Guest Speaker – Jason Orendorff from Mozilla

Avatar Dr. T | December 5, 2017 326 Views 0 Likes 5 On 1 Rating

326 Views 5 On 1 Rating Rate it

When we began the Ruby back-end, which included both tech and guest speakers I didn’t realize how much value this part of the program would offer. I know every speaker who comes is the best yet, but Jason did more than encourage me to be a better developer, he encouraged me to take responsibility for outcomes, which we really don’t think too much about until it is too late.

He set the stage by having us, from the future, look at the organizational and tech failures that lead to one death too many. 

Who would have thought that software could have so much impact on the outcomes of a person’s life. I have always thought of development as a super power and what comes with it is great responsibility.

He said a few things that stuck with me, but one of the most impactful was: “Everything in industry has a greater impact and the responsibility for safety and security is entirely on the developer.”

He presented us with the worst case scenario – processes and tech that lead to the death of 3 people in the Therac -25 case.

He began by asking who was at fault: the organization or the tech. Then he listed out the errors on both sides.

Organizationally they were:

  • No code reviews of system code
  • No written systematic test plan – When the FDA asked for test documentation after the tragedy, they couldn’t provide anything.
  • Software risks not accessed – with the a predicted failure rate of 1 in 10 million, they thought it was acceptable to proceed.
  • Errors not documented, error codes popped up, but the meaning wasn’t found anywhere, which caused users to believe that all errors were minor. There also was no next step.
  • No final system testing
  • Over confidence – no one wanted to take blame
  • No FDA oversight for medical machines or devices

 

The technical risks/problems were:

  • Bad error messages – A good error message tells you what to do next…
  • Bad error recovery
  • Data races: a race condition is where multiple processes are going at the same time. These errors are hard to track and debug because they don’t consistently happen.
  • Incrementing a boolean
  • Status confusion – when there are more statuses than are necessary
  • Programming mistakes
  • No Hardware interlocks to prevent harm – the example he gave was a microwave. You can’t nuke your head because the unit won’t work unless the latch clicks.
  • Frequent minor errors – conditioned the operators that all errors were minor 
  • General bad software quality

  Things we hope to have learned from this tragedy:

  • When you know more things look different.
  • Failures have a web of causes
  • The nature of safety and software is congruent to quality is congruent to security
  • Blame is not a number – it is not our job to deflect culpability, our job is to make sure it doesn’t happen again.
  • Safety engineering is breaking causal chains that lead to failure

 

So how does the developer super power manifest itself? Since we can’t address the root cause, because people are hard to change, we should focus on what we can change. We can change processes and systems.

So, bringing this all together in measurable things that we can focus on, here is a list Jason gave us that should be a minimum for the software development environment and I couldn’t agree with him more:

  • Version Control – who is writing the version and why?
  • Issue tracking – this builds collaboration, community, but adds the value of quality
  • User feedback  – without it the product will fail
  • Code Review Policy that everyone follows
  • Automated testing with focus on regression testing – regression tests looks at something that you know is working, so if a new feature or version is added and the test fails, devs can catch it quickly and fix it.
  • Continuous Integration – extra infrastructure to get rid of the chaos and to know immediately when a change has messed something up.

These are extra precautions that Mozilla have implemented to ensure quality:

  • Try Server – to test and see if the new code would pass tests
  • A security team – they are dedicated builders and breakers
  • Fuzzer Team (The Fuzzers) – generate random input and submits reports if there is a crash
  • Assertions
  • QA Team – provide normal human expectation of what a program should be and how it should function and reports bugs
  • Automated updates
  • Bug bounty – if you find a security hole, you get some loot upwards to $10K! And the problem gets fixed so we all win!

The common thread throughout all of these issues and solutions is

  1. to avoid relying too much on programmer diligence. Put a system of processes in place that nullify that factor.
  2. WE CAN WORK ON THIS! We have the power to do this and put it into place

Ultimately he closed with this: Engineering = Quality Engineering

I know first hand the impact of crappy software. In my research I uncovered the significance of good tech and how it can negatively impact a person’s life if not created with care and user feedback.  This talk was just in time for me.

I had been struggling with creating software and creating quality software and this has given me the courage I needed to be great! Thank you, Jason!