Hi, and welcome back. This will be a dark and tragic lesson. It's about the Therac-25 radiation therapy machine. How it was poorly designed. And how it killed patients as a result. The events that took place happened between 1985 and 1987. An IEEE article which analyzed the failures appeared in 1993. Although the article is now 24 years old, it is a reminder of what can happen if development, particularly design, is shoddy. The problems which existed then still exist today, the attitudes, the haste, the poor engineering. I think when we're done with this case study, you'll see that the good design practices advocated previously would have prevented many, if not all of all of the problems. Some background, radiation therapy in cancer treatment uses X-rays to disrupt or destroy the DNA of cancer cells, preventing them from reproducing. The high energy photons impart some of their energy to the chemical bonds holding the DNA together, fracturing it so that it can't be copied. Of course the high energy would also disrupt the DNA of normal cells. So focus, intensity, and the distribution of the X-ray beam is critical in radiation therapy. The Therac-25 was built by the Atomic Energy of Canada Limited and a French company called CGR. These two companies had collaborated since the early 1970s in building linear accelerators for medical applications. The previous product to the Therac-25 was the Therac-6, a 6 million electron volt accelerator. The Therac-25 was produced along with another machine, the Therac-20, both being derived from the Therac-6 model. The 20 and 25 models had 20 and 25 million electron volt accelerators respectively. So they were much more powerful. Also, the Therac-6, in addition to being much lower powered, was largely a mechanical system rather than one governed by software. As such, the settings and safety interlocks were physical rather than virtual in nature. The important point here is that this was the first radiation therapy machine produced by AECL which was completely run by software. My hands start to sweat at this point. The computer hardware for the system was the DEC PDP-11. By the mid 1980s, this was a venerable machine, having been around since the early 1970s. It had a 16 bit processor and a very large instruction word instruction set. This was not uncommon for machines of this era. But by the mid 1980s, the PDP-11 was in decline, because it was a 16-bit architecture. And 32-bit CPUs were coming into production. In fairness, the PDP-11 wasn't a bad choice, since the first Therac-25 had been prototyped in 1976. And the first production machine was manufactured in 1982. A single programmer produced the software for the Therac-25. He resigned from AECL in 1986 in response to a lawsuit that was filed. And no information about him has ever been obtained. There was also no documentation on the software he wrote, nor any requirements or design that went into it. Rather than leverage the Unix operating system, which was standard with the PDP-11, the AECL programmer wrote a custom operating system. That, and the operational software, were all written in PDP-11 assembly language. Let me say something about that. In the early 1980's, I wrote C on Unix for the PDP-11 and it's 32-bit follow on, the VAX, or virtual address extension machines. I was familiar with PDP-11 assembly. And the instruction set, while extremely flexible, is difficult to read. The same instruction can act as a load or a store command, depending upon the command specifics, sources and destinations. Commands that manipulated data also has sources and destinations in them. So you had to look at the code longer and think about it more to figure out what it was doing. By comparison, C, even as cryptic as C can be, was a lot easier to read and to proofread. Given the information so far, I have two conclusions. First, this won't end well. Second, it wasn't entirely or even mostly the developer's fault. I can understand the developer's attitude. At that time, there weren't a lot of people with programming skills. In fact, the first PDP-11 worked on, that I had worked on, I'd built from a kit. It didn't even come assembled. So if you wanted to program it, you had to put the thing together. It was also a time where there were few standards. And nothing approaching the idea of software engineering crossed people's minds. I consider it a management and a design fault, however, not to have used a standard operating system which had a proven track record of correct execution. One of the eventually discovered, failure patterns in the Therac-25 software had to do with the operating systems having no test-and-set mechanism, which is critical to being able to write multi-tasking, multi-threaded code. At that time, writing operating systems was not a common skill. Certainly, there were people who did an excellent job of it, but they weren't running around loose as programmers for hire. We're all familiar with the bravado that comes from the frontline programmer. Yeah, sure, I can do that. And then hope you can, you can figure it out later. Management didn't have an appreciation for this. It should have. The six serious overdose cases occurred between 1985 and 1987. The approval for use of the machine was withdrawn by the Food and Drug Administration in 1987. In each case, a machine malfunction, which always had a software component, caused patients to receive instant radiation burns from receiving doses hundreds of times higher than prescribed. Contributing to the overdoses were a number of designed-in deficits. First, was that if the software detected an error, it would notify the operator of malfunction by using that word, error, and a number from 1 to 64. No document has ever been found that explained what all the error codes meant. When a malfunction occurred, the machine would either go into a pause state, such that treatment could be restarted by entering the letter p on the operator screen, or the machine could also go into a reset state, which all the prescription information, the dosing information, had to be reentered by hand. It's also worth mentioning, at this point, that for each patient's treatment, the beam parameters had to be entered by hand. A note about the user interface. This was a VT100 terminal, a standard at the time, 24 line by 80 character screen. There were no graphics. Tabbed fields and command lines were very common. The radiation prescription information included radiation mode, electron beams for low energy, and X-rays for high energy levels, dose, dose rate, time, the field size, the gantry rotation, and data for running accessories added to the machine. This information originally had to be entered twice. And it had to match or the operator would have to start over. In pre-production, this was determined to be too slow. So data was entered just once and could be duplicated a second time in the fields by simply hitting the Enter key. Now in the overdose cases, it wasn't the inaccuracy of data entry that caused problems. Two of the cases, however, resulted from the operator, upon discovering an inaccurate entry, changing the parameters quickly. The lack of integrity in the operating system produced raced conditions between the routine that read characters from the operator console and the routine that recorded and processed them. This happened notably with the mode field, which is a single character X for X-ray and E for electron beam. X-Rays were much more powerful than electron beams. And X was the default for the field. Operators would recognize their mistake, quickly tab up to the mode field, and change it to an E, and then proceed with the treatment. The real time routines, however, would occasionally not pick up the edit. The machine would proceed with a very high energy, narrowly focused, X-ray beam, even though the operator console clearly had the letter E in the mode field. This particular error was so subtle that a number of other false errors were found and fixed, only to have the original overdose problem reappear. There are a number of side stories, which you can investigate if you're interested. One is the manufacturer refusing to believe that it was their problem. Another was finding what they thought was the problem and fixing the wrong thing. A third was lack of exchange of information in treatment accidents. But insofar as the software engineering aspect is concerned, here are the main takeaways. Documentation should not be an afterthought. Software quality assurance practices and standards should be established. Designs should be kept simple. Ways to get information about errors, for example, software audit trails, should be designed into the software from the beginning. The software should be subjected to extensive testing and formal analysis at the module and software level. System testing alone is not adequate. I think you'll see that the practices recommended a few lessons ago would have prevented the problems experienced by this system. Two more points, first about security. I can see that some may say, well, this might have been about safety, but it's not about security. Security is about two things, proper function and proper handling of malfunctions. And that's what this case is about. Second, about radiation treatment, in case you or your loved ones, heaven forbid, need radiation treatment, a lot has changed since the Therac-25. Machines are not that much more powerful, but the systems are much better engineered. Radiation planning is done by modeling the volume of a tumor using computer tomography or magnetic resonance, or both. A set a treatment parameters describing the minimum radiation to be received by the tumor volume and the maximum radiation to be received by surrounding tissue is put into the system. A computer program runs simulations of the machine to discover which of literally hundreds of beam parameters will best radiate the affected area without radiating the surrounding tissue. Once the treatment program has been generated, a high fidelity simulation is run to verify that the computed parameters are correct. Finally, a three-dimensional radiation sensing array is placed on the treatment table. And the radiation program is run. The collectors and the radiation sensing array verify that the computer model is correct. And then treatment can proceed. At no point during this process is data manually entered, with the possible exception of the description of the tumor volume to be radiated. This is generally done on high resolution screens which overlay planned tumor volume with other imagery to ensure accuracy. From there on, data is simply transferred from system to system and no hand entry is performed. So if you or someone you know is in the position of needing radiation treatment, ask for an explanation. The doctors and technicians should be happy to show their knowledge and explain to you what's happening. Okay, so, not to forget, secure design is just good design. Thanks. We'll see you next time.