IN/MSX: Running 4 Copies of an Operating System at Once
In mid-1981 I was working at a little place in Rockville Maryland called the National Institute for Safety Research (NISR) while working part-time on my BS degree in Computer Science at American University in Washington, DC.
While working at NISR, writing PL/I code to analyze driver records pulled from the DMVs of multiple states, I came to know an IBM operating system called VM/370. This “hypervisor” ran other operating systems as guests, a concept which intrigued me. Early in my IBM mainframe programming career I had discovered that it was possible (and inexpensive) to order the “program logic” manuals from IBM. These documents exposed the internal workings of IBM’s operating systems and compilers in considerable detail. I learned all that I could about VM/370 while waiting for my programs to chew through mounds of data.
Even before this, while at Montgomery College in Rockville, I had fished entire OS/370 operating system “build” listings from the trash (a pile of paper 5 or 6 inches thick) and studied them with care.
Rumors of IBM’s entry into the personal computer business had been flying for some time. Shortly after the announcement I went to the local Computerland store to see one of these much-anticipated machines in the flesh. While I was peeking my head inside, the guy next to me asked me what I thought of it and of the Intel 8088 chip inside. I told him that it was ok, but that the Motorola 68000 was a much more interesting and better designed chip. He thought that this was an interesting opinion and we started to talk. He introduced himself as Dave Dikel and said that he was working for a company that was going to bring a 68000-based computer to market. He invited me to come over to talk to them.
Before I knew it I was meeting with Dave and with Dick Naedel, the CEO of Intellimac. Dick explained that they were building computers targeted at Ada programmers. At that time, [Ada](http://en.wikipedia.org/wiki/Ada(programminglanguage))) was supposed to be the next big thing in high-level programming languages. Commissioned by the US Department of Defense, it was designed to replace the hundreds of unique and often proprietary languages then in use. Dick had a huge and well-appointed office (perhaps 30 by 30 feet) and spoke with knowledge and confidence about his plans.
The machines were to be built in 19″ racks, using the Stanford SUN board, which had a Motorola 68000, memory management hardware, space for some ROMs and 2 Megabytes of RAM. Intellimac added a large (14″ platter, perhaps 140 MB) Fujitsu hard drive and some other peripherals. The SUN board used a then-current standard called the Multibus, designed (ironically enough) by Intel.
Dick explained to me that they were using an operating system from a company called Telesoft. Telesoft, headed by UCSD Pascal author Ken Bowles, was building an Ada compiler on top of its ROS (Renaissance Operating System) product. He told me (and I remember this clearly) that they had the operating system running in single user mode but that they wanted to run it in multi-user mode. At that point I was barely 21 years old. I had written a whole bunch of system-level 6502 assembler code and I had a really good ground-up understanding of the way that contemporary computer hardware worked. After studying the manual for the SUN board, I decided that I could simply break the 2MB of physical memory in to 4 chunks of 512KB each and run 4 copies of the operating system, gaining control via interrupts and device drivers. Dick offered me $10 per hour, just slightly more than I was making at NISR, and I accepted with only the vaguest idea of what I was getting in to. When you are young and naive, anything seems possible with technology.
Two weeks later I showed up for my first day at work, expecting to take the existing single-user version of the operating system and make it multi-user. I started in, only to discover that Dick had been stretching the truth just a little bit. In fact, they didn’t have the bootstrap code or the device drivers needed to make the hardware run at all! Undaunted, and facing a three month deadline before they were going to be ready to ship the hardware, I set to work. The details elude me at this point, but I must have used some sort of cross-assembly system to write the first simple bootstrap loader (in 68000 assembler code) and the single-user driver for the hard drive. Because I had to bring the machine up from scratch, I had a really good understanding of what had to be done to initialize the various bits of hardware, set up the memory management, seek the disk drive, and so forth. After about 6 weeks of hard labor I actually booted ROS for the first time and soon afterward it was running reliably.
The clock to the three month shipping date was still ticking, so I then set to work on the multi-user hypervisor environment. I spent a lot of time studying the interrupt and exception handling model of the 68000 and was able to see how to context switch between multiple processes pretty easily. Basically, I kept a separate context — all of the data and address registers, the stack pointer, the status flags, the interrupt mask, and the program counter — for each of the 4 copies of the operating system. I wrote code to store and restore the contexts. I wrote some nice triple-nested loops to load up the memory management hardware, allocating an equal amount to each partition and protecting each partition from the others. I added some hooks in the disk and console (serial port) drivers and we procured a multiple-port serial I/O board to run the additional dumb terminals.
I built something I called VMON, the Virtual Monitor. Each of the 4 terminals would boot up into VMON. From there, keyboard commands could be used to examine and modify memory, change system settings, and to boot up a copy of ROS into the memory partition. The exception handling was set up so that the operating system running in any partition could crash back into VMON without affecting the other running copies.
Getting to this point took about a month. The bootstrap code, the hypervisor, and VMON were all burned into ROMs (read-only memories). I would write the code, assemble it, convert it to the format required by the ROM burner, burn it, plug the ROMs into the test hardware, and give it a shot. All debugging was done using the system itself. I wasn’t even aware that most system-level programming like this was done with the aid of an ICE, an in-circuit emulator. I got to know the machine inside and out, including the exact layout of the stack after the registers had been pushed. There’s something really good and rewarding about knowing a system at this very intimate level. If you understand everything, then nothing at all should be a surprise. A complete understanding can lead to complete mastery of what can be done. Conversely, incomplete or incorrect understanding will lead to trouble. You have to know what’s going on. You can’t browbeat the computer into doing something other than what you’ve programmed it to do. It is clean, pure, and mentally demanding like almost nothing else.
Miracle of miracles, I completed the code and it booted a copy of the operating system. And then a second, a third, and a fourth. Everyone was impressed — it seemed to work pretty well and was demoable within a few days. However, it wasn’t perfect. After 20 to 30 minutes of operation, one or another of those operating systems would crash into VMON.
Much head-scratching ensued. It was pretty clear that I had done just about everything right. For things to work at all, I simply had to be storing and recovering the system state with perfect accuracy as I transitioned from partition to partition. If anything wasn’t right then it would have to fall over and die within seconds. In the course of 20 to 30 minutes of flawless operation it had to be doing things right millions and millions of times.
Suspecting a flaw in the context switching, I simply rewrote it. This didn’t help. A week passed. I remember walking around downtown Bethesda all night, running the code in my head trying to figure out what I had done wrong.
I decided that I was somehow corrupting the memory partitions from outside. I replaced each booted operating system with a test program. Each copy of the program would simply fill up its memory partition with a known pattern, then read it back while checking for errors. I started them up and waited. Sure enough, one of them detected a mistake. Oddly enough (and I can still remember this with absolute clarity 25 years later) the wrong value was the status flags and the interrupt mask that my context switching code would push onto the stack while switching between memory partitions. If I didn’t know the machine in such great details, I would have thought that the 0×2703 was just another value. But I recognized it for what it was, and in an instant I knew what was wrong.
When switching from partition to partition I had to first save out the old state and then switch to the new one. Each partition had its own call stack and there was another call stack for the hypervisor. Part of my context switching code would adjust the memory management to activate the block of memory which contained the partition’s call stack. Well, there was a small problem with my code. I would first set up the new stack pointer, and then activate the new block of memory. If a high-priority clock interrupt occurred between these two operations, the stack pointer would effectively be pointing into another partition’s memory. There was a one-instruction window during which an interrupt would be deadly. The 0×2703 value was the evidence that I needed to realize that I had to do things slightly differently, setting up a separate stack for the hypervisor. After this tiny change the system ran flawlessly.
Intellimac went on to sell quite a few of these machines. For $50K (back in 1982) a 4-user Ada development machine was a bargain. They named the hypervisor IN/MSX, and later made some money by selling this name off to Microsoft in the company’s waning days. The Telesoft Ada compiler was never quite finished — as far as I know they never reached the coveted goal of passing the entire suite of validation tests, despite at least one ground-up rewrite. Dick wanted me to singlehandedly write an Ada compiler for Intellimac and even sent me to an ACM compiler construction conference, but this was clearly a very daunting task and we never actually wrote any code.
After some initial success the company got a bit conceited and started to make promises thinking that I could always figure out a way to make good on them. Sadly, I was not a miracle worker, and when they started to promise customers that they could run Unix and ROS on the same machine I knew that it was time to go. I moved on to Contel Information Systems, and the crazy stuff that I did there will have to be a story for another day.