Computers in Spaceflight: The NASA Experience

- Chapter Three -
- The Skylab Computer System -
 
Software
 
 
[72] IBM wanted to do a careful job on the software for Skylab. In the late 1960s and early 1970s, the company internally pushed the development and implementation of software engineering techniques. IBM learned many lessons from the creation of the OS/360 operating system, and various government-related projects. Two IBM software management experts, Harlan Mills and Frederick Brooks, circulated these lessons both within IBM and to the computing public28. The small size (16K) of the Skylab software and correspondingly small group of programmers assigned to write it (never more than 75 people, not all of whom were programmers, and only 5 or 6 for the reactivation software), meant that the difficulties in communication and configuration control associated with large projects were not as much of a factor. Also the IBM programmers were specialists. MIT assigned engineers to the programming of the Apollo computer, assuming that it was easier to teach an engineer to program than to teach a programmer the nuances of the system. This turned out to be a mistake, which MIT acknowledged29. Thus, the stage was set for IBM to produce a superb real-time program. However, the complexity of the control moment laws, the redundancy management needs, and the inevitable memory overrun kept the development from being simple.
 
 
Requirements Definition and Design
 
 
IBM and NASA jointly defined the requirements for the Skylab software. Marshall Space Flight Center delivered the detailed requirements for the control laws, navigation, and momentum management, leaving lesser items such as I/O handling to the contractor. IBM and NASA made a parallel effort to determine if the equations actually worked30. The result was the Program Requirements Document (PRD), issued July 1, 1970 31.
 
The actual design, the Program Definition Document (PDD), was released later and served as the baseline for the software, which meant that the design could not be changed without formal review. The software resulting from these documents ranged from 9,000 words to nearly 20,000 words of memory. Since the memory size of the computer was just over 16,000 words, a "scrub" was necessary, continuing the NASA tradition of exceeding the memory size of an already-procured computer by the time the planners knew the final requirements. Managers had not yet learned that software needs should drive the hardware choices. Engineers changed the control moment gyro [73] logic to reduce core usage and made other cuts32. Memory became the prime consideration in allowing requirements changes33.
 
 
Architecture and Coding
 
 
Skylab gave IBM an opportunity to demonstrate how to do software development right. The company carefully separated the production process into strictly designed phases. Two different flight loads resulted: one full-function program that filled the 16K memory, and an 8K version as a backup that needed only one module for storage. These two programs needed slightly different architectures, or schemes for organizing the execution of functions, which made the job tougher. Also increasing complexity was the requirement for redundancy management. An advanced development environment helped keep the complexity under control.
 
 
Production Phases
 
 
IBM developed the software load for Skylab in four baselined phases. Originally, three were planned: Phase I, Phase II, and Final, but numerous changes made during Phase I required an intermediate stage Phase IA. Crews used the software resulting from Phase IA for training in the simulators in Houston34.
 
The PDD for Phase I was released on November 4, 1970, and coding began35. The Phase I program contained most of the major components of the eventual flight load, including discrete I/O and interrupt processing, command system processing, initialization, redundancy management, attitude reference determination, attitude control, momentum desaturation, maneuvering, navigation and timing, ATM experiment control, displays, telemetry, and algorithms for utilities36. IBM's programming team completed and released the Phase I program for verification on June 23, 1971. It consisted of 16,224 words, filling about 99% of the computer's memory37.
 
It was this situation that led to the added phase, which was chiefly a memory scrub. Not only was Phase I a fairly extensive program, three modules still had to be coded and many changes would likely occur in the nearly 20 months remaining before launch38. By the time IBM delivered Phase IA on February 9, 1972, it had incorporated 45 waivers and 105 software change requests (SWCR) made after the thirteenth revision of the design39. This meant that nearly 40% of the original program was changed. Even with the attention to memory size, the new software amounted to 16,111 words, or 98.3% of the locations.
 
[74] Phase II represented another extensive revision of the software. The baseline for it was Phase IA plus 49 approved change requests. By delivery on August 28, 1972, 102 additional changes had been incorporated and the design was up to revision 1940. Therefore, software engineers modified about 35% of the program. The memory usage rose to 99.7%, or 16,338 locations. The final version reduced this to 16,329 words. The difference between Phase II and the flight release was only 17 additional changes. IBM made the delivery March 20, 1973, 2 months before the launch.
 
 
Architecture: The 16K Program
 
 
The ATMDC software divided into an executive and applications modules. The executive module handled the priority multitasking, interrupt processing, supporting the interval timer and also basic timekeeping chores41. Applications consisted of three major groups: time-dependent functions, asynchronous functions, and utilities. Time-dependent functions were executed in three cycles, with the possibility of higher priority jobs interrupting the currently running module. The cycles were differentiated by time: There was a "slow loop" each second, an "intermediate loop" executing five times each second, and the switch-over processor running each half second42. Designers grouped appropriate modules in a cycle. An exception to the cycle groupings, but nevertheless time dependent, was the output-write routine, which was run between intermediate loops in order to take more efficient advantage of the system resources. The switch-over process aided in redundancy management, as explained below. Asynchronous functions could be called at any time, one of which was telemetry, which sent 24 strings of 50 bits per second. The other was the command system, which could receive signals from either the ground or the Digital Address System (DAS, the crew interface) in the workshop. Those signals resulted in interrupts. Utility functions included such common algorithms as square root, sine, and cosine, and unique functions such as gimbal angle computations and quaternion multiplication43.
 
Interrupt handling was quite straightforward. Each application module had a specific priority ranking. Tasks could be requested by several means, such as interrupts, discrete signals, elapsed time, or by the direct request of another program. Any current task could be interrupted when a new task was requested. The priority of the new task was immediately entered into the priority level control tables. If the new task was of a higher priority than the current task, the computer did the new one first. When telemetry or the command system requested a task, its priority was entered on the table, just like tasks....
 
 

[
75]
 
Figure 3-4.
 
Figure 3-4. The real-time cycle of the Skylab 16K flight program. (From IBM, Skylab Operation Assessment, ATMDC, 1974)

 
 

[76] ....called in the other ways44. The standard telemetry signal functioning as a Digital Command System (DCS) word consisted of 35 bits. Buried in it were an enable bit, an execute bit, and 12 information bits. The enable and execute bits caused an interrupt, making it possible for the data to be stored45.

 
The 16K program had a computation cycle consisting of six levels: experiment input, Control Moment Gyro gimbal rates, Workshop Computer Interface Unit tests, and the command system processor; telemetry output; the switch-over timer (reset each second) and 64-bit transfer register (refreshed about once every 17 seconds); the intermediate loop (made up of Control Moment Gyro control); the slow loop (containing timing, navigation, maneuver, momentum management, display, redundancy, self test, and experiment support functions); and the "wait" state (when all functions in a particular cycle finished, about 15% of cycle time in the flight release of the program, depending on the number and nature of interrupts46.
 
 
The 8K Program
 
 
The 8K program was strongly related to the 16K program in that the larger version served as the model for the smaller. Its design, released April 3, 1972, developed from the Phase IA version of the software. IBM delivered the 8K program on November 14, 1972 after 10 weeks of verification activity. The functions of the short program were largely limited to attitude control and solar experiment activity and data handling47. It was 8,001 words in length. IBM reduced the number of levels in the computation cycle of the 8K program to four: Level I handled command processing and I/O to the Gyros, Level II did telemetry, Level III consisted of the time-dependent functions from both the original intermediate loop and slow loop, and Level IV was the wait state48.
 
 
Redundancy Management
 
 
All mission-critical systems in Skylab were redundant. The computer program contained 1,366 words of redundancy management software49. At less than 10% of the total memory, it was a bargain. Managing redundancy with stand-alone hardware and solely mechanical switching would have added much more cost, weight, and complexity to the workshop design, with the loss of a certain amount of reliability.
 
The redundancy management software consisted of two parts: self [77] tests of the computer system and an error detection program for mission-critical hardware not in the computer system. Self tests of the computer were quite extensive: Logic tests might involve doing a Boolean OR operation on the contents of a register to see if a carry occurred; operation tests required executing EXCHANGE and LOAD instructions; and arithmetic tests meant executing an ADD and checking for planned answer50. IBM also designed tests for memory addressing and I/O51.
 
The error detection program examined critical signs in several systems. If a failure was detected in attitude control hardware such as the Control Moment Gyros, rate gyros or acquisition sun sensors, then backups or reconfigurations were activated52. During the mission, one Control Gyro and several of the rate gyros failed. In fact, a "six-pack" of replacement rate gyros had to be brought up by the second crew.
 
Switch-over between the two computers was handled by the error detection program or automatically activated by the TMR timer circuits. If self tests indicated a computer hardware failure or that the software was not properly maintaining the workshop's attitude, switch-over would then be initiated. The timers were supposed to be reset about once each second during the computation cycle, after which they then counted down until reset. If two of the three reached zero, then switch-over occurred53. Besides automatic switch-over, the crew or the ground could initiate it, as actually happened in mid-mission. So that the secondary computer would be properly activated, a 64-bit transfer register was kept loaded with relevant data. This register, like the timers, consisted of TMR circuits. Great care was taken to ensure that data loaded into the transfer register were uncontaminated. A write operation to the register was restricted in length to a period of 672 microseconds plus or minus 20%, which was just about how long it took to write 64 bits into a redundant circuit. This operation could only take place after 1.5 to 2.75 seconds had elapsed since the last write, so the computer would not accept transient signals as correct data and a new write could not interfere with an earlier write54. Besides this "time-out feature," the transfer register could only be refreshed after a successful execution of the error detection program55. This way, data could not be written to the register from a failed computer.
 
The redundancy management software was a step toward the eventual Shuttle redundancy management scheme. Previously, IBM had used TMR hardware to ensure reliability. This system, with its watchdog timer, was software based and, in effect, saved space and weight. Two ATMDCs were smaller and required less power than a single TMR computer of equal reliability.
 
 
The Development Environment and Integration
 
 
[78] The Skylab software development was done in a programming environment that took advantage of useful software tools and proper integration techniques. Binary code for the computer was in hexadecimal (base 16) format, and loaded in that format56. Hand coding in hex is rather tedious, so IBM prepared an assembler to translate mnemonics into it. They also provided a relocatable loader for placing separately coded modules in contiguous memory locations. Macros, blocks of frequently used code, were kept in common libraries. Listings of programs and the original source resided in an IBM System 360/7557. This environment was small compared with the later Software Production Facility for the Shuttle, but the concept of a good tool set, promoted by IBM's Mills and Brooks, was well realized.
 
Integration of the Skylab software followed a top-down approach: The program was highly modular so as to keep individual functions separate for easy modification and also simple enough for a single programmer to handle. The executive and major subprocesses were coded and integrated first; then the remaining modules were added. The modules were grouped into three batches, so all the modules in a batch were added and tested, then the next batch would be added, and so on58. This helped in the integration process.
 
 
Verification
 
 
The software for Skylab was one of the most extensively verified systems of its era. Since it was a real-time program, verification was more difficult than a corresponding batch program because it is hard to replicate test inputs when interrupts can occur at any time; thus, a combination of simulators is needed to properly verify a real-time program.
 
IBM used a number of different simulation configurations in the verification process. The AS-II simulator consisted of a System 360/75 used for analysis of the Skylab while it was in orbit. It could evaluate the effects of changes to the flight program. The Skylab Workshop Simulator (SWS) was an all-digital simulation used in developing the initial software, as well as verification. It ran at a 3.5/1 ratio of execution time to real time. The SWS was so effective that it once correctly identified a deficiency in the requirements relating to the Control Moment Gyro system. The Skylab Hybrid Simulator (SHS) included some analog circuits for greater fidelity. One of the most effective simulators was a System 360/44 connected to an actual ATMDC; the program in the 44 could simulate six degrees of freedom59.
 
[79] The verification process was scheduled for the final 10 weeks prior to the delivery of any software phase. The process included validation of the baseline program to the requirements, coding analysis, logic analysis, equation implementation tests, performance evaluations, and mission procedure validation. The AS-II did the logic analysis and was designed to trace all logic paths through the software. The 360/44 and ATMDC system did performance tests since it was near real time in operation60. The digital simulators could be stopped in order to insert program changes. Tracing was also possible61. Combining simulators and software verification tools contributed to a high level of confidence that was confirmed in actual performance.


link to previous pagelink to indexlink to next page