A wide range of empirical approaches to the assessment of human performance has been proposed or actually used in astronaut evaluation. These approaches can be conceptualized as Iying along a dimension of apparent or face validity. A depiction of how closely the performance assessment conditions approximate actual mission requirements is given in figure 1. The low fidelity discrete task assessment techniques are at one end of the continuum, and measurements of performance during actual missions are at the other end of the continuum. The several approaches to performance assessment are discussed in the following sections, beginning with the low and proceeding to the high fidelity end of the continuum.
Discrete Task Assessment Techniques
Performance analysts have long relied on the use of test batteries consisting of one or a number of discrete, individual tasks to measure such factors as vigilance, reaction time, tracking, limb steadiness, coordination, and perceptual speed. For example, Mackworth's (1950) classic technique for assessing vigilance involves subjects attending to a clock hand that usually moves in single steps, but occasionally gives a double jump. Subjects are instructed to make a....
 ....response via some recording device only when they observe the double jump. This single task has been a standard technique in the assessment of effects of monotony and fatigue associated with the monitoring of radar and control equipment by air crews under operational conditions. A discrete task technique used for assessing tracking ability is the rotary pursuit task (Ruff, 1963). Here, the subject attempts to maintain constant contact between a stylus held in his hand and a small disk on a rotating turntable.
The essential assumption underlying the use of such tests is that each task measures some basic human ability than can be related directly to the performance of more complex, real world tasks. By using discrete basic ability tasks under highly controlled conditions, it is assumed that useful information regarding the more complex performance of actual mission tasks can be inferred. The discrete task assessment approach has been preferred in many laboratory studies, particularly those assessing the effects of simple factors on a single aspect of performance. Discrete tasks have the advantages of being relatively uncomplicated to use and low in cost. Performance is rather easily defined and quantified with such tests. However, there are several disadvantages in the use of the techniques, especially with highly skilled special populations such as astronaut operators.
First, there is the issue of apparent validity. The tests, even if proven to tap performance on tasks needed in spaceflight, usually do not resemble the actual mission tasks much, if at all. For example, holding a stylus in holes of decreasing diameter without touching the edges may indeed be one measure of limb steadiness, and limb steadiness may be necessary for successful performance of in flight mission related tasks; however, that single task measure of limb steadiness may have little or no direct relation to the in flight limb steadiness performance required of astronauts. Also, the point at which a change in task performance occurs in the laboratory may be considerably different from the point at which performance is affected under operational conditions. Therefore, the degree to which the results of such methods can be generalized to operational settings is questionable. These individualized tasks appear best suited for settings where there is a specific question regarding the effect of some isolated variable upon a given aspect of performance. They can help to clarify issues, for example, by identifying questions towards which more sophisticated techniques should be directed; they are of lesser value when used individually in attempts to quantify probable levels of performance under operational conditions.
 Fleishman (1967), Parker (1967), and others have argued that discrete task measures are most useful when factor analyses have identified them as pertinent to features of the more complex operational task system. By selecting those tasks that have elements in common with the operational condition, the likelihood of obtaining relevant, generalizable data can be maximized. This, of course, requires that the investigator be able to define and quantify the specific task demands of the real world environment.
The usefulness of discrete task techniques might be enhanced by employing them under conditions that represent the operational setting. One of the important conditions of the operational setting a condition most often lacking in the use of these sequentially administered tests - is the multitask, concurrent demand, timesharing requirements of the work environment. Some investigators have attempted to deal with this issue by combining two or more of these tests into a concurrent task situation. There is some evidence to suggest that concurrent testing does enhance the sensitivity of the tests (Brown, 1978), but we do not yet know whether it increases the tests' validity, as theory would suggest.
Finkelman and Glass (1970) employed a dual task methodology requiring simultaneous performance of a primary task (tracking) and a secondary task (recall). They found that unpredictable, uncontrollable noise produced decrements in performances of the subsidiary task, but not in the primary task. Likewise, Bell (1978) found, using a tracking task and a number processing subsidiary task, that noise and/or heat had detrimental effects on subsidiary , but not on primary task performance. The basic notion behind these and other supportive findings is that operators or experimental subjects have only a limited capacity to process information and to respond. When some stressful condition in the environment reduces a person's ability to allocate resources to the tasks, the focus of effort will shift to the primary task, allowing decrements to occur in the secondary task. Thus, performance may deteriorate on what is perceived to be the less important task, while remaining unchanged on the more important task. Such findings represent typical experiences in operational settings. Although the concurrent use of single tasks may increase measurement sensitivity, such an approach still fails to represent the much more complicated real world work situation.
 Multiple Task Batteries
In efforts to avoid, or at least reduce, the disadvantages of the discrete task methodology, various investigators have adopted a task battery approach to the study of human performance. Alluisi (1967) describes such a technique as "synthetic" because it synthesizes various tasks into a general work situation that requires time sharing. Although the tasks individually may have low apparent validity in terms of any specific application, they are selected to represent functions that operators are called upon to perform in a variety of man machine operational settings and collectively constitute a reasonable and credible "job" to operators and experimental subjects (Alluisi, 1967,1969; Chiles, Alluisi, and Adams, 1968).
A multiple task performance battery has been described and an operator's panel pictured by Chiles and his colleagues (1968). Behavioral measures are obtained from the operator's performance in working at the tasks presented with the panel. In the final version of this test battery, six tasks (three passive, watch keeping tasks and three active, computational/procedural tasks) are displayed at each of the identical work stations -one station for each member of a four or five person crew. Communications functions are measured indirectly in the performance of the three active tasks. For further details, including citations of the individual technical reports of research conducted with this approach, consult the three brief summaries (Alluisi, 1967, 1969; Alluisi and Chiles, 1967), the general description of synthetic work methodology (Morgan and Alluisi, 1972), and the report of the work rest scheduling research (Chiles et al.,1968).
Using the synthetic work, multiple task performance battery, information relevant to a wide range of abilities can be acquired simultaneously, minimizing the difficulties of cross experimental comparisons often cited as a problem when different subjects in different experiments perform different tasks. The synthetic, multiple task battery approach has been used by Alluisi and his colleagues to investigate work rest schedules and also a wide range of variables of potential impact to the crew of spacecraft: desynchronosis (disruption of the body's circadian activity), sleep loss and the recovery therefrom, confinement, illness, etc. (see, e.g., Adams, Levine, and Chiles, 1959; Adams and Chiles, 1960, 1961; Alluisi, Chiles, Hall, and Hawkes, 1963; Alluisi, Chiles and Hall, 1964; Alluisi, Beisel, Bartelloni, and Coates, 1973; Beisel, Morgan, Bartelloni, Coates, DeRubertis, and Alluisi, 1974). Some of this work is discussed in a  section of this chapter entitled "Issues in Astronaut Work Regimes."
The synthetic test battery would seem to be the most generally useful approach to the selection and preliminary training of astronauts. It is less expensive than most simulation systems and permits more than a single individual to be measured at a given time. It yields a maximum of information regarding subjects' performance capabilities on several psychophysiological dimensions and can be used to investigate a range of environmental conditions. However, there remains the issue of validity, particularly when mission training is involved. To fully prepare spaceflight crews for the rigors of space, simulators are required.
Partial and Full Scale Simulation
Mission simulators provide the highest degree of fidelity possible in the simulation of true operational conditions. They are particularly important in space mission design because there is essentially no opportunity for a graduated series of practice efforts under true operational conditions before the mission takes place. Since space mission crews must be trained and highly proficient in their tasks before the flight, it is imperative that high fidelity simulator systems be available for training on specific, individual aspects of the mission (partial simulation) and for the completely integrated "dress rehearsal" simulation of the mission (full scale simulation).
Simulator systems have formed an integral part of our manned space effort. From the beginning, high fidelity simulators provided much of the information for man machine design factors and training requirements, and allowed for total system inspection (Link, 1965; Berry, 1967). For example, Mercury astronauts participated in four centrifuge programs to investigate crew capability to control the spacecraft manually during the high acceleration loads imposed during launch and entry (Link and Gurovskiy, 1975). There was general agreement that the centrifuge technique was the most useful high g environmental simulation device used during training and that the accurate representation of the actual craft was extremely valuable. Mercury crews used a yaw recognition simulator during training to familiarize themselves with the wide angle optics of the periscope they were to use for outside viewing during the actual mission. In addition, a simulation training program using a multiaxis spin test inertia facility trainer was used to examine reaction to recovery from tumbling flight. In all, some type of simulation was provided for  every significant flight phase which required integrating the crew with the flight plan and the ground support elements of the Mercury project.
As the complexity of spacecraft and missions increased with the advent of the Gemini Program, the importance of simulator training increased. Results from the Gemini Program strongly suggested that crew response during flight was highly dependent upon the fidelity of simulation training received before flight (Kelly and Coons,1967). During the Apollo Program, simulators were again a significant part of preflight training (Ertel and Morse, 1969; Brooks, Grimwood, and Swenson, 1979). During the Skylab program, simulators depicting diverse visual displays were used routinely in training. Combining such tasks as target recognition and status assessment, complex and time dependent point operations, malfunction analyses, and rapid response to flare and other transient events, astronauts were able to use high fidelity simulations to master operations important to the flight and to the experiments conducted aboard the spacecraft (Holt and da Silva, 1977). Simulation continues as an essential preparatory tool for Shuttle (Bilodeau, 1980).
New developments in computer graphics have resulted in significant advancement in simulation experiments. At Johnson Space Center's Graphics Analysis Facility, researchers are able to simulate many of the concerns of man machine integration without the necessity of hardware development. Full scale modeling will continue to be required; however, for many applications, computer graphics offer a flexible alternative to this more expensive and time consuming approach.
While there is no doubt that simulator studies have been useful during all stages of spaceflight, they have been employed primarily for training (e.g., the use of neutral buoyancy simulations achieved through water immersion (Machell, Bell, Shyken, and Prim, 1967) and the Keplerian trajectory aircraft flights (Nicogossian and Parker, 1982) and for evaluating the operation of the entire man machine complex. In contrast, they have been used only rarely to assess the performance of humans within a system. A partial exception is the work of Milton Grodsky (Grodsky and Bryant, 1963; Grodsky, Warfield, Flaherty, Mandour, and Hurley, 1964; Grodsky, Glazer, and Hopkins, 1966a; Grodsky, 1967), who employed man machine simulation to assess the reliability of human operators and to investigate performance changes across time.
 As one example, Grodsky, Moore, and Flaherty (1966b) compared performance levels during a 7 day, integrated lunar mission simulation with asymptotic performance levels previously obtained during a 5 wk baseline training period. The simulator used was considered to be a good representation of the Apollo command module. Subjects tested included nine Air Force test pilots with pilot qualifications similar to those of the astronauts. The tasks performed during the simulated lunar landing mission were divided into four categories: switching, flight control, information handling, and malfunction detection. The results of the first two tasks demonstrated some changes from baseline data. Performance was superior to baseline levels for some types of navigational switching tasks, but was degraded for certain other tasks during specific mission phases, and flight control results demonstrated suggestive, though not statistically significant degradation.
Although Grodsky et al. (1966b) did not delineate any variables (other than the mission time line) that might have influenced performance, there are suggestions that workload, boredom, or task complexity may have been involved. For example, in terms of task complexity, the brake and hover requirement of lunar landing was clearly the most difficult, based on the number of trials required during training to achieve an adequate baseline. This was one of the areas that appeared to degrade during the simulated mission. Also, performance during the lunar orbit insertion was relatively poor (20% to 35% degraded when compared with baseline), although this phase was conceptually and functionally identical to the translunar insertion and transearth insertion phases, which showed no such degradations. The authors hypothesize that the observed decrements prior to lunar landing may have been a result of emotional shifts, but they are unable to say if the prevailing mood was one of boredom or excitement.
Although the results of this preliminary study are not particularly striking, they do suggest that important performance effects can be identified and measured during simulation. Performance assessment under such conditions offers certain advantages over the discrete task and synthetic work approaches previously discussed. Mission simulation has very high apparent validity. This validity facilitates comparison of performance changes between test and mission settings. Short of actually studying in flight performance, the simulation approach is our best single indicator of how overall performance under operational conditions may vary as a function of the combined conditions and stresses of the mission. Data obtained  during training periods and during mission simulations can be used as a gauge with which to predict, and later to judge the sufficiency of performance observed during actual flights. Future studies should employ simulation as an experimental tool to explore the effects of such independent variables as task scheduling, work rest cycles, sleep deprivation, and desynchronosis.
There are at least three drawbacks to the use of the simulated mission technique in the study of performance which should be pointed out: (1) some of the conditions of an actual mission, such as weightlessness, simply cannot be adequately replicated, and so their effects cannot actually be measured; (2) although high fidelity system simulation may permit more direct comparison of specific system performance between test and mission settings, data obtained for one system may not be generalizable to other systems; and (3) the simulation techniques are quite costly in terms of all resources (relative to the two simpler techniques previously described).
In flight Performance Assessment
In the early days of spaceflight, few quantitative performance data were obtained under operational conditions. During Apollo 15 and 16 (Kubis, Elrod, Rusnak, and Barnes, 1972; Kubis, Elrod, Rusnak, Barnes, and Saxon, 1972) and later on Skylab (Kubis, McLaughlin, Jackson, Rusnak, McBride, and Saxon, 1977), a major source of in flight data became available through time motion analyses of image and voice data. For instance, video and auditory recordings from Skylab Missions 2, 3, and 4 were analyzed to determine the amount of time required to complete various tasks. Comparable baseline data had been collected prior to flight. Eight tasks were employed, each subdivided into the many different components required to complete the task. The tasks selected were limited to those with standardized, repetitive maneuvers that would satisfy replication and conformity conditions. Analyses were conducted to determine the degree of performance change (amount of time required in flight versus preflight) and the time to return to baseline (preflight) levels.
The results showed that the first attempts in space to carry out mission tasks usually were inefficient. For example, during Skylab 2, the first in flight task trial took longer than the last preflight trial in 68% of the cases (Kubis et al., 1977). Similar data taken during Skylab 3 showed 54% of the task elements took longer in space than  on Earth, and a comparable figure from Skylab 4 was 58%. However, during each of the three missions, by the end of the second performance trial, approximately half of all tasks were completed within the time recorded for the last preflight trials. This suggests that to facilitate performance adaptation, tasks critical to mission success should be rehearsed early in the flight.
This type of in flight data collection represents a good first attempt to quantify performance in space. However, the statistical data presented could be further refined. For example, the available reports cite only the number of tasks that took longer to perform in flight than preflight, but give no indication of how much longer. It would be important to determine whether these time periods represent significant delays. Also, the analysis is complicated by the confounding of tasks with the time at which they were measured; different tasks were performed on different days of the missions. In addition, future in flight measurements should include a specification of the types of errors made. With the time motion studies presently available, the only inferences that can be made are those involving comparison of fine, medium, and gross motor movements. For example, Kubis et al. (1977) concluded from their Skylab analysis that during adaptation to spaceflight, fine motor movements are affected more adversely than medium or gross motor movements during both intravehicular and extravehicular activities. This result was confirmed in the debriefing comments of the astronauts who reported that they had more difficulty with the control of small objects than of large objects. These findings suggest that extra preflight training should be given to manuevers requiring fine, delicate movements, and that, where possible, these tasks should be scheduled later in the mission, after crews have fully developed their fine motor dexterity.
Work time ratios constitute another type of in flight performance analysis which could prove useful in judging the overall work capacities of spacecrews. l n one analysis (Garriott and Doerre, 1977), total estimated time associated with tasks accomplished was divided by the total number of hours available to work. This computation was termed the "work efficiency ratio" and was computed for each crewman of the Skylab missions. (Time estimates were based on the time it took a trained crewman to complete the task on the ground.) A ratio of 0.50 was defined as a normal work day based on "normal" working conditions on Earth, where the hours of useful work (8 hr) is divided by awake time (16 hr). Astronauts maintained or exceeded the 0.50 level on most mission days. However, during the second day of Skylab 3 and the third day of Skylab 4, the work efficiency ratios  were 0.41 and 0.45, respectively. This decrease in work efficiency appears to have resulted from space sickness, as reported by the crewmembers. It might prove useful to combine the analyses of work efficiency ratios with those from time motion techniques to determine (1) the extent to which these measures of performance speed may be correlated and how they might be combined to yield maximum information, and (2) the relationship between performance speed and error rate.
Future Focus of Research on Performance Assessment
No single approach to the study of performance variables related to spaceflight can provide all of the resources and data needed. Instead, our future efforts need to focus on the coordinated and integrated use of all the assessment approaches discussed so far (Alluisi, 1975, 1977). Although mission simulations and actual in flight measurements may be extremely valuable in gauging the overall functioning of crews and in helping to identify problems of concern, these approaches do not support inferences regarding cause and effect relationships. For example, there is no way of determining whether the changes in performance observed early in the Skylab missions were due primarily to motion sickness, lack of adaptation to weightlessness, increased arousal resulting from the excitement and novelty of the mission, or some combination of these or other factors. To investigate these variables more directly, we must rely upon more controlled laboratory tests, using the discrete task and synthetic work approaches. Important findings from research using such techniques can guide our assessment of performance under more complex, operational conditions.
Other performance measurement problems also deserve attention. One problem is to define criteria for measuring performance under operational conditions. Until such criteria are defined, there can be no clear direction on how to validate and implement ground based systems to aid our understanding of work in space. Also, more effort is required to ensure systematic quantitative and qualitative data on in flight performance. Unfortunately. there are no plans for such performance studies in the Space Shuttle series.
Another problem area concerns task predictive validity under ground based conditions (i.e., the ability of the ground based task to effectively predict task performance in space). For research and development purposes, emphasis should be placed on the predictive  validity of assessment approaches rather than relying on their apparent validity (i.e., validity based on the extent to which a test seems to measure the variable to be tested because of its similarity to the criterion measure), although for selection and training purposes apparent validity may have to remain a major consideration.
Finally, we need to relate relevant physiological indices to the performance aspects of behavior. Behavioral measures could be useful to determine flight crew status and to indicate changes in their capability to perform mission tasks.