A petaflops system would cover the flight decks of all existing Nimitz-class aircraft carriers or fill up most of the Empire State Building with its hardware.
When President George Bush signed the High-Performance Computing and Communications (HPCC) Act into law in December 1991, implementing systems capable of a trillion floating-point operations per second (teraflops) seemed daunting. Yet, only two years after the initiation of this multiagency program, a blue-ribbon panel of experts from academia, industry and government met in an extraordinary forum to explore the feasibility of achieving performance levels at least a thousand times greater-petaflops.
Even the word petaflops was unfamiliar to many of the 60-plus participants invited to the February 1994 Workshop on Enabling Technologies for Petaflops Computing. hosted by the NASA Jet Propulsion Laboratory (JPL), in Pasadena, Calif. The workshop's keynote speaker, Seymour Cray, said, "If we can maintain our current incremental rate, it will take 20 years. It's going to be interesting for me to hear your proposals on how we get a factor of a thousand in a quick period."
Today, as the first teraflops computers are being installed and applied, the prospect for something capable of operating a thousand times faster is almost beyond comprehension. A petaflops computer is more powerful than all of the computers on today's Internet combined. If such a system incorporated a petabyte of memory, it could hold all 17 million books in the Library of Congress or several thousand years' worth of videotapes.
To fabricate such a system today from the best price/performance systems available requires up to 10 million processors and consumes more than one billion watts of power. Its cost would be approximately $25 billion dollars, and the supercomputer would fail every couple of minutes. The system would cover the flight decks of all existing Nimitz-class aircraft carriers or fill up most of the Empire State Building with its hardware.
Nonetheless, by the conclusion of the unprecedented Pasadena gathering, attendees had formulated a framework and roadmap for accomplishing this revolutionary capability, later to be documented in the MIT Press book, Enabling Technologies for Petaflops Computing, by Thomas Sterling, JPL, Paul Messina, California Institute of Technology, and Paul H. Smith, Department of Energy. Although none of the contributors could know it, this workshop catalyzed a four-year process that extends these early-stage conclusions through a series of in-depth workshops, studies and projects to reveal a detailed picture of future ultrascale computing systems and their development.
In August 1995, the first in-depth workshop explored the opportunities and requirements of performing Grand Challenge applications at a petaflops level. The 10-day Petaflops Applications Summer Study, chaired by Rick Stevens of Argonne National Laboratory, revealed the major impact in science and engineering that such capability would have. It also determined that memory capacity required for many problems was far less than assumed. One barrier to better quantitative understanding is the lack of detailed architectural models for application studies to target. Without those, only general conclusions can be derived concerning application behavior.
As a consequence, the April 1996 Petaflops Architecture Workshop (PAWS), chaired by Thomas Sterling, formerly of NASA's Goddard Space Flight Center in Greenbelt, Md., investigated alternative architecture approaches to structuring such large machines and efficiently managing their resources. Participants identified four architecture classes and recognized several advanced device technologies as possibly important to future computing systems. Among the petaflops computing challenges pinpointed was parallelism, or linking multiple CPUs to build powerful distributed and parallel systems. A problem is the substantial delay incurred, or latency, that results from the communication among these large number of processors. Another consequence of parallelism is the resulting limitation in the bandwidth associated with the communication. Additional constraints dominating the design decision of any petaflops computer are complexity, power consumption and cost.
The responsibility for resource management in the presence of million-way parallelism and latencies spurs on a new generation of system software, including languages, compilers, debuggers, operating systems and low-level runtime systems. Devising such software even for teraflops machines has proven challenging and is expected to be more difficult for petaflops machines. The June 1996 Systems Software Workshop (PetaSoft), chaired by Ian Foster of Argonne National Laboratory, and the September 1996 MiniSoft workshop on system software, chaired by Sid Karin of San Diego Supercomputer Center, explored this critical technology. From these studies, an impression emerged that applications algorithms are markedly different from those employed on traditional machines ranging from mainframes to vector supercomputers. The necessary algorithms expose a high degree of parallelism while exhibiting a high tolerance to system latency. April 1997's Petaflops Algorithms Workshop, chaired by David Bailey of NASA Ames Research Center, examined the implications of this premise. The meeting led to a better understanding of the principles and forms of latency-tolerant algorithms, as well as other characteristics required of petaflops systems driven by algorithm-specific issues.
In addition to the focused workshops, the National Science Foundation, with support from NASA and the Defense Advanced Research Projects Agency (DARPA), sponsored a set of studies to investigate applications for near-petaflop-scale computer architectures. Expert teams across the country undertook eight projects to explore approaches and technologies that provided the basis for the PAWS meeting. One project carried out by the University of Notre Dame examined the impact of merging processor logic with memory cells on a single chip. This processor-in-memory approach exposes the high-memory bandwidth intrinsic to the memory chip, enabling very-high-efficiency computing with low-power requirements.
This same technology/architecture concept is being considered for future scalable spaceborne computing by JPL in NASA HPCC's Remote Exploration and Experimentation Project. A team of researchers at Caltech, the State University of New York at Stony Brook and the University of Delaware are exploring the use of superconductor rapid single-flux quantum electronics for very-high-speed electronics with very low power consumption in a framework employing the multithreaded execution model.
This model addresses the serious challenge of long latency delays inherent in message passing communications and slow main memory. To hide these delays, a multithreaded architecture performs more than one task or thread at the same time. When one thread is stalled by a communication or memory-access request and has to wait, the architecture automatically switches to another thread and processes it. This method achieves high efficiency of the execution resources and memory bandwidth, providing a way to manage long latencies.
This Hybrid Technology MultiThreaded (HTMT) project identified the need for smart dynamic random access memory, while the Processor-In-Memory project uncovered the value of incorporating very-low-power, high-speed accelerators and the requirement for an overall governing computing model.
As a result of these studies, the two projects have merged into a single new HTMT project directed by JPL and sponsored by NASA, the National Security Agency and DARPA for two years. The project involves a dozen collaborating investigator teams in technology, architecture, system software and applications. The results of this new research initiative may lead the way to rapidly achieving petaflops-scale computer systems.
Exploring the realm of ultrafast computing will continue with a final focused workshop on the full integration and operation of the different hardware and software components. The Petaflops Operations Working Review (POWR) workshop, to be held June 2-5, 1998, will explore the synthesis and behavior of complete systems to determine elements that must be investigated in future research projects, thus providing guidance for establishing follow-on government-sponsored programs in high-end computing. POWR will be hosted by NASA, with additional sponsorship by its other partnering federal agencies.
The Second Workshop on Enabling Technologies for Petaflops Computing will be conducted in February 1999 to review all aspects of this difficult challenge and to set new directions. NASA and other federal agencies have worked effectively with representatives of academia, industry and government laboratories to investigate high-risk intellectual questions that no one component of the community could explore independently. Leadership in coordinating the initiative has proven essential in advancing our nation's understanding of future opportunities and establishing community consensus guiding our next steps towards them.