DRAFT Findings

Petaflops Architecture WorkShop
Oxnard, California
April 22-25, 1996
by: Dr. Thomas Sterling, Senior Staff Scientist
Center of Excellence in Space Data and Information Sciences
NASA Goddard Space Flight Center

FINDINGS

The workshop on Petaflops Architecture was rich in understanding and findings related to the advance of very high performance computing towards the goal of achieving practical and useful systems capable of sustaining petaflops level performance. This chapter presents many of these findings in two groups: the seminal findings drawn from all aspects of the workshop deliberations, and a much larger list of supporting detailed findings organized by subdomain of architecture, technology, applications/algorithms, and system software.

MAJOR FINDINGS

ACCELERATED PATH TO PETAFLOPS:
Alternative approaches to petaflops architecture may yield petaflops capable systems significantly sooner than the natural evolution of computing systems limited to COTS technology.

LATENCY:
Latency management is critical to effective computing at petaflops scale and may require aggressive techniques in architecture, software, and algorithms. The only low latency is when enough fast memory is nearby.

PARALLELISM:
Parallelism for petaflops computing both for performance and efficiency will require between 100,000 and 1,000,000 sustained concurrent active threads and will require new scalable algorithms to expose sufficient parallelism as well as architecture and system software to manage it.

BANDWIDTH:
The primary obstacle to efficient petaflops execution is sufficient bandwidth across the system between processors and memory. Advances in hierarchical structures of processors and memory including PIM, advanced interconnections technology and topologies, and possibly on-the-fly data compression may all be required.

SUPERCONDUCTING RSFQ TECHNOLOGY:
Superconducting technology employing advanced RSFQ devices provides an important opportunity for implementation of petaflops processors at very high clock rates (> 100 GHz) and extremely low power consumption.

SPD - 1ST PETAFLOPS COMPUTER:
The first petaflops computer is likely to be a special purpose device designed with physical data paths and processing logic configured to match the requirements of kernel computations for a single algorithm. This could occur within 5 years and cost a modest $10 M.

PROCESSOR GRANULARITY:
Many small processors will provide greater performance at lower power consumption than fewer large processors for the same chip area, and if integrated with memory on the same die, such as PIM, can exploit memory bandwidth intrinsic to DRAM row buffers. However, effects of Amdahl's Law favor fewer high speed processors versus many more low speed processors.

MEMORY:
Memory based on DRAM technology will dominate the cost of a petaflops machine for many architecture concepts and the processor-to-memory ratio is a critical parameter of system design. Optical photo-refractive holographic storage technology and out-of-core approaches as well as emphasis on small working data set applications may significantly reduce system cost and make petaflops machines available sooner than would otherwise be possible. Distributed shared memory structures will ease programming and system software design but may have limits in effectiveness of hiding latency and delivering bandwidth.

OPTICAL INTERCONNECT:
Optical networks have wide applicability across classes of architecture and will be essential for high bandwidth, constrained latency, and in the case of free-space optics reduce interconnect complexity.

ARCHITECTURE CONCEPT CATAGORIES:
Among the diverse architecture concepts explored, the conceptual advances could be discriminated by catagories of 1. advanced processor design, 2. embedded aggressive latency management techniques in hardware and software, and 3. system interconnection structures and methods.

ARCHTYPICAL ARCHITECTURES:
Four major architecture types were identified although many variants or even combinations of each are possible.

ALGORITHMS:
A new class of computational algorithms will be required to enhance application parallelism by orders-of-magnitude and to decouple computation from communication for latency tolerant execution. These algorithms will be complicated by a move to meta-applications combining multiple disjoint kernels with distinct execution profiles.

SYSTEM SOFTWARE:
Although there is a basic level of software required to make a petaflops computer usable, there appear to be no new system software challenges that are unique to petaflops level computing. Nonetheless, the degree of scaling of critical system parameters such as parallelism and latency will significantly aggravate the responsibilities of system software and will require innovative solutions to compile time and runtime management of parallel activities, data sets, and physical resources.

BUSINESS MODEL:
An approach to developing and sustaining a petaflops computer community including joint industrial and academic R\&D and hardware and software manufacturing and support on an economically sound basis is crucial to the realizability of useful petaflops computing. A new business model is required redefining the relationship and roles of all participating entities.


SUPPORTING FINDINGS

ARCHITECTURE

TECHNOLOGY

APPLICATIONS/ALGORITHMS

SYSTEM SOFTWARE

GENERAL

Go To:


lpicha:5/30/96