Beowulf was the only warrior powerful enough to defeat the terrifying monster Grendel. For many data warriors their chosen high-performance computer is a build-it-yourself machine named after the epic's hero.
This 20th century Beowulf draws on the mass market for strength, linking personal computers across Ethernet networks. Using commodity equipment sold in stores or by mail order brings about "ultra-low-cost computing," said Thomas Sterling, who led the NASA Goddard Space Flight Center team that built the first Beowulf. "It is something lean and mean attacking a big, nasty problem," explained the visiting associate at Caltech's Center for Advanced Computing Research.
Winning mass acceptance
Key to Beowulf is the Linux operating system, a free version of Unix that runs efficiently on X86 and upward Intel-compatible, Digital Alpha and Power PC chips. Linux comes in complete source code form, so anyone can improve, extend or even contribute to it.
When starting the Beowulf project in 1994, "the missing piece was networking," said Don Becker, staff scientist at Goddard's Center of Excellence in Space Data and Information Sciences (CESDIS). Each chip type requires a different network adapter, so Becker began developing, and is constantly updating, software to drive nearly all adapters.
The software enables all three Ethernet networks: standard, Fast and Gigabit, providing 10 million, 100 million and one billion bits per second, respectively. These capabilities are now part of Linux, which is installed on at least 5 million computers.
Skeptics did not think high-performance computers could be home-built using these parts, especially volunteer-coded Linux. Proving them wrong, Goddard built several 16-processor Beowulfs under the auspices of NASA HPCC's Earth and Space Sciences (ESS) Project. Even though one ran 290 days without crashing, Beowulf truly came of age in November 1996 when two $50,000 systems, each with 16 Pentium Pro 200-megahertz processors, exceeded a compute speed of 2 gigaflops (2 billion floating-point operations per second).
"That caught people by surprise," Sterling said, and the achievement earned the 1997 Gordon Bell Prize for Price/Performance. Caltech's John Salmon and Los Alamos National Laboratory's Mike Warren spearheaded building those Beowulfs to run their N-body code, which simulates the universe's evolution.
This performance milestone inspired other laboratories and academic institutions to build Beowulf computers. From four-processor teaching systems at high schools to veritable supercomputers at major laboratories, Sterling estimated there are 200 to 300 Beowulfs in the world, "but it changes every week."
Would-be Beowulf builders can obtain Linux and monitoring software from Goddard's World Wide Web site or from a $30 extreme.linux CD-ROM published by Red Hat Software. Several heavily attended NASA- and Caltech-sponsored How to Build a Beowulf tutorials have also demonstrated the complete process.
Scaling greater heights
While guiding others, researchers at the sites of Beowulf's first victories are exploring larger systems. Caltech's $240,000, 120-processor Naegling mixes 200- and 300-megahertz Pentium Pro chips. "Sustained over many hours, we get 70 megaflops (70 million flops) per processor on the N-body code," Salmon said. Comparable performance is experienced on a Caltech Southern California air pollution model and the European Southern Observatory data processing system.
Naegling also serves as an emulator for the Hybrid Technology MultiThreaded petaflops (one million billion flops) computer [see "Mix of technologies spurs future supercomputer," INSIGHTS, July 1998]. Slated for completion in 2007, the design calls for processor-in-memory chips to feed data to superconducting processors. "We are trying to figure out how we are going to program the machine," explained Loring Craymer, NASA Jet Propulsion Laboratory scientist. "The purpose is to look at application behavior on Naegling and extrapolate the HTMT performance."
Also setting trends is Goddard's Highly-parallel Integrated Virtual Environment (theHIVE), which employs 128 Pentium Pro 200-megahertz chips. Primary funds for the $210,000 system came from NASA's Regional Application Centers. "These are Earth science projects in the germination stage," said ESS Project chief technologist John Dorband. "With their own HIVEs, Baltimore might study the Chesapeake Bay, or Chicago might investigate the Great Lakes." A particular interest is image segmentation, which is searching through satellite observations of Earth's surface for regions with common attributes. Goddard scientist Jim Tilton said theHIVE crunches segmentation routines 33 percent faster than a 512-processor CRAY T3E supercomputer.
Goddard and Clemson University are researching Beowulf as a secondary mass storage system. "With the Earth and space sciences' large data sets, you need to move into and out of the processors very quickly and not go back to tape storage or slow disks," said CESDIS staff scientist Phil Merkey. Using Gigabit Ethernet for its main internal network, the $162,000 Bulk Data Server can access intermediate storage at several billion bits per second.
Gigabit Ethernet also connects to theHIVE for testing Clemson's parallel file system software on applications. Furthermore, the addition of the server's 100 personal computers "offers more than 200 processors in one room for computing," Merkey said. "A big issue with Beowulf is determining how well applications will scale." Caltech studies show that 1,000 to 2,000 processors are at least theoretically possible at acceptable cost.
The current performance and price/performance champion is Los Alamos' Avalon. Constructed by Mike Warren and colleagues last May, Avalon uses 68 Alpha processors, the same type as a CRAY T3E. Its 20 gigaflops performance on the Linpack benchmark places Avalon 315th on the latest TOP 500 list of the world's supercomputers. The $150,000 price tag translates into $7 per megaflop, compared to about $175 per megaflop for a commercial machine.
Price/performance on Beowulfs improves quickly because mass-market parts get cheaper and faster every few months. Even so, the software architecture can stay essentially the same. "High-performance computing has a sad history of people moving to new generations without the same programming model," Sterling said. "Beowulfs are forever."
Despite that consistency, Beowulfs will not replace traditional supercomputer centers. The primary reason is a relatively long delay, or latency, in passing data among processors. Average Beowulf latency is 100 millionths of a second, while it is only a few millionths of a second on the specialized network of a CRAY T3E. "You have to make data packages as large as possible so you don't lose the benefits of your network bandwidth," Dorband said. "Think of buses instead of cars."
Like ESS Project graduates Salmon and Warren, "our investigators should use premium-cost cycles to develop applications," stated Clark Mobarry, ESS deputy project manager for applications. "If they find the programs are latency-tolerant, they can use Beowulfs for production." Besides installing a large supercomputer at Goddard early next century, the project is considering building a 256-processor Beowulf to encourage such investigator activity.
For the general science community, Salmon stressed that "the real work that needs to get done is in software and standard setting." Community agreement is the best path to standards, but software advances are being incorporated into a second extreme.linux CD-ROM due out this fall. "It will be easier to install and have a few applications ready to run," Becker said. Together with the new MIT Press book, How to Build a Beowulf, these tools will help greater numbers make their own heroes.
1Anonymous, Beowulf, translated by Francis B. Gummere, P.F. Collier & Sons, New York, 1910. Prepared for the University of Virginia Library Electronic Text Center.