![]() ![]() ![]()
|
|
Leaders of government, academia and industry met on June 11-12 to collaborate on building a foundation that provides continuous access to high-end computational capabilities. This network will make working together easier for remote collaborators by providing uniform access to information, software and computing resources while acting as a tool to augment that access. Scientists, engineers and decision-makers from across the country will collaborate in this robust working environment from which new algorithms will spread at our nation's feet. While the notion of linking people, computers and data over networks is a familiar concept, "this is the first time an unprecedented number of hardware, software and application experts are collaborating on this effort," says Ian Foster of Argonne National Laboratory and the University of Chicago. Noting the irony of having traveled across the country to contemplate a plan that, among other things, would improve videoconferencing, Foster adds: "Clearly, we are doing one thing wrong. We've all come to Ames for the meeting. We should have been able to stay home." Access to the nation's data Foster and Carl Kesselman, who is with the University of Southern California's Information Sciences Institute, led the Globus research project that investigates the integration of geographically distributed computational and information resources. They both know the value of providing continuous access to compute power and information. Applications ranging from climate modeling to aeronautics demand it. Take the problem of handling geographically distributed data repositories from NASA's Earth Observing System satellites. It is difficult to retrieve and filter the terabytes of data gushing in from around the country. "What good is this ocean of data if you can't easily integrate it with other models? What if you need to know the ozone level over Pasadena, Calif., every day at noon over the last three years?" asks California Institute of Technology's Paul Messina. To provide a climate model showing ozone levels 30 years in the future under a deadline, researchers and scientists across the country need to synthesize data maintained in geographically distributed repositories, digital libraries and databases. "What's the alternative?" asks Messina. "You can ask for 70 CD-ROMS to be mailed to you and somehow sort through all of them to find what you want. If you can't get to the vast information stored digitally, then you have to travel to the information. In fact, many people will simply not take advantage of all that information. Government agencies continue to amass their own huge archives of weather data because it's impractical to share. This means there is duplication of effort, and you, the taxpayer, end up paying more." A computational infrastructure, on the other hand, would allow the public to query for the information. The resources being shared with the public could eventually include the instruments themselves: data archives containing public information, various supercomputers and other resources used for the analysis. Funded by the High-Performance Computing and Communications (HPCC) Program and the Information Technology Program, NASA is building such an infrastructure or computational grid that will link NASA systems with National Science Foundation's computational grid R&D efforts and infrastructure (computing, data and network resources.) Experts from diverse branches of computer science see this evolving into a full-scale, nationwide resource called the Information Power Grid (IPG). A blueprint for this proposed grid technology is documented in the book, The Grid, published by Morgan Kaufmann and edited by Foster and Kesselman. The term grid is analogous to the electric power grid that provides uniform access to electricity. IPG will provide currents of computational power and data to autonomous users. There is no information power grid now. At the Supercomputing Conference in late 1995, there was a close examplethe Information Wide-Area Year (I-WAY) experiment. This wide-area, distributed computing experiment simulated a grid-like environment for three days by connecting supercomputers, databases and other resources at 17 sites across North America.
The analogy between electricity and computation may not be perfect. Nevertheless, "IPG is like the electrical grid in that we don't care what power station is producing the watts. I don't know why the computing industry should be any different. I mean, think about it," says Messina, chief architect of the National Partnership for Advanced Computational Infrastructure, led by the University of California at San Diego. Computing experts have thought about it, and some clearly are believers. Several major grid projects funded by federal agencies such as the National Science Foundation, the Department of Energy, the Department of Defense and NASA are underway. "Our challenge is to coordinate these efforts so they will lead to interoperable technologies and division of labor. As the process unfolds, we'll see the development and deployment of interconnected national-scale grids as one of the major activities shaping the 21st century," pronounces the National Computational Science Alliance (NCSA) director, Larry Smarr. Many researchers at the NASA June IPG meeting had already weathered technical and sociological beatings against their protogrids and still managed to push through meaningful issues. Ending "unhappy owners of machines," as University of Wisconsin's Miron Livny puts it, increasing the computing capabilities one to two orders of magnitude for scientists and engineers, building software that supports development of grid capabilities, and conducting high-speed network experiments with metacomputing (the coupling of geographically distributed resources) are major accomplishments. As impressive as these accomplishments are, a national computational infrastructure will not come into being just because "a bunch of researchers got together and thought it would be a cool thing to do," adds Smarr. Similar to the evolution of major infrastructure components such as the railroads, telephones, power and light, industry and government play a major role. Key factors include research, invention and standardization. Computing experts are hopeful that industry will eventually open the floodgates to information and computation even further. "It's the government's role to do the high-risk research, smoothing the way for industry to make further investment," adds NCSA's deputy director John Toole, who is uniting the technology and partnerships among hundreds of these alliance partners and 30-plus institutions. Distributed supercomputing IPG's future will evolve as it becomes more useful to industry. Large problems such as climate modeling require supercomputer-based online analysis using distributed resources such as data storage, processors, real-time interactive devices and networks. "There's no point in building the grid if you don't have the power plants to power the applications built on the grid," says Foster. "Distributed computing (a method of linking individual computers over the Internet) will allow more horsepower for complex problems," says Bill Feiereisen, division chief of NASA's Numerical Aerospace Simulation Facility at Ames Research Center. Feiereisen thinks 95 percent of today's multidisciplinary modeling of aerospace vehicles are applicable to this conceptual framework. "With proper decomposition, an application can be spread over multiple systems by sharing the computation over these distributed systems. Many assume that to achieve petaflops (one million billion floating-point operations per second) performance, the operations have to reside on one centralized calculation. But you can harness the power of a sum total of 10,000 independent calculations on individual machines to run thousands of simulations, with quick turnaround." [See Beowulf lives onas a build-it-yourself computer] Already many in the industry see build-it-yourself supercomputers as the wave of the future, promising petaflops capability at reduced cost. Potentially, supercomputers such as Beowulf could be one of IPG's network nodes. The algorithms developed for IPG must be delay-tolerant, much like those developed for Beowulf, says Messina. Latency results from factors of distributed processing which is when applications travel to the site where the data reside and distributed caching which is when data are moved to the supercomputer for analysis. "If a computation uses computers in San Francisco and Chicago, the time it takes to send a message across the country in optical fiber is close to 15 milliseconds. That's a very long time for a computer," adds Messina. (The speed of light in a vacuum is 299,792.458 kilometers per second, or about 186,282 miles per second.) "You can't surpass the speed of light. But if your algorithms are delay-tolerant, you can put together a system like IPG that is dynamic and able to translate requests into a set of characteristics," says NASA's Jim McCabe, who is involved in developing the networking elements of the grid. "We're moving into a new market economy, where cycles and bandwidth will be bought and sold like commodities." Some researchers doubt whether the communication topology can handle the sheer size of these problems if they are spread across the nation. "Very few people see the performance needed for this type of application in their local area network. The exception is High-Performance Parallel Interface (HIPPI) connections to Cray supercomputers and high-performance systems. The real question is, are our networks developed to the point that we can deliver this performance in the wide area? The answer, unequivocally, is yes," asserts William Johnston, NASA's new IPG manager from Lawrence Berkeley National Laboratory and the architect of DOE's grid project called China Clipper. The NASA Research and Education Network/Next Generation Internet initiative will provide high-speed connectivity for NASA's first protogrid. Many clearly hope that leveraging a collection of geographically diverse, heterogeneous resources (a metasystem) will lead to even bigger calculations. "Computing power is the key to reducing design-to-manufacturing time for aircraft. It could lead to fewer design cycles," says Ray Cosner, Boeing senior technical fellow. "Distributing computing needs across IPG could cut cycle time by 50 percent or more." This is critical, considering the largest datasets for wind-tunnel testing come from advanced optical instrumentation methods, such as pressure-sensitive paint, that return quantitative data in large quantities. According to Cosner, the design process could be reduced from 36 months to about 15 months if this cycle time is reduced and the number of cycles cut from three to two. "That's an enormous competitive advantage. It will depend on whether the grid delivers reliable access to huge compute power. If it does, what IPG will bring to American industry is increased reliance on high-fidelity modeling and simulation, which are low-cost alternatives to current design practices." Collaborative design High-performance computing is just one objective of IPG, as Andrew Grimshaw, University of Virginia associate professor of computer science, points out. "It's not simply lashing a bunch of machines together to do large-scale computation. What IPG is about is breaking down barriers between different groups who are working together so they can share information in a safe, secure, fault-tolerant way." IPG is a perfect solution for global companies such as Boeing. It can bring together geographically separated databases containing information that includes structures, wind-tunnel experiments and maintenance records, adds Feiereisen. In this way, engineering, manufacturing, marketing and even the customer can cooperate on design. For instance, two years ago NCSA industrial partner Caterpillar "designed in virtual environment caves," said Smarr. "Now they're extending this cave using high-speed networking to Germany and Houston in order to allow multiple participants to collaborate in new product design." Allowing engineers and manufacturing experts to work together on a design saved manufacturing costs and shortened time to market, both key benefits that result from these internal groups working together. Smarr seems to be telling everyone that radical change in humans working with information is IPG's religion. And many are telling others the same. "Information and compute power could improve the complex process of aeronautics. I talked with a senior engineer who has been in the aeronautics field for 40 years. He says, 'I'm just now barely understanding all the design pieces'," recalled NASA's Bill Van Dalsem, who hopes one day to see metal bent differently using better processes because IPG exists. How do you achieve this kind of acid test? Better design decisions. "In current design scenarios, 80 percent to 90 percent of the total cost of a project is committed when we have only 10 percent of the information," says Robert J. Hansen, deputy director for research at ARC. Other applications of the grid include high-throughput computing in which grids are used to perform large numbers of tasks. For example, one user might ask: How much behavioral data can I generate by the time my report is due? It will also be used for on-demand applications in which grids are used to meet peak needs for computational resources. Designers with tight deadlines may ask their companies to authorize paying for large amounts of resources in a short time period, including the interconnects. Interacting with the grid NASA wants nothing less than to redefine how people interact with information and computers. "You need robust software that ties together assets such as computing, real-time data acquisition, visualization stations and sensors," said Feiereisen. "We have to develop robust software because we cannot afford to debug complex and fragile mechanisms in such a diverse environment," agrees Livny. In a grid environment, where nodes may be removed and sold at any time by their owners, commonly used mechanisms are likely to turn fragile. "Hardware problems such as node failure, link failure and architectural differences reflect themselves in the software. Complexity and failure are difficult issues," adds Grimshaw. Over the next seven years, application software will in most cases be written from scratch by the best scientists and engineers in the country, Smarr tells us. "The software model is probably going to be something like a distributed ocean of object components that are being generated in a decentralized authorship fashion, but that interoperate with each other because they're objects." Conceivably, the Globus Project, which is one of the early, larger efforts that created a software infrastucture for grid capabilities, could be leveraged and scaled to new platforms. Another model is Legion, a large-scale, object-based distributed computing system. Portable Batch System, a mechanism that ties computers together while allocating jobs between them, will also be leveraged. "The question 'what does it mean to support application development in these grid environments?' is just now being explored. First, we are leveraging existing programming models and adapting them to run on heterogeneous dynamic grids, but we need to look at new types of programming models as well," says Foster. Certainly there is potential for revolutionary changes in software as the grid is used more. For example, it may be useful to have components implemented in a language such as Java that can be ported to many different machines. Still, by early in the next century, developers will know this new direction better. Above all, the new software will reduce barriers to remote resources and hide the heterogeneity of interfaces. There is a growing need for tools that help program designers analyze how well they are adapting their codes to the grid. One such tool developed within HPCC's Computational Aerosciences program and in collaboration with other programs is the portable parallel distributed debugger (p2d2). An additional uniform debugger that operates on distributed machines is on the wish list of grid developers. Another issue that concerns middleware system developers is resource management. One of many approaches to resource management that NASA is researching is the model demonstrated by the UWFlock at the University of Wisconsin (UW). The UWFlock uses Condor, a high-throughput computing system, to harness the power of more than 500 desktop UNIX workstations linked throughout the campus. Condor operates under an open market model in which each individual application or resource is an agent for itself. "It works just like the classified ads section in the newspaper. Two people describe themselves and the constraint of who they're looking for. Then a matchmaker checks to see if there is a match between the ads and informs the parties they're a match. Still, the matchmaker doesn't know anything about the semantics of the ads themselves. It allows members of each Condor community to select the attributes it uses to describe itself," explains Livny. For example, if a subset of the computing community would like to add a given attribute such as a video card or a solver, with the classified ad approach any feature can be added to the description of individual owners without involving any centralized authority. "So if you are looking for a video card, add it to your request," says Livny, "and you will be sent to that provider." With Livny, who for a decade has been developing Condor, the important question behind IPG research is "how to effectively address distributed ownership of large and evolving collections of hardware and software resources." Security Open cyberspace leads to serious questions about security. IPG researchers unanimously agree that security is one of their greatest hurdles. If aerospace designers are to use resources scattered across the country, protecting proprietary information is paramount. Some feel the real burden for ensuring security lies in the virtual-machine component of the computational system, which would enable someone like senior engineer Mark Turner of GE to collaborate with competitors on engine data. How is Turner to retrieve authorized data from his competitor but no other files? The grid's virtual-machine layer will enable a short-lived collection of resources to allow him to get what he needs while hiding the location on the grid. "But first you have to figure out how data will cross firewalls," challenges Turner. According to NASA's Jim McCabe, special policies must be written to allow passage through firewalls. "Today's data are identified by a simple IP address. We will need to write high-level descriptions to accompany the data that answer questions such as who sent the data? when? and what's the security clearance?" "As an MIT, Nobel-prize-winning economist once said, 'It's not enough to invent a technology. You only make change in productivity if you deploy the use of technology throughout the industry'," Smarr remarks. "Right now, we are not yet at the John Glenn-in-orbit stage. We're sort of at the Ham-the-monkey stage. The good news is he's in orbit and the ride on the rocket didn't kill him. He's still breathing. That's basically where we are. It's a long way to Space Station Freedom." NASA's IPG project is on an evolutionary path that may become revolutionary, according to McCabe. "It's just like the web. Five years ago, who would have thought we'd be wired the way we are today?" There are bumps along the way, of course. "But I think the role of the government is to do the high-risk research that no one in industry could do on their own," says Van Dalsem. "The benefits to be gained from this information grid are overwhelming," says Ken Kennedy, co-chair of the President's Advisory Committee on Information Technology. "We'll be able to solve problems earlier, save lives and make life better." However, sentiments may change if the grid is too difficult to use, cautions Van Dalsem. "We have to cue the grid to respond to the kind of question a designer might ask." Leaders of IPG are determined to make ease of use a priority so that IPG will be everywhere, like Coca-Cola. It's inevitable. Information technology is becoming an integral part of modern lifestyles. And yet, "seven years from now, will we get our investment back? That's hard to evaluate. But if we don't do the research, a decade from now, we will be sorry. Our aerospace industry and our national research infrastructure will have deteriorated," Foster warns. NASA and its partners seem determined to make every detail of IPG right, including solving each technical issue and resolving any doubts that would slow progress. "I will continue to push for a national grid as long as I am alive and kicking," asserts Livny.
|