Using iRODS
Using iRODS
Speeding Information to Help Predict “the Big One”
By Paul Tooby, DICE Communications
iRODS Data System Powers High Speed Automated Data Transfer
Scientists from the Southern California Earthquake Center (SCEC) are unleashing massive “virtual earthquakes” using supercomputers, giving a crucial advance look at the likely impacts of the large earthquake expected to hit California – “the Big One.” This information can provide guidance for improved hazard estimates, building codes, and safer buildings, potentially saving lives and billions of dollars.
But these detailed simulations produce mountains of data, creating new challenges in how to manage and move all this information. When running large simulations on the National Science Foundation (NSF) Ranger supercomputer at the Texas Advanced Computing Center (TACC), computational scientist Yifeng Cui of UC San Diego’s Supercomputer Center needed to efficiently transfer 20 terabytes of data from Texas into the 159 terabyte SCEC Digital Library in San Diego (one terabyte is about 1,000 gigabytes, the equivalent of about 2 million books).
To manage this large data collection, Cui and colleagues chose the NSF-supported Integrated Rule-Oriented Data System (iRODS), which makes data transfer easy, and even the 20 terabytes of SCEC data could be transferred from Texas to San Diego using just a single command.
By taking full advantage of the capabilities of iRODS, Cui and UCSD graduate student Sashka Davis were able to achieve a six-fold speed improvement over initial performance, which cut the total time required to transfer 20 terabytes of data from about 10 days to less than 2 days if done continuously -- an important improvement that helps the researchers get more science done. [Here is a technical report on how they achieved high speed data transfer using iRODS.]
The researchers transferred earthquake simulation data at a peak speed of 177.8 megabytes/sec and an average rate of 130 megabytes/sec. This made use of more than 70 percent of the theoretical peak 250 megabytes/sec network bandwidth, which networking experts said is excellent performance over this real-world 1,300 mile network link. And this performance includes not simply transferring the data itself but also ingesting both the data and the descriptive metadata into the iRODS digital library where it can be stored, shared, discovered, re-used, preserved, and more.
By speeding access to data, the iRODS Data System helps ensure that the growing realism of “virtual earthquakes” can teach scientists and policymakers the secrets of tomorrow’s earthquakes, giving new scientific understanding and important time to prepare.
The iRODS system is supported by the NSF and the National Archives and Records Administration (NARA) and developed by the Data Intensive Cyber Environments (DICE) group at the University of North Carolina at Chapel Hill and the University of California, San Diego, and supported by the NSF and the National Archives and Records Administration (NARA).
Related links:
Technical information on fast data transfer with iRODS: “Progress Towards Efficient Data Ingestion into iRODS,” Sashka Davis, UCSD, September, 2008.
Southern California Earthquake Center (SCEC) http://www.scec.org
Data Intensive Cyber Environments (DICE) group http://diceresearch.org
Integrated Rule-Oriented Data System (iRODS) https://www.irods.org
National Science Foundation (NSF) http://www.nsf.gov
National Archives and Records Administration (NARA) http://www.archives.gov
Massive Virtual Earthquakes Require Fast Data
To explore the detailed impacts of the expected “Big One” on the San Andreas Fault, the NSF Southern California Earthquake Center (SCEC) conducts large scale “virtual earthquake” simulations. As part of the research, the iRODS Data System efficiently moved a massive twenty terabytes of simulation output 1,300 miles from Texas into the iRODS-based SCEC digital library in San Diego. A. Chourasia, UCSD.