The case study “How Will Astronomy Archives Survive the Data Tsunami?” brings out the fact that astronomy has begun generating more data that are becoming difficult to manage, serve and process with the current methods available. The case study states that the rate at which the data is growing is 0.5 PB each year, and that over 60PB of archived data will become available by 2020. It further outlines ways in which next generation methods and tools of handling the numerous data can be developed. This paper takes a critical look at the case study analyzing and assessing the highlighted issues regarding performance degradation, evaluating the archival technique and developing technologies discussed in the study. The paper also evaluates the offered methods of reducing the potentially big financial and computational costs resulting from the data archiving.
The Performance Degradation Issues Highlighted in the Case Study
Increased data sizes and their usage expected in the next few years have resulted in astronomy archives and data centers performance being affected. The affected data centers and archives include the NASA/IPAC IRSA. These archive and data centers are experiencing increased data sizes and usage as it is responsible for the storage and hosting of the data from the Spitzer Space Telescope and WISE mission. It should be noted that these two missions have generated more data volume than that of over 35 missions that are already archived.
The case also points out that the usage of the archive has been driven up by the data availability coupled with fast growth in program-based queries. This has greatly contributed to performance degradation of the archive, and this could even become worse as the new sets of data are made available through the archive. Increased requests for bigger data sizes have also contributed to the degradation as it has caused a fall in response time to queries.
The case further points out the fact that the archives operate on low budgets that are usually fixed over several years. Consequently, adding other infrastructures as usage of the archived data increases will not offer the best solution to the dwindling performance. It also notes that as archive holdings grow, the demand for data also grows with more sophisticated queries being raised since astronomical objects change over time resulting in the falling performance of the data centers and archives.
Archival Techniques Outlined in the Study
The case brings out various archival techniques that can be used to handle the growth of archived data. This includes cloud storage, use of R-trees, use of Geographical Information systems and Graphical processing units. Cloud storage will wok well with applications requiring much memory and processing as the cost of processing is low. R-trees, on the other hand, are used for indexing multidimensional data thus increasing access times speeds while Geographical Information systems “store information about the world as a collection of thematic layers that can be linked together by geography” (Bayfieldcounty, 2013). Lastly, the graphical processing units can be used to speed up the output of a picture or an image on a display device.
All these methods can be used, but they might not be able to handle the increasing amounts of archived data. For instance, for cloud computing to achieve the best performance, high-throughput networks and parallel file systems would be required given the large quantities of image data in astronomy. Use of cloud technology thus would be costly and uneconomical and apart from that other disadvantages such as lower Internet bandwidth leading to decreased performance make it unable to handle growth of archived data (Cloud Consulting, 2011).
On the other hand, Geographical Information systems are expensive and much more complex for use in astronomy. The high costs and complexity make it unsuitable for use in astronomical data archiving. Graphical processing units can be used, but they only support single-precision calculations yet astronomy more often needs the double-precision calculations and their performance are often confined by the data transfer to and from the GPUs.
The Emerging Technologies that Could be Applied to the Data Archival
Emerging technologies such as clustering, incorporation of the Montage image mosaic engine and use of infrastructures such as the SciDB database could prove to be effective in solving the data archival problem in astronomy.
The clustering technology involves bringing a set of computer processors together to create a super computer system. A processor is usually called a node and has its own CPU, memory, operating system, and I/O subsystem and can communicate with other nodes. Clustering enables heavy programs that would take a lot of time to run to be able to run on regular hardware (Narayan, 2005).
Montage image mosaic engines are toolkits used by the astronomers to create astronomical images into mosaic image. It has been tested on operating systems such as Linux, Mac OS X and Solaris. The engines produce an image mosaic following four steps with Montage implementing each of them as separate and independentmodules on files that follow the flexible image transport system format. This format has become the standard used by astronomers as the files are human readable form and the system is convenient in manipulating and managing large image files (Medina, 2007).
SciDB database, on the other hand, is an open-source software that creates the next generation computing database for the data scientists such as astronauts, bio-informaticians and any other field that used huge volume of many-dimensional data such as genomic data and geospatial data. The system combines analytical capabilities with data-management capabilities to support complex and flexible analytics. The system is declarative array which is oriented and extensible (Cudré-Mauroux, 2010).
The Proposed Methods, of Reducing the Potentially big Financial and Computational Costs Resulting from the Data Archiving
The case study proposes various methods of reducing the financial and computing costs that may be incurred while archiving data. This includes use of Graphical processing units (GPU), using R-tree based indexing schemes and academic clouding.
Graphical processing unit is made up of floating-point processors and when used may benefit some applications such as fixed-resolution mesh simulations, machine-learning and volume-rendering packages that run on it. GPU computing uses graphics processing units combined with a CPU to speed up general-purpose of scientific applications and also engineering applications. GPU computing offloads compute-intensive portions to the GPU, while the other part of the code is kept running on CPU and thus it makes applications run faster. GPUs are generally used in manipulating computer graphics and prove to be more effective than general-purpose CPUs in areas where large blocks of information are processed in parallel (Nvdia, 2013).
R-tree based indexing systems, on the other hand, support scalable and fast access to big databases with astronomical information and imaging data sets and is also important in indexing multidimensional information thus speeding up an access time. This technique is currently being worked on by the Virtual Astronomical Observatory in efforts to offer seamless data discovery services in astronomy. The system provides speed –ups that are way above database table scans and are already being used by the VAO Image and Catalog Discovery and Spitzer Space Telescope Heritage Archive.
Apart from that, academic clouding that is being used by Canadian Astronomy Data Center can also be applied by other archives. This clouding system enables the delivery, processing, storage, analysis, and distribution of every datasets of astronomical nature.
In conclusion, the amount of information being processed in astronomy is growing every year and new ways of managing, processing and storing as well as accessing such large quantities of data must be improvised.