Indiana University

MapReduce and Data Intensive Applications

TitleMapReduce and Data Intensive Applications
Publication TypeConference Paper
Year of Publication2012
Date Published07/2012
AuthorsQiu, J., and G. Fox
Refereed DesignationUnknown
Conference NameXSEDE12
Conference LocationChicago, IL
Publication Languageeng
AbstractWe are in the era of data deluge and future success in science depends on the ability to leverage and utilize large-scale data. This proposal follows up our successful first meetings in this series of “MapReduce application and environments” at TeraGrid 2011. Further we will use it to kick start an XSEDE forum. It aligns directly with several NSF goals including Cyberinfrastructure Framework for 21st Century Science and Engineering (CF21) and Core Techniques and Technologies for Advancing big Data Science & Engineering (BIGDATA). In particular, MapReduce based programming models and run-time systems such as the open-source Hadoop system have increasingly been adopted by researchers of HPC, Grid and Cloud community with data-intensive problems, in areas including bio-informatics, data mining and analytics, and text processing. While MapReduce run-time systems such as Hadoop are currently not supported across XSEDE systems (it is available on some systems including FutureGrid), there is increased demand for these environments by the science community. Figure 1 shows the statistics of projects on FutureGrid testbed, where Hadoop, MapReduce, and Twister (MapReduce variant) have been used extensively as a framework for experiments in scalable data processing.
URLFollow Link