Accepted Poster-Only Abstracts
Exploring Large Composite Simulation Models with Splash (Tuesday)
Peter J. Haas, Patricia Selinger, Ignacio Terrizzano, Haidong Xue / IBM Research / Almaden
Large, high-resolution simulation models can easily generate vast amounts of data. Such models are becoming prevalent both in e-science and in guiding investment, planning, and policy decisions around complex issues such as population health and safety. IBM?s Splash project provides a platform for combining heterogeneous simulation models across disciplines to create composite simulation models of complex "systems of systems". Although Splash provides a powerful capability for collaborative, deep predictive analytics, the combination of multiple models exacerbates the amount of generated data, and understanding the behavior of a composite model can become extremely challenging. To fully exploit the potential of simulation models as tools for understanding complex systems, it is crucial that the generation of simulated data be carefully controlled and efficient.
The Splash Experiment Manager (Splash-EM) allows efficient exploration of composite simulation models. In Splash, model contributors provide metadata about component models in the Splash Actor Description Language (SADL). We extend SADL to let users specify information pertinent to experiment management, such as a recommended set of model parameters to potentially use as experimental factors, along with recommended factor values. Splash-EM assembles this metadata for the component models in a unified interface where a user can easily specify efficient numerical experiments using sophisticated statistical experimental designs. Splash-EM builds on this functionality to support sensitivity analysis and simulation-based optimization, identifying the most important system parameters as well as their optimal settings. The use of statistical design methodology can reduce data-generation requirements for these tasks by orders of magnitude. Splash-EM also provides a metamodeling capability, using stochastic kriging techniques to fit a response surface. The resulting metamodel can be used to explore model behavior interactively, without the need to generate new output data on the spot. Taken together, these new capabilities allow insight in the face of massive simulation data.
MapReduce on Massively Multi-Core Systems (Tuesday)
Piramanayagam Arumuga Nainar, Debabrata Sarkar / Oracle
Big data analytics is commonly performed on Apache Hadoop deployments that are usually scaled up horizontally on large numbers of small machines. By contrast, enterprises usually deploy traditional database applications on SMP servers with large numbers of cores and huge main memory (typically greater than one Terabyte). SMP servers are rarely considered for Hadoop's big data applications.
Yet, a recent study in a production Hadoop cluster finds that the working dataset for more than 90% of the jobs in that cluster is less than one Terabyte. Prefetching or fitting all the data in an SMP server's main memory can provide huge performance benefits by reducing IO and communication latencies. Iterative MapReduce applications (e.g. the machine learning algorithms in Apache Mahout) and real time (non-batch) Hadoop jobs can also benefit from SMP servers, if the working data sets fit in the main memory.
In this session, we will share our experience of running Hadoop and Mahout on Oracle's SPARC servers. Some of these servers can have as much as a thousand CPUs and more than a Terabyte of main memory. We present performance results and discuss the challenges and solutions for configuring Hadoop on such a system. We also present best practices and tips for tuning Hadoop in this atypical configuration.
AwardsMeter: Real-Time Social Analytics for the Academy Awards with HP Vertica (Wednesday)
Malu Castellanos, Manolo Garcia-Solaco, Riddhiman Ghosh, Peter Grimsdale, Meichun Hsu and Kieran McCorry / Hewlett-Packard
With the growing popularity of social sites such as Twitter, Facebook and blogs, there is a significant and increasing need to collect and analyze huge amounts of unstructured data from social media channels and present the insights gained in a manner that is simple and straightforward for individuals to consume. AwardsMeter is an example of how this can be achieved in the communications, media, and entertainment space, collecting large amounts of social media data, processing it in real-time, extracting sentiment and properties, providing an aggregate view of the extracted information and a convenient 'second-screen' to complement existing media channels. This poster describes the AwardsMeter platform, which exploits HP's Vertica database and cloud infrastructure for real-time analysis of structured and unstructured big data in social media and our experience of deploying it for a live trial during the 2013 Oscars event.