ARRIA

Adaptive Runtime for Resource Intensive Applications (ARRIA).

ARRIA will be put back up on Github at some point.

ARRIA is a parallel computing framework and runtime system that is designed to make parallel execution transparent to the user by applying pattern matching and machine learning techniques to common parallel programming patterns during application runtime. By using runtime performance metrics to both effectively scale and manage computational tasks and resources, ARRIA demonstrates not only more effective utilization of computational resources, but increased application scalability and performance. Through the use an intuitive interface and binary instrumentation, as a program runs, ARRIA provides Iso-efficiency analysis and strategically maps I/O and compute intensive tasks onto resources using parallel programming patterns. ARRIA is ideal for demonstrating and overcoming performance limitations most often due to hardware and algorithmic constraints in high performance computing applications.

Consider the composition of classical piece of music. The music of each instrument is generally written by a single individual with orchestration over many parts with movements, rhythm, metric modulations and individual voices that must be delicately synchronized and syncopated. The human ear is quite good at discerning when these instruments, often led by a conductor, are working well together and when they are not. The ARRIA project mimics to some degree the symphonic development process, in the measurement and classification of application performance encoded as a collection of patterns for performance parallel computing. We can conceive that an application running in a high performance computing environment must take advantage of space and time much like an Aria concerns often a single voice interwoven with an orchestral accompaniment.

The following observations have provided guidance and motivations in the development of ARRIA:

Problem domain: High performance heterogeneous distributed computing environment, and scientific data service (cloud) environments as outlined in [7, 8].

Observations toward establishing the problem statement:

Observation 1.) The complexity of parallel processing will no longer be hidden in hardware by a combination of increased instruction level parallelism (ILP) and pipeline techniques, as it was with superscalar designs. It will have to be addressed at an upper level, in software, either directly in the context of the applications or in the programming environment. As portability remains a requirement, clearly the programming environment has to drastically change[6,7].

Observation 2.) Parallel Programming will continue to be difficult, particularly in efforts by the scientific community by non-computer scientists to migrate legacy applications to new heterogeneous multicore architectures. Using an entirely new language is not an immediately feasible solution as this would most likely require a restructuring and rewriting of the code. The majority of HPC applications still rely on the message passing model and are programmed by domain experts, often non- computer scientists, with FORTRAN/C with MPI and/or OpenMP. Through the use of link stage libraries and binary instrumentation we can provide effective runtime analysis of instrumented code[13].

Observation 3.) Data management and movement both locally and globally is still performance bottleneck. Furthermore, multicore architectural designs do little to improve this and often exacerbate the problem. This is true both on node and off node [11][12][14].
“Data intensive computing requires new paradigms for accessing, managing and processing petabytes of data on local heterogeneous clusters [Phuong Nguyen ‘09].”

Observation 4.) There has been considerable effort on the establishment and implementation of efficient parallel programming strategies for numerical methods, as these continue to provide the foundation of high performance scientific computing. These methods are unable to maintain scalability with the architectural changes of “the multicore revolution”. By focusing on efficiently scaling both computation and data intensive aspects of parallel numerical methods, we can provide a solid foundation for future HPC applications.

Observation 5.) Much of the research in this domain (high performance Cloud/Grid computing for science) focuses on either 1.) task level scheduling and node level optimization at the instruction level for SPMD applications or 2.) large scale data distribution frameworks for post compute analysis, pre run staging, or longer term storage for visualization[3]. I would like to argue that the vast majority of these efforts focus on optimizing a few unique properties of a particular application.

Observation 6.) The development of Parallel programming patterns[1,2] has been incorporated into a novel programming language[4], getting scientists to use a new language will be difficult, but having a framework which automatically chooses and implements a parallel programming pattern during runtime has not been explored in a HPC cloud/grid environment

Observation 7.) We can effectively establish a DAG representation that is independent of problem-size, overlapping of communication and computation, task prioritization, architecture-aware scheduling and management of micro-tasks on distributed architectures that feature heterogeneous many-core nodes [5].

[1] Parallel Programming Patterns "http://www.cs.uiuc.edu/homes/snir/PPP/" http://www.cs.uiuc.edu/homes/snir/PPP/
[2] http://parlab.eecs.berkeley.edu/wiki/patterns/patterns
[3] “Liu, Tieman, Kettimuthu, Foster “A Data Transfer Framework for Large-Scale Science Experiments”
[4] Keutzer, Mattson, “A Design Pattern Language for Engineering (Parallel) Software”, Dr. Dobbs Journal, May 18, 2010.
[5] Isard, Budiu, Yu, Birrell, Fetterly, “Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks". Microsoft Research. HYPERLINK "http://research.microsoft.com/en" http://research.microsoft.com/en-us/projects/dryadlinq/eurosys07.pdf. Retrieved 2007-12-04.
[6] Bosilca, G., Bouteiller, A., Danalis, A, Faverge, M., Haidar, H., Herault, T., Kurzak, J., Langou, J., Lemarinier, P., Ltaief, H., Luszczek, P., YarKhan, A., Dongarra, J. "Distributed-Memory Task Execution and Dependence Tracking within DAGuE and the DPLASMA Project," Innovative Computing Laboratory Technical Report, ICL-UT-10-02, 2010, 2010.
[7] Alexey L. Lastovetsky, Jack J. Dongarra, “High Performance Heterogeneous Computing”, Wiley Interscience, 2009.
[8] Michael Armbrust, Armando Fox, Rean Griffith, Anthony D. Joseph, Randy Katz,
Andy Konwinski, Gunho Lee, David Patterson, Ariel Rabkin, Ion Stoica, and Matei Zaharia, “Above the Clouds: A Berkeley View of Cloud Computing”, EECS Department University of California, Berkeley Technical Report No. UCB/EECS-2009-28, February 10, 2009
[9] S. Vazhkudai, X. Ma, V. Freeh, J. Strickland, N. Tammineedi, T.A. Simon, S.L. Scott, "Constructing Collaborative Desktop Storage Caches for Large Scientific Datasets", ACM Transaction on Storage (TOS), 2006.
[10] X. Ma, S. Vazhkudai, V. Freeh, T.A. Simon, T. Yang, S.L. Scott, "Coupling Prefix Caching and Collective Downloads for Remote Data Access", in Proceedings of the 20th ACM International Conference on Supercomputing, pp. 229-238, Cairns, Australia, June 2006.
[11] Tyler A. Simon, James W. McGalliard: Observation and analysis of the multicore performance impact on scientific applications. Concurrency and Computation: Practice and Experience 21(17): 2213-2231 (2009)
[12] T. Simon, S. Cable, M. Mahmoodi, "Application Scalability and Performance on Multicore Architectures," hpcmp-ugc, pp.378-381, 2007 DoD High Performance Computing Modernization Program Users Group Conference, 2007
[13] Jeffrey K. Hollingsworth and Barton P. Miller, "Dynamic Control of Performance Monitoring on Large Scale Parallel Systems", International Conference on Supercomputing, Tokyo, July 19-23, 1993.
[14] “Uncovering Results in the Magellan Testbed
An Interview with NERSC Director Kathy Yelick” http://www.hpcinthecloud.com/features/96947084.html (Accessed 7/1/2010)

UMBC Computer Science and Electrical Engineering

Tyler A. Simon -- Adjunct Professor