Title: Cooking the perfect reduction or how to shrink science data while keeping its substance
Science is our best tool to address important societal problems like pandemic, climate change, transition to green energy with solid and effective solutions. Discovering solutions for these problems often relies on using ultra precise scientific instruments and large-scale numerical simulations that generate extreme volumes of data at high velocity. The next generation of these scientific equipments, currently under construction, will generate more scientific data than can be stored, communicated and analyzed. To respond to this unprecedented challenge, the community has identified scientific data reduction as a major research topic with the goal of finding solutions to reduce data size from one to several orders of magnitude while preserving the potential for scientific discoveries. Several teams are actively searching scientific data reduction techniques satisfying researchers constraints in terms of accuracy, speed and reduction ratio. This talk will discuss the current situation and detail solutions that researchers have developed to start addressing this research challenge. In particular, we will discuss progress in lossy compression for scientific data and methods to assess the performance of scientific data reduction techniques.
Cappello received his Ph.D. from the University of Paris XI in 1994. He joined CNRS, the French National Center for Scientific Research in 1995 and Inria in 2003, where he holds the position of permanent senior researcher. He created and directed the Grid’5000 project to help researchers in parallel and distributed systems to run experiments at large-scale in a controlled and reproducible way (https://www.grid5000.fr). Grid’5000 is still in used nearly 20 years later and supported hundreds of researchers to publish more than 1500 research publications. In 2009, Cappello became visiting research professor at the University of Illinois. He created with Marc Snir the Joint laboratory on Extreme Scale Computing (JLESC: https://jlesc.github.io) gathering seven of the most prominent research and production centers in supercomputing: NCSA, Inria, ANL, BSC, JSC, Riken CCS and UTK. Over his 12 years tenure as the director of theJLESC, Cappello has helped hundreds of researchers and students to share their research and collaborate to explore the frontiers of supercomputing. From 2008, as a member of the executive committee of the International Exascale Software Project, he led the roadmap and strategy efforts for projects related to resilience at the extreme scale. In 2016, Cappello became the director of two Exascale Computing Project (ECP: https://www.exascaleproject.org/) software projects related to resilience and lossy compression of scientific data that will help Exascale applications to run efficiently on Exascale systems. Through his 30 years of research career, Cappello has directed the research and development of several high-impact software tools, including XtremWeb, one of the first Desktop Grid softwares, the MPICH-V fault tolerance MPI library, the Fault Tolerance Interface (https://github.com/leobago/fti), the VeloC multilevel checkpointing environment, SZ lossy compressor for scientific data (https://exascaleproject.org/wp-content/uploads/2019/11/VeloC_SZ.pdf), and the Z-Checker tool to assess the errors produced by lossy compressors (https://github.com/CODARcode/Z-checker). He is an IEEE Fellow and the recipient of the 2018 IEEE TCPP Outstanding Service award.
Friday, February 26 at 2:30pm to 3:30pmVirtual Event