PROCESS: Towards Exascale Data Processing

Supercomputing can be streamlined for simplicity and flexibility: This is the experience from the EU project PROCESS, short for Providing Computing Solution for Exascale Challenges. Since autumn 2017, six research institutes, together with Lufthansa and the Spanish dissemination specialist Inmark, have been building a software portal for High Performance Computing (HPC) and developing programs for processing huge amounts of data from research and industry.

Munich’s Ludwig-Maximilians-University (LMU) coordinates the project, supports the individual organizations in software development and, together with the Leibniz Supercomputing Center (LRZ) in Garching, provides storage space and access to raw data: “PROCESS already offers modular and generalizable open source solutions for processing big and extreme data,” explains Maximilian Höb, a PhD student at the Department of Computer Science at Ludwig-Maximilians-University and technical coordinator of PROCESS: “We decided on a container model at the start because it allows applications for supercomputing to be executed at any HPC cluster, free of computer and software dependencies.“

HPC software for sharing and collaboration

With PROCESS software solutions, scientists can adapt their own applications better to the requirements of supercomputers. The participating scientific HPC centres in Germany, the Netherlands, Poland and Slovakia can jointly manage computing power and times and distribute large data projects among themselves. The platform provides around 15 software modules for data and compute management, but also for the analysis and evaluation of large amounts of data, as well as tools for virtualizing services and for standardizing or harmonizing authentication processes and computer infrastructures.

“To do this, we abstract the hardware and have developed a package of middleware that builds bridges between the individually programmed applications of users and different operating systems of the supercomputers,” explains Höb. In theory, PROCESS software could even forge a network of all scientific supercomputers in Europe, but at least it can promote cooperation between individual computing centres and facilitate the division of labour if data volumes in research and industry continue to grow rapidly. Since the start of PROCESS, valuable experience has also been gained in the development of HPC software for exchangeability and compatibility.

Tools for Extreme Data Processing

The individual modules of the PROCESS platform interlock like building bricks and were developed in various sub-projects. As a result, they have already proven themselves in practice in larger data projects from a wide variety of scientific disciplines. For example, PROCESS offers an execution environment for machine learning and image recognition. Originally programmed for the analysis and evaluation of medical image data, it can easily be adapted to new application areas: “The PROCESS infrastructure even makes it possible to develop more complex models and to use ever larger amounts of data at ever finer scales,” reports Höb.

PROCESS software stack release in preparation

Various workflow tools also improve the transfer of data from different sources. They help clean up disruptive information and simplify the preparation of data for analysis and processing on the supercomputer. The security of sensitive data and data protection was also a requirement in all use cases whose software and tools were combined on the PROCESS platform. The PROCESS portal therefore provides technical solutions for identifying users and separating data sets.

About a year before the end of the EU PROCESS project, the team is busy fine-tuning the software and portal: “By July 2020, we should have completed the PROCESS software stack release,” says Höb. At SC19, his focus is sharing PROCESS with the larger HPC community and building collaborations for PROCESS’ continued growth and success.


Software Repository: