Tens of thousands of compute cores, thousands of nodes, and intricate system architectures: Supercomputers are a challenge to program and run efficiently. Application developers, users, and system administrators need tools to meet that challenge. The Friedrich-Alexander-Universität (FAU) and the Erlangen Regional Computing Center (RRZE) develop LIKWID, an open source tool suite “for everyone who wants to take control of their software and the hardware it is running on,” as Thomas Gruber, research associate at the Department of Computer Science at the FAU and main LIKWID developer, explains.
Taking control in HPC
The name says it all – LIKWID stands for the motto “Like I knew what I’m doing.” Now the fifth major version of the software collection was released. Developed since 2009, it comprises ten useful tools and is funded by the Federal Ministry for Research and Education (BMBF) in Germany. “As of version 5.0, LIKWID now also supports the ARM and IBM Power processor architectures as well as Nvidia accelerators,” Gruber lists the most important highlights of the toolkit, which will also be presented at booth 2063 during SC19.
Used and appreciated worldwide
Since its launch, LIKWID has built quite a reputation in the HPC community: For example, most of the members of the Gauss Alliance, a network of German high-performance scientific computing centers, national supercomputing centers of Switzerland (CSCS) and of China (NSCC), or the National Energy Research Scientific Computing Center (NERSC) in Berkeley work with the tools from Erlangen. LIKWID is also part of the Debian software repository and therefore available in all Debian-based Linux operating systems. The HPC package manager Spack contains the toolkit as a mainline package. And finally, the SPEC Research Group lists LIKWID as one of the “peer-reviewed tools for quantitative system evaluation and analysis.”
Many universities employ the tools in lectures and courses for computer scientists, computational scientists, and other specialists: “LIKWID can reveal how hardware interacts with software; for example, the use of resources such as computational units or the memory interface of a processor can be accurately measured. Scientists and data experts can use LIKWID to check whether a hypothesis or a performance model of their application is correct,” Gruber says.
Monitoring in High Performance Computing
LIKWID is also a capable monitoring tool on HPC clusters. It helps users to identify possible shortcomings of their code that lead to sub-optimal performance, such as an unbalanced distribution of work across the CPUs. Administrators, on the other hand, can monitor their whole system as well as individual user jobs to pinpoint bottlenecks or oversights on the user side – e.g., when someone accidentally sets too small a problem size so that all processors do nothing but wait for work. In this respect, LIKWID simplifies the daunting task of finding optimization opportunities and makes supercomputing just a little bit more transparent.