Slovenian supercomputer Vega processes experimental ATLAS data
16 June 2022 – ATLAS cooperation uses a global network of data centers – LHC Worldwide Computer Network – to perform data processing and analysis. These data centers are generally built from the basic hardware to perform the full spectrum of ATLAS data processing, from reducing the raw data coming from the detector to a manageable size to making graphs for publication.
While Grida’s distributed approach has proven to be very successful, the computing needs of LHC experiments continue to expand, so ATLAS collaboration is exploring the potential of integrating high-performance computing centers (HPCs) into Grida’s distributed environment. HPC harnesses the power of purpose-built supercomputers built from specialized hardware and is widely used in other scientific disciplines.
However, HPC poses major challenges for ATLAS data processing. Access to supercomputer installations is usually subject to more restrictions than Grid sites, and their CPU architectures may not be suitable for ATLAS software. Their scheduling mechanisms favor very large tasks using thousands of nodes, which is not typical of ATLAS workflows. Finally, the installation of a supercomputer may be geographically remote from the ATLAS storage space, which can cause network problems.
Despite these challenges, ATLAS staff have been able to successfully exploit HPC over the past few years, including several near the top celebrities List of the top 500 supercomputers. We overcame technological barriers by isolating master computing from parts that require network access, such as data transfer. The software problems have been solved by using container technology, which allows ATLAS software to run on any operating system, and by developing “edge services”, which allows calculations to be performed in offline mode without external services should be contacted.
The latest HPC data processing center is ATLAS Vega – the first new fifth scale EuroHPC JU machine hosted at the Institute of Information Sciences in Maribor, Slovenia. Vega became operational in April 2021 and consists of 960 nodes, each of which contains 128 physical CPU cores, for a total of 122,800 physical or 245,760 logical cores. Looking at this from a perspective, the total number of cores provided by ATLAS from network sources is around 300,000.
Due to close ties with the ATLAS community of physicists in Slovenia, some of whom were heavily involved in the planning and launch of Vega, ATLAS was one of the first users to be given official timetables. This benefited both the ATLAS collaboration, which could benefit from a significant additional resource, and Vega, which had a steady, well-understood workflow to help in the start-up phase.
Vega has been almost constantly busy with ATLAS tasks since it was turned on, and periods of fewer jobs were due to either other Vega users or a lack of ATLAS jobs to submit. This enormous additional computing power – which essentially doubled the available ATLAS resources – was invaluable as it allowed several large-scale data processing campaigns to be run in parallel. As such, ATLAS cooperation is focused on restarting the LHC for completely updated Run 2 dataset and related simulations, many of which have been significantly expanded in statistics thanks to additional resources provided by Vega.
This is proof of the robustness of ATLAS’s distributed computing systems so that they can be scaled up to a single site equivalent in size to the entire network. While Vega will eventually be committed to other science projects, part will continue to be dedicated to ATLAS. Moreover, successful experience shows that ATLAS members (and their data) are ready to jump to the next available HPC center and make full use of its potential.
Source: CERN