Introduction
Last year, I was faced with a user issue that initially confused me. A user running open-source CFD software told me they had installed the same software on their laptop, and it was running much faster on their 8-core workstation compared to a 120-core HPC-optimised machine on a cloud HPC cluster.
Could we have installed a software so poorly that a vanilla installation on a small workstation could do a better job?
I then asked the user about exactly what they were running, and it turned out to be quite small in scale. So small that the overhead caused by using HPC was actually harmful.
Why users may misunderstand when to use HPC
Initially, I started using HPC as part of my PhD project. I had a great grounding in science but no formal training in HPC.
I did eventually work out how to submit jobs for the specific software I was using, but that was mainly by asking colleagues and borrowing submission scripts, whose contents I didn’t totally understand.
It was very much a black box, and as far as I was concerned, the more cores and memory I could ask for, the faster my jobs would run.
Thankfully, I’ve never had to run something so small that a workstation would perform better. However, I’m sure at the time, I would be just as puzzled as the user who came to me with the issue.
As an admin, it’s important to remember that many users might not have considered when HPC could be detrimental to their calculations.
In fact, even in my role as an admin, my first thoughts went to something being wildly wrong with the software installation, as I assumed that anyone would be aware of situations where HPC is overkill (although, ironically, I’m sure I wouldn’t have been when I started).
When is HPC not a good idea?
HPC speeds up time to solution by breaking a big problem into small components and spreading the components out between cores and compute nodes. The components are then worked on in parallel.
These components usually need to communicate with one another constantly in order to perform the calculation (the more communication there is, the more ‘tightly coupled’ the calculation).
With all the communication taking place, delays are introduced through throughput and latency constraints in the interconnect between cores and nodes.
This delay is much greater for the interconnect between nodes than between cores due to differences in interconnect and distance.
When a small problem is broken up into too many components, there comes a point where the communication overhead between the components adds so much delay, it would be faster to not break down and parallelise the problem at all or at least reduce the number of components to run in parallel.
Conclusion
Since we began working with laptops or desktops, we are often conditioned to believe that more of everything will speed up a calculation. However, in the realm of HPC, where many computers are connected to one another, this is not always the case.
Organisations should take steps to ensure that this is made clear to users at an early stage. This may even encourage users to experiment with core numbers and memory to further optimise workflows by themselves.
Manveer Munde
Principal Consultant
Red Oak Consulting