For the first time since AMD's ill-fated launch of Bulldozer (their successor to Opteron) the answer to the question, 'Which CPU will be in my next HPC system?' doesn't have to be 'Whichever variety of Intel Xeon E5 they are selling when you procure'.
In fact, it's not just in the x86 market where there is now a genuine choice. Soon we will have at least two creditable ARM v8 ISA CPUs (from Cavium and Qualcomm) and IBM have gone all in on the Power architecture (having at one point in the last ten years had four competing HPC CPU lines - x86, Blue Gene, Power and Cell). In fact, it may even be Intel that is left wondering which horse to back in the HPC CPU race with both Xeons looking insufficiently differentiated going forward.
For the first time in at least five years, the need for comparative benchmarking, conducted as part of your pre-tender process, is looking to be an absolutely essential step to deliver the best value, rather than being viewed as something that provides a little more confidence that the vendors had tuned the MPI implementation.
At Red Oak we routinely track the evolution processor technology from all the vendors and have and relationships with and the capability to ensure that early pre-tender benchmarking can be carried out. Contact us to ensure that you aren't left rueing a decision to restrict your procurement choice to an x86 platform.
Machine learning is one of the buzzwords of the last few years but it is hardly a new phenomenon and can be traced back to the early 1980s. There are in fact a number of related terms, including artificial intelligence (AI) and deep learning, which are often erroneously used interchangeably by the unfamiliar. The term AI was coined in the mid-1950s and was intended to mean generalised human intelligence being exhibited by machines. Machine learning is the practice of using algorithms to parse raw data, generate a set of rules from its analysis and to then to make a determination or a prediction about new inputs based on the knowledge that it has ‘learnt’. This is usually against a narrow range or well-defined set of tasks. Deep learning is really just a set of techniques for implementing more nuanced and sophisticated machine learning methods, primarily based around layered artificial neural networks (graphs). What is clear is that a machine learning revolution is underway and that soon very few areas of technical endeavour will not have some level of AI associated with it.
NVIDIA as a company has very much been at the forefront of this new wave of machine learning which has been driven by wide-scale adoption of machine learning by the ‘Super Seven’ (Google, Amazon, Microsoft, Facebook, Alibaba, Baidu and TenCent). The current techniques being used to train the machine learning systems are demanding and to date rely heavily on the parallel processing power of GPUs (principally NVIDIA P100s). While inferencing is less amenable to a GPU architecture NVIDIA has directly addressed this with ‘Volta’ their next generation GPU (which will ship in Q3 this year in DGX-1 boxes).
By any metrics, Volta is a beast of a device, featuring more than 21.1B transistors and an area of 815mm^2 on a 12nm TSMC process node. Performance has been significantly boosted across the board (by between 40 and 60% for many typical HPC benchmarks) but the biggest departure is probably the addition of the 672 TensorCores which are specifically geared for machine learning workloads (tensor products – essentially 4x4 matrix matrix-multiplies) and can deliver a theoretical peak performance of 120TF of mixed precision performance. According to NVIDIA Volta is good for a 12x increase in training performance and more than 6x for inferencing. There’s understandably a lot of detail that wasn’t given at GTC which I imagine will help us better understand the strengths and weaknesses of the approach.
The addition of what amounts to machine learning specific acceleration to their mainstream GPU architecture speaks to both how computationally demanding the space is and how much of as market there is to serve. In addition, the decision to open source of NVIDIA’s deep learning accelerator (DLA) IP, which will have its first outing on the Xavier SoC targeted at the autonomous vehicle market, can be seen as a move to address Google’s TPU strategy and build a deep learning ecosystem around NVIDIA IP.
NVIDIA have also looked to serve the machine learning market with the DGX-1 (what they are calling an AI supercomputer) and also the NVIDIA deep learning stack. Both of these are really an attempt to make picking up and using machine learning more of a turnkey or appliance solution which by bundling an ecosystem and support enables profit margins to remain healthy.
What’s clear is that machine learning is going to be a battleground for all the silicon heavyweights and that the architecture and marketing types are going to be busy trying to capture market share in what is sure to be one of the major growth areas for the foreseeable future.
#GTC17 #ai #machinelearning #deeplearning #volta #xavier #dla
Throughout the period since the announcement of Omni-Path, one eyebrow-raising trend to be noted has been the prevalence of commentary from Mellanox which tries to paint Omni-Path in a bad light, rather than focusing on the benefits of their own technology. Particularly disquieting was this benchmarking study - not because of the results, but because Mellanox committed the cardinal sin of presenting data (to a largely scientific audience) with dodgy graph axes ("performance rating"?) and nothing in the way of reproducible parameters. Responses by Intel in the media were conspicuous by their absence.
Fortunately for Intel, they had a platform from which to set the record straight at their pre-SC16 HPC Developer Conference. Intel OPA bosses went to some length to explain how the onload/offload comparison was a red herring (due to the overhead of offloading small messages), and how their interconnect delivers much greater performance per dollar than the competition. Regardless of what you think to the arguments either way, Intel should win points for a more respectable marketing strategy, and a more scientific approach to presenting benchmark results.
It isn't all sunshine and rainbows for Omni-Path, though. From the announcement of HDR InfiniBand at the start of SC16 and the lack of public news about Omni-Path 2, we can deduce that come mid-2017, the 200Gb/s segment of the interconnect market will belong to Mellanox for at least a few months. To make matters worse for Intel, the Mellanox HDR100 solution will negate the Omni-Path switch radix advantage and corresponding cost effectiveness argument. More damning, though, is the slow uptake of Omni-Path by major storage vendors. Poor throughput performance is rumoured, but not publicly confirmed. Unfortunately, the mud-slinging is unlikely to be finished just yet.
HPC software stacks is a topic I covered in a recent technology intelligence briefing. Having been impressed by OpenHPC in my first (successful!) attempt at system configuration, I wanted to hear about the release of version 1.2 and the healthy growth in the number of supported packages - including Omni-Path support - which was bizarrely lacking from the previous release.
Details were also presented for the Intel-supported version of the OpenHPC software stack, HPC Orchestrator. Having commercial efforts feed into the open source project (and vice versa) will help to protect OpenHPC from abandonment, and the promise of expanded functionality such as resiliency and the development of a login node image should broaden appeal to more complicated, production-critical systems. OpenHPC and HPC Orchestrator are good for customers because they provide a sane, default fall-back option in case vendor-specific software proves to be too bloated or expensive for the desired use case. They are also good for vendors, providing a stable platform on which to build their value-add.
But my view did not seem to be shared within the audience at the developer conference session. Some customers, who are perhaps weary of the Intel monopoly on their market, were not particularly enthused at the idea of paying yet more money for an HPC Orchestrator support contract. Without detailed pricing information for the various levels of HPC Orchestrator support, it is hard to judge what the impact will be for any given procurement but I don't believe this is something for service managers to fear. If their funding model precludes a support contract, OpenHPC provides a perfectly capable base software stack at no cost and can be integrated with their current tools easily by installing only the desired subset of packages. With improvements from the commercial stack feeding back into the open source project, there are no obvious downsides for the community. What I see is HPC Orchestrator being a safety net for small admin teams and a hedge against software compatability issues for large HPC centers - an insurance policy they ought to be willing to pay for. After all, if you have an application which is important enough to warrant a share of your HPC, someone should probably care enough to keep it running!
Softbank are a giant technology and infrastructure magpie who along with a large portfolio have also accumulated staggering amounts of debt in recent years, approaching $120B even before the ARM bid, which will cost them another $32.4B. Is Softbank a good home for a company on which so many pin their hopes of challenging the Intel Hegemony?
The general consensus is that Softbank are opportunistically adding to their portfolio with some vague longer term view to leveraging ARM to position Softbank to be a major player in the Asia-Pacific Internet Of Things (IOT) market. From that perspective adding ARM to their collection of companies looks like a strategically interesting choice. However any well-reasoned rational for the purchase has so far not been forthcoming from Softbank. In most respects they would be well advised to leave well enough alone at ARM, at least as far as senior management and technical direction. They have promised significant investment and growth along with leaving the corporate HQ in Cambridge, but how many of these promises will materialise only time will tell.
If they are sensible they will tread lightly, providing steady investment capital (because growing a technology company is often gated more by the longer term ability to attract and retain engineering talent) and gently ensuring that the low power IOT tuned devices ARM are bound to be developing have a very firm focus and are aligned with how Softbank Group and their many subsidiaries want to do things. Despite the press noise, it was never likely that a counterbid from any other large vertically integrated corporations (think Samsung, Google or Apple) would have passed the competition scrutiny that would have followed.
The purchase comes on the back of ARM's most successful year ever, with a number of developments in the HPC and Enterprise sectors (ISA extensions etc) and several new headline architectural licensees (including Fujitsu) as well as a steady stream of vendor roadmap announcements in the HPC sector (think Qualcomm, Broadcom, Cavium and Applied Micro). ARM technology in the datacentre is, at last, reaching the critical mass required to compete with x86 (and to an extent IBM's POWER). It won't be competing on absolute performance, not yet at least, but you can guarantee that at a price point there will start to be some deals done and that will give the vendors time to run Intel a little closer.
Against a background of acquisition and consolidation in the technology sector it is also clear that the HPC battleground continues to be China. China overtook the US as the largest installed base of HPC this June (though one can argue how valid this result was) but with deep pockets and a national investment programme that will generate results, ARM is definitely in the mix.
While the home-grown architecture powering the Sunway TaihuLight, current number one on the June 2016 Top500, is a significant first, it is also clear that an ARM based HPC class CPU will be developed in China. Just such a beast was unveiled by a Chinese company (Phytium Technology) at last year’s Hot Chips, and at this year’s conference Phytium showed functioning silicon (the Mars derivative of their Xiaomi architecture).
The Post-K announcements from Fujitsu and Riken are also important and will ensure that the rivalry between China and Japan will continue to be played out in the Top500. In the US, there are at least two and arguably four credible ARM based HPC focussed SoCs currently on the table, so it will be interesting to see where they enter the market. Best guess is that we will probably have to wait till 2019 before we see a Top10 entry using an ARM architecture.
HPC is hotting up again and the next six or seven years are bound to be interesting times!
You will be horrified by the answers.
As an industry we have become so used to projects being late that we no longer notice. We’ve even built slippages and over runs into our project methodologies: milestone and cost tolerance. Argh!
This is nothing short of a scandal and as professionals we should be ashamed of ourselves.
Let’s look at the scale of the scandal. A few minutes with a spreadsheet will reveal that each week of delay for a typical £3m HPC purchase is equivalent to £50,000. But these costs are invisible to the project: staff costs, lost business benefit, electricity, data centre opportunity costs, cost of money, etc. However to the business overall these are very real. And this ignores the wider industry costs. Delays in customer acceptance (and therefore revenue recognition) is so endemic that suppliers build a risk premium into all of their prices. (The cost to a supplier of each week’s delay to acceptance of a £3m HPC is about £15,000 on a £3m HPC system. It’s not the supplier paying this; it’s the customers.)
This is our industry’s Silent Spring moment.
We have the tools in our hands, but we are either too lazy or too arrogant to use them. Strong governance is hard and it is difficult to admit that we might not have the required level of expertise or experience in house.
This is not about delivering a convincing case for Red Oak engagement (though it is that in part!) but a more general call to arms across the industry.
Although the US is not at risk of losing its lead in the HPC race1, China is making a highly credible challenge for the top spot. Ironically the US’s recent decision to prevent Intel from selling key components to the Chinese laboratories will probably only increase Chinese resolve and accelerate development of indigenous technologies.2
The recent announcement that NUDT will use a home-grown DSP chip (instead of the planned Xeon Phi) to deliver the world’s first 100 PF system caused many in the community to raise their eye brows and reflect that China’s response to the US embargo is much quicker than the everyone probably imagined.
There was therefore an almost tangible sigh of relief from the beleaguered US HPC community upon the release of President Obama’s Executive Order instantiating the National Strategic Computing Initiative. This initiative fully embraces “to outcompete we must outcompute” and sets ambitious targets. It is very sensible.
However, it comes with no money.
And that is the rub. HPC more than any other industry is dominated by levels of investment; to call the HPC tune you need to pay the piper – handsomely!
So what will happen to the NCSI? My guess is that it will develop some very interesting and worthwhile strategy documents over the next couple of years and then be quietly forgotten when the new President takes office. But beyond that its impact will be nil. Unfortunately
1 The level of HPC investment across the whole US economy (ie not just by the national laboratories) is too great.
2 The US Department of Justice’s track record in successfully protecting the US’s lead in key technological areas is patchy (at best). Own goals that spring to mind include the Paragon in the 1990s, DRAM in the 1980s and satellite technology in 2000s.
In this environment it is common for customers to self-insure their hardware and software support issues, particularly at the end of the typical three year initial support contract. This is an attractive proposition: after all the sum of all the hardware failures and software support calls rarely adds up to the total price of the support paid to the supplier. 1 And the attractiveness increases when the follow-on quotation arrives from the supplier for support moving forward! 2
The first dragon is obvious, the Bathtub dragon. Hardware becomes increasingly unreliable the older it is. It is, though, relatively easy to defeat this dragon and to build this into an internal business case. About 50% of business cases do this.
The second dragon is Bathtub’s older brother, Obsolescence. Most hardware is obsolete within six months of being sold after which it is no longer possible to buy it. This is harder to factor into the internal support strategy, but boils down to three choices: either pre-buy a stock of spares (the so called 'lifetime buy'), engineer the system so that it is capable of accepting different-but-similar parts, or over-buy the original solution combined with a design which allows the system to degrade gracefully when parts fail. This requires a level of sophistication within the internal HPC group that implies a reasonably sized dedicated HPC team. This is rare: 10-20% of groups have this.
The third dragon, Head Count, is an artefact of the financial realities facing businesses (and government): front-line, value-generating staff are more valuable than back-office support staff. In the battle to secure headcount it is easier to justify a new front-line member of staff (with a typical income to cost ratio in excess of five and normally nearer to ten) than it is to justify an additional support person (with a typical saving to cost ratio of between one and two). This is why the fraction of an full time equivalent (FTE) built into the saving business case rarely materialises! Less than 10% of business are able to defeat this dragon.
The fourth dragon is Actuarial Reality. It is economically naïve for a customer to think that it is better placed to assess the actuarial risk of hardware and software failure than the supplier that sold the original equipment and has a large installed base of systems upon which to build its cost and risk models. For sure, support is (nearly) always a significant profit earner for suppliers. However, by and large if the quoted support cost is high it is because that reflects the risk as measured by the supplier. (A simple test of this is how desperately the supplier tries to win the support business. Often times the customer might be able to secure a 5-10% reduction, but after that the supplier will simply walk away.) It is not possible to defeat this dragon, only to ignore it!
The last dragon is the most fearsome of all, the P45 dragon. A failure to secure the IT infrastructure of an organisation will cost the IT Director his/her job. The same way that only bad things follow from a failure to regularly patch systems with security updates, so running operationally important applications on unsupported hardware only presents downside for senior IT staff. The question typically asked by the Board is: “If an application is important, then surely it’s important enough to run on supported hardware?” This conversation always ends badly for the IT Director.
“Ah,” but I hear, “we know all of this and we have agreed with our (internal) customers that our support is on a best-endeavours basis only.” 3,4 This is very dangerous territory for a professional IT organisation. It inevitably leads to a mindset that is dominated by capital expense, implicitly assuming that operating costs are zero. (If you have a data centre which houses equipment older than five years, you are probably a victim of this mindset.) It is common in these organisations for the support function to be undervalued, for support to be a reactive, do-it-in-the-margins activity rather than proactively maximising the usage experience of the estate.
At the end of the day, we undertake support (and buy in external hardware and software support) to maximise the benefit to our users of our systems, as measured by predictability of availability, downtime and number of lost jobs. That’s not to say that there is no place for self-insurance of hardware and software support. However, it is remarkable that those organisations that are the most mature and capable of undertaking these activities are often those that most aggressively decline to do so, often married with an aggressive decommissioning and refresh timetable.
1 These sums often ignore the cost of the engineer call out time as well as any margin for the supplier. In any case, support is in essence an insurance policy.
2 The typical argument runs along the lines of: “Ove the last year we’ve had X hardware failures. The cost of these parts is about Y. Last year the support bill was Z. Thus we can save money if we support the hardware ourselves, though it will require an additional fraction of an FTE to undertake the repairs.”
3 Most people actually mean “reasonable endeavours” in the legal sense.
4 Wait until the production HPC is unavailable for two weeks ahead of an important deliverable due to a catastrophic failure of, for example the disk controllers, before being too confident that the Board really did accept this risk. The fifth dragon applies here!
There has been much recent speculation about the veracity of D-Wave System Inc.’s claims to have invented a quantum computer. See here, here, here, here, here, here and here for just a small selection. There has also been much discussion about quantum entanglement, types of quantum computers, and the ability to run Shor’s algorithm. However, this misses the point.
It is clear that the D-Wave machine is different from traditional CPU-based (or even analogue) computers and it might well display the characteristics of a traditional (sic) quantum computer. But why is the industry so fixated by what is inside the box? For sure, in our desire to better understand the nature of the world and thereby develop better things to help humanity, we will need, in time, to understand what is going on inside the D-Wave devices. But at this stage, we need to focus on the real question: For what class of problems is the D-Wave effective, and can we predict in advance which specific problems it will be better at?
At this stage, I don’t care if the D-Wave devices are quantum computers or even if they are populated by very smart, sub-atomic kittens. (Though, if I had a choice, I think I would prefer the latter!)
The crux of the matter is in the last part of the problem statement: predict. Until that part is answered, D-Wave is likely to remain nothing more than an interesting footnote to the history of computing.
I confess it’s a time of year that I enjoy immensely despite being frequently reminded by my friends in the industry - many of whom I have known for some twenty odd years - that we are getting older… I’d like to say wiser but maybe I should stick with older…
That notwithstanding, it’s always interesting to see what is going on, what’s coming back around again and to hear about the imminent demise of Moore’s law – plus ça change…. I can but hope that the weather is not as hot as last year and that I can savour new beers. On a more serious note, we are starting to see the promise of heterogeneous systems and the physical realisation of 64-bit cores. Many of those who know me will be aware of my positive stance on heterogeneity and utilising the right core for the right task, and my constant harping on about the state of HPC software…
Speaking of HPC software, I hope you all saw the recent NAG and Red Oak press release http://www.hpcwire.com/off-the-wire/nag-red-oak-announce-strategic-partnership/ and will therefore be flocking to the Red Oak/NAG (Booth 850) stand at ISC ‘14. We will be there dishing out sage advice shaped by a healthy dose of reality as well as, it is rumoured, free pens.
Have a great ISC and I hope to see you there but please take a minute to remember Hans Meuer, without whom there would be no ISC.