Arm doubled down on high-performance computing today with the launch of the Neoverse V1 and N2 processor architectures. The new chip designs build on the success of the company's Neoverse N1 and E1, which spawned chips like Ampere's Altra, Marvel's ThunderX2, and Amazon's Graviton2.
According to Arm, these chips are expected to deliver 40% to 50% higher performance than the previous generation Neoverse N1 while consuming the same amount of power.
With Neoverse N1 "we've had quite a bit of success, and we now have four out of the top seven hyperscalers talking publicly about their use of Arm," said Chris Bergey, SVP and GM of Arm's infrastructure line, in an interview with SDxCentral. "We've got a software ecosystem that's growing week over week, and we've been able to talk about the fact that now the No. 1 supercomputer in the world, the Fugaku system, is based on Arm."
Arm set out to deliver even greater performance gains with its next generation of Neoverse chip designs.
Peek Performance Per ThreadArm is positioning the V1 for CPU-bound workloads where single-threaded performance is the key factor.
"The thing that we've seen with N1 is that there was interest in Arm, but it was like 'hey you're just not there from a performance point of view,'" said Bergey. "Not only have we closed the performance gap for some of these workloads, you can actually see a considerable uplift."
Building on this, Arm developed a chip that put performance first at the expense of power efficiency.
"Really the design principle here was really pushing performance. We gave up a little on power performance. ... We really wanted to get that performance per thread up," said Bergey.
The V1 also added support for scalable vector extensions (SVE), making it ideal for high-performance compute (HPC), cloud, and machine learning applications. SVE enables the execution of single instruction, multiple data integer, bfloat16, and floating-point instructions on wider vector units using a software programming model that's agnostic to the width of the unit.
"We really think it's SVE done right," he said. "It allows for full control over voltage frequency transitions. The idea is that you should be able to stay at speed and not have to make any kind of frequency switch, and it's designed to transition seamlessly between different register widths."
This is important as it allows the chip to transition from wider SVE instructions and back to standard instructions without disruptions, Bergey explained.
A Second-Gen NeoverseWhile the V1 is designed for maximum single-threaded CPU performance, the N2 — the follow up to the company's original data-center architecture — was built specifically for scale-out.
Now that Arm is competitive on performance with more traditional silicon from Intel, AMD, or IBM Power, Bergey said the architecture's power efficiency is becoming a key differentiator. Rather than engineering for a lower thermal design power (TDP,) Arm set out to increase the performance under the same power envelope. The result was a 40% increase in performance at a given wattage.
"At any fixed TDP, that's a considerable uplift," Bergey said.
Arm envisions the Neoverse N2 for use in SmartNICs, cloud, enterprise networking, and at the edge where per-core performance is secondary to core count or energy efficiency.
The Neoverse N2 will arrive sometime in 2021.
With V1 and N2, "you can kind of pick what you think you want to optimize," Bergey said. "If you want to optimize for number of cores and a fixed TDP, then you go N2. If you go 'no, I need the max per-thread performance,' you go with V1."
Arm's Next StepsLooking to the future, Arm plans to expand its addressable market through continued investment in technologies like Compute Express Link (CXL) and Cache Coherent Interconnect for Accelerators (CCIX) in order to enable ultra-low latency fabrics and desegregated computing platforms.
Arm will also push the development of Project Cassini, which aims to remove barriers to software development through the production of new standards, platform security, and reference implementations. The idea is to create software that "just works" on Arm.
The $40B QuestionThe question remains how Nvidia's planned $40 billion acquisition of Arm will affect the company's roadmap moving forward.
While Neoverse V1 and N2 will launch long before the acquisition — slated to take approximately 18 months pending regulatory approval in the U.S., United Kingdom, European Union, and China— is complete, Nvidia's influence on future chip designs is all but certain.
Even if Nvidia leaves the architectural design to Arm, Nvidia plans to license its networking and GPU intellectual property through Arm.