Cerebras outperforms GPUs and breaks the record for largest AI models trained on a single device

Cerebras, the corporate behind the world’s largest accelerator chip in existence, the CS-2 wafer scaling engine, has simply introduced a milestone: the coaching of the world’s largest NLP (Pure Language Processing) AI mannequin on a single system. Whereas that in itself may imply plenty of issues (it would not be a lot of a file to interrupt if the earlier largest mannequin was educated on a smartwatch, for instance), the Cerebras-trained AI mannequin amounted to an astounding, and unprecedented , 20 billion parameters. All with out the workload having to scale throughout a number of accelerators. That is sufficient to suit the most recent web sensation, the image-from-text generator, 12 billion OpenAI DALL-E parameters (opens in a brand new tab).

Crucial a part of Cerebras’ achievement is the discount of software program and infrastructure complexity necessities. After all, a single CS-2 system is akin to a supercomputer by itself. Wafer Scale Engine-2 – which, because the identify suggests, is etched onto a single 7nm wafer, sometimes sufficient for lots of of typical chips – contains a staggering 2.6 trillion 7nm transistors, 850,000 cores, and 40GB of cache onboard a bundle that consumes round 15kW.

Cerebras Wafer Scale Engine

Wafer Scale Engine-2 from Cerebras in all its wafer-sized glory. (Picture credit score: Cerebras)

Retaining as much as 20 billion parameter NLP fashions on a single chip considerably reduces overhead in coaching prices throughout 1000’s of GPUs (and related {hardware} and scale necessities) whereas eliminating the technical difficulties of partitioning fashions throughout them . Cerebras says that is “one of the vital painful elements of the NLP workload”, typically “taking months to finish”.

It is a customized downside that is distinctive not solely to every neural community being processed, the specs of every GPU, and the community that ties all of it collectively, parts that have to be resolved forward of time earlier than the primary coaching begins. And it can’t be ported between programs.

CS-2 brain

Cerebras CS-2 is a self-contained supercomputing cluster that features not solely the Wafer Scale Engine-2, but additionally all related energy, reminiscence, and storage subsystems. (Picture credit score: Cerebras)

Uncooked numbers could make Cerebras’ achievement appear underwhelming: OpenAI’s GPT-3, an NLP mannequin that may write complete articles that can typically mislead human readers, has a staggering 175 billion parameters. DeepMind’s Gopher, launched late final 12 months, raises that determine to 280 billion. The brains of Google Mind have even introduced the formation of a mannequin of greater than a billion parameters, the change transformer.

Leave a Comment