• Home
  • About
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Manage Email Delivery
  • NextGenInfra.io
No Result
View All Result
Converge Digest
Friday, June 12, 2026
  • Home
  • About
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Manage Email Delivery
  • NextGenInfra.io
No Result
View All Result
Converge Digest
No Result
View All Result

Home » Cerebras Launches AI Inference Solution 20x Faster Than GPUs”

Cerebras Launches AI Inference Solution 20x Faster Than GPUs”

August 27, 2024
in All
A A

Cerebras Systems introduced a new AI inference solution, claiming it to be the fastest in the world. The Cerebras Inference platform delivers 1,800 tokens per second for the Llama 3.1 8B model and 450 tokens per second for the Llama 3.1 70B model, outperforming NVIDIA GPU-based solutions by 20 times in hyperscale cloud environments. The solution is priced competitively at just 10 cents per million tokens, offering a significant cost advantage over existing GPU options.

Powered by the Cerebras CS-3 system and the Wafer Scale Engine 3 (WSE-3) processor, the platform promises to maintain state-of-the-art accuracy without sacrificing speed, thanks to its 16-bit domain inference. The WSE-3 provides 7,000 times more memory bandwidth than the NVIDIA H100, addressing one of the core challenges of generative AI. Cerebras Inference is available across three pricing tiers—Free, Developer, and Enterprise—catering to different user needs, from basic access to custom enterprise solutions.

• Performance: 20x faster than GPU-based solutions, delivering 1,800 tokens per second on Llama 3.1 8B and 450 tokens per second on Llama 3.1 70B.

• Pricing: Starting at 10 cents per million tokens, significantly lower than GPU alternatives.

• Technology: Powered by the WSE-3 processor with 7,000x more memory bandwidth than NVIDIA H100.

• Availability: Offered in Free, Developer, and Enterprise tiers with varying levels of access and support.

“Speed and scale change everything,” said Kim Branson, SVP of AI/ML at GlaxoSmithKline, an early Cerebras customer.

“LiveKit is excited to partner with Cerebras to help developers build the next generation of multimodal AI applications. Combining Cerebras’ best-in-class compute and SoTA models with LiveKit’s global edge network, developers can now create voice and video-based AI experiences with ultra-low latency and more human-like characteristics,” said Russell D’sa, CEO and Co-Founder of LiveKit.

“For traditional search engines, we know that lower latencies drive higher user engagement and that instant results have changed the way people interact with search and with the internet. At Perplexity, we believe ultra-fast inference speeds like what Cerebras is demonstrating can have a similar unlock for user interaction with the future of search – intelligent answer engines,” said Denis Yarats, CTO and co-founder, Perplexity.

Cerebras Wafer Scale Engine packs 1.2 trillion transistors
Tags: CerebrasHot Chips
ShareTweetShareSummarizeSummarize
Previous Post

Broadcom Enhances VeloCloud Software-Defined Edge with FWA, Satellite Acces

Next Post

Intel Shows its Optical Compute Interconnect (OCI) Chiplet at Hot Chips

Jim Carroll

Jim Carroll

Editor and Publisher, Converge! Network Digest, Optical Networks Daily - Covering the full stack of network convergence from Silicon Valley

Related Posts

Financials

Cerebras Shares Surge 68% in Nasdaq Debut

May 14, 2026
Financials

Cerebras Systems Launches IPO Roadshow

May 4, 2026
Semiconductors

Cerebras Files for IPO with Wafer-Scale Alternative to GPUs

April 19, 2026
All

Cerebras Raises $1 Billion Series H at $23B Valuation

February 5, 2026
Semiconductors

OpenAI and Cerebras sign Multi-year deal for 750 MW AI Inference Rollout

January 14, 2026
All

Hot Chips 2025: Celestial AI CTO Details In-Die Optical I/O

August 29, 2025
Next Post

Intel Shows its Optical Compute Interconnect (OCI) Chiplet at Hot Chips

Categories

  • 5G / 6G / Wi-Fi
  • AI Infrastructure
  • All
  • Automotive Networking
  • Blueprints
  • Clouds and Carriers
  • Data Centers
  • Enterprise
  • Explainer
  • Feature
  • Financials
  • Last Mile / Middle Mile
  • Legal / Regulatory
  • Optical
  • Quantum
  • Research
  • Security
  • Semiconductors
  • Space
  • Start-ups
  • Subsea
  • Sustainability
  • Video
  • Webinars

Archives

Tags

5G All AT&T Australia AWS Blueprint columns BroadbandWireless Broadcom China Ciena Cisco Data Centers Dell'Oro Ericsson FCC Financial Financials Huawei Infinera Intel Japan Juniper Last Mile Last Mille LTE Mergers and Acquisitions Mobile NFV Nokia Optical Packet Systems PacketVoice People Regulatory Satellite SDN Service Providers Silicon Silicon Valley StandardsWatch Storage TTP UK Verizon Wi-Fi
Converge Digest

A private dossier for networking and telecoms

Follow Us

  • Home
  • About
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Manage Email Delivery
  • NextGenInfra.io

© 2026 Converge Digest - A private dossier for networking and telecoms.

No Result
View All Result
  • Home
  • About
  • Events Calendar
  • Blueprint Guidelines
  • Privacy Policy
  • Manage Email Delivery
  • NextGenInfra.io

© 2026 Converge Digest - A private dossier for networking and telecoms.

This website uses cookies. By continuing to use this website you are giving consent to cookies being used. Visit our Privacy and Cookie Policy.
Go to mobile version