dstl

TERMINAL

TERMINAL

LIBRARY

LIBRARY

//

Cerebras CEO, Andrew Feldman on Compute Power

Cerebras CEO, Andrew Feldman on Compute Power

Cerebras CEO, Andrew Feldman on Compute Power

Jan 23, 2026

Jan 23, 2026

All-In Podcast

All-In Podcast

39:58

39:58

94K Views

94K Views

THESIS

Wafer-scale architecture overcomes the memory bandwidth constraints of traditional GPUs, unlocking millisecond-latency inference that transforms AI utility and drives exponential demand.

Wafer-scale architecture overcomes the memory bandwidth constraints of traditional GPUs, unlocking millisecond-latency inference that transforms AI utility and drives exponential demand.

Wafer-scale architecture overcomes the memory bandwidth constraints of traditional GPUs, unlocking millisecond-latency inference that transforms AI utility and drives exponential demand.

ASSET CLASS

ASSET CLASS

SECULAR

SECULAR

CONVICTION

CONVICTION

HIGH

HIGH

TIME HORIZON

TIME HORIZON

5-10 Years (Secular Growth Phase)

5-10 Years (Secular Growth Phase)

01

01

//

PREMISE

PREMISE

The Memory Wall Bottleneck

The Memory Wall Bottleneck

Traditional GPU architectures are constrained by the physical separation of compute and memory, creating significant latency (the 'memory wall'). This bottleneck limits inference speed, forcing users to wait seconds for results and preventing seamless, real-time agentic AI workflows.

Traditional GPU architectures are constrained by the physical separation of compute and memory, creating significant latency (the 'memory wall'). This bottleneck limits inference speed, forcing users to wait seconds for results and preventing seamless, real-time agentic AI workflows.

02

02

//

MECHANISM

MECHANISM

Wafer-Scale Integration

Wafer-Scale Integration

Cerebras replaces the standard 'postage stamp' sized chip with a single Wafer Scale Engine (WSE) that is 56 times larger than a standard GPU and contains 4 trillion transistors. This architecture integrates massive compute and high-speed memory on a single substrate, eliminating off-chip communication delays and solving the memory bandwidth problem.

Cerebras replaces the standard 'postage stamp' sized chip with a single Wafer Scale Engine (WSE) that is 56 times larger than a standard GPU and contains 4 trillion transistors. This architecture integrates massive compute and high-speed memory on a single substrate, eliminating off-chip communication delays and solving the memory bandwidth problem.

03

03

//

OUTCOME

OUTCOME

Qualitative Shift in User Experience

Qualitative Shift in User Experience

By reducing inference times from seconds to milliseconds, the technology shifts the user experience from a transactional query-response model to an instantaneous flow. This speed reduction (latency elimination) is the primary driver for increased usage volume, unlocking complex, multi-step agentic workflows that were previously impractical.

By reducing inference times from seconds to milliseconds, the technology shifts the user experience from a transactional query-response model to an instantaneous flow. This speed reduction (latency elimination) is the primary driver for increased usage volume, unlocking complex, multi-step agentic workflows that were previously impractical.

//

NECESSARY CONDITION

Inference demand must continue to scale towards complex, multi-step agentic workflows rather than simple, single-shot queries.

When Netflix got fast, when the internet got fast, Netflix didn't get better at delivering DVDs. Netflix became a movie studio... It wasn't a change in degree. It was a fundamental change in kind. And what speed does for AI is the same.

When Netflix got fast, when the internet got fast, Netflix didn't get better at delivering DVDs. Netflix became a movie studio... It wasn't a change in degree. It was a fundamental change in kind. And what speed does for AI is the same.

46:11

RISK

Steel Man Counter-Thesis

While the demand for millisecond-latency inference is theoretically exponential, the physical reality of a 'decrepit' US power grid and an 18-month memory supply chain freeze creates a hard ceiling on deployment. The industry may face a severe 'digestion period' where infrastructure lag prevents the realization of the wafer-scale speed advantage, rendering the theoretical utility gains inaccessible in the near term.

//

RISK 01

RISK 01

Systemic Memory Supply Chain Bottleneck

Systemic Memory Supply Chain Bottleneck

THESIS

The speaker identifies a 'massive memory shortage' driven by panic-buying, where lead times have extended from 6 to 18 months. This 'bullwhip effect' in the supply chain creates high prices and logistical confusion, potentially stalling the production and deployment of hardware required to meet inference demand.

The speaker identifies a 'massive memory shortage' driven by panic-buying, where lead times have extended from 6 to 18 months. This 'bullwhip effect' in the supply chain creates high prices and logistical confusion, potentially stalling the production and deployment of hardware required to meet inference demand.

DEFENSE

The speaker frames this as a known cyclical market phenomenon that occurs every 6-8 years and anticipates an 18-month digestion period.

The speaker frames this as a known cyclical market phenomenon that occurs every 6-8 years and anticipates an 18-month digestion period.

//

RISK 02

RISK 02

Power Infrastructure Capacity Ceiling

Power Infrastructure Capacity Ceiling

THESIS

The 'limiting constraint' for data centers has shifted from space to power availability. The US power grid is described as 'decrepit' due to 50 years of underinvestment, creating a physical barrier to deploying the massive gigawatt-scale clusters required for the thesis to play out.

The 'limiting constraint' for data centers has shifted from space to power availability. The US power grid is described as 'decrepit' due to 50 years of underinvestment, creating a physical barrier to deploying the massive gigawatt-scale clusters required for the thesis to play out.

DEFENSE

Cerebras is circumventing the grid by sourcing alternative energy, such as 'flare-off gas' from petroleum mining, hydroelectric power, and locating facilities in energy-rich regions like West Texas, Wyoming, and the Nordics.

Cerebras is circumventing the grid by sourcing alternative energy, such as 'flare-off gas' from petroleum mining, hydroelectric power, and locating facilities in energy-rich regions like West Texas, Wyoming, and the Nordics.

//

RISK 03

RISK 03

Geopolitical Asymmetry in Software/Open Source

Geopolitical Asymmetry in Software/Open Source

THESIS

While the US maintains a lead in chip manufacturing, the speaker admits that China has 'pushed ahead' in the open-source model category. If the dominant AI software ecosystem shifts toward Chinese open-source standards, superior US hardware could face integration or adoption friction.

While the US maintains a lead in chip manufacturing, the speaker admits that China has 'pushed ahead' in the open-source model category. If the dominant AI software ecosystem shifts toward Chinese open-source standards, superior US hardware could face integration or adoption friction.

DEFENSE

The speaker acknowledges the threat and the 'real politic' of adversaries but relies on the assumption that US chip superiority ('getting better at the game by playing it') is sufficient to maintain the lead.

The speaker acknowledges the threat and the 'real politic' of adversaries but relies on the assumption that US chip superiority ('getting better at the game by playing it') is sufficient to maintain the lead.

//

ASYMMETRIC SKEW

Asymmetric Upside (Secular) bounded by significant Short-Term Physical Execution Risk (Grid/Supply Chain).

ALPHA

NOISE

The Consensus

The AI infrastructure build-out is defined by the shortage of discrete GPU units (e.g., H100s), with value primarily driven by Large Model Training. There is growing concern that the industry may be 'overbuilding' relative to current enterprise adoption.

Speed is a linear efficiency metric; faster chips allow users to wait less time for the same answers (Change in Degree).

SIGNAL

The Variant

The defining metric has shifted from 'Units' to 'Power' (MegaWatts), and the driver has flipped from 'Training' to 'Inference.' We are in the earliest stages because the shift to 'Agentic' workflows (machines recursively querying machines) will cause inference demand to explode exponentially, making current overbuilding concerns moot.

Speed is a qualitative transformer; eliminating latency enables a 'Change in Kind' (analogous to broadband enabling Netflix streaming vs. DVD mail). Millisecond latency is not just faster—it is the necessary condition for agentic coding and research workflows, which fail on traditional GPU architectures due to compounding delays in recursive query cascades.

SOURCE OF THE EDGE

First Principles Engineering (Architectural foresight on the 'Memory Wall' 7 years prior to ChatGPT) & Empirical feedback from deploying the world's largest chip.

//

CONVICTION DETECTED

• Blisteringly fast • Without question • Literally don't need them • There is zero latency • 100% it's coming • Absolutely ran the table

//

HEDGE DETECTED

• It's hard to predict • Plus or minus a year • I'm not sure I agree... but it's reasonable to disagree with me there • I think maybe some valid concerns around jobs