DSTL - Audio in. Alpha out.

dstl

TERMINAL

LIBRARY

The AI Psychosis: How Coding Agents Are Rewriting Software Engineering

Join

The AI Psychosis: How Coding Agents Are Rewriting Software Engineering

Mar 23, 2026

No Priors: AI, Machine Learning, Tech, & Startups

1:06:27

341K Views

THESIS

Software engineering fundamentally changed in December 2024 when coding agents flipped the ratio from 80% human-written code to 80% agent-delegated, and researchers who fail to remove themselves from the loop are now the bottleneck.

ASSET CLASS

SECULAR

CONVICTION

HIGH

TIME HORIZON

Already underway, accelerating

PREMISE

Human typing speed and cognitive bandwidth created an artificial ceiling on software development throughput

For decades, software engineering productivity was constrained by individual human capacity—typing speed, mental context switching, and the serial nature of human attention. Engineers could only work on one task at a time, review one codebase at a time, and process information at biological speeds. This created a massive gap between what was theoretically possible in software development and what could actually be executed. The entire industry operated as if they were not compute-bound, but they were actually severely talent-and-attention-bound.

MECHANISM

Parallel agentic delegation unlocks multiplicative researcher leverage

The December 2024 capability jump in coding agents created a fundamental workflow transformation where engineers now operate as orchestrators rather than implementers. The mechanism works through three channels: First, parallelization—engineers can now run multiple agent sessions simultaneously across different repositories, each handling 20-minute autonomous work cycles. Second, abstraction level shift—work happens in macro actions like entire new functionalities rather than individual lines of code. Third, the auto-research loop removes humans entirely from optimization cycles, allowing overnight hyperparameter tuning that discovers improvements human researchers missed despite two decades of experience. The binding constraint has shifted from compute access to human instruction quality and token throughput maximization.

OUTCOME

Recursive self-improvement becomes the dominant paradigm for AI development

The precise market result is a complete refactoring of how software and AI research organizations operate. Research teams become program markdown files describing roles and workflows that can themselves be optimized. Frontier labs are building automation where thousand-person research teams are actively automating themselves away. Open source potentially runs circles around closed labs by leveraging untrusted compute pools globally through blockchain-like verification systems. The demand for software paradoxically increases as its cost drops, but the nature of software engineering jobs transforms from code-writing to agent orchestration and instruction optimization.

NECESSARY CONDITION

Regulatory frameworks must remain permissive to innovation (avoiding the 'European' model) and open source development must remain unencumbered by downstream liability.

I kind of went from 80-20 of like you know to like 20-80 of writing code by myself versus just delegating to agents. And I don't even think it's 20-80 by now. I think it's a lot more than that. I don't think I've typed like a line of code probably since December basically.

02:31

RISK

Steel Man Counter-Thesis

The agent-driven productivity revolution may plateau far short of the transformative vision presented. Three structural constraints could limit its scope: First, reinforcement learning architectures fundamentally cannot improve capabilities in non-verifiable domains, meaning the 'jaggedness' problem is not a bug to be fixed but an intrinsic feature of how these systems learn. Models will continue excelling at code while remaining frozen at jokes, nuance, and judgment indefinitely. Second, the assumption that 'everything is skill issue' may be a cognitive trap - practitioners attributing failures to their own inadequate prompting rather than recognizing hard capability limits, leading to endless optimization of diminishing returns. Third, the vision of agents acting autonomously requires trust infrastructure that does not exist and may never exist at scale - the security, liability, and verification systems needed for agents to truly 'act on your behalf' without supervision represent a coordination problem as hard as the AI problem itself. The result could be a world where agents provide substantial but bounded productivity gains in specific verifiable domains while human bottlenecks persist everywhere else, making the '16 hours of manifesting will to agents' more like sophisticated autocomplete than genuine autonomy.

RISK 01

Agent Reliability and 'Jaggedness' Undermines Full Autonomy

THESIS

The thesis assumes agents can progressively replace human involvement in complex workflows. However, the speaker explicitly acknowledges that current AI agents exhibit severe 'jaggedness' - performing brilliantly in verifiable domains while failing unpredictably in softer, nuanced tasks. This inconsistency means agents still produce nonsensical outputs, get stuck in wrong loops, and waste compute on obvious problems. The fundamental architecture trained via reinforcement learning only improves in domains with clear rewards, leaving vast areas of capability frozen or unreliable. This jaggedness may not be a temporary limitation but a structural feature of how models are trained.

DEFENSE

The speaker acknowledges this limitation directly, describing frustration when agents fail and characterizing models as simultaneously brilliant and like a 10-year-old. He frames current limitations as potentially skill issues that can be overcome with better prompting, memory systems, and agent orchestration. However, his defense relies on assumed future improvement rather than demonstrated resolution.

RISK 02

Verifiable Metrics as a Prerequisite Limits Scope of Automation

THESIS

Auto research and autonomous agent loops only work when there are objective, easily-evaluated metrics. The speaker explicitly states that if you cannot evaluate, you cannot auto-research. This creates a fundamental ceiling: vast domains of human cognitive work - creativity, nuanced judgment, strategic decision-making, interpersonal dynamics, aesthetic choices - lack clean metrics and may remain resistant to full automation. The joke example illustrates this perfectly: models have not improved at telling jokes in years despite massive capability gains elsewhere because humor lacks verifiable rewards.

DEFENSE

The speaker directly acknowledges this constraint, noting that anything outside verifiable domains 'meanders' and that labs struggle with nuance and when to ask clarifying questions. He does not claim this will be solved soon, instead implicitly accepting it as a boundary condition for where automation applies.

RISK 03

Security, Privacy, and Trust Barriers Block Full Agent Integration

THESIS

The vision of agents controlling digital lives and acting autonomously requires giving them access to sensitive systems - email, calendar, financial accounts, security systems. The speaker admits he has not given his claw full access to his digital life due to security and privacy concerns, calling current systems 'rough around the edges.' If even an AI expert with deep understanding hesitates to grant full access, mainstream adoption faces a significant trust barrier that could persist regardless of capability improvements.

DEFENSE

The speaker mentions this concern in passing but provides no framework for how it gets resolved. He does not discuss emerging security architectures, sandboxing approaches, or trust frameworks. This remains an unexamined assumption that the barrier will simply come down over time without articulating the mechanism.

ASYMMETRIC SKEW

Upside scenario delivers 10-100x productivity gains in verifiable software domains with gradual expansion into adjacent areas. Downside scenario delivers 2-3x gains in narrow domains with persistent human bottlenecks, wasted investment in over-automated workflows, and potential security incidents from premature trust. The ratio suggests moderate upside with meaningful tail risk of disappointment relative to current expectations.

ALPHA

NOISE

The Consensus

The market believes we are in the early-to-middle stages of a gradual AI integration, where language models serve as increasingly capable assistants but human engineers and researchers remain the primary drivers of intellectual work. Software engineering roles will evolve but largely persist in recognizable form. AI research progress will continue to be human-directed, with researchers proposing hypotheses, designing experiments, and interpreting results. Open source AI will remain substantially behind closed frontier models, and physical world robotics and automation will develop roughly in parallel with digital AI capabilities.

The market's logic runs: AI capabilities are improving steadily; humans with AI tools are more productive than humans without them; the equilibrium is enhanced human workers using AI assistants; research progress requires human creativity and judgment; open source lags closed models by a meaningful margin (12-18 months); job displacement will be gradual and allow for adaptation.

SIGNAL

The Variant

Karpathy believes we have already crossed an irreversible threshold where human involvement in coding and research is becoming a bottleneck to be systematically removed. The shift from 80/20 human-to-agent coding to 20/80 (or more extreme) happened abruptly in December 2024, and most people have not recognized how dramatic this change is. The correct frame is not 'AI as tool' but 'AI as autonomous worker requiring supervision.' Auto-research—where models run experiments, interpret results, and iterate without human involvement—is not a future possibility but a present reality that works. Digital transformation will massively outpace physical world automation because bits are fundamentally easier to manipulate than atoms. The goal is to maximize token throughput, remove yourself from the loop entirely, and treat your role as arranging autonomous systems rather than doing the work yourself.

Karpathy's logic runs differently at every node. First, capability gains are not linear but involve phase transitions—something 'flipped' in December 2024. Second, the productivity gain is so large that the human becomes the constraint, not the contributor. Third, the equilibrium is not 'enhanced humans' but 'humans arranging autonomous systems and getting out of the way.' Fourth, research progress in verifiable domains can be fully automated now—auto-research found hyperparameter improvements Karpathy missed after two decades of manual tuning. Fifth, open source is now 6-8 months behind and converging, which is close enough to matter for most use cases. Sixth, the digital economy will transform at 'speed of light' while physical automation lags by years, creating a specific temporal structure to disruption that differs from gradual uniform change.

SOURCE OF THE EDGE

Karpathy's edge is genuine but domain-specific. His credibility comes from three sources: (1) direct operational experience at the frontier—he built Tesla's autopilot system, worked at OpenAI, and has been hands-on with these tools for months in an intensive daily workflow; (2) a two-decade track record in the specific domain he is discussing (neural network training, LLM optimization); and (3) concrete, falsifiable demonstrations—auto-research found improvements to a codebase he had already optimized, which he openly admits he missed. The edge is real for his claims about coding agents and research automation. However, his broader claims about societal transformation, job market dynamics, and long-term trajectories are extrapolations from his technical experience, not positions where he has a structural informational advantage. He explicitly acknowledges this limitation when discussing job market forecasting ('I'm not professionally doing that really and I think it's a job of economists to do properly'). His edge should be weighted heavily on the technical mechanics of what agents can do now, weighted moderately on near-term implications for software engineering and research workflows, and weighted lightly on macroeconomic or societal predictions.

CONVICTION DETECTED

• I don't think I've typed like a line of code probably since December basically • A normal person actually realizes that this happened or how dramatic it was like literally • This is like an extremely large change • Everything is skill issue • This should be free like in a year or two or three. There's no vibe coding involved. This is trivial. This is table stakes. • I can't believe I just typed in like 'Can you find my Sonos?' And suddenly it's playing music. • Micro GPT is like my end of my obsession. It's the 200 lines. I thought about this for a long time. This is the solution. Trust me, it can't get simpler. • I'm not explaining to people anymore. I'm explaining it to agents.

HEDGE DETECTED

• I kind of feel like I was just in this perpetual I still am often in this state of AI psychosis • It all kind of feels like skill issue when it doesn't work to some extent • I still am often in this state • I kind of feel like my judgment will inevitably start to drift • It's kind of really hard to forecast to be honest • I don't actually think that's true but it's kind of interesting to think about • It's a very loaded question a little bit • I think it's really hard to tell because again like the job market is extremely diverse and I think the answers will probably vary • I don't have something that I'm like super happy with just yet • The whole thing still doesn't it's still kind of like bursting at the seams a little bit and there's cracks and it doesn't fully work The ratio reveals a telling pattern: Karpathy hedges extensively on predictions about the future, societal impact, and areas outside his direct technical experience, but speaks with near-absolute certainty about what he has personally observed and done (the December shift, auto-research results, his coding workflow). This is the signature of genuine operational confidence combined with intellectual honesty about extrapolation limits. The hedging is not performative uncertainty—it is calibrated to domain. His high-conviction claims about the present state of coding agents and research automation should be taken seriously precisely because he does not extend the same certainty to domains where he lacks direct evidence.