📊 Full opportunity report: The Model Is Only 10%: The Real Lesson of the New SDLC on ThorstenMeyerAI.com — validation score, market gap, and execution plan.

TL;DR

A recent whitepaper emphasizes that AI models are only a small part of the software lifecycle. The focus is shifting toward harnessing, verification, and context engineering, which are more critical for effective AI deployment.

A new whitepaper from Google, authored by Addy Osmani, Shubham Saboo, and Sokratis Kartakis, states that the model constitutes only about 10% of an AI system’s behavior. The report argues that the real focus should be on harnessing, configuration, and verification, which account for the remaining 90%. This shift has significant implications for how organizations develop and deploy AI systems, emphasizing the importance of system design over model selection alone.

The whitepaper, titled The New SDLC With Vibe Coding, challenges the common perception that upgrading to the latest AI model automatically improves system performance. Instead, it highlights that most failures and inefficiencies stem from configuration errors, missing tools, or poor context management. Experiments cited in the paper show that changing the harness—such as prompts, tools, and middleware—can drastically improve AI performance, even with the same underlying model. For example, one team moved a coding agent from outside the Top 30 to the Top 5 on a benchmark by tweaking only the harness, not the model.

The authors advocate for a disciplined approach called agentic engineering, where AI is embedded within a framework of verification, structured context, and continuous oversight. This approach contrasts sharply with vibe coding, which involves minimal review and quick prompts, often leading to higher long-term costs and security risks. The paper emphasizes that costs are driven more by how AI is configured and used than by the model itself. High upfront investments in system design and context engineering yield lower marginal costs over time, making AI deployment more sustainable and secure.

At a glance
reportWhen: published March 2026
The developmentThe new Google whitepaper highlights that only 10% of AI system behavior is determined by the model itself, with the rest governed by configuration and context engineering.
The Model Is Only 10% — The New SDLC With Vibe Coding
AI Dispatch · Field Notes
Google · Osmani, Saboo & Kartakis · May 2026

The model is only 10%

A Google whitepaper argues software’s biggest shift is from writing code to expressing intent. Its sharpest claim: the model you obsess over is the smallest part of the system — the scaffolding around it does the real work.

A spectrum, not a binary — the differentiator is how outputs get verified
Vibe Coding
Casual prompts · “does it seem to work?” · disposable code · high risk
Structured AI-Assisted
Detailed prompts + constraints · manual testing · features in real codebases
Agentic Engineering
Formal specs · automated tests + evals + CI gates · production scale · low risk
Tests verify the deterministic; evals verify the rest. Without both, it’s vibe coding — however clever the prompt.
The idea worth building your strategy around
Agent = Model + Harness
~10%
HARNESS — prompts · tools · context · hooks · sandboxes · observability
MODEL~90% IS YOUR SURFACE AREA, NOT THE PROVIDER’S
Outside Top 30 → Top 5 on Terminal Bench 2.0 by changing only the harness — same model.
“Most agent failures, examined honestly, are configuration failures” — a missing tool, a vague rule, a noisy context.
The economics: it’s a token-cost problem (CapEx vs OpEx)
Vibe Coding
Low CapEx · High OpEx
Looks free, hides debt: token burn (fix-it loops), maintenance tax (AI spaghetti), security remediation. Crosses over to 3–10× more per feature.
Agentic Engineering
High CapEx · Low OpEx
Pay upfront (specs, evals, context), then ship cheaply. Levers: context engineering for first-pass success + intelligent model routing — cheap models for the easy work.
85%
of devs use AI coding agents (51% daily)
41%
of all new code is AI-generated
~90%
of agent behavior is the harness, not the model
+19%
longer on some tasks (METR) — verification is the cost
The read

The clearest map yet of how serious AI development works — and mostly tool-agnostic. But it’s a Google funnel: the concepts are neutral, the on-ramps point to Gemini, Jules & the ADK. If the harness is 90% and it’s yours, your moat and your costs both live there — so own your scaffolding, route across models, and remember: AI amplifies whatever engineering culture it lands in.

Source: Osmani, Saboo & Kartakis, “The New SDLC With Vibe Coding,” Google (May 2026). Figures are the paper’s own, incl. METR & LangChain. Analysis is the author’s.
thorstenmeyerai.com

Why System Design and Configuration Matter More Than the Model

This whitepaper shifts the AI development paradigm, highlighting that system configuration, harnessing, and verification are more impactful than simply upgrading models. For organizations, this means that investing in system architecture, tools, and context management can lead to better performance, lower costs, and enhanced security. It challenges teams to focus on building durable, configurable frameworks rather than chasing the latest model versions, which often provide diminishing returns without proper configuration.

LEAN PROGRAMMING FOR FORMAL SOFTWARE VERIFICATION: Mathematical proof systems and logical frameworks for verified computation

LEAN PROGRAMMING FOR FORMAL SOFTWARE VERIFICATION: Mathematical proof systems and logical frameworks for verified computation

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Shift Toward System-Centric AI Development

As of early 2026, AI adoption is widespread, with 85% of professional developers using AI coding agents regularly, and 41% generating most of their code via AI. Previously, the emphasis was on acquiring the most advanced models; now, the focus is on how those models are integrated, configured, and verified within larger systems. The whitepaper builds on ongoing discussions about the total cost of ownership in AI, emphasizing that the model itself is only a small component of the overall system performance and security.

This perspective aligns with recent experiments demonstrating that tweaks to prompts, tools, and context loading can significantly outperform simply upgrading to newer models. The authors argue that the real skill lies in system design and context engineering, which has been historically undervalued.

“The model constitutes only about 10% of an AI system’s behavior; the rest is determined by harnessing, configuration, and verification.”

— Addy Osmani

YAML Made Simple: A Beginner’s Guide to Configuration and Data Structuring

YAML Made Simple: A Beginner’s Guide to Configuration and Data Structuring

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What Aspects of System Configuration Are Still Unclear

While the whitepaper provides strong evidence that configuration and harnessing are critical, it remains unclear how organizations can best scale these practices across complex systems. Specific methodologies for systematically measuring and improving harness design are still evolving. Additionally, the long-term impact of this shift on AI model development cycles and industry standards has yet to be fully established.

Context Engineering for Claude: A Practitioner's Guide to CLAUDE.md, Memory Tools, and Three-Layer Workflows for Solopreneurs, Freelancers, and Product Managers

Context Engineering for Claude: A Practitioner's Guide to CLAUDE.md, Memory Tools, and Three-Layer Workflows for Solopreneurs, Freelancers, and Product Managers

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Next Steps for AI Development and System Optimization

Organizations are likely to invest more in system architecture, tooling, and context engineering practices. Expect to see increased focus on developing frameworks for verifying and maintaining AI systems, along with new standards for harness design. Researchers and practitioners will explore best practices for scalable context management and automated configuration, shaping the future of AI engineering. Further studies are anticipated to quantify the cost savings and performance improvements achievable through these approaches.

AI Model Validation & Testing: Ensuring Reliable AI Systems — Bias Testing, Robustness Evaluation & Regulatory Compliance (AI Compliance Toolkit)

AI Model Validation & Testing: Ensuring Reliable AI Systems — Bias Testing, Robustness Evaluation & Regulatory Compliance (AI Compliance Toolkit)

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Why is the model only 10% of the AI system’s behavior?

The whitepaper explains that most of an AI system’s behavior depends on how it is configured, harnessed, and verified, not just the underlying model. The model provides a base, but the surrounding system determines how effectively it performs.

How can organizations improve their AI systems based on this insight?

By investing in system design, including tools, prompts, context management, and verification processes, organizations can significantly enhance AI performance without always needing the latest model upgrades.

Does this mean model upgrades are no longer important?

Model upgrades remain valuable but are now seen as one part of a broader system strategy. The whitepaper emphasizes that system configuration and harness design often have a larger impact on results.

What are the risks of focusing too much on system configuration?

Over-reliance on configuration without understanding foundational model capabilities could lead to security vulnerabilities or maintenance challenges. Balanced development that includes ongoing model evaluation is recommended.

What is agentic engineering?

Agentic engineering involves embedding AI within a framework of structured verification, tools, and context management, enabling more reliable and scalable AI deployment.

Source: ThorstenMeyerAI.com

You May Also Like

The labor share. Is value really moving from labor to capital? The data isn’t on anyone’s side yet.

New data shows mixed signals about whether AI is shifting value from labor to capital, with stable aggregate shares but rising marginal displacement.

$965B and Climbing: Anthropic’s Series H Is Really a Compute Bet

Anthropic announced a $65B Series H funding round at a $965B valuation, emphasizing a focus on compute capacity over valuation growth, signaling a major infrastructure investment.

Technology and Privacy: Are Your Gadgets Listening Too Much?

Nurturing awareness of device listening habits reveals hidden privacy risks that could affect your digital security.

Saturation. The ten-essay framework, closed.

The ten-essay framework on European sovereign AI has reached a natural saturation point, with no further structural insights expected before key 2026 deadlines.