📊 Full opportunity report: Engineering Is Automated. Research Is the Residual. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
AI systems are now capable of automating most engineering tasks in AI development, with some benchmarks approaching saturation. However, AI’s ability to fully automate research remains uncertain, leaving research as the residual human activity.
Recent advances in AI capability demonstrate that engineering tasks in AI research and development are increasingly automated, with some benchmarks nearing full saturation. Meanwhile, the automation of research activities remains incomplete, leaving research as the residual activity requiring human insight. This shift has implications for the future of AI innovation and the roles of human researchers.
Multiple independent benchmarks—CORE-Bench, MLE-Bench, and kernel design—show AI systems progressing rapidly toward automating core engineering skills relevant to AI R&D. For example, CORE-Bench, which measures the ability to reproduce research papers, reached 95.5% success in December 2025, with its author declaring it ‘solved.’ Similarly, MLE-Bench, evaluating performance on Kaggle competitions, hit 64.4% in February 2026, approaching mid-tier human performance. These benchmarks indicate that automating the engineering side of AI research is now largely a solved engineering problem, with the bottleneck shifting from capability to application.
In contrast, the capacity of AI to automate research activities—such as hypothesis generation, experimental design, and creative problem-solving—remains less certain. While some evidence suggests progress, the structural question remains whether research itself is reducible to engineering at scale, or if it requires fundamentally different, human-centric skills. The evidence base is growing, but definitive conclusions are still pending.
Engineering is automated.
Research is the residual.
Six skill benchmarks. Edison’s framing. The question Clark leaves open is whether research is just engineering at scale.
Jack Clark’s Import AI #455 catalogs six benchmarks measuring AI capability on AI R&D tasks and concludes “AI can today automate vast swatches, perhaps the entirety, of AI engineering.” The residual question is research. The structural read on the residual: it may not be a permanent moat.
Six skills. One trajectory.
Clark catalogs six benchmarks measuring AI capability on AI R&D-relevant tasks. Each individual benchmark could be noise. Six benchmarks moving together is a curve. The pattern is the cascade observed across the broader Clark series — visible here in the specific R&D-skill domain.

AI Tools for Finance and Accounting Professionals: Automate Tasks, Save Hours, Work Smarter
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three data points. Mixed signal.
Clark provides three data points on the creative-spark question. Yes-evidence: Erdős-1051, centaur math discovery, sporadic Move-37-style moments. No-evidence: low yield, framing dependence, absence of acceleration. The mixed signal is the honest read.
The data supports two readings. Pessimistic: rare moments suggest creative insight is qualitatively distinct from engineering work. Optimistic: rare moments are an artifact of low-volume exploration; more shots on goal yields more discoveries. Both readings are consistent with Clark’s “vast swatches, perhaps the entirety” claim. They differ on the residual.

AI Engineering: Building Applications with Foundation Models
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five dimensions Clark gestures at but leaves underdeveloped.
Clark’s section is rigorous on the empirical evidence. Five strategic dimensions matter for the institutional response that the Clark series synthesis argues is structurally inadequate.

AI Workflow Tools for Researchers & Analysts: Automating Literature Reviews, Summaries, and Hypothesis Generation with ChatGPT, Claude, and Perplexity
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Two readings. Different equilibria.
The structural question Clark leaves open: is research a permanent moat that bounds automated AI R&D, or is it engineering at scale that dissolves with more shots on goal? Both readings are consistent with the current data. They differ by orders of magnitude in consequences.
Productivity multiplier years
Recursive loop operational

The AI Workbench: Real Workshop Projects With Artificial Intelligence – How to Plan, Design & Document
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Five audiences. Asymmetric cost of being wrong.
The institutional response should not bet on inspiration being a permanent moat. If the distinction holds, capacity built is still useful. If it closes, capacity is necessary. Asymmetric cost-of-being-wrong points toward building now.
IN INDUSTRY
IN ACADEMIA
POLICYMAKERS
INVESTORS
EVERYONE ELSE
Engineering is automated. The residual is the question. The institutional response should not bet on inspiration being a permanent moat.
Implications of AI Automating Engineering Tasks
The rapid automation of engineering tasks in AI R&D suggests that the traditional bottleneck in AI development is shifting. Organizations may soon see a significant reduction in the cost and time required to reproduce and test research ideas. However, the residual nature of research implies that human creativity and insight remain crucial, at least for now. This transition could reshape the roles of researchers, potentially leading to a focus on strategic and innovative aspects rather than routine engineering.
Progress in AI Capabilities and Benchmark Saturation
Recent months have seen a cascade of benchmark results indicating AI’s increasing proficiency in core engineering skills. For example, the CORE-Bench, which assesses research reproduction capabilities, improved from 21.5% in September 2024 to 95.5% by December 2025. Similarly, the MLE-Bench, measuring performance on Kaggle competitions, advanced from 16.9% to 64.4% over the same period. These benchmarks are designed to evaluate specific skills like reproducing research, optimizing kernels, and solving competitions, and all are approaching saturation. This pattern suggests that AI’s engineering capabilities are becoming nearly fully automated, with the current limitations being primarily measurement or application-specific rather than fundamental.
Meanwhile, the question of whether AI can automate research—such as hypothesis formulation, experimental design, and scientific discovery—remains open. Some researchers argue that research may itself be a form of scaled engineering, which AI could eventually master, while others believe it fundamentally requires human insight. The ongoing development of AI tools for generating research ideas and designing experiments hints at a possible future where research becomes increasingly automated, but this remains a developing area.
“The structural read is that research may itself be engineering at scale — in which case the residual closes faster than Clark’s framing implies.”
— Thorsten Meyer
Unresolved Questions About AI-Driven Research Automation
It is still unclear whether AI can fully automate scientific research activities such as hypothesis generation, experimental design, and creative problem-solving. While benchmarks show progress in engineering tasks, the structural nature of research as potentially distinct from engineering means that complete automation may require breakthroughs not yet achieved. The timeline for this transition remains uncertain, and some experts caution that human insight may always play a role.
Next Steps in Monitoring AI’s Research Capabilities
Researchers and organizations will continue to monitor benchmark developments and real-world applications of AI in research settings. The focus will be on whether AI can autonomously generate novel hypotheses, design experiments, and publish scientific findings at scale. Additionally, efforts are underway to develop new benchmarks that better capture the essence of research activities, which will inform predictions about the future of AI-driven scientific discovery. The next 32 months are likely to see further advances and debates over the limits of automation in research.
Key Questions
What are the main benchmarks indicating AI automation progress?
Key benchmarks include CORE-Bench, measuring research reproduction, and MLE-Bench, assessing performance on Kaggle competitions. Both are approaching saturation, indicating significant progress in engineering skills.
Can AI fully automate scientific research?
The evidence suggests AI can automate many engineering tasks, but whether it can fully automate research activities like hypothesis generation remains uncertain. The structural nature of research may require human insight for now.
What are the implications for human researchers?
If engineering becomes automated, researchers may shift focus from routine tasks to strategic, creative, and oversight roles, potentially reducing costs and accelerating development cycles.
How soon might AI automate all aspects of research?
It is currently unclear; while progress is rapid, full automation of research activities could still be years away, depending on breakthroughs in AI capabilities and understanding of scientific discovery processes.
Source: ThorstenMeyerAI.com