Why task-level gains don't translate to organisational improvement
In your group, go round each person:
"Anyone who thinks AI is slowing down is fatally miscalibrated"
— Jack Clark, co-founder of Anthropic (X, March 2026)
Task duration AI can complete (80% reliability) doubles every 4-7 months
One domain, one generation: near-100x jump
Capabilities don't arrive gradually — they step-change
DORA (DevOps Research and Assessment) measures software delivery:
These are the metrics that matter — not lines of code or PRs merged
| Level | Source | Finding |
|---|---|---|
| Task | Anthropic | 80% time savings |
| Developer | METR RCT | -4% to +9% (flat) |
| Organisation | Faros | +21% tasks, unchanged DORA |
Amdahl's Law: The maximum speedup of a system is limited by the fraction that can be improved
No Silver Bullet (1986): "The hard part of building software is the specification, design, and testing of this conceptual construct, not the labor of representing it"
AI accelerates code
→ PR volume doubles
→ review time increases
→ quality gates overwhelmed
→ deployment unchanged
Pick a process — e.g. feature delivery, an incident response, an onboarding, a report.
Copilots accelerate individual tasks — exactly the level where gains are real
But we've just seen: task-level gains don't translate to org-level improvement
Before AI, you could run on tribal knowledge, stale spreadsheets, and "ask Dave". Slow, but workable.
Task tools → task gains (real but bounded)
Systemic improvement requires systemic change:
Copilots-only users vs. agent-native organisations — the gap is widening:
Inaction is a position, but maybe not a good one
90% of developers now use AI at work — adoption is no longer the question
DORA: "AI's primary role is that of an amplifier. It magnifies the strengths of high-performing organisations and the dysfunctions of struggling ones"
Fix the foundations first — AI won't do it for you
Task-level gains != Org-level gains
The gap is architecture — and architecture here means the whole organisation, not just the code.
Can an agent read your organisation well enough to work in it?
Legibility isn't a coding problem. It's an information architecture problem. Every gap is a ceiling.
Getting benefit from AI requires value stream mapping
Score your organisation 1–5 on each row — could an agent actually read this?
Your lowest score is your ceiling. What would it take to move it by one point?
The organisation can be legible and still erode the people inside it
Two ways to use AI. Same tool. Opposite cognitive consequences.
It isn't about how much AI you use. It's about whether you use it to avoid the cognitive work, or to structure it.
Skill loss from AI delegation isn't a slope. It's a cliff.
The fix isn't less AI. It's scaffolding over delegation — which has to be architected, not willed.
Caosun & Aral (2026): the economics actively select against prevention.
Individual rationality produces collective irrationality. The market won't fix this because the market is the cause.
Scaffolding at the individual level requires de-scaffolding at the organisational level.
Most organisations will drift into de facto delegation without process changes — the worst of both worlds.
"Reduce management layers from 5 to 2-3 this year. The ideal is eventually 0 — all 6,000 people report to me."
— Jack Dorsey, CEO of Block (laid off 40% citing AI efficiencies)
Source: Dare Obasanjo on Bluesky
Anthropic interpretability team (April 2026): inside Claude Sonnet 4.5, 171 emotion-like patterns can be identified — and they causally drive behaviour.
These aren't subjective feelings. They're learned behavioural patterns that mirror how humans act under emotional influence.
An AI coordinator showing "concern" about a deadline isn't processing urgency.
Traditional management observation — reading tone, body language, calibrated trust — doesn't translate to systems that perform emotion without experiencing it.
Multi-Agent Teams Hold Experts Back (arXiv, 2026): LLM teams consistently underperform their best member by 8-37.6%
Where management matters most — ambiguous, judgment-heavy decisions — AI coordination dilutes expertise rather than concentrating it.
What happens when AI handles coordination in practice (Berkeley CMR, 2026):
The pipeline problem: where do future senior leaders come from if today's managers are checkers?
At current capability levels, the substitution Dorsey describes is not yet feasible.
Agents aren't eliminating middle management — they're turning ICs into managers of agents, and soon of agent fleets.
"People will be mostly programming by talking to a face by the end of 2026. There's absolutely NO reason to type with the Mayor. You should be able to chat with them like a person. You'll have a cartoon fox there onscreen, in costume, building and managing your production software, and showing you pretty status updates whenever you ask for one. This is the end state for IDEs."
Where most organisations are today:
The step most organisations skip:
"If you haven't spent $1,000 on tokens today per engineer, your software factory has room for improvement"
— Justin McCarthy, CTO, StrongDM
Fully automated: specification → code → review → deployment
OpenAI Harness Engineering — 1M LOC, no human-written code
Cloudflare — rewrote NextJS from tests + spec + docs alone
Caveat: everyone showing you this is selling something
Human judgement doesn't disappear — it moves up the stack:
This is scaffolding-preserving architecture at organisational scale — the delegation/scaffolding choice relocated to where it scales.
Direction of travel, not today's reality for most organisations — but preparing your architecture now determines whether you can get there.
Lessons from Trail of Bits
Boutique, high-end security consultancy — vulnerability research, code audits, cryptography
Their AI-native transformation in one year:
"Make it measurable" — visible levels per role, not mandates
+44.6pp
adoption swing from manager endorsement alone
Manager endorses AI: 79% · Manager doesn't: 34.4% · Irrational Labs
Only 28% of employees strongly agree their manager actively supports AI use
Before any tooling budget, fix this gap. It is free.
Run a hackathon — not just for developers:
Next steps: Pick a leader. Have them visibly use AI for a real task this week. The passive 50% watches what leadership does, not what it says.
Create a capability ladder with observable behaviours per role:
Next steps: Adapt the Trail of Bits matrix to your roles. Publish it. Let people self-assess.
Run a value stream map of idea-to-production:
Next steps: Pick one team with good test coverage. Let an agent generate a PR. See what your review process catches vs what linting and tests catch. That ratio tells you how far you are from automated review.
The same principles apply across the organisation
e.g. PMOs and reporting:
Future — make project state machine-readable:
Records meetings, calls, voice memos → makes transcripts agent-readable
Agent-native outside engineering — same shape, different domain
Bookending the claims from the opening:
| We said... | Why |
|---|---|
| Task-level gains are real | AI does make coding faster in isolation |
| The bottleneck shifts, doesn't disappear | Faster generation just moves the queue into review |
| Individual level — delegation erodes, scaffolding preserves | Avoiding the work atrophies the judgement you need to check it |
| Success requires architectural transformation | Agents only use what's legible to them |
| EA problem, not a tooling problem | People track what their managers visibly do |
Task-level gains are real
Org-level gains require architectural transformation
The constraint is systems, not AI capability
Structural change — not a tool upgrade