AI Survey Negatives

Quick read of downside signals from Anthropic's 2026 paper on coding skill formation ("How AI Impacts Skill Formation").

How to read this page: the first two sections compare AI vs No AI. The "AI Usage Pattern" section compares different behavior styles inside the AI group only.

Main Result (AI vs No AI): Learning Drop with AI Assistance

No AI (control) 100%
Baseline score in the main randomized study
AI-assisted 83%
17% lower quiz score (p = 0.010)
The paper reports no statistically significant speed-up in the main study (p = 0.391), but a significant drop in post-task knowledge.

Errors During Task (AI vs No AI, Median per Participant)

AI-assisted 1
Median errors per participant (Q1-Q3: 0-3)
No AI (control) 3
Median errors per participant (Q1-Q3: 2-5)
Fewer errors can mean less hands-on debugging practice when learning a new library.

No-AI Group Snapshot (Control)

26

Control participants (No AI)

Main study group size

3

Median total errors per participant

Q1-Q3: 2-5 errors

22/26

Finished Task 2 within 35 min

4 of 26 did not finish Task 2 in time

Higher

Average quiz score vs AI group

Higher across all coding-experience bands (Figure 7)
Task 2 completion within 35 minutes (AI vs No AI)
No AI (control) 84.6%
AI-assisted 100.0%
This reflects in-time completion only; the paper's primary finding is about post-task learning decline under AI assistance.

AI Usage Pattern Split (AI Group Only)

These cards are not AI-vs-No-AI. Every row below is from participants who had AI assistance; it shows which AI usage style performed better or worse.

Generation-then-comprehension

AI group only
Quiz score
86%
Time
24m

Hybrid code-explanation

AI group only
Quiz score
68%
Time
24m

Conceptual inquiry

AI group only
Quiz score
65%
Time
22m

AI delegation

AI group only
Quiz score
39%
Time
19.5m

Progressive AI reliance

AI group only
Quiz score
35%
Time
22m

Iterative AI debugging

AI group only
Quiz score
24%
Time
31m
Lowest-learning clusters are heavy delegation/debugging behaviors (24%-39% quiz), while explanation-first patterns score much higher (65%-86%).

Behavior Signals

-0.41

More debugging-query reliance -> lower quiz score

Correlation r = -0.41 (p = 0.043)

0.43

More debugging-query reliance -> slower completion

Correlation r = 0.43 (p = 0.033)
Practical takeaway: this is closer to a skill-atrophy risk signal than a literal medical claim. The measured outcome is weaker conceptual/debugging performance after AI-heavy workflows.
Source: Shen & Tamkin (Anthropic), "How AI Impacts Skill Formation", arXiv:2601.20245v2 (Feb 2026).