Back to blog
Education

Education in the Age of LLMs: Stop Grading the Robot's Homework

June 10, 20268 min read

Somewhere right now, a student is submitting an essay they didn't write, to a teacher who won't read it, who will grade it against a rubric an AI could apply better, so the student can get marks that certify skills they don't have for a job that won't exist.

Everyone in this pipeline knows it's broken. Everyone keeps the pipeline running.

LLMs aren't coming to education. They're already here, sitting in every pocket, writing every assignment, solving every problem set. The question isn't "how do we stop students from using AI?" That war is over. We lost. The question is: what's the point of education when the answers write themselves?

The Detection Delusion

Our first response, naturally, was panic. Ban ChatGPT. Buy AI detectors. Make students write essays in airplane mode while a teacher watches them like a parole officer.

How's that going?

The detectors flag non-native English speakers as bots and let actual AI essays sail through. Students learned to add typos and "humanize" their outputs faster than schools learned to spell "perplexity." We've created an arms race where the only skill being developed is evasion.

This is the practical file problem all over again. Remember those? Sixty students, identical "experimental observations," full marks for neat handwriting. We've been grading fiction for decades. The AI just industrialized the fiction.

Here's the thing: if a take-home assignment can be completed by a machine, the assignment was never measuring anything worth measuring. The LLM didn't break your evaluation. It exposed it.

Grade the Understanding, Not the Artifact

For a hundred years, we've used a lazy proxy: if the deliverable exists, learning must have happened. Essay submitted? Knowledge acquired. Problem set complete? Concepts understood. The artifact was the evidence.

That proxy is dead. Time for a viking funeral — light it on fire, push it out to sea.

The replacement is older than the proxy itself: check whether the human in front of you actually understands. The technology for this has existed for millennia. It's called a conversation.

What this looks like in practice:

  • Defend your submission. You wrote this essay? Great. Explain your second paragraph. Why this argument and not the obvious counter? What would change your mind? Five minutes of viva reveals more than five pages of prose.
  • Modify it live. Here's your code. Now change the requirement slightly. Watch the student who understands fly, and the student who copy-pasted freeze.
  • Explain it to someone else. Teaching is the final boss of understanding. You can't prompt-engineer your way through a ten-year-old asking "but why?"
  • Show your process, not your product. The conversation with the AI — the questions asked, the outputs rejected, the corrections made — is the new rough work. Grade the margins, not the final answer.

"But that doesn't scale!" I hear the administrators cry. "We have 60 students per class!"

Funny how we found money for AI detection software but not for actually talking to students. Besides — and savour this irony — the same LLMs causing the panic can handle the scale. An AI can conduct a Socratic dialogue with every student simultaneously, probe their reasoning, and flag where understanding goes hollow. The teacher reviews the flags and has the human conversations that matter. The machine does the homework of checking the homework.

We just have to want understanding badly enough to measure it.

Crutch vs. Exoskeleton

Now for the part that makes both camps uncomfortable.

The "ban AI" camp is wrong because they're preparing students for a world that no longer exists. The "let AI do everything" camp is wrong because they're not preparing students at all. The truth lives in a distinction we're failing to teach: the difference between a crutch and an exoskeleton.

A crutch replaces a function. Use it long enough when you don't need it, and the muscle atrophies. A student who lets the LLM do their thinking ends up unable to think — confidently illiterate, fluently hollow, able to produce anything and understand nothing.

An exoskeleton amplifies a function. The muscle still works; it just lifts more. A student who drafts their own argument, then uses the LLM to stress-test it, finds counterexamples, explores ten framings instead of one — that student is thinking more, not less.

Same tool. Opposite outcomes. The difference isn't in the AI; it's in who's doing the cognitive work.

We've handled this before, badly and then well. Calculators were going to destroy arithmetic — instead we (eventually) figured out that you learn long division first, then get the calculator, and math class moved up the ladder to harder problems. Nobody calls an engineer a cheater for using MATLAB.

But an LLM isn't a calculator for numbers. It's a calculator for thought itself. Which means the thing students must learn before leaning on it isn't arithmetic — it's reasoning. Forming an argument. Smelling a wrong answer. Knowing enough to catch the machine confidently hallucinating page 47 of a book that doesn't exist.

Skepticism, in other words. The thing our education system was already terrible at teaching when the misinformation merely came from WhatsApp forwards. Now it comes pre-polished, grammatically perfect, and personalized. The bar just went up. Our curriculum is still doing limbo under the old one.

The Employability Cliff

Here's the uncomfortable part nobody wants to say out loud at parent-teacher meetings.

The entry-level knowledge job — the junior analyst formatting decks, the trainee developer writing boilerplate, the fresh graduate summarizing documents — was the bottom rung of every white-collar career ladder. That rung is now a product feature. It costs twenty dollars a month and doesn't take chai breaks.

This means the youth of this country face a brutal fork:

  • Those who can direct AI — who can decompose a problem, delegate the mechanical parts, verify the output, and own the result — enter the workforce operating at the level of yesterday's five-year veteran.
  • Those who can only do what AI does — produce passable text, standard code, generic analysis on instruction — are competing with a machine on price. They will lose. Not eventually. At the interview.

And what are we training them for? Twelve years of memorize-reproduce-repeat, optimizing for exams that reward exactly the skill set LLMs have automated. We are running the world's largest training program for a profession — human photocopier — whose last vacancy closed in 2023.

Sharma ji ka beta scored 98% by perfectly reproducing the textbook. The textbook is now a free API. What exactly did he score 98% in?

I've written before that we measure students by marks scored, not concepts understood, and that this trauma graduates with them into annual performance reviews. The stakes just changed. Gaming the system used to get you a mediocre but survivable career. Now the system being gamed certifies you for unemployment. The marksheet says "first division." The job market says "we have that as a browser extension."

The Classroom We Actually Need

So burn it down and rebuild around one principle: humans should learn what makes them better directors of intelligence, and stop competing with it as producers of output.

  • Problem decomposition over problem solving. Breaking a vague, messy goal into precise, delegable pieces is the core skill of the AI age. It's also, conveniently, the core skill of thinking.
  • Verification as a first-class subject. Reading critically, checking claims, testing code, asking "what would make this wrong?" Trust, but verify. Actually — verify first.
  • Taste and judgment. When the cost of producing ten options drops to zero, the entire value moves to choosing well. We've never taught taste. We've barely tolerated it.
  • AI-free zones, by design. Mental math didn't die with calculators; we kept it because the muscle matters. Some thinking must still happen unplugged — not as punishment, but as strength training. You don't bring a forklift to the gym.
  • Evaluation as conversation. Continuous, verbal, adaptive. Did you understand it? Can you defend it? Can you extend it? The questions a machine can't answer for you, because the answer is you.

None of this needs new technology. It needs us to admit the old system was optimized for a world where producing text was expensive and verifying understanding was a luxury. That world ended. The price of text hit zero. Understanding is the only thing left with a market value.

Full disclosure: an LLM could have written a version of this post. Faster, probably with fewer rants. So am I a hypocrite typing this myself, or a fool for not delegating it? Neither — because if you sat me down and asked me to defend any paragraph here, I could. For the next generation, that's the entire exam.

Can you defend what you submitted?

Everything else is just very expensive typing.

Share this article