Learning Intelligence

[The Argument](#chapter-1) [Timeline](#chapter-4) [The Platform](#chapter-platform) [The Field](#chapter-field-map) [↓ PDF](learning-intelligence-guide.pdf) [⌘ Agents](#prompts)

A guide for educators · 2026

# How we teach when the
answer is _free_

A field guide to assessment, pedagogy, and evidence in the post-output classroom — and to _learning intelligence_, the discipline trying to put them back together.

14,500 words

 ·

 ~55 minutes

 ·

 For higher ed & K–12 leaders

 ·

 Updated May 2026

[↓

 Download as PDF](learning-intelligence-guide.pdf) [⌘

 Agent-ready prompts](#prompts)

CH 01 / 14

01

Chapter 01 · The break

## A contract _quietly_ broken

In the fall of 2022, a freshman writing instructor could still reasonably believe that the essay sitting in her grading queue was a record of her student's thinking. By the spring of 2026, she cannot. The basic contract of formal education quietly broke between those two semesters.

The break has a date. On [November 30, 2022](https://openai.com/index/chatgpt/), OpenAI released ChatGPT to the public. Within five days it had a million users. Within two months, [100 million](https://techcrunch.com/2025/11/30/chatgpt-launched-three-years-ago-today/). By the start of 2026, 900 million people were using it weekly, and the question facing every teacher in every classroom on every continent had inverted. It was no longer, _can the student produce this work?_ It was, _if the student can produce this work in thirty seconds with a chatbot, what was the work for?_

Three and a half years in, the data is overwhelming.

88%

of UK undergraduates use generative AI specifically on assessed work — up from 53% the year before.

HEPI Student Survey 2026

95%

of US college faculty expect AI to increase student overreliance; 90% expect it to diminish critical thinking.

Elon University · AAC&U, Jan 2026

86%

of students globally are already using AI regularly in their studies, with one in four using it daily.

Digital Education Council, 2024

38%

of faculty say AI has _increased_ their workload — mostly from policing cheating and rebuilding assessments. Only 11% say it has decreased it.

Tyton Partners, Time for Class 2025

The instructor with the unreadable essay is not alone. Her problem is now the central operational problem of the entire sector.

What this guide is about is the field that is emerging in response to that problem. It does not yet have a settled name, but the most accurate one available is **learning intelligence**: the practice of generating and interpreting trustworthy evidence of how learning is happening, not just what was finally submitted, so that teachers can teach, students can learn, and institutions can certify that something real took place between them.

The thesis

#### When output becomes cheap, evidence becomes everything.

Learning intelligence is not the same as learning analytics. It is not the same as AI tutoring. It is not the same as plagiarism detection. It borrows from all three, but it points somewhere they do not — at the place where assessment was always pointing, before the post-war research university convinced itself that a stack of essays was a sufficient record of a mind at work: at the _process_ of learning itself.

This guide is written for the people who already know that something has to change: K-12 superintendents and deans, provosts and chief academic officers, instructional designers, and the teachers and faculty who are doing the actual work of teaching while the ground shifts under them. It walks through what we knew about learning before generative AI made the question urgent again, what the last four years actually did to the classroom, why the assessment crisis is a validity crisis and not a cheating crisis, what AI can and cannot do as a tutor and a colleague, and what a credible model of learning intelligence looks like for the institutions that have to live inside it.

The schools that thrive in the next decade will be the ones that learn how to _see_ learning again.

CH 02 / 14

02

Chapter 02 · Foundations

## What we knew _before_ the machines could write

The strange thing about the AI crisis in education is that almost everything we need to solve it is already in the research literature. The science of how people learn has not changed because chatbots got good at writing essays. What has changed is the price of pretending it doesn't matter.

Begin with three findings that nearly every cognitive scientist would put on the same short list.

### Effortful processing builds memory

Robert and Elizabeth Bjork's work on **desirable difficulties** — the now-classic body of research showing that easier conditions during practice often produce _worse_ long-term learning than harder ones — established a principle that has been replicated across decades of memory and cognition studies. Spacing practice, mixing problem types, generating answers before being told them, and being forced to retrieve information from memory all feel less productive in the moment and produce more durable learning in the long run. Ease is the enemy of encoding.

### Active learning beats passive instruction

The largest meta-analysis on the question, [Freeman and colleagues' 2014 paper in PNAS](https://www.pnas.org/doi/10.1073/pnas.1319030111), pooled 225 studies of undergraduate STEM courses and found that student performance under active learning conditions improved by 0.47 standard deviations on exams and concept inventories, with failure rates 1.5 times higher under traditional lecture. The effect held across every discipline studied.

"These results support active learning as the preferred, empirically validated teaching practice in regular classrooms." Freeman et al., PNAS · 2014

### Feedback is the strongest classroom lever we have

[John Hattie and Helen Timperley's 2007 review in _Review of Educational Research_](https://journals.sagepub.com/doi/abs/10.3102/003465430298487) put the effect size of well-formed feedback on student achievement between 0.70 and 0.79 — extraordinarily large by educational-research standards. The crucial word is _well-formed_. Feedback that tells the student where they were trying to go, where they actually are, and what to do next outperforms feedback that just labels work as good or bad. Praise and grades, by themselves, are weak instructional tools; substantive, forward-looking feedback is a near-miracle.

Stack these findings together and the picture is clear. Learning is something a person does with effort, in interaction with material and with other people, under conditions where feedback can flow continuously and the student can act on it. Or as [Paul Black and Dylan Wiliam wrote in their landmark 1998 monograph _Inside the Black Box_](http://allianceforlearning.co.uk/wp-content/uploads/2017/03/William-and-Black-Inside-the-Black-Box.pdf), summarizing the case for formative assessment:

"There is a body of firm evidence that formative assessment is an essential component of classroom work and that its development can raise standards of achievement. We know of no other way of raising standards for which such a strong prima facie case can be made." Black & Wiliam · 1998

That was twenty-eight years ago. Most universities still mostly grade the final paper.

### The hidden engine: self-regulated learning

There is a fourth, less tidy finding that turns out to matter enormously in the AI era. Learning, even in adults, is regulated by the learner. The body of research on **self-regulated learning**, built up over decades primarily by Barry Zimmerman, Ernesto Panadero, and their collaborators, treats learning as a cycle: students set goals, plan how to reach them, monitor their own understanding, adjust their strategies when they notice they are off track, and reflect afterward on what worked.

Students who do this well outperform students who don't, even controlling for prior achievement. The skills are teachable but rarely taught explicitly. And — this is the part that matters now — when a student offloads cognitive work to an AI, the most important thing they often offload is not the answer. It is the _monitoring_. They stop noticing what they don't understand.

### The framework that named the practice: the 4Cs

In the early 2000s, the U.S.-based [Partnership for 21st Century Skills](https://www.battelleforkids.org/wp-content/uploads/2023/11/P21_framework_0816_2pgs.pdf), now hosted by Battelle for Kids, codified what eventually became known as the 4Cs: **critical thinking, communication, collaboration, and creativity**. The 4Cs are not a research result; they are a synthesis of what employers, educators, and policy bodies converged on as the durable, transferable capabilities every student should develop.

They are deliberately not a list of facts. They are a list of practices. You can't take a multiple-choice test on creativity. You have to do something creative, in front of someone who can judge it. This is the framework most schools claim to teach. It is also the framework most schools have the hardest time actually assessing — because the 4Cs are processes, and the conventional assessment infrastructure of higher education is built around products. The [Association of American Colleges and Universities' VALUE rubrics](https://www.aacu.org/value/rubrics), developed by faculty teams across more than two thousand institutions, are the most widely adopted attempt to bridge the gap.

The synthesis

#### Before ChatGPT, the research had already told us:

Learning is effortful, social, and continuous. The strongest tool a teacher has is timely, well-formed feedback inside an active task. The skills that matter most for a graduate's life are not facts but practices — critical thinking, communication, collaboration, creativity — and those practices have to be seen and judged over time. Assessment systems that compress all of this into a final product and a number are weak assessments even on their own terms.

What generative AI did was take that weakness and turn it into an emergency.

CH 03 / 14

03

Chapter 03 · The constructs

## The 4Cs, _observed_ not assumed

The entire learning intelligence project depends on whether the 4Cs can be made observable. If "critical thinking" is a vibe, it cannot be evidenced and the field collapses into self-reporting. If it is a sequence of behaviors a person can be seen doing — well, then we can build something.

C₁

#### Critical thinking

How well a learner frames questions, evaluates evidence, reasons through alternatives, and revises judgments in response to counterevidence.

C₂

#### Communication

How well a learner makes ideas understandable and audience-appropriate across written, oral, and multimodal forms.

C₃

#### Collaboration

How productively a learner contributes to collective understanding, coordinates with peers, and incorporates feedback.

C₄

#### Creativity

How well a learner generates original combinations, explores alternatives, and iterates toward novel but useful solutions.

### Critical thinking, in five visible moves

The [VALUE rubric for critical thinking](https://www.aacu.org/initiatives/value-initiative/value-rubrics/value-rubrics-critical-thinking) breaks the construct into five observable dimensions: explanation of issues, evidence (selecting and using credible information), influence of context and assumptions, the student's own position (with appropriate complexity), and conclusions and related outcomes. Each of those leaves traces. A student frames a question well or badly. They cite a source or invent one. They acknowledge a counterargument or steamroll past it. They reach a conclusion that follows from their evidence or one that overreaches. In a writing assignment, almost every one of these moves is visible in the draft history if anyone bothers to look.

### Communication is behavioral too

The VALUE rubric for written communication asks about context and purpose, content development, genre and disciplinary conventions, sources and evidence, and control of syntax and mechanics. Every one of these is something a reader can see, and every one is something a draft history can reveal as a process. The student who revises for clarity is doing communication. The student who hands in a polished first draft they didn't write may have done communication on the page but not in their head.

### Collaboration is where assessment usually fails

Most universities give group grades. A group grade tells you almost nothing about which student did the collaborating. The VALUE rubric for teamwork instead asks about contributions to team meetings, facilitation of teammates' contributions, individual contributions outside meetings, fostering a constructive team climate, and responding to conflict. Each of these is observable in a peer-evaluation workflow, a shared document's revision log, or a structured peer-review system. The signal exists. We mostly don't capture it.

### Creativity, redefined as iteration

The [OECD's PISA 2022 creative thinking assessment](https://www.oecd.org/en/topics/sub-issues/creative-thinking/pisa-2022-creative-thinking.html) — the first time creativity has been measured at international scale, across 64 countries — defines the construct as the capacity to generate diverse and original ideas, _and_ to evaluate and improve upon ideas, in open-ended tasks across written, visual, scientific, and social problem-solving domains. The framework is deliberately not about a single output. It is about whether a student can generate alternatives, recognize promising ones, and improve them iteratively.

AI can produce a thousand decent ideas in a minute. What it cannot do, yet, is sit inside a learner's head and develop their ability to discriminate between them.

### The hidden fifth: metacognition

There is a fifth capability that increasingly belongs in this list, even though it is not part of the original 4Cs: **metacognition and self-regulated learning**. This is the practice of monitoring your own understanding, planning your approach, noticing when you're stuck, and choosing a better strategy. In a world where students can outsource almost every other cognitive operation to a machine, the one thing they cannot outsource is knowing whether they have actually understood something.

The work of Singh and colleagues at ASIS&T in 2025, which embedded metacognitive prompts into a generative AI search workflow, found that students who were nudged to evaluate the AI's answers — to ask themselves whether what they were reading actually addressed their question — engaged in deeper inquiry and were more discerning about AI responses. The intervention was small. The implication is not.

The 4Cs, plus metacognition, are the right scaffolding for the rest of this guide. They are not abstract. They are practices. They produce evidence. And the practices are exactly what generative AI most threatens to collapse if they are not deliberately preserved.

CH 04 / 14

04

Chapter 04 · History

## Four years that _broke_ the model

The story of GenAI in education from the end of 2022 to the spring of 2026 is the story of an entire sector going through the stages of grief in under four years. The policy and product responses still alive today were forged in specific moments, and the moments still shape what is possible.

Nov 30
2022

Release.

[OpenAI publishes ChatGPT](https://openai.com/index/chatgpt/), a free conversational interface on top of GPT-3.5. There is no official "education launch." There doesn't need to be one. Within days, students discover that the chatbot will write a passable essay on almost any topic in any voice, and the news travels through TikTok faster than any administrator can respond.

Jan
2023

Panic.

New York City Public Schools, then the largest district in the United States, blocks ChatGPT on school networks. Multiple universities follow. Op-eds appear under headlines like "The College Essay is Dead." Some instructors switch to handwritten in-class essays. Most do nothing different because the semester is already underway.

Feb
2023

The detection arms race begins.

Turnitin announces that AI writing detection is coming to its products. Within months, a parallel industry of "AI humanizers" appears. Students begin running their AI-generated text through second AI tools to bypass detection. The detection market grows. The arms race accelerates.

May
2023

The first major policy document.

The U.S. Department of Education's Office of Educational Technology publishes [_Artificial Intelligence and the Future of Teaching and Learning_](https://www.ed.gov/sites/ed/files/documents/ai-report/ai-report.pdf). Its core message: AI should "augment human intelligence, not replace it," and the right response is not bans but informed adoption. The phrase "humans in the loop" appears repeatedly.

Aug
2023

Vanderbilt disables Turnitin's AI detector.

In one of the most consequential institutional decisions of the year, [Vanderbilt publicly turns off Turnitin's AI writing detection](https://www.vanderbilt.edu/brightspace/2023/08/16/guidance-on-ai-detection-and-why-were-disabling-turnitins-ai-detector/). The rationale is unusually candid. A 1% false positive rate would mean roughly 750 of Vanderbilt's 75,000 annual submissions being wrongly flagged. AI detectors were also more likely to flag text by non-native English speakers. The detection-first strategy started losing credibility a year before it started losing in court.

Sep
2023

UNESCO weighs in.

UNESCO publishes [the first global guidance document on generative AI in education](https://www.unesco.org/en/articles/guidance-generative-ai-education-and-research). It calls on governments to regulate use, protect student data, set age limits, and build AI literacy into curricula — framing the issue as a curriculum and pedagogy issue, not just a cheating issue.

Nov
2023

TEQSA reframes assessment.

Australia's Tertiary Education Quality and Standards Agency publishes [_Assessment Reform for the Age of Artificial Intelligence_](https://www.teqsa.gov.au/guides-resources/resources/corporate-publications/assessment-reform-age-artificial-intelligence). It is the first national regulator to move the conversation from detection to redesign. The premise: the assurance of learning is the institution's responsibility, and it cannot be discharged by trying to catch AI use. It has to be discharged by designing assessments that produce evidence AI cannot easily fake.

2024

Normalization.

The Digital Education Council's first global student survey finds [86% of students using AI regularly](https://www.digitaleducationcouncil.com/post/what-students-want-key-results-from-dec-global-ai-student-survey-2024); half do not feel "AI ready." OECD releases [PISA 2022 creative thinking results](https://www.oecd.org/en/topics/sub-issues/creative-thinking/pisa-2022-creative-thinking.html), the first international comparable measure of creative capability across 64 countries. Singapore tops the rankings. The conversation shifts from "how do we ban this" to "what should students actually be able to do."

2025

The institutional response hardens.

[EDUCAUSE's 2025 AI Landscape Study](https://www.educause.edu/content/2025/2025-educause-ai-landscape-study/introduction-and-key-findings) finds 57% of higher-ed institutions treat AI as a strategic priority (up from 49%), and 74% are focused on academic integrity. [Tyton Partners' _Time for Class 2025_](https://www.globenewswire.com/news-release/2025/06/11/3097384/0/en/Tyton-Partners-Releases-2025-Time-for-Class-Report-Institutions-Rebalance-Human-Connection-and-Digital-Innovation-in-Higher-Ed.html) finds 38% of faculty reporting increased workload from AI versus 11% reporting decreased.

Jun
2025

The Kestin RCT.

[Kestin et al. publish a randomized controlled trial in _Scientific Reports_](https://www.nature.com/articles/s41598-025-97652-6) in which roughly 180 Harvard physics students alternated weekly between in-class active learning and homework using a custom-built AI tutor. The AI tutor produced learning gains roughly **twice as large** as the active-learning sessions, along with higher engagement and motivation. The first major evidence that pedagogically designed AI can outperform what was previously considered the gold standard of in-person instruction.

Jul 29
2025

Study Mode.

[OpenAI launches "Study Mode"](https://openai.com/index/chatgpt-study-mode/) in ChatGPT, built in consultation with pedagogy experts from over 40 institutions. Instead of producing direct answers, Study Mode uses guiding questions and Socratic prompting. The first major signal that the platform layer recognizes the difference between answering a question and teaching the person who asked.

Jan
2026

OECD names the problem.

[The OECD Digital Education Outlook 2026](https://www.oecd.org/en/publications/oecd-digital-education-outlook-2026_062a7394-en.html) introduces the phrase that will probably define the policy conversation for the rest of the decade: **false mastery**. Students who practiced math with a generic chatbot performed better in the moment but scored _up to 17% worse_ on subsequent closed-book exams than peers who studied alone. The output looked like learning. The learner had not actually learned.

Jan
2026

Faculty hit a wall.

The [Elon/AAC&U survey of 1,057 faculty](https://www.aacu.org/research/the-ai-challenge) publishes. 95% expect AI to increase student overreliance. 90% expect it to diminish critical thinking. 83% expect it to shorten attention spans. The faculty, three years in, are not enthusiastic about how this is going.

Mar
2026

HEPI publishes the new normal.

[HEPI's 2026 student survey](https://www.hepi.ac.uk/reports/student-generative-ai-survey-2026/) finds that AI use among UK undergraduates is now near universal, with 88% using GenAI for assessed work. The question fully shifts. It is no longer, _will students use AI?_ It is, _what are we able to certify about what they learned?_

The arc

#### Panic → ban → detect → regulate → integrate → re-examine.

The sector has not landed yet. But the place it is landing on, the one almost every policy body and serious researcher is now pointing toward, is the same place. Not better detection. Not faster grading. Not bigger LMS dashboards. _Better evidence of learning, captured during the learning itself._

That place is what learning intelligence is for.

CH 05 / 14

05

Chapter 05 · The crisis

## When output becomes _cheap_

There is a moment in the life of any measurement system when the thing it measures stops being scarce, and the system stops being useful. The assessment crisis in education is that moment.

For most of the last century, a well-written paper was a reasonable proxy for a literate mind. It took hours of reading, drafting, revising, and re-reading to produce. The labor and the cognition were entangled. To submit the paper was, with allowances for cheating, also to have done the thinking. **The artifact was the evidence.**

That entanglement is what generative AI breaks. The price of a polished paragraph has collapsed. The price of a coherent five-page argument has collapsed. The price of a passable C+ undergraduate essay has collapsed approximately to zero.

A thermometer that returns the same number whether or not anything is hot has stopped being a thermometer.

This is not a moral observation. It is a measurement observation. Whatever those artifacts used to certify, they no longer certify in the same way.

### The validity crisis, not the cheating crisis

The polite name for this in the assessment literature is the **validity crisis**. The OECD's "false mastery" finding is one way to describe it. The 17% gap between practice performance with a chatbot and unaided exam performance afterward is a direct empirical demonstration that the metric (the practice score) is decoupling from the construct (what the student actually knows).

The [research on cognitive offloading](https://www.mdpi.com/2075-4698/15/1/6), much of it published in 2025, makes the mechanism more specific. When students hand cognitive work to AI tools, they engage in less of the self-monitoring and effortful retrieval that produce durable learning. Frequent AI use is now reliably correlated with weaker critical thinking, with cognitive offloading as the statistical mediator. The effect is strongest in younger students and those with less academic experience — the populations whose learning was most fragile to begin with.

### Why detection isn't the answer

The reflex response — find the AI text, punish it, restore the old contract — has been tested at scale and is not working. Vanderbilt's [public reasoning for disabling Turnitin's AI detector in 2023](https://www.vanderbilt.edu/brightspace/2023/08/16/guidance-on-ai-detection-and-why-were-disabling-turnitins-ai-detector/) is still the cleanest statement of why:

-   False positive rates that look acceptable in a vendor deck (1%, for example) translate into hundreds or thousands of wrongly accused students at institutional scale.
-   Non-native English speakers are systematically more likely to be flagged.
-   Detectors look at one snapshot of text, while AI generation models update continuously and rapidly outpace what detectors can learn.

Multiple recent papers, including [Garland's 2026 mathematical framing](https://arxiv.org/abs/2510.03531) of the detection problem, argue that text-only one-shot detection is structurally incapable of achieving the fairness properties educational institutions need. Even Turnitin itself has shifted its public messaging, repositioning AI detection as one signal among many rather than a determinant of misconduct.

### The deeper problem

The detection failure is not an accident. It reflects a deeper truth: AI text is not, in any robust technical sense, distinguishable from human text. It is a category of writing, and that category is converging on the same prose features the academy taught us to value — fluency, clarity, organization, conventional grammar, formal register. We trained machines on those features, then asked them to produce text by maximizing those features, and we are now surprised that the resulting text is hard to distinguish from text by students whose teachers also asked them to produce text by maximizing those features.

What follows from this is uncomfortable but unavoidable. The artifact-only model of assessment was never very strong. It survived because the cost of producing acceptable artifacts was high enough that the artifact and the learning were _in practice_ yoked together. Once the yoke is removed, the artifact stops being able to do the assessment job.

What replaces it

#### An evidence model that doesn't depend on the artifact being scarce.

The answer cannot be "go back to in-person handwritten exams forever." Handwritten exams have their own validity problems, and no graduate of any of these institutions will go on to do their professional work without AI tools. The answer cannot be "trust the student" either — not because students are dishonest but because the question is about evidence, not honesty.

The answer has to be a different evidence model. One that can absorb the fact that AI is everywhere, in every step of the work, and still produce something an instructor can act on, a student can learn from, and an institution can defend.

CH 06 / 14

06

Chapter 06 · The split

## There are _two_ AIs

It is tempting to conclude that generative AI is bad for learning. Many faculty have. The Elon/AAC&U numbers describe a faculty population that has seen, up close, what unconstrained AI use looks like. The evidence is more complicated than the faculty mood.

AI is not the variable. Pedagogy is.

There appear to be two AIs, distinguished not by which model is running but by how the model is used. **One AI helps learners. The other replaces them.** The same chatbot can do both within the same hour.

### The amplifier case

The case for AI as a learning amplifier is real and growing. The [Kestin et al. randomized controlled trial in _Scientific Reports_](https://www.nature.com/articles/s41598-025-97652-6) found that students using a carefully designed physics AI tutor — short responses, expert scaffolds, explicit step-by-step reasoning, guardrails against giving away answers — learned roughly twice as much per hour as students in an active-learning classroom.

The 2025 meta-analyses on ChatGPT and academic performance, and the 2026 meta-analyses on GenAI and educational outcomes in higher education, point in the same direction on average: AI-supported learning interventions produce significant positive effects on achievement, particularly when the intervention scaffolds the learning process rather than substituting for it.

### The erosion case

The case for AI as a learning erosion mechanism is equally real. The OECD Outlook's "false mastery" finding, the cognitive-offloading research, the workload survey showing faculty spending more time policing AI than teaching, the studies showing students who lean heavily on AI tools score worse on subsequent unaided assessments — these are not noise. They describe what happens when AI is used without pedagogical structure.

AI is not a pedagogy. It is an amplifier of whichever pedagogy you bring to it.

Good designs become better. Bad designs become much worse. An assignment that was already a poor test of student understanding becomes a near-zero test of student understanding when AI is added. An assignment that was already focused on the process of thinking, with scaffolding and feedback and revision, can become much more powerful with AI as the tireless second reader.

### Learning work vs. output work

What separates the two cases is whether the AI is being used to do the _learning work_ or the _output work_.

Learning work

AI asks the student a question, makes them try, shows them where they went wrong, encourages them to try again, models the reasoning, then steps back. The student's neural circuits for the topic are exercised. Their schema is built. Their ability to do the work without the tool improves. This is what Bloom's famous "two sigma problem" was about — and the Kestin study's two-fold gain over active learning is in that ballpark.

Output work

The student types the prompt, takes the result, lightly edits it, and submits. The artifact looks the same as it did before, but no learning has happened. The student's circuits for thinking through the problem have not been exercised. Their ability to do the work without the tool has not improved and may have actively decayed. This is what cognitive offloading looks like in practice.

### Can AI be empathetic enough?

There is a third case worth naming, which is the question of whether AI can do the _relational_ work of education. The honest answer from the research is partial. The 2024 systematic review by Sorin and colleagues in _JMIR_ found that large language models can demonstrate elements of cognitive empathy — recognizing emotional content, producing supportive-sounding responses, sometimes outperforming rushed humans on perceived bedside manner. But they do not feel with the learner, and their "empathy" is prompt-sensitive, surface-level, and easily destabilized.

A 2024 quasi-experimental study in online higher education found that empathic chatbot feedback was comparable to teacher feedback on learning performance in that specific context. But a 2025 study on the "emotional cost of educational chatbots" found that students using a chatbot during an assignment reported significantly lower positive affect than peers who did not. A 2026 study on the "AI empathy choice paradox" found that people generally prefer to receive empathy from humans, even while rating AI-generated empathy as high quality when they receive it.

Role-specific, not categorical

#### AI can be empathetic enough for some roles, not for others.

AI is empathetic enough for first-line support, formative feedback, low-stakes encouragement, structured Socratic prompting, and some tutoring at large scale. It is not empathetic enough to replace faculty mentorship, the relational trust that underwrites belonging and challenge, high-stakes advising that shapes a student's life, or the moral seriousness that good teachers bring to difficult conversations.

Empathy is, in the end, a feature of accountability between persons. A chatbot is not accountable in that sense, and pretending otherwise is the same category mistake that has tripped up every wave of educational technology since the teaching machine.

CH 07 / 14

07

Chapter 07 · The reframe

## From artifact to _process_: a new evidence model

If the artifact-only model of assessment is broken, what replaces it? The answer existing in the research literature, well before generative AI made it urgent, is evidence-centered design: build assessments around the evidence you actually need to make the claims you want to make.

The more steps you observe, the more confidently you can read what the student knows.

### Three converging literatures

**Evidence-centered design**, from Robert Mislevy and colleagues' work in the late 1990s, treats assessment as an argument structure. The argument starts with claims (what we want to say a student knows or can do), specifies the evidence that would support those claims, and only then designs tasks that elicit that evidence. The grade is the conclusion of an argument, not the start of one.

**Stealth assessment** is the practice of capturing evidence of learning _during_ an authentic activity, rather than interrupting the activity with separate tests. Developed primarily in educational games and simulations, the principle generalizes: the evidence is built into the experience, not bolted on.

**Process data in large-scale assessments** captures not just student answers but the sequences of actions that produced them — which items they returned to, how long they spent on each step, how they revised. The PISA 2025 "Learning in the Digital World" framework treats iterative knowledge building and effective self-regulation with digital tools as integral parts of the competence being assessed.

The more steps you can observe between a student's first encounter with a problem and their final answer, the more confidently you can certify what they learned.

### The new menu of assessment patterns

The most robust assessment designs in the AI era are the ones that have been recommended for decades by serious assessment researchers but rarely adopted at scale because they are labor-intensive. Today, they are also the only ones whose evidence still holds up.

01

#### Staged writing assignments

Where the draft history is itself evidence. Students submit a planning document, outline, annotated bibliography, first draft, revision memo, and final draft, with feedback exchanges between each. The grade considers the trajectory, not just the endpoint. AI can be used at any stage and the system still produces signal about what the student actually engaged with.

02

#### Inquiry-driven discussion

Where the quality of the questions a student asks is itself a measurable thing. The Packback platform's Curiosity Score, for example, is not a generative AI scoring system; it is an algorithmic measure of the open-endedness, sourcing, and clarity of student-posed questions. A peer-reviewed study on 2,800 long-form assignments found that AI-assisted process feedback improved writing quality and reduced grading workload. The mechanism was process visibility, not output evaluation.

03

#### Social annotation

Where students mark up shared readings in front of each other before class. The annotation behavior is the evidence. A [study on the Perusall platform](https://www.frontiersin.org/journals/education/articles/10.3389/feduc.2018.00008/full) in _Frontiers in Education_ found that pre-class annotation grades and discussion behavior together accounted for **41.8% of the variance in students' weekly post-class essay performance**. Annotation behavior is one of the most diagnostic single signals available in undergraduate teaching.

04

#### Peer review with calibration

Where students assess each other's work using rubrics and are themselves assessed on the quality of their feedback. The peer review is the assessment of collaboration. Done well, it produces evidence about both the reviewed student and the reviewer.

05

#### Oral defense components

Where students explain their work and answer follow-up questions. The oldest and most reliable assessment in the academy — the one PhD committees use precisely because they cannot easily certify a dissertation just by reading it. AI is structurally bad at fake-defending an argument it generated.

06

#### Reflective process notes

Where students articulate what they tried, what they learned, where they got stuck, and what they would do differently. Evidence of metacognition. Also, when done honestly, the cheapest learning intervention any course can add.

07

#### Explicit AI use disclosure

Where students describe what AI tools they used, for what purposes, and how they evaluated the results. The assessment-design move that does the most work for the least cost. It does not depend on detection; it does not depend on trust. It makes AI use a legible part of the assignment.

What all of these have in common is that they multiply the number of points where the student is visibly thinking. Each individual point is not necessarily harder to fake than a final essay. But together, they form a pattern that takes much more work to fake than to do, and that produces a much richer record for the instructor to read.

CH 08 / 14

08

Chapter 08 · Definition

## Defining _learning intelligence_

It is time to be specific.

Working definition

#### Learning intelligence is the continuous collection and interpretation of evidence about how learning is happening, used to improve teaching, learning, and assessment.

The definition is one sentence long for a reason. The longer versions tend to obscure what the field is actually for. Learning intelligence is not a technology. It is a practice. The technology exists to enable the practice, the way the microscope exists to enable biology.

### What it isn't

Several other terms are in the air, and the differences matter:

Learning analytics

The established academic field, re-codified in 2025 by [SoLAR](https://www.solaresearch.org/2025/06/solars-learning-analytics-definition-taskforce-releases-report/) as "the collection, analysis, interpretation and communication of data about learners and their learning that provides theoretically relevant and actionable insights." Learning analytics is the intellectual parent. The difference is that classical learning analytics has tended toward retrospective dashboards (engagement, time-on-task, click counts) often weakly connected to specific learning constructs. Learning intelligence is more pedagogically opinionated, more assignment-native, and more focused on inferring constructs rather than reporting engagement.

AI in education

The broader category, including AI tutors and administrative chatbots. Learning intelligence is a specific use case within it. The distinguishing question is whether the system's primary output is _evidence a human can act on_.

Plagiarism & integrity detection

Has tried to colonize the assessment problem from the integrity side. Detection is a small, narrow, and now-discredited slice of what assessment needs. A learning intelligence system may include some integrity signals, but its main job is positive: showing what students did, not just flagging what they might not have done.

Adaptive learning

About adjusting content delivery to individual students. Learning intelligence is about understanding what a student's work reveals about their thinking. Adjacent, not identical.

### Four emerging variants

Within the learning intelligence space itself, four overlapping variants are emerging:

I

#### Instructional intelligence

Mostly K-12. AI embedded in curriculum and lesson delivery. Vendors like Kiddom and Subject. Signal: engagement and standards-aligned progress.

II

#### Authorship intelligence

Systems capture evidence of writing process, draft history, AI use, and revision behavior at the assignment level. Cadmus, Turnitin Clarity, Brisk's writing replay. Signal: process provenance.

III

#### Institutional intelligence

LMS and campus platforms aggregate outcomes across courses. Canvas Intelligent Insights, D2L Achievement+, Anthology's analytics suite. Signal: which courses and student segments are at risk?

IV

#### Process-native intelligence

Systems that capture evidence of student thinking inside high-cognition assignments — discussion, inquiry, writing, peer review, oral defense. Packback, FeedbackFruits, Perusall, Kritik. The variant with the most upside.

### Five principles

What these variants share is a set of design principles. The following five, taken together, distinguish a learning intelligence system from a generic AI tool or analytics dashboard.

1

#### Process over artifact

The primary unit of analysis is what the learner did, not just what they submitted. A system that only ingests final submissions cannot do learning intelligence in any robust sense.

2

#### Constructs over clicks

Raw events are inputs, not outputs. A learning intelligence system maps events to learning constructs (critical thinking, revision quality, collaboration depth) using transparent evidence models. Engagement metrics are at best weak proxies for learning; surfacing them as if they were learning is engagement theater.

3

#### Evidence over scores

The system's primary output should be evidence a human can interpret, not a black-box score. When the system does produce scores, the scores should be defensible: the user should be able to ask "why" and get a useful answer that points back to specific observed behavior.

4

#### Transparency over surveillance

Students should know what is being captured, why, and how it will be used. Teachers should be able to see and challenge the system's inferences. Institutional governance — privacy, retention, role-based access, audit logging — is part of the system, not an afterthought. This is a precondition of trust, and trust is a precondition of any assessment that actually changes student behavior.

5

#### Human-in-the-loop over autonomous judgment

The system supports instructor judgment; it does not replace it. High-stakes decisions — grades, integrity findings, intervention referrals — remain with humans. The system's job is to make those decisions better-informed, not to make them automatically.

Does it help the people in the room understand what is happening between them? That is the test for whether something is learning intelligence or just edtech with AI features.

CH 09 / 14

09

Chapter 09 · Anatomy

## How the _platform_ actually works

Definitions move quickly to abstraction. It is worth being concrete about the shape a learning intelligence system actually takes, because most products marketed under the "learning intelligence" banner over the next two years will be something else wearing the label. A serious platform sits on three layers, and a buyer can tell whether a product is real by asking which of the three are present.

Signals flow upward through the stack. Each layer fails on its own.

Layer 03

Synthesis & insight

**Rolls signals into views that match the audience.** A student should see their own evidence portfolio — what they did, what feedback they received, where they grew, where they are stuck.

An instructor should see a per-section view that answers the three questions worth asking in any teaching moment: _who needs help now, what evidence supports that inference, where should I intervene_. A department chair or chief academic officer should see aggregated, privacy-preserving rollups of 4C coverage across a course, cohort, program, or institution — the kind of view an accreditor can read and a budget committee can fund against.

Layer 02

Assignment types as instruments

**The vehicles through which evidence is produced.** No single activity surfaces all of the constructs above. Discussions surface inquiry quality and reasoning chains. Writing surfaces argument structure and revision behavior. Close reading and annotation surface comprehension. Peer review surfaces collaboration and meta-critical thinking. Team-based projects surface coordination. Oral defense surfaces explanation under pressure.

And a newer category — **AI-integrated assignments** — surfaces AI literacy itself: students prompt, evaluate, accept, reject, and reflect on AI use, and the entire trajectory becomes evidence the instructor can read. Whoever owns the assignment owns the learning intelligence.

Layer 01

Pedagogy and the constructs

**The foundation. The platform has to know what it is trying to measure.** The 4Cs are the most defensible anchor framework available, paired with what a growing number of practitioners now call AI literacy capabilities: _judgment_ (knowing when to trust an AI output), _explanation_ (knowing how to defend a choice an AI helped produce), _coordination_ (working with AI as one collaborator among many), and _agency_ (deciding when not to use it at all).

Without an explicit construct layer, every "insight" the system produces is a behavioral metric in search of meaning.

The three-layer test

#### A buyer can diagnose any product in this space with three questions.

A platform that has only Layer 3 is a dashboard pretending to be intelligence. A platform that has only Layer 2 is a workflow tool. A platform that has only Layer 1 is a framework, not a system. The serious work of the next two years in this category is to assemble all three.

### The assignment-type taxonomy, mapped to the 4Cs

The strength of a learning intelligence platform's coverage of the 4Cs is the sum of the assignments it can host and the constructs it can map them to. A platform that hosts only one or two assignment types — discussion alone, or writing alone — is by definition a partial measurement system, regardless of how good its analytics layer is.

Assignment type

What it evidences

Primary 4Cs

Inquiry-driven discussion

Question quality, reasoning chains, response uptake, source-grounded debate.

Critical Thinking · Communication

Staged writing

Argument structure, source use, revision behavior, feedback uptake across draft history.

Critical Thinking · Communication

Close reading & annotation

Comprehension, engagement with text, ability to surface confusion, social sense-making.

Critical Thinking · Communication

Peer review & calibration

Quality of feedback given, reciprocity, ability to read another's work the way an instructor would.

Collaboration · Critical Thinking

Team-based work

Contribution equity, coordination moves, conflict resolution, collective problem-solving.

Collaboration · Creativity

Conversational reasoning & oral defense

Explanation under pressure, audience adaptation, ability to defend a position in dialogue.

Communication · Critical Thinking

Co-writing with AI

What the student asked, what they accepted, what they rejected, and why — captured as evidence.

Critical Thinking · AI Literacy

AI-integrated assignments

Judgment about when to use AI, explanation of choices, agency about when not to. The trajectory is the assessment.

AI Literacy · all 4Cs

### A maturity model for the institution

Institutions, like products, do not arrive at full learning intelligence overnight. A useful way to think about adoption is as three progressive postures an institution can take toward AI in its assessments, each enabled by a different set of platform capabilities.

Stage 01 · Foundation

AI Aware

The institution recognizes AI is in every classroom and makes its assessment more visible. Faculty use the 4Cs explicitly; assignments are designed with process visibility in mind. No longer pretending AI is absent — not yet doing anything specific about it.

-   4Cs framework & curriculum mapping
-   Faculty enablement & assignment design
-   Discussions and writing assignments
-   Engagement insights, course-level analytics

Stage 02 · Transition

AI Active

The institution now treats AI use itself as an object of instruction. Conversational reasoning, co-writing with AI, collaborative live thinking, and a metacognitive layer in which students reflect on what they asked and why. Faculty teach _with_ AI, not policing it.

-   All Foundation capabilities
-   AI literacy curriculum & thought leadership
-   Conversational reasoning · co-writing assignments
-   Metacognitive layer · visible decision-making
-   Cohort-level insights · student journey view

Stage 03 · Intelligence

AI Native

AI is embedded throughout, and the institution measures not only the 4Cs but AI literacy itself. Full learning-journey visibility for faculty and administration. The institution can produce, on demand, defensible evidence of what its graduates can do with AI and without it.

-   All Transition capabilities
-   AI literacy outcomes measurement
-   Full assignment-type library
-   Learning-journey visibility for admin & faculty
-   AI literacy benchmarking across cohorts
-   Predictive engagement & risk signals

The point

#### "AI policy" is not a single decision. It is a position on a continuum.

The point of the maturity model is not that every institution should sprint to AI Native. The point is that the right position depends on faculty readiness, student population, governance maturity, and what the institution is trying to certify. A learning intelligence platform that respects this is one that meets the institution where it is and offers a clear path forward.

CH 10 / 14

10

Chapter 10 · Practice

## How to teach _with_ AI

The categorical answer to "how should I teach with AI" is "it depends," which is true and useless. The operational answer, drawn from the research evidence assembled so far, is more specific.

#### For instructors

→

#### Start with what you are trying to certify.

The first question for AI redesign is whether your intended learning outcomes are still meaningful in a world where AI can produce most of the artifacts that previously evidenced them. If "the student can write a coherent five-page essay" is the outcome, that outcome is now a weak proxy for what the course probably actually cares about, which is something more like "the student can read closely, generate a thesis from evidence, defend it against an alternative reading, and revise based on critique." The redesign starts by sharpening the outcome.

→

#### Make AI use explicit and bounded.

The single most leveraged move any instructor can make is to write an AI use policy into each assignment that specifies what AI tools may be used, for what stages, and with what disclosure. The policy can be permissive ("any AI tool, any stage, with a disclosure paragraph") or restrictive ("no AI tools for this exam"). What it cannot be is implicit. Implicit policies turn every assignment into a guessing game for students and a detection puzzle for instructors.

→

#### Stage the work.

Almost any high-cognition assignment can be broken into stages with checkpoints. A research paper can have a topic proposal, an annotated bibliography, an outline, a draft, a peer review exchange, a revision memo, and a final draft. AI cannot easily fake six stages of legible thinking across two months.

→

#### Add a metacognitive layer.

The cheapest way to convert AI use from passive offloading into active learning is to require students to reflect on it. "What did you ask the AI? What did you accept? What did you reject? Why?" These three questions, asked in a short reflective paragraph attached to every AI-permitted assignment, do an enormous amount of work.

→

#### Reintroduce the voice.

A five-minute oral defense after a major paper produces more evidence about whether the student understood their argument than another five pages of writing would. AI is structurally bad at fake-defending an argument it generated. A student who wrote their paper genuinely can usually defend it. A student who didn't, usually can't.

→

#### Use AI as a feedback amplifier, not a grading machine.

The role generative AI is best at in the classroom is the role of patient, tireless, immediate first reader. AI feedback on a draft — what is the thesis, what is the strongest evidence, what is the weakest, what is missing — is often genuinely useful, and it can be delivered at 2 a.m. when the student is working. What AI is bad at is making the final grading judgment that affects the student's transcript and life. The Kestin study's effect size came from the AI doing the tutoring; the grading remained with the instructor.

→

#### Calibrate peer review.

Peer review done badly is busywork that students don't trust. Done well, it produces evidence of collaboration, communication, critical reading, and a powerful learning experience for the reviewer. The trick is calibration: train students on what good feedback looks like, give them rubrics, and assess them on the quality of their reviews as well as on their own work.

→

#### Talk about it openly with students.

This is the move faculty most often skip and most often regret. Students are not the enemy. Most of them want to learn. Many of them are confused, often legitimately, about what is and is not allowed. A short, honest conversation at the start of a course about why the assignments are designed the way they are does more to shape student behavior than any technical countermeasure.

The bar has moved

#### None of this is exotic. It is mostly old.

Before generative AI, instructors who used these practices were doing exceptional teaching. After generative AI, instructors who don't use them are doing weak assessment, regardless of whether they realize it.

The good news is that the bar has moved toward the kind of teaching most teachers came into the profession wanting to do. Less grading of artifacts that may or may not represent student thought. More conversation. More feedback. More visible learning. The AI era is a forcing function on a transition the research literature has been quietly recommending for thirty years.

CH 11 / 14

11

Chapter 11 · Institutions

## The _institutional_ layer

What individual faculty can do, by themselves, has limits. The transition learning intelligence describes is also an institutional transition, and the institutions that get this right will get it right at the system level.

The institutional response has so far been mixed. [EDUCAUSE's 2025 AI Landscape Study](https://www.educause.edu/content/2025/2025-educause-ai-landscape-study/introduction-and-key-findings) found 57% of higher-ed institutions now consider AI a strategic priority, up from 49% the previous year. The proportion with AI Acceptable Use Policies climbed from 23% to 39% in a single year. And yet only 9% reported that their cybersecurity and privacy policies were adequate for AI-related risks. The infrastructure is catching up to the urgency, slowly.

### Five institutional priorities

01

#### Get clarity on what the institution wants to certify.

Most colleges have learning outcomes documents. Most of those documents are aspirational and rarely consulted. The first step toward a defensible AI-era assessment strategy is to take those outcomes seriously enough to ask, for each one, what evidence the institution is actually generating, and whether that evidence is robust to AI. This is a faculty-governance conversation as much as it is an administrative one. It cannot be outsourced.

02

#### Align assessment design at the program level.

A degree certifies what the _program_ did, not what each course did. Programs need to map outcomes to courses, identify where each outcome is taught, practiced, and assessed, and ensure that the cumulative evidence supports the credential. A program where every course relies on take-home essays as its primary assessment is now structurally vulnerable. A program that distributes assessment across written, oral, project-based, and applied work is much more robust.

03

#### Fund the infrastructure for process evidence.

Capturing process evidence is more expensive than capturing artifact evidence. Draft histories take storage. Peer review systems take licenses. Oral defenses take time. The institutions that succeed will treat this as a capital expense, not an operating burden on individual instructors. The institutions that fail will quietly require faculty to do this work in addition to everything else they were already doing, at which point most faculty will quite reasonably refuse, and the assessment redesign will not happen.

04

#### Build the data governance before you need it.

Every learning intelligence system is also a student data system. The questions that matter — what data are collected, how long they are retained, who can see them, what inferences can be made — should be answered before the platforms are bought, not after. The Digital Education Council's 2024 survey found 80% of students said AI in universities was not fully meeting expectations, with 60% specifically worrying about the fairness of AI evaluations and 70% citing privacy. These are not abstract policy questions. They are trust prerequisites.

05

#### Invest in faculty development that respects faculty time.

The training that works is not generic AI literacy. It is discipline-specific assessment redesign, run by colleagues who teach similar courses, with concrete examples, model assignments, and time to revise actual syllabi. Institutions pairing instructional designers with faculty cohorts and offering course buyouts for redesign are seeing substantive curriculum change. Institutions that offered a webinar and a policy document are mostly still where they were three years ago.

The deeper question

#### The institution has to decide what business it is in.

The transactional model — pay tuition, complete assignments, receive credential — has been weakening for years, and AI accelerates the weakening. If the credential is going to mean anything in 2030, it will need to mean that the institution can credibly say what the holder of it can do. Which means the institution needs to have evidence. Which means it needs to invest in producing that evidence, deliberately, at scale, across its curriculum.

When output becomes cheap, evidence becomes everything. The institutions that act on this will be in a strong position. The ones that don't will find that their credentials are slowly losing their power to certify anything an employer or a graduate school cares about, and that nobody can quite say when it happened.

CH 12 / 14

12

Chapter 12 · The market

## Where to look in the _field_

A category is more real when you can see who is building inside it. The market for learning intelligence and AI-era assessment is still forming, and the categories below are more useful than vendor names — most companies fit primarily inside one of them, and the test for any product is what it actually does, not what it markets itself as.

The five categories below were derived by reading the field across higher education and K-12 vendor materials, peer-reviewed efficacy research where it exists, and the institutional buying patterns visible in EDUCAUSE, Tyton, and HEPI data. Each category solves a different piece of the assessment-after-AI problem. Each has a place in a well-considered assessment architecture. None of them, on its own, is yet a complete answer.

### The capability matrix

The matrix below is the fastest way to read the market. Categories run down the rows; the seven capabilities a learning intelligence platform needs to deliver run across the columns. A buyer can read it in under a minute and immediately see what each category gives them and what it leaves on the table.

Category

Process
evidence

Construct
mapping (4Cs)

Multiple
assignment types

Faculty-
readable

Accreditor-
defensible

Institutional
rollups

LMS
integration

Process-native LI

●

●

●

●

◐

◐

●

Integrity-first & detection

◐

○

○

◐

◐

◐

●

LMS-native analytics

○

◐

○

◐

●

●

●

AI tutoring

◐

○

◐

○

○

○

◐

K-12 instructional

◐

◐

◐

●

○

◐

◐

●

Full capability

 ◐

Partial

 ○

Not in the category

### The five categories, explained

01 · The newest

#### Process-native learning intelligence

Products in this category capture evidence of student thinking inside specific high-cognition assignments — discussion, writing, peer review, oral defense, AI-integrated work — and map those signals to learning constructs like the 4Cs and AI literacy. They are pedagogically opinionated by design and are the only category architecturally aligned with the "process over artifact" thesis the rest of the field is now converging on.

**Strength**Produce defensible evidence that learning happened, before the final artifact is submitted.

**Gap**Cross-program institutional reporting at the scale of LMS analytics is still expanding. Most platforms stop at the assignment or course level.

**When to look**When the question is "how do we know learning happened" and the artifact alone is no longer trustworthy. Most useful for institutions facing assurance-of-learning pressure.

**Representative**Packback · Cadmus · FeedbackFruits · Perusall · Kritik

02 · The incumbent

#### Integrity-first and detection

The established category for academic-misconduct workflow. Products here detect similarity, identify likely AI-generated text, capture authorship transparency, and provide audit trails for hearing processes. The category leaders are now pivoting toward process visibility (notably Turnitin's Clarity product), recognizing that detection alone is structurally limited.

**Strength**Surface signals that a piece of work may not be the student's own. Maintain the institutional integrity infrastructure accreditors still expect.

**Gap**Cannot prove learning occurred (only flag where it may not have). [Vanderbilt's 2023 decision to disable Turnitin's AI detector](https://www.vanderbilt.edu/brightspace/2023/08/16/guidance-on-ai-detection-and-why-were-disabling-turnitins-ai-detector/) on fairness grounds was an early signal of waning institutional confidence; non-native English speakers continue to face 2–3× higher false-positive rates.

**When to look**When the institution still needs an integrity workflow for high-stakes assessments, with full awareness of false-positive risk and an explicit policy that detection alone is not sufficient evidence for misconduct.

**Representative**Turnitin · Originality.AI · GPTZero · Copyleaks

03 · The institutional layer

#### LMS-native analytics

The reporting infrastructure where provost- and CIO-level conversations already happen. Products here aggregate course-level data — submissions, grades, page views, login frequency — into dashboards that flag at-risk students and report on course health and outcome trends. They are where institutional accreditation reporting is already structured.

**Strength**Provide the institutional view. Native data flow from existing LMS adoption. Already trusted by accreditors for outcome reporting.

**Gap**Engagement counts are not learning measurements; participation and login frequency are weak proxies for whether anything has been learned. The 24–48 hour data lag also makes them retrospective by design, not real-time intervention tools.

**When to look**For the institutional reporting layer, not as a substitute for assignment-level evidence. Best deployed as the destination that ingests process evidence from elsewhere.

**Representative**Canvas Intelligent Insights · D2L Achievement+ and Lumi · Anthology Analytics for Learn

04 · The fastest-growing

#### AI tutoring

The category propelled by both consumer interest and pedagogically engineered systems. Products here scaffold individual study sessions, generate explanations, provide practice problems, and deliver immediate formative feedback. Some are extraordinarily effective when carefully designed; others are general-purpose chatbots in a study skin.

**Strength**Personalize learning at scale. The [Kestin et al. 2025 RCT in Scientific Reports](https://www.nature.com/articles/s41598-025-97652-6) showed a carefully designed AI tutor producing learning gains roughly twice those of in-class active learning.

**Gap**AI tutoring is an input to learning, not a measurement of it. A student who learned a topic through a tutor still needs an assessment surface that captures whether the learning held without the tool.

**When to look**For personalized study support, supplemental instruction, and outcomes-aligned remediation — not for grading, certification, or program-level assurance.

**Representative**Khan Academy and Khanmigo · ChatGPT Study Mode · Squirrel AI · Carnegie Learning

05 · The K-12 wedge

#### K-12 instructional intelligence

The category that has most actively claimed the "learning intelligence" phrase, generally inside K-12 curriculum and lesson workflows. Products here align lessons to standards, deliver classroom-level visibility to teachers, and provide AI productivity tools for routine instructional tasks. The K-12 context — younger students, more standardization, less faculty autonomy — is structurally different from higher ed, and tools built primarily for it often do not generalize directly upward.

**Strength**Standardize AI use across schools and districts. Reduce teacher workload on lesson planning and feedback generation. Strongest district-level adoption motion in the field.

**Gap**Most are not yet evidence-architected for cross-program assurance in the way higher education will require. Strong in the classroom, weaker in the program-review or accreditation layer.

**When to look**For K-12 specifically, especially districts standardizing AI policy and seeking teacher-productivity gains.

**Representative**Kiddom · Subject · MagicSchool · Brisk

### Convergence is happening

The five categories overlap more than they used to. Integrity vendors are adding process-visibility features. LMS vendors are partnering with AI providers to embed tutoring and feedback directly. Process-native platforms are extending upward into the institutional reporting layer. K-12 vendors are eyeing higher ed.

The convergence is not random. Every category is moving toward the same destination: a system that captures evidence of how learning is happening, maps it to defensible constructs, and produces interpretable reports for the people who need them. That destination is what this guide has been calling learning intelligence throughout. The categories represent different starting points, not different end states.

This matters for the buyer because it means a vendor's category of origin tells you what they will do best in 2026 — but the maturity of their other layers tells you what they will be able to do in 2028. Integrity vendors moving into process visibility, LMS vendors moving into construct mapping, and process-native vendors moving into institutional rollups are all making the same bet on the same destination. The question is which ones will actually arrive.

A buyer's tool

#### Eight questions to ask any vendor

1.  Does the product capture process evidence, or only final artifacts?
2.  Can you map signals to specific learning constructs (such as the 4Cs and AI literacy), or only to engagement counts?
3.  Can a faculty member see _why_ a student was flagged or scored, in terms of observable behavior?
4.  Can a student see their own evidence, and contest incorrect inferences?
5.  Does the platform host multiple assignment types, or only one?
6.  Is the data architecture defensible to an accreditor, with rollups by course, cohort, and program?
7.  What student data is captured, how long is it retained, and who has access?
8.  Is any high-stakes judgment — grades, integrity findings, intervention referrals — held by humans?

The vendor that can answer all eight questions clearly, in plain language, without sales evasion, is the vendor worth talking to longer. The vendor that responds to half of them by talking about how innovative their AI is — that's the answer to the question you actually asked.

CH 13 / 14

13

Chapter 13 · Limits

## Open questions and _limits_

Honesty requires acknowledging what the field of learning intelligence does not yet know, and what its risks are.

### The research base is still young

Most of the strongest empirical studies on AI in higher education are short-term, often in specific disciplines (often language learning or introductory STEM), often with carefully engineered interventions that may not generalize to the chatbot a typical student uses on a typical Tuesday night. The Kestin RCT is the strongest single piece of evidence for AI as a learning amplifier, and it is one study, in one course, with a custom-built tutor. Larger and longer trials are coming, but the field's current claims should be held with appropriate humility.

### Process measurement is hard

The process-data and stealth-assessment literatures are robust as frameworks but uneven as implementations. Most existing edtech "process data" is engagement counting in disguise. The hard work of mapping events to constructs, validating those mappings, and demonstrating that the resulting inferences are fair across student populations is mostly still ahead of us.

### Affective AI is not ready

Multiple recent systematic reviews, including a 2026 review of 96 studies, have found that affective computing in education tends to study engagement, confusion, and frustration from facial expression CNNs, often without real classroom validation and almost always without serious ethical analysis. The safe use case is opt-in, low-stakes, instructor-facing support. Anything resembling automatic grading of "engagement" or "attention" from a webcam is not safe.

### Equity outcomes are unclear

A within-subject writing experiment in 2025 found that all students benefited from AI assistance but less-skilled writers benefited more — suggesting AI could narrow some performance gaps. The DEC and HEPI surveys, on the other hand, show socioeconomic divides in usage patterns. Whether AI is a leveler or an amplifier of existing inequalities is probably context-dependent, and learning intelligence systems should be designed to monitor and audit their own equity effects rather than assume them.

### Privacy is the chronic risk

Capturing process evidence at high fidelity creates real surveillance risks. The safe defaults are data minimization, bounded retention, redaction, role-based access, and auditable model use. A learning intelligence system that does not respect these defaults will produce backlash that may delay the entire field.

### The category itself is contested

"Learning intelligence" is already being used by multiple companies, including 1EdTech, Kiddom, Subject, and Brisk. The phrase has not been claimed by any single body and probably will not be. The thing that matters is not who owns the phrase but who builds the practice. The practice can succeed under several names. What cannot succeed is treating the phrase itself as a moat.

CH 14 / 14

14

Chapter 14 · Closing

## The post-output _classroom_

There is a temptation, in a moment like this, to be either apocalyptic or utopian. Either the universities are ending and the AI is winning, or the AI is liberating and the old gatekeepers should get out of the way. Both moods miss what is actually happening.

What is actually happening is that a measurement system whose limits had been quietly tolerated for decades has now broken in public. The old contract — submit the artifact, receive the grade, accumulate the credential — relied on a scarcity that has been quietly removed. Faculty are not wrong that something has been lost. Students are not wrong that something has been gained. Both are responding to real features of the situation.

The way out is not nostalgia and not utopia. It is to take what we have always known about how people learn, and what we have always known about how to assess what they have learned, and finally to build the infrastructure that takes those things seriously. The research has been telling us for thirty years that learning is a process, that feedback is the most powerful intervention we have, that the 4Cs are practices, that authentic assessment is the only assessment that produces durable evidence. We did not act on it at scale because the artifact-only model was good enough.

It is no longer good enough.

Learning intelligence is the name being attached, for now, to the work of building the new system. Whether the term sticks is less important than whether the practice does. The practice is captured in a few principles that have been the through-line of this entire guide. Watch the process, not just the product. Map signals to constructs, not to clicks. Treat evidence as something the human reads, not something the system decides. Be transparent about what is captured and why. Keep humans in the loop for any decision that matters.

A teacher in 2026 who designs an assignment that produces six points of legible thinking, gives students explicit AI use guidelines and a metacognitive reflection prompt, calibrates peer review, requires a short oral defense, and uses AI as a tireless feedback amplifier between drafts is not doing something exotic. They are doing the kind of teaching that the research literature has recommended since before most of their current students were born. The difference is that, in 2026, they are also doing the only kind of teaching whose evidence still holds up.

The institutions that fund this work, that align their assessment infrastructure to it, that protect their faculty's time to do it, and that build the data governance to support it without slipping into surveillance, will be the institutions whose credentials still mean something at the end of the decade. The institutions that don't will continue to graduate students whose transcripts certify achievements the institutions can no longer credibly verify.

The post-output classroom is here. It has been here, in fragments, for years. What learning intelligence is for is to assemble those fragments into a system the next generation of students can actually be learners inside.

The work

#### That is the work. It will take a decade.

It is the most interesting work in education right now.

---

---

Sources & further reading

## The evidence _behind_ this guide

This guide synthesizes peer-reviewed research, major institutional reports, and current sector survey data published between 1998 and early 2026. The most consequential sources are organized below by what they ground in the argument.

#### The pedagogy that still holds

-   [Freeman et al. (2014). Active learning increases student performance in science, engineering, and mathematics. _PNAS_.](https://www.pnas.org/doi/10.1073/pnas.1319030111) — Meta-analysis of 225 studies, 0.47 SD improvement under active learning.
-   [Hattie & Timperley (2007). The power of feedback. _Review of Educational Research_.](https://journals.sagepub.com/doi/abs/10.3102/003465430298487) — The canonical synthesis on feedback (effect sizes 0.70–0.79).
-   [Black & Wiliam (1998). Inside the Black Box.](http://allianceforlearning.co.uk/wp-content/uploads/2017/03/William-and-Black-Inside-the-Black-Box.pdf) — The case for formative assessment that reshaped classroom practice worldwide.

#### The 4Cs and authentic assessment

-   [Partnership for 21st Century Learning Framework](https://www.battelleforkids.org/wp-content/uploads/2023/11/P21_framework_0816_2pgs.pdf) — The codification of the 4Cs.
-   [AAC&U VALUE Rubrics](https://www.aacu.org/value/rubrics) — Sixteen faculty-developed rubrics for authentic assessment, used by 2,000+ institutions.
-   [PISA 2022 Creative Thinking Results](https://www.oecd.org/en/topics/sub-issues/creative-thinking/pisa-2022-creative-thinking.html) — The first international measurement of creative capability.

#### The surveys that define the moment

-   [HEPI Student Generative AI Survey 2026](https://www.hepi.ac.uk/reports/student-generative-ai-survey-2026/) — 88% of UK undergraduates using GenAI for assessed work.
-   [Elon University & AAC&U (2026). The AI Challenge.](https://www.aacu.org/research/the-ai-challenge) — 95% of faculty expect overreliance; 90% expect diminished critical thinking.
-   [Digital Education Council Global Student AI Survey 2024](https://www.digitaleducationcouncil.com/post/what-students-want-key-results-from-dec-global-ai-student-survey-2024) — 86% of students using AI regularly; 50% don't feel AI-ready.
-   [Tyton Partners. Time for Class 2025.](https://www.globenewswire.com/news-release/2025/06/11/3097384/0/en/Tyton-Partners-Releases-2025-Time-for-Class-Report-Institutions-Rebalance-Human-Connection-and-Digital-Innovation-in-Higher-Ed.html) — 38% of faculty report increased workload from AI vs 11% decreased.
-   [EDUCAUSE 2025 AI Landscape Study](https://www.educause.edu/content/2025/2025-educause-ai-landscape-study/introduction-and-key-findings) — Strategic-priority, AUP, and workforce data.

#### Policy and regulatory landmarks

-   [U.S. Department of Education (2023). Artificial Intelligence and the Future of Teaching and Learning.](https://www.ed.gov/sites/ed/files/documents/ai-report/ai-report.pdf)
-   [UNESCO (2023). Guidance for Generative AI in Education and Research.](https://www.unesco.org/en/articles/guidance-generative-ai-education-and-research)
-   [TEQSA (2023). Assessment Reform for the Age of Artificial Intelligence.](https://www.teqsa.gov.au/guides-resources/resources/corporate-publications/assessment-reform-age-artificial-intelligence)
-   [OECD Digital Education Outlook 2026](https://www.oecd.org/en/publications/oecd-digital-education-outlook-2026_062a7394-en.html) — The "false mastery" framing.
-   [Vanderbilt University (2023). Guidance on AI Detection.](https://www.vanderbilt.edu/brightspace/2023/08/16/guidance-on-ai-detection-and-why-were-disabling-turnitins-ai-detector/)

#### The new outcome studies

-   [Kestin et al. (2025). AI tutoring outperforms in-class active learning. _Scientific Reports_.](https://www.nature.com/articles/s41598-025-97652-6) — Harvard physics RCT showing roughly 2× learning gains with a designed AI tutor.
-   [OpenAI (2025). Introducing Study Mode.](https://openai.com/index/chatgpt-study-mode/) — The platform layer's move toward scaffolded learning.
-   [Gerlich (2025). AI Tools in Society: Impacts on Cognitive Offloading and the Future of Critical Thinking. _Societies_.](https://www.mdpi.com/2075-4698/15/1/6)
-   [Miller et al. Use of a Social Annotation Platform for Pre-Class Reading. _Frontiers in Education_.](https://www.frontiersin.org/journals/education/articles/10.3389/feduc.2018.00008/full) — Annotation behavior accounts for 41.8% of variance in post-class essay performance.

#### The category being defined

-   [SoLAR (2025). Updated definition of Learning Analytics.](https://www.solaresearch.org/2025/06/solars-learning-analytics-definition-taskforce-releases-report/)
-   [OpenAI. Introducing ChatGPT (Nov 2022).](https://openai.com/index/chatgpt/) — Where this all started.

For agents & technical readers

[**⌘ Agent-ready prompts** Five copyable prompts for Claude, ChatGPT, Cursor.](#prompts) [**↓ Download PDF (44 pages)** Print-ready edition with full citations.

](learning-intelligence-guide.pdf)

---

[**⊞ Raw Markdown source** Full essay text, agent-friendly.

](learning-intelligence-article.md)

For agents