A Framework for Evaluating AI Powered Education: Purpose, Embodiment, and Pedagogical Application
- SRGE
- 3 days ago
- 3 min read
Summary: AI tools in education are easier to evaluate when teams share a common vocabulary. Three concepts do the work: purpose (what the tool does), embodiment (how it appears to the learner), and pedagogical application (whether it is pastoral, instructional, or metacognitive). Together, these categories let ed-tech professionals compare tools, interpret research, and make clearer decisions.
Written by Alex@srge.co
Education technology teams evaluate AI tools constantly, but those conversations often stall because the vocabulary isn't shared. The three concepts below, purpose, embodiment, and pedagogical application, give teams a common foundation for discussing what a tool does, how it presents itself, and what learning function it serves.
Purpose describes what the AI actually does. A tool may serve one purpose or several:
Providing access to a knowledge base
Motivating or inspiring learners through emotional or psychological stimuli
Promoting analytical, divergent, critical, or creative thinking
Simulating a context, role, or persona for the learner to interact with
Providing formative or summative feedback on learner work
Monitoring learner progress and delivering recommendations or instruction based on that data
Embodiment describes how the tool appears to the user. Some systems are visible, a chatbot the student types to, or an avatar they interact with directly. Others are invisible; the student notices only that content adapts to them, without knowing an AI model drives it.
Pedagogical application describes the learning function the tool serves. In a course context, tools fall into three categories:
Pastoral: supports the instructor or course administrator. Handles assessment, monitoring, and reporting. The AI serves the operator, not the student.
Instructional: delivers or guides content directly to the student. Acts as a tutor or on-demand explainer. This category carries the strongest research evidence.
Metacognitive: helps students develop awareness of their own learning. Rather than delivering content, it asks students to reflect, predict, or explain their thinking before receiving feedback. The goal is durable learning: students who can self-correct.
Three examples:
An example of a Pastoral system, created by Habeed et al., is a disembodied system that reduces the effort a teacher must spend administering viva voce performance-based exams. Their system uses the ITS model, built on four components: a Domain Model (the subject knowledge base), a Student Model (tracking individual progress and misconceptions), a Tutor Model (determining when and how to intervene), and a User Interface (the interaction layer). Together, these components continuously assess where a student is, compare it against expert knowledge, and respond accordingly.
Original Article: https://link.springer.com/article/10.1007/s10639-025-13755-7
An example of an Instructional system is the Harvard GPT-4 Physics Tutor. This GPT-4-powered Socratic tutoring system sits inside an undergraduate physics course. Students work through problem sets with AI guidance rather than static answer keys. The tutor asks probing questions, confirms correct reasoning, and redirects errors; it does not give answers directly. Kestin et al. used Bloom's Taxonomy, a framework for assessing learning outcomes, to evaluate it. They also note that the system's success may have depended on the fact that the builders were also the subject matter experts teaching a quantitative course to college students.
Original Article: https://pmc.ncbi.nlm.nih.gov/articles/PMC12179260/#Sec12
An example of a metacognitive tool is the Longevity Games Interview Simulator, developed to prepare undergraduate students for real-world research interviews with older adults. Built on GPT-4o and Claude-3.7, the simulator places students in repeatable, low-stakes interview scenarios generated from real demographic data. After each simulation, students complete a reflective exercise where they evaluate their own information-gathering and ethical reasoning around protected health information. The system does not test what students know; it asks them to examine how they think and why they made the decisions they did. A small pilot with senior nursing students showed improvements in confidence, preparedness, and ethical awareness. The authors also note the system's potential for adaptation across disciplines, suggesting its metacognitive scaffolding is not domain-dependent.
Original Article: https://journals.asm.org/doi/10.1128/jmbe.00122-25
Comparing the examples:
Each example served multiple purposes. Examples 1 and 2 provided access to a knowledge base, promoted critical thinking, and delivered formative feedback. Example 3 simulated a real interaction, provided feedback, and promoted critical thinking. Example 1 was disembodied; it helped the teacher facilitate and the student perform, but neither user communicated with a system meant to present as human. Example 2 is semi-embodied; users talk with a chatbot, but see no face and simulate no real-world interaction. Example 3 is fully embodied; users simulate a real person while knowing they interact with AI. Artificiality is the point.
To review:
Pedagogical application sits at the highest level. A system is instructional, metacognitive, or pastoral. Each application runs on a system that is embodied, disembodied, or semi-embodied. And each application serves one purpose or a collection of purposes that together define it. For ed-tech professionals, this vocabulary makes it easier to evaluate tools, align teams, and engage with the research.
If you are interested in discussing this topic more visit srge.co or email alex@srge.co

Comments