Intentional AI Spotlight: Jonathan Kalodimos on assessing learning when students can’t hide behind the text

By Demian Hommel, CTL AI in Teaching and Learning Fellow in partnership with the AI Literacy Center

Stories of AI at OSU

As part of the Intentional AI at OSU series, I sat down with Jonathan Kalodimos, an associate professor of finance and Harley & Brigitte Smith Fellow. Jon’s background includes a tenure as a financial economist at the U.S. Securities and Exchange Commission, and he brings that same empirical rigor to a problem now facing every department on campus: generative AI has made written work unreliable as a signal of what students actually know. His response is to build a system that makes oral evaluation—where you can hear students think—feasible at scale.

The challenge: Seeing what students actually know

In a traditional classroom, written assignments have been the primary window into student understanding. Generative AI has clouded that window. For Jon, the challenge goes deeper than academic integrity:

  • Writing has become opaque: A well-prompted chatbot can produce a polished essay or financial analysis indistinguishable from strong student work. The signals faculty have relied on for decades—clarity of argument, depth of analysis—are now easily manufactured. Detection software is unreliable and shifts the focus from learning to policing.
  • Oral evaluation reveals reasoning: When a student presents orally, they have to explain, connect, and defend ideas. You get a direct signal of what they actually understand. But grading presentations is one of the most time-intensive things faculty do, which keeps most instructors from assigning them at the frequency students need.
  • The gap is structural: Faculty need a system that handles the labor-intensive parts of evaluating oral work—extracting evidence, mapping it to criteria—so they can focus their limited time on exercising judgment.

The innovation: A faculty-AI partnership for oral assessment

Jon has built a system at OSU that processes student video presentations and produces evidence-based evaluation drafts for faculty review—a partnership that makes better assessment sustainable.

  • Evidence extraction: The system goes through every word and image a student said and every piece of evidence in their slides, then pulls out the specific evidence relevant to each rubric criterion. The output is a draft showing the instructor exactly what the student demonstrated, with defined evidence extracted from the presentation.
  • Cognitive classification using Bloom’s taxonomy: Beyond collecting evidence, the system maps what it finds against Bloom’s taxonomy. Did the student merely recall a definition or explain a concept in their own words? Apply a method or analyze trade-offs between approaches? This gives faculty a structured starting point for evaluating the depth of student thinking, which they can adjust based on their own professional judgment.
  • Domain-agnostic by design: Because the system evaluates what students say and write on their slides rather than discipline-specific structures, it works across disciplines. The same framework applies whether a finance student is defending a valuation, a geography student is presenting research, or an engineering student is walking through a technical design. What changes is the rubric, not the system.

Reflection: AI as a partner in better assessment

Most of the time faculty spend evaluating oral work goes to gathering evidence, not making judgments. Jon’s system redirects that balance: if AI can reliably surface what a student said and map it to your criteria, you spend your time on the decisions that require your expertise.

Key advice for faculty

  • Start with oral, even small: As written work becomes easier for AI to generate, consider adding even short oral components—a three-minute video explanation of a concept gives you a window into student reasoning that a polished essay no longer provides.
  • Write criteria around what’s observable: Whether or not you use AI-assisted evaluation, rubrics built around observable behaviors produce better feedback. “Student explains the relationship between X and Y using a specific example” is actionable. “Student demonstrates deep understanding” is not.
  • Treat AI outputs as drafts: AI evaluation systems tend to be generous; they treat any mention of the right concepts as evidence of comprehension. Faculty review is the mechanism that maintains rigor.
  • Think beyond your own classroom: The assessment challenges created by generative AI aren’t unique to any single discipline. Faculty who engage with these approaches now are helping shape what AI-assisted education looks like across the institution.

Demian Hommel.

About the Author: Demian Hommel is a professor of geography and environmental science in the College of Earth, Ocean, and Atmospheric Sciences and is an AI in Teaching and Learning Fellow with the OSU Center for Teaching and Learning. When he isn’t exploring the societal and environmental impacts of AI, you can find him DJing under the alias Dr. Gonzo or trying to graft citrus trees in his greenhouse.


Top image generated with Microsoft Copilot.

Comments

Leave a Reply