This blog post is a continuation from “Refining Rubrics & Assessments: AI as Design Support – Part 1“.
Using AI to Refine Rubric Language
In the previous post, I gave an assignment prompt to Copilot (as that’s the recommended tool at Oregon State University) and asked it to complete the task. For reference, here is the task.
Rubrics are often the weakest link in assessment design, particularly when descriptors rely on vague phrases like “meets expectations” or “demonstrates understanding.” One way to evaluate rubric clarity is to ask AI to self-assess its own response using the rubric criteria.
If the model can plausibly justify a high score despite shallow reasoning or inconsistent logic, the rubric may not be clearly distinguishing levels of performance. More precise rubrics specify what evidence matters and how quality differs, emphasizing reasoning, coherence, and alignment with course concepts rather than polish or length. Clear criteria benefit students, but they also make it harder for superficially strong work to masquerade as deep learning.
Rubric Analysis Prompt (Click to expand)
You are now acting as an external assessment reviewer, not a student.
You will be given:
- An assignment prompt
- A grading rubric
- A model-generated student submission (your own prior response)
Your task is not to grade the submission.
Instead, critically evaluate the rubric itself by answering the following:
- Rubric Vulnerabilities
- Identify specific rubric criteria or descriptors that allow a high score to be justified through fluent but shallow reasoning.
- For each vulnerability, explain what kind of weak or superficial evidence could still plausibly receive a high score under the current wording.
- Distinguishing Performance Levels
- For at least three rubric categories, explain why the difference between “Excellent” and “Good” (or “Good” and “Satisfactory”) may be ambiguous in practice.
- Describe what concrete evidence a human grader would need to reliably distinguish between those levels.
- AI Self-Assessment Stress Test
- Using your own generated submission as an example, explain how it could convincingly argue for a high score even if underlying understanding were limited.
- Point to specific rubric language that enables this justification.
- Rubric Strengthening Recommendations
- Propose revised rubric language that makes expectations more explicit and evidence-based.
- Emphasize observable reasoning, causal explanation, constraint awareness, or conceptual boundaries rather than general phrases such as “demonstrates understanding” or “well-justified.”
Constraints:
- Do not rewrite the assignment prompt.
- Do not assume access to course-specific lectures or materials.
Focus on how the rubric functions as an assessment instrument, not on pedagogy or student motivation.
Tone:
Analytical, critical, and concrete. Avoid generic advice.
You could use this directly by attaching a rubric, assessment prompt, and “submission”, or modifying it to your own situation.
Here is a section of the results it gave, along with the “thinking” section expanded to see the process of the generated answer:

(Copilot gave me an enormous amount of feedback, as expected because the rubric included a lot of generic language.)
Rethinking “Higher-Order Thinking” in an AI-Rich Environment
Frameworks like Bloom’s Taxonomy remain useful, but AI complicates the assumption that higher-order tasks are automatically more resistant to outsourcing. AI can analyze, evaluate, and even create convincing responses if prompts are static and unconstrained.
What remains more difficult to outsource is judgment. Assignments that require students to choose among approaches, justify those choices, identify uncertainty, or explain when a method would fail tend to surface understanding more reliably than tasks that simply ask for analysis or synthesis. When reviewing AI-generated responses, a helpful question is: What would a human need to know to trust this answer? Designing assessments around that question shifts the focus from output to accountability.
Instructors can strengthen authenticity by introducing under specified scenarios, realistic limitations, or prompts that require students to articulate how they would evaluate the reliability of their own results. These design choices don’t prevent AI use, but they make it harder to succeed without understanding when and why an answer might be wrong.
An Iterative Design Loop for Assessments and Rubrics
Using AI as an assessment design diagnostic and refinement tool can work best as an iterative process. Draft the assignment and rubric, test them with AI, analyze how success is achieved, and revise accordingly. The goal is not to reach a point where AI “fails,” but rather a point where success requires engagement with disciplinary concepts and reasoning. This mirrors quality-assurance practices in other domains: catching misalignment early, refining specifications, and retesting until the design reliably produces the intended outcome. Importantly, this loop should be finite and purposeful, not an endless escalation.
Conclusion
using AI in assessment design is not about surveillance or enforcement. It is a transparency tool. When instructors acknowledge that AI exists and design accordingly, they reduce the incentive for adversarial behavior and increase clarity around expectations. Being open with students about the role of AI (what is permitted, what responsibility cannot be delegated, and how understanding will be evaluated) helps maintain trust while preserving academic standards. The credibility of online and in-person education alike depends not on stopping students from using tools, but on ensuring that passing a course still signifies meaningful learning.
Takeaway Cheat Sheet
- Think of AI as support, not a villain.
- Stress‑test early: run the rubric through a model for verification before you hand it to students.
- Refine granularity: precise descriptors = clearer expectations.
- Target higher‑order thinking: embed authentic scenarios.
- Iterate, don’t stagnate: keep the loop tight but finite.
- Mind ethics: disclose, de‑bias, and set realistic limits.