This blog post is a continuation from “Refining Rubrics & Assessments: AI as Design Support – Part 1“.

Using AI to Refine Rubric Language

In the previous post, I gave an assignment prompt to Copilot (as that’s the recommended tool at Oregon State University) and asked it to complete the task. For reference, here is the task.

Rubrics are often the weakest link in assessment design, particularly when descriptors rely on vague phrases like “meets expectations” or “demonstrates understanding.” One way to evaluate rubric clarity is to ask AI to self-assess its own response using the rubric criteria.

If the model can plausibly justify a high score despite shallow reasoning or inconsistent logic, the rubric may not be clearly distinguishing levels of performance. More precise rubrics specify what evidence matters and how quality differs, emphasizing reasoning, coherence, and alignment with course concepts rather than polish or length. Clear criteria benefit students, but they also make it harder for superficially strong work to masquerade as deep learning.


Rubric Analysis Prompt (Click to expand)

You are now acting as an external assessment reviewer, not a student.
You will be given:

  1. An assignment prompt
  2. A grading rubric
  3. A model-generated student submission (your own prior response)

Your task is not to grade the submission.
Instead, critically evaluate the rubric itself by answering the following:

  1. Rubric Vulnerabilities
    • Identify specific rubric criteria or descriptors that allow a high score to be justified through fluent but shallow reasoning.
    • For each vulnerability, explain what kind of weak or superficial evidence could still plausibly receive a high score under the current wording.
  2. Distinguishing Performance Levels
    • For at least three rubric categories, explain why the difference between “Excellent” and “Good” (or “Good” and “Satisfactory”) may be ambiguous in practice.
    • Describe what concrete evidence a human grader would need to reliably distinguish between those levels.
  3. AI Self-Assessment Stress Test
    • Using your own generated submission as an example, explain how it could convincingly argue for a high score even if underlying understanding were limited.
    • Point to specific rubric language that enables this justification.
  4. Rubric Strengthening Recommendations
    • Propose revised rubric language that makes expectations more explicit and evidence-based.
    • Emphasize observable reasoning, causal explanation, constraint awareness, or conceptual boundaries rather than general phrases such as “demonstrates understanding” or “well-justified.”

Constraints:

  • Do not rewrite the assignment prompt.
  • Do not assume access to course-specific lectures or materials.

Focus on how the rubric functions as an assessment instrument, not on pedagogy or student motivation.

Tone:
Analytical, critical, and concrete. Avoid generic advice.



You could use this directly by attaching a rubric, assessment prompt, and “submission”, or modifying it to your own situation.

Here is a section of the results it gave, along with the “thinking” section expanded to see the process of the generated answer:


(Copilot gave me an enormous amount of feedback, as expected because the rubric included a lot of generic language.)


Rethinking “Higher-Order Thinking” in an AI-Rich Environment

Frameworks like Bloom’s Taxonomy remain useful, but AI complicates the assumption that higher-order tasks are automatically more resistant to outsourcing. AI can analyze, evaluate, and even create convincing responses if prompts are static and unconstrained.

What remains more difficult to outsource is judgment. Assignments that require students to choose among approaches, justify those choices, identify uncertainty, or explain when a method would fail tend to surface understanding more reliably than tasks that simply ask for analysis or synthesis. When reviewing AI-generated responses, a helpful question is: What would a human need to know to trust this answer? Designing assessments around that question shifts the focus from output to accountability.

Instructors can strengthen authenticity by introducing under specified scenarios, realistic limitations, or prompts that require students to articulate how they would evaluate the reliability of their own results. These design choices don’t prevent AI use, but they make it harder to succeed without understanding when and why an answer might be wrong.


An Iterative Design Loop for Assessments and Rubrics

Using AI as an assessment design diagnostic and refinement tool can work best as an iterative process. Draft the assignment and rubric, test them with AI, analyze how success is achieved, and revise accordingly. The goal is not to reach a point where AI “fails,” but rather a point where success requires engagement with disciplinary concepts and reasoning. This mirrors quality-assurance practices in other domains: catching misalignment early, refining specifications, and retesting until the design reliably produces the intended outcome. Importantly, this loop should be finite and purposeful, not an endless escalation.

Conclusion

using AI in assessment design is not about surveillance or enforcement. It is a transparency tool. When instructors acknowledge that AI exists and design accordingly, they reduce the incentive for adversarial behavior and increase clarity around expectations. Being open with students about the role of AI (what is permitted, what responsibility cannot be delegated, and how understanding will be evaluated) helps maintain trust while preserving academic standards. The credibility of online and in-person education alike depends not on stopping students from using tools, but on ensuring that passing a course still signifies meaningful learning.

Takeaway Cheat Sheet

  • Think of AI as support, not a villain.
  • Stress‑test early: run the rubric through a model for verification before you hand it to students.
  • Refine granularity: precise descriptors = clearer expectations.
  • Target higher‑order thinking: embed authentic scenarios.
  • Iterate, don’t stagnate: keep the loop tight but finite.
  • Mind ethics: disclose, de‑bias, and set realistic limits.

For centuries, knowledge and access to education was restricted to just a few. In today’s’ world, almost anybody can access information through the web and more recently through AI tools. However, it is important to recognize that these tools, while offering expansive access to content of varied nature, also pose challenges. Generative AI has fundamentally changed how students interact with assignments, but it has also given instructors a powerful new lens for examining their own assessment design. Rather than treating AI solely as a threat to academic integrity, we can use it as a diagnostic tool – one that quickly reveals whether our assignments and rubrics are actually measuring what we think they are. If an AI can complete an assignment, and meet the stated criteria for success without engaging course-specific learning, is it really a student problem, or a signal to modify the design?


A small shift in perspective from “they’re using this to cheat” to “how can this help me prevent cheating” is especially important in online and hybrid environments, where traditional academic integrity controls like proctored exams are either unavailable or undesirable. Instead of trying to outmaneuver AI or police its use, instructors can ask a more productive question: What does success on this assignment actually require?


Why AI Is a Helpful Design Tool


AI can function as an unusually honest “devil’s advocate.” It doesn’t get tired, anxious, or confused about instructions, and it excels at finding the most efficient path to meeting stated requirements. When an instructor gives an AI model an assignment prompt and a rubric, the resulting output can expose whether the rubric rewards deep engagement or simply fluent compliance.


If an AI can generate a response that appears to meet expectations without referencing key course concepts, grappling with assumptions, or making meaningful decisions, then students can likely do the same. In this way, AI acts less like a cheating student and more like a mirror held up to our assessment design.

An example using Copilot:


Stress-Testing Assignments Before Students Ever See Them

One practical workflow to test the resilience of your assignments is to run them through AI before they are deployed. Provide the model with the prompt and the rubric (nothing else) and ask it to produce a strong submission. Then evaluate that response using your own grading criteria.

The point is not to judge whether the AI’s answer is “good,” but to analyze why it succeeds in meeting the set requirements easily and flawlessly (at first sight). If the response earns high marks through generic explanations, surface-level analysis, or broadly applicable reasoning, that’s evidence that the assessment may not be tightly aligned with course learning outcomes, focus on deeper thinking and analysis, or elicit students’ own creativity . This kind of stress-testing takes minutes, and often surfaces issues that would otherwise only become visible after grading a full cohort.


The Task (Click to reveal )

Assignment Prompt

Subject: Chemical Engineering
Level: Upper-level undergraduate (3rd year)
Topic: Reactor Design & Engineering Judgment

Assignment: Conceptual Design and Analysis of a Chemical Reactor

You are tasked with the preliminary design and analysis of a chemical reactor for the production of a commodity chemical of your choice (e.g., ammonia, methanol, ethylene oxide, sulfuric acid, or another well-established industrial product).

Your analysis should address the following:

  1. Process Overview
    • Briefly describe the selected chemical process and its industrial relevance.
    • Identify the primary reaction(s) involved and classify the reaction type(s) (e.g., exothermic/endothermic, reversible/irreversible, catalytic/non-catalytic).
  2. Reactor Selection
    • Propose an appropriate reactor type (e.g., CSTR, PFR, batch, packed bed).
    • Justify your selection based on reaction kinetics, heat transfer considerations, conversion goals, and operational constraints.
  3. Operating Conditions
    • Discuss key operating variables such as temperature, pressure, residence time, and feed composition.
    • Explain how these variables influence conversion, selectivity, and safety.
  4. Engineering Trade-Offs
    • Identify at least two major design trade-offs (e.g., conversion vs. selectivity, energy efficiency vs. safety, capital cost vs. operating cost).
    • Explain how an engineer might balance these trade-offs in practice.
  5. Limitations and Assumptions
    • Clearly state any simplifying assumptions made in your analysis.
    • Discuss the limitations of your proposed design at this preliminary stage.

Your response should demonstrate clear engineering reasoning rather than detailed numerical calculations. Where appropriate, qualitative trends, simplified relationships, or order-of-magnitude reasoning may be used.

Length: ~1,000–1,200 words
References: Not required, but accepted if used appropriately

The Rubric (Click to reveal)
CriterionExcellent (A)Good (B)Satisfactory (C)Unsatisfactory (D/F)
Understanding of Chemical Engineering PrinciplesDemonstrates strong understanding of reaction engineering concepts and correctly applies them to the chosen processDemonstrates general understanding with minor conceptual gapsShows basic familiarity but with notable misunderstandings or oversimplifications
Demonstrates weak or incorrect understanding of core concepts
Reactor Selection & JustificationReactor choice is well-justified using multiple relevant criteria (kinetics, heat transfer, safety, operability)Reactor choice is reasonable but justification lacks depth or completenessReactor choice is weakly justified or based on limited reasoning

Reactor choice is inappropriate or unjustified
Analysis of Operating ConditionsClearly explains how operating variables affect performance, safety, and efficiencyExplains effects of variables with minor omissions or inaccuracies
Provides limited or superficial discussion of operating conditions

Fails to meaningfully analyze operating variables
Engineering Trade-OffsInsightfully identifies and explains realistic trade-offs, demonstrating engineering judgmentIdentifies trade-offs but discussion lacks nuance or integrationTrade-offs are mentioned but poorly explained or generic
Trade-offs are absent or incorrect
Assumptions & LimitationsAssumptions are clearly stated and critically evaluatedAssumptions are stated but not fully examined
Assumptions are implicit or weakly articulated

Assumptions are missing or inappropriate
Clarity & OrganizationResponse is well-structured, clear, and professionalGenerally clear with minor organizational issues
Organization or clarity interferes with understanding


Poorly organized or difficult to follow



Identifying Gaps in What We’re Measuring

AI performs particularly well on tasks that rely on recognition, pattern matching, and general world knowledge. This means it can easily succeed on assessments that emphasize recall, procedural execution, or elimination of obviously wrong answers. When that happens, the assessment may be measuring familiarity rather than understanding.

Revising these tasks does not require making them longer or more complex. Instead, instructors can focus on higher-order thinking and metacognition, for example requiring students to articulate why a particular approach applies, what assumptions are being made, or how results should be interpreted. These shifts move the assessment away from answer production and toward critical and disciplinary thinking – without assuming that AI use can or should be eliminated. The point of identifying the gaps can also help you revisit the structure of the assignment to determine how each of its elements (purpose, instructions/task/prompt, and criteria for success) are cohesively connected to strengthen the assignment.

In the second part of this blog, I take the same task above, and work with the AI to refine a rubric.

There are many benefits to using rubrics for both instructors and students, as discussed in Rubrics Markers of Quality Part 1 – Unlock the Benefits. Effective rubrics serve as a tool to foster excellence in teaching and learning, so let’s take a look at some best practices and tips to get you started.

Best Practices

Alignment

Rubrics should articulate a clear connection between how students demonstrate learning and the (CLO) Course Learning Outcomes. Solely scoring gateway criteria, the minimum expectations for a task, (e.g., word count, number of discussion responses) can be alluring. Consider a rubric design to move past minimum expectations and assess what students should be able to do after completing a task.

Detailed, Measurable, and Observable

Clear and specific rubrics have the potential to communicate to how to demonstrate learning, how performance evaluation measures, and markers of excellence. The details provide students with a tool to self-assess their progress and level up their performance autonomously.

Language Use

Rubrics create the opportunity to foster an inclusive learning environment. Application of clear and consistent language takes into consideration a diverse student composition. Online students hail from around the world and speak various native languages. Learners may interpret the meaning of different words differently. Use simple terms with specific and detailed descriptions. Doing so creates space for students to focus on learning instead of decoding expectations. Additionally, consider the application of parallel language consistently. The use of similar language (e.g. demonstrates, mostly demonstrates, and doesn’t demonstrate) across each criterion can be helpful to differentiate between each performance level.

Tips of the Trade!

Suitability

Consider the instructional aim, learning outcomes, and the purpose of a task when choosing the best rubric for your course.

  • Analytic Rubrics: The hallmark design of an analytic rubric evaluates performance criteria separately. Characteristically this rubric’s structure is a grid, and evaluation of performance scores are on a continuum of levels. Analytic rubrics are detailed, specific, measurable, and observable. Therefore, this rubric type is an excellent tool for formative feedback and assessment of learning outcomes.
  • Holistic Rubrics: Holistic rubrics evaluate criteria together in one general description for each performance level. Ideally, this rubric design evaluates the overall quality of a task.  Consider the application of a holistic rubric can when an exact answer isn’t needed, when deviation or errors are allowed, and for interpretive/exploratory activities.
  • General Rubrics: Generalized rubrics can be leveraged to assess multiple tasks that have the same learning outcomes (e.g., reflection paper, journal). Performance dimensions focus solely on outcomes versus discrete task features.

Explicit Expectations

Demystifying expectations can be challenging.  Consider articulating performance expectations in the task description before deploying a learning task. Refrain from using rubrics as a standalone vehicle to communicate expectations. Unfortunately, students may miss the rubric all together and fail to meet expectations. Secondly, make the implicit explicit! Be transparent. Provide students with all the information and tools they need to be successful from the outset.

Iterate

A continuous improvement process is a key to developing high-quality assessment rubrics. Consider multiple tests and revisions of the rubric. There are several strategies for testing a rubric. 1) Consider asking students, teaching assistants, or professional colleagues to score a range of work samples with a rubric. 2) Integrate opportunities for students to conduct self-assessments. 3) Consider assessing a task with the same rubric between course sections and academic terms. Reflect on how effectively and accurately the rubric performed, after testing is complete. Revise and redeploy as needed.

Customize

Save some time, and don’t reinvent the wheel. Leverage existing samples and templates. Keep in mind that existing resources weren’t designed with your course in mind. Customization will be needed to ensure the accuracy and effectiveness of the rubric.

Are you interested in learning more about rubrics and how they can enrich your course? Your Instructional Designer can help you craft effective rubrics that will be the best fit for your unique course.

References

Additional Resources

The Basics
Best Practices
Creating and Designing Rubrics

With the migration to Canvas comes many new features and methods for facilitating your course.stock-photo-female-tourist-holding-a-map-890139 The Canvas Guides provide a lot of information, but you may be wondering, where do I even start? Here at Ecampus, we’ve put together a few guides to help you become familiar with some of the tools in Canvas.

First, if you’re wondering, “I did this in Blackboard, but I can’t find it in Canvas; how do I…?”, we’ve created a few design options for that. These design options explore how to adapt features that you’ve used in Blackboard to the new Canvas environment.

 

We’ve also created some more in depth quick references that help explain how to use some of the most popular Canvas features.

 

The Quick Reference guides and other helpful Canvas-specific information can be found on our Canvas Faculty Resources page. We also have a list of resources for teaching an online course on our Teaching Resources page where you can find our favorite presentation, web-conferencing, and other tools.

 

Are there other features you’ve discovered or some you’d like to know more about? Leave your feedback in the comments!