Many educators are understandably concerned about the impact of AI on education. There is a fear that students may rely on generative tools to complete coursework instead of engaging deeply with the material.
Our work started from a different concern altogether.
When you have been working in analytical chemistry for years (or decades), certain ideas become so natural that you stop noticing how conceptually demanding they are for beginners. You say things like, “As we all know…” or “You should already be familiar with this from the previous course.” But what feels obvious to the expert may still be transformative for the novice.
In that sense, the calibration problem is not only on the student side; it is often on the instructor side.
Threshold concepts
For readers unfamiliar with the term, a “threshold concept” is a concept that changes how a learner sees the discipline. Once understood, it reorganizes understanding in a way that is transformative and, in many cases, irreversible. It is less about memorizing content and more about crossing into the field’s way of thinking.
In analytical chemistry, professional identity is deeply tied to how we reason about measurement and evidence: how we judge data quality, interpret signals, validate methods, and make decisions under constraints. Students do not merely accumulate techniques; they gradually internalize a way of thinking about evidence.
Threshold concepts are the moments along that journey when a novice moves closer to becoming an analytical chemist.
Take precision and accuracy, for example. On the surface, they can look like definitional distinctions. But in practice, confusion here can distort how students interpret method performance and make laboratory decisions. Our upcoming paper argues that precision and accuracy may function as a threshold concept in analytical chemistry precisely because they reshape how measurement is understood and operationalized.
We have also explored concepts like chemical equilibrium and speciation – ideas that may appear “covered” in earlier courses, but repeatedly become troublesome when students must apply them in real analytical contexts.
The difficulty is that experts no longer experience these ideas as transformative. They have long since crossed the threshold.
Why AI?
Our project, funded by a Brazilian research agency (FAPESP), uses generative AI as a disciplined way to generate testable hypotheses for experts to evaluate.
We developed a two-stage workflow.
Stage 1 focuses on building and refining a prompt to analyze short excerpts from analytical chemistry textbooks, course texts, and research articles on analytical chemistry education. The prompt operationalizes widely discussed threshold-concept criteria from the education literature. For each excerpt, the model must surface candidate concepts and classify them by strength; for example, strong candidates, plausible candidates, and weak candidates, based on how well they align with threshold-concept characteristics. This stage is “AI-supported,” but heavily prompt-disciplined: the goal is not a free-form brainstorm, but a structured, criterion-anchored scan that can be repeated and audited.
Stage 2 takes the structured output from Stage 1 and aggregates it by theme. In other words, instead of treating each excerpt in isolation, we bring together all excerpt-level outputs related to a given theme, for example, precision and accuracy, and then look for convergences across sources. The result is a synthesized, themed shortlist designed specifically for expert adjudication: a list that analytical chemistry specialists can review, challenge, refine, and ultimately decide whether it is genuinely useful for identifying threshold concept candidates in the discipline.
The crucial point is this: the AI does not declare threshold concepts; it proposes candidates and patterns, and the human experts remain the decision-makers.
AI as a calibration tool
The metaphor of calibration is central to our thinking.
In instrumental analysis, if a calibration curve carries a systematic error, every measurement downstream will be biased in the same direction. Something similar can happen in education.
If instructors systematically underestimate the conceptual difficulty of certain ideas, they introduce a persistent bias into the learning process. Students’ struggles are then interpreted as carelessness or lack of preparation, rather than as signals of a deeper conceptual transition.
By surfacing candidate threshold concepts from disciplinary texts and then consolidating convergent patterns across multiple sources, AI can act like a mirror held up to the discipline. It helps reveal where “as we all know” may be masking genuine conceptual thresholds.
What AI can see that we might miss
Traditional curriculum design relies heavily on expert consensus. But consensus can sometimes conceal blind spots.
AI can help detect distributed signals across multiple sources; recurring conceptual tensions, repeated explanatory moves, and patterns that are hard to notice when you read one text at a time. When you then aggregate these patterns by theme, you often get a clearer picture of where learners may be crossing (or failing to cross) the same conceptual threshold again and again.
AI also disrupts the comfort of naturalized discourse and forces a second look at what we assume students “should already know.” In practice, this can reshape how instructors think about sequencing, emphasis, and assessment: if threshold concepts are real “crossings,” curricula should not treat every topic as equally weighted. Some ideas may deserve more time, more scaffolding, and more workplace-proximal practice than a content-heavy syllabus typically allows.
A legitimate concern, however, is whether such a framework risks drifting into abstract educational theory.
We avoid that by keeping the focus on laboratory reasoning and professional practice. The goal is not to create a purely conceptual taxonomy, but to identify threshold concept candidates that matter for how students interpret results, evaluate methods, and justify decisions in realistic analytical scenarios.
What we have observed so far
In preliminary tests across three themes: precision and accuracy, chemical equilibrium, and speciation, we saw increasing stability in excerpt-level identification as the Stage 1 prompt was refined. When we then moved to Stage 2, the themed aggregation helped produce more coherent shortlists by highlighting convergences across sources, rather than isolated claims from single excerpts.
We are now entering the phase in which analytical chemistry experts evaluate whether these aggregated outputs are genuinely useful for identifying threshold concept candidates and for informing teach-and-assess design.
One of the most productive moments is not when AI produces a list, but when an expert pauses and says, “I hadn’t thought about it that way.” That pause is the beginning of recalibration.
Looking ahead
If becoming an analytical chemist involves crossing conceptual thresholds, then curriculum design becomes less about content accumulation and more about calibrated conceptual progression. Future analytical scientists must be able to reason under uncertainty, integrate cross-disciplinary knowledge, and make defensible decisions in real-world contexts. Identifying and teaching through threshold concepts may be one way to prepare them more deliberately for that role.
