What use would a guessing clock be? We are often told that even a broken clock tells the right time twice a day. But what about a clock that randomly presents a time when a button is pressed? If the clock is random, it will, of course, be mostly wrong. But, from time to time, we can imagine that the clock does, in fact, chance upon the right time.

Imagine waking up and pressing the clock to see the time: 6.55 am. Time to get up. Or is it? Maybe the clock is wrong. Maybe it is the middle of the night. Or maybe you have overslept and it is much later.

What use would such a clock be? Without some measure of reliability this clock would be useless. Actually, no. It would be worse than useless since people might use it, believing it to be accurate. The fact that from time to time it would be correct is no help because, of course, we don’t know when its right or wrong.

But maybe we can independently verify the time it tells. Perhaps we could check with another source (another clock, or the position of the sun in the sky). But if there was another clock available, why on earth would we use this guessing clock at all? If we could read the time from the position of the sun, again wouldn’t that make our guessing clock redundant?

Let’s imagine now that our guessing clock benefits from a mysterious technological improvement, making it correct 90% (or 99%) of the time, but that when its wrong, its very wrong, not just a minute or two slow or fast, but completely, randomly wrong. When we press our improved clock do we trust the displayed time? Why not? Chances are it is correct.

Theorists of AI talk about a parallel problem in AI known as the validation paradox. If we suppose that what GenAI presents is very likely correct, but with the tiny possibility that it is wrong, then we have a big problem. If we could easily validate the output then, like our slightly dodgy clock, we can ask what the point is of having the GenAI at all. If we can’t validate the output, then we can’t be sure of its reliability. The need for independent validation might not always render all outputs useless, but it certainly creates a problem. That problem is magnified because of the propensity of human users to overlook the potential inaccuracies of GenAI because they seem so convincing and plausible.

Educational theorists might recognise the form of this paradox from the learning paradox derived from Plato’s dialogue, Meno: How can we inquire into something we do not know? If we already know it, inquiry is unnecessary. If we do not know what it is we’re looking for, we would not recognise it even if we came upon it. This dilemma – known as Meno’s paradox - calls into question the very possibility of learning. Socrates famously responds to the paradox by invoking the idea of anamnesis, or recollection: that learning is not acquisition from outside but recollection of knowledge already latent in the soul, hence we are able to re-cognise as true something we apparently did not know. The latent presence of the knowledge in the soul means we are (at least in some sense) able to validate what is learned (this argument only really works with certain forms of non-empirical knowledge like knowledge of ‘justice’). The connection with GenAI concerns the idea of output validation: to be helpful, the knowledge provided by GenAI should be something I couldn’t easily arrive at myself. But if so, it is also knowledge that I cannot easily validate. In short, the problem is this: The more powerful and useful the GenAI output is, the less able I am to be certain of its validity.

So what should we do? Stop using GenAI? Readers should be aware that we are already using it to a huge extent, and it might seem unrealistic to simply stop. No one really knows how much social media is comprised of AI generated content, but it is safe to assume it is considerable. Soon enough (if not already) the music we listen to, the books we read, and the TV and movies we watch could all have the fingerprints of GenAI. And would we even know? Would we care? This raises a second kind of validation problem: how can we know what has, and had not been generated by some kind of artificial intelligence?

Perhaps the real danger lies not in the unreliability of GenAI, nor even in its ubiquity, but in what it encourages us to forget: that education is not a process of arriving at the right answers, but of dwelling with uncertainty, of learning how to ask better questions, and of becoming attuned to what is worthy of our attention. Like the stochastic clock, GenAI might sometimes tell us something useful—but only if we remember that the real work of learning is not in pressing the button, but in knowing when not to. In a world of increasingly plausible simulations, perhaps education’s most urgent task is to sustain our capacity for discernment—not just of facts, but of meaning, trust, and value.

David Lewin, July 2025