“AI Tools in the Classroom?” – Radio Feature on Deutschlandfunk Kultur, Political Feuilleton
My position on the risks and shortcomings of automated feedback and grading tools in the classroom: radio feature on September 25, 2025, 7:20 a.m., Deutschlandfunk Kultur [in German].
Radio Feature Transcript (translation)
Host’s intro: “Artificial intelligence” in education – is this the solution to teacher shortages and overburdened schools? Many believe so and place great hopes in AI tools that are supposed to revolutionize teaching and assessment. AI theorist Rainer Mühlhoff, however, expects the hype to be overblown and reality to look very different.
Since the release of ChatGPT, Germany has seen a veritable boom in so-called AI tools for schools. These can be chatbots acting as learning tutors to explain homework to students, or tools for teachers that create lesson plans, generate feedback on homework, or automatically grade exams.
The promise that AI can take over complex and pedagogically sensitive tasks such as giving individual improvement suggestions or assessing performance seems to be part of the hype around the new technology – which makes us believe that AI can pretty much do everything. This optimism meets an education system that is working beyond capacity, struggling with teacher shortages and growing class sizes, and has been chronically underfunded for decades.
In reality, the state of our education system is a politically created problem. But precisely in such deadlocked situations, technological and thus supposedly depoliticized solutions are often presented as especially attractive. More than half of the federal states have invested in AI tools for their teachers – in some cases with seven-figure sums per year – instead of investing in more staff to solve our problems.
At the Chair of Ethics of Artificial Intelligence at the University of Osnabrück, we took a closer look at the three most widely used AI tools for automated feedback and grading in a study. The results are alarming: assessments of the same text vary by several grades when repeated – almost as if they were being rolled by dice. Verbal feedback remains generic and clichéd, in some cases even recommending the use of false information. When we followed the AI’s improvement suggestions, we were unable to consistently improve our grade in any of our tests. A similar study at the University of Flensburg confirmed this and additionally showed that none of the common tools reliably detects false information or unconstitutional content.
The AI tools for schools, mostly offered by smaller German companies, rely in the background on the large language models of global corporations such as OpenAI and Microsoft. Anyone who has used ChatGPT a little knows that these AI services tend to “hallucinate” – producing output that sounds good but is sometimes nonsensical, inventing facts, or contradicting themselves. Technically, this is not surprising, since large language models are statistical programs. They calculate the most likely next word in a word chain so that entire sentences emerge – without understanding the meaning of those sentences.
What chatbots produce is therefore, technically speaking, statistical mediocrity: whatever sounds most plausible – to “muddle through,” so to speak. But is that really the standard by which we want to measure our students?
Shouldn’t school rather encourage recognizing the world, investigating facts, and having the courage to use one’s own understanding? Instead of training students, via machine feedback, to make their “outputs” increasingly conform to the statistical mediocrity of an AI?
The capacity problems in the school system won’t be solved by such AI systems either. First, as teachers repeatedly report, it takes an awful lot of time to manually check the outputs of these feedback and grading tools before passing them on to students. And second, we must expect that any time teachers save will be “paid for” by being required to take on even more classes, students, and tasks.
Reference
-
Mühlhoff, Rainer, und Marte Henningsen. 2024. „Chatbots im Schulunterricht: Wir testen das Fobizz-Tool zur automatischen Bewertung von Hausaufgaben“. doi:10.48550/arXiv.2412.06651.×
@online{Mü-Henningsen2024, title = {Chatbots im Schulunterricht: Wir testen das Fobizz-Tool zur automatischen Bewertung von Hausaufgaben}, shorttitle = {Chatbots im Schulunterricht}, author = {Mühlhoff, Rainer and Henningsen, Marte}, date = {2024-12-10}, eprint = {2412.06651}, eprinttype = {arXiv}, eprintclass = {cs}, doi = {10.48550/arXiv.2412.06651}, url = {http://arxiv.org/abs/2412.06651}, urldate = {2024-12-11}, pubstate = {prepublished}, keywords = {Computer Science - Computers and Society}, web_thumbnail = {/assets/images/publications/Mü-Henningsen2024.jpg}, web_group = {reports}, web_fulltext = {http://arxiv.org/pdf/2412.06651}, web_preprint = {http://arxiv.org/abs/2412.06651} }