As of now, there is a full-blown panic spreading regarding the free availability of chatGPT. It was made available to the general public in late 2022 after running a closed beta since 2021.

The discussion around ML systems creating expressive artifacts such as text, images, videos, and even voice has been going on for some time, but the last weeks have elevated it into public view, and now everybody and their aunt is talking about it. This is good, as we urgently need a public discourse about the doors that have been opened here.

After learning in late December that our (informatics/computer science) students are exchanging good chatGPT prompts to generate text for their hand-ins in our ways of thinking in informatics lecture, we also became slightly alarmed. One learning outcome of the course is to teach thinking skills to (first-year) students. To get them to think, we have them hand in written summaries, essays, reflections, etc., on their work within the course as well as on the content of the course. Using chatGPT to have those texts written is a shortcut that leaves out the most important part – the actual thinking.

In this setting, the written text is not the main result - i couldn’t care less about the texts, to be honest– but it is meant to be the evidence of thinking. I call this the writing-as-proof-of-thinking(1) approach.

The thinking-part can be many things: proof that they read a text, watched a video, conducted literature research on something, thought about a problem, and reflected critically on their own work. It can serve as evidence for many different abilities and learnings. This approach had its problems from the start, but the incentives (good ratio of resources invested vs. students supervised) were always stronger than my reservations.

But now, this has changed dramatically. The fakeability (not a word, I know) of writing-as-proof-of-thinking with chatGPT means there is no basis to grade students; maybe they did the thinking, maybe they didn’t – we cannot know.

Of course, the quality of texts written by chatGPT is quite lacking, but given that some work is put into the generation and revision of the writing prompts for chatGPT, the texts are often good enough to deceive the tutors who are assessing the work handed in. Since we have 700+ students in this course, there is a solid incentive to rely on this evaluation method. Also, some original texts handed in by students are of questionable quality even without chatGPT, which has many reasons that maybe will be written about in a separate blog post here…

In one word: sigh

We discussed this situation with the students in one of the last units of the ways of thinking in informatics lecture of this semester. There is a strong bias in the group of students still attending the lecture hall units at that time; they are the ones actually interested in the course, so probably nobody from the chatGPT-using crowd was present. Still, we were able to collect some interesting suggestions and quotes.

We used an online tool to collect suggestions and ideas – not all students were in the lecture hall, as we also stream the lecture, and not everybody likes to stand up and talk into a microphone – and received around 45 written suggestions and comments in addition to the things that were suggested in the room. These comments, questions, and ideas give a good impression of what our students believe, hope, and fear about AI in their future. (Everything was written and said in German language, I transcribed and translated it using various services)

I’ll start with a couple of general comments:

I don’t really know, but I am quite afraid that systems like chatGPT will replace me as a software engineer.

There are already so many other students in our year who solve tasks (in all classes) with chatGPT without thinking, and you can immediately tell that these people have never thought for themselves. In my opinion, chatGPT is quite overrated.

Anyone who has to use chatGPT to create an LVA like DWI will not stay long at the TU anyway.

These are certainly interesting, but of little help. While I certainly hope that it is true that those who use chatGPT will have problems down the road, I also think that these systems will get better to the point that work written with such systems will be as good or even better than what (some? many?) students can write on their own.

Some students suggested different ways to change what we do to accommodate or combat chatGPT.

Ask students work much more with sources, where chatGPT has problems because it cannot access the internet.

Switch from digital to analog, elaborations have to be written by hand.

Emphasize discussions in groups, where chatGPT cannot be used, or give more »diverse« assignments that need video, voice recordings, discussions etc. to create an elaboration.

More creative tasks, less »one right answer«. Assignments should have a personal component where students have to include aspects of their own life or experiences, so using chatGPT becomes impossible.

I’m afraid that all of these strategies fail simply because either I don’t want to follow them (all analog??? it’s 2023!), or they don’t really solve the problem (how can we validate and evaluate the personal experience part?). But it is anyway a vain undertaking to try to outsmart the ML systems, as one student points out

It will also be essential to say goodbye to the arms race. All suggestions in this direction are only helpful in the very short term and tie up resources in their development.

So, lets have a look at the most helpful comments by students:

One problem in general is that many students submit their work while only minimally engaging with the content, interest and motivation are ruined by any form of assessment.

In the medium term, there will be no getting around the question of the student-teacher-ratio. And that will cost accordingly. You can't ignore that.

Personally, if tests were available, I would probably take less from the lecture, because then a learning for the test mode would arise for me, whereby I would NOT learn according to interest, but according to the test material.

Each of these three comments touches the core of the problem, in different ways.

The first comment illustrates how the way teaching and learning is framed. Robert Fried wrote a book in 2005 where he described it as The game of schoola game played by teachers, kids, and parents alike, and which mostly hurts kids (and society). We treat learning as something that is decoupled from interest and motivation and – most harmful of all – something that can be assessed through grades or scores, which then reflect how much we have learned. Those scores, as the signifiers of having-learned, then become the most important part of learning and, consequently, of teaching, and I believe this to be wrong.

I don’t want to discuss the whole instant-gratification-vs-long-term-goals thing here, but it has been shown quite conclusively that extrinsic motivation can negatively impact intrinsic motivation. In other words, »paying« the work of students in grades and scores can condition them to ignore their interests and follow the »money«.

So the question here is: how can we (a) maintain the motivation many students bring to the course and (b) help the other students find their parts or aspects of the course that interest them so that they can build up some sort of engagement for the course.

The second comment stabs a knife into the very Austrian problem of how universities are expected to teach huge classes with little resources. We have 700+ students, and most of the help we get are students from last year’s course called »tutors« that receive a little money to help us organize the course. The ratio of these tutors to students is in the ballpark of 30:1, and we have little to no time or money to actually qualify these tutors beyond what they have learned already. So, they can do administrative work like organizing group work and monitoring group work, and some light-weight grading of student hand-ins, but not the kind of supervision where you form a critical verdict of a student’s learning progress. In other words: we are dependent on the writing-as-proof-of-thinking cycle described above.

No question here - A change in this situation is highly unlikely, so we will have to find ways to make it work with the inadequate resources, or we buy into a much more resource-friendly model of teaching, like tests & exams – and this is where the third comment comes in.

It mirrors my sentiments towards tests and exams. Learning for a test does not lead to sustainable knowledge acquisition, at least not in the learn-to-think-for-yourself type of course we do. I have my doubts about for-the-test-learning in general, which has also been described by the somewhat insensitive term bulimic learning (learn for the test and »barf« it on the paper, leaving as little as possible in the brain). Specifically for courses like the ways of thinking in informatics, I know for a fact (because of years of trial-and-failing before giving up on testing) that learning for the test helps nobody: it creates much useless work for teachers to write, administer and grade the tests, and students learn next-to-nothing of actual value from it.

So, while we have been on a path with the writing-as-proof-of-thinking approach that has proven a lot of merit despite the inherent problems, we have now definitely reached the end of this road, and we have little idea how to proceed sustainably, i.e. a way that works even with the next couple of versions of AI/ML content generating tools coming along.

Some thinking has already been done, and the list of approaches we are pondering at the moment looks like this:

  • Forget writing-as-proof-of-thinking. Many students (especially informatics students in the first semester) hate writing. While we happily ignored this for many years (just because students hate it does not mean it is a bad thing), this is over now. We need to rely on assignments that let students express their critical thinking in other ways such as creating videos, posters, group discussions with external experts, etc. This could lead us all the way down the PBL-rabbit hole, with the problem that facilitating, supervising and monitoring PBL for 700 students sounds like a nightmare…

  • Let go of grading. The spark for this approach came from a thread on mastodon by @jonny@neuromatch.social. I remember since forever being intrigued by this concept, but I never dared to actually do it.

  • Demand goals. Have students express explicit (learning) goals for the semester, and peer-evaluate/self-evaluate the accomplishment of these goals.

  • Run class-wide contests and competitions, like in these examples. They surely sound like a lot of fun, but also a metric ton of work. Also, I have some reservations against creating overly competitive settings.

  • Play games. Maybe it is possible to create a set of games that help create not only interest, but also critical discussion surrounding the topics of our course. We are certainly trying this one out. Of course, learning relies on reflection and active transfer, which still needs to be facilitated separately – somehow.

There are certainly more ideas out there. If you have some, let me know, or let us know, as we are in this together (which i deduce from the fact that you read this wall of text till the end).

If there is enough interest, I am more than glad to offer to organize a shared workspace of some form (like a miro or a cryptpad) for collecting and sorting ideas, and/or a zoom meeting, etc.

The problems of our reliance on writing-as-proof-of-thinking were always there, but now, we quite suddenly have to solve them.

(image: stable diffusion with the prompt »the university of the future run by an artificial intelligence, by moebius«)

(1) I just learned that this is also called artifact as proxy of process in educational literature.

Comment