On 13 November 2024 I attended a workshop that was all about Generative AI and assessment for OU computing modules that was organised by colleagues Michel Wermelinger (academic co-lead GAI in LTA), Mark Slaymaker (C&C Director of Teaching) and Anton Dil (Assessment Lead of C&C Board of Studies). What follows are a set of notes and reflections which relates to the event. I also share some useful links and articles that I need to find the time to follow up on. This summary is, of course, very incomplete; I only give a very broad sketch of some of the discussions that have taken place, since there is such a lot of figuring out to do. Also, for brevity, Generative AI is abbreviated to GenAI.
Introduction and some resources
The workshop opened with a useful introductory session, where we were directed various resources which we could follow up after the event. I’ll pick on a few:
To get a handle on what public research has been published by colleagues within OU, it is worth reviewing ORO papers about Generative AI.
The following notable book was highlighted:
There were also a few papers about how GenAI could be used with the teaching of programming:
- Lehtinen et al. (2023) Automated Questions About Learners' Own Code Help to Detect Fragile Prerequisite Knowledge.
- Denny et al. (2023) Conversing with Copilot: Exploring Prompt Engineering for Solving CS1 Problems Using Natural Language.
To conclude this very brief summary, there is also an AL Development Resource which has been produced by Kevin Waugh, available under a Creative Commons Licence.
Short presentations
There were ten short presentations which covered a range of issues, from how GenAI might be used within assessments to facilitate meaningful learning, through to the threats it may offer to academic integrity. What follows are some points that I found particularly interesting.
One of the school’s modules, TM352 Web Mobile and Cloud technologies, has a new block that is dedicated to tools that can be used with software development. Over the last couple of years, it has changed quite a bit in terms of the technologies it introduces to learners. Since software development practices are clearly evolving, I need to find the time to see what has found its way into that module. It will, of course, have an influence on what any rewrite to TM354 Software Engineering might contain.
I learnt about a new software project called Llama. From what I understand, Llama is an open source large language model (LLM) engine that can be locally installed on your desktop computer, from where it can be fed your own documents and resources. The thing is: LLMs can make things up and get things wrong. Another challenge is that LLMs need a lot of computing resources to do anything useful. If students are ever required to play with their own LLMs as a part of a series of learning activities, this raises the subject of computing equity: some students will have access to powerful laptops, whereas other might not. Maybe there will a point when the school may have to deploy AI tools within the OpenSTEM Labs?
Whether you like them or not, a point was made that these tools may begin to find their way closer to the user. Tony Hirst made the point that when models start to operate on your own data (or device) this may open up the possibility of semantically searching sets of documents. Similarly, digital assistants may begin to offer more aggressive help and suggestions about how to write bits of documents. Will the new generation of AI tools and digital assistants be more annoying than the memorable Microsoft Clippy?
GenAI appears to be quite good at helping to solve well-defined and well-known programming problems. This leads us to consider a number of related issues and tensions. Knowing how to work with programming languages is currently an important graduate skill; programming also develops problem decomposition and algorithmic thinking skills. An interesting reflection is that GenAI may well help certain groups of students more than others.
Perhaps the nature of programming is changing as development environments draw upon coding solutions of others. Put another way, in the same way that nobody (except for low level software engineers) knows assembly language these days, perhaps the task of software development is moving to a higher level of abstraction. Perhaps developing software will mean less coding, but more about how to combine bits of code together. This said, it is still going to be important understand what those raw materials look like.
An interesting research question was highlighted by Mike Richards: can tutors distinguish between what has been generated by AI and what has been written by students? For more information about this research, do refer to the article Bob or Bot: Exploring ChatGPT’s answers to University Computer Science Assessment (ORO) A follow on question is, of course: what do we do about this?
One possible answer to this question may lie in a presentation which shared practice about the conducting of oral assessments, which is something that is already done on the postgraduate M812 digital forensics module. Whilst oral assessments can be a useful way to assess whether learning has taken place, it is important to consider the necessity of reasonable adjustments, to take account of students who may not be able to make an oral assessment, either due to communication difficulties, or mental health difficulties.
The next presentation, given by Zoe Tompkins, helped us to consider another approach to assessment: asking students to create study logs (which evidence their engagement), accompanied by pieces of reflective writing. On this point, I’m reminded of my current experience as an A334 English literature student, where I’m required to make regular forum postings to demonstrate regular independent study (which I feel that I’m a bit behind with). In addition to this, I also have an A334 reflective blog. A further reflection is that undergraduate apprentices have to evidence a lot of their learning by uploading digital evidence into an ePortfolio tool, which is then confirmed by a practice tutor. Regular conversations strengthen academic integrity.
This leads onto an important question which relates to the next presentation: what can be done to ensure that written assessments are ‘GenAI proof’? Is this something that can be built in? A metaphor was shared: we’re trying to ‘beat the machine’, whilst at the same time teaching everyone about the machine. One way to try to beat the machine is to use processes, and to refer to contexts that ‘the machine’ doesn’t know about. The context of questions is important.
The final presentation was by one of our school’s academic conduct officers. Two interesting numbers were mentioned. There are 6 points that students need to bear in mind when considering GenAI. If I’ve understood this correctly, there are 19 different points of guidance available for module teams to help them to design effective assessments. There’s another point within all this, which is: tutors are likely to know whether a bit of text has been generated by a LLM.
Reflections
This event reminded me that I have quite an extensive TODO list: I need to familiarise myself with Visual Studio Code, have a good look at Copilot, get up to speed with GitHub for education, look at the TM352 materials (in addition to M813 and M814, which I keep meaning to do for quite a while), and review the new Software Engineering Body of Knowledge (SWEBOK 4.0) that has been recently released. This is in addition to learning more about the architecture of LLMs, and upskill myself when it comes to the ethics, and figure out more about the different dimensions of cloud computing. Computing as moved on since I was last a developer and software engineer. With my TM470 tutor hat on, we need to understand how and where LLMs might be useful, and more about the risks they post to academic integrity.
At the time of writing, there is such a lot of talk about GenAI (and AI in general). I do wonder where we are in the Gartner hype cycle (Wikipedia). As I might have mentioned in other blogs, I’ve been around in computing for long enough to know that AI hype has happened before. I suspect we’re currently climbing up the ‘peak of inflated expectations’. With each AI hype cycle, we always learn new things. I’m of the school of thought that the current developments represent yet another evolutionary change, rather than one that offers revolutionary change.
Whilst studying A334, my tutor talked a little about GenAI in an introductory tutorial. In doing so, he shared something about expectations, in terms of what was expected in a good assessment submission. If I remember rightly, he mentioned the importance of writing that answered the question (a context that was specific, not general), demonstrated familiarity with the module materials (by quoting relevant sections of course texts), and clear and unambiguous referencing. Since the module is all about literature, there is scope to say what we personally think a text might be about. These are all the kind of things that LLMs might be able to do at some level, but not to a degree that is yet thoroughly convincing. To get something convincing, students need to spend time doing ‘prompt engineering’.
This leads us to a final reflection: do we spent a lot of time writing prompts and interrogating a LLM to try to get what we need, or would that time be spent more effectively writing what needed to be written in the first place? If the writing of assessments are all about learning, then does it matter how learning has taken place, as long as the learning has occurred? There is, of course, the important subject of good academic practice, which means becoming aware of what the cultural norms of academic debate and discourse are all about. To offer us a little more guidance, in the coming months I understand there will be some resources about Generative AI available on OpenLearn.
Acknowledgments
Many thanks to the organisers and facilitators. Thanks to all presenters; there were a lot of them!
Addendum
Edited on 27 November 24, attributing Zoe Tompkins to one of the sessions. During the event, a session was given by Professor Karen Kear, who demonstrated how Generative AI can struggle with very specific tasks: creating useful image descriptions. Generative AI is general; it doesn't understand the context in which problems are applied.