Top

AI code generators could make learning to code easier for young students, new research shows

While programming can be challenging for young students, new research from University of Toronto computer scientists suggests artificial intelligence-based coding assistants can play a key role in helping learners progress.

Researchers found encouraging evidence that AI code generators built on the technology used by tools like ChatGPT could ultimately make programming and computing more accessible to young learners. Third-year PhD student Majeed Kazemitabaar led a study that involved 69 students between the ages of 10 and 17 who were learning Python, a text-based programming language, for the first time.

His research found that using AI code generators is not detrimental to learning and could significantly improve learning outcomes and retention for students who have prior experience with block-based programming environments, such as Scratch, where pieces of code are dragged and dropped.

The paper, “Studying the effect of AI Code Generators on Supporting Novice Learners in Introductory Programming” has been accepted to CHI 2023, the premier international conference in the area of human-computer interaction. Its co-authors include U of T engineering science undergraduate students Justin Chow and Carl Ka To Ma, Associate Professor Tovi Grossman and collaborators from the University of Michigan and the University of Maryland.

PhD student Majeed Kazemitabaar sits in his workspace with his arms folded next to computer monitors displaying Coding Steps, the self-paced learning environment for study participants working on programming tasks.

Research from PhD student Majeed Kazemitabaar shows AI code generators are beneficial to novice programmers learning to code and do not negatively impact learning retention. (Photo: Matt Hintsa)

The research team built a self-paced learning environment called Coding Steps where learners worked on tasks to learn basic Python programming concepts.

Inside the learning environment, the team included an AI code generator based on OpenAI Codex, that could help learners produce code from natural language descriptions. Codex is based on GPT-3, a large language model developed by OpenAI that underpins tools such as ChatGPT and Copilot, a coding assistant from GitHub.

During the three-week study, learners started with 45 code-authoring tasks, in which half had access to Codex. Each authoring task was followed by a code-modification task, in which both groups worked without Codex. The final phases of the study involved two evaluation sessions, including an immediate post-test, and a retention test one week later.

Researchers found that students who had access to the AI code generator were not only able to successfully author and understand the generated code, but also showed no decrease in performance on the subsequent manual code-modification tasks, in which both groups performed similarly.

The results from the code-authoring tasks indicated that using AI code generators can improve progression and correctness score and reduce encountered errors and task completion time.

The research team was surprised by the learning gains made by the Codex group after the training phase.

“On the retention test, we never expected this, but even the students that had access to the code generator were performing slightly better. And there were trends showing they were doing better on more complex topics like loops and arrays,” says Kazemitabaar.

“That was very interesting to see how the students that didn’t have access to the code generator regressed more a week later,” he adds.

The study also found that students with prior block-based programming knowledge who had used the AI code generator performed significantly better during the retention post-tests than their peers who had not had access to the AI code generator for the authoring tasks.

Kazemitabaar notes students who did not have access to the code generator experienced a lot of frustration as they were trying to fix syntax errors, even though they had access to a novice-friendly documentation.

“Students’ access to the code generator reduced their frustration and improved their progression,” Kazemitabaar says.

With the current debate surrounding ChatGPT in educational contexts, what does this mean for computer science education and the way AI code generators can be used in K-12 environments?

The researchers acknowledge that while over-reliance on these tools could be a potential drawback, the study showed that having access to AI code generators did not impede learning gains and did not reduce learners’ ability to manually modify code.

Students used the code generators in passive and active ways, explains Kazemitabaar. Passive uses involved copying the question to prompt Codex and then submitting the generated code, while active uses involved tinkering with and manually verifying the AI-generated code before submission.

“To achieve the best learning outcomes for students, we should aim to better understand these behaviours and try to promote the more active approaches,” he suggests.

Kazemitabaar says he hopes to see further research on how AI code generators could support educators in striking a balance between accelerating students’ learning and preventing them from becoming dependent on these tools.

“How could they be used to not necessarily show solutions, but instead, progressively, gradually help students learn without revealing too much of the answers?” he ponders.

He believes these tools can be useful for learning and democratizing programming and could help more students, including underrepresented populations, engage in computer science.

“Programming is very difficult for a young student to learn, and they will be frustrated very easily when they see their first error,” he says. “That’s what I am trying to help students not worry about, but still learn.”

While this research focuses on introductory programming, Kazemitabaar says it will be worth exploring the effects these tools may have in computer science education at higher levels, such as in university courses.

“What about other contexts, like a fourth-year university programming course or a course in which students are required to learn about the theoretical concepts of algorithms? What about the coding parts of those courses? And do these tools, benefit students on those topics as well or not?”