A New Look at Novice Programmer Errors
Davin McCall
Michael Kölling

2019

ACM Transactions on Computing Education, Volume 19, Issue 4

The types of programming errors that novice programmers make and struggle to resolve have long been of interest to researchers. Various past studies have analyzed the frequency of compiler diagnostic messages. This information, however, does not have a direct correlation to the types of errors students make, due to the inaccuracy and imprecision of diagnostic messages. Furthermore, few attempts have been made to determine the severity of different kinds of errors in terms other than frequency of occurrence. Previously, we developed a method for meaningful categorization of errors, and produced a frequency distribution of these error categories; in this article, we extend the previous method to also make a determination of error difficulty, in order to give a better measurement of the overall severity of different kinds of errors. An error category hierarchy was developed and validated, and errors in snapshots of students source code were categorized accordingly. The result is a frequency table of logical error categories rather than diagnostic messages. Resolution time for each of the analyzed errors was calculated, and the average resolution time for each category of error was determined; this defines an error difficulty score. The combination of frequency and difficulty allow us to identify the types of error that are most problematic for novice programmers. The results show that ranking errors by severity—a product of frequency and difficulty—yields a significantly different ordering than ranking them by frequency alone, indicating that error frequency by itself may not be a suitable indicator for which errors are actually the most problematic for students.

Study Information
Manually extracted from the paper by the Progmiscon.org team

Programming Languages

Java

Method

Quantitative systematic research analyzing 1000+ randomly-selected compilation events

Subjects

199 user sessions (each sessions associated with a different user)

Artifact

https://bluej.org/blackbox/
Note by Progmiscon.org Team
The paper's appendices contain most of the underlying data. The raw data would be available in the Blackbox dataset, however, the information about which 199 user sessions were randomly selected (within the period from 2013-06-11 to 2014-05-30) from the Blackbox dataset is not provided.

Related Study Results
Phenomena studied in this paper that map to Progmiscon.org misconceptions

The following list summarizes those phenomena reported in this study that provide evidence for misconceptions documented on Progmiscon.org. (The paper may provide evidence for other misconceptions as well. This list focuses exclusively on misconceptions documented on Progmiscon.org.)

Errors
A Blackbox dataset of randomly-selected, student-generated compilation events over a 1-year period.

Tab8.2
Variable not declared
85 / 1 000
Compilation events containing this error
This provides evidence potentially relevant for the following Progmiscon.org misconceptions:
Note by Progmiscon.org Team
The exact total number of errors studied is unclear (the paper states 'a total of (just over) 1,000 compilation events with errors'.)
Tab8.27
Method declaration: missing return type
10 / 1 000
Compilation events containing this error
This provides evidence potentially relevant for the following Progmiscon.org misconceptions:
Note by Progmiscon.org Team
The exact total number of errors studied is unclear (the paper states 'a total of (just over) 1,000 compilation events with errors'.)
Tab8.18
Missing parentheses for method call
21 / 1 000
Compilation events containing this error
This provides evidence potentially relevant for the following Progmiscon.org misconceptions:
Note by Progmiscon.org Team
The exact total number of errors studied is unclear (the paper states 'a total of (just over) 1,000 compilation events with errors'.)
Tab8.33
'=' used in place of '=='
6 / 1 000
Compilation events containing this error
This provides evidence potentially relevant for the following Progmiscon.org misconceptions:
Note by Progmiscon.org Team
The exact total number of errors studied is unclear (the paper states 'a total of (just over) 1,000 compilation events with errors'.)
Tab8.46
Class does not implement required method
2 / 1 000
Compilation events containing this error
Note by Progmiscon.org Team
The exact total number of errors studied is unclear (the paper states 'a total of (just over) 1,000 compilation events with errors'.)
Tab8.60
Local variable declaration with illegal modifier
1 / 1 000
Compilation events containing this error
This provides evidence potentially relevant for the following Progmiscon.org misconceptions:
Note by Progmiscon.org Team
The exact total number of errors studied is unclear (the paper states 'a total of (just over) 1,000 compilation events with errors'.)
Tab8.69
Constructor call without using 'new'
0 / 1 000
Compilation events containing this error
This provides evidence potentially relevant for the following Progmiscon.org misconceptions:
Note by Progmiscon.org Team
The exact total number of errors studied is unclear (the paper states 'a total of (just over) 1,000 compilation events with errors'.)