BASE Cognitive Stress Monitoring During Bug Inspection

Description

The main hypothesis of the BASE project is to research software bugs in a new perspective using neuroscience, physiological response and software reliability engineering in a tight interdisciplinary approach to understand the brain mechanisms involved in error making and error discovery, focusing on software programming and code inspection activities, and evaluate the possibilities of using the findings to improve software quality through a new comprehensive biofeedback augmented software engineering approach. 

 This study was designed to investigate the neural network associated to human error making and error discovery during software inspection activities, using electroencephalography (EEG) and functional magnetic resonance imaging (fMRI) to understand the mental conditions that lead to a bug and reveal the neural patterns associated to the “eureka moment” of finding a bug.

Furthermore, this study also investigates correlation models between the neuronal activation patterns associated with bug scenarios and the physiological response of programmers that can be monitored by currently available smartwatches and similar wearable devices. Examples of such signals that will be used in the current study are the ECG (electrocardiogram), PPG (photoplethysmogram) and EDA (electrodermal activity). The main idea is to monitor physiologic responses to emotional and cognitive stress and concentration states (e.g., variations in the heart rate, breathing rhythm, secretion increasing by the sweat glands, etc.), considering also stress/concentration related behavioural information that can be inferred from the typical programming environment, such as mouse and keyboard activity, eye movement, and facial expression.

This study was designed to investigate the cognitive state in code inspection and bug detection by a set of sensors placed in the programmers (subjects) that participated in the experiment. We used the experimental setup of one of the experiment campaigns of the BASE initiative, which covers a comprehensive set of sensors including EEG with a 64-channel cap, ECG, EDA, eye tracking with pupillography and fMRI. The signals from all these sources are synchronized in a common time base to allow consistent cross analysis.

This study involved 21 participants with experience in C programming language and code inspection, and they were selected after a series of interviews focusing on their C programming skills. The volunteers who participated were all male, with ages ranging from 19 to 40, and an average age of 25.56 +- 6.85 years old. During the screening, two questionnaires were provided: a programming experience questionnaire and a technical questionnaire. In the first one, the goal was to assess the programming experience of the programmer based on the volume of coding of the candidate in the last three years. The second questionnaire’s goal, composed of 10 questions, was to assess the volunteer coding skills. Of 49 candidates, the candidates with a score lower than 3 out of 10 points were considered as not eligible. Therefore, only 21 programmers were selected based on the final scores obtained from the questionnaires. The selected participants were classified into two levels of proficiency: 16 intermediate (scored between 4 and 7 points) and 5 expert participants (scored between 8 and 10 points).

The study was approved by the Ethics Committee of our organization, in accordance with the Declaration of Helsinki, and all experiments were performed in accordance with relevant guidelines and regulations. Informed consent was signed by all participants. The anonymized data from this research will be available upon request to the authors.

The four code snippets (Bucket sort, Fibonacci, Hondt method and Matrix determinant) used for code inspection and bug detection condition, represent different characteristics concerning complexity (simple/complex) and algorithm type (recursive/iterative). The table below summarizes the four programs:

    • The code snippet Bucket Sort implements a sorting algorithm and was presented as an iterative, medium-sized, and complex code snippet with four bugs.
    • The Fibonacci code implements the algorithm that generates the Fibonacci sequence and was used as a recursive, small-sized, and simple code snippet with one bug.
    • The Hondt Method code implements the Hondt algorithm for allocating seats after an election and was used as an iterative, small-sized, and medium-complex code snippet with four bugs.
    • The Matrix Determinant code implements the recursive algorithm that computes the determinant of square matrices and was used as a recursive, medium-sized, and complex code snippet with four bugs.
Prog. Type Number of Bugs Lines of code Cyclomatic complexity
Bucket Sort Iterative 4 42 10
Fibonacci Recursive 1 9 2
Hondt Method Iterative 4 32 5
Matrix Determinant Recursive 5 39 10

All the participants performed the experiments in the same room, without distractions, noise or presence of people unrelated to the experiments. The protocol includes the following steps, performed on the screen of a laptop:

    1. Each run of the experiment starts with an empty grey screen with a black cross in the center. During this step (30 seconds) the main objective is that the subject abstracts himself from the surrounding environment and to adjust to the experimental setup. The participant is instructed to “think about nothing” and to specially avoid task-related thoughts.
    2. In the second step, a task is randomly selected from the tasks stack. In can be: 1) Natural language reading; 2) Simple code snippet analysis or; 3) C code snippet analysis.
    3. The third step is similar to the first step and has the purpose of abstracting the subject from the activity performed in the previous step in order to do not affect the consequent one. This step lasts for 30 seconds.
    4. In the fourth step of each run, a task is randomly selected from the tasks stack. The selected task in this step cannot be of the same type of the task selected in step 2.
    5. The fifth step is similar to the first and second steps. This step lasts for 30 seconds.
    6. In the last step, a task is randomly selected from the tasks stack. The selected task in this step cannot be of the same type of the tasks selected in step 2 and 4.

After the 4 runs are accomplished, a concluding phase take place. In this phase an empty grey screen with an black cross in the center is present to the subject during 30 seconds.

All tasks are randomly selected and randomly assigned, from the tasks stack, to the to the each step of the run, respectively.

The protocol scheme is presented below.Description of each task:

    • Empty grey screen with a black cross in its center during 30 seconds. It serves as baseline phase in which the participant does not perform any activity.
    • Natural language reading – In this task a text in natural language is presented to the participant (selected in a random order from a group of texts presented in the protocol document in section 6). The presented text was based on Portuguese histories and with neutral characteristics,
      in order to avoid measurement fluctuations induced by narrative triggered emotions. The duration of this task is 60 seconds.
    • Simple code snippet reading – In this task the participant is presented with a screen containing a simple and iterative code snippet to be analyzed. In this task, the code snippet is selected in a random order from the code snippets in the protocol document in section 7.1. The main objective in this task is to induce the subject into a low cognitive effort state which will be used as a reference state during the posterior analysis. This step lasts for 300 seconds.
    • Bug detection – In this task a code snippet in C language is displayed to subject. In this task, the subject is asked to analyze and inspect the code aiming for bug (software faults) detection. The 4 code snippets are presented in the protocol document in section 7.2, along with their description and characteristics. In this task, the code snippet is selected in a random order. The duration of this task is 600 seconds.

Parameters

Time series
    • EDA – Eletrodermal Activity
    • ECG – Electrocardiogram
    • PPG – Photoplethysmogram
    • EEG – Electroencephalogram (64 channels)
    • fMRI – Functional magnetic resonance imaging

Eye tracking data

    • Raw data position

    • Pupil diameter (left and right eye)

    • Cornea reflex position (left and right eye)

    • Point of regard (gaze data / left and right eye)

    • Head position / Rotation

Meta data
    • Screening evaluation and demographic data
    • NASA-TLX evaluation data
    • Bug Detection evaluation data
Annotations
    • Experts section code annotation
Others
    • Protocol definition
    • Screening question and C-Test

Access:

This database is available to logged users using the link below.

How to cite this database:

Medeiros, Júlio and Simões, Marco and Castelhano, João and Abreu, Rodolfo and Couceiro, Ricardo and Henriques, Jorge and Castelo-Branco, Miguel and Madeira, Henrique and Teixeira, César and de Carvalho, Paulo (2024). EEG as a Potential Ground Truth for the Assessment of Cognitive State in Software Development Activities: A Multimodal Imaging Study. PLOS ONE, 19(3), e0299108.

Castelhano, Joao and Duarte, Isabel C and Couceiro, Ricardo and Medeiros, Julio and Duraes, Joao and Afonso, Sónia and Madeira, Henrique and Castelo-Branco, Miguel (2022). Software Bug Detection Causes a Shift From Bottom-Up to Top-Down Effective Connectivity Involving the Insula Within the Error-Monitoring Network. Frontiers in Human Neuroscience, 16, 788272.