Filter by type:

Sort by year:

Towards an Objective Measure of Developers' Cognitive Activities

Zohreh Sharafi, Yu Huang, Kevin Leach, and Westley Weimer
Journal Paper ACM Transaction on Software Engineering and Methodology (TOSEM), 2021


Understanding how developers carry out different computer science activities with objective measures can help to improve productivity and guide the use and development of supporting tools in software engineering. In this article, we present two controlled experiments involving 112 students to explore multiple computing activities (code comprehension, code review, and data structure manipulations) using three different objective measures including neuroimaging (functional near-infrared spectroscopy (fNIRS) and functional magnetic resonance imaging (fMRI)) and eye tracking. By examining code review and prose review using fMRI, we find that the neural representations of programming languages vs. natural languages are distinct. We can classify which task a participant is undertaking based solely on brain activity, and those task distinctions are modulated by expertise. We leverage insights from the psychological notion of spatial ability to decode the neural representations of several fundamental data structures and their manipulations using fMRI, fNIRS, and eye tracking. We examine list, array, tree, and mental rotation tasks and find that data structure and spatial operations use the same focal regions of the brain but to different degrees: they are related but distinct neural tasks. We demonstrate best practices and describe the implication and tradeoffs between fMRI, fNIRS, eye tracking, and self-reporting for software engineering research.

Eyes on Code: A Study on Developers' Code Navigation Strategies

Zohreh Sharafi, Ian Bertram, Michael Flanagan, and Westley Weimer
Journal Paper IEEE Transaction in Software Engineering (TSE), 2020


What code navigation strategies do developers use and what mechanisms do they employ to find relevant information Do their strategies evolve over the course of longer tasks Answers to these questions can provide insight to educators and software tool designers to support a wide variety of programmers as they tackle increasingly-complex software systems. However, little research to date has measured developers' code navigation strategies in ecologically-valid settings or analyzed how strategies progressed throughout a maintenance task. We propose a novel experimental design that more accurately represents the software maintenance process in terms of software complexity and IDE interactions. Using this framework, we conduct an eye-tracking study (n=36) of realistic bug-fixing tasks, dynamically and empirically identifying relevant code areas. We introduce a three-phase model to characterize developers' navigation behavior supported by statistical variations in eye movements over time. We also propose quantifiable notion of ``thrashing'' with the code as a navigation activity. We find that thrashing is associated with lower effectiveness. Our results confirm that the relevance of various code elements changes over time, and that our proposed three-phase model is capable of capturing these significant changes. We discuss our findings and their implications for tool designers, educators, and the research community.

A Practical Guide on Conducting Eye Tracking Studies in Software Engineering

Zohreh Sharafi, Bonita Sharif, Andrew Begel, Roman Bednarik, Martha Crosby, and Yann-Gaël Guéhéneuc
Journal Paper Empirical Software Engineering, Springer, 2020


For several years, the software engineering research community used eye trackers to study program comprehension, bug localization, pair programming, and other software engineering tasks. Eye trackers provide researchers with insights on software engineers' cognitive processes, data that can augment those acquired through other means, such as on-line surveys and questionnaires. While there are many ways to take advantage of eye trackers, advancing their use requires defining standards for experimental design, execution, and reporting. We begin by presenting the foundations of eye tracking to provide context and perspective. Based on previous surveys of eye tracking for programming and software engineering tasks and our collective, extensive experience with eye trackers, we discuss when and why researchers should use eye trackers as well as \emph{how} they should use them. We compile a list of typical use cases---real and anticipated---of eye trackers, as well as metrics, visualizations, and statistical analyses to analyze and report eye-tracking data. We also discuss the pragmatics of eye tracking studies. Finally, we offer lessons learned about using eye trackers to study software engineering tasks. This paper is intended to be a one-stop resource for researchers interested in designing, executing, and reporting eye tracking studies of software engineering tasks.

A Systematic Literature Review on the Usage of Eye-tracking in Software Engineering

Zohreh Sharafi, Zéphyrin Soh, and Yann-Gaël Guéhéneuc
Journal Paper Journal of Information and Software Technology (IST), Elsevier, 2015


Eye-tracking is a mean to collect evidence regarding some paticipants’ cognitive processes. Eye-trackers monitor participants’ visual attention by collecting eye-movement data. These data are useful to get insights into participants’ cognitive processes during reasoning tasks.

The Evidence-based Software Engineering (EBSE) paradigm has been proposed in 2004 and, since then, has been used to provide detailed insights regarding different topics in software engineering research and practice. Systematic Literature Reviews (SLR) are also useful in the context of EBSE by bringing together all existing evidence of research and results about a particular topic. This SLR evaluates the current state of the art of using eye-trackers in software engineering and provides evidence on the uses and contributions of eye-trackers to empirical studies in software engineering.

We perform a SLR covering eye-tracking studies in software engineering published from 1990 up to the end of 2014. To search all recognised resources, instead of applying manual search, we perform an extensive automated search using Engineering Village. We identify 36 relevant publications, including nine journal papers, two workshop papers, and 25 conference papers. The software engineering community started using eye-trackers in the 1990s and they have become increasingly recognised as useful tools to conduct empirical studies from 2006. We observe that researchers use eye-trackers to study model comprehension, code comprehension, debugging, collaborative interaction, and traceability. Moreover, we find that studies use different metrics based on eye-movement data to obtain quantitative measures. We also report the limitations of current eye-tracking technology, which threaten the validity of previous studies, along with suggestions to mitigate these limitations. However, not withstanding these limitations and threats, we conclude that the advent of new eye-trackers makes the use of these tools easier and less obtrusive and that the software engineering community could benefit more from this technology.

An Empirical Study on the Importance of Source Code Entities for Requirements Traceability

Nasir Ali, Zohreh Sharafi, Yann-Gaël Guéhéneuc, and Giuliano Antoniol
Journal Paper Empirical software engineering Journal (EMSE), Springer, 2014

Requirements Traceability (RT) links help developers during program comprehension and maintenance tasks. However, creating RT links is a laborious and resource-consuming task. Information Retrieval (IR) techniques are useful to automatically create traceability links. However, IR-based techniques typically have low accuracy (precision, recall, or both) and thus, creating RT links remains a human intensive process. We conjecture that understanding how developers verify RT links could help improve the accuracy of IR-based RT techniques to create RT links. Consequently, we perform an empirical study consisting of four case studies. First, we use an eye-tracking system to capture developers’ eye movements while they verify RT links.

We analyse the obtained data to identify and rank developers’ preferred types of Source Code Entities (SCEs), e.g., domain vs. implementation-level source code terms and class names vs. method names. Second, we perform another eye-tracking case study to confirm that it is the semantic content of the developers’ preferred types of SCEs and not their locations that attract developers’ attention and help them in their task to verify RT links. Third, we propose an improved term weighting scheme, i.e., Developers Preferred Term Frequency/Inverse Document Frequency (D P T F / I D F), that uses the knowledge of the developers’ preferred types of SCEs to give more importance to these SCEs into the term weighting scheme. We integrate thisweighting scheme with an IR technique, i.e., Latent Semantic Indexing (LSI), to create a new technique to RT link recovery. Using three systems (iTrust, Lucene, and Pooka), we show that the proposed technique statistically improves the accuracy of the recovered RT links over a technique based on LSI and the usual Term Frequency/Inverse Document Frequency (T F / I D F) weighting scheme. Finally, we compare the newly proposed D P T F / I D F with our original Domain Or Implementation/Inverse Document Frequency (D O I / I D F) weighting scheme.

Taupe: Visualizing and Analysing Eye-tracking Data

Benoît de Smet, Lorent Lempereur, Zohreh Sharafi, Yann-Gaël Guéhéneuc, Giuliano Antoniol, and Naji Habra
Journal Paper Science of Computer Programming Journal (SCP), Elsevier, 2011

Program comprehension is an essential part of any maintenance activity. It allows developers to build mental models of the program before undertaking any change. It has been studied by the research community for many years with the aim to devise models and tools to understand and ease this activity. Recently, researchers have introduced the use of eye-tracking devices to gather and analyze data about the developers’ cognitive processes during program comprehension. However, eye-tracking devices are not completely reliable and, thus, recorded data sometimes must be processed, filtered, or corrected. Moreover, the analysis software tools packaged with eye-tracking devices are not open-source and do not always provide extension points to seamlessly integrate new sophisticated analyses.

Consequently, we develop the Taupe software system to help researchers visualize, analyze, and edit the data recorded by eye-tracking devices. The two main objectives of Taupe are compatibility and extensibility so that researchers can easily: (1) apply the system on any eye-tracking data and (2) extend the system with their own analyses. To meet our objectives, we base the development of Taupe: (1) on well-known good practices, such as design patterns and a plug-in architecture using reflection, (2) on a thorough documentation, validation, and verification process, and (3) on lessons learned from existing analysis software systems. This paper describes the context of development of Taupe, the architectural and design choices made during its development, and its documentation, validation and verification process. It also illustrates the application of Taupe in three experiments on the use of design patterns by developers during program comprehension.

LOGI: An Empirical Model of Heat-Induced Disk Drive Data Loss and its Implications for Data Recovery

Hammad Ahmad, Colton Holoday, Ian Bertram, Kevin Angstadt, Zohreh Sharafi, and Westley Weimer
Conference Papers Predictive Models and Data Analytics in Software Engineering (PROMISE) 2022


Disk storage continues to be an important medium for data recording in software engineering, and recovering data from a failed storage disk can be expensive and time-consuming. Unfortunately, while physical damage instances are well documented, existing studies of data loss are limited, often only predicting times between failures. We present an empirical measurement of patterns of heat damage on indicative, low-cost commodity hard drives. Because damaged hard drives require many hours to read, we propose an efficient, accurate sampling algorithm. Using our empirical measurements, we develop LOGI, a formal mathematical model that, on average, predicts sector damage with precision, recall, F-measure, and accuracy values of over 0.95. We also present a case study on the usage of LOGI and discuss its implications for file carver software. We hope that this model is used by other researchers to simulate damage and bootstrap further study of disk failures, helping engineers make informed decisions about data storage for software systems.

Trustworthiness Perceptions in Code Review: An Eye-tracking Study

Ian Bertram, Jack Hong, Yu Huang, Westley Weimer, and Zohreh Sharafi
Conference Papers ACM International Symposiumon Empirical Software Engineering and Measurement (ESEM) - Emerging results, 2020


Background:Automated program repair and other bug-fixing approaches are gaining attention in the software engineering community. Automation shows promise in reducing bug fixing costs. However, many developers express reluctance about accepting machine-generated patches into their codebases. Aims: To contribute to the scientific understanding and the empirical investigation of human trust and perception with regards to automation in software maintenance. Method: We design and conduct an eye-tracking study investigating how developers perceive trust as a function of code provenance (i.e., author or source). We systematically vary provenance while controlling for patch quality. Results: In our study of ten participants, overall visual code scanning and the distribution of attention differed across identical code patches labeled as human- vs. machine-written. Participants looked more at the source code for human-labeled patches and looked more at tests for machine-labeled patches. Participants judged human-labeled patches to have better readability and coding style. However, participants were more comfortable giving a critical task to an automated program repair tool. Conclusion: We find that there are significant differences in code review behavior based on trust as a function of patch provenance. Further, we find that important differences can be revealed by eye tracking. Our results may inform the subsequent design and analysis of automated repair techniques to increase developers' trust and, consequently, their deployment.

Investigating Gender Bias and Differences in Code Review using Medical Imaging and Eye-Tracking

Yu Huang, Kevin Leach, Zohreh Sharafi, Nicholas McKay, Tyler Santander, and Westley Weimer
Conference Papers ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), 2020


Code review is a critical step in modern software quality assurance, yet it is vulnerable to human biases. Previous studies have clarified the extent of the problem, particularly regarding gender biases, but no consensus of understanding has emerged. Advances in medical imaging are increasingly applied to software engineering, supporting grounded neurobiological explorations of computing activities, including the review, reading, and writing of source code. In this paper, we present the results of a controlled experiment using both medical imaging and also eye tracking to investigate the neurological correlates of gender bias and differences in code review. We find that men and women conduct code reviews differently, in ways that are measurable and supported by behavioral, eye-tracking and medical imaging data. We also find biases in how humans review code as a function of its apparent author, when controlling for code quality. In addition to advancing our fundamental understanding of how cognitive biases relate to the code review process, the results may inform subsequent training and tool design to reduce bias.

Eye-tracking Metrics in Software Engineering

Zohreh Sharafi, Timothy Shaffer, Bonita Sharif, and Yann-Gaël Guéhéneuc
Conference Papers Asia-Pacific Software Engineering Conference (APSEC), 2015


Eye-tracking studies are getting more prevalent in software engineering. Researchers often use different metrics when publishing their results in eye-tracking studies. Even when the same metrics are used, they are given different names, causing difficulties in comparing studies. To encourage replications and facilitate advancing the state of the art, it is important that the metrics used by researchers be clearly and consistently defined in the literature. There is therefore a need for a survey of eyetracking metrics to support the (future) goal of standardizing eyetracking metrics. This paper seeks to bring awareness to the use of different metrics along with practical suggestions on using them. It compares and contrasts various eye-tracking metrics used in software engineering. It also provides definitions for common metrics and discusses some metrics that the software engineering community might borrow from other fields.

An Empirical Study on the Efficiency of Graphical vs. Textual Representations in Requirements Comprehension

Zohreh Sharafi, Alessandro Marchetto, Angelo Susi, Giuliano Antoniol, and Yann-Gaël Guéhéneuc
Conference Papers 21st International Conference on Program Comprehension (ICPC), May 2013. IEEE Computer Society Press


Graphical representations are used to visualise, specify, and document software artifacts in all stages of software development process. In contrast with text, graphical representations are presented in two-dimensional form, which seems easy to process. However, few empirical studies investigated the efficiency of graphical representations vs. textual ones in modelling and presenting software requirements.

Therefore, in this paper, we report the results of an eye-tracking experiment involving 28 participants to study the impact of structured textual vs. graphical representations on subjects' efficiency while performing requirement comprehension tasks. We measure subjects' efficiency in terms of the percentage of correct answers (accuracy) and of the time and effort spend to perform the tasks. We observe no statistically-significant difference in term of accuracy. However, our subjects spent more time and effort while working with the graphical representation although this extra time and effort does not affect accuracy. Our findings challenge the general assumption that graphical representations are more efficient than the textual ones at least in the case of developers not familiar with the graphical representation. Indeed, our results emphasise that training can significantly improve the efficiency of our subjects working with graphical representations. Moreover, by comparing the visual paths of our subjects, we observe that the spatial structure of the graphical representation leads our subjects to follow two different strategies (top-down vs. bottomup) and subsequently this hierarchical structure helps developers to ease the difficulty of model comprehension tasks.

An Empirical Study on Requirements Traceability Using Eye-Tracking

Nasir Ali, Zohreh Sharafi, Yann-Gaël Guéhéneuc, and Giuliano Antoniol
Conference Papers 28th IEEE International Conference on Software Maintenance (ICSM) - 2012, Trento, Italy


Requirements traceability (RT) links help developers to understand programs and ensure that their source code is consistent with its documentation. Creating RT links is a laborious and resource-consuming task. Information Retrieval (IR) techniques are useful to automatically recover traceability links. However, IR-based approaches typically have low accuracy (precision and recall) and, thus, creating RT links remains a human intensive process. We conjecture that understanding how developers verify RT links could help improve the accuracy of IR-based approaches to recover RT links.

Consequently, we perform an empirical study consisting of two controlled experiments. First, we use an eye-tracking system to capture developers' eye movements while they verify RT links. We analyse the obtained data to identify and rank developers' preferred source code entities (SCEs), e.g., class names, method names. Second, we use the ranked SCEs to propose two new weighting schemes called SE/IDF (source code entity/inverse document frequency) and DOI/IDF (domain or implementation/inverse document frequency) to recover RT links combined with an IR technique. SE/IDF is based on the developers preferred SCEs to verify RT links. DOI/IDF is an extension of SE/IDF distinguishing domain and implementation concepts. We use LSI combined with SE/IDF, DOI/IDF, and TF/IDF to show, using two systems, iTrust and Pooka, that LSIDOI/IDF statistically improves the accuracy of the recovered RT links over LSITF/IDF.

Women & Men: Different but Equal: A Study on the Impact of Identifiers on Source Code Understanding

Zohreh Sharafi, Zéphyrin Soh, Yann-Gaël Guéhéneuc, and Giuliano Antoniol
Conference Papers 20th IEEE International Conference on Program Comprehension (ICPC) - 2012, Passau, Bavaria, Germany


Program comprehension is preliminary to any program evolution task. Researchers agree that identifiers play an important role in code reading and program understanding activities. Yet, to the best of our knowledge, only one work investigated the impact of gender on the memorability of identifiers and thus, ultimately, on program comprehension. This paper reports the results of an experiment involving 15 male subjects and nine female subjects to study the impact of gender on the subjects' visual effort, required time, as well as accuracy to recall Camel Case versus Underscore identifiers in source code reading.

We observe no statistically-significant difference in term of accuracy, required time, and effort. However, our data supports the conjecture that male and female subjects follow different comprehension strategies: female subjects seem to carefully weight all options and spend more time to rule out wrong answers while male subjects seem to quickly set their minds on some answers, possibly the wrong ones. Indeed, we found that the effort spent on wrong answers is significantly higher for female subjects and that there is an interaction between the effort that female subjects invested on wrong answers and their higher percentages of correct answers when compared to male subjects.

Professional Status or Expertise for UML Class Diagram Comprehension: An Empirical Study

Zéphyrin Soh, Zohreh Sharafi, Bertrand Van Den Plas, Gerardo Cepeda, Yann-Gaël Guéhéneuc, and Giuliano Antoniol
Conference Papers 20th IEEE International Conference on Program Comprehension (ICPC) - 2012, Passau, Bavaria, Germany


Professional experience is one of the most important criteria for almost any job offer in software engineering. Professional experience refers both to professional status (practitioner vs. student) and expertise (expert vs. novice). We perform an experiment with 21 subjects including both practitioners and students, and experts and novices. We seek to understand the relation between the speed and accuracy of the subjects and their status and expertise in performing maintenance tasks on UML class diagrams. We also study the impact of the formulation of the maintenance task. We use an eye-tracking system to gather the fixations of the subjects when performing the task. We measure the subjects' comprehension using their accuracy, the time spent, the search effort, the overall effort, and the question comprehension effort.

We found that (1) practitioners are more accurate than students while students spend around 35 percent less time than practitioners, (2) experts are more accurate than novices while novices spending around 33 percent less time than experts, (3) expertise is the most important factor for accuracy and speed, (4) experienced students are more accurate and spend around 37 percent less time than experienced practitioners, and (5) when the description of the task is precise, the novice students can be accurate. We conclude that it is an illusion for project managers to focus on status only when recruiting a software engineer. Our result is the starting point to consider the differences between status and expertise when studying software engineers' productivity. Thus, it can help project managers to recruit productive engineers and motivated students to acquire the experience and ability in the projects.

Extending the UML Metamodel to Provide Support for Cross-cutting Concerns

Zohreh Sharafi, Parisa Mirshams, Abdelwahab Hamou-Lhadj, and Constantinos Constantinides
Conference Papers 8th ACIS International Conference on Software Engineering Research, Management and Applications(SERA). pp. 149-157


Aspect-orientation is a term used to describe approaches that explicitly capture, model and implement crosscutting concerns (or aspects). There is currently a number of new programming languages as well as extensions to current programming languages, the design dimensions of most of which have been influenced by the AspectJ language through three concepts and their respective constructs, namely join points, point cuts and advice which can support two principles recognized as being key concepts of aspect-oriented programming (AOP): quantification and obliviousness. At the modeling level, the reception of AOP has long been focused on the modeling of AspectJ programs, and there exists no model that is generic enough to capture non-AspectJ aspects either as a source language during forward engineering or as a target language during reverse engineering.

In this paper, we present an extension to the UML metamodel to explicitly capture crosscutting concerns. The model is independent from any programming language and abstracted away from platform specific details. An instantiation of the newly created metamodel can be represented in standard XMI format, which enables current CASE tools to read and to visualize the instance models in UML. This language-independent aspectual description can support model transformations vital to software development and maintenance, such as forward engineering, reverse engineering, and reengineering.

Model Based Global Image Registration

Niloofar Gheissari, Mostafa Kamali, Parisa Mirshams, and Zohreh Sharafi
Conference Papers 3rd International Conference on Computer Vision Theory and Applications (VISAPP), pp. 440-445


In this paper, we propose a model-based image registration method capable of detecting the true transformation model between two images. We incorporate a statistical model selection criterion to choose the true underlying transformation model. Therefore, the proposed algorithm is robust to degeneracy as any degeneracy is detected by the model selection component. In addition, the algorithm is robust to noise and outliers since any corresponding pair that does not undergo the chosen model is rejected by a robust fitting method adapted from the literature. Another important contribution of this paper is evaluating a number of different model selection criteria for image registration task. We evaluated all different criteria based on different levels of noise. We conclude that CAIC, GBIC slightly outperform other criteria for this application. The next choices are GIC, SSD and MDL. Finally we create panorama images using our registration algorithm. The panorama images show the success of this algorithm.