The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We would like to take this opportunity to welcome you to the First International Workshop on Data Analysis Patterns in Software Engineering (DAPSE 2013).
We present the Source Code Statistical Language Model data analysis pattern. Statistical language models have been an enabling tool for a wide array of important language technologies. Speech recognition, machine translation, and document summarization (to name a few) all rely on statistical language models to assign probability estimates to natural language utterances or sentences. In this data analysis...
We present commit graphs, a graph representation of the commit history in version control systems. The graph is structured by commonly changed files between commits. We derive two analysis patterns relating to bug-fixing commits and system modularity.
The concept to commit pattern is used for tracing code changes from user requests (analyzing the mailing list) to change implementation (analyzing the code repository). The analysis is done via text mining of both emails and commits descriptions in 4 stages. The first stage is identifying a search time window for the mailing list by evaluating a targeted commit time stamp. Once a window is established,...
The paper introduces the concept of data analysis anti-patterns, i.e., data analysis procedures that may lead to invalid results that may mislead decision makers. Two examples of anti-patterns are presented and discussed.
When we seek insight in collected data we are most often forced to limit our measurements to a portion of all individuals that can be hypothetically considered for observation. Nevertheless, as researchers, we want to draw more general conclusions that are valid beyond the restricted subset we are currently analyzing. Statistical significance testing is a fundamental pattern of data analysis that...
Given the software engineering (SE) data, there does exist the binary relationship between entities and their properties within the data. Users are usually interested in their meaningful groupings of entities and properties. Formal concept analysis (FCA) is a powerful technique to deal with the binary relation between entities and entity properties to infer a hierarchy of concepts. The output of FCA...
Software developers and managers make decisions based on the understanding they have of their software systems. This understanding is both built up experientially and through investigating various software development artifacts. While artifacts can be investigated individually, being able to summarize characteristics about a set of development artifacts can be useful. In this paper we propose lifecycle...
The ‘Measure what counts’ pattern consists in evaluating software data analysis techniques against problem-specific measures related to cost and other stakeholders' goals instead of relying solely on generic metrics such as recall, precision, F-measure, and Receiver Operating Characteristic area.
This pattern was originally designed to classify sequences of events in log files by error-proneness. Sequences of events trace application use in real contexts. As such, identifying error-prone sequences helps understand and predict application use. The classification problem we describe is typical in supervised machine learning, but the composite pattern we propose investigates it with several techniques...
Bug reports provide insight about the quality of an evolving software and about its development process. Such data, however, is often incomplete and inaccurate, and thus should be cleaned before analysis. In this paper, we present patterns that help both novice and experienced data scientists to discard invalid bug data that could lead to wrong conclusions.
Bug reports record tasks performed by users and developers while collaborating to resolve bugs. Such data can be transformed into higher level information that helps data scientists understand various aspects of the team's development process. In this paper, we present patterns that show, step by step, how to extract higher level information about software verification from bug report data.
In this paper, we propose a data transformation pattern to transform sequential data into a set of binary/categorical features and numerical features to enable data analysis. These features capture both structural and temporal information inherent in sequential data.
Chunks are sets of code that have the property that a change that touches a chunk touches only that chunk. The pattern described in this paper defines chunks, indicates their usefulness, and provides an algorithm for calculating them.
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.