The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
One of the most common causes of bugs is overlooking changes. To prevent bugs and improve the quality of the products, numerous studies have been undertaken on change guides based on logical couplings extracted from developers' past process histories, such as change history. While valuable change rules based on logical couplings can be gleaned found from the change history, these rules often fail...
Refactoring is an important technique to improve maintainability of software, and developers often use this technique during a development process. Before now, researchers have proposed some techniques finding refactoring opportunities for developers. Finding refactoring opportunities means identifying locations to be refactored. However, there are no specific criteria for developers to determine...
Static analysis tools are often used by software developers to entail early detection of potential faults, vulnerabilities, code smells, or to assess the source code adherence to coding standards and guidelines. Also, their adoption within Continuous Integration (CI) pipelines has been advocated by researchers and practitioners. This paper studies the usage of static analysis tools in 20 Java open...
A main difficulty to study the evolution and quality of real-life software systems is the effect of moderator factors, such as: programming skill, type of maintenance task, and learning effect. Experimenters must account for moderator factors to identify the relationships between the variables of interest. In practice, controlling for moderator factors in realistic (industrial) settings is expensive...
Continuous Integration (CI) has become a best practice of modern software development. Thanks in part to its tight integration with GitHub, Travis CI has emerged as arguably the most widely used CI platform for Open-Source Software (OSS) development. However, despite its prominent role in Software Engineering in practice, the benefits, costs, and implications of doing CI are all but clear from an...
Over the last few years, researchers proposed several semantic history slicing approaches that identify the set of semantically-related commits implementing a particular software functionality. However, there is no comprehensive benchmark for evaluating these approaches, making it difficult to assess their capabilities. This paper presents a dataset of 81 semantic change data collected from 8 real-world...
Source code comments are valuable to keep developers' explanations of code fragments. Proper comments help code readers understand the source code quickly and precisely. However, developers sometimes delete valuable comments since they do not know about the readers' knowledge and think the written comments are redundant. This paper describes a study of lost comments based on edit operation histories...
Change distilling algorithms compute a sequence of fine-grained changes that, when executed in order, transform a given source AST into a given target AST. The resulting change sequences are used in the field of mining software repositories to study source code evolution. Unfortunately, detecting and specifying source code evolutions in such a change sequence is cumbersome. We therefore introduce...
In this paper, we present a collection of Modern Code Review data for five open source projects. The data showcases mined data from both an integrated peer review system and source code repositories. We present an easy–to–use andricher data structure to retrieve the 1.) People 2.) Process and 3.) Product aspects of the peer review. This paperpresents the extraction methodology, the dataset structure,...
Many studies analyze issue tracking repositories to understand and support software development. To facilitate the analyses, we share a Mozilla issue tracking dataset covering a 15-year history. The dataset includes three extracts and multiple levels for each extract. The three extracts were retrieved through two channels, a front-end (web user interface (UI)), and a back-end (official database dump)...
Bug localisation is a core program comprehension task in software maintenance: given the observation of a bug, where is it located in the source code files? Information retrieval (IR) approaches see a bug report as the query, and the source code files as the documents to be retrieved, ranked by relevance. Such approaches have the advantage of not requiring expensive static or dynamic analysis of the...
Past research has proposed association rule mining as a means to uncover the evolutionary coupling from a system’s change history. These couplings have various applications, such as improving system decomposition and recommending related changes during development. The strength of the coupling can be characterized using a variety of interestingness measures. Existing recommendation engines typically...
Build systems describe how source code is translated into deliverables. Developers use build management tools like Maven to specify their build systems. Past work has shown that while Maven provides invaluable features (e.g., incremental building), it introduces an overhead on software development. Indeed, Maven build systems require maintenance. However, Maven build systems follow the build lifecycle,...
This paper proposes an active mining process for improvement of quality of clinical process by using service logs in a hospital information system. First, datasets of temporal change of the number of orders are extracted from service logs stored in hospital information system. Then, since datsets of temporal change can be viewed as time-series of a statistic, clustering can be applied to the data...
Association rule mining is an unsupervised learning technique that infers relationships among items in a data set. This technique has been successfully used to analyze a system's change history and uncover evolutionary coupling between system artifacts. Evolutionary coupling can, in turn, be used to recommend artifacts that are potentially affected by a given set of changes to the system. In general,...
Since its introduction 10 years ago, GIT has taken the world of version control systems (VCS) by storm. Its success is partly due to creating opportunities for new usage patterns that empower developers to work more efficiently. However, the resulting change in both user behavior and the way GIT stores changes impacts data mining and data analytics procedures [6], [13]. While some of these unique...
Many software projects adopt mailing lists for the communication of developers and users. Researchers have been mining the history of such lists to study communities' behavior, organization, and evolution. A potential threat of this kind of study is that users often use multiple email addresses to interact in a single mailing list. This can affect the results and tools, when, for example, extracting...
In order to better manage the software evolution in CBSE, this paper proposes a framework and its prototype supporting system from the technical and process management perspective. This framework integrates several components including software change control and tracking by defining the change template based on a process engine. The framework not only can support software evolution information collection...
In the research of Mining Software Repositories, source code repositories are one of the core sources since it contains the product and the process of software development. A source code repository stores the versions of files and makes it possible to browse the histories of files, such as modification dates, authors, messages, so on. Although such rich information of file histories is easily available,...
Micro-clones are small pieces of redundant code, such as repeated subexpressions or statements. In this paper, we establish the considerations and value toward automated detection and removal of micro-clones at scale. We leverage the Boa software mining infrastructure to detect micro-clones in a data set containing 380,125 Java repositories, and yield thousands of instances where redundant code may...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.