The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Big Data Analysis (BDA) has attracted considerable interest and curiosity from scientists of various fields recently. As big size and complexity of big data, it is pivotal to uncover hidden patterns, bursts of activity, correlations and laws of it. Complex network analysis could be effective method for this purpose, because of its powerful data organization and visualization ability. Besides the general...
Test cases are an essential tool in software quality assurance: they ensure that code behaves as specified in the requirement. However, writing test cases does not have only benefits, it comes with a cost: the programmer has to formulate the test cases and maintain them when the tested source code changes. Particularly for start-ups or small enterprises such costs become prohibitive, which often prefer...
The use of Application Programming Interfaces (APIs) is pervasive in software systems; it makes the development of new software much easier, but remembering large APIs with sophisticated usage protocol is arduous for software developers. Code recommendation systems alleviate this burden by providing developers with a ranked list of API usages that are estimated to be most useful to their development...
Source code comments are valuable to keep developers' explanations of code fragments. Proper comments help code readers understand the source code quickly and precisely. However, developers sometimes delete valuable comments since they do not know about the readers' knowledge and think the written comments are redundant. This paper describes a study of lost comments based on edit operation histories...
Stack Overflow is one of the most popular question-and-answer sites for programmers. However, there are a great number of duplicate questions that are expected to be detected automatically in a short time. In this paper, we introduce two approaches to improve the detection accuracy: splitting body into different types of data and using word-embedding to treat word ambiguities that are not contained...
Identifying dependencies between classes is an essential activity when maintaining and evolving software applications. It is also known that JavaScript developers often use classes to structure their projects. This happens even in legacy code, i.e., code implemented in JavaScript versions that do not provide syntactical support to classes. However, identifying associations and other dependencies between...
A large corpora of software-related documents is available on the Web, and these documents offer the unique opportunity to learn from what developers are saying or asking about the code snippets that they are discussing. For example, the natural language in a bug report provides information about what is not functioning properly in a particular code snippet. Previous research has mined information...
A key goal of this research is to understand the relationship between adoption of software library versions and its release cycle. In detail, we conducted an empirical study of the release cycle of 23 libraries and how they were adopted by 415 Apache Software Foundation (ASF) client projects. Our preliminary findings show that software projects are quicker to update earlier rapid-release libraries...
Change distilling algorithms compute a sequence of fine-grained changes that, when executed in order, transform a given source AST into a given target AST. The resulting change sequences are used in the field of mining software repositories to study source code evolution. Unfortunately, detecting and specifying source code evolutions in such a change sequence is cumbersome. We therefore introduce...
In this paper, we present a collection of Modern Code Review data for five open source projects. The data showcases mined data from both an integrated peer review system and source code repositories. We present an easy–to–use andricher data structure to retrieve the 1.) People 2.) Process and 3.) Product aspects of the peer review. This paperpresents the extraction methodology, the dataset structure,...
One of the many effects of social media in software development is the flourishing of very large communities of practice where members share a common interest, such as programming languages, frameworks, and tools. These communities of practice use many different communication channels but little is known about how these communities create, share, and curate knowledge using such channels. In this paper,...
GitHub, one of the most popular social coding platforms, is the platform of reference when mining Open Source repositories to learn from past experiences. In the last years, a number of research papers have been published reporting findings based on data mined from GitHub. As the community continues to deepen in its understanding of software engineering thanks to the analysis performed on this platform,...
Exception handling is a powerful tool provided by many pro- gramming languages to help developers deal with unforeseen conditions. Java is one of the few programming languages to enforce an additional compilation check on certain sub- classes of the Exception class through checked exceptions. As part of this study, empirical data was extracted from soft- ware projects developed in Java. The intent...
ABSTRACTIssue tracking systems store valuable data for testing hy-potheses concerning maintenance, building statistical pre-diction models and (recently) investigating developer affec-tiveness. For the latter, issue tracking systems can be minedto explore developers emotions, sentiments and politeness, affects for short. However, research on affect detection insoftware artefacts is still in its early...
Developers summarize their changes to code in commit messages.When a message seems “unusual’', however, this puts doubt into the quality of the code contained in the commit. We trained n-gram language models and used cross-entropy as an indicator of commit message “unusualness” of over 120,000 commits from open source projects.Build statuses collected from Travis-CI were used as a proxy for code quality...
In this paper, we present a curated collection of 2833 C# solutions taken from Github. We encode the data in a new intermediate representation (IR) that facilitates further analysis by restricting the complexity of the syntax tree and by avoiding implicit information. The dataset is intended as a standardized input for research on recommendation systems for software engineering, but is also useful...
In software development process, developers often seek solutions to the technical problems they encounter by searching relevant questions on Q&A sites. When developers fail to find solutions on Q&A sites in their native language (e.g., Chinese), they could translate their query and search on the Q&A sites in another language (e.g., English). However, developers who are non-native English...
Many studies analyze issue tracking repositories to understand and support software development. To facilitate the analyses, we share a Mozilla issue tracking dataset covering a 15-year history. The dataset includes three extracts and multiple levels for each extract. The three extracts were retrieved through two channels, a front-end (web user interface (UI)), and a back-end (official database dump)...
Stack Overflow is a popular question answering site that is focused on programming problems. Despite efforts to prevent asking questions that have already been answered, the site contains duplicate questions. This may cause developers to unnecessarily wait for a question to be answered when it has already been asked and answered. The site currently depends on its moderators and users with high reputation...
Exception handling is a technique that addresses exceptional conditions in applications, allowing the normal flow of execution to continue in the event of an exception and/or to report on such events. Although exception handling techniques, features and bad coding practices have been discussed both in developer communities and in the literature, there is a marked lack of empirical evidence on how...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.