The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Dependency-based software change impact analysis is the domain concerned with estimating the sets of artifacts impacted by a change to a related artifact. Research has shown that analysing the various class dependency types independently will never completely reveal the impact sets. Therefore, dependency types are combined to improve the precision of estimated when compared to impact sets. Software...
Currently, open source projects receive various kinds of issues daily, because of the extreme openness of Issue Tracking System (ITS) in GitHub. ITS is a labor-intensive and time-consuming task of issue categorization for project managers. However, a contributor is only required a short textual abstract to report an issue in GitHub. Thus, most traditional classification approaches based on detailed...
A wide range of text-based artifacts contribute to software projects (e.g., source code, test cases, use cases, project requirements, interaction diagrams, etc.). Traceability Link Recovery (TLR) is the software task in which relevant documents in these various sets are linked to one another, uncovering information about the project that is not available when considering only the documents themselves...
Developing reliable software at low cost is an objective for any software developer. Software errors are removed during the various review and test phases in the development lifecycle. Software is modified to eliminate these errors. Changes are implemented during the maintenance phases also. These software changes and fixes may introduce new errors and can cause defect propagation in different dependent...
Existing registries organize functionally similar services into groups without considering past service-usage from the consumers' perspective, a.k.a. pragmatics. Pragmatics can help registries to calculate service similarity more effectively and improve organization schemes. However, pragmatics are not available beforehand and their highly accumulated number over time creates time and space efficiency...
Integrating code from different sources can be an error-prone and effort-intensive process. While an integration may appear statically sound, unexpected errors may still surface at run time. The industry practice of continuous integration aims to detect these and other run-time errors through an extensive pipeline of successive tests. Using data from a continuous integration service, Travis CI, we...
We propose to study the impact of the representation of the data in defect prediction models. For this study, we focus on the use of developer activity data, from which we structure dependency graphs. Then, instead of manually generating features, such as network metrics, we propose a model inspired in recent advances in Representation Learning which are able to automatically learn representations...
Code reuse is a common practice among software developers,whether novices or experts. Developers often rely on onlineresources in order to find code to reuse. For Python, thePython Package Index (PyPI) contains all packages developedfor the community and is the largest catalog of reusable, opensource packages developers can consult. While a valuableresource, the state of the art PyPI search has very...
Predictive models for software projects' characteristics have been traditionally based on project-level metrics, employing only little developer-level information, or none at all. In this work we suggest novel metrics that capture temporal and semantic developer-level information collected on a per developer basis. To address the scalability challenges involved in computing these metrics for each...
This article studies the extent and practicality of plagiarism detection systems using multiple classifications of detection engines, further described within the article. An in-depth analysis of 8 individual articles from different fields of work was carried out allowing comparisons both between detection systems and different writing styles/formats. The first analysis used unmodified versions of...
The major part of risk the development of software or programs is existence of duplicate code that can affect the software maintainability. The main aim of Clone identification technique is to search and detect the parts of the software code which is identical. In the passed there are various techniques that are used to identify and reflect the code identity and code fragments. Code cloning reduces...
Context: Software source code is frequently changed for fixing revealed bugs. These bug-fixing changes might introduce unintended system behaviors, which are inconsistent with scenarios of existing regression test cases, and consequently break regression testing. For validating the quality of changes, regression testing is a required process before submitting changes during the development of software...
In Software Product Line (SPL) engineering, Feature Models (FMs) are widely used to capture and manage variability in a sound and organized fashion. Though semantics, notations and reasoning support are well established, maintaining large FMs is still an open problem. As large FMs naturally contain different concerns, some related to domains, others being inherently cross-cutting ones, it is challenging...
The problem of software artifact retrieval has the goal to effectively locate software artifacts, such as a piece of source code, in a large code repository. This problem has been traditionally addressed through the textual query. In other words, information retrieval techniques will be exploited based on the textual similarity between queries and textual representation of software artifacts, which...
Modern software and hardware designs are mostly hierarchical. Moreover, while the design specification is defined up-down, the design implementation and verification are done down-up. In such a case, as a rule, coverage properties for simulation-based verification are defined inconsistently for different stages of the design flow. The fact leads to the well known explosion of bug rate, when we pass...
Many automated software engineering approaches, including code search, bug report categorization, and duplicate bug report detection, measure similarities between two documents by analyzing natural language contents. Often different words are used to express the same meaning and thus measuring similarities using exact matching of words is insufficient. To solve this problem, past studies have shown...
In this paper, we report an experience on using and adapting Semantic Clustering to evaluate software remod-ularizations. Semantic Clustering is an approach that relies on information retrieval and clustering techniques to extract sets of similar classes in a system, according to their vocabularies. We adapted Semantic Clustering to support remodularization analysis. We evaluate our adaptation using...
Extract Class refactoring (ECR) is used to divide large classes with low cohesion into smaller, more cohesive classes. However, splitting a class might result in increased coupling in the system due to new dependencies between the extracted classes. Thus, ECR requires that a software engineer identifies a trade off between cohesion and coupling. Such a trade off may be difficult to identify manually...
The development cost of safety-critical embedded systems is dominated today by the cost of software including verification and validation. This cost is typically related to the complexity of the software functions implementing the desired system behavior in nominal and off-nominal conditions. A widely used measure of complexity is the cyclomatic number, which is computed on the implementation code...
Many software engineering tasks, such as feature location and duplicate bug report detection, leverages similarities among textual corpora. However, due to the different words used by developers to express the same concept, exact matching of words is insufficient. One document can contain a particular word while the other document may contain another word that is semantically related but is not the...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.