The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Automatic text summarization has been widely studied for more than fifty years. In software engineering, automatic summarization is an emerging area that shows great potential and poses new and exciting research challenges. This technical briefing provides an introduction to the state of the art and maps future research directions in automatic software summarization.
A good understanding of the practices followed by software development projects can positively impact their success — particularly for attracting talent and on-boarding new members. In this paper, we perform a cluster analysis to classify software projects that follow continuous integration in terms of their activity, popularity, size, testing, and stability. Based on this analysis, we identify and...
The Unified Modeling Language (UML) is widely taught in academia and has good acceptance in industry. However, there is not an ample dataset of UML diagrams publicly available. Our aim is to offer a dataset of UML files, together with meta-data of the software projects where the UML files belong to. Therefore, we have systematically mined over 12 million GitHub projects to find UML files in them....
Over the last few years, researchers proposed several semantic history slicing approaches that identify the set of semantically-related commits implementing a particular software functionality. However, there is no comprehensive benchmark for evaluating these approaches, making it difficult to assess their capabilities. This paper presents a dataset of 81 semantic change data collected from 8 real-world...
Software developers often need to repeat similar modifications in multiple different locations of a system's source code. These repeated similar modifications, or systematic edits, can be both tedious and error-prone to perform manually. While there are tools that can be used to assist in automating systematic edits, it is not straightforward to find out where the occurrences of a systematic edit...
In the emerging field of big data, a large volume of data has to be managed, operating on data of huge volume becomes easier when it's sorted and structured. The data can be structured using a simple algorithm i.e. index algorithm which stores and categories data on basis of their application. This in turn will be very beneficial on business level as well as on software level.
In Data Mining (DM) projects, more specifically in the Data Understanding and the Data Preparation phases, several techniques found in the literature are used to detect and handle data quality problems such as missing data, outliers, inconsistent data or time-variant data. However, the main limitation in the application of these techniques is the complexity caused by a lack of anticipation in the...
Today the microservice architectural style is being adopted by many key technological players such as Netflix, Amazon, The Guardian. A microservice architecture is composed of a large set of small services, each running in its own process and communicating with lightweight mechanisms (often via REST APIs). If on one side having a large set of independently developed services helps in terms of developer...
Taking Shenzhen Dafen Oil Painting Village as an example, this paper researched on the effect of convergence between creative industry park and tourism. By using ROST Content Mining 6 software which analyzed based on network text, this paper constructed a conceptual model on the developed effect of urban cultural and creative tourism, and then views the development situation of cultural and creative...
Generally, software evolution activity is a process of frequent iteration which produces large-volume, heterogeneous and unstructured data in a fast way. During this process, lots of noisy data and side effects are generated. In this way, these software evolution data form the so called four Vs of Big Data. So it is necessary to extract valuable information from the big software evolution data in...
Modern systems are growing in complexity beyond deep comprehension of developers. Increasing difficulties of keeping software projects on schedule and increasing recall rates are symptoms of this development. Consequently, developers need new methods and tools to build embedded systems, such as tools that dynamically analyze systems and recover comprehensible specifications of particular aspects....
The identification of vulnerabilities relies on detailed information about the target infrastructure. The gathering of the necessary information is a crucial step that requires an intensive scanning or mature expertise and knowledge about the system even though the information was already available in a different context. In this paper we propose a new method to detect vulnerabilities that reuses...
In a large, long-lived project, an effective code review process is key to ensuring the long-term quality of the code base. In this work, we study code review practices of a large, open source project, and we investigate how the developers themselves perceive code review quality. We present a qualitative study that summarizes the results from a survey of 88 Mozilla core developers. The results provide...
Software developers need access to different kinds of information which is often dispersed among different documentation sources, such as API documentation or Stack Overflow. We present an approach to automatically augment API documentation with "insight sentences" from Stack Overflow -- sentences that are related to a particular API type and that provide insight not contained in the API...
Software licensing determines how open source systems are reused, distributed, and modified from a legal perspective. While it facilitates rapid development, it can present difficulty for developers in understanding due to the legal language of these licenses. Because of misunderstandings, systems can incorporate licensed code in a way that violates the terms of the license. Our research first aimed...
As the popularity of mobile smart devices continues to climb the complexity of “apps” continues to increase, making the development and maintenance process challenging. Current bug tracking systems lack key features to effectively support construction of reports with actionable information that directly lead to a bug’s resolution. In this demo we present the implementation of a novel bug reporting...
There is a vast growth of generated event data being collected and stored by organizations. Within the field of Process Mining, this data has been used to discover, analyze and enhance processes from different domains. For this purpose there are hundreds of techniques available in different tools. These techniques are mostly focused on single processes. On the other hand, there are several proposals...
App developers naturally want to know which of their releases are successful and which are unsuccessful. Such information can help with release planning and requirements prioritisation and elicitation. To address this problem, I performed causal analysis on 52 weeks of popular app releases from Google Play and Windows Phone Store. The results reveal properties of successful releases in multiple app...
Code reuse is a common practice among software developers,whether novices or experts. Developers often rely on onlineresources in order to find code to reuse. For Python, thePython Package Index (PyPI) contains all packages developedfor the community and is the largest catalog of reusable, opensource packages developers can consult. While a valuableresource, the state of the art PyPI search has very...
Nowadays, software development projects produce a large number of software artifacts including source code, execution traces, end-user feedback, as well as informal documentation such as developers' discussions, change logs, StackOverflow, and code reviews. Such data embeds rich and significant knowledge about software projects, their quality and services, as well as the dynamics of software development...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.