The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Currently, open source projects receive various kinds of issues daily, because of the extreme openness of Issue Tracking System (ITS) in GitHub. ITS is a labor-intensive and time-consuming task of issue categorization for project managers. However, a contributor is only required a short textual abstract to report an issue in GitHub. Thus, most traditional classification approaches based on detailed...
Commit comments increasingly receive attention as an important complementary component in code change comprehension. To address the comment scarcity issue, a variety of automatic approaches for commit comment generation have been intensively proposed. However, most of these approaches mechanically outline a superficial level summary of the changed software entities, the change intent behind the code...
Log analysis plays an important role for computer failure diagnosis. With the ever increasing size and complexity of logs, the task of analyzing logs has become cumbersome to carry out manually. For this reason, recent research has focused on automatic analysis techniques for large log files. However, log messages are texts with certain formats and it is very challenging for automatic analysis to...
As more and more companies become aware of the benefits of collecting and analyzing data, hiring employee with data analytics expertise is a key issue faced by HR practitioners. Although previous research empirically highlighted the differences of knowledge and skill requirements between big data (BD) and business intelligence (BI) in English-speaking countries, limited similar study is conducted...
Ultra-large-scale mining has been shown to be useful for a number of software engineering tasks e.g. mining specifications, defect prediction. We propose a new research direction for accelerating ultra-large-scale mining that goes beyond parallelization. Our key idea is to analyze the interaction pattern between the mining task and the artifact to cluster artifacts such that running the mining task...
Integrating code from different sources can be an error-prone and effort-intensive process. While an integration may appear statically sound, unexpected errors may still surface at run time. The industry practice of continuous integration aims to detect these and other run-time errors through an extensive pipeline of successive tests. Using data from a continuous integration service, Travis CI, we...
Over the last few years, researchers proposed several semantic history slicing approaches that identify the set of semantically-related commits implementing a particular software functionality. However, there is no comprehensive benchmark for evaluating these approaches, making it difficult to assess their capabilities. This paper presents a dataset of 81 semantic change data collected from 8 real-world...
Code reuse is a common practice among software developers,whether novices or experts. Developers often rely on onlineresources in order to find code to reuse. For Python, thePython Package Index (PyPI) contains all packages developedfor the community and is the largest catalog of reusable, opensource packages developers can consult. While a valuableresource, the state of the art PyPI search has very...
Bug localisation is a core program comprehension task in software maintenance: given the observation of a bug, where is it located in the source code files? Information retrieval (IR) approaches see a bug report as the query, and the source code files as the documents to be retrieved, ranked by relevance. Such approaches have the advantage of not requiring expensive static or dynamic analysis of the...
Many kinds of real world data can be modeled by a heterogeneous information network (HIN) which consists of multiple types of objects. Clustering plays an important role in mining knowledge from HIN. Several HIN clustering algorithms have been proposed in recent years. However, these algorithms suffer from one or moreof the following problems: (1) inability to model general HINs, (2) inability to...
The traditional duplicate bug reports detection approaches are usually based on vector space model. However, the experimental result is rarely satisfying since this method cannot distinguish semantic correlation among bug reports which written by natural languages. Topic model, as a method to model underlying topics of texts, can solve the problem of document similarity calculation methods used in...
Predictive models for software projects' characteristics have been traditionally based on project-level metrics, employing only little developer-level information, or none at all. In this work we suggest novel metrics that capture temporal and semantic developer-level information collected on a per developer basis. To address the scalability challenges involved in computing these metrics for each...
The present work proposes a paradigm for the analysis of social networks hidden within incomplete data models, based on the semantic mining of noisy corpora. A proof of concept is implemented and experimented on a partial database resulting from the capture of short text messages in line with the international project ‘Sms4science’.
To assist decision makers, there is need of providing an insight on current of scenario of market, considering really sensitive news involving economic events like acquisitions, stock splits, or dividend announcements. Similar to the work discussed above, to machine process natural language constraints, there is need of a mechanism that can automate events related information extraction and knowledge...
We introduce a framework to extract and parse Java source code, serialize it into RDF triples by applying an appropriate ontology and then analyze the resulting structured code information by using standard SPARQL queries. We present our experiments on a sample of 134 Java repositories collected from Github, obtaining 17 Million triples about methods, input and output types, comments, and other source...
Temporal properties are useful for describing and reasoning about software behavior, but developers rarely write down temporal specifications of their systems. Prior work on inferring specifications developed tools to extract likely program specifications that fit particular kinds of tool-specific templates. This paper introduces Texada, a new temporal specification mining tool for extracting specifications...
Recently, Begel et al. found that one of the most important questions software developers ask is "what parts of software are used/loved by users." User reviews provide an effective channel to address this question. However, most existing review summarization tools treat reviews as bags-of-words (i.e., mixed review categories) and are limited to extract software aspects and user preferences...
In this paper we present an ongoing FP7 project VENIS, where we are focusing on interoperability problems between Large Enterprise (LE) and Small/Medium Enterprise (SME) as well as Micro Enterprise (ME). We are proposing a solution for the enterprise search and enterprise interoperability, which follows the lightweight semantic approach. This approach is more suitable for SMEs and MEs, which are not...
GitHub is a social coding platform that enables developers to efficiently work on projects, connect with other developers, collaborate and generally "be seen: by the community. This visibility also extends to prospective employers and HR personnel who may use GitHub to learn more about a developer's skills and interests. We propose a pipeline that automatizes this process and automatically suggests...
Recent developments in geotechnologies provide resources to propose innovative strategies for urban and environmental management, including remote sensing data and computational resources for processing them. With the main objective of identifying urban areas of illegal occupation, this work uses WorldView-2-sensor images and the InterIMAGE, an image interpretation software, based on knowledge, under...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.