Advanced search

chapter

"How Not to Do It": Anti-patterns for Data Science in Software Engineering

Tim Menzies

2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C) > 887

2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C)

Many books and papers describe how to do data science. While those texts are useful, it can also be important to reflect on anti-patterns; i.e. common classes of errors seen when large communities of researchers and commercial software engineers use, and misuse data mining tools. This technical briefing will present those errors and show how to avoid them.

chapter

Security Expert Recommender in Software Engineering

Shahab Bayati

2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C) > 719 - 721

2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C)

Software engineering is a complex filed with diverse specialties. By the growth of Internet based applications, information security plays an important role in software development process. Finding expert software engineers who have expertise in information security requires too much effort. Stack Overflow is the largest social Q&A Website in the field of software engineering. Stack Overflow contains...

chapter

Candoia: A Platform and Ecosystem for Mining Software Repositories Tools

Nitin M Tiwari, Ganesha Upadhyaya, Hridesh Rajan

2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C) > 759 - 761

2016 IEEE/ACM 38th International Conference on Software Engineering Companion (ICSE-C)

We introduce Candoia, a platform and ecosystem for building Mining Software Repositories (MSR) tools. The platform is designed to support building of MSR tools by providing necessary tools and abstractions that hide the complex details of version control, bug databases, source code programming languages and forges. The ecosystem allows easy sharing and accessing of MSR apps for researchers and practitioners...

chapter

Software emergence for need based large data processing in engineering problems

Abdul Waheed, Qasim Ali

2017 IEEE 2nd International Conference on Big Data Analysis (ICBDA)( > 442 - 446

2017 IEEE 2nd International Conference on Big Data Analysis (ICBDA)

Large data handling and analysis either on industrial level or on research level has always been facing problems. These problems increase with the increase in machine dedicated software packages. Large data processing and analysis is prone to errors and is time consuming while moving data from data generation to data analysis. In this paper, first-methods of data generation, methods to move data from...

chapter

Comparison of complex network analysis software: Citespace, SCI² and Gephi

Jing Yang, Changxiu Cheng, Shi Shen, Shanli Yang

2017 IEEE 2nd International Conference on Big Data Analysis (ICBDA)( > 169 - 172

2017 IEEE 2nd International Conference on Big Data Analysis (ICBDA)

Big Data Analysis (BDA) has attracted considerable interest and curiosity from scientists of various fields recently. As big size and complexity of big data, it is pivotal to uncover hidden patterns, bursts of activity, correlations and laws of it. Complex network analysis could be effective method for this purpose, because of its powerful data organization and visualization ability. Besides the general...

chapter

Test Case Generation and Prioritization: A Process-Mining Approach

Andrea Janes

2017 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW) > 38 - 39

2017 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW)

Test cases are an essential tool in software quality assurance: they ensure that code behaves as specified in the requirement. However, writing test cases does not have only benefits, it comes with a cost: the programmer has to formulate the test cases and maintain them when the tested source code changes. Particularly for start-ups or small enterprises such costs become prohibitive, which often prefer...

chapter

A framework for classifying and comparing source code recommendation systems

Mohammad Ghafari, Hamidreza Moradi

2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER) > 555 - 556

2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER)

The use of Application Programming Interfaces (APIs) is pervasive in software systems; it makes the development of new software much easier, but remembering large APIs with sophisticated usage protocol is arduous for software developers. Code recommendation systems alleviate this burden by providing developers with a ranked list of API usages that are estimated to be most useful to their development...

chapter

Lost comments support program comprehension

Takayuki Omori

2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER) > 567 - 568

2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER)

Source code comments are valuable to keep developers' explanations of code fragments. Proper comments help code readers understand the source code quickly and precisely. However, developers sometimes delete valuable comments since they do not know about the readers' knowledge and think the written comments are redundant. This paper describes a study of lost comments based on edit operation histories...

chapter

Two improvements to detect duplicates in Stack Overflow

Yuji Mizobuchi, Kuniharu Takayama

2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER) > 563 - 564

2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER)

Stack Overflow is one of the most popular question-and-answer sites for programmers. However, there are a great number of duplicate questions that are expected to be detected automatically in a short time. In this paper, we introduce two approaches to improve the detection accuracy: splitting body into different types of data and using word-embedding to treat word ambiguities that are not contained...

chapter

Statically identifying class dependencies in legacy JavaScript systems: First results

Leonardo Humberto Silva, Marco Tulio Valente, Alexandre Bergel

2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER) > 427 - 431

2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER)

Identifying dependencies between classes is an essential activity when maintaining and evolving software applications. It is also known that JavaScript developers often use classes to structure their projects. This happens even in legacy code, i.e., code implemented in JavaScript versions that do not provide syntactical support to classes. However, identifying associations and other dependencies between...

chapter

What information about code snippets is available in different software-related documents? An exploratory study

Preetha Chatterjee, Manziba Akanda Nishi, Kostadin Damevski, Vinay Augustine, more

2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER) > 382 - 386

2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER)

A large corpora of software-related documents is available on the Web, and these documents offer the unique opportunity to learn from what developers are saying or asking about the code snippets that they are discussing. For example, the natural language in a bug report provides information about what is not functioning properly in a particular code snippet. Previous research has mined information...

chapter

Does the release cycle of a library project influence when it is adopted by a client project?

Daiki Fujibayashi, Akinori Ihara, Hirohiko Suwa, Raula Gaikovina Kula, more

2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER) > 569 - 570

2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER)

A key goal of this research is to understand the relationship between adoption of software library versions and its release cycle. In detail, we conducted an empirical study of the release cycle of 23 libraries and how they were adopted by 415 Apache Software Foundation (ASF) client projects. Our preliminary findings show that software projects are quicker to update earlier rapid-release libraries...

chapter

Extracting executable transformations from distilled code changes

Reinout Stevens, Coen De Roover

2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER) > 171 - 181

2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER)

Change distilling algorithms compute a sequence of fine-grained changes that, when executed in order, transform a given source AST into a given target AST. The resulting change sequences are used in the field of mining software repositories to study source code evolution. Unfortunately, detecting and specifying source code evolutions in such a change sequence is cumbersome. We therefore introduce...

chapter

Mining the Modern Code Review Repositories: A Dataset of People, Process and Product

Xin Yang, Raula Gaikovina Kula, Norihiro Yoshida, Hajimu Iida

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR) > 460 - 463

2016 IEEE/ACM 13th Conference on Mining Software Repositories (MSR)

In this paper, we present a collection of Modern Code Review data for five open source projects. The data showcases mined data from both an integrated peer review system and source code repositories. We present an easy–to–use andricher data structure to retrieve the 1.) People 2.) Process and 3.) Product aspects of the peer review. This paperpresents the extraction methodology, the dataset structure,...

chapter

How the R Community Creates and Curates Knowledge: A Comparative Study of Stack Overflow and Mailing Lists

Alexey Zagalsky, Carlos Gomez Teshima, Daniel M. German, Margaret-Anne Storey, more

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR) > 441 - 451

2016 IEEE/ACM 13th Conference on Mining Software Repositories (MSR)

One of the many effects of social media in software development is the flourishing of very large communities of practice where members share a common interest, such as programming languages, frameworks, and tools. These communities of practice use many different communication channels but little is known about how these communities create, share, and curate knowledge using such channels. In this paper,...

chapter

Findings from GitHub: Methods, Datasets and Limitations

Valerio Cosentino, Javier Luis Canovas Izquierdo, Jordi Cabot

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR) > 137 - 141

2016 IEEE/ACM 13th Conference on Mining Software Repositories (MSR)

GitHub, one of the most popular social coding platforms, is the platform of reference when mining Open Source repositories to learn from past experiences. In the last years, a number of research papers have been published reporting findings based on data mined from GitHub. As the community continues to deepen in its understanding of software engineering thanks to the analysis performed on this platform,...

chapter

Analysis of Exception Handling Patterns in Java Projects: An Empirical Study

Suman Nakshatri, Maithri Hegde, Sahithi Thandra

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR) > 500 - 503

2016 IEEE/ACM 13th Conference on Mining Software Repositories (MSR)

Exception handling is a powerful tool provided by many pro- gramming languages to help developers deal with unforeseen conditions. Java is one of the few programming languages to enforce an additional compilation check on certain sub- classes of the Exception class through checked exceptions. As part of this study, empirical data was extracted from soft- ware projects developed in Java. The intent...

chapter

The Emotional Side of Software Developers in JIRA

Marco Ortu, Alessandro Murgia, Giuseppe Destefanis, Parastou Tourani, more

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR) > 480 - 483

2016 IEEE/ACM 13th Conference on Mining Software Repositories (MSR)

ABSTRACTIssue tracking systems store valuable data for testing hy-potheses concerning maintenance, building statistical pre-diction models and (recently) investigating developer affec-tiveness. For the latter, issue tracking systems can be minedto explore developers emotions, sentiments and politeness, affects for short. However, research on affect detection insoftware artefacts is still in its early...

chapter

Judging a Commit by Its Cover: Correlating Commit Message Entropy with Build Status on Travis-CI

Eddie Antonio Santos, Abram Hindle

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR) > 504 - 507

2016 IEEE/ACM 13th Conference on Mining Software Repositories (MSR)

Developers summarize their changes to code in commit messages.When a message seems “unusual’', however, this puts doubt into the quality of the code contained in the commit. We trained n-gram language models and used cross-entropy as an indicator of commit message “unusualness” of over 120,000 commits from open source projects.Build statuses collected from Travis-CI were used as a proxy for code quality...

chapter

A Dataset of Simplified Syntax Trees for C#

Sebastian Proksch, Sven Amann, Sarah Nadi, Mira Mezini

2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR) > 476 - 479

2016 IEEE/ACM 13th Conference on Mining Software Repositories (MSR)

In this paper, we present a curated collection of 2833 C# solutions taken from Github. We encode the data in a new intermediate representation (IR) that facilitates further analysis by restricting the complexity of the syntax tree and by avoiding implicit information. The dataset is intended as a standardized input for research on recommendation systems for software engineering, but is also useful...

INFONA - science communication portal

Advanced search

Advanced search in people

"How Not to Do It": Anti-patterns for Data Science in Software Engineering

Security Expert Recommender in Software Engineering

Candoia: A Platform and Ecosystem for Mining Software Repositories Tools

Software emergence for need based large data processing in engineering problems

Comparison of complex network analysis software: Citespace, SCI² and Gephi

Test Case Generation and Prioritization: A Process-Mining Approach

A framework for classifying and comparing source code recommendation systems

Lost comments support program comprehension

Two improvements to detect duplicates in Stack Overflow

Statically identifying class dependencies in legacy JavaScript systems: First results

What information about code snippets is available in different software-related documents? An exploratory study

Does the release cycle of a library project influence when it is adopted by a client project?

Extracting executable transformations from distilled code changes

Mining the Modern Code Review Repositories: A Dataset of People, Process and Product

How the R Community Creates and Curates Knowledge: A Comparative Study of Stack Overflow and Mailing Lists

Findings from GitHub: Methods, Datasets and Limitations

Analysis of Exception Handling Patterns in Java Projects: An Empirical Study

The Emotional Side of Software Developers in JIRA

Judging a Commit by Its Cover: Correlating Commit Message Entropy with Build Status on Travis-CI

A Dataset of Simplified Syntax Trees for C#

Filter options

Publication date

Content availability

Publication type

Keywords

Data set

INFONA - science communication portal

Advanced search

Advanced search in people

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication type

Keywords

Data set

Reporting an error / abuse

Sending the report failed

Accessibility options