Where Is the Road for Issue Reports Classification Based on Text Mining?

Qiang Fan; Yue Yu; Gang Yin; Tao Wang; Huaimin Wang

doi:10.1109/ESEM.2017.19

Where Is the Road for Issue Reports Classification Based on Text Mining?

Fan, Qiang, Yu, Yue, Yin, Gang, Wang, Tao, Wang, Huaimin

Source

2017 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM) > 121 - 130

Abstract

Currently, open source projects receive various kinds of issues daily, because of the extreme openness of Issue Tracking System (ITS) in GitHub. ITS is a labor-intensive and time-consuming task of issue categorization for project managers. However, a contributor is only required a short textual abstract to report an issue in GitHub. Thus, most traditional classification approaches based on detailed and structured data (e.g., priority, severity, software version and so on) are difficult to adopt. In this paper, issue classification approaches on a large-scale dataset, including 80 popular projects and over 252,000 issue reports collected from GitHub, were investigated. First, four traditional text-based classification methods and their performances were discussed. Semantic perplexity (i.e., an issues description confuses bug-related sentences with nonbug-related sentences) is a crucial factor that affects the classification performances based on quantitative and qualitative study. Finally, A two-stage classifier framework based on the novel metrics of semantic perplexity of issue reports was designed. Results show that our two-stage classification can significantly improve issue classification performances.

Identifiers

book e-ISBN :	978-1-5090-4039-1
DOI	10.1109/ESEM.2017.19

Authors

Keywords

Computer bugs Software Data mining Feature extraction Semantics Measurement issue tracking system machine learning technique mining software repositories

Additional information

Data set: ieee

Publisher

IEEE

chapter

Read online
Download
Add to read later
Add to collection
Add to followed
Share

Export to bibliography


Assign to other user
	×
Wrong email address

INFONA - science communication portal

Where Is the Road for Issue Reports Classification Based on Text Mining? $("#expandableTitles").expandable();

Source

Abstract

Identifiers

Authors

User assignment

Assignment remove confirmation

You're going to remove this assignment. Are you sure?

Fan, Qiang

Yu, Yue

Yin, Gang

Wang, Tao

Keywords

Additional information

Publisher

Share

Export to bibliography

Reporting an error / abuse

Sending the report failed

Accessibility options

Where Is the Road for Issue Reports Classification Based on Text Mining?