Search results for: T.M. Khoshgoftaar

Items from 1 to 10 out of 10 results

chapter

Mining Data from Multiple Software Development Projects

Huanjing Wang, T.M. Khoshgoftaar, Kehan Gao, N. Seliya

2009 IEEE International Conference on Data Mining Workshops > 551 - 557

2009 IEEE International Conference on Data Mining Workshops (ICDMW 2009)

A large system often goes through multiple software project development cycles, in part due to changes in operation and development environments. For example, rapid turnover of the development team between releases can influence software quality, making it important to mine software project data over multiple system releases when building defect predictors. Data collection of software attributes are...

chapter

Feature Selection with Imbalanced Data for Software Defect Prediction

T.M. Khoshgoftaar, Kehan Gao

2009 International Conference on Machine Learning and Applications > 235 - 240

Eighth International Conference on Machine Learning and Applications (ICMLA 2009)

In this paper, we study the learning impact of data sampling followed by attribute selection on the classification models built with binary class imbalanced data within the scenario of software quality engineering. We use a wrapper-based attribute ranking technique to select a subset of attributes, and the random undersampling technique (RUS) on the majority class to alleviate the negative effects...

chapter

Exploring Software Quality Classification with a Wrapper-Based Feature Ranking Technique

Kehan Gao, T.M. Khoshgoftaar, A. Napolitano

2009 21st IEEE International Conference on Tools with Artificial Intelligence > 67 - 74

2009 21st IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2009)

Feature selection is a process of selecting a subset of relevant features for building learning models. It is an important activity for data preprocessing used in software quality modeling and other data mining problems. Feature selection algorithms can be divided into two categories, feature ranking and feature subset selection. Feature ranking orders the features by a criterion and a user selects...

chapter

An empirical comparison of repetitive undersampling techniques

J. Van Hulse, T.M. Khoshgoftaar, A. Napolitano

2009 IEEE International Conference on Information Reuse&Integration > 29 - 34

2009 IEEE International Conference on Information Reuse & Integration (IRI 2009)

A common problem for data mining and machine learning practitioners is class imbalance. When examples of one class greatly outnumber examples of the other class (es), traditional machine learning algorithms can perform poorly. Random undersampling is a technique that has shown great potential for alleviating the problem of class imbalance. However, undersampling leads to information loss which can...

chapter

Comparison of Four Performance Metrics for Evaluating Sampling Techniques for Low Quality Class-Imbalanced Data

A. Folleco, T.M. Khoshgoftaar, A. Napolitano

2008 Seventh International Conference on Machine Learning and Applications > 153 - 158

2008 Seventh International Conference on Machine Learning and Applications

Erroneous attribute values can significantly impact learning from otherwise valuable data. The learning impact can be exacerbated by the class imbalanced training data. We investigate and compare the overall learning impact of sampling such data by using four distinct performance metrics suitable for models built from binary class imbalanced data. Seven relatively free of noise, class imbalanced software...

chapter

A Comparative Study of Data Sampling and Cost Sensitive Learning

C. Seiffert, T.M. Khoshgoftaar, J. Van Hulse, A. Napolitano

2008 IEEE International Conference on Data Mining Workshops > 46 - 52

2008 IEEE International Conference on Data Mining Workshops

Two common challenges data mining and machine learning practitioners face in many application domains are unequal classification costs and class imbalance. Most traditional data mining techniques attempt to maximize overall accuracy rather than minimize cost. When data is imbalanced, such techniques result in models that highly favor the over represented class, the class which typically carries a...

chapter

Imputed Neighborhood Based Collaborative Filtering

Xiaoyuan Su, T.M. Khoshgoftaar, R. Greiner

2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology > 1 > 633 - 639

2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology

Collaborative filtering (CF) is one of the most effective types of recommender systems. As data sparsity remains a significant challenge for CF, we consider basing predictions on imputed data, and find this often improves performance on very sparse rating data. In this paper, we propose two imputed neighborhood based collaborative filtering (INCF) algorithms: imputed nearest neighborhood CF (INN-CF)...

chapter

RUSBoost: Improving classification performance when training data is skewed

C. Seiffert, T.M. Khoshgoftaar, J. Van Hulse, A. Napolitano

2008 19th International Conference on Pattern Recognition > 1 - 4

ICPR 2008 19th International Conference on Pattern Recognition

Constructing classification models using skewed training data can be a challenging task. We present RUSBoost, a new algorithm for alleviating the problem of class imbalance. RUSBoost combines data sampling and boosting, providing a simple and efficient method for improving classification performance when training data is imbalanced. In addition to performing favorably when compared to SMOTEBoost (another...

chapter

Improving Learner Performance with Data Sampling and Boosting

C. Seiffert, T.M. Khoshgoftaar, J. Van Hulse, A. Napolitano

2008 20th IEEE International Conference on Tools with Artificial Intelligence > 1 > 452 - 459

2008 20th IEEE International Conference on Tools with Artificial Intelligence (ICTAI)

Learning from imbalanced datasets is a well known problem in the data mining community. Many techniques have been proposed to alleviate the problems associated with class imbalance, including data sampling and boosting. While data sampling has received the bulk of the attention from the research community, our results show that boosting often results in better classification performance than even...

chapter

Addressing Class Imbalance in Non-binary Classification Problems

N. Seliya, Zhiwei Xu, T.M. Khoshgoftaar

2008 20th IEEE International Conference on Tools with Artificial Intelligence > 1 > 460 - 466

2008 20th IEEE International Conference on Tools with Artificial Intelligence (ICTAI)

The problem of class imbalance in machine learning is quite real and cumbersome when it comes to building a useful and practical classification model. We present a unique insight into addressing class imbalance for classification problems that involve three or more categories, i.e. non-binary. This study is different than related works in the literature because most works focus on addressing class...

Filter options

Keywords:
DATA MODELS

Publication date

Set your own date range

Keywords

DATA MINING (7)
LEARNING (ARTIFICIAL INTELLIGENCE) (7)
CLASS IMBALANCE (5)
DATA SAMPLING (5)
MEASUREMENT (5)
TRAINING (5)
TRAINING DATA (5)
MACHINE LEARNING (4)
PATTERN CLASSIFICATION (4)
SOFTWARE QUALITY (4)
ACCURACY (3)
BIOLOGICAL SYSTEM MODELING (3)
BOOSTING (3)
DECISION TREES (3)
CLASSIFICATION ALGORITHMS (2)
CLASSIFICATION MODEL (2)
FEATURE SELECTION (2)
SUPPORT VECTOR MACHINES (2)
ANALYSIS OF VARIANCE (1)
APPLICATION DOMAIN (1)
AREA UNDER PRC (1)
AREA UNDER ROC (1)
ARTIFICIAL INTELLIGENCE (1)
ATTRIBUTE SELECTION (1)
BAYESIAN METHODS (1)
BAYESIAN MULTIPLE IMPUTATION (1)
BEST ARITHMETIC MEAN (1)
BEST F-MEASURE (1)
BEST GEOMETRIC MEAN (1)
BINARY CLASS IMBALANCED DATA (1)
BOOSTING ALGORITHM (1)
COLLABORATION (1)
COLLABORATIVE FILTERING (1)
CORRELATION (1)
COST SENSITIVE LEARNING (1)
DATA ANALYSIS (1)
DATA COLLECTION (1)
DATA PREPROCESSING (1)
DATA QUALITY (1)
DATA SAMPLING METHODS (1)
DATA SAMPLING TECHNIQUES (1)
DATA SPARSITY (1)
DEFAULT ARITHMETIC MEAN (1)
DEFAULT F-MEASURE (1)
DEFAULT GEOMETRIC MEAN (1)
DEVELOPMENT PROCESS (1)
EMPIRICAL COMPARISON (1)
EQUATIONS (1)
ERRONEOUS ATTRIBUTE VALUES (1)
FAULT DIAGNOSIS (1)
FEATURE RANKING TECHNIQUE (1)
FILTERING (1)
FREQUENCY MODULATION (1)
GLASS (1)
HINDER CLASSIFICATION (1)
HYBRID FEATURE SELECTION TECHNIQUE (1)
IMBALANCED DATA (1)
IMBALANCED SOFTWARE ENGINEERING MEASUREMENT (1)
IMPACT LEARNING (1)
IMPUTATION TECHNIQUES (1)
IMPUTED DENSEST NEIGHBORHOOD COLLABORATIVE FILTERING (1)
IMPUTED NEAREST NEIGHBORHOOD COLLABORATIVE FILTERING (1)
INFORMATION FILTERING (1)
INFORMATION FILTERS (1)
LEARNING (1)
LEARNING IMPACT (1)
LEARNING MODELS (1)
LOW QUALITY CLASS-IMBALANCED DATA (1)
MEAN IMPUTATION (1)
MULTILAYER PERCEPTRON (1)
MULTILAYER PERCEPTRONS (1)
MULTIPLE IMPUTATION (1)
MULTIPLE SOFTWARE DEVELOPMENT PROJECTS (1)
NIOBIUM (1)
NOISE (1)
NOISE INJECTION PROCEDURE (1)
NOISE MEASUREMENT (1)
NON-BINARY CLASSIFIERS (1)
NONBINARY CLASSIFICATION (1)
OVERALL ACCURACY (1)
PEARSON CORRELATION-BASED COLLABORATIVE FILTERING ALGORITHM (1)
PERFORMANCE METRIC (1)
PERFORMANCE METRICS (1)
PREDICTION ALGORITHMS (1)
PREDICTIVE MODELS (1)
QUALITY BINARY CLASS IMBALANCED DATA (1)
RANDOM OVERSAMPLING (1)
RANDOM PROCESSES (1)
RANDOM UNDERSAMPLING (1)
RANDOM UNDERSAMPLING TECHNIQUE (1)
RECOMMENDER SYSTEMS (1)
REPETITIVE UNDERSAMPLING TECHNIQUE (1)
RISK IMPACT (1)
SAMPLING METHODS (1)
SAMPLING TECHNIQUES (1)
SENSITIVITY ANALYSIS (1)
SIMULATED NOISE (1)
SKEWED TRAINING DATA (1)
SOFTWARE (1)
more

INFONA - science communication portal

Search results for: T.M. Khoshgoftaar

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options