Search results for: T.M. Khoshgoftaar

Items from 1 to 20 out of 29 results

chapter

Mining Data from Multiple Software Development Projects

Huanjing Wang, T.M. Khoshgoftaar, Kehan Gao, N. Seliya

2009 IEEE International Conference on Data Mining Workshops > 551 - 557

2009 IEEE International Conference on Data Mining Workshops (ICDMW 2009)

A large system often goes through multiple software project development cycles, in part due to changes in operation and development environments. For example, rapid turnover of the development team between releases can influence software quality, making it important to mine software project data over multiple system releases when building defect predictors. Data collection of software attributes are...

chapter

Feature Selection with High-Dimensional Imbalanced Data

J. Van Hulse, T.M. Khoshgoftaar, A. Napolitano, R. Wald

2009 IEEE International Conference on Data Mining Workshops > 507 - 514

2009 IEEE International Conference on Data Mining Workshops (ICDMW 2009)

Feature selection is an important topic in data mining, especially for high dimensional datasets. Filtering techniques in particular have received much attention, but detailed comparisons of their performance is lacking. This work considers three filters using classifier performance metrics and six commonly-used filters. All nine filtering techniques are compared and contrasted using five different...

chapter

Feature Selection with Imbalanced Data for Software Defect Prediction

T.M. Khoshgoftaar, Kehan Gao

2009 International Conference on Machine Learning and Applications > 235 - 240

Eighth International Conference on Machine Learning and Applications (ICMLA 2009)

In this paper, we study the learning impact of data sampling followed by attribute selection on the classification models built with binary class imbalanced data within the scenario of software quality engineering. We use a wrapper-based attribute ranking technique to select a subset of attributes, and the random undersampling technique (RUS) on the majority class to alleviate the negative effects...

chapter

Wrapper-Based Feature Ranking for Software Engineering Metrics

W. Altidor, T.M. Khoshgoftaar, A. Napolitano

2009 International Conference on Machine Learning and Applications > 241 - 246

Eighth International Conference on Machine Learning and Applications (ICMLA 2009)

The application of feature ranking to software engineering datasets is rare at best. In this study, we consider wrapper-based feature ranking where nine performance metrics aided by a particular learner are evaluated. We consider five learners and take two different approaches, each in conjunction with one of two different methodologies: 3-fold Cross-Validation (CV) and 3-fold Cross-Validation Risk...

chapter

High-Dimensional Software Engineering Data and Feature Selection

Huanjing Wang, T.M. Khoshgoftaar, Kehan Gao, N. Seliya

2009 21st IEEE International Conference on Tools with Artificial Intelligence > 83 - 90

2009 21st IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2009)

Software metrics collected during project development play a critical role in software quality assurance. A software practitioner is very keen on learning which software metrics to focus on for software quality prediction. While a concise set of software metrics is often desired, a typical project collects a very large number of metrics. Minimal attention has been devoted to finding the minimum set...

chapter

An Empirical Study on Wrapper-Based Feature Ranking

W. Altidor, T.M. Khoshgoftaar, J. Van Hulse

2009 21st IEEE International Conference on Tools with Artificial Intelligence > 75 - 82

2009 21st IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2009)

Feature selection has become the cornerstone of many classification problems. It has been applied in many domains such as Web mining, text categorization, gene expression microarray analysis, image analysis, and combinatorial chemistry. One type of well-studied feature selection methodology is filtering, which is typically divided into ranking and subset evaluation. This work provides an empirical...

chapter

A Study on the Relationships of Classifier Performance Metrics

N. Seliya, T.M. Khoshgoftaar, J. Van Hulse

2009 21st IEEE International Conference on Tools with Artificial Intelligence > 59 - 66

2009 21st IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2009)

There is no general consensus on which classifier performance metrics are better to use as compared to others. While some studies investigate a handful of such metrics in a comparative fashion, an evaluation of specific relationships among a large set of commonly-used performance metrics is much needed in the data mining and machine learning community. This study provides a unique insight into the...

chapter

Exploring Software Quality Classification with a Wrapper-Based Feature Ranking Technique

Kehan Gao, T.M. Khoshgoftaar, A. Napolitano

2009 21st IEEE International Conference on Tools with Artificial Intelligence > 67 - 74

2009 21st IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2009)

Feature selection is a process of selecting a subset of relevant features for building learning models. It is an important activity for data preprocessing used in software quality modeling and other data mining problems. Feature selection algorithms can be divided into two categories, feature ranking and feature subset selection. Feature ranking orders the features by a criterion and a user selects...

chapter

An empirical comparison of repetitive undersampling techniques

J. Van Hulse, T.M. Khoshgoftaar, A. Napolitano

2009 IEEE International Conference on Information Reuse&Integration > 29 - 34

2009 IEEE International Conference on Information Reuse & Integration (IRI 2009)

A common problem for data mining and machine learning practitioners is class imbalance. When examples of one class greatly outnumber examples of the other class (es), traditional machine learning algorithms can perform poorly. Random undersampling is a technique that has shown great potential for alleviating the problem of class imbalance. However, undersampling leads to information loss which can...

chapter

Aggregating performance metrics for classifier evaluation

N. Seliya, T.M. Khoshgoftaar, J. Van Hulse

2009 IEEE International Conference on Information Reuse&Integration > 35 - 40

2009 IEEE International Conference on Information Reuse & Integration (IRI 2009)

There are several performance metrics that have been proposed for evaluating a classification model, e.g., accuracy, error rates, precision, recall, etc. While it is known that evaluating a classifier on only one performance metric is not advisable, the use of multiple performance metrics poses unique comparative challenges for the analyst. Since different performance metrics provide different perspectives...

chapter

An empirical investigation of filter attribute selection techniques for software quality classification

Kehan Gao, T.M. Khoshgoftaar, Huanjing Wang

2009 IEEE International Conference on Information Reuse&Integration > 272 - 277

2009 IEEE International Conference on Information Reuse & Integration (IRI 2009)

Attribute selection is an important activity in data preprocessing for software quality modeling and other data mining problems. The software quality models have been used to improve the fault detection process. Finding faulty components in a software system during early stages of software development process can lead to a more reliable final product and can reduce development and maintenance costs...

chapter

Comparison of Four Performance Metrics for Evaluating Sampling Techniques for Low Quality Class-Imbalanced Data

A. Folleco, T.M. Khoshgoftaar, A. Napolitano

2008 Seventh International Conference on Machine Learning and Applications > 153 - 158

2008 Seventh International Conference on Machine Learning and Applications

Erroneous attribute values can significantly impact learning from otherwise valuable data. The learning impact can be exacerbated by the class imbalanced training data. We investigate and compare the overall learning impact of sampling such data by using four distinct performance metrics suitable for models built from binary class imbalanced data. Seven relatively free of noise, class imbalanced software...

chapter

A Comparative Study of Data Sampling and Cost Sensitive Learning

C. Seiffert, T.M. Khoshgoftaar, J. Van Hulse, A. Napolitano

2008 IEEE International Conference on Data Mining Workshops > 46 - 52

2008 IEEE International Conference on Data Mining Workshops

Two common challenges data mining and machine learning practitioners face in many application domains are unequal classification costs and class imbalance. Most traditional data mining techniques attempt to maximize overall accuracy rather than minimize cost. When data is imbalanced, such techniques result in models that highly favor the over represented class, the class which typically carries a...

chapter

Imputed Neighborhood Based Collaborative Filtering

Xiaoyuan Su, T.M. Khoshgoftaar, R. Greiner

2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology > 1 > 633 - 639

2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology

Collaborative filtering (CF) is one of the most effective types of recommender systems. As data sparsity remains a significant challenge for CF, we consider basing predictions on imputed data, and find this often improves performance on very sparse rating data. In this paper, we propose two imputed neighborhood based collaborative filtering (INCF) algorithms: imputed nearest neighborhood CF (INN-CF)...

chapter

RUSBoost: Improving classification performance when training data is skewed

C. Seiffert, T.M. Khoshgoftaar, J. Van Hulse, A. Napolitano

2008 19th International Conference on Pattern Recognition > 1 - 4

ICPR 2008 19th International Conference on Pattern Recognition

Constructing classification models using skewed training data can be a challenging task. We present RUSBoost, a new algorithm for alleviating the problem of class imbalance. RUSBoost combines data sampling and boosting, providing a simple and efficient method for improving classification performance when training data is imbalanced. In addition to performing favorably when compared to SMOTEBoost (another...

chapter

Improving Learner Performance with Data Sampling and Boosting

C. Seiffert, T.M. Khoshgoftaar, J. Van Hulse, A. Napolitano

2008 20th IEEE International Conference on Tools with Artificial Intelligence > 1 > 452 - 459

2008 20th IEEE International Conference on Tools with Artificial Intelligence (ICTAI)

Learning from imbalanced datasets is a well known problem in the data mining community. Many techniques have been proposed to alleviate the problems associated with class imbalance, including data sampling and boosting. While data sampling has received the bulk of the attention from the research community, our results show that boosting often results in better classification performance than even...

chapter

Addressing Class Imbalance in Non-binary Classification Problems

N. Seliya, Zhiwei Xu, T.M. Khoshgoftaar

2008 20th IEEE International Conference on Tools with Artificial Intelligence > 1 > 460 - 466

2008 20th IEEE International Conference on Tools with Artificial Intelligence (ICTAI)

The problem of class imbalance in machine learning is quite real and cumbersome when it comes to building a useful and practical classification model. We present a unique insight into addressing class imbalance for classification problems that involve three or more categories, i.e. non-binary. This study is different than related works in the literature because most works focus on addressing class...

chapter

Resampling or Reweighting: A Comparison of Boosting Implementations

C. Seiffert, T.M. Khoshgoftaar, J. Van Hulse, A. Napolitano

2008 20th IEEE International Conference on Tools with Artificial Intelligence > 1 > 445 - 451

2008 20th IEEE International Conference on Tools with Artificial Intelligence (ICTAI)

Boosting has been shown to improve the performance of classifiers in many situations, including when data is imbalanced. There are, however, two possible implementations of boosting, and it is unclear which should be used. Boosting by reweighting is typically used, but can only be applied to base learners which are designed to handle example weights. On the other hand, boosting by resampling can be...

chapter

Using Imputation Techniques to Help Learn Accurate Classifiers

Xiaoyuan Su, T.M. Khoshgoftaar, R. Greiner

2008 20th IEEE International Conference on Tools with Artificial Intelligence > 1 > 437 - 444

2008 20th IEEE International Conference on Tools with Artificial Intelligence (ICTAI)

It is difficult to learn good classifiers when training data is missing attribute values. Conventional techniques for dealing with such omissions, such as mean imputation, generally do not significantly improve the performance of the resulting classifier. We proposed imputation-helped classifiers, which use accurate imputation techniques, such as Bayesian multiple imputation (BMI), predictive mean...

chapter

Software quality modeling: The impact of class noise on the random forest classifier

A. Folleco, T.M. Khoshgoftaar, J. Van Hulse, L. Bullard

2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence) > 3853 - 3859

2008 IEEE Congress on Evolutionary Computation (CEC)

This study investigates the impact of increasing levels of simulated class noise on software quality classification. Class noise was injected into seven software engineering measurement datasets, and the performance of three learners, random forests, C4.5, and Naive Bayes, was analyzed. The random forest classifier was utilized for this study because of its strong performance relative to well-known...

Publication type:
book

Publication date

Set your own date range

Content availability

Available (28)
None (1)

Keywords

DATA MINING (17)
LEARNING (ARTIFICIAL INTELLIGENCE) (12)
MEASUREMENT (11)
PATTERN CLASSIFICATION (11)
DATA MODELS (10)
SOFTWARE QUALITY (9)
TRAINING (9)
ACCURACY (8)
SUPPORT VECTOR MACHINES (7)
CLASSIFICATION ALGORITHMS (6)
DATA SAMPLING (6)
MACHINE LEARNING (6)
SOFTWARE METRICS (6)
TRAINING DATA (6)
CLASS IMBALANCE (5)
DECISION TREES (5)
FEATURE SELECTION (5)
BIOLOGICAL SYSTEM MODELING (4)
BOOSTING (4)
CORRELATION (4)
INFORMATION FILTERING (4)
NIOBIUM (4)
SOFTWARE QUALITY CLASSIFICATION (4)
BAYES METHODS (3)
CLASSIFICATION (3)
MULTILAYER PERCEPTRONS (3)
PERFORMANCE METRICS (3)
SOFTWARE ENGINEERING (3)
SOFTWARE FAULT TOLERANCE (3)
SOFTWARE MEASUREMENT (3)
SOFTWARE QUALITY MODELING (3)
AREA UNDER PRC (2)
AREA UNDER ROC (2)
BAYESIAN MULTIPLE IMPUTATION (2)
BEST ARITHMETIC MEAN (2)
BEST F-MEASURE (2)
BEST GEOMETRIC MEAN (2)
BINARY CLASSIFICATION (2)
BOOSTING ALGORITHM (2)
CLASSIFICATION MODEL (2)
CLASSIFIER PERFORMANCE METRICS (2)
DATA ANALYSIS (2)
DATA PREPROCESSING (2)
EIGENVALUES AND EIGENFUNCTIONS (2)
FEATURE EXTRACTION (2)
FREQUENCY MODULATION (2)
IMBALANCED DATA (2)
IMPUTATION TECHNIQUES (2)
LEARNING (2)
LOADING (2)
LOGISTIC REGRESSION (2)
MEAN IMPUTATION (2)
MULTILAYER PERCEPTRON (2)
NOISE (2)
NOISE MEASUREMENT (2)
OVERALL ACCURACY (2)
PREDICTION ALGORITHMS (2)
PREDICTIVE MODELS (2)
RANDOM PROCESSES (2)
RANDOM UNDERSAMPLING (2)
REGRESSION ANALYSIS (2)
ROUGH SET THEORY (2)
SAMPLING METHODS (2)
SOFTWARE (2)
WRAPPER-BASED FEATURE RANKING (2)
3- FOLD CROSS VALIDATION RISK IMPACT (1)
3-FOLD CROSS VALIDATION (1)
3-FOLD CROSS-VALIDATION RISK IMPACT (1)
5-NEAREST NEIGHBOR LEARNING (1)
802.11 NETWORK (1)
ADABOOST (1)
ALGORITHM DESIGN AND ANALYSIS (1)
ANALYSIS OF VARIANCE (1)
ANOMALOUS BEHAVIORS DETECTION (1)
APPLICATION DOMAIN (1)
ARTIFICIAL INTELLIGENCE (1)
ARTIFICIAL NEURAL NETWORKS (1)
ATTRIBUTE SELECTION (1)
AUTOMATIC HYBRID SEARCH (1)
BAYESIAN METHODS (1)
BELIEF NETWORKS (1)
BINARY CLASS IMBALANCED DATA (1)
BIOINFORMATICS (1)
BIOLOGICAL INFORMATION THEORY (1)
BOOLEAN PREDICATES (1)
C4.5 (1)
C4.5 DECISION TREES (1)
CLASS NOISE (1)
CLASSIFICATION DATASETS (1)
CLASSIFICATION TREE ANALYSIS (1)
CLASSIFIER EVALUATION (1)
CLUSTERING-BASED INTRUSION DETECTION (1)
COLLABORATION (1)
COLLABORATIVE FILTERING (1)
COMBINATORIAL CHEMISTRY (1)
COMMUNICATION SYSTEM TRAFFIC (1)
CONTENT-BOOSTED CF (1)
COST SENSITIVE LEARNING (1)
COVARIANCE MATRIX (1)
DATA CLASSIFIER (1)
more

INFONA - science communication portal

Search results for: T.M. Khoshgoftaar

Mining Data from Multiple Software Development Projects

Feature Selection with High-Dimensional Imbalanced Data

Feature Selection with Imbalanced Data for Software Defect Prediction

Wrapper-Based Feature Ranking for Software Engineering Metrics

High-Dimensional Software Engineering Data and Feature Selection

An Empirical Study on Wrapper-Based Feature Ranking

A Study on the Relationships of Classifier Performance Metrics

Exploring Software Quality Classification with a Wrapper-Based Feature Ranking Technique

An empirical comparison of repetitive undersampling techniques

Aggregating performance metrics for classifier evaluation

An empirical investigation of filter attribute selection techniques for software quality classification

Comparison of Four Performance Metrics for Evaluating Sampling Techniques for Low Quality Class-Imbalanced Data

A Comparative Study of Data Sampling and Cost Sensitive Learning

Imputed Neighborhood Based Collaborative Filtering

RUSBoost: Improving classification performance when training data is skewed

Improving Learner Performance with Data Sampling and Boosting

Addressing Class Imbalance in Non-binary Classification Problems

Resampling or Reweighting: A Comparison of Boosting Implementations

Using Imputation Techniques to Help Learn Accurate Classifiers

Software quality modeling: The impact of class noise on the random forest classifier

Filter options

Publication date

Content availability

Keywords

INFONA - science communication portal

Search results for: T.M. Khoshgoftaar

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options