Alireza Fazelpour

chapter

Investigating the Variation of Ensemble Size on Bagging-Based Classifier Performance in Imbalanced Bioinformatics Datasets

Alireza Fazelpour, Taghi M. Khoshgoftaar, David J. Dittman, Amri Naplitano

2016 IEEE 17th International Conference on Information Reuse and Integration (IRI) > 377 - 383

2016 IEEE 17th International Conference on Information Reuse and Integration (IRI)

Bagging ensemble techniques have been utilized effectively by practitioners in the field of bioinformatics to alleviate the problem of class imbalance and to improve the performance of classification models. However, many previous works have used bagging only with a single arbitrary number of iterations. In this study, we raise the question of what is the impact of altering the number of iterations/ensembles...

chapter

Does the Inclusion of Data Sampling Improve the Performance of Boosting Algorithms on Imbalanced Bioinformatics Data?

Alireza Fazelpour, Taghi M. Khoshgoftaar, David J. Dittman, Amri Napolitano

2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA) > 527 - 534

2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)

Bioinformatics datasets contain many challenging characteristics, such as class imbalance, which adversely impacts the performance of supervised classification models built on these datasets. Techniques such as ensemble learning and data sampling from the domain of data mining can be deployed to alleviate the problem and to improve the classification performance. In this study, we sought to seek whether...

chapter

Investigating New Bootstrapping Approaches of Bagging Classifiers to Account for Class Imbalance in Bioinformatics Datasets

Alireza Fazelpour, Taghi M. Khoshgoftaar, David J. Dittman, Amri Napolitano

2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA) > 987 - 994

2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)

One major challenge posed by bioinformatics datasets is class imbalance which occurs when one class has many more instances than the other class(es). Its undesirable effect on the classification performance is compounded with the fact that, in general, the class with fewer instances is the class of interest. Bagging has been utilized by practitioners in the field to overcome the challenge of class...

chapter

Ensemble vs. Data Sampling: Which Option Is Best Suited to Improve Classification Performance of Imbalanced Bioinformatics Data?

Taghi M. Khoshgoftaar, Alireza Fazelpour, David J. Dittman, Amri Napolitano

2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI) > 705 - 712

2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI)

Bioinformatics datasets contain challenging characteristics, such as class imbalance that occurs when one class has many more instances than the other class(es). These challenges make the task of classification much more subtle for practitioners and researchers in the field. Fortunately, there are tools, such as ensemble learning and data sampling methods that can be applied to overcome these problems...

chapter

Observing the Effect of the Choice of Classifier on Bioinformatics Data with Varying Levels of Data Quality and Class Balance

Alireza Fazelpour, Taghi M. Khoshgoftaar, David J. Dittman, Ahmad Abu Shanab

2015 IEEE International Conference on Information Reuse and Integration > 372 - 379

2015 IEEE International Conference on Information Reuse and Integration (IRI)

Noise is a prominent challenge found in many bioinformatics datasets and it refers to erroneous or missing data. The presence of noise in gene expression datasets has adverse effects on machine-learning techniques, such as supervised classification algorithms and feature selection techniques. Additionally, the identification of noise and its quantification are challenging tasks that require a proper...

chapter

Alterations to the Bootstrapping Process within Random Forest: A Case Study on Imbalanced Bioinformatics Data

Taghi M. Khoshgoftaar, Alireza Fazelpour, David J. Dittman, Amri Napolitano

2015 IEEE International Conference on Information Reuse and Integration > 342 - 348

2015 IEEE International Conference on Information Reuse and Integration (IRI)

Class imbalance is a significant challenge that practitioners in the field of bioinformatics are faced with on a daily basis. It is a phenomenon that occurs when number of instances of one class is much greater than number of instances of the other class(es) and it has adverse effects on the performance of classification models built on this skewed data. Random Forest as a robust classifier has been...

chapter

Choosing an Appropriate Ensemble Classifier for Balanced Bioinformatics Data

Alireza Fazelpour, Taghi M. Khsohgoftaar, David J. Dittman, Amri Napolitano

2015 IEEE International Conference on Information Reuse and Integration > 17 - 24

2015 IEEE International Conference on Information Reuse and Integration (IRI)

Bioinformatics datasets contain a number of characteristics, such as noisy data and difficult to learn class boundaries, which make it challenge to build effective predictive models. One option for improving results is the use of ensemble learning methods, which involve combining the results of multiple predictive models into a single decision. Since we do not rely on a single model, we reduce the...

chapter

Building an Effective Classification Model for Breast Cancer Patient Response Data

Brian Heredia, Taghi M. Khoshgoftaar, Alireza Fazelpour, David J. Dittman

2015 IEEE International Conference on Information Reuse and Integration > 229 - 235

2015 IEEE International Conference on Information Reuse and Integration (IRI)

Choosing an appropriate cancer treatment is potentially the most important task in the treatment of a cancer patient. If it were possible to identify the best option for a patient (or at minimum to remove options that will not help the patient), then the general prognosis of the patient improves. However, this task becomes much more subtle due to characteristics such as high dimensionality found in...

chapter

Select-Bagging: Effectively Combining Gene Selection and Bagging for Balanced Bioinformatics Data

David J. Dittman, Taghi M. Khoshgoftaar, Amri Napolitano, Alireza Fazelpour

2014 IEEE International Conference on Bioinformatics and Bioengineering > 413 - 419

2014 IEEE International Conference on Bioinformatics and Bioengineering (BIBE)

Bioinformatics datasets have historically been difficult to work with. However, within machine learning, there is a potentially effective tool to combat such problems: ensemble learning. Ensemble learning generates a series of models and combines their results to make a single decision. This process has the benefit of utilizing the power of multiple models but the overhead of having to compute the...

chapter

Effects of the Use of Boosting on Classification Performance of Imbalanced Bioinformatics Datasets

Taghi M. Khoshgoftaar, Alireza Fazelpour, David J. Dittman, Amri Napolitano

2014 IEEE International Conference on Bioinformatics and Bioengineering > 420 - 426

2014 IEEE International Conference on Bioinformatics and Bioengineering (BIBE)

In the domain of bioinformatics, two common problems encountered when analyzing real-world datasets are class imbalance and high dimensionality. Boosting is a technique that can be used to improve classification performance, even in the presence of class imbalance. In addition, data sampling and feature selection are two important preprocessing techniques used to counter the adverse effects of both...

chapter

Classification performance of three approaches for combining data sampling and gene selection on bioinformatics data

Taghi M. Khoshgoftaar, Alireza Fazelpour, David J. Dittman, Amri Napolitano

Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014) > 315 - 321

2014 IEEE International Conference on Information Reuse and Integration (IRI)

Bioinformatics datasets pose two major challenges to researchers and data-mining practitioners: class imbalance and high dimensionality. Class imbalance occurs when instances of one class vastly outnumber instances of the other class(es), and high dimensionality occurs when a dataset has many independent features (genes). Data sampling is often used to tackle the problem of class imbalance, and the...

chapter

Hidden dependencies between class imbalance and difficulty of learning for bioinformatics datasets

Randall Wald, Taghi M. Khoshgoftaar, Alireza Fazelpour, David J. Dittman

2013 IEEE 14th International Conference on Information Reuse & Integration (IRI) > 232 - 238

2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)

Many bioinformatics datasets share certain problems: they have class imbalance (one class with many more instances than the remaining class(es)), or are difficult to learn from (build accurate models with). Much research has investigated these two problems, or even considered both at once. However, hidden dependencies can exist between these two problems: in a given collection of datasets, the highly...

chapter

A survey of stability analysis of feature subset selection techniques

Taghi M. Khoshgoftaar, Alireza Fazelpour, Huanjing Wang, Randall Wald

2013 IEEE 14th International Conference on Information Reuse & Integration (IRI) > 424 - 431

2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)

With the proliferation of high-dimensional datasets across many application domains in recent years, feature selection has become an important data mining task due to its capability to improve both performance and computational efficiencies. The chosen feature subset is important not only due to its ability to improve classification performance, but also because in some domains, knowing the most important...

chapter

The use of balance-aware subsampling for bioinformatics datasets

Randall Wald, Taghi M. Khoshgoftaar, Alireza Fazelpour

2013 IEEE 14th International Conference on Information Reuse & Integration (IRI) > 325 - 332

2013 IEEE 14th International Conference on Information Reuse & Integration (IRI)

A major challenge facing data-mining practitioners in the field of bioinformatics is class imbalance, which occurs when instances of one class (called the majority class) vastly outnumber instances of the other (minority) classes. This can result in models with increased bias towards the majority class (minority-class instances predicted as being in the majority class). Data sampling, a process which...

chapter

First Order Statistics Based Feature Selection: A Diverse and Powerful Family of Feature Seleciton Techniques

Taghi Khoshgoftaar, David Dittman, Randall Wald, Alireza Fazelpour

2012 11th International Conference on Machine Learning and Applications > 2 > 151 - 157

2012 Eleventh International Conference on Machine Learning and Applications (ICMLA)

Dimensionality reduction techniques have become a required step when working with bioinformatics datasets. Techniques such as feature selection have been known to not only improve computation time, but to improve the results of experiments by removing the redundant and irrelevant features or genes from consideration in subsequent analysis. Univariate feature selection techniques in particular are...

INFONA - science communication portal

Search results for: Alireza Fazelpour

Investigating the Variation of Ensemble Size on Bagging-Based Classifier Performance in Imbalanced Bioinformatics Datasets

Does the Inclusion of Data Sampling Improve the Performance of Boosting Algorithms on Imbalanced Bioinformatics Data?

Investigating New Bootstrapping Approaches of Bagging Classifiers to Account for Class Imbalance in Bioinformatics Datasets

Ensemble vs. Data Sampling: Which Option Is Best Suited to Improve Classification Performance of Imbalanced Bioinformatics Data?

Observing the Effect of the Choice of Classifier on Bioinformatics Data with Varying Levels of Data Quality and Class Balance

Alterations to the Bootstrapping Process within Random Forest: A Case Study on Imbalanced Bioinformatics Data

Choosing an Appropriate Ensemble Classifier for Balanced Bioinformatics Data

Building an Effective Classification Model for Breast Cancer Patient Response Data

Select-Bagging: Effectively Combining Gene Selection and Bagging for Balanced Bioinformatics Data

Effects of the Use of Boosting on Classification Performance of Imbalanced Bioinformatics Datasets

Classification performance of three approaches for combining data sampling and gene selection on bioinformatics data

Hidden dependencies between class imbalance and difficulty of learning for bioinformatics datasets

A survey of stability analysis of feature subset selection techniques

The use of balance-aware subsampling for bioinformatics datasets

First Order Statistics Based Feature Selection: A Diverse and Powerful Family of Feature Seleciton Techniques

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results for: Alireza Fazelpour

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options