David J. Dittman

chapter

Investigating the Variation of Ensemble Size on Bagging-Based Classifier Performance in Imbalanced Bioinformatics Datasets

Alireza Fazelpour, Taghi M. Khoshgoftaar, David J. Dittman, Amri Naplitano

2016 IEEE 17th International Conference on Information Reuse and Integration (IRI) > 377 - 383

2016 IEEE 17th International Conference on Information Reuse and Integration (IRI)

Bagging ensemble techniques have been utilized effectively by practitioners in the field of bioinformatics to alleviate the problem of class imbalance and to improve the performance of classification models. However, many previous works have used bagging only with a single arbitrary number of iterations. In this study, we raise the question of what is the impact of altering the number of iterations/ensembles...

chapter

Feature Selection Algorithms for Mining High Dimensional DNA Microarray Data

David J. Dittman, Taghi M. Khoshgoftaar, Randall Wald, Jason Hulse

Handbook of Data Intensive Computing > Applications > 685-710

The World Heath Organization identified cancer as the second largest contributor to death worldwide, surpassed only by cardiovascular disease. The death count for cancer in 2002 was 7.1 million and is expected to rise to 11.5 million annually by 2030 [17]. In 2009, the International Conference on Machine Learning and Applications, or ICMLA, proposed a challenge regarding gene expression profiles in...

chapter

Does the Inclusion of Data Sampling Improve the Performance of Boosting Algorithms on Imbalanced Bioinformatics Data?

Alireza Fazelpour, Taghi M. Khoshgoftaar, David J. Dittman, Amri Napolitano

2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA) > 527 - 534

2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)

Bioinformatics datasets contain many challenging characteristics, such as class imbalance, which adversely impacts the performance of supervised classification models built on these datasets. Techniques such as ensemble learning and data sampling from the domain of data mining can be deployed to alleviate the problem and to improve the classification performance. In this study, we sought to seek whether...

chapter

Investigating New Bootstrapping Approaches of Bagging Classifiers to Account for Class Imbalance in Bioinformatics Datasets

Alireza Fazelpour, Taghi M. Khoshgoftaar, David J. Dittman, Amri Napolitano

2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA) > 987 - 994

2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA)

One major challenge posed by bioinformatics datasets is class imbalance which occurs when one class has many more instances than the other class(es). Its undesirable effect on the classification performance is compounded with the fact that, in general, the class with fewer instances is the class of interest. Bagging has been utilized by practitioners in the field to overcome the challenge of class...

chapter

Ensemble vs. Data Sampling: Which Option Is Best Suited to Improve Classification Performance of Imbalanced Bioinformatics Data?

Taghi M. Khoshgoftaar, Alireza Fazelpour, David J. Dittman, Amri Napolitano

2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI) > 705 - 712

2015 IEEE 27th International Conference on Tools with Artificial Intelligence (ICTAI)

Bioinformatics datasets contain challenging characteristics, such as class imbalance that occurs when one class has many more instances than the other class(es). These challenges make the task of classification much more subtle for practitioners and researchers in the field. Fortunately, there are tools, such as ensemble learning and data sampling methods that can be applied to overcome these problems...

chapter

Observing the Effect of the Choice of Classifier on Bioinformatics Data with Varying Levels of Data Quality and Class Balance

Alireza Fazelpour, Taghi M. Khoshgoftaar, David J. Dittman, Ahmad Abu Shanab

2015 IEEE International Conference on Information Reuse and Integration > 372 - 379

2015 IEEE International Conference on Information Reuse and Integration (IRI)

Noise is a prominent challenge found in many bioinformatics datasets and it refers to erroneous or missing data. The presence of noise in gene expression datasets has adverse effects on machine-learning techniques, such as supervised classification algorithms and feature selection techniques. Additionally, the identification of noise and its quantification are challenging tasks that require a proper...

chapter

Alterations to the Bootstrapping Process within Random Forest: A Case Study on Imbalanced Bioinformatics Data

Taghi M. Khoshgoftaar, Alireza Fazelpour, David J. Dittman, Amri Napolitano

2015 IEEE International Conference on Information Reuse and Integration > 342 - 348

2015 IEEE International Conference on Information Reuse and Integration (IRI)

Class imbalance is a significant challenge that practitioners in the field of bioinformatics are faced with on a daily basis. It is a phenomenon that occurs when number of instances of one class is much greater than number of instances of the other class(es) and it has adverse effects on the performance of classification models built on this skewed data. Random Forest as a robust classifier has been...

chapter

Choosing an Appropriate Ensemble Classifier for Balanced Bioinformatics Data

Alireza Fazelpour, Taghi M. Khsohgoftaar, David J. Dittman, Amri Napolitano

2015 IEEE International Conference on Information Reuse and Integration > 17 - 24

2015 IEEE International Conference on Information Reuse and Integration (IRI)

Bioinformatics datasets contain a number of characteristics, such as noisy data and difficult to learn class boundaries, which make it challenge to build effective predictive models. One option for improving results is the use of ensemble learning methods, which involve combining the results of multiple predictive models into a single decision. Since we do not rely on a single model, we reduce the...

chapter

The Effect of Data Sampling When Using Random Forest on Imbalanced Bioinformatics Data

David J. Dittman, Taghi M. Khoshgoftaar, Amri Napolitano

2015 IEEE International Conference on Information Reuse and Integration > 457 - 463

2015 IEEE International Conference on Information Reuse and Integration (IRI)

Ensemble learning is a powerful tool that has shown promise when applied towards bioinformatics datasets. In particular, the Random Forest classifier has been an effective and popular algorithm due to its relatively good classification performance and its ease of use. However, Random Forest does not account for class imbalance which is known for decreasing classification performance and increasing...

chapter

Using Random Undersampling to Alleviate Class Imbalance on Tweet Sentiment Data

Joseph Prusa, Taghi M. Khoshgoftaar, David J. Dittman, Amri Napolitano

2015 IEEE International Conference on Information Reuse and Integration > 197 - 202

2015 IEEE International Conference on Information Reuse and Integration (IRI)

Sentiment classification of tweets is used for a variety of social sensing tasks and provides a means of discerning public opinion on a wide range of topics. A potential concern when performing sentiment classification is that the training data may contain class imbalance, which can negatively affect classification performance. A classifier trained on imbalanced data may be biased in favor of the...

chapter

Building an Effective Classification Model for Breast Cancer Patient Response Data

Brian Heredia, Taghi M. Khoshgoftaar, Alireza Fazelpour, David J. Dittman

2015 IEEE International Conference on Information Reuse and Integration > 229 - 235

2015 IEEE International Conference on Information Reuse and Integration (IRI)

Choosing an appropriate cancer treatment is potentially the most important task in the treatment of a cancer patient. If it were possible to identify the best option for a patient (or at minimum to remove options that will not help the patient), then the general prognosis of the patient improves. However, this task becomes much more subtle due to characteristics such as high dimensionality found in...

chapter

Select-Bagging: Effectively Combining Gene Selection and Bagging for Balanced Bioinformatics Data

David J. Dittman, Taghi M. Khoshgoftaar, Amri Napolitano, Alireza Fazelpour

2014 IEEE International Conference on Bioinformatics and Bioengineering > 413 - 419

2014 IEEE International Conference on Bioinformatics and Bioengineering (BIBE)

Bioinformatics datasets have historically been difficult to work with. However, within machine learning, there is a potentially effective tool to combat such problems: ensemble learning. Ensemble learning generates a series of models and combines their results to make a single decision. This process has the benefit of utilizing the power of multiple models but the overhead of having to compute the...

chapter

Effects of the Use of Boosting on Classification Performance of Imbalanced Bioinformatics Datasets

Taghi M. Khoshgoftaar, Alireza Fazelpour, David J. Dittman, Amri Napolitano

2014 IEEE International Conference on Bioinformatics and Bioengineering > 420 - 426

2014 IEEE International Conference on Bioinformatics and Bioengineering (BIBE)

In the domain of bioinformatics, two common problems encountered when analyzing real-world datasets are class imbalance and high dimensionality. Boosting is a technique that can be used to improve classification performance, even in the presence of class imbalance. In addition, data sampling and feature selection are two important preprocessing techniques used to counter the adverse effects of both...

chapter

Selecting the Appropriate Data Sampling Approach for Imbalanced and High-Dimensional Bioinformatics Datasets

David J. Dittman, Taghi M. Khoshgoftaar, Amri Napolitano

2014 IEEE International Conference on Bioinformatics and Bioengineering > 304 - 310

2014 IEEE International Conference on Bioinformatics and Bioengineering (BIBE)

One of the more prevalent problems when working with bioinformatics datasets is class imbalance, when there are more instances in one class compared to the other class (es). This problem is made worse because frequently, the class of interest is also the minority class. A possible solution is data sampling, a powerful tool for combating class imbalance by adding or removing instances to make the dataset...

chapter

Classification performance of three approaches for combining data sampling and gene selection on bioinformatics data

Taghi M. Khoshgoftaar, Alireza Fazelpour, David J. Dittman, Amri Napolitano

Proceedings of the 2014 IEEE 15th International Conference on Information Reuse and Integration (IEEE IRI 2014) > 315 - 321

2014 IEEE International Conference on Information Reuse and Integration (IRI)

Bioinformatics datasets pose two major challenges to researchers and data-mining practitioners: class imbalance and high dimensionality. Class imbalance occurs when instances of one class vastly outnumber instances of the other class(es), and high dimensionality occurs when a dataset has many independent features (genes). Data sampling is often used to tackle the problem of class imbalance, and the...

chapter

Contrasting Undersampled Boosting with Internal and External Feature Selection for Patient Response Datasets

Taghi M. Khoshgoftaar, David J. Dittman, Randall Wald, Amri Napolitano

2013 12th International Conference on Machine Learning and Applications > 2 > 404 - 410

2013 12th International Conference on Machine Learning and Applications (ICMLA)

Class imbalance (where one class has many more instances than the other class(es)) and high dimensionality (large number of features per instance) are two prevalent problems that are frequently present in patient response datasets. In addition to these problems, these datasets are notoriously difficult to build effective models from. This paper introduces a new hybrid boosting algorithm named SelectRUSBoost...

chapter

Simplifying the Utilization of Machine Learning Techniques for Bioinformatics

David J. Dittman, Taghi M. Khoshgoftaar, Randall Wald, Amri Napolitano

2013 12th International Conference on Machine Learning and Applications > 2 > 396 - 403

2013 12th International Conference on Machine Learning and Applications (ICMLA)

The domain of bioinformatics has a number of challenges such as handling datasets which exhibit extreme levels of high dimensionality (large number of features per sample) and datasets which are particularly difficult to work with. These datasets contain many pieces of data (features) which are irrelevant and redundant to the problem being studied, which makes analysis quite difficult. However, techniques...

chapter

Random Forest with 200 Selected Features: An Optimal Model for Bioinformatics Research

Randall Wald, Taghi Khoshgoftaar, David J. Dittman, Amri Napolitano

2013 12th International Conference on Machine Learning and Applications > 1 > 154 - 160

2013 12th International Conference on Machine Learning and Applications (ICMLA)

Many problems in bioinformatics involve high-dimensional, difficult-to-process collections of data. For example, gene micro arrays can record the expression levels of thousands of genes, many of which have no relevance to the underlying medical or biological question. Building classification model son such datasets can thus take excessive computational time and still give poor results. Many strategies...

chapter

Maximizing Classification Performance for Patient Response Datasets

David J. Dittman, Taghi M. Khoshgoftaar, Randall Wald, Amri Napolitano

2013 IEEE 25th International Conference on Tools with Artificial Intelligence > 454 - 462

2013 IEEE 25th International Conference on Tools with Artificial Intelligence (ICTAI)

The ability to predict a patient's response to a treatment has long been a goal in the fields of medicine andpharmacology. This is especially true for cancer treatments, as many of these incur extreme side effects as a consequenceof destroying healthy cells along with cancerous ones. Geneprofiles such as DNA microarrays could potentially containinformation on which treatments are most likely to work...

chapter

A Review of Ensemble Classification for DNA Microarrays Data

Taghi M. Khoshgoftaar, David J. Dittman, Randall Wald, Wael Awada

2013 IEEE 25th International Conference on Tools with Artificial Intelligence > 381 - 389

2013 IEEE 25th International Conference on Tools with Artificial Intelligence (ICTAI)

Ensemble classification has been a frequent topicof research in recent years, especially in bioinformatics. The benefits of ensemble classification (less prone to overfitting, increased classification performance, and reduced bias) are aperfect match for a number of issues that plague bioinformaticsexperiments. This is especially true for DNA microarray dataexperiments, due to the large amount of...

INFONA - science communication portal

Search results for: David J. Dittman

Investigating the Variation of Ensemble Size on Bagging-Based Classifier Performance in Imbalanced Bioinformatics Datasets

Feature Selection Algorithms for Mining High Dimensional DNA Microarray Data

Does the Inclusion of Data Sampling Improve the Performance of Boosting Algorithms on Imbalanced Bioinformatics Data?

Investigating New Bootstrapping Approaches of Bagging Classifiers to Account for Class Imbalance in Bioinformatics Datasets

Ensemble vs. Data Sampling: Which Option Is Best Suited to Improve Classification Performance of Imbalanced Bioinformatics Data?

Observing the Effect of the Choice of Classifier on Bioinformatics Data with Varying Levels of Data Quality and Class Balance

Alterations to the Bootstrapping Process within Random Forest: A Case Study on Imbalanced Bioinformatics Data

Choosing an Appropriate Ensemble Classifier for Balanced Bioinformatics Data

The Effect of Data Sampling When Using Random Forest on Imbalanced Bioinformatics Data

Using Random Undersampling to Alleviate Class Imbalance on Tweet Sentiment Data

Building an Effective Classification Model for Breast Cancer Patient Response Data

Select-Bagging: Effectively Combining Gene Selection and Bagging for Balanced Bioinformatics Data

Effects of the Use of Boosting on Classification Performance of Imbalanced Bioinformatics Datasets

Selecting the Appropriate Data Sampling Approach for Imbalanced and High-Dimensional Bioinformatics Datasets

Classification performance of three approaches for combining data sampling and gene selection on bioinformatics data

Contrasting Undersampled Boosting with Internal and External Feature Selection for Patient Response Datasets

Simplifying the Utilization of Machine Learning Techniques for Bioinformatics

Random Forest with 200 Selected Features: An Optimal Model for Bioinformatics Research

Maximizing Classification Performance for Patient Response Datasets

A Review of Ensemble Classification for DNA Microarrays Data

Filter options

Publication date

Keywords

Data set

INFONA - science communication portal

Search results for: David J. Dittman

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Data set

Reporting an error / abuse

Sending the report failed

Accessibility options