The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Bagging ensemble techniques have been utilized effectively by practitioners in the field of bioinformatics to alleviate the problem of class imbalance and to improve the performance of classification models. However, many previous works have used bagging only with a single arbitrary number of iterations. In this study, we raise the question of what is the impact of altering the number of iterations/ensembles...
The World Heath Organization identified cancer as the second largest contributor to death worldwide, surpassed only by cardiovascular disease. The death count for cancer in 2002 was 7.1 million and is expected to rise to 11.5 million annually by 2030 [17]. In 2009, the International Conference on Machine Learning and Applications, or ICMLA, proposed a challenge regarding gene expression profiles in...
Bioinformatics datasets contain many challenging characteristics, such as class imbalance, which adversely impacts the performance of supervised classification models built on these datasets. Techniques such as ensemble learning and data sampling from the domain of data mining can be deployed to alleviate the problem and to improve the classification performance. In this study, we sought to seek whether...
One major challenge posed by bioinformatics datasets is class imbalance which occurs when one class has many more instances than the other class(es). Its undesirable effect on the classification performance is compounded with the fact that, in general, the class with fewer instances is the class of interest. Bagging has been utilized by practitioners in the field to overcome the challenge of class...
Bioinformatics datasets contain challenging characteristics, such as class imbalance that occurs when one class has many more instances than the other class(es). These challenges make the task of classification much more subtle for practitioners and researchers in the field. Fortunately, there are tools, such as ensemble learning and data sampling methods that can be applied to overcome these problems...
Noise is a prominent challenge found in many bioinformatics datasets and it refers to erroneous or missing data. The presence of noise in gene expression datasets has adverse effects on machine-learning techniques, such as supervised classification algorithms and feature selection techniques. Additionally, the identification of noise and its quantification are challenging tasks that require a proper...
Class imbalance is a significant challenge that practitioners in the field of bioinformatics are faced with on a daily basis. It is a phenomenon that occurs when number of instances of one class is much greater than number of instances of the other class(es) and it has adverse effects on the performance of classification models built on this skewed data. Random Forest as a robust classifier has been...
Bioinformatics datasets contain a number of characteristics, such as noisy data and difficult to learn class boundaries, which make it challenge to build effective predictive models. One option for improving results is the use of ensemble learning methods, which involve combining the results of multiple predictive models into a single decision. Since we do not rely on a single model, we reduce the...
Ensemble learning is a powerful tool that has shown promise when applied towards bioinformatics datasets. In particular, the Random Forest classifier has been an effective and popular algorithm due to its relatively good classification performance and its ease of use. However, Random Forest does not account for class imbalance which is known for decreasing classification performance and increasing...
Sentiment classification of tweets is used for a variety of social sensing tasks and provides a means of discerning public opinion on a wide range of topics. A potential concern when performing sentiment classification is that the training data may contain class imbalance, which can negatively affect classification performance. A classifier trained on imbalanced data may be biased in favor of the...
Choosing an appropriate cancer treatment is potentially the most important task in the treatment of a cancer patient. If it were possible to identify the best option for a patient (or at minimum to remove options that will not help the patient), then the general prognosis of the patient improves. However, this task becomes much more subtle due to characteristics such as high dimensionality found in...
Bioinformatics datasets have historically been difficult to work with. However, within machine learning, there is a potentially effective tool to combat such problems: ensemble learning. Ensemble learning generates a series of models and combines their results to make a single decision. This process has the benefit of utilizing the power of multiple models but the overhead of having to compute the...
In the domain of bioinformatics, two common problems encountered when analyzing real-world datasets are class imbalance and high dimensionality. Boosting is a technique that can be used to improve classification performance, even in the presence of class imbalance. In addition, data sampling and feature selection are two important preprocessing techniques used to counter the adverse effects of both...
One of the more prevalent problems when working with bioinformatics datasets is class imbalance, when there are more instances in one class compared to the other class (es). This problem is made worse because frequently, the class of interest is also the minority class. A possible solution is data sampling, a powerful tool for combating class imbalance by adding or removing instances to make the dataset...
Bioinformatics datasets pose two major challenges to researchers and data-mining practitioners: class imbalance and high dimensionality. Class imbalance occurs when instances of one class vastly outnumber instances of the other class(es), and high dimensionality occurs when a dataset has many independent features (genes). Data sampling is often used to tackle the problem of class imbalance, and the...
Class imbalance (where one class has many more instances than the other class(es)) and high dimensionality (large number of features per instance) are two prevalent problems that are frequently present in patient response datasets. In addition to these problems, these datasets are notoriously difficult to build effective models from. This paper introduces a new hybrid boosting algorithm named SelectRUSBoost...
The domain of bioinformatics has a number of challenges such as handling datasets which exhibit extreme levels of high dimensionality (large number of features per sample) and datasets which are particularly difficult to work with. These datasets contain many pieces of data (features) which are irrelevant and redundant to the problem being studied, which makes analysis quite difficult. However, techniques...
Many problems in bioinformatics involve high-dimensional, difficult-to-process collections of data. For example, gene micro arrays can record the expression levels of thousands of genes, many of which have no relevance to the underlying medical or biological question. Building classification model son such datasets can thus take excessive computational time and still give poor results. Many strategies...
The ability to predict a patient's response to a treatment has long been a goal in the fields of medicine andpharmacology. This is especially true for cancer treatments, as many of these incur extreme side effects as a consequenceof destroying healthy cells along with cancerous ones. Geneprofiles such as DNA microarrays could potentially containinformation on which treatments are most likely to work...
Ensemble classification has been a frequent topicof research in recent years, especially in bioinformatics. The benefits of ensemble classification (less prone to overfitting, increased classification performance, and reduced bias) are aperfect match for a number of issues that plague bioinformaticsexperiments. This is especially true for DNA microarray dataexperiments, due to the large amount of...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.