The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Bagging ensemble techniques have been utilized effectively by practitioners in the field of bioinformatics to alleviate the problem of class imbalance and to improve the performance of classification models. However, many previous works have used bagging only with a single arbitrary number of iterations. In this study, we raise the question of what is the impact of altering the number of iterations/ensembles...
Bioinformatics datasets contain many challenging characteristics, such as class imbalance, which adversely impacts the performance of supervised classification models built on these datasets. Techniques such as ensemble learning and data sampling from the domain of data mining can be deployed to alleviate the problem and to improve the classification performance. In this study, we sought to seek whether...
One major challenge posed by bioinformatics datasets is class imbalance which occurs when one class has many more instances than the other class(es). Its undesirable effect on the classification performance is compounded with the fact that, in general, the class with fewer instances is the class of interest. Bagging has been utilized by practitioners in the field to overcome the challenge of class...
Bioinformatics datasets contain challenging characteristics, such as class imbalance that occurs when one class has many more instances than the other class(es). These challenges make the task of classification much more subtle for practitioners and researchers in the field. Fortunately, there are tools, such as ensemble learning and data sampling methods that can be applied to overcome these problems...
Noise is a prominent challenge found in many bioinformatics datasets and it refers to erroneous or missing data. The presence of noise in gene expression datasets has adverse effects on machine-learning techniques, such as supervised classification algorithms and feature selection techniques. Additionally, the identification of noise and its quantification are challenging tasks that require a proper...
Class imbalance is a significant challenge that practitioners in the field of bioinformatics are faced with on a daily basis. It is a phenomenon that occurs when number of instances of one class is much greater than number of instances of the other class(es) and it has adverse effects on the performance of classification models built on this skewed data. Random Forest as a robust classifier has been...
Bioinformatics datasets contain a number of characteristics, such as noisy data and difficult to learn class boundaries, which make it challenge to build effective predictive models. One option for improving results is the use of ensemble learning methods, which involve combining the results of multiple predictive models into a single decision. Since we do not rely on a single model, we reduce the...
Choosing an appropriate cancer treatment is potentially the most important task in the treatment of a cancer patient. If it were possible to identify the best option for a patient (or at minimum to remove options that will not help the patient), then the general prognosis of the patient improves. However, this task becomes much more subtle due to characteristics such as high dimensionality found in...
Bioinformatics datasets have historically been difficult to work with. However, within machine learning, there is a potentially effective tool to combat such problems: ensemble learning. Ensemble learning generates a series of models and combines their results to make a single decision. This process has the benefit of utilizing the power of multiple models but the overhead of having to compute the...
In the domain of bioinformatics, two common problems encountered when analyzing real-world datasets are class imbalance and high dimensionality. Boosting is a technique that can be used to improve classification performance, even in the presence of class imbalance. In addition, data sampling and feature selection are two important preprocessing techniques used to counter the adverse effects of both...
Bioinformatics datasets pose two major challenges to researchers and data-mining practitioners: class imbalance and high dimensionality. Class imbalance occurs when instances of one class vastly outnumber instances of the other class(es), and high dimensionality occurs when a dataset has many independent features (genes). Data sampling is often used to tackle the problem of class imbalance, and the...
Many bioinformatics datasets share certain problems: they have class imbalance (one class with many more instances than the remaining class(es)), or are difficult to learn from (build accurate models with). Much research has investigated these two problems, or even considered both at once. However, hidden dependencies can exist between these two problems: in a given collection of datasets, the highly...
With the proliferation of high-dimensional datasets across many application domains in recent years, feature selection has become an important data mining task due to its capability to improve both performance and computational efficiencies. The chosen feature subset is important not only due to its ability to improve classification performance, but also because in some domains, knowing the most important...
A major challenge facing data-mining practitioners in the field of bioinformatics is class imbalance, which occurs when instances of one class (called the majority class) vastly outnumber instances of the other (minority) classes. This can result in models with increased bias towards the majority class (minority-class instances predicted as being in the majority class). Data sampling, a process which...
Dimensionality reduction techniques have become a required step when working with bioinformatics datasets. Techniques such as feature selection have been known to not only improve computation time, but to improve the results of experiments by removing the redundant and irrelevant features or genes from consideration in subsequent analysis. Univariate feature selection techniques in particular are...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.