The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Feature selection from microarray data has become an ever evolving area of research. Numerous techniques have widely been applied for extraction of genes which are expressed differentially in microarray data. Some of these comprise of studies related to fold-change approach, classical t-statistics and modified t-statistics. It has been found that the gene lists returned by these methods are dissimilar...
The development of data mining applications such as classification and clustering has shown the need for machine learning algorithms to be applied to large scale data. Cancer classification has improved over the past 20 years; there has been no general approach for identifying new cancer classes or for assigning tumors to known classes (class prediction). Most proposed cancer classification methods...
An accurate tumor classification is important to diagnosis and treatment cancers. The conventional methods for tumor classification include training and testing phases, which may cause over fitting. Although this problem can be avoided by using sparse representation classification, the existing sparse representation methods for tumor classification are inefficient. In this paper, an efficient and...
Breast cancer is the most commonly diagnosed cancer and the second leading cause of death among women worldwide. Accurate diagnosis of the specific subtypes of this disease is vital to ensure that patients are provided with the most effective therapeutic strategies that yield the greatest response. Using the newly proposed ten subtypes of breast cancer, we hypothesize that machine learning techniques...
Classification of gene expression data to determine subtype of samples is meaningful to research tumors in molecular biology level. It is also an important way to make further treatment plan for the patient. Particle swarm optimization (PSO) is proven to be an ineffective solution for classification and clustering in bioinformatics as it could not give a stable prediction result. In this study, a...
One of the most important link in improves diagnostic accuracy and disease cure rate is accurate classification of disease. The current gene chip's development and widely applications making the diagnosis based on tumor gene expression profiling expected to be on a fast and effective clinical diagnostic method. But the sample of gene is small and the expression data is multi-variable. In this article,...
An important application of microarray data in functional genomics is to classify samples according to their gene expression profiles such as to classify cancer versus normal samples or to classify different types or subtypes of cancer. One of the major tasks with gene expression data is to find co-regulated gene groups whose collective expression is strongly associated with sample categories. In...
In this investigation, a cancer classification approach is presented using clustering based gene selection and artificial neural networks. To address the so called ‘curse of dimensionality’ a T-statistic feature selection method, one of the univariate filter techniques, is used to select the most informative genes. However, instead of selecting a small group of relevant genes at once from the whole...
Amyotrophic Lateral Sclerosis (ALS) is a neurodegenerative disease causing a progressive loss of motor neurons. The disease prevalence is 5 per 100,000 people. There is no cure and it leads generally to death from respiratory failure in approximately 3-5 years after the first symptoms. The exact causes of the disease are still unknown, however, almost 20% of the known cases have shown gene mutations...
Classification analysis of gene expression data could lead to knowledge of gene functions and diseases mechanisms. However, the data involve nonlinear interactions among genes and environmental factors. Worst yet, while the data are usually of high dimensions, the sample sizes acquirable are generally relatively small, resulting in the well known difficulty ¨C the curse of dimensionality ¨C in the...
In order to achieve feature genes for classification, a method of feature selection based on gene expression profile was proposed according to the characters of gene expression data. In this method, an improved FDR was regarded as marking criterion of classification feature to remove the genes which are irrelevant to classification. A new distance composed of space distance and function distance was...
An important issue in the design of gene selection algorithm for microarray data analysis is the formation of a suitable criterion function for measuring the relevance between different gene expressions. Mutual Information (MI) is widely used criterion function but it calculates the relevance on the entire samples only once which cannot exactly identifies the informative genes. This paper proposes...
Detection of different types of cancers is important in clinical diagnosis and treatment. Leukemia is one of the cancers that has different subtypes: acute lymphoblastic leukemia (ALL) and acute myeloid leukemia (AML). The detection of these subtypes according to different genetic markups in leukemia patients will lead to individualized therapies. Gene expression analysis has been used for the study...
The concept of linear separability of gene expression data sets with respect to two classes, has been recently studied in literature. The problem is to efficiently find all pairs of genes which induce a linear separation of the data. It has been suggested that an underlying molecular mechanism relates together the two genes of a separating pair to the phenotype under study, such as a specific cancer...
Many attempts have been carried out to deal with missing values (MV) in microarrays data representing gene expressions. This is a problematic issue as many data analysis techniques are not robust to missing data. Most of the MV imputation methods currently being used have been evaluated only in terms of the similarity between the original and imputed data. While imputed expression values themselves...
Selection of reliable genes from micro array gene expression data is essential to carry out a diagnostic test and successful treatment. In this regard, a rough set based gene selection algorithm is developed recently to select genes from micro array data. In this paper, a fuzzy discretization method is proposed for rough set based gene selection algorithm to compute relevance and significance of continuous...
To construct biologically interpretable features and facilitate Muscular Dystrophy (MD) sub-types classification, we propose a novel integrative scheme utilizing PPI network, functional gene sets information, and mRNA profiling. The workflow of the proposed scheme includes three major steps: First, by combining protein-protein interaction network structure and gene co-expression relationship into...
Among the large number of genes presented in microarray data, only a small fraction of them are effective for performing a certain diagnostic test. However, it is very difficult to identify these genes for disease diagnosis. In this regard, a new supervised gene clustering algorithm is proposed to cluster genes from microarray data. The proposed method directly incorporates the information of response...
In data mining, the classification algorithms usually pursue more highly accuracy. It is based on the assumption that all misclassifications have the same cost. Obvious, the assumption is not suitable. By improving the encode/decode methods and taking different misclassification cost into account, this paper concerns a new cost-sensitive algorithm called CS-GE based on Gene Expression. The experimental...
Gene selection is a challenging task in microarray data mining because a typical microarray dataset has only a small number of records while having thousands of attributes. This kind of dataset creates a high likelihood of finding false predictions that are due to chance. Finding the most relevant genes is often the key phase in building an accurate classification model. Irrelevant and redundant attributes...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.