The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Recommender systems are widely used by companies that sell all or some of their products via the Internet. Furthermore, they are destined to take on an even more important role when their use is generalized as a Web 2.0 social service and is no longer only linked to e-commerce companies. The recommendations that a recommender system offers any given user are based on the preferences shown by a given...
Indian languages such as Hindi is phonetic in nature. The text-to-speech (TTS) system for Hindi, exploits the phonetic nature of Hindi. The algorithm developed by us involves analysis of a sentence in terms of words and then symbols involving combination of pure consonants and vowel technique. Wave files are being merged as per the requirement to generate the modified consonants influenced by matras,...
Many organizations collect large amounts of data to support their business and decision making processes. The data collected from various sources may have data quality problems in it. These kinds of issues become prominent when various databases are integrated. The integrated databases inherit the data quality problems that were present in the source database. The data in the integrated systems need...
Taking wise career decision is so crucial for any body for sure. In modern days there are excellent decision support tools like data mining tools for the people to make right decisions. This paper is an attempt to help the prospective students to make wise career decisions using technologies like data mining. In India technical manpower analysis is carried out by an organization named NTMIS(National...
In this era of data digitization, data mining is essential for getting valuable information. However, privacy and security issues remain major barriers during this process. Since medical records are related to human subjects, privacy protection is taken more seriously than other data mining tasks. As required by the Health Insurance Portability and Accountability Act (HIPAA), it is necessary to protect...
An improved K-medoids clustering algorithm (IKMC) to resolve the problem of detecting the near-duplicated records is proposed in this paper. It considers every record in database as one separate data object, uses edit-distance method and the weights of attributes to get similarity value among records, then detect duplicated records by clustering these similarity value. This algorithm can automatically...
Finding similarity between a pair of protein structures is one of the fundamental tasks in many areas of bioinformatical research such as protein structure prediction, function mapping, etc. We propose a method for finding pairing of amino acids based on densities of the structures and we also propose a modification to the original TM-score rotation algorithm that assess similarity score to this alignment...
In the field of unsupervised texture classification, a combination of various families of methods was usually used for better classification results. However, the existing methods are usually used for specific application and evaluated with fixed window size. In this literature, we propose an effort to combine multi-scale features for unsupervised texture classification. The local binary pattern (LBP)...
Abstract-We present a novel Bayesian network (BN) to classify strains of Mycobacterium tuberculosis complex (MTBC) into six major genetic lineages using mycobacterial interspersed repetitive units (MIRUs), a high-throughput biomarker. MTBC is the causative agent of tuberculosis (TB), which remains one of the leading causes of disease and morbidity world-wide. DNA fingerprinting methods such as MIRU...
Similarity search in chemical structure databases is an important problem with many applications in chemical genomics, drug design, and efficient chemical probe screening among others. It is widely believed that structure based methods provide an efficient way to do the query. Recently various graph kernel functions have been designed to capture the intrinsic similarity of graphs. Though successful...
Text Categorization is used to organize and manage biomedical text databases that are growing at an exponential rate. Feature representations for documents are a crucial factor for the performance of text categorization. Most of the successful existing techniques use a vector representation based on key entities extracted from the text. In this paper we investigate a new direction where we represent...
The amount of speaker specific information in speech signal varies from frame to frame depending on spoken text and environmental conditions. A frame selection at the preprocessing stage can be an added advantage in this context. In pre-quantization (PQ) we select a new sequence of frames Y from the original frames X such that length of Y is less than X. In this paper, we first analyze a number of...
The early diagnosis and the correct therapy for generalized infections is an important factor for patient survival in intensive care burn units (ICBUs). Due to the number of pathologies involved, there is not a specific etiology and, therefore, it is difficult for physicians to quantify the patient severity to state the diagnosis. In this scenario, CBR finds problems to obtain a reliable solution...
K-means cluster algorithm is one of important cluster analysis methods of data mining, but through the analysis and the experiment to the traditional K-means cluster algorithm, it is discovered that its cluster result varies along with the initial selected cluster central point, and the difference is big. In view of this question, this text proposed the method of seeking the initial cluster center...
This work proposes the use of deferential evolution algorithm to find the parameters of a data mining system used to pre-select electrical energy consumers with suspect of fraud. A pattern recognition system was built in order to identify suspicious behavior of electrical energy consumers. However, the system only indicates such clients, and the frauds must be confirmed through in-locus inspection...
Prototype based classifiers allow to determine the class of a new example based on a reduced set of prototypes instead of using a large set of known samples. By doing this, the computational time gets substantially decreased as the initial set is replaced by a reduced one and hence the classification requires less computations to estimate nearest neighbours. In most simple classification problems...
Knowledge constantly grows in scientific discourse and is revised over time by domain experts. The body of knowledge will get structured and refined as the communities of practice concerned with the field of knowledge develop a deeper understanding of issues. The knowledge model, as a result evolves to a new state to accommodate the new knowledge. Keeping trail of these changes in semantically rich...
In this contribution a feature selection method in semi-supervised problems is proposed. This method selects variables using a feature clustering strategy, using a combination of supervised and unsupervised feature distance measure, which is based on conditional mutual information and conditional entropy. Real databases were analyzed with different ratios between labelled and unlabelled samples in...
With the development of wireless communication and mobile spatial information services, there is an increasing demand for current applications to have the capability of processing spatio-temporal data. Typical applications include intelligent transport systems, digital battlefield, location e-commerce, and so forth. Moving Objects Database, which manages dataset of moving objects, serves as an essential...
Naive Bayes Classifiers have been known with the advantages of high efficiency and good classification accuracy and they have been widely used in many domains. However, the classifiers need complete data. And the phenomenon of missing data widely exists in practice. Facing this instance, learning naive Bayes classifier and classification method with missing data are built in this paper. Compared with...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.