The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Incorporating user characteristics and contextual information has shown to be essential when it comes to personalized music retrieval and recommendation. To this end, the current location of a user is often exploited. However, relying solely on GPS coordinates neglects the cultural background of users, which does not necessarily coincide with political borders. In this paper, we analyze culture-specific...
HDBSCAN*, a state-of-the-art density-based hierarchical clustering method, produces a hierarchical organization of clusters in a dataset w.r.t. a parameter mpts. While the performance of HDBSCAN* is robust w.r.t. mpts, choosing a "good" value for it can be challenging: depending on the data distribution, a high or low value for mpts may be more appropriate, and certain data clusters may...
We suggested a method of clustering, which allows to build a model of conceptual clustering for objects of fuzzy nature, and also to increase the accuracy of clustering for such objects. We used Cobweb clustering method as a base. We modified the formula of assessing the utility of conceptual clustering for objects with fuzzy parameter values. Then we suggested a modified Cobweb version for working...
Medical institutes use Electronic Medical Record (EMR) to record a series of medical events, including diagnostic information (diagnosis codes), procedures performed (procedure codes) and admission details. Plenty of data mining technologies are applied in the EMR data set for knowledge discovery, which is precious to medical practice. The knowledge found is conducive to develop treatment plans, improve...
Clustering is an important tool for analyzing gene expression data. Many clustering algorithms have been proposed for the analysis of gene expression data. In this article we have clustered real life gene expression data via K-Means which is one of clustering algorithms. Also, we have proposed a new method determining the initial cluster centers for K-means. We have compared results of our method...
Central examinations are one of the measurement and evaluation tools used throughout the world to select from among the participants, to rank, to reduce the number of candidates before the interview or determine whether the level of education varies between regional and demographic criteria. A more objective measurement and evaluation can be made through the questioning of multiple choice questions...
There are a diverse set of products for a particular type on the internet. When any user tries to find out best product among a certain type it is very much difficult to do it manually go through every one of them. That's why manually searching is not very efficient. In that scenario, recommendation system plays a great important role to recommend the best products. In this study, we develop a recommendation...
Missing data is a data mining problem that adversely affects data analysis and decision making processes that are frequently encountered in healthcare data for a variety of reasons. Missing data is still an important research topic because the success of the method is influenced by many factors such as the characteristics of the data and the type of the missing data. In this study, a clustering and...
How to reduce the computation time and how to improve the quality of the clustering result are the two major research issues. Although several efficient and effective clustering algorithms have been presented, none of which is perfect. As such, an effective clustering algorithm, which is based on the prediction of searching information to determine the search directions at later iterations and employs...
In this work, we analyze the usefulness of the normalized compression distance (NCD) as a similarity measure to bird species identification through audio samples. As a first approach we review the effect of different compression methods from 7z and CompLearn Toolkit, over subsets of bird audio samples obtained from the xeno-canto database. The performance of each compression method was measured applying...
Accurate forecasting of solar time series is challenging due to irregularities and uncertainties of such datasets. This paper develops an advanced hybrid forecasting method for solar radiation. The proposed framework combines a novel data mining technique for clustering the time-series data with an innovative cluster selection method and a multilayer recurrent neural network (RNN) to enhance the forecast...
A simple semantic lexicon extraction method is proposed based on one hypothesis and three filtering rules from Baidu Chinese Network Encyclopedia. The acquired affective lexicon includes emotional words and their lexical semantic relations including synonyms and antonyms. The acquiring method is recursive algorithm using the seed words. The extracted affective lexicon is labeled with affective tendency...
With the advance of mobile electronic devices and the development of positioning technology, a large volume of spatio-temopral data are collected in the form of desultorily data streams, which contain a lot of potential information. In this study, we focus on discovering the composition relationships between observation moving objects in a long period. Such research can be widely used in military...
With social media encompassing people in all aspects of life, the relevance of information shared over these media is becoming highly relevant. The marketing and retail industry have been using social media like Twitter and Facebook extensively to collect information and promote their products. Now, it is the healthcare industry's turn to find hidden insights from the vast data available on the web...
Good nutrition is an essential component of life. Undernutrition is the root cause of death of over 3.5 million children under the age of five in India. To address this issue of malnutrition, though overarching national policy is desirable, it may not be effective if the root cause of malnutrition varies across regions of the country. In this context, the attempt made in this paper is two-fold. First,...
Clustering is a well-recognized data mining technique which enables the determination of underlying patterns in datasets. In electric power systems, it has been traditionally utilized for different purposes like defining customer load profiles, tariff designs and improving load forecasting. Some surveys summarized different clustering techniques which were traditionally used for customer segmentation...
With advances in technology, high volumes of a wide variety of valuable data of different veracity can be easily collected or generated at a high velocity in the current era of big data. Embedded in these big data are implicit, previously unknown and potentially useful information. Hence, fast and scalable big data science and engineering solutions that mine and discover knowledge from these big data...
The problem faced by the company is how to determine potential customers and apply CRM (Customer Relationship Management) in order to perform the right marketing strategy, so it can bring benefits to the company. This research aims to perform clustering and profiling customer by using the model of Recency Frequency and Monetary (RFM) to provide customer relationship management (CRM) recommendation...
Automatic multi-document summarization may help news readers retrieve information from digital news media efficiently. The summarizer create a concise summary containing important information from a collection of articles, enabling readers to read only one text to gain information from multiple text sources. Reflecting on previous researches, we propose an automatic summarization system using sentence...
The article focuses on the results of the research into scientific publications of the All-Russian Institute for Scientific and Technical Information of the Russian Academy of Sciences database (VINITI Database RAS) in different fields. The purpose of operation was to increase partition accuracy on the directions of large volumes of scientific data. This analysis was carried out on summaries of scientific...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.