The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In the emerging field of big data, a large volume of data has to be managed, operating on data of huge volume becomes easier when it's sorted and structured. The data can be structured using a simple algorithm i.e. index algorithm which stores and categories data on basis of their application. This in turn will be very beneficial on business level as well as on software level.
Landsat data have the characteristics of high resolution and wide spectrum, and have been widely used to extract river network. Based on the feature analysis and pretreatment of Landsat-5 Thematic Mapper (TM) data, in this paper, we studied and realized the extraction of river network in Hunan Province, analyzed and dealt with the extraction results of river network. The main research works are as...
This study explores the application of artificial intelligence on the causal relationship between mining production index and electricity load. The data used is the total mining production index and total electricity consumption in the mining sector sampled on a monthly basis from January 1985 to December 2011 in South Africa. Optimally-pruned and basic extreme learning machines were used to develop...
One of the main challenging subjects of data mining is fuzzy-clustering time series in real-world applications. Its reason can be time-series data characteristics that include high dimensional, large volume and existence of temporal ordering in data. So far, many studies have performed about issues such as addressing time-series data high dimension and applying a different effect of each dimension...
Hemoglobin (HB) is an important protein molecule in blood. Traditionally, hemoglobin level measurement is accomplished through invasive way [1] in clinics. But, non-invasive techniques are becoming more popular nowadays, though its accuracy is not up to the mark. The conventional invasive approaches need the blood sample from the patient which is quite inappropriate always for premature infants, aging...
Order-preserving pattern matching was first studied surprisingly recently buthas already attracted much attention. For this problem we propose aspace-efficient index that works well in practice despite its lack of goodworst-case time bounds. Our solution is based on the new approach ofdecomposing the indexed sequence into an em order component, containingordering information, and a δ component, containing...
In order to explore the inherent law of serious collisions and provide a reference to determine the result of accidents, this paper proposed a new model of vessel collision analysis and prediction based on data mining. After collecting complete vessel collision accidents reports, indexes of the severity of vessel collision were extracted and quantified. By using the method of factor analysis, the...
An improved canonical correlation analysis (CCA) approach for multi-subject blind source separation (BSS) of brain functional magnetic resonance imaging (fMRI) data is proposed. Group-level comparison analysis has attracted increasing interest in the human brain fMRI analysis. Canonical correlation analysis for blind source separation (BSS-CCA) relies on the fact that all meaningful real signals are...
The success of Empirical Mode Decomposition (EMD) resides in its practical approach to dissect non-stationary data. EMD repetitively goes through the entire data span to iteratively extract Intrinsic Mode Functions (IMFs). This approach, however, is not suitable for data stream as the entire data set has to be reconsidered every time a new point is added. To overcome this, we propose Online EMD, an...
In this paper, we address the issue of speech polarity detection using strength of impulse-like excitation around epoch. The correct detection of speech polarity is a crucial step for many speech processing algorithms to extract suitable information. Occurrence of errors in the detection of speech polarity could have an impact on the performance of speech systems. Automatic detection of speech polarity...
Magnitude-only resting-state fMRI data have been largely investigated via independent component analysis (ICA) for exacting spatial maps (SMs) and time courses. However, the native complex-valued fMRI data have rarely been studied. Motivated by the significant improvements achieved by ICA of complex-valued task fMRI data than magnitude-only task fMRI data, we present an efficient method for de-noising...
Nonparametric Bayesian models have been implemented in dictionary learning. However, for signal samples from multiple subspaces, existing methods only learn one uniform dictionary and thus are not optimal for representing the subspace structures. To address this issue, we first utilize a combination of Dirichlet process and hierarchical Beta process as priors to infer the latent subspace number and...
Data mining extracts previously not known knowledge from huge amount of stored operational data of organizations which can be used for managerial decision making. The datasets are mostly high dimensional due to the advancements in information and communication technologies. Feature selection is an important dimensionality reduction technique to manage the “curse of dimensionality”. The subset of features...
Documents contain various types of information, and money information is one of such information. In the sentence “He borrowed ten dollars from me”, the expression ‘ten dollars’ conveys important information. When it is normalized into a specific way (e.g., 10 USD), then it can be used to develop various applications: Question-Answering (QA) system or Dialog system. In this paper, we propose an annotation...
Spatial Co-location pattern mining and Spatio-temporal Co-occurrence pattern mining are important directions of spatial data mining. However, the existing relevant mining algorithms are computational expensive and the algorithms can't effectively deal with uncertain data which are wide spread in many areas. The fast co-occurrence data mining algorithms for the uncertain data are proposed by using...
With the change of time information related to geography and volunteered geography also changes. In this way extraction of spatial patterns from crowdsourced data has become most valuable for service suppliers. These patterns represent the spatial features of the co-related objects. The existing approaches used Dijkstras algorithm and Euclidean distance to find spatial patterns which can not compute...
MicroRNAs form a family of single strand RNA molecules having length of approximately 22 nucleotides that are present in all animals and plants. Various studies have revealed that microRNA tend to cluster on chromosomes. In this regard, a novel clustering algorithm is presented in this paper, integrating rough hypercuboid approach with fuzzy c-means. Using the concept of rough hypercuboid equivalence...
The association rules mining process enables the end users to analyze, understand, and use the extracted knowledge in an intelligent system or to support the decision-making processes. To find valuable association rules from a large number of redundant rules, this paper proposes a deeper mining process, multi-mode and high value association rules mining (MH-ARM). This method takes into account the...
Spectral clustering is able to extract clusters with various characteristics without a parametric model, however it is infeasible for large datasets due to its high computational cost and memory requirement. Approximate spectral clustering (ASC) addresses this challenge by a representative-based partitioning approach which first finds a set of data representatives either by sampling or quantization,...
Bayesian nonparametric (BNP) models have recently become popular due to their flexibility in identifying the unknown number of clusters. However, they have difficulties handling heterogeneous data from multiple sources. Existing BNP methods either treat each of these sources independently - hence do not get benefits from the correlating information between them, or require to explicitly specify data...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.