The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
With the rapid growth of online content consumption, knowing end-users and having actionable content insights has become extremely important for any online content provider. Insights from user segment identification could help in developing a content recommendation as well as new content acquisition. For advertisers, identifying segments could assist in designing ad campaigns with greater target accuracy...
Chronic wounds present a significant risk to the patient and a substantial drain on health budgets, with the problem likely to worsen markedly with increased incidence of type II diabetes. The wound fluid microbiome is known to influence wound healing outcomes, but is poorly characterised. Next Generation Sequencing approaches yield abundant data from wound samples, but progress in understanding these...
This paper compares the performance and stability of two Big Data processing tools: the Apache Spark and the High Performance Analytics Toolkit (HPAT). The comparison was performed using two applications: a unidimensional vector sum and the K-means clustering algorithm. The experiments were performed in distributed and shared memory environments with different numbers and configurations of virtual...
Using solely the information retrieved by audio finger-printing techniques, we propose methods to treat a possibly large dataset of user-generated audio content, that (1) enable the grouping of several audio files that contain a common audio excerpt (i.e. are relative to the same event), and (2) give information about how those files are correlated in terms of time and quality inside each event. Furthermore,...
In this work, we analyze the usefulness of the normalized compression distance (NCD) as a similarity measure to bird species identification through audio samples. As a first approach we review the effect of different compression methods from 7z and CompLearn Toolkit, over subsets of bird audio samples obtained from the xeno-canto database. The performance of each compression method was measured applying...
Learning Management Systems such as Modular Object-Oriented Dynamic Learning Environment (Moodle) only supports random group assignment or instructor based assignment method. However, with the understanding that random assignment method only increases the likelihood of heterogeneity in the group, while instructor based method involves the instructors and it is not dynamic, there is need to develop...
The paper presents the results of the research of the clustering algorithm DBSCAN practical implementation within the framework of the objective clustering inductive technology. As experimental, the data Aggregation and Compound of the Computing school of the East Finland University and the gene expression sequences of lung cancer patients of the database ArrayExpres were used. The architecture of...
This paper proposes a methodology for finding typical load profiles for residential customers by using clustering techniques. Such task is particularly challenging due to the great diversity of electricity use by residential customers. Specific characteristics of this kind of customers, as number of inhabitants or house surface, may help the clustering, but such features are often, maybe always, unknowable...
This paper discusses the similarities and differences in both ideology expressed and practices employed by two terrorist groups that operated in Greece between the years of 1975 and 2017: Revolutionary Organization 17 November and Conspiracy of Fire Nuclei. Within this line of thought, we will briefly provide an outline of the political and ideological framework of the groups on focus in an effort...
Due to the emerging Big Data paradigm, traditional data management techniques result inadequate in many real life scenarios. In particular, the availability of huge amounts of data pertaining to social interactions among users calls for advanced analysis strategies. Furthermore, heterogeneity and high speed of this data require suitable data storage and management tools to be designed from scratch...
In many applications, such as data integration and big data analytics, one has to integrate data from multiple sources without detailed and accurate schema information. The state of the art focuses on matching attributes among sources based on the information derived from the data in those sources. However, a best join result according to a method's own pre-determined criteria may not fit a user's...
WiFi Fingerprint Positioning (WFP) in outdoor scenario needs mass location information including WiFi signal map and GPS (Global Positioning System) information. Generally pre-measured solution can provide high quality data but it needs lots of labor and time. Different from pre-measured solution, crowdsourcing is an economic and efficient way to obtain location information. WFP based on Clustering...
The increase of the quantity of user-generated content experienced in social media has boosted the importance of analysing and organising the content by its quality. Here, we propose a method that uses audio fingerprinting to organise and infer the quality of user-generated audio content. The proposed method detects the overlapping segments between different audio clips to organise and cluster the...
In order to extract useful information from massive data, the researchers proposed data mining technology, one of the most critical technology is clustering analysis technology. In this paper, an improved clustering algorithm based on shared nearest neighbor is proposed for the existing shared clustering algorithm, and the improved algorithm is applied to fingerprint localization. The algorithm reduces...
Penicillin fermentation process has the characteristics of time variation, non-linearity and uncertainty. Its accurate mechanism model is quite difficult to establish. In order to establish a rapid and accurate model to describe the characteristics of penicillin fermentation process, a local modeling method based on Just-In-Time algorithm is proposed for product concentration prediction during penicillin...
Cluster analysis aims at classifying data elements into different categories according to their similarity. It is a common task in data mining and useful in various field including pattern recognition, machine learning, information retrieval and so on. As an extensive studied area, many clustering methods are proposed in literature. Among them, some methods are focused on mining clusters with arbitrary...
Designing Chinese-Uighur-English online dictionary is very important for the development of ethnic scientific research and education, which is the basis for the work of Uighur semantic study. Online dictionary with a huge thesaurus can be implemented by knowledge graph, existing works have not addressed in much detail. In order to design the thesaurus of the online dictionary, this paper is based...
This paper introduces the technological techniques of data cleaning and data extraction. The current state of domestic and international research in these two areas is reviewed and their future development considered. The following concepts are all explained: the basic principle of data cleaning, the framework models, the need for and the objectives of data cleaning, the testing method and the cleaning...
This paper proposes an attack pattern mining algorithm to extract attack pattern in massive security logs. The improved fuzzy clustering algorithm is used to generate sequence set. Then PrefixSpan is used to mine frequent sequence from the sequence set. The experimental results show that this algorithm can effectively mine the attack pattern, improve the accuracy and generate more valuable attack...
There is no previous research that compares the results of k-means, CLOPE clustering and Latent Dirichlet Allocation (LDA) topic modeling algorithms for detecting trending topics on tweets. Since not all tweets contain hashtags, we considered three training data feature sets: hashtags, keywords and keywords + hashtags in this study. Our proposed methodology proved that CLOPE can also be used in a...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.