The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The proposed paper presents a novel scheme that can perform a precise extraction of knowledge from the complex and massive streaming of live data of the scene from the crowded place. The prime contribution of the proposed system is to perform enough processing over the raw and unstructured distributed data from multiple locations so that processing over distributed storage and mining can be done with...
Trolls in social media are ‘malicious’ users trying to propagate an opinion or distort the general perceptions. Identifying trolls in social media is a task of interest for many big data applications since data cannot be analyzed effectively without eliminating such users from the crowd. In this paper, we present a solution for troll detection and also the results of measuring terror awareness among...
Social Network Analysis (SNA) has become a very important and increasingly popular topic among researchers in recent years especially after emerging Semantic Web and Big Data technologies. Social networking services such as Facebook, Google+, Twitter, etc. provide large amounts of data that can be used for social network analysis by researchers. Semantic Web technology plays an important role for...
Due to increasing urban population and growing number of motor vehicles, traffic congestion is becoming a major problem of the 21st century. One of the main reasons behind traffic congestion is accidents which can not only result in casualties and losses for the participants, but also in wasted and lost time for the others that are stuck behind the wheels. Early detection of an accident can save lives,...
Estimating the skills, talents, and expertise of employees is essential for human capital management in knowledge-based organizations across industries and sectors. In this paper, we describe an approach to infer the expertise of employees from their enterprise data and digital footprints. Using a novel big data workflow with components of information retrieval and search, data fusion, matrix completion,...
Provenance is information about the origin and creation of data. In data science and engineering, such information is useful and sometimes even critical. In spite of that, provenance for big data is under-explored due to the challenges from the ‘Vs’ of big data. In data analytics, users need to query history, reproduce intermediate or final results, tune models, and adjust parameters in runtime for...
The value that can be extracted from big data greatly motivates organizations to explore data analytics technologies for better decision making and problem solving in a wide range of application domains. Cloud computing greatly eases and benefits big data analytics by offering on-demand and scalable computing infrastructures, platforms, and applications as services. Big data Analytics-as-a-Service...
Multi-engine analytics has been gaining an increasing amount of attention from both the academic and the industrial community as it can successfully cope with the heterogeneity and complexity that the plethora of frameworks, technologies and requirements have brought forth. It is now common for a data analyst to combine data that resides on multiple and totally independent engines and perform complex...
In the big data era, data mining techniques and applications are becoming increasingly important in various industries. Among numerous data mining techniques, frequent patterns are a crucial tool. The majority of existing studies on frequent pattern mining used single minimum support thresholds, which is unreasonable in the real world. Although there have been a lot of extensive research on support...
The rating score prediction is widely studied in recommender system, which predicts the rating scores of users on items through making use of the user-item interaction information. Besides the rating information between users and items, lots of additional information have been employed to promote recommendations, such as social relation and geographic location. Expenditure information on each transaction...
Recommender systems play a key role in personalizing service experiences by recommending relevant items to users. One popular technique for producing such personalization at scale is collaborative filtering via Matrix Factorization (MF). The essence of MF is to train a model by factorizing a sparse rating matrix consisting of users' ratings of item. Unfortunately, existing MF methods require model...
In this paper, we propose an online multi-view clustering algorithm, OMVC, which deals with large-scale incomplete views. We model the multi-view clustering problem as a joint weighted NMF problem and process the multi-view data chunk by chunk to reduce the memory requirement. OMVC learns the latent feature matrices for all the views and pushes them towards a consensus. We further increase the robustness...
In this paper, we study a 3-hop approach to distance estimation that uses two intermediate landmarks, where each landmark only stores distances to vertices in its local neighborhood and to the other landmarks. We show how to suitably represent and compress the distance data stored for each landmark, for the 2-hop and 3-hop case. Overall, we find that 3-hop methods achieve modest but promising improvement...
Demand-prediction based resource provisioning schemes help assure service level objectives (SLO) in cloud systems. We notice that if a provisioning scheme does not exclude bursts from historical resource demands in normal demand prediction or always uses a large padding to correct under-prediction, it will lead to resource over-provisioning and low resource utilization. To improve the previous schemes,...
Big data is currently a hot research topic, with four million hits on Google scholar in October 2016. One reason for the popularity of big data research is the knowledge that can be extracted from analyzing these large data sets. However, data can contain sensitive information, and data must therefore be sufficiently protected as it is stored and processed. Furthermore, it might also be required to...
Bayesian networks are probabilistic graphical models often used in big data analytics. The problem of Bayesian network exact structure learning is to find a network structure that is optimal under certain scoring criteria. The problem is known to be NP-hard and the existing methods are both computationally and memory intensive. In this paper, we introduce a new approach for exact structure learning...
K-Means algorithm is one of the most popular methods for flat clustering, but it's time-consuming in similarity calculation for big data, which causes lower performance in practice. Previous studies proposed improvements for finding better initial centroids to facilitate effective assignment of the data points to suitable clusters with reduced time complexity. However, in vector space representation,...
This paper provides a comprehensive analysis of skeleton decomposition used for segmentation of data W = [w1 ···WN] ⊂ Rd drawn from a union U = UMi=1 Si of linearly independent subspaces {Si}Mi=1 of dimensions of {di}Mi=1. Our previous work developed a generalized theoretical framework for computing similarity matrices by matrix factorization. Skeleton decomposition is a special case of this general...
We have implemented an updated Hierarchical Triangular Mesh (HTM) as the basis for a unified data model and an indexing scheme for geoscience data to address the variety challenge of Big Earth Data. In the absence of variety, the volume challenge of Big Data is relatively easily addressable with parallel processing. The more important challenge in achieving optimal value with a Big Data solution for...
Truth Finding is the problem of determining correct information from several conflicting sources and is required for data aggregation. Existing algorithms solve the problem by simultaneously estimating source qualities and fact confidences, working on either numeric or non-numeric data. However, in practice, datasets are a mixture of several different data types. In this work we present a unified...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.