The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Today big data is synonymous with every business and organization, so much so that data brokers have made a business of trading this big data like any other commodity. In turn, the buyers of this big data make massive profits. The only one who loses out on profits and his privacy is the internet user — the generator and owner of this big data. Our work looks at allowing the user to monetize on his...
Nowadays mobile phone data are an actual proxy for studying the users' social life and urban dynamics. In this paper we present the Sociometer, and analytical framework aimed at classifying mobile phone users into behavioral categories by means of their call habits. The analytical process starts from spatio-temporal profiles, learns the different behaviors, and returns annotated profiles. After the...
Based on empirical studies, the feature of random initialization in Particle Swarm Optimization (PSO) based Fuzzy c-means (FCM) methods affects the computational performance especially in big data. As the data points in high-density areas are more likely near the cluster centroids, we design a new algorithm to guide the initialization according to the data density patterns. Our algorithm is initialized...
Estimation of data veracity is recognized as one of the grand challenges of big data. Typically, the goal of truth discovery is to determine the veracity of multi-source, conflicting data and return, as outputs, a veracity label and a confidence score for each data value, along with the trustworthiness score of each source claiming it. Although a plethora of methods has been proposed, it is unlikely...
We have already proposed a graph analysis method that could shorten the analysis time by reconstructing a web graph. In our proposed method, a web graph is reconstructed for parallel distributed processing of possible graphs by clustering a web graph and reconstructing the web graph for Compression Graph and Cluster Graphs. Compression Graph represents the relationship between clusters, whereas Cluster...
Modern graphs are large, often containing billions of nodes and edges that demand huge amount of processing for analysis purposes. The algorithms processing these graphs often run for long time and consume substantial amount of energy. However, not all edges in the graphs are equally important. Some edges play critical role in maintaining the community and other interesting structures in the graph,...
In this paper we introduce a novel family of decision lists consisting of highly interpretable models which can be learned efficiently in a greedy manner. The defining property is that all rules are oriented in the same direction. Particular examples of this family are decision lists with monotonically decreasing (or increasing) probabilities. On simulated data we empirically confirm that the proposed...
Set-valued dataset contains different types of items/values per individual, for example, visited locations, purchased goods, watched movies, or search queries. As it is relatively easy to re-identify individuals in such datasets, their release poses significant privacy threats. Hence, organizations aiming to share such datasets must adhere to personal data regulations. In order to get rid of these...
In this paper, we report on an evaluation of four representative Big Data management systems (BDMSs): Mon-goDB, Hive, AsterixDB, and a commercial parallel shared-nothing relational database system. In terms of features, all offer to store and manage large volumes of data, and all provide some degree of query processing capabilities on top of such data. Our evaluation is based on a micro-benchmark...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.