The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.
Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.
In this paper, we present LSHDB, the first parallel and distributed engine for record linkage and similarity search. LSHDB materializes an abstraction layer to hide the mechanics of the Locality-Sensitive Hashing (a popular method for detecting similar items in high dimensions) which is used as the underlying similarity search engine. LSHDB creates the appropriate data structures from the input data...
Most data scientists use nowadays functional or semi-functional languages like SQL, Scala or R to treat data, obtained directly from databases. Such process requires to fetch data, process it, then store again, and such process tends to be done outside the DB, in often complex data-flows. Recently, database service providers have decided to integrate "R-as-a-Service" in their DB solutions...
Big data applications consist of very large collection of small records, for example data from a retail website, data from movie streaming services, sensor data applications and many other such applications. Frequent item set mining is one of the common tools used for all these applications to generate recommendations to improve user experience of the website. Frequent itemset mining is also used...
Probabilistic graphic model is an elegant framework to compactly present complex real-world observations by modeling uncertainty and logical flow (conditionally independent factors). In this paper, we present a probabilistic framework of neighborhood-based recommendation methods (PNBM) in which similarity is regarded as an unobserved factor. Thus, PNBM leads the estimation of user preference to maximizing...
With the increase of systems' complexity, exception detection becomes more important and difficult. For most complex systems, like cloud platform, exception detection is mainly conducted by analyzing a large amount of telemetry data collected from systems at runtime. Time series data and events data are two major types of telemetry data. Techniques of correlation analysis are important tools that...
As contemporary software-intensive systems reach increasingly large scale, it is imperative that failure detection schemes be developed to help prevent costly system downtimes. A promising direction towards the construction of such schemes is the exploitation of easily available measurements of system performance characteristics such as average number of processed requests and queue size per unit...
Two of the most popular approaches for dealing with big data are distributed computing and stream mining. In this paper, we incorporate both approaches in order to bring a competitive stream clustering algorithm, namely CluStream, into a modern framework for distributed computing, namely, Apache Spark. CluStream is one of the most popular clustering approaches for stream clustering and the one that...
Discovering and modeling lead-lag relations is a critical task in a variety of domains, including energy management, financial markets and environment monitoring. This task becomes more challenging when processing massive and highly dynamic data sources, often produced by sensors and live feeds that collect data about evolving entities in the real world. To cope with this data volume and velocity,...
This paper presents a framework for processing the data generated by Smart City sensors and IoT data streams in real-time. The scope of processing is to detect various event patterns from the raw data. The framework is extensible because at any moment new data sources can be registered or new specific event detection mechanism can be deployed. The framework offers a HTTP interface which can be used...
Detection of the changes in pattern of disease spread over a population network, Meme-tracking and opinion spread on the Twitter network and product-rating-cascade over a social network are a few among the many embodiments of graph sequence segmentation problem with labeled nodes. Most of the previous approaches to network sequence segmentation are on plain graphs without considerations for the dynamics...
In recent years, association networks and their applications have received increasing interest. The relationships in a network should ideally be ascertained without any preconceptions about the existence of a connection a priori. This would allow interpretations to be based on the underlying structure rather than on assumptions. Furthermore, a method that discounts outside influence on the relationships...
This paper focuses on the identification of overlapping communities, allowing nodes to simultaneously belong to several communities, in a decentralised way. To that aim it proposes LOCNeSs, an algorithm specially designed to run in a decentralised environment and to limit propagation, two essential characteristics to be applied in mobile networks. It is based on the exploitation of the preferential...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.