The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper describes our work on mining pollutant data to assess air quality in urban areas. Notable aspects of this work are that we mine social media and structured data in a domain-specific context, incorporate commonsense knowledge in mining media opinions and focus on the urban planning domain in a multicity environment. The results of mining are useful for predictive analysis in urbanization...
Pathology reports are written by pathologists, skilled physicians, that know how to interpret disorders in various tissue samples from the human body. To obtain valuable statistics on outcome of disorders, as for example cancer and effect of treatment, statistics are collected. Therefore, cancer pathology reports interpreted and coded into databases at cancer registries. In Norway is this task carried...
Many applications in various industrial and research areas analyze large continuously evolving data. Big data analytics platforms such as MapReduce focus on distributed batch processing, and therefore, a query needs to be re-executed every time its input data evolve. In this paper, we present IncReStore, a system that incrementally computes queries on fast growing datasets by materializing query outputs...
In today's world, large volumes of medical data are being continuously generated, but their value is severely undermined by our inability to translate them into knowledge and, ultimately, actions. Data mining techniques allow the extraction of previously unknown interesting patterns from large datasets, but their complexity limits their practical diffusion. Data-driven analysis is a multi-step process,...
The Internet encompasses websites, email, social media, and Internet-based television. Given the widespread use of networked computers and mobile devices, it has become possible to monitor the behavior of Internet users by examining their access logs and queries. Based on large-scale web and text mining of Internet media articles and associated user comments, we propose a framework to rapidly monitor...
Information about an entity can hardly be assumed to be given in one single document, created in a single instance of time. Rather, it is reasonable to assume that information is spread over multiple documents and created/enriched over time—for instance through crowdsourcing facts or mined from social network streams, one after the other. In this work, we consider the problem of assembling entity-centric...
We study the use of SIMD instructions to support complex conjunctive numerical predicates. Compared to previous studies, we aim to model more realistic use scenarios, where different data types, different comparison operations, and different predicate types can be mixed in a single filtering clause. Moreover, the evaluation of the predicates on a set of columns can take advantage of multiple processor...
Person-to-person cloud service providers such as Facebook use Host-side (HsC) and Application-side (AsC) caches to enhance performance. Using Facebook's Flashcache as the representative of HsC and IQ-Twemcached as the representative of AsC, this study quantifies their tradeoffs using both a read-heavy and a write-heavy workload. Obtained results show Flashcache provides significant benefit for I/O...
Big Data is colloquially described in terms of the three Vs: Volume, Velocity, and Variety. Volume and velocity receive a disproportionate amount of research attention, however, variety is frequently cited by practitioners as the Big Data problem that “keeps them up at night” — the problem that resists direct attacks in terms of new algorithms, systems, and approaches. We find that the cloud-based...
Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved with the conference event and publication of the proceedings record.
In this presentation, we survey the use of advanced hard-ware features for optimizing main-memory database systems in the context of our HyPer project. The access behavior of database objects from simultaneous OLTP transactions is monitored using the virtual memory management component in order to compact the database into hot and cold partitions. The cold partitions are stored in compressed data...
Conference proceedings front matter may contain various advertisements, welcome messages, committee or program information, and other miscellaneous conference information. This may in some cases also include the cover art, table of contents, copyright statements, title-page or half title-pages, blank pages, venue maps or other general information relating to the conference that was part of the original...
Twitter is a rapidly growing microblogging platform that allows its users to send and read short messages, called tweets. Because of the fact that a user's timeline consists of the latest tweets of their followees (users that they are following), followee recommendation is a problem of significant importance. In this work we propose a followee recommendation approach, which takes advantage of the...
Distributed and replicated systems, such as Big Data applications, deal with conflicting and duplicated data. Therefore, it is needed a database with flexible data model to materialize the data and an end-user oriented interface in order to allow queries in heterogeneous data. Using OBDA approach in a Graph Database is an attempt to solve these problems. However, new problems arise regarding hierarchical...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.