The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Cliques are topological structures that usually provide important information for understanding the structure of a graph or network. However, detecting and extracting cliques efficiently is known to be very hard. In this paper, we define and introduce the notion of a Triangle K-Core, a simpler topological structure and one that is more tractable and can moreover be used as a proxy for extracting clique-like...
Top-k queries have gained considerable attention as an effective means for narrowing down the overwhelming amount of data. This paper studies the problem of constructing an indexing structure that efficiently supports top-k queries for varying scoring functions and retrieval sizes. The existing work can be categorized into three classes: list-, layer-, and view-based approaches. This paper focuses...
In this paper, we describe a versioned database storage manager we are developing for the SciDB scientific database. The system is designed to efficiently store and retrieve array-oriented data, exposing a ``no-overwrite'' storage model in which each update creates a new ``version'' of an array. This makes it possible to perform comparisons of versions produced at different times or by different algorithms,...
Many frequent pattern mining algorithms have been developed for categorical, numerical, time series, or interval data. However, little attention has been given to integrate these algorithms so as to mine frequent patterns from multiple domain datasets for classification. In this paper, we introduce the notion of a heterogenous pattern to capture the associations among different kinds of data. We propose...
In this paper, we study efficient processing of rank joins in highly distributed systems, where servers store fragments of relations in an autonomous manner. Existing rank-join algorithms exhibit poor performance in this setting due to excessive communication costs or high latency. We propose a novel distributed rank-join framework that employs data statistics, maintained as histograms, to determine...
The abundance of Web 2.0 resources in various media formats calls for better resource integration to enrich user experience. This naturally leads to a new cross-modal resource search requirement, in which a query is a resource in one modal and the results are closely related resources in other modalities. With cross-modal search, we can better exploit existing resources. Tags associated with Web 2...
In a previous paper, we laid out the vision of a novel graph query processing paradigm where instead of processing a visual query graph after its construction, it interleaves visual query formulation and processing by exploiting the latency offered by the GUI to filter irrelevant matches and prefetch partial query results [8]. Our first attempt at implementing this vision, called GBLENDER [8], shows...
Most objects and data in the real world are interconnected, forming complex, heterogeneous but often semistructured information networks. However, many database researchers consider a database merely as a data repository that supports storage and retrieval rather than an information-rich, inter-related and multi-typed information network that supports comprehensive data analysis, whereas many network...
Forecasts are important to decision-making and risk assessment in many domains. Since current database systems do not provide integrated support for forecasting, it is usually done outside the database system by specially trained experts using forecast models. However, integrating model-based forecasting as a first-class citizen inside a DBMS speeds up the forecasting process by avoiding exporting...
A key step in validating a proposed idea or system is to evaluate over a suitable dataset. However, to this date there have been no useful tools for researchers to understand which datasets have been used for what purpose, or in what prior work. Instead, they have to manually browse through papers to find the suitable datasets and their corresponding URLs, which is laborious and inefficient. To better...
The Earth Mover's Distance, or EMD for short, has been proven to be effective for content-based image retrieval. However, due to the cubic complexity of EMD computation, it remains difficult to use EMD in applications with stringent requirement for efficiency. In this paper, we present our new system, called EUDEMON, which utilizes new techniques to support fast Online Video Frame Copy Detection based...
The studies of protein-DNA bindings between transcription factors (TFs) and transcription factor binding sites (TFBSs) are important bioinformatics topics. High-resolution (length<10) TF-TFBS binding cores are discovered by expensive and time-consuming 3D structure experiments. Recent association rule mining approaches on low-resolution binding sequences (TF length>490) are shown promising in...
Neuroscientists increasingly use computational tools in building and simulating models of the brain. The amounts of data involved in these simulations are immense and efficiently managing this data is key. One particular problem in analyzing this data is the scalable execution of range queries on spatial models of the brain. Known indexing approaches do not perform well even on today's small models...
Complex pattern queries play a critical role in many applications that must efficiently search databases and data streams. Current techniques support the search for multiple patterns using deterministic or non-deterministic automata. In practice however, the static pattern representation does not fully utilize available system resources, subsequently suffering from poor performance. Therefore a low...
Aggregate Continuous Queries (ACQs) are both a very popular class of Continuous Queries (CQs) and also have a potentially high execution cost. As such, optimizing the processing of ACQs is imperative for Data Stream Management Systems (DSMSs) to reach their full potential in supporting (critical) monitoring applications. For multiple ACQs that vary in window specifications and pre-aggregation filters,...
Textual web pages dominate web search engines nowadays. However, there is also a striking increase of structured data on the web. Efficient keyword query processing on structured data has attracted enough attention, but effective query understanding has yet to be investigated. In this paper, we focus on the problem of keyword query reformulation in the structured data scenario. These reformulated...
The skyline of a multidimensional point set consists of the points that are not dominated by other points. In a scenario where product features are represented by multidimensional points, the skyline points may be viewed as representing competitive products. A product provider may wish to upgrade uncompetitive products to become competitive, but wants to take into account the upgrading cost. We study...
Progress indicators for SQL queries were first published in 2004 with the simultaneous and independent proposals from Chaudhuri et al. and Luo et al. In this paper, we implement both progress indicators in the same commercial RDBMS to investigate their performance. We summarize common cases in which they are both accurate and cases in which they fail to provide reliable estimates. Although there are...
Many applications process data in which there exists a ``conservation law'' between related quantities. For example, in traffic monitoring, every incoming event, such as a packet's entering a router or a car's entering an intersection, should ideally have an immediate outgoing counterpart. We propose a new class of constraints -- Conservation Rules -- that express the semantics and characterize the...
We present in this paper a novel approach for extracting structured data from the Web, whose goal is to harvest real-world items from template-based HTML pages (the structured Web). It illustrates a two-phase querying of the Web, in which an intentional description of the data that is targeted is first provided, in a flexible and widely applicable manner. The extraction process leverages then both...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.