The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
High energy physics scientists analyze large amounts of data looking for interesting events when particles collide. These analyses are easily expressed using complex queries that filter events. We developed a cost model for aggregation operators and other functions used in such queries and show that it substantially improves performance. However, the query optimizer still produces suboptimal plans...
We propose that a scientific database should be inherently different from, say a business database. The difference is based on the nature of science itself, in which hypotheses, or logical implications, form an essential part of the discipline. Empirical observations give rise to tentative hypotheses. Individual hypotheses are then tested, refuted or refined, by further empirical observation. In the...
We consider pedigree data structured in the form of a directed acyclic graph, and use an encoding scheme, called NodeCodes, for expediting the evaluation of queries on pedigree graph structures. Inbreeding is the quantitative measure of the genetic relationship between two individuals. The inbreeding coefficient is related to the probability that both copies of any given gene are received from the...
The database research community's appetite for new applications has led to increased interest in the data management needs of scientists. This area encompasses a huge range of applications, extending from public repositories of observational data such as the popular Sloan Digital Sky Survey to one-of-a-kind runs of simulation codes crafted by individual scientists. In this talk, we will survey the...
K-anonymity is a simple yet practical mechanismto protect privacy against attacks of re-identifying individuals by joining multiple public data sources. All existing methods achieving k-anonymity assume implicitly that the data objects to be anonymized are given once and fixed. However, in many applications, the real world data sources are dynamic. In this paper, we investigate the problem of maintaining...
This paper presents an effective cost model to estimate the number of disk accesses (I/O cost) and the number of distance calculations (CPU cost) to process similarity queries over data indexed by metric access methods. Two types of similarity queries were taken into consideration: range and k-nearest neighbor queries. The main point of the cost model is considering not only global parameters of the...
In high dimensional data, clusters often only exist in arbitrarily oriented subspaces of the feature space. In addition, these so-called correlation clusters may have complex relationships between each other. For example, a correlation cluster in a 1-D subspace (forming a line) may be enclosed within one or even several correlation clusters in 2-D superspaces (forming planes). In general, such relationships...
In this paper, we introduce a new efficient data layout scheme to efficiently handle out-of-core axis-aligned slicing queries of very large multidimensional volumetric data. Slicing is a very useful dimension reduction tool that removes or reduces occlusion problems in visualizing 3D/4D volumetric data sets and that enables fast visual exploration of such data sets. We show that the data layouts based...
We discuss a new efficient out-of-core multidimensional indexing structure, information-aware 2n-tree, for indexing very large multidimensional volumetric data. Building a series of (n-1)-Dimensional indexing structures on n-Dimensional data causes a scalability problem in the situation of continually growing resolution in every dimension. However, building a single n-Dimensional indexing structure...
This paper proposes novel and effective techniques to estimate a radius to answer k-nearest neighbor queries. The first technique targets datasets where it is possible to learn the distribution about the pairwise distances between the elements, generating a global estimation that applies to the whole dataset. The second technique targets datasets where the first technique cannot be employed, generating...
Spatial networks, such as road systems, operate differently from normal geospatial systems because objects are constrained to locations on the network. Performing queries on spatial networks demands entirely different solutions. Most spatial queries make use of an R-Tree to process them efficiently. The M-Tree is a data tree index which is capable of indexing data in any metric space. The M-Tree index...
Recent studies on efficiently answering subspace skyline queries can be separated into two approaches. The first focused on pre-materializing a set of skylines points in various subspaces while the second focus on dynamically answering the queries by using a set of anchors to prune off skyline points through spatial reasoning. Despite effort to compress the pre-materialized subspace skylines through...
This paper presents our experiences in porting the Sloan Digital Sky Survey(SDSS)/ SkyServer to the state-of- the-art open source database system MonetDB/SQL. SDSS acts as a well-documented benchmark for scientific database management. We have achieved a fully functional prototype for the personal SkyServer, to be downloaded from our site. The lessons learned are 1) the column store approach of MonetDB...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.