The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We introduce a novel metric space search data structure called EGNAT, which is fully dynamic and designed for secondary memory. The EGNAT is based on Brin's GNAT static index, and partitions the space according to hyperplanes. The EGNAT implements deletions using a novel technique dubbed Ghost Hyperplanes, which is of independent interest for other metric space indexes. We show experimentally that...
Metric space as a universal and versatile model of similarity can be applied in various areas of non-text information retrieval. However, a general, efficient and scalable solution for metric data management is still a resisting research challenge. We introduce a novel indexing and searching mechanism called metric index (M-Index), that employs practically all known principles of metric space partitioning,...
We propose an alternate method for indexing data for answering queries in non-metric spaces. The traditional use of distance and triangle inequality is substituted with the use of fuzzy similarity fulfilling the transitivity property with a tuneable fuzzy conjunctor. In a non-metric space it is still possible that there is a fuzzy conjunctor such that transitivity holds and usual indexing techniques...
The following topics are dealt with: similarity query search; data analysis; query-evaluation algorithm; distributed and parallel index structure and image retrieval.
We offer a theoretical validation of the curse of dimensionality in the pivot based indexing of datasets for similarity search, by proving, in the framework of statistical learning, that in high dimensions no pivot based indexing scheme can essentially outperform the linear scan. A study of the asymptotic performance of pivot based indexing schemes is performed on a sequence of datasets modeled as...
Metric space searching is an emerging technique to address the problem of efficient similarity searching in many applications, including multimedia databases and other repositories handling complex objects. Although promising, the metric space approach is still immature in several aspects that are well established in traditional databases. In particular, most indexing schemes are not dynamic, that...
The traditional problem of similarity search requires to find, within a set of points, those that are closer to a query point q, according to a distance function d. In this paper we introduce the novel problem of metric filtering: in this scenario, each data point xi possesses its own distance function di and the task is to find those points that are close enough, according to di, to a query point...
We present an overview of combinatorial framework for similarity search. An algorithm is combinatorial if only direct comparisons between two pairwise similarity values are allowed. Namely, the input dataset is represented by a comparison oracle that given any three points X,Y,Z answers whether Y or Z is closer to X. We assume that the similarity order of the dataset satisfies the four variations...
We consider the problem of similarity search in metric spaces with costly distance functions and large databases. There is a trade-off between the amount of information stored in the index and the reduction in the number of comparisons for solving a query. Pivot-based methods clearly outperform clustering-based ones in number of comparisons, but their space requirements are higher and this can prevent...
Sketches are compact bit string representations of objects. Objects that have the same sketch are stored in the same database bucket. By calculating the Hamming distance of the sketches, an estimation of the similarity of their respective objects can be obtained. Objects that are close to each other are expected to have sketches with small hamming distance values. This estimation helps to schedule...
We show a new metric for comparing unordered, tree-structured data. While such data is increasingly important in its own right, the methodology underlying the construction of the metric is generic and may be reused for other classes of ordered and partially ordered data. The metric is based on the information content of the two values under consideration, which is measured using Shannon's entropy...
The content-based photo image retrieval (CoPhIR) dataset is the largest available database of digital images with corresponding visual descriptors. It contains five MPEG-7 global descriptors extracted from more than 106 million images from Flickr photo-sharing system. In this paper, we analyze this dataset focusing on 1) efficiency of similarity-based indexing and searching and on 2) expressiveness...
A recent probabilistic approach for searching in high dimensional metric spaces is based on predicting the distances between database elements according to how they order their distances towards some set of distinguished elements, called permutants. In the preprocessing phase a set of permutants is chosen, and are sorted (permuted) by their distances against every database element. The permutations...
By exploiting and extending the research achievements in metric searching technology over the last ten years, MUFIN proves the expected extensibility and scalability properties of this technology. The scalability has been demonstrated by an interactive image retrieval system over 280-dimensional vectors, which is one order of magnitude higher than what most of the literature considers to be the dimensionality...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.