The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Hand posture recognition (HPR) plays an important role in human-computer interaction (HCI) since it is one of the most common and natural ways of communication among human beings. Different fingers often represent different meanings which will attract more attentions in HPR research. Based on finger geometric feature and its classification, we develop a HPR system that can tell its posture on possible...
Music listeners often mishear the lyrics to unfamiliar songs heard from public sources. Speech recognition and acoustic distance can be used to solve this problem. This paper proposes a robust lyric search method for music information retrieval. Firstly, weighted syllable confusion matrix (WSCM) is derived from confusion network. And then, we apply acoustic distance, which is computed based on a WSCM...
An automatic text recognizer needs, in first place, to localize the text in the image the more accurately possible. For this purpose, we present in this paper a robust method for text detection. It is composed of three main stages: a segmentation stage to find character candidates, a connected component analysis based on fast-to-compute but robust features to accept characters and discard non-text...
Reading text from scene images is a challenging problem that is receiving much attention, especially since the appearance of imaging devices in low-cost consumer products like mobile phones. This paper presents an easy and fast method to recognize individual characters in images of natural scenes that is applied after an algorithm that robustly locates text on such images. The recognition is based...
We propose a new component-tree based method with efficient and effective pruning strategies for userintention guided text extraction from scene images. A grayscale image is represented first as two component-trees, whose nodes represent possible candidates of character components. The non-text candidates are then pruned by using contrast, geometric and text line information as well as the constraint...
Symbol retrieval for technical documents is still a hot challenge in the document analysis community. In this paper we propose another way to spot symbols. A pixel-based template operator which is an adaptation of the hit-or-miss transform is defined. This operator is robust to translation, rotation and reflection. Experimental results on a real application show the efficiency of our approach.
In this paper, we propose a novel text detection approach based on stroke width. Firstly, a unique contrast-enhanced Maximally Stable Extremal Region(MSER) algorithm is designed to extract character candidates. Secondly, simple geometric constrains are applied to remove non-text regions. Then by integrating stroke width generated from skeletons of those candidates, we reject remained false positives...
Kanungo noise model is widely used to test the robustness of different binary document image analysis methods towards noise. This model only works with binary images while most document images are in grayscale. Because binarizing a document image might degrade its contents and lead to a loss of information, more and more researchers are currently focusing on segmentation-free methods (Angelika et...
In this paper, we propose a novel method for extracting a set of baseline-independent features, which are based on the combination of global and local information. A HMM-based recognition system is developed with 161 models that include a space model and a blank model. All of the models are trained using the standard Baum-Welch Algorithm with the state-tying technique, and are then decoded using the...
Text localization in natural scene images is an important prerequisite for many content-based image analysis tasks. In this paper, we proposed a novel and effective approach to accurately localize scene texts. Firstly, Maximally stable extremal regions(MSER) are extracted as letter candidates. Secondly, after elimination of non-letter candidates by using geometric information, candidate regions are...
In this paper we present a novel method for robust stereo matching on document image pairs. The matching itself is performed using an affine-invariant similarity measurement to compensate for perspective distortions, where affine invariance is achieved by normalization using second-order statistics, to finally allow a simple pixel-wise comparison. To handle the inherent high self-similarity of the...
In this paper, we present a new method for a locally adaptive region detector called Bilateral kernel-based Region Detector (BIRD). This work is to detect stable regions from images by consecutively computing a multiscale decomposition based on the bilateral kernel. The BIRD regards a region as covariant if it exhibits predictability in its photometric distance over spatial distance. Distinctiveness...
Intuitive and easily interpretable performance measures, repeatability and matching performance, for local feature detectors and descriptors were introduced by Mikolajczyk et al. [10, 9]. They, however, measured performance in a wide baseline setting that does not correspond to the visual object categorisation problem which is a popular application of the detectors and descriptors. The limitation...
This paper proposes a new cost construction method with multiscale Weber (MSW) descriptor and weighted linear regression for robust stereo matching in a two-layer hierarchical structure. Firstly, the MSW descriptors extracted from stereo pairs are utilized to combined raw matching costs to reduce the disparity search range. Secondly, the indispensable matching costs on the subsets of disparity candidates...
In this paper, we focus on a challenging pattern recognition problem of significant industrial impact: classifying vehicles from their rear videos as observed by a camera mounted on top of a highway with vehicles traveling at high speed. To solve this problem, we present a novel feature called structural signatures. From a rear view video, a structural signature recovers the vehicle side profile information...
Local Binary Descriptors (LBDs) are good at matching image parts, but how much information is actually carried? Surprisingly, this question is usually ignored and replaced by a comparison of matching performances. In this paper, we directly address it by trying to reconstruct plausible images from different LBDs such as BRIEF [4] and FREAK [1]. Using an inverse problem framework, we show that this...
Short message service (SMS) is now an indispensable way of social communication. However the mobile spam is getting increasingly serious, troubling users' daily life and ruining the service quality. We propose a novel approach for spam message detection based on mining the underlying social network of SMS activities. Comparing with strategies on keywords or flow detection, our network-based approach...
We propose a novel local feature descriptor, Local Gaussian Directional Pattern (LGDP), for face recognition. LGDP encodes the directional information of the face's textures (i.e., the texture's structure) in a compact way, producing a more discriminating code than other methods. The structure of each micro-pattern is computed by using a derivative-Gaussian compass mask, and encoded by using its prominent...
There are cases where surveillance video must be transmitted in the clear, which is without encryption, such as when multiple public safety agencies use the same video feeds. To ensure trust of the received video, it must be authenticated. While lossless video is straightforward to authenticate by cryptographic means, lossy video as may result from UDP, wireless, or transcoded transmissions, is more...
We extend the PCT (Pseudo Census Transform)-based appearance model [3] to ranking-based appearance model for face alignment. The PCT-based weak ranking function is learned using RankSVM, and the ranking appearance model (RAM) is constructed in a boosting manner. Experiments show that the PCT-based RAM is more robust and generalize better than the PCT-based boosted appearance model (BAM). The PCT-RAM...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.