The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Text encoding is considered as the most functional outset to store and retrieve data, with trees of information and lists of concordances as its first immediate results, but there is a wide range of possible results opening up when a complete encoding process is accomplished. The three case studies described in this paper are meant to give an overall view on the preliminary steps of a wider project...
According to the characteristics of Weibo event, this paper analyzes the advantages and disadvantages of the traditional K-means algorithm, and proposes the K-means clustering algorithm of events based on variable time granularity. The experiments show that the improved algorithm is more suitable for clustering analysis of Weibo event, improves the efficiency of clustering algorithm, and solves the...
Data file layout inference refers to the problem of identifying the organizational characteristics associated with a structured text file, where every record in a text file shares the same structural properties. These properties include: character encoding, record length, field length (indicated by delimiting characters or fixed length), field position, and field semantic content. Within this paper,...
Ancient texts represent a primary source for research in the classics. A substantial body of digital material has evolved enriching these texts. Unfortunately these data are often distributed across myriad locations, stored in diverse and incompatible formats and are either not available online or are made available only in isolation. This paper describes an investigation into using linked data principles...
We propose an XML C source code representation to support developing CASE tools. Since source code is a main artifact of software development, most CASE tools have some features related to source code editor, static analyzer, profiler, etc. To develop such tools, detailed information related to source code is needed. However, it is quite difficult to reuse program analysis features because they do...
Despite the proliferation of work on XML keyword query, it remains open to support keyword query over probabilistic XML data. Compared with traditional keyword search, it is far more expensive to answer a keyword query over probabilistic XML data due to the consideration of possible world semantics. In this paper, we firstly define the new problem of studying top-k keyword search over probabilistic...
Identification of opinions is a set of techniques which is a part of the natural language processing, especially in the information research area. This consists in developing systems able to extract and explore the opinions existing in corpuses. The presence of important textual mass of Arabic newspapers in an electronic format requires a particular exploration technique. We intend to present in this...
Geography Markup Language (GML) has become a de facto international encoding standard for exchanging geospatial data among heterogeneous Geographic Information Systems (GIS). Whereas, structurally redundant tags and textual data representation usually inflate the sizes of GML documents substantially, which makes the storage and transport costly. In this paper, we propose an effective compression approach...
Keyword search is considered to be an effective information discovery method for both structured and semi-structured data. In XML keyword search, query semantics is based on the concept of Lowest Common Ancestor (LCA). However, naive LCA-based semantics leads to exponential computation and result size. In the literature, LCA-based semantic variants (e.g., ELCA and SLCA) were proposed, which define...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.