The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper presents a case study of discovering and classifying verbs in large web-corpora. Many tasks in natural language processing require corpora containing billions of words, and with such volumes of data co-occurrence extraction becomes one of the performance bottlenecks in the Vector Space Models of computational linguistics. We propose a co-occurrence extraction kernel based on ternary trees...
Today, user generated content and online shared opinions are gaining relevance as a source of information not only for other consumers but also for retailers. However, the huge number of posted opinions makes difficult any manual analysis. This paper proposes a new approach for gender discourse analysis based on the semantic analysis of the content of shared reviews in electronic word of mouth communities...
Twitter has attracted millions of users to share and disseminate most up-to-date information, resulting in large volumes of data produced everyday. However, many applications in Information Retrieval (IR) and Natural Language Processing (NLP) suffer severely from the noisy and short nature of tweets. In this paper, we propose a novel framework for tweet segmentation in a batch mode, called HybridSeg...
The internet has evolved from an informational space to a significant communication space, a worldwide social network by which millions of opinions are expressed daily with no sociological, psychological, temporal or spatial constraint. The content analysis of these opinions allows us to identify and then categorize the sentiments they carry. In this article, we will attempt to present the state of...
To access the Internet, companies define a Service Level Agreement (SLA) with Internet Service Providers (ISPs). Nevertheless, the current Internet does not assure Quality of Service (QoS), what points toward the concept of virtual networks (VNs) and software defined network (SDN) to support the Future Internet. Moreover, the VN and SDN approaches can be mixed creating the Virtual Software Defined...
The main challenge of question answering is that the lack of task structure prohibits the use of simplified assumptions as in task-oriented dialogue systems. This problem was tackled by integrating a dialogue management environment into a question answering system. Firstly, Wizard of Oz studies were conducted to discover how users describe their music information needs in contextual situations as...
This paper introduces normalized Google distance into the study of word sense disambiguation and presents a novel unsupervised method of word sense disambiguation. The normalized Google distance is a theory of similarity between words and phrases, based on information distance and Kolmogorov complexity by using the world-wide-web as database, with its page counts derived from a search engine such...
A novel solution is proposed to an important problem of learning real querying preferences and intentions from users who need to retrieve interesting information from a database but are not in a position to specify their information needs and/or intentions using a query language due to lack of knowledge and/or experience. A solution is proposed that is based on the presentation to the user of consecutive...
The development of Internet technologies makes it possible to obtain data in near real time about the financial state of companies. Moreover, tools such as XBRL have been developed to deal with the automatic generation of business reports. However the available tools are not suitable to support the current tendency towards the so called, Continuous Reporting. Here, for a specific purpose, the wealth...
Since Word Wide Web contains large set of data in different languages, retrieving language specific information creates a new challenge in information retrieval called language specific crawling. In this paper, a new approach is purposed for language specific crawling in which a combination of some selected content and context features of web documents have been applied. This approach has been implemented...
In the emerging e-Science scenario users should be able to easily combine data resources and tools/services; and machines should automatically be able to trace paths and carry out interpretations. Users who want to participate need to move from a down-load first to a cyberinfrastructure paradigm, thus increasing their dependency on the seamless operation of all components in the Internet. Such a scenario...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.