The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Wikipedia is one of the most popular information platforms on the Internet. The user access pattern to Wikipedia pages depends on their relevance in the current worldwide social discourse. We use publically available statistics about the top-1000 most popular pages on each day to estimate the efficiency of caches for support of the platform. While the data volumes are moderate, the main goal of Wikipedia...
There is a vast amount of information about individuals available on the Web that has potential uses in Human Resource Management (HRM) - both for recruiters and job seekers. Since people names are inherently ambiguous, finding information related to a specific person is challenging and a simple query by name will likely return web pages related to several different individuals who happen to share...
In this paper, we present the system of automatic MCQs (Multiple Choice Questions) generation for any given input text along with a set of distractors. The system is trained on a Wikipedia-based dataset consisting of URLs of Wikipedia articles. The important words (keywords) which consist of both bigrams and unigrams are extracted and stored in a dictionary along with many other components of the...
Many organizations have been reported to create and monitoring targeted Twitter streams to collect a bunch of information and understand according to user's view. Targeted Twitter stream is main usually constructed by filtering tweets and that abused words with predefined selection criteria. Due to its invaluable business value of timely information from these tweets, it's a necessary to understand...
Topic modeling has increasingly attracted interests from researchers. Common methods of topic modeling usually produce a collection of unlabeled topics where each topic is depicted by a distribution of words. Associating semantic meaning with these word distributions is not always straightforward. Traditionally, this task is left to human interpretation. Manually labeling the topics is unfortunately...
This paper introduces an automatic categorical-marking model for text categorization. Traditional classification algorithms are generally applying labeled training set and call for a lot of manual work to tag classifications beforehand. Also due to the ambiguity and fuzziness of texts, the results of traditional text categorization algorithms may not be clear enough and abundant in content. This paper...
As more and more learners are opting for onlinelearning, e-learning industry is working on improving learningexperience of online user by providing relevant content and lotof additional references. Since online learners mostly prefervideo tutorials, identifying major topics and subtopics coveredin video tutorial is a big challenge. Recently, for efficientknowledge sharing and interoperability over...
Cross-modal retrieval, which aims to solve the problem that the query and the retrieved results are from different modality, becomes more and more essential with the development of the Internet. In this paper, we mainly focus on the exploration of high-level semantic representation of image and text for cross-modal matching. Deep convolutional image features and Fisher Vector with neural word embeddings...
This paper proposes a method of assisting movie summarization using plotinformation. A plot of a movie available at Wikipedia contains a majorstory of the movie. From such a plot of a movie, we extract severalimportant sentences as the content of summary. For summarizing movie, the key work is finding the best alignment between sentences of plot andshots which are segmented from a movie. There are...
Geo-tweet visualization help users know the events that is happening over the space and time from the tweets or wikipedia while they click on the specified location for a 3D based tag visualization. Normal events are detected by system which happens anywhere or anytime using machine learning algorithm and special events are also extracted by comparing current situation to normal regularities. Generally,...
The classification of text documents into a number of pre-defined categories has many application scenarios, for example the classification of news items into thematic sections. Documents to be classified are commonly represented by a bag-of-words feature vector. The bag-of-words model cannot handle two language phenomena: synonymy and polysemy, besides, dimensions of feature vectors are orthogonal...
The purpose of this android application is to provide educational based Chatbot for visually impaired people. It will give an answer to the educational based queries asked by the visually impaired people. They can easily launch the application with the help of google voice search. Once the application is open, it will give a voice instruction to use an application. Output will be provided in voice...
Objective of question answering system (QA) is to generate concise answer of arbitrary question asked in natural language. This kind of information retrieval is required with growth of digital information. Analysis of natural language is complex task. Previously QAS were developed for specific domain and have limited efficiency. Present QAS Target on types of question commonly asked by users, characteristics...
Wikipedia is one of the fastest growing websites and a primary source of knowledge on the Internet. Being a wiki, its content is crowd-sourced by the users. This has many benefits and it is one of the main reasons it has grown to reach more than 5 million articles in its English version. Nevertheless, this also raises issues, like the overlinking of articles, which are difficult to deal with by editors...
Comparable corpora contain significant quantities of useful data for Natural Language Processing tasks, especially in the area of Machine Translation. They are mainly the source of parallel text fragments. This paper investigates how to effectively extract bilingual texts from comparable corpora relying on a small-size parallel training corpus. We propose a new technique to filter non parallel articles...
The domain of traditional web is gradually evolving with the adaptation of newer techniques, which includes semantic web. Integration of web content using ontologies in a language independent manner is a required feature in this process. For better utilization of the resources, it is necessary that the ontology, which is working as a central knowledge repository, to be language independent as well...
Named Entity Identification (NEI) is the task of identifying named entities from textual data. While NEI for English language can be done with considerable accuracy owing to tools like Stanford NER tagger, the accuracy in case of Indian languages like Hindi is comparatively poor. One of the reasons for this is the lack of sufficiently large annotated corpora in Indian languages on which NE-taggers...
The fifth Dialog State Tracking Challenge (DSTC5) introduces a new cross-language dialog state tracking scenario, where the participants are asked to build their trackers based on the English training corpus, while evaluating them with the unlabeled Chinese corpus. Although the computer-generated translations for both English and Chinese corpus are provided in the dataset, these translations contain...
Discovering topics in short texts, such as news titles and tweets, has become an important task for many content analysis applications. However, due to the lack of rich context information in short texts, the performance of conventional topic models on short texts is usually unsatisfying. In this paper, we propose a novel topic model for short text corpus using word embeddings. Continuous space word...
Understanding changes in the mood and mentalhealth of large populations is a challenge, with the need for largenumbers of samples to uncover any regular patterns within thedata. The use of data generated by online activities of healthyindividuals offers the opportunity to perform such observationson the large scales and for the long periods that are required. Various studies have previously examined...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.