The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this study, the effect of the weighting functions, which take place in the well known word embedding algorithms in the literature and which weight the co-occurrence statistics of the words, is examined. In the literature, it is assumed that the semantic relation between two words decreases inversely proportional as the distance between words increases. However, this assumption is not always acceptable...
Wikipedia is one of the most popular information platforms on the Internet. The user access pattern to Wikipedia pages depends on their relevance in the current worldwide social discourse. We use publically available statistics about the top-1000 most popular pages on each day to estimate the efficiency of caches for support of the platform. While the data volumes are moderate, the main goal of Wikipedia...
There is a vast amount of information about individuals available on the Web that has potential uses in Human Resource Management (HRM) - both for recruiters and job seekers. Since people names are inherently ambiguous, finding information related to a specific person is challenging and a simple query by name will likely return web pages related to several different individuals who happen to share...
Automatic building of software projects providesa desirable foundation to support a large variety of softwareengineering research tasks based on open software repositories. In this paper, we propose the first technique to automaticallyextract software build commands from software readme files andWiki pages, and combine the extracted commands for softwarebuilding. Specifically, we leverage the Named...
In this paper, we present the system of automatic MCQs (Multiple Choice Questions) generation for any given input text along with a set of distractors. The system is trained on a Wikipedia-based dataset consisting of URLs of Wikipedia articles. The important words (keywords) which consist of both bigrams and unigrams are extracted and stored in a dictionary along with many other components of the...
It is difficult for software professionals to find all the architectural knowledge they need from architecture documentation, and this results in wasted time and mistakes in projects. This is the case even when architecture documentation is indexed by an ontology and stored in a semantic wiki. We present a prototype tool called AK-Finder which queries architectural knowledge stored in a semantic wiki...
Topic modeling has increasingly attracted interests from researchers. Common methods of topic modeling usually produce a collection of unlabeled topics where each topic is depicted by a distribution of words. Associating semantic meaning with these word distributions is not always straightforward. Traditionally, this task is left to human interpretation. Manually labeling the topics is unfortunately...
Today's enterprises have to align their information systems continuously with their dynamic business and IT environment. Collaborative information systems address this challenge by involving diverse users in managing the application's data as well as its conceptual model. In this sense, both the data and the model co-evolve. There are different approaches for aligning data and model evolution, wherein...
This paper introduces an automatic categorical-marking model for text categorization. Traditional classification algorithms are generally applying labeled training set and call for a lot of manual work to tag classifications beforehand. Also due to the ambiguity and fuzziness of texts, the results of traditional text categorization algorithms may not be clear enough and abundant in content. This paper...
With the advent of big data era, the traditional knowledge management of teaching administration in colleges and universities is facing new difficulties and challenges. Due to the increasing trend of reform in colleges and universities, such as running schools in a more international way, developing students under the cooperation of enterprises and schools, allowing students who start an undertaking...
As more and more learners are opting for onlinelearning, e-learning industry is working on improving learningexperience of online user by providing relevant content and lotof additional references. Since online learners mostly prefervideo tutorials, identifying major topics and subtopics coveredin video tutorial is a big challenge. Recently, for efficientknowledge sharing and interoperability over...
Cross-modal retrieval, which aims to solve the problem that the query and the retrieved results are from different modality, becomes more and more essential with the development of the Internet. In this paper, we mainly focus on the exploration of high-level semantic representation of image and text for cross-modal matching. Deep convolutional image features and Fisher Vector with neural word embeddings...
This paper proposes a method of assisting movie summarization using plotinformation. A plot of a movie available at Wikipedia contains a majorstory of the movie. From such a plot of a movie, we extract severalimportant sentences as the content of summary. For summarizing movie, the key work is finding the best alignment between sentences of plot andshots which are segmented from a movie. There are...
Geo-tweet visualization help users know the events that is happening over the space and time from the tweets or wikipedia while they click on the specified location for a 3D based tag visualization. Normal events are detected by system which happens anywhere or anytime using machine learning algorithm and special events are also extracted by comparing current situation to normal regularities. Generally,...
The classification of text documents into a number of pre-defined categories has many application scenarios, for example the classification of news items into thematic sections. Documents to be classified are commonly represented by a bag-of-words feature vector. The bag-of-words model cannot handle two language phenomena: synonymy and polysemy, besides, dimensions of feature vectors are orthogonal...
The purpose of this android application is to provide educational based Chatbot for visually impaired people. It will give an answer to the educational based queries asked by the visually impaired people. They can easily launch the application with the help of google voice search. Once the application is open, it will give a voice instruction to use an application. Output will be provided in voice...
Objective of question answering system (QA) is to generate concise answer of arbitrary question asked in natural language. This kind of information retrieval is required with growth of digital information. Analysis of natural language is complex task. Previously QAS were developed for specific domain and have limited efficiency. Present QAS Target on types of question commonly asked by users, characteristics...
Wikipedia is one of the fastest growing websites and a primary source of knowledge on the Internet. Being a wiki, its content is crowd-sourced by the users. This has many benefits and it is one of the main reasons it has grown to reach more than 5 million articles in its English version. Nevertheless, this also raises issues, like the overlinking of articles, which are difficult to deal with by editors...
Comparable corpora contain significant quantities of useful data for Natural Language Processing tasks, especially in the area of Machine Translation. They are mainly the source of parallel text fragments. This paper investigates how to effectively extract bilingual texts from comparable corpora relying on a small-size parallel training corpus. We propose a new technique to filter non parallel articles...
The domain of traditional web is gradually evolving with the adaptation of newer techniques, which includes semantic web. Integration of web content using ontologies in a language independent manner is a required feature in this process. For better utilization of the resources, it is necessary that the ontology, which is working as a central knowledge repository, to be language independent as well...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.