The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Data scientists in software engineering seek insight in data collected from software projects to improve software development. The demand for data scientists with domain knowledge in software development is growing rapidly and there is already a shortage of such data scientists. Data science is a skilled art with a steep learning curve. To shorten that learning curve, this workshop will collect best...
Sentiment analysis refers to the automatic extraction of sentiments from a natural language text. We study the effect of subjectivity-based features on sentiment classification on two lexicons and also propose new subjectivity-based features for sentiment classification. The subjectivity-based features we experiment with are based on the average word polarity and the new features that we propose are...
In this research we propose a new motion classification method to improve operability of a 3D gesture interface that assists text input on mobile devices. A certain range of time-series finger scale data is cropped and is classified using linear discriminant analysis. To confirm possibility of linear separation, data were visualized using principle component analysis. Experimental result with changing...
In the past decade, we have witnessed an explosive growth of the Web, online communities, and social media. This has led to a substantial increase in the range and scope of electronic communication and distributed collaboration. In distributed teams, social communication is thought to be critical for creating and sustaining relationships, but there is often limited opportunity for team members to...
Many application areas that use supervised machine learning make use of multiple raters to collect target ratings for training data. Usage of multiple raters, however, inevitably introduces the risk that a proportion of them will be unreliable. The presence of unreliable raters can prolong the rating process, make it more expensive and lead to inaccurate ratings. The dominant, "static" approach...
In our research we use state-of-the-art Web and human language technology to create a language learning experience which blends foreign language acquisition with the user's everyday browsing activities. The language students can interact with their peers to foster social learning skills. The whole learning process is supported and supervised by a human instructor to offer encouragement and feedback...
Providers of composite Web services face the challenge of having to comply to SLAs, which are agreements governing the minimum performance that customers can expect from a composite service. In this work, a framework for optimizing adaptations of service compositions with regards to SLA violations has been developed. The framework, dubbed PREvent (Prediction and Prevention of SLA Violations Based...
Word Sense Disambiguation (WSD) is the task of choosing the most appropriate sense of a word having multiple senses in a given context. Collocational features acquired from the words in neighborship with the ambiguous word are one of the important knowledge sources in this area. This paper explores the effective sets of collocational features in Turkish in order to obtain better Turkish WSD systems...
We review a statistical machine learning model of top-down task driven attention based on the notion of ‘gist’. In this framework we consider the task to be represented as a classification problem with two sets of features — a gist of coarse grained global features and a larger set of low-level local features. Attention is modeled as the choice process over the low-level features given the gist. The...
The field of machine learning strives to develop algorithms that, through learning, lead to generalization; that is, the ability of a machine to perform a task that it was not explicitly trained for. Numerous approaches have been developed ranging from neural network models striving to replicate neurophysiology to more abstract mathematical manipulations which identify numerical similarities. Nevertheless...
Software analytics is to enable software practitioners to perform data exploration and analysis in order to obtain insightful and actionable information for data-driven tasks around software and services. When applying analytic technologies in practice of software analytics, one should incorporate (1) a broad spectrum of domain knowledge and expertise, e.g., management, machine learning, large-scale...
This is the preliminary work for a project which will be filtering comments made on news and papers automatically. Our database has over 1 million news and comments. Due to the intensity of our data, 30.677 comments made on 15.064 articles on 44 different categories are used as experimental data. Proposed anomaly based method have been obtained fast and high accuracy results without the high storage...
Recent advancements in Web 2.0, people can't be regarded as simple content reader, they can also contribute content as writers. This work consists of microblogging and text categorization. Text categorization steps were used in microblogs to find out users whose contributions are more valuable for its related category. 2015 RSS news feeds were taken for training and users' tweets were used as test...
This paper presents a optimization model of eye-hand coordination, which is based on a partially-observable markov decision processes with 17 continuous state dimensions. the maximum likelihood observation is always obtained. Maximum likelihood observation can be obtained by the maximum likelihood observation. Since the globally-optimal solution for a high-dimensional domain is computationally intractable,...
This paper presents an approach for detecting duplicate records in the context of digital gazetteers, using a state-of-the-art machine learning technique. It reports on a thorough evaluation of a machine learning approach designed for the task of classifying pairs of gazetteer records as either duplicates or not, built by using Random Forests and leveraging on different combinations of similarity...
An application which operates on an imbalanced dataset loses its classification performance on a minority class, which is rare and important. There are a number of over-sampling techniques, which insert minority instances into a dataset, to adjust the class distribution. Unfortunately, these instances highly affect the computation of generating a classifier. In this paper, a new simple and effective...
An ambiguous proper name is a name which is also a valid dictionary word with a meaning of its own when used in the text. For example in English, the word 'bush' in 'Mr. Bush' is a proper name whereas in 'a dense bush' it is a lexical entity. Almost all proper names in Hindi have a meaning and find an entry in the dictionary. Recognition of named entities finds wide application in MT, IR and several...
All manufacturing companies have been trying to integrate intelligence-oriented strategies into their workflow. This saves time and money and leads to a better quality of goods. This intelligent strategy is in the subject area of Artificial Intelligence (AI). In our contribution a “Knowledge-Based System” (KBS) for image processing and pattern recognition, machine learning (ML) is used to measure...
This paper presents a comparison between DMPML and three data mining applications (Weka, RapidMiner, and KN-IME) that implement the directed graph approach, concerning the time spent to create and execute the data preparation tasks for two data mining algorithms. The tests were executed using different types of data sets: numerical, categorical, and mixed. We observed that the scheme used by the DMPML...
In transfer learning scenarios, previous discriminative dimensionality reduction methods tend to perform poorly owing to the difference between source and target distributions. In such cases, it is unsuitable to only consider discrimination in the low-dimensional source latent space since this would generalize badly to target domains. In this paper, we propose a new dimensionality reduction method...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.