The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
We propose a new pretreatment for pedestrian detection with convolutional networks. It is widely known that the phenomenon of overlapping feature distribution is common, which leads to overfitting problem. We present a method that divide one category that have overlapping distributed features into multi-subcategories. By this means smooth boundaries can be easily found to separate different subcategories,...
Privacy-preserving data publishing is an important problem which exists in research and has become increasingly vital in recent years. We come across situations where a data owner wishes to publish data without revealing private information. A known solution to this problem is differential privacy which is a research topic that implements noise injection using the Laplace distribution and building...
Pregnancy complications are a leading cause of maternal deaths in the present era. There is a rising need to protect pregnant women from possible threats posed by abnormalities induced by changing physiological parameters. Pregnancy is a delicate stage and requires acute medical attention and care. Decision tree classification algorithms are popular and powerful methods most suitable for the medical...
Linear Support Vector Machine (LSVM) has recently become one of the most prominent learning methods for solving classification and regression problems because of its applications in text classification, word-sense disambiguation, and drug design. However LSVM and its variations cannot adapt accordingly to a dynamic dataset nor learn in online mode. In this paper, we introduce an Adaptable Linear Support...
Finding all similar time-series patterns in real time under Dynamic Time Warping (DTW) is a huge challenge in nowadays data mining. A vital requirement of the critical task is data normalization so that the search results are accurate. However, DTW and data normalization, particularly in the streaming context, cost great deals of computation time and memory space; so many techniques are required to...
Software misconfigurations are responsible for a substantial part of today's system failures, causing about one-quarter of all customer-reported issues. Identifying their root causes can be costly in terms of time and human resources. We present an approach to automatically pinpoint such defects without error reproduction. It uses static analysis to infer the correlation degree between each configuration...
Document classification is usually more challenging than numerical data classification, because it is much more difficult to effectively represent documents than numerical data for classification purposes. Vector space model (VSM) has been widely used for document representation for classification, in which a document is represented by a vector of feature values based on a bag of words. This paper...
Principal component analysis (PCA) is a commonly used method for feature extraction and dimensionality reduction. This paper proposes PCA based on similarity/correlation criteria instead of covariance to gain low-dimensional features with high performance in text classification. Experimental results have demonstrated the advantages and usefulness of the proposed method in text classification in high-dimensional...
In this paper, we present a new approach to computing the lower bound on the measurement of buffer overflow probability, when the buffer state is modeled as a semi-Markov process. In this commonly assumed model of buffer overflow we use this approach to explore the relationship between sampling rate and accuracy. Crucially we go on to show that a realistic simulation of a packet buffer reveals that...
Automatic language identification is a natural language processing problem that tries to determine the natural language of a given content. In this paper we present a statistical method for automatic language identification of written text using dictionaries containing stop words and diacritics. We propose different approaches that combine the two dictionaries to accurately determine the language...
Tens of thousands of pictures are taken at different locations throughout the year. People often visit places and take pictures to remember their visits. We believe that the seasonal travel patterns of people to specific locations will create a correlation between a location and the season of the images taken in that location. For example, fewer people visit Bear Valley, California during the summer...
In recent years, with the popularity of Internet technology, online e-learning has become more and more well known. Flipped Classroom, MOOCs (Massive Open Online Course) are becoming the innovative study type. MOOC is one of the representatives of an online teaching concept. Under this situation, a great number of student attend a course, the teacher will have the burden because of the more and more...
Support Vector Machine (SVM) is a popular machine learning technique for classification. SVM is computationally infeasible with large dataset due to its large training time. In this paper we compare three different methods for training time reduction of SVM. Different combination of Decision Tree (DT), Fisher Linear Discriminant (FLD), QR Decomposition (QRD) and Modified Fisher Linear Discriminant...
The advantages of multi-classification schemes based on decomposition strategies, and especially the One-vs-One framework, have been stressed even for those algorithms that can address multiple classes. However, there is an inherent hitch for the One-vs-One learning scheme related to the decision process: the non-competent classifier problem. This issue refers to the case where a binary classifier...
The authors study the problem of how news summarization can help stock price prediction, proposing a generic stock price prediction framework to enable the use of different external signals to predict stock prices. Experiments were conducted on five years of Hong Kong Stock Exchange data, with news reported by Finet; evaluations were performed at individual stock, sector index, and market index levels...
Multimodal sentiment analysis is the analysis of emotions, attitude, and opinion from audiovisual format. A company can improve the quality of its product and services by analyzing the reviews about the product [5]. Sentiment analysis is widely used in managing customer relations. There are many textual reviews from which we cannot extract emotions by traditional sentiment analysis techniques. Some...
Intrusion Detection System (IDS) is used to preserve the data integrity and confidentiality from attacks. In order to identify the type of attack in IDS, different methodologies like various data mining techniques exist. But some are very time consuming and laborious. Therefore we have proposed the usage of SVM (Support Vector Machine) for classification of attack from large amount of raw intrusion...
Over recent years, the world has experienced a huge growth in the volume of shared web texts. Its users generate daily a huge volume of comments and reviews related to different aspects of their lives. In general, opinion mining/sentiment analysis refers to the task of identifying positive and negative opinions, emotions and evaluations related to an article, news, products, services, etc [1]. Arabic...
In 2010, Global Status Report on NCD World Health Organization (WHO) reported that 60 percent of deaths in the world caused by the non-communicable diseases, and one of the non-communicable diseases that consumed a lot of attention was diabetes mellitus. Diabetes is a serious threat to the health development, because diabetes is a disease that caused most other diseases (complications), such as blindness,...
Identification of root causes of a performance problem is very difficult in case of large scale IT environment. A model which is scalable and reasonably accurate is required for such complex scenarios. This paper proposes a hybrid model using random forest and statistical change point detection, for root cause localization. Based on impurity measure and change in error rates, random forest identifies...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.