The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Nature inspired approaches have been used in the design of computer solutions for real life problems. These computer solutions take the form of algorithms which characterize specific behaviour of animals or birds in their natural habitat. The two bio-inspired computational concepts in modern times includes evolutionary and swarm intelligence. A novel introduction to the bio-inspired computational...
Big data analytics is emerging as an important research field nowadays with many technical challenges that confront both commercial IT deployment and big data research communities. One of the inherent problems of big data is the curse of dimensionality. Modern data are described with many attributes and stored with high dimensions. In data analytics, feature selection has been popularly used to lighten...
Data clustering is one of the most popular branches in machine learning and data analysis. Partitioning-based type of clustering algorithms, such as K-means, is prone to the problem of producing a set of clusters that is far from perfect due to its probabilistic nature. The clustering process starts with some random partitions at the beginning, and it tries to improve the partitions progressively...
SMOTE (Synthetic minority over-sampling technique) is a commonly used over-sampling technique to subside the imbalanced dataset problem. Traditionally SMOTE has two key important parameters, one is to control the amount of over-sampling, and the other specifies the area of the nearest neighbors. These two parameters are arbitrarily chosen by user. So there are no universally best default values. In...
Big Data though it is a hype up-springing many technical challenges that confront both academic research communities and commercial IT deployment, the root sources of Big Data are founded on data streams. It is generally known that data which are sourced from data streams accumulate continuously making traditional batch-based model induction algorithms infeasible for real-time data mining or high-speed...
Sentiment analysis in text mining is known to be a challenging task. Sentiment is subtly reflected by the tone, affective state or emotion of a writer's expression in words. Conventional text mining techniques which are based on keyword frequency counting usually run short of accurately detecting such subjective information implied in the text. In this paper we evaluated several popular classification...
Finding an appropriate set of features from data of high dimensionality for building an accurate classification model is a well-known NP-hard computational problem. Unfortunately in data mining, some big data are not only big in volume but they are described by a large number of features. Many feature subset selection algorithms have been proposed in the past, they are nevertheless far from perfect...
Hoeffding's bound (HB) has been widely used for node splitting in incremental decision tree algorithms. Many decision-tree algorithms adopt a sliding-window technique to detect concept drift when mining changing data streams. This paper presents a novel node-splitting approach that replaces the traditional HB with a new measure. The new measure is derived from a loss function applied in a cache-based...
There is an alarming news recently revealed on media that 8.7 percent of users on Facebook are fake; this amounts to more than 83 million accounts worldwide. Consequently this huge number of fake users whose profiles were unverified translates to the potential dangers ranging from espionage, identity thievery, information misuse and loophole to privacy compromise to the users and their families. Nowadays...
Today the channels for expressing opinions seem to increase daily. When these opinions are relevant to a company, they are important sources of business insight, whether they represent critical intelligence about a customer's defection risk, the impact of an influential reviewer on other people's purchase decisions, or early feedback on product releases, company news or competitors. Capturing and...
People increasingly use Twitter to share advice, opinions, news, moods, concerns, facts, rumors, and everything else imaginable. Much of that data is public and available for mining. However, classifying automatically the sentiment of the Twitter messages into either positive or negative with respect to a query term represents a new research challenge. Variety of approaches that use natural language...
Data Mining is concerned with extraction of interesting patterns or knowledge from huge amounts of Data. Generally data mining tasks are either predictive or descriptive. Classification falls under predictive induction while clustering and association rule mining fall under descriptive induction. Subgroup discovery is a task at the intersection of supervised learning and descriptive induction. In...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.