The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In machine learning, data augmentation is the process of creating synthetic examples in order to augment a dataset used to learn a model. One motivation for data augmentation is to reduce the variance of a classifier, thereby reducing error. In this paper, we propose new data augmentation techniques specifically designed for time series classification, where the space in which they are embedded is...
Time series similarity measure is an essential issue in time series data mining, which can be widely used in various applications. With an eye to the fact that most current measures neglect the shape characteristic of time series, this paper proposes a shape based similarity measure. By introducing a shape coefficient into the traditional weighted dynamic time warping algorithm, an improved version,...
Telecommunications fraud, a new type of crime, is showing a rising trend in recent years. However, research from data mining perspectives to detect such frauds is scarce, especially with the behavioral sequences considered. Though the call detail records (CDRs) in telecommunication is generally a snapshot, the history of a caller/callee can be treated as sequences. Indeed, the historical calling sequences...
Today, we witness the appearance of many lifelogging cameras that are able to capture the life of a person wearing the camera, which produce a large number of images everyday. Automatically characterizing the experience and extracting patterns of behavior of individuals from this huge collection of unlabeled and unstructured egocentric data present major challenges and require novel and efficient...
While there exist a plethora of classification algorithms for most data types, there is an increasing acceptance that the unique properties of time series mean that the combination of nearest neighbor classifiers and Dynamic Time Warping (DTW) is very competitive across a host of domains, from medicine to astronomy to environmental sensors. While there has been significant progress in improving the...
Dynamic Time Warping (DTW) distance has been effectively used in mining time series data in a multitude of domains. However, in its original formulation DTW is extremely inefficient in comparing long sparse time series, containing mostly zeros and some unevenly spaced non-zero observations. Original DTW distance does not take advantage of this sparsity, leading to redundant calculations and a prohibitively...
Time series is a ubiquitous data existed in different domains including finance, medicine, business and other industrial fields. Recently, time series data mining attracts much attention. In this paper, we propose multilayer piecewise aggregate approximation (MPA) to measure the Similarity of time series. The proposed method is constituted of two parts: multi-level segment method based on extreme...
The amount of data in financial data is enormous and mining it has a great value. For stock market, how to effectively select stocks from a reference sector is very important for investors. Based on the co-movement effect between stocks, this paper introduces a concept which is the stock's influence and constructs the influence matrix by using the time series of stocks. Then we divide the new stock...
To cope with time-attribute and variations of event distribution in dynamic evolving process, an streaming process mining based on time series prediction and hybrid heuristic miner is proposed. A heuristic miner is improved based on post-task of activity in event logs to optimize the initial particle distribution for Particle Swarm Optimization. Furthermore, “aging factor” based on time series attribute...
The Bike Sharing System (BSS) has become a more and more popular means of transport in Paris and in many other cities around the world. It is also generating an increasingly huge amount of data describing users trips. Such datasets may be very useful for the data mining community in order to improve the global performance of the BSS. In this paper, we focus on the resources availability (free docks...
Movement data have been widely collected from GPS and sensors, allowing us to analyze how moving objects interact in terms of space and time and to learn about the relationships that exist among the objects. In this paper, we investigate an interesting relationship that has not been adequately studied so far: the following relationship. Intuitively, a follower has similar trajectories as its leader...
with the remarkable innovations on network infrastructures, new features of the internet users' behaviors are emerging. Researchers used to analyze the users' log on-line data with statistics and clustering to find out valuable time preference patterns. However, the traditional method to define distance between users would either neglect relevance of time series or lead to high-dimensional crisis...
We introduce a temporal pattern model called Temporal Tree Associative Rule (TTA rule). This pattern model can be used to express both uncertainty and temporal inaccuracy of temporal events expressed as Symbolic Time Sequences. Among other things, TTA rules can express the usual time point operators, synchronicity, order, chaining, as well as temporal negation. TTA rule is designed to allows predictions...
Effective mining technology can extract the spatial distribution pattern of the road network traffic flow. In this paper, the similarities between traffic flow objects with spatial temporal characteristics were measured by introducing the Dynamic Time Warping (DTW) and the shortest path analysis method. We proposed a kind of clustering analysis method for road network traffic flow data. So that traffic...
An integrated pattern mining technique for query answering is proposed for marine sensor data. In pattern query, we adopt the dynamic time warping (DTW) method and propose the use of a query relaxation approach in finding similar patterns. We further calculate prediction from discovered similar patterns in marine sensor data. The predictive values are then compared with the forecast from hydrodynamic...
Biosignal is a noninvasive measurement of the status of internal organism, such as electrocardiogram (ECG), electroencephalogram (EEG), and electromyogram (EMG), etc. With machine learning techniques, these biosignals are normally classified into one of a number of disease categories. Hence, they are ideally suited to support clinician in making diagnostic decision. However, if a given biosignal is...
Anomaly detection in data streams is the problem of extracting subsequences, which do not match an expected behavior. Its importance originates from its applicability in many fields such as system health monitoring, event detection in sensor networks, and detecting eco-system disturbances, etc. In detecting anomalous subsequences from data streams, the main challenge for the existing techniques is...
In this paper, we introduce a new method, called EWAT+, for finding discords in time series databases. The proposed method takes full advantages of WAT, the discord discovery algorithm proposed by Fu et al., with major improvements based on new discord measure functions which help to set up a range of alternative good orderings for the outer loop of the discord discovery algorithm. In addition, we...
The problem of similarity measure for time series has attracted considerable research interest. Most of the recently used algorithms utilize the Dynamic Time Warping (DTW) distance for measuring the similarity of time series, in various areas such as science, medicine, industry, and finance. DTW is a considerably more robust distance measure for time series, which allows similar shapes to match even...
Identifying outliers is a difficult thing in data mining. We adopt the notion of deviants for outliers in data streams. Deviants are data set whose removal from the data sequence over data streams lead to sum of error SSE minimize. We present DDA algorithm to detect deviants over massive data streams. With this algorithm the histogram can more accurately determine the deviants and greatly reduce error.
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.