The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In the last years the volume of data that was generated by the mankind has increased and the complexity of data generated has also increased. Since the computers have evolved and provide more processing power, it is possible to carry out the real-time analysis of big volumes of data. This paper suggests the architecture of a big data processing platform called BigTim, which is able to run clustering...
A new method of pattern analysis, based on paired index comparison is introduced. Key properties of the method are described. The effectiveness is demonstrated on the Iris Anderson-Fisher Data.
Nowadays, large volumes of data and measurements are being continuously generated by computer and telecommunication networks, but such volumes make it difficult to extract meaningful knowledge from them. This paper presents SaFe-NeC, an innovative methodology for analyzing network traffic by exploiting data mining techniques, i.e. clustering and classification algorithms, focusing on self-learning...
A distributed system consists of several autonomous nodes. In a distributed system some of the nodes may be overloaded due to a large number of job arrivals while other nodes may remain idle without any processing. The performance of a distributed system depends crucially on dividing up work effectively among the computing nodes. So a way is needed to share load across all the computing nodes. In...
Strengthening of Smart Grid functionalities has become the need of the 21st Century. Security evolves to be the primary concern at the deployment level of Smart Grids. Cyber security threats and vulnerabilities in Smart grid Network needs to be addressed before the deployment of the Smart Grid. Our proposed intrusion detection scheme identifies anomalies in the Smart Grid traffic and detects attacks...
In recent years, many successful machine learning applications have been developed. Classification & Clustering is one such. This application is cross-disciplinary, now that it is based on data mining algorithms on the technical side and on graphemes and morphophonemic on the linguistic side. It will thus map the correspondence between grapheme 〈y〉 and related phonemes via morphemes in a given...
In this paper we evaluate and compare two representativeand popular distributed processing engines for large scalebig data analytics, Spark and graph based engine GraphLab. Wedesign a benchmark suite including representative algorithmsand datasets to compare the performances of the computingengines, from performance aspects of running time, memory andCPU usage, network and I/O overhead. The benchmark...
Standard fuzzy c-means algorithm only considers gray information and noise tolerance ability is poor. In order to overcome the drawbacks of traditional fuzzy c-means algorithm, a kind of improved ant colony algorithm is used to optimize fuzzy c-means. Then a new kind of image segmentation algorithm is put forward based on improved fuzzy c-means method. The experiment results show that the proposed...
The brain tumor tissue detection allows to localize a mass of abnormal cells in a slice of Magnetic Resonance (MR). The automatization of this process is useful for post processing of the extracted region of interest like the tumor segmentation. In order to detect this abnormal growth of tissue in an image, this paper presents a novel scheme which uses a two-step procedure; the k-means method and...
In today's digital world scenario, digital data is coming in and going out faster than ever before. This data is of no use until we extract some useful content from it. But, it is impractical and inefficient to use traditional database management techniques on big data. That's why, big data technologies like Hadoop comes to existence. Hadoop is an open source framework, which can be used to process...
In view of today's information available, recent progress in data mining research has lead to the development of various efficient methods for mining interesting patterns in large databases. It plays a vital role in knowledge discovery process by analyzing the huge data from various sources and summarizing it into useful information. It is helpful for analyzing the volumes of data in different domains...
Spatio - temporal methods is the process of innovations and finding the patterns from the knowledge representations through outliers. This kind of data representing the (i) the states of an object (ii) position or event in space at a particular period of time. It refers to the Objects whose attribute values are entirely different from its neighbourhood. Always their locations are different even the...
Financial stock Data Analysis and future prediction in terms of Sentiments is great challenge in the big data research. Among the unlabelled opinion, opinion classification in terms of unsupervised learning algorithm will lead to classification error as data is sparse and high dimensional. To overcome this problem, the sentiment analysis to extract the opinion of each word in the stock data has been...
Based on the investigation of periodic shrew distributed DoS Attacks among enormous normal end-users' flow in cloud computing, this paper proposed a new method to take frequency-domain characteristics from the autocorrelation sequence of network flow as clustering feature to group end-user flow data by BIRTH algorithm, and re-merge these clustering results into new groups by overcoming the deficiency...
Recent advances in using computer with different fields of sciences produced huge amounts of data. These data represent as an analysis tool and key to overcome many problems. Clustering is a primary process to analyze the data as well as, it's a preprocessing step before other techniques like classification. Density-Based clustering algorithms have advantages like clustering any arbitrary shapes and...
Outlier detection is an important issue in the realm of data mining. Several applications relay on outlier detection such as intrusion detection, fraud detection, medical and public health data, image processing, etc. Clustering-based outlier detection algorithms are considered as the most important outlier detection approaches. They provide high detection rate, however, they suffer from high false...
With the phenomenal increase in digital data, it is inefficient to run the traditional clustering algorithms on separate servers. To deal with this problem, researchers are migrating to distribute environment to implement the traditional clustering algorithms, more specifically K-means clustering. In traditional K Means Clustering, the problem of instability caused by the random initial centers exists...
Restructuring web search results is the best solution for ambiguous queries being entered to the search engine. When ambiguous queries are entered to the search engine gives multiple results for same query, so user don't get specific and accurate information about what they really want, so it becomes difficult for a user to get specific information related to the submitted keyword. For this reason...
Box and Tiao suggested about the prior distribution, which according to them is hypothetically representing the knowledge about anonymous constraints prior to the availability of data. It acts as a productive role in Bayesian analysis. Further, allotments of such kind also represent former knowledge or relative ignorance [4]. The chance of occurrence or predictability is defined by the term Probability...
Word Sense Disambiguation (WSD) is crucial and its significance is prominent in every application of computational linguistics. WSD is a challenging problem of Natural Language Processing (NLP). Though there are lots of algorithms for WSD available, still little work is carried out for choosing optimal algorithm for that. Three approaches are available for WSD, namely, Knowledge-based approach, Supervised...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.