The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The application of large-scale electric vehicles (EVs) into the future smart grid may bring about serious power quality problems. But EVs can provide ancillary services for the power system as distributed energy resources through Vehicle-to-grid (V2G) technology. The fast and accurate prediction of schedulable capacity (SC) of EVs is the first step to enable this benefit. In this paper, two different...
Privacy is an important issue for big data including sensitive attributes. In the case of directly sharing or publishing these data, privacy breach occurs. In order to overcome this problem, previous studies were focused on developing big data anonymization techniques on Hadoop environment. When compared to Hadoop, Spark facilitates to develop faster applications with the help of keeping data in memory...
The Big data analytics gives new chances to the enterprises to enhance their management and manufacturing levels. A solution with case study is proposed to accomplish deep-level quality management based on big data analytics. First, the implementation of big data analytics based on industrial process data is illustrated with case study illustration. Through the analysis and feature extraction of off-line...
The need of smart information retrieval systems is in contrast with the difficulties to deal with huge amount of data. In this paper we present a Big Data Analytics architecture used to implement a semantic similarity search tool for natural language texts in biomedical domain. The implemented methodology is based on Word Embeddings (WEs) models obtained using the word2vec algorithm. The system has...
Due to the imbalanced distribution of business data, missing of user features and many other reasons, directly using big data techniques on realistic business data tends to deviate from the business goals. It is difficult to model the insurance business data by classification algorithms like Logistic Regression and SVM etc. This paper exploits a heuristic bootstrap sampling approach combined with...
Due to its simplicity and scalability, MapReduce has become a de facto standard computing model for big data processing. Since the original MapReduce model was only appropriate for embarrassingly parallel batch processing, many follow-up studies have focused on improving the efficiency and performance of the model. Spark follows one of these recent trends by providing in-memory processing capability...
The IPTV video evaluation model based on big data is a beneficial basis for IPTV video evaluation. With the new media, social network, Internet of things and cloud computing continuing to evolve, the video-related big data arises at the historic moment. IPTV has also become the choice of more and more users. And IPTV editors are troubled by how to choose the best video for IPTV users. In this paper,...
Big Data can be defined as large data sets which are being generated from different sources like social media, audios, imaging, logging online websites etc. A need exists to process and analyze this huge amount of data to extract meaningful information. This can be a challenging task. Big data exceeds the processing capability of traditional databases to capture, manage, and process the voluminous...
Big data is a broad term with numerous dimensions, most notably: big data characteristics, techniques, software systems, application domains, computing platforms, and big data milieu (industry, government, and academia). In this paper we briefly introduce fundamental big data characteristics and then present seven case studies of big data techniques, systems, applications, and platforms, as seen from...
Moving object prediction and indexing have been a well studied area of research and include applications in environment monitoring, traffic prediction, advertising, and efficient routing. Spark is a cluster computing framework, which utilizes Resilient Distributed Datasets (RDD) on a cluster of several commodity machines. Spark is popularly used for parallel processing of massive datasets. The modeling...
Data analytics becomes increasingly important in big data applications. Adaptively subsetting large amounts of data to extract the interesting events such as the centers of hurricane or thunderstorm, statistically analyzing and visualizing the subset data, is an effective way to analyze ever-growing data. This is particularly crucial for analyzing Earth Science data, such as extreme weather. The Hadoop...
In this paper, we focus on designing an online credit card fraud detection framework with big data technologies, by which we want to achieve three major goals: 1) the ability to fuse multiple detection models to improve accuracy, 2) the ability to process large amount of data and 3) the ability to do the detection in real time. To accomplish that, we propose a general workflow, which satisfies most...
Topic modeling is a widely used approach for analyzing large text collections. In particular, Latent Dirichlet Allocation (LDA) is one of the most popular topic modeling approaches to aggregate vocabulary from a document corpus to form latent "topics". However, learning meaningful topic models with massive document collections which contain millions of documents, billions of tokens is challenging,...
Spark has grown both in popularity and complexity in recent years. In order to use available resources in an efficient way, users need to understand how the behavior of their applications is affected by the size of the datasets and various configuration settings. Indeed, Spark allows users to specify many configuration parameters and understanding the impact of these choices with respect to the application...
With the explosive growth of semantic data on the Web over the past years, many large-scale RDF knowledge bases with billions of facts are generating. This poses significant challenges for the storage and query of big RDF graphs. Current systems still have many limitations in processing big RDF graphs including scalability and real-time. In this paper, we introduce the SparkRDF, an elastic discreted...
Big data is a big business, and effective modeling of this data is key. This paper provides a comprehensive multidimensional analysis of various open source tools for machine learning with big data. An evaluation standard is proposed along with detailed comparisons of the frameworks discussed, with regard to algorithm availability, scalability, speed, and more. The major tools profiled are Mahout,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.