2016 IEEE International Conference on Big Data (Big Data)

chapter

The technical hashtag in Twitter data: A hadoop experience

Izabela Moise

2016 IEEE International Conference on Big Data (Big Data) > 3519 - 3528

The continuously growing wealth of data has radically changed the data science landscape. At the same time, Big Data tools have known important progress in terms of optimising performance and scalability. However, applying them into practical deployment settings is still a challenging task that is highly dependent on the particularities of the data. In this paper, we present our experiences with implementing...

chapter

Label propagation in big data to detect remote access Trojans

Sai C. Pallaprolu, Josephine M. Namayanja, Vandana P. Janeja, C. T. Sai Adithya

2016 IEEE International Conference on Big Data (Big Data) > 3539 - 3547

2016 IEEE International Conference on Big Data (Big Data)

Remote Access Trojans (RATs) provide cyber criminals with unlimited access to infected endpoints. Using the victim's access privileges, they can access and steal sensitive business and personal data including intellectual property and, personally identifiable information. However due to attack evolution, targeted attacks utilize modified versions of known signatures, which means that IDS rules that...

chapter

Exploring the utilization of places through a scalable “Activities in Places” analysis mechanism

Linlin You, Bige Tuncer

2016 IEEE International Conference on Big Data (Big Data) > 3563 - 3572

2016 IEEE International Conference on Big Data (Big Data)

People sensing data have been successfully utilized in various domains to support a more livable place with on-demand transport system, green environment, profitable economy and interactive governance, however, their potentials in supporting the design of places are not widely studied and explained. As an on-going multidisciplinary project in Singapore, “Livable Places” mins valuable insights from...

chapter

A novel big-data processing framwork for healthcare applications: Big-data-healthcare-in-a-box

Fuad Rahman, Marvin Slepian, Ari Mitra

2016 IEEE International Conference on Big Data (Big Data) > 3548 - 3555

2016 IEEE International Conference on Big Data (Big Data)

Herein we present a novel big-data framework for healthcare applications. Healthcare data is well suited for bigdata processing and analytics because of the variety, veracity and volume of these types of data. In recent times, many areas within healthcare have been identified that can directly benefit from such treatment. However, setting up these types of architecture is not trivial. We present a...

chapter

Leveraging cloud data to mitigate user experience from ‘breaking bad’

Nicholas A. James, Arun Kejariwal, David S. Matteson

2016 IEEE International Conference on Big Data (Big Data) > 3499 - 3508

2016 IEEE International Conference on Big Data (Big Data)

Low latency and high availability of an app or a web service are key, amongst other factors, to the overall user experience (which in turn directly impacts the bottoniline). Exogenic and/or endogenic factors often give rise to breakouts in cloud data which makes maintaining high availability and delivering high performance very challenging. Existing breakout detection techniques are not suitable for...

chapter

Universal data discovery using atypicality

Anders Host-Madsen, Elyas Sabeti, Chad Walton, Su Jun Lim

2016 IEEE International Conference on Big Data (Big Data) > 3474 - 3483

2016 IEEE International Conference on Big Data (Big Data)

With the enormous amount of data generated through the internet and sensors, Internet of Things, it becomes too overwhelming for humans to examine it all. One solution is to reduce the data to a set of statistics. The perspective in this paper is the opposite, namely that most of this data is just background noise, and the interesting parts are those that deviate from background noise, the parts that...

chapter

Mortality prediction of ICU patients using lab test data by feature vector compaction & classification

Mohammad M. Masud, Abdel Rahman Al Harahsheh

2016 IEEE International Conference on Big Data (Big Data) > 3404 - 3411

2016 IEEE International Conference on Big Data (Big Data)

Close monitoring ICU patients is a necessity for health care providers. Prediction of mortality of ICU patients based on the monitored data is an active research area. If the probability of survival (or death) of a patient could be predicted early enough, proper and timely attention could be given to the patient, saving the patients life. Most of the existing work in this regard try to predict mortality...

chapter

Iterative unified clustering in big data

Vasundhara Misal, Vandana P. Janeja, Sai.C. Pallaprolu, Yelena Yesha, more

2016 IEEE International Conference on Big Data (Big Data) > 3412 - 3421

2016 IEEE International Conference on Big Data (Big Data)

We propose a novel iterative unified clustering algorithm for data with both continuous and categorical variables, in the big data environment. Clustering is a well-studied problem and finds several applications. However, none of the big data clustering works discuss the challenge of mixed attribute datasets, with both categorical and continuous attributes. We study an application in the health care...

chapter

Spatio-temporal interpolation methods for solar events metadata

Soukaina Filali Boubrahimi, Berkay Aydin, Dustin Kempton, Rafal Angryk

2016 IEEE International Conference on Big Data (Big Data) > 3149 - 3157

2016 IEEE International Conference on Big Data (Big Data)

This paper introduces three interpolation methods that enrich complex evolving region trajectories that are captured every day from numerous ground-based and space-based solar observatories. The interpolation module takes a trajectory as its input and generates an enriched trajectory with interpolated time-geometry pairs. we created three different interpolation techniques that are: MBR-Interpolation...

chapter

Detecting non-technical energy losses through structural periodic patterns in AMI data

Viktor Botev, Magnus Almgren, Vincenzo Gulisano, Olaf Landsiedel, more

2016 IEEE International Conference on Big Data (Big Data) > 3121 - 3130

2016 IEEE International Conference on Big Data (Big Data)

The introduction of Advanced Metering Infrastructures in electricity networks brings new means of dealing with issues influencing financial margins and system-safety problems, thanks to the information reported continuously by smart meters. Such an issue is the detection of Non-Technical Losses (NTLs) in electric power grids. We introduce a datadriven method, called Structure&Detect, to identify...

chapter

Parallel computation of k-nearest neighbor joins using MapReduce

Wooyeol Kim, Younghoon Kim, Kyuseok Shim

2016 IEEE International Conference on Big Data (Big Data) > 696 - 705

2016 IEEE International Conference on Big Data (Big Data)

The k-nearest neighbor (kNN) join has recently attracted considerable attention due to its broad applications. However, processing fcNN joins is very expensive due to the quadratic nature of the join operation. Furthermore, since there is an increasing trend of applications to deal with big data, computing fcNN joins becomes more challenging. In order to process such big data, parallel and distributed...

chapter

Improved methods for static index pruning

Wei Jiang, Juan Rodriguez, Torsten Suel

2016 IEEE International Conference on Big Data (Big Data) > 686 - 695

2016 IEEE International Conference on Big Data (Big Data)

Static Index Pruning is a performance optimization technique for search engines that attempts to identify and remove index postings that are unlikely to lead to top results for typical user queries. The goal is to obtain a much smaller inverted index that can quickly return results that are (almost) as good as those for the unpruned index. We make two contributions: First, we improve on previous results...

chapter

Efficient processing of top-k joins in MapReduce

Mei Saouk, Christos Doulkeridis, Akrivi Vlachou, Kjetil Norvag

2016 IEEE International Conference on Big Data (Big Data) > 570 - 577

2016 IEEE International Conference on Big Data (Big Data)

Top-k join is an essential tool for data analysis, since it enables selective retrieval of the k best combined results that come from multiple different input datasets. In the context of Big Data, processing top-k joins over huge datasets requires a scalable platform, such as the widely popular MapReduce framework. However, such a solution does not necessarily imply efficient processing, due to inherent...

chapter

Linked data view methodology and application to BIM alignment and interoperability

Holly Ferguson, Charles Vardeman, Jarek Nabrzyski

2016 IEEE International Conference on Big Data (Big Data) > 2626 - 2635

2016 IEEE International Conference on Big Data (Big Data)

Building Information Modeling needs better strategies for schema interoperability in order to begin solving some of the problems the building industry faces including discrepancies in simulation tool data, missing or incorrect data, and gaps in data sourcing transparency. Addressing these challenges so far has often resulted in further “siloed” translation tools that only work for the few formats...

chapter

High-performance design of apache spark with RDMA and its benefits on various workloads

Xiaoyi Lu, Dipti Shankar, Shashank Gugnani, Dhabaleswar K. D K Panda

2016 IEEE International Conference on Big Data (Big Data) > 253 - 262

2016 IEEE International Conference on Big Data (Big Data)

The in-memory data processing framework, Apache Spark, has been stealing the limelight for low-latency interactive applications, iterative and batch computations. Our early experience study [17] has shown that Apache Spark can be enhanced to leverage advanced features (e.g., RDMA) on highperformance networks (e.g., InfiniBand and RoCE) to improve the performance of shuffle phase. With the fast evolving...

chapter

I'll take that to go: Big data bags and minimal identifiers for exchange of large, complex datasets

Kyle Chard, Mike D'Arcy, Ben Heavner, Ian Foster, more

2016 IEEE International Conference on Big Data (Big Data) > 319 - 328

2016 IEEE International Conference on Big Data (Big Data)

Big data workflows often require the assembly and exchange of complex, multi-element datasets. For example, in biomedical applications, the input to an analytic pipeline can be a dataset consisting thousands of images and genome sequences assembled from diverse repositories, requiring a description of the contents of the dataset in a concise and unambiguous form. Typical approaches to creating datasets...

chapter

Intercloud brokerages based on PLS method for deploying infrastructures for big data analytics

Katsunori Miura, Tazro Ohta, Courtney Powell, Masaharu Munetomo

2016 IEEE International Conference on Big Data (Big Data) > 2097 - 2102

2016 IEEE International Conference on Big Data (Big Data)

This paper proposes an intercloud brokerage method for system infrastructure deployments of genomic big data analytics workflows. The proposed method utilizes a conjunction of universally quantified atomic formula to describe requirements given by users, and selects combinations of cloud services based on logical reasoning by the replacement of definite clause sets created from conjunction of the...

chapter

On-demand data analytics in HPC environments at leadership computing facilities: Challenges and experiences

John Harney, Seung-Hwan Lim, Sreenivas Sukumar, Dale Stansberry, more

2016 IEEE International Conference on Big Data (Big Data) > 2087 - 2096

2016 IEEE International Conference on Big Data (Big Data)

The construction of data analysis infrastructures that handle continuously accumulating data is quickly becoming an essential requirement for many organizations such as the U.S. Department of Energy (DOE). While DOE supports some of the largest computing facilities in the world, new analysis infrastructures like Apache Spark are difficult to implement. In this paper, we propose an on-demand Spark...

chapter

Data-at-rest security for spark

Syed Yousaf Shah, Brent Paulovicks, Petros Zerfos

2016 IEEE International Conference on Big Data (Big Data) > 1464 - 1473

2016 IEEE International Conference on Big Data (Big Data)

Apache Spark enables fast computations and greatly accelerates analytics applications by efficiently utilizing the main memory and caching data for later use. At its core Apache Spark uses data structures called RDDs (Resilient Distributed Datasets) to give a unified view to the distributed data. However, the data represented in the RDDs remain unencrypted which can result in leakage of confidential...

chapter

The state of SQL-on-Hadoop in the cloud

Nicolas Poggi, Josep Ll. Berral, Thomas Fenech, David Carrera, more

2016 IEEE International Conference on Big Data (Big Data) > 1432 - 1443

2016 IEEE International Conference on Big Data (Big Data)

Managed Hadoop in the cloud, especially SQL-on-Hadoop, has been gaining attention recently. On Platform-as-a-Service (PaaS), analytical services like Hive and Spark come pre-configured for general-purpose and ready to use. Thus, giving companies a quick entry and on-demand deployment of ready SQL-like solutions for their big data needs. This study evaluates cloud services from an end-user perspective,...

INFONA - science communication portal

2016 IEEE International Conference on Big Data (Big Data)

The technical hashtag in Twitter data: A hadoop experience

Label propagation in big data to detect remote access Trojans

Exploring the utilization of places through a scalable “Activities in Places” analysis mechanism

A novel big-data processing framwork for healthcare applications: Big-data-healthcare-in-a-box

Leveraging cloud data to mitigate user experience from ‘breaking bad’

Universal data discovery using atypicality

Mortality prediction of ICU patients using lab test data by feature vector compaction & classification

Iterative unified clustering in big data

Spatio-temporal interpolation methods for solar events metadata

Detecting non-technical energy losses through structural periodic patterns in AMI data

Parallel computation of k-nearest neighbor joins using MapReduce

Improved methods for static index pruning

Efficient processing of top-k joins in MapReduce

Linked data view methodology and application to BIM alignment and interoperability

High-performance design of apache spark with RDMA and its benefits on various workloads

I'll take that to go: Big data bags and minimal identifiers for exchange of large, complex datasets

Intercloud brokerages based on PLS method for deploying infrastructures for big data analytics

On-demand data analytics in HPC environments at leadership computing facilities: Challenges and experiences

Data-at-rest security for spark

The state of SQL-on-Hadoop in the cloud

Filter options

Publication date

Keywords

INFONA - science communication portal

2016 IEEE International Conference on Big Data (Big Data) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2016 IEEE International Conference on Big Data (Big Data)