2016 IEEE International Conference on Big Data (Big Data)

chapter

Mix ‘n’ match multi-engine analytics

Katerina Doka, Nikolaos Papailiou, Victor Giannakouris, Dimitrios Tsoumakos, more

2016 IEEE International Conference on Big Data (Big Data) > 194 - 203

Current platforms fail to efficiently cope with the data and task heterogeneity of modern analytics workflows due to their adhesion to a single data and/or compute model. As a remedy, we present IReS, the Intelligent Resource Scheduler for complex analytics workflows executed over multi-engine environments. IReS is able to optimize a workflow with respect to a user-defined policy relying on cost and...

chapter

Harmonization of methods to facilitate reproducibility in medical data processing: Applications to diffusion tensor magnetic resonance imaging

Jeffrey Jenkins, Lin-Ching Chang, Elizabeth Hutchinson, M. Okan Irfanoglu, more

2016 IEEE International Conference on Big Data (Big Data) > 3992 - 3994

2016 IEEE International Conference on Big Data (Big Data)

Data and methodology sharing is essential for progression of scientific research. Several research groups have built tools for medical big data (MBD) processing applicable to Diffusion Tensor MRI (DTI) processing pipelines. In this paper, we propose a framework enabling methodology sharing (i.e. harmonization) to facilitate the reproducibility in DTI processing.

chapter

Cache-oblivious loops based on a novel space-filling curve

Christian Bohm, Martin Perdacher, Claudia Plant

2016 IEEE International Conference on Big Data (Big Data) > 17 - 26

2016 IEEE International Conference on Big Data (Big Data)

Modern microprocessors offer a rich memory hierarchy including various levels of cache and registers. Some of these memories (like main memory, L3 cache) are big but slow and shared among all cores. Others (registers, L1 cache) are fast and exclusively assigned to a single core but small. Only if the data accesses have a high locality, we can avoid excessive data transfers between the memory hierarchy...

chapter

Phishing detection based on newly registered domains

Xueni Li, Guanggang Geng, Zhiwei Yan, Yong Chen, more

2016 IEEE International Conference on Big Data (Big Data) > 3685 - 3692

2016 IEEE International Conference on Big Data (Big Data)

Phishing is a security attack that involves the creation of websites that mimic legitimate websites, and these fraud websites bring Internet users a lot of loss. Traditional anti-phishing methods usually worked in a passive way by receiving report data of user. Due to the growing shorter survival time of phishing, this kind of methods is not efficient enough to find and take down new phishing attacks...

chapter

Drug target path discovery on semantic biomedical big data

Fang Du, Ting Li, Yingjie Shi, Lijuan Song, more

2016 IEEE International Conference on Big Data (Big Data) > 3381 - 3386

2016 IEEE International Conference on Big Data (Big Data)

Systems chemical biology integrate chemistry, biology and computation tools as a whole system, which can help researchers to deeply study the interaction and relationship among small molecules, such as genes, proteins, targets, compounds and so on. With systems chemical biology, researchers can concentrate on new way of drug discovery, including drug target path discovery, which can not only help...

chapter

Distributed rank-1 dictionary learning: Towards fast and scalable solutions for fMRI big data analytics

Milad Makkie, Xiang Li, Tianming Liu, Shannon Quinn, more

2016 IEEE International Conference on Big Data (Big Data) > 3396 - 3403

2016 IEEE International Conference on Big Data (Big Data)

The use of functional brain imaging for research and diagnosis has benefitted greatly from the recent advancements in neuroimaging technologies, as well as the explosive growth in size and availability of fMRI data. While it has been shown in literature that using multiple and large scale fMRI datasets can improve reproducibility and lead to new discoveries, the computational and informatics systems...

chapter

Swarm Intelligence (SI) based profiling and scheduling of big data applications

Thamarai Selvi Somasundaram, Kannan Govindarajan, Vivekanandan Suresh Kumar

2016 IEEE International Conference on Big Data (Big Data) > 1875 - 1880

2016 IEEE International Conference on Big Data (Big Data)

Personalization targets a user's software, hardware, and QoS requirements at any given moment in the cloud environment for the big data applications. However, the individualization aims to target the daily needs of an individual user in a dynamic manner. The proposed research work aims to design a system which will be able to optimize user's applications towards a specified target goal. Furthermore,...

chapter

Event segmentation using MapReduce based big data clustering

M. Omair Shafiq

2016 IEEE International Conference on Big Data (Big Data) > 1857 - 1866

2016 IEEE International Conference on Big Data (Big Data)

Event segmentation is an important step in monitoring and management applications that categorizes different events into different segments. This is important especially when applications, to be monitored and managed, are large-scale, comprehensive and data-intensive in nature. The process of segmentation is based on data clustering which is one of the key data mining methods used these days. There...

chapter

Data quality: Experiences and lessons from operationalizing big data

Archana Ganapathi, Yanpei Chen

2016 IEEE International Conference on Big Data (Big Data) > 1595 - 1602

2016 IEEE International Conference on Big Data (Big Data)

Data quality issues pose a significant barrier to operationalizing big data. They pertain to the meaning of the data, the consistency of that meaning, the human interpretation of results, and the contexts in which the results are used. Data quality issues arise after organizations have moved past clear-cut technical solutions to early bottlenecks in using data. Left unaddressed, such issues can and...

chapter

UStore: An optimized storage system for enterprise data warehouses at UnionPay

Hongfeng Chai, Hao Liu, Xibo Zhou, Yanjun Xu, more

2016 IEEE International Conference on Big Data (Big Data) > 1574 - 1578

2016 IEEE International Conference on Big Data (Big Data)

UnionPay's inter-bank transaction settlement platform (ITSP) generates a huge amount of bankcard transaction data everyday, recording different bankcard activities. In order to unleash the business value of these data, UnionPay has built a customized data warehouse based on Hadoop to manage and query the massive data imported from ITSP. However, the original system suffers from low storage utilization...

chapter

Hidden Markov based anomaly detection for water supply systems

Zahra Zohrevand, Uwe Glasser, Hamed Yaghoubi Shahir, Mohammad A. Tayebi, more

2016 IEEE International Conference on Big Data (Big Data) > 1551 - 1560

2016 IEEE International Conference on Big Data (Big Data)

Considering the fact that fully immunizing critical infrastructure such as water supply or power grid systems against physical and cyberattacks is not feasible, it is crucial for every public or private sector to invigorate the detective, predictive, and preventive mechanisms to minimize the risk of disruptions, resource loss or damage. This paper proposes a methodical approach to situation analysis...

chapter

Identifying dynamic changes with noisy labels in spatial-temporal data: A study on large-scale water monitoring application

Xiaowei Jia, Xi Chen, Anuj Karpatne, Vipin Kumar

2016 IEEE International Conference on Big Data (Big Data) > 1328 - 1333

2016 IEEE International Conference on Big Data (Big Data)

The need for effective change detection is ever growing with more emerging large-scale spatial-temporal datasets that contain gridded time series data. To detect meaningful changing events with respect to our desired characteristics, in this paper we focus on the post-classification change detection problem which aims to apply change detection techniques on the time series of classification outputs...

chapter

Persistent cascades: Measuring fundamental communication structure in social networks

Steven Morse, Marta C. Gonzalez, Natasha Markuzon

2016 IEEE International Conference on Big Data (Big Data) > 969 - 975

2016 IEEE International Conference on Big Data (Big Data)

We define a new structural property of large-scale communication networks consisting of the persistent patterns of communication among users. We term these patterns “persistent cascades,” and claim they represent a strong estimate of actual information spread. Using metrics of inexact tree matching, we group these cascades into classes which we then argue represent the communication structure of a...

chapter

Labeling actors in multi-view social networks by integrating information from within and across multiple views

Ngot Bui, Thanh Le, Vasant Honavar

2016 IEEE International Conference on Big Data (Big Data) > 616 - 625

2016 IEEE International Conference on Big Data (Big Data)

Real world social networks typically consist of actors (individuals) that are linked to other actors or different types of objects via links of multiple types. Different types of relationships induce different views of the underlying social network. We consider the problem of labeling actors in such multi-view networks based on the connections among them. Given a social network in which only a subset...

chapter

Estimation of local subgraph counts

Nesreen K. Ahmed, Theodore L. Willke, Ryan A. Rossi

2016 IEEE International Conference on Big Data (Big Data) > 586 - 595

2016 IEEE International Conference on Big Data (Big Data)

Graphlets represent small induced subgraphs and are becoming increasingly important for a variety of applications. Despite the importance of the local subgraph (graphlet) counting problem, existing work focuses mainly on counting graphlets globally over the entire graph. These global counts have been used for tasks such as graph classification as well as for understanding and summarizing the fundamental...

chapter

A comparison of general-purpose distributed systems for data processing

Jinfeng Li, James Cheng, Yunjian Zhao, Fan Yang, more

2016 IEEE International Conference on Big Data (Big Data) > 378 - 383

2016 IEEE International Conference on Big Data (Big Data)

General-purpose distributed systems for data processing become popular in recent years due to the high demand from industry for big data analytics. However, there is a lack of comprehensive comparison among these systems and detailed analysis on their performance. In this paper, we conduct an extensive performance study on four state-of-the-art general-purpose distributed computing systems. Our results...

chapter

Java thread and process performance for parallel machine learning on multicore HPC clusters

Saliya Ekanayake, Supun Kamburugamuve, Pulasthi Wickramasinghe, Geoffrey C. Fox

2016 IEEE International Conference on Big Data (Big Data) > 347 - 354

2016 IEEE International Conference on Big Data (Big Data)

The growing use of Big Data frameworks on large machines highlights the importance of performance issues and the value of High Performance Computing (HPC) technology. This paper looks carefully at three major frameworks Spark, Flink and Message Passing Interface (MPI) both in scaling across nodes and internally over the many cores inside modern nodes. We focus on the special challenges of the Java...

chapter

Understanding computational web archives research methods using research objects

Emily Maemura, Christoph Becker, Ian Milligan

2016 IEEE International Conference on Big Data (Big Data) > 3250 - 3259

2016 IEEE International Conference on Big Data (Big Data)

Use of computational methods for exploration and analysis of web archives sources is emerging in new disciplines such as digital humanities. This raises urgent questions about how such research projects process web archival material using computational methods to construct their findings. This paper aims to enable web archives scholars to document their practices systematically to improve the transparency...

chapter

Appraising digital archives with Archivematica

Michael Shallcross

2016 IEEE International Conference on Big Data (Big Data) > 3272 - 3276

2016 IEEE International Conference on Big Data (Big Data)

The Bentley Historical Library, funded by a generous grant from the Andrew W. Mellon Foundation, has developed a new Appraisal and Arrangement tab in the Archivematica digital preservation system as part of its “ArchivesSpace-Archivematica-DSpace Workflow Integration” project. This new functionality permits users to conduct large-scale appraisal of digital archives as part of a largely automated workflow...

chapter

Using big data to enhance the bosch production line performance: A Kaggle challenge

Ankita Mangal, Nishant Kumar

2016 IEEE International Conference on Big Data (Big Data) > 2029 - 2035

2016 IEEE International Conference on Big Data (Big Data)

This paper describes our approach to the Bosch production line performance challenge run by Kaggle.com. Maximizing the production yield is at the heart of the manufacturing industry. At the Bosch assembly line, data is recorded for products as they progress through each stage. Data science methods are applied to this huge data repository consisting records of tests and measurements made for each component...

INFONA - science communication portal

2016 IEEE International Conference on Big Data (Big Data)

Mix ‘n’ match multi-engine analytics

Harmonization of methods to facilitate reproducibility in medical data processing: Applications to diffusion tensor magnetic resonance imaging

Cache-oblivious loops based on a novel space-filling curve

Phishing detection based on newly registered domains

Drug target path discovery on semantic biomedical big data

Distributed rank-1 dictionary learning: Towards fast and scalable solutions for fMRI big data analytics

Swarm Intelligence (SI) based profiling and scheduling of big data applications

Event segmentation using MapReduce based big data clustering

Data quality: Experiences and lessons from operationalizing big data

UStore: An optimized storage system for enterprise data warehouses at UnionPay

Hidden Markov based anomaly detection for water supply systems

Identifying dynamic changes with noisy labels in spatial-temporal data: A study on large-scale water monitoring application

Persistent cascades: Measuring fundamental communication structure in social networks

Labeling actors in multi-view social networks by integrating information from within and across multiple views

Estimation of local subgraph counts

A comparison of general-purpose distributed systems for data processing

Java thread and process performance for parallel machine learning on multicore HPC clusters

Understanding computational web archives research methods using research objects

Appraising digital archives with Archivematica

Using big data to enhance the bosch production line performance: A Kaggle challenge

Filter options

Publication date

Keywords

INFONA - science communication portal

2016 IEEE International Conference on Big Data (Big Data) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2016 IEEE International Conference on Big Data (Big Data)