2015 IEEE International Conference on Big Data (Big Data)

chapter

Using Word2Vec to process big text data

Long Ma, Yanqing Zhang

2015 IEEE International Conference on Big Data (Big Data) > 2895 - 2897

Big data is a broad data set that has been used in many fields. To process huge data set is a time consuming work, not only due to its big volume of data size, but also because data type and structure can be different and complex. Currently, many data mining and machine learning technique are being applied to deal with big data problem; some of them can construct a good learning algorithm in terms...

chapter

Mining the relation between dorm arrangement and student performance

Man Li, Ruisheng Shi

2015 IEEE International Conference on Big Data (Big Data) > 2344 - 2347

2015 IEEE International Conference on Big Data (Big Data)

This paper discusses the relation between dorm arrangement and student performance. One of the unsupervised learning algorithms, k-means algorithm, is mainly used in the process of analysis. Students are clustered into several clusters according to their similarity of performance scores. This paper analyzes the result of clustering by comparing it with actual dorm arrangement. In the end, drawbacks...

chapter

An optimized interestingness hotspot discovery framework for large gridded spatio-temporal datasets

Fatih Akdag, Christoph F. Eick

2015 IEEE International Conference on Big Data (Big Data) > 2010 - 2019

2015 IEEE International Conference on Big Data (Big Data)

We define interestingness hotspots as contiguous regions in space which are interesting based on a domain expert's notion of interestingness captured by an interestingness function. This paper centers on finding interestingness hotspots on very large gridded datasets which are quite common in scientific computing. Mining large gridded datasets with a lot of variables and measurements requires a scalable...

chapter

In-situ analytics for tomographic imaging in sensor network

Goutham Kamath, Wen-Zhan Song

2015 IEEE International Conference on Big Data (Big Data) > 2173 - 2176

2015 IEEE International Conference on Big Data (Big Data)

In both industry and academia, the seismic exploration does not yet have the capability of illuminating the physical dynamics with high resolution and in real-time. The major bottleneck in real-time monitoring today is to transfer large volume of raw data for post processing. Although computation capacity and sampling rate of sensors have increased exponentially, we still have challenges in terms...

chapter

Detecting rumor patterns in streaming social media

Shihan Wang, Takao Terano

2015 IEEE International Conference on Big Data (Big Data) > 2709 - 2715

2015 IEEE International Conference on Big Data (Big Data)

Rumor detection in streaming social media is a significant but challenging problem. In this paper, we present a method to identify rumor patterns in the streaming social media environment. Patterns which combine both structural and behavioral properties of rumor are firstly proposed to distinguish false rumors from valid news. A novel graph-based pattern matching algorithm is also described to detect...

chapter

KeyLabel algorithms for keyword search in large graphs

Yue Wang, Ke Wang, Ada Wai-Chee Fu, Raymond Chi-Wing Wong

2015 IEEE International Conference on Big Data (Big Data) > 857 - 864

2015 IEEE International Conference on Big Data (Big Data)

Graph keyword search is the process of extracting small subgraphs that contain a set of query keywords from a graph. This problem is challenging because there are many constraints, including distance constraint, keyword constraint, search time constraint, index size constraint, and memory constraint, while the size of data is inflating at a very high speed nowadays. Existing greedy algorithms guarantee...

chapter

An interactive learning framework for scalable classification of pathology images

Michael Nalisnik, David A Gutman, Jun Kong, Lee A D Cooper

2015 IEEE International Conference on Big Data (Big Data) > 928 - 935

2015 IEEE International Conference on Big Data (Big Data)

Recent advances in microscopy imaging and genomics have created an explosion of patient data in the pathology domain. Whole-slide images (WSIs) of tissues can now capture disease processes as they unfold in high resolution, recording the visual cues that have been the basis of pathologic diagnosis for over a century. Each WSI contains billions of pixels and up to a million or more microanatomic objects...

chapter

Join algorithms on GPUs: A revisit after seven years

Ran Rui, Hao Li, Yi-Cheng Tu

2015 IEEE International Conference on Big Data (Big Data) > 2541 - 2550

2015 IEEE International Conference on Big Data (Big Data)

Implementing database operations on parallel platforms has gain a lot of momentum in the past decade. A number of studies have shown the potential of using GPUs to speed up database operations. In this paper, we present empirical evaluations of a state-of-the-art work published in SIGMOD'08 on GPU-based join processing. In particular, this work presents four major join algorithms and a number of join-related...

chapter

Scalable k-NN based text clustering

Alessandro Lulli, Thibault Debatty, Matteo Dell'Amico, Pietro Michiardi, more

2015 IEEE International Conference on Big Data (Big Data) > 958 - 963

2015 IEEE International Conference on Big Data (Big Data)

Clustering items using textual features is an important problem with many applications, such as root-cause analysis of spam campaigns, as well as identifying common topics in social media. Due to the sheer size of such data, algorithmic scalability becomes a major concern. In this work, we present our approach for text clustering that builds an approximate k-NN graph, which is then used to compute...

chapter

A novel initialization method for particle swarm optimization-based FCM in big biomedical data

Chanpaul J. Wang, Hua Fang, Chonggang Wang, Mahmoud Daneshmand, more

2015 IEEE International Conference on Big Data (Big Data) > 2942 - 2944

2015 IEEE International Conference on Big Data (Big Data)

Based on empirical studies, the feature of random initialization in Particle Swarm Optimization (PSO) based Fuzzy c-means (FCM) methods affects the computational performance especially in big data. As the data points in high-density areas are more likely near the cluster centroids, we design a new algorithm to guide the initialization according to the data density patterns. Our algorithm is initialized...

chapter

Data optimised computing for heterogeneous big data computing applications

Erica Yang, Derek Ross, Srikanth Nagella, Martin Turner, more

2015 IEEE International Conference on Big Data (Big Data) > 2817 - 2819

2015 IEEE International Conference on Big Data (Big Data)

The rise of big science techniques is reshaping the provisioning of computing resources and scientific software in large science facilities. As facilities are gearing up for data intensive computing infrastructure, a wave of facility-based big science computing platforms is emerging. This paper presents a new computing paradigm towards designing HPC data analysis platform, named Data Optimised Computing...

chapter

Discovering time-evolving influence from dynamic heterogeneous graphs

Chuan Hu, Huiping Cao

2015 IEEE International Conference on Big Data (Big Data) > 2253 - 2262

2015 IEEE International Conference on Big Data (Big Data)

Influence among objects prevalently exists in graph structured data. However, most existing research efforts detect influence among objects from snapshots of homogeneous graphs. In this paper, we study a new problem of detecting time-evolving influence among objects from dynamic heterogeneous graphs. We propose a probabilistic graphical model, Time-evolving Influence Model (TIM), to capture the temporal...

chapter

Genomic analysis with MapReduce

Wei Yi Liu, Hui-I Hsiao, Shih Yao Dai

2015 IEEE International Conference on Big Data (Big Data) > 1330 - 1335

2015 IEEE International Conference on Big Data (Big Data)

Genomic analysis [1] usually includes a pipeline of three stages: sequence alignment, data conversion, and advanced analysis. The analysis pipeline needs to handle hundreds of gigabytes of data as well as to run complex analytics algorithms, which traditionally takes long execution time (20+ hours) for a full genomes analysis. Parallelizing the execution of analytics algorithms is one way to speed...

chapter

Spatio-temporal queries in HBase

Xiaoying Chen, Chong Zhang, Bin Ge, Weidong Xiao

2015 IEEE International Conference on Big Data (Big Data) > 1929 - 1937

2015 IEEE International Conference on Big Data (Big Data)

Geoscience gives insights into our surroundings and benefits many aspects of our life. Nowadays, with massive sensors deployed to sense all kinds of parameters for environments, tens of billions, even trillions of sensed data are collected and need to be analyzed for surveillance or other purposes. From many perspectives, users always issue queries according to specific spatial and temporal predicates...

chapter

Fast detection of material deformation through structural dissimilarity

Daniela Ushizima, Talita Perciano, Dilworth Parkinson

2015 IEEE International Conference on Big Data (Big Data) > 2775 - 2781

2015 IEEE International Conference on Big Data (Big Data)

Designing materials that are resistant to extreme temperatures and brittleness relies on assessing structural dynamics of samples. Algorithms are critically important to characterize material deformation under stress conditions. Here, we report on our design of coarse-grain parallel algorithms for image quality assessment based on structural information and on crack detection of gigabyte-scale experimental...

chapter

Visual analysis of large-scale LiDAR point clouds

Wanbo Luo, Hui Zhang

2015 IEEE International Conference on Big Data (Big Data) > 2487 - 2492

2015 IEEE International Conference on Big Data (Big Data)

In this study we analyzed a series of LiDAR point clouds acquired over Taijiang district (part of Fujian province, China). The objective was to detect and extract water surface area from individual LiDAR point cloud, in a parallel means. To this end, interactive visualization of fine-grained data, global cluster algorithms, and statistical investigation were applied. We first rasterized point clouds...

chapter

Practical message-passing framework for large-scale combinatorial optimization

Inho Cho, Soya Park, Sejun Park, Dongsu Han, more

2015 IEEE International Conference on Big Data (Big Data) > 24 - 31

2015 IEEE International Conference on Big Data (Big Data)

Graphical Model (GM) has provided a popular framework for big data analytics because it often lends itself to distributed and parallel processing by utilizing graph-based ‘local’ structures. It models correlated random variables where in particular, the max-product Belief Propagation (BP) is the most popular heuristic to compute the most-likely assignment in GMs. In the past years, it has been proven...

chapter

Robust and distributed web-scale near-dup document conflation in microsoft academic service

Chieh-Han Wu, Yang Song

2015 IEEE International Conference on Big Data (Big Data) > 2606 - 2611

2015 IEEE International Conference on Big Data (Big Data)

In modern web-scale applications that collect data from different sources, entity conflation is a challenging task due to various data quality issues. In this paper, we propose a robust and distributed framework to perform conflation on noisy data in the Microsoft Academic Service dataset. Our framework contains two major components. In the offline component, we train a GBDT model to determine whether...

chapter

Top (k1, k2) Distance-based outliers detection in an uncertain dataset

Fei Liu, Yan Jia

2015 IEEE International Conference on Big Data (Big Data) > 2290 - 2299

2015 IEEE International Conference on Big Data (Big Data)

In this paper, we focus on distance-based outliers detection in an uncertain dataset, which is very useful in large social network. Based on the x-tuple model and the possible world semantics, we propose the concept of tuple outlier score, top k\ probability and top (k1, k2) distance-based outlier. We then design an algorithm using dynamic programming technique to calculate tuple outlier scores and...

chapter

Open research challenges with Big Data — A data-scientist's perspective

Sreenivas R. Sukumar

2015 IEEE International Conference on Big Data (Big Data) > 1272 - 1278

2015 IEEE International Conference on Big Data (Big Data)

In this paper, we discuss data-driven discovery challenges of the Big Data era. We observe that recent innovations in being able to collect, access, organize, integrate, and query massive amounts of data from a wide variety of data sources have brought statistical data mining and machine learning under more scrutiny and evaluation for gleaning insights from the data than ever before. In that context,...

INFONA - science communication portal

2015 IEEE International Conference on Big Data (Big Data)

Using Word2Vec to process big text data

Mining the relation between dorm arrangement and student performance

An optimized interestingness hotspot discovery framework for large gridded spatio-temporal datasets

In-situ analytics for tomographic imaging in sensor network

Detecting rumor patterns in streaming social media

KeyLabel algorithms for keyword search in large graphs

An interactive learning framework for scalable classification of pathology images

Join algorithms on GPUs: A revisit after seven years

Scalable k-NN based text clustering

A novel initialization method for particle swarm optimization-based FCM in big biomedical data

Data optimised computing for heterogeneous big data computing applications

Discovering time-evolving influence from dynamic heterogeneous graphs

Genomic analysis with MapReduce

Spatio-temporal queries in HBase

Fast detection of material deformation through structural dissimilarity

Visual analysis of large-scale LiDAR point clouds

Practical message-passing framework for large-scale combinatorial optimization

Robust and distributed web-scale near-dup document conflation in microsoft academic service

Top (k1, k2) Distance-based outliers detection in an uncertain dataset

Open research challenges with Big Data — A data-scientist's perspective

Filter options

Publication date

Keywords

INFONA - science communication portal

2015 IEEE International Conference on Big Data (Big Data) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2015 IEEE International Conference on Big Data (Big Data)