2015 IEEE International Conference on Big Data (Big Data)

book

2015 IEEE International Conference on Big Data (Big Data)

IEEE

chapter

Hotspots of news articles: Joint mining of news text & social media to discover controversial points in news

Ismini Lourentzou, Graham Dyer, Abhishek Sharma, ChengXiang Zhai

2015 IEEE International Conference on Big Data (Big Data) > 2948 - 2950

2015 IEEE International Conference on Big Data (Big Data)

We propose and study a novel problem of mining news text and social media jointly to discover controversial points in news, which enables many applications such as highlighting controversial points in news articles for readers, revealing controversies in news and their trends over time, and quantifying the controversy of a news source. We design a controversy scoring function to discover the most...

chapter

Improving the quality of semantic relationships extracted from massive user behavioral data

Khalifeh AlJadda, Mohammed Korayem, Trey Grainger

2015 IEEE International Conference on Big Data (Big Data) > 2951 - 2953

2015 IEEE International Conference on Big Data (Big Data)

As the ability to store and process massive amounts of user behavioral data increases, new approaches continue to arise for leveraging the wisdom of the crowds to gain insights that were previously very challenging to discover by text mining alone. For example, through collaborative filtering, we can learn previously hidden relationships between items based upon users' interactions with them, and...

chapter

Algorithmic content generation for products

Chandra Khatri, Suman Voleti, Sathish Veeraraghavan, Nish Parikh, more

2015 IEEE International Conference on Big Data (Big Data) > 2945 - 2947

2015 IEEE International Conference on Big Data (Big Data)

Content is one of the most essential parts of products on e-commerce websites such as eBay. It not only drives user-engagement but also traffic from various search engine websites based on the relevance. Generating the content for the products, however comes with a wide set of challenges, due to the complexity of commerce at scale, and requires new applications in text processing and information extraction...

chapter

Towards a subgraph/supergraph cached query-graph index

Jing Wang, Nikos Ntarmos, Peter Triantafillou

2015 IEEE International Conference on Big Data (Big Data) > 2919 - 2921

2015 IEEE International Conference on Big Data (Big Data)

Many modern big data applications deal with graph structured data, such as databases of molecular compounds represented as graphs of atoms and bonds, or “structured interaction networks” in biological and social networks, where nodes refer to entities (proteins, people, etc.) and edges represent their relationships. Central to high performance graph analytics over such data, is to locate patterns...

chapter

Factorization machines with follow-the-regularized-leader for CTR prediction in display advertising

Anh-Phuong Ta

2015 IEEE International Conference on Big Data (Big Data) > 2889 - 2891

2015 IEEE International Conference on Big Data (Big Data)

Predicting ad click-through rates is the core problem in display advertising, which has received much attention from the machine learning community in recent years. In this paper, we present an online learning algorithm for click-though rate prediction, namely Follow-The-Regularized-Factorized-Leader (FTRFL), which incorporates the Follow-The-Regularized-Leader (FTRL-Proximal) algorithm with per-coordinate...

chapter

Integrating semantic knowledge into Tag-LDA model through cloud model

Maoyuan Zhang, Fang Yuan, Jianping Zhu

2015 IEEE International Conference on Big Data (Big Data) > 2907 - 2909

2015 IEEE International Conference on Big Data (Big Data)

Semantic Knowledge is usually adding into topic model to improve topic coherence. However, it's hard to judge whether semantic information is related to topic without using complicated lexical characteristics. In this paper, we demonstrate a novel model called Cloud Transformation Model, which can easily judge whether semantic information is related to topic, and integrate semantic information into...

chapter

Using probabilistic approach to joint clustering and statistical inference: Analytics for big investment data

Hua Fang, Honggang Wang, Chonggang Wang, Mahmoud Daneshmand

2015 IEEE International Conference on Big Data (Big Data) > 2916 - 2918

2015 IEEE International Conference on Big Data (Big Data)

This paper proposes a Contrarian Probabilistic Model (CPM) to evaluate the effectiveness of contrarians' investment in preferred stocks using big data from Tradeline. CPM accommodates the unique features of investment data which are often correlated, nested, heterogeneous, non-normal with missing values. The clustering and statistical inference are integrated in CPM, which enables joint investment...

chapter

Using Word2Vec to process big text data

Long Ma, Yanqing Zhang

2015 IEEE International Conference on Big Data (Big Data) > 2895 - 2897

2015 IEEE International Conference on Big Data (Big Data)

Big data is a broad data set that has been used in many fields. To process huge data set is a time consuming work, not only due to its big volume of data size, but also because data type and structure can be different and complex. Currently, many data mining and machine learning technique are being applied to deal with big data problem; some of them can construct a good learning algorithm in terms...

chapter

Inferring bike trip patterns from bike sharing system open data

Longbiao Chen, Jeremie Jakubowicz

2015 IEEE International Conference on Big Data (Big Data) > 2898 - 2900

2015 IEEE International Conference on Big Data (Big Data)

Understanding bike trip patterns in a bike sharing system is important for researchers designing models for station placement and bike scheduling. By bike trip patterns, we refer to the large number of bike trips observed between two stations. However, due to privacy and operational concerns, bike trip data are usually not made publicly available. In this paper, instead of relying on time-consuming...

chapter

Mining the relation between dorm arrangement and student performance

Man Li, Ruisheng Shi

2015 IEEE International Conference on Big Data (Big Data) > 2344 - 2347

2015 IEEE International Conference on Big Data (Big Data)

This paper discusses the relation between dorm arrangement and student performance. One of the unsupervised learning algorithms, k-means algorithm, is mainly used in the process of analysis. Students are clustered into several clusters according to their similarity of performance scores. This paper analyzes the result of clustering by comparing it with actual dorm arrangement. In the end, drawbacks...

chapter

Finding banded patterns in big data using sampling

Fatimah B Abdullahi, Frans Coenen, Russell Martin

2015 IEEE International Conference on Big Data (Big Data) > 2233 - 2242

2015 IEEE International Conference on Big Data (Big Data)

A mechanism for identifying bandings in large "zero-one" N-dimensional data sets, using a sampling technique, is presented. The challenge of identifying bandings in data is the large number of potential permutations that need to be considered. To circumvent this a banding score mechanism is proposed that avoids the need to consider large numbers of permutations. This has been incorporated...

chapter

Scalable preference queries for high-dimensional data using map-reduce

Gheorghi Guzun, Joel E. Tosado, Guadalupe Canahuate

2015 IEEE International Conference on Big Data (Big Data) > 2243 - 2252

2015 IEEE International Conference on Big Data (Big Data)

Preference (top-k) queries play a key role in modern data analytics tasks. Top-k techniques rely on ranking functions in order to determine an overall score for each of the objects across all the relevant attributes being examined. This ranking function is provided by the user at query time, or generated for a particular user by a personalized search engine which prevents the pre-computation of the...

chapter

An optimized interestingness hotspot discovery framework for large gridded spatio-temporal datasets

Fatih Akdag, Christoph F. Eick

2015 IEEE International Conference on Big Data (Big Data) > 2010 - 2019

2015 IEEE International Conference on Big Data (Big Data)

We define interestingness hotspots as contiguous regions in space which are interesting based on a domain expert's notion of interestingness captured by an interestingness function. This paper centers on finding interestingness hotspots on very large gridded datasets which are quite common in scientific computing. Mining large gridded datasets with a lot of variables and measurements requires a scalable...

chapter

A proactive discovery and filtering solution on phishing websites

Lv Fang, Wang Bailing, Huang Junheng, Sun Yushan, more

2015 IEEE International Conference on Big Data (Big Data) > 2348 - 2355

2015 IEEE International Conference on Big Data (Big Data)

Phishing website is becoming a major threat to the information security in Social Network. The attacks not only lessen the users' trust but also influence the benefit of the third party who develops the platform. In order to solve the time lag in phishing website passive detection, this paper proposes a solution to discover phishing website initiatively based on blacklist, in which the anomalies of...

chapter

SciSpark: Applying in-memory distributed computing to weather event detection and tracking

Rahul Palamuttam, Renato Marroquin Mogrovejo, Chris Mattmann, Brian Wilson, more

2015 IEEE International Conference on Big Data (Big Data) > 2020 - 2026

2015 IEEE International Conference on Big Data (Big Data)

In this paper we present SciSpark, a Big Data framework that extends Apache™ Spark for scaling scientific computations. The paper details the initial architecture and design of SciSpark. We demonstrate how SciSpark achieves parallel ingesting and partitioning of earth science satellite and model datasets. We also illustrate the usability and extensibility of SciSpark by implementing aspects of the...

chapter

In-situ analytics for tomographic imaging in sensor network

Goutham Kamath, Wen-Zhan Song

2015 IEEE International Conference on Big Data (Big Data) > 2173 - 2176

2015 IEEE International Conference on Big Data (Big Data)

In both industry and academia, the seismic exploration does not yet have the capability of illuminating the physical dynamics with high resolution and in real-time. The major bottleneck in real-time monitoring today is to transfer large volume of raw data for post processing. Although computation capacity and sampling rate of sensors have increased exponentially, we still have challenges in terms...

chapter

Towards a taxonomy of standards in smart data

Alexander Lenk, Leif Bonorden, Astrid Hellmanns, Nico Roedder, more

2015 IEEE International Conference on Big Data (Big Data) > 1749 - 1754

2015 IEEE International Conference on Big Data (Big Data)

The usage of large amounts of data has an immense potential for global economic growth and the competitiveness of countries with high technological standards. Vast amounts of data from different sources are collected and analyzed in order to seek economic profit and competitive advantages for companies and society in general. To gain profit from such data, it needs to be analyzed, processed, and interpreted...

chapter

Mixed-initiative social media analytics at the World Bank: Observations of citizen sentiment in Twitter data to explore "trust" of political actors and state institutions and its relationship to social protest

Nadya A. Calderon, Brian Fisher, Jeff Hemsley, Billy Ceskavich, more

2015 IEEE International Conference on Big Data (Big Data) > 1678 - 1687

2015 IEEE International Conference on Big Data (Big Data)

This paper discusses a project that studied the relationship between citizen trust and social protest using visual analysis of approximately 11 million sentiment classified Tweets from the period of the 2014 Brazilian World Cup. The results of the study reveal that the 2014 World Cup protests in Brazil sprang from a wide range of grievances coupled with a relative sense of deprivation compared with...

chapter

Marlin: Taming the big streaming data in large scale video similarity search

Nan Zhu, Wenbo He, Yu Hua, Yixin Chen

2015 IEEE International Conference on Big Data (Big Data) > 1755 - 1764

2015 IEEE International Conference on Big Data (Big Data)

The extreme volume and staggeringly increasing rate inevitably produce unprecedented pressure on any large scale video sharing and hosting systems. Among the efforts to mitigate this pressure, content-based video similarity search is becoming more and more important with the exponential growth of the data size. Though various approaches have been proposed to address this problem, they are mainly focusing...

INFONA - science communication portal

2015 IEEE International Conference on Big Data (Big Data)