2015 IEEE International Conference on Big Data (Big Data)

chapter

Towards a subgraph/supergraph cached query-graph index

Jing Wang, Nikos Ntarmos, Peter Triantafillou

2015 IEEE International Conference on Big Data (Big Data) > 2919 - 2921

Many modern big data applications deal with graph structured data, such as databases of molecular compounds represented as graphs of atoms and bonds, or “structured interaction networks” in biological and social networks, where nodes refer to entities (proteins, people, etc.) and edges represent their relationships. Central to high performance graph analytics over such data, is to locate patterns...

chapter

Data deidentification in medical transcriptions using regular expressions and machine learning

Joshua Seeger, Aron Culotta, Jason Keller, Patrick van Kessel, more

2015 IEEE International Conference on Big Data (Big Data) > 1322 - 1323

2015 IEEE International Conference on Big Data (Big Data)

A system is developed to redact personally identifiable information (PII) through a combination of entity recognition, regular expressions, and machine learning with very high precision from millions of medical transcriptions. This system is trained and tested with manually redacted medical transcriptions using an internally developed coding system, providing double blind classification capabilities.

chapter

Personalized expertise search at LinkedIn

Viet Ha-Thuc, Ganesh Venkataraman, Mario Rodriguez, Shakti Sinha, more

2015 IEEE International Conference on Big Data (Big Data) > 1238 - 1247

2015 IEEE International Conference on Big Data (Big Data)

Linkedln is the largest professional network with more than 350 million members. As the member base increases, searching for experts becomes more and more challenging. In this paper, we propose an approach to address the problem of personalized expertise search on LinkedIn, particularly for exploratory search queries containing skills. In the offline phase, we introduce a collaborative filtering approach...

chapter

Graph analytics using vertica relational database

Alekh Jindal, Samuel Madden, Malu Castellanos, Meichun Hsu

2015 IEEE International Conference on Big Data (Big Data) > 1191 - 1200

2015 IEEE International Conference on Big Data (Big Data)

Graph analytics is becoming increasingly popular, with a number of new applications and systems developed in the past few years. In this paper, we study Vertica relational database as a platform for graph analytics. We show that vertex-centric graph analysis can be translated to SQL queries, typically involving table scans and joins, and that modern column-oriented databases are very well suited to...

chapter

Klout score: Measuring influence across multiple social networks

Adithya Rao, Nemanja Spasojevic, Zhisheng Li, Trevor Dsouza

2015 IEEE International Conference on Big Data (Big Data) > 2282 - 2289

2015 IEEE International Conference on Big Data (Big Data)

In this work, we present the Klout Score, an influence scoring system that assigns scores to 750 million users across 9 different social networks on a daily basis. We propose a hierarchical framework for generating an influence score for each user, by incorporating information for the user from multiple networks and communities. Over 3600 features that capture signals of influential interactions are...

chapter

Genomic analysis with MapReduce

Wei Yi Liu, Hui-I Hsiao, Shih Yao Dai

2015 IEEE International Conference on Big Data (Big Data) > 1330 - 1335

2015 IEEE International Conference on Big Data (Big Data)

Genomic analysis [1] usually includes a pipeline of three stages: sequence alignment, data conversion, and advanced analysis. The analysis pipeline needs to handle hundreds of gigabytes of data as well as to run complex analytics algorithms, which traditionally takes long execution time (20+ hours) for a full genomes analysis. Parallelizing the execution of analytics algorithms is one way to speed...

chapter

Agile text mining with Sherlok

Renaud Richardet, Jean-Cedric Chappelier, Shreejoy Tripathy, Sean Hill

2015 IEEE International Conference on Big Data (Big Data) > 1479 - 1484

2015 IEEE International Conference on Big Data (Big Data)

The successful development of an intelligent text mining application requires the collaboration of two main stakeholders: subject matter experts and text miners. In this paper, we describe a new methodology, agile text mining to improve that collaboration. Agile text mining is characterized by short development cycles, frequent tasks redefinition and continuous performance monitoring through integration...

chapter

Twitter opinion mining for adverse drug reactions

Liang Wu, Teng-Sheng Moh, Natalia Khuri

2015 IEEE International Conference on Big Data (Big Data) > 1570 - 1574

2015 IEEE International Conference on Big Data (Big Data)

Although rigorous clinical studies are required before a drug is placed on the market, it is impossible to predict all side effects for the approved medication. The United States Food and Drug Administration actively monitors approved drugs to identify adverse events. The FDA Adverse Event Reporting System (FAERS) contains a database of adverse drug events reported by the healthcare providers and...

chapter

Scientific computing meets big data technology: An astronomy use case

Zhao Zhang, Kyle Barbary, Frank Austin Nothaft, Evan Sparks, more

2015 IEEE International Conference on Big Data (Big Data) > 918 - 927

2015 IEEE International Conference on Big Data (Big Data)

Scientific analyses commonly compose multiple single-process programs into a dataflow. An end-to-end dataflow of single-process programs is known as a many-task application. Typically, tools from the HPC software stack are used to parallelize these analyses. In this work, we investigate an alternate approach that uses Apache Spark — a modern big data platform — to parallelize many-task applications...

chapter

Big data gathering and mining pipelines for CRM using open-source

Kang Li, Vinay Deolalikar, Neeraj Pradhan

2015 IEEE International Conference on Big Data (Big Data) > 2936 - 2938

2015 IEEE International Conference on Big Data (Big Data)

Customer Relationship Management (CRM) is currently the fastest growing sector of enterprise software, estimated to increase to $36.5B worldwide by 2017. CRM technologies increasingly use data mining primitives across multiple applications. At the same time, the growth of big data has led to the evolution of an open source big data software stack (primarily powered by Apache software) that rivals...

chapter

Flexible ingest framework: A scalable architecture for dynamic routing through composable pipelines

Alexei Samoylov, Jason Schlachter

2015 IEEE International Conference on Big Data (Big Data) > 2843 - 2845

2015 IEEE International Conference on Big Data (Big Data)

In this paper we describe a flexible and scalable big data ingestion framework based on Apache Spark. It is flexible in that meta-information about the data is used to build custom processing pipelines at run-time. It is scalable in that it leverages Apache Spark with minimal additional overhead. These capabilities allow a user to setup custom big data processing pipelines capable of handling changing...

INFONA - science communication portal

2015 IEEE International Conference on Big Data (Big Data)

Towards a subgraph/supergraph cached query-graph index

Data deidentification in medical transcriptions using regular expressions and machine learning

Personalized expertise search at LinkedIn

Graph analytics using vertica relational database

Klout score: Measuring influence across multiple social networks

Genomic analysis with MapReduce

Agile text mining with Sherlok

Twitter opinion mining for adverse drug reactions

Scientific computing meets big data technology: An astronomy use case

Big data gathering and mining pipelines for CRM using open-source

Flexible ingest framework: A scalable architecture for dynamic routing through composable pipelines

Filter options

Publication date

Keywords

INFONA - science communication portal

2015 IEEE International Conference on Big Data (Big Data) $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

2015 IEEE International Conference on Big Data (Big Data)