Data Analytics in Bioinformatics: Data Science in Practice for Genomics Analysis Workflows

Kary A. C. S. Ocana; Vitor Silva; Daniel de Oliveira; Marta Mattoso

doi:10.1109/eScience.2015.50

Data Analytics in Bioinformatics: Data Science in Practice for Genomics Analysis Workflows

Ocana, Kary A. C. S., Silva, Vitor, Oliveira, Daniel de, Mattoso, Marta

Source

2015 IEEE 11th International Conference on e-Science > 322 - 331

Abstract

Workflow systems manage large-scale experiments and deliver a large volume of provenance data traces. The provenance repository of these systems contains information about the workflow execution, which allows for tracking and analyzing data transformations. However, provenance data may still be considered a black-box, when it comes to analyze the contents of resulting data files. Current solutions are focused on data transformation at coarse grain, they point to input and output files, but do not allow for exploring domain-specific data. Data analytics is essential for managing large-scale workflows executed in parallel, especially when tracking anomalous executions. In this paper, we present a data analytics approach, which is based on the use of provenance data enriched with domain-specific data coupled to a data mining tool. A real bioinformatics workflow was modeled and executed in parallel on top of Amazon clouds. It manipulates complex biological data, which is difficult to monitor like many other genomic workflows. We evaluate the benefits of using domain-specific data and provenance data for user steering while monitoring the execution with detailed filters, steering on specific conditions and performance evaluation. Results show that the provenance database coupled to workflow systems has an unexplored potential for raw data analytics, which may improve the user confidence and reduce overall execution time.

Identifiers

book e-ISBN :	978-1-4673-9325-6
DOI	10.1109/eScience.2015.50

Authors

Keywords

Databases Data analysis Data mining Runtime Bioinformatics Monitoring Phylogeny scientific workflows provenance analytics scientific experiments

Additional information

Data set: ieee

Publisher

IEEE

chapter

Read online
Download
Add to read later
Add to collection
Add to followed
Share

Export to bibliography


Assign to other user
	×
Wrong email address

INFONA - science communication portal

Data Analytics in Bioinformatics: Data Science in Practice for Genomics Analysis Workflows $("#expandableTitles").expandable();

Source

Abstract

Identifiers

Authors

User assignment

Assignment remove confirmation

You're going to remove this assignment. Are you sure?

Ocana, Kary A. C. S.

Silva, Vitor

Oliveira, Daniel de

Mattoso, Marta

Keywords

Additional information

Publisher

Share

Export to bibliography

Reporting an error / abuse

Sending the report failed

Accessibility options

Data Analytics in Bioinformatics: Data Science in Practice for Genomics Analysis Workflows