The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This work addresses the problem of re-projecting a terabyte-sized 3D data set represented as a set of 2D Deep Zoom pyramids. In general, a re-projection for small 3D data sets is executed directly in RAM. However, RAM becomes a limiting factor for terabyte-sized 3D volumes formed by a stack of hundreds of megapixel to gigapixel 2D frames. We have benchmarked three methods to perform the re-projection...
Cloudera Cloudera provides enterprises with ∗the∗ big data platform for next generation data management and analytics. This new platform allows companies to perform more flexible analysis on more types of data and in greater volumes. Amr Awadallah, CTO/Founder at Cloudera, will cover the key underlying patterns for how Hadoop is transforming the way organizations manage and derive value from data.
Microsoft Academic Search is a free search engine specific to scholarly material. It currently covers more than 50 million publications and over 19 million authors across a variety of domains. One of the main challenges in correctly indexing this material is author name ambiguity and the resulting noise in author profiles. KDD Cup 2013 invited participants to tackle this problem in 2 ways: (1) by...
Asynchronous discussion forums are one of the artifacts of the internet age. They occur in a wide variety of applications from distance learning to technical support. Technical support forums have also proliferated in enterprises, and today form a salient feature of many technical interactions in large enterprises. Two interconnected example applications where such forums may be employed are the following:...
The MapReduce framework has become the de facto choice for big data analysis in a variety of applications. In MapReduce programming model, computation is distributed to a cluster of computing nodes that runs in parallel. The performance of a MapReduce application is thus affected by system and middleware, characteristics of data, and design and implementation of the algorithms. In this study, we focus...
We are dealing with large-scale high-dimensional image data sets requiring new approaches for data mining where visualization plays the main role. Dimension reduction (DR) techniques are widely used to visualize high-dimensional data. However, the information loss due to reducing the number of dimensions is the drawback of DRs. In this paper, we introduce a novel metric to assess the quality of DRs...
Given a power network consisting of nodes (generators/loads) and edges (lines), there exist a set of constraints that must be satisfied in order for the system to be operational. When one or more power lines are cut, the bus phases and load/generator power values may need to be altered in order to restore the system to operation. The load shedding problem is to find the smallest adjustment to the...
The paper considers the impact of changing code parameters on the network load, for some given storage-flexible Data Center Network (DCN), i.e. such DCN in which the reliability and the storage volume can be modified during the storage life of the DCN data. Two regimes of the network load are considered: transition (during the migration process) and stationary (at the end of the migration process)...
A Knowledge Cube, or cube for short, is an intelligent and adaptive database instance capable of storing, analyzing, and searching data. Each cube is established based on semantic aspects, e.g., (1) Topical, (2) Contextual, (3) Spatial, or (4) Temporal. A cube specializes in handling data that is only relevant to the cube's semantics. Knowledge cubes are inspired by two prime architectures: (1) Dataspaces...
Relational databases are providing storage for several decades now. However for today's interactive web and mobile applications the importance of flexibility and scalability in data model can not be over-stated. The term NoSQL broadly covers all non-relational databases that provide schema-less and scalable model. NoSQL databases which are also termed as Internetage databases are currently being used...
We analyze lung transplant data from the United Network for Organ Sharing (UNOS) program with the aim of developing accurate risk prediction models for mortality within 1 year of lung transplant using data mining techniques. The data used in this study is de-identified and consists of 62 predictor attributes, and 1-year posttranplant survial outcome for patients who underwent lung transplant between...
The Ophidia project is a research effort addressing big data analytics requirements, issues, and challenges for eScience. We present here the Ophidia analytics framework, which is responsible for atomically processing, transforming and manipulating array-based data. This framework provides a common way to run on large clusters analytics tasks applied to big datasets. The paper highlights the design...
Modern database management systems (DBMS) have been designed to efficiently store, manage and perform computations on massive amounts of data. In contrast, many existing visualization systems do not scale seamlessly from small data sets to enormous ones. We have designed a three-tiered visualization system called ScalaR to deal with this issue. ScalaR dynamically performs resolution reduction when...
Massive graphs emerge in many real-world applications. Practitioners often find relational databases are inefficient in graph data management. In this paper, we investigate the efficiency issue by analyzing both I/O and CPU costs. First, we find the storage of a graph in relational DBMS violates the locality principle: graph queries will always reference neighbors; however, the data locations of neighbors...
In this paper, we test the robustness of emotion extraction from English language books published in the 20th century. Our analysis is performed on a sample of the 8 million digitized books available in the Google Books Ngram corpus by applying three independent emotion detection tools: WordNet Affect, Linguistic Inquiry and Word Count, and a recently proposed ‘Hedonometer’ method. We also assess...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.