The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Current platforms fail to efficiently cope with the data and task heterogeneity of modern analytics workflows due to their adhesion to a single data and/or compute model. As a remedy, we present IReS, the Intelligent Resource Scheduler for complex analytics workflows executed over multi-engine environments. IReS is able to optimize a workflow with respect to a user-defined policy relying on cost and...
Data and methodology sharing is essential for progression of scientific research. Several research groups have built tools for medical big data (MBD) processing applicable to Diffusion Tensor MRI (DTI) processing pipelines. In this paper, we propose a framework enabling methodology sharing (i.e. harmonization) to facilitate the reproducibility in DTI processing.
Modern microprocessors offer a rich memory hierarchy including various levels of cache and registers. Some of these memories (like main memory, L3 cache) are big but slow and shared among all cores. Others (registers, L1 cache) are fast and exclusively assigned to a single core but small. Only if the data accesses have a high locality, we can avoid excessive data transfers between the memory hierarchy...
Phishing is a security attack that involves the creation of websites that mimic legitimate websites, and these fraud websites bring Internet users a lot of loss. Traditional anti-phishing methods usually worked in a passive way by receiving report data of user. Due to the growing shorter survival time of phishing, this kind of methods is not efficient enough to find and take down new phishing attacks...
Systems chemical biology integrate chemistry, biology and computation tools as a whole system, which can help researchers to deeply study the interaction and relationship among small molecules, such as genes, proteins, targets, compounds and so on. With systems chemical biology, researchers can concentrate on new way of drug discovery, including drug target path discovery, which can not only help...
The use of functional brain imaging for research and diagnosis has benefitted greatly from the recent advancements in neuroimaging technologies, as well as the explosive growth in size and availability of fMRI data. While it has been shown in literature that using multiple and large scale fMRI datasets can improve reproducibility and lead to new discoveries, the computational and informatics systems...
Personalization targets a user's software, hardware, and QoS requirements at any given moment in the cloud environment for the big data applications. However, the individualization aims to target the daily needs of an individual user in a dynamic manner. The proposed research work aims to design a system which will be able to optimize user's applications towards a specified target goal. Furthermore,...
Event segmentation is an important step in monitoring and management applications that categorizes different events into different segments. This is important especially when applications, to be monitored and managed, are large-scale, comprehensive and data-intensive in nature. The process of segmentation is based on data clustering which is one of the key data mining methods used these days. There...
Data quality issues pose a significant barrier to operationalizing big data. They pertain to the meaning of the data, the consistency of that meaning, the human interpretation of results, and the contexts in which the results are used. Data quality issues arise after organizations have moved past clear-cut technical solutions to early bottlenecks in using data. Left unaddressed, such issues can and...
UnionPay's inter-bank transaction settlement platform (ITSP) generates a huge amount of bankcard transaction data everyday, recording different bankcard activities. In order to unleash the business value of these data, UnionPay has built a customized data warehouse based on Hadoop to manage and query the massive data imported from ITSP. However, the original system suffers from low storage utilization...
Considering the fact that fully immunizing critical infrastructure such as water supply or power grid systems against physical and cyberattacks is not feasible, it is crucial for every public or private sector to invigorate the detective, predictive, and preventive mechanisms to minimize the risk of disruptions, resource loss or damage. This paper proposes a methodical approach to situation analysis...
The need for effective change detection is ever growing with more emerging large-scale spatial-temporal datasets that contain gridded time series data. To detect meaningful changing events with respect to our desired characteristics, in this paper we focus on the post-classification change detection problem which aims to apply change detection techniques on the time series of classification outputs...
We define a new structural property of large-scale communication networks consisting of the persistent patterns of communication among users. We term these patterns “persistent cascades,” and claim they represent a strong estimate of actual information spread. Using metrics of inexact tree matching, we group these cascades into classes which we then argue represent the communication structure of a...
Real world social networks typically consist of actors (individuals) that are linked to other actors or different types of objects via links of multiple types. Different types of relationships induce different views of the underlying social network. We consider the problem of labeling actors in such multi-view networks based on the connections among them. Given a social network in which only a subset...
Graphlets represent small induced subgraphs and are becoming increasingly important for a variety of applications. Despite the importance of the local subgraph (graphlet) counting problem, existing work focuses mainly on counting graphlets globally over the entire graph. These global counts have been used for tasks such as graph classification as well as for understanding and summarizing the fundamental...
General-purpose distributed systems for data processing become popular in recent years due to the high demand from industry for big data analytics. However, there is a lack of comprehensive comparison among these systems and detailed analysis on their performance. In this paper, we conduct an extensive performance study on four state-of-the-art general-purpose distributed computing systems. Our results...
The growing use of Big Data frameworks on large machines highlights the importance of performance issues and the value of High Performance Computing (HPC) technology. This paper looks carefully at three major frameworks Spark, Flink and Message Passing Interface (MPI) both in scaling across nodes and internally over the many cores inside modern nodes. We focus on the special challenges of the Java...
Use of computational methods for exploration and analysis of web archives sources is emerging in new disciplines such as digital humanities. This raises urgent questions about how such research projects process web archival material using computational methods to construct their findings. This paper aims to enable web archives scholars to document their practices systematically to improve the transparency...
The Bentley Historical Library, funded by a generous grant from the Andrew W. Mellon Foundation, has developed a new Appraisal and Arrangement tab in the Archivematica digital preservation system as part of its “ArchivesSpace-Archivematica-DSpace Workflow Integration” project. This new functionality permits users to conduct large-scale appraisal of digital archives as part of a largely automated workflow...
This paper describes our approach to the Bosch production line performance challenge run by Kaggle.com. Maximizing the production yield is at the heart of the manufacturing industry. At the Bosch assembly line, data is recorded for products as they progress through each stage. Data science methods are applied to this huge data repository consisting records of tests and measurements made for each component...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.