The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Many large-scale distributed storage systems deploy erasure coding to protect data from frequent server failures for cost reason. In most of these systems, newly inserted data is first replicated across different storage servers and then migrated to erasure coded. Although this offline encoding manner can improve data access before data is erasure coded for some storage systems, it helps little and...
Similarity-oriented services serve as a foundation in a wide range of data analytic applications such as machine learning, target advertising, and real-time decisions. Both industry and academia strive for efficient and scalable similarity discovery and querying techniques to handle massive, complex data records in the real world. In addition to performance, data security and privacy become an indispensable...
Emerging distributed in-memory computing frameworks, such as Apache Spark, can process a huge amount of cached data within seconds. This remarkably high efficiency requires the system to well balance data across tasks and ensure data locality. However, it is challenging to satisfy these requirements for applications that operate on a collection of dynamically loaded and evicted datasets. The dynamics...
Internet of Things (IoT) distributed secure data management system is characterized by authentication, privacy policies to preserve data integrity. Multi-phase security and privacy policies ensure confidentiality and trust between the users and service providers. In this regard, we present a novel Two-phase Incentive-based Secure Key (TISK) system for distributed data management in IoT. The proposed...
Given a set of files that show a certain degree of similarity, we consider a novel problem of performing data redundancy elimination across a set of distributed worker nodes in a shared-nothing in-memory big data analytic system. The redundancy elimination scheme is designed in a manner that is: (i) space-efficient: the total space needed to store the files is minimized and, (ii) access-isolation:...
In an Information-Centric Internet of Things (ICIoT) environment for big data sharing, IoT data can be cached throughout the network. Such distributed data caching poses a challenge on flexible authorization and identity verification. For fine-grained data access authorization in a distributed manner, Ciphertext-Policy Attribute-Based Encryption (CP-ABE) has been identified as a promising approach...
Elastic distributed storage systems have been increasingly studied in recent years because power consumption has become a major problem in data centers. Much progress has been made in improving the agility of resizing small- and large-scale distributed storage systems. However, most of these studies focus on metadata based distributed storage systems. On the other hand, emerging consistent hashing...
■ Presented BlueWall: Software Defined Network Management in Hybrid Enterprise Cloud Environments for managing firewall request when servers (or services) are created (via APIs). ■ Demonstrated hybrid network management design and self-service capabilities. ■ Discussed challenges arising in network management in the hybrid enterprise cloud environments.
In big data era, data are usually stored in databases for easy access and utilization, which are now woven into every aspect of our lives. However, traditional relational databases cannot address users' demands for quick data access and calculating, since they cannot process data in a distributed way. To tackle this problem, non-relational databases such as MongoDB have emerged up and been applied...
To execute cloud computing tasks over a data center hosting hundreds of thousands of server nodes, it is natural to distribute computations across the nodes to take advantage of parallel processing. However, as we allocate more computing resources and further distribute the computations, a large amount of intermediate data must be moved between consecutive computation stages among the nodes, causing...
Recent advance in geo-distributed systems has made distributed data processing possible, where tasks are decomposed into subtasks, deployed into multiple data centers and run in parallel. Compared to conventional approaches that process every task in a single datacenter resulting in high latency and large data aggregation, the geo-distributed cloud systems provide a highly available and more economic...
Recently, there has been a growing interest in enabling fast data analytics by leveraging system capabilities from large-scale high-performance computing (HPC) systems. OpenSHMEM is a popular run-time system on HPC systems that has been used for large-scale compute-intensive scientific applications. In this paper, we propose to leverage OpenSHMEM to design a distributed in-memory key-value store for...
The relevance of data created in or about the IoT has a strong reliance on the context, especially spatiotemporal context, of the device and application perceiving it. To ensure that applications perceive data items that are relevant to the current context, it is necessary to restrict when each item is available. To control an application's perceptions of data availability, data items are often put...
The increasing volume and importance of point-to-multipoint traffic in virtualized data centers means the deployment of IP multicast is increasingly attractive. However, concerns about the ability of switches and routers based on commodity hardware to support the conventional IP multicast control plane and data plane, especially when there are thousands of participants in the multicast group communication,...
Nowadays high performance computers (HPC) are used to solve increasingly complex problems and process larger amounts of data. The growing computational requirements of applications can be met by utilizing more compute nodes. However, the average I/O performance a compute node can utilize is reduced with increased number of nodes. The performance gap between computation and I/O has long been a primary...
Thanks to their high availability, scalability, and usability, cloud databases have become one of the dominant cloud services. However, since cloud users do not physically possess their data, data integrity may be at risk. In this paper, we present a novel protocol that utilizes crowdsourcing paradigm to provide practical data integrity assurance in key-value cloud databases. The main advantage of...
For distributed complex event processing systems, handling high volume and continuous data streams with high throughput are required for further decision support. Due to the specific properties of pattern operators, it is difficult to process the data streams in parallel over complex event processing systems. To address the issue, a novel parallel processing strategy is proposed. The proposed method...
Big Data has become a pervasive technology to manage the ever-increasing volumes of data. Among Big Data solutions, scalable data stores play an important role, especially, key-value data stores due to their large scalability (thousands of nodes). The typical workflow for Big Data applications include two phases. The first one is to load the data into the data store typically as part of an ETL (Extract-Transform-Load)...
In extremely connected and dynamic environments, such as data centers, SDN network devices can be exploited to simplify the management of network provisioning. However, they leverage on TCAMs to implement the flow tables, i.e., on size-limited memories that can be quickly filled up when fine-grained traffic control is required, eventually preventing the installation of new forwarding rules. In this...
The demand for multi-dimensional range query over Distributed Ordered Table (DOT) has become increasingly popular, however, the DOT does not support queries very well other than the primary key. One solution to this problem is indexing. Many indexing techniques are focusing on how to improve the query ability, but do not care about the consistency between the index table and base data table. This...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.