The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In numerous IOT applications, large number of sensors and data receivers send information to server. The servers gather information that reaches huge amount in short time. In such cases, IOT applications can face the challenge of real time managing/displaying/extracting client useful information from the whole data stored on servers. Especially in critical situations, client's database query can take...
Proteogenomics is an emerging field of systems biology research at the intersection of proteomics and genomics. Two high-throughput technologies, Mass Spectrometry (MS) for proteomics and Next Generation Sequencing (NGS) machines for genomics are required to conduct proteogenomics studies. Independently both MS and NGS technologies are inflicted with data deluge which creates problems of storage,...
In our Big Data era, data is being generated, collected and analyzed at an unprecedented scale, and data-driven decision making is sweeping through all aspects of society. Recent studies have shown that poor quality data is prevalent in large databases and on the Web. Since poor quality data can have serious consequences on the results of data analyses, the importance of veracity, the fourth ‘V’ of...
One of the major challenges of the "Big Data" epoch is unstructured data mining. The problem arises due to the storage of high-dimensional data that has no standard schema. While knowledge discovery in database (KDD) algorithms were designed for data extraction, the algorithms best fit for structured data storages. Moreover, today, at the data storage level, NoSQL databases have been deployed...
Reduction of the number of association rules in data mining is a very important issue in the field of socially-aware computing in which big data need to be manipulated. The existing schemes based on the frequency of occurrences are not effective for relatively large size dataset. In this paper we propose the tabular-algorithm that assigns a weight to each rule for the removal of unimportant rules...
There has been an increasing interest in big data and big data security with the development of network technology and cloud computing. However, big data is not an entirely new technology but an extension of data mining. In this paper, we describe the background of big data, data mining and big data features, and propose attribute selection methodology for protecting the value of big data. Extracting...
Today a huge amount of geospatial data is being created, collected and used more than ever before. The ever increasing observations and measurements of geo-sensor networks, satellite imageries, point clouds from laser scanning, geospatial data of Location Based Services (LBS) and location-based social networks has become a serious challenge for data management and analysis systems. Traditionally,...
This paper introduces a service selection model with the service location considered. The location of a service represents its position in the network, which determines the transmission cost of calling this service in the composite service. The more concentrated the invoking services are, the less transmission time the composite service costs. On the other hand, the more and more popular big data...
As time goes on, the current running information system produces massive data. Obviously the previous hardware could not fit for managing the much bigger data any more. In order to upgrade the system, there are three main problems need to consider: enough storage space, high reliability, high performance and relatively low cost. As an open-source J2EE framework, SSH2 is widely used by developers to...
Recently developments in network, mining and data store technology have heightened the need for big data and big data security. In this paper, we focus on the big data's characteristic which takes seriously the analysis of value than the data itself. We express the relationship between attributes using nodes and edges. Through this, we propose a big data security hardening methodology by selecting...
Businesses and governments exploit big data without regard for issues of legality, data quality, disparate data meanings, and process quality. This often results in poor decisions, with individuals bearing the greatest risk. The threats harbored by big data extend far beyond the individual, however, and call for new legal structures, business processes, and concepts such as a Private Data Commons...
Big data is an emerging phenomenon characterized by the three Vs: volume, velocity, and variety. The volume of data has increased from terabytes to petabytes and is encroaching on exabytes. Some pundits are suggesting that zettabytes (1021) are reachable within the next several years. Velocity is concerned with not only how fast we accumulate data, but also how fast some of the data that we already...
MapReduce has shown vigorous vitality and penetrated both academia and industry in recent years. MapReduce not only can be used as an ETL tool, it can do even much more. The technique has been applied to SQL summation, OLAP, data mining, machine learning, information retrieval, multimedia data processing, science data processing etc. Basically MapReduce is a general purpose parallel computing framework...
This paper examines the needs of emerging applications of High Performance Computing by the Humanities, Arts, and Social Sciences (HASS) disciplines and presents a vision for how the current academic HPC environment could be adapted to better serve this new class of “big data” research.
A large amount of stakeholders are often involved in Smart Grid projects. Each partner has its own way of storing, representing and accessing its data. An integrated data storage and a joint online analytical mining infrastructure is needed to limit the amount of duplicated work and to raise the overall security of the system. The proposed infrastructure is composed of standard application software...
This paper complements our previous results in the context of effectively and efficient designing Parallel Relational Data Warehouses (PRDW) over heterogenous database clusters, which are represented by the proposal of a methodology called Fragmentation & Allocation (F& A). The main merit of F& A is that of combining the fragmentation and the allocation phases simultaneously,...
In this paper, we introduce ReTSO, a reliable and efficient design for transactional support in large-scale storage systems. ReTSO uses a centralized scheme and implements snapshot isolation, a property that guarantees that read operations of a transaction read a consistent snapshot of the data stored. The centralized scheme of ReTSO enables a lock-free commit algorithm that prevents unre-leased locks...
Fast access to clinical data is necessary when performing real-time predictions of medical events. A clinical data repository (CDR) therefore requires an efficient format for storing data so it can meet the access demands of prediction algorithms for clinical decision support. We have developed a new hybrid entity–attribute–value (EAV) storage format for CDRs that is compared with the common simple...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.