Serwis Infona wykorzystuje pliki cookies (ciasteczka). Są to wartości tekstowe, zapamiętywane przez przeglądarkę na urządzeniu użytkownika. Nasz serwis ma dostęp do tych wartości oraz wykorzystuje je do zapamiętania danych dotyczących użytkownika, takich jak np. ustawienia (typu widok ekranu, wybór języka interfejsu), zapamiętanie zalogowania. Korzystanie z serwisu Infona oznacza zgodę na zapis informacji i ich wykorzystanie dla celów korzytania z serwisu. Więcej informacji można znaleźć w Polityce prywatności oraz Regulaminie serwisu. Zamknięcie tego okienka potwierdza zapoznanie się z informacją o plikach cookies, akceptację polityki prywatności i regulaminu oraz sposobu wykorzystywania plików cookies w serwisie. Możesz zmienić ustawienia obsługi cookies w swojej przeglądarce.
The curse of dimensionality refers to the problem that one faces when analyzing datasets with thousands or hundreds of thousands of attributes. This problem is usually tackled by different feature selection methods which have been shown to effectively reduce computation time, improve prediction performance, and facilitate better understanding of datasets in various application areas. These methods...
Electrocardiography (ECG) signals are widely used to gauge the health of the human heart, and the resulting time series signal is often analyzed manually by a medical professional to detect any arrhythmia that the patient may have suffered. Much work has been done to automate the process of analyzing ECG signals, but most of the research involves extensive preprocessing of the ECG data to derive vectorized...
Protein-protein interaction (PPI) networks are valuable biological data source which contain rich information useful for protein function prediction. The PPI network data set obtained from high-throughput experiments is known to be noisy and incomplete. By modeling PPI data as a graph, research efforts are being made in the literature to improve the performance of protein function prediction by extending...
Big data clustering is one of the recently challenging tasks that is used in many application domains. Traditional clustering methods are not able to deal with large-scale of data. Furthermore, Big data are often characterized by the mixed type of data, including numerical and categorical attributes. Thus, we propose in this paper the parallelization of k-prototypes clustering method (MR-KP) using...
Many multidimensional hashing schemes have been actively studied in recent years, providing efficient nearest neighbor search. Generally, we can distinguish several hashing families, such as learning based hashing, which provides better hash function selectivity by learning the dataset distribution. The spacial hashing family proposes a suitable partition of the multidimensional space, more adapted...
Advancement in sequence data generation technologies are churning out voluminous omics data and posing a massive challenge to annotate the biological functional features. Sequence data from the well studied model organism Saccharomyces cerevisiae has been commonly used to test and validate in silico prediction methods. DNA replication is a critical step in the cellular process and the sequence location...
The use of data mining has led to many significant medical discoveries. However, many challenges still exist in using these methods for knowledge discovery within this field given that the large amounts of data medical practitioners collect often creates a curse of dimensionality. To address this challenge, attribute selection approaches have been developed. However, current approaches typically put...
The automatic detection of sarcasm and irony in user generated contents is one of the most challenging task of Natural Language Processing. In this paper we address this problem by introducing Bayesian Model Averaging (BMA), an ensemble approach to take into account several classifiers according to their reliabilities and their marginal probability predictions. The impact of the most used expressive...
Many websites presently provide the facility for users to rate items quality based on user opinion. These ratings are used later to produce item reputation scores. The majority of websites apply the mean method to aggregate user ratings. This method is very simple and is not considered as an accurate aggregator. Many methods have been proposed to make aggregators produce more accurate reputation scores...
With the rapidly growing of real-time social media, like Twitter, many users share and discuss their interest topics through such platforms. Hashtag is a type of metadata tag which allows users to annotate their topics of tweets. For research usage, for example, hashtags can help the performance of event detection by observing the trend of hashtags. Although Twitter grows rapidly, hashtag growth is...
In Web environment, in order to provide appropriate Web services to users' needs it becomes important to quickly and accurately extract from Web documents contents such as main-content, menu-list, article-list, comments and so on. In this paper, we propose an efficient method that extracts various contents from Web documents. In the method, text blocks are separated from the document and context information...
Growing trend of using spatial information in various domains has increased the need for spatial data analysis. As spatial data analysis involves the study of interaction between spatial objects, Probabilistic Relational Models (PRMs) can be a good choice for modeling probabilistic dependencies between such objects. However, standard PRMs do not support spatial objects. Here, we present a general...
Due to the increasing number of vehicles in recent years, traffic congestion problem is a common issue for residents of metropolises. For a better understanding of traffic congestion, the analyzed data from big data technology can be provided as timeline information. However, a scalability problem would occur when we convert raw traffic data into the timeline information due to the volume and complexity...
Ever since the advent of online social networking, people have been voluntarily posting and consuming information on the web. This new method to communicate digitally provides the means to spread information considerably far in a very short span of time with minimal resources. Social networks are increasingly being used to spread misinformation online due to low-costs in organizing grassroots of these...
The pervasive availability of increasingly powerful mobile computing devices like PDAs, smartphones and wearable sensors, is widening their use in complex applications such as collaborative analysis, information sharing, and data mining in a mobile context. Energy characterization plays a critical role in determining the requirements of data-intensive applications that can be efficiently executed...
Globally-covered ocean monitoring system Argo with more than 3,600 small and light-weight drifting buoys is always working for oceanic temperature and salinity measurement. The accumulated big ocean observation data helps many studies such as investigation into climate change mechanism. Although human experts visually confirm and revise quality control (QC) labels, it is difficult to regularize the...
In blood transfusion studies, its is often desirable before a surgical procedure to estimate the likelihood of a patient bleeding, need for blood products, re-operation due to bleeding and other important patient outcomes. Such prediction rules are crucial in allowing for optimal planning, more efficient use of blood bank resources, and identification of high-risk patient cohort for specific perioperative...
In this work we investigate the use of parametric statistical methods for Anomaly Detection in time series data. The approach involves the use of simple and computationally efficient algorithms, the Cumulative Sum (CUSUM) and Exponentially Weighted Moving Average (EWMA), that have demonstrated an acceptable performance in detecting different shifts from the process mean. However, while the performance...
The ability to detect adverse drug events (ADEs) in electronic health records (EHRs) is useful in many medical applications, such as alerting systems that indicate when an ADE-specific diagnosis code should be assigned. Automating the detection of ADEs can be attempted by applying machine learning to existing, labeled EHR data. How to do this in an effective manner is, however, an open question. The...
The enormous amounts of data that are continuously recorded in electronic health record systems offer ample opportunities for data science applications to improve healthcare. There are, however, challenges involved in using such data for machine learning, such as high dimensionality and sparsity, as well as an inherent heterogeneity that does not allow the distinct types of clinical data to be treated...
Podaj zakres dat dla filtrowania wyświetlonych wyników. Możesz podać datę początkową, końcową lub obie daty. Daty możesz wpisać ręcznie lub wybrać za pomocą kalendarza.