The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Noise is a prominent challenge found in many bioinformatics datasets and it refers to erroneous or missing data. The presence of noise in gene expression datasets has adverse effects on machine-learning techniques, such as supervised classification algorithms and feature selection techniques. Additionally, the identification of noise and its quantification are challenging tasks that require a proper...
Big data is a big business, and effective modeling of this data is key. This paper provides a comprehensive multidimensional analysis of various open source tools for machine learning with big data. An evaluation standard is proposed along with detailed comparisons of the frameworks discussed, with regard to algorithm availability, scalability, speed, and more. The major tools profiled are Mahout,...
With the rapid deployment of a number of sensors, it is crucial to efficiently manage their data streams with heterogeneous properties. To achieve various sensor applications such as discovery and mashup, a method of retrieving meaningful information from raw sensor data is required. However, it is hard to analyze and represent the sensor data since sensors generate streaming data of different patterns...
Faceted browsing has become ubiquitous with modern digital libraries and online search engines, yet the process is still difficult to abstractly model in a manner that supports the development of interoperable and reusable interfaces. We propose category theory as a theoretical foundation for faceted browsing and demonstrate how the interactive process can be mathematically abstracted. Existing efforts...
Ensemble learning is a powerful tool that has shown promise when applied towards bioinformatics datasets. In particular, the Random Forest classifier has been an effective and popular algorithm due to its relatively good classification performance and its ease of use. However, Random Forest does not account for class imbalance which is known for decreasing classification performance and increasing...
Devices, objects, and sensors are getting to connect with one another in the Internet of Things (IoT). Although there are existing models for representing sensors, there is lack of methods of integrating sensor data with domain knowledge to construct complex sensors. Semantic models for complex sensor mashups are required. In this paper, we present a complex sensor model that enables us to combine...
Choosing an appropriate cancer treatment is potentially the most important task in the treatment of a cancer patient. If it were possible to identify the best option for a patient (or at minimum to remove options that will not help the patient), then the general prognosis of the patient improves. However, this task becomes much more subtle due to characteristics such as high dimensionality found in...
Data mining and machine learning methods have been playing an important role in searching and retrieving multimedia information from all kinds of multimedia repositories. Although some of these methods have been proven to be useful, it is still an interesting and active research area to effectively and efficiently retrieve multimedia information under difficult scenarios, i.e., detecting rare events...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.