The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Clinical Decision Support (CDS) is widely seen as an information retrieval (IR) application in the medical domain. The goal of CDS is to help physicians find useful information from a collection of medical articles with respect to the given patient records, in order to take the best care of their patients. Most of the existing CDS methods do not sufficiently consider the semantic relation between...
Clinical Decision Support (CDS) can be regarded as an information retrieval (IR) task, where medical records are used to retrieve the full-text biomedical articles to satisfy the information needs from physicians, aiming at better medical solutions. Recent attempts have introduced the advances of deep learning by employing neural IR methods for CDS, where, however, only the document-query relationship...
Data representation is a fundamental task in machine learning, which affects the performance of the whole machine learning system. In the past few years, with the rapid development of deep learning, the models for word embedding based on neural networks have brought new inspiration to the research of natural language processing. In this paper, two kinds of schemes for improving the Continuous Bag-of-Words...
Apache Spark is an open source distributed data processing platform, which can use distributed memory abstraction to process large volume of data efficiently. With the application of Apache Spark more and more widely, some problems are exposed. One of the most important aspects is the performance problem. Apache Spark has more than 180 configuration parameters, which can be adjusted by users according...
In recent years, the development of Internet enables the rapid growth of global data volume, the arrival of the era of big data has brought great challenges to the traditional computing. Big Data systems, such as hadoop, spark, are becoming important platforms to handle big data, but due to design flaws of big data application itself, and unreasonable distributed framework configuration, the performance...
Learning to rank plays a very important role in information retrieval. Existing works mainly focus on applying one ranking model to all samples, which may not be suitable for the reality. In this paper, a new method for learning to rank based on query-level vector extraction is proposed, in which we assume that all samples can be divided into multiple parts, and each part is used to train one set...
Recently, as the building block of deep generative models such as Deep Belief Networks (DBNs), Restricted Boltzmann Machines (RBMs) have attracted much attention. RBM is a Markov Random Field (MRF) associated with a bipartite undirected graph which is famous for powerful expression and tractable inference. While training an RBM, we need to sample from the model. The larger the mixing rate is, the...
Recently, HPC in the Cloud has emerged as a new paradigm in the field of parallel computing. Most of cloud systems deploy virtual machines for provisioning resources. However, in a virtual machine environment, there is still no mature method to analyze performance of MPI parallel programs. In this paper, we propose a series of innovative methods for performance analysis of MPI parallel programs on...
Bayesian Network parameter learning is one of the core issues of Bayesian Network research. The parameter estimation of Bayesian Network from large incomplete dataset can be very compute-intensive. A factor graph based Bayesian Network parameter learning algorithm using MapReduce is presented in this paper, which decomposes one Bayesian Network into factors and gets the Bayesian Network parameter...
Nowadays, private clouds are widely used for resource sharing. Hadoop-based clusters are the most popular implementations for private clouds. However, because workload traces are not publicly available, few previous work compares and evaluates different cloud solutions with publicly available benchmarks. In this paper, we use a recently-released Cloud benchmarks suite — CloudRank-D to quantitatively...
The growth of computing and storage needs of several scientific applications mandate the deployment of extreme-scale parallel machines, such as Blue Gene/L, Spirit, Liberty, Red Storm and etc. One of the challenges when designing and deploying these systems in a production setting is the need to take failure occurrences into account. In this paper, an offline analysis framework of cluster system logs...
The number of files stored in a personal computer is increasing very quickly, so it is difficult for users to find the information they want. One desktop search engine named SoDesktop is proposed in this paper, which is composed of four modules including Data crawler, Task scheduler, Data indexer and Data searcher. The implementations of these four modules are described in details, and the implementation...
Topic Detection and Tracking (TDT) has been studied for years, but most existing research is oriented to news web pages. Compared to news web pages, texts in Bulletin Board System (BBS) are more complicated and filled with user participation. In this paper, we propose a novel method of TDT for BBS, which mainly includes: a representation posts selection procedure based on post quality ranking and...
Metadata is the command center in the process of data warehousing,and which is very helpful to data ETL (Extraction, Transformation and Loading), data storage management,data analysis and data mining.CWM (Common Warehouse Meta-model) is an open industry metadata standard,and which is widely used currently,public meta-models and its rules defined in CWM can properly support data transformation and...
An improved K-medoids clustering algorithm (IKMC) to resolve the problem of detecting the near-duplicated records is proposed in this paper. It considers every record in database as one separate data object, uses edit-distance method and the weights of attributes to get similarity value among records, then detect duplicated records by clustering these similarity value. This algorithm can automatically...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.