Raymond Wan

chapter

Block Merging for Off-Line Compression

Raymond Wan, Alistair Moffat

Lecture Notes in Computer Science > Combinatorial Pattern Matching > 32-41

To bound memory consumption, most compression systems provide a facility that controls the amount of data that may be processed at once. In this work we consider the Re-Pair mechanism of [2000], which processes large messages as disjoint blocks. We show that the blocks emitted by Re-Pair can be post-processed to yield further savings, and describe techniques that allow files of 500 MB or more to be...

chapter

Efficient Probabilistic Latent Semantic Analysis through Parallelization

Raymond Wan, Vo Ngoc Anh, Hiroshi Mamitsuka

Lecture Notes in Computer Science > Information Retrieval Technology > Posters > 432-443

Probabilistic latent semantic analysis (PLSA) is considered an effective technique for information retrieval, but has one notable drawback: its dramatic consumption of computing resources, in terms of both execution time and internal memory. This drawback limits the practical application of the technique only to document collections of modest size. In this paper, we look into the practice of...

chapter

Term Impacts as Normalized Term Frequencies for BM25 Similarity Scoring

Vo Ngoc Anh, Raymond Wan, Alistair Moffat

Lecture Notes in Computer Science > String Processing and Information Retrieval > 51-62

The BM25 similarity computation has been shown to provide effective document retrieval. In operational terms, the formulae which form the basis for BM25 employ both term frequency and document length normalization. This paper considers an alternative form of normalization using document-centric impacts, and shows that the new normalization simplifies BM25 and reduces the number of tuning parameters...

chapter

Applying Gaussian Distribution-Dependent Criteria to Decision Trees for High-Dimensional Microarray Data

Raymond Wan, Ichigaku Takigawa, Hiroshi Mamitsuka

Lecture Notes in Computer Science > Data Mining and Bioinformatics > 40-49

Biological data presents unique problems for data analysis due to its high dimensions. Microarray data is one example of such data which has received much attention in recent years. Machine learning algorithms such as support vector machines (SVM) are ideal for microarray data due to its high classification accuracies. However, sometimes the information being sought is a list of genes which best separates...

INFONA - science communication portal

Search results for: Raymond Wan

Block Merging for Off-Line Compression

Efficient Probabilistic Latent Semantic Analysis through Parallelization

Term Impacts as Normalized Term Frequencies for BM25 Similarity Scoring

Applying Gaussian Distribution-Dependent Criteria to Decision Trees for High-Dimensional Microarray Data

INFONA - science communication portal

Search results for: Raymond Wan

Block Merging for Off-Line Compression

Efficient Probabilistic Latent Semantic Analysis through Parallelization

Term Impacts as Normalized Term Frequencies for BM25 Similarity Scoring

Applying Gaussian Distribution-Dependent Criteria to Decision Trees for High-Dimensional Microarray Data

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Reporting an error / abuse

Sending the report failed

Accessibility options