MicroRNAs (miRNAs) are small non-coding RNAs of approximately 23 nucleotides, which negatively regulate the gene expression at the post-transcriptional level. miRNAs have been considered as good candidates for early detection or prognosis biomarkers for various diseases. Validated miRNA targets are usually reported in literature, necessitating researchers to manually screen through the related literature to keep up-to-date with novel findings. However, the amount of miRNA-related literature is increasing rapidly which makes it difficult for researchers to keep up to date. This study develops a text mining pipeline based on the statistical principle-based approach (SPBA) to detect MiRNA-Target Interactions (MTIs) mentioned in literatures. SPBA uses a collection of principles to represent linguistic concepts or rules used by human for describing MTIs. Each principle is composed of a collection of slots, which can be automatically learned from training data by merging the labeled slot sequences into more representative principles through a dominating set algorithm. Followed by a partial matching algorithm, the proposed approach can successfully recognize miRNA mentions and extract their MTIs in articles with a promising F-score of 98.8% and an accuracy of 71.43%.
Financed by the National Centre for Research and Development under grant No. SP/I/1/77065/10 by the strategic scientific research and experimental development program:
SYNAT - “Interdisciplinary System for Interactive Scientific and Scientific-Technical Information”.