This paper describes a mechanism for the extraction of relevant information about people from Polish portals for professionals. The method of information extraction is based on hierarchical execution of XPath commands and regular expressions depending on the structure of processed documents. The extraction component EXT is a part of the eXtraSpec system, which task is to support Human Resources departments of Polish companies during recruitment and team building. EXT is able to deal with several sources of information and with user profiles that are acquired from professionals' portals. In this article we also discuss the advantages of the chosen extraction method in the context of the goals of the whole eXtraSpec system and we show the directions of future research.
Financed by the National Centre for Research and Development under grant No. SP/I/1/77065/10 by the strategic scientific research and experimental development program:
SYNAT - “Interdisciplinary System for Interactive Scientific and Scientific-Technical Information”.