The article is an introduction to a statistical approach to natural language processing. The quantitative linguistics as a research discipline as well as text units applicable to statistical research have been presented. Definitions of the particular text units have been discussed in terms of their applicability to statistical natural language processing, with special attention to differences in Polish and English terminology. Statistical attributes of lexical units have also been presented as well as categories and measures used in quantitative lexical units research.
Financed by the National Centre for Research and Development under grant No. SP/I/1/77065/10 by the strategic scientific research and experimental development program:
SYNAT - “Interdisciplinary System for Interactive Scientific and Scientific-Technical Information”.