Most strings of female surnames registered in the Czech Republic are lexically different from related male surnames. This article provides a method of grouping surnames by similarity and computing surname frequencies for these grouped surnames. The method reduces the 251,723 registered surname variants to 142,586 groups. Grouped surname frequencies can be used for linguistic research of similar surnames, determining geographic distribution of surnames, or by researchers which require surname frequencies irrespective of gender.
Financed by the National Centre for Research and Development under grant No. SP/I/1/77065/10 by the strategic scientific research and experimental development program:
SYNAT - “Interdisciplinary System for Interactive Scientific and Scientific-Technical Information”.