Korpus: rus_news_2019_1M

Weitere Korpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 1000000
Average sentence length in characters 184.2014
Average sentence length in words 14.1070
Number of distinct word forms 613697
Number of distinct word forms (without multiwords) 574966
Percentage of lower case word forms 58.3236
Number of multi word units 38731
Percentage of multi word units 6.3111
Number of running word forms 14231712
Number of running word forms (without multiwords) 13934976
Percentage of lower case running words 83.2838
Average word form length 17.7922
Average running word length 12.03487046
Percentage of word forms with frequency=1 51.5935
Number of sentence based co-occurrences 3603310
- minimal likelihood ratio 6.63
- maximal likelihood ratio 71273.64
Number of neighbour based co-occurrences 479035
- minimal likelihood ratio 3.84
- maximal likelihood ratio 127949.69
Average number of sentence based co-occurrences per sentence 55.1584
Average number of neighbour co-occurrences per sentence 6.0293
Most frequent word в
Most frequent word's frequency 500832
15617 msec needed at 2021-07-08 17:00