Corpus: ukr_wikipedia_2016_10K

Other corpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 10000
Average sentence length in characters 195.7778
Average sentence length in words 15.3610
Number of distinct word forms 50799
Number of distinct word forms (without multiwords) 50159
Percentage of lower case word forms 69.9128
Number of multi word units 640
Percentage of multi word units 1.2599
Number of running word forms 152901
Number of running word forms (without multiwords) 151967
Percentage of lower case running words 82.2323
Average word form length 15.9801
Average running word length 11.76587680
Percentage of word forms with frequency=1 70.9423
Number of sentence based co-occurrences 15100
- minimal likelihood ratio 6.63
- maximal likelihood ratio 500.61
Number of neighbour based co-occurrences 2885
- minimal likelihood ratio 3.84
- maximal likelihood ratio 796.57
Average number of sentence based co-occurrences per sentence 8.3132
Average number of neighbour co-occurrences per sentence 1.4695
Most frequent word в
Frequent word's frequency 2934
539 msec needed at 2018-01-26 11:00