Korpus: bul_newscrawl_2017_30K

Weitere Korpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 30000
Average sentence length in characters 189.7095
Average sentence length in words 16.8074
Number of distinct word forms 76408
Number of distinct word forms (without multiwords) 74351
Percentage of lower case word forms 72.1299
Number of multi word units 2057
Percentage of multi word units 2.6921
Number of running word forms 505368
Number of running word forms (without multiwords) 501232
Percentage of lower case running words 85.8851
Average word form length 16.1648
Average running word length 10.22890997
Percentage of word forms with frequency=1 61.1585
Number of sentence based co-occurrences 65022
- minimal likelihood ratio 6.63
- maximal likelihood ratio 8847.62
Number of neighbour based co-occurrences 17310
- minimal likelihood ratio 3.85
- maximal likelihood ratio 15479.78
Average number of sentence based co-occurrences per sentence 34.0475
Average number of neighbour co-occurrences per sentence 4.3686
Most frequent word на
Frequent word's frequency 22932
448 msec needed at 2018-02-04 22:40