Korpus: bul_newscrawl_2017

Weitere Korpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 13029061
Average sentence length in characters 189.3340
Average sentence length in words 16.7883
Number of distinct word forms 2377431
Number of distinct word forms (without multiwords) 2352902
Percentage of lower case word forms 61.1471
Number of multi word units 24529
Percentage of multi word units 1.0317
Number of running word forms 219140851
Number of running word forms (without multiwords) 217417175
Percentage of lower case running words 85.9057
Average word form length 18.5639
Average running word length 10.22070446
Percentage of word forms with frequency=1 58.2379
Number of sentence based co-occurrences 38626298
- minimal likelihood ratio 6.63
- maximal likelihood ratio 3751048.50
Number of neighbour based co-occurrences 3592825
- minimal likelihood ratio 3.84
- maximal likelihood ratio 6504258.50
Average number of sentence based co-occurrences per sentence 140.2352
Average number of neighbour co-occurrences per sentence 10.5175
Most frequent word на
Frequent word's frequency 9974222
182371 msec needed at 2018-02-04 19:03