Korpus: hrv_news_2020_1M

Weitere Korpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 1000000
Average sentence length in characters 123.3214
Average sentence length in words 19.0351
Number of distinct word forms 509590
Number of distinct word forms (without multiwords) 496896
Percentage of lower case word forms 59.2390
Number of multi word units 12694
Percentage of multi word units 2.4910
Number of running word forms 19102237
Number of running word forms (without multiwords) 18983773
Percentage of lower case running words 86.9593
Average word form length 8.8876
Average running word length 5.40215146
Percentage of word forms with frequency=1 46.8361
Number of sentence based co-occurrences 5245064
- minimal likelihood ratio 6.63
- maximal likelihood ratio 71402.73
Number of neighbour based co-occurrences 595995
- minimal likelihood ratio 3.84
- maximal likelihood ratio 118703.92
Average number of sentence based co-occurrences per sentence 114.4514
Average number of neighbour co-occurrences per sentence 9.9654
Most frequent word je
Most frequent word's frequency 806271
17399 msec needed at 2021-06-04 21:00