Korpus: hrv_wikipedia_2021_100K

Weitere Korpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 100000
Average sentence length in characters 102.3062
Average sentence length in words 15.3438
Number of distinct word forms 210848
Number of distinct word forms (without multiwords) 204365
Percentage of lower case word forms 58.3686
Number of multi word units 6483
Percentage of multi word units 3.0747
Number of running word forms 1540106
Number of running word forms (without multiwords) 1529337
Percentage of lower case running words 84.0151
Average word form length 8.4905
Average running word length 5.60030327
Percentage of word forms with frequency=1 61.2574
Number of sentence based co-occurrences 228640
- minimal likelihood ratio 6.63
- maximal likelihood ratio 4634.98
Number of neighbour based co-occurrences 45769
- minimal likelihood ratio 3.84
- maximal likelihood ratio 9461.52
Average number of sentence based co-occurrences per sentence 29.5253
Average number of neighbour co-occurrences per sentence 4.0333
Most frequent word je
Most frequent word's frequency 67590
1575 msec needed at 2021-06-13 18:00