Korpus: nld_news_2023_1M

Weitere Korpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 1000000
Average sentence length in characters 90.1422
Average sentence length in words 14.7498
Number of distinct word forms 467997
Number of distinct word forms (without multiwords) 436927
Percentage of lower case word forms 56.8284
Number of multi word units 31070
Percentage of multi word units 6.6389
Number of running word forms 14944062
Number of running word forms (without multiwords) 14764947
Percentage of lower case running words 84.9802
Average word form length 10.4322
Average running word length 5.04034346
Percentage of word forms with frequency=1 57.1292
Number of sentence based co-occurrences 2006144
- minimal likelihood ratio 6.63
- maximal likelihood ratio 222089.94
Number of neighbour based co-occurrences 365317
- minimal likelihood ratio 3.84
- maximal likelihood ratio 202619.08
Average number of sentence based co-occurrences per sentence 83.1120
Average number of neighbour co-occurrences per sentence 7.8328
Most frequent word de
Most frequent word's frequency 712840
11145 msec needed at 2024-12-07 07:00