Korpus: dan_news_2021_100K

Weitere Korpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 100000
Average sentence length in characters 110.8653
Average sentence length in words 17.8710
Number of distinct word forms 103928
Number of distinct word forms (without multiwords) 98423
Percentage of lower case word forms 65.0412
Number of multi word units 5505
Percentage of multi word units 5.2969
Number of running word forms 1808715
Number of running word forms (without multiwords) 1782257
Percentage of lower case running words 85.4811
Average word form length 10.1206
Average running word length 5.13039926
Percentage of word forms with frequency=1 53.5890
Number of sentence based co-occurrences 337486
- minimal likelihood ratio 6.63
- maximal likelihood ratio 8431.12
Number of neighbour based co-occurrences 63945
- minimal likelihood ratio 3.84
- maximal likelihood ratio 17655.01
Average number of sentence based co-occurrences per sentence 82.8559
Average number of neighbour co-occurrences per sentence 8.2929
Most frequent word i
Most frequent word's frequency 57178
1073 msec needed at 2022-01-21 19:00