Korpus: deu_news_2016_1M

Weitere Korpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 1000000
Average sentence length in characters 107.4255
Average sentence length in words 15.2731
Number of distinct word forms 731650
Number of distinct word forms (without multiwords) 662004
Percentage of lower case word forms 18.4400
Number of multi word units 69646
Percentage of multi word units 9.5190
Number of running word forms 15632893
Number of running word forms (without multiwords) 15232219
Percentage of lower case running words 62.9358
Average word form length 12.0337
Average running word length 5.94330051
Percentage of word forms with frequency=1 60.9613
Number of sentence based co-occurrences 2543058
- minimal likelihood ratio 6.63
- maximal likelihood ratio 71264.16
Number of neighbour based co-occurrences 438564
- minimal likelihood ratio 3.84
- maximal likelihood ratio 134814.38
Average number of sentence based co-occurrences per sentence 75.3051
Average number of neighbour co-occurrences per sentence 7.2511
Most frequent word der
Most frequent word's frequency 440550
13384 msec needed at 2021-05-11 20:00