Korpus: deu-eu_web_2017_100K

Weitere Korpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 100000
Average sentence length in characters 116.7085
Average sentence length in words 16.5965
Number of distinct word forms 184316
Number of distinct word forms (without multiwords) 175718
Percentage of lower case word forms 30.8443
Number of multi word units 8598
Percentage of multi word units 4.6648
Number of running word forms 1666474
Number of running word forms (without multiwords) 1645316
Percentage of lower case running words 63.9230
Average word form length 10.6297
Average running word length 5.99326938
Percentage of word forms with frequency=1 64.8001
Number of sentence based co-occurrences 315156
- minimal likelihood ratio 6.63
- maximal likelihood ratio 5606.78
Number of neighbour based co-occurrences 55478
- minimal likelihood ratio 3.84
- maximal likelihood ratio 9636.21
Average number of sentence based co-occurrences per sentence 49.5445
Average number of neighbour co-occurrences per sentence 5.1060
Most frequent word und
Most frequent word's frequency 43381
2584 msec needed at 2019-12-17 08:01