Corpus: deu_news_2002_1M

Other corpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 1000000
Average sentence length in characters 115.2999
Average sentence length in words 16.2418
Number of distinct word forms 744982
Number of distinct word forms (without multiwords) 638994
Percentage of lower case word forms 18.9838
Number of multi word units 105988
Percentage of multi word units 14.2269
Number of running word forms 17006723
Number of running word forms (without multiwords) 16178697
Percentage of lower case running words 62.8299
Average word form length 12.3421
Average running word length 6.02108680
Percentage of word forms with frequency=1 58.2650
Number of sentence based co-occurrences 3012266
- minimal likelihood ratio 6.63
- maximal likelihood ratio 79475.43
Number of neighbour based co-occurrences 503934
- minimal likelihood ratio 3.84
- maximal likelihood ratio 126648.48
Average number of sentence based co-occurrences per sentence 86.8625
Average number of neighbour co-occurrences per sentence 8.1760
Most frequent word der
Frequent word's frequency 502802
13346 msec needed at 2018-02-15 15:40