Korpus: eng_news_2019_10K

Weitere Korpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 10000
Average sentence length in characters 115.2437
Average sentence length in words 19.3344
Number of distinct word forms 37937
Number of distinct word forms (without multiwords) 29428
Percentage of lower case word forms 53.7312
Number of multi word units 8509
Percentage of multi word units 22.4293
Number of running word forms 210022
Number of running word forms (without multiwords) 193336
Percentage of lower case running words 81.0839
Average word form length 7.3130
Average running word length 4.87055696
Percentage of word forms with frequency=1 65.0500
Number of sentence based co-occurrences 33538
- minimal likelihood ratio 6.63
- maximal likelihood ratio 1416.81
Number of neighbour based co-occurrences 7161
- minimal likelihood ratio 3.84
- maximal likelihood ratio 2123.86
Average number of sentence based co-occurrences per sentence 36.6284
Average number of neighbour co-occurrences per sentence 5.0050
Most frequent word the
Most frequent word's frequency 9978
292 msec needed at 2021-05-28 10:00