Korpus: afr_news_2016_10K

Weitere Korpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 10000
Average sentence length in characters 110.2775
Average sentence length in words 19.3793
Number of distinct word forms 25970
Number of distinct word forms (without multiwords) 25716
Percentage of lower case word forms 69.9576
Number of multi word units 254
Percentage of multi word units 0.9781
Number of running word forms 193584
Number of running word forms (without multiwords) 193226
Percentage of lower case running words 87.3068
Average word form length 8.3390
Average running word length 4.61762392
Percentage of word forms with frequency=1 64.0277
Number of sentence based co-occurrences 24906
- minimal likelihood ratio 6.63
- maximal likelihood ratio 5412.23
Number of neighbour based co-occurrences 6123
- minimal likelihood ratio 3.84
- maximal likelihood ratio 4049.14
Average number of sentence based co-occurrences per sentence 47.2558
Average number of neighbour co-occurrences per sentence 4.9309
Most frequent word die
Frequent word's frequency 10970
310 msec needed at 2018-01-30 04:20