Korpus: kor_web_2011_30K

Weitere Korpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 30000
Average sentence length in characters 171.2898
Average sentence length in words 16.2873
Number of distinct word forms 165416
Number of distinct word forms (without multiwords) 165290
Percentage of lower case word forms 95.6534
Number of multi word units 126
Percentage of multi word units 0.0762
Number of running word forms 487730
Number of running word forms (without multiwords) 487593
Percentage of lower case running words 98.0461
Average word form length 12.3086
Average running word length 9.45348887
Percentage of word forms with frequency=1 73.3133
Number of sentence based co-occurrences 85640
- minimal likelihood ratio 6.63
- maximal likelihood ratio 1379.08
Number of neighbour based co-occurrences 8798
- minimal likelihood ratio 3.86
- maximal likelihood ratio 5197.83
Average number of sentence based co-occurrences per sentence 12.5098
Average number of neighbour co-occurrences per sentence 1.5371
Most frequent word
Frequent word's frequency 3928
1142 msec needed at 2018-05-03 02:40