Korpus: nep_wikipedia_2018_10K

Weitere Korpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 10000
Average sentence length in characters 233.1569
Average sentence length in words 13.7067
Number of distinct word forms 36945
Number of distinct word forms (without multiwords) 36779
Percentage of lower case word forms 99.0148
Number of multi word units 166
Percentage of multi word units 0.4493
Number of running word forms 137311
Number of running word forms (without multiwords) 136972
Percentage of lower case running words 99.6839
Average word form length 21.4009
Average running word length 16.04419152
Percentage of word forms with frequency=1 69.7686
Number of sentence based co-occurrences 20068
- minimal likelihood ratio 6.63
- maximal likelihood ratio 1246.24
Number of neighbour based co-occurrences 2876
- minimal likelihood ratio 3.87
- maximal likelihood ratio 3479.10
Average number of sentence based co-occurrences per sentence 14.7958
Average number of neighbour co-occurrences per sentence 2.1254
Most frequent word
Most frequent word's frequency 4810
291 msec needed at 2024-04-03 02:00