Korpus: yor_wikipedia_2018_10K

Weitere Korpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 10000
Average sentence length in characters 98.4334
Average sentence length in words 15.8604
Number of distinct word forms 26851
Number of distinct word forms (without multiwords) 26385
Percentage of lower case word forms 55.0296
Number of multi word units 466
Percentage of multi word units 1.7355
Number of running word forms 158765
Number of running word forms (without multiwords) 157839
Percentage of lower case running words 77.3615
Average word form length 7.7186
Average running word length 5.14818898
Percentage of word forms with frequency=1 66.7796
Number of sentence based co-occurrences 58994
- minimal likelihood ratio 6.63
- maximal likelihood ratio 3621.29
Number of neighbour based co-occurrences 5625
- minimal likelihood ratio 3.86
- maximal likelihood ratio 4805.74
Average number of sentence based co-occurrences per sentence 65.4914
Average number of neighbour co-occurrences per sentence 5.0106
Most frequent word ni
Most frequent word's frequency 5115
334 msec needed at 2024-05-25 01:00