Korpus: kor_wikipedia_2011_100K

Weitere Korpora

1.1 Summary

Values for some general parameters

parameter value
number of sentences 100000
average sentence length in characters 154.3983
average sentence length in words 14.7836
number of distinct word forms 401029
percentage of lower case word forms 94.0862
percentage of multi word units 0.1588
number of running word forms 1892991
percentage of lower case running words 97.6969
average word form length 12.4843
average running word length 9.38975678
percentage of word forms with frequency=1 71.7444
number of sentence based co-occurrences 121814
minimal likelihood ratio 6.63
maximal likelihood ratio 2190.04
number of neighbour based co-occurrences 22832
minimal likelihood ratio 3.84
maximal likelihood ratio 9697.99
average number of sentence based co-occurrences per sentence 8.3532
average number of neighbour co-occurrences per sentence 1.5121
most frequent word 있다
frequent word's frequency 14206
3278 msec needed at 2017-12-30 01:32