Korpus: ilo_wikipedia_2014_10K

Weitere Korpora

1.1 Summary

Values for some general parameters

parameter value
number of sentences 10000
average sentence length in characters 136.6898
average sentence length in words 22.2692
number of distinct word forms 30034
percentage of lower case word forms 63.3116
percentage of multi word units 0.6759
number of running word forms 249357
percentage of lower case running words 81.7080
average word form length 8.3309
average running word length 5.07787620
percentage of word forms with frequency=1 66.2815
number of sentence based co-occurrences 19156
minimal likelihood ratio 6.63
maximal likelihood ratio 1749.70
number of neighbour based co-occurrences 6215
minimal likelihood ratio 3.84
maximal likelihood ratio 4836.22
average number of sentence based co-occurrences per sentence 51.4450
average number of neighbour co-occurrences per sentence 8.9428
most frequent word a
frequent word's frequency 25292
311 msec needed at 2017-12-25 04:00