Corpus: hbs_wikipedia_2016_10K

Other corpora

1.1 Summary

Values for some general parameters

parameter value
number of sentences 10000
average sentence length in characters 112.8248
average sentence length in words 16.7671
number of distinct word forms 50504
percentage of lower case word forms 68.9193
percentage of multi word units 0.4812
number of running word forms 191890
percentage of lower case running words 83.7125
average word form length 8.0743
average running word length 5.67021615
percentage of word forms with frequency=1 70.3311
number of sentence based co-occurrences 12130
minimal likelihood ratio 6.64
maximal likelihood ratio 1120.21
number of neighbour based co-occurrences 2805
minimal likelihood ratio 3.86
maximal likelihood ratio 1718.46
average number of sentence based co-occurrences per sentence 20.5024
average number of neighbour co-occurrences per sentence 2.5452
most frequent word je
frequent word's frequency 6867
457 msec needed at 2017-12-21 15:30