Corpus: bpy_wikipedia_2010

Other corpora

1.1 Summary

Values for some general parameters

parameter value
number of sentences 70496
average sentence length in characters 203.6868
average sentence length in words 11.9218
number of distinct word forms 72909
percentage of lower case word forms 83.5425
percentage of multi word units 6.7509
number of running word forms 995815
percentage of lower case running words 97.2159
average word form length 16.6388
average running word length 16.35704817
percentage of word forms with frequency=1 63.8056
number of sentence based co-occurrences 80468
minimal likelihood ratio 6.63
maximal likelihood ratio 67940.12
number of neighbour based co-occurrences 8273
minimal likelihood ratio 4.00
maximal likelihood ratio 136032.00
average number of sentence based co-occurrences per sentence 117.0694
average number of neighbour co-occurrences per sentence 6.8238
most frequent word বারো
frequent word's frequency 25960
1732 msec needed at 2017-11-29 08:30