Corpus: yor_wikipedia_2014

Other corpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 8661
Average sentence length in characters 113.1626
Average sentence length in words 17.9185
Number of distinct word forms 26186
Number of distinct word forms (without multiwords) 25830
Percentage of lower case word forms 62.2661
Number of multi word units 356
Percentage of multi word units 1.3595
Number of running word forms 155261
Number of running word forms (without multiwords) 154760
Percentage of lower case running words 82.4354
Average word form length 8.2577
Average running word length 5.25651977
Percentage of word forms with frequency=1 66.4401
Number of sentence based co-occurrences 54094
- minimal likelihood ratio 6.63
- maximal likelihood ratio 1282.26
Number of neighbour based co-occurrences 5692
- minimal likelihood ratio 3.85
- maximal likelihood ratio 4824.35
Average number of sentence based co-occurrences per sentence 63.6659
Average number of neighbour co-occurrences per sentence 5.1763
Most frequent word ni
Frequent word's frequency 5712
1231 msec needed at 2018-01-29 22:00