Corpus: eus_wikipedia_2014

Other corpora

1.1 Summary

Values for some general parameters

parameter value
number of sentences 645280
average sentence length in characters 108.6867
average sentence length in words 13.9952
number of distinct word forms 673203
percentage of lower case word forms 51.3529
percentage of multi word units 2.0882
number of running word forms 10726690
percentage of lower case running words 79.6673
average word form length 9.6713
average running word length 6.69666777
percentage of word forms with frequency=1 60.2178
number of sentence based co-occurrences 1588946
minimal likelihood ratio 6.63
maximal likelihood ratio 185838.77
number of neighbour based co-occurrences 226508
minimal likelihood ratio 3.84
maximal likelihood ratio 315123.28
average number of sentence based co-occurrences per sentence 51.7483
average number of neighbour co-occurrences per sentence 5.1269
most frequent word eta
frequent word's frequency 372258
10029 msec needed at 2017-12-13 04:00