Korpus: fra_wikipedia_2012_300K

Weitere Korpora

1.1 Summary

Values for some general parameters

parameter value
number of sentences 300000
average sentence length in characters 125.4535
average sentence length in words 19.9468
number of distinct word forms 314625
percentage of lower case word forms 39.9634
percentage of multi word units 7.8039
number of running word forms 7242012
percentage of lower case running words 84.5751
average word form length 8.3732
average running word length 5.03973536
percentage of word forms with frequency=1 59.0363
number of sentence based co-occurrences 633718
minimal likelihood ratio 6.63
maximal likelihood ratio 33399.66
number of neighbour based co-occurrences 146167
minimal likelihood ratio 3.84
maximal likelihood ratio 118603.28
average number of sentence based co-occurrences per sentence 102.4257
average number of neighbour co-occurrences per sentence 10.1095
most frequent word de
frequent word's frequency 333271
4271 msec needed at 2017-12-16 10:07