Korpus: fra_news_2013_1M

Weitere Korpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 999983
Average sentence length in characters 122.6220
Average sentence length in words 19.4077
Number of distinct word forms 450219
Number of distinct word forms (without multiwords) 387646
Percentage of lower case word forms 45.7204
Number of multi word units 62573
Percentage of multi word units 13.8983
Number of running word forms 19654982
Number of running word forms (without multiwords) 19281598
Percentage of lower case running words 85.7500
Average word form length 8.7574
Average running word length 5.25100243
Percentage of word forms with frequency=1 52.1520
Number of sentence based co-occurrences 3903920
- minimal likelihood ratio 6.63
- maximal likelihood ratio 168150.66
Number of neighbour based co-occurrences 507146
- minimal likelihood ratio 3.84
- maximal likelihood ratio 307273.12
Average number of sentence based co-occurrences per sentence 142.2964
Average number of neighbour co-occurrences per sentence 12.0286
Most frequent word de
Most frequent word's frequency 1019896
13548 msec needed at 2024-09-13 13:00