Korpus: ell_news_2023_1M

Weitere Korpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 1000000
Average sentence length in characters 228.4798
Average sentence length in words 19.5533
Number of distinct word forms 435534
Number of distinct word forms (without multiwords) 422123
Percentage of lower case word forms 56.6374
Number of multi word units 13411
Percentage of multi word units 3.0792
Number of running word forms 19661669
Number of running word forms (without multiwords) 19513660
Percentage of lower case running words 84.9218
Average word form length 16.1441
Average running word length 10.58924825
Percentage of word forms with frequency=1 48.0144
Number of sentence based co-occurrences 4973796
- minimal likelihood ratio 6.63
- maximal likelihood ratio 45949.20
Number of neighbour based co-occurrences 559780
- minimal likelihood ratio 3.84
- maximal likelihood ratio 156483.62
Average number of sentence based co-occurrences per sentence 146.5025
Average number of neighbour co-occurrences per sentence 12.0097
Most frequent word και
Most frequent word's frequency 616064
9378 msec needed at 2024-12-06 14:00