Korpus: hun_news_2020_1M

Weitere Korpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 1000000
Average sentence length in characters 134.5942
Average sentence length in words 16.9665
Number of distinct word forms 852263
Number of distinct word forms (without multiwords) 830224
Percentage of lower case word forms 73.4788
Number of multi word units 22039
Percentage of multi word units 2.5859
Number of running word forms 17031198
Number of running word forms (without multiwords) 16849107
Percentage of lower case running words 86.0130
Average word form length 12.0064
Average running word length 6.86135093
Percentage of word forms with frequency=1 55.3492
Number of sentence based co-occurrences 4365922
- minimal likelihood ratio 6.63
- maximal likelihood ratio 43091.24
Number of neighbour based co-occurrences 543357
- minimal likelihood ratio 3.84
- maximal likelihood ratio 90947.49
Average number of sentence based co-occurrences per sentence 76.4429
Average number of neighbour co-occurrences per sentence 7.5901
Most frequent word a
Most frequent word's frequency 1346883
17516 msec needed at 2021-06-05 10:00