Korpus: ron_wikipedia_2018_300K

Weitere Korpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 300000
Average sentence length in characters 125.6643
Average sentence length in words 18.8779
Number of distinct word forms 331058
Number of distinct word forms (without multiwords) 331057
Percentage of lower case word forms 53.6441
Number of multi word units 1
Percentage of multi word units 0.0003
Number of running word forms 5627231
Number of running word forms (without multiwords) 5627227
Percentage of lower case running words 84.0130
Average word form length 8.6896
Average running word length 5.58125930
Percentage of word forms with frequency=1 57.3942
Number of sentence based co-occurrences 1097844
- minimal likelihood ratio 6.63
- maximal likelihood ratio 52225.53
Number of neighbour based co-occurrences 175866
- minimal likelihood ratio 3.84
- maximal likelihood ratio 199932.09
Average number of sentence based co-occurrences per sentence 85.3079
Average number of neighbour co-occurrences per sentence 8.7574
Most frequent word de
Most frequent word's frequency 275431
3562 msec needed at 2024-04-20 13:02