Corpus: ron_news_2019_1M

Other corpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 1000000
Average sentence length in characters 120.9449
Average sentence length in words 18.8997
Number of distinct word forms 458431
Number of distinct word forms (without multiwords) 438817
Percentage of lower case word forms 59.1352
Number of multi word units 19614
Percentage of multi word units 4.2785
Number of running word forms 19084661
Number of running word forms (without multiwords) 18857108
Percentage of lower case running words 84.5412
Average word form length 9.0567
Average running word length 5.29933736
Percentage of word forms with frequency=1 50.1349
Number of sentence based co-occurrences 4983746
- minimal likelihood ratio 6.63
- maximal likelihood ratio 69137.97
Number of neighbour based co-occurrences 533246
- minimal likelihood ratio 3.84
- maximal likelihood ratio 411451.41
Average number of sentence based co-occurrences per sentence 134.7962
Average number of neighbour co-occurrences per sentence 11.1117
Most frequent word de
Most frequent word's frequency 960149
13976 msec needed at 2021-07-08 00:00