Corpus: por_news_2019

Other corpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 4490898
Average sentence length in characters 121.9105
Average sentence length in words 19.4821
Number of distinct word forms 785287
Number of distinct word forms (without multiwords) 691987
Percentage of lower case word forms 47.3149
Number of multi word units 93300
Percentage of multi word units 11.8810
Number of running word forms 89449824
Number of running word forms (without multiwords) 87349589
Percentage of lower case running words 82.4604
Average word form length 8.8860
Average running word length 5.14802056
Percentage of word forms with frequency=1 47.1221
Number of sentence based co-occurrences 18370098
- minimal likelihood ratio 6.63
- maximal likelihood ratio 396957.38
Number of neighbour based co-occurrences 1505400
- minimal likelihood ratio 3.84
- maximal likelihood ratio 584284.50
Average number of sentence based co-occurrences per sentence 196.6429
Average number of neighbour co-occurrences per sentence 13.5228
Most frequent word de
Most frequent word's frequency 4280241
51308 msec needed at 2021-07-06 19:00