Korpus: por-com_web_2018

Weitere Korpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 5264961
Average sentence length in characters 105.5337
Average sentence length in words 17.0911
Number of distinct word forms 908977
Number of distinct word forms (without multiwords) 873036
Percentage of lower case word forms 51.9339
Number of multi word units 35941
Percentage of multi word units 3.9540
Number of running word forms 90158787
Number of running word forms (without multiwords) 89746831
Percentage of lower case running words 87.6267
Average word form length 9.3049
Average running word length 5.09880390
Percentage of word forms with frequency=1 53.0092
Number of sentence based co-occurrences 14475414
- minimal likelihood ratio 6.63
- maximal likelihood ratio 226286.17
Number of neighbour based co-occurrences 1536037
- minimal likelihood ratio 3.84
- maximal likelihood ratio 636720.94
Average number of sentence based co-occurrences per sentence 150.5286
Average number of neighbour co-occurrences per sentence 11.2110
Most frequent word de
Most frequent word's frequency 4263250
39825 msec needed at 2020-06-13 00:01