Korpus: ita_news_2020_1M

Weitere Korpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 1000000
Average sentence length in characters 122.3297
Average sentence length in words 19.2403
Number of distinct word forms 439308
Number of distinct word forms (without multiwords) 393123
Percentage of lower case word forms 51.9255
Number of multi word units 46185
Percentage of multi word units 10.5131
Number of running word forms 19508958
Number of running word forms (without multiwords) 19171774
Percentage of lower case running words 85.9125
Average word form length 9.0960
Average running word length 5.27190301
Percentage of word forms with frequency=1 51.5843
Number of sentence based co-occurrences 3801330
- minimal likelihood ratio 6.63
- maximal likelihood ratio 40160.87
Number of neighbour based co-occurrences 518129
- minimal likelihood ratio 3.84
- maximal likelihood ratio 159785.05
Average number of sentence based co-occurrences per sentence 144.1807
Average number of neighbour co-occurrences per sentence 11.4611
Most frequent word di
Most frequent word's frequency 729523
17392 msec needed at 2021-06-07 08:00