Korpus: slv_newscrawl_2016_100K

Weitere Korpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 100000
Average sentence length in characters 114.1953
Average sentence length in words 17.7119
Number of distinct word forms 172421
Number of distinct word forms (without multiwords) 170754
Percentage of lower case word forms 64.7340
Number of multi word units 1667
Percentage of multi word units 0.9668
Number of running word forms 1768484
Number of running word forms (without multiwords) 1764241
Percentage of lower case running words 86.3464
Average word form length 8.5671
Average running word length 5.36745943
Percentage of word forms with frequency=1 56.9130
Number of sentence based co-occurrences 311644
- minimal likelihood ratio 6.63
- maximal likelihood ratio 8025.13
Number of neighbour based co-occurrences 54867
- minimal likelihood ratio 3.84
- maximal likelihood ratio 18655.07
Average number of sentence based co-occurrences per sentence 55.4201
Average number of neighbour co-occurrences per sentence 5.7802
Most frequent word je
Most frequent word's frequency 66285
1879 msec needed at 2019-11-30 08:01