Korpus: zul_news_2013_30K

Weitere Korpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 30000
Average sentence length in characters 125.7152
Average sentence length in words 14.5596
Number of distinct word forms 106872
Number of distinct word forms (without multiwords) 106549
Percentage of lower case word forms 70.1166
Number of multi word units 323
Percentage of multi word units 0.3022
Number of running word forms 437011
Number of running word forms (without multiwords) 436196
Percentage of lower case running words 78.4216
Average word form length 9.3096
Average running word length 7.53266880
Percentage of word forms with frequency=1 66.7378
Number of sentence based co-occurrences 58046
- minimal likelihood ratio 6.63
- maximal likelihood ratio 2910.70
Number of neighbour based co-occurrences 11217
- minimal likelihood ratio 3.84
- maximal likelihood ratio 4115.49
Average number of sentence based co-occurrences per sentence 10.6804
Average number of neighbour co-occurrences per sentence 2.2602
Most frequent word ukuthi
Frequent word's frequency 9369
503 msec needed at 2018-04-02 13:00