Korpus: ind_news_2020_300K

Weitere Korpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 300000
Average sentence length in characters 110.1620
Average sentence length in words 15.4562
Number of distinct word forms 182920
Number of distinct word forms (without multiwords) 163104
Percentage of lower case word forms 40.1558
Number of multi word units 19816
Percentage of multi word units 10.8332
Number of running word forms 4770768
Number of running word forms (without multiwords) 4630183
Percentage of lower case running words 73.2927
Average word form length 7.6800
Average running word length 6.02483682
Percentage of word forms with frequency=1 53.0172
Number of sentence based co-occurrences 1304848
- minimal likelihood ratio 6.63
- maximal likelihood ratio 24226.84
Number of neighbour based co-occurrences 178356
- minimal likelihood ratio 3.84
- maximal likelihood ratio 43535.43
Average number of sentence based co-occurrences per sentence 68.0009
Average number of neighbour co-occurrences per sentence 7.2570
Most frequent word yang
Most frequent word's frequency 111200
3533 msec needed at 2021-06-06 12:03