Korpus: ind_newscrawl_2011_1M

Weitere Korpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 1000000
Average sentence length in characters 116.1792
Average sentence length in words 16.2872
Number of distinct word forms 367740
Number of distinct word forms (without multiwords) 353896
Percentage of lower case word forms 42.7519
Number of multi word units 13844
Percentage of multi word units 3.7646
Number of running word forms 16357463
Number of running word forms (without multiwords) 16258197
Percentage of lower case running words 75.8600
Average word form length 8.1461
Average running word length 6.02244831
Percentage of word forms with frequency=1 53.7861
Number of sentence based co-occurrences 3801188
- minimal likelihood ratio 6.63
- maximal likelihood ratio 53850.85
Number of neighbour based co-occurrences 496976
- minimal likelihood ratio 3.84
- maximal likelihood ratio 110404.86
Average number of sentence based co-occurrences per sentence 85.7926
Average number of neighbour co-occurrences per sentence 8.3807
Most frequent word yang
Frequent word's frequency 422870
12425 msec needed at 2018-03-09 22:00