Korpus: hin_wikipedia_2014_10K

Weitere Korpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 10000
Average sentence length in characters 255.2783
Average sentence length in words 18.8774
Number of distinct word forms 33377
Number of distinct word forms (without multiwords) 30519
Percentage of lower case word forms 96.8421
Number of multi word units 2858
Percentage of multi word units 8.5628
Number of running word forms 192986
Number of running word forms (without multiwords) 188153
Percentage of lower case running words 99.3155
Average word form length 18.5416
Average running word length 12.53884870
Percentage of word forms with frequency=1 66.8784
Number of sentence based co-occurrences 29302
- minimal likelihood ratio 6.63
- maximal likelihood ratio 2045.16
Number of neighbour based co-occurrences 5954
- minimal likelihood ratio 3.84
- maximal likelihood ratio 5633.90
Average number of sentence based co-occurrences per sentence 45.0152
Average number of neighbour co-occurrences per sentence 5.2059
Most frequent word के
Most frequent word's frequency 9344
284 msec needed at 2021-08-21 17:00