Korpus: dan_wikipedia_2018_300K

Weitere Korpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 300000
Average sentence length in characters 114.9461
Average sentence length in words 17.8804
Number of distinct word forms 404592
Number of distinct word forms (without multiwords) 387901
Percentage of lower case word forms 54.2734
Number of multi word units 16691
Percentage of multi word units 4.1254
Number of running word forms 5376651
Number of running word forms (without multiwords) 5337065
Percentage of lower case running words 83.9765
Average word form length 10.1992
Average running word length 5.37136685
Percentage of word forms with frequency=1 62.9264
Number of sentence based co-occurrences 861124
- minimal likelihood ratio 6.63
- maximal likelihood ratio 13038.05
Number of neighbour based co-occurrences 154906
- minimal likelihood ratio 3.84
- maximal likelihood ratio 40228.18
Average number of sentence based co-occurrences per sentence 77.0493
Average number of neighbour co-occurrences per sentence 7.4380
Most frequent word og
Most frequent word's frequency 175274
4990 msec needed at 2019-03-02 08:05