Corpus: dan-com_web_2018

Other corpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 541776
Average sentence length in characters 108.2350
Average sentence length in words 17.2205
Number of distinct word forms 379027
Number of distinct word forms (without multiwords) 375300
Percentage of lower case word forms 66.4802
Number of multi word units 3727
Percentage of multi word units 0.9833
Number of running word forms 9301412
Number of running word forms (without multiwords) 9285115
Percentage of lower case running words 88.6492
Average word form length 11.1173
Average running word length 5.22259229
Percentage of word forms with frequency=1 57.6925
Number of sentence based co-occurrences 1809610
- minimal likelihood ratio 6.63
- maximal likelihood ratio 61010.72
Number of neighbour based co-occurrences 254301
- minimal likelihood ratio 3.84
- maximal likelihood ratio 134092.72
Average number of sentence based co-occurrences per sentence 109.8327
Average number of neighbour co-occurrences per sentence 8.9335
Most frequent word og
Most frequent word's frequency 315581
8562 msec needed at 2019-12-11 08:00