Corpus: nld-com_web_2018

Other corpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 2954062
Average sentence length in characters 96.8410
Average sentence length in words 15.7106
Number of distinct word forms 879206
Number of distinct word forms (without multiwords) 878613
Percentage of lower case word forms 62.1395
Number of multi word units 593
Percentage of multi word units 0.0674
Number of running word forms 46351522
Number of running word forms (without multiwords) 46326539
Percentage of lower case running words 88.3347
Average word form length 11.5084
Average running word length 5.11645619
Percentage of word forms with frequency=1 57.4541
Number of sentence based co-occurrences 6655118
- minimal likelihood ratio 6.63
- maximal likelihood ratio 1056040.50
Number of neighbour based co-occurrences 896568
- minimal likelihood ratio 3.84
- maximal likelihood ratio 604163.00
Average number of sentence based co-occurrences per sentence 112.2037
Average number of neighbour co-occurrences per sentence 9.3531
Most frequent word de
Most frequent word's frequency 1960216
22119 msec needed at 2020-06-08 12:00