Corpus: nld_news_2010_1M

Other corpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 1000000
Average sentence length in characters 92.8082
Average sentence length in words 15.1972
Number of distinct word forms 441677
Number of distinct word forms (without multiwords) 441101
Percentage of lower case word forms 59.0825
Number of multi word units 576
Percentage of multi word units 0.1304
Number of running word forms 15206412
Number of running word forms (without multiwords) 15184931
Percentage of lower case running words 85.1332
Average word form length 10.3522
Average running word length 5.05187933
Percentage of word forms with frequency=1 59.1247
Number of sentence based co-occurrences 2268982
- minimal likelihood ratio 6.63
- maximal likelihood ratio 180291.91
Number of neighbour based co-occurrences 365977
- minimal likelihood ratio 3.84
- maximal likelihood ratio 245923.77
Average number of sentence based co-occurrences per sentence 91.4977
Average number of neighbour co-occurrences per sentence 8.1589
Most frequent word de
Frequent word's frequency 761915
10546 msec needed at 2018-03-17 15:10