Values for some general parameters
Parameter |
Value |
Number of sentences |
1000000 |
Average sentence length in characters |
139.2186 |
Average sentence length in words |
22.7173 |
Number of distinct word forms |
470102 |
Number of distinct word forms (without multiwords) |
397625 |
Percentage of lower case word forms |
46.0026 |
Number of multi word units |
72477 |
Percentage of multi word units |
15.4173 |
Number of running word forms |
23226639 |
Number of running word forms (without multiwords) |
22694075 |
Percentage of lower case running words |
84.9411 |
Average word form length |
8.6077 |
Average running word length |
5.03716798 |
Percentage of word forms with frequency=1 |
51.8728 |
Number of sentence based co-occurrences |
4259556 |
- minimal likelihood ratio |
6.63 |
- maximal likelihood ratio |
95735.37 |
Number of neighbour based co-occurrences |
514062 |
- minimal likelihood ratio |
3.84 |
- maximal likelihood ratio |
357529.12 |
Average number of sentence based co-occurrences per sentence |
186.9129 |
Average number of neighbour co-occurrences per sentence |
14.6909 |
Most frequent word |
de |
Most frequent word's frequency |
1520448 |
13568 msec needed at 2021-07-14 15:00