Values for some general parameters
Parameter |
Value |
Number of sentences |
1000000 |
Average sentence length in characters |
118.0583 |
Average sentence length in words |
19.8521 |
Number of distinct word forms |
457910 |
Number of distinct word forms (without multiwords) |
419727 |
Percentage of lower case word forms |
35.7920 |
Number of multi word units |
38183 |
Percentage of multi word units |
8.3385 |
Number of running word forms |
20350382 |
Number of running word forms (without multiwords) |
20060598 |
Percentage of lower case running words |
81.0616 |
Average word form length |
8.6816 |
Average running word length |
4.80241362 |
Percentage of word forms with frequency=1 |
56.3008 |
Number of sentence based co-occurrences |
4190272 |
- minimal likelihood ratio |
6.63 |
- maximal likelihood ratio |
74675.20 |
Number of neighbour based co-occurrences |
518702 |
- minimal likelihood ratio |
3.84 |
- maximal likelihood ratio |
267369.97 |
Average number of sentence based co-occurrences per sentence |
153.7053 |
Average number of neighbour co-occurrences per sentence |
11.6146 |
Most frequent word |
the |
Frequent word's frequency |
1084807 |
13875 msec needed at 2018-02-24 19:20