Values for some general parameters
Parameter |
Value |
Number of sentences |
1000000 |
Average sentence length in characters |
122.3297 |
Average sentence length in words |
19.2403 |
Number of distinct word forms |
439308 |
Number of distinct word forms (without multiwords) |
393123 |
Percentage of lower case word forms |
51.9255 |
Number of multi word units |
46185 |
Percentage of multi word units |
10.5131 |
Number of running word forms |
19508958 |
Number of running word forms (without multiwords) |
19171774 |
Percentage of lower case running words |
85.9125 |
Average word form length |
9.0960 |
Average running word length |
5.27190301 |
Percentage of word forms with frequency=1 |
51.5843 |
Number of sentence based co-occurrences |
3801330 |
- minimal likelihood ratio |
6.63 |
- maximal likelihood ratio |
40160.87 |
Number of neighbour based co-occurrences |
518129 |
- minimal likelihood ratio |
3.84 |
- maximal likelihood ratio |
159785.05 |
Average number of sentence based co-occurrences per sentence |
144.1807 |
Average number of neighbour co-occurrences per sentence |
11.4611 |
Most frequent word |
di |
Most frequent word's frequency |
729523 |
17392 msec needed at 2021-06-07 08:00