Values for some general parameters
Parameter |
Value |
Number of sentences |
1000000 |
Average sentence length in characters |
123.3214 |
Average sentence length in words |
19.0351 |
Number of distinct word forms |
509590 |
Number of distinct word forms (without multiwords) |
496896 |
Percentage of lower case word forms |
59.2390 |
Number of multi word units |
12694 |
Percentage of multi word units |
2.4910 |
Number of running word forms |
19102237 |
Number of running word forms (without multiwords) |
18983773 |
Percentage of lower case running words |
86.9593 |
Average word form length |
8.8876 |
Average running word length |
5.40215146 |
Percentage of word forms with frequency=1 |
46.8361 |
Number of sentence based co-occurrences |
5245064 |
- minimal likelihood ratio |
6.63 |
- maximal likelihood ratio |
71402.73 |
Number of neighbour based co-occurrences |
595995 |
- minimal likelihood ratio |
3.84 |
- maximal likelihood ratio |
118703.92 |
Average number of sentence based co-occurrences per sentence |
114.4514 |
Average number of neighbour co-occurrences per sentence |
9.9654 |
Most frequent word |
je |
Most frequent word's frequency |
806271 |
17399 msec needed at 2021-06-04 21:00