Values for some general parameters
Parameter |
Value |
Number of sentences |
30000 |
Average sentence length in characters |
125.7152 |
Average sentence length in words |
14.5596 |
Number of distinct word forms |
106872 |
Number of distinct word forms (without multiwords) |
106549 |
Percentage of lower case word forms |
70.1166 |
Number of multi word units |
323 |
Percentage of multi word units |
0.3022 |
Number of running word forms |
437011 |
Number of running word forms (without multiwords) |
436196 |
Percentage of lower case running words |
78.4216 |
Average word form length |
9.3096 |
Average running word length |
7.53266880 |
Percentage of word forms with frequency=1 |
66.7378 |
Number of sentence based co-occurrences |
58046 |
- minimal likelihood ratio |
6.63 |
- maximal likelihood ratio |
2910.70 |
Number of neighbour based co-occurrences |
11217 |
- minimal likelihood ratio |
3.84 |
- maximal likelihood ratio |
4115.49 |
Average number of sentence based co-occurrences per sentence |
10.6804 |
Average number of neighbour co-occurrences per sentence |
2.2602 |
Most frequent word |
ukuthi |
Frequent word's frequency |
9369 |
503 msec needed at 2018-04-02 13:00