Corpus: sna-zw_web_2013_10K

Other corpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 10000
Average sentence length in characters 114.7498
Average sentence length in words 14.7308
Number of distinct word forms 35258
Number of distinct word forms (without multiwords) 35145
Percentage of lower case word forms 78.0617
Number of multi word units 113
Percentage of multi word units 0.3205
Number of running word forms 147427
Number of running word forms (without multiwords) 147235
Percentage of lower case running words 86.1362
Average word form length 8.9513
Average running word length 6.67242843
Percentage of word forms with frequency=1 68.1179
Number of sentence based co-occurrences 27922
- minimal likelihood ratio 6.63
- maximal likelihood ratio 5315.47
Number of neighbour based co-occurrences 3873
- minimal likelihood ratio 3.85
- maximal likelihood ratio 16296.71
Average number of sentence based co-occurrences per sentence 27.8206
Average number of neighbour co-occurrences per sentence 3.3340
Most frequent word kuti
Frequent word's frequency 3337
385 msec needed at 2018-06-16 09:50