Korpus: ara_web_2012_10K

Weitere Korpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 10000
Average sentence length in characters 175.6828
Average sentence length in words 16.9399
Number of distinct word forms 48186
Number of distinct word forms (without multiwords) 48178
Percentage of lower case word forms 98.7216
Number of multi word units 8
Percentage of multi word units 0.0166
Number of running word forms 168565
Number of running word forms (without multiwords) 168557
Percentage of lower case running words 99.5652
Average word form length 11.7438
Average running word length 9.39149961
Percentage of word forms with frequency=1 66.7435
Number of sentence based co-occurrences 25396
- minimal likelihood ratio 6.63
- maximal likelihood ratio 694.18
Number of neighbour based co-occurrences 3473
- minimal likelihood ratio 3.86
- maximal likelihood ratio 755.78
Average number of sentence based co-occurrences per sentence 15.0546
Average number of neighbour co-occurrences per sentence 1.7189
Most frequent word في
Frequent word's frequency 4687
295 msec needed at 2018-04-04 03:40