Korpus: ara_newscrawl_2013_30K

Weitere Korpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 30000
Average sentence length in characters 221.2029
Average sentence length in words 20.6522
Number of distinct word forms 109154
Number of distinct word forms (without multiwords) 109149
Percentage of lower case word forms 99.7801
Number of multi word units 5
Percentage of multi word units 0.0046
Number of running word forms 619363
Number of running word forms (without multiwords) 619358
Percentage of lower case running words 99.9483
Average word form length 12.3266
Average running word length 9.68705175
Percentage of word forms with frequency=1 61.8466
Number of sentence based co-occurrences 110462
- minimal likelihood ratio 6.63
- maximal likelihood ratio 2216.71
Number of neighbour based co-occurrences 17451
- minimal likelihood ratio 3.84
- maximal likelihood ratio 3985.87
Average number of sentence based co-occurrences per sentence 29.3062
Average number of neighbour co-occurrences per sentence 3.6286
Most frequent word في
Frequent word's frequency 20368
548 msec needed at 2018-02-01 21:00