Corpus: ara-ma_newscrawl-OSIAN_2018

Other corpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 537227
Average sentence length in characters 248.2435
Average sentence length in words 22.9719
Number of distinct word forms 613187
Number of distinct word forms (without multiwords) 613187
Percentage of lower case word forms 99.4118
Number of multi word units 0
Percentage of multi word units 0.0000
Number of running word forms 12340176
Number of running word forms (without multiwords) 12340176
Percentage of lower case running words 99.9556
Average word form length 13.4589
Average running word length 9.73240390
Percentage of word forms with frequency=1 55.9425
Number of sentence based co-occurrences 4514930
- minimal likelihood ratio 6.63
- maximal likelihood ratio 84849.42
Number of neighbour based co-occurrences 456149
- minimal likelihood ratio 3.84
- maximal likelihood ratio 141222.50
Average number of sentence based co-occurrences per sentence 122.8953
Average number of neighbour co-occurrences per sentence 9.8692
Most frequent word في
Frequent word's frequency 393716
29494 msec needed at 2018-05-22 17:40