Corpus: fas_newscrawl_2017

Other corpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 2254366
Average sentence length in characters 228.8614
Average sentence length in words 24.6898
Number of distinct word forms 876741
Number of distinct word forms (without multiwords) 876741
Percentage of lower case word forms 97.9679
Number of multi word units 0
Percentage of multi word units 0.0000
Number of running word forms 55583988
Number of running word forms (without multiwords) 55583988
Percentage of lower case running words 99.8939
Average word form length 15.4471
Average running word length 8.26416527
Percentage of word forms with frequency=1 58.8135
Number of sentence based co-occurrences 11948570
- minimal likelihood ratio 6.63
- maximal likelihood ratio 180470.05
Number of neighbour based co-occurrences 1245105
- minimal likelihood ratio 3.84
- maximal likelihood ratio 428716.84
Average number of sentence based co-occurrences per sentence 276.0086
Average number of neighbour co-occurrences per sentence 14.8810
Most frequent word و
Most frequent word's frequency 2324106
32865 msec needed at 2024-12-24 01:00