Corpus: isl-is_web_2013

Other corpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 217111
Average sentence length in characters 109.1274
Average sentence length in words 15.8707
Number of distinct word forms 260204
Number of distinct word forms (without multiwords) 255112
Percentage of lower case word forms 71.3513
Number of multi word units 5092
Percentage of multi word units 1.9569
Number of running word forms 3523145
Number of running word forms (without multiwords) 3441217
Percentage of lower case running words 88.1606
Average word form length 11.0445
Average running word length 5.83858966
Percentage of word forms with frequency=1 57.4284
Number of sentence based co-occurrences 753008
- minimal likelihood ratio 6.63
- maximal likelihood ratio 37518.73
Number of neighbour based co-occurrences 127880
- minimal likelihood ratio 3.84
- maximal likelihood ratio 31826.02
Average number of sentence based co-occurrences per sentence 72.8234
Average number of neighbour co-occurrences per sentence 6.8057
Most frequent word og
Frequent word's frequency 147508
4073 msec needed at 2018-04-30 06:50