Korpus: fin-com_web_2018

Weitere Korpora

1.1 Summary

Values for some general parameters

Parameter Value
Number of sentences 834619
Average sentence length in characters 104.1893
Average sentence length in words 11.7820
Number of distinct word forms 936369
Number of distinct word forms (without multiwords) 931634
Percentage of lower case word forms 72.5475
Number of multi word units 4735
Percentage of multi word units 0.5057
Number of running word forms 9812245
Number of running word forms (without multiwords) 9794893
Percentage of lower case running words 85.8923
Average word form length 12.8596
Average running word length 7.77841044
Percentage of word forms with frequency=1 62.3724
Number of sentence based co-occurrences 2000236
- minimal likelihood ratio 6.63
- maximal likelihood ratio 64002.98
Number of neighbour based co-occurrences 301247
- minimal likelihood ratio 3.84
- maximal likelihood ratio 106995.16
Average number of sentence based co-occurrences per sentence 29.3662
Average number of neighbour co-occurrences per sentence 3.4515
Most frequent word ja
Most frequent word's frequency 393021
12489 msec needed at 2024-12-11 03:00