Corpus: swa_newscrawl_2011_100K

Other corpora

2.2.11 Repetitions

Typical repetitions within words

Subword Length 2 - most frequent words
Subword Word Frequency
an wananchi 4473
An wananchi 4473
to watoto 2472
To watoto 2472
Ni nini 1463
ni nini 1463
ji jijini 1305
Ji jijini 1305
Me umeme 1267
me umeme 1267
Subword Length 2 - Most frequent subwords
Subword Count
li 1043
Li 1043
il 361
Ki 337
ki 337
ku 276
la 232
La 232
an 225
An 225
Amount of words containing repeated subwords of length 2 - per mille
Per mille
41.7642
Subword Length 3 - most frequent words
Subword Word Frequency
cho chochote 396
mba sambamba 179
huo huohuo 81
Huo huohuo 81
cha Chacha 46
Cha Chacha 46
and Kandanda 39
And Kandanda 39
Any manyanyaso 21
Sua kusuasua 21
Subword Length 3 - Most frequent subwords
Subword Count
Any 24
cha 20
Cha 20
mba 10
cho 9
Che 9
che 9
Sua 6
sua 6
mbu 4
Amount of words containing repeated subwords of length 3 - per mille
Per mille
1.7460
Subword Length 4 - most frequent words
Subword Word Frequency
Bara barabara 731
bara barabara 731
Bara barabarani 278
bara barabarani 278
wasi wasiwasi 256
Wasi wasiwasi 256
kati katikati 218
Kati katikati 218
piki pikipiki 162
Vile vilevile 106
Subword Length 4 - Most frequent subwords
Subword Count
Bara 12
bara 12
kata 6
Kata 6
kati 5
Kati 5
omba 4
Omba 4
pili 4
vugu 4
Amount of words containing repeated subwords of length 4 - per mille
Per mille
1.9802
Subword Length 5 - most frequent words
Subword Word Frequency
mbali mbalimbali 2890
Mbali mbalimbali 2890
ndogo ndogondogo 45
Ndogo ndogondogo 45
rambi rambirambi 43
Pindu kipindupindu 39
mboga mbogamboga 33
Mboga mbogamboga 33
kweli kwelikweli 19
Kweli kwelikweli 19
Subword Length 5 - Most frequent subwords
Subword Count
Pindu 4
Chana 3
chana 3
Randa 3
mbali 3
Mbali 3
gonga 3
mboga 2
Ajabu 2
Hivyo 2
Amount of words containing repeated subwords of length 5 - per mille
Per mille
1.4320
Subword Length 6 - most frequent words
Subword Word Frequency
Mikiki mikikimikiki 14
mikiki mikikimikiki 14
Shamra shamrashamra 12
shamra shamrashamra 12
Chembe chembechembe 9
wadogo wadogowadogo 9
chembe chembechembe 9
pilika pilikapilika 9
Wadogo wadogowadogo 9
vidogo vidogovidogo 8
Subword Length 6 - Most frequent subwords
Subword Count
wadogo 2
Wadogo 2
Mikiki 2
mikiki 2
shamra 2
Kamata 2
Shamra 2
kamata 2
pilika 1
mshike 1
Amount of words containing repeated subwords of length 6 - per mille
Per mille
1.4834
Subword Length 2 - most frequent words with hyphen
Subword Word Frequency
wa Ofisa-mkuu-wa-Watumishi 1
ma Dodoma-Manyoni 1
Wa Ofisa-mkuu-wa-Watumishi 1
Subword Length 2 - Most frequent subwords
Subword Count
ma 1
wa 1
Wa 1
Amount of words with hyphen containing repeated subwords of length 2 - per mille
Per mille
0.0205
Subword Length 3 - most frequent words with hyphen
Subword Word Frequency
Sua kusua-sua 1
sua kusua-sua 1
Subword Length 3 - Most frequent subwords
Subword Count
Sua 1
sua 1
Amount of words with hyphen containing repeated subwords of length 3 - per mille
Per mille
0.0116
Amount of words with hyphen containing repeated subwords of length 4 - per mille
Per mille
0.0000
Subword Length 5 - most frequent words with hyphen
Subword Word Frequency
South South-South 1
hovyo hovyo-hovyo 1
fasta fasta-fasta 1
Fasta fasta-fasta 1
Hovyo hovyo-hovyo 1
vyake kivyake-vyake 1
Subword Length 5 - Most frequent subwords
Subword Count
South 1
fasta 1
Fasta 1
hovyo 1
Hovyo 1
vyake 1
Amount of words with hyphen containing repeated subwords of length 5 - per mille
Per mille
0.0988
Amount of words with hyphen containing repeated subwords of length 6 - per mille
Per mille
0.0000
959041 msec needed at 2018-03-28 04:20