Corpus: por_newscrawl_2015_100K

Other corpora

2.2.11 Repetitions

Typical repetitions within words

Subword Length 2 - most frequent words
Subword Word Frequency
es meses 813
és meses 813
educação 401
ca educação 401
educação 401
Educação 400
ca Educação 400
Educação 400
comunicação 210
ca comunicação 210
Subword Length 2 - Most frequent subwords
Subword Count
183
183
ca 183
Da 104
104
da 104
104
os 82
Os 82
ar 78
Amount of words containing repeated subwords of length 2 - per mille
Per mille
12.3707
Subword Length 3 - most frequent words
Subword Word Frequency
End atendendo 48
Ass assassinato 45
End dependendo 45
Ass assassinado 39
Ass assassinatos 33
Ass assassinos 22
End vendendo 19
and andando 18
And andando 18
bar Bárbara 16
Subword Length 3 - Most frequent subwords
Subword Count
Ass 21
End 20
bar 14
Bar 14
Tan 6
and 4
And 4
Che 3
Can 2
can 2
Amount of words containing repeated subwords of length 3 - per mille
Per mille
1.1068
Subword Length 4 - most frequent words
Subword Word Frequency
Peri Periperi 3
Tchê Tchetchénia 2
cara Caracaraí 1
Cara Caracaraí 1
Subword Length 4 - Most frequent subwords
Subword Count
Tchê 1
cara 1
Cara 1
Peri 1
Amount of words containing repeated subwords of length 4 - per mille
Per mille
0.0562
Subword Length 5 - most frequent words
Subword Word Frequency
mente veementemente 4
Mente veementemente 4
Subword Length 5 - Most frequent subwords
Subword Count
Mente 1
mente 1
Amount of words containing repeated subwords of length 5 - per mille
Per mille
0.0335
Subword Length 6 - most frequent words
Subword Word Frequency
Brasil BrasilBrasil 1
brasil BrasilBrasil 1
Subword Length 6 - Most frequent subwords
Subword Count
brasil 1
Brasil 1
Amount of words containing repeated subwords of length 6 - per mille
Per mille
0.0681
Subword Length 2 - most frequent words with hyphen
Subword Word Frequency
pré-requisitos 6
Re pré-requisitos 6
pré-requisitos 6
te registe-te 2
pré-requisito 2
Te registe-te 2
co almoço-convívio 2
Co almoço-convívio 2
Re pré-requisito 2
pré-requisito 2
Subword Length 2 - Most frequent subwords
Subword Count
Re 4
4
4
co 2
Co 2
Os 1
1
1
La 1
te 1
Amount of words with hyphen containing repeated subwords of length 2 - per mille
Per mille
0.1370
Subword Length 3 - most frequent words with hyphen
Subword Word Frequency
sol Sol-Sol 3
Sol Sol-Sol 3
Chi Chi-chi-chi 1
chi Chi-chi-chi 1
Piu Piu-piu 1
vai Vai-Vai 1
Sul Sul-Sul 1
sul Sul-Sul 1
Vai Vai-Vai 1
Subword Length 3 - Most frequent subwords
Subword Count
Sol 1
Chi 1
chi 1
Piu 1
Sul 1
sul 1
vai 1
Vai 1
sol 1
Amount of words with hyphen containing repeated subwords of length 3 - per mille
Per mille
0.0644
Subword Length 4 - most frequent words with hyphen
Subword Word Frequency
mata mata-mata 8
Mata mata-mata 8
Vera primavera-verão 4
verá primavera-verão 4
chic Chic-Chic 1
Chic Chic-Chic 1
Vera Primavera-Verão 1
verá Primavera-Verão 1
Subword Length 4 - Most frequent subwords
Subword Count
Vera 2
verá 2
Mata 1
chic 1
Chic 1
mata 1
Amount of words with hyphen containing repeated subwords of length 4 - per mille
Per mille
0.0749
Subword Length 5 - most frequent words with hyphen
Subword Word Frequency
ganha ganha-ganha 2
corre corre-corre 2
Corre corre-corre 2
Ganha ganha-ganha 2
Lambe Lambe-Lambe 2
duplo duplo-duplo 1
Duplo duplo-duplo 1
Subword Length 5 - Most frequent subwords
Subword Count
corre 1
Corre 1
ganha 1
Ganha 1
duplo 1
Duplo 1
Lambe 1
Amount of words with hyphen containing repeated subwords of length 5 - per mille
Per mille
0.1339
Subword Length 6 - most frequent words with hyphen
Subword Word Frequency
quebra quebra-quebra 3
Quebra quebra-quebra 3
Double double-double 1
cheira cheira-cheira 1
double double-double 1
Subword Length 6 - Most frequent subwords
Subword Count
Quebra 1
cheira 1
Double 1
double 1
quebra 1
Amount of words with hyphen containing repeated subwords of length 6 - per mille
Per mille
0.2042
1090341 msec needed at 2018-03-21 04:32