Corpus: guj_newscrawl_2014_30K

Other corpora

Overview

1 General Corpus Information
1.1 Summary
2 Characters and Character N-Grams
2.1.4 Special Characters
2.1.5 Sukhotin
2.1.6 Amount of special characters
2.2.3 Word prefixes
2.2.4 Word suffixes
2.2.5 Most frequent word beginnings
2.2.6 Most frequent word endings
2.2.7 Postfixes of Length 2 (between NN Co-occurrences)
2.2.8 Prefixes of Length 2 (between NN Co-occurrences)
2.2.9 Postfixes of Length 3 (between NN Co-occurrences)
2.2.10 Prefixes of Length 3 (between NN Co-occurrences)
2.2.11 Repetitions
2.3.1 Distribution of Letter A in Words
2.3.2 Distribution of Letter B in Words
2.3.3 Distribution of Letter C in Words
2.3.4 Distribution of Letter D in Words
2.3.5 Distribution of Letter E in Words
2.3.6 Distribution of Letter F in Words
2.3.7 Distribution of Letter G in Words
2.3.8 Distribution of Letter H in Words
2.3.9 Distribution of Letter I in Words
2.3.10 Distribution of Letter J in Words
2.3.11 Distribution of Letter K in Words
2.3.12 Distribution of Letter L in Words
2.3.13 Distribution of Letter M in Words
2.3.14 Distribution of Letter N in Words
2.3.15 Distribution of Letter O in Words
2.3.16 Distribution of Letter P in Words
2.3.17 Distribution of Letter Q in Words
2.3.18 Distribution of Letter R in Words
2.3.19 Distribution of Letter S in Words
2.3.20 Distribution of Letter T in Words
2.3.21 Distribution of Letter U in Words
2.3.22 Distribution of Letter V in Words
2.3.23 Distribution of Letter W in Words
2.3.24 Distribution of Letter X in Words
2.3.25 Distribution of Letter Y in Words
2.3.26 Distribution of Letter Z in Words
2.3.27 Distribution of Letter Á in Words
2.3.28 Distribution of Letter Ð in Words
2.3.29 Distribution of Letter É in Words
2.3.30 Distribution of Letter Í in Words
2.3.31 Distribution of Letter Ó in Words
2.3.32 Distribution of Letter Ú in Words
2.3.33 Distribution of Letter Ý in Words
2.3.34 Distribution of Letter Þ in Words
2.3.35 Distribution of Letter Æ in Words
2.4.1 Distribution of Digit 1 in Words
2.4.2 Distribution of Digit 2 in Words
2.4.3 Distribution of Digit 3 in Words
2.4.4 Distribution of Digit 4 in Words
2.4.5 Distribution of Digit 5 in Words
2.4.6 Distribution of Digit 6 in Words
2.4.7 Distribution of Digit 7 in Words
2.4.8 Distribution of Digit 8 in Words
2.4.9 Distribution of Digit 9 in Words
2.4.10 Distribution of Digit 0 in Words
2.4.11 Distribution of Digit - in Words
2.5.1 Distribution of Letter A in Sentences
2.5.2 Distribution of Letter B in Sentences
2.5.3 Distribution of Letter C in Sentences
2.5.4 Distribution of Letter D in Sentences
2.5.5 Distribution of Letter E in Sentences
2.5.6 Distribution of Letter F in Sentences
2.5.7 Distribution of Letter G in Sentences
2.5.8 Distribution of Letter H in Sentences
2.5.9 Distribution of Letter I in Sentences
2.5.10 Distribution of Letter J in Sentences
2.5.11 Distribution of Letter K in Sentences
2.5.12 Distribution of Letter L in Sentences
2.5.13 Distribution of Letter M in Sentences
2.5.14 Distribution of Letter N in Sentences
2.5.15 Distribution of Letter O in Sentences
2.5.16 Distribution of Letter P in Sentences
2.5.17 Distribution of Letter Q in Sentences
2.5.18 Distribution of Letter R in Sentences
2.5.19 Distribution of Letter S in Sentences
2.5.20 Distribution of Letter T in Sentences
2.5.21 Distribution of Letter U in Sentences
2.5.22 Distribution of Letter V in Sentences
2.5.23 Distribution of Letter W in Sentences
2.5.24 Distribution of Letter X in Sentences
2.5.25 Distribution of Letter Y in Sentences
2.5.26 Distribution of Letter Æ in Sentences
2.5.26 Distribution of Letter Z in Sentences
2.5.27 Distribution of Letter Á in Sentences
2.5.28 Distribution of Letter Ð in Sentences
2.5.29 Distribution of Letter É in Sentences
2.5.30 Distribution of Letter Í in Sentences
2.5.31 Distribution of Letter Ó in Sentences
2.5.32 Distribution of Letter Ú in Sentences
2.5.33 Distribution of Letter Ý in Sentences
2.5.34 Distribution of Letter Þ in Sentences
2.6.1 Distribution of Digit 1 in Sentences
2.6.2 Distribution of Digit 2 in Sentences
2.6.3 Distribution of Digit 3 in Sentences
2.6.4 Distribution of Digit 4 in Sentences
2.6.5 Distribution of Digit 5 in Sentences
2.6.6 Distribution of Digit 6 in Sentences
2.6.7 Distribution of Digit 7 in Sentences
2.6.8 Distribution of Digit 8 in Sentences
2.6.9 Distribution of Digit 9 in Sentences
2.6.10 Distribution of Digit 0 in Sentences
2.6.11 Distribution of Commas in Sentences
2.6.12 Distribution of Semicolons in Sentences
2.6.13 Distribution of Colons in Sentences
3 Words and Multiwords
3.2.1 The Most Frequent 50 Words
3.2.3.1 Longest Words in Top-1.000 by rank
3.2.3.2 Longest Words in Top-10.000 by rank
3.2.3.3 Longest Words in Top-100.000 by rank
3.3.2.1 Frequency of numbers and special patterns I
3.3.2.2 Frequency of numbers and special patterns II
3.3.2.3 Frequency of numbers and special patterns III
3.3.2.4 Numbers in date format (1980-2029)
3.5.1.1 Words by Length without multiplicity
3.5.1.2 Words by Length with multiplicity
3.5.2 Average word length for different frequency ranges
3.5.6 Longest Words
3.6.1 Zipf's law (Standard version)
3.6.2 Zipf's law for words of fixed lengths
3.6.3 Zipf's law for words with same first letter
3.6.4 Zipf's law for words with same last letter
3.6.5 Zipf's law for numbers
3.8.1 Number of letter-N-grams at word beginnings
3.8.2 Number of letter-N-grams at word endings
3.8.4 Number of words of fixed length in different frequency ranges
3.9.1.1 Most Frequent Abbreviations
3.10.1 Text coverage by top words
3.10.2 Sentences containing the most frequent words
3.10.3 Highest ranked word in sentence
3.10.4 Repeat Rate - Words
3.10.5 Entropy
3.12.1 Words with Hyphens
3.12.2 Multiwords
3.12.3 (Multi-)Words with dots
3.12.4 Words containing special characters
3.12.5 Palindromes
3.12.6 Words with reverse word
3.12.7 Words containing many different characters
3.12.9 Problems with sentence segmentation - Words ending in a stopword
3.12.10 Left Neighbours of Full Stop
3.12.11 Left neighbors of the full stop
3.12.12 Left neighbors of the full stop with additional internal full stops
3.12.13 Compounds
3.13.1 Average Position of Words by Word Length
3.14.1 Growth of types
3.14.2 Type-Token Ratios
4 Sentences
4.1.1 Shortest sentences
4.1.2 Sentences of fixed length I
4.1.3 Sentences of fixed length II
4.1.4 Sentences of fixed length III
4.1.5 Longest sentences
4.2.1 Length of sentences in characters
4.2.2 Length of sentences in words
4.3.1.1 Most Frequent Sentence Beginnings I
4.3.1.2 Most Frequent Sentence Beginnings II
4.3.1.3 Most Frequent Sentence Beginnings III
4.3.1.4 Most Frequent Sentence Beginnings IV
4.3.1.5 Number of Word-N-grams at Sentence Beginnings
4.3.1.6 Sentences with the most frequent beginning
4.4.1.1 Most Frequent Sentence Endings I
4.4.1.2 Most Frequent Sentence Endings II
4.4.1.3 Most Frequent Sentence Endings III
4.4.1.4 Most Frequent Sentence Endings IV
4.4.1.5 Number of Word-N-grams at Sentence Endings
4.4.1.6 Sentences with the most frequent endings
4.4.2 Types of Sentences by Punctuation Mark
4.5.2.1 Maximum word rank in sentence
4.5.2.2 Average word rank in sentence
4.5.2.3 Sentences consisting of many low frequency words I
4.5.2.4 Sentences consisting of many low frequency words II
4.5.2.5 Sentences consisting of short words only I
4.5.2.6 Sentences consisting of short words only II
4.5.2.7 Sentences consisting of long words only I
4.5.2.8 Sentences with high average word length
4.5.2.9 Sentences without high frequent words
4.7.1.1 Most Frequent Sentence Signatures
4.7.1.2 Sentences with Most Frequent Sentence Signatures
4.7.3.1 Most Frequent Hash Values For Sentences
4.7.3.2 Sentences with Most Frequent Hash Values
4.8.2 Distribution of words with frequency one
4.8.3 Sentences with foreign stopwords
4.8.4 Sentences with Internationalisms or Proper Names
4.9.1 Sentences with repeated stopword
4.9.2 Sentences with repeated Non-stopword
5 Co-occurrences
5.1.1 Next neighbor co-occurrence summary
5.1.2 Most significant next neighbor co-occurrences
5.1.3 Number of words without NN co-occurrences
5.1.4 Significance and frequency for NN co-occurrences
5.1.6 Language Fingerprint
5.1.7.1 Number of NN co-occurrences vs. Frequency I
5.1.7.2 Number of NN co-occurrences vs. Frequency II
5.1.7.3 Number of left vs. right NN co-occurrences
5.1.9.1 Skewness in NN co-occurrences I
5.1.9.2 Skewness in NN co-occurrences II
5.1.9.3 Skewness in NN co-occurrences III
5.1.9.4 Skewness in NN co-occurrences IV
5.1.9.5 Skewness in NN co-occurrences V
5.1.9.6 Skewness in NN co-occurrences VI
5.1.11 Zipf's law for NN co-occurrences
5.1.12 Number of NN-co-occurrences depending on frequency classes
5.1.13 Number of NN-co-occurrences depending on frequency classes and upper or lower case I
5.1.14 Number of NN-co-occurrences depending on frequency classes and upper or lower case II
5.1.15 Number of NN-co-occurrences depending on frequency classes and upper or lower case III
5.1.16 Number of NN-co-occurrences depending on frequency classes and upper or lower case IV
5.1.17 Number of NN co-occurrences depending on word length
5.1.18 Words nearly always as next neighbors
5.1.19 Identical Words as NN co-occurrences
5.1.20 Number of NN co-occurrences depending on rank
5.2.1 Sentence based co-occurrence summary
5.2.2 Most significant sentence based co-occurrences
5.2.3 Number of words without sentence co-occurrences
5.2.4 Significance and frequency for sentence co-occurrences
5.2.5 Number of sentence co-occurrences vs. Frequency
5.2.9 Zipf's law for Sentence co-occurrences
5.2.10 Number of Sentence co-occurrences depending on frequency classes
5.2.11 Number of Sentence co-occurrences depending on word length
5.2.13 Number of Sentence co-occurrences depending on frequency classes and upper or lower case I
5.2.14 Number of Sentence co-occurrences depending on frequency classes and upper or lower case II
5.2.15 Number of Sentence co-occurrences depending on frequency classes and upper or lower case III
5.2.18 Words nearly always together in sentences
5.3.1 Quotient of Sentence and NN Co-occurrences
6 Sources
6.1.1 Number of sources by time period
6.2.1 Size of Sources
6.2.2 Size of largest domains
6.2.3 Size of different TLDs
6.4.1.1 Sentence length for different sources
6.4.2.1 Word length for different sources
6.4.2.2 Average logarithmic word rank for different sources
6.4.2.6 Sources consisting of many or few words with frequency 1
6.4.2.8 Sources with low or high average word length of rare words