1
|
General Corpus Information
|
1.1
|
Summary
|
2
|
Characters and Character N-Grams
|
2.1.4
|
Special Characters
|
2.1.6
|
Amount of special characters
|
2.1.7
|
Repeat Rate - Characters
|
2.2.3
|
Word prefixes
|
2.2.4
|
Word suffixes
|
2.2.5
|
Most frequent word beginnings
|
2.2.6
|
Most frequent word endings
|
2.2.10
|
Prefixes of Length 3 (between NN Co-occurrences)
|
2.2.11
|
Repetitions
|
2.2.12
|
Typical Prefixes and Suffixes
|
3
|
Words and Multiwords
|
3.2.1
|
The Most Frequent 50 Words
|
3.2.3.1
|
Longest Words in Top-1.000 by rank
|
3.2.3.2
|
Longest Words in Top-10.000 by rank
|
3.2.3.3
|
Longest Words in Top-100.000 by rank
|
3.3.2.1
|
Frequency of numbers and special patterns I
|
3.3.2.2
|
Frequency of numbers and special patterns II
|
3.3.2.3
|
Frequency of numbers and special patterns III
|
3.3.2.4
|
Numbers in date format (1980-2029)
|
3.5.1.1
|
Words by Length without multiplicity
|
3.5.1.2
|
Words by Length with multiplicity
|
3.5.2
|
Average word length for different frequency ranges
|
3.5.6
|
Longest Words
|
3.6.1
|
Zipf's law (Standard version)
|
3.6.2
|
Zipf's law for words of fixed lengths
|
3.6.3
|
Zipf's law for words with same first letter
|
3.6.4
|
Zipf's law for words with same last letter
|
3.6.5
|
Zipf's law for numbers
|
3.7.1
|
String similarity graph for words
|
3.7.2
|
String similar words of similar frequency
|
3.7.3
|
Distribution of the string similarity for different rank ranges
|
3.7.4
|
Node degree vs. word length
|
3.7.5
|
Levenshtein similarity examples
|
3.8.1
|
Number of letter-N-grams at word beginnings
|
3.8.2
|
Number of letter-N-grams at word endings
|
3.8.4
|
Number of words of fixed length in different frequency ranges
|
3.9.1.1
|
Most Frequent Abbreviations
|
3.10.1
|
Text coverage by top words
|
3.12.1
|
Words with Hyphens
|
3.12.2
|
Multiwords
|
3.12.3
|
(Multi-)Words with dots
|
3.12.4
|
Words containing special characters
|
3.12.5
|
Palindromes
|
3.12.6
|
Words with reverse word
|
3.12.7
|
Words containing many different characters
|
3.12.8
|
Words both in lower and upper case
|
3.12.9
|
Problems with sentence segmentation - Words ending in a stopword
|
3.12.11
|
Left neighbors of the full stop
|
3.12.12
|
Left neighbors of the full stop with additional internal full stops
|
3.12.13
|
Compounds
|
3.13.1
|
Average Position of Words by Word Length
|
3.14.1
|
Growth of types
|
3.14.2
|
Type-Token Ratios
|
4
|
Sentences
|
4.1.1
|
Shortest sentences
|
4.1.2
|
Sentences of fixed length I
|
4.1.3
|
Sentences of fixed length II
|
4.1.4
|
Sentences of fixed length III
|
4.1.5
|
Longest sentences
|
4.2.1
|
Length of sentences in characters
|
5
|
Co-occurrences
|
5.1.1
|
Next neighbor co-occurrence summary
|
6
|
Sources
|
6.1.1
|
Number of sources by time period
|
6.2.1
|
Size of Sources
|
6.2.2
|
Size of largest domains
|
6.2.3
|
Size of different TLDs
|
6.4.2.1
|
Word length for different sources
|
6.4.2.2
|
Average logarithmic word rank for different sources
|
6.4.2.8
|
Sources with low or high average word length of rare words
|