T

Text Machine

Powerful text tools, in your browser

Frequency Analysis

Paste any text to see how often each letter appears, compared side by side with the frequencies of written English. Read the bigram and trigram counts, check the Index of Coincidence to tell a monoalphabetic cipher from a polyalphabetic one, and export the table. Everything runs in your browser.

Try a sample:

Text to analyze

Paste some text above and its letter frequencies, bigrams, trigrams and Index of Coincidence will appear here.

How to use Frequency Analysis

  1. 1

    Paste your text

    Copy the text or ciphertext you want to study and paste it into the box. Letters are counted without regard to case, and spaces, numbers and punctuation are ignored.

  2. 2

    Read the summary

    Check the character and letter counts, how many distinct letters appear, the most frequent letter, and the Index of Coincidence, which hints at whether one alphabet or several were used.

  3. 3

    Study the letter-frequency chart

    Compare each letter's bar against its English marker. Switch to 'By frequency' to rank the letters and see the overall shape — lumpy for a substitution cipher, flat for a polyalphabetic one.

  4. 4

    Scan the bigrams and trigrams

    Look at the most common pairs and triples. In a cipher, the top trigram is often a disguised THE, which hands you three letters at once.

  5. 5

    Export or share

    Download the frequency table as a CSV for your notes or spreadsheet, or copy a shareable link that reopens the tool with your exact text. Everything stays in your browser.

Letter frequency analysis, explained

What is frequency analysis?

Frequency analysis is the study of how often each letter, pair of letters, or triple of letters appears in a piece of text. Because the letters of a language are not used equally — E and T are everywhere in English while Q and Z are rare — the pattern of frequencies acts like a fingerprint. Counting that pattern is the oldest and most powerful technique in classical cryptanalysis, first written down by the Arab scholar al-Kindi in the ninth century.

This tool counts the letters in whatever you paste, shows each one as a bar next to the expected English frequency, lists the most common bigrams and trigrams, and reports the Index of Coincidence. Together these numbers tell you whether the text is ordinary writing, a simple substitution cipher, or something that uses several alphabets at once — without you having to count a single letter by hand.

Reading the letter-frequency chart

Each row is one letter of the alphabet. The filled bar shows how often that letter appears in your text as a percentage of all the letters, and the thin vertical marker shows the frequency of the same letter in typical English. When a bar reaches well past its marker, that letter is over-represented; when it falls short, the letter is rarer than usual. Switch the sort order to rank the letters from most to least frequent, which makes the shape of the distribution obvious at a glance.

In normal English the tallest bars are E, T, A, O, I and N, and the chart looks lumpy and uneven. A monoalphabetic cipher keeps that lumpy shape but slides the peaks to different letters, because each letter is simply swapped for another. A polyalphabetic cipher flattens the chart until every bar is roughly the same height, because the same plaintext letter is enciphered differently depending on its position. Recognising those two shapes is the single most useful skill in breaking classical ciphers.

The Index of Coincidence

The Index of Coincidence, or IoC, measures the probability that two letters drawn at random from the text are identical. Ordinary English sits around 0.067 because its frequencies are so uneven, while completely random text approaches 0.038, where every letter is equally likely. A single number captures how lumpy or flat the distribution is.

This makes the IoC the quickest test for telling cipher families apart. Caesar, Atbash and keyword substitution ciphers only relabel letters, so the uneven English profile survives and the IoC stays high, near 0.066. Vigenère and other polyalphabetic ciphers blend several alphabets, flattening the frequencies and dragging the IoC down toward 0.04. The tool prints the value with a short hint, so a high reading points you at a substitution cipher and a low one points you at a polyalphabetic cipher.

Bigrams, trigrams and contact patterns

Single letters are only the start. English also has strongly preferred letter pairs and triples: TH, HE, IN, ER and AN are the commonest bigrams, and THE, AND, ING and ENT dominate the trigrams. The tool lists the most frequent pairs and triples in your text, counting them only inside words so that a space never joins two unrelated letters into a false pair.

These contact patterns are invaluable when a simple letter count is not enough. In a substitution cipher the disguised version of THE often shows up as the most common trigram, giving you three letters at once. Repeated bigrams can betray the length of a Vigenère key through the Kasiski method. Even the absence of doubled letters, or a suspicious run of rare pairs, is a clue about which cipher you are facing.

Breaking ciphers with frequency analysis

To attack a monoalphabetic substitution cipher, sort the chart by frequency and line it up against English. The most common cipher letter is probably E, the next probably T, and the top trigram is probably THE. Pencil in those guesses, then use the bigram and trigram lists to extend them — once you know E and T, the pair TH and the word THE fall into place quickly, and the rest of the message unravels from there.

For a Caesar cipher the same logic is even simpler, because every letter moves by the same amount: find the shift that lines the cipher's peak up with English's E and you have the key. For a Vigenère cipher, frequency analysis still works, but only after you split the text into columns by the key length, since each column is then a separate Caesar cipher you can solve on its own. Knowing the Index of Coincidence first tells you whether this column trick is even necessary.

Monoalphabetic versus polyalphabetic at a glance

If you remember only one thing, make it this. A high Index of Coincidence and a lumpy chart with clear tall bars mean a monoalphabetic cipher, where each letter maps to exactly one other letter — Caesar, Atbash, affine, or a keyword substitution. These yield to frequency analysis directly, because the statistics of the plaintext shine straight through.

A low Index of Coincidence and a flat chart where every bar is about the same height mean a polyalphabetic cipher, where one plaintext letter can become many different cipher letters — Vigenère, Beaufort, Gronsfeld or Porta. These hide the raw letter frequencies, so you must first recover the key length and then analyse each position separately. The chart and the IoC tell you which of these two worlds you are in before you spend any effort.

Limits and good practice

Frequency analysis is statistical, so it needs enough text to be trustworthy. A short message of a dozen letters can show wildly misleading frequencies simply by chance, while a full paragraph settles close to the expected pattern. When a sample looks ambiguous, the usual cause is that it is too short rather than that the method has failed.

Keep in mind that the English baseline shown here is for ordinary prose. Specialised text — a list of names, a chunk of source code, or writing in another language — has its own profile and will not match. The tool ignores spaces, digits and punctuation and folds upper and lower case together, which is exactly what you want for classical ciphers, but it means it analyses letters only, not the structure of an encoding like Base64 or Morse. For those, identify the encoding first and decode it, then run frequency analysis on the letters underneath.

Frequently asked questions

What is frequency analysis?
Frequency analysis counts how often each letter, pair and triple appears in a text. Because languages use letters unevenly — E and T are common in English, Q and Z rare — that pattern acts as a fingerprint. It is the oldest technique in cryptanalysis and the main way classical substitution ciphers are broken.
How do I use frequency analysis to break a cipher?
Sort the chart by frequency and match it against English: the most common cipher letter is probably E, the next T, and the top trigram is probably THE. Pencil in those guesses, then extend them with the bigram and trigram lists until the message reads. For a Caesar cipher, just find the shift that lines the peak up with E.
What is the Index of Coincidence?
The Index of Coincidence measures the chance that two letters picked at random from the text are the same. English is about 0.067 and random text about 0.038. Monoalphabetic ciphers keep the value high, near 0.066, while polyalphabetic ciphers like Vigenère flatten it toward 0.04, which is the quickest way to tell the two families apart.
What is the difference between monoalphabetic and polyalphabetic?
In a monoalphabetic cipher each letter always maps to the same other letter, so the lumpy English frequency profile survives and a high Index of Coincidence and uneven chart give it away. A polyalphabetic cipher uses several alphabets, so one plaintext letter becomes many cipher letters, flattening the chart and lowering the Index of Coincidence.
Why does the tool show bigrams and trigrams?
Single letters are not always enough. English has strongly preferred pairs and triples like TH, HE, THE and ING. In a substitution cipher the disguised THE is usually the commonest trigram, giving you three letters at once, and repeated bigrams can reveal a Vigenère key length through the Kasiski method.
What do the bars and the vertical marker mean?
The filled bar is how often a letter appears in your text, as a percentage of all letters. The thin vertical marker on the same row is that letter's frequency in typical English. A bar that overshoots its marker is over-represented; one that falls short is rarer than usual. The comparison shows at a glance how your text differs from plain English.
How much text do I need for reliable results?
Frequency analysis is statistical, so longer is better. A dozen letters can show misleading frequencies by pure chance, while a full paragraph settles close to the expected pattern. If a sample looks ambiguous, it is usually too short rather than the method failing. Aim for at least a sentence or two.
Does it work for languages other than English?
It counts the letters A to Z and compares them with the English baseline, so the counts are correct for any text but the comparison only makes sense for English prose. Other languages have their own frequency profiles, so the bars will not line up with the markers, though the raw counts, bigrams and Index of Coincidence are still useful.
Can I analyze Base64, Morse or binary?
This tool studies letter frequencies, so it works best on alphabetic text and ciphers. Encodings like Base64, Morse or binary represent text as symbols or numbers rather than letters, so you should identify and decode them first, then run frequency analysis on the letters underneath. The Cipher Identifier can tell you which encoding you have.
Is my text uploaded to a server?
No. All counting happens entirely in your browser, so your text is never uploaded, logged or stored. Even a share link keeps your text in the part of the URL after the hash, which browsers never send to a server, so it stays private until you choose to share it.
Can I export the frequency table?
Yes. The Export CSV button downloads the full A-to-Z table with each letter's count, its percentage in your text, and the English baseline percentage, ready to open in a spreadsheet or paste into your notes. You can also copy a shareable link that reopens the tool with the same text.

Related tools

Keep going with these handy tools

Substitution Cipher Solver

Unix Timestamp Converter

JSON to CSV Converter

CSV to JSON Converter

JSON to YAML Converter

Text to Binary Converter