Cheating at Word Games
 6 minutes read  1125 words  Page SourceThe other day, I saw the word game Wordle going around on my Twitter feed. The game prompts you to guess a 5letter word in a Mastermindlike style  every letter in your guess is reported as being correct, as present (i.e. that letter occurs somewhere in the answer, but is misplaced), or as absent.
As is my way, I immediately started thinking about how I could use this as a prompt for a small tech project. I considered making a bot to play the game (using Greasemonkey and KeyboardEvents), but the interface aspect is less interesting than the underlying strategy. What’s the best way to play the game?
Strategy
There is a limited universe of “answer candidates”^{1}. Every time you guess, the set of answercandidates is divided into (3^5=)243 different partitions^{2}, based on the 3 possible states each of the 5 letters in your guess might be in:
 All answers for which all letters in your guess would be entirely incorrect
 All answers for which the first letter of your guess is presentbutincorrectlyplaced, and all other answers in your guess are incorrect
 All answers for which the first letter of your guess is correct, and all other letters are incorrect
 All answers for which the first letter of your guess is correct, the second is presentbutincorrectly placed, and all others are incorrect
 …and so on…
This might be easier to understand with the following diagram:
(Examples generated with this script)
The game’s response to your guess indicates which partition the answer lies in. Assuming that you haven’t guessed the word, you can then repeat the process with a different guess  your guess will tell you which “subpartition” within that partition the word lies in, and so on. With each guess, you are gaining information that is used to reduce the set of possibilities. The strategy, then, is to determine which words you should pick to maximize the information gained at each step^{3}.
Information is maximized when the size of the indicated partition is minimized^{4}  the smaller the indicated partition, the less uncertainty exists about which candidate is the actual answer. However, since we don’t know in advance which partition will be indicated (because we don’t know the answer), we instead have to minimize the expected partition size for all possible answers, rather than minimizing the size of the indicated partition (being able to do the latter would be equivalent to knowing the answer in advance!). For a given partioning strategy, we can calculate the expected partition size with the following (where “part(x)” is “the partition that contains (candidate answer) x”, $ \mathbb{P} $ is the set of partitions, and $ \mathbb{C} $ is the full set of candidate answers)^{5}:
$$ \begin{aligned} E(size\ of\ partition) &= \sum_{c \in \mathbb{C}} \mathcal{P}(c\ is\ correct) *  part(c) \newline &= \sum_{p \in \mathbb{P}} \sum_{c \in p} \mathcal{P}(c\ is\ correct) * part(c) \newline &= \sum_{p \in \mathbb{P}} \sum_{c \in p} \mathcal{P}(c\ is\ correct) * p\newline &= \sum_{p \in \mathbb{P}} p \sum_{c \in p} \mathcal{P}(c\ is\ correct)\newline &= \sum_{p \in \mathbb{P}} p \sum_{c \in p} \frac {1} {\mathbb{C}}\newline &= \sum_{p \in \mathbb{P}} p * \frac {p} {\mathbb{C}}\newline &= \frac {1} {\mathbb{C}} \sum_{p \in \mathbb{P}} p^2 \end{aligned} $$
Therefore, we can maximize the information gained at each guess by picking a guess that minimizes the sumofsquares of the sizes of all resultant partitions. This strategy should then be repeatable to narrow down to the actual answer, by using the identified partition as the next “universe” of guessable words.
Conclusion
With this code, I figured out the best and worst words to start with (lower numbers are better):
Computing: [#############################################] 10657/10657
Best guesses:
(0.023954023202981775, 'roate')
(0.02416039632596131, 'soare')
(0.02440110277138952, 'salet')
(0.025732265392850645, 'orate')
(0.026118142082110753, 'earst')
Worst guesses:
(0.4011596826033615, 'immix')
(0.3858660533939142, 'jugum')
(0.38409863366438246, 'qajaq')
(0.375986639859308, 'gyppy')
(0.3756119588186725, 'fuffy')
Intuitively, these seem reasonable. The letters R, O, A, T, E, and S are all pretty common, so it makes sense to me that starting with words that use these letters would be optimal (particularly since, then, you’re more likely to get a “correct” letter, which is more informative than a “present” one). The worst guesses not only use rarer letters (J, X, Q, Y), but also use duplicated letters which give less information.
That said, I would love to do some testing on this strategy, by setting up the iterated strategy, having it “play” the game, and recording how may attempts are required to win. It would be particularly cool to see if there are cases where the second guess of this iterated strategy is known to be wrong (from the results of the first guess) because that increases the amount of information gained. If we know that the word ends in “E”, there’s more information gained by guessing a word that doesn’t end in E  but humans (probably?) intuitively try to “keep the known letters”. Maybe a followup post!
In particular, I’ve been informed (while writing this post) that a friendofafriend had already figured out the optimal startwords, and they’re similartobutdifferentfrom mine  so, there’s at least one interesting alternative perspective out there! (EDIT: 20211229) I implemented an alternative approach  finding pairs of 5letter words that together comprise the 10 most common letters in the corpus  which I’ve been told is pretty similar to Bruno’s (I intentionally waited until after I’d got two implementations before I spoiled myself by reading those tweets!), though we somehow got different answers. I’m looking forward to learning what I did wrong!

We can figure out exactly what this is by taking a quick peek at the code  or, we could infer that it exists by noting that there are a largebutfinite number of ways of arranging 26 letters in 5 positions (a little less than 12 million), or a smallerbutstillverylarge number of actual fiveletter words. The actual list of potential answers for Wordle includes 2,315 words, and there are 10,657 words that you are allowed to guess (that is  there are 8,342 words that you’re allowed to guess for information, but that cannot possibly be the answer). ↩︎

Thanks to @fuseboy for pointing out my error of arithmetic here! ↩︎

This is a naïve implementation: there might be a better strategy which is not locallymaximal, but which generates more information by coordination between the steps. For instance, there might be a choice at Step 1 that results in largerthanoptimal partitions, but where each partition is then more amenable to subdivision in Step 2. My intuition is that this isn’t the case for this problem, but I’m open to disagreement! ↩︎

It’s been a while since I’ve studied Information Theory, so I might be misusing some terms. Please do feel free to correct me if so! ↩︎

LaTeX formatting added to this blog courtesy of this guide. ↩︎