```I have some things to say about the word game "Wordle" that became
popular in 2022, along with related games that are based on  (roughly) the
same dictionary (Quordle, Octordle, etc.) I do like to play them,
but as a mathematician I wanted to do a thorough analysis of some
questions that arose as I played.

What I present here is (mostly) a discussion of the "good" sets
of words drawn from the  original 2315-word Wordle vocabulary list.
A set of words is "good" if, when used as a starting set of entries in the
games, it enables a human guess all the words in a small number of turns.
The starting set must help win all the "subgames" in a compound game; the
player then switches to "hard mode" in each of the sub-games one at a time,
guessing only words that are consistent with the clues (the colored tiles that result).

I wish to find and to compare those initial lists of starting words.
I will use both exhaustive searches and clever optimization techniques
to find "discriminating" sets of words: sets for which each resulting
array of green/yellow/gray tiles matches relatively few vocabulary words.

If I accomplish nothing else with these sets of words, at least I will
have generated some great passwords!

I am hardly the first person to apply a thorough mathematical analysis
to Wordle. Some open forums with information include Stack Exchange and
Reddit. Laurent Poirrier has collected information about "optimal"
algorithms for playing the games. When applied to Wordle itself, those
alternatives are more "efficient" than anything I propose here, at
least in the sense that the average number of guesses will be higher
playing the way(s) that I propose here. But beginning with a fixed,
good starting list is both simpler for a human player (many fewer
branching rules required), and better suited for the compound games,
and those are the criteria of interest to me.

If I ever want to waste more time on this project I really should
re-write from scratch the software routines I used, rerun everything
efficiently and to completion, and summarize the findings in a clear,
complete, and concise way. Yeah, sure.

Please let me know of corrections or additions to this document.
-- dave
(rusin@math.utexas.edu)

Index of sections:
Some caveats
Introduction: what are we doing in this document?
The best starting sets of six (and more!), and why these are interesting
Best starting quintuples and waltzing nymphs
Best starting quadruples: everyone can win at Wordle
Best starting triples (by various measures)
Best (and possibly best) starting pairs
The best single word to start with
Concluding remarks

==============================================================================
Some initial caveats first:

1. All comments here are about playing Wordle in "easy mode".
(I don't even know what "hard mode" would mean for a compound game.)

2. All my analyses are built upon the word lists in the version of Wordle that
was a simple web page in February 2022 (before purchase by the New York Times).
In particular, (almost) all uses of the word "word" here mean "one of the original 2315
possible answers to a Wordle puzzle". (I do make a few comments below that refer
to the larger, 12972-word, list of acceptable inputs to Wordle, but I have made
little effort to update them in response to NYT's enlargment of that set in Summer 2022.)
I have gathered together a long list of comments about the word list(s)
that I recommend to a person who actually wants to play the games well.
(It is important to know what words are, or are not, potential Wordle answers!)
Some of the compound games use slightly different word lists; these are
discussed only briefly.

3. In original Wordle, the daily hidden words were presented in a
particular (random-looking) order; since November 2022 they are
chosen by a "curator" at the Times. Our model of the games assumes
instead that the words in the word list are chosen at random, with
uniform probability, to be hidden each time we play. (This appears
to be the mode of play in the compound games, at least in "practice
mode", although as far as I can determine the multiple hidden words
are chosen to be distinct.)

4. In some cases I am stating claims of optimality or completeness.
The proofs I give are mostly just sketches that can be fleshed out by
the reader if interested.  The only parts that do not amount to a
simple case-by-case computer check have to do with the computation of
covering sets (which I did with linear-programming/ optimization
software Gurobi). I have written up a brief introduction to that
technique available here. The key ideas are (a) to cement ideas of
"nearness" or "similarity" in the word list, (b) to identify sets of
"most-similar" words that will be problematic late in the game, (c) to
compute for each such set the collection of words to play that will
help avoid these problematic sets, then (d) to find a "cover" for
these collections --- a set of words that intersects all or most of
these collections.

==============================================================================

INTRODUCTION: HOW DO PEOPLE PLAY WORDLE?

Ask a frequent Wordler their strategy and you'll get a variety of
answers.  "I just pick a random word to start with and run with it";
"I start with ADIEU to get a lot of vowels"; "I read somewhere that
it's best to start with CRANE + SPILT". To me, these sound like only
Phase 1 of a strategy: all these answers specify a certain number "a"
of fixed starting words (a=0, a=1, and a=2 respectively). (I should
note parenthetically that the second answer came from someone who,
unlike me, is willing to guess a word that's not a Wordle answer-word,
and the third answer came from someone who, like most of us, is
playing Wordle's "easy mode".)

This Phase 1 is about gathering information about what the hidden word
might be.  And it's important for people who (like me) play the compound
games, because the hope is that these first "a" moves will simultaneously
reveal a lot of information about the multiple Wordle subgames.

But then comes Phase 2: how do we use the information gained? At some
point, people start to enter guesses of what the hidden word might be;
unlike using SPLIT after CRANE, most people at some point in the game
start to enter only words that might actually be the answer word
(i.e., they unconsciously switch to Wordle's "hard mode"). They'll
spend some number "b" of guesses in this mode trying to guess the
right word. They will try to have a+b no larger than 6, to win the game.

A person playing Wordle does not need to think of the distinction
between Phase 1 and Phase 2. But it does become more important when
playing the compound games like Quordle (with N=4 subgames) in which
we are in essence playing Phase 1 just once for all N of the subgames
and then carrying out Phase 2 separately for each of them.  Thus the
total number of moves would be not a+b but rather a + N b . As N
increases, it becomes more and more important to keep b small, even if
it means  a  has to be a bit larger. In other words, we want to find
sets of words for Phase 1 that are really good at determining what the
hidden word(s) might be, so that we spend very little time after that
actually pondering the clues left after Phase 1. Considering the
optimal solutions we find in this document, the expected number of
steps to solve an N-fold compound game need not be higher than
the lowest of these:
6 + N
5 + 1.0058 N
4 + 1.0298 N
3 + 1.1682 N
2 + 1.6590 N
These are merely upper bounds, but the pattern is clear: for
sufficiently large values of N it can be more efficient to use a
larger starting set.

----------------

Let's see how we can analyze just how good a starting set is. As we
shall see, in order to be able to compare different starting sets,
it will be important to know not just what words a player starts with
but also what exactly they will do after those words are entered.

Let's follow one person whose Phase 1 has a=3: they use the three starting
words LOATH+MURKY+SPINE. It's a good start! But now what? Consider
what this person might do when the colored tiles show up as in each of
these examples.
lower case = yellow tile = right letter, wrong place;
UPPER CASE = green  tile = right letter, right place.
Play along here: what would YOU do in each case?

LOATH + MURKY + SPINE
1)  .O..h   .u...   s...E
2)  .....   .....   ....E
3)  ..A..   ..rK.   ....E
4)  ....H   .ur..   s....
5)  l.A..   ...k.   ...N.
6)  ...t.   ..r..   ..i.e
7)  .o...   ..r..   .p..E
8)  .....   ..r..   .pI.E
9)  .o...   ..r..   ...n.
10)  LOA..   m...Y   .....
11)  ..a..   ..r..   ....e
12)  ..at.   .u...   ...N.

I hope for Example #1 you decided the word was "HOUSE". You're right!
That's the only word consistent with those clues, and you might as well
enter it on your next turn and win.

Example #2 is much harder but it turns out there is only one Wordle word
that matches this pattern: "WEDGE". Most people, I think, would need
a hint to figure this out, and that's fine, but of course it costs
a turn. In what follows, we will assume the player is perspicacious
enough to spot the right word without hints, when there is only one
(and more generally we will assume the player can list all the possible
words consistent with the hints). This is NOT realistic; from time
to time we will discuss ways to make things easier for the player.
But it illustrates why a person needs to really know the word list!

It turns out these first two examples are pretty representative of
what this player will face: Of all the words in the Wordle dictionary,
1364 (59%) are uniquely identified by the colored tiles that result
from playing LOATH+MURKY+SPINE. But for the rest, the colored tiles
have only indicated that the hidden word is one of a "cluster" of
similar-looking words.

In Example #3, there's a good chance you see the "_rake" and so you'd enter
"brake" right away. Again, not a bad plan but it turns out "drake" is
also a Wordle word. This is common: we think we know the word, so why
not enter it? But then we discover the word we entered is only one of
several possibilities; in this example we have a 50-50 chance of getting
the word right, even if we *do* know the two possibilities. The same
would happen in Example #4: this time there's a better chance you recognize
both "brush" and "crush" as possibilities, and they are the only ones, so
what else is there to do but enter one or the other and hope for the best;
half the time you'll win on the first guess, half the time on the second.

With a bit of effort you might figure out that when Example #5 shows
up, the word is either BLANK, CLANK, or FLANK. So what do we do now?
Most people would probably enter these words one at a time, especially
if at first only one or two of the possibilities comes to mind.
(Really? "clank"?!) If that's what you choose to do, you'll get the
right word in either 1, 2, or 3 more moves, each with probability 1/3.
Playing this way --- simply entering the first matching word that
comes to mind --- we might call "guess-at-will" mode, or (since we have
now slipped into playing "hard mode"!) we might call it "free-form
hard mode".

The situation in Example #6 is a little different. The possible words
now are REFIT, RIVET, and TIGER. But this is a better situation for the
player than example 5! No matter which of the three we choose to enter
as our fourth word, if it's wrong we will get enough new information
to tell which of the other two is the hidden word. So in reality we're
playing the same way, but have better odds: still a 1/3 chance of
winning on move 4 but then a 2/3 chance of winning on move 5.

This now leads us to look at Example #7: there are again three possible
answers: GROPE, PROBE, and PROVE. But this time the situation is a mix
of the previous two. If we guess GROPE on move 4, then we have no
additional information to distinguish whether PROBE or PROVE is the
right word. If instead we guess one of the other two, then we *do* get
that information and can surely win on move 5.

So in this case, the player has two choices: it's simpler just
to continue to guess whatever seems to fit, continuing in freeform
hard mode. But it's more efficient to use a "guided hard-mode",
in which (in addition to memorizing the three starting words) the
player memorizes that playing PROBE when it's possible to do so is the
better thing to do.

In that last example, taking the effort to remember an additional rule
has only a small payoff, but the same principle applies in more important
cases. Example #8 shows an array that could signal any of GRIPE, PRICE,
PRIDE, or PRIZE. It's quite possible that a player using freeform
guessing would guess GRIPE first, which unfortunately would give no
information about which other word is the hidden one (if it's not
GRIPE itself) and then no matter which other of the four words we try next,
we never get more information about the remaining candidates when we
guess wrong. In this case the player could definitely run out of turns
and lose.  By contrast, if the player takes pains to remember to keep
an eye out for PRIZE and play it when it's possible to do so, then he
or she will definitely win by move 6 whichever of the other three is
the hidden word.

So in our analysis of starting wordsets, we will draw a distinction
between their value to a player guessing freely each time, and the
value to a player who remembers preferential members of key clusters.

There's one more option that a clever player can use, and it's again
illustrated by example 5. For a player not committed to hard mode at
all, there's no reason the player could not enter, say, BRACE as the
fourth word. Depending on whether the B, the C, or neither gets a
colored tile, the player knows right away which is the right word and
can enter it as the fifth word, and win. So this manner of play ---
this "out of the box thinking" --- can reduce the maximum number
of turns that it will take to resolve a situation like example 5.
That can mean the difference between winning and losing!

Out-of-the-box mode can also reduce the average number of turns
needed (i.e. the expected value of this random variable).  Example #9
demonstrates this: there are five words that could possibly be that
day's hidden word: BROWN, CROWN, DROWN, FROWN, and GROWN.  It's pretty
clear that both freeform and guided hardmode can take up to five moves
to get the right answer, with the player losing the game.  But if the
player enters BADGE for the fourth word, for example, then there will
be a clear signal whether or not BROWN, DROWN, or GROWN is the hidden
word and can be entered to win on move 5; otherwise the word is either
CROWN or FROWN, and we can enter one of them on move 5 and if necessary
enter the other on move 6 to win. So the maximum number of guesses needed
drops from 5 to 3, and the expected number drops from 3.00 to 2.40.
That's a significant improvement, but it does come at the cost of the player
having to memorize more steps to their algorithm (i.e. to remember that
if the word could be BROWN, then it's best to play BADGE).

With more options to choose from, it's not surprising that we can often
find out-of-cluster words that trim the set of candidates in a cluster
more often than using in-cluster ("preferred") words. In an attempt to
use fewer moves it's tempting to look outside the cluster more often.
I prefer not to do so when an in-cluster word is available for two
reasons. First, the special rules to resolve a problematic cluster take
half as much memory this way! Secondly, when used in the compound games,
the preferred-word rules continue to be just as useful even when
previously-solved subgames remove some candidates from a cluster (e.g.
the rule "play PRIZE whenever it's a candidate" continues to be at
least as good a move as making a random selection among the candidates,
irrespective of how many candidates are left in the same cluster as
PRIZE). On the other hand, using up a move to enter an out-of-cluster
word could be *less* efficient than choosing a random candidate, after
some of the candidates are removed (e.g. if DROWN and GROWN have already
been eliminated, it is a waste of a move to enter BADGE).

So we will generally assume the player will NOT reach for an
out-of-cluster word if at least one of the words in the cluster will
lead to a guaranteed win.  In practice -- particularly in compound
games -- humans will find it can be handy to employ those tactics if
they can spot them on the fly (potentially even using words that are
not on the shorter Wordle answer-list) if the player is close to running
out of moves. But we will avoid discussion of such ad-hoc strategies.

----------------

For a player who has just entered  LOATH, MURKY, and SPINE there are 1,715
possible ways the colored tiles can then appear. (We've just worked through
9 of them.) Fully 80% of them indicate precisely one word and the
player has an easy win on move 4. But those other 20% can be tricky,
as we have seen. It's easy enough to write a computer program to alert
us to all of them and outline potential responses, but if we wish to
answer the question of how good this starting set is, we have to know
just *how* the player intends to proceed in those other 20% of the cases!

In my analyses, I will assume players play out "Phase 2" in one of two ways:
(1) Simply use a guess-at-will strategy. With a six-move limit, that
may mean accepting the possibility of a loss; we might want to
compute that probability. Or we can imagine the freedom to
continue playing more moves until victory, and then we can ask
for the probability distribution: what is the probability that
the player will win after 1, 2, 3, ... moves. From that we could
compute the expected number of moves until a win.
(2) Use the same guess-at-will strategy in general, but by
pre-computing startegies for the possible tricky situations, the
player will use a "preferred" right answer (like PRIZE for Example 8)
when necessary to do so to ensure a win. And (only) if no such
preferred word exists, the player will use an "out-of-the-box"
solution (like playing BADGE when BROWN is indicated by the clues,
in Example 9). In either case, I would expect the player to
revert to free-form guessing in the very next move.

There can also be situations in which neither a "preferred" word nor
an "out of the box" solution exists. That turns out not to happen with
LOATH + MURKY + SPINE but it occurs for example with LEARN + STICK + DOUGH:
when the hidden word could be "batch", it could also be any of
batch, catch, hatch, match, patch, watch
It turns out that no matter what word you enter next, from the entire
Wordle answer list, you could STILL have a set of at least 3 words
that produce the same colored-tile patterns. In fact, just this once,
I even checked all 14,853 currently-allowed Wordle input words, and
every one of them still leaves a set of three of them unseparated.
(I have to say I was surprised by this!)  That doesn't mean the player
cannot still win by the 6th move, but now he or she must use BOTH
moves 4 and 5 to gain enough information to be able confidently to
enter the correct word, finally, on move 6. For example the player
would have to know in advance to enter (say) BAWDY on move 4 and CHAMP
on move 5 . So that's another rule this player has to remember:
BATCH -> BAWDY + CHAMP ; using it, the player will get enough information
to know what to enter for move 6. Even a reasonably good starting set
might need a "two-word strategy" like this for a couple of its most
problematic clusters of similar words that yield the same tile patterns,
but the best starting sets resolve every cluster with just "preferred"
and (single) "out-of-the-box" moves.

There is one more playing option we might consider, as we try to model
what the player might do when armed with a fixed starting word-set.
Consider the player using LOATH + MURKY + SPINE who gets the
colored-tile pattern in Example #10. It is already clear after the
first two words have been entered that the word must be LOAMY, so
there is no point to entering SPINE. This example is especially
obvious but more generally there is no point to entering SPINE if the
first two words have already revealed five different letters in the
hidden word; since SPINE does not repeat any of the letters in LOATH
and MURKY, we know in advance that the response to SPINE would just be
five gray tiles, giving us no new information. Even if only four
colored tiles had shown from LOATH and MURKY, we could probably skip
entering SPINE; the same might be true if we had gotten three green
tiles from LOATH and MURKY. The point is that in such cases it is
likely that there are only very few words --- maybe just one --- that
fit these unusually helpful sets of clues. A dedicated player might
even make a list in advance of any problematic situations that could
arise from skipping SPINE when there are four colored tiles from LOATH
and MURKY, and then devise separate strategies for those cases.

We will not pursue this line of inquiry very far because it is
less useful in a compound game. Suppose, for example, a person is
playing Dordle (N=2) and after just LOATH and MURKY are entered
the player sees  LOA..+m...Y  in one subgame but  .....+..... in
the other. Surely they can enter LOAMY next to win the first subgame,
but this will give no new information in the other, so inevitably the
player will use SPINE again anyway. More generally, it only makes
sense to abandon the intended list of starting words if *all* of the
subgames have already given abundant clues in response to the first
couple of words.  This certainly can happen, especially in Wordle
itself (N=1) but it becomes increasingly rare as N increases.

----------------

To complete this introduction, we can now summarize the prospects
for the player who begins with LOATH + MURKY + SPINE .

(1) The guess-at-will strategy cannot guarantee success. Example #11
is among the worst: from those tiles all we know is that the hidden
word is one of these twelve:
bread, cedar, debar, dread, eager, gazer, racer, rebar, wafer, wager, waver, zebra
An unlucky guesser could take as many as six more guesses to find the
hidden word, even if properly following all the additional clues that
come from earlier incorrect guesses, and even if only guessing
legitimate Wordle answer-words.

A player who enters these three starter words and then follows this
process can expect to guess the hidden word on the very next turn
74.1% of the time. On subsequent moves the player will win (21.1%,
3.90%, 0.78%, 0.12%, 0.002%) of the time. In particular that means
losing standard Wordle 0.90% of the time. If it is possible to keep
guessing through move 9, the expected number of moves needed is 4.317.

(2) On the other hand, the player who does not want to face defeat (but
still wants to begin with LOATH + MURKY + SPINE ) has the option of
using special rules for tricky situations. As it turns out, of the
1715 possible ways the colored tiles can result from this starting
word-set, 25 of them can lead to defeat if we just guess at will any
word consistent with the clues. Of these, 13 cases can still give a
victory if we play a "preferred" word in the cluster of consistent
words (like PRIZE) but the other 12 require an out-of-cluster word
(like BADGE) to ensure success. (See e.g. Example #12, which could
indicate any of daunt, gaunt, jaunt, taunt, or vaunt as an answer. But
the player could enter JUDGE and then be sure of a victory after just
1 or 2 more moves.)

For a player who follows this strategy, I calculate that the probability
of completing the puzzle on moves 4, 5, and 6 to be 73.56%, 22.88%,
and 3.56% respectively, for an average of 4.300 moves to win (and a
100% chance of winning by the sixth move). This represents an
improvement over the guess-at-will strategy, but comes at the expense
of having a more complicated set of rules of play.

Those probabilities apply to Wordle itself but they point to some
statistics for the compound games too. If the player is playing a
compound game built from N Wordle subgames, then (if sufficiently many
moves are permitted), the player will first enter the three starting
words, and then in any of the N subgames can expect to require 1.317
additional moves to win by using a guess-at-will strategy (or 1.300
additional moves to win by memorizing what to do with the 25 anomalous
clusters). That would mean the expected number of total moves is 3 +
1.317 N.  Well... it *would* mean this expected number of moves if
(after the three starter words) the subsequent guesses are applied
only to one subgame at a time. In practice, this is not true -- the
additional words entered for the first subgame may give additional
clues in the second and subsequent subgames, so the expected number of
moves is surely smaller than 3 + 1.317 N .

Nonetheless, this gives an upper bound on the expected number of
moves for e.g. Quordle, and more broadly shows the relative importance
of the two phases of Wordle-solving. For Wordle itself (N=1), it is
only the combined number of moves taken for both the fixed initial
guesses and then the guess-at-will phase. But for increasingly
large values of  N, the size  "a"  of the starter set (here, a=3)
becomes less important than the set's effectiveness in approaching a
solution.

For example, we will see later that there is a four-word starting
set that completes the puzzle on the very next move 97.02% of the
time and on the following move 2.98% of the time, so the expected
number of moves is 4 + 1.0298 for Wordle and no more than 4 + 1.0298 N
for a compound game. Already for Quordle this is a smaller number
than 3 + 1.317 N.

In short: for Wordle itself (and the smaller compound games) it may
be inefficient to fix more than one or two starting words to be
used every day, but for the larger compound games, it may indeed
be more efficient to begin with a larger fixed starting set.

Also note that, while we have a way to turn LOATH + MURKY + SPINE into
a 100%-winning strategy for Wordle, it doesn't guarantee success for
the compound games. Our refined strategy (2) will surely win Wordle with no
more than 3 moves after the starting set, but for an N-fold compound
game that means the maximum number of moves needed could be 3 + 3N, in
the (rare) case that all the subgames force the player to use three
additional guesses to discover the word. (Not only would this happen
at most .0356^N of the time, according an earlier paragraph, but it
would require that none of the words entered to complete any subgame
offer any succor in any of the other subgames -- a very rare situation!)

How else could one offer more information about how the strategies
will fare in the compound cases? Surely failure is possible for N>1
even if it is impossible for N=1. Presumably one could (at least for
very small values of N) itemize a catalogue of the *combinations* of
tile patterns that could lead to a loss and perhaps find ways to
circumvent them, as we did above with "preferred" and "out-of-the-box"
moves, but I have not tried to do so. I have tried to run computer
simulations of thousands of randomly-selected Quordle games to see how
the different starting sets compare, but it is not clear how
representative these are, since there are over one trillion different
Quordle games, and many more for Octordle, etc.

So in the end, is LOATH+MURKY+SPINE a good Phase-1 strategy? That's a
matter of opinion. Memorizing the 13 "preferred" words, plus recognizing
the other 12 difficult clusters and remembering the out-of-cluster
word that resolves them, might be a bit much. Figuring them out on the
fly is not impossible, but taxing. Sticking with mode (1) is simple,
and if you're a gambler who is a "lucky guesser", that may be sufficient.
And, well, maybe it's more fun too, even if it means the occasional
loss. It's a personal decision, of course. But we can provide
comparable data for other word sets, and let each player make a
decision separately.

----------------

So to summarize all this notation: we will analyze some sets of  "a"  words
to be entered at the start of an  "N"-subgame compound Wordle game.
Based on the colored tiles that result, the 2,315 Wordle answer words
will be split into "clusters", and seeing the cluster in which the
hidden word is contained, the player will try to discover which word
it is by either making a random guess, or by using a memorized
"preferred" member of the cluster, or by using a pre-computed
"out-of-the-box" word that separates the cluster into smaller clusters,
or (in extreme cases) by entering pre-determined words over the
next *two* moves to resolve ambiguity.

In order to compare different starting sets, we can tabulate, for each
such starting set, the number and sizes of the clusters; the probabilities
of completing the puzzle after 1, 2, 3, ... additional moves; and
the amount of information the player must memorize to carry out their
algorithm. We can look for measures of how likely it is that the
collections of colored tiles will be very sparse (does the player need
a hint?) or abundant (can we guess the hidden word before using all
the starter words?) After all these measurements, we can decide for
ourselves which is most important, and then determine which
starting set is "best"

Very well, then, let's look at some very good starting sets of Wordle words.
We start with sets of  a=6  words, then progress down to smaller values of  a.

==============================================================================

SIX (AND UP)
The six-word set
[catty, frond, rumba, spill, verge, whack]
is nearly perfect for playing these games. It completely distinguishes
all 2315 Wordle words (despite not including j,q,x, nor z !).
That is, any two different hidden Wordle words will generate different
patterns of green/yellow/gray tiles when these six words are entered.
So if we enter these words as Phase 1, then Phase 2 is simply: enter the
hidden word and win; there is no need to wonder about the player's behaviour.
There is no guessing nor branching in this routine. After the  a=6
starting words are entered, the probability of finishing on the next
move is 100%.

So this six-word set is nearly ideal for the N-fold compound Wordle
games: all of them will be completed successfully in N+6 turns, 100%
of the time. Sadly, most of these games (Wordle itself is the case N=1,
then Dordle, Quordle, etc.) allow only N+5 turns to finish the game, so
this starting set has a 0% chance of actually winning the game.

One exception is Octordle's variant that requires the player
to solve 8 simultaneous Wordle subgames *in order*; because of the
extra level of difficulty, that game allows the player 15 guesses to
try to win. This starting set of 6 words permits the player to win
every such game in only 14 guesses! (Here I use the fact that all
the answers-words for Octordle are among the 2315 Wordle answers.)

*Almost* serving as another application is Sexaginta-quattuordle,
which gives the player 70 moves to guess 64 words --- just enough to
use this starting sextet.  In fact, starting with this sextet gives
an excellent way to play this compound game, since then the player
need only scan the first six rows of the crowded display for each subgame.
Unfortunately, the word list for 64ordle is significantly larger than
that of Wordle. so in 64ordle there are pairs that are not
distinguished this sextent:
edged, egged    sided, sized    dozed, oozed
boded, boxed    dazed, jaded    waded, waxed
including some pairs involving Wordle words* :
unzip*, unpin   bonus*, bosun   mummy*, yummy  organ*, argon
so the player would not be assured a win with this sextet.
Indeed no sextet will suffice for the full set of 64ordle words;
I suspect there are many septets that will suffice to distinguish
every one of the game's answer words, but with only 70 moves
allowed, a starting septet will not allow the player a victory.

The sextet also definitely fails (slightly) for Dordle.
whose solution-list includes the words UNPIN, YUMMY, and ARGON; the
sextet does not distinguish these from UNZIP, MUMMY, and ORGAN,
respectively. Since the Dordle word list also *deletes* some of
Wordle's words, there may be a different discriminating sextet
for Dordle. I have not looked for one.

Of course, this discriminating sextet also works for those compound
games whose solution set is a subset of Wordle's: Quordle, Octordle,
and Duotrigordle. But it's of no practical value since those games
do not allow N+6 words to be entered.

It is comparatively easy to get more 6-word sets that *almost* split
the whole wordset, and thus it is easy to get many sets of 7 that do.
But it turns out that this six-word set is the UNIQUE sextet of
Wordle answer-words that splits the Wordle dictionary into
singletons in the way I have described! (I have to say I was very
surprised by this!)

Of course, even when playing this sextet, the player still has to do
some thinking to *recognize* the hidden word each day; knowing that it
is unique, and knowing a few letters in it, are not quite the end of
the story. Of the 30 letter tiles shown after the six words are
entered, the player may see no greens at all and as few as four yellow
tiles (both {b,i,n,o} and {i,n,p,u} can occur) and it can take some
effort to realize the hidden words are "inbox" and "unzip".

A novice player who wants to practice recognizing Wordle words might want
to play with this set, since it allows only one Wordle-correct word to be
built from any set of clues. You could, for example, enter these words
into the sequential version of sedecordle (because it allows you to
enter so many words) and then practice recognizing Wordle words.

If you want to have a set of words that has the same property as the
Magic Six, but includes all 26 letters, you'll need at least eight
starting words. One such solution is
[cross, equip, expel, flank, jumbo, razor, vodka, wight]

How much help do you need? We can even flag for each letter that's ever
doubled, though to do so you'll need at least 17 starting words, e.g.
[affix, booby, ditto, jazzy, kappa, kayak, mimic, occur, penny,
piggy, queue, radar, shush, slyly, undid, vivid, widow]

Oh, there are some tripled letters too; if you want those flagged as well
you'll need at least 20 words, e.g.
[bobby, cocoa, daddy, error, fluff, heath, jazzy, knack, leggy, mamma,
melee, ninny, pixie, puppy, queue, sassy, slyly, tatty, vivid, widow]
At that point, you know not only which letters appear but how many of each
there are! With 20 words entered, Sedecordle is leaving you space for just
one guess, but you have literally nothing to do but to permute the letters
in yellow! (On average, 3.74 of the tiles are already green; only
abort, acorn, adorn, avian, axial, offal
have no green tiles and force you to consider all 120 permutations of
the five yellow tiles.)

I can even provide a starting word-set that relieves the player of all thought!
We do so by finding an (optimal) solution to the game Kilordle, which
requires solving N=1000 Wordle games at once. (In Kilordle, it is not
necessary to *enter* each correct word, merely to get a green tile in
each of its five columns. As an additional assist, the 1000 subgames
are sorted to present first the ones that are closest to completion by
some metric, with the completed subgames removed from view.) But in
fact we can ignore the given subgames completely! Just treat this as
requiring a list of words that contains each letter in each position --
130 tasks. Actually 5 of the tasks are never presented in a Wordle game
(e.g. there is no word with an  x  as the first letter). When solving
kilordle manually, I typically need to enter about 100 words.
Certainly 26 words would be a minimum because we would, among all the
subgames, eventually need to enter every letter in column 2. So the
minimum number of words needed, to be sure to solve every round
of kilordle, is between 26 and 125.

I found manually a set of 36 words that did the trick. But by using
optimization software I discovered that the minimum is actually 35.
A sample solution is
[above, affix, askew, banjo, bayou, civic, debug, eject, epoxy,
ethos, evoke, extra, fritz, globe, howdy, igloo, imply, jazzy,
known, leggy, maxim, nymph, ozone, pique, quasi, rajah, scrub,
skimp, squad, tweak, udder, vinyl, whiff, yacht, zesty]

Other 35-word kilordle solutions exist but all of them must contain
"pique", which is the unique word having q in position 3. They must
likewise all contain "bayou" (u5), and either "banjo" or "ninja" (j4),
"azure" or "ozone" (z2), "eject" or "fjord" (j2), etc. I already fiddled with
the list to remove words I didn't care for ("squib", "waxen", "twixt",...);
I'm not sure what I would consider the most normal-sounding list of
35 words would be.

So not only does this set of words solve every game of Kilordle, it
gives a "simple" way to solve Wordle: just enter all 36 of these
words, and then locate the green tiles in each column to form a
5-letter word! Of course, we're now waaay past the 6-word Wordle limit...

(If you don't mind using words like "embog" and "jambu" then the
minimum drops to 30, using the 12000-word list of possible Wordle
inputs. A solution was posted to reddit by user "k3and". As the Times increases the pool of
acceptable input words, the size of a minimal winning Kilordle set
can decrease.)

Mathematicians might want to click here for a short description
of how I first found the 6-word set; You'll probably want to read about
measures of word similarity first. The point is that we can talk in a
meaningful way about what it means for two words to be "close" to each
other. (Mathematically, we can impose a metric on the set of these
words, and all our searches for optimal word sets focus on finding
words that are within a small distance of each other, then ensuring
that Phase 1 leaves us with ways to distinguish those words.)

The claims of minimality for the 8-, 17-, 20-, and 35-word sets are
proved by covering-set arguments and computations using Gurobi.

==============================================================================

FIVE

With five 5-letter words we can hope to include all but one letter of the
English alphabet, and sure enough this is possible. One example that comes
immediately to mind contains all the letters but  j :
[waqfs, vozhd, blunk, cimex, grypt]

Hah hah, just kidding, that's a bit ridiculous. Not one of those five
words is in the Wordle wordlist, although all five of them are accepted
as input in a Wordle game.  We can get as many as three wordlist
words into the set and still have 25 letters:
[waltz, fjord, chunk;   vibex, gymps]
(This one misses only  q .) But having two words outside the basic Wordle
wordlist is the provable(*) minimum, if you hope to include 25 of the 26 letters,
and the only two such sets are this one and
[waltz, fjord, nymph;   vibex, gucks]
and neither of these is particularly great as an opening play. (The first
one leaves 41 pairs of words and 4 triples undistinguished; the second is
similar. So neither will guarantee you a win of the game in 6 moves.)

[ (*) UPDATE: I re-verified this after the NYT increased the set of Wordle
inputs. Simply compare the list of 20-letter
quadruples of Wordle words to the new list of acceptable Wordle inputs;
in every case there is non-empty intersection. Although, let's be honest,
who would know better if I announced that
[ fjord, nymph, squib, waltz ; gveck ]
was a 25-letter quintuple that used only one word from the list of
acceptable inputs?... ]

A five-word set with 25 distinct letters is impossible if it includes only
zero or one non-wordlist words; the best we can do is 24 distinct letters.
Before I turn to the 24-letter sets formed only from answer-list words,
let me mention one example that does include just ONE non-Wordle word,
which I will do because it's actually a reasonable word. It's (almost)
a sentence, or at least a headline:
[quick, waltz, vexes, fjord, nymph]
(It's got two e's while missing b and g . "Vexes" is not in the Wordle
wordlist, being a third-person singular form of a verb.)  Entertaining
though it may be, it's not as perfect for Wordle as the sextet in the
previous section: the colored tiles returned from these five words are not
sufficient to distinguish "error" from "gorge", "blast" from "stall", etc.

----------------

That's the last word set I analyzed that uses non-Wordle words; in everything
that follows, I only consider sets of words from the Wordle solution-word list.

Of all the 5-word sets made of Wordle answer words, none have 25
distinct letters. There are 58 5-word sets with 24 different letters.
Four of them include one word with a repeated letter, e.g.
[blitz, chump, fjord, gawky, seven]
and the rest have a pair of words sharing a letter, e.g.
[coven, fjord, gawky, plumb, sixth]

As starting sets for playing Wordle and the other hgames, I
would argue that these two are each the best in their class. But
despite revealing nearly all the letters in the hidden word (they miss
q,x and q,z respectively) they still don't quite pin down the word
unambiguously: using the first one to get clues we would not be able
to distinguish odder, order, and rodeo from each other as candidates
for the missing word, and there are 26 additional pairs of words that
cannot be distinguished. The second quintuple has no unresolved
triples but does have 31 indistinguishable pairs.

With so many letters revealed, these quints do give a human player
lots of help figuring out the missing word. And they make a great
start for the N-fold games with large N, because the average number
"b" of words that are used in Phase 2 is so small: the second of the
quints terminates by the next (sixth) move 98.66% of the time, and
in the remaining 1.34% of the cases it finishes after just b=2 moves.
(There's really no question about mode of play here: each set of
25 colored tiles corresponds to only one or two possible words.)

The first quint is a little different, though. That one triple
{odder, order, rodeo} does benefit from preferentially choosing
odder or order instead of rodeo. So we can run two analyses:
using a guess-at-will strategy, the distribution is
98.79% of games end with (0 or) 1 additional move
1.20% end with 2 additional moves
0.0014% end with 3
Or we can use instead a "guided hard mode" strategy: if we
choose to play "order" when it is consistent with the clues,
the distribution changes only little, but in an important way:
98.79% of games end with (0 or) 1 additional move
1.21% end with 2 additional moves
none  need 3 (or more) additional moves.

----------------

Next we set aside the desire to include 24 different letters, and just
look for ANY good set of five words.

Is there any five-word set that's as good as the six-word set of the
previous section --- one that always narrows down the set of possibilities
to just one word (and thus guarantees a win by move 6)? The answer
is provably "no". I have an elementary argument
that explains why no such perfect quint exists.  But it's faster to
simply use the Linear Programming techniques already described
in this document, especially since
(a) That technique also proves we cannot even find a perfect quintuple
among the much larger set of Wordle's list of recognized inputs,
(even including the additional words added Aug 2022); and
(b) LP techniques helped me find the "tough pairs" that I used to create
the elementary argument, anyway.
(The LP techniques are also used to prove the uniqueness of the six-word
set in the previous section, and that uniqueness in turn trivially proves
that no five-word set can detect every hidden word unambiguously.)

Nonetheless, some really good sets of five words do exist.

----------------

A provably most-efficient-possible 5-word set is:
[blank, chump, goody, river, swift]
(No jqxz; has double r, double o, and two i)
Laurent Poirrier found this one and we have proved that no other
quintuple is better in the sense that this quint can distinguish
all the Wordle answer words except for these 11 pairs:
[ample, maple]  [booby, boozy]  [bugle, bulge]  [chili, chill]
[eagle, legal]  [gauge, gauze]  [jaunt, taunt]  [lemon, melon]
[pasty, patsy]  [skate, stake]  [testy, zesty]
So there is no need for anything but free guessing --- if the
hidden word can only be one of these 22 words, we might as well
flip a coin to pick one in the right pair. Thus the distribution
of games is that
99.52% of games complete with 1 more move
0.58% complete on the second move (after the first 5).

The only other quintuple that is equally efficient is
[bawdy, clove, furor, might, spank]
(No jqxz; has double r, two o, and two a); it has the very same
distribution of numbers of moves needed.

If we sequentially ran  N  independent random processes, 99.52%
of which finished after 1 step and the remainder after 2 steps,
then the number of steps needed to complete all of them could
be anywhere between  N  and  2N ; the probability that exactly
k  of these  N  processes took that second step to complete
would be  binomial(N,k) (.9952)^(N-k) (.0058)^k, and the
expected number of steps taken would be  1.0058 N . That almost
models what happens with an N-fold compound Wordle game like
Quordle (N=4) : the expected total number of moves needed,
if we begin with either of the starting quintuples above,
would be 5 + 1.0058 N  IF the  N  subgames were independent.

But they're not! Suppose for example we are using the first of these
two quintuples. If after entering those words we have concluded that
in one of the N subgames the hidden word could either be "ample" or
"maple", and we also know that the hidden word in another subgame is
either "bugle" or "bulge", then we should indeed flip a coin to enter
either "ample" or "maple", but then (by looking to see whether the L
is yellow or green) we would know whether the other hidden word is
"bugle" or "bulge". As it turns out, for EVERY one of the 11 pairs at
least one of the words in the pair can resolve at least one of the
other 10 ambiguous pairs. In fact, the compound games which contain
exactly two subgames in which the hidden words lie in (different)
ambiguous pairs can involve 55 different combinations of two of these
ambiguous pairs, and of these 55 combinations, 24 allow us to guess a
word from one pair that will resolve the question of which word from
the other pair is the hidden word in that other subgame. In fact, it
is impossible for a game to require more than 10 + N steps to complete
(far fewer than 5 + 2N except for the smallest N) because so many of
the ambiguous pairs include words to guess preferentially so as to
discover the hidden words in other pairs! (A maximal example includes
subgames with the hidden words
chili  gauge  jaunt  lemon  skate
Such a game would require five coin tosses, which if they are all
unlucky would cost us 10 moves to win, after the initial quintuple
is entered.)

Since the subgames are very likely NOT independent, then, we can
only conclude that the expected number of moves to complete an  N-fold
compound game, when starting with one of these two quintuples, is
at most  5 + 1.0058  N .

If we are playing an  N-fold  compound game that allows only
N+5 moves to complete the game, then after the starting quintuple
is entered, we must finish every one of the subgames with just
one move (each). That would happen with probability (0.9952)^N
IF the games were independent but as above we notice that
the earlier subgames can provide additional information to help
resolve the later ones. So in fact the probability of success is for
all  N  at least  0.9952^5 = 0.97623 (i.e. a 97.623% chance of winning),
assuming the player resolves any of the 11 ambiguous cases in
an advantageous order.

----------------

When I claim that the previous two starting quintiuples are optimal,
what I mean is that they minimize the number of pairs of words that are
not distinguished from each other, and consequently they minimize the
expected number of moves needed until a win.  The proof of their optimality
comes from searching for sets of five words that maximize the pairs split
among a select list of pairs of similar words. Searching for maximizing
quints in that way allows us to discover other quints that are nearly as good.

The next few close contenders for "best" starting quintuple, all of which
happen to contain all letters except j,q,x, and z,  are these:
[bawdy, furor, month, speck, vigil]
will win on move 6 99.48% of the time, and on move 7 every other
time. Nothing is any better than free-form guessing. This quint leaves
10 pairs and a triple unresolved.
[flock, haven, rugby, swept, timid]
has exactly the same distribution of moves, but this one leaves 12 pairs
unresolved (and no triples).
[bawdy, chump, front, skill, verge]
finishes by move six 99.35% of the time, and on move 7 every other time.
No need to think about different strategies or modes; all this quint
leaves undecided is 15 pairs.
[batty, champ, furor, slink, wedge]
which leaves undecided 15 pairs plus a triple (the same triple [bobby,
booby, boozy] as a previous case) for a 6-move success rate of 99.27%.
(This last one might be slightly easier for a human to use because
it most frequently returns four or more yellow or green tiles,
although how to compare the quints on this score is not clear because
the repeated letters in the quint mean there may be less information
from yellow tiles than meets the eye.)

For the curious: these are the six quints that cover the largest
numbers among the 919 hardest splitting sets, that is, I checked the
919 hardest pairs of words to differentiate, and these quints covered
the most --- at least 907 of them --- and moreover these quints *did*
distinguish any pair of words that wasn't on this list of 919 tough
pairs.

(FOr completeness' sake, I ran a similar test with the much more
accommodating set of 14,853 currently-allowed input words for Wordle.
There exist (multiple) sets of five words which successfully distinguish
all the Wordle answer-words except for 8 pairs, and  8  is the minimal
number of failures. For example, for
spill verge dumbo fawny chott
all the clusters are singletons except for these eight pairs:
algae/glaze    crock/crook    dried/drier    husky/hussy
liken/linen    odder/order    piper/riper    rebar/zebra

It is easy to find many starting quintuples which give success rates
over 99%, but never 100%, so perhaps we are just splitting hairs
here. But one quint of note is
[carve, sight, downy, plumb, fetal]
which extends the primary quadruple of the next section to
reduce that quadruple's ambiguity --- all that's left are 20
ambiguous pairs and two triples. There's no need for any strategy;
even the triples can be guessed at will.
98.96% of the games finish by move 6,
1.94% finish on move 7
Also note that this quint uses 20 different letters in the first four
words, potentially helping the player discover the hidden word more quickly.

So to summarize, we have found several quintuples of starting words that
*almost* always enable us to know the hidden word and enter it as move 6
and win --- but not one of them allows us to win 100% of the time.
This is about to change...

==============================================================================

FOUR

Using the right starting set, we can guarantee a win of Wordle.

With a four-word starting set, it is conceivable one could win
*before* using up all the moves allowed by Wordle's rules. Indeed we
will see in the next section that a perfect player can always win
Wordle in at most 5 moves. But in order to do so with a four-word
starting set, that quadruple would have to unambiguously identify
the hidden word every time. As discussed in the previous section,
that's not even possible with FIVE starting words, let alone with four.

So instead we look for sets of four starting words that can guarantee
a win by move 6, i.e. with TWO rounds of guessing after the four
starting words are entered. That's flexibility we did not have in
the previous section, and it turns out to be just what the doctor
ordered. We start with:

Most useful 4-word set, ideal for casual players who want a 100% win rate:
[carve, downy, plumb, sight]
This set has 20 different letters (all but k, f, and the rare j,q,x,z)
which gives the human player a lot of information about the hidden word.
Then, in 94.25% of all cases, there is only one possible word which can be
entered on move 5. The uncertain clusters consist of 5 triples of words and
59 pairs. Guessing a word from each set will add wins on move 5 in an
additional 2.76% of the cases; and in the remaining 2.98% of cases there
is a guaranteed win on move 6.  So we always win, and over time take
an average of 5.0298 moves to do so. With this starting set, the player
can finish using "freeform hard mode" --- there is no need to use
the other modes --- and will finish on move five 97.02% of the time
and on move six the rest of the time.

As in the previous section we can at least estimate the performance
for compound games: an independence assumption would make the expected
number of moves be  4 + 1.0298 N , and the fact that the subgames need
not be independent only serves to lower the expected number of moves.

If I have run the tabulation correctly, there are 45147 sets of four
Wordle words that use 20 different letters. None of them appears to
be better than the one above, but some are reasonably good. Many
permit a win just by guessing (after the four words are entered), i.e.
finishing with "freeform hard mode", but they have slightly worse
distributions of numbers of moves needed compared to the quad above.
For example after entering
[carve, downy, fight, slump]
we have only 4 ambiguous triples, but 63 unsplit pairs, leaving a
probability distribution of ending on move five 96.93% of the time,
and on move six 3.07% of the time, so an average of 5.037 moves to win.

Other quads that are good (by various measures) include
[burst, champ, dingy, vowel]
[brawl, coven, dumpy, sight]
[covet, gland, shrub, wimpy]
etc. They all involve a handful of undifferentiated triples and dozens
of undifferentiated pairs; none of the triples require preferential
guessing or out-of-the-box thinking. All four of these sets end in a win
by the 6th word, and in fact the fraction of the time a 6th move is
needed is small: 3.02%, 3.02%, 3.07% respectively.

The set
[bawdy, flung, porch, smite]
was suggested to me by a friend when I was first introduced to
Wordle. It's nearly as good by the above measures (not quite!) though
it does have the advantage that there are fewer words for which we get
only a couple of yellow/green tiles.  This doesn't technically make it
better for distinguishing the words, but it does make it easier for a
human player to deduce what the possible matching words are!
The particular Achilles heel for this quadruple is the set of 3 words
jaunt,taunt,vaunt  which will yield the same tile colors. In this case
in order to be sure to win by turn 6, the player cannot continue in
hard mode but must turn to "out-of-the-box" mode of play: s/he must enter
either "jetty" or "trove" on move 5 to get enough additional information
to know which of the three candidates is the actual word of the
day. (There are three other triples and 69 pairs which are also
undifferentiated after the first four words are entered, but in those
cases we can simply guess at will and be sure to win by move 6.)
Using just free-form guessing we will win in
five moves 96.67% of the time; six moves 3.28%; 7 moves 0.04%
But using out-of-the-box mode for that one cluster, we would change
those numbers to 96.63%, 3.37%, and 0% -- it's still the same 5.0337
expected moves, but (crucially!) we would always be able to finish
by move 6.

The 4-word set
[bugle, champ, downy, first]
(again with 20 letters, this time missing k, v, and jqxz)
has a slightly higher ability to identify words unambiguously; it leaves
undifferentiated only 50 pairs and only 4 triples, but also one quad.
([piper, riper, viper], [jaunt, taunt, vaunt], [eater, extra, taker],
[bobby, booby, boozy], and [skate, stake, state, stave]). These larger
sets prevent us from winning by move 6 if we use a "freeform hard-mode"
style of playing: the distribution of play times is
97.37% of games end by move 5, 2.52% end on move 6, 0.12% end on move 7
Using "guided hard-mode" allows us to finish one of those sets by move 6
(play SKATE if you can) but two more require "out-of-the-box mode" to
finish that fast: if JAUNT fits, play JETTY; and if PIPER fits, play PARER.
Using those three rules, the probability distribution changes to
97.28% win by move 5, 2.72% win on move 6. So this quadruple
is slightly more efficient than the others, but we comes at a price of
extra complexity: we must remember the three additional rules
Prefer: SKATE    Replace: JAUNT->JETTY, PIPER->PARER

An interesting also-ran in this category is:
[blast, midge, porch, funky]
which extends an excellent starting triple that we will meet in the
next section; so if you start to play Wordle thinking you will just
use the first three, and then you get stuck or otherwise want to bail
out, you could just use funky as your fourth word. This quadrule would
still leave 66 pairs to be split, and 9 triples. We can guarantee a
win by move 6 if we steer clear of free guessing in four of those
triples.  A suitable set of rules is
Prefer: EAGER, FEVER    Replace: JAUNT->JETTY, CATCH->CLOWN
which gives game probabilites of 96.29% by step 5, 3.71% on step 6.

There may be more efficient 4-word sets that involve fewer than 20
letters; I haven't found any yet. (And I observe that these may be
harder to use for people who don't know the wordlist well.)
Here is one example that at least comes close:
[champ, flown, rugby, steed]
This one leaves 90 ambiguous clusters: 87 pairs and 3 triples. The
distributions for free-guessing mode would be
95.98% by move 5, 3.92% on move 6, 0.10% on move 7
We can guarantee a win by move 6 with rules for the three triples:
only one has a preferred word (play STARK if it fits); the other
two require an out-of-cluster word (if JAUNT fits, play JETTY; if
STAKE fits, play EVOKE). Then we can replace those probabilities with
95.90% by move 5, 4.10% on move 6 (never on move 7 or later)
for an average of 5.04 moves.

A search is underway for other, "better" four-word starting sets, but
already we can prove that any quadruple of Wordle solution words will
leave dozens of pairs undifferentiated, possibly in sets of three or
more (as illustrated in the examples above).

==============================================================================

THREE

This is a long section because there are many starting triples that are
"good" for various reasons, so no single one can be called "best",

With a set of three starting words, we can surely win by move 6, but in
practice this can be tricky. After just three initial words there are
at least 11 letters that will not have been tested, so the player must
do more sleuthing; e.g. it is quite possible that after three initial
guesses the player has seen nothing but grey tiles!

Still, by choosing an appropriate starting set of three words,
one can hope to have a 100% win rate at Wordle. After all, we have
already seen in the last section that we can win 100% of the time
starting with CARVE + DOWNY + PLUMB; surely with the freedom to choose
something other than SIGHT next, we should be able to ask for
something more than just a 100% success rate in 6 moves. At the very
least we should be able to arrange a lower average number of moves
until a win. What else might we ask for? What are we willing to give up?
How do we decide that one or another three-word starting set is
"better" than another, or even "the best"?

The question of what is a good three-word starting set arises periodically
on the Reddit forum. I fashioned a detailed response analyzing many of
the starting triples that had been proposed. In this document, we can put
that analysis into context.

What we will see is that trying to reduce the expected number of moves
needed to win will introduce more complexity in our algorithms. To be
precise, what we had in the previous section was a starting set
[carve,downy,plumb,sight] that had two features:
(A) It wins 100% of the time within two more moves.
(B) It requires no decision-making besides guessing freely among candidates.
In this section we could hope for a THREE-word starting set with both
those properties. After all, it *is* known that there are algorithms
to win Wordle in just 5 moves (although the best algorithms that do that
to my knowledge all require significant amounts of branching).

Sadly, I can prove that no set of three words can have BOTH
properties (A) and (B). In fact, I am pretty sure that (B) alone is
impossible (more on this below). But we can find starting triples
that have property (A).

----------------

I have found that there are exactly 261 starting triples with which
every game can be won by move 5. For each of these triples, there will be
clusters of words (signalled by the pattern of the 15 colored tiles)
that could lead to a loss if we simply play with a guess-at-will mode,
that is, we will have to map out some preferred words or out-of-cluster
words to use in those cases. (See the examples of PROBE and BRACE in the
Introduction.) Unfortunately, for each of these starting triples, in order
to achieve goal (A), we need at least 54 rules of these two types, which
is perhaps too many for a human to execute while playing a game.

Which is "best" among these 261 triples is a matter of taste but
[blast, midge, porch]
is certainly a good choice. It minimizes the expected number of moves
(4.18) by maximizing the probability of a win by move 4 (1888/2315 =
81.6%), and among all these starting triples it reveals the highest
number (1597 --- about two-thirds) of words with certainty. There are
also 414 more words that come in pairs, such that if we guess one of
the pair on move 4 and it's not the hidden word, then the other is
surely right and we can enter this one on move 5 and win. The
remaining 304 words are grouped in 84 other clusters, each cluster
containing between 3 and 10 words, that are still ambiguous.  For 29 of
the clusters, free-form hard mode suffices: we may simply guess any word
in the cluster on move 4; if that's not the hidden word, we will get
enough information from those colored tiles to know unambiguously
which other word in the cluster is the right one. But the other 55
clusters require extra care: there *are* words to enter on move 4
which will give enough information to allow a win on move 5, but we
must choose them carefully. In 39 of the cases, a word from within
the cluster will do (e.g. {arena, freak, raven, wafer, waver, wreak} is
such a cluster; "wafer" is the one choice that will work). In the
other 16 clusters we need a word not in the cluster (e.g. one such
cluster is {jaunt, taunt, vaunt}; in order to guarantee a win by
move 5 the only words in the Wordle answer-list that can be entered
on move 4 are "jetty" and "trove").

So in toto we have 55 such rules that must be memorized. A sample
algorithm using this information might be this:

After blast+midge+porch, enter any word consistent with the clues, except
* If the word COULD be any of the following 39 words, then play it:
allow, antic, awake, award, bevel, crown, dizzy, dowdy, dried, drone,
eater, enter, equal, fauna, fatty, fewer, filly, finer, folly, funky,
jelly, kitty, liner, mafia, otter, relax, safer, seize, sever, shown,
skate, skulk, swash, taste, testy, udder, unfed, value, wafer
* If the word COULD be any of the following 16 first-halves, play the second half:
[anger, gawky], [catch, crown], [cinch, crown], [crane, ozone],
[fatal, fella], [field, gawky], [fight, frown], [fizzy, ozone],
[focal, fella], [forth, crown], [fudge, funky], [jaunt, jetty],
[major, jetty], [rower, gawky], [snoop, frown], [stoke, funky]
Then (if the hidden word has not already been played) there is only one
Wordle word consistent with the clues; play it on move 5 and win.

Other strategies exist; for example, for every one of the 55 problematic
clusters --- indeed for all but four of the 291 non-singleton
clusters! --- one or more of the following ten words will split the cluster
completely. Make a table of which of these words you wish to use to
resolve each of the 55 clusters to obtain your win-by-move-5 algorithm:
[crown, fewer, filly, funky, gawky, jetty, navel, skate, spunk, tawny]
(The cardinalities of the sets of ten words here and the seven used in
the previous algorithm are minimal, as determined by Gurobi.)
With this given starting triple ("Phase 1") these different algorithms to
complete the daily puzzle ("Phase 2") can have different expected numbers
of moves, and different probabilities of success on moves 3, 4, and 5,
but the numbers do not range very far.

Note that by using this 3-word starter set on an N-fold compound game, we can
solve all N of the subgames in at worst 3+2N moves. In that worst case, this
is more than the number of moves typically allowed in compound games, which is
N+5. But when N=2, the two are equal, meaning we have a guaranteed winning
strategy for 2-fold Wordle. Unfortunately, Dordle uses a different wordset
than Wordle, that is, Dordle is not exactly a 2-fold Wordle, so one does not
have an a priori guarantee that this algorithm will work for Dordle. But
as it turns out, it does still work, with minor modifications.
Change the preferred word (Rule 1) to this set of 35:
allow, assay, awake, awash, awful, crown, dowdy,
drone, eater, enjoy, enter, fatty, fever, finer,
folly, funky, goner, jawed, kneed, lefty, newly,
otter, relax, sally, seize, sever, skate, skier,
skulk, snipe, testy, tower, value, viper, wafer
and change the set of out-of-cluster moves to this set of 20 pairs:
[anger, wagon], [catch, clown], [cinch, awful], [crane, anvil], [dizzy, dozen],
[fatal, awful], [field, awful], [fifty, flank], [fight, flown], [focal, fever],
[forth, awful], [foyer, gawky], [fudge, fauna], [jaunt, jetty], [liner, anvil],
[lower, anvil], [major, agony], [snoop, flown], [staff, bonus], [stoke, ankle]
Then each half of the Dordle game will definitely end within 3+2 moves, i.e.
the whole game will end within 3+2+2=7 moves.

This starting triple can also be used, more easily, to win in Wordle
by move 6. That is, a player who initially intends to follow this
algorithm so as to win by move 5, may decide during the play that
it would be sufficient to win by move 6, and then need not remember
all the 55 special cases listed above; the only ones needed are the
preferred words  CROWN FAUNA and FEWER, and the out-of-cluster pairs
[fight, frown], [fudge, funky], [rower, gawky], [snoop, frown]
Alternatively only in the first pair is it still necessary to go outside
the cluster; if we are willing to wait until move 6 to win, we can
instead use the first elements of the other three pairs as a preferred word.
This algorithm will still complete by move 5 98.74% of the time, and
has an only slightly higher expected number of moves -- 4.197 --
than the complicated, win-by-move-5 algorithm. And we have already
mentioned a third alternative in the previous section: we can
consistently play FUNKY on move 4 and then follow rules for four
special cases; but this gives a significantly higher expected number of
moves: 5.037.

Finally, for basic Wordle we may return to a point made in the
Introduction. If we enter only BLAST + PORCH, on about one-fourth of
the days we will see 4 or 5 colored tiles, or 3 greens, or 2 greens
and a yellow. In most of those cases we can still win by move 5 by
simple guessing without entering MIDGE! We need only watch for
the following words, to be used as preferred members of their cluster:
[blade, blond, brain, ditch, graft, gulch, mouth, plain, swash]
and if the word could be STORE or CATCH, play HYMEN.
Doing so will lower the expected number of moves to 3.9 .
(We can similarly avoid MIDGE in the compound games, but this analysis
applies only if all the subgames show such a favorable return from
just BLAST + PORCH, which becomes increasingly rare as the number of
subgames increases.)

This long analysis of BLAST + MIDGE + PORCH can be repeated for each
of the other 260 starting triples that have property (A). I have not
done so, but have collected some data about those triples and invite a
discussion of which others are, by some measure, better than this one.

----------------

Now, what about property (B)? Surely it would be convenient to have
a starting triple that worked as well as the starting quad
of  CARVE + DOWNY + PLUMB + SIGHT : just enter the starting set and
keep guessing words that are consistent with the clues.

I believe I can prove that no such triple exists when using only
words from the Wordle answer-list. Just for this search, though, I
also looked at the longer 14853-word list of valid input words.
In order to speed things up I made the reasonable, but not ironclad,
assumption that such a triple would involve 15 distinct letters.
(This permitted me to doing a preliminary compression to the 5,649 sets
of five distinct letters that are involved in those words, and then
to non-intersecting triples of such letter-sets.) If I have done the
search properly, I can report that no such perfect triple exists:
for every (15-letter) triple of allowed input words, there is at
least one cluster for which the guess-at-will strategy can lead to
a loss in standard 6-move Wordle.

I did also look for near-misses, though, and found a couple of
triples for which there is only one bad cluster. The best is
BONDS + GLAMP + FECHT
Each of the words {skate,stake,stare,state,stave} will turn yellow
the S, A, E, and T tiles, and obviously a guess-at-will strategy would
for example allow the player to guess them in reverse alphabetical
order, which would be a loss if the hidden word were "stake". So
in this case the player must remember (only) one additional rule:
"prefer SKATE".

Also having just one bad cluster is  TECHS + GLAMP + ROWND. For this
triple, the cluster {berry, eerie, ferry, fever, jerky, verve}
can again lead to a loss from random guessing (e.g. the sequence
verve, jerky, ferry, berry) but again the loss can be prevented
by playing the preferred word BERRY on move 4 if that is
the cluster indicated after the initial triple.

(For technical reasons I consider BONDS+GLAMP+FECHT to be the better
of the two. Finding these good triples amounts to making sure they
never (or rarely) permit quadruples like {berry, ferry, jerky, eerie}
to be together in a cluster after the initial triple of words is
entered. I assembled a list of tens of thousands of these problematic
quadruples and then developed mechanisms to detect starting triples
that broke most of these quads apart. The first triple I listed
only missed its one quadruple "STA_E". The other one actually missed two:
both {berry, ferry, jerky, eerie} and {berry, ferry, jerky, verve} are
problematic. This is a minor distinction of course. The first triple
is also better than the second in the sense that we only have to
invoke our special rule ("guess SKATE") on five days out of a 6.5-year
cycle, as opposed to needing the special rule ("guess BERRY") on six
days per cycle!)

I did not find any other starting triples that involved only a single
non-singleton cluster. Both GLAND+ROMPS+FECHT and GLAND+ROMPS+WECHT
involve just two (and each of them actually leaves three problematic
quadruples unseparated).

I make no claim about whether other equally-good triples exist. (My
method of sorting was only designed to make sure I didn't miss any
starting triples that guaranteed success *just by guessing* (with zero
special rules like "use SKATE"), and so I am fairly confident that
such a triple does not exist; but along the way I had to branch though
decision trees to trim the candidate pool, and a starting triple that
was a "near miss" might not have been good enough to survive an
early-stage pruning.)

----------------

Among triples of words drawn from the more limited (and more reasonable!)
Wordle answer-list, it appears that the minimum number of problematic
clusters is three, that is, every starting triple requires the player
to remember at least three additional rules if he or she wishes to ensure
a win by move 6.

I believe there are just two that accomplish this just with preferred words:
[blond, girth, swamp]: prefer CATER, ROCKY, and TUTOR
[blond, right, swamp]: prefer CATER, SKATE, and FORCE
If the player chooses randomly from possible words at each stage, except when
using the three preferred words for the fourth move, then he or she will win
the game by the fourth move 73.22% of the time; on the fifth move 24.33% of
the time; and on the sixth move 2.45% of the time, for an expected number of
moves of 4.292 . (The numbers for the second triple are just a bit different:
72.87%, 24.85%, 2.28%, and 4.294 moves on average.) The first of these
triples actually leaves seven of the forbidden quadruples (mentioned earlier)
intact; the second leaves eight unbroken.

These two triples were found waiting in the list of 261 triples that
allow a win by move 5, that is, if instead of just the three preferred
words we were to memorize dozens of rules, then we could force a win
one move earlier even in those 2%-3% of the cases when the game would
go to the sixth round when we ignore those dozens of moves. Compared to
the triple BLAST + MIDGE + PORCH that we discussed earlier, these
two are less good for trying to finish in 5 moves, but simpler for
trying to finish in 6 moves.

Other starting triples come very close to meeting goal (B):
[choir, swept, gland]
also has just three tricky clusters, but only two of them can be
resolved with preferred words (merry and mayor); the last requires
an out-of-cluster solution (FOYER -> FAVOR). Similar remarks apply to
[copse, drawn, light]
[blimp, cedar, ghost]
[force, glint, swamp]
[glint, peach, sword]
the last of which requires TWO out-of-cluster words (BATCH->CLIMB,
BOXER->ROCKY) as well as one preferred in-cluster word (MAYOR).
I believe there are no other starting triples that lead to just
three problematic clusters. There's also
[blimp, cedar, thong]
[blend, match, sprig]
which can be played using only preferred words, yet need four of them
(kitty, leave, rower, skate for the first; fudge, leaky, outer, rower for
the second) because there are four problematic clusters.
Only one of these triples can guarantee a win by the fifth move, but
they all can come close: for example just by following the four rules,
the last will finish on move four 80.09% of the time, taking an average
of 4.212 moves. (Actually in 16% of the cases, we can confidently guess
the hidden word after just blend+match, which lowers our average to
around 4.0 --- assuming we are sure that we can recognize those cases
when they occur!) The other triples have numbers that are similar.

----------------

So we have found starting triples that come as close as possible to the
goal of having a simple mode of play; but such triples tend to take
more moves to win. Before that, we found starting triples that never
need more than 5 moves to win; but they all require complex sets of
rules to achieve this. It may be better to find triples that are somehow
intermediate between these extremes. And, since the player now has three
more moves after the starting set is entered, we have more latitude to
devise different sets of rules for the triples; so we will need to
describe both the "good" starting triples, AND the algorithm we will
use after entering them. We will see there are many reasonable
starting triples and many ways to play, so no matter how we will choose
to judge when a triple is "best", there will likely be many triples
that are very close in quality. In short: in this section it will
be hard to name a "winner" starting triple!

Using only the words in the Wordle answer list, there are over 2 billion
sets of three words that we might potentially consider as starting sets.
As we discussed in the sections on five- and four-word starting sets, it
is certainly true that some starting sets with repeated letters are good.
In fact, 46 of the 261 starting triples that can lead to a guaranteed win
by move 5 have repeated letters! (For example [crump, doubt, salve]
repeats "u" but lacks both i and o (and y)! ) Some of the sets people
reported on Reddit that they enjoy using are reasonably good and duplicate
letters, e.g. [blind, stare, wimpy] and [colon, right, speed].

But generally speaking it seems prudent to focus on sets of three Wordle
words that include 15 different letters. That reduces the number of candidate
triples to 1,243,026. (I have listed them all in a 26Mb zipped file.)
This is small enough a set that it is possible to run some quick preliminary
computations on all of them and then run longer analyses on the most
promising among them.

I have done so (and continue to do so as I write this) but now the
question arises: what exactly do we want to measure about each candidate?
How do we judge which is better or worse?

For each triple, I could measure the following:
1. The number of clusters it creates. (This happens to be the same as
(the number of Wordle words, 2315) * (probability of guessing the
right word on the very next guess), so higher is better.)
2. In more detail we can count the number of clusters of different sizes.
Generally we prefer more smaller clusters, e.g. more singletons, and
fewer larger ones, e.g. a low maximum cluster size.
3. The total numbers of green and yellow tiles that are produced by the triple
across a 2315-day (6.5-year) cycle. Whether the hidden word is uniquely
identified by a set of tiles or not, it is difficult for a human to
realize what the hidden word might be if he or she sees mostly gray
tiles! So we want these numbers to be high.
4. The number of clusters for which guessing randomly could lead to a loss;
specifically we could count the number that can be resolved by using
a preferred word in the cluster; the number that can be resolved by a
single word (but not one in the cluster); and the others, that require
a two-word or other compound strategy. Of course to actually USE
a starting triple, we would also have to know what these words ARE.
5. The probability distribution of numbers of moves until victory, if the
player simply turns to freeform guessing after the initial triple is
entered. From this in particular we can compute the probability of a
loss when using this strategy ("loss" = "more than 6 moves used") and
the average number of moves used.
6. The probability distribution of numbers of moves until victory, assuming
the player follows a pre-determined strategy. I would assume the
player follows "guess-at-will", "best preferred word", and "best
out-of-the-box word" strategies in that order --- whichever guarantees
a win by move 6. From this we can then determine the expected number
of moves until victory.
7. As a measure of the difficulty of proceeding after the initial triple,
it is helpful to measure the number of times per cycle that the player
must begin with fewer than three colored tiles when deciding which
is the cluster containing the day's hidden word.
8. As a measure of how often one can abandon a word in the chosen triple,
we can count how often a player has an abundant set of clues after
just two of the words have been entered. (It would also be relevant
to know just how complicated the algorithm would have to be if we
try to branch off at that point, but I will not carry out that
analysis for many starting triples!)
9. As an assist to the stumped player, it would be helpful to list the
words that best complement the starting triple: to find the words that
introduce the greatest number of additional letters in case the player
is already stymied at move 4.
10. It would be fairly straightforward to run a simulation to see how a
starting set would fare for Quordle and other compound games. Whether
this simulation accurately represents rare behaviour is unclear.

(Note that statistics 3 and 9 depend only on the 15 letters involved,
not the actual words, so these can be precomputed and recycled for
multiple triples.)

Ultimately it is up to the player to choose how to weight these numbers
but having done so, the list of tested starting triples can be sorted.
I would like to calculate these numbers at least for "best" of the 1.2million
triples that involve 15 different letters. (I should probably do likewise
at least for the 46 triples that allow a guaranteed win by move 5, even
if I am only testing them for their suitability for a win by move 6.)

Until I have processed more data to include here, I will direct the reader
to the partial analysis I have already done: I computed #1 and #3 for all
1.2million 15-letter triples in this zipped file, and I computed
assorted other statistics for some "promising" triples in this Reddit post.

But already it is clear that many, many triples score well by some measures,
and almost inevitably they are noticeably lacking in others. Thus there
is not likely to be a clear "best" starting triple, but rather various
contenders for the title depending on what the player values most.

Here are some notable triples to consider.

We have already mentioned
[blast, midge, porch]
It has the lowest expected number of moves (4.197) and the highest
rate of completion-by-move-5 (98.74%)

We also mentioned
[blond, girth, swamp]
It's an excellent choice for someone who simply wants to use freeform
guessing after the initial three words are played.  It's actually kind
of hard to make such bad guesses that you lose after 6 entries: the
expected number of losses in an entire 2315-word cycle is 0.446,
i.e. you could reasonably expect to go *14 years* between losses! (But
yes, it can happen: if the initial 3 words leave you with just yellow
A,R, and T tiles, you might guess "treat"; then if you get a yellow E
too, you might go for "extra"; seeing the T go green you might then
guess "cater"; but then you'd lose if the hidden word is "after").
Close behind are [blond, right, swamp] and [blend, right, scamp].

"Best" by several standards is
[bland, copse, right]
1. It is the triple with the lowest expected length of the game;
it will finish with an average of 4.1682 moves if the player follows
the decision tree I have indicated. (Just slightly longer averages can
be expected from [midge, porch, slant], [blimp, dance, short], and
[bland, copse, mirth].)
2. This is also the triple with the largest number of clusters. It
spreads the 2315 Wordle words into 1954 different clusters, each
identified by its unique set of colored tiles when these three words
are played. That includes 1689 singletons (words that are already
uniquely identified after the starting triple). Second place by this
measure, but perhaps overall better, is [blimp, dance, short]
(Obviously the singletons and the clusters of sizes 2 and 3 can
be resolved by move 6. The extra rules cover the four largest clusters,
of sizes 10,7,6,5, and 4. That leaves only one cluster of size 5 and
13 of size 4 which fortunately can all be solved by freeform guessing.)
3. This triple has the highest probability of finishing on the very
next move (84.36%) (Actually this statement is logically equivalent
to statement (2)! )
4. Experimental simulations suggest this is
the best set with which to start Quordle, in order to minimize the
expected length of the game (about 7.420 moves per game, and winning
Quordle over 99.5% of the time.)

(To win in six moves with BLAND+COPSE+RIGHT, you can preferentially
use MAJOR MOWER FEWER and WATER, and if FIGHT fits, play AWFUL.
To win with BLIMP+DANCE+SHORT you can preferentially play ROVER, SKATE,
FOLLY, FREER, and WAGER; if GAUNT fits, play GRAVY.)

Dual to the most-on-move-4 record is the least-on-move-6. The lowest
I've found is with
[brown, midst, place]
which only gets to a sixth move 0.82% of the time. (Second place goes to
[blown, caper, midst]. The triple [blond, midge, porch] also scores very well.)
You can use preferred words {gayer,skate,other,gavel,earth,rover,relay}
If the word could be FOLLY, play FIGHT; if it could be TAUNT, play THONG

Every one of the 15-letter triples leaves at least one cluster with six
or more elements (and I would imagine that's true of triples with repeated
letters, too!). There are 283 triples whose largest clusters have only 6
members. Probably the best of these is
[blimp, dance, worst]
it leaves just one cluster of six (if you get just a yellow E and
a yellow R then the hidden word is one of
{every, fever, freer, queer, query, refer}
Comparably good is
[bland, comet, sprig]
which also has only one such cluster. These two triples are also the
ones in this pool that immediately identify the most words (i.e. they
leave the largest numbers of singleton "clusters"). And they are the
two with the smallest average sizes of clusters.
(To win with BLIMP+DANCE+WORST you can preferentially play ROVER and SKATE;
if JOLLY fits, play FIGHT, and if JAUNT fits, play TIGHT. To win with
SPRIG+COMET+BLAND, you need 6 preferences {filly, folly, fewer,
hater, witch, value} and 2 out-of-box rules HAUNT->VOUCH, FOYER-> GAWKY).

If you REALLY want to avoid days when you get the worst hints about the
hidden word, I might suggest one of these triples:
[handy, slice, tumor],  [handy, lemur, stoic],  [duchy, merit, salon]
In each case there is only one day in the whole 7-year cycle when you
will get just a single colored tile (it will be a green A because the hidden
word will be "kappa") and only 15 days when you get just two yellow tiles.

In the opposite direction, if you want the starting triple to give you
the greatest likelihood of getting four or five colored tiles, from just
two of the three words, I've got a couple of good ones: the best I've seen
will put you in that position almost one game out of every three:
[close, train; dumpy],   [scone, trial; dumpy]
(In all the good examples I found, the weak word is interchangeably dumpy,
jumpy, or pudgy.)  Similar examples can be made from almost any
disjoint PAIR of words that provide a lot of yellow and green
tiles. But apart from the one measure that's being optimized in this
paragraph, these triples do not fare especially well.

A personal favorite, scoring highly on several scales, is
[blond, march, spite]
It has one of the highest rates of completion on move 4 (84.06%) and a
a high frequency of offering good hints right away (on one-third of all
days it reveals 4 or more colored tiles, or three colored tiles with more
greens than yellows.) To play, use the out-of-cluster solution GAUNT -> GUAVA
and the preferred words {eager, rower, fewer, jolly, otter}.

I also recommend
[blond, parse, wight]
It uses all 9 of the most common letters; it allows the player to win
by the fourth move in 80.3% of the cases just by guessing the first
word that comes to mind, and in fact by just guessing that way the
player will win in 99.95% of all games. In order to handle the
remaining cases one need only remember to use
local mayor otter mover fella
when possible. At that point the player will need the 6th move only
1.8% of the time.  (That sixth move can also be avoided, by remembering 38
"preferred" words and 24 substitution pairs.)

A few other triples that I have found to have a good mix of virtues
include these:
[blend, right, scamp]
[crawl, fight, spend]
[glide, spawn, throb]
All of them can be resolved with just a few preferred words, respectively
{local, water, judge, fewer, voter}
{gamer, baker, mayor, outer, berry}
{match, mower, mayor, outer, mover, dryer}

==============================================================================

TWO

It is very popular to ask about "best" pairs of words to start a Wordle
game. What goes unsaid is what we discovered in the previous section:
very much depends on how exactly the player intends to proceed, and on
what the player values when comparing one starting pair over another.

We have already found starting triples that allow us to
complete every Wordle game in 5 moves, while it is known that there
is no algorithm that can successfully complete every Wordle
game in 4 moves. So it is not clear what else we want to accomplish
with a starting pair that we have not already accomplished with
larger starting sets, except to reduce the number of moves.  (We could
measure this with the expected number of moves, or the frequency of
winning on moves 3,4,5, or 6, etc.) At the same time we must accept
the fact that the algorithms to improve performace will of necessity
be more complicated than the ones we have already encountered.
Moreover, there is now an even greater likelihood that the starting
set (having now at most 10 letters in it) will give us paltry
information about what the hidden word might be.

When we looked at five-word starting sets, we discovered that having
repeated letters was the only choice; among four-word starting
sets it was a competitive choice, and among three-word starting
sets, having repeated letters was an uncommon choice.
Now, among two-word starting sets, we expect that having repeated
letters will be a poor choice. So in what follows we will
restrict our attention to pairs with no repeated letters.

It turns out that there are exactly 196,175 such pairs of Wordle words.
I have made a list of all of them, along with some basic data about each.
(It is sorted informally to reflect a notion of expected "quality".
At the top is [salon, trice]; at the bottom is [inbox, jumpy].)
But what is there about any of them that would cause us to call it
better or worse than another? Here are some possible considerations.

First of all, we could ask that the pair give us the best chance to
get a solution on the very next move (move 3). It is an elementary
probability exercise to see that the probability of success on move 3
is K/S, where S is the number of possible words (S=2315) and K is the
number of clusters into which they are separated by the starting words.
So we look for the pair that partitions the dictionary into the most
clusters. Of all the 10-letter starting pairs, the ones that produce
the most clusters (K=1071 of them) are  PRICE + SLANT and CRANE + SPILT.
Play these and then guess a word consistent with the clues; there is
a 46% chance your guess will be correct. (The average size of a cluster
is S/K, so these are also the starting pairs that have the smallest
average cluster size, 2.1615.)

More generally, we want our starting pair to sort the dictionary into
a "large" number of "small" clusters. The smallest (singleton) clusters
are best of course: these are the words we know unambiguously. So
we can ask for the pair that has the most clusters of cardinality 1.
That pair is again PRICE + SLANT (with 634; CRANE + SPILT is second, with 631.)

Looking forward to the remaining moves, we might instead want to insist
that there be no "large" clusters. Sadly, every pair leaves some clusters
of size 16 or more --- no matter what starting pair is played, there will
be days on which there are 16 or more words that are consistent with
the clues they provide. The only pairs whose largest clusters have "only"
16 elements are
SCALD + TENOR     CLONE + STAIR      NOSEY + TRAIL
They have, respectively, two, three, and four clusters of that size.
(For example, if after the first pair is played, the E and R tiles go
yellow while the other eight are gray, then the hidden word that day
could be any of
bribe, brief, every, fibre, fiery, grief, grime, gripe,
prime, prize, puree, purge, query, rhyme, rupee, where
An all-gray color display indicates the hidden word is one of the
sixteen Wordle words that lack s,c,a,l,d,t,e,n,o, and r.)

Not all clusters are equally tricky of course. Since (after the
starting pair) we have four moves in which to guess the hidden word, a
cluster of four or fewer words will surely allow us a win. So we may
ask for the pair which puts the most words into those small clusters
(i.e. for which the fewest words lie in clusters of size 5 or more).
The prize now goes to SALON + TRICE, which has 1,543 words in these
"guaranteed" clusters. (Or we can count the clusters rather than the
words; PRICE + SLANT has 91.5% of its clusters "small", just beating
second-place CRANE + SPILT which has 91.4%)

There is another way to look at the situation just after we have played
our starting triple: how hard is it (for a human!) to come up with
a word that matches the clues provided by the starting pair?
It's obviously easier when there are many green and yellow tiles,
green being better. Of course this will vary by day; we can look at the
average numbers of those tiles or equivalently, the total number that
appear over the entire 2315-day Wordle cycle. The pair that will
show the most greens is CRONY + SLATE, with 2692. I considered weighting
the yellows as half as valuable as a green; with that weighting CRONY + SLATE
is still the best (it will show 4131 yellows over the cycle, too) but
if we value the yellows more heavily the ranking changes. At the extreme
of counting yellows and greens equally, there is a 13-way tie. Indeed
it is not hard to show that the total number of colored tiles that
will appear over a Wordle cycle depends only on the letters used. Thus
all of these pairs
[route, slain]   [route, snail]   [louse, train]
[sonar, utile]   [solar, unite]   [outer, slain]
[solar, untie]   [outer, snail]   [arose, unlit]   [arose, until]
[alien, torus]   [noise, ultra]   [arson, utile]
will score equally (they will produce 7062 colored tiles) because
they are made of the same letters: a,e,i,o,u and l,n,r,s,t . These
happen to be the 10 most-used letters in Wordle if we count by
*words containing the letter*. If instead we multiply-count any
repeat letters within a word, then "c" would replace "u" in this list;
and as it happens the words containing aeioclnrst are (tied for)
second in this ranking, with 7053 colored tiles.

My list of all the 10-letter pairs is sorted by an ad-hoc measure
that combines several of the ideas discussed so far. By that criterion
the top pairs include some that are by now familiar:
[salon, trice]
[cairn, stole]
[price, slant]
[close, train]
[crane, spilt]
...
----------------

In order to analyze and rank the starting pairs any further, we would
have to know how the player intends to proceed after playing the starting
pair. I will investigate two procedures the player might follow.

First let us suppose the player follows a guess-at-will strategy. As it
turns out, for every one of the starting pairs there is then a nonzero
possibility of losing (depending on what the hidden word is, and the ways
that the coin flips as the player picks words that are consistent with
the clues). So one of the ways of ranking the starting pairs is to
compare the probability of a loss when following this strategy.

For this search I resorted to some heuristics to trim the search space a
bit, so it is *possible* that there can be a better pair, but these appear
to be the best. Shown here is the computed (not experimental) rate of failure
when following a guess-at-will strategy after starting with each pair:
[spend, trawl] 0.0024206777 (1 fail per 413 days --- about 5.6 per six-year cycle)
[blond, tramp] 0.0024243336
[blend, tramp] 0.0024699297
[bland, swept] 0.0025154240
[scold, tramp] 0.0025959420
[blend, stamp] 0.0026618621
[blend, swamp] 0.0026902371
[bland, trump] 0.0026914026
Interestingly the pattern continues even further down the list: the "best"
pairs by this standard only use words with a single vowel, in the middle.

While these pairs have the lowest failure rate, they tend in general to take
longer to achieve a win. For example the games that start with these pairs
will typically be able to guess the hidden word on move 3 less than one-third
of the time -- many fewer than the 46% success rate we have seen for other pairs.

As noted already, this strategy can result in a loss, no matter which
starting pair is used. But we can imagine allowing the game to continue for
more than 6 turns, until a win is inevitably found. Counting the 7-move and
8-move (etc.) games too, we can compute the average number of moves needed
for a win for each pair. The starting pairs with the lowest average
numnbers of moves, following the guess-at-will strategy, are
3.700452018, [crane, spilt]
3.702213793, [price, slant]
3.712982832, [crane, split]
3.713728776, [cried, slant]
3.719044188, [print, scale]

----------------

A player who does not want to allow a loss would have to follow a different
strategy. In this document we have consistently proposed one alternative:
to keep things as simple as possible, we would assume the player would
examine the tiles after the starting pair is played, to determine the
cluster in which the hidden word lies. Then:

(1) If that cluster will surely find the hidden word by move 6 using a
guess-at-will strategy, the player will pursue that.
(2) If not, but if one "preferred" word in the cluster will give enough
extra information to finish by move 6, then play it on move 3.
(3) if not, but a non-cluster word will separate the cluster into
smaller clusters that can each be resolved by move 6 using
guess-at-will, then play that word on move 3.
If in (2) or (3) there are multiple candidate words to play, then
play the one that gives the best probability distribution.

Unfortunately it seems that for most pairs there are other clusters that
do not fall under any of these three categories. We have seen that there
can be two-word solutions in those cases, or a recursive use of
"preferred" words. I have done some by-hand analyses for these cases
but only for limited examples.

Now, how shall we pick a 'best" starting pair to use with this strategy?
Unlike the previous subsection, the chance for a loss is always zero;
but now there is something new to optimize: the difficulty of the algorithm.
For none of the starting pairs is it possible to have a short list of
additional rules (as e.g. the three we provided for CHOIR + SWEPT + GLAND),
But we can do our best!

The starting pair with the simplest algorithm that I have found is
[blond, spite]
It will lead to a guaranteed win by move 6, with an average number of
moves that is below 4.0, if we simply follow rules (2) and (3) for
some exceptional clusters. But there are 23 of them: twenty-three
combinations of yellow and green tiles after these two words have been
entered, that signal a cluster of possible words that cannot be resolved
just by freeform guessing. The set of rules to cover these tricky
clusters is then more difficult to memorize than anything we did with
starting triples! In 21 clusters we will use the "preferred" words
aider, cater, chain, charm, chart, crave, crest, fifth, folly, girly,
grill, legal, mayor, money, scary, scree, shrew, stark, trait, twang, wager
The other two clusters require out-of-cluster moves on move 3:
if the word could be "found", then play "wharf"
if the word could be "mover", then play "rocky"
Using these 23 rules will guarantee success by move 6, taking on
average 3.8237 moves.

Among the many starting pairs I have so far studied, none has
fewer than 23 rules for exceptional clusters like this, so BLOND + SPITE
is "best" by this criterion for "simplicity". (It has the added virtue
that using the word MARCH on move 3 resolves many of the 23 clusters by
move 6, that is, if forget some of the 23 rules while playing, we could
simply revert to the six rules already discussed for this starting triple
in the previous section.)

I have found one other starting pair that creates only 23 tricky
clusters: BLOND + TRACE. To play this pair, use the 19 preferred words
agape, amiss, amuse, cinch, favor, fetal, foist, folly, gaunt, gipsy,
grape, hairy, impel, palsy, serif, serve, setup, shift, swill
and the four out-of-cluster moves:
if the word could be "catch", then play "champ"
if the word could be "found", then play "swamp"
if the word could be "gamer", then play "gawky"
if the word could be "mover", then play "whisk"
Using these 23 rules will guarantee success by move 6, taking on
average 3.7461 moves.

An interesting candidate for "simplest" is GLAND + SWEPT . This
starting pair leaves 35 clusters that require special treatment;
34 of them can be resolved using a preferred member of the cluster,
and the last (the one including BERRY) can be resolved using
the out-of cluster word ROCKY. But alternatively we can use
CHOIR for 33 of the first 34 --- all except the one containing
[arbor, armor, favor, major, mayor, razor]
since we noted in the previous section that CHOIR + GLAND + SWEPT
is a good starting triple, having itself just three difficult clusters,
this setbeing one of them, (It can be resolved using MAYOR as
a preferred word.) In other words we have a "simple" algorithm for
winning Wordle:
Start with GLAND + SWEPT and see which cluster contains the day's word.
play ROCKY if the cluster contains BERRY,
play MAYOR if the cluster contains MAYOR,
play CHOIR if the cluster is any of the other problematic ones,
Otherwise, guess at will.
But this is a cheat! To use this algorithm we must recognize
those 33 clusters as they arise, which is no easier than remembering
the preferred words that signal them. Instead, this algorithm is
essentially a variant of that used with the starting triple:
Start with GLAND + SWEPT.
If the word could be BERRY, play ROCKY; then guess-at-will
Otherwise play CHOIR. Then
If the word could be MAYOR, play MAYOR; then guess-at-will
Otherwise guess-at-will.

The other natural criterion to use is the average number of moves
until a win. Alex Selby's web page advocates CRANE + SPILT, which we
have seen is also good by other criteria. This pair allows a win by
move 6 with most of the 1071 clusters dispatched by free-form guessing.
There are 25 clusters which can be handled by using a preferred member:
above, allow, awake, batch, bevel, blade, corer, dingy, ditch,
ditty, dogma, dumpy, earth, foist, goody, gouge, grade, haven,
marry, merge, otter, sewer, vomit, wager, women
An additional four clusters require out-of-the-cluster solutions:
billy->BAWDY, bound->BAWDY, bully->FJORD, daunt->JUDGE
But then in addition, there is the largest cluster (containing 33
words), and in this one we cannot pin down the hidden word among them
with certainty with any single word played on move 3 (neither from
within nor outside the cluster). A double-word substitution will
suffice: play ROWDY and JUMBO on moves 3 and 4, and this will
determine the hidden word uniquely, to be played on move 5, except we
have to toss a coin to pick between {roger,rover} and between {every,
ferry}, which could force us to defer victory until move 6.

In the previous subsection we noted this starting pair used fewer moves
on average (3.7005) than any other starting pair. More precisely,
if we use this starting pair and simply resort to a guess-at-will strategy,
we would discover the hidden word by move 6 about 99.505% of the time: 46.3%
of the time by move 3, 40.4% on move 4, 10.9% on move 5, and 1.9% on move 6.
But now, following the recipe above will guarantee a win even in the remaining
cases, with a lower average of 3.6590 moves per day. (We now finish on move 3
only 46.1% of the time, but a higher 43.1% of the time on move 4 as the
longer tail of the old probability vector is moved toward shorter games
by the complicated rules of play.)

Clearly an average game length of 3.6590 is better than anything we obtained
using fixed starting triples. But when applied to an  N-fold compound
game, this would imply (an upper bound for) the length of the game
being 2 + 1.6590 N. For N=1 this is clearly better than say the bound
3 + 1.1682 N which we obtained in the previous section. But already for
N=2 the advantage is nearly lost; so even for Dordle it is not clear
that we are better off with the best starting pair than we would be with
one of our good starting triples. It is unlikely to be better for Quordle and beyond.

We have also already met another promisinhg starting pair: PRICE + SLANT.
This one can win Wordle 100% of the time if we use these 24 preferred words:
awake, badge, berry, bowel, corer, dingy, ditty, dogma, dough, eater, field, foist,
gouge, grade, haven, legal, marry, merge, modal, otter, sewer, vomit, wager, women
and employ these 5 out-of-cluster pairs:
[batch, climb], [billy, howdy], [bound, thumb], [bulky, fjord], [daunt, dough]
However, the cluster containing "berry" is large (33 elements) and even if we
preferentially play  berry  on move 3, there are still two large sub-clusters
that could lead to a loss unless we play preferred members of each of THEM, too:
*after* playing  "berry" on move 3, if the word could (still) be "roger" or "mover",
then play those words on move 4. (So that's a "recursive guided freeform hard mode".
WHew.) With this algorithm, Wordle is definitely won by move 6, taking an
average of 3.6618 moves.

AS far as I have computed to date (several thousand of the most promising
starting pairs), the pairs which use the fewest moves on average, while
guaranteeing a win by move 6 using this style of procedure, start with
3.6591 [crane, spilt]
3.6618 [price, slant]
3.6739 [crane, split]
3.6757 [cried, slant]
3.6771 [print, scale]
...

I have not finished an exhaustive search of word-pairs but I have looked
at all pairs drawn from what I consider the "better half" of all Wordle words.
I am running a background process at home that sifts through promising pairs;
for each one it is necessary to identify the problematic clusters and to find
in- or out-of-cluster words that can resolve them, if possible; I can also
search for procedures to resolve the clusters which cannot be won with these
tools, and then compute the probability distribution showing the frequencies
with which this algorithm will end on moves 3, 4, 5, or 6. Over time I
may use these results to update this section. But it seems clear that these
procedures to guarantee a win with a particular starting pair are inevitably
too complicated to actually be used by a human, and unlikely to be useful
for the compound games.

==============================================================================

ONE

As far as the best single starting word, by various criteria that might be
trace, or brute or chant, or raise or arise, or filet or parse, or dealt, or...
Many people have weighed in, using different rubrics to assess the choices.
YMMV: by now the reader understands that we cannot answer the question of
"What is the best starting word?" without deciding how to measure quality,
and without fixing an algorithm that the player will follow after entering
that first word. (Most answers to this question assume the player "will play
optimally" but that seems unrealistic unless a simple algorithm is stated
clearly and then followed by the player!)

==============================================================================

CONCLUSION(?)

So ... what does all this tell us about how to play Wordle and the compound games?

One conclusion, surely, is that what we choose to do will depend on what we want
consider important. We have seen in the examples that the most important goals
may be at odds. We want to win as often as possible; we want to keep our numbers
of moves used as low as possible; we want to follow a procedure that is as
simple as possible; and along the way we appreciate not having to come up with
good moves in the absence of concrete hints.

In order to say something more definitive, I ran some simulations
of Wordle and the compound games, playing with the starting sets
discussed in the text, and following the intended procedure (using
free-form guessing whenever possible; playing preferred words when
necessary; and using out-of-cluster words only when that is the only
option). I tried a couple dozen of the starting sets from the text,
and used them to play Wordle, Quordle, Octordle, Sedecordle, and
Sexaginta-quattuordle, as well as idealized Dordle and Duotrigordle
(versions that would use the same word list as Wordle). More precisely,
I simulated the "sequential" versions of these games, in which the
player must complete the subgames in order.  The primary statistics to
collect are (a) the average numbers of moves used to win (assuming no
limit to the number of moves allowed) and (b) the frequency of failure
(taking more than N+5 moves to complete an N-subgame game).  In more
detail, I kept track of the distribution of the numbers of moves used.

Looking over the numbers, we can draw some general conclusions.

1. The primary determinant of these numbers is the number of words in the starting set.
The two-word starting sets take lower numbers of moves, but have the highest rates
of failure; the four-word starting sets are just the opposite.

2. For two-word starting sets, the average number of moves increases rapidly with the
number  N  of subgames; for larger starting sets the effect is less pronounced.
The overall result is what we predicted in the introduction: when  N  is small
(up to N=3), a one- or two-word starting set is likely to be best; in an
intermediate range (N=4 through perhaps N=100?) a three-word starting set
will probably be best, and for very large N we might expect the four-word
starting sets to win.

3. The average number of moves taken in an  N-fold  compound game is obviously
at least  N . But what is interesting is just how much larger than  N  it is:
the excess over  N  increases as  N  increases --- but only up to a point, and
then starts to decrease! (The maximum in each table is usually around N=16 or
N=32). I suppose the explanation is that as the number of subgames grows very
large, the set of hints available in the later subgames grows quite abundant
from the words played in the earlier subgames, and so it becomes more and more
likely in later games that we can immediately deduce the hidden word in one
try --- or that the new hidden word was already one of the guesses in a previous
subgame!

4. Players wishing to keep their "streak" alive should definitely use the larger
starter sets. For example it is possible to win at Quordle over 99.8% of the time
using a 4-word starting set; the failure rate for a 2-word starting set is over
ten times as large.

5. For starting sets of a given size, it can be difficult to observe a meaningful
distinction between their "success rates". I tested quite a few 3-word starting sets
and while there is some noticeable difference between the starting triples that
were optimal by one criterion and the triples that were optimal by a different
criterion, many of the "generally good" triples had very similar tables.

6. "User error" is likely to be a large enough problem to obliterate the differences
in many cases. For example, I have personally played a few hundred games of Quordle
recently using the spite+march+blond opening; from the tables I see that about
two-thirds of them should have completed with me immediately entering the answer
in every one of the four subgames. I know I have done so more than half the time,
but not 2/3! I fail to notice a yellow tile, I type "blind" instead of "blond",
I forget to use the preferred words, or I just draw a blank!

----------------

To repeat the highlights, the "best" starting sets that I have found, of
various sizes, are
catty, frond, rumba, spill, verge, whack
blank, chump, goody, river, swift
carve, downy, plumb, sight
bland, copse, right
crane, spilt
But who am I to say? :-)

I will continue to process the datasets that I have constructed, and intend to
update this document when something new pops up. In the mean time, I welcome
corrections and suggestions for further investigation.

Now, how about moving on to a nice game of Nerdle, hmm? :-)

--dave
rusin@math.utexas.edu

```
============================================================================== Projects for the future including looking to see how the claims are modified for the other games (with different word lists), or when allowing any valid starting word from the Wordle input list Does our "metric" really satisfy the triangle inequality? Need more examples of triangles! Check whether a 4 DAILY + 1 ADMX combo can give 25 letters. Need to talk in THREE about why the informal ranking is or isn't supported by the experience in TWO , since I only have time to examine the "top" of the list.