The Best Starting Words For

                    W  O  R  D  L  E  !

I have some things to say about the word game "Wordle" that became
popular in 2022, along with related games that are based on (roughly) 
the same dictionary (Dordle, Quordle, Octordle, etc.) I do like to
play them, but as a mathematician I wanted to do a thorough analysis
of some questions that arose as I played.

What I present here is (mostly) a discussion of the "good" sets of words
drawn from the  original 2315-word Wordle vocabulary list. A set of
words is "good" if, when used as a starting set of entries in the games,
it enables a human to guess all the words in a small number of turns.
The starting set must help win all the "subgames" in a compound game; 
the player then usually in effect switches to "hard mode" in each of
the sub-games one at a time, guessing only words that are consistent
with the clues (the colored tiles that result); the player might
memorize extra rules to handle a few tricky situations.

I wish to find and to compare those initial lists of starting words.
I will use both exhaustive searches and clever optimization techniques
to find "discriminating" sets of words: sets for which each resulting
array of green/yellow/gray tiles matches relatively few vocabulary
words.  Rather than a summary of my own personal experience, this
document is intended to be a comprehensive review of sets of words
that are demonstrably better than the alternatives, according to a
variety of clearly-defined standards.

If I accomplish nothing else with these sets of words, at least I will
have generated some great passwords!

I am hardly the first person to apply a thorough mathematical analysis
to Wordle. Some open forums with information include Stack Exchange,
Reddit, and Discord. Laurent Poirrier has collected information about
"optimal" algorithms for playing the games, including results of Alex Selby. 
When applied to Wordle itself, those alternatives are more "efficient"
than anything I propose here, at least in the sense that the average
number of guesses will be higher playing the way(s) that I propose here.

But beginning with a fixed, good starting list is both simpler
for a human player (many fewer branching rules are required), and
better suited for the compound games, and those are the criteria
of interest to me. Using the solutions I present here, I will never
lose at Wordle, and can play hundreds of Quordle before losing
once. (I take an average of about 7.4 turns for Quordle; that
includes a lot of "user error".) Xan Gregg has
an analysis applicable to these compound games, although the focus
is on perfect play (rather than something a human can aspire to).


Please let me know of corrections or additions to this document.
-- dave
(rusin@math.utexas.edu)

    Index of sections:
  1. Some caveats
  2. Introduction: what are we doing in this document?
  3. The best starting sets of six (and more!), and why these are interesting
  4. Best starting quintuples and waltzing nymphs
  5. Interlude: What does it mean for a starting set to be "best"?
  6. Best starting quadruples: everyone can win at Wordle
  7. Best starting triples (by various measures):
    1. Completing by move 5
    2. Closest to free guessing
    3. Best situation before turn 4
    4. Guessing without strategy
    5. Guessing with strategy
    6. A few other good triples
    7. Summary table of best triples
  8. Best (and possibly best) starting pairs
  9. The best single word to start with
  10. Concluding remarks
============================================================================== Some initial caveats first: 1. Except in section 9, all comments here are about playing Wordle in "easy mode". Any starting set containing more than one word will fail to satisfy the hard mode rules on some days. (And I don't even know what "hard mode" would mean for a compound game.) 2. All my analyses are built upon the word lists in the version of Wordle that was a simple web page in February 2022 (before purchase by the New York Times). In particular, (almost) all uses of the word "word" here mean "one of the original 2315 possible answers to a Wordle puzzle". (I do make a few comments below that refer to the larger, 12972-word, list of acceptable inputs to Wordle, but I have made little effort to update them in response to NYT's enlargment of that set in Summer 2022.) I have gathered together a long list of comments about the word list(s) that I recommend to a person who actually wants to play the games well. It is important to know what words are, or are not, potential Wordle answers, and in particular the results quoted in this document assume that the player has perfect recall of the list of Wordle solutions! I believe this "practice site" uses the same wordlist as Wordle itself, and speedle offers it as an option; I recommend them for testing out the good word sets discussed in this document. Some of the compound games use slightly different word lists; these are discussed only briefly. (The game of Woodle also uses the same wordlist as Wordle's current list, but the mode of play is very different and will not be discussed in this document.) 3. In original Wordle, the daily hidden words were presented in a particular (random-looking) order; since November 2022 they are chosen by a "curator" at the Times. Our model of the games assumes instead that the words in the word list are chosen at random, with uniform probability, to be hidden each time we play. (This appears to be the mode of play in the compound games, at least in "practice mode", except that as far as I can determine the multiple subgames are guaranteed to have different hidden words.) One may therefore interpret probabilistic statements in a frequentist sense: in what fraction of the games in an entire 2315-day Wordle cycle does such and such an event occur? 4. In some cases I am stating claims of optimality or completeness. The proofs I give are mostly just sketches that can be fleshed out by the reader if interested. The only parts that do not amount to a simple case-by-case computer check have to do with the computation of covering sets (which I did with linear-programming/ optimization software Gurobi). I have written up a brief introduction to that technique available here. The key ideas are (a) to cement ideas of "nearness" or "similarity" in the word list, (b) to identify sets of "most-similar" words that will be problematic late in the game, (c) to compute for each such set the collection of words to play that will help avoid these problematic sets, then (d) to find a "cover" for these collections --- a set of words that intersects all or most of these collections. ============================================================================== INTRODUCTION: HOW DO PEOPLE PLAY WORDLE? Let's get the terminology straight before we discuss the good word sets. (You should, at the very least, read the end of this introduction!) If you ask a researcher for the best way to play Wordle, they will present a decision tree --- basically a list of if-then statements that specifies what to do at each stage in the game, based on the clues being given by the day's hidden word. There must necessarily be thousands of rules since for each of the two thousand possible hidden words, the tree must have a separate terminal rule (a "leaf" on the tree) that says it's time to play that word, not to mention intermediate rules to be played mid-game. Now ask a frequent Wordler their strategy and you'll get a variety of answers. "I just pick a random word to start with and run with it"; "I start with ADIEU to get a lot of vowels"; "I read somewhere that it's best to start with CRANE + SPILT". To me, these sound like only Phase 1 of a strategy: all these answers specify a certain number "a" of fixed starting words (a=0, a=1, and a=2 respectively). (I should note parenthetically that the second answer came from someone who, unlike me, is willing to guess a word that's not a Wordle answer-word, and the third answer came from someone who, like most of us, is playing Wordle's "easy mode".) This Phase 1 is about gathering information about what the hidden word(s) might be. And it's important for people who (like me) play the compound games, because the hope is that these first "a" turns will simultaneously reveal a lot of information about the multiple Wordle subgames. But then comes Phase 2: how do we use the information gained? No human is going to memorize thousands of individual instructions! Maybe a few, to cover special cases ("If I still get all gray tiles then...") But at some point, people start to enter guesses of what the hidden word might be; unlike using SPILT after CRANE, most people at some point in the game start to enter only words that are consistent with the clues from the previous turns (i.e., they unconsciously switch to something like Wordle's "hard mode"). They'll spend some number "b" of guesses in this mode trying to guess the right word. Unlike a decision tree, this number b is not fixed: given the same exact puzzle (same hidden word) on a different day, the player might make different guesses, maybe finding the hidden word sooner or later. (So actually b is a "random variable", in the parlance of Statistics.) The goal is to have a+b no larger than 6, to win the game, but we might interpret that in terms of the expected value of a+b, or of its maximum value. So humans mostly don't play as the researchers envision, and hence the results of most prior research are primarily of academic interest and not necessarily helpful to a human who is willing to learn only a few steps and rules. That's where this document comes in. A person playing Wordle does not need to think of the distinction between Phase 1 and Phase 2. But it does become more important when playing the compound games like Quordle (with N=4 subgames) in which we are in essence playing Phase 1 just once for all N of the subgames and then carrying out Phase 2 separately for each of them. Thus the total number of turns would be not a+b but rather a + N b . As N increases, it becomes more and more important to keep b small, even if it means a has to be a bit larger. In other words, we want to find sets of words for Phase 1 that are really good at providing clues about the hidden word(s), so that we spend very little time after that guessing word consistent with the clues left after Phase 1. Considering the optimal solutions we find in this document, the expected number of steps to solve an N-fold compound game need not be higher than the lowest of these: 6 + N 5 + 1.0058 N 4 + 1.0298 N 3 + 1.1682 N 2 + 1.6590 N 1 + 2.8218 N These are merely upper bounds, but the pattern is clear: smaller values of b come with larger values of a. So for sufficiently large values of N it can be more efficient to use a larger starting set. ---------------- Let's see how we can analyze just how good a starting set is. As we shall see, in order to be able to compare different starting sets, it will be important to know not just what words a player starts with but also what exactly they will do after those starting words are entered --- and what it is they value as the game progresses. Let's follow one player whose Phase 1 has a=3: they use the three starting words LOATH+MURKY+SPINE. It's a good start! But now what? Consider what this person might do when the colored tiles show up as in each of these examples. lower case = yellow tile = right letter, wrong place; UPPER CASE = green tile = right letter, right place. Play along here: what would YOU do in each case? LOATH + MURKY + SPINE 1) .O..h .u... s...E 2) ..... ..... ....E 3) ..A.. ..rK. ....E 4) ....H .ur.. s.... 5) l.A.. ...k. ...N. 6) ...t. ..r.. ..i.e 7) .o... ..r.. .p..E 8) ..... ..r.. .pI.E 9) .o... ..r.. ...n. 10) LOA.. m...Y ..... 11) ..a.. ..r.. ....e 12) ..at. .u... ...N. I hope for Example #1 you decided the word was "HOUSE". You're right! That's the only word consistent with those clues, and you might as well enter it on your next turn and win. Example #2 is much harder but it turns out there is only one Wordle word that matches this pattern: "WEDGE". Most people, I think, would need a hint to figure this out; they might deliberately enter something that's not a Wordle solution word, or not consistent with the colored tiles, just to get some information about some more letter. That's fine, but of course it costs one turn. In this document, we will assume the player is perspicacious enough to spot the right word without hints, when there is only one (and more generally we will assume the player can list all the possible words consistent with the hints). This is NOT realistic; from time to time we will discuss ways to make things easier for the player. But it illustrates why a person needs to really know the word list! It turns out these first two examples are pretty representative of what this player will face: Of all the words in the Wordle dictionary, 1364 (59%) are uniquely identified by the colored tiles that result from playing LOATH+MURKY+SPINE. But for the rest, the colored tiles have only indicated that the hidden word is one of a "cluster" of similar-looking words. In Example #3, there's a good chance you see the "_rake" and so you'd enter "brake" right away. Again, not a bad plan but it turns out "drake" is also a Wordle word. This is common: we think we know the word, so why not enter it? But then we discover the word we entered is only one of several possibilities. In this example we have a 50-50 chance of getting the word right, even if we *do* know the two possibilities. The same would happen in Example #4: this time there's a better chance you recognize both "brush" and "crush" as possibilities, and they are the only ones, so what else is there to do but enter one or the other and hope for the best; half the time you'll win on the first guess, half the time on the second. With a bit of effort you might figure out that when Example #5 shows up, the word is either BLANK, CLANK, or FLANK. So what do we do now? Most people would probably enter these words one at a time, especially if at first only one or two of the possibilities comes to mind. (Really? "clank"?!) If that's what you choose to do, you'll get the right word in either 1, 2, or 3 more turns, each with probability 1/3. Playing this way --- simply entering the first matching word that comes to mind --- we might call "guess-at-will" mode, or (since we have now slipped into playing a kind of "hard mode"!) we might call it "free-form hard mode". The situation in Example #6 is a little different. The possible words now are REFIT, RIVET, and TIGER. But this is a better situation for the player than example 5! No matter which of the three we choose to enter as our fourth word, if it's wrong we will get enough new information to tell which of the other two is the hidden word. So in reality we're playing the same way, but have better odds: still a 1/3 chance of winning on turn 4 but then a 2/3 chance of winning on turn 5. This now leads us to look at Example #7: there are again three possible answers: GROPE, PROBE, and PROVE. But this time the situation is a mix of the previous two. If we guess GROPE on turn 4, then we have no additional information to distinguish whether PROBE or PROVE is the right word. If instead we guess one of the other two, then we *do* get that information and can surely win on turn 5. So in this case, the player has two choices: it's simpler just to continue to guess whatever seems to fit, continuing in freeform hard mode. But it's more efficient to use a "guided hard mode", in which (in addition to memorizing the three starting words) the player memorizes that playing PROBE when it's possible to do so is the preferred thing to do. In that last example, taking the effort to remember an additional rule has only a small payoff, but the same principle applies in more important cases. Example #8 shows an array that could signal any of GRIPE, PRICE, PRIDE, or PRIZE. It's quite possible that a player using freeform guessing would guess GRIPE first, which unfortunately would give no information about which other word is the hidden one (if it's not GRIPE itself) and then no matter which other of the four words we try next, we never get more information about the remaining candidates when we guess wrong. In this case the player could definitely run out of turns and lose. By contrast, if the player takes pains to remember to keep an eye out for PRIZE and play it when it's possible to do so, then he or she will definitely win by turn 6 whichever of the other three is the hidden word. So our analysis of each starting wordsets will consider two situations separately: what will happen if the player simply guesses candidate answers words at random, versus, what would happen if the player maps out a strategy that resolves a cluster of candidate solutions by (memorizing and) playing the most "efficient" of those candidates. There's another strategy that a clever player can use, and it's again illustrated by example 5 (the "_LANK" one). For a player not committed to hard mode at all, there's no reason the player could not enter, say, BRACE as the fourth word. Depending on whether the B, the C, or neither gets a colored tile, the player knows right away which is the right word and can enter it as the fifth word, and win. So this manner of play --- this "out of the box thinking" --- can reduce the maximum number of turns that it will take to resolve a situation like example 5. That can mean the difference between winning and losing! Out-of-the-box mode can also reduce the average number of turns needed (i.e. the expected value of this random variable). Example #9 demonstrates this: there are five words that could possibly be that day's hidden word: BROWN, CROWN, DROWN, FROWN, and GROWN. It's pretty clear that both freeform and guided hardmode can take up to five turns to get the right answer, with the player losing the game. But if the player enters BADGE for the fourth word, for example, then there will be a clear signal whether or not BROWN, DROWN, or GROWN is the hidden word and can be entered to win on turn 5; otherwise the word is either CROWN or FROWN, and we can enter one of them on turn 5 and if necessary enter the other on turn 6 to win. So the maximum number of guesses needed drops from 5 to 3, and the expected number drops from 3.00 to 2.40. That's a significant improvement, but it does come at the cost of the player having to memorize more steps to their algorithm (i.e. to remember that if the word could be BROWN, then it's best to play BADGE). With more options to choose from, it's not surprising that we can often find out-of-cluster words that trim the set of candidates in a cluster more often than using in-cluster ("preferred") words. In an attempt to use fewer turns it's tempting to look outside the cluster more often. I choose not to do so when an in-cluster word is available for two reasons. First, the special rules to resolve a problematic cluster take half as much memory this way! Secondly, when used in the compound games, the preferred-word rules continue to be just as useful even when previously-solved subgames remove some candidates from a cluster (e.g. the rule "play PRIZE whenever it's a candidate" continues to be at least as good a move as making a random selection among the candidates, irrespective of how many candidates are left in the same cluster as PRIZE). By contrast, using up a turn to enter an out-of-cluster word could be *less* efficient than choosing a random candidate, after some of the candidates are removed (e.g. if DROWN and GROWN have already been eliminated, it is a waste of a turn to enter BADGE). So we will generally assume the player will NOT reach for an out-of-cluster word if at least one of the words in the cluster will lead to a guaranteed win. In practice -- particularly in compound games -- humans might find it can be handy to employ those tactics if they can spot them on the fly (potentially even using words that are not on the shorter Wordle answer-list) if the player is close to running out of turns. But we will avoid discussion of such ad-hoc strategies. ---------------- For a player who has just entered LOATH, MURKY, and SPINE there are 1,715 possible ways the colored tiles can then appear. (We've just worked through 9 of them.) Fully 80% of them indicate precisely one word and the player has an easy win on turn 4. But those other 20% can be tricky, as we have seen. It's easy enough to write a computer program to alert us to all of them and outline potential responses, but if we wish to answer the question of how good this starting set is, we have to know just *how* the player intends to proceed in those other 20% of the cases! In my analyses, I will assume players play out "Phase 2" in one of two ways: (1) Simply use a guess-at-will strategy. With a six-turn limit, that may mean accepting the possibility of a loss; we might want to compute that probability, and then value most highly the starting sets of words that keep this probability as low as possible. Or, we can imagine the freedom to continue playing as many turns as needed until victory, and then we can ask for the probability distribution: what is the probability that the player will win after 1, 2, 3, ... turns. From that we could compute the expected number of turns until a win; we would then seek starting sets ot minimize that expected value. (2) Use the same guess-at-will strategy in general, but by pre-computing strategies for the possible tricky situations, the player will use a "preferred" right answer (like PRIZE for Example 8) when necessary to do so to ensure a win by turn 6. And (only) if no such preferred word exists, the player will use an "out-of-the-box" solution (like playing BADGE when BROWN is indicated by the clues, in Example 9). In either case, I would expect the player to revert to free-form guessing in the very next turn. Once the set of rules for each exceptional cluster is in place, we can again compute a probability distribution. We usually look for the starting sets that use the fewest number of turns on average. To repeat: yes, one can guess the hidden word faster by using additional rules to play more in- and out-of-cluster words; but the point of this document is to discuss *simple* algorithms to play the games! So we will only analyze sets of such rules that refer ONLY to those clusters that cannot be resolved by randomly guessing their members. There can also be situations in which neither a "preferred" in-cluster word nor an out-of-the-cluster solution exists. That turns out not to happen with LOATH + MURKY + SPINE but it occurs for example with LEARN + STICK + DOUGH: when the hidden word could be "batch", it could also be any of batch, catch, hatch, match, patch, watch It turns out that no matter what word you enter for your forth turn, from the entire Wordle answer list, you might STILL have to choose from among a set of at least 3 words on your fifth and sixth turns; then you might guess wrong and lose. (In fact, just this once, I even checked all 14,853 currently-allowed Wordle input words, as a candidate for the fourth turn, and every one of them leaves a set of three or more from which a player might have to guess on the last two turns. I have to say I was surprised by this!) That doesn't mean the player who starts with LEARN + STICK + DOUGH cannot win by the 6th turn, but now he or she must use BOTH turns 4 and 5 to gain enough information to be able confidently to enter the correct word, finally, on turn 6. For example the player would have to know in advance to enter (say) BAWDY on turn 4 and CHAMP on turn 5 . So that's another rule this player has to remember: BATCH -> BAWDY + CHAMP ; using it, the player will get enough information to know what to enter for turn 6. Even a reasonably good starting set might need a "two-word strategy" like this for a couple of its most problematic clusters. With additional effort we might identify a pair of words to be entered that fixes the problem; or alternatively we can identify some "preferred" words in the sub-clusters that result after turn 4, that should be played on turn 5. The very best starting sets avoid all this messiness, but we will describe a few examples in which these techniques are necessary or useful. In practice, a player using a prescribed recipe like this might sometimes choose to go rogue. Consider the player using LOATH + MURKY + SPINE who gets the colored-tile pattern in Example #10. It is already clear after the first two words have been entered that the word must be LOAMY, so there is no point to entering SPINE. This example is especially obvious but more generally there is no point to entering SPINE if the first two words have already revealed five different letters in the hidden word; since SPINE does not repeat any of the letters in LOATH and MURKY, we know in advance that the response to SPINE would just be five gray tiles, giving us no new information. Even if only some colored tiles had shown from LOATH and MURKY, we could probably skip entering SPINE, if we already have "lots" of information about the hidden word. The point is that in such cases it is likely that there are only very few words --- maybe just one --- that fit these unusually helpful sets of clues. A dedicated player might even make a list in advance of any problematic situations that could arise from skipping SPINE when there are (say) four colored tiles from LOATH and MURKY, and then devise separate strategies for those cases. We will not pursue this line of inquiry very far because it deviates from both principles we declared at the outset. On the one hand they give algorithms for play that increasing involve branching (so they're not simple). And on the other hand they're less useful in a compound game. To illustrate, suppose a person is playing Dordle (N=2) and after just LOATH and MURKY are entered the player sees LOA..+m...Y in one subgame but .....+..... in the other. Surely they can enter LOAMY next, to win the first subgame, but this will give no new information in the other, so inevitably the player will use SPINE again anyway. More generally, it only makes sense to abandon the intended list of starting words if *all* of the subgames have already given abundant clues in response to the first couple of words. This certainly can happen, especially in Wordle itself (N=1) but it becomes increasingly rare as N increases. ---------------- To complete this introduction, we can now summarize the prospects for the player who begins with LOATH + MURKY + SPINE . First, we can describe the situation immediately after those three words are entered. We start by observing the player is given a good set of hints by the colored tiles that result from this starting triple. It turns out that every hidden word will yield at least one colored tile; the word WEDGE mentioned earlier is the only one that gives only a single green tile, and only CIVIC and VIVID do worse by giving only one yellow tile. At the other extreme, about 18% of the words give five colored tiles; the average over the 2315-day cycle is that we will get 1.35 green tiles and 2.41 yellow ones. So this triple starts us off with pretty generous help with constructing the hidden words. Now, if the player has learned the list of solution words, then each possible arrangement of colored tiles flags for them a cluster of potential answer-words. We can count how many clusters there are that consist of 1, 2, 3, ... words; that gives us the cluster vector [1364, 227, 67, 30, 13, 4, 2, 3, 3, 1, 0, 1] So there are 1364 words that can immediately be guessed with confidence, 227 cases that require a coin flip, etc. These numbers sum to 1715, the total number of clusters. The fact that there are 12 numbers here tells us the largest cluster has 12 words; in fact that cluster is the set of possible solutions to Example #11: bread, cedar, debar, dread, eager, gazer, racer, rebar, wafer, wager, waver, zebra Generally speaking, it's better to have short cluster vectors, with entries dropping in size as quickly as possible left-to-right. In another section we will discuss some ways to make that idea precise; each such encapsulation can lead to a different way to describe ome starting set as "better" or "worse" than another, and so each can lead to a different conclusion about which is "best". To analyze further, we have to know which of the two modes of play the person will pursue after this starting triple is entered. (1) The guess-at-will mode cannot guarantee this person success. A player who enters these three starter words and then follows this strategy will sometimes take many turns to win. We can compute the distribution of the percentages of the time that the hidden word is found after 1,2,3,... addititonal turns; this is the probability vector and for this starting triple I calculate it to be [0.740821, 0.211184, 0.038993, 0.007789, 0.001194, 0.000019] The fact that it's six numbers long reflects the fact that an unlucky guesser could take as many as six more guesses (nine turns total) to find the hidden word, even if properly following all the additional clues that come from earlier incorrect guesses, and even if only guessing legitimate Wordle answer-words. From this vector we can also easily compute that there is a 0.9002% chance of losing the game, and that the expected number of turns needed to find the hidden word is 4.317. (That's a=3 starting words plus an expected value of b=1.317 additional turns.) Again, a good starting set would give a probability vector that's short and front-loaded. We will discuss a few different ways to turn that idea into a precise metric. (2) On the other hand, the player who does not want to face defeat (but still wants to begin with LOATH + MURKY + SPINE ) has the option of using special rules for tricky situations. As it turns out, of the 1715 clusters, only 25 of them can lead to defeat if we just randomly guess any words consistent with the clues. Thirteen of these clusters will still give a victory if we play a "preferred" word in the cluster of consistent words (like PRIZE) but the other 12 require an out-of-cluster word (like BADGE) to ensure success. (See e.g. Example #12, which could indicate any of daunt, gaunt, jaunt, taunt, or vaunt as an answer. But the player could enter JUDGE on turn 4 and then be sure of a victory after just 1 or 2 more turns.) We can summarize these additional rules by simply listing the 13 (in-cluster) preferred words {algae, allot, aware, baggy, bevel, biddy, bitty, bobby, boxer, bread, brief, budge, gripe} and the 12 ordered pairs [baste, twice], [batch, bawdy], [batty, aback], [billy, bawdy], [bully, badge], [baker, aback], [brown, badge], [daunt, judge], [coyly, fjord], [cider, ridge], [after, creed], [breed, befit] This means: play any preferred word that might be consistent with the clues provided by the starting triple; and for each ordered pair [WORD1, WORD2], play the (out-of-cluster) word WORD2 if the word WORD1 is consistent with the clues. (Checking WORD1 amounts to identifying the cluster in which the hidden word lies.) In any other case, and after using these special rules for turn 4, just keep guessing any word that's consistent with the clues up to that point. For players following this strategy, I computed the probability vector [0.735637, 0.228790, 0.035572] from which we easily deduce an average of 4.300 turns to win (and a 100% chance of winning by the sixth turn). This represents an improvement over the guess-at-will strategy, but comes at the expense of having a more complicated set of rules of play. (A final variant strategy would be to take nothing to chance and to map out in advance the best moves for each cluster, considering any words for moves 4,5,6 that allow the player to win for certain by move 6. This would allow the player to lower the expected number of moves just a bit, to 4.2894. But to do so would mean to memorize an even longer list of pre-computed branching decisions, and that is antithetical to the goals for this paper that we described at the outset!) So in the end, is LOATH+MURKY+SPINE a good Phase-1 strategy for Wordle? That's a matter of opinion. Memorizing the 13 "preferred" words, plus recognizing the other 12 difficult clusters and remembering the out-of-cluster word that resolves them, might be a bit much. Figuring them out on the fly is not impossible, but taxing. Sticking instead with mode (1) is simple, and if you're a gambler who is a "lucky guesser", that may be sufficient. And, well, maybe it's more fun too, even if it means the occasional loss. It's a personal decision, of course. But we can provide comparable data for other word sets, and let each player make a decision separately. ---------------- And how good is LOATH + MURKY + SPINE as an opener for the compound games? Those probability vectors apply only to Wordle itself but they point to some statistics for the compound games too. If the player is playing a compound game built from N Wordle subgames, then (if the game permits sufficiently many turns), the player will first enter the a=3 starting words, and then in any of the N subgames can expect to require b=1.317 additional turns to win by using a guess-at-will strategy (or 1.300 additional turns to win by memorizing what to do with the 25 anomalous clusters). That would mean the expected total number of turns is 3 + 1.317 N. Well... it *would* mean this expected number of turns if (after the three starter words) the subsequent guesses are applied only to one subgame at a time. In the games I know, this is not how the games are played -- instead, the subsequent words entered for the first subgame might give additional clues in the second and later subgames. So the expected number of turns is surely smaller than 3 + 1.317 N . Nonetheless, this gives an upper bound on the expected number of turns for e.g. Quordle, and more broadly shows the relative importance of the two phases of Wordle-solving. For Wordle itself (N=1), it is only the combined number of turns taken for both the fixed initial guesses and then the guess-at-will phase. But for increasingly large values of N, the size "a" of the starter set (here, a=3) becomes less important than the set's effectiveness in approaching a solution (measured here by the coefficient b=1.317).. For example, we will see later that there is a four-word starting set that has a probability vector of [0.9702, 0.0298], so the expected number of turns is 4 + 1.0298 for Wordle and no more than 4 + 1.0298 N for a compound game. Already for Quordle this is a smaller number than 3 + 1.317 N. So for small-N games, we might expect LOATH+MURKY+SPINE to be better than the four-word starting set, but for large-N games the opposite is likely true. (More generally: for Wordle itself (and the smaller compound games) it may be inefficient to fix more than one or two starting words to be used every day, but for the larger compound games, it may indeed be more efficient to begin with a starting set of three or four words.) Also note that, while we described a way to turn LOATH + MURKY + SPINE into a 100%-winning strategy for Wordle, it doesn't guarantee success for the compound games. Our refined strategy (2) will surely win Wordle with no more than 3 turns after the starting set, but for an N-fold compound game that means the maximum number of turns needed could be 3 + 3N, in the (rare) case that all the subgames force the player to use three additional guesses to discover the word. (Not only would this happen at most .0356^N of the time, according an earlier paragraph, but it would require that none of the words entered to complete any subgame offer any succor in any of the other subgames -- a very rare situation!) How else could one offer more information about how the strategies will fare in the compound cases? Surely failure is possible for N>1 even if it is impossible for N=1. Presumably one could (at least for very small values of N) itemize a catalogue of the *combinations* of tile patterns in the subgames that could lead to a loss and perhaps find ways to circumvent them, as we did above with "preferred" and "out-of-the-box" moves, but I have not tried to do so. I have tried to run computer simulations of thousands of randomly-selected Quordle games to see how the different starting sets compare, but it is not clear how representative these are, since there are over one trillion different Quordle games, and many more for Octordle, etc. ---------------- To summarize all this notation: in this document we will analyze some sets of "a" Wordle words to be entered at the start of an "N"-subgame compound Wordle-like game. Based on the colored tiles they would yield, the 2,315 Wordle answer words will be split into "clusters", the sizes of which are stored in the "cluster vector". On any one day of play, the player would enter the starting word-set; seeing the resulting colored tiles tells us the cluster in which the hidden word is contained. The player will try to discover which word in the cluster is the hidden word, either by entering candidates at random, or by using a memorized "preferred" member of the cluster, or by using a pre-computed "out-of-the-box" word that splits the cluster into smaller clusters (or, in extreme cases, by using the idea of "preferred" words recursively, or by entering some pre-determined words over the next *two* turns to resolve ambiguity). Knowing the starter set and the intended mode of play (and the particular preferred or out-of-cluster words to be used) we can calculate the "probabilty vector" of possible lengths of the game. In turn, that allows us to compute the expected number of turns and the expected rate of failure, for this starting set of words. This gives us many potential metrics by which to say one starting set is better or worse than another: we can combine some statistics from the cluster vector and the probability vector, and if we are using extra rules to ensure a win, we can count how many there are and how complicated they are. We can combine these measurements by any formula that appeals to us -- maybe tossing in other measures too. (How likely it is that the there will be very few colored tiles, in which case the player might need to waste a turn getting a hint? How likely is it that there will be so many colored tiles after the starting set is paritally played, that we can jump early into a guess?) In this document we will choose to balance the many measurements in a few ways; the reader is invited to do so differently. Very well, then, let's look at some very good starting sets of Wordle words. We start with sets of a=6 words, then progress down to smaller values of a. ============================================================================== SIX (AND UP) The six-word set [catty, frond, rumba, spill, verge, whack] is nearly perfect for playing these games. It completely distinguishes all 2315 Wordle words (despite not including j,q,x, nor z !). That is, any two different hidden Wordle words will generate different patterns of green/yellow/gray tiles when these six words are entered. So if we enter these words as Phase 1, then Phase 2 is simply: enter the hidden word and win; there is no need to wonder about the player's behaviour. There is no guessing nor branching in this routine. After the a=6 starting words are entered, the probability of finishing on the very next turn is 100%. So this six-word set is nearly ideal for the N-fold compound Wordle games: all of them will be completed successfully in N+6 turns, 100% of the time. Sadly, most of these games (Wordle itself is the case N=1, then Dordle, Quordle, etc.) allow only N+5 turns to finish the game, so this starting set has a 0% chance of actually winning the game. One exception is Octordle's variant that requires the player to solve 8 simultaneous Wordle subgames *in order*; because of the extra level of difficulty, that game allows the player 15 guesses to try to win. This starting set of 6 words permits the player to win every such game in only 14 guesses! (Here I use the fact that all the answers-words for Octordle are among the 2315 Wordle answers.) *Almost* serving as another application is Sexaginta-quattuordle, which gives the player N+6=70 turns to guess N=64 words --- just enough to use this starting sextet. In fact, starting with this sextet gives an excellent way to play this compound game, since then the player need only scan the first six rows of the crowded display for each subgame. Unfortunately, the word list for 64ordle is significantly larger than that of Wordle. So in 64ordle there are pairs that are not distinguished this sextet: edged, egged sided, sized dozed, oozed boded, boxed dazed, jaded waded, waxed including some pairs involving Wordle words* : unzip*,unpin bonus*,bosun mummy*,yummy organ*,argon so the player would not be assured a win with this sextet. In fact no sextet will suffice for the full set of 64ordle words. I suspect there are many septets that will suffice to distinguish every one of the game's answer words, but with only 70 turns allowed, a starting septet will not allow the player a victory. The sextet also definitely fails (slightly) for Dordle. whose solution-list includes the words UNPIN, YUMMY, and ARGON; the sextet does not distinguish these from UNZIP, MUMMY, and ORGAN, respectively. Since the Dordle word list also *deletes* some of Wordle's words, there may be a different discriminating sextet for Dordle. I have not looked for one. Of course, this discriminating sextet also works for those compound games whose solution set is a subset of Wordle's: Quordle, Octordle, and Duotrigordle. But it's of no practical value since those games do not allow N+6 words to be entered. It is comparatively easy to get more 6-word sets that *almost* split the whole wordset, and thus it is easy to get many sets of 7 that do. But, astoundingly, it turns out that this six-word set is the UNIQUE sextet of Wordle answer-words that splits the Wordle dictionary into singletons in the way I have described! (I have to say I was very surprised by this.) A novice player who wants to practice recognizing Wordle words might want to play with this set, since it allows only one Wordle-correct word to be built from any set of clues. You could, for example, enter these words into the sequential version of Sedecordle (because it allows you to enter so many words) and then practice recognizing Wordle words. Of course, even when playing this sextet, the player still has to do some thinking to *recognize* the hidden word each day; knowing that it is unique, and knowing a few letters in it, are not quite the end of the story. Of the 30 letter tiles shown after the six words are entered, the player may see no greens at all and as few as four yellow tiles (both {b,i,n,o} and {i,n,p,u} can occur) and it can take some effort to realize the hidden words are "inbox" and "unzip". If you want to have a set of words that has the same property as the Magic Six, but includes all 26 letters, you'll need at least eight starting words. One such solution is [cross, equip, expel, flack, jumbo, razor, vodka, wight] How much help do you need to construct the hidden word? We can even test for each letter that's ever doubled, though to do so you'll need at least 17 starting words, e.g. [affix, booby, ditto, jazzy, kappa, kayak, mimic, occur, penny, piggy, queue, radar, shush, slyly, undid, vivid, widow] Oh, there are some tripled letters too; if you want those flagged as well you'll need at least 20 words, e.g. [bobby, cocoa, daddy, error, fluff, heath, jazzy, knack, leggy, mamma, melee, ninny, pixie, puppy, queue, sassy, slyly, tatty, vivid, widow] At that point, you know not only which letters appear but how many of each there are! With 20 words entered, Sedecordle is leaving you space for just one guess, but you have literally nothing to do but to permute any letters in yellow! (On average, 3.74 of the tiles are already green; only abort, acorn, adorn, avian, axial, offal have no green tiles and thus force you to consider all 120 permutations of the five yellow tiles.) I even offer a starting word-set that relieves the player of all thought! We do so by finding an (optimal) solution to the game Kilordle, which requires solving N=1000 Wordle games at once. (In Kilordle, it is not necessary to *enter* each correct word, merely to get a green tile in each of its five columns. As an additional assist, the 1000 subgames are sorted to present first the ones that are closest to completion by some metric, with the completed subgames removed from view.) But in fact we can ignore the given subgames completely! Just treat this as requiring a list of words that contains each letter in each position -- 130 tasks. Actually 5 of the tasks are never presented in a Wordle game (e.g. there is no word with an x as the first letter) so we can always win by entering no more than 125 words. When solving Kilordle manually, I typically need to enter about 100 words. On the other hand, at least 26 words would be necessary because we would, among all the subgames, eventually need to enter every letter in column 2. So the minimum number of words needed, to be sure to solve every round of Kilordle, is between 26 and 125. Using optimization software I discovered that the minimum number of words is actually 35. A sample solution is [above, affix, askew, banjo, bayou, civic, debug, eject, epoxy, ethos, evoke, extra, fritz, globe, howdy, igloo, imply, jazzy, known, leggy, maxim, nymph, ozone, pique, quasi, rajah, scrub, skimp, squad, tweak, udder, vinyl, whiff, yacht, zesty] So not only does this set of words solve every game of Kilordle, it gives a "simple" way to solve Wordle, too: just enter all 36 of these words, and then locate the green tiles in each column to form a 5-letter word! Of course, we're now waaay past Wordle's 6-turn limit... Other 35-word kilordle solutions exist but all of them must contain "pique", which is the unique word having q in position 3. They must likewise all contain "bayou" (u5), and either "banjo" or "ninja" (j4), "azure" or "ozone" (z2), "eject" or "fjord" (j2), etc. I have already fiddled with the list to remove words I didn't care for. ( "squib"? "waxen"? "twixt"?) I'm not sure what I would consider to be the most "normal-sounding" list of 35 words. If you don't mind using words like "embog" and "jambu" then the minimum drops to 30, using the 12000-word list of possible Wordle inputs. A solution was posted to reddit by user "k3and". As the Times increases the pool of acceptable input words, the size of a minimal winning Kilordle set can decrease.) Mathematicians might want to click here for a short description of how I first found the 6-word set; You'll probably want to read about measures of word similarity first. The point is that we can talk in a meaningful way about what it means for two words to be "close" to each other. (Mathematically, we can impose a metric on the set of these words, and all our searches for optimal word sets focus on finding words that are within a small distance of each other, then ensuring that Phase 1 leaves us with ways to distinguish those words.) The claims of minimality for the 8-, 17-, 20-, and 35-word sets are proved by covering-set arguments and computations using Gurobi. ============================================================================== FIVE With five 5-letter words we can hope to include all but one letter of the English alphabet, and sure enough this is possible. One example that comes immediately to mind contains all the letters but j : (5-1) [waqfs, vozhd, blunk, cimex, grypt] Hah hah, just kidding, that's a bit ridiculous. Not one of those five words is in the Wordle wordlist, although all five of them are accepted as input in a Wordle game. We can get as many as three wordlist words into the set and still have 25 letters: (5-2) [waltz, fjord, chunk; vibex, gymps] (This one misses only q .) But having two words outside the basic Wordle wordlist is the provable(*) minimum, if you hope to include 25 of the 26 letters, and the only two such sets are that one and (5-3) [waltz, fjord, nymph; vibex, gucks] and neither of these is particularly great as an opening play in Wordle. The cluster vectors are [2221,41,4] for the second one and [2242,30,3,1] for the first, which cannot distinguish {puree, purer, rupee, upper}. So neither will guarantee you a win of the game in 6 turns.) [ (*) UPDATE: I re-verified this after the NYT increased the set of Wordle inputs. Simply compare the list of 20-letter quadruples of Wordle words to the new list of acceptable Wordle inputs; in every case there is non-empty intersection. I'm waiting for them to decide that "gveck" is a word; then [ fjord, nymph, squib, waltz ; gveck ] will be a 25-letter quintuple that uses only one word from the list of acceptable inputs. Alas, until "gveck" becomes a real word, we will have to use sets of SIX Wordle words if we want to cover 25 -- or all 26 -- letters of the alphabet; "gecko" and "vixen" cover "gveck"+"x".] A five-word set with 25 distinct letters is impossible if it includes only zero or one non-wordlist words; the best we can do is 24 distinct letters. Before I turn to the 24-letter sets formed only from answer-list words, let me mention one example that does include just ONE non-Wordle word, which I will do because it's actually a reasonable word. It's (almost) a sentence, or at least a headline: (5-4) [quick, waltz, vexes, fjord, nymph] (It's got two e's while missing b and g . "Vexes" is not in the Wordle wordlist, being a third-person singular form of a verb.) Entertaining though it may be, it's not as perfect for Wordle as the sextet in the previous section: the colored tiles returned from these five words are not sufficient to distinguish "error" from "gorge", "blast" from "stall", etc. Its cluster vector is a disappointing [1637, 197, 49, 12, 5, 6, 4] . ---------------- That's the last word set I analyzed that uses non-Wordle words; for the rest of this section, I only consider sets of words from the Wordle solution-word list. Of all the 5-word sets made of Wordle answer words, none have 25 distinct letters. There are 58 5-word sets with 24 different letters. Four of them include one word with a repeated letter, e.g. (5-5) [blitz, chump, fjord, gawky, seven] and the rest have a pair of words sharing a letter, e.g. (5-6) [coven, fjord, gawky, plumb, sixth] As starting sets for playing Wordle and the other games, I would argue that these two are each the best in their class. But despite revealing nearly all the letters in the hidden word (they miss q,x and q,z respectively) they still don't quite pin down the word unambiguously: the cluster vectors are [2260, 26, 1] for the first (it cannot distinguish {odder, order, rodeo}) and [2253, 31] for the second. Thus we cannot be sure to win by turn 6 with either of these quints. Obviously by flipping a coin for the 31 tricky clusters of the second quint, we would have a probability vector of [0.986610, 0.013390] : a 1.34% chance of losing and an average of 6.0134 turns to win. The first quint (5-5) is a little different, though. That one triple {odder, order, rodeo} does benefit from preferentially choosing "odder" or "order" instead of "rodeo". So we can run two analyses: (1) using a guess-at-will strategy, the probability distribution is [0.987904, 0.011663, 0.000432] (2) Or we can use instead a "guided hard mode" strategy: play "order" as a preferred word. Now the distribution is [0.987904, 0.012095] Either way there's still a 1.21% chance of losing. But taking the extra effort for one tough cluster lowers the *maximum* game length from 8 to 7, and the *average* game length from 6.0125 to 6.0121 . ---------------- Next we set aside the desire to include 24 different letters, and just look for ANY good set of five Wordle words. Is there any five-word set that's as good as the six-word set of the previous section --- one that always narrows down the set of possibilities to just one word (and thus guarantees a win by turn 6)? The answer is provably "no". I have an elementary argument that explains why no such perfect quint exists. But it's faster to simply use the Linear Programming techniques already described in this document, especially since (a) That technique also proves we cannot even find a perfect quintuple among the much larger set of Wordle's list of recognized inputs, and (b) LP techniques helped me find the "tough pairs" that I used to create the elementary argument, anyway. (The LP techniques are also used to prove the uniqueness of the six-word set in the previous section, and that uniqueness in turn trivially proves that no five-word set can detect every hidden word unambiguously.) We will return to observation (a) in a moment. ---------------- Nonetheless, some really good sets of five Wordle solution words do exist. A provably most-efficient-possible 5-word set is: (5-7) [blank, chump, goody, river, swift] (It lacks jqxz; has double r, double o, and two i.) Laurent Poirrier found this one and we have proved that no other quintuple is better in the sense that this quint can distinguish all the Wordle answer words except for these 11 pairs: [ample, maple] [booby, boozy] [bugle, bulge] [chili, chill] [eagle, legal] [gauge, gauze] [jaunt, taunt] [lemon, melon] [pasty, patsy] [skate, stake] [testy, zesty] That is, its cluster vector is [2293, 11]. So there is no need for any strategy but free guessing --- just flip a coin for those 11 clusters. And thus we compute a distribution vector of [0.995248, 0.004752], and so the loss rate is 0.48% and the average number of turns is 6.0048 . The only other quintuple that is equally efficient is (5-8) [bawdy, clove, furor, might, spank] (No jqxz; has double r, two o, and two a); it has the same cluster vector and distribution vector, and hence the same average and loss rate. If we sequentially ran N independent random processes, 99.52% of which finished after 1 step and the remainder after 2 steps, then the number of steps needed to complete all of them could be anywhere between N and 2N ; the probability that exactly k of these N processes took that second step to complete would be binomial(N,k) (.9952)^(N-k) (.0058)^k, and the expected number of steps taken would be 1.0058 N . That almost models what happens here with an N-fold compound Wordle game like Quordle (N=4) : the expected total number of turns needed, if we begin with either of the starting quintuples above, would be 5 + 1.0058 N IF the N subgames were independent. But they're not! Suppose for example we are using the first of these two quintuples. If after entering those words we have concluded that in one of the N subgames the hidden word could either be "ample" or "maple", and we also know that the hidden word in another subgame is either "bugle" or "bulge", then we should indeed flip a coin to enter either "ample" or "maple"; but then (by looking to see whether the L is yellow or green) we would know whether the other hidden word is "bugle" or "bulge". As it turns out, for EVERY one of the 11 pairs at least one of the words in the pair can resolve at least one of the other 10 ambiguous pairs, so we should preferentially play those words and reduce the ambiguity in another subgame. In fact, it is impossible for a game to require more than 10 + N turns to complete (far fewer than 5 + 2N except for the smallest N) because so many of the ambiguous pairs include words to guess preferentially so as to discover the hidden words in other pairs! A maximally bad example includes N=5 subgames with the five hidden words being chili gauge jaunt lemon skate Such a game would require five coin tosses, which if they are all unlucky would cost us 10 turns to win, after the initial quintuple is entered. Since the subgames are very likely NOT independent, then, we can only conclude that the expected number of turns to complete an N-fold compound game, when starting with one of these two quintuples, is *at most* 5 + 1.0058 N . When playing the typical N-fold compound games, that allow only N+5 turns to complete the game, then after the starting quintuple is entered, we must finish every one of the subgames with just one turn (each). That would happen with probability (0.9952)^N IF the games were independent, but as above we notice that the earlier subgames can provide additional information to help resolve the later ones. So in fact the probability of success is, for all N, at least 0.9952^5 = 0.97623 (i.e. a 97.623% chance of winning), assuming the player resolves any of the 11 ambiguous cases in an advantageous order. ---------------- When I claim that the previous two starting quintiuples are optimal, what I mean is that they minimize the number of pairs of words that are not distinguished from each other, and consequently they minimize the expected number of turns needed until a win. The proof of their optimality comes from searching for sets of five words that maximize the pairs split among a select list of pairs of similar words. Searching for maximizing quints in that way allows us to discover other quints that are nearly as good. The next few close contenders for "best" starting quintuple (all of which happen to contain all letters except the rare j,q,x, and z) are these: (5-9) [bawdy, furor, month, speck, vigil] has cluster vector [2292, 10, 1] and distribution vector [0.9948, 0.0052]. (Nothing is better than guessing at random for the triple {bobby,booby,boozy}.) (5-10) [flock, haven, rugby, swept, timid] has cluster vector [2291, 12] and the same distribution vector [0.9948, 0.0052]. (5-11) [bawdy, chump, front, skill, verge] has cluster vector [2285, 15] and distribution vector [0.9935, 0.0065]. (5-12) [batty, champ, furor, slink, wedge] has cluster vector [2285, 15, 1] and distribution vector [0.9927, 0.0073]. (Same ambiguous triple as (5-9), so guess-at-will is as good as anything.) For the curious: these are the six quints that cover the largest numbers among the 919 hardest splitting sets, that is, I checked the 919 hardest pairs of words to differentiate, and these quints covered the most --- at least 907 of them --- and moreover these quints *did* distinguish any pair of words that wasn't on this list of 919 tough pairs. For completeness' sake, I ran a similar test with the much more accommodating set of 14,853 currently-allowed input words for Wordle. There exist (multiple) sets of five words which successfully distinguish all the Wordle answer-words except for 8 pairs, and 8 is the minimal number of failures. For example, for (5-13) [spill, verge, dumbo, fawny, chott] the cluster vector is [2299, 8]: the non-singleton clusters are {algae,glaze} {crock,crook} {dried,drier} {husky,hussy} {liken,linen} {odder,order} {piper,riper} {rebar,zebra} So the distribution vector must be [0.9965, 0.0035], and thus the average number of turns is 5.0035 and the failure rate is 0.35%. I cannot predict what "words" will someday be allowed as input for Wordle, but I can guarantee that it will never be possible to enter five words and unambiguously know what the hidden word is, to be entered on turn 6. I considered 945 of the hardest pairs of words to separate, and used Gurobi to determine the largest number that could be distinguished by ANY combination of five strings of five letters each. It reported that the maximum is 943, that is, at least two of those pairs would go unsplit. (There were several sets of five "future words" that would accomplish this, but only after some experimentation did I find one that *also* split all pairs not on my chosen list of 945: the starting quintuple of these "words" [serer, calvl, hyott, gmudn, fpibk] has a cluster vector of [2311, 2], the only unsplit pairs being crook/crock and gauge/gauze. SERER is actually on the current list of admissible inputs; none of the others is. The letters can be permuted within columns to get other equivalent starting sets, as long as the doubled e, l, r, t stay doubled within a word. If you really want to stretch the notion of the Wordle game, suppose we start the game with these ... 5.2 words (?!): [spaul, flyin, doogh, crrew, mktbt, v****] The tile colors that we get in response will almost always identify the hidden word; Its one failure is eager/gazer. But this set consists of five complete "words", and the extra letter is right at the front of the sixth word (so can we call it 5.2 words?). As a bonus the first two words are actually admissible Wordle inputs, and frankly if you told me the next two were Wordle words, too, I'd believe you. (They're not.) As you can see there are words with two Os, two Rs, and two Ts (and those letters need to be together in a word); there are also two Ls which you could put into the same word but there's no need to. (The tool for this optimization is to allow 130 variables for which letter/position pairs are used, together with 26 variables for which letters will be doubled within a word, and 10 more variables to indicate which letters would be tripled within a word. Together, these account for all the mechanisms by which a set of starting words can distinguish any particular pair of Wordle solution words.) It is easy to find many starting quintuples which give success rates over 99%, but never 100%, so perhaps we are just splitting hairs here. But one quint of note is (5-14) [carve, sight, downy, plumb, fetal] with cluster vector [2269, 20, 2] and distribution [0.9896, 0.0104]. (Both triples can be solved using freeform guessing.) The significance of this quintuple is that it extends the best quadruple we will find in the next section (the first four words here). So playing the first four words first already gives a high probability of solving a Wordle puzzle without entering all five words; and those first four words already use 20 different letters, making it easier to guess the hidden word even when it is known to be unique. Alternatively, this quint is good as a backup plan for a player intending to use that "best" quadruple but who has trouble discerning what word to enter next; FETAL offers the most additional help (but does come at a cost of a 1% failure rate that was not present for a player who uses that quadruple and CAN discern the unique solution word!) So to summarize, we have found several quintuples of starting words that *almost* always enable us to know the hidden word and enter it as turn 6 and win --- but not one of them allows us to win 100% of the time. This will change when we get to quadruples! But first, we need a little digression... ============================================================================== INTERLUDE: THE DIFFERENT WAYS TO RANK PROPOSED STARTING SETS As we discuss smaller starting sets, there will not be a single "best" starting set because there are different ways to rank or score the candidates. In order to rank them, you have to know what it is that you value most! The main question to ask before ranking is: do we want to rank the candidates based only on how the game looks immediately after the starting set is entered? Or should we "play the long game" and incorporate into our ranking the knowledge of how we will proceed for the rest of the game? We'll consider the first possibility first; we'll see there are multiple ways to measure just how well the starting set has worked. Later, we'll investigate rankings based on two possible strategies the player might use to finish the game: a guess-at-will procedure, or a procedure based on pre-computing a few ideal moves to make in just enough cases to ensure a victory by move 6. (One can go further and assume the player has worked out a strategy for more than the minimal number of cases, maybe even computing an ideal move to make at every turn. Since this article is about *human* players, we will not pursue such advanced options.) We can apply these rankings to any sets of candidates. Primarily we will use them to make a systematic examination of all the starting sets that repeat no letters --- this restriction will give us a manageable set of candidates to examine methodically, which generally speaking contains the best starting sets of any size. Here are the counts of such sets (and some links to the lists): Sets How many exist quintuples+ 0 quadruples 45,147 triples 1,243,026 pairs 196,175 singletons 1,566 Note: many times our rankings will necessarily put two candidates at a tie. One reason is that our lists of candidate starting sets include many "perfect anagrams": pairs of starting sets that have the same letters in the same columns and therefore will return precisely the same colored tiles no matter what the hidden word. For example the triples [crone, guilt, shady] and [crony, guide, shalt] are perfect anagrams of each other. Some extreme examples are [place, trunk], [plane, truck], [plank, truce], [plunk, trace] and [twang,slump,cried], [tried,swung,clamp], [tried,swamp,clung], [tramp,swing,clued] Starting sets that are perfect anagrams of each other result in the same progress of the game: they partition the dictionary into the same clusters, they give the same colored tiles, etc. So when we are rank the candidate starting sets, we will mention only one of the pair, and relegate its perfect anagram(s) to a footnote. These sets of anagrams will appear when we review starting quadruples, triples, and pairs. (But no two single words can be perfect anagrams of each other!) (A) COLOR METRIC(S) Very well then; how can we assess how good the player's situation is, right after entering the starting triple (i.e., making no assumptions about the player's actions thereafter)? All the information we could use at that moment is presented in the tiles that have been turned green/yellow/gray by the starting set. Since we are assuming throughout this document that the player knows the Wordle dictionary, that implies the player can run mentally run through the list each day to find the words that match the colored tiles before him: he knows each day what cluster of words contains the day's hidden word. We will use that information in part (B), below. But let's concede that a typical human player will instead try to construct the hidden word candidates from just the green and yellow letters; so he wants to have a lot of those! For example, in the Introduction, we gave an example of a display of colored tiles that corresponded to only one possible word (WEDGE) yet the situation would have been difficult for a human because there was only one colored tile to go on. So one of the ways we will rank different starting sets will be in terms of how much information we get from the starting set. Our proxy for that will be simply to count how many yellow and green tiles they produce; more is better than fewer. We will primarily do this for starting sets without repeated letters. If a starting set includes two words sharing the same letter in the same position, the counts of the green, yellow, and grey tiles will over-estimate the amount of information obtained from the starting set. (That's also true if the starting set includes a word with a repeated letter: if, say, the left-most E goes grey, then we gain no new information from observing that the other Es are grey too.) Of course the counts will vary depending on what the daily hidden word is, so these should be interpreted as averages --- expected values of a random variable. Equivalently, we can simply count the numbers G and Y of green and yellow tiles that will show up day after day, across an entire 2315-day cycle. (Divide by 2315 if you wish to compute averages.) Then each starting set can be plotted as a point in the (G,Y) plane; the points that are farther out correspond to the starting sets that that give the player the most information. Mathematically the points of interest are the points that are on the boundary of the convex hull of this set of points --- the ones that would snag a lasso tightening around these points. In order to actually rank the candidates, we have to decide how much information we get from a yellow tile as opposed to a green one. If each yellow is worth a fraction f of a green, then the metric we would use to rank the candidates is simply G + f Y . Natural choices might be f=0 (if only the green tiles are of interest), f=1/2 (if you'd be willing to swap two yellows for a green), or f=1 (if every colored tile is equally valuable to you). But we can determine the ranking of the candidates for every f. Mathematically we can even ask about f>1 although this would make no sense in the context of Wordle! It's not hard to show that when f=1, the metric depends only on the letters involved, not their positions; anagrams might replace yellow tiles with green ones or vice versa, but no matter what the day's hidden word, the total number of colored tiles is the same for both permutations of the letters. This will mean that when f=1, there is likely to be a large multi-way tie for "best" starting set, each candidate set being an anagram of the others. So this becomes our first (family of) metric(s): we can show the highest-ranking candidates for f=0 --- that is, the candidates that produce the most green tiles, on average; then a list of ranges of f < 1 on which the ranking doesn't change, and in each range we can show the rankings of the candidates using the values of G + f Y . Jeff Dooley has proposed an interesting variation: just as we might weight the yellow tiles differently from the green ones, we might also weight the colored tiles differently depending on the letter revealed. (Not only does a green letter help the player more than a yellow one, but since J is so much rarer than E among Wordle words, the presence of a yellow J is much more valuable than a yellow E.) I have not pursued this very far yet. He remarks that simply weighting consonants differently from vowels can lead to different rankings; for example the best starting pair under such an assumption came out to be CRONE + SHALT.) Another alternative using the color tiles is to try to maximize the minimum number(s) of colored tiles day after day: the better starting set is the one that never leaves the player high and dry. We won't pursue this systematically but can make some observations about examples. Since these rankings depend only the the numbers G and Y, it will happen that two candidate starting sets rank equally for every value of f, if they should happen to have the same values of G and of Y. That happens for perfect anagrams of course; it can also happen "by accident". (B) CLUSTER METRIC(S) Next we assess the situation after the starting play a little differently. A player who really is familiar with the set of solution words might not need so many colored tiles to identify the candidate words; but he might appreciate a starting set that generally leaves little ambiguity about what the word is, that is, the player would prefer that the starting set partition the dictionary into a lot of small clusters rather than fewer larger ones. On the assumption that all the dictionary words are equally likely to be the hidden word, it makes no difference which words are in the cluster: the quality of the situation depends only on how many words are in the cluster. (In parts (C) and (D) below, we will consider ways in which one cluster may be viewed as better or worse than another cluster of the same size.) So from this perspective, all the information we need to judge and compare candidate starting sets is the cluster vector v = < v1, v2, ..., v_N > that shows the numbers v1, v2, ... of clusters of sizes 1, 2, ... We'd like to rank more highly the starting sets for which the numbers of small clusters are high and the other numbers drop off rapidly. So we will form various metrics that can be calculated from v and which grow larger or smaller for the better or worse starting sets. In the Wordle "literature" there are multiple metrics of this type; we will outline a few of them below. But actually, all of their rankings can be determined in a uniform way by computing a version of mathematicians' "L^p metric": For any number p we can compute a number for each candidate starting set, based on the cluster vector: Lp ( v ) = sum( v_i * i^p ) = sum of the p-th powers of the sizes of all the clusters We will even extend this to the values of p = - infinity ( meaning that i^p = 0 unless i = 1 ), and p = + infinity ( meaning we find the ranking of the candidates that applies for all large p ; equivalently, we rank the candidates by the value of lim( p -> infinity ) (Lp)^(1/p) . ) Which indicates a better candidate, a larger value of Lp or a smaller one? Suppose two candidates have nearly identical cluster vectors, but one has a single cluster of 2C elements, and the other has two clusters of C elements each. Clearly the second candidate is the better one for the player. How do their Lp values compare? They have all the same summands except for a term (2C)^p for the first candidate, and 2 C^p for the second. Thus if the Lp metric is to indicate which candidates is better, then we must declare that 2 is better than 2^p. This means that for p > 1, the better candidates are the ones with smaller values of the Lp metric; for p < 1 the sense is reversed. I don't have a ready explanation of what exactly each metric Lp measures. But think of it this way: using different values of p > 1 allows us to decide just how badly we want to avoid having large clusters; different values of p < 1 (especially negative p) allow us to decide how strongly we want to favor having small clusters. There are key values of p for which the Lp ranking matches the rankings that other people have investigated. p = 1 : This sum L1 = sum( v_i * i ) simply counts all the words in all the clusters, and so is the same value for all candidates (L1 = 2315). (Even though all candidates are tied for best when p=1, there will of course be winners using the metrics with p near 1 . The starting sets that already give the smallest Lp values when p is just larger than 1, or the largest values when p is just smaller than 1, are those making the smallest changes from L1 = 2315; mathematically this rate of change is the derivative of Lp, which works out to sum( v_i * i * log(i) ). Thus it makes sense to use this expression to give us a ranking that applies when p=1; smaller is better. ) p = 0 : Since i^p = 1 for every cluster size i, this sum L0 = sum(v_i) is simply counting the clusters, i.e. the number of possible ways the tiles can be colored over the years. (Note that the arithmetic average of the sizes of all the clusters is L1 / L0 = 2315/L0, so maximizing the number of clusters, L0, also minimizes the average of their sizes.) Also, it is an elementary probability exercise to see that the probability of successfully guessing the hidden word on the very next turn, when using a guess-at-will strategy, is exactly L0 / 2315: so the starting triple with the highest value of L0 is the one that makes it most likely that we'll enter the hidden word by our fourth turn. p = - infinity : Treating i^p as 0 for every value of i > 1 means that for this value of p, Lp is counting the number of singleton clusters: the number of words that can are determined unambiguously by the tile colors. Equivalently, this Lp counts the number of days per cycle that we can be sure of the hidden word; subtract from the total number of days (2315) to get the number of days we must either selected a word at random from the cluster as a guess, or determine a different strategy for that cluster that can guarantee a win by turn 6. We will return to those options in parts (C) and (D), below. (OK, OK, "-infinity" is not a real number, so we are instead using the ranking provided by all sufficiently negative p .) p = 2 : Over the whole 2315-day cycle, every word will eventually be the hidden word once; thus a cluster with i elements will be recognized as the cluster containing the day's hidden word just that many times. Hence when we tally the size of the day's cluster, day after day, we are computing L2 . (We could then divide L2 by the number of days, 2315, to obtain the "average daily ambiguity".) p = + infinity : the last term v_N * N^p dominates the others for large p, so the candidate with the smallest value of Lp will be the one(s) for which the largest cluster is as small as possible; ties are broken by counting the number of clusters of that size. (Any remaining ties are broken by the next-largest term, which similarly considers the second-largest size of a cluster, and so on.) We recognize all these special cases as metrics of interest that provide worthwhile rankings of candidate starting sets; but by considering the rankings that result from all values of p, we can see how the different rankings morph into each other as p varies. Here is an illustration, showing the rankings of the top single-word starting sets, for all values of p. Sometimes a tweak in the metric(s) may be appropriate. For example, after a starting triple is entered, the player has three more turns to try to enter the day's hidden word. It's great if the clues obtained from the starting set uniquely identify the hidden word, but even if there are two or three possibilities we are comforable: we know can keep guessing until we land on the correct one and still we will have won by turn 6. So perhaps we should try to maximize not the number of singletons but rather the number of days when we can confidently just try all the words in a cluster: don't maximize v1 (which is what happens with p=-infinity) but rather maximize v1 + 2 v2 + 3 v3 . (Along the same lines, I have also looked to maximize the percentage of clusters that are this small; that is computed as (v1+v2+v3)/L0. ) Of course we would rank starting quads in this way by maximizing only v1+2v2, and similarly adjust for starting sets of other sizes. Parallel to the comment in (A) about candidates with the same (G,Y) measurements, note that all these Lp metrics are computed only from the cluster vector. Two candidates with the same cluster vectors will end up ranked equally for every value of p. This happens with perfect anagrams, of course, but can also occur in other cases, particularly when the cluster vector is short (as happens with the best starting quadruples, for example). As you can imagine from the foregoing discussion, there is no end to the set of ways that one may assign a "score" to every starting set. Besides inventing new quantities to measure, we also have the option to combine several existing measurements into one; how we mush them all together is a personal choice. (For example, in my files of the pairs and triples that have no repeated letters, I sort them according to the value of 5 L0 + 2 G + Y .) For this reason I will try to determine not only the "best" starting set by each metric, but also a couple of runners-up --- one of them might be "pretty good" by lots of separate metrics, and thus become a player's go-to starting set. (C) GUESS-AT-WILL RANKINGS So far we have considered only the ways to compare and rank different starting sets without regard to how the player will proceed afterward. We can make more useful rankings if we know what the players would actually do in later turns; but what strategies might they use? Of course a player may adopt a complicated playbook of their own design, but from here forward we will assume the player with either (a) switch entirely to a guess-at-will strategy, running a risk of losing, or (b) pre-compute a simplest-possible winning strategy, finding a word (or sequence of words) to play in those cases when the colored tiles indicate a cluster that might lead to a loss if we use strategy (a). For a player pursuing strategy (a), I can think of only two natural measures by which to rank the starting sets. We want both these numbers to be as low as possible: M6 = probability of a loss (i.e. not guessing the word by turn 6) M7 = average number of turns needed to win (including occasions when 7 or more turns are used) Both these numbers are computed readily from the probability distribution that shows the probability that the player will guess the hidden word after 1, 2, 3, ... additional turns. We will compute those distributions assuming that the hidden words occur with equal probability, and assuming that the player selects words from the cluster with equal probability. Note that at this point, not all clusters of a given size are equally difficult. Among clusters of three words, for example, we have seen in previous sections clusters like {skate, stake, state}, in which any wrong guess gives extra information to reveal what the hidden word must be; and clusters like {haunt, jaunt, vaunt}, in which each wrong guess provides no extra information. (So the first cluster adds a vector (1/3, 2/3, 0) to the probability distribution, while the second cluster adds (1/3, 1/3, 1/3) .) (D) RANKINGS THAT ASSUME A (MINIMAL) STRATEGY The last way to rank and compare starting sets is to indeed take into account the players actions after entering the starting set, but to assume the player will do something other than enter a cluster word at random. In order to compute such a ranking we have to decide in advance what playbook we think the player will follow. Here there are multiple options. In this document, we will assume that such a player will above all want to find the hidden word by turn 6, but even beyond that, there are multiple options. In our ranking of candidate starting sets, we choose to review just one strategy per candidate. Then, we shall simply rank the candidates by the average number of turns their strategy requires until finding the hidden word. Other analyses I have seen have opted to assume the player would use the "optimal" playbook, that is, for each starting set, to follow up with a full decision tree showing what actions to take for each cluster that could contain the hidden word, the actions chosen to minimize the expected number of turns. We will (usually) not consider these strategies, since they are usually too complicated for human execution, which is our interest. Instead, in order to pick a "simple" strategy which we will assume the player will follow, we will assume their actions after the starting set is entered are governed by these principles: (1) Whenever a guess-at-will strategy is sure to bring a victory by turn 6, use it. (2) If playing a "preferred" cluster member will guarantee victory, use it; more precisely, use the cluster member that will ensure victory in the fewest turns. (3) If not, play a single, out-of-cluster word that will ensure victory (again choosing one that minimizes the number of turns). (4) I have made some ad-hoc choices in the rare cases that no single word will allow a guess-at-will strategy afterwards. But note in particular that the strategies I am assuming will include special handling rules only for the clusters that might lead to a loss if free guessing is used. More finely-tuned decision trees can surely lower the expected numbers of turns but in the interest of finding ways for ordinary humans to play the game, their consideration is beyond the scope of this article. With the strategy in place, it is a straightforward matter to compute the probability distribution showing the likelihood of ending the game after this or that many moves. We can then rank the candidates based on the average number of moves. Unfortunately, it is computationally intensive to work out the optimal strategy. Therefore, I usually only work one out for the starting sets which have proved to rank highly by the metrics in the previous subsections. As a general rule, most of the time a player is using these strategies, they are simply using the guess-at-will rule (1); hence the rankings with a strategy are typically similar to the rankings without a strategy. That gives some confidence that we have not missed a "best" candidate. The other issue of importance in this subsection is a bit informal: we would like to find starting sets whose win-by-turn-6 strategy is "simpler" than those of any competing candidate. Primarily that means we rank more highly the starting sets that have fewer anomalous clusters that require a special rule to be memorized. Generally we prefer starting sets that allow more use of in-cluster preferred words than out-of-cluster words, and we prefer starting sets that do not require rules that are used any later than immediately after the starting set. So there you have it! Many different metrics by which to compare and rank candidate starting sets, some with side variations or extra parameters to tinker with. For starting sets of four or fewer words, these different metrics can lead to distinctly different choices of which candidate is "best". In the next three sections we will evaluate our starting sets of different sizes according to these different metrics. ============================================================================== FOUR Using the right starting set, we can guarantee a win of Wordle. With a four-word starting set, it is conceivable one could win *before* using up all the turns allowed by Wordle's rules. Indeed we'll see in the next section that a perfect player can always win Wordle in at most 5 turns. But in order to do so with a four-word starting set, that quadruple would have to unambiguously identify the hidden word every time. As discussed in the quintuple section, that's not even possible with FIVE starting words, let alone with four. So instead we look for sets of four starting words that can guarantee a win by turn 6, i.e. with TWO rounds of guessing after the four starting words are entered. That's flexibility we did not have in the previous section, and it turns out to be just what the doctor ordered. Let's get right to my favorite starting quadruple: The four-word starting set (4-1) [carve, downy, plumb, sight] guarantees a win at Wordle. The cluster vector is simply [2182, 59, 5]; obviously we can win on the fifth turn whenever the hidden word comes from the many singleton clusters, and if it's in any of the 2-word clusters, we can try one of the two words on turn 5 and (if necessary) the other on turn 6. But as it turns out the five triples are also easy to resolve: no matter which of the three cluster words we enter on turn 5, it turns out to give enough information to determine which of the other two words is the hidden word. So there's no need to develop a strategy: freeform guessing will end the game by turn 5 in 2246/2315 of the cases, and on turn 6 in the other 69/2315. So the distribution vector is just [0.970194, 0.029806], and so there is a 100% chance of victory, taking an average of 5.0298 turns. As with quintuples, we can at least estimate the performance for compound games: an independence assumption would make the expected number of turns be 4 + 1.0298 N , and the fact that the subgames need not be independent only serves to lower the expected number of turns. Similarly we can compute the probability of a win by the (N+5)th turn under an independence assumption to be .9709194^N + N*.029806*.970194^(N-1) and again be confident that the true probability of a win is higher. (Since the cases in which we do NOT have an instant solution are already rare, the independence assumption is not all that far from reality. Interestingly, we can similarly under-estimate the probability of a winning a compound games using our best starting quintuple in the previous section, to be 0.995248^N . The two (under)estimates agree around N=16, which suggests that for Sedecordle and the larger games it may be better to use the 5-word starting set, but for say Sedecordle and smaller games, it's the 4-word starting set that may be the better choice.) Note that this set has 20 different letters (all but k, f, and the rare j,q,x,z) which gives the human player a lot of information about the hidden word right away. That's handy! Intuitively, one would expect that using 20 different letters would tend to keep cluster sizes small. So for most of this section, we will focus only on such starting quads. We can compare all these quadruples, using the different rankings established in the previous section. (A) THE BEST STARTING QUADRUPLE BY THE VARIOUS POST-START METRICS We start with the rankings established by the color metrics G + f Y. The best quads for various values of f are: [brave, flint, pudgy, shock] (G,Y = [4117, 6030] ) for f < 0.31838 [budge, flack, print, showy] (G,Y = [4096, 6096] ) for 0.31818 < f < 0.32026 [balmy, fudge, print, shock] (G,Y = [4047, 6249] ) for 0.32026 < f <= 1 At f=1 there is a 1247-way tie of quads that all yield 10296 colored tiles, namely, the quads made of the 20 most common letters in Wordle (all the letters except v,w, and jqxz); the one yielding the most greens is again [balmy, fudge, print, shock]. (For all f > 1.05405, the best score is held by [aglow, fetid, nymph, scrub] because it is the quad with the most yellows: [G,Y]=[2151, 8137] .) Note: none of these "best" quads (by the color metrics) has any perfect anagrams, but for example in second place for small f we have a tie (of course) between these three perfect anagrams: [brick, fudge, plant, showy] [black, fudge, print, showy] [budge, flack, print, showy] Altogether there are 28 quads that are in the top 5 when ranked by the values of G + f Y , for some value of f > 0; they form just 22 distinct (G,Y) pairs on the convex hull of all the 45174 (20-letter) quads because 6 of these quads are perfect anagrams of others. We next can rank the quads by the Lp metrics, for all real numbers p. Over all real values of p, there are only four quads that are ever "best": [batch, drove, slung, wimpy] for p > 4.826180826 [blown, carve, dumpy, sight] for smaller p > 3.833921198 [chump, dying, fable, worst] for smaller p > 2.000000000 [bugle, champ, downy, first] for smaller p (i.e. p<2.0) Each of these has 0, 1, or 2 clusters with four elements, and no larger clusters. In particular, the first of these is ranked best by all large p because its largest cluster contains only 3 words, and 3 is the minimum for all these 45K 20-letter quads. (There are quads like [flown, jerky, match, squib] in this set that have as many as 15 words in a cluster!) Some other quads also have just 3 elements in their largest clusters, but this one is the only one to have only a single cluster of 3. (Its cluster vector is [2148, 82, 1].) The last of these four clusters wins on several counts: its cluster vector is [2199, 50, 4, 1], so it has 2199 singleton clusters, which is the maximum, and it has 2254 clusters altogether, which is also the maximum. Its average "ambiguity" is 1.058747300 words per day, which is the minimum, and which is accomplished only by this and the third quad (whose cluster vector is [2913, 56, 2, 1].) Literally using the Lp ranking with p=+infinity puts into first place all the quads whose largest cluster has the minimal size, which among these 45K quads is three. That gives a massive tie for first place to the 3223 quads which have no clusters larger than 3 elements. (We can break the tie among those 3223 quads by considering these same metrics: By using the large-p Lp metrics (which we have done above), this is tantamount to ranking them by the number of clusters of size 3 that they have. The one which unambiguously identifies the most words is [chant, dowel, rugby, skimp] with cluster vector [2186, 57, 5]. It's also the one with the lowest average daily ambiguity: on average, the player is choosing the hidden word from a set with 1.062203024 candidate words in it. In fact it's the "best" one of these 3223 candidates by evey Lp metric with p<3.685172 . It's also the one with the most clusters altogether (2248), which means it will win most often on move 5 (2248/2315 of the time) if we pursue a strategy of simple guessing words within clusters.) Returning now to the full set of 45K 20-letter quands, we have already listed the ones that are "best" by any Lp metric. The other quads that show up in the top-5 ranking for some value of p are [batch, dimly, prove, swung], [cable, fight, rowdy, spunk], [clump, grove, handy, swift]*, [crown, dumpy, fight, salve], [comfy, diver, plush, twang], [clasp, downy, giver, thumb], [carve, dumpy, flown, sight], [carve, downy, fight, slump], [chant, dowel, rugby, skimp], [chump, dingy, fable, worst], [crump, downy, fable, sight], [dumpy, globe, ranch, swift], [bland, comfy, purse, wight], [bugle, cramp, downy, shift]*, [barge, clump, downy, shift], [bugle, candy, morph, swift], (*= plus the perfect anagrams [crump, glove, handy, swift] and [bugle, crimp, downy, shaft]) The quads [crown, dumpy, fight, salve] and [clump, grove, handy, swift] have the same cluster vector [2151,79,2], so they will be tied in the rankings from every Lp metric. However, they are not perfect anagrams of each other, and so will be different by some of the other (non-Lp) rankings. (For example, the numbers of green and yellow tiles they produce over a complete cycle are [3574, 6596] for the first and [3693, 6477] for the second, so the second quad is "better" in the sense of giving more information to the player in the form of colored tiles.) This collision is not rare: there are only 24587 distinct cluster vectors for the 45147 quads. There are also many collisions for specific values of p . Because the cluster vectors for these quads are so simple, the equations that define the values of p for which the Lp metrics of two quads are equal, are themselves also very simple, and likely to be repeated for other pairs of quads. Indeed, see the chart showing how the rankings of these two dozen quads vary as p varies; there are multiple values of p where more than one pair of quads exchange places in the rankings. This is quite unusual --- it does not happen (much) for smaller starting sets than quads, because such starting sets have longer, more complicated cluster vectors. (B) BEST STARTING SETS IF YOU INTEND TO JUST KEEP GUESSING I computed the probability distribution of each of the 45,174 20-letter quadruples: if a player consistently pursues a guess-at-will strategy with the same starting quad, what fractions of the games will end after 1, 2, 3, ... more turns. From that probability distribution we can compute the average number of turns needed for victory, and the probability of a loss (i.e. failure to guess the hidden word by turn 6). The starting quads with the lowest average number of turns are [bugle, champ, downy, first] 5.02754 turns on average [barge, clump, downy, shift] 5.02797 [bugle, candy, morph, swift] 5.02797 [bland, comfy, purse, wight] 5.02840 [chump, dying, fable, worst] 5.02840 Notice that the differences are very small, and sometimes zero! They amount to needing a single extra turn across the entire 2315-day cycle! The list continues with very small increments for a considerable length. Indeed, 45 thousand entries later we reach the worst quad, [fjord, glyph, quack, vixen], which still takes only 5.23466 turns on average; so the increments must be small. (Perfect anagrams would have identical turn averages; the pairs in the table which appear to have equal averages really do, but they are not perfect anagrams of each other.) The other metric that we use when incorporating the guess-at-will strategy is the failure rate: which quads would lose least often when following this strategy? The five quads listed above would all occasionally require a seventh turn to win. But as noted at the outset in this section, there are starting quads like (4-1) which will never lead to a loss if the player simply guesses any Wordle word that is consistent with the clues at each turn! There are 230 such quads among the 45K, so they are all tied for best by this metric. To break the tie we might invoke one of the metrics from section (A). For example, 130 of them have a maximum cluster size of 3. (All the others have a max cluster size of 5, except for [angst, birch, dumpy, vowel] whose cluster vector is [2145, 78, 3, 0, 1]; yet even its largest cluster, {skate, stake, state, steak, taste} can be resolved in two turns with free-form guessing!) Alternatively, we could break the tie by looking at the average number of turns needed. In that case the winners are [carve, downy, plumb, sight] 5.02980 turns on average [brawl, coven, dumpy, sight] 5.03024 [burst, champ, dingy, vowel] 5.03024 [carve, downy, fight, slump] 5.03067 [covet, gland, shrub, wimpy] 5.03067 [burst, champ, dying, vowel] 5.03110 (The last has a perfect anagram [burst, champ, dowel, vying] too.) These all have similar, 3-term, cluster vectors. These 230 quads are the most impressive, but really, starting with any 20-letter quad is sure to give satisfactory results. The very worst of them still wins 97.58% of the time just by guessing Wordle words after the initial quad is played. The smallest *nonzero* rate of failure among the 45K quads is exactly 1 loss per 5 full cycles, i.e. the daily Wordle player could expect a loss only once every 32 YEARS! (C) BEST STARTING SETS IF YOU'LL USE A SIMPLE STRATEGY THAT FORCES A WIN Since the guess-at-will strategy is already very successful, the use of a strategy which dictates actions only for problematic clusters is expected to result in only small changes in play. In particular, we expect that the best quads now in part (C) are likely to be among the best ones in part (B). So I reviewed each of the top 1000 quads, ranked by the average number of moves until victory when playing guess-at-will. For each of them I determined which clusters could cause a loss by turn 6, and selected a preferred word, or if necessary an out-of-cluster word, to play on turn 5 that would guarantee success by turn 6. (Since there are only those two turns left after the initial quad, it's easy to see that any preferred word that works will give the same average number of turns; likewise any successful out-of-cluster word will give the same number of moves.) Note that our decision to seek minimal rules for a win-by-turn-6 strategy limit us to using pre-determined moves only for clusters that could otherwise lead to a loss; in particular, for any of the 230 quads that have no problematic clusters, that pattern dictates that we will introduce no new rules into our strategy for those quads; the average number of turns for them will be the same here in part (C) as it was in part (B). When a win-by-turn-6 strategy is found for a quad, the distribution vector will be simply of the form [1-x, x] where x is the fraction of the time the game goes to turn 6; then the average number of moves is 5+x . Our standard metric for starting sets having a win-by-6 strategy is to minimize this average number, which is equivalent to minimizing x . Here are the best quadruples, along with the value of x . Also shown are the numbers of in- and out-of-cluster words we must remember to use in order to guarantee this win by turn 6: x=0.027213, 1, 2 [bugle, champ, downy, first] x=0.027645, 1, 2 [bugle, candy, morph, swift] x=0.027645, 1, 2 [barge, clump, downy, shift] x=0.028077, 1, 2 [chump, dying, fable, worst] x=0.028077, 1, 1 [bland, comfy, purse, wight] x=0.028509, 1, 2 [bugle, crimp, downy, shaft] x=0.028509, 1, 2 [bugle, cramp, downy, shift] (anagram) x=0.028509, 1, 1 [dumpy, globe, ranch, swift] x=0.028509, 1, 1 [downy, farce, plumb, sight] x=0.028509, 1, 1 [chump, globe, randy, swift] Again the numbers are close and we have multiple quads that rank equally highly. About 22 of these good (top-1000 !) quads do not have any winning strategy! In each of the cases of failure in that cohort, the starting quad created a cluster of four _AUNT words such as {haunt, jaunt, taunt, vaunt}. In such a case, there is no way to guarantee a win by turn 6: no matter what Wordle word is entered on turn 5, there will be at least one pair of these four words that are scored the same, and all we can do on turn 6 is pick one of them to enter, and face a loss if we guessed wrong. Other quads could guarantee a win by turn 6 but only by fixing a strategy for quite a few clusters, including out-of-cluster plays, because the clusters included multiple words of tricky forms like _AUNT, _ATCH, _IPER, CO_ER, etc. So the quads in the table above are remarkable not only because of the low numbers of turns needed but because of the low numbers of rules to be learned and followed. (Of course, by any notion of "simplicity", the quads with the most simple strategy to achieve a win by turn 6 are the 230 quads that can accomplish it by guessing any word in the cluster!) We will close out part (C) by discussing a few examples. The unambiguous best quad by the standards of part (C) heads the preceding table: [bugle, champ, downy, first] Its largest cluster is {skate, stake, state, stave} . Guessing, say, STATE on turn 5 would be a problem if the hidden word were STAKE or STAVE --- both would get the same response from STATE. So instead, guess SKATE as the preferred cluster member. Two of its other clusters are {piper, riper, viper} and {jaunt, taunt, vaunt} and it is clear that guessing any member of these clusters can lead to a loss. A suitable recipe is to guess PARER in the first case and JETTY in the second; then the game will have to go to turn 6, but now there is enough information to enter the correct word on turn 6 and win. As noted in the table above, this strategy will lead to a use of turn 6 2.72% of the time, so the average number of turns needed is 5.0272. That's a minimum over all these 20-letter quads. In the sequel, we will summarize this strategy in just a few lines: startset: [bugle, champ, downy, first] preferred: { skate } out-of-box: [jaunt, jetty], [piper, parer] We have already highlighted the quad [chant, dowel, rugby, skimp] It uses few moves, on average, to win. Unfortunately two of its clusters are the sets {jaunt, taunt, vaunt} and {focal, local, vocal}, and it is clear that any use of in-cluster words has a one-in-three chance of not finding the hidden word until turn 7. We can guarantee a win, but that requires using turn 5 to play "jetty" or "trove", if the first tricky cluster shows up, and something like "fever" if the second cluster does. In that case the distribution vector will be [0.9701, 0.0299] : 5.0299 turns on average. The set (4-6) [bawdy, flung, porch, smite] was suggested to me by a friend when I was first introduced to Wordle. It also uses 20 different letters and has a fairly good cluster vector [2165, 69, 4]. But it's actually not quite as good as the previous quads. One of the four largest clusters consists of the three words {jaunt,taunt,vaunt}. For this starting quad there is only the one problematic cluster so we need only one extra rule: Start with [bawdy, flung, porch, smite]. Then If the hidden word *could* be "jaunt", play "judge". Otherwise, continue to guess anything consistent with the clues. (In the notation of the introduction, this is the one rule [jaunt, judge] As it happens, JUDGE and TROVE are the only words we could use here!) It we use no strategy with this starting quad, but just guess words consistent with the clues, the probability distribution (frequency that the game lasts 1, 2, or 3 more turns) is [0.966739, 0.032829, 0.000432] Using instead the one rule "[jaunt: judge]" changes the distribution to [0.966307, 0.033693]. The success rate on turn 5 has gone down, the average number of turns is unchanged (at 5.0337), but importantly the maximum length of a game has gone down from 7 to 6 by switching to this strategy. There may also be more efficient 4-word sets that involve fewer than 20 letters; I haven't found any yet. (And I observe that these may be harder to use for people who don't know the wordlist well.) Here is one example that at least comes close: (4-8) [champ, flown, rugby, steed] Its cluster vector is [2132, 87, 3] and I compute the probability distribution vector to be [0.959827, 0.039165, 0.001008] meaning a 0.10% failure rate and an average of 5.0412 turns if we play by guessing at random. Or we can guarantee a win by turn 6 if we add rules {stark; [jaunt, jetty], [stake, evoke] } . With this algorithm the probability distribution vector is [0.958963, 0.041037] meaning an average of 5.0410 turns per game (and a maximum of 6 !) Overall not as good as the 20-letter quadruples we've met, but close! I don't claim to have examined all possible starting quadruples; there may be more that should be listed, especially if they excel according to some other metric than we have used so far. Just as at the end of the last section, we can offer a Wordle starting set that bridges two sections of this document. The quad (4-9) [blast, midge, porch, funky] adds one word to one of the best starting *triples* from the next section, for all the same reasons -- to get a hint, to backpedal in a goal to start only with a fixed triple, etc. As before, this "augmented triple" won't measure up as well as the actual (excellent) triple, but it may be easier to use. FUNKY is arguably the best word to add to the other three. This quadruple now has the simple cluster vector [2156, 66, 9]. With a guess-at-will strategy the distribution vector is [0.963715, 0.035133, 0.001152] which works out to an average of 5.0374 turns. But there's still a loss that way, so we look for additional rules for the tough clusters. A simple choice turns out to be {eager, fever, [catch, clown], [jaunt, jetty]} For this algorithm, the distribution vector turns out to be [0.962851, 0.037150] for 5.0372 turns on average, and of course 100% win rate. ============================================================================== THREE This is a long section because there are many starting triples that are "good" for different reasons, so no single one can be called "best". I have created a separate file that contains all the statistical data for the triples mentioned in this section; feel free to weight the different criteria as you wish to select your favorite starting triple! With a set of three starting words, we can surely win by turn 6, but in practice this can be tricky. After just three initial words there are at least 11 letters that will not have been tested, so the player must do more sleuthing; e.g. it is quite possible that after three initial guesses the player has seen nothing but grey tiles! And many starting triples cannot guarantee success by turn 6 simply because they cannot quickly enough distinguish, say, JOKER, BOXER, FOYER, LOVER, WOOER and the other twenty(!) _O_ER words. Still, by choosing an appropriate starting set of three words, one can hope to have a 100% win rate at Wordle. After all, we have already seen in the last section that we can win 100% of the time starting with CARVE + DOWNY + PLUMB; surely with the freedom to choose something other than SIGHT next, we should be able to ask for something more than just a 100% success rate in 6 turns. At the very least we should be able to arrange a lower average number of turns until a win. What else might we ask for? What are we willing to give up? How do we decide that one or another three-word starting set is "better" than another, or even "the best"? The question of what is a good three-word starting set arises periodically on the Reddit forum. I fashioned a detailed response analyzing many of the starting triples that had been proposed. In this document, we can put that analysis into context. What we will see is that trying to reduce the expected number of turns needed to win will introduce more complexity in our algorithms. To be precise, what we had in the previous section was a starting set [carve, downy, plumb, sight] that had two features: (A) It wins 100% of the time within two more turns. (B) It requires no added rules besides "guess any candidate". In this section we could hope for a THREE-word starting set with both those properties. After all, it *is* known that there are algorithms to win Wordle in just 5 turns (although the best algorithms to do that to my knowledge all require long lists of rules). Sadly, I can prove that no set of three words can have BOTH properties (A) and (B). In fact, I am pretty sure that (B) alone is impossible (more on this below). But we can find starting triples that have property (A)! ---------------- (A) CAN WE FORCE A WIN IN FIVE TURNS? YES! I have found that there are exactly 261 starting triples with which every game can be won by turn 5. For each of these triples, there will be clusters of words (signalled by the pattern of the 15 colored tiles) that could lead to a loss if we simply play with a guess-at-will mode, that is, we will have to map out some preferred words or out-of-cluster words to use in those cases. (See the examples of PROBE and BRACE in the Introduction.) Unfortunately, for each of these starting triples, in order to achieve goal (A), we need at least 54 rules of these two types, which is perhaps too many for a human to execute while playing a game. Which is "best" among these 261 triples is as always a matter of taste but (R3-01) [blast, midge, porch] is certainly a good choice. Let's discuss this triple in detail; comparable analyses for other triples are given as a table in another file. Of all 261 starting triples, this one's cluster vector [1597, 207, 53, 18, 9, 3, 0, 0, 0, 1] has the highest total number of clusters, which translates into having the highest probability of getting the word on the very next turn after the starting triple (81.56%) just by guessing. It also has the highest number of singleton clusters, meaning more words are known with certainty after this opening triple than with any of the other 260 special triples. The distribution vector using the guess-at-will strategy is [0.815551, 0.168524, 0.014825, 0.001100] so this strategy will take an average of 4.2015 turns; more importantly it won't complete by turn 6 about 0.11% of the time, and definitely will not always complete by turn 5 ! The precise reason is that there are 55 clusters where a hidden word can still be hidden after two clue-consistent guesses from within the cluster. Of these, 39 clusters can be resolved by turn 5 by playing a "preferred" cluster member on turn 4 (e.g. {arena, freak, raven, wafer, waver, wreak} is such a cluster; "wafer" is the one choice that will work). The other 16 clusters can be resolved on turn 5 by using an out-of-cluster word. (One such cluster is {jaunt, taunt, vaunt}; in order to guarantee a win by turn 5 we must enter either "jetty" and "trove" on turn 4.) So in toto we have 55 such rules that must be memorized, one for each tricky cluster. A sample algorithm using this information might be this: After blast+midge+porch, enter any word consistent with the clues, except * If the word COULD be any of the following 39 words, then play it: allow, antic, awake, award, bevel, crown, dizzy, dowdy, dried, drone, eater, enter, equal, fauna, fatty, fewer, filly, finer, folly, funky, jelly, kitty, liner, mafia, otter, relax, safer, seize, sever, shown, skate, skulk, swash, taste, testy, udder, unfed, value, wafer * If the word COULD be any of the following 16 first-halves, play the second half: [anger, gawky], [catch, crown], [cinch, crown], [crane, ozone], [fatal, fella], [field, gawky], [fight, frown], [fizzy, ozone], [focal, fella], [forth, crown], [fudge, funky], [jaunt, jetty], [major, jetty], [rower, gawky], [snoop, frown], [stoke, funky] Then (if the hidden word has not already been played) there is only one Wordle word consistent with the clues; play it on turn 5 and win. The probability vector for this set of rules is [0.808639, 0.191361], meaning the game runs 4.1914 turns on average (and has a 100% rate of completion by turn 5). This is the lowest average turn count among the algorithms that I checked for these 261 triples. Other strategies for the 55 problematic clusters exist; for example, for each of them --- indeed for all but four of the 291 non-singleton clusters! --- one or more of the following ten words will split the cluster completely. Make a table of which of these words you wish to use to resolve each of the 55 clusters to create your own win-by-turn-5 algorithm: [crown, fewer, filly, funky, gawky, jetty, navel, skate, spunk, tawny] (The cardinalities of the sets of ten words here and the seven used in the previous algorithm are minimal, as determined by Gurobi.) With this given starting triple ("Phase 1") these different algorithms to complete the daily puzzle ("Phase 2") can have slightly different probability vectors and thus different expected numbers of turns. This starting triple can also be used, more easily, to win in Wordle by turn 6. That is, a player who initially intends to follow this algorithm so as to win by turn 5, may decide during the play that it would be sufficient to win by turn 6, and then can forget most of the 55 special rules listed above; the only ones still needed are {crown, fauna, fewer} and [fight, frown], [fudge, funky], [rower, gawky], [snoop, frown] Alternatively if we're willing to wait until turn 6 to win, we can use more preferred words and fewer out-of-cluster ones: the strategy {crown, fauna, fewer, fudge, rower, snoop, [fight, frown] } works, and gives a distribution vector of [0.815119, 0.172282, 0.0125989] and thus a slightly higher number of turns (4.197) than when using the 55 rules to finish by turn 5. (All those extra rules were needed just to avoid these 1.26% of the days when the game took a sixth turn to win!) We have already mentioned a third alternative in the previous section: we can consistently play FUNKY on turn 4 and then follow rules for just four special clusters in (4-9); but this gives a significantly higher expected number of turns: 5.037. Note that by using this 3-word starter set on an N-fold compound game, we can solve all N of the subgames in at worst 3+2N turns. In that worst case, this is more than the N+5 turns typically allowed in compound games. But when N=2, the two are equal, meaning we have a guaranteed winning strategy for 2-fold Wordle. Dordle is not exactly a 2-fold Wordle -- it uses a different wordset --, so one does not have an a priori guarantee that this algorithm will work for Dordle. But as it turns out, it does still work, with minor modifications. Change the set of preferred words to this set of 35: { allow, assay, awake, awash, awful, crown, dowdy, drone, eater, enjoy, enter, fatty, fever, finer, folly, funky, goner, jawed, kneed, lefty, newly, otter, relax, sally, seize, sever, skate, skier, skulk, snipe, testy, tower, value, viper, wafer } and change the set of out-of-cluster moves to this set of 20 pairs: [anger, wagon], [catch, clown], [cinch, awful], [crane, anvil], [dizzy, dozen], [fatal, awful], [field, awful], [fifty, flank], [fight, flown], [focal, fever], [forth, awful], [foyer, gawky], [fudge, fauna], [jaunt, jetty], [liner, anvil], [lower, anvil], [major, agony], [snoop, flown], [staff, bonus], [stoke, ankle] Then each of the two subgames of the Dordle game will definitely end within 3+2 turns, i.e. the whole game will end within 3+2+2=7 turns. Finally, for basic Wordle we may return to a point made in the Introduction. If we enter only BLAST + PORCH, on about one-fourth of the days we will see 4 or 5 colored tiles, or 3 greens, or 2 greens and a yellow. In most of those cases we can still win by turn 5 by simple guessing without entering MIDGE! We need only watch for the following words, to be used as preferred members of their cluster: [blade, blond, brain, ditch, graft, gulch, mouth, plain, swash] and if the word could be STORE or CATCH, play HYMEN. Doing so will lower the expected number of turns to 3.9 . (We can similarly avoid MIDGE in the compound games, but this analysis applies only if all the subgames show such a favorable return from just BLAST + PORCH, which becomes increasingly rare as the number of subgames increases.) This long analysis of BLAST + MIDGE + PORCH can be repeated for each of the other 260 starting triples that have property (A). I have not done so, but have collected some data about those triples and invite a discussion of which others are, by some measure, better than this one. ---------------- (B) IS THERE A WINNING STARTING TRIPLE THAT REQUIRES NO EXTRA RULES? NO! Now, what about property (B)? Surely it would be convenient to have a starting triple that worked as easily as the starting quad of CARVE + DOWNY + PLUMB + SIGHT : just enter the starting set and keep guessing words that are consistent with the clues. It would be great to know for certain that we'd find the hidden word by turn 6! I believe I can prove that no such triple exists when using only words from the Wordle answer-list. Just for this search, though, I also looked at the longer 14853-word list of valid input words. In order to speed things up I made the reasonable, but not ironclad, assumption that such a triple would involve 15 distinct letters. (This permitted me to doing a preliminary compression to the 5,649 sets of five distinct letters that can form at least one of those words, and then to non-intersecting triples of such letter-sets.) If I have done the search properly, I can report that no such perfect triple exists: for every (15-letter) triple of allowed input words, there is at least one cluster for which the guess-at-will strategy can lead to a loss in standard 6-turn Wordle. I did also look for near-misses, though, and found a couple of triples for which there is only one bad cluster. The best is (R3-02) [bonds, glamp, fecht] Each of the hidden words SKATE, STAKE, STARE, STATE, and STAVE will turn yellow the S, A, E, and T tiles, and obviously a guess-at-will strategy would for example allow the player to guess them in e.g. reverse alphabetical order, which would be a loss if the hidden word were STAKE. So in this case the player must remember (only) one additional rule: "prefer SKATE". Also having just one bad cluster is (R3-03) [techs, glamp, rownd] which has a cluster {berry, eerie, ferry, fever, jerky, verve} . This cluster can again lead to a loss from random guessing (e.g. the sequence "verve, jerky, ferry, berry") but again the loss can be prevented by playing the preferred word BERRY on turn 4 if we get a green E and a yellow R from the starting triple. (While triple (R3-02) has one word in the Wordle solution-list, triple (R3-03) has none at all! I would also say that the first triple is "better" than the second in the sense that we only have to invoke our special rule ("guess SKATE") on five days out of a 6.5-year cycle, as opposed to needing the other special rule ("guess BERRY") on six days per cycle. There's also a minor technical reason. Finding these good triples amounts to making sure they never (or rarely) permit quadruples like {berry, ferry, jerky, eerie} to be together in a cluster after the initial triple of words is entered. I assembled a list of tens of thousands of these problematic quadruples and then developed mechanisms to detect starting triples that broke most of these quads apart. The first triple I listed only missed its one quadruple "STA_E". The other one actually missed two: both {berry, ferry, jerky, eerie} and {berry, ferry, jerky, verve} are problematic quadruples. This is a minor distinction of course.) I did not find any other starting triples that involved only a single non-singleton cluster. Both GLAND+ROMPS+FECHT and GLAND+ROMPS+WECHT involve just two (and each of them actually leaves three problematic quadruples unseparated). I make no claim about whether other equally-good triples exist. (My method of sorting was only designed to make sure I didn't miss any starting triples that guaranteed success *just by guessing* (with zero special rules like "use SKATE"), and so I am fairly confident that such a triple does not exist; but along the way I had to branch though decision trees to trim the candidate pool, and a starting triple that was a "near miss" might not have been good enough to survive an early-stage pruning.) ---------------- . Now let's return to the consideration only of triples of words drawn from the more limited (and more reasonable!) Wordle answer-list. I can present a number of triples that are good (by several measures); only rarely can I truly claim that the ones I found are absolutely the best. Using only the words in the Wordle answer list, there are over 2 billion sets of three words that we might potentially consider as starting sets. Just as we observed with five- and four-word starting sets, it is certainly true that some starting sets with repeated letters can be good. In fact, 46 of the 261 starting triples that can lead to a guaranteed win by turn 5 have repeated letters! (Example: [crump, doubt, salve].) And some posters on Reddit say they use with success some starting sets with repeat letters, e.g. [blind, stare, wimpy] and [colon, right, speed]. But generally speaking it seems prudent to focus on sets of three Wordle words that include 15 different letters. That reduces the number of candidate triples to 1,243,026. (I have listed them all in a 26Mb zipped file.) This is just small enough a set that it is possible to run some quick preliminary computations on all of them and then run longer analyses on the most promising among them. (Some statistics take a while to compute!) I computed assorted other statistics for some "promising" triples in this Reddit post. We will review some of the "best" triples that can be found this way. ---------------- (D) THE BEST STARTING TRIPLE BY THE VARIOUS POST-START METRICS As discussed in a previous section, there are two types of metrics to assess how the game "looks" immediately after the starting set is played: those using the cluster vector, and those counting the colored tiles. Each method splits into a whole continuum of metrics (distinguished by a parameter), and allows a couple of variations. As for the cluster-vector metrics: I have worked out the rankings of the triples based on the Lp metrics, and found the top-4 triples for every possible value of p. The only triples in that list are [birth, model, spawn], [bland, comet, sprig], [bland, copse, mirth], [bland, copse, right], [blimp, dance, short], [blimp, dance, worst], [blind, comet, sharp], [blind, match, spore]*, [blond, march, spite], [blond, rivet, scamp], [chirp, golem, stand], [clamp, diner, ghost], [climb, donut, parse], [climb, sandy, trope], [midge, porch, slant] (*):There is also the anagram triple BLOND+MATCH+SPIRE which results in identical game play to BLIND+MATCH+SPORE, so it will not be separately mentioned. Specifically, the very best triple is, for each p, one of these three: (R3-04) [blimp, dance, short] (for all p < -0.7479659304) (R3-05) [bland, copse, right] (for the next p < 2.166571184) (R3-06) [bland, comet, sprig] (for all larger p ) Each of these passes to second-best for some other ranges of p ; the only other triples that are second best for any p are (R3-07) [bland, copse, mirth] (for all p < -0.747965930) (R3-08) [clamp, diner, ghost] (only for 2.232224367 < p < 2.282649886) (R3-09) [blimp, dance, worst] (for all p > 2.282649886) Particular values of p give us the metrics of independent interest: p=+infinity, p=2, (p=1), p=0, and p=-infinity. When p = +infinity, the ranking by the Lp metric coincides with ranking by metric M1, the size of the largest cluster. It turns out that every triple yields cluster(s) with at least six words in them. There are 283 triples whose largest clusters have only 6 members, so all tie by metric M1. We can break the ties to pick out the "best" among that set by using the Lp metric for large (finite) p ; that puts (R3-06) on top, closely followed by (R3-09); then come (R3-10) [blond, rivet, scamp] (R3-11) [birth, model, spawn] As it turns out, these four are the only ones of the 283 to have just a single cluster of size 6. They also score best by the metrics we will use in parts (E) and (F). I did a partial search for triples that use fewer than 15 different letters, yet still had no "large" clusters. I still did not find any triples whose largest clusters are smaller than 6 words, but I did find some reasonable candidates with just a few 6-element clusters and nothing larger. An example is BLIMP + COAST + RENEW , but it's not quite as good as the previous four -- it uses slightly more turns, has more problematic clusters, etc. (It has four clusters of size 6; I have not yet found a triple with repeated letters that has at most one cluster of size 6 and none larger.) Recall that when p=2, the Lp metric is equivalent to the average cluster size (as measured day by day). According to this ranking, the best triples are (R3-05), (R3-04), (R3-08), and (R3-06). A player using the first of these triples day after day would on average have only 1.4501 words to choose from in the cluster containing the word of the day! (Those first two triples maintain top ranking for all lower p , including p=0 and p=-infinity). When p=1, all triples are tied; the L_1 metric simply counts the words in the Wordle dictionary (2315) no matter what the triple. Even though the optimization process is treated a bit differently for p>1 versus p<1, for p very near 1 on either side, the ranking is the same: the best for this p is (R3-05), followed by (R3-04), then (R3-12) [blind, match, spore], then (R3-13) [blind, comet, sharp] As noted in our section on Rankings, these triples are best for those values of p near p=1 because they are the ones with the smallest derivative there, sum( v_i * i * log(i) ). Taking p=0 gives metric M2, the total number of clusters; it is maximized by (R3-05) which spreads the 2315 words into 1954 different clusters; (R3-04) is second best with 1952. Then follow (R3-07) and (R3-14) [midge, porch, slant] with 1949; these two switch their positions in the rankings precisely at p=0. Finally, for very negative p the ordering of starting triples matches that of "p = - infinity", i.e. metric M3 (the number of singletons). Now (R3-04) is the best triple, putting each of 1696 words --- nearly three-fourths of the Wordle solution list --- into its own cluster. This is followed by (R3-07), (R3-14), and then (R3-15): (R3-15) [blond, march, spite] (R3-16) [copse, gland, mirth] In a previous section we proposed variants of the Lp metrics that can also be computed from the cluster vectors. We have metric M4, which counts words that are in clusters with at most 3 words in them. The winner on this scale is (R3-17) [bench, midst, polar] It has just 78 words in clusters of sizes 4 or more (10 quads, 3 sextets, and one cluster each of sizes 5,7,8); the other 2237 words are in clusters small enough that we can just test all the candidates in a cluster by turn 6 ! Second and third place go to (R3-18) [chirp, donut, false] (R3-19) [blind, comet, spray] with 2234 and 2228 respectively; then this is followed very closely by many good ones. Not much changes if instead we count the number of small clusters, as a fraction of the total number of clusters. This time it's (R3-18) that ekes out a win, with 99.21% of all clusters small; (R3-17) is just behind it with 99.16%, then (R3-19) is third with 99.12%, and in fourth, with 99.11%, is (R3-20) [bench, solid, tramp] They have only 15, 16, 17, and 17 clusters, respectively, with more than 3 words in them. We turn next to the metrics relating to the numbers of green and yellow tiles produced by the starting triple: M5 = G + f Y, with colored tiles counted across a whole 2315-day cycle. When f=0, we are comparing our starting triples by looking only at the green tiles. In this case, and indeed for all f < 0.0769, the best is (R3-21) [brace, moult, shiny] which produces 3611 green tiles per cycle (an average of 1.56 per day) along with 5260 yellow tiles. For larger values of f up to 0.5587, including the natural choice of f=1/2, the best is this one, with G=3605, Y=5338 : (R3-22) [built, crone, shady] (Note: [built, crony, shade] will yield identical game play, and so need not be mentioned separately.) Then, for higher fractional weights (up to f = 1.0) the torch passes to this one, which offers 3505 green tiles and 5517 yellows (total: 9022) (R3-23) [curly, point, shade] Finally at f=1.0, yellow and green are counted equally. At that point, any two triples that use the same letters will show the same number of colored tiles, so there is a massive tie of 455 (I think) triples that all use the letters {a, c, d, e, h, i, l, n, o, p, r, s, t, u, y} , probably the best of which is (R3-23) itself. (It also has the highest proportion of green tiles among the colored ones.) So each of those triples qualifies as "best" for a range of values of f. For each positive value of f <= 1, the second- and third-best are either one of those three triples, or one of these four : (R3-24) [briny, coupe, shalt] (3600 G, 5319 Y) (R3-25) [cruel, point, shady] (3476 G, 5546 Y) (R3-26) [count, plier, shady] (3466 G, 5556 Y) (R3-27) [crony, guide, shalt] (3547 G, 5429 Y) (The triples [crony, guilt, shade] and [crone, guilt, shady] have exactly the same letters in the same positions as (R3-27), so the game proceeds identically in all three cases.) Rather than a high average count of colored tiles, you might want a high minimum count of colored tiles. Arguably the best triple to use, to avoid days with few clues from the starting triple, is: (R3-28) [handy, slice, tumor] Use this starting triple and you will never get an all-gray day! There is only one day in the whole 7-year cycle when you will get just a single colored tile (it will be a green A because the hidden word will be "kappa"). In addition you get just two yellow tiles on 15 days, two greens on 21 days, and one of each on 47 days. On all the other 2231 days you will be rewarded with three or more colored tiles. (Having these same attributes, but generally worse measures of play, are its permutations [handy, lemur, stoic] and [duchy, merit, salon]. All triples made from these 15 letters will give three or more colored tiles to all but 84 of the 2315 hidden words; no letter set can reduce this number below 84. The main reason many of those 84 words give only 1 or 2 colored tiles is because the hidden word itself is made of few letters, e.g. MAMMA can never get more than two colored tiles from our 15-letter starting triples! If we restrict our attention to the hidden words made of five distinct letters, every one of them except GAWKY will return three or more colored tiles to triple (R3-28).) A variant in the opposite direction (many colored tiles) is to ask for the greatest number of colored tiles from just the first two words in a triple. We mentioned in the Introduction the possibility of starting with an intended triple but abandoming the third word if the first two have already yielded "many" clues. (This is much less useful in the compound games.) In the next section we will look at starting pairs, and given any starting pair that gives many colored tiles (by whatever weighting we want for yellow versus green ones), we can simply adjoin a third word to get a starting triple that would be optimal by the metric of this paragraph. Carrying this out for some high-ranking starting pairs we find that the best third word to add is usually dumpy, jumpy, or pudgy. Judging the triples that are created in this way by the metrics for triples, the best is probably (R3-29) [close, train; dumpy] Also scoring well are [scone, trial; dumpy] and [crony, slate; humid]. ---------------- (E) BEST STARTING SETS IF YOU INTEND TO JUST KEEP GUESSING We have already observed that no triple has property (B), that is, for every starting triple the guess-at-will strategy can fail to find the word by turn 6. So we begin with metric M6 : the probability of a loss. I believe the very best by this metric is (R3-30) [blond, girth, swamp] It's actually kind of hard to make enough bad guesses that you lose after 6 turns! The expected number of losses in an entire 2315-word cycle is 0.446, i.e. you could reasonably expect to go *14 years* between losses! (But yes, it can happen: For example it would be consistent with the rules to guess the sequence "treat", "extra", and "cater", and lose if the hidden word is "after". Try it and see.) Runners-up appear to be (R3-31) [blond, right, swamp] (R3-32) [blend, right, scamp] That small chance of failing to finish by turn 6 can be reduced to 0 if instead of guessing cluster words at random, we remember to play a few preferre words when a troublesome cluster appears. But actually, these three triples were found waiting in the list of 261 triples that allow a win by turn 5, that is, if instead of just the few preferred words we were to memorize dozens of rules, then we could force a win one turn earlier even in those 2%-3% of the cases when the game would otherwise go to the sixth round. (Triple (R3-32) can accomplish a 5-turn win by using 41 preferred words and 13 out-of-cluster moves; this total of 54 rules is minimal within that group.) These triples respectively manage to break up all but 7, 8, and 10 of the "problematic" quadruples mentioned in part (B). Metric M7 measures the expected length of a game (playing until the hidden word is found, potentially past the sixth turn). This is minimized by the stellar starting triple (R3-05) [bland, copse, right] which, on average, solves Wordle with this strategy in 4.1707 turns. This triple is also excellent by other measures. We have already seen in part (D) that it has the largest number of clusters (1954); and has the lowest daily average of number of possible solutions (an average of only 1.4501 words matches the tile colors). When using the guess-at-will strategy, it is the triple giving the highest probability of winning on the very next turn (84.4% of games will end by turn 4). And we'll see in part(F) that it also finishes fastest, on average, when using a strategy designed to win by turn 6. Nearly as good in terms of metric M7 are these four (which are are actually as good or better by some of the other metrics). So arguably any of these could also be called the best overall starting triple: (R3-04) [blimp, dance, short] (R3-07) [bland, copse, mirth] (R3-15) [blond, march, spite] (R3-14) [midge, porch, slant] Note also that [blond, spite] alone is a good starting pair, as we shall see in the next section. (Triple (R3-18) has the distinction of being the worst of the triples mentioned in this document by the following criterion: a very unlucky guesser pursuing a guess-at-will strategy with any of the dozens of highlighted triples will surely win by move 9 at the latest -- except it is possible with this one to still require a tenth move to enter the hidden word and win! This happens for example if the player guesses, in succession, the completely appropriate sequence rarer, baker, waver, eager, gamer, gazer, gayer with "gayer" being the hidden word.) ---------------- (F) BEST STARTING SETS IF YOU'LL USE A SIMPLE STRATEGY THAT FORCES A WIN I examined every starting triple that I believe has a reasonable chance to excel in this section for some good resolution of its problematic clusters. To be clear here, we are interested only in simple strategies that will guarantee a win by turn 6. That means we use a guess-at-will strategy whenever that will surely find the hidden word that fast; we only need a plan for the problematic clusters for which the guess-at-will strategy might not discover the hidden word until turn 7 or later. Rather than include the data in this narrative, I have collected the specifics for each triple mentioned in this document into a separate file. With this particular chosen strategy to guarantee a win by turn 6, we can then compute the probability distribution showing how long the game might then last. So let us turn right away to the triples with a strategy that wins with the fewest turns. Using the sets of preferred in-cluster and out-of-cluster words listed in this file, the starting triples which will take, on average, the fewest turns to find the hidden word are: (R3-05) [bland, copse, right], which needs 4.1682 turns on average (R3-14) [midge, porch, slant] (R3-04) [blimp, dance, short] (R3-07) [bland, copse, mirth] It is a consistent pattern that the ranking of good starting sets by metric M8 is very similar to their ranking by metric M7. This is not surprising: in this subsection we are assuming the player takes specific actions on turn 4 if the colored tiles signal the need to do so; but the affected clusters only account for a few percent of all the words, so with or without the extra rules, on most days we reach the answer just with freeform guessing. Hence the probability distributions are not greatly different. Similarly, the next few entries in the ranking are also score very highly by metric M7, too. Most of the next few entries have already appeared on other lists: (R3-16) [copse, gland, mirth] 4.1716 turns on average (R3-13) [blind, comet, sharp] (R3-33) [blond, right, space] (R3-15) [blond, march, spite] The top four are also at the top of the list of how often the hidden word is discovered right away on turn 4. For example, a player who enters BLAND, COPSE, and RIGHT on turns 1-3 and then follows this strategy will guess the hidden word on turn 4 fully 84.4% of the time. (The precise order for this criterion is (R3-05), (R3-04), (R3-07), and then (R3-14), which is actually also tied with (R3-12).) Dual to the most-wins-by-turn-4 record held by (R3-05) is the fewest-wins-on-turn-6 triple. In English: which of these strategies, designed to finish by turn 6, most often actually finishes by turn 5? The best I've found is for (R3-34) [brown, midst, place] So not only does this strategy guarantee a win by turn 6, it is *almost* a strategy to win by turn 5, while using many fewer extra rules than the 261 true win-within-5 strategies discussed earlier. Second place goes to (R3-35) [blown, caper, midst] which does not have a strategy that can guarantee a win by move 5, but still can accomplish it 99.13% of the time! (Triples (R3-09) and (R3-15) also scores very well.) Since triple (R3-05) has now appeared at the top of several lists, let me discuss this one in greater detail. Obviously after any starting triple, the player (assuming they know the Wordle dictionary!) can find the hidden word by move 6 it is contained in a cluster with only 1, 2, or 3 elements. Triple (R3-05) doesn't have many clusters that are larger than that! The four largest clusters have sizes 10, 7, 6, and 5; it turns out that to prevent a loss by turn 6 when looking within one of these clusters, we just to play the corresponding preferred words eater, femur, major, mover on turn 4, when they fit. There is a cluster of four _IGHT words that could lead to a loss too, but it can be avoided by using this rule: if FIGHT fits, play AWFUL That leaves only one cluster of size 5 and thirteen of size 4 and as it turns out, all of them can be resolved by freeform guessing. That's the playing strategy that will, over the long haul, need 4.1682 turns on average to finish the game, and will always do so by turn 6. In fact, it will finish on turns 4, 5, and with probabilities [0.843629, 0.144557, 0.011814]. Experimentally this triple also minimizes the expected length of a Quordle game, taking about 7.4 turns to solve all four subgames. This has become the starting triple I personally use. (If you try it too, and get stuck on a puzzle, the fourth word MURKY is the word most likely to be helpful.) Were it not for "user error", I would regularly be able to go hundreds of Quordle games between losses. (The probability distribution above would suggest that a four-round Wordle game would win in 9 moves only 97.1318% of the time -- one loss every 35 games or so -- assuming that the four subgames are independent; the fact that a much higher success rate is apparently possible shows the weakeness of this independence assumption. Apparently the information gained from the easy subgames greatly reduces the ambiguity in the harder subgames.) This is relevant as we compare this optimal (?) starting triple to the optimal starting quadruple, quad (4-1). An independence assumption would suggest that already for Dordle, and surely for the larger games, there is a lower probability of loss if we use a larger starting set instead of a starting triple; we made precisely this kind of claim when discussing quad (4-1) in the previous section. But this starting triple is seen experimentally to give fewer losses than (4-1), at least for Quordle, because there is substantial conveyance of information from one subgame to another now; the independence assumption is this time far from valid. It's beyond the scope of this article, but nonetheless it is possible to write out a complete decision tree showing the optimal cluster words to choose (rather than simply choosing cluster candidates at random); following this tree over the entire 2315-day cycle would involve entering a total of 9629 words, an average of 4.1594 per day. But we demur since the goal of this paper is to study simple ways to play Wordle, and a human cannot be expected to memorize the whole tree. Indeed, the other way to compare starting sets in this part (F) is precisely in terms of the simplicity of their win-by-turn-6 algorithms. Although using non-Wordle words allowed us in part (B) to find a starting triple that required only one extra rule ("guess SKATE"), we can't do that well when using only Wordle words in our starting triple. It appears that three is the minimum here, that is, every starting triple requires the player to remember at least three additional rules if he or she wishes to ensure a win by turn 6. The two triples (R3-30) and (R3-31) are the only ones that manage to do this using only "preferred" words. Some other starting triples also fall just 3 clusters short of meeting goal (B), but require one or two out-of-cluster resolutions: (R3-36) [copse, drawn, light] (R3-37) [force, glint, swamp] (R3-38) [blimp, cedar, ghost] (R3-39) [choir, gland, swept] (R3-40) [glint, peach, sword] I believe there are no other starting triples that lead to just three problematic clusters. Of those with four bad clusters, there are just these three that can resolve them all with in-cluster "preferred" words: (R3-20) [bench, solid, tramp] (R3-41) [blend, match, sprig] (R3-42) [blimp, cedar, thong] ---------------- (G) SOME OTHER FAVORITE TRIPLES We close with a list of a few additional starting triples that have a well-balanced set of attributes. (R3-43) [blond, parse, wight] (R3-44) [glide, spawn, throb] (R3-45) [crawl, fight, spend] These three are among the 261 starting triples that can be won on turn 5, have low failure rates when the player is just guessing, and have win-by-turn-6 strategies that use no out-of-cluster words. (The top and bottom ones use only 5 preferred words). The top two use all 9 of the most common letters. I believe (R3-45) is the fourth-best triple in terms of its failure rate when using guess-at-will. (R3-46) [clang, spied, throw] This is one of the 283 triples whose largest clusters have only 6 elements; it has several of them, unfortunately, but despite that, finds the hidden word in fewer moves than most. (R3-47) [crimp, doubt, salve] The triple appears on two short lists: all its clusters are of size 6 or less, and it is possible to create a strategy that allows the play to ensure a win by turn 5 . It even manages to accomplish these things without also being on the list of 15-letter triples! (But it's not particularly recommended for play: it takes 8 preferred words plus 4 out-of-cluster plays just to ensure a win by move 6.) ---------------- (H) SUMMARY OF "BEST" STARTING TRIPLES I said at the start of this section that no one triple could be called "best". Here I will do so anyway, for various ways to phrase the question: Rankings base on how the game board looks right after the starting triple: (R3-09) [blimp, dance, worst] lowest size of largest cluster (M1=6) (R3-05) [bland, copse, right] highest number of clusters (M2=1954)/lowest average cluster size (1.1847) (R3-04) [blimp, dance, short] highest number of singleton clusters (M3=1696) (R3-17) [bench, midst, polar] most words in small clusters (M4=2237) (R3-21) [brace, moult, shiny] most green tiles (G=3611 per 2315-day cycle) (R3-22) [built, crone, shady] most colored tiles (by fractional count) (G+Y/2 = 3605 + 5338) (R3-23) [curly, point, shade] most colored tiles (by total count) (G+Y= 3505 + 5517) (R3-28) [handy, slice, tumor] fewest bad-tiles days (one [1,0], fifteen [0,2]) (R3-29) [close, train; dumpy] "many" colored tiles after first two words Rankings based on game's end, after randomly guessing candidates: (R3-30) [blond, girth, swamp] lowest chance of failure (M6 = 0.02% -- once per 5190 games) (R3-05) [bland, copse, right] lowest average turn-count (M7 = 4.1707) Rankings based on game's end, after using simplest algorithm designed to ensure a win by move 6: (R3-05) [bland, copse, right] lowest average turn-count (M8 = 4.1682) (R3-30) [blond, girth, swamp] smallest number of extra rules to follow (3) (R3-34) [brown, midst, place] lowest use of turn 6 (0.8210%) Rankings based on game's end, after using an algorithm designed to ensure a win by move 5: (R3-01) [blast, midge, porch] lowest average turn-count (4.1914) (R3-32) [blend, right, scamp] lowest number of rules needed (54) ============================================================================== TWO We next ask what might be the "best" two-word starting sets. We saw in a previous section that there are many ways to define "best". We'll use those definitions here too, and will keep track of the best few as measured by each individual metric, in case we'd like to find a starting set that isn't "best" by any single criterion, but is at least "good" by many. When we looked at five-word starting sets, we discovered that having repeated letters was the only choice; among four-word starting sets it was a competitive choice, and among three-word starting sets, having repeated letters was an uncommon choice. Now, among two-word starting sets, we expect that having repeated letters will be a poor choice. So in what follows we will restrict our attention to pairs with no repeated letters. It turns out that there are exactly 196,175 such pairs of Wordle words. I have made a list of all of them, along with some basic data about each. (It is sorted informally to reflect a notion of expected "quality". At the top is [salon, trice]; at the bottom is [inbox, jumpy].) I have examined all these pairs against all the metrics in sections (A) and (B); in section (C) I ignored some of the least-likely candidates to be "best", but I suppose I one of those could end up being a good one. OK, then, let's follow the pattern of parts (D) (E) (F) of the previous section. As with the triples, we'll conduct our analysis here and store all the data about the individual pairs mentioned in a separate file. We'll summarize our list of "best" pairs at the end of this section. ---------------- (A) THE BEST STARTING PAIRS BY THE VARIOUS POST-START METRICS As with triples, we may rank all the (10-letter) pairs by their Lp metrics, for every real number p including +- infinity. The rankings of the top three only change at a few select values of p: p=+infinity [scald, tenor], [clone, stair], [nosey, trail] p=+6.55013012 X [clone, stair], [scald, tenor], [nosey, trail] p=+4.71796285 X [clone, stair], [scald, tenor], [coast, liner] p=+3.82766546 X [clone, stair], [scald, tenor], [cairn, stole] p=+3.67351974 X [clone, stair], [cairn, stole], [scald, tenor] p=+3.54643114 X [clone, stair], [cairn, stole], [coast, liner] p=+3.45486425 X [clone, stair], [cairn, stole], [salon, trice] p=+2.98298632 X [cairn, stole], [clone, stair], [salon, trice] p=+2.88072579 X [cairn, stole], [salon, trice], [clone, stair] p=+2.62287098 X [salon, trice], [cairn, stole], [clone, stair] p=+2.13659860 X [salon, trice], [cairn, stole], [close, train] p=+0.69814188 X [salon, trice], [close, train], [cairn, stole] p=+0.54007080 X [salon, trice], [close, train], [crane, spilt] p=+0.50624357 X [salon, trice], [crane, spilt], [close, train] p=+0.48959569 X [salon, trice], [crane, spilt], [price, slant] p=+0.21339525 X [crane, spilt], [salon, trice], [price, slant] p=+0.20671891 X [crane, spilt], [price, slant], [salon, trice] p= 0.00000000 X [price, slant], [crane, spilt], [salon, trice] p=-0.76193452 X [price, slant], [crane, spilt], [crane, split] p=-1.30720631 X [price, slant], [crane, spilt], [crest, plain] p=-2.97845882 X [price, slant], [crane, spilt], [lance, sport] p=-infinity The rankings for the largest values of p>0 agree with the rankings by metric M1, the size of the largest cluster. Every starting pair leaves some clusters of size 16 or more --- no matter what starting pair is played, there will be days on which there are 16 or more words that are consistent with the clues they provide. The top three listed pairs are the only ones whose clusters all contain at most 16 words (they have two, three, and four of them respectively). (S2-1) [scald, tenor] (S2-2) [clone, stair] (S2-3) [nosey, trail] (For example, if pair (S2-1) turns the E and R tiles yellow and the other eight gray, then the hidden word that day could be any of: bribe, brief, every, fibre, fiery, grief, grime, gripe, prime, prize, puree, purge, query, rhyme, rupee, where. And an all-gray tile display indicates the hidden word is one of the sixteen Wordle words that lack s,c,a,l,d,t,e,n,o, and r.) When p=2 we are measuring the daily average cluster size; the starting pairs that minimize this are (S2-6) [salon, trice] (S2-2) [clone, stair] (S2-10) [cairn, stole] For example the first leaves us with an average of 4.3633 possible solutions each day (but a maximum of 23). These three maintain their leading positions for a range of values of p which includes the case p=1 where all starting sets a momentarily tied. When p=0 we are measuring the total number of clusters, metric M2. There is a tie here between two very similar anagrams, each with M2=1071 clusters: (S2-4) [price, slant] (S2-5) [crane, spilt] These two maintain their lead for all negative p, including p=-infinity which is the metric M3: the number of singletons. (They have respectively M3=634 and M3=631: after just these two starting words are entered, there is already a unique possible solution word on over a quarter of all days, and simply guessing among possible solutions will score us a victory on turn 3 46% of the time.) The other starting pairs which finish in the top three for some p are (S2-7) [crane, split] (S2-8) [crest, plain] (S2-9) [lance, sport] [close, train] [coast, liner] We turn next to metric M4, which counts the number of words in "small" clusters. Since we are beginning with just a starting pair, we can confidently play through all words in any cluster of size four or less. For the starting pair (S2-6) SALON + TRICE, those clusters include 1543 of the 2315 words --- which is exactly 2/3 of them, and more than for any other pair. Close behind is (S2-4), and then a tie for third place between (S2-5) and (S2-10). (Counting by clusters instead of by words, the best pair is (S2-4) for which over 91.5% of the clusters contain fewer than five words. Close behind is (S2-2), and CAIRN + SLEPT is third. ---------------- The other metrics that we described in a previous section use the numbers of green and yellow tiles that result from playing the starting pair day after day. We can determine the pairs that produce the highest values of G + f Y (for various values of f >= 0 ) over the course of a 2315-day cycle. Once again the conclusion of which pair is best depends on the value of the parameter f : how much of a green tile is a yellow tile worth? Then the best pairs are these (S2-11) [crony, slate] if f < .4518, including f=0 (only green matters) (S2-12) [irony, slate] if .4519 < f < .8368 (S2-2) [clone, stair] if .8369 < f < .9135 (S2-13) [route, slain] if .9135 < f < 1 At the extreme of counting yellows and greens equally, (i.e. when f=1) there is a 13-way tie since for f=1 the score depends only on the letters used, not their positions. Thus all of these pairs will score equally (they each will produce 7062 colored tiles): alien,torus arose,unlit arose,until arson,utile louse,train noise,ultra outer,slain outer,snail route,snail sonar,utile solar,unite solar,untie and (S2-13)=route,slain because they are made of the same letters: a,e,i,o,u and l,n,r,s,t . Of these thirteen pairs, the one with the best distribution vector in the next subsection is (S2-13); for example it takes an average of 3.8400 turns to win using the guess-at-will strategy with this pair. These happen to be the 10 most-used letters in Wordle if we count by *words containing the letter*. If instead we count by *appearances of letters within words*, then C would replace U in this list; and as it happens the words containing AEIOCLNRST are (tied for) second in this ranking, with 7053 colored tiles. (Last in this ranking is JUMPY + WHISK, which yields only 3585 colored tiles.) It makes no sense in Wordle to value a yellow tile more than a green, but mathematicians might ask about values of f > 1 as well. For all f>1.02, the optimal pair is OCTAL + RESIN, which turns out to yield the most yellow tiles of any starting pair: 5827 of them, along with 1226 green. Besides these four best starting pairs, the other pairs that rank among the top three, for various values of f < 1, are: [brine, soapy], [briny, slate], [cairn, stole], [corny, slate], [crony, saute], [irony, stale], [rainy, stole], [route, snail] In the list of all 10-letter pairs mentioned above, I sorted all the pairs by a composite metric (basically combining M1 and M5); unsurprisingly, it begins with some of the pairs already mentioned (SALON+TRICE is first) but already by the seventh entry (SCONE+TRAIL) we meet a pair that isn't really close to "best" by any single measure, yet is fairly good by several of them. It's quite reasonable to balance competing metrics in this way; your results may be different and still good! Since I did the computations with the list of all possible 10-letter pairs, I can, just for laughs, report the worst possible 10-letter starting pairs by the various metrics. These include such gems as [jumpy, shock], [gawky, jumbo], [gawky, squib], and [ethos, umbra]. ---------------- (B) BEST STARTING SETS IF YOU INTEND TO JUST KEEP GUESSING I believe it is true that there is no starting pair with which the player can be assured of winning by turn 6, if the player just randomly selects a word consistent with the clues at each step. Metric M6 ranks the starting pairs by the probability of a loss. For this search I resorted to some heuristics to trim the search space a bit, so it is *possible* that there can be a better pair, but these appear to be the best. (Not only is there no starting pair for which this strategy is sure to win by turn 6, but I did not even encounter any pairs for which the strategy is sure to win by turns 7 or 8! A win by turn 9 is guaranteed for many, e.g. for CABLE + SNORT, but with many good starting pairs the player might continue making logical guesses and still not discover the hidden word for a long time --- e.g. starting with the (good!) pair SLANG + TRICE, it is nonetheless possible for the game to continue to the 12th move this way!) Here are the best pairs I have found, when judged by metric M6. Shown here is the computed (not experimental!) rate of failure when following a guess-at-will strategy after starting with each pair. (S2-14) [spend, trawl] 0.002421 (1 fail per 413 games) (S2-15) [blond, tramp] 0.002424 (S2-16) [blend, tramp] 0.002470 (S2-17) [bland, swept] 0.002515 (S2-18) [scold, tramp] 0.002596 (1 fail per 385 games) The (low!) failure rates are quite close, and there are plenty more good daily choices that would fail less than once per year. There is a consistent pattern to nearly all the pairs high on this list: the two words each have just one vowel, right in the middle. Metric M7 counts the average number of turns until victory (including games that use a 7th, 8th, ... turn) when the player continually tries randomly chosen words consistent with the clues. The best pairs by this metric are (S2-5) [crane, spilt], 3.700452 turns on average (S2-4) [price, slant], 3.702214 (S2-7) [crane, split], 3.712983 (S2-19) [cried, slant], 3.713729 (S2-20) [print, scale], 3.719044 (These all use the same 10 letters, except (S2-19) uses D not P.) Again, even further down this list, there is a pattern, and it's very different from the previous list: now there are always three vowels, usually AEI. There is a definite tradeoff between metrics M7 and M8. The best words in the second table will guess the hidden word on turn 3 nearly half the time, while the best words in the first table will do so less than one-third of the time. In the other direction, the pairs at the top of the second table usually have failure rates twice as high as the words in the first table --- even much higher as we read just a little beyond the listed portion of the table. Occasionally there are pairs that are reasonably good by both metrics, e.g. [crisp, table] [brace, spilt] [scalp, tribe] [clasp, trend] (We would like both the average number of turns, T, and the percentage P of failures to be low, i.e we want the pair (T,P) to be close to the origin; but for all the pairs studied, 3T+2P tends to be larger than about 12, so that's a description of the tradeoff.) For point of reference, I took a pair at random from near the center of the list of all 10-letter pairs ("center" according to the informal combination metric I mentioned in part (A) ) : RECAP + WOULD. This unremarkable starting pair could also be used with a guess-at-will strategy, and the results are still fairly good: one could expect only one failure per 123.9 games, and an average of 3.9954 turns per game. So our "best" starting pairs are distinctly better, although for the average player playing only once per day, it may be difficult to see the difference! ---------------- (C) BEST STARTING SETS IF YOU'LL USE A SIMPLE STRATEGY THAT FORCES A WIN As with starting triples, I examined each starting pair that could reasonably be expected to be highly ranked in this section, and determined a minimal strategy to ensure a win by turn 6. That means checking each of the clusters created by the starting pair, and using the simplest resolution of it: (1) If that cluster will surely find the hidden word by turn 6 using a guess-at-will strategy, the player will pursue that. (2) If not, but if one "preferred" word in the cluster will give enough extra information to finish by turn 6, then play it on turn 3. (3) if not, but a non-cluster word will separate the cluster into smaller clusters that can each be resolved by turn 6 using guess-at-will, then play that word on turn 3. If in (2) or (3) there are multiple candidate words to play, then play the one that gives the best probability distribution. I've worked out resolutions for all the clusters, for several hundred of the most promising starting pairs. Unfortunately it seems that for most pairs there are clusters that do not resolve in any of these three ways. In such cases we can use two-word solutions in those cases, or a recursive use of "preferred" words. With a strategy in hand for each of the starting pairs, we can compare them using metric M8. The five best starting pairs by this metric are the same as the ones that are best by metric M7, and in the same order, though the numbers of turns are reduced: (S2-5) [crane, spilt] 3.6591 (S2-4) [price, slant] 3.6618 (S2-7) [crane, split] 3.6739 (S2-19) [cried, slant] 3.6757 (S2-20) [print, scale] 3.6771 These are really good results --- 3.6591 is one of the lowest expected numbers of turns of any strategy described in this document! And again we may compare to the run-of-the-mill starting pairs: the example RECAP + WOULD only improves to an average of 3.9072 turns per game when using its best minimal strategy. But these results come at a cost: let's see what is needed to achieve them. The top example, (S2-5) CRANE + SPILT, guaranteess a win by turn 6, and most of the 1071 clusters can be dispatched by free-form guessing. But there are 30 that cannot. This includes 25 clusters which can be handled by using a preferred member: above, allow, awake, batch, bevel, blade, corer, dingy, ditch, ditty, dogma, dumpy, earth, foist, goody, gouge, grade, haven, marry, merge, otter, sewer, vomit, wager, women An additional four clusters require ordinary out-of-cluster solutions: [billy, bawdy], [bound, bawdy], [bully, fjord], [daunt, judge] Then in addition, there is the largest cluster: 33 words that yield a yellow E and yellow R. We cannot pin down the hidden word among them unambiguously by playing any single word on turn 3 (neither from within nor outside the cluster). We can still guarantee a win by move 6 if we remember rules for turn 3 and turn 4: use the preferred word "berry" on turn 3; then on turn 4 (only), if either "rover" or "mover" fits, play it. Otherwise, guess at will. Potentially this algorithm is just within a player's ability to memorize; for such a player I have prepared a cheat sheet. In the previous subsection we noted this starting pair also used fewer turns on average (3.700452) than any other starting pair, when using a guess-at-will strategy. The distribution vector with that strategy is [.462635, .404369, .109006, .019043, .003868, .001008, .000068, .000003, .00000005] (with a 0.50% failure rate). But with the strategy above it changes to [.460907, .430587, .097044, .011462] The best strategy for PRICE+SLANT is nearly the same because these two starting pairs are not only anagrams of each other, but have seven of the 10 letters in the same positions! So about 80% of the clusters for one starting pair are the exact same sets of words as clusters for the other starting pair. Please refer to the strategy cheat sheet for more details. (For the clusters that are different for the two starting pairs, it is often possible to make some minor adjustments in the proposed algorithms so that we not only guarantee a win by turn 6 but reduce the average number of turns just a little.) The top two pairs keep their ranking, and the list of also-rans changes little, if we ask for the frequency by which the game is won by turn 3, or by turn 4. It's a little different if we look for wins on turn 5, that is, how often do we really need a sixth turn? In that case the winners are (S2-19) [cried, slant] 0.918% of the time, a 6th turn is needed. (S2-16) [blend, tramp] 0.968% [crowd, slept] 0.978% [scald, trope] 0.985% [chide, slant] 0.986% (For comparison, CRANE + SPILT will go to the sixth turn 1.146% of the time if we handle the problematic clusters as above.) What is happening here is that the ranking starts to resemble the first list in part (B), ranking pairs by their ability to end in six turns, using the guess-at-will strategy. ---------------- The other criterion we care about, when comparing candidate starting pairs that each use a strategy designed to guarantee a win by turn 6, is to try to optimize the simplicity of the algorithm that guarantees a win. Clearly with 30 additional rules to learn, the two examples above are only barely simple enough to be used by a casual Wordle player! The starting pair with the "simplest" algorithm that I have found is (S2-21) [blond, spite] It has "only" 23 problematic clusters, and all can be resolved without any rules that extend beyond the first turn after the pair. It will lead to a guaranteed win by turn 6, with an average of 3.8237 turns, if we simply follow rules (2) and (3) for these exceptional clusters. A simplest strategy involves only two out-of-cluster words: { aider, cater, chain, charm, chart, crave, crest, fifth, folly, girly, grill, legal, mayor, money, scary, scree, shrew, stark, trait, twang, wager, [found, wharf], [mover, rocky] } Alternatively, using the word MARCH on turn 3 resolves many of the 23 clusters by turn 6, that is, if forget some of the 23 rules while playing, we could simply revert to the six rules already discussed for this starting triple (R3-15) in the previous section. Among the many starting pairs I have so far studied, none has fewer than 23 rules for exceptional clusters like this, and I have found only one other starting pair that creates only 23 tricky clusters: (S2-22) [blond, trace] A strategy that works uses four out-of-cluster words: { favor, fetal, folly, gaunt, gipsy, grape, harpy, mouth, palsy, pinch, serif, serve, setup, shake, shift, sigma, smell, swill, vague [catch, champ], [found, swamp], [gamer, gawky], [mover, whisk] } Using these 23 rules will guarantee success by turn 6, taking on average 3.7461 turns. It is by this measure that these "best" starting pairs really are better than the run-of-the-mill starting pairs. For example, the pair RECAP + WOULD, mentioned earlier, requires using 45 preferred words and 5 out-of-cluster pairs to resolve its many problematic clusters! This seems to be typical for pairs taken from the middle of the list of all 10-letter pairs, but the "best" pairs only need about half that many! I also have found one starting pair that never requires an out-of-cluster word to be played: (S2-23) [scold, tramp] Simply play these 27 preferred words if they fit the clues: { bagel, bugle, catty, deign, diner, feign, flake, fleck, folly, habit, haste, liken, nerve, novel, prone, range, rotor, rough, serif, shunt, skate, stone, swine, taint, vegan, white, wince } This wins every time but takes an average of 3.8436 turns. (SWAMP + TREND also avoids all out-of-cluster words, but needs 36 preferred words to ensure a 100% win rate and averages 3.8504 turns.) Other examples that never require out-of-cluster words are trickier. For example, CLASH + TRIPE can be won every time by playing preferred elenents of the 28 trickiest clusters, including { badge, bowel, brand, bread, broad, budge, crown, diner, dingy, ditto, dolly, drove, exult, filmy, frond, grade, gumbo, jaunt, mangy, marry, meter, modal, sewer, stung, taunt, wager, woven } However, the largest cluster ("yellow e & r") requires recursive use of preferred words. Guess "derby" on turn 3 if it fits; then (and only then) if "mower" fits, play that *on turn 4*, and then (only then!) if "roger" fits, play it *on turn 5*. If that's STILL not the right hidden word, it nonetheless gives enough information to choose between "goner", "joker", and "rover" on turn 6! This procedure averages 3.7249 turns for a win. An interesting candidate for "simplest" is (S2-24) [gland, swept] This starting pair leaves 35 clusters that require special treatment; 34 of them can be resolved using a preferred member of the cluster, and the last (the one including BERRY) can be resolved using the out-of cluster word ROCKY. But alternatively we can use CHOIR for 33 of the first 34 --- all except the one containing [arbor, armor, favor, major, mayor, razor] since we noted in the previous section that (R3-39) CHOIR + GLAND + SWEPT is a good starting triple, having itself just three difficult clusters, this set being one of them, (It can be resolved using MAYOR as a preferred word.) In other words we have a "simple" algorithm for winning Wordle: Start with GLAND + SWEPT and see which cluster contains the day's word. play ROCKY if the cluster contains BERRY, play MAYOR if the cluster contains MAYOR, play CHOIR if the cluster is any of the other *problematic* ones, Otherwise, guess at will. But this is a cheat! To use this algorithm we must recognize those 33 clusters as they arise, which is no easier than remembering the preferred words that signal them. But this does suggest a hybrid algorithm for Wordle -- something between a 2- and a 3-word starting set: Start with GLAND + SWEPT. Next, play ROCKY if the word could be BERRY, play MAYOR if the word could be MAYOR, play CHOIR otherwise. Then, guess at will. In practice this is very much like the procedure for the starting triple CHOIR+GLAND+SWEPT; it simply skips a step in about 1% of the cases. ---------------- (D) SINCE WE'RE ALREADY CONSIDERING COMPLEX STRATEGIES... Throughout this document, we have emphasized just two strategies that a player might use after the starting set is played: if not using guess-at-will all the time, we at least assume the player would follow that strategy for the clusters which will surely lead to a victory by move 6. Just for this starting set CRANE + SPILT, however, I considered a couple of other options. First of all, the player might commit to memory a preferred word to use in every cluster, including the sub-clusters that appear after turns 3, 4, and 5. This would produce an entire decision tree which, if used daily, would finish the game on the Nth turn this many times during an entire 2315-day cycle, always choosing a word within the clusters: 1, 1, 1069, 1071, 153, 16, 3, 1 for a total of 8384 words entered (an average of 3.6216 per day --- of course that's better than guessing randomly within each cluster!) We can also combine the two strategies: use the preferred word from every cluster, as in the previous paragraph, except for the five clusters that required an out-of-cluster word or two-word strategy used earlier. Then the probability distribution vector that replaces the ones in the previous paragraph is [.461339, .465659, .066955, .006048] to get an average of 3.617714 moves per game, which is lower than every other algorithm we discuss in this document --- although we have now strayed very far from our goal of "simple" ways to play Wordle! In fact, we have at that point run almost the complete analysis (for this starting pair) which other researchers have done; the only additional "improvements" one could make to the algorithm at this point would be to consider the use of out-of-cluster words even for those cluster that can be resolved by move 6 without them; and then, finally, to allow the use of Wordle's "other" words --- the words that are not in the solution list but are allowed as input. In fact I believe these two additional refinements would add very little. The two starting pairs that I have found to be "best" by metrics M7 and M8 are also the highest two (among the pairs that use only Wordle answer-list words) on Alex Selby's list. He computes the average numbers of turns used assuming the player plays optimally after the starting pair. That means not only memorizing a word to be played from each cluster (and sub-cluster) but also pre-computing that word from among all the Wordle-permitted entry words (not just the Wordle answer-list words, as I have done, and certainly not just the words within the cluster!) Of course with more options to select (and assuming a very compliant player!) one can then expect the average number of moves to decrease. But it doesn't go down by much. We can compare the average numbers of turns for the four strategies: (a) random guessing (b) minimal win-by-6 (c) previous paragraph (d) optimal (Selby): CRANE+SPILT: 3.7005 3.6591 3.6177 3.6003 PRICE+SLANT: 3.7022 3.6618 3.6037 Since this document intends to be an analysis of a *human* player's options, the second column is probably the limit of our investigations. Clearly an average game length of even 3.6591 is better than anything we obtained using fixed starting triples. But when applied to an N-fold compound game, this would imply (an upper bound for) the length of the game being 2 + 1.6591 N. For N=1 this is clearly better than say the bound 3 + 1.1682 N which we obtained in the previous section. But already for N=2 the advantage is nearly lost; so even for Dordle it is not clear that we are better off with the best starting pair than we would be with one of our good starting triples. It is unlikely to be better for Quordle and beyond. I have found subjectively that I do less well on Quordle even with this "best" starting pair than I do with good starting triples. ---------------- (E) SUMMARY OF BEST STARTING PAIRS I have not finished an exhaustive search of word-pairs but I have looked at all pairs drawn from what I consider the "better half" of all Wordle words. (This informal ranking provably works well in the next section, hence my optimism.) I am running a background process at home that sifts through promising pairs; for each one it is necessary to identify the problematic clusters and to find in- or out-of-cluster words that can resolve them, if possible; I can also search for procedures to resolve the clusters which cannot be won with these tools, and then compute the probability distribution showing the frequencies with which this algorithm will end on turns 3, 4, 5, or 6. Over time I may use these results to update this section. But it seems clear that these procedures to guarantee a win with a particular starting pair are inevitably very complicated for a human to use, and unlikely to be useful for the compound games. As at the end of the last section we can summarize the reasons to declare a starting pair "best": (A) Rankings base on how the game board looks right after the starting triple: (S2-2) [clone, stair] lowest size of largest cluster (M1 = 16) (S2-5) [crane, split] highest number of clusters (M2=1071)/lowest average cluster size (tie) (S2-4) [price, slant] highest number of clusters (M2=1071)/lowest average cluster size (tie) (S2-4) [price, slant] highest number of singleton clusters (M3=634) (S2-6) [salon, trice] most words in small (< 5) clusters (M4=1543) (S2-11) [crony, slate] most green tiles (G = 2692 per 2315-day cycle) (S2-12) [irony, slate] most green & discounted yellow tiles (G = 2528, Y = 4494) (S2-13) [route, slain] most colored tiles (G+Y = 7062) (B) Rankings based on game's end, after randomly guessing candidates: (S2-14) [spend, trawl] lowest chance of failure (M6 = 0.24% -- once per 413 games) (S2-5) [crane, spilt] lowest average turn-count (M7 = 3.700452) (C) Rankings based on game's end, after using an algorithm designed to ensure a win by move 6: (S2-5) [crane, spilt] lowest average turn-count (M8 = 3.6591) (S2-19) [cried, slant] lowest use of turn 6 (0.918%) (S2-21) [blond, spite] smallest number of extra rules to follow (23) ============================================================================== ONE We can address the one-word starting sets in all the multiple ways we have looked at larger sets in the last three sections. Since there are only 2315 candidates this time, we can apply most of our testst comprehensively to every option. (Unsurprisingly, the starting sets without repeat letters are again better!) (A) THE BEST STARTING WORDS BY THE VARIOUS POST-START METRICS Again we can find the top-ranking choices for every Lp metric; the rankings stay constant across intervals as p varies across the whole real number line. I have computed the successive lists of top-5 words (for each p); here there is space just to list the single best word for each p . It is: raise for p > +0.9112781617 slate for +0.4922321810 < p < +0.9112781617 trace for -0.3589050698 < p < +0.4922321810 parse for -0.9281602628 < p < -0.3589050698 filet for -1.8523351140 < p < -0.9281602628 brute for p < -1.852335114 For large p, ARISE is in second place, but it and RAISE are tied at p=+infinity, where the Lp metric becomes metric M1, measuring the size of the largest cluster. But even they have clusters of 168 words (in both cases, it's the set of words that contain none of the letters A,E,I,R, and S). All other words leave a cluster that's even bigger; ALONE has one of size 182, then AROSE, ATONE, RATIO, ... At p=+2 we are measuring the daily average of the size of the cluster containing the hidden word; RAISE is the winner here too, with an average of 61.0009. (Even though there are only 10 clusters larger than this, the player will encounter those ten very often!) Next best are ARISE, IRATE, AROSE, and ALTER. At p=+1, all words give the same value to the Lp metric (namely 2315), but at this point RAISE is the word for which the Lp metrics are growing most slowly, followed by SLATE, CRATE, IRATE, and TRACE. At p=0 the LP metric counts the number of clusters. By this point, TRACE has claimed the lead, with 150 clusters. That gives TRACE the smallest average size of its clusters (15.4333 words per cluster) and the greatest likelihood for a person to guess the hidden word on turn 2 (6.48% of the time). Runners up are, in order, CRATE, SLATE, PARSE, and (tie) CRANE and STALE. At p=-infinity the Lp metric simply counts the singleton clusters, that is, the most words which are known unambiguously after the starting word is entered. BRUTE and CHANT are tied for the most; but "most" isn't many -- they only pin down 40 words in the Wordle dictionary! (Next come METRO and SPILT with 39, DINER and HORDE with 38.) Our other post-starting-set metrics are based on counting the colored tiles they produce. The words that produce the most green tiles over the six-year cycle are SLATE (with 1437 of them), SAUCE (1411), and SLICE (1409). Treating a yellow as half a green the highest scoring words are STARE (1326 + 2761/2), then AROSE (2670.0) and RAISE (2668.5); SLATE drops to fifth. Treating a yellow as equal to a green gives a tie score to words with the same letters; on top are ALERT/ALTER/LATER (4117), then IRATE (4116), AROSE (4093), STARE(4067), and RAISE/ARISE (also 4067). Of course we can rank the words by the value of G + f Y where a yellow is valued at a fraction f of a green; we just worked out f=0, f=0.5, and f=1. Then SLATE is indeed best if f=0 or any f less than f=0.37 . At f=0.37 it's a tie between SLATE and STARE. Then STARE is best for larger f until f=0.842, when IRATE takes the lead. At f=0.993 the best becomes LATER, which keeps its title until f=1, as above. For f > 1 the best is ALERT unless you for some reason value yellow tiles more than 3.25 greens; at that point OPERA is the best word simply because it gives the largest number of expected yellow tiles. ---------------- (B) BEST STARTING WORDS IF YOU INTEND TO JUST KEEP GUESSING The guess-at-will strategy has a distinctive feature when applied to starting sets of size 1. Since we are starting with just a single fixed word before starting to guess, and with this strategy we'll guess only within clusters formed from previous guesses, this means the player is also following a strict form of Wordle's "hard mode"! (The game's "hard mode" is actually more permissive: a letter that has previously come up gray can be used again even though from our perspective the new word is now clearly not in the same cluster as the hidden word; likewise the game allows a repeat of a letter in the same spot after the tile is yellow, while for us that also indicates the new word is out-of-cluster.) The two natural metrics to compute for a strategy of free-form guessing are (1) the average number of turns needed to discover the hidden word, and (2) the probability of doing so by turn 6 (i.e., "winning"). (1) A preliminary sort of the words in the Wordle dictionary works very well to quickly locate the words which find the hidden word fastest. From the probability distributions of these words I can compute the average numbers of turns until the hidden word is found. (As always, this includes counting as 7, 8, etc turns those rarer occasions when the player will continue past the 6-turn limit --- which is now a fairly common event when starting with just a one-word starting set!) Ranked by average numbers of turns, the list of 1-word starting sets begins: slate, 3.8218 turns on average to complete Wordle least, 3.8327 trace, 3.8420 stale, 3,8463 crate, 3.8490 slant, 3.8509 leant, 3.8541 plate, 3.8618 dealt, 3.8659 react, 3.8662 ... Note that all of these are significantly *more* than the the counts of the better two-word starting sets. "Guess-at-will" is frankly not an efficient strategy for one-word starting sets. The problem is that after playing just one starting word, we are left with some very large clusters, and truly guessing candidates from them at random is a very inefficient way to uncover the hidden words in them. (Just so we're clear here, these are calculated probabilities, not empirical data. The actual expected number of turns for SLATE, for example, is exactly 2143345083855809867374480229283246322413669919670571681198349 ------------------------------------------------------------- 560821042520148446351329436583213214101064845220300800000000 So we do have the precision to rank these properly!) (2) Though it may not be efficient (few turns) this strategy is certainly simple, and *may* also be effective, as judged by the probability of a win by turn 6. My preliminary sorting has not been as helpful at suggesting candidates for "best" by this metric, and so I am still computing data for more candidates but it appears the winners are: clasp, .992797, about one loss in every 138.83 runs scalp, .992524, 133.76 splat, .992047, 125.75 spelt, .991969, 124.52 slept, .991902, 123.50 spilt, .991801, 121.98 split, .991771, 121.53 Again it's worth observing that even though we have five more turns allowed after a single start word (compared to just four more after a starting pair), we will still suffer a loss at least three times as often as we do with the best starting pair. There's also a sequence of rankings more or less intermediate between these two lists. The ordering in table (1) is similar to a ranking already mentioned above listing the words that have the greatest chance of winning by turn 2 (TRACE leads the pack with 6.48% of games ending on turn 2, then crate, slate, and parse). We can also sort by the fraction of games finished by turn 3; it's SLATE at 39.42% (then trace, then crate, then least). Sorted by the fraction completed by turn 4, it's SLATE at 78.56% (then least, then slant, then stale). Finally sorting by the fraction completed by turn 5, the highest is CLASP at 95.27%, now followed by slept, spilt, and slant. The last ranking in this sequence would be the same as the ranking by the rate of victories, which is table (2) above. More of a curiosity than anything else, I suppose: table (2) clarifies that no matter what word a player starts with, an unlucky guesser should expect occasionally to need more than 6 words to guess the hidden word. But in fact, For most starting words, on the worst days it will take 12, 13, 14, or 15 turns to find the hidden word. The closest exceptions I have found are the starting words SPORT, STONY, STORM, STORK, and SCOLD, each of which always finished by the 11th move. With SPORT as the first word, and following a guess-at-will strategy with Wordle answer words, a player will always finish on turns 2 through 11 with these frequencies: [.05097192, .29439833, .41291251, .19020249, .04183704, .00790584, .00158518, .00018201, .00000464, .00000004] for an average of 3.9085 turns and a 0.97% chance of loss. The situation with the other four is similar, although their distributions are skewed more heavily to the right. As we have moved through this document, looking at shorter and shorter starting sets in each section, we have seen that the numbers of moves needed has generally decreased. The information gained from the shorter starting sets is obviously weaker, so more turns will be needed, on average, to finish the puzzle after the starting words are entered. But for good staring sets, those additional turns have been few, and their number increased only slowly as we moved from section to section. In particular, the average number of turns taken for the best starting sets was lower for starting triples than for starting quads, then lower again for starting triples. But the pattern has now ended: as we drop from studying starting pairs to considering a single fixed starting word, the average game length has increased. (Using in both cases the guess-at-will strategy, we saw that the starting set CRANE + SPILT would take an average of 3.7005 moves. Our best single starting word SLATE requires 3.8218 moves on average.) A similar conclusion has been drawn in a slightly different context. In a nutshell: if you will simply use a guess-at-will strategy, asking "What is the best starting word?" is pointless; it's always better to ask for the best starting pair unless it's especially important to you to have a fighting chance to finish the game on the second turn. (The probability of that is at best 6.4%, which happens if your opening move is TRACE). One-word starting sets can finish more quickly than the best two-word starting sets, but only if we use more complex follow-up strategies than "guess at will", as we shall see in the next subsection. ---------------- (C) BEST STARTING WORDS IF YOU'LL USE A SIMPLE STRATEGY THAT FORCES A WIN Well, the guess-at-random strategy of part (B) isn't looking as good for any starting singleton as it did for some starting pairs. What about the win-by-turn-6 strategy? It turns out that, whatever the efficiency of such algorithms, their complexity has risen to levels that surely make them unsuitable for regular human use. So I don't expect I will bother trying to compute them for each of the 2315 possible 1-word starting sets to pick a "best". (Moreover, these strategies of preferred cluster members and so on take considerable time to find in the first place.) So we will content ourselves with a single promising example: What would be our strategy if we started with just the one word that's most efficient for a guess-at-will strategy: SLATE. Right away from the cluster vector we can tell that such a one-word starting set is going to be trouble: it begins as [29, 20, 10, 9, 6, 10, 4, 4, 0, 4, 2, 4, 4, 3, 2, 1, 4, 1, 2, 1, 1, 0,...] (which counts the numbers of clusters of sizes up to 22) and then is a sparse list, as the remaining clusters have sizes [23, 24, 25, 27, 28, 28, 31, 31, 32, 37, 39, 39, 42, 48, 51, 56, 58, 61, 61, 72, 86, 87, 107, 136, 165, 221 (!) ] So the Wordle dictionary is split by SLATE into relatively few clusters (only 147 of them), some very large -- a situation very different from what happens with 4, 3, and even 2-word starting sets! To amplify what we've already said about the guess-at-will approach: if after we enter SLATE we simply begin guessing words consistent with the growing list of clues, we will guess the word on turns 2, 3, 4, ... with these probabilities: [0.0634989201, 0.3306985566, 0.3913739602, 0.1629811789, 0.0401178793, 0.0092995402, 0.0017820878, 0.0002364096, 0.0000112207, 0.0000002431, 0.0000000036, 0.0000000000] That last number isn't really zero, actually; the probability of the game ending on the THIRTEENTH turn is small but nonzero: about 2 x 10^(-11). From this distribution we can compute that the average game will last 3.821799 turns (as in Table (1), above), and will fail to complete by turn 6 about 1.1330% of the time. As already noted: guessing isn't a good strategy with one-word starting sets. What about pursuing a simplest-possible strategy that can guarantee a win by turn 6? We can do so but will have to devise recipes for the FORTY-FOUR(!) clusters that could fail to complete by turn 6 using freeform guessing. We attempt this as has been done in previous sections. The natural strategy would be to use preferred cluster members when possible and out-of-cluster words when not; we can do this but in six cases we need out-of-cluster pairs (as we did for CRANE + SPILT). (I have not proved that these 6 multiple-out-of-cluster choices are optimal but at least they do work.) Here is the strategy in the now-familiar format {preferenes, [signals,actions]}: { aback, abide, abled, adapt, adept, agile, alarm, album, alien, amity, avail, befit, belch, belie, beset, bigot, bison, blade, bleed, blimp, bloke, boost, bused, chose, cutie, sandy, scare, scene, scion, scout, screw, slick, [baste, batch], [scant, frisk], [adage, crimp], [birth, brown], [afoul, manic], [binge, doing], [abbot, [cough, thumb]], [abbey, [cigar, bawdy]], [billy, [drill, wharf]], [abhor, [grind, macaw]], [beech, [rowdy, cabin]], [biddy, [frond, chump]], The distribution of game lengths for this procedure works out to be [0.058315, 0.308362, 0.477195, 0.141094, 0.015033] giving an average of 3.746167 turns to complete and win. That's a notable improvement over a guess-at-will approach, but comes at the expense of having to learn *62* words and the context in which each one applies. (The improvement is because over three-quarters of the Wordle words are in clusters where at least one preferred word has been computed; unlike the situation with larger starting sets, these many additional rules are doing much more than eliminating fringe cases!) To add insult to injury, this strategy still does not finish faster, on average, than the comparatively-simpler strategy for CRANE+SPILT discussed in the last section. Since already in this example we are confronting strategies that could hardly be called simple, we might as well consider other strategies too, even if they are yet more complex, if they offer some other benefit. The next few paragraphs will lead us in that direction, but I don't think I've found the definitive narrative to follow here. ---------------- (D) AREN'T WE NOW DOING HARD MODE?... Let's speak a bit more about original-Wordle's "hard mode". All the time we have discussed starting sets with two or more words, the sequences of words that we have been proposing will be inconsistent with the hard mode rules on most days (unless, say, the colored tiles all come back gray after the first word!) But now that we are discussing a starting set of just one word, it is conceivable that we could carry out our procedures in a hard-mode game. We have already noted that playing a guess-at-will strategy is consistent with (a strict version of) the hard-mode rules; the same will be true whenever we play a "preferred" member of a cluster. Indeed, we have up to this point barely mentioned the possibility of identifying preferred words on a recursive basis (picking out preferred members of sub-clusters to be used on later turns) but this also would be consistent with (a stricter form of) the "hard mode" rules. So it is natural to ask, can we do that? Can we get a hard-mode strategy for playing Wordle, starting with SLATE ? Suppose some day you begin with SLATE, and in reponse you get green T,E and yellow S,A tiles. If you are playing in Wordle's "hard mode", you must then play a word ending in TE that also has S and A (but does not[*] begin with S). The only such Wordle solution-words are baste, caste, haste, paste, taste, waste and one of those is the day's hidden word. But each time you play a word from this cluster, if it's not itself the hidden word, then you gain no additional information about the hidden word. So it could take you six more turns to stumble on the correct word, at which time the game could have ended. In short: any strategy that can start with SLATE and guarantee a win by turn 6 must use an out-of-cluster word for this cluster, and so is not permitted by the rules of hard mode. [*UPDATE -- I am not a regular user of "hard mode" and only recently learned that this is not true: apparently Wordle would also allow the use of SAUTE, for example, which is not in the same cluster as SLATE. So the remarks in this paragraph and the next actually apply only to a hypothetical "strict hard mode" that DISallows the use of letters that have already been given grey tiles, or yellow tiles in the given positions. But as it turns out, the conclusion for SLATE is unchanged: using hard mode (and using only Wordle's solution-list words), it is not possible to guarantee a win by move 6 if we start with SLATE.] This situation is not unusual: there are many other "good" starting words for which the cluster that contains "baste" could potentially go unresolved by move 6 if we insist on playing in "hard mode"; this includes not only SLATE but also STALE, STEAL, SLANT, STARE, TRAIL... Other patterns that cause the same problem for other starting words include "(h)atch" (for TRACE, CRATE, REACT, LEANT, DEALT, ...); "(j)aunt" (for CLEAT, ALERT, LEAPT, CRANE, ...) and "(w)ound" for TRADE... Indeed, the starting words TRACE and REACT are generally good choices, but each of them creates one cluster of *seven* "_ATCH" words; using hard mode obligates us to then spend up to seven turns looking for the hidden word, so the game may not conclude until turn 8 ! Nonetheless, there ARE starting words which permit the construction of a win-by-turn-6 strategy that involves only using preferred words within clusters, and this gives a strategy that is permitted in "hard mode". However, in all the cases I have checked, the strategy must be used recursively: we have to map out some preferred words not only for turn 2 but also for later turns (that is, there are words that are preferred for a cluster but only after a previous preferred word has been tried.) One example I have worked out this way is PARSE. I first worked out the consequences of a guess-at-will strategy starting just with PARSE. The probability distribution for this strategy is [.06306695, .30292396, .38894339, .18507397, .04831372, .00951574, .00179633, .00032386, .00003878, .00000313, .00000016, .00000001, .00000000 ]: so (a) a very unlucky guesser could have to go to the 14th turn to discover the hidden word; (b) the average number of turns needed is 3.890251; (c) there is a 1.17% chance of failing (by turn 6). I next found a standard (non-recursive) strategy for each of the clusters that can lead to a failure that way, just as I did (above) for SLATE. For PARSE there are 40 such clusters; the simplest procedure for dealing with them is summarized as: { alike, carat, caste, debar, drift, gavel, grace, hoist, merry, metal, noose, pouty, pride, prong, rayon, reply, resin, salon, short, slash, slate, smart, spilt, stalk, steak, stink, stone, tonal, torch, tribe, truss, verge, [boule, build], [cable, gulch], [craft, droit], [lager, light], [badly,[child,tawny]], [blind,[blond,fight]], [betel,[ditch,nobly]], [refit,[blown,ditch,fever] ] } So 32 of the clusters are resolved by using a preferred word inside the cluster. Then, four clusters each require ONE out-of-cluster word; three need a PAIR of out-of-cluster words, and as far as I can tell the last one requires using THREE fixed words, to be played on turns 2, 3, and 4! This guarantees a win by move 6; the fractions of the time that the word is found on turn 2,3,4,5,6 are [0.05961123, 0.32074700, 0.49688725, 0.11121149, 0.01154303] for an average of 3.695328 turns. But (unlike for SLATE) for the starting word PARSE, we can write other rules for the last 8 clusters, ones that involve only words within the clusters. The following algorithm will work after PARSE is played on turn 1, and will determine the hidden word by move 6: **On turn 2, just start guessing, unless one of above 40 fits, in which case, play it. (the 32 words "alike, carat, ..." and 8 first-halves "boule, cable, ...") **On turn 3, just start guessing, unless one of these 20 fits, in which case, play it. [from BADLY] magma, tawny [from BETEL] cello, enjoy, woven [from BLIND] cumin, dowdy, filmy, gulch, joint, witty [from BOULE] undue [from CABLE] gauze [from CRAFT] board, drawn [from LAGER] water [from REFIT] brief, diner, rebel, wreck **On turn 4, just start guessing, unless one of these 2 fits, in which case, play it. [from MAGMA] havoc [from WRECK] bluer **On turn 5, just start guessing, unless one of these 2 fits, in which case, play it. [from HAVOC] taint [from BLUER] homer (It will only happen that we need the special rule on one turn if we had followed a particular special rule on the prior turn.) With this set of rules, the probability distribution is [0.06306695, 0.37474268, 0.41821998, 0.12766482, 0.01630556] from which we compute the average number of moves to be 3.659399 . Again this guarantees a win by turn 6, but now this strategy can be used when playing "hard mode"! (If you wish to play along this way, you might appreciate a brief cheat sheet I wrote up.) As has been typical for all our larger starting sets, most of the clusters will surely be resolved by turn 6 just by guessing candidates at random. What's different with 1-word starting sets is that there are so few clusters (146 with PARSE) and while, yes, 106 out of 146 can be safely handled by freeform guessing, those clusters account for only 488 words -- only one-fifth of the Wordle dictionary. The next largest group of clusters is the set of 32 of them for which we have picked out a prefered word, to be used in both the second and third algorithms. Just giving some guidance for these clusters accounts for the improvement from the first to the second strategies, because nearly half the dictionary (1014 words) are in these clusters, so anything better than unguided guessing will affect the number of moves almost every other day! The rest of the words are contained in just 8 clusters, but they are large: together they hold 883 words; dispensing with an out-of-cluster guess --- in some sense a wasted turn --- on every one of those 883 days accounts for why the third algorithm can be better than the second. (Sadly, both the second and third procedures again require memorizing some five dozen words, properly ordered. So it is unlikely to be of practical use for most people.) While it would not be suitable for a human player, I did pursue this idea to the extreme for some of the likely best starting words: how well could we do if we were willing to memorize preferred words to be used on every move, until the hidden word is found? That is, how well can we do if we are willing to construct the entire decision tree, using only (strict) hard mode guesses (i.e. always playing words that are in the current cluster)? I have already mentioned that starting with TRACE might take us until the eighth turn to finish, yet it's otherwise not a bad start from this perspective! I can produce a decision tree that shows what to do on each of the 2315 days of a full cycle, and a total of only 8204 words would have been played -- an average of 3.543845 turns per game. Actually SLATE is better, and in fact gives the lowest total of all the words I tested in this way: 8186 total (3.536069/day) if we follow this decision tree. Most of these trees sometimes extend past the sixth move though; among those that do not, the fewest moves I have found is for PLATE. It wins every day, taking 3.562419 moves per day, 8247 total. (SCALE and PLACE are nearly as good; PARSE takes 8282 total.) In other words, PLATE seems to be the best starting word in the sense of using the fewest turns on average, if we play in hard mode and insist on winning by turn 6 (using only solution-list words). My algorithm tried to minimize something a bit odd, in my construction of these decision trees: I have learned that Tom Sirgedas has posted decision trees to play the game with the same constraints that I do: strict hard mode, using only Wordle solution words. He gives one starting with SLATE which, like mine, will occasionally take more than 6 turns; another starting with PLATE which always finishes by the 6th turn; and a third starting with SCAMP which always finishes by the 5th turn. His first two finish a bit faster than mine (3.51836 turns and 3.55378 turns respectively) and his third finishes in 3.71404 turns on average. Each of these is optimal in its class by this metric. (My decision trees opt to lower the maximum number of turns needed to handle each word in a cluster, before trying to lower the average number of turns. The difference is evident only in a few clusters.) ---------------- (E) A "WORD" ABOUT REPEATED LETTERS In previous sections it was necessary to restrict attention to starting sets without repeated letters, to cut down on the size of the search spaces. For single-word starting sets that is not necessary, and indeed this gives us an oppotunity to consider the consequences of not allowing repeated letters in the previous sections. There are 749 Wordle solution words that repeat one or more letters. We can rank them according to any of the metrics that we have applied to other sets of words: For the various Lp metrics, the best of these words is one of these: INANE (for all p>10.736), whose largest cluster has "only" 316 elements ELATE (for other p>2.525) ERASE (for other p>1.133) with a daily ambiguity of "only" 106.97 TEASE (for other p>0.645) TERSE (for other p>-0.895) which has a "large" total of 122 clusters TRAIT (for other p>-1.604) TOTAL (for all p<1.604) with 33 singleton clusters, which is the max. For the color-tile metrics, the best is either SOOTY (for all f<0.063) with 1392 green tiles, which is maximal CEASE (for other f<0.183) TEASE (for other f<0.234) ERASE (for other f<0.821) EATER (for other f<1.840) with 3641 total colored tiles, which is maximal TERRA (for all f>1.840) with 2759 yellow tiles, which is the max. For the guess-at-will strategy the best repeat-letter starting words are SLEET, which finishes after an average of 3.96898 turns (the min). SLEEP, which fails 1.00232% of the time (the min) Also notable are ROOST, TORSO, and SHOOT, which alone are guaranteed to finish by the 12th move this way. (I did not compute a minimal finish-by-turn-6 strategy for any of these starting words. Each of the words TRAIT, TOTAL, SLEEP have 28 clusters that require an additional rule of play (i.e. clusters for which guess-at-will can fail), and the other words listed above (probably) have even more, so none of these words would be optimal by the 'simplest strategy' criterion, and I doubt any of the rest of the 749 repeated-letter words would be any simpler.) Based on this information, one might conclude that ERASE, TEASE, and TERSE are the best words to include in a discussion that's not limited to words with distinct letters. Besides the other words listed above, one might also consider words that showed up fairly highly on one or more of the above rankings: STEEL, RESET, BERET, EASEL, LEASE, LATTE, NERVE, NEVER, SNEER, TREAT, FLEET, STATE, SPREE, SCREE, ... *However*, as good as the best of this lot are, they're not great. Ranking all 2315 words by any of various critera, even the best of these words (measured by the same criterion) are typically out-ranked by some 10% of the words that do not repeat any letters. One might hope that one could pair off the best of these "four-letter words" with the best pairs to get good triples with only 14 distinct letters; or similarly to create good 19-letter quads or 9-letter pairs. This does not seem to be the case. For example, what 14-letter triples can we form which include ERASE? Of all the 196175 10-letter pairs, every one of the ones that I would consider to be in "the better half" already includes at least one of the letters A, E, R, or S. A good example of one that doesn't is COUNT+DIMLY, but now the triple ERASE+COUNT+DIMLY is not competitive with the best triples we itemized in a previous section. Likewise I scanned what is arguably the "top 20%" of all the 15-letter triples, and every one of them shared a letter with every one of the 18 or so best words in this section. The conclusion is that in order to build starting sets that repeat a letter but remain competitive with the others we have already found, we would have to either (a) combine moderately-good starting sets without repeat letters, with moderately-good individual words with a repeated letter, in the hopes that they have complementary strengths, or (b) combine very good parts that individually have no repeated letters, but whose letter-sets overlap. (Option (b) is counter-intuitive, since there is more information that can be returned from a letter repeated in a single word, then from a letter in common between two words. Nonetheless, we have a better selection of the two parts in this way.) ---------------- (F) SUMMARY OF BEST SINGLE STARTING WORDS So what's the single best starting word? By various criteria in this document, we have seen it might be TRACE, or BRUTE or CHANT, or RAISE or ARISE, or FILET, or SLATE, or CLASP, or SLEPT, or STARE or ALERT/ALTER/LATER, or PLATE. I once informally combined some of the metrics to get an overall winner, and it came out to be PARSE. WordleBot says CRANE, while if you set the toggles suitably at WordleTools and it will rank TRAIL, TRAIN, or ATONE highest. One "expert" advocates CLAMP, while someone else "proves" it's SLICE or ADEPT. Most of these claims have some justification and computation behind them, so it may seem odd that they can have different conclusions. Of course it depends on what you wish to optimize and how you will continue play after the starting word, to try to optimize it. (Most claims about the best starting word seems to assume the player "will play optimally" but that seems unrealistic unless a clear and simple algorithm is presented, and then followed by the player!) I know the very best one-word starting sets can achieve an average number of turns as low as 3.4212 while guaranteeing a 0% failure rate. But all the ones I know of allow use of all the non-answer-list words as well; I don't know how much higher the minimum is if we restrict to words from the Wordle answer list. But I do know that these "optimal" algorithms have a significantly higher amount of branching than our simple algorithms. (The best score in this document was 3.6590 turns per game, for CRANE + SPILT.) Also, it bears repeating that the superiority of those algorithms applies only to original Wordle, not the compound games. ============================================================================== CONCLUSION(?) So ... what does all this tell us about how to play Wordle and the compound games? One conclusion, surely, is that what we choose to do will depend on what we want consider important. We have seen in the examples that the most important goals may be at odds. We want to win as often as possible; we want to keep our numbers of turns used as low as possible; we want to follow a procedure that is as simple as possible; and along the way we appreciate not having to come up with good moves in the absence of concrete hints. In order to measure just how good or bad a particular starting set is, we have insisted throughout that it is important to clarify just what the player will do after the starting set is played. It is interesting to note however, that (assuming the goal is to minimize the average number of turns) the relative rankings are similar whether we intend to finish the game with a guess-at-will strategy, or following a strategy that guarantees a win within six turns. So, since people like a firm answer, let's just guess what the reader is really interested in, and make a recommendation: depending on what size of set of words you want, the "best" starting set is one of these: catty, frond, rumba, spill, verge, whack blank, chump, goody, river, swift carve, downy, plumb, sight bland, copse, right crane, spilt Should I give a personal, non-necessarily scientific, recommendation that takes into account what actual human players are like? First, familiarize yourself with the Wordle wordlist! Then (1) If you're playing Wordle on hard mode, start with PLATE. (2) If you're playing Wordle on easy mode, and just want to keep your "streak" alive, start with CARVE, DOWNY, PLUMB, SIGHT. (3) If you're playing Wordle on easy mode, and want to minimize your number of moves, start with CRANE + SPILT (4) If you're playing a compound game, start with COPSE, BLAND, RIGHT. In all cases, memorize as many of the corresponding side rules as you can. Then, during play, use those rules but otherwise just keep guessing Wordle words consistent with the clues. ---------------- I will continue to process the datasets that I have constructed, and intend to update this document when something new pops up. In the mean time, I welcome corrections and suggestions for further investigation. Now, how about moving on to a nice game of Nerdle, hmm? :-) --dave rusin@math.utexas.edu