This page is part of my <a href=math.utexas.edu/~rusin/wordle>analysis of Wordle</a>.

Theorem: There is no set of three starting words that has both desirable features
  (A) sure to win in at most two more moves
  (B) able to finish with "freeform hard mode"

(For comparison, we noted in the main document that the FOUR-word starting
word-set  carve-downy-sight-plumb  does possess both properties (A) and (B). )

Let's illustrate the style of the proof of this claim.
Suppose for example that after a starting word-set has been entered, you 
notice that the word "haunt" could be the day's hidden word, i.e. "haunt"
is consistent with the clues you've gotten from your 15 colored tiles.
Depending on what words you started with, it is quite likely that
the words "jaunt" and "vaunt" are also candidates. Well, if indeed your
starting word-set has property (A), that means there is some word to
enter as your next move that will tell you which of these three is the
day's hidden word; but if your word-set has property (B) as well, then
you should be able to enter whichever of the three comes to mind first,
and that's a problem: if you enter "haunt" (say) and that's not the day's
hidden word, then you haven't gained enough information to exclude either
"jaunt" nor "vaunt", so you've violated property (A).

The conclusion, then, is that any starting word-set with properties (A)
and (B) cannot allow both "jaunt" and "vaunt" as possibilities on the
day when "haunt" is a possibility. That's exactly what happen when "sight"
is part of the starting set: that "h" will be yellow when the day's word
is "haunt", but it will be grey when the word is "jaunt" or "vaunt".

And this is precisely the feature that makes a starting word-set able
to have properties (A) and (B): at least one of the words in the set
must contain an h, or a j, or a v. Those are the only ways in which the
colored tiles for haunt, jaunt, and vaunt will not all be identical.

The same reasoning applied to the words  match, patch, and watch . If
on our next-to-last move we enter any of these three words, and it's not
the hidden word of the day, then we will have gained no new information to
decide which of the other two is the day's word. So we would contradict
properties (A) and (B) *unless* the starting set of words already treats
match, patch, and watch differently: at least one of the starting words
must contain an m, a p, or a w (and not two of these letters, unless
one of them is in position 1). We can assemble the set of all words
in the Wordle wordlist that contain one of those letters; those are the
words that "crack apart" this set of three words -- they form the
"cracking set" for {match,patch,watch}, and now we understand that our
starting wordset must intersect this cracking set.

[Note: elsewhere we refer to "splitting" the set {match,patch,watch},
which would mean having words that treat *all three* differently.
Words like "women" or "wimpy" split {match,patch,watch}, but words
like "marry" merely crack the set.]

So we obtain multiple requirements for our starting word set: it must
intersect the cracking set of {haunt, jaunt, vaunt}; it must also
intersect the cracking set of {match, patch, watch}; and so on.

Actually, the starting word-set must intersect the cracking set of
EVERY triple of words that score each other equally, and conversely
if it does intersect all of these, then with one (hard-mode) guess
after the starting set, there can be no ambiguity left about the day's
hidden word, so we can enter it and win.

This turns the search for starting wordsets with properties (A) and (B)
into a "covering-set problem": the starting set we seek will be a set
of words that intersects the cracking set of every such triple of words.

In principle this is a known mathematical problem for which software
exists: list the cracking sets of all such triples and then look to see 
if there is a covering set with three words. (My claim is that there 
is not; the smallest cracking sets, like {carve, downy, plumb, sight}, 
contain four words.) But as a practical matter this computation is far
too lengthy --- there are many millions of "haunt"y triples to worry
about, and their cracking sets contain hundreds or even thousands of words.

But it turns out that a relatively small number of "haunt"y triples
is already enough to preclude the possibility of a covering set with
just three words. These are the only triples we need to consider:
 [jaunt, taunt, vaunt], [piper, riper, viper], [catch, hatch, watch]
 [focal, local, vocal], [budge, fudge, judge], [corer, cover, cower]
 [foist, joist, moist], [stake, state, stave], [grade, grave, graze]
 [haunt, jaunt, taunt], [fight, tight, wight], [daunt, jaunt, taunt]
 [built, guilt, quilt], [dolly, folly, jolly], [folly, holly, jolly]
 [hitch, pitch, witch], [bunch, hunch, munch], [miser, riser, wiser]
 [paper, parer, payer], [stage, stake, state], [golly, holly, jolly]
 [skill, spill, swill], [eater, hater, water], [crane, crave, craze]
 [mower, power, rower], [booby, booty, boozy], [clash, clasp, class]
 [cried, dried, fried], [field, wield, yield], [stung, stunk, stunt]
 [taker, tamer, taper], [bitty, ditty, kitty], [budge, judge, nudge]
 [night, tight, wight], [bring, brink, briny], [conic, ionic, sonic]
 [flush, plush, slush], [fiber, filer, fixer], [hatch, latch, watch]
 [spoof, spook, spoon], [catch, hatch, latch], [bribe, bride, brine]

That's 42 triples (arranged in order of the size of their cracking sets).
Some of these triples overlap but that is irrelevant. The triples in this
list are particularly annoying when playing Wordle: in each of these 42
triples, the three words share a set of four letters, that is, any one
of them would score the other two with four green tiles (and one grey).

It is now a relatively simple matter to check the claim that a
starting wordset with properties (A) and (B) has to have at least
four elements. Simply
  1. compute for each of the 42 triples the "cracking set" :
      { w in Wordle wordlist | score(w,w1), score(w,w2), and score(w,w3) are not all identical }
  2. present the 42 cracking sets to e.g. Gurobi and ask it to find
     the smallest covering set.
The answer will be a set of cardinality 4. In other words, for any starting
wordset of just three words, at least one of the 42 triples shown will be "bad":
on any occasion that any one of the words in the triple is a candidate for the 
word of the day, ALL three of them will also be candidates, so if we pick one
of them as our fourth entry and it is wrong, then we have to pick one of the
other two as our fifth entry and if that's wrong too, we won't win before the
sixth entry.

                ================

So the proof has become an easy covering-set computation. Perhaps it is of interest
to know how I obtained those 42 triples.

In fact, I initially began with a much larger set of triples. With some experimentation
I found that the cracking sets for triples tended to be smaller when the triples
were more similar. (See separate article on <a href=similarity>similarity</a>.)
In a covering-set problem, larger subsets are easier to intersect, that is, they 
tend not to restrict the potential covering sets as much; they are less helpful
in looking for a covering set. So I computed first the triples that were "neighbors"
(and still had the property that each word in the triple scored the other two
words equally), and then computed the cracking set for each. The triples
themselves are not really relevant to the covering-set problem --- only their
cracking sets matter --- so duplicates could be omitted. In fact if the
cracking set for one triple even simply *contained* the cracking set for another, 
then the former could be discarded. (A covering set that intersected the smaller
cracking set would automatically intersect the larger one too.) In this way I
found a collection of just 618 cracking sets that must be covered, and software
was able to complete the calculation and determine that the covering set would
have to have cardinality 4 or more.

At that point I simply discarded several hundred of the largest cracking sets
and noticed that this did not change the outcome of the covering-set problem:
no covers of cardinality three exist. This continued to be true as long as I
retained at least a couple hundred of the smallest cracking sets.

The next step was to invert the question: if no set of cardinality three could
cover all 200 of these cracking sets, what is the largest number of them that
*could* be covered by a set of three starting words? The answer turned out to be
"almost all of them", but specifically Gurobi could tell me a starting triple
that would cover all of these cracking sets except for, say, the 27th.
So I would rerun the optimization problem, adding the constraint that the 27th
cracking set MUST be among those covered. This process I would repeat in a 
sort of Whack-A-Mole game: if Gurobi could cover all of the cracking sets
except the 9th and the 83rd, then I would rerun the problem adding the
constraints that the 9th and 83rd cracking sets MUST be covered. The typical
result would be that Gurobi would find another 3-word covering set that 
met the same number of the cracking sets, but occasionally it would be
unable to match the previous level, finding it could meet only somewhat
fewer of the constraints. Eventually, after adding many fewer than 200 of
the cracking sets to the list of constraints that must be met, I would be
informed that there was no feasible solution at all: no starting set of just
three words could cover the list of cracking sets that I had added as must-have
constraints.

In this way I replaced the list of 200 or so unmeetable constraints with a
subset of 96 of the hardest ones. Then I repeated the process and found just
66 of *those* to be sufficient alone to force the covering set to have four or
more elements. Another round got it down to 45, and some more tinkering found
a few more that were unnecessary. This gave the present list of 42 cracking
sets that cannot be covered with any set of just 3 Wordle words. I believe I
checked that every subset of these 42 *could* be covered with 3 words, i.e.
that this set of problematic triples is minimal in the sense of inclusion;
but I don't make any claims about minimality in the sense of cardinality.

                ================

I don't know whether there is a starting set of three words that meets just
condition (B). That would be a set of three words which would be sure to win 
Wordle by the 6th move, with the last three moves being simply freeform 
hard-mode guesses. ("Guess anything that fits the clues so far".)

It is tempting to think that these 42 triples do also give a proof that
no three-word starting set can have the two properties
  (A) sure to win in at most two more moves
  (B') able to finish with "guided hard mode"
Indeed, what the covering-set computations show is that given any
three-word starting set, at least one of these 42 triples will not be cracked, and
at first blush that would seem to be enough -- it wouldn't help to always prefer
say "haunt" to "jaunt", since they all score each other with the same four greens.
But consider for example the (very good!) starting triple 
     [blend, match, sprig]
It does, of course, fail to crack some of those 42 triples, including
[stake, state, stave]; all three of these words will turn the S tile green
and the  A,E,T tiles yellow. But so will the word "skate", and if we 
are permitted to use *guided* hard mode, then we can make it a point
to play "skate" whenever it is an option. If the word of the day is
one of the other three, we'll then know which it is, to play on move 5.
This example shows that the fact that a triple is uncracked by the starting
wordset does not mean that we cannot resolve it in two more moves, 
if *guided* hard mode is allowed (and certainly not, if out-of-the-box
playing is allowed!)

I don't yet know whether there is a starting set of three words which
satisfies conditions (A) and (B').  The starting set blend+match+sprig
does have property (B') : if after those three words are entered the
clues suggest that the hidden word could be fudge, outer, leaky, or joker,
then play that word; otherwise, play in freeform hard mode. Then you
will win by the sixth move. On the other hand I don't know yet whether there
is a starting set of three words that has property (A), with or 
without having property (B') as well.

                ================

Finally, observe that the 42-triple-test is only useful for the proof
of the negative result about three-word starting wordsets; it's not
helpful for the positive task of finding four-word starting sets: as
noted above, that task would be equivalent to finding a covering set
for many thousands of cracking sets; covering these 42 is necessary
as a part of that, but not even close to being sufficient. In the same
way it is unhelpful as a way to predict the "quality" of three-word 
starting sets.

On the one hand the good starter blend+match+sprig mentioned above
fails to crack not only [stake, state, stave] but also [hatch, watch, catch],
[tight, wight, fight], [field, wield, yield], and [focal, local, vocal], 
each of which cannot guarantee a win by move 5 in hard mode; and this
starter also has uncracked clusters [jaunt, taunt, vaunt, tawny] and
[piper, riper, viper, prize], each of which includes one of the 42
triples along with another word, although unlike "skate" that
additional word does not fully resolve the cluster. So although this
is a very good starting wordset, it looks bad when using these 42
triples as a metric.

By contrast, a starting wordset like
    [flask, heavy, thumb]
can crack 41 out of the 42 hard triples, and still not be a particularly
good starting set. Of the 42 it cracks all but the last, but not only
is that triple uncracked, it's lumped into the much larger cluster
  [bicep, binge, booze, borne, boxer, breed, bribe, bride, brine, probe]
It also leaves uncracked some other triples (not tested for among the 42)
such as  [basic, basin, basis] . And it does a generally poor job of
distinguishing words --- there are only 630 words uniquely determined
by this starting set, with all the other Wordle words clustered into
just 396 other clusters, some of which are large, including the 58
indistinguishable words (such as "cider") that return nothing but a
yellow E in response to the three starting words!