Where the Puzzles Come From

5 minute read

Welcome to CrosswordRace.com! This fun little website was created to help me learn some new things, stay sharp on my web development skills, and keep myself from yelling at the TV because the NC State Wolfpack can't seem to field a competent offense.

One of the most common questions I receive from users is, "Where do you get the puzzles?" It's a good question, and first, let me let the cat out of the bag: yes, the puzzles are AI-generated. I’m not a crossword author, just a crossword hobbyist. I didn’t want to write the puzzles myself, as that would require constant manual attention to the site. So, I needed a way to automatically generate them.

Step 1: Building a Word List

As anyone who has ever tried to make crossword software knows, you need a word list of words that are eligible to appear in your puzzle. There are many word lists available on the internet, some specific to crosswords, some paid, and some free.

I built my word list by combining several existing, free, and open-source word lists, then manually narrowing them down to include only words between 3 and 5 letters (since all my puzzles are 5x5). I also remove words that I consider too cliche, too esoteric or too inappropriate for a crossword. Since I based my word list mostly on "words" rather than "crossword clues," you’ll notice that comparatively few phrases appear in the CrosswordRace.com puzzles. Short phrases like "AM SO" or "ARE TOO" are classic in other crosswords, like the NYT. Maybe one day I'll revise the word list and update the generation software to include more of these.

Step 2: Building Candidate Puzzles

With the word list in hand, I needed to create some crossword grids. To do that, I wrote a Python script that picks a word at random from the word list and selects a template from several blank crossword templates. The script then uses a modified BFS search algorithm on the remaining words to determine which ones are still eligible to fit in the puzzle. The BFS continues until there are no words that can satisfy the puzzle, or a complete solution is found. If a complete solution is found, it’s added to the list of "candidate" puzzles. If no solution is found, the program picks a new random word from the list and starts the process over.

You might be wondering, why not just have AI generate the crossword grids? Well, I tried that. It didn’t work very well. The AI kept using bad clues, poor grid patterns, or just outright couldn’t generate a quality 5x5 puzzle. Perhaps there’s a better way to fine-tune a model, prompt, or context to generate a crossword grid, but I found that generating candidate puzzles from the word list is both resource and time efficient (it takes around 1 second and negligible compute to generate a candidate puzzle from a script running locally on my laptop).

Step 3: Clues and Consensus

Now that I have a 5x5 crossword puzzle populated with eligible words from the word list, I needed to generate some clues. To do that, I started asking AI models to write crossword clues for me. It was hit or miss—sometimes the clues were great, and other times they were bad or nonsensical. Throughout the development process I used multiple models and variations, but the workhorse that I found the most success with was Meta Llama 3. The real problem was that the model sometimes hallucinated and generated clues that were outright incorrect. One example is the clue ASSES (which has since been removed from the word list). The model generated this clue for ASSESS, likely assuming it was a typo.

So, how could I monitor the quality of the AI-generated clues? I could read all the puzzles, but that’s boring and still requires constant attention to the site. Instead, I started using other AI models to build a rudimentary concept of consensus around the clue. In essence, I would use one model to generate the clue, and then a few other models to attempt to solve it. If the other models could solve the clue, I assumed it was legitimate. If the entire puzzle could be solved this way, it was published to the site.

Conclusion

Word lists, custom software, and AI models – that’s the gist of how the puzzles for CrosswordRace.com are generated. The puzzles are produced in batches every few days, so it’s possible that you might see a duplicate if you play enough. Currently, the puzzle production process takes around 10-20 seconds from start to finish, which is too long to run synchronously while players are waiting for a round to start.

If you’d like more information or are a big nerd with questions like “What temperature did you find generated the best clues?” feel free to start a topic on the community tab or send me a message through the contact us page.

Live Arena

Daily Puzzle

Leaderboard

Crossword Creator

Community Puzzles

Crossword Upload