A crossword layout generator in Prolog.
This is a crossword grid generator implemented in SWI Prolog. The solver core is portable Prolog, though the clue-input and JSON-output paths use SWI-specific features (see Requirements). The solver works deterministically, laying down one word at a time by placing it such that it intersects with an existing word, and also such that it doesn't sit hard up against any letters from another word.
The solver is designed to run online, so it returns the first solution rather than trying to find them all. The list of input words is shuffled first so repeated solutions on different runs are less likely. The start location for laying down the first word can also be shuffled or specified in advance. The algorithm does not try guess the grid length, so this must be specified by the user. It is of course possible to specify a length for which there are no solutions. Also note that duplicate solutions exist. These consist of the same layout of words, but found by laying out words in a different order.
The words and their accompanying solutions are supplied as an input file.
The repository includes a Prolog fixture, fixtures/bundled_17_clues.pl,
which also allows you to attach a URL to each word, as this was originally
intented for embedding in a web page, where the clues/solved words can be
links.
SWI-Prolog. The solver core is plain
Prolog, but the input and output paths are SWI-specific: clue metadata and
the JSON output both use SWI dicts and library(http/json), so porting to
another Prolog would mean replacing those. It is developed and tested
against SWI and has been confirmed to run on SWI-Prolog 9.2.x (the version
in the Ubuntu repositories) and 10.0.x (from the ppa:swi-prolog/stable
PPA).
crossword.pl is an executable SWI-Prolog script (note the #!
shebang at the top). Run it directly:
# Place the words using a fixed starting position.
$ ./crossword.pl --input <input_file> <grid_length> <start_loc>
# Find a vaguely random layout by shuffling the input words and the
# order in which start positions are tried.
$ ./crossword.pl --input <input_file> --shuffle <grid_length>
# Count how many distinct solutions exist (see the note below).
$ ./crossword.pl --input <input_file> --all <grid_length> [<start_loc>]
# JSON and Prolog clue inputs are both accepted.
$ ./crossword.pl --input puzzle.json <grid_length> <start_loc>
$ ./crossword.pl --input fixtures/bundled_17_clues.pl 17 topleft_across
# Write the output to a file. --out composes with any form above.
$ ./crossword.pl --input <input_file> --out solution.json <grid_length> <start_loc>
# Show all options.
$ ./crossword.pl --help
Arguments:
grid_length— an integer giving the side length of the (square) grid. The words infixtures/bundled_17_clues.plare intended for a grid of length 17.start_loc— where the first word is placed. One of:topleft_across,topleft_down,topright,bottomleft.
Options:
--input <file>— required. Load the word/clue set from a.jsonclue file using the interchange schema, or from a.plProlog fixture containing aclues/1term.--out <file>— write the output to<file>instead of stdout. The file is written only once a solution is found, so an unsolvable run leaves no empty file behind. Without the flag, output goes to stdout as before.--help/-h— print the usage summary and exit.
Flags are parsed with library(optparse), so they compose in any order and
also accept the --flag=value form (e.g. --out=solution.json).
If you do not have execute permission set, you can equivalently run it
as swipl crossword.pl <args...>.
Examples:
$ ./crossword.pl --input fixtures/bundled_17_clues.pl 17 topleft_across
$ ./crossword.pl --input fixtures/bundled_17_clues.pl --shuffle 17
Note on
--all: this enumerates every solution by backtracking, and the search space is large — many solutions are duplicates that differ only in the order words were laid down. With a fixedstart_locit is bounded but can still be slow; with nostart_locit additionally enumerates all four starting positions and can take a very long time. Use it for small grids / small clue sets.
The bundled words and clues live in fixtures/bundled_17_clues.pl. Prolog
fixture files define a single clues/1 term whose argument is a list of
[Answer, Metadata] pairs:
clues([
['OMEGA POINT',
_{clue: 'Transcending entropy',
link: 'http://en.wikipedia.org/wiki/Omega_Point'}],
['FLOW',
_{clue: 'Autotelic activity',
link: 'http://en.wikipedia.org/wiki/Flow_(psychology)'}],
...
]).
- Answer — the word. Spaces are allowed for multi-word answers; they
are stripped before the letters are placed on the grid (so
'OMEGA POINT'occupies a single 10-cell run). The answer keeps its spaces as a display label in the output. - Metadata — an optional dict of arbitrary data carried through to the
output verbatim. By convention it holds a
clueand alink, but the solver never inspects it, so any keys are fine. The metadata element may be omitted entirely (just['BIAS']) or given as an empty dict (['FLOW', _{}]); either yields an emptymetaobject in the output.
Answers must be unique: the output associates metadata with each placed word by its answer string, so a repeated answer is rejected up front.
To use your own words with the main CLI, pass either a JSON clue file or a
Prolog fixture file with --input.
For programmatic use you can supply the clue set as a JSON file instead, without editing any Prolog:
$ ./crossword.pl --input puzzle.json 17 topleft_across
The file is a JSON object with a clues array; each entry mirrors an
output words[] object minus the solver-computed fields:
{
"clues": [
{ "answer": "OMEGA POINT",
"meta": { "clue": "Transcending entropy",
"link": "http://en.wikipedia.org/wiki/Omega_Point" } },
{ "answer": "FLOW" },
{ "answer": "BIAS", "meta": {} }
]
}
answer— required string. Spaces are allowed and stripped before placement, exactly as in the bundled Prolog fixture.meta— optional object (default{}), copied verbatim to the output entry'smeta. By conventionclueandlink, but any keys are fine; the solver never inspects it. Omittingmetaand giving{}are equivalent.
The two sources are otherwise identical: JSON files and Prolog fixtures
produce the same internal word list as clues/1, so uniqueness checking,
solving, and output are unchanged. A missing file, malformed JSON, Prolog
fixture without clues/1, unsupported extension, or JSON schema violation
(no clues array, a non-string answer, a non-object meta) is reported
and the program exits non-zero. See
docs/json-input-spec.md for the full schema and
rationale; tests/clues.json is a worked example.
The solver uses two structures (see the comments at the top of
crossword.pl):
-
The grid — an association list (
library(assoc)) keyed by cell number. Cells are numbered1 .. grid_length²in row-major order (left to right, top to bottom). Moving across is+1(along a row); moving down is+grid_length(down a column). Each value is initially the atomempty. -
Placed words — a list, one entry per word already laid down. Each entry is a dict:
word{answer:Answer, letters:Letters, cells:Cells, dir:Dir, len:Len, start:Start, num:ClueNum}where
lettersis the space-stripped character list,cellsis the list of grid-cell numbers it occupies,dirisacross/down,lenis the length,startis the starting cell, andnum(the clue number) is added at the end byassign_clue_numbers/2. The solver carries no clue text or link here — metadata is rejoined to each word by its answer string only at output time.
The driver is assign_words/8. It is a greedy, backtracking placement:
- Pick a not-yet-placed word (via
member/2, so the choice is a backtrack point). find_intersecting_word/6chooses where to put it. For the very first word there is nothing to cross, so the groundedstart_locis used. For every word after that, it finds an already-placed word that shares a letter, computes the crossing cell, flips the direction (swap_dir/2), derives the candidate start cell, and checks the word fits on the grid (fits_on_grid/4).assign_word/9lays the letters down withassign_letters/7, enforcing the two layout rules:- each cell must either already hold the same letter (a legal crossing) or be empty; and
- an empty cell's perpendicular neighbours must be free
(
adj_is_free/4), plus the cells just before the start and just after the end must be empty (check_prev_cell/4,check_next_cell/4). Together these stop words being laid flush alongside one another — only genuine crossings are allowed.
- Recurse with the word removed from the pool and added to the placed list. When the pool is empty, every word has been placed and the layout succeeds.
Because the words are tried in order and the solver returns the first
success, a --shuffle run randomises both the word order and the order
the start positions are tried (shuffle/2, built on choose/2).
Once all words are placed:
assign_clue_numbers/2sorts the placed words by their start cell and assigns clue numbers in that order (cells that start both an across and a down word share a number).emit_json/3builds the grid and word list and writes the whole solution as a single JSON object viajson_write_dict/2— to standard output, or to the named file when--out <file>is given.
The output is one JSON object with three top-level keys (json_write_dict
emits object keys in sorted order, so on the wire they appear as grid,
gridLength, words):
{
"gridLength": 17,
"grid": [ ...gridLength rows... ],
"words": [ ...one entry per placed word... ]
}
-
grid— a densegrid_length×grid_lengtharray, row-major and 0-indexed[row, col]. Each entry is eithernull(an empty cell) or a cell object{"letter", "number", "across", "down"}:letter— the cell's letter;number— the corner-label clue number, ornullunless a word starts in this cell (across and down words starting together share it);across/down— the clue numbers of the across/down words passing through the cell, ornullfor none. A crossing cell carries both, so both words (and both links) stay reachable; the consumer decides which to follow.
-
words— one object per placed word:number,direction("across"/"down"),answer(spaces preserved),cells(the ordered[row, col]pairs it occupies), andmeta(the verbatim metadata dict from the input clue set,{}if none). To follow a link from a cell, match the cell'sacross/downnumber and direction againstwords.
If no layout exists for the requested grid_length, the program produces
no output and exits non-zero (and with --out, no file is written).
The full schema and design rationale live in
docs/json-output-spec.md.
- Grid size is not inferred. You must pass a
grid_lengththat is large enough; too small and there is no solution, and the solver will simply fail to find one. - First-solution semantics. The solver stops at the first valid
layout.
--allexists to enumerate them but is expensive (see the usage note). - Duplicate solutions. Many enumerated solutions are the same physical layout reached by placing words in a different order.
Run the full regression suite with:
$ make test
For local performance comparisons, run the benchmark harness:
$ make bench
or directly:
$ swipl -q benchmarks/run_benchmarks.pl -- fixtures/bundled_17_clues.pl
The harness reports min, median, and mean wall time, CPU time, and inference
counts for a Prolog fixture file that defines clues/1. These numbers are
machine-specific and are intended for comparing local branches, not as CI
thresholds or authoritative baselines. The default fixture is
fixtures/bundled_17_clues.pl; pass a different fixture path directly, or use
make bench BENCH_FIXTURE=fixtures/other.pl BENCH_GRID=<n>. Use --grid <n>,
--start-loc <loc>, --iterations <n>, --warmup <n>, and
--format text|csv|json to shape a direct run. With make bench, the matching
variables are BENCH_GRID, BENCH_START_LOC, BENCH_ITERATIONS,
BENCH_WARMUP, and BENCH_FORMAT. The default format is text.
Synthetic benchmark fixtures are included for deeper search workloads:
| fixture | words | grid |
|---|---|---|
fixtures/benchmark_08_words.pl |
8 | 13 |
fixtures/benchmark_14_words.pl |
14 | 17 |
fixtures/benchmark_16_dense_words.pl |
16 | 17 |
fixtures/benchmark_20_words.pl |
20 | 37 |
fixtures/benchmark_26_words.pl |
26 | 49 |
For example:
$ make bench BENCH_FIXTURE=fixtures/benchmark_20_words.pl BENCH_GRID=37
The dense 16-word fixture is intentionally much slower than the others; start with one measured iteration:
$ make bench BENCH_FIXTURE=fixtures/benchmark_16_dense_words.pl BENCH_GRID=17 BENCH_ITERATIONS=1 BENCH_WARMUP=0