Ukkonen's algorithm constructs a suffix tree in linear time by scanning the
string left to right and updating the tree after each new character.
It keeps an active point (active node, active edge, active length),
a remainder of suffixes still to add, and uses suffix links
to move between related internal nodes efficiently.
The active point is the cursor for the current extension: the active node is where the
search starts, the active edge is the next outgoing edge being matched, and the active
length is how many characters are already matched on that edge. This compactly represents
a position inside the tree without rebuilding paths, allowing the algorithm to continue
matching or splitting from the exact in-progress location.
The remainder is the count of suffixes that still need to be added for the current phase.
It shrinks as extensions are resolved and grows when a new character is read, letting the
algorithm defer work and avoid rescanning the string from scratch.
For each phase, extensions follow three rules:
-
If the next character already exists on the active edge, the algorithm
increments active length and stops.
-
If there is no matching edge from the active node, it creates a new leaf.
-
If a mismatch happens on an edge, it splits the edge, creates a new internal node,
then adds the new leaf and suffix link.
The result is a compact tree that represents every suffix of the input string, built in
linear time, which makes it suitable for long inputs and interactive visualization.
Original paper