Introduction To Trie - The Prefix Tree You Should Know

All Notes

Code

Code – Trie

Have you ever wondered how your phone suggests words as you type, or how a search engine instantly completes your query? Behind many such feats of efficiency lies a powerful, yet often overlooked, data structure: the Trie.

Sometimes called a “prefix tree” (which is a much more descriptive name, honestly), a Trie is a specialized tree-like data structure used to store a dynamic set or associative array where the keys are usually strings. Unlike a binary search tree, where nodes store the actual key, in a Trie, keys are not stored explicitly in nodes. Instead, their position in the tree defines the key. Let’s dig in.

What is a Trie? The Core Concept

Imagine a dictionary where every letter in a word is a step along a path. If multiple words share a common prefix, they share that initial path. That’s essentially a Trie.

At its core, a Trie is composed of nodes. Each node typically represents a single character.

Root Node: The very first node in a Trie is an empty node. It doesn’t represent any character but serves as the starting point for all words.
Edges: Each edge from a node to its child represents a character.
Children: A node can have multiple children, each corresponding to a different character. This is typically implemented as a dictionary or hash map where keys are characters and values are child nodes.
is_end_of_word Flag: To indicate that a path from the root to a certain node forms a complete, valid word, that node has a special flag (e.g., a boolean is_end_of_word). Without this flag, you wouldn’t know if “cat” is a word if “caterpillar” is also in the Trie.

Let’s visualize. If you insert “apple”, “apply”, and “apricot”:

(root)
  | 'a'
  (node for 'a')
    | 'p'
    (node for 'p')
      | 'p'
      (node for 'p') <-- (This node has children for 'l' and 'r')
        | 'l' ---------------------- 'r'
        (node for 'l')              (node for 'r')
          | 'y'                       | 'i'
          (node for 'y')              (node for 'i')
          (is_end_of_word=True for "apply")  (node for 'c')
            | 'e'                       | 'o'
            (node for 'e')              (node for 't')
            (is_end_of_word=True for "apple") (is_end_of_word=True for "apricot")

See how “apple” and “apply” share the “appl” prefix? And “apple”, “apply”, “apricot” all share “ap”? This shared prefix is where Tries get their efficiency and space savings (for certain datasets).

Implementing a Trie in Python

To implement a Trie, we first need a TrieNode class, and then a Trie class to manage operations on these nodes.

1. `TrieNode` Class

A TrieNode will hold:

children: A dictionary mapping characters to TrieNode objects.
is_end_of_word: A boolean indicating if this node marks the end of a valid word.

# trie_node.py
class TrieNode:
    def __init__(self):
        # A dictionary to store children nodes.
        # Key: character, Value: TrieNode object
        self.children = {}
        # Boolean flag to mark if this node represents the end of a word.
        self.is_end_of_word = False

    def __repr__(self):
        # For debugging: show children and end-of-word status
        children_chars = list(self.children.keys())
        return f"TrieNode(children={children_chars}, is_end_of_word={self.is_end_of_word})"

# Example usage (for demonstration, not part of the main Trie class)
# root = TrieNode()
# root.children['a'] = TrieNode()
# root.children['a'].children['p'] = TrieNode()
# root.children['a'].children['p'].is_end_of_word = True # This would be word "ap"

2. `Trie` Class Operations

The Trie class will encapsulate the root node and provide methods for insert, search, and starts_with.

Insertion (`insert`)

To insert a word:

Start at the root node.
For each character in the word: a. Check if the character already exists as a child of the current node. b. If not, create a new TrieNode for it and add it to the children dictionary. c. Move to that child node.
Once all characters are processed, mark the final node as is_end_of_word = True.

# trie_implementation.py
from trie_node import TrieNode # Assuming TrieNode is in trie_node.py

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert(self, word: str) -> None:
        """
        Inserts a word into the Trie.
        """
        current_node = self.root
        for char in word:
            if char not in current_node.children:
                current_node.children[char] = TrieNode()
            current_node = current_node.children[char]
        current_node.is_end_of_word = True
        print(f"'{word}' inserted.")

# --- Example Usage for Insertion ---
if __name__ == "__main__":
    my_trie = Trie()
    print("--- Inserting Words ---")
    my_trie.insert("apple")
    my_trie.insert("apricot")
    my_trie.insert("apply")
    my_trie.insert("app")
    my_trie.insert("banana")

--- Inserting Words ---
'apple' inserted.
'apricot' inserted.
'apply' inserted.
'app' inserted.
'banana' inserted.

Search (`search`)

To search for a complete word:

Start at the root node.
For each character in the word: a. Check if the character exists as a child of the current node. b. If not, the word is not in the Trie; return False. c. If yes, move to that child node.
After processing all characters, check if the is_end_of_word flag of the final node is True. If it is, the word exists; return True. Otherwise, it’s just a prefix of another word, not a full word itself; return False.

# trie_implementation.py (continuation)

    def search(self, word: str) -> bool:
        """
        Searches for a complete word in the Trie.
        Returns True if the word is found, False otherwise.
        """
        current_node = self.root
        for char in word:
            if char not in current_node.children:
                return False # Character not found, so word doesn't exist
            current_node = current_node.children[char]
        return current_node.is_end_of_word # Check if it's marked as a complete word

# --- Example Usage for Search ---
if __name__ == "__main__":
    my_trie = Trie()
    my_trie.insert("apple")
    my_trie.insert("apricot")
    my_trie.insert("apply")
    my_trie.insert("app")
    my_trie.insert("banana")

    print("\n--- Searching for Words ---")
    print(f"Is 'apple' in Trie? {my_trie.search('apple')}")        # Expected: True
    print(f"Is 'app' in Trie? {my_trie.search('app')}")          # Expected: True
    print(f"Is 'appl' in Trie? {my_trie.search('appl')}")         # Expected: False (is a prefix, but not a full word)
    print(f"Is 'ap' in Trie? {my_trie.search('ap')}")            # Expected: False
    print(f"Is 'ban' in Trie? {my_trie.search('ban')}")           # Expected: False
    print(f"Is 'banana' in Trie? {my_trie.search('banana')}")     # Expected: True
    print(f"Is 'orange' in Trie? {my_trie.search('orange')}")     # Expected: False

--- Inserting Words ---
'apple' inserted.
'apricot' inserted.
'apply' inserted.
'app' inserted.
'banana' inserted.

--- Searching for Words ---
Is 'apple' in Trie? True
Is 'app' in Trie? True
Is 'appl' in Trie? False
Is 'ap' in Trie? False
Is 'ban' in Trie? False
Is 'banana' in Trie? True
Is 'orange' in Trie? False

Prefix Search (`starts_with`)

To check if any word in the Trie starts with a given prefix:

Start at the root node.
For each character in the prefix: a. Check if the character exists as a child of the current node. b. If not, no word starts with this prefix; return False. c. If yes, move to that child node.
If you successfully traverse all characters of the prefix, it means that prefix exists in the Trie (either as a full word or as a prefix of other words); return True. No need to check is_end_of_word here.

# trie_implementation.py (continuation)

    def starts_with(self, prefix: str) -> bool:
        """
        Checks if any word in the Trie starts with the given prefix.
        Returns True if a prefix is found, False otherwise.
        """
        current_node = self.root
        for char in prefix:
            if char not in current_node.children:
                return False # Prefix not found
            current_node = current_node.children[char]
        return True # Successfully traversed the entire prefix

# --- Example Usage for Prefix Search ---
if __name__ == "__main__":
    my_trie = Trie()
    my_trie.insert("apple")
    my_trie.insert("apricot")
    my_trie.insert("apply")
    my_trie.insert("app")
    my_trie.insert("banana")

    print("\n--- Checking Prefixes ---")
    print(f"Does any word start with 'app'? {my_trie.starts_with('app')}")   # Expected: True
    print(f"Does any word start with 'ap'? {my_trie.starts_with('ap')}")     # Expected: True
    print(f"Does any word start with 'ban'? {my_trie.starts_with('ban')}")    # Expected: True
    print(f"Does any word start with 'appl'? {my_trie.starts_with('appl')}")  # Expected: True (apple, apply)
    print(f"Does any word start with 'ora'? {my_trie.starts_with('ora')}")    # Expected: False
    print(f"Does any word start with 'z'? {my_trie.starts_with('z')}")      # Expected: False

--- Inserting Words ---
'apple' inserted.
'apricot' inserted.
'apply' inserted.
'app' inserted.
'banana' inserted.

--- Checking Prefixes ---
Does any word start with 'app'? True
Does any word start with 'ap'? True
Does any word start with 'ban'? True
Does any word start with 'appl'? True
Does any word start with 'ora'? False
Does any word start with 'z'? False

Collecting All Words with a Given Prefix (Autocomplete Helper)

This is a very common and practical use case for Tries, enabling features like autocomplete. To implement this, we first find the node corresponding to the end of the prefix. Then, from that node, we perform a Depth-First Search (DFS) or Breadth-First Search (BFS) to collect all words by traversing down all possible paths, adding words to a list when is_end_of_word is true.

We’ll add a helper method (_collect_all_words) to our Trie class.

# trie_implementation.py (continuation)

    def _collect_all_words(self, node: TrieNode, current_prefix: str, words: list):
        """
        Helper function to recursively collect all words from a given node.
        """
        if node.is_end_of_word:
            words.append(current_prefix)

        for char, child_node in node.children.items():
            self._collect_all_words(child_node, current_prefix + char, words)

    def get_words_with_prefix(self, prefix: str) -> list[str]:
        """
        Finds all words in the Trie that start with the given prefix.
        Returns a list of matching words.
        """
        current_node = self.root
        # Traverse to the end of the prefix
        for char in prefix:
            if char not in current_node.children:
                return [] # No words start with this prefix
            current_node = current_node.children[char]

        # Once at the prefix's end node, collect all words from there
        words = []
        self._collect_all_words(current_node, prefix, words)
        return words

# --- Example Usage for Collecting Words with Prefix ---
if __name__ == "__main__":
    my_trie = Trie()
    my_trie.insert("apple")
    my_trie.insert("apricot")
    my_trie.insert("apply")
    my_trie.insert("app")
    my_trie.insert("banana")
    my_trie.insert("band")
    my_trie.insert("bandage")
    my_trie.insert("aptitude")

    print("\n--- Getting Words by Prefix ---")
    print(f"Words starting with 'ap': {my_trie.get_words_with_prefix('ap')}")
    print(f"Words starting with 'app': {my_trie.get_words_with_prefix('app')}")
    print(f"Words starting with 'appl': {my_trie.get_words_with_prefix('appl')}")
    print(f"Words starting with 'bana': {my_trie.get_words_with_prefix('bana')}")
    print(f"Words starting with 'ban': {my_trie.get_words_with_prefix('ban')}")
    print(f"Words starting with 'test': {my_trie.get_words_with_prefix('test')}")
    print(f"Words starting with '': {my_trie.get_words_with_prefix('')}") # All words

--- Inserting Words ---
'apple' inserted.
'apricot' inserted.
'apply' inserted.
'app' inserted.
'banana' inserted.
'band' inserted.
'bandage' inserted.
'aptitude' inserted.

--- Getting Words by Prefix ---
Words starting with 'ap': ['apple', 'apply', 'app', 'apricot', 'aptitude']
Words starting with 'app': ['apple', 'apply', 'app']
Words starting with 'appl': ['apple', 'apply']
Words starting with 'bana': ['banana']
Words starting with 'ban': ['banana', 'band', 'bandage']
Words starting with 'test': []
Words starting with '': ['apple', 'apply', 'app', 'apricot', 'aptitude', 'banana', 'band', 'bandage']

When to Use a Trie (Pros & Cons)

Tries aren’t a one-size-fits-all solution, but they excel in specific scenarios.

Advantages of Tries:

Fast Prefix-Based Operations: The primary strength. Searching for words, checking prefixes, or finding all words with a given prefix can be done in O(L) time, where L is the length of the word/prefix. This is often faster than hashing (which can have O(L) on average for string keys due to hash calculation, but O(L * N) in worst-case collisions for N words) or balanced trees (which are O(L * log N)).
Space Efficiency for Common Prefixes: When many words share prefixes (e.g., “apple”, “apply”, “applet”), the common parts are stored only once, potentially saving space compared to storing each word individually.
Alphabetical Ordering (Implicit): If children are stored in a sorted manner (e.g., a sorted list of (char, node) tuples, though a dictionary/hash map is more common for lookup speed), you can easily retrieve words in alphabetical order. This is a natural property of the tree structure.
No Collisions: Unlike hash tables, Tries inherently avoid hash collisions, which can degrade performance.

Disadvantages of Tries:

Space Complexity: For datasets with very diverse words and few shared prefixes (e.g., random strings), a Trie can consume significantly more memory than a hash set. Each node requires memory for its children dictionary/map and the is_end_of_word flag.
Memory Overhead per Node: The overhead of storing pointers/references (or dictionaries) for potentially 26 (or more, for Unicode) children at each node can be substantial if nodes are sparse.
Deletion Complexity: Deleting words from a Trie is not as straightforward as insertion or search. If a node is part of multiple words (as a prefix for some), it cannot simply be removed. It often requires reference counting or careful recursive checks to ensure you don’t break other words.
- Note: For this introductory post, we skipped deletion to keep things focused on the core strengths and prevent over-complicating the example. Be aware it’s a non-trivial operation!

Common Use Cases

Tries shine in applications where string prefixes are fundamental:

Autocomplete Systems: Type “app”, and it suggests “apple”, “apply”, “appetizer”.
Spell Checkers and Dictionaries: Efficiently check if a word exists and suggest corrections (e.g., finding words with minimal edit distance from a misspelled word).
IP Routing: Routers use Tries to match IP addresses with their longest prefix in the routing table, determining the next hop for network packets.
T9 Predictive Text: Similar to autocomplete, but mapping numbers to potential words.
Browser History Search: Quickly finding previously visited URLs based on a typed prefix.
Bioinformatics: Storing DNA or RNA sequences for efficient pattern matching.

Conclusion

The Trie, or prefix tree, is an elegant and highly efficient data structure for handling string-related problems, especially those involving prefixes. While it might consume more memory in certain scenarios, its speed for operations like searching and prefix-matching makes it invaluable in applications where quick lookups and suggestions are critical.

Understanding Tries expands your algorithmic toolkit significantly, enabling you to build more performant and responsive string-based features. So, the next time you see an autocomplete dropdown, you’ll know a little bit more about the magic happening behind the scenes!

Programming Myths That Waste Your Time

Jun 17, 2025

How Wave Mechanics Informs Optical Computing

Jun 17, 2025

How to Think Like a Physicist When Debugging Systems

Jun 17, 2025

Understanding and Implementing Circular Linked Lists

Jun 18, 2025

Temporal Logic and Why Reactive Systems Can Be So Hard

Jun 17, 2025

OpenAI’s $10B Valuation Spike Why Investors Bet Big on LLMs

Jun 27, 2025