Real-Time Search in 100ms Using Prefix Hashing

All Notes

Technology

Abstract green matrix code background with binary style.

Real-Time Search in 100ms Using Prefix Hashing

The expectation for modern applications is instantaneous feedback. Whether you’re typing a search query, a command in an IDE, or an address in a mapping application, users anticipate results appearing almost as fast as they can type. This isn’t just a nicety; it’s a fundamental part of a seamless user experience. Achieving real-time search, particularly with a sub-100ms latency target, presents a fascinating set of challenges that traditional search engine architectures often struggle with.

While comprehensive full-text search engines like Elasticsearch or Apache Solr excel at complex queries over vast datasets, their overhead—indexing latency, query parsing, scoring, and distributed nature—can make achieving strict 100ms end-to-end response times for simple prefix queries a stretch without significant optimization or specific design patterns. This is where specialized techniques come into play, and one powerful, often overlooked approach for prefix-based searches is prefix hashing.

The Quest for Ultra-Low Latency Search

Before we dive into prefix hashing, let’s understand why real-time search, especially at the sub-100ms threshold, is such a formidable challenge.

Scale: Modern datasets can contain millions, billions, or even trillions of documents or terms. Searching through such volumes quickly is inherently difficult.
Dynamic Data: Data is rarely static. New items are added, old ones removed, and existing ones updated, often continuously. The search index must reflect these changes rapidly.
Latency Expectations: “Real-time” for a user typically means “perceptibly instant.” Cognitive science suggests that latencies above 100-200ms start to feel sluggish ¹. For interactive typing scenarios (autocomplete), anything above 50ms can be noticeable.

Traditional full-text search engines primarily rely on inverted indexes. An inverted index maps words (terms) to the documents containing them. While incredibly powerful for full-text search, retrieving all documents containing a specific prefix (e.g., “appl” for “apple”, “application”, “appliance”) requires iterating through all terms starting with that prefix or leveraging a data structure like a trie internally. For extremely high-volume, low-latency prefix lookups, we can optimize further.

Unpacking Prefix Hashing

Prefix hashing, at its core, is a technique optimized for prefix-based queries (like autocomplete or type-ahead suggestions). Instead of indexing full terms or relying solely on tree-like structures, it pre-computes and stores hashes for various prefixes of your searchable terms.

Imagine you have a dictionary of words: apple, application, apply, apricot, banana.

For a query “app”, a traditional system might:

Traverse an inverted index.
Traverse a trie to find all words under the “app” node.

With prefix hashing, the idea is to generate hashes for specific prefixes of all words in your vocabulary and store them in a hash map.

How it Works Conceptually:

Term Decomposition: For each term, generate all its prefixes up to a certain maximum length (e.g., 1-gram, 2-gram, …, N-gram prefixes).
- apple -> a, ap, app, appl, apple
- application -> a, ap, app, appl, appli, …, application
Prefix Hashing: For each generated prefix, compute a hash value (e.g., a long long integer).
Index Construction: Store these prefix hashes in a primary hash map where the key is the prefix hash and the value is a list of references (e.g., term IDs) to the actual full terms that contain this prefix.
- hash("app") -> [term_id_apple, term_id_application, term_id_apply]
- hash("appl") -> [term_id_apple, term_id_application]
- hash("ap") -> [term_id_apple, term_id_application, term_id_apply, term_id_apricot]
Term Storage: The actual full terms and any associated data (e.g., frequency, metadata) are stored separately, usually in a contiguous array or another map where they can be quickly retrieved using their term_id.

When a user types “appl”:

The system computes hash("appl").
It looks up this hash in the prefix hash map.
It retrieves the list [term_id_apple, term_id_application].
It fetches the full terms corresponding to these IDs: apple, application.
Crucially, it then performs a final string comparison on these retrieved terms (apple.startsWith("appl"), application.startsWith("appl")) to filter out any false positives due to hash collisions. While modern hash functions have low collision rates, they are not zero, and this step ensures accuracy.

Prefix Hashing vs. Tries vs. Inverted Indexes

Tries (Prefix Trees): Tries are excellent for prefix matching. They structure data as a tree where each node represents a character, and paths from the root form words. They offer guaranteed prefix matches without collisions. However, for very large alphabets (e.g., Unicode characters) or deep words, tries can be memory-intensive due to the pointers/map at each node. Traversal depth can also be a factor, though generally fast.
Inverted Indexes: Designed for full-text search where you need to find documents containing specific words or combinations of words, regardless of their position. They are highly flexible and support complex queries but are less optimized for the very specific pattern of “give me all words starting with X” in a sub-100ms window without a preceding trie or B-tree structure.
Prefix Hashing: Trades memory for speed. It leverages the O(1) average-case lookup time of hash maps. The main overhead is the memory for storing hashes and lists of IDs, and the potential for hash collisions. It’s often more memory-efficient than a full trie for certain sparse datasets because it only stores specific prefixes, not every character path.

For ultra-low latency, the O(1) average-case lookup of a hash map (plus a small list iteration) often beats tree traversals, especially when cache locality is well-managed.

Core Concepts & Data Structures for 100ms Search

Achieving sub-100ms latency requires making almost everything an in-memory operation with minimal CPU cycles.

The Primary Index: std::unordered_map<uint64_t, std::vector<int>> (or similar)
- Key: uint64_t (64-bit integer hash of the prefix). Using 64-bit hashes significantly reduces collision probability compared to 32-bit.
- Value: std::vector<int> (a dynamic array of term_ids). int (or uint32_t) is typically sufficient if your vocabulary has fewer than ~4 billion terms. std::vector is generally efficient for small to medium-sized lists, offering good cache locality.
- Why unordered_map (Hash Map): Average O(1) lookup time, which is paramount for speed.
- Note: For maximum performance and predictability, especially in C++, you might consider custom hash table implementations that are more cache-aware or use open addressing instead of chaining, though std::unordered_map is often sufficient.
The Term Dictionary: std::vector<std::string> (or std::string_view for more efficiency)
- This holds the actual full terms. When a term_id (an integer index) is returned from the prefix hash map, it directly corresponds to an index in this vector to retrieve the original term string.
- std::string_view (C++17+) or similar slice/span types can be used to avoid copying strings if the original strings are stored contiguously in memory.
- If you need to store more than just the term (e.g., frequency, relevance score, associated document ID), you’d use a std::vector<TermRecord> where TermRecord is a struct containing the string and other metadata.
Prefix Length Strategy:
- This is a critical design decision. What prefixes do you hash?
- Fixed length: e.g., only 3-char prefixes. appl would not be hashed, only app. This limits search depth.
- All prefixes up to N: e.g., for application, hash a, ap, app, appl, appli, applic, applica, applicat, applicati, applicatio, application. If N is 10, then application (length 11) would only have its first 10 prefixes hashed.
- Typical Strategy: Hash all prefixes from length 1 up to a reasonable maximum (e.g., 10-15 characters). Longer prefixes are more specific, leading to fewer results and fewer collisions. Shorter prefixes (a, b) will have huge result lists, which can be problematic, but are necessary for early suggestions.
Hashing Function:
- A fast, high-quality non-cryptographic hash function is essential. Examples:
  - MurmurHash3: Widely used, fast, good distribution. MurmurHash on GitHub
  - xxHash: Extremely fast, good collision resistance. xxHash on GitHub
  - FNV-1a: Simpler, but typically slower and less collision-resistant than MurmurHash or xxHash for string hashing.
- The hash function must be deterministic (same input always yields same output).

Implementation Details & Optimizations

To genuinely hit the 100ms mark, every millisecond counts.

Memory Layout & Cache Locality:
- Keep related data together in memory. std::vector is good because its elements are contiguous.
- When the prefix hash map returns term_ids, iterating through std::vector<int> and then using those ints as indices into a std::vector<std::string> (or TermRecords) means jumping around memory. This can be a cache miss generator.
- Optimization: If the lists of term_ids for a hash are consistently very small, consider storing them directly in the hash map value using fixed-size arrays or small-object optimization (e.g., an std::array<int, N> or a custom small vector type), potentially spilling over to a std::vector for larger lists.
- Tuning std::unordered_map: Its load_factor significantly impacts performance. A lower load factor (more empty buckets) reduces collisions and speeds up lookups but uses more memory.
Indexing Process (Offline vs. Online):
- Initial Build (Offline): For a large vocabulary, the initial index build can take time (minutes to hours). This is typically done offline and then loaded into memory.
- Incremental Updates (Online): For truly real-time updates (new terms added/removed), you need to manage the hash map and term dictionary dynamically.
  - Adding a term: Compute all its prefixes, hash them, and add the term_id to the corresponding std::vector in the hash map. Add the term to the term_dictionary.
  - Removing a term: This is harder. You’d need to iterate through all prefixes of the term and remove its term_id from the lists. Or, more simply, mark the term as “deleted” in your TermRecord and filter it out during search, performing a full cleanup periodically (garbage collection).
Query Flow Refinements:
- Minimum Prefix Length: Don’t allow queries shorter than, say, 2 or 3 characters, or results for “a” or “the” will be unmanageably large.
- Result Limit: For autocomplete, users rarely need more than 5-10 suggestions. Limit the number of results returned by the search, which can prune the final filtering step.
- Scoring/Relevance: The returned term_ids are just matches. To make them useful, you’ll need a scoring mechanism (e.g., term frequency, recency, popularity, user-specific history). Sort the filtered results by this score. This scoring can add a few milliseconds, so it must be lightweight.
- Asynchronous Processing: Perform the search on a separate thread or use an async model to prevent blocking the UI thread. The 100ms budget includes the time from keypress to display.
Optimizing for Collisions:
- Even with a 64-bit hash function, collisions can occur, especially if you have billions of terms and are hashing short prefixes.
- The final string comparison (startsWith check) is crucial. Ensure this is highly optimized. If your terms are stored in a contiguous char array, this can be extremely fast.

Example Data Structures (Conceptual C++):

#include <unordered_map>
#include <vector>
#include <string>
#include <cstdint> // For uint64_t
#include <algorithm> // For std::sort, std::remove_if

// Assume a fast hash function exists, e.g., from xxHash library
// uint64_t xxh64(const char* input, size_t length, uint64_t seed);

class PrefixSearchIndex {
public:
    // Stores term strings and optional metadata (e.g., score)
    struct TermData {
        std::string term;
        double score; // Example: frequency, popularity, etc.
        // Add more metadata as needed
    };

    // Main index: prefix hash -> list of term IDs
    std::unordered_map<uint64_t, std::vector<int>> prefix_hash_to_term_ids;
    // Stores the actual term data, indexed by term_id
    std::vector<TermData> term_dictionary;

    int next_term_id = 0; // Simple auto-incrementing ID

    // Add a term to the index
    void add_term(const std::string& term_string, double score = 1.0) {
        int current_term_id = next_term_id++;
        term_dictionary.push_back({term_string, score});

        // Generate and index all prefixes
        for (int i = 1; i <= term_string.length() && i <= MAX_PREFIX_LENGTH; ++i) {
            std::string prefix = term_string.substr(0, i);
            uint64_t prefix_hash = xxh64(prefix.c_str(), prefix.length(), HASH_SEED);
            prefix_hash_to_term_ids[prefix_hash].push_back(current_term_id);
        }
    }

    // Search for terms matching a prefix
    std::vector<TermData> search(const std::string& query_prefix, int max_results = 10) const {
        if (query_prefix.empty() || query_prefix.length() > MAX_PREFIX_LENGTH) {
            return {}; // Or handle short/long queries differently
        }

        uint64_t query_hash = xxh64(query_prefix.c_str(), query_prefix.length(), HASH_SEED);

        // Fast lookup for the hash bucket
        auto it = prefix_hash_to_term_ids.find(query_hash);
        if (it == prefix_hash_to_term_ids.end()) {
            return {}; // No matches for this hash
        }

        // Retrieve candidates and perform final string comparison to filter collisions
        std::vector<TermData> results;
        for (int term_id : it->second) {
            if (term_id < term_dictionary.size()) { // Basic bounds check
                const TermData& data = term_dictionary[term_id];
                // Crucial: Actual string comparison for accuracy and collision handling
                if (data.term.rfind(query_prefix, 0) == 0) { // Check if term starts with query_prefix
                    results.push_back(data);
                }
            }
        }

        // Sort results by score (descending)
        std::sort(results.begin(), results.end(), [](const TermData& a, const TermData& b) {
            return a.score > b.score;
        });

        // Limit results
        if (results.size() > max_results) {
            results.resize(max_results);
        }

        return results;
    }

private:
    static constexpr int MAX_PREFIX_LENGTH = 15; // Max prefix length to index
    static constexpr uint64_t HASH_SEED = 0; // Seed for the hash function
};

// Dummy xxHash64 implementation for compilation (replace with actual library)
uint64_t xxh64(const char* input, size_t length, uint64_t seed) {
    // This is a placeholder. Use a real xxHash library for production.
    uint64_t hash = seed;
    for (size_t i = 0; i < length; ++i) {
        hash = (hash << 5) + hash + input[i]; // Simple polynomial rolling hash for example
    }
    return hash;
}

Achieving 100ms Latency: Realism and Constraints

The “100ms” target is highly ambitious and depends on several factors:

Dataset Size:
- Millions of terms (e.g., 1-10 million): Highly achievable on a single modern server with sufficient RAM. The memory footprint might be in the single-digit GBs.
- Billions of terms: This becomes a distributed systems problem. While prefix hashing can be adapted (e.g., sharding the hash map across multiple machines), the inter-node communication and aggregation will likely push the latency beyond 100ms for a single query unless extremely optimized. For single-machine, it’s likely too much memory.
Memory Budget: The entire index must reside in RAM. Storing all prefixes can be memory-intensive.
- Example: 10 million terms, average 7 prefixes per term, each prefix hash + vector overhead might be 16 bytes. Plus the term string itself. This adds up. (10M terms * 7 prefixes/term) * (8 bytes hash + 4 bytes term_id + vector overhead) + (10M terms * avg_term_len)
Hardware: Fast CPUs, ample and fast RAM (DDR4/5), and potentially NVMe SSDs if the index needs to be loaded quickly from disk at startup.
Concurrency: If many users are querying concurrently, the system needs to handle the load without degrading latency. This implies efficient multithreading or asynchronous I/O (though for in-memory, mostly CPU-bound).
Simplicity of Query: This technique is for simple prefix matching. Adding fuzzy matching, complex boolean logic, or full-text relevance scoring dramatically increases complexity and likely latency.

Note: The 100ms latency goal is primarily for the query execution time itself – from receiving the query to returning the results. It typically does not include network latency, client-side rendering, or complex data manipulation after the initial search.

Trade-offs and Limitations

No solution is perfect. Prefix hashing has its own set of compromises:

Memory Footprint: Indexing all prefixes can consume significant RAM, especially for a large vocabulary with long average term lengths. This might limit the scale achievable on a single machine.
Indexing Time: While search is fast, building the initial index can be time-consuming, requiring pre-computation or an efficient pipeline for continuous updates.
Collision Handling: While rare with good hash functions, collisions mean more data is retrieved than strictly necessary, requiring an extra filtering step. This adds a small, but measurable, overhead.
Limited Query Types: Excellent for prefix matching. Poor for substring search (“find ‘apple’ anywhere in the word”), fuzzy matching (“aple”), or complex keyword combinations.
Relevance Scoring Complexity: While you can attach scores to terms, implementing sophisticated relevance models (e.g., TF-IDF, BM25) directly within this simple structure is challenging. You’d typically only score terms, not documents containing them.

Use Cases

Prefix hashing shines in scenarios where:

Autocomplete/Type-ahead Suggestions: The canonical use case. As a user types, provide instant suggestions from a predefined vocabulary (e.g., product names, user names, commands).
Command Palettes: In IDEs (like VS Code), editors, or applications, where typing part of a command brings up a list of matching actions.
Dictionary Lookups: Fast lookup for definitions or translations based on a typed prefix.
URL/File Path Auto-completion: In browsers or file explorers.

Conclusion

Achieving real-time search in 100ms using prefix hashing is a testament to the power of specialized data structures and meticulous optimization. By pre-computing and indexing hashes of prefixes, leveraging the O(1) average-case lookup of hash maps, and keeping all operations strictly in-memory, it’s possible to deliver a blazing-fast user experience for prefix-based queries.

However, it’s crucial to understand its limitations. Prefix hashing is not a general-purpose search engine replacement. It’s a targeted solution for a specific problem: providing instant, accurate suggestions from a defined vocabulary. When this specific need aligns with your application’s requirements, prefix hashing offers an elegant and incredibly performant path to truly real-time search.

References

MurmurHash3: GitHub Repository
xxHash: GitHub Repository
C++ std::unordered_map documentation: cppreference.com
Trie (Prefix Tree) Overview: GeeksForGeeks Article (For comparison of data structures)
Inverted Index Overview: Wikipedia (For comparison of search approaches)

Nielsen, Jakob. “Response Times: The 3 Important Limits.” Nielsen Norman Group, 1993. Link to Article (A foundational text on user experience and latency perception). ↩︎

Understanding the Template Method Pattern

Jun 18, 2025

How to Use fzf + ripgrep to Instantly Find Anything

Jun 17, 2025

Decoding the Word Break Problem - A Developers Deep Dive

Jun 18, 2025

Dynamic Arrays in JS What Happens When You Push Too Far

Jun 17, 2025

Firebase for Indie Devs Lightweight and Scalable

Jun 17, 2025

Demystifying Flood Fill - A Practical Guide for Developers

Jun 18, 2025