Hybrid Fusion: Score Integration Layer

The defining strength of the system lies in its hybrid score fusion mechanism. When a query is received, lexical and semantic engines operate concurrently. Their outputs are merged through configurable score-combination strategies.

One such method is Reciprocal Rank Fusion (RRF):

RRF(d) = \sum \frac{1}{k + rank(d)}

Alternatively, a weighted linear combination can be used:

Score_{final} = w_l \cdot Score_{lexical} + w_s \cdot Score_{semantic}

The weights are domain-tunable. In legal search, lexical weighting may dominate to preserve citation integrity, while semantic weighting supplements interpretive depth.

Additionally, frequency-based precedence rules ensure that strong exact matches are never overshadowed by semantic approximations. This prevents the common failure mode where embeddings distort statutory precision.

Example in Practice: Merging Search Results

# From HybridIndex logic
def merge_results(hot_results, cold_results):
    merged = {}
    all_terms = set(hot_results.keys()) | set(cold_results.keys())

    for term in all_terms:
        doc_positions = {}
        
        # Integrate hot index results (In-memory)
        for doc_id, positions in hot_results.get(term, []):
            doc_positions.setdefault(doc_id, []).extend(positions)
            
        # Integrate cold index results (On-disk B-Tree)
        for doc_id, positions in cold_results.get(term, []):
            doc_positions.setdefault(doc_id, []).extend(positions)
            
        # Deduplicate the merged data
        for doc_id in doc_positions:
            doc_positions[doc_id] = sorted(set(doc_positions[doc_id]))
            
        merged[term] = sorted(doc_positions.items(), key=lambda x: x[0])
        
    return merged

Performance Characteristics

Performance optimization is embedded at every architectural layer:

In-memory indexing for high-frequency access
B-Tree disk persistence for scale
ANN-based vector search for sub-linear retrieval
Parallel ingestion using thread pools
Efficient binary serialization using msgpack
CUDA acceleration for embeddings

Target latencies under optimized configuration are:

Lexical Search: < 20ms
Semantic Search: < 80ms
Hybrid Search: < 120ms

Example in Practice: Merging Search Results​

Performance Characteristics​

Example in Practice: Merging Search Results

Performance Characteristics