Hybrid Fusion: Score Integration Layer
The defining strength of the system lies in its hybrid score fusion mechanism. When a query is received, lexical and semantic engines operate concurrently. Their outputs are merged through configurable score-combination strategies.
One such method is Reciprocal Rank Fusion (RRF):
Alternatively, a weighted linear combination can be used:
The weights are domain-tunable. In legal search, lexical weighting may dominate to preserve citation integrity, while semantic weighting supplements interpretive depth.
Additionally, frequency-based precedence rules ensure that strong exact matches are never overshadowed by semantic approximations. This prevents the common failure mode where embeddings distort statutory precision.
Example in Practice: Merging Search Results
# From HybridIndex logic
def merge_results(hot_results, cold_results):
merged = {}
all_terms = set(hot_results.keys()) | set(cold_results.keys())
for term in all_terms:
doc_positions = {}
# Integrate hot index results (In-memory)
for doc_id, positions in hot_results.get(term, []):
doc_positions.setdefault(doc_id, []).extend(positions)
# Integrate cold index results (On-disk B-Tree)
for doc_id, positions in cold_results.get(term, []):
doc_positions.setdefault(doc_id, []).extend(positions)
# Deduplicate the merged data
for doc_id in doc_positions:
doc_positions[doc_id] = sorted(set(doc_positions[doc_id]))
merged[term] = sorted(doc_positions.items(), key=lambda x: x[0])
return merged
Performance Characteristics
Performance optimization is embedded at every architectural layer:
- In-memory indexing for high-frequency access
- B-Tree disk persistence for scale
- ANN-based vector search for sub-linear retrieval
- Parallel ingestion using thread pools
- Efficient binary serialization using msgpack
- CUDA acceleration for embeddings
Target latencies under optimized configuration are:
- Lexical Search: < 20ms
- Semantic Search: < 80ms
- Hybrid Search: < 120ms