Skip to main content

Hybrid Fusion: Score Integration Layer

The defining strength of the system lies in its hybrid score fusion mechanism. When a query is received, lexical and semantic engines operate concurrently. Their outputs are merged through configurable score-combination strategies.

One such method is Reciprocal Rank Fusion (RRF):

RRF(d)=1k+rank(d)RRF(d) = \sum \frac{1}{k + rank(d)}

Alternatively, a weighted linear combination can be used:

Scorefinal=wlScorelexical+wsScoresemanticScore_{final} = w_l \cdot Score_{lexical} + w_s \cdot Score_{semantic}

The weights are domain-tunable. In legal search, lexical weighting may dominate to preserve citation integrity, while semantic weighting supplements interpretive depth.

Additionally, frequency-based precedence rules ensure that strong exact matches are never overshadowed by semantic approximations. This prevents the common failure mode where embeddings distort statutory precision.

Example in Practice: Merging Search Results

# From HybridIndex logic
def merge_results(hot_results, cold_results):
merged = {}
all_terms = set(hot_results.keys()) | set(cold_results.keys())

for term in all_terms:
doc_positions = {}

# Integrate hot index results (In-memory)
for doc_id, positions in hot_results.get(term, []):
doc_positions.setdefault(doc_id, []).extend(positions)

# Integrate cold index results (On-disk B-Tree)
for doc_id, positions in cold_results.get(term, []):
doc_positions.setdefault(doc_id, []).extend(positions)

# Deduplicate the merged data
for doc_id in doc_positions:
doc_positions[doc_id] = sorted(set(doc_positions[doc_id]))

merged[term] = sorted(doc_positions.items(), key=lambda x: x[0])

return merged

Performance Characteristics

Performance optimization is embedded at every architectural layer:

  • In-memory indexing for high-frequency access
  • B-Tree disk persistence for scale
  • ANN-based vector search for sub-linear retrieval
  • Parallel ingestion using thread pools
  • Efficient binary serialization using msgpack
  • CUDA acceleration for embeddings

Target latencies under optimized configuration are:

  • Lexical Search: < 20ms
  • Semantic Search: < 80ms
  • Hybrid Search: < 120ms