📄️ Lexical Engine
The lexical engine is built upon an inverted indexing framework enhanced by a hot/cold storage architecture. During indexing, documents are tokenized into lexemes, normalized for Unicode consistency, and processed through language-aware stemming pipelines capable of handling both Devanagari and Latin scripts.
📄️ Semantic Engine
Parallel to the lexical system, the semantic engine operates using transformer-generated dense embeddings. Each document is segmented into logical chunks—often aligned to legal sections or structural hierarchies—and transformed into a 384-dimensional embedding vector using a domain-aligned sentence-transformer model.
📄️ Qdrant Integration
Qdrant serves as the vector backbone of the semantic layer. Each stored vector is associated with a structured payload containing:
📄️ Hybrid Fusion
The defining strength of the system lies in its hybrid score fusion mechanism. When a query is received, lexical and semantic engines operate concurrently. Their outputs are merged through configurable score-combination strategies.