Data & Search Engineer - Immediate joiners
We are looking for a strong Data & Search Engineer to design, build, and operate the data ingestion, indexing, and retrieval foundation for enterprise AI and Agentic AI solutions.
This role is critical for enabling accurate, secure, and scalable AI-powered search, document intelligence, knowledge retrieval, and RAG-based applications. The ideal candidate should have hands-on experience in handling structured and unstructured enterprise data, designing chunking and enrichment strategies, optimizing search relevance, and validating retrieval quality.
The candidate should be comfortable working with large volumes of documents, enterprise metadata, security-aware indexing, hybrid search, and Azure AI services.
- Key Responsibilities
Data Ingestion & Processing
Design and build scalable ingestion pipelines for structured, semi-structured, and unstructured data sources such as PDFs, Word documents, Excel files, SharePoint, databases, APIs, and enterprise repositories. - Develop robust document parsing, cleaning, normalization, and transformation workflows.
- Implement document chunking strategies based on structure, sections, headings, tables, document type, and business context.
- Maintain document identifiers, source references, version history, and lineage information across ingestion and indexing workflows.
- Metadata, Enrichment & Governance
Design metadata schemas for enterprise search and RAG use cases. - Enrich content with document-level, section-level, topic-level, and security-level metadata.
- Implement tagging, classification, topic extraction, entity extraction, and semantic enrichment pipelines.
- Ensure support for RBAC-aware retrieval, data masking, access control filtering, and secure indexing practices.
- Search, Indexing & Retrieval
Build and tune hybrid search solutions combining semantic search, vector search, and keyword-based search. - Design and maintain indexes for enterprise-grade retrieval performance.
- Work with vector databases, Azure AI Search, embeddings, and ranking strategies.
- Optimize retrieval relevance using filters, scoring profiles, reranking, metadata boosts, and query expansion.
- Evaluate chunk quality, index quality, and retrieval performance through systematic testing.
- Retrieval Quality & Evaluation
Define and execute retrieval evaluation frameworks using relevance metrics, test query sets, golden datasets, and human review feedback. - Identify issues such as poor chunking, missing metadata, irrelevant retrieval, duplicate chunks, hallucination risk, and low-confidence answers.
- Continuously improve ingestion and indexing strategies based on evaluation results.
- Support RAG and Agentic AI teams with reliable, explainable, and traceable retrieval foundations.
Required Skills & Experience
- Strong experience in enterprise data ingestion, search engineering, indexing, and retrieval.
- Hands-on knowledge of document chunking, metadata modeling, content enrichment, and data preprocessing.
- Experience with hybrid search: semantic search, vector search, full-text search, and keyword search.
- Strong understanding of embeddings, vector indexing, similarity search, and relevance tuning.
- Experience with Azure AI Search, Azure OpenAI, Azure AI Document Intelligence, Microsoft Fabric, SharePoint, Microsoft Graph, or related Microsoft AI services.
- Experience with Python and data processing frameworks.
- Good understanding of data masking, access control, RBAC-aware search, and secure data handling.
- Experience working with enterprise documents, knowledge bases, policies, SOPs, contracts, engineering documents, or operational data.
- Ability to validate retrieval quality and improve search accuracy through structured evaluation.
Preferred Technical Stack
Microsoft Azure AI Search
Azure OpenAI Service
Azure AI Document Intelligence
Azure Functions / Azure Container Apps
Microsoft Graph API
SharePoint / OneDrive / Teams data integration
Microsoft Fabric / Synapse / Data Factory
Python
SQL / PostgreSQL / SQL Server
Vector search and embedding models
LangChain / Semantic Kernel / LlamaIndex
Power BI integration awareness is a plus