What is the salary for this Gen AI Data Engineer position?

Salary information for this Gen AI Data Engineer position is available upon application.

What experience is required for this Gen AI Data Engineer role?

This Gen AI Data Engineer position requires mid_level of experience.

Where is this Gen AI Data Engineer job located?

This Gen AI Data Engineer position is located in Pune, Maharashtra.

How do I apply for this Gen AI Data Engineer position at NielsenIQ?

You can apply for this Gen AI Data Engineer position by clicking the 'Apply Now' button on this page, which will direct you to the official application portal.

Gen AI Data Engineer at NielsenIQ | Pune, Maharashtra | Apply Now | MindMyJob

As a Generative AI Data Engineer, you will build the data infrastructure critical for powering generative AI applications. You will collaborate with software engineers, ML engineers, and product leaders to facilitate rapid experimentation and deploy scalable, production-ready GenAI systems. Your responsibilities include designing flexible and scalable data pipelines, supporting retrieval systems, and ensuring high-quality data flow throughout the GenAI lifecycle, from initial experimentation to production deployment.

Core Responsibilities

Data Engineering & Pipeline Development

Design and build scalable batch and near real-time data pipelines using modern data processing frameworks.
Develop robust ETL/ELT workflows for ingesting and transforming structured and unstructured data (documents, PDFs, APIs, logs, etc.).
Utilize cloud-native orchestration and data processing solutions for reliability and scalability.
Implement reusable data frameworks to support rapid experimentation and iteration cycles.
Ensure data quality through validation, schema enforcement, and automated checks.

GenAI Data Preparation & Experimentation

Prepare and curate datasets for GenAI use cases including RAG, embeddings, and fine-tuning workflows.
Implement data processing steps like chunking, tokenization, metadata enrichment, and semantic structuring.
Enable fast experimentation loops by supporting dynamic datasets and evaluation pipelines.
Collaborate with engineering and product teams for quick iteration on features and experiments.
Transition experimental pipelines into production-ready, robust workflows.

Vector Databases & Retrieval Systems

Build and maintain embedding pipelines using LLM providers and open-source models.
Design and optimize retrieval systems using cloud-native vector databases and hybrid storage solutions.
Work with relational databases that support vector capabilities, such as PostgreSQL with vector extensions.
Implement and optimize RAG pipelines, including indexing, retrieval, ranking, and refresh strategies.
Manage the lifecycle of embeddings, vector indexes, and retrieval datasets.

Data Storage & Platform Engineering

Work with cloud-native data platforms and storage solutions, including data lakes, lakehouses, and object storage.
Design efficient storage schemas for both analytical and retrieval workloads.
Optimize relational and hybrid data stores for low-latency, high-throughput access patterns.
Ensure cost-effective and scalable data storage strategies.

Productionization & Scalability

Convert experimental workflows into scalable, reliable production pipelines.
Optimize pipelines for performance, cost, and reliability.
Implement incremental processing, caching, and efficient refresh strategies.

Monitoring & Data Observability

Implement monitoring for pipeline health, data freshness, and quality.
Track dataset drift, embedding drift, and retrieval effectiveness.
Build logging, alerting, and observability frameworks for data systems.

Collaboration & Cross-Functional Work

Partner closely with engineering teams, product leadership, and data scientists to define and deliver data solutions.
Act as a bridge between rapid experimentation and production engineering.
Contribute to architecture decisions and GenAI data best practices.
Document pipelines, architectures, and data models clearly.

Nice-to-Have / Growth Areas

Experience with GenAI frameworks such as LangChain, LlamaIndex, or similar.
Exposure to knowledge graphs and graph-based retrieval approaches.
Understanding of data governance, lineage, and cataloging.
Experience with experiment tracking and dataset versioning.
Experience working with multi-modal datasets (text, image, audio).

Qualifications

2-3 years of experience in Data Engineering or related roles.
Strong proficiency in Python and SQL.
Hands-on experience with modern data processing frameworks and orchestration tools.
Experience working with cloud-native data platforms on Azure, AWS, or GCP.
Experience with relational databases such as PostgreSQL, including extensions for advanced workloads like vector storage.
Strong understanding of building scalable data pipelines for both structured and unstructured data.
Familiarity with GenAI concepts such as LLMs, embeddings, and RAG architectures.

Soft Skills

Strong collaborator comfortable working with engineers, product managers, and leadership.
Ability to balance rapid experimentation with production rigor.
Strong problem-solving and debugging capabilities across data systems.
Clear communicator with strong documentation practices.
Adaptable and thrives in fast-moving, GenAI-driven environments.

Additional Information

Enjoy a flexible and rewarding work environment with peer-to-peer recognition platforms.
Recharge and revitalize with wellness plans for you and your family.
Plan your future with financial wellness tools.
Stay relevant and upskill yourself with career development opportunities.

Our Benefits

Flexible working environment
Volunteer time off
LinkedIn Learning
Employee-Assistance-Program (EAP)

NIQ may use AI tools in recruitment for tasks like resume screening, assessments, scheduling, job matching, and communication support to enhance efficiency and ensure consistent evaluation based on job-related criteria. All AI use adheres to NIQ's principles of fairness, transparency, human oversight, and inclusion. Final hiring decisions are made by humans. NIQ regularly reviews AI tools to mitigate bias and ensure compliance. For questions, accommodations, or to request human review where legally permitted, contact your local HR representative. Learn more about NIQ's AI Safety Policies and Guiding Principles at https://www.nielseniq.com/global/en/ai-safety-policies.

Gen AI Data Engineer

Auto Apply to 50+ AI Matched Gen AI Data Engineer Jobs

Full Job Description