Blend360
Blend3604h ago
Career Pages

Data Engineer

Hyderabad, TS, in
Full Time
Mid Level

Auto Apply to 50+ AI Matched Data Engineer Jobs

Use Auto Apply Agents to Bulk Apply jobs with ATS Optimised Resumes, find verified Insider Connections for jobs at Blend360

Responsibilities

Qualifications & Requirements

Experience Level: Mid Level

Full Job Description

Blend360 is seeking a hands-on Data Engineer with deep expertise in distributed systems, ETL/ELT development, and enterprise-grade database management to join our team in Hyderabad, Telangana, India. You will be instrumental in designing, implementing, and optimizing ingestion, transformation, and storage workflows for our Media Mix Optimization (MMO) platform. This role requires strong technical fluency across big data frameworks like HDFS, Hive, and PySpark, orchestration platforms such as Apache NiFi, and relational systems like Postgres. Excellent coding skills in Python and SQL are essential for automation, custom transformations, and ensuring operational reliability.

The MMO platform is designed to analyze and optimize marketing investments across multiple channels, requiring a robust on-premises data infrastructure that supports distributed computing, large-scale data ingestion, and advanced analytics. As a Data Engineer, you will build and maintain resilient pipelines and data systems that feed into MMO models, ensuring data quality, governance, and availability for Data Science and BI teams. The environment integrates HDFS for distributed storage, Apache NiFi for orchestration, Hive and PySpark for distributed processing, and Postgres for structured data management.

This role is crucial for enabling the seamless integration of massive datasets from disparate sources (media, campaign, transaction, customer interaction, etc.), standardizing this data, and providing reliable foundations for advanced econometric modeling and insights. Your responsibilities will include:

  • Data Pipeline Development & Orchestration: Design, build, and optimize scalable data pipelines in Apache NiFi to automate ingestion, cleansing, and enrichment from structured, semi-structured, and unstructured sources, ensuring low-latency and high-throughput requirements for distributed processing.
  • Data Storage & Processing: Architect and manage datasets on HDFS for high-volume, fault-tolerant storage. Develop distributed processing workflows in PySpark and Hive to handle large-scale transformations, aggregations, and joins across petabyte-level datasets, implementing partitioning, bucketing, and indexing strategies for optimized query performance.
  • Database Engineering & Management: Maintain and tune Postgres databases for high availability, integrity, and performance. Write advanced SQL queries for ETL, analysis, and integration with downstream BI/analytics systems.
  • Collaboration & Integration: Partner with Data Scientists to deliver clean, reliable datasets for model training and MMO analysis. Work with BI engineers to ensure data pipelines align with reporting and visualization requirements.
  • Monitoring & Reliability Engineering: Implement monitoring, logging, and alerting frameworks to track data pipeline health. Troubleshoot and resolve issues in ingestion, transformations, and distributed jobs.
  • Data Governance & Compliance: Enforce standards for data quality, lineage, and security across systems, ensuring compliance with internal governance and external regulations.
  • Documentation & Knowledge Transfer: Develop and maintain comprehensive technical documentation for pipelines, data models, and workflows. Provide knowledge sharing and onboarding support for cross-functional teams.

Qualifications:

  • Bachelor’s degree in Computer Science, Information Technology, or a related field (Master’s preferred).
  • Proven experience as a Data Engineer with expertise in HDFS, Apache NiFi, Hive, PySpark, Postgres, Python, and SQL.
  • Strong background in ETL/ELT design, distributed processing, and relational database management.
  • Experience with on-premises big data ecosystems supporting distributed computing.
  • Solid debugging, optimization, and performance tuning skills.
  • Ability to work in agile environments, collaborating with multi-disciplinary teams.
  • Strong communication skills for cross-functional technical discussions.

Preferred Qualifications:

  • Familiarity with data governance frameworks, lineage tracking, and data cataloging tools.
  • Knowledge of security standards, encryption, and access control in on-premises environments.
  • Prior experience with Media Mix Modeling (MMM/MMO) or marketing analytics projects.
  • Exposure to workflow schedulers like Airflow, Oozie, or similar.
  • Proficiency in developing automation scripts and frameworks in Python for CI/CD of data pipelines.

Company

Blend360

Blend360

Blend360 is a leading data and AI services company, specializing in data engineering, data science, MLOps, and governance to build scalable analytics solutions. We partner with enterprise and Fortune ...

Hyderabad, TS, in
Posted on Career Pages
Data Engineer at Blend360 | Hyderabad, TS, in | Apply Now | MindMyJob | MindMyJob - AI Job Search Platform