What is the salary for this Senior Data Engineer - Python & Pyspark position?

Salary information for this Senior Data Engineer - Python & Pyspark position is available upon application.

What experience is required for this Senior Data Engineer - Python & Pyspark role?

This Senior Data Engineer - Python & Pyspark position requires senior_level of experience.

Where is this Senior Data Engineer - Python & Pyspark job located?

This Senior Data Engineer - Python & Pyspark position is located in Chennai, TN,IN, IN.

How do I apply for this Senior Data Engineer - Python & Pyspark position at Citi?

You can apply for this Senior Data Engineer - Python & Pyspark position by clicking the 'Apply Now' button on this page, which will direct you to the official application portal.

Senior Data Engineer - Python & PySpark

About the Role: Citi is seeking a Senior Data Engineer with expertise in Python and PySpark to join our team in Chennai, TN, India. This role involves designing, developing, and optimizing data architectures, pipelines, and models to support critical business needs, including advanced analytics, reporting, and machine learning initiatives.

Key Responsibilities:

Data Architecture & Design: Architect, build, and refine scalable data pipelines, data models, and overall data architectures to meet diverse business objectives.
ETL/ELT Development: Develop, test, and deploy robust ETL/ELT processes using Python and PySpark. Ingest, transform, and load data from various sources into data warehouses and data lakes. Optimize complex data transformations with PySpark.
Data Quality & Governance: Implement and enforce best practices for data quality, governance, and security to ensure data integrity, reliability, and privacy.
Performance Optimization: Monitor, troubleshoot, and enhance the performance of data pipelines, ensuring timely data delivery, with a specific focus on PySpark job efficiency.
Infrastructure Management: Collaborate with DevOps and MLOps teams to manage and optimize data infrastructure, including cloud resources (AWS, Azure, GCP), databases, and data processing frameworks, ensuring efficient operation of PySpark clusters.
Mentorship & Leadership: Provide technical leadership, mentorship, and code reviews to junior data engineers, promoting Python and PySpark best practices and fostering a culture of continuous improvement.
Collaboration: Partner closely with data scientists, analysts, product managers, and other stakeholders to understand data requirements and deliver effective data solutions.
Innovation: Research and evaluate emerging data technologies, tools, and methodologies to enhance data capabilities and maintain a competitive edge.
Documentation: Create and maintain comprehensive documentation for all data pipelines, data models, and data infrastructure.

Qualifications:

Bachelor's or Master's degree in Computer Science, Software Engineering, Data Science, or a related quantitative field.
5+ years of professional experience in data engineering, with a strong background in building and managing large-scale data systems.
Extensive hands-on experience with Python for data engineering tasks.
Proven experience with PySpark for big data processing and transformation.
Demonstrated experience with cloud data platforms such as AWS (Redshift, S3, EMR, Glue), Azure (Data Lake, Databricks, Synapse), or Google Cloud (BigQuery, Dataflow).
Strong experience with SQL and NoSQL databases (e.g., PostgreSQL, MySQL, MongoDB, Cassandra).
Extensive experience with distributed data processing frameworks, particularly Apache Spark.
Programming Languages: Expert proficiency in Python is mandatory. Strong SQL mastery is essential. Familiarity with Scala or Java is a plus.
Big Data Technologies: In-depth knowledge and hands-on experience with Apache Spark (PySpark), including Spark SQL, Spark Streaming, and DataFrame API. Experience with Apache Kafka, Apache Airflow, Delta Lake, or similar technologies is beneficial.
Data Warehousing: In-depth understanding of data warehousing concepts, dimensional modeling, and ETL/ELT processes.
Cloud Platforms: Hands-on experience with at least one major cloud provider (AWS, Azure, GCP) and their data services, especially those supporting Spark/PySpark workloads.
Containerization: Familiarity with Docker and Kubernetes is a plus.
Version Control: Proficient with Git and CI/CD pipelines.
Excellent problem-solving, analytical, communication, and interpersonal skills.
Ability to articulate complex technical concepts to non-technical stakeholders.
Proven ability to work effectively in a fast-paced, agile environment.
Proactive, self-motivated, with a strong sense of ownership.
Experience with real-time data streaming and processing using PySpark Structured Streaming is desirable.
Knowledge of machine learning concepts and MLOps practices, particularly integrating ML workflows with PySpark, is a plus.
Familiarity with data visualization tools (e.g., Tableau, Power BI) is a plus.
Contributions to open-source data projects are a plus.

Join Citi in Chennai and be part of a dynamic team driving data innovation!

Senior Data Engineer

Auto Apply to 50+ AI Matched Senior Data Engineer Jobs

Responsibilities

Qualifications & Requirements

Full Job Description

Senior Data Engineer - Python & PySpark

Company

Citi