
PySpark Technical Lead
Responsibilities
Qualifications & Requirements
Experience Level: Senior Level
Full Job Description
Sopra Steria is looking for a skilled Data Engineer to join its team in Chennai, Tamil Nadu. This role involves close collaboration with Data Scientists to develop and deploy machine learning models. The ideal candidate will be proficient in building and maintaining pipelines for training and inference datasets, utilizing PySpark for data processing and transformation. Key responsibilities include working with AWS EMR and S3 for scalable data solutions, implementing ETL workflows using Streamsets, designing high-quality datasets, and collaborating with cross-functional teams for deployment and inferencing. Optimization of pipelines for performance and reliability, along with ensuring secure data access through IAM policies, are also critical. Experience with Spark architecture and job optimization is essential. The total expected experience for this role is 6-8 years. Candidates should possess advanced SQL skills (including window functions), expertise in Spark Architecture, PySpark or Scala with Spark, and Hadoop. Proven experience in designing and deploying data pipelines, strong problem-solving abilities, and excellent communication skills are required. Desirable skills include hands-on experience with Airflow, S3, Streamsets, Kafka, AWS IAM, AWS EMR, and Snowflake. The company is committed to fostering an inclusive and respectful work environment, free from discrimination, and all positions are open to individuals with disabilities.
Company
Sopra Steria
Sopra Steria is a leading European technology company with 56,000 employees operating in approximately 30 countries. Renowned for its expertise in consulting, digital services, and software developmen...