
PySpark Module Lead
Responsibilities
Qualifications & Requirements
Experience Level: Mid Level
Full Job Description
Job Opportunity: PySpark Module Lead in Noida, Uttar Pradesh
Sopra Steria is seeking a skilled and motivated PySpark Module Lead to join our dynamic team in Noida, Uttar Pradesh. This role involves close collaboration with Data Scientists to develop and deploy machine learning models. Proficiency in PySpark and related technologies is essential for building and maintaining robust pipelines for training and inference datasets.
Responsibilities
- Collaborate with Data Scientists to design, develop, and implement machine learning pipelines.
- Utilize PySpark for data processing, transformation, and preparation of datasets for model training.
- Leverage AWS EMR and S3 for scalable and efficient data storage and processing.
- Implement and manage ETL workflows using Streamsets for data ingestion and transformation.
- Design and construct pipelines to deliver high-quality training and inference datasets.
- Partner with cross-functional teams to ensure seamless deployment and real-time/near real-time inferencing capabilities.
- Optimize and fine-tune pipelines for performance, scalability, and reliability.
- Ensure appropriate configuration of IAM policies and permissions for secure data access and management.
- Implement and optimize Spark architecture and Spark jobs for scalable data processing.
Qualifications and Requirements
This position requires a professional degree and a total expected experience of 04-06 years.
Mandatory Skills:
- Proficiency in Advanced SQL (Window functions), Spark Architecture, PySpark or Scala with Spark, Hadoop.
- Proven expertise in designing and deploying data pipelines.
- Strong problem-solving skills and the ability to work effectively in a collaborative team environment.
- Excellent communication skills, with the ability to translate technical concepts to non-technical stakeholders.
Desirable Skills:
- Hands-on experience with Airflow, S3, and Streamsets or similar ETL tools (training available).
- Understanding of real-time or near real-time inferencing architectures.
- Basic knowledge of Kafka, AWS IAM, AWS EMR, and Snowflake.
Sopra Steria is an equal opportunity employer committed to diversity and inclusion, and we welcome applications from individuals with disabilities.
Company
Soprasteria
Sopra Steria is a leading European technology company with 56,000 employees operating in approximately 30 countries. They specialize in consulting, digital services, and software development, assistin...