A
Amgen•2h ago
Foundit
Associate Data Engineer
Hyderabad / Secunderabad, Telangana, India
Full Time
Mid Level
Full Job Description
About the Role: Associate Data Engineer at Amgen, Hyderabad
Amgen is seeking a talented Associate Data Engineer to join our dynamic team in Hyderabad / Secunderabad, Telangana, India. This permanent position focuses on developing, testing, and maintaining robust data pipelines to support critical analytics, reporting, and machine learning initiatives.
Key Responsibilities:
- Design, implement, and manage data pipelines using Databricks, PySpark, and Python to process structured and semi-structured data from diverse sources.
- Develop and optimize scalable ETL/ELT workflows for business intelligence, data science, and AI/ML applications.
- Collaborate closely with data engineers, analysts, and data scientists to define data requirements and ensure the delivery of high-quality, reliable datasets.
- Conduct comprehensive data cleansing, validation, and quality assurance checks to maintain data integrity and accuracy.
- Optimize Spark jobs and Databricks notebooks for enhanced performance, reliability, and cost-effectiveness.
- Create and maintain detailed documentation for data pipelines, workflows, data definitions, and operational processes.
- Actively participate in troubleshooting pipeline failures, data discrepancies, and performance bottlenecks.
- Adhere to industry best practices for version control (Git), code quality, testing, and deployment strategies.
- Support foundational AI/ML data preparation tasks, including feature engineering and the creation of training datasets and model inputs.
- Monitor scheduled jobs and workflows to ensure timely and successful data delivery.
- Engage with cross-functional teams within an Agile or iterative development framework.
Qualifications:
- Required: Bachelor's degree in Computer Science, Data Engineering, Information Systems, Engineering, Mathematics, or a related field, or equivalent practical experience (2-6 years).
- Proficiency in Python for data processing, scripting, and automation.
- Strong understanding of PySpark and distributed data processing principles.
- Hands-on experience with Databricks, including notebooks, clusters, jobs, workflows, Delta tables, and performance tuning.
- Experience building and maintaining scalable ETL/ELT pipelines in a Databricks environment.
- Familiarity with Delta Lake and lakehouse architecture concepts.
- Solid SQL skills for data querying, transformation, and validation.
- Experience working with various data formats: CSV, JSON, Parquet, and Delta.
- Knowledge of core data engineering concepts (ETL/ELT, data pipelines, data lakes, data warehouses, batch processing, data quality).
- Basic understanding of AI/ML concepts, including features, training data, model inputs/outputs, and evaluation.
- Experience supporting AI/ML data preparation or feature engineering.
- Exposure to cloud data platforms (AWS, Azure, or GCP).
- Familiarity with version control systems like Git.
- Excellent analytical, problem-solving, and troubleshooting abilities.
- Strong communication and collaboration skills, with the ability to work effectively with diverse stakeholders.
- A proactive approach to learning new technologies and data engineering best practices.
Preferred Qualifications:
- Experience with Delta Lake, Unity Catalog, or advanced lakehouse architectures.
- Familiarity with workflow orchestration tools or Databricks Jobs.
- Exposure to CI/CD practices in data engineering.
- Experience with ML frameworks (MLflow, scikit-learn).
- Experience with data visualization tools (Tableau, Power BI) for dashboarding, reporting, and exploratory analysis.
- Understanding of data governance, security, and access control.
- Experience in an Agile/Scrum environment.
Note: This role may require working evening or night shifts based on business requirements.
Company
A
Amgen
Hyderabad / Secunderabad, Telangana, India
Posted on Foundit