
Data Scientist
Responsibilities
Qualifications & Requirements
Experience Level: Mid Level
Full Job Description
Data Scientist Role at ProcDNA in Pune, India
ProcDNA is seeking a skilled Data Scientist to join our team in Pune, India. This role is pivotal in delivering comprehensive data science solutions for the pharmaceutical and healthcare sectors. You will leverage a combination of technical expertise, business understanding, and a scientific approach to tackle complex challenges. Your responsibilities will include transforming intricate business problems into analytical frameworks, developing robust machine learning and statistical models, and generating actionable insights that positively impact commercial, clinical, and operational outcomes for global clients and patients. Strategic thinking is essential as you will frame business challenges, design analytical strategies, and guide clients toward data-driven decision-making.
Key Responsibilities
- Lead day-to-day execution of data science projects, ensuring methodological soundness, business relevance, and timely delivery from problem definition to deployment.
- Construct, optimize, and validate sophisticated machine learning and statistical models, including supervised techniques (classification, regression, uplift), unsupervised methods (clustering, PCA, GMM), transformer models, and analytical frameworks (hypothesis testing, causal inference, survival analysis) using industry-standard libraries.
- Develop clean, modular, and production-ready code with reusable components, following best practices for version control, documentation, and scalable pipeline design for production or client-facing deployments.
- Synthesize insights from diverse data sources such as claims, prescription data (LAAD), lab results, EMRs, and unstructured text into clear narratives to guide client decisions, considering patient, healthcare provider (HCP), and market contexts.
- Collaborate with consultants, domain experts, and engineers to structure analytical workflows addressing complex commercial or clinical questions.
- Present findings and insights clearly, structurally, and actionably to internal and client stakeholders.
- Actively engage in client discussions, supporting solution development and storyboarding for business audiences.
- Contribute to internal capability enhancement through the creation of reusable ML assets, accelerators, and documentation to enrich the team's solution portfolio.
Required Skills
- Extensive hands-on experience with Python, PySpark, and SQL for managing and processing large structured and unstructured datasets.
- Solid understanding of machine learning algorithms, feature engineering, model tuning, and evaluation methodologies.
- Proficiency in data visualization tools such as Power BI, Tableau, or the MS Office suite, and the ability to effectively communicate analytical results.
- Capability to structure ambiguous business problems, define analytical roadmaps, and communicate insights effectively to both technical and non-technical audiences.
- Strong collaboration and project management skills for coordinating multi-disciplinary teams.
Preferred Skills
- Previous experience in the pharmaceutical or life sciences industry, with familiarity with structured data sources like LAAD, Lab, and Sales, as well as unstructured datasets (e.g., market research, physician notes, publications).
- Experience with R, Rshiny, and data platforms like Databricks, AWS, Azure, or Snowflake is beneficial.
- Exposure to MLOps frameworks, including MLflow, Docker, Airflow, or CI/CD pipelines, for automating model training, deployment, and monitoring in scalable production environments.
- Experience mentoring junior analysts or working collaboratively in cross-functional data science teams.
Qualifications
- Bachelor's or Master's degree in Computer Science, Statistics, Mathematics, Data Science, or a related field.
- 1 to 6 years of professional experience in data science, analytics, or advanced modeling roles.
- Demonstrated ability to balance analytical rigor with business acumen, delivering models that are explainable, actionable, and production-ready.