AI Evaluation Engineer (Remote)
Company: Crossing Hurdles
Location: India (Remote)
Job Type: Hourly Contract
Compensation: $21 per hour
Commitment: Flexible (30–40 hrs/week or full-time)
About the Role
At Crossing Hurdles, we are a referral partner for organizations that collaborate with the world's leading AI research labs. We are seeking talented individuals to help build and train cutting-edge AI models.
Responsibilities
- Frame and design high-quality machine learning tasks to enhance LLM capabilities.
- Build and optimize ML models for NLP, classification, prediction, recommendation, or generative tasks.
- Run rapid experimentation cycles, evaluate model performance, and iterate on improvements.
- Conduct advanced feature engineering and preprocessing for large-scale datasets.
- Implement adversarial testing, robustness checks, and bias evaluations.
- Fine-tune, evaluate, and deploy transformer-based models when required.
- Create datasets, evaluation rubrics, and benchmarking pipelines for ML tasks.
- Maintain documentation for experiments, datasets, and modelling decisions.
- Stay updated with cutting-edge ML research, tools, and competition-grade methodologies.
Requirements
- 3–5+ years of experience in machine learning model development.
- Degree in Computer Science, Engineering, Statistics, Mathematics, or a related field.
- Proven competitive ML background (Kaggle/DrivenData) with medals or strong rankings is preferred.
- Strong proficiency in Python with PyTorch/TensorFlow.
- Solid understanding of ML fundamentals — statistics, optimisation, model evaluation.
- Experience building reproducible ML pipelines and experiment tracking.
- Familiarity with benchmarking, scoring methodologies & evaluation frameworks.
- Experience with cloud environments (AWS/GCP/Azure).
- Strong problem-solving ability, analytical mindset, and clear communication skills.
- Fluency in English.
Preferred Qualifications
- Kaggle Master/Grandmaster or multiple gold medals.
- Experience with LLMs, generative models, or multimodal learning.
- Knowledge of vector DBs, distributed training, scalable deployments.
- MLOps exposure — W&B, MLflow, Airflow, Docker.
- Publications, open-source contributions, or tech writing experience.
- Prior mentorship or leadership experience.
Application Process
- Apply for the job role.
- Await an official message/email from our recruitment team (typically within 1–2 days).
