About Us
Blute Technologies is a technology solutions company focused on delivering AI-driven, enterprise-grade digital transformation solutions across industries. We specialize in AI/ML, data engineering, analytics, enterprise applications, and intelligent automation solutions.
Role Overview
We are seeking an experienced Dataset Developer with a minimum of 5 years of hands-on experience in data collection, annotation, preprocessing, validation, and dataset engineering for AI/ML applications. This role is crucial for building high-quality datasets that directly impact AI model performance and business outcomes. The ideal candidate will possess strong expertise in managing structured and unstructured datasets, data pipelines, labeling tools, and quality assurance processes, with a focus on supporting machine learning and computer vision/NLP initiatives in Delhi.
Key Responsibilities
- Design, build, curate, and maintain large-scale datasets for AI/ML projects.
- Perform data cleaning, preprocessing, normalization, augmentation, and transformation activities.
- Work with structured, semi-structured, and unstructured data sources including images, videos, text, audio, and tabular data.
- Develop and manage data annotation workflows using industry-standard labeling tools.
- Ensure dataset accuracy, consistency, completeness, and quality through validation techniques.
- Collaborate with AI/ML engineers, data scientists, and product teams to understand model requirements and dataset specifications.
- Create metadata standards, taxonomy structures, and labeling guidelines.
- Automate repetitive dataset preparation and validation tasks using scripting/programming.
- Handle data balancing, deduplication, versioning, and dataset lifecycle management.
- Support model training teams with optimized and production-ready datasets.
- Maintain documentation for dataset pipelines, schemas, and data governance practices.
- Ensure compliance with data privacy, security, and regulatory standards.
Required Skills & Qualifications
- Minimum 5 years of experience in dataset development, data engineering, or AI data operations.
- Strong understanding of AI/ML data preparation methodologies.
- Experience working with annotation and labeling tools such as Label Studio, CVAT, Supervisely, Roboflow, V7.
- Proficiency in Python and data handling libraries such as Pandas, NumPy, OpenCV.
- Experience with SQL and database systems.
- Knowledge of data formats including JSON, XML, CSV, Parquet, and TFRecord.
- Familiarity with computer vision, OCR, NLP, or speech datasets is preferred.
- Experience with cloud platforms such as Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform (GCP).
- Understanding of data quality metrics and validation frameworks.
- Experience with Git/version control systems.
- Strong analytical and problem-solving skills.
- Excellent communication and collaboration abilities.
Preferred Qualifications
- Experience supporting Generative AI or Large Language Model (LLM) projects.
- Knowledge of synthetic data generation techniques.
- Familiarity with MLOps and data pipeline orchestration tools.
- Exposure to healthcare, retail, manufacturing, or enterprise AI datasets is an added advantage.
- Bachelor’s or Master’s degree in Computer Science, Data Science, AI, IT, or a related field.
What We Offer
- Opportunity to work on cutting-edge AI and data engineering projects in Delhi.
- Exposure to enterprise-scale AI implementations.
- A collaborative and innovation-driven work environment.
- Career growth opportunities in AI/ML and data engineering domains.
- Competitive compensation and benefits.
Pay: ₹108,333.00 - ₹141,667.00 per month
Work Location: In-person