
Language Engineer
Full Job Description
Role & Responsibilities
The Amazon Artificial General Intelligence (AGI) Data Services organization is instrumental in developing diverse datasets essential for training and evaluating Amazon AI models. We are actively seeking Language Engineers to join our dedicated science and engineering team. In this role, you will support the creation of complex, multi-modal datasets by employing a variety of methodologies, including synthetic data generation, model-assisted data creation, and human-in-the-loop data collection processes.
Key Job Responsibilities
- Design intricate data collection initiatives involving human participants, directly addressing scientific requirements. This includes authoring clear instructions, defining and implementing quality targets and control mechanisms, providing day-to-day coordination of data collection activities (encompassing planning, scheduling, and reporting), and ensuring the successful delivery of final outputs.
- Develop and execute complex data creation tasks utilizing synthetic and model-based generation techniques, adhering to state-of-the-art methodologies.
- Analyze and extract meaningful insights from extensive datasets.
- Build tools or prototypes for data analysis and data creation, leveraging Python or other scripting languages.
- Employ modeling tools to initiate or test novel AI functionalities.
- Collaborate effectively with scientists, software engineers, and fellow data creators to assess the performance of AI models.
About the Team
Amazon's commitment to being the world's most customer-centric company means enabling customers to research and purchase anything online or offline. We set ambitious goals and seek individuals who can help us achieve and surpass them. The AGI organization is at the forefront of developing AI capabilities that power numerous Amazon products and search experiences. We provide secure, flexible, cost-effective, and high-quality data development services to our internal and external customers, empowering them to build advanced Machine Learning models.
Basic Qualifications
- Master's degree or higher in a relevant discipline such as Computational Linguistics or an equivalent field with a strong emphasis on computational analysis.
- A minimum of 2 years of experience in computational linguistics, language data processing, or AI data creation.
- Demonstrated experience with language data annotation systems and various forms of data markup.
- Proficiency in scripting languages, particularly Python.
- Experience working with speech, text, and multi-modal data across multiple languages.
- Exceptional communication, robust organizational skills, and a keen eye for detail.
- Comfort and adaptability in a fast-paced, highly collaborative, and dynamic work environment.
Preferred Candidate Profile
- PhD in Computational Linguistics or an equivalent field with a significant computational focus.
- Expertise in bootstrapping AI data collections to rapidly adapt to evolving requirements.
- Extensive experience working with speech, text, and multi-modal data in a diverse range of languages.
- Proven experience in data creation for sophisticated synthetic workflows.
- Practical experience with Machine Learning concepts and applications.
- Familiarity with technical concepts, including Application Programming Interfaces (APIs).
- Practical knowledge of version control systems and agile development methodologies.
- Familiarity with database querying and data analysis processes (e.g., SQL, R, Matlab).
- A willingness to support multiple projects concurrently and adapt to reprioritization as needed.
- Strong creative thinking abilities coupled with outstanding analytical and problem-solving skills.
Company
Amazon
Amazon is a global leader committed to being the world's most customer-centric company. We empower customers to research and purchase a vast array of products online and offline. Our ambition is to se...