Principal Machine Learning Engineer - PyTorch
Join the AI Platform Core Components team at Red Hat, a vital part of AI Engineering, and contribute your passion for Open Source and Machine Learning to expand the reach and impact of Red Hat's AI offerings. This role focuses on enhancing Red Hat's AI capabilities for customers and the open-source community.
About the Role
We are seeking a talented PyTorch Machine Learning Engineer to significantly improve, extend, and contribute to the upstream development of PyTorch on Red Hat platforms. You will play a key role in advancing PyTorch core functionalities, optimizing its performance on cutting-edge hardware, and actively collaborating with the broader upstream PyTorch community.
Responsibilities
- Design, implement, and maintain core PyTorch features, including operations, kernels, and development tools, using both Python and C++.
- Profile and optimize PyTorch execution across various hardware architectures, including CPUs, GPUs (NVIDIA CUDA), and accelerators (Intel, AMD).
- Develop comprehensive tests, benchmarks, and concise examples to ensure the correctness and performance of PyTorch implementations.
- Diagnose and resolve issues spanning the entire technology stack, including PyTorch, associated libraries, hardware, and drivers, contributing fixes to the upstream project.
- Foster strong collaboration with upstream PyTorch maintainers and internal Red Hat teams, producing clear documentation and design proposals.
- Actively contribute to the PyTorch open-source community.
Qualifications
- Possess 2 to 6 years of experience in developing and maintaining Machine Learning systems.
- Demonstrated experience contributing to open-source projects.
- Proficiency in C++ and Python programming languages.
- Hands-on experience with PyTorch, including working with its internals, developing custom operations, or advanced usage scenarios.
- A solid understanding of algorithms, data structures, and performance-optimized coding practices.
- Comfortable working within a Linux environment, utilizing Git for version control, and adhering to modern development workflows.
Bonus Points
- Familiarity with numerical computing techniques, vectorization strategies, and low-level performance profiling tools.
- Previous contributions to the PyTorch project or other significant ML/AI open-source initiatives.
- Experience working with CUDA, ROCm (AMD GPUs), or Intel GPU/oneAPI technologies.
