CUDA Kernel Optimization Specialist - AI Trainer

name
Remote United States Full-time 🌐 English
NA
Experience: Entry-level
Added to JobCollate: June 7, 2026

Source: Himalayas

Tailor your resume to this posting—match keywords and layout for recruiters. Try Resume.io before you apply.

AI Summary Powered by Gemini

This role focuses on analyzing and optimizing GPU kernels to enhance performance and hardware utilization using profiler metrics. It is an ideal opportunity for engineers with at least one year of GPU experience to apply C++ and CUDA expertise in a flexible, remote environment.

Job Description

Role OverviewAnalyze and optimize GPU kernels for performance, efficiency, and hardware utilization. Use profiler metrics to guide kernel improvements. Review GPU kernel implementations to identify bottlenecks without needing extensive algorithmic background.What You Will DoWrite, modify, and reason about C++17, Python, and GPU programming code. Apply CUDA, HIP, and shader programming expertise to improve performance outcomes. Document optimization decisions clearly.Why It Might Be a FitMust have at least 1 year of professional or graduate-level research experience with GPUs. Strong understanding of GPU profiler performance metrics for kernel optimization. Ability to optimize GPU kernels without deep prior context on every algorithm.RequirementsAvailable to work at least 20 hrs/wk.Fluent in core C++ features through C++17.Working knowledge of Python and Git.Fluent in at least one GPU programming model like CUDA, HIP, Slang, HLSL, or GLSL.At least 1 year of professional or graduate-level research experience with GPUs.Strong understanding of GPU profiler performance metrics for kernel optimization.Ability to optimize GPU kernels without deep prior context on every algorithm.Originally posted on Himalayas

Full Description

Role OverviewAnalyze and optimize GPU kernels for performance, efficiency, and hardware utilization. Use profiler metrics to guide kernel improvements. Review GPU kernel implementations to identify bottlenecks without needing extensive algorithmic background.What You Will DoWrite, modify, and reason about C++17, Python, and GPU programming code. Apply CUDA, HIP, and shader programming expertise to improve performance outcomes. Document optimization decisions clearly.Why It Might Be a FitMust have at least 1 year of professional or graduate-level research experience with GPUs. Strong understanding of GPU profiler performance metrics for kernel optimization. Ability to optimize GPU kernels without deep prior context on every algorithm.RequirementsAvailable to work at least 20 hrs/wk.Fluent in core C++ features through C++17.Working knowledge of Python and Git.Fluent in at least one GPU programming model like CUDA, HIP, Slang, HLSL, or GLSL.At least 1 year of professional or graduate-level research experience with GPUs.Strong understanding of GPU profiler performance metrics for kernel optimization.Ability to optimize GPU kernels without deep prior context on every algorithm.Originally posted on Himalayas

Required Skills

AI-Performance-Optimization-Engineer AI-Optimization-Engineer GPU-Programming-Engineer GPU-Software-Engineer GPU-Computing-Engineer Software-Optimization-Engineer