 |
Senior GPU Platform Engineer - Onsite - Redmond Washington
Company: EPAM Systems Location: Redmond, Washington
Posted On: 02/28/2026
Join our team to operate and support cutting-edge GPU infrastructure powering AI and high-performance computing workloads for a leading global hyperscale cloud provider. In this hands-on role, youll manage the full lifecycle of NVIDIA GPU platforms from bring-up to break/fix while ensuring optimal performance for advanced AI applications. At EPAM, youll work on cutting-edge technologies, solve complex challenges, and shape the future of digital innovation. With access to continuous learning, mentorship, and global projects, your expertise will drive meaningful change. Responsibilities • Operate and maintain production GPU and bare-metal compute platforms with hands-on hardware management • Perform physical infrastructure tasks including rack/stack, cabling, power validation, and system bring-up • Diagnose hardware faults, replace failed components, and coordinate vendor support for complex issues • Install and configure Linux operating systems with GPU-specific drivers and software stacks • Execute platform validation using diagnostic tools to ensure GPU health, stability, and performance • Provision bare-metal systems through automated workflows while troubleshooting configuration issues • Apply firmware, BIOS, and platform configuration changes following standardized change processes Requirements • 5 years professional experience supporting production server infrastructure in data center environments • Strong Linux administration skills with ability to independently troubleshoot system-level issues • Hands-on experience with physical server hardware including diagnostics and component replacement • Familiarity with GPU platforms, preferably NVIDIA, and associated drivers and software stacks • Experience working in structured, change-controlled production environments • Knowledge of infrastructure monitoring tools and alert response procedures • Excellent communication skills with ability to collaborate across operations and engineering teams We offer/Benefits • Medical, Dental and Vision Insurance (Subsidized) • Health Savings Account • Flexible Spending Accounts (Healthcare, Dependent Care, Commuter) • Short-Term and Long-Term Disability (Company Provided) • Life and AD&D Insurance (Company Provided) • Employee Assistance Program • Unlimited access to LinkedIn learning solutions • Matched 401(k) Retirement Savings Plan • Paid Time Off – the employee will be eligible to accrue 15-25 paid days, depending on specific level and tenure with EPAM (accrual eligibility may change over time) • Paid Holidays - nine (9) total per year • Legal Plan and Identity Theft Protection • Accident Insurance • Employee Discounts • Pet Insurance • Employee Stock Purchase Program • If otherwise eligible, participation in the discretionary annual bonus program • If otherwise eligible and hired into a qualifying level, participation in the discretionary Long-Term Incentive (LTI) Program EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our clients, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential. More...

Register an account with us and set up job agents! We'll email you immediately when jobs like this are posted on our site.
|
 |