Senior GPU Platform Engineer - Onsite
Company: EPAM Systems
Location: Redmond
Posted on: February 11, 2026
|
|
|
Job Description:
Join our team to operate and support cutting-edge GPU
infrastructure powering AI and high-performance computing workloads
for a leading global hyperscale cloud provider. In this hands-on
role, youll manage the full lifecycle of NVIDIA GPU platforms from
bring-up to break/fix while ensuring optimal performance for
advanced AI applications. At EPAM, youll work on cutting-edge
technologies, solve complex challenges, and shape the future of
digital innovation. With access to continuous learning, mentorship,
and global projects, your expertise will drive meaningful change.
Responsibilities • Operate and maintain production GPU and
bare-metal compute platforms with hands-on hardware management •
Perform physical infrastructure tasks including rack/stack,
cabling, power validation, and system bring-up • Diagnose hardware
faults, replace failed components, and coordinate vendor support
for complex issues • Install and configure Linux operating systems
with GPU-specific drivers and software stacks • Execute platform
validation using diagnostic tools to ensure GPU health, stability,
and performance • Provision bare-metal systems through automated
workflows while troubleshooting configuration issues • Apply
firmware, BIOS, and platform configuration changes following
standardized change processes Requirements • 5 years professional
experience supporting production server infrastructure in data
center environments • Strong Linux administration skills with
ability to independently troubleshoot system-level issues •
Hands-on experience with physical server hardware including
diagnostics and component replacement • Familiarity with GPU
platforms, preferably NVIDIA, and associated drivers and software
stacks • Experience working in structured, change-controlled
production environments • Knowledge of infrastructure monitoring
tools and alert response procedures • Excellent communication
skills with ability to collaborate across operations and
engineering teams We offer/Benefits • Medical, Dental and Vision
Insurance (Subsidized) • Health Savings Account • Flexible Spending
Accounts (Healthcare, Dependent Care, Commuter) • Short-Term and
Long-Term Disability (Company Provided) • Life and AD&D
Insurance (Company Provided) • Employee Assistance Program •
Unlimited access to LinkedIn learning solutions • Matched 401(k)
Retirement Savings Plan • Paid Time Off – the employee will be
eligible to accrue 15-25 paid days, depending on specific level and
tenure with EPAM (accrual eligibility may change over time) • Paid
Holidays - nine (9) total per year • Legal Plan and Identity Theft
Protection • Accident Insurance • Employee Discounts • Pet
Insurance • Employee Stock Purchase Program • If otherwise
eligible, participation in the discretionary annual bonus program •
If otherwise eligible and hired into a qualifying level,
participation in the discretionary Long-Term Incentive (LTI)
Program EPAM is a leading global provider of digital platform
engineering and development services. We are committed to having a
positive impact on our clients, our employees, and our communities.
We embrace a dynamic and inclusive culture. Here you will
collaborate with multi-national teams, contribute to a myriad of
innovative projects that deliver the most creative and cutting-edge
solutions, and have an opportunity to continuously learn and grow.
No matter where you are located, you will join a dedicated,
creative, and diverse community that will help you discover your
fullest potential.
Keywords: EPAM Systems, Seattle , Senior GPU Platform Engineer - Onsite, IT / Software / Systems , Redmond, Washington