NVIDIA is known for developing integrated circuits, which are used in everything from electronic game consoles to personal computers (PCs). The company is a leading manufacturer of high-end graphics processing units (GPUs)
The individual in this role will actively monitor the clusters to identify and resolve any issues that may arise, collaborating closely with various teams as necessary. The job involves troubleshooting a wide range of problems, spanning from hardware and network issues to Kubernetes or other Linux service complications.
This role demands a dynamic and proactive approach to maintaining the stability and performance of our clusters, contributing significantly to our cutting-edge HPC and AI initiatives.
Linux/Linux Administration:
Kubernetes:
L2/L3 Networking – Cumulus Switches:
GPU Experience:
InfiniBand:
Hardware Troubleshooting:
If you are looking for stability, professional growth, long-term career, and technology challenges in the sought-after companies – come and join us today! One last thing, if you have a lot of these skills, but not all of them, please still apply. We love to teach those who are willing to learn.