Sort by
Refine Your Search
-
Listed
-
Country
-
Field
-
Language Model (LLM) GPU cluster to ensure stable and reliable operation of training tasks; (b) handle GPU node failures, IB network anomalies, CUDA/NCCL errors and Kubernetes scheduling failures, perform
-
Language Model (LLM) training platform, developing unified capabilities for GPU resource pooling, training job scheduling, inference acceleration and the Machine Learning Operations (MLOps) platform
-
/GPU environments. Provide consultative support and training to researchers using BRC AI/ML tools and pipelines. Performs related duties & responsibilities as assigned/requested. Qualifications REQUIRED
-
(URCF) at Drexel University is building a new shared computing platform focused on GPU-accelerated workloads, particularly AI model training. The system includes GPU and CPU compute nodes with Nvidia H200
-
heterogeneous (CPU/GPU) computing models. Collaborate with physicists, computer scientists, mathematicians and engineers across LBNL divisions to define software requirements, implement robust solutions, and
-
that serve the entire campus community. You will bridge the gap between high-performance hardware and practical user applications, ensuring that our AI infrastructure, from GPU infrastructure to sovereign data
-
experiments, particularly ATLAS and DUNE. Contribute to the architecture and core development of the Phlex framework, emphasizing scalable, multi-threaded, and heterogeneous (CPU/GPU) computing models
-
heterogeneous (CPU/GPU) computing models. Collaborate with physicists, computer scientists, mathematicians and engineers across LBNL divisions to define software requirements, implement robust solutions, and
-
Job Code 0005 Employee Class Civil Service Add to My Favorite Jobs Email this Job About the Job The successful applicant will assist in the adaptation of the PPMstar code to run well on GPU-accelerated
-
analysis systems using GPU- and FPGA-supported HPC clusters at large international research facilities such as Effelsberg, SKA, and MeerKAT. The systems developed by the BDG are based on state-of-the-art