Sort by
Refine Your Search
-
Listed
-
Country
-
Field
-
, and security monitoring tools. PREFERRED: Professional certification (CISSP or equivalent), hands-on experience with securing HPC, GPU cluster, or data center environments, experience with AI/ML
-
maintain the workload scheduler and architect quality-of-service policies. Administer Linux systems across infrastructure projects and deployment of new GPUs for research and teaching. Troubleshoot complex
-
in chemistry, physics, or related field. At least 2 years of experience developing quantum Monte Carlo algorithms. Strong problem-solving and analytical skills. Python programming experience. GPU
-
Massachusetts Institute of Technology (MIT) | Cambridge, Massachusetts | United States | 19 days ago
of complex AI research workloads on state-of-the-art hardware. The role will have heavy focus on optimizing existing NVIDIA GPU-based workloads for top-tier AMD GPUs, such as MI355X and beyond and will analyze
-
performance of complex AI research workloads on state-of-the-art hardware. The role will have heavy focus on optimizing existing NVIDIA GPU-based workloads for top-tier AMD GPUs, such as MI355X and beyond and
-
well as large-scale GPU computing facilities for deep learning. We are looking for a Research Engineer to manage the EEE GPU Cluster. The role will focus on enhancing the EEE GPU Cluster team’s ability in terms
-
, vulnerability management, and security monitoring tools. PREFERRED: Professional certification (CISSP or equivalent), hands-on experience with securing HPC, GPU cluster, or data center environments, experience
-
différents types de parallélisme, que ce soit au niveau d'un nœud de calcul (CPU et GPU) qu'au niveau d'une grappe de PC. Cet environnement inclura les outils nécessaires à la description et à la construction
-
FLAME-GPU accelerated agent-based modelling of material response to environmental and operational loading EPSRC CDT in Developing National Capability for Materials 4.0, with the Henry Royce
-
Language Model (LLM) GPU cluster to ensure stable and reliable operation of training tasks; (b) handle GPU node failures, IB network anomalies, CUDA/NCCL errors and Kubernetes scheduling failures, perform