-
scientific computing, parallel storage and file system. Excellent communication and project management skills, and ability to navigate a complex academic environment and work collaboratively with individuals
-
Lab researches on a variety of computer systems topics including HPC resilience, data center power management, large-scale job scheduling and performance tuning, parallel storage systems and scientific
-
-throughput parallelism for a uniquely transparent large-scale LLM inference service. Service autodeployment for public scientific use on NCSA HPC GPU clusters. Robustness and monitoring for reliable serving
-
in developing the National Deep Inference Fabric, an open-source deep learning interpretability research computing infrastructure project. You will be responsible for full stack development, doing both