CGW has officially announced Puget Systems’ Custom Generative AI and Machine Learning server as a recipient of a CGW Silver Edge Award for SIGGRAPH 2023. Selected by CGW's editorial team, the Silver Edge Awards recognize the most innovative and impressive technologies announced or on display at the annual conference.
Out of the many exhibitors participating in SIGGRAPH’s 50th annual conference this year, CGW selected 10 companies for their outstanding technological achievements. These best-of-show selections demonstrate remarkable advancements in the field of computer graphics.
About Puget Systems’ Custom Generative AI and Machine Learning Server
Puget Systems debuted a custom Generative AI and Machine Learning server at SIGGRAPH 2023. Configured with four NVIDIA RTX 6000 Ada graphics cards, the specialized new server is designed to handle intensive generative AI and machine learning and to effectively manage real-time rendering, graphics, AR/MR/VR/XR, compute, and deep learning processing.
The AI Training and Inference server is a rackmount workstation capable of hosting a web-based chat server using STOA (spatial-temporal modeling of object and action) models such as the Meta-Llama-2-70b large language models (LLMs) supporting multiple simultaneous users. Puget Systems Labs conducted extensive testing of this configuration with Llama-2-70b and Falcon-40b (Falcon-40b requires less memory space and can run with only two RTX 6000 Ada GPUs.) In addition to running a chat interface, this hardware is also suitable for base model fine-tuning within the available GPU memory limits.
The Puget Systems Lab team conducted extensive testing of the new AI Training and Inference server utilizing a full set of four NVIDIA RTX 6000 Ada graphics cards. The team tested the system with Meta’s Llama-2-70b-chat-hf, using HuggingFace Text-Generation-Inference (TGI) server and HuggingFace ChatUI. The test model used approximately 130GB of video memory (VRAM), and the team confirmed that the system should work well with other LLMs that fit within available GPU memory (192GB with four cards installed).
Notable performance stats from the testing:
Typical usage measured response:
- • Validation Time = 0.59673 ms
- • Queue Time = 0.17409 ms
- • Time per Token = 54.558 ms
Stress tested with multiple concurrent users:
- • Data below is from a session with 114 prompts (20-30 users) over 5 minutes
Average prompt response under multi-user load:
- • Validation Time = 3.0312 ms
- • Queue Time = 4687.9 ms
- • Time per Token = 68.076 ms
Here's another interview with Puget Systems from SIGGRAPH 2023: