GPT-IN-A-BOX Performance Evaluation: Nutanix Kubernetes Engine (NKE) on AMD EPYC 9004 Series
Nutanix Kubernetes Engine (NKE) is a platform that enables the deployment and management of Kubernetes clusters on Nutanix's hyper-converged infrastructure. It provides a streamlined, scalable, and secure environment for running containerized applications, integrating with Nutanix's suite of tools for infrastructure management, storage, and networking.
Kubernetes Version: 1.26.8-0
The AMD EPYC 9004 series, known as "Genoa," represents AMD's latest lineup of high-performance server processors, designed for the most demanding workloads across various industries.
AMD EPYC 9654: 4th Gen processor with 96 cores and 192 threads
This document provides a performance evaluation of an GPT-IN-A-BOX solution running on Nutanix Kubernetes Engine, powered by AMD EPYC 9004 series CPUs, for serving Large Language Model (LLM) inferencing with the open-source LLM, Llama2. It outlines the key findings from Infobell IT's comprehensive and independent performance, using Infobell IT’s innovative benchmarking tool EchoSwift.
LLM Model | SUT Config | Input and Output tokens combinations |
---|---|---|
Llama2 7B (int8) | 24 Core CPUs and 48GB memory |
|
EchoSwift is a specialized tool for benchmarking inference in Large Language Models (LLMs). It offers comprehensive performance and scalability assessments, measuring key metrics like latency and throughput under varying parallel request loads. These insights help customers make informed decisions about the optimal configurations for their LLMs and aids in identifying potential bottlenecks in their deployments of LLM models under load. The insights gained from this benchmarking tool provide clear and actionable guidance, helping customers make informed decisions about the optimal configuration for their large language models (LLMs).