AI-in-a-Box Performance Evaluation of Large Language Models (LLM) using NVIDIA NIMs on AWS EKS with NVIDIA GPU A10G

This document showcases the evaluated performance benchmark results of NVIDIA NIM Llama 3 8B model on Amazon EKS cluster, powered by NVIDIA A10G Tensor Core GPU. It outlines the key findings from Infobell IT's comprehensive and independent performance benchmarking of the NVIDIA NIM Llama 3 model, conducted using Infobell IT’s innovative benchmarking tool, EchoSwift.

NVIDIA NIM

NVIDIA NIM microservices are a set of easy-to-use microservices for accelerating the deployment of foundational models on any cloud or data center while ensuring that your data remains secure. NVIDIA NIM microservices have production-grade runtimes including on-going security updates.

NVIDIA A10G Tensor core GPU

Built on the latest NVIDIA Ampere architecture, the NVIDIA A10G GPU is designed to accelerate AI, machine learning and data-heavy tasks, offering exceptional performance and efficiency. It’s ideal for businesses seeking high-speed AI applications and virtualized environments with energy efficiency.

Amazon Elastic Kubernetes Services (EKS)

Amazon Elastic Kubernetes Services (EKS) is a managed Kubernetes service offered to run Kubernetes workload on AWS infrastructure, taking advantage of its security, availability and scalability. It integrates with other AWS services like EC2 for compute instances, VPC for networking and IAM for role and permission management.

LLM Model	SUT Config	Input and Output tokens combinations
NVIDIA NIM Llama3 8B	NVIDIA A10G GPU, 8 Core CPUs and 32GB Memory	32 Input tokens and 256 Output tokens 256 Input tokens and 32 Output tokens

EchoSwift

EchoSwift is a specialized tool for benchmarking inference in Large Language Models (LLMs). It offers comprehensive performance and scalability assessments, measuring key metrics like latency and throughput under varying parallel request loads. These insights help customers make informed decisions about the optimal configurations for their LLMs and aids in identifying potential bottlenecks in their deployments of LLM models under load. The insights gained from this benchmarking tool provide clear and actionable guidance, helping customers make informed decisions about the optimal configuration for their large language models (LLMs).

Get it now

Whats your name?

Job Title

Company

Email address

Phone Number

Country

book

CPU Type