vLLM SG Meet Up

Overview

🚨 This event is currently oversubscribed — thanks so much for the amazing interest! If you're really keen to join, feel free to email us at [email protected] and we’ll do our best to accommodate you.

Join us for our latest vLLM meetup, where we'll dive into the forefront of efficient large language model (LLM) inference and serving. As an open-source powerhouse for high-throughput, memory-efficient LLM deployments, vLLM continues to evolve rapidly, and this event is your opportunity to stay ahead of the curve.

Course Description & Learning Outcomes

Key Highlights

Introduction to vLLM & vLLM v1 Development Insights: Get an insightful introduction to vLLM from a major contributor, who will share key developments and experiences from vLLM v1.

Optimizing vLLM Inference on AMD Data Center GPUs: A Senior Product Manager for AMD AI Engineering will provide a deep dive into best practices and advanced techniques for maximizing vLLM inference performance on AMD's GPUs.

WEKA's Augmented Memory Grid (AMG): WEKA will present their innovative AMG, a high-performance, persistent key-value cache layer leveraging GPUDirect Storage (GDS) to deliver unparalleled performance for AI workloads.

Deploying AudioLLM with Ray: The MERaLiON team will share their experience and insights on deploying AudioLLM using Ray and VLLM, with a specific focus on effective autoscaling and load balancing strategies.

What You'll Walk Away With

By the end of this meetup, you'll have a deeper understanding of the evolving vLLM ecosystem, practical knowledge on integrating advanced solutions like AMG, and fresh ideas for enhancing your LLM serving infrastructure. This is more than a technical deep dive—it's a chance to network with fellow developers, maintainers, and industry leaders, fostering collaborations that drive open-source innovation forward.

Don't miss out—RSVP today and be part of the conversation shaping the future of efficient AI inference!

Schedule

Date: 27 Aug 2025, Wednesday
Time: 6:00 PM - 8:30 PM (GMT +8:00) Kuala Lumpur, Singapore
Location: 32 Carpenter Street, 059911

Agenda

Day/Time	Agenda Activity/Description
6:00 PM - 6:10 PM	Registration & light refreshment
6:10 PM - 6:40 PM	Introduction to vLLM & vLLM v1 Development Insights Join us for an insightful introduction to vLLM. This session will also feature a major vLLM contributor sharing their experiences and key developments from vLLM v1.
6:40 PM - 7:00 PM	Optimizing vLLM Inference on AMD Data Center GPUs: Best Practices & Deep Dive Join a Senior Product Manager for AMD AI Engineering to explore advanced techniques for maximizing vLLM inference performance on AMD GPUs.
7:00 PM - 7:30 PM	WEKA Presents: Augmented Memory Grid (AMG) Discover WEKA's AMG, a high-performance, persistent KV cache layer powered by GPUDirect Storage, designed to optimize your data workflows.
7:30 PM - 7:50 PM	Deploying AudioLLM with Ray Learn about MERaLiON team's experience deploying AudioLLM using Ray and VLLM, with a focus on autoscaling and load balancing.
7:50 PM - 8:30 PM	Q&A, Dinner & Networking Cap off the evening with an interactive Q&A session, followed by dinner and an excellent opportunity to network with fellow attendees and speakers.

Skills Covered

PROFICIENCY LEVEL GUIDE
Beginner: Introduce the subject matter without the need to have any prerequisites.
Proficient: Requires learners to have prior knowledge of the subject.
Expert: Involves advanced and more complex understanding of the subject.