Overview
Join us for our latest vLLM meetup, where we'll dive into the forefront of efficient large language model (LLM) inference and serving. As an open-source powerhouse for high-throughput, memory-efficient LLM deployments, vLLM continues to evolve rapidly, and this event is your opportunity to stay ahead of the curve.
Course Description & Learning Outcomes
Key Highlights
Latest Developments in vLLM: The core vLLM team will showcase recent advancements in the vLLM library, spotlighting new features focused on throughput optimization, enhanced memory management, and seamless integration with emerging hardware ecosystems, pushing the boundaries of large language model performance.
AMD's Talk on Optimizing vLLM Inference: AMD will deliver a deep dive into best practices for optimizing vLLM inference on AMD Data Center GPUs, sharing insights and techniques to maximize performance and efficiency in AI model deployment.
WEKA's Augmented Memory Grid (AMG): WEKA will present their innovative AMG, a high-performance, persistent key-value cache layer leveraging GPUDirect Storage (GDS). AMG integrates with WEKA's NeuralMesh infrastructure to deliver unparalleled performance for AI workloads.
What You'll Walk Away With
By the end of this meetup, you'll have a deeper understanding of the evolving vLLM ecosystem, practical knowledge on integrating advanced KV cache solutions like AMG, and fresh ideas for enhancing your LLM serving infrastructure. This is more than a technical deep-dive—it's a chance to network with fellow developers, maintainers, and industry leaders, fostering collaborations that drive open-source innovation forward.
Don't miss out—RSVP today and be part of the conversation shaping the future of efficient AI inference!
The agenda is subject to change.
Schedule
Date: 27 Aug 2025, Wednesday
Time: 6:00 PM - 8:30 PM (GMT +8:00) Kuala Lumpur, Singapore
Location: 32 Carpenter Street, 059911
Agenda
Day/Time | Agenda Activity/Description |
---|---|
6:00 PM - 6:15 PM | Registration |
6:15 PM - 6:45 PM | Introduction to vLLM & vLLM v1 Development Insights Join us for an insightful introduction to vLLM. This session will also feature a major vLLM contributor sharing their experiences and key developments from vLLM v1. |
6:45 PM - 7:00 PM | Optimizing vLLM Inference on AMD Data Center GPUs: Best Practices & Deep Dive Explore advanced techniques and best practices for maximizing the performance of vLLM inference workloads on AMD's GPUs |
7:00 PM - 7:30 PM | WEKA Presents: Augmented Memory Grid (AMG) Discover WEKA's AMG, a high-performance, persistent KV cache layer powered by GPUDirect Storage, designed to optimize your data workflows. |
7:30 PM - 8:30 PM | Q&A, Dinner & Networking Cap off the evening with an interactive Q&A session, followed by dinner and an excellent opportunity to network with fellow attendees and speakers. |
Skills Covered
PROFICIENCY LEVEL GUIDE
Beginner: Introduce the subject matter without the need to have any prerequisites.
Proficient: Requires learners to have prior knowledge of the subject.
Expert: Involves advanced and more complex understanding of the subject.
- Cloud Computing (Proficiency level: Proficient)
- Machine Learning (Proficiency level: Proficient)
- Computer Science (Proficiency level: Proficient)
- Deep Learning (Proficiency level: Proficient)
- DevOps (Proficiency level: Proficient)
- Software Development (Proficiency level: Proficient)
Partners



