Why it matters
This is production agent plumbing, not demos: memory, vector retrieval, and caching are what make agents fast and reliable at scale. A support-agent reference design handling hundreds of MCP tools is a concrete template for shipping on AWS.
The tokenmaxxing angle
Semantic caching is one of the cleanest token-cost levers there is — serve repeat questions from Redis instead of re-running the LLM. A talk explicitly about cutting redundant LLM calls is squarely about inference economics.
From the organizers
June 24, 4:30-6 PM EDT at 12 W 39th St, NYC, with the talk from 4:45-5:45; speaker is Yusuf Bahdur, Senior Partner Solutions Architect at Redis. Attendees must arrive 15 minutes early for security and need full names on Meetup profiles.