Why it matters
Running agents locally with open-source models eliminates per-token API charges entirely. For high-throughput workloads, local inference economics can outperform cloud APIs by an order of magnitude.
The tokenmaxxing angle
The talk directly addresses the build-vs-buy tradeoff for inference. Switching from a cloud API to a self-hosted vLLM endpoint can eliminate token costs on repetitive agent tasks, making local-first routing a core FinOps strategy.
From the organizers
Speaker Legare Kerrison is a Developer Advocate on Red Hat's AI team working with OpenShift AI and vLLM, per the event page. Topic is 'Running Agents Locally with Open Source: From Laptop to Kubernetes.'