Liquid AI unveils LFM2.5-8B-A1B, an 8B parameter MoE edge model with 128K context, 38T pretraining, expanded tokenizer, and strong on-device benchmarking and tool-calling capabilities.
Thinking Machines unveils 'Interaction Models'—a real-time, multimodal, two-model architecture with 200ms micro-turns and encoder-free fusion, aiming to embed interactivity directly into the model and benchmarked against multiple rivals.
A technical blog post shows a 16% throughput and ~11% end-to-end latency improvement in multimodal inference by caching CUDA IPC pool handles in a Python dict, reducing host-side overhead in SGLang.
SGLang spun out as RadixArk with a $400M valuation, backed by Accel, built on Ion Stoica’s UC Berkeley open-source work.
Subscribe for real-time topic updates and unlimited access to our intelligence platform.