Multimodal is a topic tracked in our intelligence system with 6 linked articles.
Gemma 4 12B claims a unified, encoder-free multimodal model.
Thinking Machines unveils 'Interaction Models'βa real-time, multimodal, two-model architecture with 200ms micro-turns and encoder-free fusion, aiming to embed interactivity directly into the model and benchmarked against multiple rivals.
A technical blog post shows a 16% throughput and ~11% end-to-end latency improvement in multimodal inference by caching CUDA IPC pool handles in a Python dict, reducing host-side overhead in SGLang.
Parlor demonstrates real-time, on-device AI on Apple M3 Pro using Gemma 4 E2B and Kokoro TTS with end-to-end latency ~2.5β3.0s and ~2.6 GB model size, highlighting low server reliance and open-source licensing.
Recall is an open-source, local multimodal memory search that stores vectors on-device using Gemini Embedding 2 and ChromaDB, with a setup wizard and Raycast UI, and optional Google embedding API calls.
Googleβs AI Edge Gallery adds Gemma 4 support for on-device LLMs on iPhone with offline privacy, while detailing data collection and EU compliance posture.
Subscribe for real-time topic updates and unlimited access to our intelligence platform.