🏷️Topic

Multimodal

10 articles

First tracked: Jan 26, 2026

Last updated: Jun 21, 2026

Overview

Multimodal is a topic tracked in our intelligence system with 6 linked articles.

Latest Coverage

Launch HN: Poly (YC S22) – Cursor for Files

↗

YC S22-backed Poly aims to replace traditional file explorers with AI-powered, cross-platform search and an agent, offering a 100GB free tier and 2TB for $10/month (plus small per-GB charges) and a public waitlist/demo.

Nov 20, 20251%

Gemma 4 12B: A unified, encoder-free multimodal model

↗

Gemma 4 12B claims a unified, encoder-free multimodal model.

Jun 3, 20261%

Interaction Models

↗

Thinking Machines unveils 'Interaction Models'—a real-time, multimodal, two-model architecture with 200ms micro-turns and encoder-free fusion, aiming to embed interactivity directly into the model and benchmarked against multiple rivals.

May 12, 20261%

Boosting multimodal inference performance by >10% with a single Python dict

↗

A technical blog post shows a 16% throughput and ~11% end-to-end latency improvement in multimodal inference by caching CUDA IPC pool handles in a Python dict, reducing host-side overhead in SGLang.

May 9, 20261%

Show HN: Real-time AI (audio/video in, voice out) on an M3 Pro with Gemma E2B

↗

Parlor demonstrates real-time, on-device AI on Apple M3 Pro using Gemma 4 E2B and Kokoro TTS with end-to-end latency ~2.5–3.0s and ~2.6 GB model size, highlighting low server reliance and open-source licensing.

Apr 6, 20261%

Recall – local multimodal semantic search for your files

↗

Recall is an open-source, local multimodal memory search that stores vectors on-device using Gemini Embedding 2 and ChromaDB, with a setup wizard and Raycast UI, and optional Google embedding API calls.

Apr 6, 20261%

Unlock 10+ topic insights

Subscribe for real-time topic updates and unlimited access to our intelligence platform.

Get Watch Sign in

Related Entities

235

419