Inference is a topic tracked in our intelligence system with 5 linked articles.
DIY install of a Tesla V100 SXM2 datacenter GPU into a gaming PC with an SXM2-to-PCIe adapter yields 32GB VRAM total and ~32 tokens/sec local LLM inference for ~£200, plus caveats on fan noise and software compatibility.
Liquid AI unveils LFM2.5-8B-A1B, an 8B parameter MoE edge model with 128K context, 38T pretraining, expanded tokenizer, and strong on-device benchmarking and tool-calling capabilities.
TechCrunch pieces reveal hard funding/data-center economics around AI compute, highlighting XCENA’s memory-centric MX1 chip and multiple seed/Series B rounds, underscoring a shift in AI inference architecture alongside RSI/AGI discourse.
XCENA raises $135M Series B at a $570M valuation to back memory-centric AI hardware (MX1) with mass production expected by late 2026 and revenue in 2027.
AI-driven agent workloads are forcing a redesign of cloud infrastructure, highlighted by AWS OpenSearch Serverless for agentic tasks, massive AI-chip/storage deals (Snowflake/AWS), and record funding rounds (Anthropic) signaling a shift to machine-generated internet traffic.
The piece argues Anthropic and OpenAI have achieved product-market fit, citing rising enterprise spend, API-pricing shifts, and large-scale inference budgets, with IPO pressures influencing pricing and sales strategy.
Subscribe for real-time topic updates and unlimited access to our intelligence platform.