DeepSeek 4 Flash local inference engine for Metal
↗A Metal-only local inference engine (ds4.c) for DeepSeek V4 Flash with 1M-token context, 2-bit quantization, and disk-backed KV cache, offering OpenAI/Anthropic-compatible local APIs but with alpha-quality code and very high RAM requirements on macOS.
May 7, 20261%