Particle.news

Google Rolls Out Gemini 3.1 Flash‑Lite in Preview for Low‑Cost, Low‑Latency AI Workloads

Built for high‑volume developer tasks, it introduces thinking‑level controls with Google‑reported speed gains over Gemini 2.5 Flash.

Overview

  • The model is available in preview through the Gemini API in Google AI Studio and for enterprises via Vertex AI, with early users including Latitude, Cartwheel and Whering.
  • Pricing is set at $0.25 per 1 million input tokens and $1.50 per 1 million output tokens.
  • Google reports a 2.5x faster time to first token and a 45% increase in output speed versus Gemini 2.5 Flash, citing the Artificial Analysis benchmark.
  • Published results list an Arena.ai Elo score of 1432, 86.9% on GPQA Diamond and 76.8% on MMMU Pro.
  • Developers can adjust built‑in thinking levels to balance latency, cost and depth for tasks such as translation, content moderation, UI generation and simulations.