Milestone launches vision language model

0
47

Milestone Systems, a world leader in data-driven video technology, has released an advanced vision language model (VLM) specialising in traffic understanding and powered by Nvidia Cosmos Reason.

The VLM powers two new products, a Video Summarization tool for Xprotect Video Management Software and a VLM as a Service for third party integrations.
Video Summarization for Xprotect allows users to search summaries from visual data and automates reporting.

Today’s video systems capture vast amounts of data, and reviewing footage remains time consuming and largely manual. With Milestone Systems’ new Video Summarization tool – a generative AI-powered

The Video Summarization is free to download and takes only a few minutes to install directly in the Xprotect Smart Client. And users only pay when prompting the VLM. With Milestone’s Hafnia VLM as a Service (VLMaaS), developers, integrators and partners get API access to production-ready video intelligence built on Nvidia’s latest technology and fine-tuned on responsibly sourced data.

The VLMaaS helps developers create AI-powered solutions quickly without needing to set up, fine-tune or manage their own AI systems – it enhances any existing solutions with generative AI, regardless of the level of analytics currently in place. This makes it fast and simple to add advanced video intelligence features to applications, whether it’s testing a minimum viable product (MVP) or scaling a platform.

With VLMaaS, the development of AI and analytics can be accelerated significantly – up to 70 times less effort than doing the work to fine-tune a VLM model to do the same.

Key capabilities here offer:
● Access high accuracy vision language model, fine-tune on traffic optimised data and built on Nvidia Cosmos Reason
● Follow prompt-based instructions for traffic-related operations
● API-first delivery – simple integration via HTTPS
● Fine-tuned models for US and EU markets, with more regions to follow
● Designed to build standalone solutions or integrate with the Milestone product portfolio
● 100% responsibly sourced training data with auditable data lineage, GDPR- and EU AI Act-compliant, used for the fine-tuning of the model

Pricing for the VLMaaS is pay-per-use (based on API calls), meaning no large upfront investments or custom training costs.

Andrew Burnett, Acting Chief Technology Officer, Milestone Systems, said: “With the Vision Language Model as a Service and Video Summarization for Xprotect, we’re tackling some of the most challenging bottlenecks: video overload and time-consuming manual work. Operators get immediate insight directly within Xprotect; builders get API‑first access to production‑ready intelligence without bespoke training or heavy infrastructure.

Because this model is specialised for real-world traffic video and fine-tuned on responsibly sourced data, customers can trust the results, deploy with confidence, and enhance all existing solutions in place. It’s the fastest, most advanced and impactful path to turning video into actionable outcomes.”