Milestone: successful AI will depend on ethically sourced data

0
7

As AI continues its global adoption, more organisations are realising the need for data that’s accurate, up-to-date, and trusted by stakeholders. Data is the fuel for our business decisions, our strategies, and even for training AI models themselves. The choice of source and quality of that data is therefore essential for the successful implementation of AI technology.

Having a rich source of data is a critical differentiator for businesses today. Traditionally, according to Milestone Systems, the data used to train AI models has been scraped from somewhere, be that publicly available sources such as the Internet, or elsewhere. Of course, that can lead to untrusted, questionable outcomes based on the quality of data being inputted.

Plus, in the race to create the very best models, AI developers can face possible legal and ethical issues around the use of that data. In some cases, this has led to significant court cases around copyright and data consent

That’s where video data and its metadata are an invaluable asset, offering on-the-ground insights into movement patterns, dwell times, behaviour, peak times, and more, that impact everything from sales and promotions to maintenance and cleaning.

A recent project by Milestone Systems, called Project Hafnia, showcases what’s possible for training machine learning and deep learning models in complex activities and objects. Project Hafnia carefully curates and tags vast swathes of data and delivers it as a ready-made package for AI model training, streamlining the process of developing AI models. The data is recycled, anonymised and tidied to provide high-quality, traceable data.

The key with Project Hafnia data is that it is responsibly sourced, meaning users can rest assured that their training meets the various regulations required of data controllers and processors across Europe, the UK and beyond. As Søren Rågård Jensen, Executive Product Enablement Manager at Milestone Systems, explains, “The foundation of Hafnia is a fully compliant, responsibly sourced video data library… It’s a video data library at a scale that is more data than any solution developer of advanced video analytics can source themselves.”

Four considerations for data usage
Project Hafnia highlights several areas that business leaders must be aware of when using data:
1. Data quality and timeliness: If the input isn’t accurate or up-to-date with current conditions, then the result and/or training of the AI model will be incorrect. Decisions cannot be made on untrustworthy results, and this will impact ongoing trust in AI and data use in your organisation. Inaccurate data can quickly scale, snowballing until decision-making and operations are threatened.
2. Sensitive information: Datasets scraped from questionable sources may inadvertently include sensitive information, harmful content or misinformation. If the dataset isn’t broad and diverse enough, it can lead to biased results. The dataset may not, by default, protect the privacy of individuals whose data has been used, with their permission or otherwise.

3. Resource-intensive: The broader the dataset, the more manual curation and cleaning it used to require. The more humans involved in creating a dataset, the greater the risk of human error causing quality issues and inaccuracies.

4. Legislation: Organisations need to ensure that they comply with GDPR and other region-specific legislation, including the incoming EU AI Act, both in terms of sensitive information and data lineage. Emerging regulations may require companies to disclose dataset provenance, at least in broad terms. This can be difficult for AI developers to do.

When everything aligns, and business leaders have the right data sources at their disposal, the results can be staggering. In Genoa, a smart city, an extensive use of cameras is supporting traffic management, road safety, civil protection, and emergency response. The City has entered into a data licensing agreement with Milestone Hafnia and has now provided a large amount of traffic video footage. This data fine-tunes the largest vision-language model for traffic, enabling full end-to-end text summarisation of events in a video clip without immediate human intervention. The data is always protected, with videos remaining on the Milestone platform.

Applied to other industries, this could see video becoming a key asset for retailers, in planning store layout, promotional messaging, product placement and even opening hours. Video insights can help leaders understand footfall at specific times, to inform staffing, as well as movement within a store, so high-profit items are placed in the busiest areas. Anonymisation and annotation within the video management platform mean sensitive data isn’t shared, and data provenance can be tracked.

Being a frontrunner in this century will be determined by the data you use and the AI models trained on it. If every decision and automation will be made based on data, you must ensure that the data is high-quality, trustworthy, and ethical.

“In the future, I think we will have a rubber stamp saying, ‘ethically trained,’” states Søren. “That is really high up on our agenda and part of the responsible technology agenda being driven by Milestone Systems. “I think this can be a new direction for technology, not just because it can be done, but because it is creating a better world.”