The benefits and challenges of in-camera audio analytics for surveillance solutions

0
1348

Audio is often overlooked in the security and video surveillance industry. There are some intercom installations where audio plays a key role, but it’s not typically thought about when it comes to security and event management. Audio takes a back seat in many security systems because audio captured from a surveillance camera can have a different impact on the privacy of those being monitored.

Audio surveillance is therefore subject to strict laws that vary from state to state. Many states require a clearly posted sign indicating audio recording is taking place in an area before a person enters. Analytic information derived from audio can be a useful tool and when implemented correctly, removes any concerns over privacy or legal compliance.
Audio analytics processed in the camera, has been a niche and specialised area for many installers and end users. This could be due to state laws governing audio recording, however, audio analytics on the edge overcomes legal challenges as it never passes audio outside of the camera.

Processing audio analytics in-camera provides excellent privacy since audio data is analysed internally with a set of algorithms that only compare and assess the audio content. Processing audio analytics on the edge also reduces latency compared with any system that needs to send the raw audio to an on-premises or cloud server for analysis. Audio analytics can quickly pinpoint zones that security staff should focus on, which can dramatically shorten response times to incidents. Audio-derived data also provides a secondary layer of verification that an event is taking place which can help prioritise responses from police and emergency personnel.

Many IP-based cameras have small microphones embedded in the housing while some have a jack for connecting external microphones to the camera. Microphones on indoor cameras work well since the housing allows for a small hole to permit sound waves to reach the microphone. Outdoor cameras that are IP66 certified against water and dust ingress will typically have less sensitivity since the microphone is not exposed. In cases like these, an outdoor microphone, strategically placed, can significantly improve outdoor analytic accuracy.

There are several companies that make excellent directional microphones for outdoor use, some of which can also combat wind noise. Any high-quality external microphone should easily outperform a camera’s internal microphone in terms of analytic accuracy, so it is worth considering in areas where audio information gathering is deemed most important.
Surveillance cameras with a dedicated SoC (System on Chip) have become available in recent years with in-built video and audio analytics that can detect and classify audio events and send alerts to staff and emergency for sounds such as gunshots, screams, glass breaks and explosions. Having a SoC allows a manufacturer to reserve space for specialised features. For audio analytics, a database of reference sounds is needed for comparison.
The camera extracts the characteristics of the audio source collected using the camera’s internal or externally connected microphone and calculates its likelihood based on the pre-defined database. If a match is found for a known sound, e.g., gunshot, explosion, glass break, or scream, an event is triggered, and the message is passed to the VMS.

Audio detection
The first job of a well-configured camera or camera/mic pair is to detect sounds of interest while rejecting ancillary sounds and noise below a preset threshold. Each camera must be custom configured for its particular environment to detect audio levels which exceed a user-defined level. Since audio levels are typically greater in abnormal situations, any audio levels exceeding the baseline set levels are detected as being a potential security event. Operators can be notified of any abnormal situations via event signals allowing the operator to take suitable measures. Finding a baseline of background noise and setting an appropriate threshold level is the first step.

A simple threshold level may not be adequate enough to reduce false alarms depending on the environment where a camera or microphone is installed. Noise reduction is a feature on cameras that can reduce background noise greater than 55dB-65dB for increased detection accuracy. Installers should be able to enable or disable the noise reduction function and view the results to validate the optimum configuration during setup. With noise reduction enabled, the system analyses the attenuated audio source. As such, the audio source classification performance may be hindered or generate errors, so it is important to use noise reduction technology sparingly.

Audio source classification
It’s important to supply the analytic algorithm with a good audio level and a high signal-to-noise ratio to reduce the chance of generating false alarms under normal circumstances. Installers should experiment with ideal placement for both video as well as audio. While a ceiling corner might seem an ideal location for a camera, it might also cause background audio noise to be artificially amplified. Many cameras provide a graph which visualises audio source levels to allow for the intuitive checking of noise cancellation and detection levels.

It’s important to choose a VMS that has correctly integrated the camera’s API (application programming interface) in order to receive comprehensive audio analytic events that include the classification ID (explosion, glass break, gunshot, scream). A standard VMS that only supports generic alarms, may not be able to resolve all of the information. More advanced VMS solutions can identify different messages from the camera.
Well configured audio analytics can deliver critical information about a security event, accelerating response times and providing timely details beyond video-only surveillance. Analytics take privacy concerns out of the equation and allow installers and end users to use camera audio responsibly. Hanwha Techwin’s audio source classification technology, available in its X Series cameras, features three customisable settings for category, noise cancellation and detection level for optimum performance in a variety of installation environments