The Deep Learning phenomenon continues to excite the IT world, with computing power now at the level where it can be properly used in practical applications. Hikvision has been at the forefront of applying the technology in the surveillance industry and beyond, and has already released its first set of products that harness the power of Artificial Intelligence (AI).
The concept of Deep Learning takes inspiration from the way the human brain works. Our brains can be seen as a very complex deep learning model. Brain neural networks are comprised of billions of interconnected neurons; deep learning simulates this structure. These multi-layer networks can collect information and perform corresponding actions according to an analysis of that information.
In the past two years, the technology has excelled in speech recognition, computer vision, voice translation, and much more. It has even surpassed human capabilities in the areas of facial verification and image classification; hence, it has been highly regarded in the field of video surveillance in the security industry. Its ability to enhance the recognition of human beings – distinguishing them from animals, for example – makes the technology a great addition to the security arsenal. This is especially relevant in a world where false alarms account for 94%-99% of all alarms, according to police and fire service statistics!
How deep learning works
Deep learning is intrinsically different from other algorithms. The way it solves the insufficiencies of traditional algorithms is encompassed in the following aspects. The algorithmic model for deep learning has a much deeper structure than the traditional algorithms. Sometimes, the number of layers can reach over a hundred, enabling it to process large amounts of data in complex classifications. Deep learning is very similar to the human learning process, and has a layer-by-layer feature-abstraction process. Each layer will have different “weighting,” and this weighting reflects on what was learnt about the images’ “components.” The higher the layer level, the more specific the components. Just like the human brain, an original signal in deep learning passes through layers of processing; next, it takes a partial understanding (shallow) to an overall abstraction (deep) where it can perceive the object.
Deep learning does not require manual intervention, but relies on a computer to extract features by itself. This way, it is able to extract as many features from the target as possible, including abstract features that are difficult or impossible to describe. The more features there are, the more accurate the recognition and classification will be. Some of the most direct benefits that deep learning algorithms can bring include achieving comparable or even better-than-human pattern recognition accuracy, strong anti-interference capabilities, and the ability to classify and recognize thousands of features.
Challenges of existing systems
Conventional surveillance systems, mostly detect moving targets, without further analysis. Even smart IP cameras can only map individual points on a shape one by one, making it difficult to calibrate some features (e.g. forehead or cheek), thus decreasing accuracy.
For perimeter security, for example, other technologies can be (and are) used to provide more comprehensive security. But they all have their downsides. Infrared emission detectors can be ‘jumped over’ but are also prone to false alarms caused by animals. Electronic fences can be a safety hazard, and are limited in certain areas. Some of these solutions can also be expensive and complicated to install.
An object such as animals, leaves, or even light can cause false alarms, so being able to identify the presence of a human shape really improves the accuracy of perimeter VCA functions. Frequent false alarms are always an issue for end-users, who need to spend time to investigate each one, potentially delaying any necessary response and generally affecting efficiency.
Imagine, for example, a scenario where it’s relatively quiet – a location at night where there are few cars and people around. Even here, there could be 50 false alarms in a night. Assuming it takes 2-3 minutes to check out a false alarm, and that just 3 out of the 50 warrant more attention – say 15 minutes each. A guard either needs to check the system and look back at the alert, or someone needs to be dispatched to the location and look around, checking if anyone has indeed ‘entered without permission’. In most organizations, these would need to be reported/recorded too, adding to the overall time spent on this ‘false alarm’. So, those 50 false alarms could cost more than two hours each night of wasted time in that scenario.
Deep Learning, however, makes a big difference. With a large amount of good quality data from the cameras and other sources, like the Hikvision Research Institute, and over a hundred data cleaning team members to label the video images, sample data with millions of categories within the industry have been accumulated. With this large amount of quality training data, human, vehicle, and object pattern recognition models become more and more accurate for video surveillance use.
Based on a series of experiments, the recognition accuracy of solutions using the Deep Learning algorithm increased accuracy by 38% – applying this to the previous example, that’s a saving of nearly one hour each night. This makes Deep Learning technology a great advantage in a perimeter security solution, with much more accurate line crossing, intrusion, entrance and exit detection.
Other uses
The value of Deep Learning technology stretches further than traditional security. For example, tracking movement patterns of individuals can see if they are ‘loitering’ and a potential threat in the future. A threshold could be set to five meters radius of movement, or ten seconds of staying in the same place. If the person passes either threshold, an alarm could be triggered. The solution tracks the individual and compares this behaviour to a database to see if it recognizes a pattern.
Another application would be in a scenario where ‘falling down’ could be a threat, like an elderly care home. If a height threshold was set at 0.5m and duration time 10 seconds, for example, the solution would be able to see a person falling down (as they go below 0.5m) and might be in trouble (if they ‘stay down’ for longer than 10 seconds). The solution uses the parameters set to compare with its database and raise an alarm.With features and benefits like these, it’s easy to see how many smart applications could be catered for by Deep Learning technology. To sum up, a 10,000-strong R&D Centre is pushing the boundaries of surveillance solutions and bringing even more benefits to them. Artificial Intelligence has massive potential, and Hikvision is always exploring new ways to apply this exciting technology throughout the security industry and beyond.