A significant challenge to VCA technology is not simply providing a correct result, but providing the desired result
see bigger image
Figure 5: Providing correct and useful results requires intersection of user intentions, VCA interpretation, and results provided by VCA tool

The term Video Content Analysis (VCA) is often used to describe sophisticated video analytics technologies designed to assist analysts in classifying or detecting events of interest in videos. These events may include the appearance of a particular object, class of objects, or action. VCA technology employs a complex mix of algorithms, typically encompassing the fields of Computer Vision, Machine Learning, and Information Retrieval. This inherent complexity of VCA problems makes success difficult to demonstrate.

Over the past four years, System Planning Corporation, of Arlington, Virginia, has performed several technology surveys and evaluated nine VCA technologies for object classification and video search. Through discussions with potential users and detailed technical interchanges with developers, it has become clear that negative perceptions represent a significant obstacle to wider adoption of VCA technology.

Despite a number of challenges, we believe that VCA does have the potential to help solve real-world problems.

Challenge: fiction vs. reality

The complexity, robustness, and maturity of VCA technology is rapidly advancing. This, coupled with fictional portrayals of VCA technology and widespread use of narrow implementations, sometimes makes it difficult to know where reality ends and fantasy begins. What was impossible a few years ago is now commonplace.

Movies and TV shows like the CSI and NCIS franchises blur the line between fact and fiction by portraying video search capabilities that do exist, but with considerably more speed, automation, accuracy, and robustness than currently achievable. Furthermore, Licence Plate Readers (LPR) are used regularly by toll booths and parking garages, while facial recognition software can be used to log into your laptop or unlock your smart phone. Potential users therefore see the ubiquity of video and image analytics tools, but don’t necessarily appreciate the operational constraints that are necessary to make such systems work. Together, these can contribute to unrealistic expectations regarding the capabilities of state-of-the-art VCA technology.

Challenge: Agreeing on the question

Video search is inherently ambiguous due to the complexity and depth of information contained in an image. Each image chip in Figure 2 represents a potential match to the query image shown in Figure 1. Whether the chip represents a true positive (i.e. right answer) depends on the mission at hand. Stated more technically, whether a potential match is correct or not depends on the level of fidelity (i.e. precision) required; a right answer for one user may represent a wrong for another (Figure 3).

Example of highly-ambiguous query image, in which a single image can represent many concepts
Figure 1. Example of highly-ambiguous query image,  in which a single image can represent many concepts
Range of possible matches for the query image in Figure 1. What constitutes a right answer depends on the germane features in the query image
Figure 2. Range of possible matches for the query image in Figure 1. What constitutes a right answer depends on the germane features in the query image
Required precision depends on the mission at hand
Figure 3. Required precision depends on the mission at hand

Imagine a law enforcement scenario involving an overnight crime in the vicinity of a low-quality CCTV security camera. Initially, investigators might use VCA to find vehicles passing though the camera’s field of view. At this stage, any detected vehicle will constitute correct answer. After viewing all vehicles in the video and correlating with other information, an investigator decides that the suspect vehicle is a silver sedan. A nearby higher-quality video source is then queried to find all silver sedans – thus any detected silver sedan will be represent a true positive. Finally, after additional investigation, the suspect vehicle is identified. Now, any subsequent searches will accept only make/model matches as right answers.

Challenge: The black box

The previous section presented examples in which search results are technically correct, yet are inconsistent with the user’s expectations and requirements. This is a challenge that is inherent in VCA algorithms. Because computer vision algorithms represent objects using numerical models (descriptors), their interpretations of an image are not readily understood by human operators. The result is that the software cannot easily ask “Is this what you meant?” to clarify features of interest. The cartoon in Figure 4 shows descriptors associated with the Histogram of Oriented Gradients (HOG) algorithm and possible resulting matches. In such a case, the only way to present the user with a choice is to provide the answers on the right, 50% of which are likely incorrect.

Numerical descriptors affect how an object is interpreted, yet are difficult to convey to a user
Figure 4. Numerical descriptors affect how an object is interpreted, yet are difficult to convey to a user

Closing the confidence gap

A significant challenge to VCA technology is not simply providing a correct result, but providing the desired result. As Figure 5 shows, providing desired results requires the convergence of three concepts: user intentions, VCA Interpretation, and VCA Results. A perfectly-implemented algorithm would achieve complete overlap between interpretation and results, but this is not sufficient to satisfy user requirements. In order to convince potential users of the value of VCA tools, overlap between the user intention and the software’s interpretation of those intentions must be significant.

There are two ways to increase the overlap between user intentions and VCA interpretation of those intentions:

1. Train the software

Train the software



2. Train the user

Train the user

We believe that the strengths of user training make this the preferred path toward improving confidence in VCA technology. This means that users must take the time to understand the subtleties and nuances of presented results and experiment with query images to learn how to limit undesirable answers.

Path forward

While we believe that user training is the best path toward improving acceptance, it is incumbent on VCA developers to provide users with useful tools and information. Without insight into what caused the VCA software to return a result, users cannot effectively modify their queries to improve performance. At a minimum, VCA tools should provide the following information:

  • What was detected (bounding box)
  • Why it was deemed a match (parametric scores)

Such information would provide insight into the VCA tool’s “thought process”, thereby allowing the user to understand which image feature are driving the results. Having provided users with information to better understand search results, software should also provide users with a means to modify those results by accentuating or de-emphasising certain image features.

VCA technology is not ready to autonomously provide perfect, error-free results, but with the right training and user experience, VCA is ready to make significant improvements in video analysts’ workflow.

Share with LinkedIn Share with Twitter Share with Facebook Share with Facebook
Download PDF version Download PDF version

Author profile

Gary Rubin Director for Analysis and Support, System Planning Corporation

In case you missed it

Disruptive Innovation Providing New Opportunities In Smart Cities
Disruptive Innovation Providing New Opportunities In Smart Cities

Growth is accelerating in the smart cities market, which will quadruple in the next four years based on 2020 numbers. Top priorities are resilient energy and infrastructure projects, followed by data-driven public safety and intelligent transportation. Innovation in smart cities will come from the continual maturation of relevant technologies such as artificial intelligence (AI), the Internet of Things (IoT), fifth-generation telecommunications (5G) and edge-to-cloud networking. AI and computer vision (video analytics) are driving challenges in security and safety, in particular, with video management systems (VMSs) capturing video streams and exposing them to various AI analytics. Adoption of disruptive technologies “Cities are entering the critical part of the adoption curve,” said Kasia Hanson, Global Director, Partner Sales, IOT Video, Safe Cities, Intel Corp. “They are beginning to cross the chasm to realize their smart city vision. Cities are taking notice and have new incentives to push harder than before. They are in a better position to innovate.” “Safety and security were already important market drivers responsible for adoption of AI, computer vision and edge computing scenarios,” commented Hanson, in a presentation at the Milestone Integration Platform Symposium (MIPS) 2021. She added: “2020 was an inflection point when technology and the market were ripe for disruption. COVID has accelerated the adoption of disruptive technologies in ways we could not have predicted last year.” Challenges faced by cities Spending in the European Union on public order and safety alone stood at 1.7% of GDP in 2018 Providing wide-ranging services is an expanding need in cities of all sizes. There are currently 33 megacities globally with populations over 10 million. There are also another 4,000 cities with populations over 100,000 inhabitants. Challenges for all cities include improving public health and safety, addressing environmental pressures, enabling mobility, improving quality of life, promoting economic competitiveness, and reducing costs. Spending in the European Union on public order and safety alone stood at 1.7% of GDP in 2018. Other challenges include air quality – 80% of those living in urban areas are exposed to air quality levels that exceed World Health Organization (WHO) limits. Highlighting mobility concerns is an eye-opening statistic from Los Angeles in 2017: Residents spent an average of 102 hours sitting in traffic. Smart technology “The Smart City of Today can enable rich and diverse use cases,” says Hanson. Examples include AI-enabled traffic signals to help reduce air pollution, and machine learning for public safety such as real-time visualization and emergency response. Public safety use cases include smart and connected outdoor lighting, smart buildings, crime prevention, video wearables for field agents, smart kiosks, and detection of noise level, glass breaks, and gunshots. Smart technology will make indoor spaces safer by controlling access to a building with keyless and touchless entry. In the age of COVID, systems can also detect face mask compliance, screen for fever, and ensure physical distancing. 2020 was an inflection point when technology and the smart cities market were ripe for disruption, Kasia Hanson told the MIPS 2021 audience. Video solutions Video workloads will provide core capabilities as entertainment venues reopen after the pandemic. When audiences attend an event at a city stadium, deep learning and AI capabilities analyze customer behaviors to create new routes, pathways, signage and to optimize cleaning operations. Personalized digital experiences will add to the overall entertainment value. In the public safety arena, video enables core capabilities such as protection of people, assets, and property, emergency response, and real-time visualization, and increased situational awareness. Video also provides intelligent incident management, better operational efficiency, and faster information sharing and collaboration. Smart video strategy Intel and Milestone provide video solutions across many use cases, including safety and security Video at the edge is a key element in end-to-end solutions. Transforming data from various point solutions into insights is complicated, time-consuming, and costly. Cities and public venues are looking for hardware, software, and industry expertise to provide the right mix of performance, capabilities, and cost-effectiveness. Intel’s smart video strategy focuses around its OpenVINO toolkit. OpenVINO, which is short for Open Visual Inference and Neural network Optimization, enables customers to build and deploy high-performing computer vision and deep learning inference applications. Intel and Milestone partnership – Video solutions “Our customers are asking for choice and flexibility at the edge, on-premises and in the cloud,” said Hansen in her presentation at the virtual conference. “They want the choice to integrate with large-scale software packages to speed deployment and ensure consistency over time. They need to be able to scale computer vision. Resolutions are increasing alongside growth in sensor installations themselves. They have to be able to accommodate that volume, no matter what causes it to grow.” As partners, Intel and Milestone provide video solutions across many use cases, including safety and security. In effect, the partnership combines Intel’s portfolio of video, computer vision, inferencing, and AI capabilities with Milestone’s video management software and community of analytics partners. Given its complex needs, the smart cities market is particularly inviting for these technologies.

What Are the Physical Security Challenges of Smart Cities?
What Are the Physical Security Challenges of Smart Cities?

The emergence of smart cities provides real-world evidence of the vast capabilities of the Internet of Things (IoT). Urban areas today can deploy a variety of IoT sensors to collect data that is then analyzed to provide insights to drive better decision-making and ultimately to make modern cities more livable. Safety and security are an important aspect of smart cities, and the capabilities that drive smarter cities also enable technologies that make them safer. We asked this week’s Expert Panel Roundtable: What are the physical security challenges of smart cities?

New Markets For AI-Powered Smart Cameras In 2021
New Markets For AI-Powered Smart Cameras In 2021

Organizations faced a number of unforeseen challenges in nearly every business sector throughout 2020 – and continuing into 2021. Until now, businesses have been on the defensive, reacting to the shifting workforce and economic conditions, however, COVID-19 proved to be a catalyst for some to accelerate their long-term technology and digitalization plans. This is now giving decision-makers the chance to take a proactive approach to mitigate current and post-pandemic risks. These long-term technology solutions can be used for today’s new world of social distancing and face mask policies and flexibly repurposed for tomorrow’s renewed focus on efficiency and business optimization. For many, this emphasis on optimization will likely be precipitated by not only the resulting economic impacts of the pandemic but also the growing sophistication and maturity of technologies such as Artificial Intelligence (AI) and Machine Learning (ML), technologies that are coming of age just when they seem to be needed the most.COVID-19 proved to be a catalyst for some to accelerate their long-term technology and digitalization plans Combined with today’s cutting-edge computer vision capabilities, AI and ML have produced smart cameras that have enabled organizations to more easily implement and comply with new health and safety requirements. Smart cameras equipped with AI-enabled intelligent video analytic applications can also be used in a variety of use cases that take into account traditional security applications, as well as business or operational optimization, uses – all on a single camera. As the applications for video analytics become more and more mainstream - providing valuable insights to a variety of industries - 2021 will be a year to explore new areas of use for AI-powered cameras. Optimizing production workflows and product quality in agriculture Surveillance and monitoring technologies are offering value to industries such as agriculture by providing a cost-effective solution for monitoring of crops, business assets and optimizing production processes. As many in the agriculture sector seek to find new technologies to assist in reducing energy usage, as well as reduce the environmental strain of modern farming, they can find an unusual ally in smart surveillance. Some niche farming organizations are already implementing AI solutions to monitor crops for peak production freshness in order to reduce waste and increase product quality.  For users who face environmental threats, such as mold, parasites, or other insects, smart surveillance monitoring can assist in the early identification of these pests and notify proper personnel before damage has occurred. They can also monitor vast amounts of livestock in fields to ensure safety from predators or to identify if an animal is injured. Using video monitoring in the growing environment as well as along the supply chain can also prove valuable to large-scale agriculture production. Applications can track and manage inventory in real-time, improving knowledge of high-demand items and allowing for better supply chain planning, further reducing potential spoilage. Efficient monitoring in manufacturing and logistics New challenges have arisen in the transportation and logistics sector, with the industry experiencing global growth. While security and operational requirements are changing, smart surveillance offers an entirely new way to monitor and control the physical side of logistics, correcting problems that often go undetected by the human eye, but have a significant impact on the overall customer experience. Smart surveillance offers an entirely new way to monitor and control the physical side of logistics, correcting problems that often go undetected by the human eye. Video analytics can assist logistic service providers in successfully delivering the correct product to the right location and customer in its original condition, which normally requires the supply chain to be both secure and ultra-efficient. The latest camera technology and intelligent software algorithms can analyze footage directly on the camera – detecting a damaged package at the loading dock before it is loaded onto a truck for delivery. When shipments come in, smart cameras can also alert drivers of empty loading bays available for offloading or alert facility staff of potential blockages or hazards for incoming and outgoing vehicles that could delay delivery schedules planned down to the minute. For monitoring and detecting specific vehicles, computer vision in combination with video analysis enables security cameras to streamline access control measures with license plate recognition. Smart cameras equipped with this technology can identify incoming and outgoing trucks - ensuring that only authorized vehicles gain access to transfer points or warehouses. Enhance regulatory safety measures in industrial settings  Smart surveillance and AI-enabled applications can be used to ensure compliance with organizational or regulatory safety measures in industrial environments. Object detection apps can identify if employees are wearing proper safety gear, such as facial coverings, hard hats, or lifting belts. Similar to the prevention of break-ins and theft, cameras equipped with behavior detection can help to automatically recognize accidents at an early stage. For example, if a worker falls to the ground or is hit by a falling object, the system recognizes this as unusual behavior and reports it immediately. Going beyond employee safety is the ability to use this technology for vital preventative maintenance on machinery and structures. A camera can identify potential safety hazards, such as a loose cable causing sparks, potential wiring hazards, or even detect defects in raw materials. Other more subtle changes, such as gradual structural shifts/crack or increases in vibrations – ones that would take the human eye months or years to discover – are detectable by smart cameras trained to detect the first signs of mechanical deterioration that could potentially pose a physical safety risk to people or assets. Early recognition of fire and smoke is another use case where industrial decision-makers can find value. Conventional fire alarms are often difficult to properly mount in buildings or outdoor spaces and they require a lot of maintenance. Smart security cameras can be deployed in difficult or hard-to-reach areas. When equipped with fire detection applications, they can trigger notification far earlier than a conventional fire alarm – as well as reduce false alarms by distinguishing between smoke, fog, or other objects that trigger false alarms. By digitizing analog environments, whether a smoke detector or an analog pressure gauge, decision-makers will have access to a wealth of data for analysis that will enable them to optimize highly technical processes along different stages of manufacturing - as well as ensure employee safety and security of industrial assets and resources. Looking forward to the future of smart surveillance With the rise of automation in all three of these markets, from intelligent shelving systems in warehouses to autonomous-driving trucks, object detection for security threats, and the use of AI in monitoring agricultural crops and livestock, the overall demand for computer vision and video analytics will continue to grow. That is why now is the best time for decision-makers across a number of industries to examine their current infrastructure and determine if they are ready to make an investment in a sustainable, multi-use, and long-term security and business optimization solution.