ROI-WEIGHTED PRIMARY FRAME FILTERING FOR EFFICIENT DEEP VIDEO SURVEILLANCE

ROI-WEIGHTED PRIMARY FRAME FILTERING FOR EFFICIENT DEEP VIDEO SURVEILLANCE

Authors

  • Shohruh Begmatov
  • Mukhriddin Arabboev
  • Akhram Nishanov

DOI:

https://doi.org/10.5281/zenodo.21072156

Keywords:

video surveillance; frame skipping; ROI-weighted difference; object-aware filtering; multi-object tracking; edge AI; computational load reduction

Abstract

Real-time video surveillance systems frequently rely on deep neural detectors and multi-object trackers.
Although these models provide high detection and tracking quality, processing every frame is computationally expensive,
especially for long-term multi-camera deployments. This paper presents an ROI-weighted primary filtering algorithm that
decides whether an incoming video frame should be processed by a full deep model or skipped and propagated using
previously computed results. The method is motivated by the observation that ordinary mean absolute frame difference
treats all pixels equally. At the same time, surveillance decisions are usually more sensitive to changes in object regions
than to background fluctuations. The proposed estimator constructs a spatial importance map from previously detected
bounding boxes, expands these regions by a dilation margin, and computes a normalized weighted frame-difference
score. A cost-minimization rule converts this score into a binary processing gate. Experiments on MOT20 sequences
show that the ROI-weighted score amplifies object-related changes by 1.15x to 1.87x relative to global MAFD. Practical
deployment scenarios demonstrate that more than 65% of frames can be handled by lightweight filtering, reducing computational
load by 56.69% and 61.31% in two surveillance cases. These findings indicate that object-aware frame filtering
is a simple and effective pre-inference mechanism for resource-efficient video analytics.

Author Biographies

Shohruh Begmatov

Tashkent University of Information
Technologies named after Muhammad al-Khwarizmi
Doctor of Philosophy (PhD) in Technical Sciences

Mukhriddin Arabboev

Tashkent University of Information
Technologies named after Muhammad al-Khwarizmi
Doctor of Philosophy (PhD) in Technical Sciences

Akhram Nishanov

Tashkent University of Information
Technologies named after Muhammad al-Khwarizmi
Doctor of Science in Technical Sciences, Professor

References

A. Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, “Simple online and real-time tracking,” in 2016 IEEE International

Conference on Image Processing (ICIP), 2016, pp. 3464–3468, doi: 10.1109/ICIP.2016.7533003.

N. Wojke, A. Bewley, and D. Paulus, “Simple online and realtime tracking with a deep association metric,” in 2017

IEEE International Conference on Image Processing (ICIP), 2017, pp. 3645–3649, doi: 10.1109/ICIP.2017.8296962.

P. Dendorfer, H. Rezatofighi, A. Milan, J. Shi, D. Cremers, I. Reid, S. Roth, K. Schindler, and L. Leal-Taixé, “MOT20:

A benchmark for multi object tracking in crowded scenes,” arXiv preprint arXiv:2003.09003, 2020, doi: 10.48550/

arXiv.2003.09003.

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: Inverted residuals and linear bottlenecks,”

in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 4510–4520, doi:

1109/CVPR.2018.00474.

M. Tan, R. Pang, and Q. V. Le, “EfficientDet: Scalable and efficient object detection,” in 2020 IEEE/CVF Conference on

Computer Vision and Pattern Recognition (CVPR), 2020, pp. 10781–10790, doi: 10.1109/CVPR42600.2020.01079.

A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “YOLOv4: Optimal speed and accuracy of object detection,” arXiv

preprint arXiv:2004.10934, 2020, doi: 10.48550/arXiv.2004.10934.

S. Y. Nikouei, Y. Chen, S. Song, R. Xu, B.-Y. Choi, and T. R. Faughnan, “Real-time human detection as an edge

service enabled by a lightweight CNN,” in 2018 IEEE International Conference on EDGE Computing (EDGE), 2018,

pp. 125–129.

T. Yu, C. Chen, Y. Zhou, and X. Hu, “Improving surveillance object detection with adaptive omni-attention over both

inter-frame and intra-frame context,” in Proceedings of the Asian Conference on Computer Vision (ACCV), 2022, pp.

–2712.

Downloads

Published

2026-06-01
Loading...