Abstract:
Accurate detection of workers in wide-angle surveillance images is significant for intelligent surveillance in port terminals. However, the traditional YOLOv7 algorithm has limitations on the recognition of workers in wide-angle surveillance images, such as weak feature extraction ability, low detection accuracy, etc. To fill these gaps, an algorithm for terminal worker detection based on improved YOLOv7 is proposed. A task-specific context decoupling (TSCODE) structure balancing the classification and localization tasks is designed, and the gather-and-distribute mechanism (GD) improving the fusion of multi-scale features is applied, which improves the performance and robustness of multiscale features detection from various workers'images. To strengthen the feature extraction of small targets, the vision transformer with bi-level routing attention (BRA-ViT) is introduced into the end of the backbone network, capturing the position, direction, and cross-channel information of small objects. The slim-neck is used to lighten the neck of the network, refine the number of parameters, and reduce computational complexity, enhancing detection speed while maintaining detection accuracy. Fourthly, a loss function with minimum-point-distance-based intersection over union (MPDIoU) is used to calculate the prediction loss of the bounding box, reducing the rates of false negatives and false positives. To validate the proposed algorithm, wide-angle surveillance images in different areas of the port (quay, yard, chokepoint, and other locations) at different times (day and night) are collected and annotated in the dataset, and ablation and comparison experiments are implemented. The results show that the average detection precision (AP) and average detection speed of the proposed algorithm are 90.6% and 39 fps, respectively. Compared with Faster R-CNN, SSD, YOLOv3, YOLOv5, YOLOv7, and YOLOv8, AP of the proposed algorithm is improved by 13.8%, 15.8%, 8.5%, 5.2%, 2.7%, and 3.5%, respectively; FPS of the proposed algorithm is similar to the baseline YOLOv7 algorithm. In summary, the proposed algorithm has higher AP than existing algorithms with responsible detection speed, which is suitable for real-time safety and security surveillance in port terminals.