WO2023070955A1

WO2023070955A1 - Method and apparatus for detecting tiny target in port operation area on basis of computer vision

Info

Publication number: WO2023070955A1
Application number: PCT/CN2022/072005
Authority: WO
Inventors: 王硕; 郑智辉; 闫威; 唐波; 郭宸瑞; 董昊天; 闫涛; 李钊
Original assignee: 北京航天自动控制研究所
Priority date: 2021-10-29
Filing date: 2022-01-14
Publication date: 2023-05-04
Also published as: CN114120220A

Abstract

The present application belongs to the technical field of security monitoring of a port operation area, and relates to a method and apparatus for detecting a tiny target in a port operation area on the basis of computer vision, which method and apparatus solve the problems of existing training data being relatively limited and the accuracy of tiny article detection being low. The method comprises: performing motion information detection on a historical monitoring video of a port operation area, performing snapshotting according to motion information, and labeling a target to make a data set; establishing a neural network Yolov5x, and performing preliminary training on the neural network Yolov5x by using a part of the data set, so as to acquire a target detection model; performing target detection on the other part of the data set by using the target detection model, so as to analyze false detection and missed detection targets; updating the data set on the basis of the false detection and/or missed detection targets and by means of data augmentation, so as to perform reinforcement training on the target detection model; and performing, by using the reinforced target detection model, target detection on pictures to be subjected to detection. Target detection is performed, by using a reinforced target detection model, on a picture to be subjected to detection, thereby improving the accuracy of small target detection.

Description

一种基于计算机视觉的港口作业区微小目标检测方法和装置A computer vision-based detection method and device for tiny targets in port operation areas

技术领域technical field

本申请涉及港口作业区安防监控技术领域，尤其涉及一种基于计算机视觉的港口作业区微小目标检测方法和装置。The present application relates to the technical field of security monitoring in port operation areas, in particular to a method and device for detecting tiny targets in port operation areas based on computer vision.

背景技术Background technique

港口作业区是包括港池、锚地、进港航道、泊位等水域以及货运站、堆场、码头前沿、办公生活区域等陆域范围的能够容纳完整的集装箱装卸操作过程的具有明确界限的场所。它是水陆联运的枢纽站，是集装箱货物在转换运输方式时的缓冲地，也是货物的交接点，在整个集装箱运输过程中占有重要地位。The port operation area is a place with clear boundaries that can accommodate the complete container loading and unloading operation process, including water areas such as harbor basins, anchorages, approach channels, and berths, as well as land areas such as freight stations, storage yards, wharf fronts, and office and living areas. It is the hub station of land and water transport, a buffer place for container cargo when switching modes of transportation, and a handover point for cargo, occupying an important position in the entire container transportation process.

由于港口作业区的重要地位，其安全性能与秩序的要求高，因此，需要在港口作业区配备***严密的安保***。最重要的安保***为视频监控***。传统的视频监控***可以24小时对港口作业区的所有位置进行监控。然而，由于传统的监控***并不具备复杂的视频分析能力，以及监控点位繁多，安保人员精力有限等原因，使用现有的视频监控***无法及时对违规行为进行反应。由此，基于计算机视觉的目标检测算法引入视频监控***，实时检测港口工作区里的人员、车辆、作业机械等物体，并以此监控港口作业区的违规行为。Due to the important position of the port operation area, its safety performance and order requirements are high. Therefore, it is necessary to equip the port operation area with a strict security system. The most important security system is the video surveillance system. The traditional video surveillance system can monitor all locations in the port operation area 24 hours a day. However, due to the fact that the traditional monitoring system does not have complex video analysis capabilities, there are many monitoring points, and the security personnel have limited energy, it is impossible to respond to violations in a timely manner using the existing video monitoring system. Therefore, the target detection algorithm based on computer vision is introduced into the video surveillance system to detect people, vehicles, operating machinery and other objects in the port work area in real time, and to monitor violations in the port work area.

传统的计算机视觉目标检测算法，是基于物体的边缘等传统数字图像特征进行检测。由于港口作业区场景复杂，采用传统的数字图像特征进行检测，存在误检测严重的问题；由于港口作业区时长极为广阔，横向距离为300米以上，导致港口作业区内物体的相对尺寸过小，采用传统的数字图像特征进行检测，存在微小物体难以检测的问题。The traditional computer vision target detection algorithm is based on traditional digital image features such as the edge of the object. Due to the complex scene of the port operation area, traditional digital image features are used for detection, and there are serious problems of false detection; due to the extremely long duration of the port operation area and the lateral distance of more than 300 meters, the relative size of the objects in the port operation area is too small. Using traditional digital image features for detection, there is a problem that tiny objects are difficult to detect.

近些年来，深度学习技术应用到计算机视觉技术中，展现了其强大的功能。深度学习技术通过多层处理，逐渐从图片的低层特征表示中，提取出高层语义特征。根据高层语义特征，深度学习技术可以准确对数字图像里的目标进行识别。现有方法通过搭建摄像头采集图像，得到用于训练的训练集、用于验证的验证集以及用于测试的测试集并标注各图片集数据。同时，对数据集标注结果使用k-means，得到预选框的尺寸。之后使用yolo-tiny框架结合遗传算法进行模型训练并进行目标检测。In recent years, deep learning technology has been applied to computer vision technology, demonstrating its powerful functions. Deep learning technology gradually extracts high-level semantic features from the low-level feature representation of pictures through multi-layer processing. Based on high-level semantic features, deep learning technology can accurately identify objects in digital images. The existing method collects images by building a camera, obtains a training set for training, a verification set for verification, and a test set for testing, and labels the data of each picture set. At the same time, use k-means on the dataset labeling results to get the size of the preselection box. Then use the yolo-tiny framework combined with the genetic algorithm for model training and target detection.

现有的深度学习检测方法中，存在以下问题：In the existing deep learning detection methods, there are the following problems:

1、需要专门搭建摄像机采集图像，然而在港口作业区，其安防监控摄像机均已安装完毕，且摄像机点位众多，无法在每一个监控点位都获取理想的图像信息。1. It is necessary to build a special camera to collect images. However, in the port operation area, the security monitoring cameras have been installed, and there are many camera points, so it is impossible to obtain ideal image information at each monitoring point.

2、Yolo-tiny的backbone网络深度，网络宽度小，导致算法无法获取足够的语义信息，继而导致对微小物体的检测准确度不高。2. Yolo-tiny's backbone network depth and network width are small, which makes the algorithm unable to obtain enough semantic information, which in turn leads to low detection accuracy for tiny objects.

发明内容Contents of the invention

鉴于上述的分析，本申请实施例旨在提供一种基于计算机视觉的港口作业区微小目标检测方法和装置，用以解决现有训练数据相对有限以及微小物体的检测准确度低的问题。In view of the above analysis, the embodiment of the present application aims to provide a method and device for detecting small objects in port operation areas based on computer vision, so as to solve the problems of relatively limited existing training data and low detection accuracy of small objects.

一方面，本申请实施例提供了一种基于计算机视觉的目标检测方法，包括：对港口作业区的历史监控视频进行运动信息检测，并根据所述运动信息进行截图并对目标进行标注以制作数据集；建立神经网络Yolov5x，利用所述数据集中的一部分对所述神经网络Yolov5x进行初步训练以获取目标检测模型；使用所述目标检测模型对所述数据集中的另一部分进行目标检测以分析误检目标和漏检目标；基于所述误检目标和/或所述漏检目标，通过数据增广更新所述数据集，并利用更新的数据集对所述目标检测模型进行强化训练；以及每隔预定时间对所述港口作业区的当前监控视频进行截图以获得待检测图片并利用强化的目标检测模型对所述待检测图片进行目标检测。On the one hand, the embodiment of the present application provides a computer vision-based target detection method, including: detecting motion information on the historical surveillance video of the port operation area, and taking a screenshot according to the motion information and marking the target to create data set; set up a neural network Yolov5x, utilize a part of the data set to carry out preliminary training to the neural network Yolov5x to obtain a target detection model; use the target detection model to perform target detection on another part of the data set to analyze false detections target and missed detection target; based on the false detection target and/or the missed detection target, update the data set through data augmentation, and use the updated data set to carry out intensive training on the target detection model; and every Taking a screenshot of the current surveillance video of the port operation area at a predetermined time to obtain a picture to be detected, and using an enhanced target detection model to perform target detection on the picture to be detected.

上述技术方案的有益效果如下：基于误检目标和/或漏检目标，通过数据增广更新数据集以获取足够的港口作业区训练数据，并利用更新的数据集对目标检测模型进行强化训练以提升目标检测模型的鲁棒性。利用强化的目标检测模型对待检测图片进行目标检测，设计港口作业区小目标检测网络，提升小目标的检测准确率。The beneficial effects of the above-mentioned technical solution are as follows: Based on the false detection target and/or the missed detection target, the data set is updated through data augmentation to obtain sufficient training data in the port operation area, and the updated data set is used to strengthen the training of the target detection model to Improve the robustness of object detection models. Use the enhanced target detection model to perform target detection on the pictures to be detected, design a small target detection network in the port operation area, and improve the detection accuracy of small targets.

基于上述方法的进一步改进，对港口作业区的历史监控视频进行运动信息检测，并根据所述运动信息进行截图并对目标进行标注以制作数据集进一步包括：从数据库中获取所述港口作业区的历史监控视频；根据所述历史监控视频的图片帧间信息使用高斯混合模型将所述历史监控视频分类为静止像素和运动像素以判断出所述历史监控视频中的运动像素区域；以及对所述历史监控视频中存在运动像素区域的图片进行截图并且对所述运动像素区域中的目标进行标注以生成所述数据集。Based on the further improvement of the above method, the historical monitoring video of the port operation area is detected for motion information, and taking a screenshot according to the motion information and marking the target to make a data set further includes: obtaining the port operation area from the database. Historical monitoring video; using Gaussian mixture model to classify the historical monitoring video into static pixels and moving pixels according to the picture frame information of the historical monitoring video to determine the moving pixel area in the historical monitoring video; and for the Screenshots are taken of pictures with moving pixel areas in historical surveillance videos, and objects in the moving pixel areas are marked to generate the data set.

基于上述方法的进一步改进，所述目标包括要检测的尺寸相对较大的大目标和尺寸相对较小的小目标，在利用所述数据集对所述神经网络Yolov5x进行训练以获取目标检测模型之前，还包括：根据所述港口作业区的所述历史监控视频和所述当前监控视频中的小目标的尺寸，确定所述数据集中的每个图片帧的输入尺寸。Based on the further improvement of the above method, the target includes a large target with a relatively large size and a small target with a relatively small size to be detected, before using the data set to train the neural network Yolov5x to obtain a target detection model , further comprising: determining the input size of each picture frame in the data set according to the size of the small target in the historical surveillance video of the port operation area and the current surveillance video.

基于上述方法的进一步改进，建立神经网络Yolov5x还包括：所述神经网络Yolov5x的基网使用CSP网络架构；以及在所述神经网络Yolov5x的移动步长为8、16和32的金字塔特征图的基础上，增加移动步长为4的金字塔特征图以检测所述小目标。Based on the further improvement of the above method, the establishment of the neural network Yolov5x also includes: the base network of the neural network Yolov5x uses a CSP network architecture; Above, a pyramid feature map with a moving step size of 4 is added to detect the small object.

基于上述方法的进一步改进，分析误检目标和漏检目标进一步包括：将使用所述目标检测模型检测到的目标与所述数据集的对应的图片中的标注目标进行比较，以确定所述误检目标和所述漏检目标，其中，将所述对应的被标注的第一图片中本来没有目标，而使用所述目标检测模型检测到的目标确定为所述误检目标；以及将所述对应的被标注的第二图片中具有目标，而使用所述目标检测模型没有检测到的目标确定为所述漏检目标。Based on a further improvement of the above method, analyzing the falsely detected object and the missed object further includes: comparing the object detected by the object detection model with the marked object in the corresponding picture of the data set to determine the falsely detected object The detection target and the missed detection target, wherein, the target detected by the target detection model without the target in the corresponding marked first picture is determined as the false detection target; and the There is a target in the corresponding marked second picture, and the target that is not detected by using the target detection model is determined as the missed detection target.

基于上述方法的进一步改进，基于所述误检目标，通过数据增广更新所述数据集，其中，利用更新的数据集提升所述目标检测模型的鲁棒性进一步包括：当所述误检目标是稳定的误检目标时，将检测到所述误检目标的所述第一图片进行正确标注并添加至所述数据集；从所述第一图片中随机剪裁所述误检目标作为负样本的图片进行数据增广以更新所述数据集；以及将具有不同相对尺寸的目标的图片与所述负样本的图片输入所述目标检测模型，使得所述目标检测模型识别出所述目标与所述误检目标之间的区别。Based on a further improvement of the above method, based on the falsely detected target, the data set is updated through data augmentation, wherein using the updated data set to improve the robustness of the target detection model further includes: when the falsely detected target When it is a stable false detection target, the first picture in which the false detection target is detected is correctly labeled and added to the data set; the false detection target is randomly cropped from the first picture as a negative sample performing data augmentation on the pictures to update the data set; and inputting pictures of targets with different relative sizes and the pictures of the negative samples into the target detection model, so that the target detection model recognizes that the target is different from the target The distinction between false detection targets described above.

基于上述方法的进一步改进，基于所述漏检目标，通过数据增广更新所述数据集，其中，利用更新的数据集提升所述目标检测模型的鲁棒性进一步包括：通过改变所述第二图片的亮度和对比度进行数据增广和/或通过从所述第二图片中随机抠出所述漏检目标并将所述漏检目标粘贴到所述第二图片的其他位置处进行数据增广，以更新所述数据集，其中，所述第二图片中具有漏检目标；将粘贴有所述漏检目标的第二图片输入所述目标检测模型，使得所述目标检测模型能够检测出所述漏检目标。Based on a further improvement of the above method, based on the missed detection target, the data set is updated through data augmentation, wherein, using the updated data set to improve the robustness of the target detection model further includes: by changing the second performing data augmentation on the brightness and contrast of the picture and/or performing data augmentation by randomly extracting the missed detection target from the second picture and pasting the missed detection target to other positions of the second picture , to update the data set, wherein the second picture has an undetected target; the second picture pasted with the undetected target is input into the target detection model, so that the target detection model can detect all The missed detection target.

基于上述方法的进一步改进，所述大目标包括港口作业机械；所述小目标包括港口人员；所述误检目标包括摄像机；以及所述漏检目标包括下蹲人员。Based on a further improvement of the above method, the large target includes port operation machinery; the small target includes port personnel; the falsely detected target includes a camera; and the missed detected target includes squatting personnel.

另一方面，本申请实施例提供了一种基于计算机视觉的目标检测装置，包括：数据集生成模块，用于对港口作业区的历史监控视频进行运动信息检测，并根据所述运动信息进行截图并对目标进行标注以制作数据集；模型建立模块，用于建立神经网络Yolov5x，利用所述数据集中的一部分对所述神经网络Yolov5x进行初步训练以获取目标检测模型；目标分析模块，用于使用所述目标检测模型对所述数据集中的另一部分进行目标检测以分析误检目标和漏检目标；数据集更新模块，用于基于所述误检目标和/或所述漏检目标，通过数据增广更新所述数据集；强化训练模块，用于利用更新的数据集对所述目标检测模型进行强化训练；以及目标检测模块，用于每隔预定时间对所述港口作业区的当前监控视频进行截图以获得待检测图片并利用强化的目标检测模型对所述待检测图片进行目标检测。On the other hand, the embodiment of the present application provides a computer vision-based target detection device, including: a data set generation module, which is used to detect motion information on historical surveillance videos of port operation areas, and take screenshots based on the motion information And target is marked to make data set; Model building module is used to set up neural network Yolov5x, utilizes a part in described data set to carry out preliminary training to described neural network Yolov5x to obtain target detection model; Target analysis module, for using The target detection model performs target detection on another part of the data set to analyze false detection targets and missed detection targets; a data set update module is configured to pass data based on the false detection target and/or the missed detection target The data set is augmented and updated; the enhanced training module is used to use the updated data set to carry out intensive training on the target detection model; and the target detection module is used to monitor the current monitoring video of the port operation area every predetermined time Take a screenshot to obtain a picture to be detected, and use an enhanced target detection model to perform target detection on the picture to be detected.

基于上述装置的进一步改进，所述数据集生成模块用于：从数据库中获取所述港口作业区的历史监控视频；根据所述历史监控视频的图片帧间信息使用高斯混合模型将所述历史监控视频分类为静止像素和运动像素以判断出所述历史监控视频中的运动像素区域；以及对所述历史监控视频中存在运动像素区域的图片进行截图并且对所述运动像素区域中的目标进行标注以生成所述数据集。Based on the further improvement of the above device, the data set generation module is used to: obtain the historical monitoring video of the port operation area from the database; Classifying the video into static pixels and moving pixels to determine the moving pixel area in the historical surveillance video; and taking a screenshot of the picture with the moving pixel area in the historical monitoring video and marking the target in the moving pixel area to generate the dataset.

基于上述装置的进一步改进，基于计算机视觉的目标检测方法还包括输入尺寸确定模块，用于根据所述港口作业区的所述历史监控视频和所述当前监控视频中的小目标的尺寸，确定所述数据集中的每个图片帧的输入尺寸。Based on the further improvement of the above device, the computer vision-based target detection method also includes an input size determination module, which is used to determine the size of the small target in the historical monitoring video and the current monitoring video of the port operation area. The input dimensions of each picture frame in the above dataset.

基于上述装置的进一步改进，所述模型建立模块还用于：所述神经网络Yolov5x的基网使用CSP网络架构；以及在所述神经网络Yolov5x的移动步长为8、16和32的金字塔特征图的基础上，增加移动步长为4的金字塔特征图以检测所述小目标。Based on the further improvement of the above-mentioned device, the model building module is also used for: the base network of the neural network Yolov5x uses a CSP network architecture; and the moving step size of the neural network Yolov5x is 8, 16 and 32. On the basis of , increase the pyramid feature map with a moving step size of 4 to detect the small target.

基于上述装置的进一步改进，所述目标分析模块用于将使用所述目标检测模型检测到的目标与所述数据集的对应的图片中的标注目标进行比较，以确定所述误检目标和所述漏检目标，其中，所述检测分析模块包括误检目标分析子模块和漏检目标分析子模块，所述误检目标分析子模块，用于将所述对应的被标注的第一图片中本来没有目标，而使用所述目标检测模型检测到的目标确定为所述误检目标；以及所述漏检目标分析子模块，用于将所述对应的被标注的第二图片中具有目标，而使用所述目标检测模型没有检测到的目标确定为所述漏检目标。Based on the further improvement of the above device, the target analysis module is used to compare the target detected by using the target detection model with the marked target in the corresponding picture of the data set, so as to determine the false detection target and the detected target. The missed detection target, wherein, the detection analysis module includes a false detection target analysis submodule and a missed detection target analysis submodule, and the false detection target analysis submodule is used to convert the corresponding marked first picture There is no target originally, but the target detected by the target detection model is determined as the false detection target; and the missed target analysis submodule is used to identify the target in the corresponding marked second picture, And the target not detected by using the target detection model is determined as the missed target.

基于上述装置的进一步改进，所述数据集更新模块用于：当所述误检目标是稳定的误检目标时，将检测到所述误检目标的所述第一图片进行正确标注并添加至所述数据集；从所述第一图片中随机剪裁所述误检目标作为负样本的图片进行数据增广以更新所述数据集；以及将具有不同相对尺寸的目标的图片与所述负样本的图片输入所述目标检测模型，使得所述目标检测模型识别出所述目标与所述误检目标之间的区别。Based on the further improvement of the above device, the data set update module is configured to: when the false detection target is a stable false detection target, correctly label and add the first picture in which the false detection target is detected to The data set; from the first picture, randomly clipping the false detection object as a negative sample picture to perform data augmentation to update the data set; and combining the pictures of objects with different relative sizes with the negative sample The picture of the target is input into the target detection model, so that the target detection model recognizes the difference between the target and the false detection target.

基于上述装置的进一步改进，所述数据集更新模块用于：通过改变所述第二图片的亮度和对比度进行数据增广和/或通过从所述第二图片中随机抠出所述漏检目标并将所述漏检目标粘贴到所述第二图片的其他位置处进行数据增广，以更新所述数据集，其中，所述第二图片中具有漏检目标；将粘贴有所述漏检目标的第二图片输入所述目标检测模型，使得所述目标检测模型能够检测出所述漏检目标。Based on the further improvement of the above device, the data set update module is configured to: perform data augmentation by changing the brightness and contrast of the second picture and/or by randomly extracting the missed detection target from the second picture And paste the missed detection target to other positions of the second picture for data augmentation, to update the data set, wherein, there is a missed detection target in the second picture; the missed detection target will be pasted The second picture of the target is input to the target detection model, so that the target detection model can detect the missed target.

基于上述装置的进一步改进，所述大目标包括港口作业机械；所述小目标包括港口人员；所述误检目标包括摄像机；以及所述漏检目标包括下蹲人员。Based on a further improvement of the above device, the large target includes port operation machinery; the small target includes port personnel; the falsely detected target includes a camera; and the missed detected target includes squatting personnel.

与现有技术相比，本申请至少可实现如下有益效果之一：Compared with the prior art, the present application can achieve at least one of the following beneficial effects:

1、基于误检目标和/或漏检目标，通过数据增广更新数据集以获取足够的港口作业区训练数据，并利用更新的数据集对目标检测模型进行强化训练以提升目标检测模型的鲁棒性。利用强化的目标检测模型对待检测图片进行目标检测，设计港口作业区小目标检测网络，提升小目标的检测准确率。1. Based on false detection targets and/or missed detection targets, update the data set through data augmentation to obtain sufficient training data in the port operation area, and use the updated data set to strengthen the training of the target detection model to improve the robustness of the target detection model. Stickiness. Use the enhanced target detection model to perform target detection on the pictures to be detected, design a small target detection network in the port operation area, and improve the detection accuracy of small targets.

2、通过重新设计Yolov5x的网络结构，提升了港口作业区小目标物体的检测能力。通过***设计监测模型输入尺寸以及添加步长为4的金字塔特征图，将检测模型对港口人员的检测准确率由Yolo-tiny的90％提升到92％。2. By redesigning the network structure of Yolov5x, the detection ability of small target objects in the port operation area has been improved. By systematically designing the input size of the monitoring model and adding a pyramid feature map with a step size of 4, the detection accuracy of the detection model for port personnel is increased from 90% of Yolo-tiny to 92%.

3、使用运动检测算法，有效获取港口作业区实时监控数据集。准确获取港口作业区实时监控视频中有物体运动的图片数据，而将物体静止(前后帧保持不变)、没有物体的视频帧过滤掉，提升了获取监控数据集的效率。3. Use the motion detection algorithm to effectively obtain the real-time monitoring data set of the port operation area. Accurately obtain the image data of moving objects in the real-time monitoring video of the port operation area, and filter out the video frames with static objects (the front and rear frames remain unchanged) and no objects, which improves the efficiency of obtaining monitoring data sets.

4、根据检测的具体情况，有针对性的进行数据增广，提升港口作业区小目标物体的检测能力。根据检测具体情况，针对性进行数据增广，不仅可以尽量减少无用的训练量，还可以将识别的准确率由原先的92％提升到现在的95％。4. According to the specific situation of the detection, data augmentation is carried out in a targeted manner to improve the detection ability of small target objects in the port operation area. According to the specific situation of the detection, targeted data augmentation can not only reduce the amount of useless training as much as possible, but also increase the recognition accuracy from the original 92% to the current 95%.

本申请中，上述各技术方案之间还可以相互组合，以实现更多的优选组合方案。本申请的其他特征和优点将在随后的说明书中阐述，并且，部分优点可从说明书中变得显而易见，或者通过实施本申请而了解。本申请的目的和其他优点可通过说明书以及附图中所特别指出的内容中来实现和获得。In the present application, the above technical solutions can also be combined with each other to realize more preferred combination solutions. Additional features and advantages of the application will be set forth in the description which follows, and some of the advantages will be apparent from the description, or may be learned by practice of the application. The objectives and other advantages of the application will be realized and attained by the matter particularly pointed out in the written description and accompanying drawings.

附图说明Description of drawings

附图仅用于示出具体实施例的目的，而并不认为是对本申请的限制，在整个附图中，相同的参考符号表示相同的部件。The drawings are for the purpose of illustrating specific embodiments only and are not to be considered limiting of the application, and like reference numerals refer to like parts throughout the drawings.

图1为根据本申请实施例的基于计算机视觉的目标检测方法的流程图。FIG. 1 is a flowchart of a computer vision-based object detection method according to an embodiment of the present application.

图2为根据本申请实施例的用于预测不同尺度的目标的金字塔特征图的示图。FIG. 2 is a diagram of a pyramid feature map used for predicting objects of different scales according to an embodiment of the present application.

图3为根据本申请实施例的使用高斯混合模型检测视频的运行物体的检测结果示图。FIG. 3 is a diagram of a detection result of a running object in a video using a Gaussian mixture model according to an embodiment of the present application.

图4为根据本申请实施例的港口作业区的视频中截取的图片示图。Fig. 4 is an illustration of a picture intercepted from a video of a port operation area according to an embodiment of the present application.

图5为根据本申请实施例的在使用目标检测模型进行目标检测时，存在误检目标的示图。Fig. 5 is a diagram showing falsely detected targets when using a target detection model for target detection according to an embodiment of the present application.

图6为根据本申请实施例的在使用目标检测模型进行目标检测时，存在漏检目标的示图。Fig. 6 is a diagram showing missed detection targets when using a target detection model for target detection according to an embodiment of the present application.

图7A和图7B分别为根据本申请实施例的漏检目标的原图及其随机抠图。FIG. 7A and FIG. 7B are respectively the original image of the missed detection object and its random cutout image according to the embodiment of the present application.

图8A和图8B分别为根据本申请实施例的原图和调节对比度后的示图。FIG. 8A and FIG. 8B are respectively the original image and the image after adjusting the contrast according to the embodiment of the present application.

图9A和图9B分别为根据本申请实施例的原图和添加漏检人员后的示图。FIG. 9A and FIG. 9B are respectively the original image and the view after adding the missing persons according to the embodiment of the present application.

图10为根据本申请实施例的基于计算机视觉的目标检测装置的框图。FIG. 10 is a block diagram of an object detection device based on computer vision according to an embodiment of the present application.

具体实施方式Detailed ways

下面结合附图来具体描述本申请的优选实施例，其中，附图构成本申请一部分，并与本申请的实施例一起用于阐释本申请的原理，并非用于限定本申请的范围。Preferred embodiments of the application are described in detail below in conjunction with the accompanying drawings, wherein the drawings constitute a part of the application and together with the embodiments of the application are used to explain the principles of the application and are not intended to limit the scope of the application.

本申请的一个具体实施例，公开了一种基于计算机视觉的目标检测方法。参考图1，基于计算机视觉的目标检测方法，包括：在步骤S102中，对港口作业区的历史监控视频进行运动信息检测，并根据运动信息进行截图并对目标进行标注以制作数据集；在步骤S104中，建立神经网络Yolov5x，利用数据集中的一部分对神经网络Yolov5x进行初步训练以获取目标检测模型；在步骤S106中，使用目标检测模型对数据集中的另一部分进行目标检测以分析误检目标和漏检目标；在步骤S108中，基于误检目标和/或漏检目标，通过数据增广更新数据集，并利用更新的数据集对目标检测模型进行强化训练；以及在步骤S110中，每隔预定时间对港口作业区的当前监控视频进行截图以获得待检测图片并利用强化的目标检测模型对待检测图片进行目标检测。A specific embodiment of the present application discloses a computer vision-based object detection method. With reference to Fig. 1, the target detection method based on computer vision, comprises: in step S102, carries out motion information detection to the historical monitoring video of port operation area, and according to motion information, screenshots are taken and target is marked to make data set; In step In S104, the neural network Yolov5x is established, and a part of the data set is used to carry out preliminary training to the neural network Yolov5x to obtain a target detection model; in step S106, the target detection model is used to perform target detection on another part of the data set to analyze false detection targets and Missed detection target; in step S108, based on the false detection target and/or the missed detection target, update the data set through data augmentation, and use the updated data set to carry out intensive training to the target detection model; and in step S110, every Take a screenshot of the current surveillance video of the port operation area at the scheduled time to obtain the picture to be detected and use the enhanced target detection model to perform target detection on the picture to be detected.

与现有技术相比，本实施例提供的基于计算机视觉的目标检测方法中，基于误检目标和/或漏检目标，通过数据增广更新数据集以获取足够的港口作业区训练数据，并利用更新的数据集对目标检测模型进行强化训练以提升目标检测模型的鲁棒性。利用强化的目标检测模型对待检测图片进行目标检测，设计港口作业区小目标检测网络，提升小目标的检测准确率。Compared with the prior art, in the target detection method based on computer vision provided by this embodiment, based on false detection targets and/or missed detection targets, the data set is updated through data augmentation to obtain sufficient training data in the port operation area, and The object detection model is intensively trained with the updated data set to improve the robustness of the object detection model. Use the enhanced target detection model to perform target detection on the pictures to be detected, design a small target detection network in the port operation area, and improve the detection accuracy of small targets.

下文中，参考图1，对根据本申请实施例的基于计算机视觉的目标检测方法的步骤S102至S110进行详细描述。Hereinafter, with reference to FIG. 1 , steps S102 to S110 of the method for object detection based on computer vision according to an embodiment of the present application will be described in detail.

在步骤S102中，对港口作业区的历史监控视频进行运动信息检测，并根据运动信息进行截图并对目标进行标注以制作数据集。例如，目标包括人员、车辆、作业机械等物体。具体地，对港口作业区的历史监控视频进行运动信息检测，并根据运动信息进行截图并对目标进行标注以制作数据集进一步包括：从数据库中获取港口作业区的历史监控视频；根据历史监控视频的图片帧间信息使用高斯混合模型将历史监控视频分类为静止像素和运动像素以判断出历史监控视频中的运动像素区域；以及对历史监控视频中存在运动像素区域的图片进行截图并且对运动像素区域中的目标进行标注以生成数据集。In step S102, motion information detection is performed on the historical surveillance video of the port operation area, screenshots are taken according to the motion information, and targets are marked to create a data set. For example, targets include objects such as people, vehicles, and work machinery. Specifically, the motion information detection is performed on the historical surveillance video of the port operation area, and screenshots are taken according to the motion information and the target is marked to make a data set. Further includes: obtaining the historical surveillance video of the port operation area from the database; Use the Gaussian mixture model to classify the historical surveillance video into static pixels and motion pixels to determine the motion pixel area in the historical surveillance video; and take screenshots of the pictures with motion pixel areas in the historical surveillance video and analyze the motion pixels Objects in the region are annotated to generate a dataset.

然后，根据港口作业区的历史监控视频和当前监控视频中的小目标的尺寸，确定数据集中的每个图片帧的输入尺寸。具体地，为了稳定检测到微小物体，我们规定物体在步长为16的特征图上至少为1×1像素。Then, according to the historical monitoring video of the port operation area and the size of the small target in the current monitoring video, the input size of each picture frame in the data set is determined. Specifically, in order to stably detect tiny objects, we stipulate that the object is at least 1×1 pixel on the feature map with a stride of 16.

在步骤S104中，建立神经网络Yolov5x，利用数据集中的一部分对神经网络Yolov5x进行初步训练以获取目标检测模型。具体地，建立神经网络Yolov5x还包括：神经网络Yolov5x的基网使用CSP网络架构；以及在神经网络Yolov5x的移动步长为8、16和32的金字塔特征图的基础上，增加移动步长为4的金字塔特征图以检测小目标。利用特征金字塔网络高效提取各维度特征，使得从所述特征金字塔网络的顶层获取高语义化、低空间尺度的特征并进行升采样，并将升采样的特征与所述特征金字塔网络中的低语义化、高空间尺度的特征进行融合以获得抑制目标信息并凸显目标信息。例如，大目标为港口作业机械；小目标包括港口人员或者出现在监控视频中的其他人员。In step S104, a neural network Yolov5x is established, and a part of the data set is used to perform preliminary training on the neural network Yolov5x to obtain a target detection model. Specifically, the establishment of the neural network Yolov5x also includes: the base network of the neural network Yolov5x uses a CSP network architecture; Pyramid feature maps for detecting small objects. Use the feature pyramid network to efficiently extract features of each dimension, so that high semantic and low spatial scale features are obtained from the top layer of the feature pyramid network and upsampled, and the upsampled features are combined with the low semantics in the feature pyramid network The fusion of the features of high-dimensional and high-spatial scales is used to suppress target information and highlight target information. For example, the large target is port operation machinery; the small target includes port personnel or other persons appearing in the surveillance video.

在步骤S106中，使用目标检测模型对数据集中的另一部分进行目标检测以分析误检目标和漏检目标。例如，误检目标包括摄像机；漏检目标包括下蹲人员。具体地，分析误检目标和漏检目标进一步包括：将使用目标检测模型检测到的目标与数据集的对应的图片中的标注目标进行比较，以确定误检目标和漏检目标，其中，将对应的被标注的第一图片中本来没有目标，而使用目标检测模型检测到的目标确定为误检目标；以及将对应的被标注的第二图片中具有目标，而使用目标检测模型没有检测到的目标确定为漏检目标。In step S106, the target detection model is used to perform target detection on another part of the data set to analyze falsely detected targets and missed detected targets. For example, false detections include cameras; missed detections include squatting people. Specifically, analyzing the false detection target and the missed detection target further includes: comparing the target detected by using the target detection model with the marked target in the corresponding picture of the data set, so as to determine the false detection target and the missed detection target, wherein, the There is no target in the corresponding marked first picture, but the target detected by the target detection model is determined to be a false detection target; and there is a target in the corresponding marked second picture, but the target detection model is not detected The target is identified as the missed target.

在步骤S108中，基于误检目标和/或漏检目标，通过数据增广更新数据集，以提升目标检测模型的鲁棒性，并利用更新的数据集对目标检测模型进行强化训练。具体地，基于误检目标，通过数据增广更新数据集进一步包括：当误检目标是稳定的误检目标时，将检测到误检目标的第一图片进行正确标注并添加至数据集；从第一图片中随机剪裁误检目标作为负样本的图片进行数据增广以更新数据集；以及将具有不同相对尺寸的目标的图片与负样本的图片输入目标检测模型，使得目标检测模型识别出目标与误检目标之间的区别。基于漏检目标，通过数据增广更新数据集进一步包括：通过改变第二图片的亮度和对比度进行数据增广和/或通过从第二图片中随机抠出漏检目标并将漏检目标粘贴到第二图片的其他位置处进行数据增广，以更新数据集，其中，第二图片中具有漏检目标；将粘贴有漏检目标的第二图片输入目标检测模型，使得目标检测模型能够检测出漏检目标。In step S108, based on the falsely detected object and/or the missed detected object, the data set is updated through data augmentation to improve the robustness of the object detection model, and the updated data set is used to strengthen the training of the object detection model. Specifically, based on the false detection target, updating the data set through data augmentation further includes: when the false detection target is a stable false detection target, correctly labeling the first picture in which the false detection target is detected and adding it to the data set; In the first picture, randomly crop the wrongly detected target as a negative sample image to perform data augmentation to update the data set; and input the target detection model with pictures of targets with different relative sizes and negative samples, so that the target detection model recognizes the target The difference between false detection targets. Based on the missed object, updating the data set through data augmentation further includes: performing data augmentation by changing the brightness and contrast of the second picture and/or by randomly extracting the missed object from the second picture and pasting the missed object into Data augmentation is performed at other positions of the second picture to update the data set, wherein the second picture has missed detection targets; the second picture pasted with missed detection targets is input into the target detection model, so that the target detection model can detect missed target.

在步骤S110中，每隔预定时间对港口作业区的当前监控视频进行截图以获得待检测图片并利用强化的目标检测模型对待检测图片进行目标检测。例如，监控摄像机拍摄的视频通常为25至30帧/秒，每2至3张图片中截取一张图片，即，每隔一张或两张图片进行截图。例如，当监控摄像机拍摄的视频通常为25帧/秒，预定时间为0.004秒或者0.006秒。In step S110, screenshots are taken of the current surveillance video of the port operation area at predetermined intervals to obtain pictures to be detected, and target detection is performed on the pictures to be detected by using an enhanced target detection model. For example, the video taken by a surveillance camera is usually 25 to 30 frames per second, and a picture is cut out of every 2 to 3 pictures, that is, a picture is taken every other or two pictures. For example, when the video captured by the surveillance camera is usually 25 frames per second, the predetermined time is 0.004 seconds or 0.006 seconds.

本申请的另一个具体实施例，公开了一种基于计算机视觉的目标检测装置。参考图10，基于计算机视觉的目标检测装置包括：数据集生成模块1002、输入尺寸确定模块、模型建立模块1004、目标分析模块1006、数据集更新模块1008、强化训练模块1010和目标检测模块1012。Another specific embodiment of the present application discloses a computer vision-based object detection device. Referring to FIG. 10 , the computer vision-based target detection device includes: a data set generation module 1002, an input size determination module, a model building module 1004, a target analysis module 1006, a data set update module 1008, an intensive training module 1010 and a target detection module 1012.

数据集生成模块1002，用于对港口作业区的历史监控视频进行运动信息检测，并根据运动信息进行截图并对目标进行标注以制作数据集。数据集生成模块1002从数据库中获取港口作业区的历史监控视频，具体地，由摄像机拍摄监控视频并存储在数据库中；根据历史监控视频的图片帧间信息使用高斯混合模型将历史监控视频分类为静止像素和运动像素以判断出历史监控视频中的运动像素区域；以及对历史监控视频中存在运动像素区域的图片进行截图并且对运动像素区域中的目标进行标注以生成数据集，并将数据集存储在存储服务器的数据库中。The data set generation module 1002 is used to detect motion information on the historical surveillance video of the port operation area, take screenshots and mark targets according to the motion information to create a data set. The data set generation module 1002 obtains the historical monitoring video of the port operation area from the database, specifically, the monitoring video is taken by the camera and stored in the database; according to the information between the picture frames of the historical monitoring video, the Gaussian mixture model is used to classify the historical monitoring video into Static pixels and moving pixels to determine the moving pixel area in the historical surveillance video; and take a screenshot of the picture with the moving pixel area in the historical monitoring video and mark the target in the moving pixel area to generate a data set, and the data set Stored in the database of the storage server.

输入尺寸确定模块根据港口作业区的历史监控视频和当前监控视频中的小目标的尺寸，确定数据集中的每个图片帧的输入尺寸。The input size determination module determines the input size of each picture frame in the data set according to the historical surveillance video of the port operation area and the size of the small target in the current surveillance video.

模型建立模块1004，用于建立神经网络Yolov5x，利用数据集中的一部分对神经网络Yolov5x进行初步训练以获取目标检测模型。模型建立模块还用于：神经网络 Yolov5x的基网使用CSP网络架构；以及在神经网络Yolov5x的移动步长为8、16和32的金字塔特征图的基础上，增加移动步长为4的金字塔特征图以检测小目标。利用特征金字塔网络高效提取各维度特征，使得从所述特征金字塔网络的顶层获取高语义化、低空间尺度的特征并进行升采样，并将升采样的特征与所述特征金字塔网络中的低语义化、高空间尺度的特征进行融合以获得抑制目标信息并突显目标信息。大目标包括港口作业机械。小目标包括港口人员。The model establishment module 1004 is used to establish the neural network Yolov5x, and use a part of the data set to perform preliminary training on the neural network Yolov5x to obtain a target detection model. The model building module is also used for: the base network of the neural network Yolov5x uses a CSP network architecture; and on the basis of the pyramid feature maps with a moving step size of 8, 16 and 32 of the neural network Yolov5x, adding a pyramid feature with a moving step size of 4 map to detect small objects. Use the feature pyramid network to efficiently extract features of each dimension, so that high semantic and low spatial scale features are obtained from the top layer of the feature pyramid network and upsampled, and the upsampled features are combined with the low semantics in the feature pyramid network The fusion of the features of the generalized and high-spatial scale is obtained to suppress the target information and highlight the target information. Big targets include port machinery. Small targets include port personnel.

目标分析模块1006，用于使用目标检测模型对数据集中的另一部分进行目标检测以分析误检目标和漏检目标。目标分析模块1006用于将使用目标检测模型检测到的目标与数据集的对应的图片中的标注目标进行比较，以确定误检目标和漏检目标。例如，误检目标包括摄像机；以及漏检目标包括下蹲人员。检测分析模块包括误检目标分析子模块和漏检目标分析子模块。误检目标分析子模块，用于将对应的被标注的第一图片中本来没有目标，而使用目标检测模型检测到的目标确定为误检目标。漏检目标分析子模块，用于将对应的被标注的第二图片中具有目标，而使用目标检测模型没有检测到的目标确定为漏检目标。The target analysis module 1006 is configured to use the target detection model to perform target detection on another part of the data set to analyze falsely detected targets and missed detected targets. The target analysis module 1006 is used to compare the target detected by using the target detection model with the marked target in the corresponding picture of the data set, so as to determine the falsely detected target and the missed detected target. For example, false detection targets include cameras; and missed detection targets include squatting people. The detection analysis module includes a false detection target analysis sub-module and a missed detection target analysis sub-module. The false detection target analysis sub-module is used to determine the target detected by the target detection model as a false detection target if there is no target in the corresponding marked first picture. The missed target analysis sub-module is configured to determine the target that has a target in the corresponding labeled second picture but is not detected by the target detection model as a missed target.

数据集更新模块1008，用于基于误检目标和/或漏检目标，通过数据增广更新数据集。数据集更新模块用于：当误检目标是稳定的误检目标时，将检测到误检目标的第一图片进行正确标注并添加至数据集；从第一图片中随机剪裁误检目标作为负样本的图片进行数据增广以更新数据集。数据集更新模块用于：通过改变第二图片的亮度和对比度进行数据增广和/或通过从第二图片中随机抠出漏检目标并将漏检目标粘贴到第二图片的其他位置处进行数据增广，以更新数据集，其中，第二图片中具有漏检目标。The data set updating module 1008 is configured to update the data set through data augmentation based on the falsely detected object and/or the missed detected object. The data set update module is used to: when the false detection target is a stable false detection target, correctly label the first picture in which the false detection target is detected and add it to the data set; randomly crop the false detection target from the first picture as a negative Images of samples are augmented to update the dataset. The data set update module is used for: performing data augmentation by changing the brightness and contrast of the second picture and/or by randomly extracting the missed detection target from the second picture and pasting the missed detection target to other positions of the second picture Data augmentation to update the dataset, wherein the second image has missed objects.

强化训练模块1010，用于利用更新的数据集对目标检测模型进行强化训练。将具有不同相对尺寸的目标的图片与负样本的图片输入目标检测模型，使得目标检测模型识别出目标与误检目标之间的区别。将粘贴有漏检目标的第二图片输入目标检测模型，使得目标检测模型能够检测出漏检目标。The intensive training module 1010 is configured to use the updated data set to perform intensive training on the target detection model. The images of objects with different relative sizes and the images of negative samples are fed into the object detection model, so that the object detection model recognizes the difference between objects and falsely detected objects. Inputting the second picture pasted with the missed target into the target detection model, so that the target detection model can detect the missed target.

目标检测模块1012，用于每隔预定时间对港口作业区的当前监控视频进行截图以获得待检测图片并利用强化的目标检测模型对待检测图片进行目标检测。具体地，在显示器上显示利用目标检测模型实时检测到的目标，其中，目标包括人员、车辆和作业机械等物体，并实时显示港口作业区的违规行为。另外，通过扬声器将目标的违规行为通知安保人员。违规行为包括人员未带安全帽和/或没有穿工作服、人员下蹲等；车辆超速、逆行等。The target detection module 1012 is configured to take screenshots of the current surveillance video of the port operation area at predetermined intervals to obtain pictures to be detected, and use the enhanced target detection model to perform target detection on the pictures to be detected. Specifically, the targets detected by the target detection model in real time are displayed on the display, where the targets include objects such as people, vehicles, and operating machinery, and the violations in the port operation area are displayed in real time. Additionally, security personnel are notified of target violations via loudspeaker. Violations include personnel not wearing safety helmets and/or work clothes, personnel squatting, etc.; vehicles speeding, retrograde, etc.

下文中，以具体实例的方式对基于计算机视觉的目标检测方法进行详细描述。Hereinafter, the object detection method based on computer vision will be described in detail by way of specific examples.

本申请提出的港口作业区小目标检测算法是以最新的单阶段检测神经网络YOLOv5为原型进行改进的算法。首先，通过增大网络输入尺寸并修正输入尺寸比例，将输入图像的细节信息进行分析同时，提升检测网络的工作效率；之后，基于YOLOv5原来的金字塔形图像特征，添加更浅层的特征图作为检测输出，提升微小物体检测性能；随后，初步训练一版网络后，根据新数据的检测情况，针对性使用多种数据增广方法，在图片数据有限的前提下，增广数据，进行迭代训练，提升检测网络的鲁棒性。数据增广是深度学习中常用的技巧之一，主要用于增加训练数据集，让数据集尽可能的多样化，使得训练的模型具有更强的泛化能力。The small target detection algorithm in the port operation area proposed in this application is an improved algorithm based on the latest single-stage detection neural network YOLOv5. First of all, by increasing the network input size and correcting the input size ratio, the detailed information of the input image is analyzed and the work efficiency of the detection network is improved; then, based on the original pyramid-shaped image features of YOLOv5, a shallower feature map is added as Detection output to improve the detection performance of small objects; then, after the initial training of a version of the network, according to the detection situation of new data, a variety of data augmentation methods are used in a targeted manner, and under the premise of limited image data, data augmentation is carried out for iterative training , to improve the robustness of the detection network. Data augmentation is one of the commonly used techniques in deep learning. It is mainly used to increase the training data set and make the data set as diverse as possible, so that the trained model has stronger generalization ability.

本申请方法是通过下述技术方案实现的：The application method is achieved through the following technical solutions:

步骤一：基于小目标检测需求，初步选择Yolov5x网络为港口作业区小目标检测网络。Step 1: Based on the small target detection requirements, the Yolov5x network is initially selected as the small target detection network in the port operation area.

使用Yolov5x作为检测网络原因如下：The reasons for using Yolov5x as the detection network are as follows:

1、相比于Yolo-tiny的基网使用的ResNet架构，Yolov5x的基网使用了CSP架构，可以在准确率略微提升的前提下，减小运算量。1. Compared with the ResNet architecture used by the Yolo-tiny base network, the Yolov5x base network uses the CSP architecture, which can reduce the amount of computation while slightly improving the accuracy.

表1：现有方法与Yolov5x的性能比Table 1: Performance ratio between existing methods and Yolov5x

MethodMethod	BackboneBackbone	Sizesize	FPSFPS	#Parameter#Parameter	AP ₅₀ AP ₅₀	AP ₇₅ AP ₇₅	AP _S AP _S
YoloV3YoloV3	Darknet53Darknet53	608608	3030	62.3M62.3M	57.957.9	34.434.4	18.318.3
YoloV3(SPP)YoloV3 (SPP)	Darknet53Darknet53	608608	3030	62.9M62.9M	60.660.6	38.238.2	20.620.6
PANet(SSP)PANet (SSP)	CSPResNeXt50CSPResNeXt50	608608	3535	56.9M56.9M	60.660.6	41.641.6	22.122.1

2、相比于Yolo-tiny的1024层通道，Yolov5的通道层为1280，更多的通道数意味着更复杂的语义表达能力，即可以在浅层特征图获取更精准的检测能力，提升检测的准确率。2. Compared with Yolo-tiny's 1024-layer channel, Yolov5's channel layer is 1280. More channels mean more complex semantic expression ability, that is, more accurate detection ability can be obtained in shallow feature maps, and detection can be improved. the accuracy rate.

3、相比于Yolo-tiny的两层金字塔特征，Yolov5为三层金字塔特征，可以满足更多尺度物体的检测能力。3. Compared with the two-layer pyramid feature of Yolo-tiny, Yolov5 is a three-layer pyramid feature, which can meet the detection ability of more scale objects.

步骤二：根据现场小目标检测的实际需求，确定小目标检测网络的输入尺寸。Step 2: Determine the input size of the small target detection network according to the actual needs of on-site small target detection.

选择尺寸的方法为：The method for selecting the size is:

1、确定视频的分辨率w _origin×w _origin(如果分辨率为长方形，则w _origin为视频宽高的较长边)需要检测物体的最小尺寸w _min×w _min。 1. Determine the video resolution w _origin × w _origin (if the resolution is a rectangle, then w _origin is the longer side of the video width and height) the minimum size w _min × w _min of the object to be detected.

2、由于Yolov5的检测坐标是通过金字塔形特征图确定的(金字塔特征图原理在步骤三中详解)，根据尺度不变性，如果一个物体能够检测到，该物体需要在对应的尺度的特征图的对应位置有显著的特征，因此需要物体在对应特征图上的尺寸至少为1×1像素。2. Since the detection coordinates of Yolov5 are determined through the pyramid-shaped feature map (the principle of the pyramid feature map is explained in detail in step 3), according to the scale invariance, if an object can be detected, the object needs to be in the feature map of the corresponding scale. The corresponding location has salient features, so the size of the object on the corresponding feature map is required to be at least 1×1 pixel.

3、为了稳定检测到微小物体，我们规定物体在步长为16的特征图上至少为1×1像素。如果我们设模型的输入尺寸为w _input×w _input，输入尺寸所满足的条件即如下图所示： 3. In order to stably detect tiny objects, we stipulate that the object is at least 1×1 pixel on the feature map with a step size of 16. If we set the input size of the model as w _input × w _input , the conditions that the input size satisfies are shown in the figure below:

推导后可得available after derivation

由于一般视频帧图片不是正方形，如果检测模型设置为正方形的输入尺寸w _input，上下均会有一个区域无法被有效利用，这样会导致运算的浪费。因此，我们将输入尺寸比例尽量贴近视频帧图片的比例。 Since the general video frame picture is not square, if the detection model is set to a square input size w _input , there will be an area above and below that cannot be effectively used, which will lead to waste of calculation. Therefore, we will input the size ratio as close as possible to the ratio of the video frame picture.

步骤三：在原有的移动步长为8,16,32的金字塔特征图基础上，添加了移动步长为4的金字塔特征图。Step 3: On the basis of the original pyramid feature maps with moving steps of 8, 16, and 32, a pyramid feature map with moving steps of 4 is added.

特征金字塔网络定义与引进步长为4的特征图的原因：特征金字塔网络是一种利用常规卷积神经网络模型来高效提取图片中各维度特征的方法。其工作原理为：首先进行神经网络前向运算，在网络顶层获取高语义化程度，低空间尺度的特征；获取前述的特征后，将该特征升采样，并和神经网络中浅层特征(低语义化，高空间尺度特征)融合，融合后的特征既有较高的语义性，也有较高的分辨率；将前两步获取的不同尺度特征图分别用于预测不同尺度的物体(参考图2)。The reason why the feature pyramid network defines and introduces a feature map with a step length of 4: The feature pyramid network is a method that uses a conventional convolutional neural network model to efficiently extract features of each dimension in a picture. Its working principle is as follows: first, carry out the forward operation of the neural network, and obtain the features with high semantic level and low spatial scale at the top layer of the network; Semantic, high spatial scale features) fusion, the fused features have both high semantics and high resolution; the different scale feature maps obtained in the first two steps are used to predict objects of different scales (reference figure 2).

由于按照步骤二设计的网络，微小尺寸物体在空间尺寸最大的特征图(步长为8的特征图)上也只有2×2的大小，这个尺寸的特征对物体的判断难度还是比较大的。因此，需要添加一个空间尺度更大的金字塔特征图，即步长为4的特征图。Due to the network designed according to step 2, the size of the micro-sized object is only 2×2 on the feature map with the largest spatial size (the feature map with a step size of 8), and the feature of this size is relatively difficult to judge the object. Therefore, it is necessary to add a pyramid feature map with a larger spatial scale, that is, a feature map with a step size of 4.

步骤四：使用部分监控视频，根据运动信息进行截图标注，制作数据集。Step 4: Use part of the surveillance video, mark screenshots based on motion information, and create a data set.

使用运动信息截图的原因与方法：Why and how to use sports information screenshots:

由于获取的港口作业区监控视频为现场实际作业视频，现场特点为人员出现数量少，人员出现时间短，因此现场硬盘录像机获取的港口作业区监控视频有效时长极少(5％左右)。如果隔特定时间内进行截取，不仅会截取到大量无效图片(即没有待检测物体)，也会错过人员出现时的一系列重要信息。同时，由于没有物体时，现场背景随时间变化不明显；物体出现时，物体大体上是运动的。因此，截图时需要根据运动信息截图，即当视频中有区域进行运动时，截取当时的视频帧。Since the monitoring video of the port operation area obtained is the actual operation video on site, the site is characterized by a small number of personnel and a short time of personnel appearance, so the effective duration of the monitoring video of the port operation area obtained by the on-site hard disk video recorder is very small (about 5%). If the interception is performed at a specific time interval, not only will a large number of invalid pictures be intercepted (that is, there is no object to be detected), but also a series of important information when the person appears will be missed. At the same time, when there is no object, the scene background does not change significantly with time; when the object appears, the object is generally in motion. Therefore, when taking a screenshot, it is necessary to take a screenshot according to the motion information, that is, when there is an area in the video that is moving, the video frame at that time is intercepted.

使用高斯混合模型(GMM，全称为Gaussian Mixture Model)来进行视频的运动检测，根据检测结果参考图3，判断是否截取特定视频。高斯混合模型是由多个高斯分布模型(及正态分布)组合而成的模型，其公式如下所示：Use the Gaussian Mixture Model (GMM, the full name is Gaussian Mixture Model) to detect the motion of the video, and refer to Figure 3 according to the detection results to determine whether to intercept a specific video. The Gaussian mixture model is a model composed of multiple Gaussian distribution models (and normal distribution), and its formula is as follows:

其中，θ表示要被估计的参数，样本集X＝{x ₁,x ₂，…x _N}，x _j为样本集X中的样本数据，从分布是P(x|θ)的总体样本中抽取到N个样本的概率，也就是样本集X中各个样本x _j的联合概率。 Among them, θ represents the parameter to be estimated, the sample set X={x ₁ ,x ₂ ,...x _N }, x _j is the sample data in the sample set X, from the population sample whose distribution is P(x|θ) The probability of drawing N samples is the joint probability of each sample x _j in the sample set X.

根据视频帧间信息特征使用混合高斯进行归类，判断出视频中哪些像素是运动的，哪些像素是静止的。从而判断出视频中运动的像素位置。According to the inter-frame information features of the video, Gaussian mixture is used to classify, and it is judged which pixels in the video are moving and which pixels are still. In this way, the position of the moving pixel in the video can be determined.

混合高斯模型的参数需要根据视频的运动情况实时进行更新，参数更新使用EM(Expectation Maximization)算法，EM算法的步骤如下：The parameters of the mixed Gaussian model need to be updated in real time according to the motion of the video. The parameter update uses the EM (Expectation Maximization) algorithm. The steps of the EM algorithm are as follows:

1、初始化参数θ1. Initialize parameter θ

2、E-step：依据当前参数，计算每个数据j来自子模型k的可能性r _jk。 2. E-step: Calculate the possibility r _jk that each data j comes from sub-model k according to the current parameters.

其中，α _k是各高斯分布的权重，φ是各高斯子模型概率函数，其模型参数为θ _k。 Among them, α _k is the weight of each Gaussian distribution, φ is the probability function of each Gaussian sub-model, and its model parameter is θ _k .

3、M-step：计算新一轮迭代的模型参数3. M-step: Calculate the model parameters of a new round of iteration

4、重复计算E-step和M-step直至收敛(||θ _i+1-θ _i||<ε，其中ε是一个很小的正数，表示经过一次迭代之后参数变化非常小)。 4. Repeat the calculation of E-step and M-step until convergence (||θ _i+1 -θ _i ||<ε, where ε is a small positive number, indicating that the parameter changes after one iteration is very small).

步骤五：使用步骤四制作的数据集初步训练一个改进版的小目标检测模型。Step 5: Use the data set produced in step 4 to initially train an improved version of the small target detection model.

步骤六：使用步骤五训练的小目标检测模型，对尚未训练的港口作业区视频进行检测，分析误检、漏检情况。Step 6: Use the small target detection model trained in step 5 to detect untrained port operation area videos, and analyze false detections and missed detections.

误检情况：本来没有物体的区域，经过检测网络分析，认为是某种物体。False detection: The area where there is no object is considered to be some kind of object after analysis of the detection network.

漏检情况：本来图片中有若干物体，没有被检测神经网络检测出。Missed detection: There are several objects in the picture that were not detected by the detection neural network.

步骤七：基于误检、漏检情况，进行更新数据、数据增广等方法，进行强化学习，提升检测模型的鲁棒性。Step 7: Based on false detection and missed detection, methods such as updating data and data augmentation are carried out, and reinforcement learning is carried out to improve the robustness of the detection model.

更新数据的步骤：Steps to update data:

1、对于步骤六的误检情况，需要分析误检情况是否稳定，如果稳定，可以添加多个时段的该场景图片，作为负样本加入数据集。如果只是偶发情况，可以不用作为负样本添加进训练集。1. For the false detection situation in step 6, it is necessary to analyze whether the false detection situation is stable. If it is stable, you can add pictures of the scene in multiple time periods and add them to the data set as negative samples. If it is only an occasional situation, it does not need to be added to the training set as a negative sample.

2、对于步骤六的漏检情况，如果漏检的人员尺度不满足之前设定的微小尺寸人员标准或人员周围环境过于复杂，则该人员不作为训练数据加入；否则将漏检人员所在图片进行标注，并放入检测数据集中。2. For the missed detection in step 6, if the size of the missed personnel does not meet the previously set small size personnel standard or the surrounding environment of the personnel is too complicated, the personnel will not be added as training data; Labeled and put into the detection data set.

由于采集的样本视频数据量过少，现有的检测网络训练集较难以完全针对港口的各种环境进行稳定检测，鲁棒性不高，需要对训练数据进行数据增广。针对港口作业区监控视频的实际情况，确定如下增广数据方法：Due to the small amount of sample video data collected, it is difficult for the existing detection network training set to perform stable detection for various environments of the port, and the robustness is not high, so data augmentation of the training data is required. According to the actual situation of the monitoring video in the port operation area, the following data augmentation methods are determined:

随机裁剪：即随机扣取图片的某些部分进行检测，这种方法可以使得物体的相对尺寸改变，模型可以在不同相对尺度的物体中提取到该物体的特征，使得该类型物体检测准确度提升。Random cropping: That is, some parts of the picture are randomly deducted for detection. This method can change the relative size of the object. The model can extract the features of the object from objects of different relative scales, which improves the detection accuracy of this type of object. .

随机水平翻转：即将图片沿水平方向翻转。由于物体在监控中不会垂直方向对称(即上下颠倒)，因此水平翻转可以成为增加训练数据的方法。Random Horizontal Flip: Flip the picture horizontally. Since objects are not vertically symmetrical (i.e. upside down) in monitoring, horizontal flipping can be a way to augment training data.

随机调节亮度、对比度、饱和度：由于港口作业区获取监控视频，无法涵盖现场各种自然条件情况，例如阴天、逆光、阴影遮挡等情况。如果出现上述天气情况，可能会导致训练的模型检测不稳定。因此需要通过随机调节图片的亮度、对比度、饱和度来提升训练模型鲁棒性。Random adjustment of brightness, contrast, and saturation: Since the monitoring video obtained in the port operation area cannot cover various natural conditions on site, such as cloudy sky, backlight, and shadow occlusion, etc. If the above weather conditions occur, it may cause the trained model to detect instability. Therefore, it is necessary to randomly adjust the brightness, contrast, and saturation of the image to improve the robustness of the training model.

港口作业区的视频监测的实施例Example of Video Monitoring in Port Operation Area

步骤一：选择Yolov5x检测网络为港口作业区小目标检测网络。Step 1: Select the Yolov5x detection network as the small target detection network in the port operation area.

基于方案里步骤二的输入尺寸判断方法，我们可以计算一下检测模型的最终输入尺寸。在这里，我们确定微小物体在监控视频里的最小尺寸w _min为32，监控视频尺寸w _origin为1920(原始尺寸为1920×1080)，根据方案中步骤二的模型输入尺寸计算公式 Based on the input size judgment method in step 2 of the scheme, we can calculate the final input size of the detection model. Here, we determine that the minimum size w _min of tiny objects in the surveillance video is 32, and the size w _origin of the surveillance video is 1920 (the original size is 1920×1080), according to the model input size calculation formula in step 2 of the scheme

我们可以计算出：w _input>960，这里，取大于960的2 ^m数字，即1024。另外，根据方案中的输入尺寸比例与实际视频帧契合的要求，模型的输入尺寸被确定为1024×576。 We can calculate: w _input >960, here, take the 2 ^m number greater than 960, that is, 1024. In addition, the input size of the model is determined to be 1024×576 according to the requirement that the input size ratio in the scheme fits with the actual video frame.

步骤三：通过修改Yolov5x的模型配置文件，在原有的移动步长为8,16,32的金字塔特征图基础上，添加了移动步长为4的金字塔特征图(尺寸为256×144)。Step 3: By modifying the model configuration file of Yolov5x, on the basis of the original pyramid feature map with a moving step size of 8, 16, and 32, a pyramid feature map with a moving step size of 4 (size 256×144) is added.

以一段视频举例，进行数据集制作(参考图4)。Take a video as an example to make a data set (refer to Figure 4).

本段视频的时长为1分钟，其中有人视频的时长为20秒。如果按照平常的每隔特定帧截一张图，则67％的视频帧都是没有人或前后重复的视频帧。这些视频帧就是无效的视频帧，无法放入训练数据集中。为此，我们使用了opencv自带的混合高斯模型(opencv的函数名为createBackgroundSubtractorMOG2)判断视频运动情况，通过运动情况判断是否截取视频帧。使用该方法，截取的视频帧不仅确保有物体，而且可以确保截取的物体是运动变化的。The duration of this video is 1 minute, and the duration of someone's video is 20 seconds. If you take a picture every specific frame as usual, 67% of the video frames are video frames without people or repeated before and after. These video frames are invalid video frames and cannot be put into the training dataset. To this end, we use the mixed Gaussian model that comes with opencv (the function of opencv is named createBackgroundSubtractorMOG2) to judge the video motion, and judge whether to intercept the video frame through the motion. Using this method, the intercepted video frame not only ensures that there is an object, but also ensures that the intercepted object is changing in motion.

使用该方法截取了需要的视频帧，并制作训练检测网络所需的训练、验证、测试数据集。Using this method, the required video frames are intercepted, and the training, verification, and test data sets required for training the detection network are produced.

误检情况示例：Examples of false positives:

如图5所示，训练的检测网络准确地检测到了视频中央的一个人和一台港口机械。然而，该网络将轨道吊上架设的监控摄像机球机误检测为人员。由于模型训练时并未将该视频放入数据集，可以推测得出检测网络对该复杂环境尚未完全适应，需要之后将其添加进训练集进行适应性训练；另外，由于摄像机与人在相对尺度较小的位置十分相似，需要将该摄像机作为负样本，训练提取能分辨其与人之间区别的特征。As shown in Figure 5, the trained detection network accurately detects a person and a harbor machinery in the center of the video. However, the network misdetected a surveillance camera dome mounted on a rail crane as a person. Since the video was not included in the data set during model training, it can be speculated that the detection network has not yet fully adapted to the complex environment, and it needs to be added to the training set for adaptive training; Smaller locations are very similar, and this camera needs to be used as a negative sample to train to extract features that distinguish it from humans.

漏检情况示例：Examples of missing cases:

如图6所示，视频右下角有四个人，其中一个人无法通过训练的网络检测到。通过分析可以推断出漏检的原因：人员背光，导致人员特征不明显；人员处于下蹲状态，与正常人员的特征不同。针对背光问题，训练时可以通过改变图片亮度、对比度等方式增广数据；针对处于下蹲状态检测不稳定的问题，可以通过随机抠图、扣出下蹲人员并粘贴到训练集其他图片等方式，让网络学习到下蹲人员特征。As shown in Figure 6, there are four people in the lower right corner of the video, one of whom cannot be detected by the trained network. Through the analysis, we can infer the reasons for the missed detection: the backlight of the personnel makes the characteristics of the personnel not obvious; the personnel are in a squatting state, which is different from the characteristics of normal personnel. For the backlight problem, you can increase the data by changing the brightness and contrast of the picture during training; for the problem of unstable detection in the squatting state, you can randomly cut out the picture, deduct the squatting person and paste it to other pictures in the training set, etc. , let the network learn the characteristics of squatting personnel.

针对误检情况：For false positives:

以步骤六的误检示例图为例，根据之前的分析，需要将该视频帧正确标注后放入数据集。同时随机扣取图片中轨道吊的摄像机作为负样本。通过输入不同相对尺度的样本与负样本，检测网络能够学习出人与摄像机的区别(下图7A为原图，图7B为随机扣的图片)Take the false detection example picture in step 6 as an example. According to the previous analysis, the video frame needs to be correctly labeled and put into the data set. At the same time, the camera of the rail crane in the picture is randomly deducted as a negative sample. By inputting samples and negative samples of different relative scales, the detection network can learn the difference between people and cameras (Figure 7A below is the original image, and Figure 7B is a randomly deducted image)

针对漏检情况：For missed detection:

以步骤六的漏检示例图为例，根据之前的分析，可以进行两种数据增广思路：通过改变图片亮度、对比度增广；通过随机抠图、扣出下蹲人员并粘贴到训练集其他图片度增广。Taking the missed detection example picture in step 6 as an example, according to the previous analysis, two data augmentation ideas can be carried out: by changing the brightness and contrast of the picture; Image augmentation.

图8A和图8B是第一种增广方式效果(以更改对比度为例，图8A为原图，图8B为调节对比度图)。Figure 8A and Figure 8B are the effects of the first augmentation method (taking changing the contrast as an example, Figure 8A is the original image, and Figure 8B is the image for adjusting the contrast).

图9A和图9B为第二种增广方式效果(图9A为原图，图9B将漏检人员添加到图像的其他位置)。Fig. 9A and Fig. 9B are the effect of the second augmentation method (Fig. 9A is the original image, and Fig. 9B adds the missing persons to other positions of the image).

1、本申请通过重新设计Yolov5x的网络结构，提升了港口作业区小目标物体的检测能力。通过***设计监测模型输入尺寸以及添加步长为4的金字塔特征图，将检测模型对港口人员的检测准确率由Yolo-tiny的90％提升到95％。与此同时，引进的Yolov5x网络，在华为Atlas-300加速卡上运行，同尺寸下运行时间由原来的120ms，减少到现在的50ms。1. This application improves the detection ability of small target objects in the port operation area by redesigning the network structure of Yolov5x. By systematically designing the input size of the monitoring model and adding a pyramid feature map with a step size of 4, the detection accuracy of the detection model for port personnel is increased from 90% of Yolo-tiny to 95%. At the same time, the imported Yolov5x network runs on the Huawei Atlas-300 accelerator card, and the running time of the same size is reduced from the original 120ms to the current 50ms.

2、本申请使用运动检测算法，有效获取港口作业区实时监控数据集。准确获取港口作业区实时监控视频中有物体运动的图片数据，而将物体静止(前后帧保持不变)、没有物体的视频帧过滤掉，提升了获取监控数据集的效率。2. This application uses a motion detection algorithm to effectively obtain real-time monitoring data sets in port operation areas. Accurately obtain the image data of moving objects in the real-time monitoring video of the port operation area, and filter out the video frames with static objects (the front and rear frames remain unchanged) and no objects, which improves the efficiency of obtaining monitoring data sets.

3、根据检测的具体情况，有针对性的进行数据增广，提升港口作业区小目标物体的检测能力。根据检测具体情况，针对性进行数据增广，不仅可以尽量减少无用的训练量，还可以将识别的准确率由原先的92％提升到现在的95％。3. According to the specific situation of the detection, data augmentation is carried out in a targeted manner to improve the detection ability of small target objects in the port operation area. According to the specific situation of the detection, targeted data augmentation can not only reduce the amount of useless training as much as possible, but also increase the recognition accuracy from the original 92% to the current 95%.

术语“目标检测装置”包括用于处理数据的各种装置、设备和机器，例如目标检测装置包括可编程处理器、计算机、多个处理器或多个计算机等。除了硬件之外，该装置还可以包括创建用于所讨论的计算机程序的执行环境的代码，例如，构成处理器固件、协议栈、数据库管理***、操作***、或运行环境、或者其一个或多个组合的代码。The term "target detection device" includes various devices, devices and machines for processing data, for example, a target detection device includes a programmable processor, a computer, multiple processors or multiple computers, and the like. In addition to hardware, the apparatus may include code for creating an execution environment for the computer program in question, for example, making up processor firmware, protocol stacks, database management systems, operating systems, or runtime environments, or one or more of them. combined code.

本说明书中描述的方法和逻辑流程可以由一个或多个可编程处理器来执行，其中，可编程处理器执行一个或多个计算机程序，以通过对监控视频进行操作并且生成目标检测结果来执行这些功能。The methods and logic flows described in this specification can be performed by one or more programmable processors, wherein the programmable processors execute one or more computer programs to perform operations on surveillance video and generate object detection results these functions.

通常地，计算机还包括用于存储历史视频数据和数据集的一个或多个大容量存储设备(例如，磁盘、磁光盘或光盘)，或者与一个或多个大容量存储设备可操作地耦合，以从大容量存储设备接收数据或者向该大容量存储设备传输数据，或者这两者。适用于存储计算机程序指令和数据的设备包括所有形式的非易失性存储器、介质和存储设备，举例来说包括：半导体存储设备，例如EPROM(可擦可编程只读存储器)、EEPROM(电可擦可编程只读存储器)和闪存设备；磁盘，例如内部硬盘或可移动硬盘；磁光盘；以及CD-ROM光盘和DVD-ROM光盘。Typically, the computer also includes or is operably coupled to one or more mass storage devices (e.g., magnetic, magneto-optical, or optical disks) for storing historical video data and data sets, to receive data from the mass storage device or to transmit data to the mass storage device, or both. Devices suitable for storing computer program instructions and data include all forms of non-volatile memory, media and storage devices including, for example: semiconductor memory devices such as EPROM (Erasable Programmable Read-Only Memory), EEPROM (Electrically erasable programmable read-only memory) and flash memory devices; magnetic disks, such as internal or removable hard disks; magneto-optical disks; and CD-ROM and DVD-ROM disks.

本领域技术人员可以理解，实现上述实施例方法的全部或部分流程，可以通过计算机程序来指令相关的硬件来完成，所述的程序可存储于计算机可读存储介质中。其中，所述计算机可读存储介质为磁盘、光盘、只读存储记忆体或随机存储记忆体等。Those skilled in the art can understand that all or part of the processes of the methods in the above embodiments can be implemented by instructing related hardware through a computer program, and the program can be stored in a computer-readable storage medium. Wherein, the computer-readable storage medium is a magnetic disk, an optical disk, a read-only memory or a random access memory, and the like.

以上所述，仅为本申请较佳的具体实施方式，但本申请的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，可轻易想到的变化或替换，都应涵盖在本申请的保护范围之内。The above is only a preferred embodiment of the present application, but the scope of protection of the present application is not limited thereto. Any person familiar with the technical field can easily conceive of changes or changes within the technical scope disclosed in this application Replacement should be covered within the protection scope of this application.

Claims

一种基于计算机视觉的目标检测方法，其特征在于，包括：A method for target detection based on computer vision, characterized in that, comprising:

对港口作业区的历史监控视频进行运动信息检测，并根据所述运动信息进行截图并对目标进行标注以制作数据集；Carry out motion information detection on the historical surveillance video of the port operation area, and take screenshots and mark the target according to the motion information to make a data set;

建立神经网络Yolov5x，利用所述数据集中的一部分对所述神经网络Yolov5x进行初步训练以获取目标检测模型；Neural network Yolov5x is set up, and a part of the data set is used to carry out preliminary training to the neural network Yolov5x to obtain a target detection model;

使用所述目标检测模型对所述数据集中的另一部分进行目标检测以分析误检目标和漏检目标；Using the target detection model to perform target detection on another part of the data set to analyze false detection targets and missed detection targets;

基于所述误检目标和/或所述漏检目标，通过数据增广更新所述数据集，并利用更新的数据集对所述目标检测模型进行强化训练；以及Based on the false detection target and/or the missed detection target, update the data set through data augmentation, and use the updated data set to perform intensive training on the target detection model; and

每隔预定时间对所述港口作业区的当前监控视频进行截图以获得待检测图片并利用强化的目标检测模型对所述待检测图片进行目标检测。Taking screenshots of the current surveillance video of the port operation area at predetermined intervals to obtain pictures to be detected, and performing target detection on the pictures to be detected by using an enhanced target detection model.
根据权利要求1所述的基于计算机视觉的目标检测方法，其特征在于，对港口作业区的历史监控视频进行运动信息检测，并根据所述运动信息进行截图并对目标进行标注以制作数据集进一步包括：The target detection method based on computer vision according to claim 1, wherein the historical monitoring video of the port operation area is detected for motion information, and screenshots are taken according to the motion information and the target is marked to make a data set. include:

从数据库中获取所述港口作业区的历史监控视频；Obtain the historical monitoring video of the port operation area from the database;

根据所述历史监控视频的图片帧间信息使用高斯混合模型将所述历史监控视频分类为静止像素和运动像素以判断出所述历史监控视频中的运动像素区域；以及Classify the historical surveillance video into static pixels and motion pixels by using a Gaussian mixture model according to the picture inter-frame information of the historical surveillance video to determine the motion pixel area in the historical surveillance video; and

对所述历史监控视频中存在运动像素区域的图片进行截图并且对所述运动像素区域中的目标进行标注以生成所述数据集。The data set is generated by taking a screenshot of a picture in the historical surveillance video that has a motion pixel area and marking a target in the motion pixel area.
根据权利要求2所述的基于计算机视觉的目标检测方法，其特征在于，所述目标包括要检测的尺寸相对较大的大目标和尺寸相对较小的小目标，在利用所述数据集对所述神经网络Yolov5x进行训练以获取目标检测模型之前，还包括：The computer vision-based target detection method according to claim 2, wherein the target includes a large target with a relatively large size and a small target with a relatively small size to be detected, and when using the data set to analyze the target Before the neural network Yolov5x is trained to obtain the target detection model, it also includes:

根据所述港口作业区的所述历史监控视频和所述当前监控视频中的小目标的尺寸，确定所述数据集中的每个图片帧的输入尺寸。The input size of each picture frame in the data set is determined according to the size of the small target in the historical monitoring video of the port operation area and the current monitoring video.
根据权利要求3所述的基于计算机视觉的目标检测方法，其特征在于，建立神经网络Yolov5x还包括：The target detection method based on computer vision according to claim 3, wherein, setting up neural network Yolov5x also includes:

所述神经网络Yolov5x的基网使用CSP网络架构；以及The base network of the neural network Yolov5x uses a CSP network architecture; and

在所述神经网络Yolov5x的移动步长为8、16和32的金字塔特征图的基础上，增加移动步长为4的金字塔特征图以检测所述小目标。On the basis of the pyramid feature maps with moving steps of 8, 16 and 32 in the neural network Yolov5x, a pyramid feature map with a moving step of 4 is added to detect the small target.
根据权利要求3所述的基于计算机视觉的目标检测方法，其特征在于，分析误检目标和漏检目标进一步包括：The target detection method based on computer vision according to claim 3, wherein analyzing false detection targets and missed detection targets further comprises:

将使用所述目标检测模型检测到的目标与所述数据集的对应的图片中的标注目标进行比较，以确定所述误检目标和所述漏检目标，其中，Comparing the target detected by using the target detection model with the marked target in the corresponding picture of the data set, to determine the false detection target and the missed detection target, wherein,

将所述对应的被标注的第一图片中本来没有目标，而使用所述目标检测模型检测到的目标确定为所述误检目标；以及There is no target in the corresponding marked first picture, but the target detected by the target detection model is determined as the false detection target; and

将所述对应的被标注的第二图片中具有目标，而使用所述目标检测模型没有检测到的目标确定为所述漏检目标。Determining an object that has an object in the corresponding marked second picture but is not detected by using the object detection model as the missed detection object.
根据权利要求5所述的基于计算机视觉的目标检测方法，其特征在于，基于所述误检目标，通过数据增广更新所述数据集，其中，利用更新的数据集提升所述目标检测模型的鲁棒性进一步包括：The computer vision-based target detection method according to claim 5, wherein, based on the misdetected target, the data set is updated through data augmentation, wherein the updated data set is used to improve the accuracy of the target detection model. Robustness further includes:

当所述误检目标是稳定的误检目标时，将检测到所述误检目标的所述第一图片进行正确标注并添加至所述数据集；When the false detection target is a stable false detection target, correctly labeling the first picture in which the false detection target is detected and adding it to the data set;

从所述第一图片中随机剪裁所述误检目标作为负样本的图片进行数据增广以更新所述数据集；以及Randomly clipping the false detection object from the first picture as a negative sample picture to perform data augmentation to update the data set; and

将具有不同相对尺寸的目标的图片与所述负样本的图片输入所述目标检测模型，使得所述目标检测模型识别出所述目标与所述误检目标之间的区别。The pictures of targets with different relative sizes and the pictures of the negative samples are input into the target detection model, so that the target detection model can identify the difference between the target and the falsely detected target.
根据权利要求5所述的基于计算机视觉的目标检测方法，其特征在于，基于所述漏检目标，通过数据增广更新所述数据集，其中，利用更新的数据集提升所述目标检测模型的鲁棒性进一步包括：The computer vision-based target detection method according to claim 5, wherein based on the missed detection target, the data set is updated through data augmentation, wherein the updated data set is used to improve the accuracy of the target detection model. Robustness further includes:

通过改变所述第二图片的亮度和对比度进行数据增广和/或通过从所述第二图片中随机抠出所述漏检目标并将所述漏检目标粘贴到所述第二图片的其他位置处进行数据增广，以更新所述数据集，其中，所述第二图片中具有漏检目标；以及Perform data augmentation by changing the brightness and contrast of the second picture and/or by randomly extracting the missed detection target from the second picture and pasting the missed detection target to other objects in the second picture performing data augmentation at a position to update the data set, wherein the second picture has an undetected object; and

将粘贴有所述漏检目标的第二图片输入所述目标检测模型，使得所述目标检测模型能够检测出所述漏检目标。inputting the second picture pasted with the missed target into the target detection model, so that the target detection model can detect the missed target.
根据权利要求3所述的基于计算机视觉的目标检测方法，其特征在于，The target detection method based on computer vision according to claim 3, wherein,

所述大目标包括港口作业机械；Said large objects include port operation machinery;

所述小目标包括港口人员；Said small targets include port personnel;

所述误检目标包括摄像机；以及The false detection target includes a camera; and

所述漏检目标包括下蹲人员。The missed detection targets include squatting people.
一种基于计算机视觉的目标检测装置，其特征在于，包括：A computer vision-based target detection device, characterized in that it comprises:

数据集生成模块，用于对港口作业区的历史监控视频进行运动信息检测，并根据所述运动信息进行截图并对目标进行标注以制作数据集；The data set generation module is used to detect the motion information of the historical monitoring video of the port operation area, and take a screenshot according to the motion information and mark the target to make a data set;

模型建立模块，用于建立神经网络Yolov5x，利用所述数据集中的一部分对所述神经网络Yolov5x进行初步训练以获取目标检测模型；Model building module, for setting up neural network Yolov5x, utilizes a part in described data set to carry out preliminary training to described neural network Yolov5x to obtain target detection model;

目标分析模块，用于使用所述目标检测模型对所述数据集中的另一部分进行目标检测以分析误检目标和漏检目标；A target analysis module, configured to use the target detection model to perform target detection on another part of the data set to analyze false detection targets and missed detection targets;

数据集更新模块，用于基于所述误检目标和/或所述漏检目标，通过数据增广更新所述数据集；A data set update module, configured to update the data set through data augmentation based on the false detection target and/or the missed detection target;

强化训练模块，用于利用更新的数据集对所述目标检测模型进行强化训练；以及An intensive training module, configured to perform intensive training on the target detection model using an updated data set; and

目标检测模块，用于每隔预定时间对所述港口作业区的当前监控视频进行截图以获得待检测图片并利用强化的目标检测模型对所述待检测图片进行目标检测。The target detection module is configured to take screenshots of the current surveillance video of the port operation area at predetermined intervals to obtain pictures to be detected, and use an enhanced target detection model to perform target detection on the pictures to be detected.
根据权利要求9所述的基于计算机视觉的目标检测装置，其特征在于，所述数据集生成模块用于：The target detection device based on computer vision according to claim 9, wherein the data set generation module is used for:

从数据库中获取所述港口作业区的历史监控视频；Obtain the historical monitoring video of the port operation area from the database;

根据所述历史监控视频的图片帧间信息使用高斯混合模型将所述历史监控视频分类为静止像素和运动像素以判断出所述历史监控视频中的运动像素区域；以及Classify the historical surveillance video into static pixels and motion pixels by using a Gaussian mixture model according to the picture inter-frame information of the historical surveillance video to determine the motion pixel area in the historical surveillance video; and

对所述历史监控视频中存在运动像素区域的图片进行截图并且对所述运动像素区域中的目标进行标注以生成所述数据集。The data set is generated by taking a screenshot of a picture in the historical surveillance video that has a motion pixel area and marking a target in the motion pixel area.
根据权利要求10所述的基于计算机视觉的目标检测装置，其特征在于，还包括输入尺寸确定模块，用于根据所述港口作业区的所述历史监控视频和所述当前监控视频中的小目标的尺寸，确定所述数据集中的每个图片帧的输入尺寸。The computer vision-based target detection device according to claim 10, further comprising an input size determination module for small targets based on the historical monitoring video and the current monitoring video of the port operation area The size of , determines the input size of each image frame in the dataset.
根据权利要求11所述的基于计算机视觉的目标检测装置，其特征在于，所述模型建立模块还用于：The computer vision-based target detection device according to claim 11, wherein the model building module is also used for:

所述神经网络Yolov5x的基网使用CSP网络架构；以及The base network of the neural network Yolov5x uses a CSP network architecture; and

在所述神经网络Yolov5x的移动步长为8、16和32的金字塔特征图的基础上，增加移动步长为4的金字塔特征图以检测所述小目标。On the basis of the pyramid feature maps with moving steps of 8, 16 and 32 in the neural network Yolov5x, a pyramid feature map with a moving step of 4 is added to detect the small target.
根据权利要求11所述的基于计算机视觉的目标检测装置，其特征在于，所述目标分析模块用于将使用所述目标检测模型检测到的目标与所述数据集的对应的图片中的标注目标进行比较，以确定所述误检目标和所述漏检目标，其中，所述检测分析模块包括误检目标分析子模块和漏检目标分析子模块，The computer vision-based target detection device according to claim 11, wherein the target analysis module is used to combine the target detected by the target detection model with the marked target in the corresponding picture of the data set performing a comparison to determine the false detection target and the missed detection target, wherein the detection analysis module includes a false detection target analysis submodule and a missed detection target analysis submodule,

所述误检目标分析子模块，用于将所述对应的被标注的第一图片中本来没有目标，而使用所述目标检测模型检测到的目标确定为所述误检目标；以及The false detection target analysis sub-module is used to determine the target detected by the target detection model as the false detection target in the corresponding marked first picture, which has no target originally; and

所述漏检目标分析子模块，用于将所述对应的被标注的第二图片中具有目标，而使用所述目标检测模型没有检测到的目标确定为所述漏检目标。The missed target analysis sub-module is configured to determine, as the missed target, a target that has a target in the corresponding marked second picture but is not detected by using the target detection model.
根据权利要求13所述的基于计算机视觉的目标检测装置，其特征在于，所述数据集更新模块用于：The target detection device based on computer vision according to claim 13, wherein the data set updating module is used for:

当所述误检目标是稳定的误检目标时，将检测到所述误检目标的所述第一图片进行正确标注并添加至所述数据集；When the false detection target is a stable false detection target, correctly labeling the first picture in which the false detection target is detected and adding it to the data set;

从所述第一图片中随机剪裁所述误检目标作为负样本的图片进行数据增广以更新所述数据集；以及Randomly clipping the false detection object from the first picture as a negative sample picture to perform data augmentation to update the data set; and

将具有不同相对尺寸的目标的图片与所述负样本的图片输入所述目标检测模型，使得所述目标检测模型识别出所述目标与所述误检目标之间的区别。The pictures of targets with different relative sizes and the pictures of the negative samples are input into the target detection model, so that the target detection model can identify the difference between the target and the falsely detected target.
根据权利要求13所述的基于计算机视觉的目标检测装置，其特征在于，所述数据集更新模块用于：The target detection device based on computer vision according to claim 13, wherein the data set updating module is used for:

通过改变所述第二图片的亮度和对比度进行数据增广和/或通过从所述第二图片中随机抠出所述漏检目标并将所述漏检目标粘贴到所述第二图片的其他位置处进行数据增广，以更新所述数据集，其中，所述第二图片中具有漏检目标；以及Perform data augmentation by changing the brightness and contrast of the second picture and/or by randomly extracting the missed detection target from the second picture and pasting the missed detection target to other objects in the second picture performing data augmentation at a position to update the data set, wherein the second picture has an undetected object; and

将粘贴有所述漏检目标的第二图片输入所述目标检测模型，使得所述目标检测模型能够检测出所述漏检目标。inputting the second picture pasted with the missed target into the target detection model, so that the target detection model can detect the missed target.
根据权利要求11所述的基于计算机视觉的目标检测装置，其特征在于，The target detection device based on computer vision according to claim 11, characterized in that,

所述大目标包括港口作业机械；Said large objects include port operation machinery;

所述小目标包括港口人员；Said small targets include port personnel;

所述误检目标包括摄像机；以及The false detection target includes a camera; and

所述漏检目标包括下蹲人员。The missed detection targets include squatting people.