CN111832513A - Real-time football target detection method based on neural network - Google Patents

Real-time football target detection method based on neural network Download PDF

Info

Publication number
CN111832513A
CN111832513A CN202010705052.4A CN202010705052A CN111832513A CN 111832513 A CN111832513 A CN 111832513A CN 202010705052 A CN202010705052 A CN 202010705052A CN 111832513 A CN111832513 A CN 111832513A
Authority
CN
China
Prior art keywords
layer
target detection
network
module
yolov4
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010705052.4A
Other languages
Chinese (zh)
Other versions
CN111832513B (en
Inventor
段育松
姬红兵
张文博
李晓颖
李林
臧博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xi'an Broccoli Education Technology Co ltd
Xidian University
Original Assignee
Xi'an Broccoli Education Technology Co ltd
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xi'an Broccoli Education Technology Co ltd, Xidian University filed Critical Xi'an Broccoli Education Technology Co ltd
Priority to CN202010705052.4A priority Critical patent/CN111832513B/en
Publication of CN111832513A publication Critical patent/CN111832513A/en
Application granted granted Critical
Publication of CN111832513B publication Critical patent/CN111832513B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a real-time football target detection method based on a neural network, which mainly solves the problems of low speed and low precision of the existing football target detection. The scheme is as follows: 1) acquiring a football target detection network YOLOv 4; 2) constructing a football target training data set; 3) obtaining the prior frame size of the constructed training data set, and replacing the prior frame size with the prior frame size in a target detection network YOLOv 4; 4) performing data augmentation on the training data set; 5) training a target detection network YOLOv4 by using the augmented data set; 6) and inputting the football target video to be detected into the trained YOLOv4 football target detection network for detection and labeling, and outputting the detection result of the football target. The invention enhances the identification and positioning capability of the network, improves the detection speed and the detection precision of the football target, ensures the real-time property of the football target detection, and can be used for human-computer interaction, sports events, live broadcast and motion analysis.

Description

Real-time football target detection method based on neural network
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a football target detection method which can be used for man-machine interaction, sports events and motion analysis.
Background
The football target detection is to judge whether a football target exists in an image or a video sequence by using a computer vision technology and give accurate positioning. The technique can be applied to human-computer interaction, sports events, live broadcast and motion analysis. Through the development of a soccer target detection system for more than ten years, the training library of the soccer target detection system tends to be large-scale, the detection precision tends to be practical, and the detection speed tends to be real-time. In the traditional football target detection method, the method mainly focuses on the aspects of extraction of manual features, learning and post-processing of a feature classifier and the like, and because football match videos have diversity and complexity, football shooting speed is high, shooting pixels cannot be guaranteed, stress is large, moving distance is long, football targets in the videos are different in size, and the football targets are easily shielded by players and referees, so that the traditional football target detection method is low in precision. And the football target detection form is mainly based on images due to the limitation of computer hardware conditions in the past, only whether the football target exists in the images is required to be detected, and the real-time football target detection problem is difficult to solve.
The existing real-time football target detection method based on the neural network comprises one-stage and two-stage. A one-stage target detection algorithm SSD proposed in the paper "SSD: Single Shot MultiBox Detector" published by Wei Liu in ECCV 2016. The method discretizes the output space of the bounding box into a set of default boxes according to the different aspect ratios of each feature map location. During prediction, the network generates a score for each object class in each default box and adjusts that box to better match the object shape. In addition, the network incorporates predictions from multiple feature maps of different resolutions, naturally dealing with objects of various sizes. However, the method has the disadvantage that in the training process of the SSD, the IOU between the prior frame and the real frame reaches 0.5 before the prior frame and the real frame are put into the network for training. The large target ROI will have a much larger value and therefore contain more a priori boxes and can be adequately trained. Conversely, a small target may have a much smaller a priori box for training and may not receive sufficient training. Therefore, the SSD has insufficient detection accuracy for small targets and is not positioned accurately.
A two-stage target detection algorithm, Faster R-CNN, proposed in the IEEE paper published in 2017 by Towards read-Time object detection with Region pro-posal Networks. The method comprises two modules, wherein one module is a deep full convolution network (RPN) and is used for generating a regional scheme; the second module is the Fast R-CNN detector, which uses the RPN generated region scheme for detection. A regional proposal network RPN is introduced which shares the full image convolution feature with the detection network, thereby achieving a near cost-free regional proposal. However, the method has the defects that the Faster R-CNN is performed in two steps in the training process, so that the target detection speed is low, and the real-time performance of the target detection cannot be ensured.
Disclosure of Invention
The invention aims to provide a real-time football target detection method based on a neural network aiming at the defects of the prior art, so as to improve the precision and speed of football target detection and ensure the real-time performance of football target detection.
In order to achieve the purpose, the technical scheme of the invention comprises the following steps:
(1) constructing a football target detection network YOLOv 4:
(1a) constructing a backbone network CSPDarknet53 of a target detection network YOLOv 4;
(1b) constructing a neck network PANet of a target detection network YOLOv 4;
(1c) building a Head network YOLO Head of a target detection network YOLOv 4;
(2) constructing a training data set:
(2a) collecting at least 3000 images containing a soccer target and having a resolution of not less than 608 x 608;
(2b) manually marking a football boundary frame in each image containing the football to generate annotation files corresponding to the acquired images one by one;
(2c) forming a training data set by the acquired images and the annotation files;
(3) training the target detection network YOLOv 4:
(3a) configuring a target detection network YOLOv4 environment;
(3b) downloading a pre-training weight file YOLOv4.conv.137 of a target detection network YOLOv 4;
(3c) obtaining the prior frame size of the constructed data set by using a k-means clustering method, and updating the prior frame size in a target detection network YOLOv 4;
(3d) inputting a training data set, and performing data augmentation on the training data set by adopting a CutMix method;
(3e) loading a pre-training weight file YOLOv4.conv.137 on the target detection network YOLOv4 constructed in the step (1) by using a transfer learning method to obtain a loaded target detection network YOLOv 4;
(3f) training the training data set constructed in the step (2) by using the loaded target detection network YOLOv4 to obtain a trained YOLOv4 football target detection network;
(4) the method comprises the steps of collecting a video containing a football target, inputting the video into a trained YOLOv4 football target detection network for detection and labeling, outputting the video labeled with football pixels, and obtaining a detection result of the football target.
Compared with the prior art, the invention has the following advantages:
firstly, because the invention constructs the YOLOv4 football target detection network formed by combining the backbone network CSPDarknet53, the neck network PANet and the Head network YOLO Head, and introduces the characteristic pyramid module into the neck network PANet, the invention detects the target on3 characteristic graphs with different scales by sampling and fusing the characteristics of different layers and utilizing the high resolution of the bottom layer characteristics and the semantic information of the high layer characteristics, and allocates accurate anchor point frames to the characteristic graphs with different scales, thereby improving the precision of football target detection.
Secondly, the method of the invention uses the CutMix method to amplify the input data, thereby improving the network training efficiency, enhancing the recognition and positioning capability of the network and further improving the generalization of the network.
Thirdly, the constructed data set is trained by using a transfer learning method, so that the target detection network can rapidly learn the high-dimensional characteristics of the data set, the time complexity of football target detection is reduced, the detection speed is further improved, and the real-time performance of the football target detection is ensured.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention;
FIG. 2 is a schematic diagram of the YOLOv4 network framework in the present invention;
fig. 3 is a schematic structural diagram of the CSPDarknet53 network in the present invention;
FIG. 4 is a graph of the results of training on a constructed data set using the present invention;
FIG. 5 is a graph of the results of an experiment using the present invention to perform soccer detection on a soccer video;
FIG. 6 is a graph showing FPS test results of frames per second for football video detection using the present invention.
Detailed Description
The embodiments and effects of the present invention will be described in further detail below with reference to the accompanying drawings.
Referring to fig. 1, the implementation steps of this embodiment are as follows.
Step 1, constructing a football target detection network YOLOv4.
Referring to fig. 2, the soccer target detection network YOLOv4 includes: a backbone network CSPDarknet53, a neck network PANet, a Head network YOLO Head, which is implemented as follows:
1.1) building a backbone network CSPDarknet53 of a target detection network YOLOv 4:
referring to fig. 3, the structural relationship of the backbone network CSPDarknet53 is: the input layer → the first buildup layer → the second buildup layer → the first module → the third buildup layer → the fourth buildup layer → the second module → the fifth buildup layer → the sixth buildup layer → the third module → the seventh buildup layer → the eighth buildup layer → the fourth module → the ninth buildup layer → the tenth buildup layer → the fifth module → the eleventh buildup layer. Wherein:
the first convolution layer to the eleventh convolution layer are CBM convolution layers, namely, the first convolution layer, the eleventh convolution layer and the eleventh convolution layer are formed by a Conv convolution layer, a Bn batch normalization layer and a Mish activation function layer;
the number of channels of the first convolution layer and the second convolution layer is 32 and 64 respectively, convolution kernels are 3 multiplied by 3, and step lengths are 1 and 2 respectively;
the number of channels of the third convolution layer and the fourth convolution layer is 64 and 128 respectively, convolution kernels are 1 multiplied by 1 and 3 multiplied by 3 respectively, and step lengths are 1 and 2 respectively;
the number of channels of the fifth convolutional layer and the sixth convolutional layer is 128 and 256 respectively, the convolutional cores are 1 multiplied by 1 and 3 multiplied by 3 respectively, and the step length is 1 and 2 respectively;
the number of channels of the seventh convolutional layer and the eighth convolutional layer is 256 and 512 respectively, the convolutional cores are 1 multiplied by 1 and 3 multiplied by 3 respectively, and the step length is 1 and 2 respectively;
the number of channels of the ninth convolutional layer and the tenth convolutional layer is 512 and 1024 respectively, the convolutional cores are 1 multiplied by 1 and 3 multiplied by 3 respectively, and the step length is 1 and 2 respectively;
the number of channels of the eleventh convolutional layer is 1024, the convolutional kernel is 3 × 3, and the step length is 2;
the first combination module is formed by splicing three CBM convolutional layers and a CSP residual module; each CBM convolutional layer consists of a convolutional layer with the channel number of 64, the convolutional kernel size of 1 multiplied by 1 and the step length of 1, a batch normalization layer and a Mish activation function layer; the CSP residual module consists of two CBM convolutional layers with the channel numbers of 32 and 64 respectively, convolutional kernels of 1 multiplied by 1 and 3 multiplied by 3 respectively and the step length of 1, and the two CBM convolutional layers are connected in sequence;
the second combination module is formed by splicing three CBM convolutional layers and two CSP residual modules; each CBM convolutional layer consists of a convolutional layer with the channel number of 64, the convolutional kernel size of 1 multiplied by 1 and the step length of 1, a batch normalization layer and a Mish activation function layer; the CSP residual error module consists of two CBM convolutional layers with the channel number of 64, the convolutional kernels of 1 multiplied by 1 and 3 multiplied by 3 respectively and the step length of 1, and the two CBM convolutional layers are connected in sequence;
the third combination module is formed by splicing three CBM convolutional layers and two CSP residual modules; each CBM convolutional layer consists of a convolutional layer with the channel number of 128, the convolutional kernel size of 1 multiplied by 1 and the step length of 1, a batch normalization layer and a Mish activation function layer; the CSP residual error module consists of two CBM convolutional layers with the channel number of 128, the convolutional kernels of 1 multiplied by 1 and 3 multiplied by 3 respectively and the step length of 1, and the two CBM convolutional layers are connected in sequence;
the fourth combination module is formed by splicing three CBM convolutional layers and two CSP residual modules; each CBM convolutional layer consists of a convolutional layer with the channel number of 256, the convolutional kernel size of 1 multiplied by 1 and the step length of 1, a batch normalization layer and a Mish activation function layer; the CSP residual error module consists of two CBM convolutional layers with 256 channel numbers, 1 multiplied by 1 and 3 multiplied by 3 convolution kernels respectively and 1 step length, and the two CBM convolutional layers are connected in sequence;
the fifth combined module is formed by splicing three CBM convolutional layers and two CSP residual modules; each CBM convolutional layer consists of a convolutional layer with the channel number of 512, the convolutional kernel size of 1 multiplied by 1 and the step length of 1, a batch normalization layer and a Mish activation function layer; the CSP residual error module consists of two CBM convolutional layers with the channel number of 512, the convolutional kernels of 1 multiplied by 1 and 3 multiplied by 3 respectively and the step length of 1, and the two CBM convolutional layers are connected in sequence;
the Mish activation function is expressed as: mix ═ x × tanh (ln (1+ e)x) In which x represents the output of the previous layer, exX to the power of e, ln (1+ e)x) Denotes base e (1+ e)x) Tan h is a hyperbolic tangent function. The Mish activation function allows small negative gradient flow when the value is negative, so that the flow of information is ensured, and the problem of gradient saturation does not exist.
1.2) building a neck network PANet network of a target detection network YOLOv 4:
the structure of the neck network PANet network is as follows in sequence: the 1 st combination module → the 2 nd combination module → the 3 rd combination module → the 4 th combination module;
the parameters of each module are set as follows:
the 1 st combination module consists of a 1 st stacking module and a convolution layer, wherein the 1 st stacking module is a stack of a 38 multiplied by 256 effective characteristic layer of a spatial pyramid structure output after convolution and sampling and a 38 multiplied by 256 effective characteristic layer of a main network, and after stacking, the picture size is unchanged, and the number of channels is doubled; the convolutional layer is an alternating convolution of five times 1 × 1 pixels and 3 × 3 pixels;
the 2 nd combined module consists of a 2 nd stacked module and a convolution layer, wherein the 2 nd stacked module is formed by stacking a 76 multiplied by 128 characteristic layer obtained by convolution and sampling of the output of the 1 st combined module and an effective characteristic layer of a main network 76 multiplied by 128, and after stacking, the size of a picture is unchanged, and the number of channels is doubled; the convolutional layer is an alternating convolution of five times 1 × 1 pixels and 3 × 3 pixels;
the 3 rd stacking module consists of a 3 rd stacking module and a convolution layer, wherein the 3 rd stacking module is a stack of the output 38 multiplied by 256 characteristic layer of the 1 st stacking module and the output down-sampled 38 multiplied by 256 characteristic layer of the second stacking module, and after stacking, the picture size is unchanged, and the number of channels is doubled; the convolutional layer is an alternating convolution of five times 1 × 1 pixels and 3 × 3 pixels;
the 4 th combined module consists of a 4 th stacked module and a convolution layer, wherein the 4 th stacked module is formed by stacking a 19 multiplied by 512 characteristic layer output from the 3 rd combined module after down sampling and a 19 multiplied by 512 characteristic layer output from a space pyramid structure, and the picture size is unchanged and the number of channels is doubled after stacking; the convolutional layer is an alternating convolution of five times 1 × 1 pixels and 3 × 3 pixels;
the CBL convolutional layer consists of Conv convolution, Bn batch normalization and Leaky _ relu activation functions;
the spatial pyramid structure adopts a mode of maximum pooling of 1 × 1, 5 × 5, 9 × 9 and 13 × 13 to perform multi-scale fusion.
1.3) building a Head network YOLO Head of a target detection network YOLOv 4:
the structure of the Head network YOLO Head is as follows in sequence: 3 × 3 convolutional layers and 1 × 1 convolutional layers;
the 3 x 3 convolutional layer is composed of a Conv convolutional layer + Bn batch normalization layer + leak _ relu activation function layer,
the 1 × 1 convolutional layer is composed of only Conv convolutional layers.
The 3 × 3 pixel convolution layer is the integration of the obtained features, and the 1 × 1 pixel convolution layer is the final output result obtained by using the features.
And 2, constructing a training data set.
2.1) collecting at least 3000 images containing a football target and having a resolution of not less than 608 x 608;
2.2) manually marking the boundary frame of the football in each image containing the football to generate annotation files corresponding to the acquired images one by one;
2.3) composing the acquired image and the annotation file into a training data set.
Step 3, training the target detection network YOLOv 4:
3.1) configuring an object detection network YOLOv4 environment, comprising different software of cuda 10.2, cudnn7.6.5, Python3.7, VisualStudio2019 and OpenCV3.4;
3.2) downloading a pre-training weight file YOLOv4.conv.137 of a target detection network YOLOv4 to a local hard disk;
3.3) obtaining the prior frame size of the constructed training data set by using a k-means clustering method, and updating the prior frame size in the target detection network YOLOv 4;
3.4) inputting a constructed training data set, and performing data augmentation on the training data set by adopting a CutMix method, namely randomly intercepting a rectangular area from one image of the data set, and replacing a corresponding rectangular area in the other image by using pixels of the rectangular area to form a new combined image so as to ensure that non-information pixels cannot appear in the image;
3.5) loading a pre-training weight file YOLOv4.conv.137 on a target detection network YOLOv4 by using a transfer learning method, namely obtaining the parameters of a trained shallow network from a large data set, and loading the parameters onto a target detection network YOLOv4, so that the target detection network YOLOv4 has the capability of identifying the general characteristics of the bottom layer, thereby saving training time and reducing the risks of under-fitting and over-fitting;
3.6) training the training data set by using the loaded target detection network YOLOv4, and realizing the following steps:
3.6.1) inputting the training data set subjected to data amplification by the CutMix method into a target detection network YOLOv 4;
3.6.2) continuously optimizing the network training parameters in a layer-by-layer training mode until the loss function of the target detection network YOLOv4 converges, as shown in fig. 4, wherein the lower curve in the figure is the training loss of the target detection network YOLOv4, and the upper curve is the average precision mean value mAP, so as to obtain the trained target detection network YOLOv4.
And 4, acquiring a video containing the football target, inputting the video into a trained YOLOv4 football target detection network for detection and labeling, and outputting the video labeled with football pixels to obtain a detection result of the football target.
The effect of the present invention will be further described with reference to simulation experiments.
1. The experimental conditions are as follows:
the hardware test platform of the simulation experiment of the invention is as follows: the CPU is Intel (R) core (TM) i7-7800X, the main frequency is 3.6GHz, the memory is 80GB, the GPU is double GeForce RTX 2080Ti, and the software platform is as follows: windows10 system.
2. And (3) analyzing the experimental content and the result:
simulation 1, training a target detection network YOLOv4 on a constructed football target training data set by using the method of the present invention, detecting and labeling the input video containing the football target by using the trained target detection network YOLOv4, and outputting the video labeled with football pixels to obtain the detection result of the football target, as shown in fig. 5. Wherein:
fig. 5(a) shows the result of detection in the normal state of the soccer ball, fig. 5(b) shows the result of detection when the soccer ball is blocked by a player, fig. 5(c) shows the result of detection when the soccer ball is blocked by a soccer net, and fig. 5(d) shows the result of detection when the soccer ball is moving at high speed.
As can be seen from FIG. 5, the target detection network YOLOv4 has good robustness to the problems of football occlusion, high-speed motion and blurring in the football video.
Simulation 2, training a target detection network YOLOv4 on the constructed football target training data set by using the method of the invention, and then detecting the frames per second FPS of the input football video by using the trained target detection network YOLOv4 to obtain the results of the frames per second FPS of the football video, as shown in fig. 6.
As can be seen from FIG. 6, the FPS number of transmission frames per second of the experimental operation is 31.3, that is, the invention can detect the football target on 31.3 frames of images per second, the detection speed is high, and the real-time property of the detection can be ensured.
In conclusion, the real-time football target detection method based on the neural network can better position and identify football targets under complex conditions, enhance the representation capability of local detail information, effectively improve the generalization of the network, further enhance the speed and the precision of football target detection and ensure the real-time performance of the football target detection.
The above description is only one specific example of the present invention in order to facilitate the understanding of the present invention by those skilled in the art, but the present invention is not limited to the scope of the specific example, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the present invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected by the present invention.

Claims (8)

1. A real-time football target detection method based on a neural network is characterized by comprising the following steps:
(1) constructing a football target detection network YOLOv 4:
(1a) constructing a backbone network CSPDarknet53 of a target detection network YOLOv 4;
(1b) constructing a neck network PANet of a target detection network YOLOv 4;
(1c) building a Head network YOLO Head of a target detection network YOLOv 4;
(2) constructing a training data set:
(2a) collecting at least 3000 images containing a soccer target and having a resolution of not less than 608 x 608;
(2b) manually marking a football boundary frame in each image containing the football to generate annotation files corresponding to the acquired images one by one;
(2c) forming a training data set by the acquired images and the annotation files;
(3) training the target detection network YOLOv 4:
(3a) configuring a target detection network YOLOv4 environment;
(3b) downloading a pre-training weight file YOLOv4.conv.137 of a target detection network YOLOv 4;
(3c) obtaining the prior frame size of the constructed data set by using a k-means clustering method, and updating the prior frame size in a target detection network YOLOv 4;
(3d) inputting a training data set, and performing data augmentation on the training data set by adopting a CutMix method;
(3e) loading a pre-training weight file YOLOv4.conv.137 on the target detection network YOLOv4 constructed in the step (1) by using a transfer learning method to obtain a loaded target detection network YOLOv 4;
(3f) training the training data set constructed in the step (2) by using the loaded target detection network YOLOv4 to obtain a trained YOLOv4 football target detection network;
(4) the method comprises the steps of collecting a video containing a football target, inputting the video into a trained YOLOv4 football target detection network for detection and labeling, outputting the video labeled with football pixels, and obtaining a detection result of the football target.
2. The method according to claim 1, wherein the backbone network CSPDarknet53 of the target detection network YOLOv4 built in (1a) has the following structural relationships:
the input layer → the first buildup layer → the second buildup layer → the first module → the third buildup layer → the fourth buildup layer → the second module → the fifth buildup layer → the sixth buildup layer → the third module → the seventh buildup layer → the eighth buildup layer → the fourth module → the ninth buildup layer → the tenth buildup layer → the fifth module → the eleventh buildup layer, wherein:
the first convolution layer to the eleventh convolution layer are CBM convolution layers, namely, the first convolution layer, the eleventh convolution layer and the eleventh convolution layer are formed by a Conv convolution layer, a Bn batch normalization layer and a Mish activation function layer;
the first combination module is formed by splicing three CBM convolutional layers with 64 channel numbers and a CSP residual module;
the second combined module is formed by splicing three CBM convolutional layers with 64 channel numbers and two CSP residual modules;
the third combination module consists of three CBM convolutional layers with the channel number of 128 and eight CSP residual modules which are spliced;
the fourth combination module consists of three CBM convolutional layers with 256 channels and eight CSP residual modules which are spliced;
the fifth combined module is formed by splicing three CBM convolutional layers with the channel number of 512 and four CSP residual modules;
the CSP residual module consists of two CBM convolutional layers, and the output of the second CBM convolutional layer is connected with the input of the first CBM convolutional layer;
the Mish activation function is expressed as: mix ═ x × tanh (ln (1+ e)x) In which x represents the output of the previous layer, exX to the power of e, ln (1+ e)x) Denotes base e (1+ e)x) Tan h is a hyperbolic tangent function.
3. The method according to claim 1, wherein the neck network PANet of the target detection network YOLOv4 built in (1b) has the structure: 1 st combination module → 2 nd combination module → 3 rd combination module → 4 th combination module, wherein:
the 1 st combination module consists of a 1 st stacking module and a convolution layer, wherein the 1 st stacking module is a stack of a feature layer obtained by CBL (cubic boron nitride) and sampling of the output of a spatial pyramid structure and an effective feature layer of a main network 38 x 512, and the convolution layer is an alternate convolution of 1 x 1 pixels and 3 x 3 pixels;
the 2 nd combination module is composed of a 2 nd stacking module and a convolution layer, wherein the 2 nd stacking module is formed by adding a sampled characteristic layer and an effective characteristic layer of a main network 76 x 256 to the output of the 1 st combination module through CBL, and the convolution layer is formed by alternately convolving 1 x 1 pixels and 3 x 3 pixels;
the 3 rd combination module consists of a 3 rd stacking module and a convolution layer, wherein the 3 rd stacking module is a stack of an output characteristic layer of the 1 st combination module and a characteristic layer after the output of the 2 nd combination module is subjected to down sampling, and the convolution layer is an alternate convolution of 1 × 1 pixels and 3 × 3 pixels;
the 4 th combination module consists of a 4 th stacking module and a convolution layer, wherein the 4 th stacking module is a stack of a characteristic layer after the output of the 3 rd combination module is subjected to down sampling and an output characteristic layer of a spatial pyramid structure, and the convolution layer is the alternate convolution of 1 × 1 pixels and 3 × 3 pixels;
the CBL convolutional layer consists of Conv convolution, Bn batch normalization and Leaky _ relu activation functions;
the spatial pyramid structure adopts a mode of maximum pooling of 1 × 1, 5 × 5, 9 × 9 and 13 × 13 to perform multi-scale fusion.
4. The method of claim 1, wherein the neck network YOLO Head of the target detection network yollov 4 constructed in (1c) has the structure: 3 × 3 convolutional layers and 1 × 1 convolutional layers are connected;
the 3 x 3 convolutional layer is composed of a Conv convolutional layer + Bn batch normalization layer + leak _ relu activation function layer,
the 1 × 1 convolutional layer is composed of only Conv convolutional layers.
5. The method according to claim 1, wherein the target detection network YOLOv4 environment is configured in (3a) and comprises cuda 10.2, cudnn7.6.5, python3.7, VisualStudio2019 and opencv3.4.
6. The method of claim 1, wherein the step (3d) of augmenting the input training set by using the CutMix method randomly intercepts a rectangular region from one image, and replaces the corresponding rectangular region in the other image with pixels of the rectangular region to form a new combined image.
7. The method of claim 1, wherein in (3e), a pre-training weight file YOLOv4.conv.137 is loaded on the target detection network YOLOv4 by using a migration learning method, and parameters of a well-trained shallow network are obtained from a large data set and loaded on the target detection network YOLOv4, so that the target detection network YOLOv4 has the capability of recognizing underlying general features, thereby saving training time and reducing the risk of under-fitting and over-fitting.
8. The method of claim 1, wherein the training data set is trained in (3f) using the loaded target detection network YOLOv4, as follows:
(3f1) inputting a data set subjected to data amplification by a CutMix method into a target detection network YOLOv 4;
(3f2) and (3) adopting a layer-by-layer training mode, firstly learning and identifying the universal characteristics of the bottom layer, then quickly learning the high-dimensional characteristics of the data set after data amplification, and continuously optimizing network training parameters until the loss function of the target detection network YOLOv4 is converged to obtain the trained target detection network YOLOv4.
CN202010705052.4A 2020-07-21 2020-07-21 Real-time football target detection method based on neural network Active CN111832513B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010705052.4A CN111832513B (en) 2020-07-21 2020-07-21 Real-time football target detection method based on neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010705052.4A CN111832513B (en) 2020-07-21 2020-07-21 Real-time football target detection method based on neural network

Publications (2)

Publication Number Publication Date
CN111832513A true CN111832513A (en) 2020-10-27
CN111832513B CN111832513B (en) 2024-02-09

Family

ID=72924506

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010705052.4A Active CN111832513B (en) 2020-07-21 2020-07-21 Real-time football target detection method based on neural network

Country Status (1)

Country Link
CN (1) CN111832513B (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112347943A (en) * 2020-11-09 2021-02-09 哈尔滨理工大学 Anchor optimization safety helmet detection method based on YOLOV4
CN112364734A (en) * 2020-10-30 2021-02-12 福州大学 Abnormal dressing detection method based on yolov4 and CenterNet
CN112381806A (en) * 2020-11-18 2021-02-19 上海北昂医药科技股份有限公司 Double centromere aberration chromosome analysis and prediction method based on multi-scale fusion method
CN112508001A (en) * 2020-12-03 2021-03-16 安徽理工大学 Coal gangue positioning method based on multispectral waveband screening and improved U-Net
CN112651326A (en) * 2020-12-22 2021-04-13 济南大学 Driver hand detection method and system based on deep learning
CN112766188A (en) * 2021-01-25 2021-05-07 浙江科技学院 Small-target pedestrian detection method based on improved YOLO algorithm
CN112781634A (en) * 2021-04-12 2021-05-11 南京信息工程大学 BOTDR distributed optical fiber sensing system based on YOLOv4 convolutional neural network
CN112907660A (en) * 2021-01-08 2021-06-04 浙江大学 Underwater laser target detector for small sample
CN112927297A (en) * 2021-02-20 2021-06-08 华南理工大学 Target detection and visual positioning method based on YOLO series
CN113052184A (en) * 2021-03-12 2021-06-29 电子科技大学 Target detection method based on two-stage local feature alignment
CN113205108A (en) * 2020-11-02 2021-08-03 哈尔滨理工大学 YOLOv 4-based multi-target vehicle detection and tracking method
CN113239845A (en) * 2021-05-26 2021-08-10 青岛以萨数据技术有限公司 Infrared target detection method and system for embedded platform
CN113239842A (en) * 2021-05-25 2021-08-10 三门峡崤云信息服务股份有限公司 Image recognition-based swan detection method and device
CN113420607A (en) * 2021-05-31 2021-09-21 西南电子技术研究所(中国电子科技集团公司第十研究所) Multi-scale target detection and identification method for unmanned aerial vehicle
CN113487551A (en) * 2021-06-30 2021-10-08 佛山市南海区广工大数控装备协同创新研究院 Gasket detection method and device for improving performance of dense target based on deep learning
CN113486865A (en) * 2021-09-03 2021-10-08 国网江西省电力有限公司电力科学研究院 Power transmission line suspended foreign object target detection method based on deep learning
CN113592825A (en) * 2021-08-02 2021-11-02 安徽理工大学 YOLO algorithm-based real-time coal gangue detection method
CN113763356A (en) * 2021-09-08 2021-12-07 国网江西省电力有限公司电力科学研究院 Target detection method based on visible light and infrared image fusion
CN113822844A (en) * 2021-05-21 2021-12-21 国电电力宁夏新能源开发有限公司 Unmanned aerial vehicle inspection defect detection method and device for blades of wind turbine generator system and storage medium
CN114492625A (en) * 2022-01-23 2022-05-13 北京工业大学 Solution of target detection network search model based on migration to detection problem of intelligent vehicle marker
CN114495029A (en) * 2022-01-24 2022-05-13 中国矿业大学 Traffic target detection method and system based on improved YOLOv4

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019028725A1 (en) * 2017-08-10 2019-02-14 Intel Corporation Convolutional neural network framework using reverse connections and objectness priors for object detection
WO2019144575A1 (en) * 2018-01-24 2019-08-01 中山大学 Fast pedestrian detection method and device
CN110348376A (en) * 2019-07-09 2019-10-18 华南理工大学 A kind of pedestrian's real-time detection method neural network based
CN111126359A (en) * 2019-11-15 2020-05-08 西安电子科技大学 High-definition image small target detection method based on self-encoder and YOLO algorithm
AU2020100705A4 (en) * 2020-05-05 2020-06-18 Chang, Jiaying Miss A helmet detection method with lightweight backbone based on yolov3 network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019028725A1 (en) * 2017-08-10 2019-02-14 Intel Corporation Convolutional neural network framework using reverse connections and objectness priors for object detection
WO2019144575A1 (en) * 2018-01-24 2019-08-01 中山大学 Fast pedestrian detection method and device
CN110348376A (en) * 2019-07-09 2019-10-18 华南理工大学 A kind of pedestrian's real-time detection method neural network based
CN111126359A (en) * 2019-11-15 2020-05-08 西安电子科技大学 High-definition image small target detection method based on self-encoder and YOLO algorithm
AU2020100705A4 (en) * 2020-05-05 2020-06-18 Chang, Jiaying Miss A helmet detection method with lightweight backbone based on yolov3 network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
管军霖;智鑫;: "基于YOLOv4卷积神经网络的口罩佩戴检测方法", 现代信息科技, no. 11 *
陈聪;杨忠;宋佳蓉;韩家明;: "一种改进的卷积神经网络行人识别方法", 应用科技, no. 03 *
黄国新;梁斌斌;张建伟;: "基于保持高分辨率的实时机场场面小目标检测", 现代计算机, no. 05 *

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112364734A (en) * 2020-10-30 2021-02-12 福州大学 Abnormal dressing detection method based on yolov4 and CenterNet
CN113205108A (en) * 2020-11-02 2021-08-03 哈尔滨理工大学 YOLOv 4-based multi-target vehicle detection and tracking method
CN112347943A (en) * 2020-11-09 2021-02-09 哈尔滨理工大学 Anchor optimization safety helmet detection method based on YOLOV4
CN112381806A (en) * 2020-11-18 2021-02-19 上海北昂医药科技股份有限公司 Double centromere aberration chromosome analysis and prediction method based on multi-scale fusion method
CN112508001A (en) * 2020-12-03 2021-03-16 安徽理工大学 Coal gangue positioning method based on multispectral waveband screening and improved U-Net
CN112651326A (en) * 2020-12-22 2021-04-13 济南大学 Driver hand detection method and system based on deep learning
CN112907660B (en) * 2021-01-08 2022-10-04 浙江大学 Underwater laser target detector for small sample
CN112907660A (en) * 2021-01-08 2021-06-04 浙江大学 Underwater laser target detector for small sample
CN112766188B (en) * 2021-01-25 2024-05-10 浙江科技学院 Small target pedestrian detection method based on improved YOLO algorithm
CN112766188A (en) * 2021-01-25 2021-05-07 浙江科技学院 Small-target pedestrian detection method based on improved YOLO algorithm
CN112927297A (en) * 2021-02-20 2021-06-08 华南理工大学 Target detection and visual positioning method based on YOLO series
CN113052184A (en) * 2021-03-12 2021-06-29 电子科技大学 Target detection method based on two-stage local feature alignment
CN112781634A (en) * 2021-04-12 2021-05-11 南京信息工程大学 BOTDR distributed optical fiber sensing system based on YOLOv4 convolutional neural network
CN113822844A (en) * 2021-05-21 2021-12-21 国电电力宁夏新能源开发有限公司 Unmanned aerial vehicle inspection defect detection method and device for blades of wind turbine generator system and storage medium
CN113239842A (en) * 2021-05-25 2021-08-10 三门峡崤云信息服务股份有限公司 Image recognition-based swan detection method and device
CN113239845A (en) * 2021-05-26 2021-08-10 青岛以萨数据技术有限公司 Infrared target detection method and system for embedded platform
CN113420607A (en) * 2021-05-31 2021-09-21 西南电子技术研究所(中国电子科技集团公司第十研究所) Multi-scale target detection and identification method for unmanned aerial vehicle
CN113487551A (en) * 2021-06-30 2021-10-08 佛山市南海区广工大数控装备协同创新研究院 Gasket detection method and device for improving performance of dense target based on deep learning
CN113487551B (en) * 2021-06-30 2024-01-16 佛山市南海区广工大数控装备协同创新研究院 Gasket detection method and device for improving dense target performance based on deep learning
CN113592825A (en) * 2021-08-02 2021-11-02 安徽理工大学 YOLO algorithm-based real-time coal gangue detection method
CN113486865A (en) * 2021-09-03 2021-10-08 国网江西省电力有限公司电力科学研究院 Power transmission line suspended foreign object target detection method based on deep learning
CN113763356A (en) * 2021-09-08 2021-12-07 国网江西省电力有限公司电力科学研究院 Target detection method based on visible light and infrared image fusion
CN114492625A (en) * 2022-01-23 2022-05-13 北京工业大学 Solution of target detection network search model based on migration to detection problem of intelligent vehicle marker
CN114495029A (en) * 2022-01-24 2022-05-13 中国矿业大学 Traffic target detection method and system based on improved YOLOv4

Also Published As

Publication number Publication date
CN111832513B (en) 2024-02-09

Similar Documents

Publication Publication Date Title
CN111832513B (en) Real-time football target detection method based on neural network
Dvornik et al. On the importance of visual context for data augmentation in scene understanding
CN111126472B (en) SSD (solid State disk) -based improved target detection method
Dvornik et al. Modeling visual context is key to augmenting object detection datasets
Huang et al. Mask R-CNN with pyramid attention network for scene text detection
CN109241982B (en) Target detection method based on deep and shallow layer convolutional neural network
CN111210443B (en) Deformable convolution mixing task cascading semantic segmentation method based on embedding balance
CN114202672A (en) Small target detection method based on attention mechanism
CN113591795B (en) Lightweight face detection method and system based on mixed attention characteristic pyramid structure
US8744168B2 (en) Target analysis apparatus, method and computer-readable medium
CN110059558A (en) A kind of orchard barrier real-time detection method based on improvement SSD network
CN111401293B (en) Gesture recognition method based on Head lightweight Mask scanning R-CNN
CN109711407B (en) License plate recognition method and related device
Zhou et al. Water-Filling: a novel way for image structural feature extraction
CN111191649A (en) Method and equipment for identifying bent multi-line text image
KR101618996B1 (en) Sampling method and image processing apparatus for estimating homography
CN110705566B (en) Multi-mode fusion significance detection method based on spatial pyramid pool
US20210256707A1 (en) Learning to Segment via Cut-and-Paste
Wang et al. Multiscale deep alternative neural network for large-scale video classification
Ma et al. Mdcn: Multi-scale, deep inception convolutional neural networks for efficient object detection
CN112800955A (en) Remote sensing image rotating target detection method and system based on weighted bidirectional feature pyramid
CN112085017A (en) Tea tender shoot image segmentation method based on significance detection and Grabcut algorithm
Yang et al. Real-time pedestrian detection via hierarchical convolutional feature
CN114943729A (en) Cell counting method and system for high-resolution cell image
Zhang et al. Multi-scale salient object detection with pyramid spatial pooling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant