CN111553406B - Target detection system, method and terminal based on improved YOLO-V3 - Google Patents

Target detection system, method and terminal based on improved YOLO-V3 Download PDF

Info

Publication number
CN111553406B
CN111553406B CN202010333517.8A CN202010333517A CN111553406B CN 111553406 B CN111553406 B CN 111553406B CN 202010333517 A CN202010333517 A CN 202010333517A CN 111553406 B CN111553406 B CN 111553406B
Authority
CN
China
Prior art keywords
image
feature
module
yolo
convolution layers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010333517.8A
Other languages
Chinese (zh)
Other versions
CN111553406A (en
Inventor
田鹏程
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Kaike Intelligent Technology Co ltd
Original Assignee
Shanghai Kaike Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Kaike Intelligent Technology Co ltd filed Critical Shanghai Kaike Intelligent Technology Co ltd
Priority to CN202010333517.8A priority Critical patent/CN111553406B/en
Publication of CN111553406A publication Critical patent/CN111553406A/en
Application granted granted Critical
Publication of CN111553406B publication Critical patent/CN111553406B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an object detection system based on improved YOLO-V3, comprising: the system comprises an image acquisition module, an image preprocessing module, a dark net-39 main network module, a multi-scale convolution layer feature combination module, a weighted feature fusion module and a prediction module, wherein the dark net-39 main network module adopts a dark net-39 main network model to extract image features, so as to obtain feature images of 5 different scale convolution layers; the multi-scale convolution layer feature combination module is used for optimally combining the feature images of the 5 different scale convolution layers to obtain a combined feature image; the weighted feature fusion module is used for carrying out weighted feature fusion on the combined feature map; the prediction module is used for carrying out regression prediction on the fused feature images by using a YOLO-V3 algorithm to obtain a target detection result. The system has a smaller network model, accelerates the target detection speed, enhances the network feature fusion effect and realizes a better detection result.

Description

Target detection system, method and terminal based on improved YOLO-V3
Technical Field
The invention relates to the technical field of computer vision, in particular to a target detection system, method and terminal based on YOLO-V3.
Background
YOLO (You Only Look Once) -V3 is a popular object detection algorithm at present, the speed is high and stable, but a Darknet-53 network structure is adopted in a YOLO-V3 backbone network, the parameter is 65.86BFLOPs (Billion Float Point Operations), and model parameters are large, so that the speed of the algorithm is greatly reduced when embedded equipment is operated, and the real-time detection effect cannot be achieved. When 416×416 is input, the minimum feature map size for YOLO-V3 to extract features is 13×13, which is still large, resulting in poor detection of medium or large size objects by the YOLO-V3 operator. YOLOv3 utilizes multi-scale feature graphs from different layers to predict targets with different sizes, and fuses high-low layer feature information, so that the feature fusion effect is poor because the feature graph contribution degree of different layers is often different although the detection precision is improved to a certain extent.
Disclosure of Invention
Aiming at the defects in the prior art, the object detection system, the method, the terminal and the medium based on the YOLO-V3 provided by the embodiment of the invention have the advantages of high object detection speed, improved detection effect on medium or large-size objects, improved fusion effect of YOLO-V3 fused with different layer feature graphs, and improved mAP index of object detection.
In a first aspect, an embodiment of the present invention provides a YOLO-V3-based target detection system, including: an image acquisition module, an image preprocessing module, a dark-39 backbone network module, a multi-scale convolution layer characteristic combination module, a weighted characteristic fusion module and a prediction module,
the image acquisition module is used for acquiring an image to be identified;
the image preprocessing module is used for preprocessing an image to be identified to obtain a preprocessed image;
the dark net-39 main network module is improved through a dark net-53 main network to obtain a dark net-39 main network model, and image features are extracted through the dark net-39 main network model to obtain feature diagrams of 5 different scale convolution layers;
the multi-scale convolution layer feature combination module is used for optimally combining the feature images of the 5 different scale convolution layers to obtain a combined feature image;
the weighted feature fusion module is used for carrying out weighted feature fusion on the combined feature map;
the prediction module is used for carrying out regression prediction on the fused feature images by using a YOLO-V3 algorithm to obtain a target detection result.
In a second aspect, an embodiment of the present invention provides a target detection method based on improved YOLO-V3, including:
acquiring an image to be identified;
preprocessing an image to be identified to obtain a preprocessed image;
extracting image features by adopting a trained dark net-39 backbone network model to obtain feature images of 5 convolution layers with different scales;
optimally combining the feature images of the convolution layers with different scales to obtain a combined feature image;
carrying out weighted feature fusion on the combined feature map;
and carrying out regression prediction on the fused feature images by using a YOLO-V3 algorithm to obtain a target detection result.
In a third aspect, an embodiment of the present invention provides an intelligent terminal, including a processor, an input device, an output device, and a memory, where the processor, the input device, the output device, and the memory are connected to each other, and the memory is configured to store a computer program, where the computer program includes program instructions, and the processor is configured to invoke the program instructions to execute the method steps described in the foregoing embodiments.
In a fourth aspect, embodiments of the present invention provide a computer readable storage medium storing a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method steps described in the above embodiments.
The invention has the beneficial effects that:
according to the target detection system, method, terminal and medium based on the improved YOLO-V3, the dark net-39 main network is adopted for feature extraction, the size of a model is reduced, the target detection speed is increased, 5 different scale convolution layers are adopted for feature image extraction, shallow layer feature information and deep layer feature information are fully fused, the detection effect of a medium or large-size object is improved, the combination weighting feature fusion is carried out on the feature images of different convolution layers according to different contribution degrees of the feature images of different convolution layers, the network feature fusion effect is enhanced, and a better detection result is achieved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. Like elements or portions are generally identified by like reference numerals throughout the several figures. In the drawings, elements or portions thereof are not necessarily drawn to scale.
FIG. 1 is a block diagram showing an object detection system based on modified YOLO-V3 according to a first embodiment of the present invention;
FIG. 2 is a flow chart of a target detection method based on modified YOLO-V3 according to a second embodiment of the present invention;
fig. 3 is a block diagram of an intelligent terminal according to a third embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be understood that the terms "comprises" and "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
As used in this specification and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".
It is noted that unless otherwise indicated, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this invention pertains.
Referring to FIG. 1, there is shown a block diagram of an object detection system based on modified YOLO-V3 according to a first embodiment of the present invention, the system comprising: the system comprises an image acquisition module 101, an image preprocessing module 102, a dark-39 backbone network module 103, a multi-scale convolution layer feature combination module 104, a weighted feature fusion module 105 and a prediction module 106, wherein the image acquisition module 101 is used for acquiring an image to be identified; the image preprocessing module 102 is used for preprocessing an image to be identified to obtain a preprocessed image; the dark net-39 main network module 103 is used for obtaining a dark net-39 main network model by improving a dark net-53 main network, and extracting image features by adopting the dark net-39 main network model to obtain feature diagrams of 5 different scale convolution layers; the multi-scale convolution layer feature combination module 104 is configured to optimally combine feature graphs of 5 different scale convolution layers to obtain a combined feature graph, where the optimal combination is to perform different combinations according to different layers, the first and the last layers are combined two by two, and the middle layer is combined three; the weighted feature fusion module 105 is used for performing weighted feature fusion on the combined feature map; the prediction module 106 is configured to perform regression prediction on the fused feature map by using YOLO-V3 algorithm, so as to obtain a target detection result.
The image preprocessing module 102 comprises an image rotation unit and a scaling unit, wherein the image rotation unit is used for randomly overturning, rotating and cutting an image to be identified; the scaling unit is used for performing scale transformation on the image to be identified.
The dark net-39 backbone network module 103 performs channel clipping on the dark net-53 network, reduces the number of model parameters, and improves the operation efficiency while fully extracting the picture features, so that compared with the original calculated amount, the improved YOLO-V3 algorithm is reduced by 80%, and the speed is improved by 4 times. The structure of the dark net-39 backbone network in the dark net-39 backbone network module is shown in Table 1.
Figure BDA0002465797330000051
The dark net-39 main network module comprises a dark net-39 main network training unit, wherein the dark net-39 main network training unit adds 2 convolution layers in a main network of a traditional YOLO-V3 algorithm, and adopts 5 feature graphs of different scale convolution layers to detect targets; and acquiring a data set, dividing the data set into a training set, a testing set and a verification set, re-clustering coordinates of the boundary frames on the training set by adopting a k-means clustering algorithm, and calculating 15 boundary frame coordinates of the characteristic diagrams of the convolution layers with 5 different scales.
The dark net-39 main network module performs reasonable pruning on the dark net-53 main network, optimizes the network structure, removes some redundant convolution operations, and obtains the dark net-39 main network, wherein the method comprises the specific operation of halving the number of channels of a Level 5 layer, and simultaneously taking the Level 5 layer as a characteristic output layer, wherein the stride is 4 at the moment, so that the method is beneficial to better improving the detection rate of small target objects. The Level 4 and Level 3 and Level 2 layers halve the number of channels and the number of operations is halved, and the stride is 8, 16 and 32, respectively. Finally, a 3×3 convolution layer is added, and the feature extraction effect is enhanced while the parameter number is hardly increased, and the stride is 64. The dark net-39 network at this time cannot directly load the weight parameters of the original dark net-53 and needs to be retrained. In the embodiment, classification training is performed on the ImageNet LSVRC 2012 dataset, 90 epochs are trained, the initial learning rate is 1e-03, the learning rate is reduced by ten times when step is 170000 and 350000, the batch_size is 128, and the weight attenuation coefficient is 5e-04.
Taking the coco dataset as an example, the coco 2017 detection dataset has 118287 training sets, 5000 verification sets and 40670 test sets, which are 80 categories in total. Furthermore, this process is normalized to account for the different picture sizes on the training set. In the field of object detection, the measure of similarity between two bounding boxes is often based on the size of the IOU (area interaction ratio), detectionResult represents the predicted rectangular box area, groundTruth represents the real rectangular box area,
Figure BDA0002465797330000061
then for target detection, the distance metric formula may be calculated as follows:
d(box,centroid)=1-IOU(box,centroid)
centroid refers to the center of a bounding box, the greater the IOU value between two bounding boxes, the smaller the distance between them. Before inputting the image to be identified into the dark net-39 backbone network module, the image preprocessing module preprocesses the image to be identified, transforms the image size into a fixed size, adopts a multi-scale training method in the embodiment, and randomly selects one size from the set {256,320,384,448,512,576,640,704,768} as the image input size at this time. Taking the input to-be-identified image size of 448×448 as an example, calculating 15 boundary frame coordinates of the 5-scale convolution layer feature images as follows:
(4,6),(7,16),(14,9),(22,17),(13,30),(28,37),(46,23),(25,70),(49,58),(86,39),(56,124),(99,83),(114,205),(199,124),(294,275)。
inputting 448 x 448 images to be identified, creating image pyramids of the images to be identified, inputting image pyramids of different levels into corresponding networks, respectively performing target detection on feature maps of different depths, and performing target detection on future layers through feature maps of the current layerThe feature map of the (2) is up-sampled and utilized, so that the current feature map can obtain information of a future layer, and low-order semantic information and high-order semantic information are organically fused, thereby improving detection accuracy. The feature sizes of the feature layers of the pyramid network are 7 multiplied by 7,14 multiplied by 14,28 multiplied by 28,56 multiplied by 56 and 112 multiplied by 112, the feature image sizes are sequentially 1 st, 2 nd, 3 rd, 4 th and 5 th layers from small to large, meanwhile, the first four layers perform up-sampling operation on the feature pyramid with 2 times of step length, and are fused with the depth residual error network of the next layer to form a depth fused rapid detection model, so that the expression capacity of the feature pyramid is enhanced, and compared with the traditional YOLOv3 network, the feature pyramid has a wider range, and therefore, the detection effect of small targets and objects with larger sizes can be remarkably improved. In order not to increase the calculation amount, the embodiment replaces the 3×3 convolution layer in the pyramid network by the depth separable convolution, and the calculation amount is remarkably reduced. After the feature pyramid network is adopted, a total of 5 feature graphs of convolution layers with different scales are arranged, and the optimal combination mode is selected according to the experimental effect, so that model parameters can be greatly reduced. For example: fusing 7×7 and 14×14 feature maps, and downsampling 14×14 feature maps before fusing, wherein two feature maps with 7×7 size are obtained, namely L 1 And L 2 In order to better fuse the features, the embodiment adopts a weighted feature fusion mode, and the fused features are F 1 ,L 1 Corresponding weighting coefficient w 1 ,L 2 The corresponding weighting coefficient is w 2 Then:
Figure BDA0002465797330000071
the prediction module adopts YOLO-V3 to perform regression prediction on the weighted and fused feature graphs, the YOLO-V3 divides the feature graphs into n×n grids (feature graphs with different scales, N is different in size, 5 scales are shared in this embodiment, N is 7,14, 28,56 and 112 respectively, 3 different bounding boxes are predicted by each grid, the target detection result can be expressed as n×n× [3× (c+con+b) ], C represents the number of categories, con represents the confidence level, and B represents the coordinates of the bounding box.
In order to enable the detection network to quickly converge, the cut dark net-39 network structure is pre-trained on an image net data set, and the obtained weight file is directly used as an initialization weight to be loaded into the detection network. The super-parameters are set when the dark net-39 network is pre-trained, wherein the training epoch is 120, the initial learning rate is 1e-04, the learning rate adopts a cosine_decay mode, the final learning rate is 1e-6, the momentum is 0.9, the batch_size is set to 32, and the weight attenuation coefficient is 5e-04 by adopting an l2 regularization mode.
According to the target detection system based on the improved YOLO-V3, the dark net-39 main network is adopted for feature extraction, the size of a model is reduced, the target detection speed is increased, 5 different-scale convolution layers are adopted for feature image extraction, shallow layer feature information and deep layer feature information are fully fused, the detection effect of a medium or large-size object is improved, the combination weighting feature fusion is carried out on the feature images of different convolution layers according to different contribution degrees of the feature images of the different convolution layers, the network feature fusion effect is enhanced, and a better detection result is achieved.
In the first embodiment described above, there is provided an object detection system based on modified YOLO-V3, and in correspondence thereto, the present application also provides an object detection method based on modified YOLO-V3. Please refer to fig. 2, which is a flowchart of a target detection method based on the modified YOLO-V3 according to a second embodiment of the present invention. Since the method embodiments are substantially similar to the apparatus embodiments, the description is relatively simple, and reference is made to the description of the apparatus embodiments for relevant points. The method embodiments described below are merely illustrative.
As shown in fig. 2, a flowchart of a target detection method based on improved YOLO-V3 according to a second embodiment of the present invention is shown, and the method includes:
s201, acquiring an image to be identified.
In the present embodiment, the input image to be recognized has a size of 448×448.
S202, preprocessing an image to be identified to obtain a preprocessed image.
Specifically, the specific method for preprocessing the image to be identified comprises the following steps:
randomly turning over and cutting the image to be identified horizontally/vertically;
and performing scale transformation on the image to be identified.
And S203, extracting image features by adopting a trained dark-39 backbone network model to obtain feature graphs of 5 convolution layers with different scales.
Specifically, the step of training the dark net-39 backbone network model, the specific method for training the dark net-39 backbone network model comprises the following steps:
2 convolution layers are added in a main network of a traditional YOLO-V3 algorithm, and 5 different scale convolution layer feature maps are adopted for target detection.
Specifically, the method comprises the steps of carrying out reasonable pruning on a dark net-53 network, optimizing a network structure, removing some redundant convolution operations, and obtaining the dark net-39 network, wherein the specific operation is that the number of channels of a Level 5 layer is halved, meanwhile, the Level 5 layer is also used as a characteristic output layer, and the stride is 4 at the moment, so that the small target object detection rate is improved better. The Level 4 and Level 3 and Level 2 layers halve the number of channels and the number of operations is halved, and the stride is 8, 16 and 32, respectively. Finally, a 3×3 convolution layer is added, and the feature extraction effect is enhanced while the parameter number is hardly increased, and the stride is 64. The dark net-39 network at this time cannot directly load the weight parameters of the original dark net-53 and needs to be retrained. In the embodiment, classification training is performed on the image Net LSVRC 2012 data set, 120 epochs are trained, the initial learning rate is 1e-03, the learning rate is reduced by ten times when step is 170000 and 350000, the batch_size is 128, and the weight attenuation coefficient is 5e-04.
Acquiring a data set, and dividing the data set into a training set, a testing set and a verification set;
and re-clustering the coordinates of the boundary frames on the training set by adopting a k-means clustering algorithm, and calculating 15 boundary frame coordinates of the characteristic diagrams of the convolution layers with 5 different scales.
And S204, optimally combining the feature graphs of the convolution layers with different scales to obtain a combined feature graph.
And S205, carrying out weighted feature fusion on the combined feature map.
S206, carrying out regression prediction on the fused feature images by using a YOLO-V3 algorithm to obtain a target detection result.
According to the target detection method based on the improved YOLO-V3, the dark net-39 main network is adopted for feature extraction, the size of a model is reduced, the target detection speed is increased, 5 different-scale convolution layers are adopted for feature image extraction, shallow layer feature information and deep layer feature information are fully fused, the detection effect of a medium or large-size object is improved, the combination weighting feature fusion is carried out on the feature images of different convolution layers according to different contribution degrees of the feature images of the different convolution layers, the network feature fusion effect is enhanced, and a better detection result is achieved.
As shown in fig. 3, a schematic structural diagram of an intelligent terminal according to a third embodiment of the present invention is shown, where the terminal includes a processor 301, an input device 302, an output device 303, and a memory 304, where the processor 301, the input device 302, the output device 303, and the memory 304 are connected to each other, and the memory 304 is used to store a computer program, where the computer program includes program instructions, and the processor 301 is configured to invoke the program instructions to execute the method described in the second embodiment.
It should be appreciated that in embodiments of the present invention, the processor 301 may be a central processing unit (Central Processing Unit, CPU), which may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSPs), application specific integrated circuits (Application Specific Integrated Circuit, ASICs), off-the-shelf programmable gate arrays (Field-Programmable Gate Array, FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The input device 302 may include a touch pad, a fingerprint sensor (for collecting fingerprint information of a user and direction information of a fingerprint), a microphone, etc., and the output device 303 may include a display (LCD, etc.), a speaker, etc.
The memory 304 may include read only memory and random access memory and provides instructions and data to the processor 301. A portion of memory 304 may also include non-volatile random access memory. For example, the memory 304 may also store information of device type.
In a specific implementation, the processor 301, the input device 302, and the output device 303 described in the embodiments of the present invention may perform an implementation described in the method embodiments provided in the embodiments of the present invention, or may perform an implementation described in the system embodiments of the present invention, which are not described herein again.
In a further embodiment of the invention, a computer-readable storage medium is provided, which stores a computer program comprising program instructions that, when executed by a processor, cause the processor to perform the method described in the above embodiment.
The computer readable storage medium may be an internal storage unit of the terminal according to the foregoing embodiment, for example, a hard disk or a memory of the terminal. The computer readable storage medium may also be an external storage device of the terminal, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the terminal. Further, the computer-readable storage medium may also include both an internal storage unit and an external storage device of the terminal. The computer-readable storage medium is used to store the computer program and other programs and data required by the terminal. The computer-readable storage medium may also be used to temporarily store data that has been output or is to be output.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps described in connection with the embodiments disclosed herein may be embodied in electronic hardware, in computer software, or in a combination of the two, and that the elements and steps of the examples have been generally described in terms of function in the foregoing description to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working procedures of the terminal and the unit described above may refer to the corresponding procedures in the foregoing method embodiments, which are not repeated herein.
In several embodiments provided in the present application, it should be understood that the disclosed terminal and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices, or elements, or may be an electrical, mechanical, or other form of connection.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention, and are intended to be included within the scope of the appended claims and description.

Claims (6)

1. An improved YOLO-V3 based target detection system, comprising: an image acquisition module, an image preprocessing module, a dark-39 backbone network module, a multi-scale convolution layer characteristic combination module, a weighted characteristic fusion module and a prediction module,
the image acquisition module is used for acquiring an image to be identified;
the image preprocessing module is used for preprocessing an image to be identified to obtain a preprocessed image;
the dark net-39 main network module is improved through a dark net-53 main network to obtain a dark net-39 main network model, and image features are extracted through the dark net-39 main network model to obtain feature diagrams of 5 different scale convolution layers;
the multi-scale convolution layer feature combination module is used for optimally combining the feature images of the 5 different scale convolution layers to obtain a combined feature image;
the weighted feature fusion module is used for carrying out weighted feature fusion on the combined feature map;
the prediction module is used for carrying out regression prediction on the fused feature images by using a YOLO-V3 algorithm to obtain a target detection result;
the dark net-39 main network module comprises a dark net-39 main network training unit, wherein the dark net-39 main network training unit adds 2 convolution layers in a main network of a traditional YOLO-V3 algorithm, and adopts 5 feature graphs of different scale convolution layers to detect targets;
acquiring a data set, dividing the data set into a training set, a testing set and a verification set,
and re-clustering the coordinates of the boundary frames on the training set by adopting a k-means clustering algorithm, and calculating 15 boundary frame coordinates of the characteristic diagrams of the convolution layers with 5 different scales.
2. The improved YOLO-V3 based object detection system of claim 1, wherein the image preprocessing module comprises an image rotation unit and a scaling unit, the image rotation unit being used for performing random horizontal/vertical flipping, cropping of an image to be identified; the scaling unit is used for performing scale transformation on the image to be identified.
3. An improved YOLO-V3-based target detection method, comprising:
acquiring an image to be identified;
preprocessing an image to be identified to obtain a preprocessed image;
extracting image features by adopting a trained dark net-39 backbone network model to obtain feature images of 5 convolution layers with different scales;
optimally combining the feature images of the convolution layers with different scales to obtain a combined feature image;
carrying out weighted feature fusion on the combined feature map;
carrying out regression prediction on the fused feature images by using a YOLO-V3 algorithm to obtain a target detection result;
the method also comprises the step of training the dark net-39 trunk network model, and the concrete method for training the dark net-39 trunk network model comprises the following steps:
2 convolution layers are added in a main network of a traditional YOLO-V3 algorithm, and 5 feature maps of convolution layers with different scales are adopted for target detection;
acquiring a data set, dividing the data set into a training set, a testing set and a verification set,
and re-clustering the coordinates of the boundary frames on the training set by adopting a k-means clustering algorithm, and calculating 15 boundary frame coordinates of the characteristic diagrams of the convolution layers with 5 different scales.
4. The improved YOLO-V3 based object detection method of claim 3, wherein said specific method of preprocessing the image to be identified comprises:
randomly turning over and cutting the image to be identified horizontally/vertically;
and performing scale transformation on the image to be identified.
5. A smart terminal comprising a processor, an input device, an output device and a memory, the processor, the input device, the output device and the memory being interconnected, the memory being for storing a computer program, the computer program comprising program instructions, characterized in that the processor is configured to invoke the program instructions to perform the method of any of claims 3-4.
6. A computer readable storage medium, characterized in that the computer storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of any of claims 3-4.
CN202010333517.8A 2020-04-24 2020-04-24 Target detection system, method and terminal based on improved YOLO-V3 Active CN111553406B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010333517.8A CN111553406B (en) 2020-04-24 2020-04-24 Target detection system, method and terminal based on improved YOLO-V3

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010333517.8A CN111553406B (en) 2020-04-24 2020-04-24 Target detection system, method and terminal based on improved YOLO-V3

Publications (2)

Publication Number Publication Date
CN111553406A CN111553406A (en) 2020-08-18
CN111553406B true CN111553406B (en) 2023-04-28

Family

ID=72007656

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010333517.8A Active CN111553406B (en) 2020-04-24 2020-04-24 Target detection system, method and terminal based on improved YOLO-V3

Country Status (1)

Country Link
CN (1) CN111553406B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183255A (en) * 2020-09-15 2021-01-05 西北工业大学 Underwater target visual identification and attitude estimation method based on deep learning
CN112132032A (en) * 2020-09-23 2020-12-25 平安国际智慧城市科技股份有限公司 Traffic sign detection method and device, electronic equipment and storage medium
CN112200201A (en) * 2020-10-13 2021-01-08 上海商汤智能科技有限公司 Target detection method and device, electronic equipment and storage medium
CN112380921A (en) * 2020-10-23 2021-02-19 西安科锐盛创新科技有限公司 Road detection method based on Internet of vehicles
CN112307976B (en) * 2020-10-30 2024-05-10 北京百度网讯科技有限公司 Target detection method, target detection device, electronic equipment and storage medium
CN112633066A (en) * 2020-11-20 2021-04-09 苏州浪潮智能科技有限公司 Aerial small target detection method, device, equipment and storage medium
CN112507896B (en) * 2020-12-14 2023-11-07 大连大学 Method for detecting cherry fruits by adopting improved YOLO-V4 model
CN112801169B (en) * 2021-01-25 2024-02-06 中国人民解放军陆军工程大学 Camouflage target detection method, system, device and storage medium based on improved YOLO algorithm
CN112949692A (en) * 2021-02-03 2021-06-11 歌尔股份有限公司 Target detection method and device
CN112966565A (en) * 2021-02-05 2021-06-15 深圳市优必选科技股份有限公司 Object detection method and device, terminal equipment and storage medium
CN112668560B (en) * 2021-03-16 2021-07-30 中国矿业大学(北京) Pedestrian detection method and system for pedestrian flow dense area
CN113838021A (en) * 2021-09-18 2021-12-24 长春理工大学 Pulmonary nodule detection system based on improved YOLOv5 network
CN114170421B (en) * 2022-02-10 2022-06-17 卡奥斯工业智能研究院(青岛)有限公司 Image detection method, device, equipment and storage medium
CN117960839B (en) * 2024-03-29 2024-06-04 山西建投临汾建筑产业有限公司 Steel structural member welding deformation correcting device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109614985A (en) * 2018-11-06 2019-04-12 华南理工大学 A kind of object detection method based on intensive connection features pyramid network
CN109685152A (en) * 2018-12-29 2019-04-26 北京化工大学 A kind of image object detection method based on DC-SPP-YOLO
CN110443208A (en) * 2019-08-08 2019-11-12 南京工业大学 A kind of vehicle target detection method, system and equipment based on YOLOv2
WO2019232830A1 (en) * 2018-06-06 2019-12-12 平安科技(深圳)有限公司 Method and device for detecting foreign object debris at airport, computer apparatus, and storage medium
CN110766098A (en) * 2019-11-07 2020-02-07 中国石油大学(华东) Traffic scene small target detection method based on improved YOLOv3
CN110991311A (en) * 2019-11-28 2020-04-10 江南大学 Target detection method based on dense connection deep network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9881234B2 (en) * 2015-11-25 2018-01-30 Baidu Usa Llc. Systems and methods for end-to-end object detection

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019232830A1 (en) * 2018-06-06 2019-12-12 平安科技(深圳)有限公司 Method and device for detecting foreign object debris at airport, computer apparatus, and storage medium
CN109614985A (en) * 2018-11-06 2019-04-12 华南理工大学 A kind of object detection method based on intensive connection features pyramid network
CN109685152A (en) * 2018-12-29 2019-04-26 北京化工大学 A kind of image object detection method based on DC-SPP-YOLO
CN110443208A (en) * 2019-08-08 2019-11-12 南京工业大学 A kind of vehicle target detection method, system and equipment based on YOLOv2
CN110766098A (en) * 2019-11-07 2020-02-07 中国石油大学(华东) Traffic scene small target detection method based on improved YOLOv3
CN110991311A (en) * 2019-11-28 2020-04-10 江南大学 Target detection method based on dense connection deep network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
戴伟聪 ; 金龙旭 ; 李国宁 ; 郑志强 ; .遥感图像中飞机的改进YOLOv3实时检测算法.光电工程.2018,(12),全文. *
朱鹏 ; 陈虎 ; 李科 ; 程宾洋 ; .一种轻量级的多尺度特征人脸检测方法.计算机技术与发展.(04),全文. *

Also Published As

Publication number Publication date
CN111553406A (en) 2020-08-18

Similar Documents

Publication Publication Date Title
CN111553406B (en) Target detection system, method and terminal based on improved YOLO-V3
CN110647817B (en) Real-time face detection method based on MobileNet V3
CN109671020B (en) Image processing method, device, electronic equipment and computer storage medium
CN112733749B (en) Real-time pedestrian detection method integrating attention mechanism
Zhang et al. A dense u-net with cross-layer intersection for detection and localization of image forgery
CN111814794B (en) Text detection method and device, electronic equipment and storage medium
CN111860398B (en) Remote sensing image target detection method and system and terminal equipment
CN111723786A (en) Method and device for detecting wearing of safety helmet based on single model prediction
CN109714526B (en) Intelligent camera and control system
CN111274999B (en) Data processing method, image processing device and electronic equipment
CN111062854A (en) Method, device, terminal and storage medium for detecting watermark
CN113487610B (en) Herpes image recognition method and device, computer equipment and storage medium
CN112749576B (en) Image recognition method and device, computing equipment and computer storage medium
CN111967478B (en) Feature map reconstruction method, system, storage medium and terminal based on weight overturn
CN116309612B (en) Semiconductor silicon wafer detection method, device and medium based on frequency decoupling supervision
CN116343159B (en) Unstructured scene passable region detection method, device and storage medium
CN111985487A (en) Remote sensing image target extraction method, electronic equipment and storage medium
TWI803243B (en) Method for expanding images, computer device and storage medium
CN114092813B (en) Industrial park image extraction method and system, electronic equipment and storage medium
CN115953454A (en) Water level obtaining method, device and equipment based on image restoration and storage medium
CN114529828A (en) Method, device and equipment for extracting residential area elements of remote sensing image
CN116543246A (en) Training method of image denoising model, image denoising method, device and equipment
CN114155524A (en) Single-stage 3D point cloud target detection method and device, computer equipment and medium
CN112541535B (en) Three-dimensional point cloud classification method based on complementary multi-branch deep learning
CN116311086B (en) Plant monitoring method, training method, device and equipment for plant monitoring model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant