CN114927236A - Detection method and system for multiple target images - Google Patents

Detection method and system for multiple target images Download PDF

Info

Publication number
CN114927236A
CN114927236A CN202210655674.XA CN202210655674A CN114927236A CN 114927236 A CN114927236 A CN 114927236A CN 202210655674 A CN202210655674 A CN 202210655674A CN 114927236 A CN114927236 A CN 114927236A
Authority
CN
China
Prior art keywords
target area
target
image
detection
original image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210655674.XA
Other languages
Chinese (zh)
Inventor
梁浩
费伦科
苏建澎
江巧娴
梁立斌
张诗乔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202210655674.XA priority Critical patent/CN114927236A/en
Publication of CN114927236A publication Critical patent/CN114927236A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/80ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for detecting, monitoring or modelling epidemics or pandemics, e.g. flu
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/69Microscopic objects, e.g. biological cells or cellular parts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/69Microscopic objects, e.g. biological cells or cellular parts
    • G06V20/695Preprocessing, e.g. image segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/69Microscopic objects, e.g. biological cells or cellular parts
    • G06V20/698Matching; Classification
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Public Health (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a detection method and a detection system for multiple target images, which relate to the technical field of image detection, and are characterized in that an original image data set is obtained, each original image in the original image data set comprises a large target area and a small target area, a target detection model is constructed, the target detection model comprises a YOLO network model, a FPN network model, a PAN network model and a detection network model which are connected in sequence, the method comprises the steps of training a target detection model through a preprocessed original image data set to obtain a trained target detection model, obtaining an image to be detected, inputting the image to be detected into the trained target detection model, outputting detection results of each target area of the image to be detected, and improving the positioning precision of a small target and the characteristic extraction effect of the small target.

Description

Detection method and system for multiple target images
Technical Field
The invention relates to the technical field of image detection, in particular to a detection method and a detection system for multiple target images.
Background
In recent years, the demand of new corona reagents for detecting new corona viruses is increased, and the classification and statistics of the detection results of the new corona reagents require manual operation, including the classification and statistics of the reagent serving as a large target and the reagent serving as a small target in the acquired new corona reagent images.
The target detection refers to image segmentation based on target geometry and statistical characteristics, combines extraction and identification of targets, can process a plurality of targets in real time in a complex scene, and automatically extracts and identifies the required targets.
The traditional target detection method is realized based on a deep neural network, the convolutional network is taken as a basis, a classification network is taken as a main trunk, because the small target in the image to be detected is small relative to the size of the image, the convolutional network carries out downsampling processing on the image to be detected for a plurality of times, and the small target has low pixel in a feature map output after the feature extraction of the image to be detected by the convolutional network, so the classification effect of the classification network on the small target is poor, and the detection effect on the small target is poor, in order to solve the above problems, the prior art provides a target detection method, based on a YOLO network model, the downsampling multiplying factor of the image to be detected is reduced by increasing the number of feature maps output by a feature extraction module in the YOLO network model according to the image to be detected, so the detection effect on the small target is enhanced, however, the image to be detected with higher resolution is generally used in the detection of a rectangular small target represented by reagent result detection, the YOLO network model cannot fully extract feature information from an image to be detected with a high resolution, and the image to be detected has a large number of small targets and a large size difference of the small targets, and the YOLO network model has low positioning accuracy on the small targets and a poor feature extraction effect on the small targets.
Disclosure of Invention
In order to solve the problems that the existing target detection method has low positioning precision on multiple large and small target images and especially has poor characteristic extraction effect on small targets, the invention provides a detection method and a detection system for multiple target images.
In order to achieve the technical effects, the technical scheme of the invention is as follows:
a detection method for multiple target images comprises the following steps:
s1, acquiring an original image data set, wherein each original image in the original image data set comprises a large target area and a small target area;
s2, preprocessing an original image data set to obtain a preprocessed image data set, and dividing the image data set into a training set, a verification set and a test set;
s3, constructing a target detection model, wherein the target detection model comprises a YOLO network model, a FPN network model, a PAN network model and a detection network model which are sequentially connected;
s4, training the target detection model by using the training set, evaluating the target detection model in the training process by using the verification set, and testing the effectiveness of the target detection model by using the test set to obtain the trained target detection model;
and S5, acquiring an image to be detected, wherein the image to be detected comprises a multiple target area formed by a large target area and a small target area, inputting the image to be detected into a trained target detection model, and outputting a detection result of each target area of the image to be detected.
Preferably, each original image in the original image data set is obtained by shooting through a high-definition camera of a mobile phone.
Preferably, the process of preprocessing the original image data set includes:
labeling each original image of the original image data set, labeling a large target area real frame and a small target area real frame in each original image, and obtaining image labeling data sets respectively corresponding to each original image.
Preferably, the process of preprocessing the original image data set further includes:
turning over, zooming and data enhancing operation are carried out on each original image in the original image data set, and numerical value information of each image annotation data corresponding to the original image in the image annotation data set is changed according to the turning over, zooming and data enhancing operation, wherein the numerical value information comprises coordinate information of a real frame of a large target area in the image and coordinate information of a real frame of a small target area in the image;
and splicing a plurality of original images in the original image data set into one image.
Preferably, the YOLO network model includes a CSPDarkent53 network and an SPPF module connected in sequence, and the network parameters and weights of the CSPDarkent53 network are pre-trained on a general ImageNet image classification dataset.
Preferably, in step S4, performing a feature extraction operation on the preprocessed original image data set through the CSPDarkent53 network to obtain a first feature map, performing a pooling operation and a feature fusion operation on the first feature map through the SPPF module to obtain a second feature map, inputting the second feature map into the FPN network model to perform multi-scale feature learning to obtain a third feature map, inputting the third feature map into the PAN network model to perform feature size localization learning to obtain a fourth feature map, inputting the fourth feature map into the detection network model, performing automatic labeling and classification prediction at the detection network model based on the fourth feature map to obtain a prediction boundary box of the large target region, a prediction boundary box of the small target region, and classification probabilities corresponding to the prediction boundary box of the large target region and the prediction boundary box of the small target region, respectively, when a difference between the prediction boundary box of the large target region and a real frame of the large target region is less than or equal to a preset difference, and when the difference between the predicted boundary frame of the small target area and the real frame of the small target area is less than or equal to the preset difference, finishing the training to obtain a trained target detection model.
Preferably, the detection network model automatically marks the large target prediction region and the small target prediction region in a predefined frame tracing marking mode, the predefined frame tracing marking is an adaptive frame tracing marking, and an adaptive calculation process of the adaptive frame tracing marking is as follows:
setting the width and height of an initial drawing frame for marking a large target prediction area and a small target prediction area;
zooming the feature image according to the width and the height of the feature image in the fourth feature image according to a preset proportion to obtain a zoomed feature image;
introducing a K-means clustering algorithm, and setting a clustering center of the K-means clustering algorithm according to the zoomed characteristic image, wherein the clustering center is a rectangular frame;
determining the intersection area of the initial drawing frame and the clustering center and the union area of the initial drawing frame and the clustering center, and updating the clustering result of the K-means clustering algorithm according to the ratio of the intersection area to the union area;
and updating the width and height of the initial drawing frame according to the clustering result to obtain a large target area prediction boundary frame and a small target area prediction boundary frame.
Preferably, the difference between the predicted bounding box of the large target area and the real frame of the large target area and the difference between the predicted bounding box of the small target area and the real frame of the small target area are both evaluated by a numerical value of a Loss function Loss, the numerical value of Loss represents the difference, and a preset numerical value of Loss is used for representing the preset difference, and the specific values are as follows:
setting parts in a large target area and a small target area in an original image as a foreground, setting parts outside the large target area and the small target area as a background, equally dividing the original image into a plurality of grids, and introducing a loss function formula as follows:
Loss=λ 1 L cls2 L obj3 L loc (1)
wherein λ is 1 、λ 2 、λ 3 Is hyperparametric, L cls Is to determine the error, L, produced by the classification of the original image obj Is an error, L, generated by judging whether the target is a foreground target loc Errors caused by positioning of a large target area boundary frame and a small target area boundary frame;
L cls the formula (2) is specifically as follows:
Figure BDA0003689361650000041
wherein B is the number of real frames of the large target area and the small target area,
Figure BDA0003689361650000042
representing whether the jth prediction bounding box in the ith grid is a foreground target or not, if so, taking the value of 1, otherwise, taking the value of 0, and p i (c) Is a classification probability, p' i (c)=1-p i (c) Log () is a logarithmic function;
L obj the formula (2) is specifically as follows:
Figure BDA0003689361650000043
Figure BDA0003689361650000044
representing whether the jth prediction bounding box in the ith grid is a background target or not, if so, the value is 1, if not, the value is 0, and c i Is the true confidence, the value is 1 if the foreground target is, and the value is 0, c 'if the background target is' i If the confidence coefficient is the predicted confidence coefficient, the value is 1 if the object is a foreground object, and the value is 0 if the object is a background object;
L loc the formula (2) is specifically as follows:
L loc =L CIoU =1-CIoU (4)
CIoU=IoU-(ρ 2 (b,b gt )/c 2 +αv) (5)
a=v/(1-IoU)+v (6)
v=4/Π 2 (tan -1 w gt /h gt -tan -1 w/h) (7)
wherein IoU is the ratio of intersection area and union area of real box and predicted boundary box, rho 2 (b,b gt ) Square of the distance representing the center points of the predicted bounding box and the real box, c 2 Square of diagonal distance, w, representing minimum closure area capable of containing both predicted bounding box and true box gt /h gt Is the aspect ratio of the real box, w/h is the aspect ratio of the predicted bounding box, tan -1 () Is an arctangent function.
The invention also provides a detection system for multiple target images, which comprises:
an acquisition unit configured to acquire an original image dataset, each original image in the original image dataset including a large target region and a small target region;
the system comprises a preprocessing unit, a test unit and a data processing unit, wherein the preprocessing unit is used for preprocessing an original image data set to obtain a preprocessed image data set and dividing the image data set into a training set, a verification set and a test set;
the device comprises a construction unit, a detection unit and a processing unit, wherein the construction unit is used for constructing a target detection model, and the target detection model comprises a YOLO network model, a FPN network model, a PAN network model and a detection network model which are connected in sequence;
the training unit is used for training the target detection model by using the training set, evaluating the target detection model in the training process by using the verification set, and testing the effectiveness of the target detection model by using the test set to obtain the trained target detection model;
the detection unit is used for acquiring an image to be detected, the image to be detected comprises a multiple target area formed by a large target area and a small target area, the image to be detected is input into a trained target detection model, and the detection result of each target area of the image to be detected is output.
Preferably, the YOLO network model includes a CSPDarkent53 network and an SPPF module connected in sequence, the network parameters and weights of the CSPDarkent53 network are obtained by using network parameters and weights obtained by pre-training a generic ImageNet image classification dataset, the training unit is specifically configured to perform a feature extraction operation on the preprocessed raw image dataset through the CSPDarkent53 network to obtain a first feature map, perform a pooling operation and a feature fusion operation on the first feature map through the SPPF module to obtain a second feature map, input the second feature map into the FPN network model to perform multi-scale feature learning to obtain a third feature map, input the third feature map into the PAN network model to perform feature size positioning learning to obtain a fourth feature map, input the fourth feature map into the detection network model, and perform automatic labeling and classification prediction at the detection network model based on the fourth feature map to obtain a prediction boundary frame, and a prediction unit of a large target region, And when the difference between the predicted boundary frame of the large target area and the real frame of the large target area is less than or equal to the preset difference, and the difference between the predicted boundary frame of the small target area and the real frame of the small target area is less than or equal to the preset difference, finishing training to obtain a trained target detection model.
The detection system for the multiple target images is used for executing the detection method for the multiple target images.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
the invention provides a detection method and a detection system for multiple target images, wherein when a target detection model is constructed, an FPN network model and a PAN network model are added on the basis of a YOLO network model, the FPN network model learns feature information with different sizes through a top-down structure, and the PAN network model strengthens the positioning effect on a small target through a bottom-up structure, so that the positioning precision of the small target and the feature extraction effect on the small target can be improved.
Drawings
FIG. 1 is a schematic flow chart of a detection method for multiple target images according to the present invention;
FIG. 2 is a schematic diagram of a CSP module in the CSP vendor 53 network according to the present invention;
FIG. 3 is a schematic diagram of an SPPF module according to the present invention;
FIG. 4 is a diagram illustrating an FPN network model and a PAN network model according to the present invention;
FIG. 5 is a diagram illustrating the parameters of the loss function proposed by the present invention;
FIG. 6 is a schematic diagram of a multi-object image-oriented detection system according to the present invention;
FIG. 7 is a schematic diagram showing an example of a pretreatment process in the target detection of the novel corona reagent proposed by the present invention;
fig. 8 is a schematic diagram showing an example of the target detection process of the new corona reagent proposed by the present invention.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
for better illustration of the present embodiment, some parts of the drawings may be omitted, enlarged or reduced, and do not represent actual sizes;
it will be understood by those skilled in the art that certain descriptions of well-known structures in the drawings may be omitted.
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
The positional relationships depicted in the drawings are for illustrative purposes only and are not to be construed as limiting the present patent;
example 1
In consideration of the problems that the existing target detection method has low positioning accuracy for multiple large and small target images, and especially has poor feature extraction effect for small targets, the embodiment provides a detection method for multiple target images, which can enhance the positioning effect for small targets by constructing an improved target detection model, can improve the positioning accuracy for small targets and the feature extraction effect for small targets, and takes the current neocoronary reagent target detection as an example, and the method is described by combining with a flow diagram shown in fig. 1, and referring to fig. 1, including the following steps:
s1, acquiring an original image data set, wherein each original image in the original image data set comprises a large target area and a small target area;
in this step, each original image in the original image data set is obtained by shooting a new crown reagent and a reagent result corresponding to the new crown reagent through a mobile phone high-definition camera, each original image includes a large target area and a small target area, the large target area is an area where the new crown reagent is located in the original image, the small target area is an area where the reagent result is located in the original image, the sizes of the large target and the small target are various, the large target and the small target jointly form a plurality of targets, and the sizes of the large target and the small target are relatively speaking.
S2, preprocessing an original image data set to obtain a preprocessed image data set, and dividing the image data set into a training set, a verification set and a test set;
in this step, the specific pretreatment process is as follows:
marking each original image of the original image data set in a manual marking mode, marking a large target area real frame and a small target area real frame in each original image to obtain image marking data sets respectively corresponding to each original image, and dividing the original image data set and the image marking data set corresponding to each original image into a training set, a verification set and a test set.
Optionally, the division ratio of the training set, the verification set and the test set is 6:2: 2.
The method comprises the steps of carrying out turning operation, zooming operation and data enhancement operation on each original image in an original image data set, changing numerical value information of each image labeling data corresponding to the original image in an image labeling data set according to the turning operation, the zooming operation and the data enhancement operation, wherein the numerical value information comprises coordinate information of a large target area real frame in the image and coordinate information of a small target area real frame in the image, and splicing a plurality of original images in the original image data set into one image.
S3, constructing a target detection model, wherein the target detection model comprises a YOLO network model, a FPN network model, a PAN network model and a detection network model which are connected in sequence;
referring to fig. 2, 3 and 4, in this step, the YOLO network model includes a CSPDarkent53 network and an SPPF module connected in sequence, the network parameters and weights of the CSPDarkent53 network are obtained by using a network parameter and weights pre-trained in a generic ImageNet image classification dataset, the CSPDarkent53 network includes a CSP module and a Darknet53 model, the design concept of the CSP module is shown in fig. 2, the design concept of the SPPF module is shown in fig. 3, and the design concept of the FPN network model and the PAN network model is shown in fig. 4.
S4, training the target detection model by using the training set, evaluating the target detection model in the training process by using the verification set, and testing the effectiveness of the target detection model by using the test set to obtain the trained target detection model;
in the step, a CSPDarkent53 network is used for carrying out feature extraction operation on a preprocessed original image data set to obtain a first feature map, an SPPF module is used for carrying out pooling operation and feature fusion operation on the first feature map to obtain a second feature map, the second feature map is input into an FPN network model for carrying out multi-scale feature learning to obtain a third feature map, the third feature map is input into a PAN network model for carrying out feature size positioning learning to obtain a fourth feature map, the fourth feature map is input into a detection network model, automatic labeling and classification prediction are carried out at the detection network model based on the fourth feature map to obtain a prediction boundary frame of a large target area, a prediction boundary frame of a small target area and classification probabilities respectively corresponding to the prediction boundary frame of the large target area and the prediction boundary frame of the small target area, when the difference between the prediction boundary frame of the large target area and a real frame of the large target area is less than or equal to a preset difference, and when the difference between the predicted boundary frame of the small target area and the real frame of the small target area is less than or equal to the preset difference, finishing the training to obtain a trained target detection model.
The detection network model automatically marks a large target prediction area and a small target prediction area in a predefined frame marking mode, the predefined frame marking mode is a self-adaptive frame marking mode, and the self-adaptive calculation process of the self-adaptive frame marking mode is as follows:
setting the width and height of an initial drawing frame for marking a large target prediction area and a small target prediction area;
zooming the feature image according to the width and the height of the feature image in the fourth feature image according to a preset proportion to obtain a zoomed feature image;
introducing a K-means clustering algorithm, and setting a clustering center of the K-means clustering algorithm according to the scaled characteristic image, wherein the clustering center is a rectangular frame;
determining the intersection area of the initial drawing frame and the clustering center and the union area of the initial drawing frame and the clustering center, and updating the clustering result of the K-means clustering algorithm according to the ratio of the intersection area to the union area;
and updating the width and height of the initial drawing frame according to the clustering result to obtain a large target area prediction boundary frame and a small target area prediction boundary frame.
S5, obtaining an image to be detected, wherein the image to be detected comprises a multiple target area formed by a large target area and a small target area, inputting the image to be detected into a trained target detection model, and outputting a detection result of each target area of the image to be detected.
In the embodiment, on the whole, when the target detection model is constructed, the FPN network model and the PAN network model are added on the basis of the YOLO network model, the FPN network model learns the feature information of different sizes through a top-down structure, and the PAN network model strengthens the positioning effect on the small target through a bottom-up structure, so that the positioning accuracy of the small target and the feature extraction effect on the small target can be improved.
Example 2
In this embodiment, on the basis of embodiment 1, the difference between the predicted bounding box of the large target area and the real frame of the large target area, and the difference between the predicted bounding box of the small target area and the real frame of the small target area, which are mentioned in embodiment 1, are evaluated.
The difference between the predicted boundary frame of the large target area and the real frame of the large target area and the difference between the predicted boundary frame of the small target area and the real frame of the small target area are evaluated through a Loss function Loss value, wherein the Loss value represents the difference, and a preset value of the Loss value is used for representing the preset difference, and the method specifically comprises the following steps:
setting parts in a large target area and a small target area in an original image as a foreground, setting parts outside the large target area and the small target area as a background, equally dividing the original image into a plurality of grids, and introducing a loss function formula as follows:
Loss=λ 1 L cls2 L obj3 L loc (1)
wherein λ is 1 、λ 2 、λ 3 Is hyperparametric, L cls Is to determine the error, L, produced by the classification of the original image obj Is an error, L, generated by judging whether the target is a foreground target loc Errors caused by positioning of a large target area boundary frame and a small target area boundary frame;
L cls the formula (2) is specifically as follows:
Figure BDA0003689361650000091
wherein B is the number of real frames of the large target area and the small target area,
Figure BDA0003689361650000092
representing whether the jth prediction bounding box in the ith grid is a foreground target or not, if so, taking the value of 1, otherwise, taking the value of 0, and p i (c) Is a classification probability, p' i (c)=1-p i (c) Log () is a logarithmic function;
L obj the formula (2) is specifically as follows:
Figure BDA0003689361650000093
Figure BDA0003689361650000094
representing whether the jth prediction bounding box in the ith grid is a background target, if so, the value is 1, if not, the value is 0, and c i Is the true confidence, the value is 1 if the object is the foreground object, and the value is 0, c 'if the object is the background object' i If the target is a foreground target, the value is 1, and if the target is a background target, the value is 0;
please refer to fig. 5, L loc The formula (2) is specifically as follows:
L loc =L CIoU =1-CIoU (4)
CIoU=IoU-(ρ 2 (b,b gt )/c 2 +αv) (5)
a=v/(1-IoU)+v (6)
v=4/Π 2 (tan -1 w gt /h gt -tan -1 w/h) (7)
wherein IoU is the ratio of intersection area and union area of real box and predicted boundary box, rho 2 (b,b gt ) Represents the square of the distance between the center points of the predicted bounding box and the real box, as shown in FIG. 5, p 2 (b,b gt ) Is the square of the value d, c in FIG. 5 2 As shown in FIG. 5, the square of the diagonal distance, w, representing the minimum bounding region that can contain both the predicted bounding box and the true box gt /h gt Is the aspect ratio of the real box, w/h is the aspect ratio of the predicted bounding box, tan -1 () Is an arctangent function.
In this embodiment, the difference between the predicted bounding box of the large target area and the real box of the large target area and the difference between the predicted bounding box of the small target area and the real box of the small target area are evaluated by using the CIoU Loss function, so that the performance of the network can be improved.
Example 3
Referring to fig. 6, the present embodiment describes a detection system for multiple target images in the present invention, and the detection system for multiple target images in the present embodiment includes:
an acquisition unit 601 configured to acquire an original image data set, each original image in the original image data set including a large target region and a small target region;
a preprocessing unit 602, configured to preprocess an original image data set to obtain a preprocessed image data set, and divide the image data set into a training set, a verification set, and a test set;
a constructing unit 603, configured to construct a target detection model, where the target detection model includes a YOLO network model, a FPN network model, a PAN network model, and a detection network model, which are connected in sequence;
the training unit 604 is configured to train a target detection model using a training set, evaluate the target detection model in the training process using a validation set, and test the effectiveness of the target detection model using a test set to obtain a trained target detection model;
the detection unit 605 is configured to obtain an image to be detected, where the image to be detected includes multiple target regions formed by a large target region and a small target region, input the image to be detected into a trained target detection model, and output a detection result of each target region of the image to be detected.
The YOLO network model includes a CSPDarkent53 network and an SPPF module connected in sequence, where the network parameters and weights of the CSPDarkent53 network are obtained by using network parameters and weights obtained by pretraining a generic ImageNet image classification dataset, the training unit 604 is specifically configured to perform a feature extraction operation on the preprocessed raw image dataset through the CSPDarkent53 network to obtain a first feature map, perform a pooling operation and a feature fusion operation on the first feature map through the SPPF module to obtain a second feature map, input the second feature map into the FPN network model to perform multi-scale feature learning to obtain a third feature map, input the third feature map into the PAN network model to perform feature size positioning learning to obtain a fourth feature map, input the fourth feature map into the detection network model, and perform automatic labeling and classification prediction at the detection network model based on the fourth feature map to obtain a prediction boundary frame of a large target region, and perform classification prediction at the detection network model based on the third feature map, And when the difference between the predicted boundary frame of the large target area and the real frame of the large target area is less than or equal to the preset difference, and the difference between the predicted boundary frame of the small target area and the real frame of the small target area is less than or equal to the preset difference, finishing training to obtain a trained target detection model.
Example 4
In this embodiment, taking current new crown reagent target detection as an example, and describing a new crown reagent target detection process by combining schematic diagrams shown in fig. 7 and fig. 8, referring to fig. 7, first, a mobile phone high-definition camera is used to shoot a new crown reagent and a reagent result corresponding to the new crown reagent to obtain an original image data set, and the original image data set is preprocessed, specifically, a large target area and a small target area of the original image are labeled by performing a flipping operation, a zooming operation, and a data enhancing operation on the original image, where the large target area is an area where the new crown reagent is located in the original image, the small target area is an area where the reagent result is located in the original image, sizes of the large target and the small target are various, the large target and the small target jointly form multiple targets, and the large target and the small target are relatively related to each other, so as to obtain a preprocessed image data set, dividing the preprocessed image data set into a training set, a verification set and a test set, training a target detection model by using the training set, evaluating the target detection model in the training process by using the verification set, testing the effectiveness of the target detection model by using the test set to obtain the trained target detection model, referring to fig. 7, inputting the image to be detected into the trained target detection model, and reasoning the trained target detection model to obtain a final detection result, namely a prediction boundary box of a large target area and a prediction boundary box of a small target area in the image to be detected.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. This need not be, nor should it be exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (10)

1. A detection method for multiple target images is characterized by comprising the following steps:
s1, acquiring an original image data set, wherein each original image in the original image data set comprises a large target area and a small target area;
s2, preprocessing an original image data set to obtain a preprocessed image data set, and dividing the image data set into a training set, a verification set and a test set;
s3, constructing a target detection model, wherein the target detection model comprises a YOLO network model, a FPN network model, a PAN network model and a detection network model which are connected in sequence;
s4, training the target detection model by using the training set, evaluating the target detection model in the training process by using the verification set, and testing the effectiveness of the target detection model by using the test set to obtain the trained target detection model;
s5, obtaining an image to be detected, wherein the image to be detected comprises a multiple target area formed by a large target area and a small target area, inputting the image to be detected into a trained target detection model, and outputting a detection result of each target area of the image to be detected.
2. The method for detecting multiple target-oriented images according to claim 1, wherein each raw image in the raw image data set is captured by a high-definition camera of a mobile phone.
3. The method for detecting multiple target images according to claim 2, wherein the preprocessing the original image data set comprises:
labeling each original image of the original image data set, labeling a large target area real frame and a small target area real frame in each original image, and obtaining image labeling data sets respectively corresponding to each original image.
4. The method for detecting multiple target images according to claim 3, wherein the preprocessing the original image data set further comprises:
turning over, zooming and data enhancing operation are carried out on each original image in an original image data set, and numerical value information of each image labeling data corresponding to the original image in an image labeling data set is changed according to the turning over, zooming and data enhancing operation, wherein the numerical value information comprises coordinate information of a large target area real frame in the image and coordinate information of a small target area real frame in the image;
and splicing a plurality of original images in the original image data set into one image.
5. The method for detecting multiple target-oriented images according to claim 4, wherein the YOLO network model comprises a CSPDarkent53 network and an SPPF module which are connected in sequence, and the network parameters and the weights of the CSPDarkent53 network are pre-trained by using a general ImageNet image classification data set.
6. The multiple object image-oriented detection method according to claim 5, wherein, in step S4,
performing feature extraction operation on the preprocessed original image data set through a CSPDarkent53 network to obtain a first feature map, performing pooling operation and feature fusion operation on the first feature map through an SPPF module to obtain a second feature map, inputting the second feature map into an FPN network model to perform multi-scale feature learning to obtain a third feature map, inputting the third feature map into the PAN network model to perform feature size positioning learning to obtain a fourth feature map, inputting the fourth feature map into a detection network model, performing automatic labeling and classification prediction at the detection network model based on the fourth feature map to obtain a prediction boundary frame of a large target area, a prediction boundary frame of a small target area and classification probabilities respectively corresponding to the prediction boundary frame of the large target area and the prediction boundary frame of the small target area, when the difference degree between the prediction boundary frame of the large target area and a real frame of the large target area is less than or equal to a preset difference degree, and when the difference between the predicted boundary frame of the small target area and the real frame of the small target area is less than or equal to the preset difference, finishing the training to obtain a trained target detection model.
7. The method for detecting the multiple target images according to claim 6, wherein the detection network model automatically labels the large target prediction area and the small target prediction area in a predefined frame labeling mode, the predefined frame labeling is an adaptive frame labeling, and an adaptive calculation process of the adaptive frame labeling is as follows:
setting the width and height of an initial drawing frame for marking a large target prediction area and a small target prediction area;
zooming the feature image according to the width and the height of the feature image in the fourth feature image according to a preset proportion to obtain a zoomed feature image;
introducing a K-means clustering algorithm, and setting a clustering center of the K-means clustering algorithm according to the scaled characteristic image, wherein the clustering center is a rectangular frame;
determining the intersection area of the initial drawing frame and the clustering center and the union area of the initial drawing frame and the clustering center, and updating the clustering result of the K-means clustering algorithm according to the ratio of the intersection area to the union area;
and updating the width and height of the initial drawing frame according to the clustering result to obtain a large target area prediction boundary frame and a small target area prediction boundary frame.
8. The method as claimed in claim 7, wherein the difference between the predicted bounding box of the large target area and the real bounding box of the large target area and the difference between the predicted bounding box of the small target area and the real bounding box of the small target area are evaluated by a value of a Loss function Loss, wherein the value of Loss represents the difference, and a preset value of Loss represents a preset difference, specifically as follows:
setting the parts in the large target area and the small target area in the original image as the foreground, setting the parts outside the large target area and the small target area as the background, equally dividing the original image into a plurality of grids, and introducing a loss function formula as follows:
Loss=λ 1 L cls2 L obj3 L loc (1)
wherein λ is 1 、λ 2 、λ 3 Is a hyperparameter, L cls Is to determine the error, L, produced by the classification of the original image obj Is to determine whether the error, L, is an error generated by the foreground object loc Errors caused by positioning of a large target area boundary frame and a small target area boundary frame;
L cls the formula (2) is specifically as follows:
Figure FDA0003689361640000031
wherein B is the number of real frames of the large target area and the small target area,
Figure FDA0003689361640000032
representing whether the jth prediction bounding box in the ith grid is a foreground target or not, if so, the value is 1Otherwise, the value is 0, p i (c) Is classification probability, p' i (c)=1-p i (c) Log () is a logarithmic function;
L obj the formula (2) is specifically as follows:
Figure FDA0003689361640000033
Figure FDA0003689361640000034
representing whether the jth prediction bounding box in the ith grid is a background target or not, if so, the value is 1, if not, the value is 0, and c i Is the true confidence, the value is 1 if the foreground target is, and the value is 0, c 'if the background target is' i If the confidence coefficient is the predicted confidence coefficient, the value is 1 if the object is a foreground object, and the value is 0 if the object is a background object;
L loc the formula (2) is specifically as follows:
L loc =L CIoU =1-CIoU (4)
CIoU=IoU-(ρ 2 (b,b gt )/c 2 +αv) (5)
a=v/(1-IoU)+v (6)
v=4/Π 2 (tan -1 w gt /h gt -tan -1 w/h) (7)
wherein IoU is the ratio of the intersection area and the union area of the real box and the predicted boundary box, rho 2 (b,b gt ) Square of the distance representing the center points of the predicted bounding box and the real box, c 2 Square of diagonal distance, w, representing minimum occlusion region capable of containing both predicted bounding box and true box gt /h gt Is the aspect ratio of the real box, w/h is the aspect ratio of the predicted bounding box, tan -1 () Is an arctangent function.
9. A multiple-object-image-oriented detection system, comprising:
an acquisition unit configured to acquire an original image dataset, each original image in the original image dataset including a large target region and a small target region;
the system comprises a preprocessing unit, a test unit and a data processing unit, wherein the preprocessing unit is used for preprocessing an original image data set to obtain a preprocessed image data set and dividing the image data set into a training set, a verification set and a test set;
the device comprises a construction unit, a detection unit and a processing unit, wherein the construction unit is used for constructing a target detection model, and the target detection model comprises a YOLO network model, a FPN network model, a PAN network model and a detection network model which are sequentially connected;
the training unit is used for training the target detection model by using the training set, evaluating the target detection model in the training process by using the verification set, and testing the effectiveness of the target detection model by using the test set to obtain the trained target detection model;
the detection unit is used for acquiring an image to be detected, the image to be detected comprises a multiple target area formed by a large target area and a small target area, the image to be detected is input into a trained target detection model, and the detection result of each target area of the image to be detected is output.
10. The multi-object image-oriented detection system of claim 9, wherein the YOLO network model includes a CSPDarkent53 network and an SPPF module connected in sequence, the network parameters and weights of the CSPDarkent53 network are obtained by using network parameters and weights obtained by pre-training a generic ImageNet image classification dataset, the training unit is specifically configured to perform feature extraction on the preprocessed raw image dataset through the CSPDarkent53 network to obtain a first feature map, perform pooling operation and feature fusion operation on the first feature map through the SPPF module to obtain a second feature map, input the second feature map into the FPN network model for multi-scale feature learning to obtain a third feature map, input the third feature map into the PAN network model for feature size positioning learning to obtain a fourth feature map, input the fourth feature map into the detection network model, and based on the fourth feature map, and carrying out automatic labeling and classification prediction at the detection network model to obtain a prediction boundary frame of a large target area, a prediction boundary frame of a small target area and classification probabilities respectively corresponding to the prediction boundary frame of the large target area and the prediction boundary frame of the small target area, and finishing training when the difference between the prediction boundary frame of the large target area and the real frame of the large target area is less than or equal to a preset difference and the difference between the prediction boundary frame of the small target area and the real frame of the small target area is less than or equal to the preset difference to obtain a trained target detection model.
CN202210655674.XA 2022-06-10 2022-06-10 Detection method and system for multiple target images Pending CN114927236A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210655674.XA CN114927236A (en) 2022-06-10 2022-06-10 Detection method and system for multiple target images

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210655674.XA CN114927236A (en) 2022-06-10 2022-06-10 Detection method and system for multiple target images

Publications (1)

Publication Number Publication Date
CN114927236A true CN114927236A (en) 2022-08-19

Family

ID=82814623

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210655674.XA Pending CN114927236A (en) 2022-06-10 2022-06-10 Detection method and system for multiple target images

Country Status (1)

Country Link
CN (1) CN114927236A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116994116A (en) * 2023-08-04 2023-11-03 北京泰策科技有限公司 Target detection method and system based on self-attention model and yolov5

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116994116A (en) * 2023-08-04 2023-11-03 北京泰策科技有限公司 Target detection method and system based on self-attention model and yolov5
CN116994116B (en) * 2023-08-04 2024-04-16 北京泰策科技有限公司 Target detection method and system based on self-attention model and yolov5

Similar Documents

Publication Publication Date Title
CN110363182B (en) Deep learning-based lane line detection method
CN111476827B (en) Target tracking method, system, electronic device and storage medium
CN111950453A (en) Optional-shape text recognition method based on selective attention mechanism
CN111178120B (en) Pest image detection method based on crop identification cascading technology
CN110909618B (en) Method and device for identifying identity of pet
CN111160249A (en) Multi-class target detection method of optical remote sensing image based on cross-scale feature fusion
CN106022231A (en) Multi-feature-fusion-based technical method for rapid detection of pedestrian
US20120076417A1 (en) Image recognition method and image recognition apparatus
CN104036284A (en) Adaboost algorithm based multi-scale pedestrian detection method
CN111709313B (en) Pedestrian re-identification method based on local and channel combination characteristics
CN112634369A (en) Space and or graph model generation method and device, electronic equipment and storage medium
CN112634368A (en) Method and device for generating space and OR graph model of scene target and electronic equipment
CN112836625A (en) Face living body detection method and device and electronic equipment
CN110852327A (en) Image processing method, image processing device, electronic equipment and storage medium
CN116740758A (en) Bird image recognition method and system for preventing misjudgment
CN112613668A (en) Scenic spot dangerous area management and control method based on artificial intelligence
CN112634329A (en) Scene target activity prediction method and device based on space-time and-or graph
CN110310305A (en) A kind of method for tracking target and device based on BSSD detection and Kalman filtering
CN114332921A (en) Pedestrian detection method based on improved clustering algorithm for Faster R-CNN network
CN111695373A (en) Zebra crossing positioning method, system, medium and device
CN112784494B (en) Training method of false positive recognition model, target recognition method and device
CN112241736A (en) Text detection method and device
CN113780145A (en) Sperm morphology detection method, sperm morphology detection device, computer equipment and storage medium
CN114927236A (en) Detection method and system for multiple target images
CN116704490B (en) License plate recognition method, license plate recognition device and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination