CN111241947A - Training method and device of target detection model, storage medium and computer equipment - Google Patents

Training method and device of target detection model, storage medium and computer equipment Download PDF

Info

Publication number
CN111241947A
CN111241947A CN201911422532.3A CN201911422532A CN111241947A CN 111241947 A CN111241947 A CN 111241947A CN 201911422532 A CN201911422532 A CN 201911422532A CN 111241947 A CN111241947 A CN 111241947A
Authority
CN
China
Prior art keywords
sample image
target
detection frame
prediction
position information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911422532.3A
Other languages
Chinese (zh)
Other versions
CN111241947B (en
Inventor
岑俊毅
李立赛
傅东生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Miracle Intelligent Network Co ltd
Original Assignee
Miracle Intelligent Network Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Miracle Intelligent Network Co ltd filed Critical Miracle Intelligent Network Co ltd
Priority to CN201911422532.3A priority Critical patent/CN111241947B/en
Publication of CN111241947A publication Critical patent/CN111241947A/en
Application granted granted Critical
Publication of CN111241947B publication Critical patent/CN111241947B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to a training method, a device, a computer readable storage medium and a computer device of a target detection model, wherein the method comprises the following steps: acquiring a characteristic diagram of a sample image during training, and determining an initial detection frame in the characteristic diagram according to a preset rotation angle, a preset scale and a preset target aspect ratio; adjusting the position of each initial detection frame to obtain the position information of a prediction detection frame, and adjusting the network parameters of the regression network according to the position information and the real position information in the labeling information of the sample image; predicting probabilities of the target detection areas and the preset categories corresponding to the target detection areas determined according to the position information of the prediction detection frames; and after network parameters of the classification network are adjusted according to the real category information and the prediction probability in the labeling information of the sample image, a target detection model for carrying out target detection on the image is obtained. The scheme provided by the application enables the target detection model to identify the rotation angle of the target in the image, and the positioned target detection frame is more accurate.

Description

Training method and device of target detection model, storage medium and computer equipment
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for training a target detection model, a computer-readable storage medium, and a computer device.
Background
The target detection is also called target extraction, and is an image segmentation technology in the field of computer vision, and not only can the target be segmented from the image, namely the target position is positioned, but also the category of the target can be identified.
When a position regression network of a target detection model is trained, a sliding window method is usually adopted to traverse regions in an image, and then the regions are screened and checked to be used as candidate rectangular regions for target detection, however, the inventors have realized that the candidate rectangular regions are usually horizontal rectangular regions, such candidate regions can more accurately and effectively locate targets horizontally placed and regularly placed in the image, but when a rotating target or a target with an irregular shape exists in the image, a target detection frame determined according to the candidate regions is not accurate enough, for example, when a long strip-shaped object (such as a pencil) is presented in the image at a certain angle with a horizontal square line, if the horizontal rectangular frame is adopted for labeling, the background area in the located target detection frame may be far larger than the area of the target itself, so that the target location is not accurate enough, the target recognition rate is low.
Disclosure of Invention
Therefore, it is necessary to provide a training method and apparatus for a target detection model, a computer-readable storage medium, and a computer device, for solving the technical problems of the existing target detection model that the method for locating a rotating target or an irregular-shaped target in an image is not accurate enough and the recognition rate is low.
A method of training an object detection model, comprising:
acquiring a sample image and annotation information, wherein the annotation information comprises real position information and real category information of a target in the sample image, and the real position information comprises a rotation angle of a rectangular surrounding frame corresponding to the target;
obtaining a feature map of the sample image through a feature extraction network of an initial model;
generating a network through an area of an initial model, and determining an initial detection frame in the feature map according to a preset rotation angle, a preset scale and a preset target aspect ratio;
adjusting the position of each initial detection frame through a regression network of an initial model to obtain the position information of a prediction detection frame, and adjusting the network parameters of the regression network according to the real position information in the labeling information and the position information of the prediction detection frame;
predicting the prediction probability of each preset category corresponding to the target according to the target detection area determined by the position information of each prediction detection frame through a classification network of an initial model;
and after network parameters of the classification network are adjusted according to the real category information and the prediction probability in the labeling information, a target detection model for carrying out target detection on the image is obtained.
An apparatus for training an object detection model, the apparatus comprising:
acquiring a sample image and annotation information, wherein the annotation information comprises real position information and real category information of a target in the sample image, and the real position information comprises a rotation angle of a rectangular surrounding frame corresponding to the target;
obtaining a feature map of the sample image through a feature extraction network of an initial model;
generating a network through an area of an initial model, and determining an initial detection frame in the feature map according to a preset rotation angle, a preset scale and a preset target aspect ratio;
adjusting the position of each initial detection frame through a regression network of an initial model to obtain the position information of a prediction detection frame, and adjusting the network parameters of the regression network according to the real position information in the labeling information and the position information of the prediction detection frame;
predicting the prediction probability of each preset category corresponding to the target according to the target detection area determined by the position information of each prediction detection frame through a classification network of an initial model;
and after network parameters of the classification network are adjusted according to the real category information and the prediction probability in the labeling information, a target detection model for carrying out target detection on the image is obtained.
In one embodiment, the acquiring the sample image comprises: obtaining an original sample image; judging whether the aspect ratio of the original sample image is 1; if so, scaling the original sample image to a preset size in an equal proportion to obtain a sample image; if not, the original sample image is subjected to equal-scale scaling and then image pixels are supplemented, and a sample image with a preset size is obtained.
In one embodiment, the acquiring the sample image comprises: obtaining an original sample image; performing rotation processing on the original sample image according to a preset angle to obtain a sample image, and obtaining real labeling information of the sample image according to the rotation angle of a rectangular surrounding frame in the original sample image and the preset angle; or, performing vertical mirror image processing on the original sample image to obtain a sample image, and obtaining real annotation information of the sample image according to the rotation angle of the rectangular surrounding frame in the original sample image; or carrying out horizontal mirror image processing on the original sample image to obtain a sample image, and obtaining the real annotation information of the sample image according to the rotation angle of the rectangular surrounding frame in the original sample image.
In one embodiment, the step of determining the target aspect ratio comprises: obtaining sample images and width and height information of a rectangular surrounding frame corresponding to a target in each sample image; counting the aspect ratio of each rectangular surrounding frame according to the aspect information; and clustering the counted aspect ratio to obtain the target aspect ratio in the clustering result.
In one embodiment, the adjusting the position of each of the initial detection frames to obtain the position information of the prediction detection frame includes: calculating the position offset of each initial detection frame according to the current network parameters of the regression network; obtaining the position information of a prediction detection frame according to the initial detection frame and the position offset; the position information includes coordinates of a geometric center point of the prediction detection frame, a width and a height of the prediction detection frame, and a rotation angle of the prediction detection frame.
In one embodiment, the method further comprises: determining the prediction detection frame according to the position information of the prediction detection frame; determining a rectangular surrounding frame corresponding to a target in the sample image according to the real position information; calculating the intersection ratio between the prediction detection frame and the rectangular surrounding frame; calculating a rotation angle difference between the prediction detection frame and the rectangular enclosure frame; when the intersection ratio is larger than a first threshold value and the rotation angle difference is smaller than a second threshold value, marking the sample image as a positive sample image; when the intersection ratio is smaller than a third threshold value or the rotation angle difference is larger than a second threshold value, the sample image is marked as a negative sample image.
In one embodiment, the predicting the prediction probability of each preset category corresponding to the target according to the target detection area determined by the position information of each prediction detection frame includes: determining a target detection area on the feature map according to the position information of each prediction detection frame; after the target detection areas are adjusted to the same preset scale, acquiring a feature vector corresponding to each target detection area; and determining the prediction probability of each preset category corresponding to the target detection area according to the feature vector.
A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, causes the processor to carry out the steps of the method of training an object detection model as described above.
A computer device comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to carry out the steps of the above-mentioned method of training an object detection model
According to the method and the device for training the target detection model, on one hand, when the target detection model is trained, the labeling information of the sample image comprises the real position information and the real category information, and the real position information comprises the rotation angle, so that the trained target detection model can have the capability of identifying the rotation angle of the target in the image, and the positioned target detection frame is more accurate. On the other hand, in the process of training the target detection model, the rotation angle, the scale and the target aspect ratio which are used for generating the initial detection frame in the area generation network are initialized, the generation mode of the initial detection frame is enriched, the trained target detection model is more stable, and the generated initial detection frame is closer to the real target detection frame due to the fact that the initial detection frame is determined according to the preset rotation angle. Therefore, after the position of the initial detection frame is adjusted through the regression network, the prediction detection frame is obtained, the target detection area on the feature map is obtained according to the prediction detection frame, the network parameters of the regression network are adjusted according to the real position information in the labeling information and the position information of the prediction detection frame, and after the class probability of the target detection area is predicted through the classification network, the network parameters of the classification network can be adjusted according to the real class information and the prediction probability in the labeling information, so that the target detection model which can carry out target detection on the rotating target in the image and can position the target more accurately is obtained.
Drawings
FIG. 1 is a diagram of an exemplary environment in which a method for training a target detection model may be implemented;
FIG. 2 is a schematic flow chart diagram illustrating a method for training a target detection model according to one embodiment;
FIG. 3 is a schematic diagram of labeling a sample image in one embodiment;
FIG. 4 is a schematic flow chart illustrating labeling of a sample image according to an embodiment;
FIG. 5 is a diagram illustrating an embodiment of enhancing an original sample image to obtain a sample image;
FIG. 6 is a diagram illustrating an initial detection box determined from a feature map in one embodiment;
FIG. 7 is a schematic flow chart diagram illustrating a method for training a target detection model in an exemplary embodiment;
FIG. 8 is a block diagram showing the structure of a training apparatus for an object detection model according to an embodiment;
FIG. 9 is a block diagram of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
FIG. 1 is a diagram of an exemplary implementation of a method for training a target detection model. Referring to fig. 1, the training method of the target detection model is applied to a training system of the target detection model. The training system of the target detection model may include the terminal 11 and the server 120. The terminal 11 and the server 120 may be connected via a network, and the terminal 110 may be specifically a desktop terminal or a mobile terminal, and the mobile terminal may be specifically at least one of a mobile phone, a tablet computer, a notebook computer, and the like. The server 120 may be implemented as a stand-alone server or a server cluster composed of a plurality of servers.
Specifically, the terminal 110 may take a sample image and communicate the sample image to the server 120. After obtaining the sample image, the server 120 trains the initial model by using the sample image to obtain a target detection model for performing target detection on the image.
In one embodiment, the server 120 may obtain the sample image and the annotation information, where the annotation information includes real position information and real category information of the target in the sample image, and the real position information includes a rotation angle of a rectangular bounding box corresponding to the target; obtaining a characteristic diagram of a sample image through a characteristic extraction network of the initial model; generating a network through the area of the initial model, and determining an initial detection frame in the feature map according to a preset rotation angle, a preset scale and a preset target aspect ratio; adjusting the position of each initial detection frame through a regression network of the initial model to obtain the position information of the prediction detection frame, and adjusting the network parameters of the regression network according to the real position information in the labeling information and the position information of the prediction detection frame; predicting the prediction probability of the target corresponding to each preset category in the target detection area determined according to the position information of each prediction detection frame through a classification network of an initial model; and after network parameters of the classification network are adjusted according to the real category information and the prediction probability in the labeling information, a target detection model for carrying out target detection on the image is obtained.
In one embodiment, as shown in FIG. 2, a method of training an object detection model is provided. The method is described as applied to a computer device (such as the terminal 110 or the server 120 in fig. 1) as an example. The method may include the following steps S202 to S212.
S202, a sample image and annotation information are obtained, wherein the annotation information comprises real position information and real category information of a target in the sample image, and the real position information comprises a rotation angle of a rectangular surrounding frame corresponding to the target.
The sample image is an image used for training an initial model, and the model obtained through training of the sample image has the capability of carrying out target detection on the image. Target detection requires not only segmentation of the target from the image, i.e. locating the target position, but also identification of the target class. The class information of the target in the sample image may be one or more of a plurality of preset classification classes, and the preset classification classes may be preset according to the actual application requirements, such as human faces, vehicles, animals, vehicles, and the like. The position information of the object in the sample image may be represented by position information of a rectangular enclosure frame surrounding the object, such as an x-coordinate of a geometric center point of the rectangular enclosure frame, a y-coordinate, a width w of the rectangular enclosure frame, and a height h of the rectangular enclosure frame, and the geometric center point of the rectangular enclosure frame does not change after the rectangular enclosure frame is rotated around the geometric center point. In addition, in the embodiments provided in the present application, the position information of the sample image further includes a rotation angle θ of the rectangular bounding box corresponding to the target, that is, the annotation information of the sample image can be represented by such a set of data including x, y, w, h, and θ. The rotation angle θ may be an offset angle of the rectangular bounding box when the rectangular bounding box is placed horizontally, for example, an included angle between a long side of the rectangular bounding box and a positive direction of an x axis of the sample image, an included angle between a long side of the rectangular bounding box and a positive direction of a y axis of the sample image, or an included angle between a short side of the rectangular bounding box and a positive direction of the x axis of the sample image or an included angle between a short side of the rectangular bounding box and a positive direction of the y axis of the sample image. The value of the rotation angle may be any value between 0 degrees and 360 degrees. It can be understood that, since the annotation information of the sample image includes the rotation angle of the sample image, the model obtained by training the annotated sample image also has the capability of identifying the rotation angle of the target in the image, so that the target in the image can be more accurately located according to the rotation angle.
Fig. 3 is a schematic diagram illustrating labeling of a sample image in an embodiment. Referring to fig. 3, the target in the sample image is a pencil, where the left side of fig. 3 is a schematic diagram illustrating labeling of the target in the sample image in the conventional technology, a rectangular bounding box in the drawing is a horizontal rectangular box, the horizontal rectangular box includes a large amount of background information, and the background information is even greater than the target information, which may cause a decrease in recognition rate and inaccurate target positioning. Fig. 3 is a schematic diagram of labeling a target in a sample image according to an embodiment of the present application, where a rectangular bounding box in the diagram is a rectangular box with a rotation angle, and the rectangular bounding box more accurately illustrates a position of the target in the image.
The initial model may be a machine learning model that may learn from the sample images to provide the ability to identify the images. In embodiments provided herein, a computer device may learn, from a sample image, the ability to perform object detection on the image. In one embodiment, the computer device may set a model structure of the machine learning model in advance to obtain an initial model, and train the initial model through the sample image to obtain model parameters of the machine learning model. When the image needs to be subjected to target detection, the computer equipment can obtain model parameters obtained by training in advance, and then the model parameters are imported into the initial model to obtain a target detection model with the capability of performing target detection on the image.
In one embodiment, before labeling the sample image, the training sample may be expanded, and obtaining the sample image includes: obtaining an original sample image; judging whether the aspect ratio of the original sample image is 1; if so, scaling the original sample image to a preset size in an equal proportion to obtain a sample image; if not, the original sample image is subjected to equal-scale scaling and then image pixels are supplemented, and a sample image with a preset size is obtained.
Because the input images of the feature extraction network need to have the same size, and the input and output of the whole network are fixed, the sample images need to be preprocessed first. Specifically, it is determined whether the width and the height of the original sample image are the same, and if so, the width and the height of the original sample image are scaled to a preset size, where the preset size may be S × S, for example. If the width and the height of the original sample image are different, when the width is larger than the height, firstly scaling the width of the original sample image to a preset size S, then scaling the height of the sample image to S' according to the width-to-height ratio of the original sample image, and then supplementing pixels to the upper area or the lower area of the original sample image to enable the height of the original sample image to be S, so that the sample image with the size of S is obtained; when the height is larger than the width, the height of the original sample image is firstly scaled to a preset size S, then the width of the original sample image is scaled to S' according to the aspect ratio of the original sample image, then pixels are supplemented to the left area or the right area of the original sample image to enable the width of the original sample image to be S, and therefore the sample image with the size S is obtained. The scaling is to ensure that the target in the sample image is not deformed, and to satisfy the requirement that the sample image is consistent in size and is not long enough or wide enough, the picture length or width is supplemented by pixels.
Fig. 4 is a schematic flow chart illustrating labeling of a sample image in an embodiment. Referring to fig. 4, the method includes the following steps:
s402, obtaining an original sample image;
s404, judging whether the aspect ratio of the original sample image is 1; if yes, go to step S406; if not, go to step S408;
s406, scaling the original sample picture to a preset size S in a high-width and high-height ratio;
s408, judging whether the width of the original sample image is larger than the height; if yes, go to step S410 a; if not, go to step S410 b;
s410a, the width of the original sample image is zoomed to a preset size S, and then the height of the sample image is zoomed to S' according to the aspect ratio of the original sample image;
s412a, supplementing pixels to the upper area or the lower area of the original sample image to make the height of the original sample image be S, and obtaining a sample image of S × S;
s410b, scaling the height of the original sample image to a preset size S, and then scaling the width of the original sample image to S' according to the aspect ratio of the original sample image;
s412b, supplementing pixels to the left or right area of the original sample image to make the width of the original sample image be S, and obtaining a sample image of S × S;
and S414, marking the adjusted sample image.
In one embodiment, the method further includes the step of obtaining the target aspect ratio by counting the aspect ratio of the rectangular bounding box labeled in the sample image: obtaining sample images and width and height information of rectangular surrounding frames corresponding to targets in the sample images; counting the width-to-height ratio of each rectangular surrounding frame according to the width-to-height information; and clustering the counted aspect ratio to obtain the target aspect ratio in the clustering result.
Specifically, after labeling the sample image, the computer device may obtain width and height information of a rectangular bounding box labeled in the sample image, count the width and height ratio, and perform clustering on the counted width and height ratio by using a clustering algorithm, where the number of types of clustering may be set as needed, for example, 3 clustering may be performed by using a K-means algorithm to obtain 3 width and height ratio values, and the obtained target width and height ratio values include w1: h1, w2: h2, w3: h 3. For example, 1:2, 1:3, 1: 4.
It should be noted that, the target aspect ratio obtained by the computer device is used to determine the initial detection frames from the feature map of the sample image in step S206, and the more the types of the target aspect ratio, the more the number of the determined initial detection frames.
In one embodiment, the method further comprises the step of enhancing the sample image, that is, the step of obtaining the sample image comprises: obtaining an original sample image; carrying out rotation processing on the original sample image according to a preset angle to obtain a sample image, and obtaining real labeling information of the sample image according to the rotation angle of a rectangular surrounding frame in the original sample image and the preset angle; or, carrying out vertical mirror image processing on the original sample image to obtain a sample image, and obtaining real annotation information of the sample image according to the rotation angle of the rectangular surrounding frame in the original sample image; or carrying out horizontal mirror image processing on the original sample image to obtain a sample image, and obtaining the real annotation information of the sample image according to the rotation angle of the rectangular surrounding frame in the original sample image.
Specifically, the computer device may rotate the original sample image by a preset angle, where the preset angle may be, for example, 30 ° or 60 °, and the computer device may further perform vertical mirroring or horizontal mirroring on the original sample image, and may further perform vertical mirroring or horizontal mirroring on the sample image processed by the preset rotation angle to obtain a new sample image. The rotation angle in the corresponding labeling information of the processed sample image needs to be modified correspondingly, so that the obtained new sample image can be added into a training sample library for training an initial model.
Fig. 5 is a schematic diagram illustrating an original sample image being subjected to enhancement processing to obtain a sample image in one embodiment. Referring to fig. 5, a schematic diagram of an enhanced sample image may be obtained by performing rotation processing, mirror processing, and further mirror processing on the rotation-processed picture on an original sample image.
In the embodiment, the sample images are subjected to image enhancement processing, so that the richness of the sample images can be improved, and the sample images are adopted to train the initial model, so that a more accurate and more stable target detection model can be obtained.
And S204, obtaining a characteristic diagram of the sample image through a characteristic extraction network of the initial model.
Wherein the feature map may be used to reflect characteristics of the sample image. According to the characteristics of the sample image, the target in the sample image can be positioned, and the class to which the sample image belongs can be classified. The initial model comprises a feature extraction network, a region generation network, a regression network and a classification network, and the computer equipment can input the sample image into the feature extraction network of the initial model in the process of training the initial model, and extract the image features of the sample image through the feature extraction network to obtain a feature map. The network parameters in the feature extraction network can be determined by training in advance, and the parameters of the feature extraction network are kept unchanged in the training process. The feature extraction network may be, for example, a convolutional neural network. In addition, the initial model can be built based on the network architecture of the Faster RCNN.
And S206, generating a network through the area of the initial model, and determining an initial detection frame in the feature map according to the preset rotation angle, the preset scale and the preset target aspect ratio.
The area generation network extracts an initial detection frame with a rotation angle from the feature map. Specifically, for each position point belonging to the foreground on the feature map, a corresponding initial detection frame is generated according to a preset rotation angle, a preset scale and a preset target aspect ratio. It is understood that if the predetermined rotation angle includes m, the predetermined scale includes n, and the target aspect ratio includes k, then in combination, m × n × k initial detection frames can be generated for each position point.
For example, the preset rotation angle includes 8 rotation angles, which are {0, 45 °, 90 °, 135 °, 180 °, 225 °, 270 °, 315 ° }. The predetermined scale includes 3 scales, which are 128 × 128, 256 × 256, 512 × 512, respectively. The target aspect ratio is determined by counting the aspect ratio of the rectangular surrounding frame marked in the sample image, and the target aspect ratio is obtained by analyzing the real aspect ratio of the target in the sample image, so that the aspect ratio of the initial detection frame determined according to the target aspect ratio is more appropriate to the real aspect ratio of the target, the storage of a detection network can be accelerated, and the accuracy is improved. For example, there may be 3 target aspect ratios, which are { w1: h1, w2: h2, w3: h3}, so that 8 × 3 — 72 different initial detection frames may be generated for each location point.
FIG. 6 is a diagram illustrating an initial detection box determined from a feature map in one embodiment. Referring to fig. 6, the size of the feature map is S × S, and for a certain position point (X0, Y0) on the feature map, the area generation network extracts 72 initial detection boxes, fig. 6 only shows 6 initial detection boxes, and the rotation angle, the size, and the aspect ratio corresponding to these 6 initial detection boxes are:
0°、128*128、1;
45°、128*128、1;
90°、128*128、1;
45°、256*128、2;
90°、256*256、1;
45°、256*512、1/2。
in one embodiment, for an input pair sample image, pixel points belonging to the foreground are obtained through a classification function in the area generation network, so that position points corresponding to the pixel points belonging to the foreground on the feature map are determined, and an initial detection frame is generated for each determined position point.
S208, adjusting the position of each initial detection frame through the regression network of the initial model to obtain the position information of the prediction detection frame, and adjusting the network parameters of the regression network according to the real position information in the labeling information and the position information of the prediction detection frame.
The regression network is used for adjusting the position of the generated initial detection frame according to the current network parameters to obtain the position information of the adjusted prediction detection frame. The position information of the prediction detection frame also includes the coordinates of the geometric center point of the prediction detection frame, the width and height of the prediction detection frame, and the rotation angle. The initial detection frame generally cannot accurately position the target in the sample image, the position information of the initial detection frame is adjusted through the current network parameters in the regression network, and the obtained prediction detection frame is closer to the target detection frame.
In one embodiment, adjusting the position of each initial detection frame to obtain the position information of the prediction detection frame includes: calculating the position offset of each initial detection frame according to the current network parameters of the regression network; obtaining the position information of a prediction detection frame according to the initial detection frame and the position offset; the position information includes coordinates of a geometric center point of the prediction detection frame, a width and a height of the prediction detection frame, and a rotation angle of the prediction detection frame.
In the above example, if the area generation network can generate 72 initial detection frames for each foreground position point, the regression network includes the coordinates of the geometric center point, the width and height of the predicted detection frame, and the rotation angle x, y, w, h, and θ in the position information of the predicted detection frame obtained after adjusting the position of the initial detection frame, and then the regression network has 72 × 5 — 360 output values for each point on the feature map.
In an embodiment, the training method of the target detection model further includes: determining a prediction detection frame according to the position information of the prediction detection frame; determining a rectangular surrounding frame corresponding to the target in the sample image according to the real position information; calculating the intersection and parallel ratio between the prediction detection frame and the rectangular surrounding frame; calculating the rotation angle difference between the prediction detection frame and the rectangular surrounding frame; when the intersection ratio is larger than a first threshold value and the rotation angle difference is smaller than a second threshold value, marking the sample image as a positive sample image; and when the intersection ratio is smaller than a third threshold value or the rotation angle difference is larger than a second threshold value, marking the sample image as a negative sample image.
The intersection ratio refers to a ratio of an overlapping area of the prediction detection frame and the real rectangular bounding box to a merging area, the overlapping area may be represented by the number of position points included in an overlapping region of the prediction detection frame and the real rectangular bounding box, and similarly, the merging area may be represented by the number of position points included in a region after the prediction detection frame and the real rectangular bounding box are merged. It is mentioned above that the regression network outputs the position information of the predicted detection frame, the position information including the rotation angle of the predicted detection frame, and thus the difference between the rotation angle of the predicted detection frame output by the regression network and the rotation angle of the true rectangular bounding frame can be determined. The intersection ratio and the difference of the rotation angle can reflect the accuracy of the prediction detection frame to a certain extent, and the larger the intersection ratio is, the higher the overlapping degree of the intersection ratio is, and the smaller the difference of the rotation angle is, the closer the intersection ratio is, the closer the position of the intersection ratio is. If the intersection ratio between the prediction detection frame and the real rectangular surrounding frame is greater than the first threshold and the difference between the prediction detection frame and the real rectangular surrounding frame in terms of the rotation angle is smaller than the second threshold, it is indicated that the prediction detection frame is closer to the real rectangular surrounding frame, and the sample image can be marked as a positive sample image. When the intersection ratio is less than a third threshold or the difference between the rotation angles is greater than a second threshold, the sample image is marked as a negative sample image. Wherein the first threshold may be 0.7, the second threshold may be 22.5 °, and the third threshold may be 0.3. If the difference between the intersection ratio or the rotation angle between the prediction detection frame and the real rectangular surrounding frame meets other conditions, the sample image does not belong to the positive sample image or the negative sample image and is not used for training.
In this embodiment, since the number of foreground position points is large, the number of initial detection frames determined according to each foreground position point is large, the number of prediction detection frames obtained by regression is large, and in order to reduce the data amount in the training process, the sample images may be screened according to the above method, and the model is trained only by using the screened sample images.
In one embodiment, after obtaining the predicted detection boxes, the computer device may further perform a filtering on all the predicted detection boxes according to the overlapping degree of the predicted detection boxes in order to reduce the calculation amount of the training process. The computer device may also cull prediction detection boxes that exceed image boundaries.
Further, after obtaining the position information of the prediction detection frame, the computer device also obtains the position information of the target in the sample image, and the computer device may adjust the network parameter of the regression network according to a difference between the actual position information of the target in the annotation information and the position information of the prediction detection frame.
S210, predicting the prediction probability of the target corresponding to each preset category in the target detection area determined according to the position information of each prediction detection frame through the classification network of the initial model.
Specifically, the input of the classification network includes a feature map and the determined position information of the prediction detection frame, and the computer device may determine a target detection region from the feature map according to the position information of the prediction detection frame, and predict the category of the sample image based on the target detection region.
In one embodiment, the determining the prediction probability of the target detection region prediction target corresponding to each preset category according to the position information of each prediction detection frame includes: determining a target detection area on the feature map according to the position information of each prediction detection frame; after adjusting the target detection areas to the same preset scale, obtaining the characteristic vectors corresponding to the target detection areas; and determining the prediction probability of the target detection area corresponding to each preset category according to the feature vector.
Specifically, the computer device may cut target detection regions with different sizes from the feature map according to the position information of each prediction detection frame, adjust each target detection Region to the same preset scale through ROI Pooling (Region of interest Pooling, candidate Region Pooling), obtain a feature vector corresponding to each target detection Region, and determine a probability vector of each target detection Region belonging to each preset category through the full-link layer and the normalization layer, thereby obtaining a prediction probability corresponding to each preset category.
S212, after network parameters of the classification network are adjusted according to the real category information and the prediction probability in the labeling information, a target detection model for carrying out target detection on the image is obtained.
And finally, after the class probability of the preset class corresponding to the prediction detection frame in the sample image is determined, constructing a loss function of the classification network according to the class probability and the real class information of the target in the sample image, and adjusting the network parameters of the classification network according to the adjustment direction when the loss function is minimized. For all sample images, the computer device may perform the above steps S202 to S212 on the current model until an object detection model capable of performing object detection on the image is obtained.
According to the training method of the target detection model, on one hand, when the target detection model is trained, the labeling information of the sample image comprises real position information and real category information, and the real position information comprises the rotation angle, so that the trained target detection model can have the capability of identifying the rotation angle of the target in the image, and the positioned target detection frame is more accurate. On the other hand, in the process of training the target detection model, the rotation angle, the scale and the target aspect ratio which are used for generating the initial detection frame in the area generation network are initialized, the generation mode of the initial detection frame is enriched, the trained target detection model is more stable, and the generated initial detection frame is closer to the real target detection frame due to the fact that the initial detection frame is determined according to the preset rotation angle. Therefore, after the position of the initial detection frame is adjusted through the regression network, the prediction detection frame is obtained, the target detection area on the feature map is obtained according to the prediction detection frame, the network parameters of the regression network are adjusted according to the real position information in the labeling information and the position information of the prediction detection frame, and after the class probability of the target detection area is predicted through the classification network, the network parameters of the classification network can be adjusted according to the real class information and the prediction probability in the labeling information, so that the target detection model which can carry out target detection on the rotating target in the image and can position the target more accurately is obtained.
In a specific embodiment, as shown in fig. 7, the method for training the target detection model includes the following steps:
s702, acquiring an original sample image.
S704, the original sample images and the width and height information of the rectangular surrounding frame corresponding to the target in each original sample image are obtained.
And S706, counting the aspect ratio of each rectangular surrounding frame according to the aspect information.
S708, clustering the counted aspect ratio to obtain the target aspect ratio in the clustering result.
S710, judging whether the aspect ratio of the original sample image is 1; if so, scaling the original sample image to a preset size in an equal proportion to obtain a sample image; if not, the original sample image is subjected to equal-scale scaling and then image pixels are supplemented, and a sample image with a preset size is obtained.
And S712, performing rotation processing on the sample image according to a preset angle to obtain a newly added sample image.
And S714, performing vertical mirror image processing on the sample image to obtain a newly added sample image.
And S716, performing horizontal mirror image processing on the sample image to obtain a newly added sample image.
And S718, obtaining the real labeling information of the newly added sample image according to the rotation angle of the rectangular surrounding frame in the sample image.
And S720, obtaining a characteristic diagram of the sample image through the characteristic extraction network of the initial model.
And S722, generating a network through the region of the initial model, and determining an initial detection frame in the feature map according to a preset rotation angle, a preset scale and a preset target aspect ratio.
S724, calculating the position offset of each initial detection frame according to the current network parameters of the regression network through the regression network of the initial model; obtaining the position information of a prediction detection frame according to the initial detection frame and the position offset; the position information includes coordinates of a geometric center point of the prediction detection frame, a width and a height of the prediction detection frame, and a rotation angle of the prediction detection frame.
And S726, adjusting the network parameters of the regression network according to the real position information in the labeling information and the position information of the prediction detection frame.
S728, the prediction detection frame is determined based on the position information of the prediction detection frame.
And S730, determining a rectangular surrounding frame corresponding to the target in the sample image according to the real position information.
S732, calculating the intersection ratio and the rotation angle difference between the prediction detection frame and the rectangular bounding frame.
S734, when the intersection ratio is greater than the first threshold and the rotation angle difference is less than the second threshold, the sample image is marked as a positive sample image.
And S736, when the intersection ratio is smaller than a third threshold value or the rotation angle difference is larger than a second threshold value, marking the sample image as a negative sample image.
S738, determining a target detection area on the feature map according to the position information of each prediction detection frame through the classification network of the initial model.
And S740, adjusting the target detection areas to the same preset scale, and then obtaining the feature vectors corresponding to the target detection areas.
And S742, determining the prediction probability of each preset category corresponding to the target detection area according to the feature vector.
S744, after network parameters of the classification network are adjusted according to the real category information and the prediction probability in the labeling information, a target detection model for carrying out target detection on the image is obtained.
FIG. 7 is a flowchart illustrating a method for training a target detection model according to an embodiment. It should be understood that, although the steps in the flowchart of fig. 7 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 7 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 8, a training apparatus 800 for an object detection model is provided, the apparatus including a sample image acquisition module 802, a feature map acquisition module 804, an initial detection frame generation module 806, a position regression module 808, and a classification module 810, wherein:
the sample image obtaining module 802 is configured to obtain a sample image and annotation information, where the annotation information includes real position information and real category information of a target in the sample image.
And the feature map obtaining module 804 is configured to obtain a feature map of the sample image through a feature extraction network of the initial model.
And an initial detection frame generation module 806, configured to generate a network through the area of the initial model, and determine an initial detection frame in the feature map according to a preset rotation angle, a preset scale, and a preset target aspect ratio.
And the position regression module 808 is configured to adjust the position of each initial detection frame through the regression network of the initial model to obtain the position information of the predicted detection frame, and adjust the network parameters of the regression network according to the actual position information in the labeling information and the position information of the predicted detection frame.
The classification module 810 is configured to predict, through a classification network of the initial model, prediction probabilities of the target detection regions determined according to the position information of the prediction detection frames, where the target detection regions predict the target corresponding to the preset categories; and after network parameters of the classification network are adjusted according to the real category information and the prediction probability in the labeling information, a target detection model for carrying out target detection on the image is obtained.
In one embodiment, the sample image acquisition module 802 is further configured to acquire an original sample image; judging whether the aspect ratio of the original sample image is 1; if so, scaling the original sample image to a preset size in an equal proportion to obtain a sample image; if not, the original sample image is subjected to equal-scale scaling and then image pixels are supplemented, and a sample image with a preset size is obtained.
In one embodiment, the sample image acquisition module 802 is further configured to acquire an original sample image; carrying out rotation processing on the original sample image according to a preset angle to obtain a sample image, and obtaining real labeling information of the sample image according to the rotation angle of a rectangular surrounding frame in the original sample image and the preset angle; or, carrying out vertical mirror image processing on the original sample image to obtain a sample image, and obtaining real annotation information of the sample image according to the rotation angle of the rectangular surrounding frame in the original sample image; or carrying out horizontal mirror image processing on the original sample image to obtain a sample image, and obtaining the real annotation information of the sample image according to the rotation angle of the rectangular surrounding frame in the original sample image.
In one embodiment, the apparatus further includes a sample image preprocessing module, configured to obtain width and height information of the sample image and a rectangular bounding box corresponding to the target in each sample image; counting the width-to-height ratio of each rectangular surrounding frame according to the width-to-height information; and clustering the counted aspect ratio to obtain the target aspect ratio in the clustering result.
In one embodiment, the apparatus further includes a statistical module, configured to obtain width and height information of the sample image and a rectangular bounding box corresponding to the target in each sample image; counting the width-to-height ratio of each rectangular surrounding frame according to the width-to-height information; and clustering the counted aspect ratio to obtain the target aspect ratio in the clustering result.
In one embodiment, the apparatus further includes a screening module, configured to determine the predicted detection frame according to the position information of the predicted detection frame; determining a rectangular surrounding frame corresponding to the target in the sample image according to the real position information; calculating the intersection and parallel ratio between the prediction detection frame and the rectangular surrounding frame; calculating the rotation angle difference between the prediction detection frame and the rectangular surrounding frame; when the intersection ratio is larger than a first threshold value and the rotation angle difference is smaller than a second threshold value, marking the sample image as a positive sample image; and when the intersection ratio is smaller than a third threshold value or the rotation angle difference is larger than a second threshold value, marking the sample image as a negative sample image.
In one embodiment, the classification module is further configured to determine a target detection area on the feature map according to the position information of each of the prediction detection frames; after adjusting the target detection areas to the same preset scale, obtaining the characteristic vectors corresponding to the target detection areas; and determining the prediction probability of the target detection area corresponding to each preset category according to the feature vector.
In the training apparatus 800 for the target detection model, on one hand, when the target detection model is trained, the labeling information of the sample image includes the real position information and the real category information, and the real position information includes the rotation angle, so that the trained target detection model can have the capability of identifying the rotation angle of the target in the image, and the positioned target detection frame is more accurate. On the other hand, in the process of training the target detection model, the rotation angle, the scale and the target aspect ratio which are used for generating the initial detection frame in the area generation network are initialized, the generation mode of the initial detection frame is enriched, the trained target detection model is more stable, and the generated initial detection frame is closer to the real target detection frame due to the fact that the initial detection frame is determined according to the preset rotation angle. Therefore, after the position of the initial detection frame is adjusted through the regression network, the prediction detection frame is obtained, the target detection area on the feature map is obtained according to the prediction detection frame, the network parameters of the regression network are adjusted according to the real position information in the labeling information and the position information of the prediction detection frame, and after the class probability of the target detection area is predicted through the classification network, the network parameters of the classification network can be adjusted according to the real class information and the prediction probability in the labeling information, so that the target detection model which can carry out target detection on the rotating target in the image and can position the target more accurately is obtained.
FIG. 9 is a diagram illustrating an internal structure of a computer device in one embodiment. The computer device may specifically be the computer device of fig. 1. As shown in fig. 9, the computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the memory includes a non-volatile storage medium and an internal memory. The non-volatile storage medium of the computer device stores an operating system and may also store a computer program that, when executed by the processor, causes the processor to implement a method of training a target detection model. The internal memory may also have a computer program stored therein, which when executed by the processor, causes the processor to perform a method of training a target detection model.
Those skilled in the art will appreciate that the architecture shown in fig. 9 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, the training apparatus for the object detection model provided in the present application may be implemented in the form of a computer program, and the computer program may be run on a computer device as shown in fig. 9. The memory of the computer device may store various program modules of the training apparatus constituting the object detection model, such as the sample image acquisition module 802, the feature map acquisition module 804, the initial detection frame generation module 806, the position regression module 808, and the classification module 810 shown in fig. 8. The program modules constitute computer programs that cause the processors to execute the steps of the training method of the object detection model of the embodiments of the present application described in the present specification.
For example, the computer device shown in fig. 9 may execute step S202 through the sample image acquisition module 802 in the training apparatus of the target detection model shown in fig. 8. The computer device may execute step S204 through the feature map obtaining module 804. The computer device may perform step S206 by the initial detection box generation module 806. The computer device may perform step S208 by the location regression module 808. The computer device may perform steps S210 and S212 through the classification module 810. ' Qijian
In an embodiment, a computer device is provided, comprising a memory and a processor, the memory storing a computer program which, when executed by the processor, causes the processor to perform the steps of the above training method of an object detection model. Here, the steps of the training method of the target detection model may be steps in the training method of the target detection model of the above embodiments.
In one embodiment, a computer-readable storage medium is provided, in which a computer program is stored, which, when being executed by a processor, causes the processor to carry out the steps of the above-mentioned method for training an object detection model. Here, the steps of the training method of the target detection model may be steps in the training method of the target detection model of the above embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A method of training an object detection model, comprising:
acquiring a sample image and annotation information, wherein the annotation information comprises real position information and real category information of a target in the sample image, and the real position information comprises a rotation angle of a rectangular surrounding frame corresponding to the target;
obtaining a feature map of the sample image through a feature extraction network of an initial model;
generating a network through an area of an initial model, and determining an initial detection frame in the feature map according to a preset rotation angle, a preset scale and a preset target aspect ratio;
adjusting the position of each initial detection frame through a regression network of an initial model to obtain the position information of a prediction detection frame, and adjusting the network parameters of the regression network according to the real position information in the labeling information and the position information of the prediction detection frame;
predicting the prediction probability of each preset category corresponding to the target according to the target detection area determined by the position information of each prediction detection frame through a classification network of an initial model;
and after network parameters of the classification network are adjusted according to the real category information and the prediction probability in the labeling information, a target detection model for carrying out target detection on the image is obtained.
2. The method of claim 1, wherein the obtaining a sample image comprises:
obtaining an original sample image;
judging whether the aspect ratio of the original sample image is 1; if so, scaling the original sample image to a preset size in an equal proportion to obtain a sample image; if not, the original sample image is subjected to equal-scale scaling and then image pixels are supplemented, and a sample image with a preset size is obtained.
3. The method of claim 1, wherein the obtaining a sample image comprises:
obtaining an original sample image;
performing rotation processing on the original sample image according to a preset angle to obtain a sample image, and obtaining real labeling information of the sample image according to the rotation angle of a rectangular surrounding frame in the original sample image and the preset angle; alternatively, the first and second electrodes may be,
performing vertical mirror image processing on the original sample image to obtain a sample image, and obtaining real annotation information of the sample image according to the rotation angle of a rectangular surrounding frame in the original sample image; alternatively, the first and second electrodes may be,
and carrying out horizontal mirror image processing on the original sample image to obtain a sample image, and obtaining the real annotation information of the sample image according to the rotation angle of the rectangular surrounding frame in the original sample image.
4. The method of claim 1, wherein the step of determining the target aspect ratio comprises:
obtaining sample images and width and height information of a rectangular surrounding frame corresponding to a target in each sample image;
counting the aspect ratio of each rectangular surrounding frame according to the aspect information;
and clustering the counted aspect ratio to obtain the target aspect ratio in the clustering result.
5. The method of claim 1, wherein the adjusting the position of each of the initial detection frames to obtain the position information of the predicted detection frame comprises:
calculating the position offset of each initial detection frame according to the current network parameters of the regression network;
obtaining the position information of a prediction detection frame according to the initial detection frame and the position offset; the position information includes coordinates of a geometric center point of the prediction detection frame, a width and a height of the prediction detection frame, and a rotation angle of the prediction detection frame.
6. The method of claim 5, further comprising:
determining the prediction detection frame according to the position information of the prediction detection frame;
determining a rectangular surrounding frame corresponding to a target in the sample image according to the real position information;
calculating the intersection ratio between the prediction detection frame and the rectangular surrounding frame;
calculating a rotation angle difference between the prediction detection frame and the rectangular enclosure frame;
when the intersection ratio is larger than a first threshold value and the rotation angle difference is smaller than a second threshold value, marking the sample image as a positive sample image;
when the intersection ratio is smaller than a third threshold value or the rotation angle difference is larger than a second threshold value, the sample image is marked as a negative sample image.
7. The method according to claim 1, wherein the predicting the prediction probability of each preset category corresponding to the target according to the target detection area determined according to the position information of each prediction detection frame comprises:
determining a target detection area on the feature map according to the position information of each prediction detection frame;
after the target detection areas are adjusted to the same preset scale, acquiring a feature vector corresponding to each target detection area;
and determining the prediction probability of each preset category corresponding to the target detection area according to the feature vector.
8. An apparatus for training an object detection model, the apparatus comprising:
the system comprises a sample image acquisition module, a target identification module and a target identification module, wherein the sample image acquisition module is used for acquiring a sample image and annotation information, and the annotation information comprises real position information and real category information of a target in the sample image;
the characteristic diagram acquisition module is used for acquiring a characteristic diagram of the sample image through a characteristic extraction network of the initial model;
the initial detection frame generation module is used for generating a network through an area of an initial model and determining an initial detection frame in the feature map according to a preset rotation angle, a preset scale and a preset target aspect ratio;
the position regression module is used for adjusting the position of each initial detection frame through a regression network of an initial model to obtain the position information of the prediction detection frame, and adjusting the network parameters of the regression network according to the real position information in the labeling information and the position information of the prediction detection frame;
the classification module is used for predicting the prediction probability of each preset class corresponding to the target according to the target detection area determined by the position information of each prediction detection frame through a classification network of an initial model; and after network parameters of the classification network are adjusted according to the real category information and the prediction probability in the labeling information, a target detection model for carrying out target detection on the image is obtained.
9. A computer-readable storage medium, storing a computer program which, when executed by a processor, causes the processor to carry out the steps of the method according to any one of claims 1 to 7.
10. A computer device comprising a memory and a processor, the memory storing a computer program that, when executed by the processor, causes the processor to perform the steps of the method according to any one of claims 1 to 7.
CN201911422532.3A 2019-12-31 2019-12-31 Training method and device for target detection model, storage medium and computer equipment Active CN111241947B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911422532.3A CN111241947B (en) 2019-12-31 2019-12-31 Training method and device for target detection model, storage medium and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911422532.3A CN111241947B (en) 2019-12-31 2019-12-31 Training method and device for target detection model, storage medium and computer equipment

Publications (2)

Publication Number Publication Date
CN111241947A true CN111241947A (en) 2020-06-05
CN111241947B CN111241947B (en) 2023-07-18

Family

ID=70874291

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911422532.3A Active CN111241947B (en) 2019-12-31 2019-12-31 Training method and device for target detection model, storage medium and computer equipment

Country Status (1)

Country Link
CN (1) CN111241947B (en)

Cited By (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111680680A (en) * 2020-06-09 2020-09-18 创新奇智(合肥)科技有限公司 Object code positioning method and device, electronic equipment and storage medium
CN111797737A (en) * 2020-06-22 2020-10-20 重庆高新区飞马创新研究院 Remote sensing target detection method and device
CN111814905A (en) * 2020-07-23 2020-10-23 上海眼控科技股份有限公司 Target detection method, target detection device, computer equipment and storage medium
CN111862001A (en) * 2020-06-28 2020-10-30 微医云(杭州)控股有限公司 Semi-automatic labeling method and device for CT image, electronic equipment and storage medium
CN111898659A (en) * 2020-07-16 2020-11-06 北京灵汐科技有限公司 Target detection method and system
CN111931864A (en) * 2020-09-17 2020-11-13 南京甄视智能科技有限公司 Method and system for multiple optimization of target detector based on vertex distance and cross-over ratio
CN112001247A (en) * 2020-07-17 2020-11-27 浙江大华技术股份有限公司 Multi-target detection method, equipment and storage device
CN112052787A (en) * 2020-09-03 2020-12-08 腾讯科技(深圳)有限公司 Target detection method and device based on artificial intelligence and electronic equipment
CN112115898A (en) * 2020-09-24 2020-12-22 深圳市赛为智能股份有限公司 Multi-pointer instrument detection method and device, computer equipment and storage medium
CN112149684A (en) * 2020-08-19 2020-12-29 北京豆牛网络科技有限公司 Image processing method and image preprocessing method for target detection
CN112183529A (en) * 2020-09-23 2021-01-05 创新奇智(北京)科技有限公司 Quadrilateral object detection method, quadrilateral object model training method, quadrilateral object detection device, quadrilateral object model training device and storage medium
CN112418344A (en) * 2020-12-07 2021-02-26 汇纳科技股份有限公司 Training method, target detection method, medium and electronic device
CN112461130A (en) * 2020-11-16 2021-03-09 北京平恒智能科技有限公司 Positioning method for visual inspection tool frame of adhesive product
CN112464785A (en) * 2020-11-25 2021-03-09 浙江大华技术股份有限公司 Target detection method and device, computer equipment and storage medium
CN112489011A (en) * 2020-11-27 2021-03-12 上海航天控制技术研究所 Intelligent assembling and adjusting method for star sensor optical machine component
CN112488118A (en) * 2020-12-18 2021-03-12 哈尔滨工业大学(深圳) Target detection method and related device
CN112508975A (en) * 2020-12-21 2021-03-16 上海眼控科技股份有限公司 Image identification method, device, equipment and storage medium
CN112613570A (en) * 2020-12-29 2021-04-06 深圳云天励飞技术股份有限公司 Image detection method, image detection device, equipment and storage medium
CN112686162A (en) * 2020-12-31 2021-04-20 北京每日优鲜电子商务有限公司 Method, device, equipment and storage medium for detecting clean state of warehouse environment
CN112766418A (en) * 2021-03-02 2021-05-07 阳光财产保险股份有限公司 Image text direction classification method, device, equipment and storage medium
CN112801164A (en) * 2021-01-22 2021-05-14 北京百度网讯科技有限公司 Training method, device and equipment of target detection model and storage medium
CN112799055A (en) * 2020-12-28 2021-05-14 深圳承泰科技有限公司 Method and device for detecting detected vehicle and electronic equipment
CN113128485A (en) * 2021-03-17 2021-07-16 北京达佳互联信息技术有限公司 Training method of text detection model, text detection method and device
CN113269188A (en) * 2021-06-17 2021-08-17 华南农业大学 General method for detecting mark points and pixel coordinates thereof
CN113283345A (en) * 2021-05-27 2021-08-20 新东方教育科技集团有限公司 Blackboard writing behavior detection method, training method, device, medium and equipment
CN113313111A (en) * 2021-05-28 2021-08-27 北京百度网讯科技有限公司 Text recognition method, device, equipment and medium
CN113343853A (en) * 2021-06-08 2021-09-03 深圳格瑞健康管理有限公司 Intelligent screening method and device for child dental caries
CN113643323A (en) * 2021-08-20 2021-11-12 中国矿业大学 Target detection system under dust and fog environment of urban underground comprehensive pipe gallery
CN113744213A (en) * 2021-08-23 2021-12-03 上海明略人工智能(集团)有限公司 Method and system for detecting regularity of food balance, computer equipment and storage medium
CN113748430A (en) * 2021-06-28 2021-12-03 商汤国际私人有限公司 Object detection network training and detection method, device, equipment and storage medium
CN113836977A (en) * 2020-06-24 2021-12-24 顺丰科技有限公司 Target detection method and device, electronic equipment and storage medium
WO2022000855A1 (en) * 2020-06-29 2022-01-06 魔门塔(苏州)科技有限公司 Target detection method and device
CN113920068A (en) * 2021-09-23 2022-01-11 北京医准智能科技有限公司 Body part detection method and device based on artificial intelligence and electronic equipment
CN114022695A (en) * 2021-10-29 2022-02-08 北京百度网讯科技有限公司 Training method and device for detection model, electronic equipment and storage medium
CN114220063A (en) * 2021-11-17 2022-03-22 浙江大华技术股份有限公司 Target detection method and device
CN114462469A (en) * 2021-12-20 2022-05-10 浙江大华技术股份有限公司 Training method of target detection model, target detection method and related device
WO2022142783A1 (en) * 2020-12-29 2022-07-07 华为云计算技术有限公司 Image processing method and related device
CN114862810A (en) * 2022-05-19 2022-08-05 苏州大学 Fruit counting method, device and storage medium
CN115375917A (en) * 2022-10-25 2022-11-22 杭州华橙软件技术有限公司 Target edge feature extraction method, device, terminal and storage medium
CN115482417A (en) * 2022-09-29 2022-12-16 珠海视熙科技有限公司 Multi-target detection model and training method, device, medium and equipment thereof
CN116128954A (en) * 2022-12-30 2023-05-16 上海强仝智能科技有限公司 Commodity layout identification method, device and storage medium based on generation network
CN116862980A (en) * 2023-06-12 2023-10-10 上海玉贲智能科技有限公司 Target detection frame position optimization correction method, system, medium and terminal for image edge
TWI822282B (en) * 2021-12-02 2023-11-11 美商萬國商業機器公司 Computer-implemented method, computer system and computer program product for object detection considering tendency of object location
CN117611513A (en) * 2022-11-08 2024-02-27 郑州英视江河生态环境科技有限公司 Microscopic biological image processing method, device and system
CN117831082A (en) * 2023-12-29 2024-04-05 广电运通集团股份有限公司 Palm area detection method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108376235A (en) * 2018-01-15 2018-08-07 深圳市易成自动驾驶技术有限公司 Image detecting method, device and computer readable storage medium
CN109815868A (en) * 2019-01-15 2019-05-28 腾讯科技(深圳)有限公司 A kind of image object detection method, device and storage medium
CN109961040A (en) * 2019-03-20 2019-07-02 深圳市华付信息技术有限公司 Identity card area positioning method, device, computer equipment and storage medium
CN110097018A (en) * 2019-05-08 2019-08-06 深圳供电局有限公司 Transformer substation instrument detection method and device, computer equipment and storage medium
CN110232311A (en) * 2019-04-26 2019-09-13 平安科技(深圳)有限公司 Dividing method, device and the computer equipment of hand images
CN110298298A (en) * 2019-06-26 2019-10-01 北京市商汤科技开发有限公司 Target detection and the training method of target detection network, device and equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108376235A (en) * 2018-01-15 2018-08-07 深圳市易成自动驾驶技术有限公司 Image detecting method, device and computer readable storage medium
CN109815868A (en) * 2019-01-15 2019-05-28 腾讯科技(深圳)有限公司 A kind of image object detection method, device and storage medium
CN109961040A (en) * 2019-03-20 2019-07-02 深圳市华付信息技术有限公司 Identity card area positioning method, device, computer equipment and storage medium
CN110232311A (en) * 2019-04-26 2019-09-13 平安科技(深圳)有限公司 Dividing method, device and the computer equipment of hand images
CN110097018A (en) * 2019-05-08 2019-08-06 深圳供电局有限公司 Transformer substation instrument detection method and device, computer equipment and storage medium
CN110298298A (en) * 2019-06-26 2019-10-01 北京市商汤科技开发有限公司 Target detection and the training method of target detection network, device and equipment

Cited By (70)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111680680A (en) * 2020-06-09 2020-09-18 创新奇智(合肥)科技有限公司 Object code positioning method and device, electronic equipment and storage medium
CN111680680B (en) * 2020-06-09 2023-10-13 创新奇智(合肥)科技有限公司 Target code positioning method and device, electronic equipment and storage medium
CN111797737A (en) * 2020-06-22 2020-10-20 重庆高新区飞马创新研究院 Remote sensing target detection method and device
CN113836977B (en) * 2020-06-24 2024-02-23 顺丰科技有限公司 Target detection method, target detection device, electronic equipment and storage medium
CN113836977A (en) * 2020-06-24 2021-12-24 顺丰科技有限公司 Target detection method and device, electronic equipment and storage medium
CN111862001A (en) * 2020-06-28 2020-10-30 微医云(杭州)控股有限公司 Semi-automatic labeling method and device for CT image, electronic equipment and storage medium
CN111862001B (en) * 2020-06-28 2023-11-28 微医云(杭州)控股有限公司 Semi-automatic labeling method and device for CT images, electronic equipment and storage medium
WO2022000855A1 (en) * 2020-06-29 2022-01-06 魔门塔(苏州)科技有限公司 Target detection method and device
CN111898659A (en) * 2020-07-16 2020-11-06 北京灵汐科技有限公司 Target detection method and system
CN112001247A (en) * 2020-07-17 2020-11-27 浙江大华技术股份有限公司 Multi-target detection method, equipment and storage device
CN111814905A (en) * 2020-07-23 2020-10-23 上海眼控科技股份有限公司 Target detection method, target detection device, computer equipment and storage medium
CN112149684A (en) * 2020-08-19 2020-12-29 北京豆牛网络科技有限公司 Image processing method and image preprocessing method for target detection
CN112149684B (en) * 2020-08-19 2024-06-07 北京豆牛网络科技有限公司 Image processing method and image preprocessing method for target detection
CN112052787B (en) * 2020-09-03 2021-07-30 腾讯科技(深圳)有限公司 Target detection method and device based on artificial intelligence and electronic equipment
CN112052787A (en) * 2020-09-03 2020-12-08 腾讯科技(深圳)有限公司 Target detection method and device based on artificial intelligence and electronic equipment
CN111931864A (en) * 2020-09-17 2020-11-13 南京甄视智能科技有限公司 Method and system for multiple optimization of target detector based on vertex distance and cross-over ratio
CN112396122A (en) * 2020-09-17 2021-02-23 南京甄视智能科技有限公司 Method and system for multiple optimization of target detector based on vertex distance and cross-over ratio
CN112396122B (en) * 2020-09-17 2022-11-22 小视科技(江苏)股份有限公司 Method and system for multiple optimization of target detector based on vertex distance and cross-over ratio
CN112183529A (en) * 2020-09-23 2021-01-05 创新奇智(北京)科技有限公司 Quadrilateral object detection method, quadrilateral object model training method, quadrilateral object detection device, quadrilateral object model training device and storage medium
CN112115898A (en) * 2020-09-24 2020-12-22 深圳市赛为智能股份有限公司 Multi-pointer instrument detection method and device, computer equipment and storage medium
CN112461130A (en) * 2020-11-16 2021-03-09 北京平恒智能科技有限公司 Positioning method for visual inspection tool frame of adhesive product
CN112464785A (en) * 2020-11-25 2021-03-09 浙江大华技术股份有限公司 Target detection method and device, computer equipment and storage medium
CN112489011B (en) * 2020-11-27 2023-01-31 上海航天控制技术研究所 Intelligent assembling and adjusting method for star sensor optical machine component
CN112489011A (en) * 2020-11-27 2021-03-12 上海航天控制技术研究所 Intelligent assembling and adjusting method for star sensor optical machine component
CN112418344B (en) * 2020-12-07 2023-11-21 汇纳科技股份有限公司 Training method, target detection method, medium and electronic equipment
CN112418344A (en) * 2020-12-07 2021-02-26 汇纳科技股份有限公司 Training method, target detection method, medium and electronic device
CN112488118A (en) * 2020-12-18 2021-03-12 哈尔滨工业大学(深圳) Target detection method and related device
CN112488118B (en) * 2020-12-18 2023-08-08 哈尔滨工业大学(深圳) Target detection method and related device
CN112508975A (en) * 2020-12-21 2021-03-16 上海眼控科技股份有限公司 Image identification method, device, equipment and storage medium
CN112799055A (en) * 2020-12-28 2021-05-14 深圳承泰科技有限公司 Method and device for detecting detected vehicle and electronic equipment
CN112613570B (en) * 2020-12-29 2024-06-11 深圳云天励飞技术股份有限公司 Image detection method, image detection device, equipment and storage medium
WO2022142783A1 (en) * 2020-12-29 2022-07-07 华为云计算技术有限公司 Image processing method and related device
CN112613570A (en) * 2020-12-29 2021-04-06 深圳云天励飞技术股份有限公司 Image detection method, image detection device, equipment and storage medium
CN112686162A (en) * 2020-12-31 2021-04-20 北京每日优鲜电子商务有限公司 Method, device, equipment and storage medium for detecting clean state of warehouse environment
CN112686162B (en) * 2020-12-31 2023-12-15 鄂尔多斯市空港大数据运营有限公司 Method, device, equipment and storage medium for detecting clean state of warehouse environment
CN112801164A (en) * 2021-01-22 2021-05-14 北京百度网讯科技有限公司 Training method, device and equipment of target detection model and storage medium
CN112801164B (en) * 2021-01-22 2024-02-13 北京百度网讯科技有限公司 Training method, device, equipment and storage medium of target detection model
CN112766418A (en) * 2021-03-02 2021-05-07 阳光财产保险股份有限公司 Image text direction classification method, device, equipment and storage medium
CN113128485A (en) * 2021-03-17 2021-07-16 北京达佳互联信息技术有限公司 Training method of text detection model, text detection method and device
CN113283345A (en) * 2021-05-27 2021-08-20 新东方教育科技集团有限公司 Blackboard writing behavior detection method, training method, device, medium and equipment
CN113283345B (en) * 2021-05-27 2023-11-24 新东方教育科技集团有限公司 Blackboard writing behavior detection method, training device, medium and equipment
CN113313111A (en) * 2021-05-28 2021-08-27 北京百度网讯科技有限公司 Text recognition method, device, equipment and medium
CN113343853A (en) * 2021-06-08 2021-09-03 深圳格瑞健康管理有限公司 Intelligent screening method and device for child dental caries
CN113343853B (en) * 2021-06-08 2024-06-14 深圳格瑞健康科技有限公司 Intelligent screening method and device for dental caries of children
CN113269188A (en) * 2021-06-17 2021-08-17 华南农业大学 General method for detecting mark points and pixel coordinates thereof
CN113269188B (en) * 2021-06-17 2023-03-14 华南农业大学 Mark point and pixel coordinate detection method thereof
CN113748430B (en) * 2021-06-28 2024-05-24 商汤国际私人有限公司 Training and detecting method, device, equipment and storage medium of object detection network
CN113748430A (en) * 2021-06-28 2021-12-03 商汤国际私人有限公司 Object detection network training and detection method, device, equipment and storage medium
CN113643323B (en) * 2021-08-20 2023-10-03 中国矿业大学 Target detection system under urban underground comprehensive pipe rack dust fog environment
CN113643323A (en) * 2021-08-20 2021-11-12 中国矿业大学 Target detection system under dust and fog environment of urban underground comprehensive pipe gallery
CN113744213A (en) * 2021-08-23 2021-12-03 上海明略人工智能(集团)有限公司 Method and system for detecting regularity of food balance, computer equipment and storage medium
CN113920068A (en) * 2021-09-23 2022-01-11 北京医准智能科技有限公司 Body part detection method and device based on artificial intelligence and electronic equipment
CN114022695A (en) * 2021-10-29 2022-02-08 北京百度网讯科技有限公司 Training method and device for detection model, electronic equipment and storage medium
CN114220063B (en) * 2021-11-17 2023-04-07 浙江大华技术股份有限公司 Target detection method and device
CN114220063A (en) * 2021-11-17 2022-03-22 浙江大华技术股份有限公司 Target detection method and device
TWI822282B (en) * 2021-12-02 2023-11-11 美商萬國商業機器公司 Computer-implemented method, computer system and computer program product for object detection considering tendency of object location
CN114462469A (en) * 2021-12-20 2022-05-10 浙江大华技术股份有限公司 Training method of target detection model, target detection method and related device
CN114462469B (en) * 2021-12-20 2023-04-18 浙江大华技术股份有限公司 Training method of target detection model, target detection method and related device
CN114862810A (en) * 2022-05-19 2022-08-05 苏州大学 Fruit counting method, device and storage medium
CN114862810B (en) * 2022-05-19 2024-07-02 苏州大学 Fruit counting method, device and storage medium
CN115482417B (en) * 2022-09-29 2023-08-08 珠海视熙科技有限公司 Multi-target detection model, training method, device, medium and equipment thereof
CN115482417A (en) * 2022-09-29 2022-12-16 珠海视熙科技有限公司 Multi-target detection model and training method, device, medium and equipment thereof
CN115375917A (en) * 2022-10-25 2022-11-22 杭州华橙软件技术有限公司 Target edge feature extraction method, device, terminal and storage medium
CN115375917B (en) * 2022-10-25 2023-03-24 杭州华橙软件技术有限公司 Target edge feature extraction method, device, terminal and storage medium
CN117611513A (en) * 2022-11-08 2024-02-27 郑州英视江河生态环境科技有限公司 Microscopic biological image processing method, device and system
CN116128954B (en) * 2022-12-30 2023-12-05 上海强仝智能科技有限公司 Commodity layout identification method, device and storage medium based on generation network
CN116128954A (en) * 2022-12-30 2023-05-16 上海强仝智能科技有限公司 Commodity layout identification method, device and storage medium based on generation network
CN116862980B (en) * 2023-06-12 2024-01-23 上海玉贲智能科技有限公司 Target detection frame position optimization correction method, system, medium and terminal for image edge
CN116862980A (en) * 2023-06-12 2023-10-10 上海玉贲智能科技有限公司 Target detection frame position optimization correction method, system, medium and terminal for image edge
CN117831082A (en) * 2023-12-29 2024-04-05 广电运通集团股份有限公司 Palm area detection method and device

Also Published As

Publication number Publication date
CN111241947B (en) 2023-07-18

Similar Documents

Publication Publication Date Title
CN111241947B (en) Training method and device for target detection model, storage medium and computer equipment
CN110245662B (en) Detection model training method and device, computer equipment and storage medium
CN110569721B (en) Recognition model training method, image recognition method, device, equipment and medium
CN111950329B (en) Target detection and model training method, device, computer equipment and storage medium
CN109035299B (en) Target tracking method and device, computer equipment and storage medium
CN109271870B (en) Pedestrian re-identification method, device, computer equipment and storage medium
CN110852285B (en) Object detection method and device, computer equipment and storage medium
CN111523414B (en) Face recognition method, device, computer equipment and storage medium
CN110348294B (en) Method and device for positioning chart in PDF document and computer equipment
CN111079632A (en) Training method and device of text detection model, computer equipment and storage medium
US11900676B2 (en) Method and apparatus for detecting target in video, computing device, and storage medium
CN112989962B (en) Track generation method, track generation device, electronic equipment and storage medium
CN112241952B (en) Brain midline identification method, device, computer equipment and storage medium
WO2022134354A1 (en) Vehicle loss detection model training method and apparatus, vehicle loss detection method and apparatus, and device and medium
CN113706481A (en) Sperm quality detection method, sperm quality detection device, computer equipment and storage medium
CN113780145A (en) Sperm morphology detection method, sperm morphology detection device, computer equipment and storage medium
Guo et al. Machine vision-based intelligent manufacturing using a novel dual-template matching: a case study for lithium battery positioning
CN112241705A (en) Target detection model training method and target detection method based on classification regression
CN112199984B (en) Target rapid detection method for large-scale remote sensing image
CN114419370A (en) Target image processing method and device, storage medium and electronic equipment
CN114005052A (en) Target detection method and device for panoramic image, computer equipment and storage medium
CN110472656B (en) Vehicle image classification method, device, computer equipment and storage medium
CN112164090A (en) Data processing method and device, electronic equipment and machine-readable storage medium
CN112926610A (en) Construction method of license plate image screening model and license plate image screening method
CN113780131B (en) Text image orientation recognition method, text content recognition method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant