CN112419310A - Target detection method based on intersection and fusion frame optimization - Google Patents

Target detection method based on intersection and fusion frame optimization Download PDF

Info

Publication number
CN112419310A
CN112419310A CN202011447204.1A CN202011447204A CN112419310A CN 112419310 A CN112419310 A CN 112419310A CN 202011447204 A CN202011447204 A CN 202011447204A CN 112419310 A CN112419310 A CN 112419310A
Authority
CN
China
Prior art keywords
frame
prediction
height
width
calibration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011447204.1A
Other languages
Chinese (zh)
Other versions
CN112419310B (en
Inventor
惠国保
田万勇
王瑜
郭褚冰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 20 Research Institute
Original Assignee
CETC 20 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 20 Research Institute filed Critical CETC 20 Research Institute
Priority to CN202011447204.1A priority Critical patent/CN112419310B/en
Publication of CN112419310A publication Critical patent/CN112419310A/en
Application granted granted Critical
Publication of CN112419310B publication Critical patent/CN112419310B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Quality & Reliability (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Geometry (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a target detection method based on intersection and fusion frame optimization, which is characterized in that a correct mask is selected based on an intersection and fusion frame coincidence rate calculation method, the offset vector position of a prediction frame closest to a calibration frame on a characteristic diagram is determined, the error value of the offset of the prediction frame and the reference offset is calculated in a targeted manner, and the target prediction precision and the positioning precision of a network are improved. The method can screen a more reliable prediction frame, calculate the back propagation error more accurately, and store the gradient value of the back propagation, so that the accuracy of the finally trained network model is higher, the intersection and fusion coincidence rate has invariance to the scale change, and the coincidence rate value can be mapped into a certain interval, which is only a subset of the traditional IOU method, so that the error regression can be truly optimized, and the space for improving the accuracy of the prediction frame is expanded.

Description

Target detection method based on intersection and fusion frame optimization
Technical Field
The invention relates to an image target detection technology, in particular to a target detection method for optimizing a training network by calculating a back propagation error according to a frame coincidence rate.
Background
An important aspect of object detection and identification on an image is accurate frame prediction. An input image is firstly used for framing an interested target from the image and carrying out the identification of the attribute of the target in the image. The predicted bounding box is required to frame the target as completely as possible. The important information source of the predicted frame is the feature vector of the region, the image feature region is divided according to a set mode and is determined by the pixel position of the feature image and the width and the height of the anchor frame.
How many feature boxes correspond to how many prediction boxes. The feature areas divided on the feature map are many, not every feature area covers the target, even if the target is not covered, the target is not covered exactly, and a prediction box deduced by the feature information has errors.
Therefore, a machine learning method is needed to improve the accuracy of the prediction frame in the learning process. In the learning process, the calibration frame is used as a target, a prediction frame or an anchor frame with high coincidence rate with the calibration frame is screened out, and the corresponding prediction frame and an offset error value thereof are selected as errors of back propagation. And performing iterative optimization learning for multiple times to enable the prediction frame to approach the calibration frame.
On the characteristic diagram, the number of prediction frames is large, each prediction frame corresponds to a plurality of masks (masks), and the key for calculating the back propagation error is to select the mask closest to the calibration frame. In the process of selecting the mask, the method involves calculating the coincidence rate of a prediction frame or an anchor frame and a calibration frame in a large quantity, sequencing the coincidence rates, and screening the mask corresponding to the frame with the highest coincidence rate.
The frame coincidence rate calculation is an important ring in the network model training and learning. Conventional coincidence rates are generally expressed in phase-to-phase ratios (IOU), but this presents some problems. The IOU is used as a measurement value of coincidence of the two frames, can be directly applied to back propagation and is used for objective function optimization, so people preferably use the IOU as an objective function to complete a two-dimensional target detection task.
The IOU can be either a direct loss function or an indirect loss function, but either has an important issue: if the two frames do not coincide, the IOU value is 0, and how far the two frames are apart cannot be reflected. In such a non-coincidence situation, if the IOU is used as a loss function, the gradient value is 0, and no optimization is performed.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a preferred target detection method based on an intersection and fusion frame. In order to solve the problem of the traditional IOU weakness, the invention expands the IOU weakness to the non-coincidence condition and provides an intersection and fusion coincidence rate calculation method. The calculation method of the intersection and fusion coincidence rate provided by the invention is used as a target detection core module, and the positioning precision is further promoted, because the frame regression loss is not directly represented by the coincidence rate. The frame regression loss is measured by using the frame position, the width and height offset and the attribute probability change error value.
The technical scheme adopted by the invention for solving the technical problem comprises the following steps:
1) extracting characteristics;
because the structure of the neural network and the scale of each layer are preset, firstly, the width and the height of an input initial image are adjusted to be suitable for the width and the height of a network input port, and the width and the height of the image are scaled to be suitable for the width and the height of the network input port, and the method specifically comprises the following steps:
a. firstly, the ratio of the width and height (W, H) of the initial image to the diameter and height (W ', H') of the network input port is calculated
Figure BDA0002824837390000021
b. Get
Figure BDA0002824837390000022
And
Figure BDA0002824837390000023
the middle and smaller ratio is a reference ratio, the initial image is zoomed according to the reference ratio, one side of the reference ratio is just zoomed to the width and the height of the input aperture of the network, and the margin is left after the other side is zoomed;
c. filling the remaining white part with a fixed pixel value, wherein the fixed pixel value is the highest half of the gray value, namely 0.5 × 256 to 128;
after the width and height of an input initial image are adjusted, a final feature mapping chart is obtained through multi-layer feature extraction of a neural network; the final feature map reflects the feature quantity required for predicting the frame, and the structural diagram of the final feature map is shown in fig. 1.
2) Obtaining a prediction frame;
on the final feature mapping chart obtained in the step 1), the corresponding feature vector of each feature point is divided into three equal-length segments, each segment corresponds to a mask (mask), and the mask is a film with an anchor frame covered on the image; each feature point on the feature map is used as the center of an anchor frame, the same anchor frame mask of all the feature points is constructed on the feature map, and three anchor frames have three masks, namely correspond to three prediction frames;
each segment of feature vector comprises two segments, one is a prediction frame offset segment and a prediction frame attribute probability value segment, wherein the prediction frame offset comprises four components, namely 2 horizontal and vertical offsets and 2 width and height offsets of a position, and the prediction frame attribute comprises a target judgment probability value (one component) and a target type prediction probability value (a plurality of components of a target type); the predicted frame position, width and height and attributes are scaled by the respective component values:
respectively and correspondingly adding the horizontal and vertical coordinates of the center point of the anchor frame mask on the final feature map with the horizontal and vertical offsets of the position of the prediction frame to convert the position of the prediction frame; multiplying the width and height of the anchor frame by the offset of the width and height of the prediction frame to convert the width and height of the prediction frame; judging the attribute of the prediction box by the probability value of the existence of the target, wherein the probability value is larger than a threshold value, and considering that the prediction box contains the target, otherwise, judging that the target does not exist; if there is a target, the type corresponding to the maximum component in the target type prediction probability values is determined as the attribute of the prediction frame, and in this way, the prediction frame (including the position, width, height and attribute of the prediction frame) is converted for each feature point vector on the final feature map.
3) Calculating the coincidence rate of the prediction frame and the calibration frame;
obtaining a prediction frame of each feature point on the final feature map by 2), screening the prediction frames, wherein the screening refers to calculating the coincidence rate of each feature point prediction frame and a calibration frame on the final feature map, sequencing a plurality of prediction frames according to the coincidence rate, and selecting the prediction frame with the maximum coincidence rate, namely selecting the prediction frame closest to the calibration frame, and the coincidence schematic diagram of two frames is shown in FIG. 3;
4) calculating a back propagation error;
during model training, calculating errors of the offset and the attribute probability value of the prediction frame, wherein the errors take the calibration frame as a benchmark, calculating the offset error of the position of the prediction frame, the offset error of width and height, the target-free judgment error and the type prediction error, and the meaning of each error corresponds to the characteristic point vector in the step 2); the specific steps for calculating the error are as follows:
a) calculating the offset (t) of the calibration frame on the feature map corresponding to the position of the feature pointx,ty) And corresponding anchor frame width and height (A)w,Ah) Amount of change (t) ofw,th) (ii) a The corresponding feature point is the upper left point (G) of the calibration frame on the original imagex,Gy) Mapping to the feature point coordinate with the closest distance on the feature map, wherein the anchor frame is the anchor frame corresponding to the prediction frame with the maximum coincidence rate of the calibration frames;
b) calculating an error value between the predicted value and the reference value;
the feature vector at the feature point coordinate (i, j) on the feature map, and the four foremost variables are the predicted values (b) of the frame position and the width and height variationx,by,bw,bh) Then the error value is:
Δi=s(ti-bi),i∈{x,y,w,h}
wherein s is a scale factor for equalizing the proportion of the small boxes,
Figure BDA0002824837390000031
then, the presence or absence of a target determination error Δ in the prediction frame is calculatedobjAnd the class prediction error Δct(ct(t∈{1,2,3});
And finally, the back propagation error is the sum of the error values of the position offset, the width and height offset and the attribute judgment of the predicted frame and the calibrated frame.
The feature extraction of the neural network is a convolution and pooling alternating operation.
The specific steps of selecting the prediction frame closest to the calibration frame are as follows:
firstly, calculating a minimum bounding frame of a prediction frame and a calibration frame, which refers to a fusion region of two frames P and GT in FIG. 3, namely, taking a rectangular region bounded by the horizontal coordinate of the two frames P and GT with the minimum left side, the maximum right side and the minimum upper side and the maximum lower side of the vertical coordinate, wherein the bounding region is called as a fusion region and is marked as U;
secondly, calculating an intersection area of the prediction frame and the calibration frame, which is an intersection of the P area and the GT area and is marked as I;
then, the ratio I/U of the intersection area of the prediction frame and the calibration frame and the fusion area is the coincidence ratio, and the range of the coincidence ratio is [0,1 ];
and aiming at one calibration frame, sequencing all the prediction frames according to the coincidence rate, and taking the prediction frame with the maximum coincidence rate as the closest calibration frame.
The step of mapping the coordinate points of the calibration frame to the final feature map comprises the following steps:
the first step is as follows: the original width and height (G) of the calibration framew,Gh) And (3) normalizing relative to the width and height (W, H) of the original image, namely, the width and height of the calibration frame are respectively higher than those of the original image, so as to obtain the normalized width and height of the calibration frame:
Figure BDA0002824837390000041
the second step is that: multiplying position coordinate point after normalization of calibration frame by feature diagram width and height (F)w,Fh) Obtaining the horizontal and vertical coordinates (g) of the calibration frame on the feature mapx,gy) Is a floating point number:
Figure BDA0002824837390000042
Figure BDA0002824837390000043
the third step: find the distance between the horizontal and vertical positions on the feature mapCoordinate (g)x,gy) The characteristic point with the closest distance is taken as a reference point, the coordinates of the reference point are (i, j), the coordinate values on the characteristic map are integers, and i, j are integers; and calculating to obtain:
tx=gx-i
ty=gy-j
Figure BDA0002824837390000044
Figure BDA0002824837390000045
wherein, the difference value between the position coordinate mapped on the characteristic diagram by the calibration frame and the characteristic point coordinate with the nearest distance is used as the position reference offset (t)x,ty) Calibrating the width and height of the frame and the maximum width and height t of the anchor frame with the coincidence ratew,thLogarithm of the ratio as the width-height reference offset (t)w,th)。
The presence or absence of the target determination error ΔobjAnd the class prediction error ΔctThe specific calculation method of (ct (t epsilon {1,2,3}) is as follows:
presence or absence of target determination error ΔobjDetermining the corresponding difference value of probability values for the existence of targets in attribute components in a prediction frame and a calibration frame on the feature point vector, and predicting the error delta of the typect(ct (t epsilon {1,2,3}) is the corresponding difference value of the target type prediction probability values in the attribute components in the prediction frame and the calibration frame on the feature point vector.
The threshold value is 0.5.
The method has the advantages that the IOU method is expanded, the intersection and fusion coincidence rate calculation method is provided, the method is compatible with the situation that frames are not coincident, more reliable prediction frames can be screened, the back propagation error can be calculated more accurately, the gradient value of the back propagation is stored, and the accuracy of the finally trained network model is higher.
The method for calculating the intersection and fusion coincidence rate supports the translation and scaling attributes of frame distance measurement, the intersection and fusion coincidence rate has invariance to scale change, and the coincidence rate value can be mapped into a certain interval, which is only a subset of the conventional IOU method.
A margin or a blind area still exists between the traditional IOU value and regression error optimization, and the calculation method of the intersection and fusion coincidence rate obtained by expanding on the basis of the IOU method can cover the blind area, so that the error regression can be truly optimized, and the space for improving the accuracy of the predicted frame is expanded.
Drawings
Fig. 1 is a schematic diagram of a final layer feature diagram.
FIG. 2 is a diagram of a feature map storage format in system memory.
FIG. 3 is a schematic diagram of the intersection and fusion of a prediction box and a calibration box.
Detailed Description
The invention is further illustrated with reference to the following figures and examples.
Specifically, the quantities to be adjusted are mapped onto feature vectors of a feature map, each segment of the vector corresponds to several masks (masks), each mask is a frame position, a width and height offset and an attribute probability value, and errors of the quantities are propagated reversely as shown in fig. 1. The intersection and fusion coincidence rate is used for selecting a mask to be adjusted from a plurality of masks. Because the back propagation error function is a linear function, the IOU or the intersection and fusion coincidence rate can be indirectly used as an error, and the optimization of the deep neural network model is realized.
However, in all non-coincident cases, the IOU has a gradient value of 0, which affects the training quality and convergence speed. In contrast, the cross-merge coincidence ratio has a gradient value in both coincident and non-coincident situations, which is characteristic of the IOU. Even when the IOU value is high, the intersection and fusion coincidence rate can have the IOU characteristic.
The intersection and fusion coincidence rate provided by the invention is defined as: for two rectangles with any shapes, finding out a smallest enclosure frame of two frames, wherein an area in the enclosure frame is called a fusion area C; then determining the intersection overlapping part of the two frames, wherein the area is called an intersection area I; and calculating the proportion I/C of the area of the intersection of the two frames in the area of the fusion area, namely the intersection and fusion coincidence rate.
The error value is 1-I/C obtained based on the intersection and fusion coincidence rate, the error value has close relation with the traditional IOU, and the relation between the error value and the traditional IOU can be deduced according to the following formula:
Figure BDA0002824837390000061
where U is the merged region of the two borders, U ═ P + GT-I, it can be seen that U is an amount that cannot be directly calculated.
As can be seen from the above equation, the non-union region proportion in the union region minus the union-union ratio multiplied by a coefficient, which is the IOU, is the error value of union-union.
The axial direction of the rectangular frame on the image is consistent with the axial direction of the image, and the rectangular frame is changed only in the aspects of position, width and height and not in the aspect of rotation. Both arbitrary rectangular boxes can find the bounding box that minimally encloses them. The smallest enclosing frame is the frame limited according to the smallest and largest boundaries of the horizontal and vertical coordinates in the two frames.
The intersection and fusion coincidence rate provided by the invention is calculated and the following generality is ensured: 1) keeping consistent with the definition related to the IOU, including shape description and area description; 2) the scale change is kept unchanged as the IOU; 3) the two frames overlapping condition that the IOU has is compatible.
The frame coincidence rate calculation method based on intersection and fusion is designed and realized, and guidance is provided for selecting a proper mask. And selecting a correct mask, determining the offset vector position of the prediction frame closest to the calibration frame on the characteristic diagram, and calculating the error value of the offset of the prediction frame and the reference offset in a targeted manner to improve the target prediction precision and positioning precision of the network. The embodiment provided by the invention comprises the following steps:
step 1, taking a feature vector at a feature point (i, j) on a feature map (shown in figure 1), wherein the feature vector is composed of 3 segments of masks (shown in figure 2), and each segment of mask corresponds to a prediction frame (p)x,py,pw,ph) Position, width and height offset value (b)x,by,bw,bh) And attribute prediction probability (b)obj,bc1,bc2,bc3) Where x, y, w, h correspond to the center position coordinates and width and height of the frame, and obj, c1, c2, and c3 correspond to probability values (in the present embodiment, the object types are set to three types) that the frame contains the object probability and the object belongs to (c1, c2, and c3), respectively.
Step 2, counting all calibration frames on the input image, wherein each calibration frame and the prediction frame P obtained in the step 1 calculate the coincidence rate (IOU) in the following way:
1) determining a fusion area of the prediction frame and the calibration frame, and calculating the area U of the fusion area;
2) then calculating the intersection area of the prediction frame and the calibration frame, and calculating the area of the intersection area as I;
3) and calculating the intersection and fusion coincidence rate of the prediction frame and the calibration frame as I/U, sequencing according to the coincidence rate, and recording the number of the calibration frame with the maximum coincidence rate.
Step 3, setting a prediction frame error value: if the maximum merging and merging coincidence rate obtained in the step 2 is greater than the neglect threshold (0.5), the target probability error value delta of the predicted frame obtained in the step 1 is usedobjThe value is set to 0, meaning disregarded, otherwise Δobj=0-bobj. If the maximum coincidence rate obtained in the step 2 is greater than the real threshold value (0.9), setting as follows:
1) predicting the target probability of the frame by the error value deltaobj=1-bobj
2) According to the number of the calibration frame obtained in the step 3) of the step 2), finding out a target type value ct (t is belonged to {1,2,3}) corresponding to the calibration frame from all calibration data, and a prediction frame target type prediction error value delta consistent with ctct=1-bctThe prediction error value of other target types in the prediction box is set as deltact=0-bct(ii) a The position and width and height of the frame are calibrated and converted into a reference position offset.
3) Finding the offset (t) relative to the reference position from all calibration data according to the calibration frame number obtained in the step 2x,ty) And an offset amount (t) from the reference width and heightw,th):
The reference position offset is the corresponding difference between the position of the calibration frame mapped on the feature map and the feature point (i, j).
The reference width-height offset is a logarithmic value of the ratio of the width to the height of the calibration frame to the anchor frame.
The predicted frame position and width to height offset error value is (t)x,ty,tw,th) And (b)x,by,bw,bh) The difference of the corresponding term.
And 4, circularly executing the steps 1 to 3 on all the feature points on the feature map, and finally obtaining the prediction error value of each mask section of the feature vector on all the feature points.
Step 5, for each calibration frame, determining the position (k, l) of the frame mapped on the feature map, and taking an integer; and simultaneously calculating the coincidence rate of the frame and each anchor frame at the same central point, wherein the calculation mode of the coincidence rate is the same as 2 steps. And recording the anchor frame number corresponding to the maximum coincidence rate.
And 6, according to the anchor frame number obtained in the step 5, finding a number corresponding to the anchor frame number in the mask of the feature vector 3 section corresponding to the feature point (k, l), if the number exists, recording the mask number, and if the number does not exist, returning to exit.
Step 7, extracting the predicted frame offset amount (b ') corresponding to the mask number obtained in step 6'x,b'y,b'w,b'h) And attribute probability value (b'obj,b′c1,b′c2,b′c3) The error amounts of these values are calculated as follows:
1) the mask prediction box contains a target probability error value delta'obj=1-b'obj
2) The method for calculating the classification error of the mask prediction box is the same as the method of the step 2) in the step 3: target type values c't (t epsilon {1,2,3}) corresponding to the calibration frames and prediction frame target type prediction error values delta' consistent with c't are found from all calibration data'ct=1-b′ctPredicted box target type prediction error value Δ ' inconsistent with c't 'ct=0-b′ct
3) The method for calculating the error value of the offset of the prediction frame is the same as that in the step 3) of the step 3, except that the position of the feature point is (k, l) when the offset of the reference position is calculated, and the anchor frame is the anchor frame marked in the step 5 when the offset of the reference width and the height is calculated.
And 8, circularly executing the steps 5 to 7 until all the calibration frames are executed. And finally, obtaining vector error values of masks corresponding to all the calibration frames on the feature map.
And 9, according to the eigenvector error obtained in the step, performing back propagation by adopting a gradient descent method, and adjusting a network weight value.

Claims (6)

1. A target detection method based on intersection and fusion frame optimization is characterized by comprising the following steps:
1) extracting characteristics;
because the structure of the neural network and the scale of each layer are preset, firstly, the width and the height of an input initial image are adjusted to be suitable for the width and the height of a network input port, and the width and the height of the image are scaled to be suitable for the width and the height of the network input port, and the method specifically comprises the following steps:
a. firstly, the ratio of the width and height (W, H) of the initial image to the diameter and height (W ', H') of the network input port is calculated
Figure FDA0002824837380000011
b. Get
Figure FDA0002824837380000012
And
Figure FDA0002824837380000013
the middle and smaller ratio is a reference ratio, the initial image is zoomed according to the reference ratio, one side of the reference ratio is just zoomed to the width and the height of the input aperture of the network, and the margin is left after the other side is zoomed;
c. filling the remaining residual white part with a fixed pixel value, wherein the fixed pixel value is half of the highest gray value;
after the width and height of an input initial image are adjusted, a final feature mapping chart is obtained through multi-layer feature extraction of a neural network;
2) obtaining a prediction frame;
on the final feature mapping chart obtained in the step 1), the corresponding feature vector of each feature point is divided into three equal-length sections, each section corresponds to a mask, and the mask is a film with an anchor frame on the image; each feature point on the feature map is used as the center of an anchor frame, the same anchor frame mask of all the feature points is constructed on the feature map, and three anchor frames have three masks, namely correspond to three prediction frames;
each segment of feature vector comprises two segments, one segment is a prediction frame offset segment and a prediction frame attribute probability value segment, wherein the prediction frame offset comprises four components, namely 2 horizontal and vertical offsets and 2 width and height offsets of a position, and the prediction frame attribute comprises a target judgment probability value and a target type prediction probability value; the predicted frame position, width and height and attributes are scaled by the respective component values:
respectively and correspondingly adding the horizontal and vertical coordinates of the center point of the anchor frame mask on the final feature map with the horizontal and vertical offsets of the position of the prediction frame to convert the position of the prediction frame; multiplying the width and height of the anchor frame by the offset of the width and height of the prediction frame to convert the width and height of the prediction frame; judging the attribute of the prediction box by the probability value of the existence of the target, wherein the probability value is larger than a threshold value, and considering that the prediction box contains the target, otherwise, judging that the target does not exist; if the target exists, judging the type corresponding to the maximum component in the target type prediction probability value as the attribute of the prediction frame, and converting each characteristic point vector on the final characteristic graph into the prediction frame by the operation;
3) calculating the coincidence rate of the prediction frame and the calibration frame;
obtaining a prediction frame of each feature point on the final feature map by 2), and screening the prediction frames, wherein the screening refers to calculating the coincidence rate of each feature point prediction frame and a calibration frame on the final feature map, sequencing a plurality of prediction frames according to the coincidence rate, and selecting the prediction frame with the maximum coincidence rate, namely selecting the prediction frame closest to the calibration frame;
4) calculating a back propagation error;
during model training, calculating errors of the offset and the attribute probability value of the prediction frame, wherein the errors take the calibration frame as a benchmark, calculating the offset error of the position of the prediction frame, the offset error of width and height, the target-free judgment error and the type prediction error, and the meaning of each error corresponds to the characteristic point vector in the step 2); the specific steps for calculating the error are as follows:
a) calculating the offset (t) of the calibration frame on the feature map corresponding to the position of the feature pointx,ty) And corresponding anchor frame width and height (A)w,Ah) Amount of change (t) ofw,th) (ii) a The corresponding feature point is the upper left point (G) of the calibration frame on the original imagex,Gy) Mapping to the feature point coordinate with the closest distance on the feature map, wherein the anchor frame is the anchor frame corresponding to the prediction frame with the maximum coincidence rate of the calibration frames;
b) calculating an error value between the predicted value and the reference value;
the feature vector at the feature point coordinate (i, j) on the feature map, and the four foremost variables are the predicted values (b) of the frame position and the width and height variationx,by,bw,bh) Then the error value is:
Δi=s(ti-bi),i∈{x,y,w,h}
wherein s is a scale factor for equalizing the proportion of the small boxes,
Figure FDA0002824837380000021
then, the presence or absence of a target determination error Δ in the prediction frame is calculatedobjAnd the class prediction error Δct(ct(t∈{1,2,3});
And finally, the back propagation error is the sum of the error values of the position offset, the width and height offset and the attribute judgment of the predicted frame and the calibrated frame.
2. The optimal target detection method based on the intersection and fusion frame as claimed in claim 1, wherein:
the feature extraction of the neural network is a convolution and pooling alternating operation.
3. The optimal target detection method based on the intersection and fusion frame as claimed in claim 1, wherein:
the specific steps of selecting the prediction frame closest to the calibration frame are as follows:
firstly, calculating a minimum surrounding frame of a prediction frame and a calibration frame, wherein the minimum surrounding frame refers to a fusion region of the prediction frame and the calibration frame, namely a rectangular region surrounded by the prediction frame and the calibration frame is taken, and the surrounding region is called as a fusion region and is marked as U;
secondly, calculating an intersection area of the prediction frame and the calibration frame, which is an intersection of the P area and the GT area and is marked as I;
then, the ratio I/U of the intersection area of the prediction frame and the calibration frame and the fusion area is the coincidence ratio, and the range of the coincidence ratio is [0,1 ];
and aiming at one calibration frame, sequencing all the prediction frames according to the coincidence rate, and taking the prediction frame with the maximum coincidence rate as the closest calibration frame.
4. The optimal target detection method based on the intersection and fusion frame as claimed in claim 1, wherein:
the step of mapping the coordinate points of the calibration frame to the final feature map comprises the following steps:
the first step is as follows: the original width and height (G) of the calibration framew,Gh) And (3) normalizing relative to the width and height (W, H) of the original image, namely, the width and height of the calibration frame are respectively higher than those of the original image, so as to obtain the normalized width and height of the calibration frame:
Figure FDA0002824837380000031
the second step is that: multiplying position coordinate point after normalization of calibration frame by feature diagram width and height (F)w,Fh) Obtaining the horizontal and vertical coordinates (g) of the calibration frame on the feature mapx,gy) Is a floating point number:
Figure FDA0002824837380000032
Figure FDA0002824837380000033
the third step: finding the distance on the feature map from the horizontal and vertical coordinates (g)x,gy) The characteristic point with the closest distance is taken as a reference point, the coordinates of the reference point are (i, j), the coordinate values on the characteristic map are integers, and i, j are integers; and calculating to obtain:
tx=gx-i
ty=gy-j
Figure FDA0002824837380000034
Figure FDA0002824837380000035
wherein, the difference value between the position coordinate mapped on the characteristic diagram by the calibration frame and the characteristic point coordinate with the nearest distance is used as the position reference offset (t)x,ty) Calibrating the width and height of the frame and the maximum width and height t of the anchor frame with the coincidence ratew,thLogarithm of the ratio as the width-height reference offset (t)w,th)。
5. The optimal target detection method based on the intersection and fusion frame as claimed in claim 1, wherein:
the presence or absence of the target determination error ΔobjAnd the class prediction error ΔctThe specific calculation method of (ct (t epsilon {1,2,3}) is as follows:
presence or absence of target determination error ΔobjDetermining the corresponding difference value of probability values for the existence of targets in attribute components in a prediction frame and a calibration frame on the feature point vector, and predicting the error delta of the typect(ct (t epsilon {1,2,3}) is the corresponding difference value of the target type prediction probability values in the attribute components in the prediction frame and the calibration frame on the feature point vector.
6. The optimal target detection method based on the intersection and fusion frame as claimed in claim 1, wherein:
the threshold value is 0.5.
CN202011447204.1A 2020-12-08 2020-12-08 Target detection method based on cross fusion frame optimization Active CN112419310B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011447204.1A CN112419310B (en) 2020-12-08 2020-12-08 Target detection method based on cross fusion frame optimization

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011447204.1A CN112419310B (en) 2020-12-08 2020-12-08 Target detection method based on cross fusion frame optimization

Publications (2)

Publication Number Publication Date
CN112419310A true CN112419310A (en) 2021-02-26
CN112419310B CN112419310B (en) 2023-07-07

Family

ID=74776093

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011447204.1A Active CN112419310B (en) 2020-12-08 2020-12-08 Target detection method based on cross fusion frame optimization

Country Status (1)

Country Link
CN (1) CN112419310B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114118124A (en) * 2021-09-29 2022-03-01 北京百度网讯科技有限公司 Image detection method and device

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304758A (en) * 2017-06-21 2018-07-20 腾讯科技(深圳)有限公司 Facial features tracking method and device
CN108764228A (en) * 2018-05-28 2018-11-06 嘉兴善索智能科技有限公司 Word object detection method in a kind of image
US10304191B1 (en) * 2016-10-11 2019-05-28 Zoox, Inc. Three dimensional bounding box estimation from two dimensional images
CN110400332A (en) * 2018-04-25 2019-11-01 杭州海康威视数字技术股份有限公司 A kind of target detection tracking method, device and computer equipment
CN110427915A (en) * 2019-08-14 2019-11-08 北京百度网讯科技有限公司 Method and apparatus for output information
CN110517224A (en) * 2019-07-12 2019-11-29 上海大学 A kind of photovoltaic panel defect inspection method based on deep neural network
CN110532920A (en) * 2019-08-21 2019-12-03 长江大学 Smallest number data set face identification method based on FaceNet method
CN110766058A (en) * 2019-10-11 2020-02-07 西安工业大学 Battlefield target detection method based on optimized RPN (resilient packet network)
CN110909800A (en) * 2019-11-26 2020-03-24 浙江理工大学 Vehicle detection method based on fast R-CNN improved algorithm
CN111062282A (en) * 2019-12-05 2020-04-24 武汉科技大学 Transformer substation pointer type instrument identification method based on improved YOLOV3 model
CN111461110A (en) * 2020-03-02 2020-07-28 华南理工大学 Small target detection method based on multi-scale image and weighted fusion loss

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10304191B1 (en) * 2016-10-11 2019-05-28 Zoox, Inc. Three dimensional bounding box estimation from two dimensional images
CN108304758A (en) * 2017-06-21 2018-07-20 腾讯科技(深圳)有限公司 Facial features tracking method and device
CN110400332A (en) * 2018-04-25 2019-11-01 杭州海康威视数字技术股份有限公司 A kind of target detection tracking method, device and computer equipment
CN108764228A (en) * 2018-05-28 2018-11-06 嘉兴善索智能科技有限公司 Word object detection method in a kind of image
CN110517224A (en) * 2019-07-12 2019-11-29 上海大学 A kind of photovoltaic panel defect inspection method based on deep neural network
CN110427915A (en) * 2019-08-14 2019-11-08 北京百度网讯科技有限公司 Method and apparatus for output information
CN110532920A (en) * 2019-08-21 2019-12-03 长江大学 Smallest number data set face identification method based on FaceNet method
CN110766058A (en) * 2019-10-11 2020-02-07 西安工业大学 Battlefield target detection method based on optimized RPN (resilient packet network)
CN110909800A (en) * 2019-11-26 2020-03-24 浙江理工大学 Vehicle detection method based on fast R-CNN improved algorithm
CN111062282A (en) * 2019-12-05 2020-04-24 武汉科技大学 Transformer substation pointer type instrument identification method based on improved YOLOV3 model
CN111461110A (en) * 2020-03-02 2020-07-28 华南理工大学 Small target detection method based on multi-scale image and weighted fusion loss

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Y. SUN, C ET AL: "An Object Detection Network for Embedded System", 《DSCI》 *
袁汉钦等: "一种基于掩膜组合的多类弹载图像目标分割算法", 《舰船电子工程》 *
韩兴等: "基于深度神经网络复杂场景下的机器人拣选方法", 《北京邮电大学学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114118124A (en) * 2021-09-29 2022-03-01 北京百度网讯科技有限公司 Image detection method and device
CN114118124B (en) * 2021-09-29 2023-09-12 北京百度网讯科技有限公司 Image detection method and device

Also Published As

Publication number Publication date
CN112419310B (en) 2023-07-07

Similar Documents

Publication Publication Date Title
CN110298298B (en) Target detection and target detection network training method, device and equipment
CN113362329B (en) Method for training focus detection model and method for recognizing focus in image
CN111191566B (en) Optical remote sensing image multi-target detection method based on pixel classification
CN112800964B (en) Remote sensing image target detection method and system based on multi-module fusion
US7668373B2 (en) Pattern evaluation method, method of manufacturing semiconductor, program and pattern evaluation apparatus
CN111489357A (en) Image segmentation method, device, equipment and storage medium
CN111242026B (en) Remote sensing image target detection method based on spatial hierarchy perception module and metric learning
CN110838145B (en) Visual positioning and mapping method for indoor dynamic scene
CN113177592B (en) Image segmentation method and device, computer equipment and storage medium
US20220277434A1 (en) Measurement System, Method for Generating Learning Model to Be Used When Performing Image Measurement of Semiconductor Including Predetermined Structure, and Recording Medium for Storing Program for Causing Computer to Execute Processing for Generating Learning Model to Be Used When Performing Image Measurement of Semiconductor Including Predetermined Structure
CN111239684A (en) Binocular fast distance measurement method based on YoloV3 deep learning
CN111353440A (en) Target detection method
US20200327686A1 (en) Methods, systems, articles of manufacture, and apparatus to enhance image depth confidence maps
EP3376468A1 (en) Object detection device and object detection method
CN110598711B (en) Target segmentation method combined with classification task
CN109993728B (en) Automatic detection method and system for deviation of thermal transfer glue
CN112419310A (en) Target detection method based on intersection and fusion frame optimization
CN114926498A (en) Rapid target tracking method based on space-time constraint and learnable feature matching
CN112613462B (en) Weighted intersection ratio method
CN117635421A (en) Image stitching and fusion method and device
CN111144466B (en) Image sample self-adaptive depth measurement learning method
JPH08335268A (en) Area extracting method
CN115656991A (en) Vehicle external parameter calibration method, device, equipment and storage medium
CN115719414A (en) Target detection and accurate positioning method based on arbitrary quadrilateral regression
CN115909347A (en) Instrument reading identification method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant