CN109784476A - A method of improving DSOD network - Google Patents
A method of improving DSOD network Download PDFInfo
- Publication number
- CN109784476A CN109784476A CN201910029814.0A CN201910029814A CN109784476A CN 109784476 A CN109784476 A CN 109784476A CN 201910029814 A CN201910029814 A CN 201910029814A CN 109784476 A CN109784476 A CN 109784476A
- Authority
- CN
- China
- Prior art keywords
- network
- dsod
- image
- feature
- convolutional layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Image Analysis (AREA)
Abstract
The present invention relates to a kind of methods for improving DSOD network, input picture is pre-processed first, pretreated image is input in DSOD feature extraction sub-network, RFB_a network module is added after second interposer of feature extraction sub-network, the feature with different feeling open country is extracted by the Atrous convolution of sampling step lengths different in RFB_a network, the Atrous convolutional layer that sampling step length is 6 is added after feature extraction sub-network, the feature that Atrous convolutional layer generates is input in multi-scale prediction layer, multi-scale prediction layer is input in loss function, IOG penalty term is added in loss function, prevent occurring similar prediction block overlapping when predicting intensive same type target.Meanwhile learning rate is arranged using warm up strategy in the training stage, by the way that suitably batch sample size is arranged, reduce the hardware device requirement of trained network.The present invention has higher detection accuracy relative to former DSOD algorithm, improves the detectability to Small object, while reducing the hardware device requirement of trained network.
Description
Technical field
The present invention relates to computer vision field, especially a kind of method for improving DSOD network.
Background technique
Target detection is one of most important research topic in computer vision, and main task is positioned in given image
Interested target, and accurately judge the specific location of each target.Algorithm of target detection based on convolutional neural networks can
To be divided into two kinds: algorithm of target detection based on extracted region and based on the algorithm of target detection of recurrence.Based on extracted region mesh
Detection algorithm is marked, although detection accuracy with higher, needs to extract candidate region, detection speed is difficult to reach real
When.Algorithm of target detection based on recurrence, as SSD, DSOD have reached reality so that detecting by the step of removing extracted region
When.But DSOD algorithm requires the problems such as high to hardware device there is poor to small target deteection ability, and when training network.
Summary of the invention
In view of this, can be improved the purpose of the present invention is to propose to a kind of method for improving DSOD network to Small object
Detectability and the detection accuracy for improving target.
The present invention is realized using following scheme: a method of improving DSOD network, comprising the following steps:
Step S1: the image obtained in data set is input to input layer as input picture, and by input picture;To input
Image cut, mirror image and equalization is gone to pre-process, and obtains pretreated image, while will locate in advance using method for normalizing
The absolute coordinate in image after reason is converted into relative coordinate;
Step S2: RFB_a network module is added after second interposer of the feature extraction sub-network in DSOD network;
Image pretreated in step S1 is input to the feature extraction sub-network in DSOD network and carries out feature extraction;By DSOD net
The characteristic pattern of second interposer in feature extraction sub-network in network is input in RFB_a network module, by RFB_a net
The Atrous of different sampling step lengths expands convolution in network module, extracts the feature with different feeling open country;The difference of the extraction
Receptive field feature is input in 3 × 3 convolutional layers, forms first scale prediction layer of DSOD network;
Step S3: the Atrous convolutional layer for having default sampling step length is added to of the feature extraction in the DSOD network
After network, and the characteristic pattern of feature extraction sub-network described in step S2 is input to, and there is default (sampling step length 6) to sample
In the Atrous convolutional layer of step-length, to increase the receptive field of characteristic pattern;Meanwhile the feature for generating Atrous convolutional layer inputs
Into the multi-scale prediction layer in DSOD network, 5 scale prediction layers are formed;
Step S4: 5 rulers described in the first scale prediction layer and step S3 by DSOD network described in step S2
The feature of degree prediction interval is input in the multitask loss function L that IOG penalty term is added;
Step S5: learning rate is arranged by warm up strategy, optimizes the institute in the DSOD network using gradient descent algorithm
There is the weight of network layer;Suitable sample size (present invention is set as 16) is set, to reduce the hardware of trained DSOD network
Equipment requirement.
Further, image preprocessing in step S1 specifically:
Step S11: input picture is cut: first randomly chooses length and height that input picture cuts image,
Then one number of random selection is carried out in 0.1,0.3,0.7,0.9 as Jaccard coefficient threshold, by Jaccard coefficient
Calculate the similarity of true frame and cutting image all in original image;
Whether step S12: judging true frame or cuts the Jaccard coefficient of image is greater than and randomly selects in step S11
Jaccard threshold value;If at least one true frame and the Jaccard coefficient for cutting image are greater than the Jaccard threshold of the selection
Value, and the centre coordinate of this true frame falls in and cuts in image, then cuts image and meet the requirements, otherwise return step S11;Its
Middle Jaccard coefficient calculates as follows:
Wherein, N indicates the number of true frame in image, boxiIndicate the area of i-th of true frame, boxcutIt indicates to cut
The area of image, operator ∩ indicate to calculate overlapping area.
Step S13: to the image after cutting, carrying out left and right mirror image processing according to preset probability T (T=0.5 of the present invention),
Image resolution ratio after mirror image processing is adjusted to 300 × 300, the image after obtaining mirror image processing.
Step S14: it uses and goes equalization method that the image after mirror image processing is carried out equalization, after obtaining equalization
Image.
Further, the particular content of the step S2 are as follows: volume 1 × 1 is used first in each RFB_a network branches
Lamination is used to reduce the port number of feature;It uses step-length for 1 in first branch of RFB_a network and convolution kernel is 3 × 3
Convolutional layer, obtain 3 × 3 receptive field feature;Used in second branch 1 × 3 convolutional layer and sampling step length for 3 Atrous
Convolutional layer obtains 1 × 7 receptive field feature;Used in third branch 3 × 1 convolutional layers and sampling step length for 3 Atrous
Convolutional layer obtains 7 × 1 receptive field feature;Used in the 4th branch 3 × 3 convolutional layers and sampling step length for 5 Atrous
Convolutional layer obtains 11 × 11 receptive field feature;It is carried out by the feature that each branch is extracted in channel splicing and 1 × 1 convolutional layer
Fusion;Finally feature that the feature of fusion and second interposer of DSOD network generate is merged by residual error to be formed it is finally defeated
Feature out.
Further, it is Atrous convolutional layer that addition described in step S3, which has default sampling step length, method particularly includes:
Firstly, increasing the output channel number C of feature extraction sub-network in the DSOD network, to extract feature letter more abundant
Then the Atrous convolutional layer with certain sampling step length r is added in breath, it is special that the output channel of Atrous convolutional layer is equal to original DSOD
Sign extracts sub-network output channel number, so that Atrous convolution is embedded into the network of DSOD;And then 1 × 1 convolution is added
Layer carries out Fusion Features.
Further, the multitask loss function L of IOG penalty term is added described in step S4 specifically:
Step S41: the prediction block p and all true frame G areas of default sample for calculating the output of DSOD network are handed over and are compared maximum
Prediction block giou_max, formula is as follows:
Wherein, g indicates that true frame, G indicate the set of all true frames, and p indicates prediction block, and P indicates all prediction block collection
It closes, boxgIndicate the area of true frame, boxpIndicate the area of prediction block;
Step S42: it will hand over and be removed than the true frame of maximum area in step S41, then to calculate prediction block true with residue
The maximum IOG penalty term of frame, using maximum IOG penalty term as LiogLoss function, calculation formula are as follows:
Step S43: by LiogLoss function and positioning loss function LlocAnd Classification Loss function LconfFunction is added
Power fusion, forms final multitask loss function L, formula is as follows:
Wherein, N indicates the quantity of the positive sample of detection, and α indicates the L of positioning losslocWeight;Position loss function Lloc
Using smoothL1Loss;Classification Loss function LconfIt is calculated using information cross entropy;Position loss function LlocMeter
It calculates as follows:
Wherein,Indicator function indicates that i-th of default frame matches the true frame that j-th of classification is k, and l indicates prediction block
Position coordinates, pos indicate the default frame of positive sample, the quantity for the positive sample that N is indicated;smoothL1It calculates as follows:
Classification Loss LconfIt calculates as follows:
Wherein, c indicates the confidence level of each classification, and Neg indicates that negative sample, p indicate classification, and 0 indicates that classification is background.It indicates that i-th of prediction block is the probability of classification p, calculates as follows:
Further, using preheating (warmup) strategy setting learning rate described in step S5 specifically: will initially learn
Rate is set as 10-5, make learning rate linear increase to 10 in preceding 5 epoch-2, in the 75th epoch, the 125th epoch and
Learning rate respectively divided by 10, is completed training in the 200th epoch by 175 epoch;Criticize normalized weight initial value setting
It is 0.5, the value of biasing is set as 0;All convolution are initialized using the method for xavier;It will by improving Training strategy
Trained batch sample size falls below 16 from 128, to reduce requirement of the trained network to hardware device.
Compared with prior art, the invention has the following beneficial effects:
The present invention is added efficient network structure in lower layer network, extracts more global characteristic information, improves pair
The detectability of Small object;Penalty term is added in loss function prevents occurring similar prediction block overlapping in heavy dense targets, and
Missing inspection is generated when non-maximum value inhibits and post-processes, improves the detection accuracy of target.In addition, by improving Training strategy, drop
The hardware device requirement of low trained network.
Detailed description of the invention
Fig. 1 is the structure chart of the embodiment of the present invention.
Fig. 2 is that the convolutional layer of the embodiment of the present invention and Atrous convolutional layer one-dimensional characteristic extract figure.
Fig. 3 is the RFB_a network structure of the embodiment of the present invention.
Fig. 4 is the intensive sampling figure of the embodiment of the present invention.
Fig. 5 is 1 testing result of specific embodiment of the embodiment of the present invention and the comparison diagram of original DSOD testing result.
Specific embodiment
The present invention will be further described with reference to the accompanying drawings and embodiments.
As shown in Figure 1, present embodiments providing a kind of method for improving DSOD network, comprising the following steps:
Step S1: the image obtained in data set is input to input layer as input picture, and by input picture;To input
Image cut, mirror image and equalization is gone to pre-process, and obtains pretreated image, while will locate in advance using method for normalizing
The absolute coordinate in image after reason is converted into relative coordinate;
Step S2: RFB_a network module is added after second interposer of the feature extraction sub-network in DSOD network;
Image pretreated in step S1 is input to the feature extraction sub-network in DSOD network and carries out feature extraction;By DSOD net
The characteristic pattern of second interposer in feature extraction sub-network in network is input in RFB_a network module, by RFB_a net
The Atrous of different sampling step lengths expands convolution in network module, extracts the feature with different feeling open country;The difference of the extraction
Receptive field feature is input in 3 × 3 convolutional layers, forms first scale prediction layer of DSOD network;
Step S3: the Atrous convolutional layer for having default sampling step length is added to of the feature extraction in the DSOD network
After network, and the characteristic pattern of feature extraction sub-network described in step S2 is input to, and there is default (sampling step length 6) to sample
In the Atrous convolutional layer of step-length, to increase the receptive field of characteristic pattern;Meanwhile the feature for generating Atrous convolutional layer inputs
Into the multi-scale prediction layer in DSOD network, 5 scale prediction layers are formed;
Step S4: 5 rulers described in the first scale prediction layer and step S3 by DSOD network described in step S2
The feature of degree prediction interval is input in the multitask loss function L that IOG penalty term is added;
Step S5: learning rate is arranged by warm up strategy, optimizes the institute in the DSOD network using gradient descent algorithm
There is the weight of network layer;Suitable sample size (present invention is set as 16) is set, to reduce the hardware of trained DSOD network
Equipment requirement.
In the present embodiment, image preprocessing in step S1 specifically:
Step S11: input picture is cut: first randomly chooses length and height that input picture cuts image,
Then one number of random selection is carried out in 0.1,0.3,0.7,0.9 as Jaccard coefficient threshold, by Jaccard coefficient
Calculate the similarity of true frame and cutting image all in original image;
Whether step S12: judging true frame or cuts the Jaccard coefficient of image is greater than and randomly selects in step S11
Jaccard threshold value;If at least one true frame and the Jaccard coefficient for cutting image are greater than the Jaccard threshold of the selection
Value, and the centre coordinate of this true frame falls in and cuts in image, then cuts image and meet the requirements, otherwise return step S11;Its
Middle Jaccard coefficient calculates as follows:
Wherein, N indicates the number of true frame in image, boxiIndicate the area of i-th of true frame, boxcutIt indicates to cut
The area of image, operator ∩ indicate to calculate overlapping area.
Step S13: to the image after cutting, carrying out left and right mirror image processing according to preset probability T (T=0.5 of the present invention),
Image resolution ratio after mirror image processing is adjusted to 300 × 300, the image after obtaining mirror image processing.
Step S14: it uses and goes equalization method that the image after mirror image processing is carried out equalization, after obtaining equalization
Image.
In the present embodiment, the particular content of the step S2 are as follows: 1 is used first in each RFB_a network branches
× 1 convolutional layer is used to reduce the port number of feature;It uses step-length for 1 in first branch of RFB_a network and convolution kernel is
3 × 3 convolutional layer obtains 3 × 3 receptive field feature;Use 1 × 3 convolutional layer and sampling step length for 3 in second branch
Atrous convolutional layer obtains 1 × 7 receptive field feature;Use 3 × 1 convolutional layers and sampling step length for 3 in third branch
Atrous convolutional layer obtains 7 × 1 receptive field feature;Use 3 × 3 convolutional layers and sampling step length for 5 in the 4th branch
Atrous convolutional layer obtains 11 × 11 receptive field feature;The spy for being extracted each branch by channel splicing and 1 × 1 convolutional layer
Sign is merged;Finally the feature that the feature of fusion and second interposer of DSOD network generate is merged to be formed by residual error
The feature of final output.
In the present embodiment, it is the specific method of Atrous convolutional layer that addition described in step S3, which has default sampling step length,
Are as follows: firstly, increasing the output channel number C of feature extraction sub-network in the DSOD network, to extract feature more abundant
Information, is then added the Atrous convolutional layer with certain sampling step length r, and the output channel of Atrous convolutional layer is equal to original DSOD
Feature extraction sub-network output channel number, so that Atrous convolution is embedded into the network of DSOD;And then volume 1 × 1 is added
Lamination carries out Fusion Features.
In the present embodiment, the multitask loss function L of IOG penalty term is added described in step S4 specifically:
Step S41: the prediction block p and all true frame G areas of default sample for calculating the output of DSOD network are handed over and are compared maximum
Prediction block giou_max, formula is as follows:
Wherein, g indicates that true frame, G indicate the set of all true frames, and p indicates prediction block, and P indicates all prediction block collection
It closes, boxgIndicate the area of true frame, boxpIndicate the area of prediction block;
Step S42: it will hand over and be removed than the true frame of maximum area in step S41, then to calculate prediction block true with residue
The maximum IOG penalty term of frame, using maximum IOG penalty term as LiogLoss function, calculation formula are as follows:
Step S43: by LiogLoss function and positioning loss function LlocAnd Classification Loss function LconfFunction is added
Power fusion, forms final multitask loss function L, formula is as follows:
Wherein, N indicates the quantity of the positive sample of detection, and α indicates the L of positioning losslocWeight;Position loss function Lloc
Using smoothL1Loss;Classification Loss function LconfIt is calculated using information cross entropy;Position loss function LlocMeter
It calculates as follows:
Wherein,Indicator function indicates that i-th of default frame matches the true frame that j-th of classification is k, and l indicates prediction block
Position coordinates, pos indicate positive sample default frame, N indicate positive sample quantity;smoothL1It calculates as follows:
Classification Loss LconfIt calculates as follows:
Wherein, c indicates the confidence level of each classification, and Neg indicates that negative sample, p indicate classification, and 0 indicates that classification is background.It indicates that i-th of prediction block is the probability of classification p, calculates as follows:
In the present embodiment, using preheating (warmup) strategy setting learning rate described in step S5 specifically: will be initial
Learning rate is set as 10-5, make learning rate linear increase to 10 in preceding 5 epoch-2, in the 75th epoch, the 125th epoch
Learning rate is completed into training in the 200th epoch respectively divided by 10 with the 175th epoch;Criticize normalized weight initial value
It is set as 0.5, the value of biasing is set as 0;All convolution are initialized using the method for xavier;By improving training plan
Trained batch sample size is slightly fallen below 16 from 128, to reduce requirement of the trained network to hardware device.
Preferably, the present embodiment is cut the image of input, mirror image, equalization is gone to pre-process;It will be pretreated
Image is input in DSOD feature extraction sub-network, and RFB_a network is added after second interposer of feature extraction sub-network
In module, the feature with different feeling open country is extracted by the Atrous convolution of sampling step lengths different in RFB_a network, for detection
Small object step provides the feature for having more global information;The Atrous that sampling step length is 6 is added after feature extraction sub-network
Convolutional layer increases the receptive field of characteristic pattern, provides semantic information more abundant for subsequent multi-scale prediction layer;By Atrous
The feature that convolutional layer generates is input in multi-scale prediction layer, and multi-scale prediction layer is input in loss function, in loss letter
IOG penalty term is added in number, prevents occurring similar prediction block overlapping when predicting intensive same type target, to avoid non-
There is missing inspection after maximum value inhibition processing;Meanwhile conjunction is passed through using preheating (warmup) strategy setting learning rate in the training stage
Suitable batch sample size, reduces the hardware device requirement of trained network.The experimental results showed that the present invention is calculated relative to former DSOD
Method has higher detection accuracy, improves the detectability to Small object, while the hardware device for reducing trained network is wanted
It asks.
Fig. 2 is that Standard convolution layer and Atrous convolutional layer one-dimensional characteristic extract figure.When the step-length r of sampling is 1, Atrous
Convolution is exactly the convolution of a standard.When sampling step length r is 2, and fill factor pading is 2, and interleaving in input signal
Enter r-1 0, after Atrous convolution algorithm, 3 input signals produce 5 signal excitations.It can be seen from the figure that
Atrous convolutional layer has the receptive field for increasing convolution kernel.Also there is core using two-dimensional Atrous convolution for the present embodiment
The identical effect of one-dimensional Atrous convolution.
Fig. 3 indicates the network structure of the RFB_a of a standard.RFB_a network module is the structure of multiple-limb convolutional network.
In different branches, the different size receptive field feature extracted by the Atrous convolution of different sampling step lengths is spelled by channel
Row Fusion Features are tapped into, form the effect of intensive sampling on former characteristic pattern, as shown in Figure 4.RFB_a of the present embodiment in standard
ReLU activation primitive is added in the last one Atrous convolution of each branch in network structure, to extract higher level spy
Sign.Meanwhile the consistency in order to guarantee DSOD network structure, the present embodiment normalize (Batch in RFB_a network batches
Normalization, BN) and before ReLU activation primitive is adjusted to convolutional layer.
Embodiment 1, as shown in figure 5, analyzing DSOD and improved method at minimum (XS) using target detection analysis tool
Detectability in target.From fig. 5, it can be seen that other than desk classification, improved DSOD object detection method aircraft, from
Detection accuracy in the classifications such as driving, bird has different degrees of raising.In desk classification image, the cup that is placed on desk
And other items, to desk cause it is certain block, to improved DSOD detection cause larger impact, so precision be lower than original DSOD
Algorithm.In general, the improved method of the present embodiment has better detection accuracy to Small object.
Embodiment 2, in PASCAL V0C2007 test set, by improved DSOD and some other typical based on returning
Algorithm of target detection detection accuracy and detection speed on compare, the index of primary concern is mAP (mean Average
) and FPS (Frames Per Second) Precision.Wherein, * indicates the result tested in the present embodiment experimental situation.
It can be seen that improved DSOD model with higher precision from the data in table, detection accuracy is increased to from 77.4%
79.0%.It is compared with DSSD, improved DSOD is superior to DSSD in detection accuracy and detection speed.Since RFBNet300 is used
Multiple RFB network blocks, can extract more global feature, roughly the same with the improved method of the present embodiment in precision.But
The computational complexity of RFBNet300 is higher than the improved method of the present embodiment, and improved method has better real-time.
The foregoing is merely presently preferred embodiments of the present invention, all equivalent changes done according to scope of the present invention patent with
Modification, is all covered by the present invention.
Claims (6)
1. a kind of method for improving DSOD network, it is characterised in that: the following steps are included:
Step S1: the image obtained in data set is input to input layer as input picture, and by input picture;To input picture
It cut, mirror image and equalization gone to pre-process, obtain pretreated image, while after being pre-processed using method for normalizing
Image in absolute coordinate be converted into relative coordinate;
Step S2: RFB_a network module is added after second interposer of the feature extraction sub-network in DSOD network;It will step
Pretreated image is input to the feature extraction sub-network in DSOD network and carries out feature extraction in rapid S1;It will be in DSOD network
Feature extraction sub-network in the characteristic pattern of second interposer be input in RFB_a network module, by RFB_a network mould
The Atrous of different sampling step lengths expands convolution in block, extracts the feature with different feeling open country;The different feeling of the extraction
Wild feature is input in 3 × 3 convolutional layers, forms first scale prediction layer of DSOD network;
Step S3: the Atrous convolutional layer for having default sampling step length is added to the feature extraction sub-network in the DSOD network
Afterwards, and by the characteristic pattern of feature extraction sub-network described in step S2 it is input to the Atrous convolutional layer with default sampling step length
In, to increase the receptive field of characteristic pattern;Meanwhile the feature that Atrous convolutional layer generates being input to more rulers in DSOD network
It spends in prediction interval, forms 5 scale prediction layers;
Step S4: 5 scales described in the first scale prediction layer and step S3 by DSOD network described in step S2 are pre-
The feature for surveying layer is input in the multitask loss function L that IOG penalty term is added;
Step S5: learning rate is arranged by warm up strategy, optimizes all nets in the DSOD network using gradient descent algorithm
The weight of network layers;Suitable sample size is set, to reduce the hardware device requirement of trained DSOD network.
2. a kind of method for improving DSOD network according to claim 1, it is characterised in that: image preprocessing in step S1
Specifically:
Step S11: input picture is cut: first randomly chooses length and height that input picture cuts image, then
One number of random selection is carried out in 0.1,0.3,0.7,0.9 as Jaccard coefficient threshold, is calculated by Jaccard coefficient
The similarity of all true frames and cutting image in original image;
Whether step S12: judging true frame or cuts the Jaccard coefficient of image is greater than and randomly selects in step S11
Jaccard threshold value;If at least one true frame and the Jaccard coefficient for cutting image are greater than the Jaccard threshold of the selection
Value, and the centre coordinate of this true frame falls in and cuts in image, then cuts image and meet the requirements, otherwise return step S11;Its
Middle Jaccard coefficient calculates as follows:
Wherein, N indicates the number of true frame in image, boxiIndicate the area of i-th of true frame, boxcutIt indicates to cut image
Area, operator ∩ indicate to calculate overlapping area.
Step S13: to the image after cutting, left and right mirror image processing is carried out according to preset probability T, by the image after mirror image processing
Resolution adjustment is 300 × 300, the image after obtaining mirror image processing.
Step S14: using and go equalization method that the image after mirror image processing is carried out equalization, the figure after obtaining equalization
Picture.
3. a kind of method for improving DSOD network according to claim 1, it is characterised in that: the step S2's is specific interior
Hold are as follows: use 1 × 1 convolutional layer first in each RFB_a network branches, be used to reduce the port number of feature;In RFB_a
It uses step-length for 1 in first branch of network and convolutional layer that convolution kernel is 3 × 3, obtains 3 × 3 receptive field feature;Second
It uses 1 × 3 convolutional layer and sampling step length for 3 Atrous convolutional layer in branch, obtains 1 × 7 receptive field feature;In third
It uses 3 × 1 convolutional layers and sampling step length for 3 Atrous convolutional layer in branch, obtains 7 × 1 receptive field feature;At the 4th
It uses 3 × 3 convolutional layers and sampling step length for 5 Atrous convolutional layer in branch, obtains 11 × 11 receptive field feature;By logical
Road splicing and 1 × 1 convolutional layer merge the feature that each branch extracts;Finally by the second of the feature of fusion and DSOD network
The feature that a interposer generates merges the feature to form final output by residual error.
4. a kind of method for improving DSOD network according to claim 1, which is characterized in that tool is added described in step S3
Having default sampling step length is Atrous convolutional layer method particularly includes: firstly, increasing feature extraction subnet in the DSOD network
Then the Atrous with certain sampling step length r is added to extract characteristic information more abundant in the output channel number C of network
The output channel of convolutional layer, Atrous convolutional layer is equal to original DSOD feature extraction sub-network output channel number, so that Atrous volumes
Product is embedded into the network of DSOD;And then 1 × 1 convolutional layer is added and carries out Fusion Features.
5. a kind of method for improving DSOD network according to claim 1, it is characterised in that: IOG is added described in step S4
The multitask loss function L of penalty term specifically:
Step S41: prediction block p and all true frame G areas friendships of default sample and more pre- than maximum of the output of DSOD network are calculated
Survey frame giou_max, formula is as follows:
Wherein, g indicates that true frame, G indicate the set of all true frames, and p indicates prediction block, and P indicates all prediction block set,
boxgIndicate the area of true frame, boxpIndicate the area of prediction block;
Step S42: will hand over and remove than the true frame of maximum area in step S41, then calculate prediction block and remaining true frame
Maximum IOG penalty term, using maximum IOG penalty term as LiogLoss function, calculation formula are as follows:
Step S43: by LiogLoss function and positioning loss function LlocAnd Classification Loss function LconfFunction, which is weighted, to be melted
It closes, forms final multitask loss function L, formula is as follows:
Wherein, N indicates the quantity of the positive sample of detection, and α indicates the L of positioning losslocWeight;Position loss function LlocUsing
Be smoothL1Loss;Classification Loss function LconfIt is calculated using information cross entropy;Position loss function LlocIt calculates such as
Under:
Wherein,Indicator function indicates that i-th of default frame matches the true frame that j-th of classification is k, and l indicates the position of prediction block
Coordinate, pos indicate the default frame of positive sample, the quantity for the positive sample that N is indicated;smoothL1It calculates as follows:
Classification Loss LconfIt calculates as follows:
Wherein, c indicates the confidence level of each classification, and Neg indicates that negative sample, p indicate classification, and 0 indicates that classification is background.Table
Show that i-th of prediction block is the probability of classification p, calculate as follows:
6. a kind of method for improving DSOD network according to claim 1, which is characterized in that using pre- described in step S5
Hot strategy setting learning rate specifically: set 10 for initial learning rate-5, learning rate linear increase is arrived in preceding 5 epoch
10-2, in the 75th epoch, the 125th epoch and the 175th epoch by learning rate respectively divided by 10, in the 200th epoch
Complete training;It criticizes normalized weight initial value and is set as 0.5, the value of biasing is set as 0;All convolution use xavier's
Method is initialized;Trained batch sample size is fallen below 16 from 128 by improving Training strategy, to reduce training net
Requirement of the network to hardware device.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910029814.0A CN109784476B (en) | 2019-01-12 | 2019-01-12 | Method for improving DSOD network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910029814.0A CN109784476B (en) | 2019-01-12 | 2019-01-12 | Method for improving DSOD network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109784476A true CN109784476A (en) | 2019-05-21 |
CN109784476B CN109784476B (en) | 2022-08-16 |
Family
ID=66500412
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910029814.0A Active CN109784476B (en) | 2019-01-12 | 2019-01-12 | Method for improving DSOD network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109784476B (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110348390A (en) * | 2019-07-12 | 2019-10-18 | 创新奇智(重庆)科技有限公司 | A kind of training method, computer-readable medium and the system of fire defector model |
CN110348423A (en) * | 2019-07-19 | 2019-10-18 | 西安电子科技大学 | A kind of real-time face detection method based on deep learning |
CN110378232A (en) * | 2019-06-20 | 2019-10-25 | 陕西师范大学 | The examination hall examinee position rapid detection method of improved SSD dual network |
CN110443172A (en) * | 2019-07-25 | 2019-11-12 | 北京科技大学 | A kind of object detection method and system based on super-resolution and model compression |
CN110503112A (en) * | 2019-08-27 | 2019-11-26 | 电子科技大学 | A kind of small target deteection of Enhanced feature study and recognition methods |
CN110580445A (en) * | 2019-07-12 | 2019-12-17 | 西北工业大学 | Face key point detection method based on GIoU and weighted NMS improvement |
CN110647817A (en) * | 2019-08-27 | 2020-01-03 | 江南大学 | Real-time face detection method based on MobileNet V3 |
CN110852330A (en) * | 2019-10-23 | 2020-02-28 | 天津大学 | Behavior identification method based on single stage |
CN111027512A (en) * | 2019-12-24 | 2020-04-17 | 北方工业大学 | Remote sensing image shore-approaching ship detection and positioning method and device |
CN111079753A (en) * | 2019-12-20 | 2020-04-28 | 长沙千视通智能科技有限公司 | License plate recognition method and device based on deep learning and big data combination |
CN111539434A (en) * | 2020-04-10 | 2020-08-14 | 南京理工大学 | Infrared weak and small target detection method based on similarity |
CN111680556A (en) * | 2020-04-29 | 2020-09-18 | 平安国际智慧城市科技股份有限公司 | Method, device and equipment for identifying vehicle type at traffic gate and storage medium |
CN112525919A (en) * | 2020-12-21 | 2021-03-19 | 福建新大陆软件工程有限公司 | Wood board defect detection system and method based on deep learning |
CN112614107A (en) * | 2020-12-23 | 2021-04-06 | 北京澎思科技有限公司 | Image processing method and device, electronic equipment and storage medium |
CN112819008A (en) * | 2021-01-11 | 2021-05-18 | 腾讯科技(深圳)有限公司 | Method, device, medium and electronic equipment for optimizing instance detection network |
CN113096023A (en) * | 2020-01-08 | 2021-07-09 | 字节跳动有限公司 | Neural network training method, image processing method and device, and storage medium |
CN115760990A (en) * | 2023-01-10 | 2023-03-07 | 华南理工大学 | Identification and positioning method of pineapple pistil, electronic equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170046616A1 (en) * | 2015-08-15 | 2017-02-16 | Salesforce.Com, Inc. | Three-dimensional (3d) convolution with 3d batch normalization |
CN108564097A (en) * | 2017-12-05 | 2018-09-21 | 华南理工大学 | A kind of multiscale target detection method based on depth convolutional neural networks |
CN108921196A (en) * | 2018-06-01 | 2018-11-30 | 南京邮电大学 | A kind of semantic segmentation method for improving full convolutional neural networks |
CN109035184A (en) * | 2018-06-08 | 2018-12-18 | 西北工业大学 | A kind of intensive connection method based on the deformable convolution of unit |
-
2019
- 2019-01-12 CN CN201910029814.0A patent/CN109784476B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170046616A1 (en) * | 2015-08-15 | 2017-02-16 | Salesforce.Com, Inc. | Three-dimensional (3d) convolution with 3d batch normalization |
CN108564097A (en) * | 2017-12-05 | 2018-09-21 | 华南理工大学 | A kind of multiscale target detection method based on depth convolutional neural networks |
CN108921196A (en) * | 2018-06-01 | 2018-11-30 | 南京邮电大学 | A kind of semantic segmentation method for improving full convolutional neural networks |
CN109035184A (en) * | 2018-06-08 | 2018-12-18 | 西北工业大学 | A kind of intensive connection method based on the deformable convolution of unit |
Non-Patent Citations (3)
Title |
---|
LIANG-CHIEH CHEN ET AL.: "DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》 * |
ZHIQIANG SHEN ET AL.: "DSOD: Learning Deeply Supervised Object Detectors from Scratch", 《2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)》 * |
吴建耀 等: "一种改进的DSOD目标检测算法", 《半导体光电》 * |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110378232B (en) * | 2019-06-20 | 2022-12-27 | 陕西师范大学 | Improved test room examinee position rapid detection method of SSD dual-network |
CN110378232A (en) * | 2019-06-20 | 2019-10-25 | 陕西师范大学 | The examination hall examinee position rapid detection method of improved SSD dual network |
CN110580445B (en) * | 2019-07-12 | 2023-02-07 | 西北工业大学 | Face key point detection method based on GIoU and weighted NMS improvement |
CN110580445A (en) * | 2019-07-12 | 2019-12-17 | 西北工业大学 | Face key point detection method based on GIoU and weighted NMS improvement |
CN110348390A (en) * | 2019-07-12 | 2019-10-18 | 创新奇智(重庆)科技有限公司 | A kind of training method, computer-readable medium and the system of fire defector model |
CN110348423A (en) * | 2019-07-19 | 2019-10-18 | 西安电子科技大学 | A kind of real-time face detection method based on deep learning |
CN110443172A (en) * | 2019-07-25 | 2019-11-12 | 北京科技大学 | A kind of object detection method and system based on super-resolution and model compression |
CN110647817A (en) * | 2019-08-27 | 2020-01-03 | 江南大学 | Real-time face detection method based on MobileNet V3 |
CN110503112A (en) * | 2019-08-27 | 2019-11-26 | 电子科技大学 | A kind of small target deteection of Enhanced feature study and recognition methods |
CN110503112B (en) * | 2019-08-27 | 2023-02-03 | 电子科技大学 | Small target detection and identification method for enhancing feature learning |
CN110647817B (en) * | 2019-08-27 | 2022-04-05 | 江南大学 | Real-time face detection method based on MobileNet V3 |
CN110852330A (en) * | 2019-10-23 | 2020-02-28 | 天津大学 | Behavior identification method based on single stage |
CN111079753B (en) * | 2019-12-20 | 2023-08-22 | 长沙千视通智能科技有限公司 | License plate recognition method and device based on combination of deep learning and big data |
CN111079753A (en) * | 2019-12-20 | 2020-04-28 | 长沙千视通智能科技有限公司 | License plate recognition method and device based on deep learning and big data combination |
CN111027512B (en) * | 2019-12-24 | 2023-04-18 | 北方工业大学 | Remote sensing image quayside ship detection and positioning method and device |
CN111027512A (en) * | 2019-12-24 | 2020-04-17 | 北方工业大学 | Remote sensing image shore-approaching ship detection and positioning method and device |
CN113096023A (en) * | 2020-01-08 | 2021-07-09 | 字节跳动有限公司 | Neural network training method, image processing method and device, and storage medium |
CN113096023B (en) * | 2020-01-08 | 2023-10-27 | 字节跳动有限公司 | Training method, image processing method and device for neural network and storage medium |
CN111539434B (en) * | 2020-04-10 | 2022-09-20 | 南京理工大学 | Infrared weak and small target detection method based on similarity |
CN111539434A (en) * | 2020-04-10 | 2020-08-14 | 南京理工大学 | Infrared weak and small target detection method based on similarity |
CN111680556A (en) * | 2020-04-29 | 2020-09-18 | 平安国际智慧城市科技股份有限公司 | Method, device and equipment for identifying vehicle type at traffic gate and storage medium |
CN111680556B (en) * | 2020-04-29 | 2024-06-07 | 平安国际智慧城市科技股份有限公司 | Method, device, equipment and storage medium for identifying traffic gate vehicle type |
CN112525919A (en) * | 2020-12-21 | 2021-03-19 | 福建新大陆软件工程有限公司 | Wood board defect detection system and method based on deep learning |
CN112614107A (en) * | 2020-12-23 | 2021-04-06 | 北京澎思科技有限公司 | Image processing method and device, electronic equipment and storage medium |
CN112819008A (en) * | 2021-01-11 | 2021-05-18 | 腾讯科技(深圳)有限公司 | Method, device, medium and electronic equipment for optimizing instance detection network |
CN115760990A (en) * | 2023-01-10 | 2023-03-07 | 华南理工大学 | Identification and positioning method of pineapple pistil, electronic equipment and storage medium |
CN115760990B (en) * | 2023-01-10 | 2023-04-21 | 华南理工大学 | Pineapple pistil identification and positioning method, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109784476B (en) | 2022-08-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109784476A (en) | A method of improving DSOD network | |
CN111259930B (en) | General target detection method of self-adaptive attention guidance mechanism | |
CN111401201B (en) | Aerial image multi-scale target detection method based on spatial pyramid attention drive | |
CN107609525B (en) | Remote sensing image target detection method for constructing convolutional neural network based on pruning strategy | |
CN108416378A (en) | A kind of large scene SAR target identification methods based on deep neural network | |
CN111696128B (en) | High-speed multi-target detection tracking and target image optimization method and storage medium | |
CN108898047B (en) | Pedestrian detection method and system based on blocking and shielding perception | |
CN111091105A (en) | Remote sensing image target detection method based on new frame regression loss function | |
CN109948415A (en) | Remote sensing image object detection method based on filtering background and scale prediction | |
CN106980858A (en) | The language text detection of a kind of language text detection with alignment system and the application system and localization method | |
CN111027481B (en) | Behavior analysis method and device based on human body key point detection | |
JPH06333054A (en) | System for detecting target pattern within image | |
CN105844627A (en) | Sea surface object image background inhibition method based on convolution nerve network | |
CN112396619B (en) | Small particle segmentation method based on semantic segmentation and internally complex composition | |
CN109840524A (en) | Kind identification method, device, equipment and the storage medium of text | |
CN110264444A (en) | Damage detecting method and device based on weak segmentation | |
CN111983676A (en) | Earthquake monitoring method and device based on deep learning | |
CN110111370A (en) | A kind of vision object tracking methods based on TLD and the multiple dimensioned space-time characteristic of depth | |
CN106127754A (en) | CME detection method based on fusion feature and space-time expending decision rule | |
CN104851102B (en) | A kind of infrared small target detection method based on human visual system | |
CN116542912A (en) | Flexible body bridge vibration detection model with multi-target visual tracking function and application | |
Duan et al. | An anchor box setting technique based on differences between categories for object detection | |
CN115410102A (en) | SAR image airplane target detection method based on combined attention mechanism | |
Yang et al. | Immature Yuzu citrus detection based on DSSD network with image tiling approach | |
CN112801955B (en) | Plankton detection method under unbalanced population distribution condition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |