CN109784476A - A method of improving DSOD network - Google Patents

A method of improving DSOD network Download PDF

Info

Publication number
CN109784476A
CN109784476A CN201910029814.0A CN201910029814A CN109784476A CN 109784476 A CN109784476 A CN 109784476A CN 201910029814 A CN201910029814 A CN 201910029814A CN 109784476 A CN109784476 A CN 109784476A
Authority
CN
China
Prior art keywords
network
dsod
image
feature
convolutional layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910029814.0A
Other languages
Chinese (zh)
Other versions
CN109784476B (en
Inventor
程树英
吴建耀
郑茜颖
林培杰
陈志聪
吴丽君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN201910029814.0A priority Critical patent/CN109784476B/en
Publication of CN109784476A publication Critical patent/CN109784476A/en
Application granted granted Critical
Publication of CN109784476B publication Critical patent/CN109784476B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The present invention relates to a kind of methods for improving DSOD network, input picture is pre-processed first, pretreated image is input in DSOD feature extraction sub-network, RFB_a network module is added after second interposer of feature extraction sub-network, the feature with different feeling open country is extracted by the Atrous convolution of sampling step lengths different in RFB_a network, the Atrous convolutional layer that sampling step length is 6 is added after feature extraction sub-network, the feature that Atrous convolutional layer generates is input in multi-scale prediction layer, multi-scale prediction layer is input in loss function, IOG penalty term is added in loss function, prevent occurring similar prediction block overlapping when predicting intensive same type target.Meanwhile learning rate is arranged using warm up strategy in the training stage, by the way that suitably batch sample size is arranged, reduce the hardware device requirement of trained network.The present invention has higher detection accuracy relative to former DSOD algorithm, improves the detectability to Small object, while reducing the hardware device requirement of trained network.

Description

A method of improving DSOD network
Technical field
The present invention relates to computer vision field, especially a kind of method for improving DSOD network.
Background technique
Target detection is one of most important research topic in computer vision, and main task is positioned in given image Interested target, and accurately judge the specific location of each target.Algorithm of target detection based on convolutional neural networks can To be divided into two kinds: algorithm of target detection based on extracted region and based on the algorithm of target detection of recurrence.Based on extracted region mesh Detection algorithm is marked, although detection accuracy with higher, needs to extract candidate region, detection speed is difficult to reach real When.Algorithm of target detection based on recurrence, as SSD, DSOD have reached reality so that detecting by the step of removing extracted region When.But DSOD algorithm requires the problems such as high to hardware device there is poor to small target deteection ability, and when training network.
Summary of the invention
In view of this, can be improved the purpose of the present invention is to propose to a kind of method for improving DSOD network to Small object Detectability and the detection accuracy for improving target.
The present invention is realized using following scheme: a method of improving DSOD network, comprising the following steps:
Step S1: the image obtained in data set is input to input layer as input picture, and by input picture;To input Image cut, mirror image and equalization is gone to pre-process, and obtains pretreated image, while will locate in advance using method for normalizing The absolute coordinate in image after reason is converted into relative coordinate;
Step S2: RFB_a network module is added after second interposer of the feature extraction sub-network in DSOD network; Image pretreated in step S1 is input to the feature extraction sub-network in DSOD network and carries out feature extraction;By DSOD net The characteristic pattern of second interposer in feature extraction sub-network in network is input in RFB_a network module, by RFB_a net The Atrous of different sampling step lengths expands convolution in network module, extracts the feature with different feeling open country;The difference of the extraction Receptive field feature is input in 3 × 3 convolutional layers, forms first scale prediction layer of DSOD network;
Step S3: the Atrous convolutional layer for having default sampling step length is added to of the feature extraction in the DSOD network After network, and the characteristic pattern of feature extraction sub-network described in step S2 is input to, and there is default (sampling step length 6) to sample In the Atrous convolutional layer of step-length, to increase the receptive field of characteristic pattern;Meanwhile the feature for generating Atrous convolutional layer inputs Into the multi-scale prediction layer in DSOD network, 5 scale prediction layers are formed;
Step S4: 5 rulers described in the first scale prediction layer and step S3 by DSOD network described in step S2 The feature of degree prediction interval is input in the multitask loss function L that IOG penalty term is added;
Step S5: learning rate is arranged by warm up strategy, optimizes the institute in the DSOD network using gradient descent algorithm There is the weight of network layer;Suitable sample size (present invention is set as 16) is set, to reduce the hardware of trained DSOD network Equipment requirement.
Further, image preprocessing in step S1 specifically:
Step S11: input picture is cut: first randomly chooses length and height that input picture cuts image, Then one number of random selection is carried out in 0.1,0.3,0.7,0.9 as Jaccard coefficient threshold, by Jaccard coefficient Calculate the similarity of true frame and cutting image all in original image;
Whether step S12: judging true frame or cuts the Jaccard coefficient of image is greater than and randomly selects in step S11 Jaccard threshold value;If at least one true frame and the Jaccard coefficient for cutting image are greater than the Jaccard threshold of the selection Value, and the centre coordinate of this true frame falls in and cuts in image, then cuts image and meet the requirements, otherwise return step S11;Its Middle Jaccard coefficient calculates as follows:
Wherein, N indicates the number of true frame in image, boxiIndicate the area of i-th of true frame, boxcutIt indicates to cut The area of image, operator ∩ indicate to calculate overlapping area.
Step S13: to the image after cutting, carrying out left and right mirror image processing according to preset probability T (T=0.5 of the present invention), Image resolution ratio after mirror image processing is adjusted to 300 × 300, the image after obtaining mirror image processing.
Step S14: it uses and goes equalization method that the image after mirror image processing is carried out equalization, after obtaining equalization Image.
Further, the particular content of the step S2 are as follows: volume 1 × 1 is used first in each RFB_a network branches Lamination is used to reduce the port number of feature;It uses step-length for 1 in first branch of RFB_a network and convolution kernel is 3 × 3 Convolutional layer, obtain 3 × 3 receptive field feature;Used in second branch 1 × 3 convolutional layer and sampling step length for 3 Atrous Convolutional layer obtains 1 × 7 receptive field feature;Used in third branch 3 × 1 convolutional layers and sampling step length for 3 Atrous Convolutional layer obtains 7 × 1 receptive field feature;Used in the 4th branch 3 × 3 convolutional layers and sampling step length for 5 Atrous Convolutional layer obtains 11 × 11 receptive field feature;It is carried out by the feature that each branch is extracted in channel splicing and 1 × 1 convolutional layer Fusion;Finally feature that the feature of fusion and second interposer of DSOD network generate is merged by residual error to be formed it is finally defeated Feature out.
Further, it is Atrous convolutional layer that addition described in step S3, which has default sampling step length, method particularly includes: Firstly, increasing the output channel number C of feature extraction sub-network in the DSOD network, to extract feature letter more abundant Then the Atrous convolutional layer with certain sampling step length r is added in breath, it is special that the output channel of Atrous convolutional layer is equal to original DSOD Sign extracts sub-network output channel number, so that Atrous convolution is embedded into the network of DSOD;And then 1 × 1 convolution is added Layer carries out Fusion Features.
Further, the multitask loss function L of IOG penalty term is added described in step S4 specifically:
Step S41: the prediction block p and all true frame G areas of default sample for calculating the output of DSOD network are handed over and are compared maximum Prediction block giou_max, formula is as follows:
Wherein, g indicates that true frame, G indicate the set of all true frames, and p indicates prediction block, and P indicates all prediction block collection It closes, boxgIndicate the area of true frame, boxpIndicate the area of prediction block;
Step S42: it will hand over and be removed than the true frame of maximum area in step S41, then to calculate prediction block true with residue The maximum IOG penalty term of frame, using maximum IOG penalty term as LiogLoss function, calculation formula are as follows:
Step S43: by LiogLoss function and positioning loss function LlocAnd Classification Loss function LconfFunction is added Power fusion, forms final multitask loss function L, formula is as follows:
Wherein, N indicates the quantity of the positive sample of detection, and α indicates the L of positioning losslocWeight;Position loss function Lloc Using smoothL1Loss;Classification Loss function LconfIt is calculated using information cross entropy;Position loss function LlocMeter It calculates as follows:
Wherein,Indicator function indicates that i-th of default frame matches the true frame that j-th of classification is k, and l indicates prediction block Position coordinates, pos indicate the default frame of positive sample, the quantity for the positive sample that N is indicated;smoothL1It calculates as follows:
Classification Loss LconfIt calculates as follows:
Wherein, c indicates the confidence level of each classification, and Neg indicates that negative sample, p indicate classification, and 0 indicates that classification is background.It indicates that i-th of prediction block is the probability of classification p, calculates as follows:
Further, using preheating (warmup) strategy setting learning rate described in step S5 specifically: will initially learn Rate is set as 10-5, make learning rate linear increase to 10 in preceding 5 epoch-2, in the 75th epoch, the 125th epoch and Learning rate respectively divided by 10, is completed training in the 200th epoch by 175 epoch;Criticize normalized weight initial value setting It is 0.5, the value of biasing is set as 0;All convolution are initialized using the method for xavier;It will by improving Training strategy Trained batch sample size falls below 16 from 128, to reduce requirement of the trained network to hardware device.
Compared with prior art, the invention has the following beneficial effects:
The present invention is added efficient network structure in lower layer network, extracts more global characteristic information, improves pair The detectability of Small object;Penalty term is added in loss function prevents occurring similar prediction block overlapping in heavy dense targets, and Missing inspection is generated when non-maximum value inhibits and post-processes, improves the detection accuracy of target.In addition, by improving Training strategy, drop The hardware device requirement of low trained network.
Detailed description of the invention
Fig. 1 is the structure chart of the embodiment of the present invention.
Fig. 2 is that the convolutional layer of the embodiment of the present invention and Atrous convolutional layer one-dimensional characteristic extract figure.
Fig. 3 is the RFB_a network structure of the embodiment of the present invention.
Fig. 4 is the intensive sampling figure of the embodiment of the present invention.
Fig. 5 is 1 testing result of specific embodiment of the embodiment of the present invention and the comparison diagram of original DSOD testing result.
Specific embodiment
The present invention will be further described with reference to the accompanying drawings and embodiments.
As shown in Figure 1, present embodiments providing a kind of method for improving DSOD network, comprising the following steps:
Step S1: the image obtained in data set is input to input layer as input picture, and by input picture;To input Image cut, mirror image and equalization is gone to pre-process, and obtains pretreated image, while will locate in advance using method for normalizing The absolute coordinate in image after reason is converted into relative coordinate;
Step S2: RFB_a network module is added after second interposer of the feature extraction sub-network in DSOD network; Image pretreated in step S1 is input to the feature extraction sub-network in DSOD network and carries out feature extraction;By DSOD net The characteristic pattern of second interposer in feature extraction sub-network in network is input in RFB_a network module, by RFB_a net The Atrous of different sampling step lengths expands convolution in network module, extracts the feature with different feeling open country;The difference of the extraction Receptive field feature is input in 3 × 3 convolutional layers, forms first scale prediction layer of DSOD network;
Step S3: the Atrous convolutional layer for having default sampling step length is added to of the feature extraction in the DSOD network After network, and the characteristic pattern of feature extraction sub-network described in step S2 is input to, and there is default (sampling step length 6) to sample In the Atrous convolutional layer of step-length, to increase the receptive field of characteristic pattern;Meanwhile the feature for generating Atrous convolutional layer inputs Into the multi-scale prediction layer in DSOD network, 5 scale prediction layers are formed;
Step S4: 5 rulers described in the first scale prediction layer and step S3 by DSOD network described in step S2 The feature of degree prediction interval is input in the multitask loss function L that IOG penalty term is added;
Step S5: learning rate is arranged by warm up strategy, optimizes the institute in the DSOD network using gradient descent algorithm There is the weight of network layer;Suitable sample size (present invention is set as 16) is set, to reduce the hardware of trained DSOD network Equipment requirement.
In the present embodiment, image preprocessing in step S1 specifically:
Step S11: input picture is cut: first randomly chooses length and height that input picture cuts image, Then one number of random selection is carried out in 0.1,0.3,0.7,0.9 as Jaccard coefficient threshold, by Jaccard coefficient Calculate the similarity of true frame and cutting image all in original image;
Whether step S12: judging true frame or cuts the Jaccard coefficient of image is greater than and randomly selects in step S11 Jaccard threshold value;If at least one true frame and the Jaccard coefficient for cutting image are greater than the Jaccard threshold of the selection Value, and the centre coordinate of this true frame falls in and cuts in image, then cuts image and meet the requirements, otherwise return step S11;Its Middle Jaccard coefficient calculates as follows:
Wherein, N indicates the number of true frame in image, boxiIndicate the area of i-th of true frame, boxcutIt indicates to cut The area of image, operator ∩ indicate to calculate overlapping area.
Step S13: to the image after cutting, carrying out left and right mirror image processing according to preset probability T (T=0.5 of the present invention), Image resolution ratio after mirror image processing is adjusted to 300 × 300, the image after obtaining mirror image processing.
Step S14: it uses and goes equalization method that the image after mirror image processing is carried out equalization, after obtaining equalization Image.
In the present embodiment, the particular content of the step S2 are as follows: 1 is used first in each RFB_a network branches × 1 convolutional layer is used to reduce the port number of feature;It uses step-length for 1 in first branch of RFB_a network and convolution kernel is 3 × 3 convolutional layer obtains 3 × 3 receptive field feature;Use 1 × 3 convolutional layer and sampling step length for 3 in second branch Atrous convolutional layer obtains 1 × 7 receptive field feature;Use 3 × 1 convolutional layers and sampling step length for 3 in third branch Atrous convolutional layer obtains 7 × 1 receptive field feature;Use 3 × 3 convolutional layers and sampling step length for 5 in the 4th branch Atrous convolutional layer obtains 11 × 11 receptive field feature;The spy for being extracted each branch by channel splicing and 1 × 1 convolutional layer Sign is merged;Finally the feature that the feature of fusion and second interposer of DSOD network generate is merged to be formed by residual error The feature of final output.
In the present embodiment, it is the specific method of Atrous convolutional layer that addition described in step S3, which has default sampling step length, Are as follows: firstly, increasing the output channel number C of feature extraction sub-network in the DSOD network, to extract feature more abundant Information, is then added the Atrous convolutional layer with certain sampling step length r, and the output channel of Atrous convolutional layer is equal to original DSOD Feature extraction sub-network output channel number, so that Atrous convolution is embedded into the network of DSOD;And then volume 1 × 1 is added Lamination carries out Fusion Features.
In the present embodiment, the multitask loss function L of IOG penalty term is added described in step S4 specifically:
Step S41: the prediction block p and all true frame G areas of default sample for calculating the output of DSOD network are handed over and are compared maximum Prediction block giou_max, formula is as follows:
Wherein, g indicates that true frame, G indicate the set of all true frames, and p indicates prediction block, and P indicates all prediction block collection It closes, boxgIndicate the area of true frame, boxpIndicate the area of prediction block;
Step S42: it will hand over and be removed than the true frame of maximum area in step S41, then to calculate prediction block true with residue The maximum IOG penalty term of frame, using maximum IOG penalty term as LiogLoss function, calculation formula are as follows:
Step S43: by LiogLoss function and positioning loss function LlocAnd Classification Loss function LconfFunction is added Power fusion, forms final multitask loss function L, formula is as follows:
Wherein, N indicates the quantity of the positive sample of detection, and α indicates the L of positioning losslocWeight;Position loss function Lloc Using smoothL1Loss;Classification Loss function LconfIt is calculated using information cross entropy;Position loss function LlocMeter It calculates as follows:
Wherein,Indicator function indicates that i-th of default frame matches the true frame that j-th of classification is k, and l indicates prediction block Position coordinates, pos indicate positive sample default frame, N indicate positive sample quantity;smoothL1It calculates as follows:
Classification Loss LconfIt calculates as follows:
Wherein, c indicates the confidence level of each classification, and Neg indicates that negative sample, p indicate classification, and 0 indicates that classification is background.It indicates that i-th of prediction block is the probability of classification p, calculates as follows:
In the present embodiment, using preheating (warmup) strategy setting learning rate described in step S5 specifically: will be initial Learning rate is set as 10-5, make learning rate linear increase to 10 in preceding 5 epoch-2, in the 75th epoch, the 125th epoch Learning rate is completed into training in the 200th epoch respectively divided by 10 with the 175th epoch;Criticize normalized weight initial value It is set as 0.5, the value of biasing is set as 0;All convolution are initialized using the method for xavier;By improving training plan Trained batch sample size is slightly fallen below 16 from 128, to reduce requirement of the trained network to hardware device.
Preferably, the present embodiment is cut the image of input, mirror image, equalization is gone to pre-process;It will be pretreated Image is input in DSOD feature extraction sub-network, and RFB_a network is added after second interposer of feature extraction sub-network In module, the feature with different feeling open country is extracted by the Atrous convolution of sampling step lengths different in RFB_a network, for detection Small object step provides the feature for having more global information;The Atrous that sampling step length is 6 is added after feature extraction sub-network Convolutional layer increases the receptive field of characteristic pattern, provides semantic information more abundant for subsequent multi-scale prediction layer;By Atrous The feature that convolutional layer generates is input in multi-scale prediction layer, and multi-scale prediction layer is input in loss function, in loss letter IOG penalty term is added in number, prevents occurring similar prediction block overlapping when predicting intensive same type target, to avoid non- There is missing inspection after maximum value inhibition processing;Meanwhile conjunction is passed through using preheating (warmup) strategy setting learning rate in the training stage Suitable batch sample size, reduces the hardware device requirement of trained network.The experimental results showed that the present invention is calculated relative to former DSOD Method has higher detection accuracy, improves the detectability to Small object, while the hardware device for reducing trained network is wanted It asks.
Fig. 2 is that Standard convolution layer and Atrous convolutional layer one-dimensional characteristic extract figure.When the step-length r of sampling is 1, Atrous Convolution is exactly the convolution of a standard.When sampling step length r is 2, and fill factor pading is 2, and interleaving in input signal Enter r-1 0, after Atrous convolution algorithm, 3 input signals produce 5 signal excitations.It can be seen from the figure that Atrous convolutional layer has the receptive field for increasing convolution kernel.Also there is core using two-dimensional Atrous convolution for the present embodiment The identical effect of one-dimensional Atrous convolution.
Fig. 3 indicates the network structure of the RFB_a of a standard.RFB_a network module is the structure of multiple-limb convolutional network. In different branches, the different size receptive field feature extracted by the Atrous convolution of different sampling step lengths is spelled by channel Row Fusion Features are tapped into, form the effect of intensive sampling on former characteristic pattern, as shown in Figure 4.RFB_a of the present embodiment in standard ReLU activation primitive is added in the last one Atrous convolution of each branch in network structure, to extract higher level spy Sign.Meanwhile the consistency in order to guarantee DSOD network structure, the present embodiment normalize (Batch in RFB_a network batches Normalization, BN) and before ReLU activation primitive is adjusted to convolutional layer.
Embodiment 1, as shown in figure 5, analyzing DSOD and improved method at minimum (XS) using target detection analysis tool Detectability in target.From fig. 5, it can be seen that other than desk classification, improved DSOD object detection method aircraft, from Detection accuracy in the classifications such as driving, bird has different degrees of raising.In desk classification image, the cup that is placed on desk And other items, to desk cause it is certain block, to improved DSOD detection cause larger impact, so precision be lower than original DSOD Algorithm.In general, the improved method of the present embodiment has better detection accuracy to Small object.
Embodiment 2, in PASCAL V0C2007 test set, by improved DSOD and some other typical based on returning Algorithm of target detection detection accuracy and detection speed on compare, the index of primary concern is mAP (mean Average ) and FPS (Frames Per Second) Precision.Wherein, * indicates the result tested in the present embodiment experimental situation. It can be seen that improved DSOD model with higher precision from the data in table, detection accuracy is increased to from 77.4% 79.0%.It is compared with DSSD, improved DSOD is superior to DSSD in detection accuracy and detection speed.Since RFBNet300 is used Multiple RFB network blocks, can extract more global feature, roughly the same with the improved method of the present embodiment in precision.But The computational complexity of RFBNet300 is higher than the improved method of the present embodiment, and improved method has better real-time.
The foregoing is merely presently preferred embodiments of the present invention, all equivalent changes done according to scope of the present invention patent with Modification, is all covered by the present invention.

Claims (6)

1. a kind of method for improving DSOD network, it is characterised in that: the following steps are included:
Step S1: the image obtained in data set is input to input layer as input picture, and by input picture;To input picture It cut, mirror image and equalization gone to pre-process, obtain pretreated image, while after being pre-processed using method for normalizing Image in absolute coordinate be converted into relative coordinate;
Step S2: RFB_a network module is added after second interposer of the feature extraction sub-network in DSOD network;It will step Pretreated image is input to the feature extraction sub-network in DSOD network and carries out feature extraction in rapid S1;It will be in DSOD network Feature extraction sub-network in the characteristic pattern of second interposer be input in RFB_a network module, by RFB_a network mould The Atrous of different sampling step lengths expands convolution in block, extracts the feature with different feeling open country;The different feeling of the extraction Wild feature is input in 3 × 3 convolutional layers, forms first scale prediction layer of DSOD network;
Step S3: the Atrous convolutional layer for having default sampling step length is added to the feature extraction sub-network in the DSOD network Afterwards, and by the characteristic pattern of feature extraction sub-network described in step S2 it is input to the Atrous convolutional layer with default sampling step length In, to increase the receptive field of characteristic pattern;Meanwhile the feature that Atrous convolutional layer generates being input to more rulers in DSOD network It spends in prediction interval, forms 5 scale prediction layers;
Step S4: 5 scales described in the first scale prediction layer and step S3 by DSOD network described in step S2 are pre- The feature for surveying layer is input in the multitask loss function L that IOG penalty term is added;
Step S5: learning rate is arranged by warm up strategy, optimizes all nets in the DSOD network using gradient descent algorithm The weight of network layers;Suitable sample size is set, to reduce the hardware device requirement of trained DSOD network.
2. a kind of method for improving DSOD network according to claim 1, it is characterised in that: image preprocessing in step S1 Specifically:
Step S11: input picture is cut: first randomly chooses length and height that input picture cuts image, then One number of random selection is carried out in 0.1,0.3,0.7,0.9 as Jaccard coefficient threshold, is calculated by Jaccard coefficient The similarity of all true frames and cutting image in original image;
Whether step S12: judging true frame or cuts the Jaccard coefficient of image is greater than and randomly selects in step S11 Jaccard threshold value;If at least one true frame and the Jaccard coefficient for cutting image are greater than the Jaccard threshold of the selection Value, and the centre coordinate of this true frame falls in and cuts in image, then cuts image and meet the requirements, otherwise return step S11;Its Middle Jaccard coefficient calculates as follows:
Wherein, N indicates the number of true frame in image, boxiIndicate the area of i-th of true frame, boxcutIt indicates to cut image Area, operator ∩ indicate to calculate overlapping area.
Step S13: to the image after cutting, left and right mirror image processing is carried out according to preset probability T, by the image after mirror image processing Resolution adjustment is 300 × 300, the image after obtaining mirror image processing.
Step S14: using and go equalization method that the image after mirror image processing is carried out equalization, the figure after obtaining equalization Picture.
3. a kind of method for improving DSOD network according to claim 1, it is characterised in that: the step S2's is specific interior Hold are as follows: use 1 × 1 convolutional layer first in each RFB_a network branches, be used to reduce the port number of feature;In RFB_a It uses step-length for 1 in first branch of network and convolutional layer that convolution kernel is 3 × 3, obtains 3 × 3 receptive field feature;Second It uses 1 × 3 convolutional layer and sampling step length for 3 Atrous convolutional layer in branch, obtains 1 × 7 receptive field feature;In third It uses 3 × 1 convolutional layers and sampling step length for 3 Atrous convolutional layer in branch, obtains 7 × 1 receptive field feature;At the 4th It uses 3 × 3 convolutional layers and sampling step length for 5 Atrous convolutional layer in branch, obtains 11 × 11 receptive field feature;By logical Road splicing and 1 × 1 convolutional layer merge the feature that each branch extracts;Finally by the second of the feature of fusion and DSOD network The feature that a interposer generates merges the feature to form final output by residual error.
4. a kind of method for improving DSOD network according to claim 1, which is characterized in that tool is added described in step S3 Having default sampling step length is Atrous convolutional layer method particularly includes: firstly, increasing feature extraction subnet in the DSOD network Then the Atrous with certain sampling step length r is added to extract characteristic information more abundant in the output channel number C of network The output channel of convolutional layer, Atrous convolutional layer is equal to original DSOD feature extraction sub-network output channel number, so that Atrous volumes Product is embedded into the network of DSOD;And then 1 × 1 convolutional layer is added and carries out Fusion Features.
5. a kind of method for improving DSOD network according to claim 1, it is characterised in that: IOG is added described in step S4 The multitask loss function L of penalty term specifically:
Step S41: prediction block p and all true frame G areas friendships of default sample and more pre- than maximum of the output of DSOD network are calculated Survey frame giou_max, formula is as follows:
Wherein, g indicates that true frame, G indicate the set of all true frames, and p indicates prediction block, and P indicates all prediction block set, boxgIndicate the area of true frame, boxpIndicate the area of prediction block;
Step S42: will hand over and remove than the true frame of maximum area in step S41, then calculate prediction block and remaining true frame Maximum IOG penalty term, using maximum IOG penalty term as LiogLoss function, calculation formula are as follows:
Step S43: by LiogLoss function and positioning loss function LlocAnd Classification Loss function LconfFunction, which is weighted, to be melted It closes, forms final multitask loss function L, formula is as follows:
Wherein, N indicates the quantity of the positive sample of detection, and α indicates the L of positioning losslocWeight;Position loss function LlocUsing Be smoothL1Loss;Classification Loss function LconfIt is calculated using information cross entropy;Position loss function LlocIt calculates such as Under:
Wherein,Indicator function indicates that i-th of default frame matches the true frame that j-th of classification is k, and l indicates the position of prediction block Coordinate, pos indicate the default frame of positive sample, the quantity for the positive sample that N is indicated;smoothL1It calculates as follows:
Classification Loss LconfIt calculates as follows:
Wherein, c indicates the confidence level of each classification, and Neg indicates that negative sample, p indicate classification, and 0 indicates that classification is background.Table Show that i-th of prediction block is the probability of classification p, calculate as follows:
6. a kind of method for improving DSOD network according to claim 1, which is characterized in that using pre- described in step S5 Hot strategy setting learning rate specifically: set 10 for initial learning rate-5, learning rate linear increase is arrived in preceding 5 epoch 10-2, in the 75th epoch, the 125th epoch and the 175th epoch by learning rate respectively divided by 10, in the 200th epoch Complete training;It criticizes normalized weight initial value and is set as 0.5, the value of biasing is set as 0;All convolution use xavier's Method is initialized;Trained batch sample size is fallen below 16 from 128 by improving Training strategy, to reduce training net Requirement of the network to hardware device.
CN201910029814.0A 2019-01-12 2019-01-12 Method for improving DSOD network Active CN109784476B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910029814.0A CN109784476B (en) 2019-01-12 2019-01-12 Method for improving DSOD network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910029814.0A CN109784476B (en) 2019-01-12 2019-01-12 Method for improving DSOD network

Publications (2)

Publication Number Publication Date
CN109784476A true CN109784476A (en) 2019-05-21
CN109784476B CN109784476B (en) 2022-08-16

Family

ID=66500412

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910029814.0A Active CN109784476B (en) 2019-01-12 2019-01-12 Method for improving DSOD network

Country Status (1)

Country Link
CN (1) CN109784476B (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110348390A (en) * 2019-07-12 2019-10-18 创新奇智(重庆)科技有限公司 A kind of training method, computer-readable medium and the system of fire defector model
CN110348423A (en) * 2019-07-19 2019-10-18 西安电子科技大学 A kind of real-time face detection method based on deep learning
CN110378232A (en) * 2019-06-20 2019-10-25 陕西师范大学 The examination hall examinee position rapid detection method of improved SSD dual network
CN110443172A (en) * 2019-07-25 2019-11-12 北京科技大学 A kind of object detection method and system based on super-resolution and model compression
CN110503112A (en) * 2019-08-27 2019-11-26 电子科技大学 A kind of small target deteection of Enhanced feature study and recognition methods
CN110580445A (en) * 2019-07-12 2019-12-17 西北工业大学 Face key point detection method based on GIoU and weighted NMS improvement
CN110647817A (en) * 2019-08-27 2020-01-03 江南大学 Real-time face detection method based on MobileNet V3
CN110852330A (en) * 2019-10-23 2020-02-28 天津大学 Behavior identification method based on single stage
CN111027512A (en) * 2019-12-24 2020-04-17 北方工业大学 Remote sensing image shore-approaching ship detection and positioning method and device
CN111079753A (en) * 2019-12-20 2020-04-28 长沙千视通智能科技有限公司 License plate recognition method and device based on deep learning and big data combination
CN111539434A (en) * 2020-04-10 2020-08-14 南京理工大学 Infrared weak and small target detection method based on similarity
CN111680556A (en) * 2020-04-29 2020-09-18 平安国际智慧城市科技股份有限公司 Method, device and equipment for identifying vehicle type at traffic gate and storage medium
CN112525919A (en) * 2020-12-21 2021-03-19 福建新大陆软件工程有限公司 Wood board defect detection system and method based on deep learning
CN112614107A (en) * 2020-12-23 2021-04-06 北京澎思科技有限公司 Image processing method and device, electronic equipment and storage medium
CN112819008A (en) * 2021-01-11 2021-05-18 腾讯科技(深圳)有限公司 Method, device, medium and electronic equipment for optimizing instance detection network
CN113096023A (en) * 2020-01-08 2021-07-09 字节跳动有限公司 Neural network training method, image processing method and device, and storage medium
CN115760990A (en) * 2023-01-10 2023-03-07 华南理工大学 Identification and positioning method of pineapple pistil, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170046616A1 (en) * 2015-08-15 2017-02-16 Salesforce.Com, Inc. Three-dimensional (3d) convolution with 3d batch normalization
CN108564097A (en) * 2017-12-05 2018-09-21 华南理工大学 A kind of multiscale target detection method based on depth convolutional neural networks
CN108921196A (en) * 2018-06-01 2018-11-30 南京邮电大学 A kind of semantic segmentation method for improving full convolutional neural networks
CN109035184A (en) * 2018-06-08 2018-12-18 西北工业大学 A kind of intensive connection method based on the deformable convolution of unit

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170046616A1 (en) * 2015-08-15 2017-02-16 Salesforce.Com, Inc. Three-dimensional (3d) convolution with 3d batch normalization
CN108564097A (en) * 2017-12-05 2018-09-21 华南理工大学 A kind of multiscale target detection method based on depth convolutional neural networks
CN108921196A (en) * 2018-06-01 2018-11-30 南京邮电大学 A kind of semantic segmentation method for improving full convolutional neural networks
CN109035184A (en) * 2018-06-08 2018-12-18 西北工业大学 A kind of intensive connection method based on the deformable convolution of unit

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LIANG-CHIEH CHEN ET AL.: "DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》 *
ZHIQIANG SHEN ET AL.: "DSOD: Learning Deeply Supervised Object Detectors from Scratch", 《2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)》 *
吴建耀 等: "一种改进的DSOD目标检测算法", 《半导体光电》 *

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110378232B (en) * 2019-06-20 2022-12-27 陕西师范大学 Improved test room examinee position rapid detection method of SSD dual-network
CN110378232A (en) * 2019-06-20 2019-10-25 陕西师范大学 The examination hall examinee position rapid detection method of improved SSD dual network
CN110580445B (en) * 2019-07-12 2023-02-07 西北工业大学 Face key point detection method based on GIoU and weighted NMS improvement
CN110580445A (en) * 2019-07-12 2019-12-17 西北工业大学 Face key point detection method based on GIoU and weighted NMS improvement
CN110348390A (en) * 2019-07-12 2019-10-18 创新奇智(重庆)科技有限公司 A kind of training method, computer-readable medium and the system of fire defector model
CN110348423A (en) * 2019-07-19 2019-10-18 西安电子科技大学 A kind of real-time face detection method based on deep learning
CN110443172A (en) * 2019-07-25 2019-11-12 北京科技大学 A kind of object detection method and system based on super-resolution and model compression
CN110647817A (en) * 2019-08-27 2020-01-03 江南大学 Real-time face detection method based on MobileNet V3
CN110503112A (en) * 2019-08-27 2019-11-26 电子科技大学 A kind of small target deteection of Enhanced feature study and recognition methods
CN110503112B (en) * 2019-08-27 2023-02-03 电子科技大学 Small target detection and identification method for enhancing feature learning
CN110647817B (en) * 2019-08-27 2022-04-05 江南大学 Real-time face detection method based on MobileNet V3
CN110852330A (en) * 2019-10-23 2020-02-28 天津大学 Behavior identification method based on single stage
CN111079753B (en) * 2019-12-20 2023-08-22 长沙千视通智能科技有限公司 License plate recognition method and device based on combination of deep learning and big data
CN111079753A (en) * 2019-12-20 2020-04-28 长沙千视通智能科技有限公司 License plate recognition method and device based on deep learning and big data combination
CN111027512B (en) * 2019-12-24 2023-04-18 北方工业大学 Remote sensing image quayside ship detection and positioning method and device
CN111027512A (en) * 2019-12-24 2020-04-17 北方工业大学 Remote sensing image shore-approaching ship detection and positioning method and device
CN113096023A (en) * 2020-01-08 2021-07-09 字节跳动有限公司 Neural network training method, image processing method and device, and storage medium
CN113096023B (en) * 2020-01-08 2023-10-27 字节跳动有限公司 Training method, image processing method and device for neural network and storage medium
CN111539434B (en) * 2020-04-10 2022-09-20 南京理工大学 Infrared weak and small target detection method based on similarity
CN111539434A (en) * 2020-04-10 2020-08-14 南京理工大学 Infrared weak and small target detection method based on similarity
CN111680556A (en) * 2020-04-29 2020-09-18 平安国际智慧城市科技股份有限公司 Method, device and equipment for identifying vehicle type at traffic gate and storage medium
CN111680556B (en) * 2020-04-29 2024-06-07 平安国际智慧城市科技股份有限公司 Method, device, equipment and storage medium for identifying traffic gate vehicle type
CN112525919A (en) * 2020-12-21 2021-03-19 福建新大陆软件工程有限公司 Wood board defect detection system and method based on deep learning
CN112614107A (en) * 2020-12-23 2021-04-06 北京澎思科技有限公司 Image processing method and device, electronic equipment and storage medium
CN112819008A (en) * 2021-01-11 2021-05-18 腾讯科技(深圳)有限公司 Method, device, medium and electronic equipment for optimizing instance detection network
CN115760990A (en) * 2023-01-10 2023-03-07 华南理工大学 Identification and positioning method of pineapple pistil, electronic equipment and storage medium
CN115760990B (en) * 2023-01-10 2023-04-21 华南理工大学 Pineapple pistil identification and positioning method, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN109784476B (en) 2022-08-16

Similar Documents

Publication Publication Date Title
CN109784476A (en) A method of improving DSOD network
CN111259930B (en) General target detection method of self-adaptive attention guidance mechanism
CN111401201B (en) Aerial image multi-scale target detection method based on spatial pyramid attention drive
CN107609525B (en) Remote sensing image target detection method for constructing convolutional neural network based on pruning strategy
CN108416378A (en) A kind of large scene SAR target identification methods based on deep neural network
CN111696128B (en) High-speed multi-target detection tracking and target image optimization method and storage medium
CN108898047B (en) Pedestrian detection method and system based on blocking and shielding perception
CN111091105A (en) Remote sensing image target detection method based on new frame regression loss function
CN109948415A (en) Remote sensing image object detection method based on filtering background and scale prediction
CN106980858A (en) The language text detection of a kind of language text detection with alignment system and the application system and localization method
CN111027481B (en) Behavior analysis method and device based on human body key point detection
JPH06333054A (en) System for detecting target pattern within image
CN105844627A (en) Sea surface object image background inhibition method based on convolution nerve network
CN112396619B (en) Small particle segmentation method based on semantic segmentation and internally complex composition
CN109840524A (en) Kind identification method, device, equipment and the storage medium of text
CN110264444A (en) Damage detecting method and device based on weak segmentation
CN111983676A (en) Earthquake monitoring method and device based on deep learning
CN110111370A (en) A kind of vision object tracking methods based on TLD and the multiple dimensioned space-time characteristic of depth
CN106127754A (en) CME detection method based on fusion feature and space-time expending decision rule
CN104851102B (en) A kind of infrared small target detection method based on human visual system
CN116542912A (en) Flexible body bridge vibration detection model with multi-target visual tracking function and application
Duan et al. An anchor box setting technique based on differences between categories for object detection
CN115410102A (en) SAR image airplane target detection method based on combined attention mechanism
Yang et al. Immature Yuzu citrus detection based on DSSD network with image tiling approach
CN112801955B (en) Plankton detection method under unbalanced population distribution condition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant