CN110532859A

CN110532859A - Remote Sensing Target detection method based on depth evolution beta pruning convolution net

Info

Publication number: CN110532859A
Application number: CN201910648586.5A
Authority: CN
Inventors: 焦李成; 李玲玲; 姜升; 郭雨薇; 程曦娜; 丁静怡; 张梦璇; 杨淑媛; 侯彪
Original assignee: Xian University of Electronic Science and Technology
Current assignee: Xian University of Electronic Science and Technology
Priority date: 2019-07-18
Filing date: 2019-07-18
Publication date: 2019-12-03
Anticipated expiration: 2039-07-18
Also published as: CN110532859B

Abstract

The invention discloses a kind of Remote Sensing Target detection method based on depth evolution beta pruning convolution net, solve the problems, such as in existing Remote Sensing Target detection not to the effectively optimization global simultaneously of detection speed and detection accuracy.Specific steps: processing data set；Construct depth convolution feature extraction subnet；Construct full convolution FCN detection sub-network；It constructs and trains depth convolution target detection network；It constructs and trains the target detection network based on depth evolution beta pruning convolution net；Target detection is carried out to test data set with trained model；Output test result.The present invention separates convolution with depth and constructs anti-residual error structure, and model parameter amount is greatly reduced while high measurement accuracy；Target detection network is combined with evolution beta pruning, realizes global accelerate.The present invention is greatly reduced calculation amount, significantly improves target detection speed, and detection accuracy is high, for in remote sensing images aircraft and the Small objects such as naval vessel quickly, accurately detect.

Description

Remote Sensing Target detection method based on depth evolution beta pruning convolution net

Technical field

The invention belongs to technical field of image processing, further relate to Remote Sensing Target detection, specifically a kind of base In the Remote Sensing Target detection method of depth evolution beta pruning convolution net, can be applied to winged in different zones in remote sensing images The ground object target on machine and naval vessel is detected.

Background technique

Target detection technique is one of key problem of computer vision field, and Remote Sensing Target detection refers to using distant The image that sense satellite capture arrives is data source, and target interested in image is positioned and divided using image processing techniques Class.Remote Sensing Target detection, can be in high-tech military confrontation as the key technology in remote sensing images application, and capture is attacked Target is hit, accurate position and classification information etc. are provided, there is great influence to military field, with important application and is ground Study carefully value.

In the prior art since remote sensing images size is big, resolution ratio is low, target size is small and object edge is fuzzy, cause existing Some methods usually can not preferably learn to clarification of objective when carrying out Remote Sensing Target detection, and then target is caused to be examined The accuracy rate of survey is low, and the huge parameter amount as existing for the huge data volume and network model of remote sensing images, detects speed It is greatly limited.

The efficiency and accuracy rate of existing target detection technique can not often get both.Second order detection model such as FasterR-CNN There is very high accuracy, while also bringing huge calculation amount；Although single order detection model such as YOLO and SSD calculating speed Comparatively fast, but accuracy rate is unsatisfactory.

The paper " Focal Loss for Dense Object Detection " that Tsung-Yi Lin et al. is delivered at it (CVPR2017) a kind of general stage target detection model RetinaNet is proposed in, which utilizes residual error network ResNet completes the preliminary extraction to characteristics of image, is added the different layers that feature pyramid network FPN generates residual error network Characteristic pattern merged, enhancing output feature semantic information, keep Small object readily identified, so improve detection performance, it Classification and regression forecasting are carried out in each layer of pyramidal layer afterwards, finally solves to influence a phase targets using Focal Loss function The class imbalance problem due to caused by excessive background of detection model accuracy rate makes a stage target detection model Testing result on COCO data set is higher than target detection model of state-of-the-art two-stage at that time for the first time.But this method is still So existing shortcoming is that there are bulk redundancy information, parameter amounts in residual error network ResNet and feature pyramid network FPN It is larger with operand, model computation complexity and calculating speed are influenced, and be unsatisfactory for the demand disposed in embedded device.

Paper " the Accelerating Convolutional Networks via that Shaohui Lin et al. is delivered at it A kind of global dynamic pruning method GDP is proposed in Global&Dynamic Filter Pruning " (IJCAI2018), it is first First propose the global discriminant function based on each filter priori knowledge, the conspicuousness in global scope on beta pruning all levels Then low filter dynamically updates the conspicuousness of entire beta pruning sparse network filter, carry out to the filter of wrong beta pruning Retraining is recompiled and carried out, to improve the precision of model, is carried out using the stochastic gradient descent method based on greedy algorithm complete Office's optimization.But the shortcoming that this method still has is, the global discriminant function based on filter priori knowledge needs root It is designed according to specific tasks, differentiation deviation may be introduced using identical global discriminant function in different applications, led Overall precision is caused to lose.

At present algorithm of target detection large scale, low resolution remote sensing image carry out target detection when, due to by To the limitation of huge data volume and model parameter amount, and the problems such as that there are target sizes is small, object edge is fuzzy, lead to existing skill The Detection accuracy and detection speed of art can not be optimal simultaneously, it is difficult to carry out to remote sensing image not only quickly but also accurate Detection.

Summary of the invention

The purpose of the present invention is, propose a kind of to remain same compared with high-accuracy in view of the shortcomings of the prior art and defect When computation complexity substantially reduces, network integral operation speed greatly improves the remote sensing figure based on depth evolution beta pruning convolution net As object detection method.

The present invention is a kind of Remote Sensing Target detection method based on depth evolution beta pruning convolution net, which is characterized in that Include the following steps:

(1) training dataset and validation data set are handled: choosing the remote sensing image that several include plurality of target, will scheme Picture processing is the image block of 512 × 512 pixels, wherein the image block composition training data of remote sensing image 70%, 30% group Data enhancing is carried out at validation data set, and to training dataset；

(2) handle test data set: image procossing is by several other remote sensing images comprising plurality of target of input The image block of 512 × 512 pixels forms test data set；

(3) construct depth convolution feature extraction sub-network: respectively construct depth separate the anti-residual error link block of convolution and Depth successively using 7 × 7 convolutional layers, maximum pond layer, and is separated the anti-residual error of convolution and connected by feature pyramid convolution module Module replaces connection with feature pyramid convolution module, constitutes depth convolution feature extraction sub-network；

Depth convolution feature extraction sub-network specific structure is that original image input layer → 7 × 7 convolutional layer → the first are most Great Chiization layer → the first depth separates anti-residual error link block C1 → the second depth of convolution and separates the anti-residual error company of convolution Connection module C2 → the first feature pyramid convolution module P1 → third depth separate the anti-residual error link block C3 of convolution → Second feature pyramid convolution module P2 → four depth separates the anti-residual error link block C4 → third feature of convolution The maximum pond of pyramid convolution module P3 → the second maximum pond layer → four feature pyramid convolution module P4 → third Change layer → five feature pyramid convolution module P5 → current generation characteristic pattern output layer；

(4) full convolution FCN detection sub-network network is constructed:

(4a) constructs full convolution FCN and classify subnet: its structure is, classification 3 × 3 convolutional layer of subnet input layer → the first → Second 3 × 3 convolutional layers → third, 3 × 3 convolutional layers → four, 3 × 3 convolutional layer → five, 3 × 3 convolutional layers → classification Subnet output layer；Classification subnet input layer is by the characteristic pattern of each feature pyramid convolution module alternately as classification subnet Input, successively carry out classification and Detection；

(4b) constructs full convolution FCN and returns subnet: its structure be return 3 × 3 convolutional layer of subnet input layer → the first → Second 3 × 3 convolutional layers → third, 3 × 3 convolutional layers → four, 3 × 3 convolutional layer → five, 3 × 3 convolutional layers → recurrence Subnet output layer；Returning subnet input layer is by the characteristic pattern of each feature pyramid convolution module alternately as recurrence subnet Input, successively carry out recurrence detection；

(5) it constructs and trains depth convolution target detection network:

(5a) constructs depth convolution target detection network: being classified using depth convolution feature extraction sub-network, full convolution FCN Subnet, full convolution FCN return subnet and successively build composition depth convolution target detection network, and structure is original image input Layer → depth convolution feature extraction sub-network → full convolution FCN classification returns sub-network；(5b) trains depth convolution target detection Network: use training dataset and validation data set to be trained as input to depth convolution target detection network, instructed The depth convolution target detection network perfected, the weight file of the depth convolution target detection network after saving training；

(6) it constructs and trains the target detection network based on depth evolution beta pruning convolution net:

(6a) carries out layer-by-layer DNA volume to the convolution filter for participating in beta pruning in trained depth convolution target detection network Code, coding are denoted as DNA_1,...l-1,l；

(6b) optimizes DNA using evolution algorithm_1,...l-1,lCoding, obtain final optimum results coding DNA '_1,...l-1,l；

(6c) ties optimum results coding and closes DNA '_1,...l-1,lAnd prune rule building is based on depth evolution beta pruning convolution net Target detection network, prune rule are to be encoded to 0 expression this convolution filter finally by beta pruning, are encoded to the filter of this convolution of 1 expression Wave device is finally retained, and is finely adjusted using training dataset, is obtained trained based on depth evolution beta pruning convolution net Target detection network, i.e., trained model save trained Model Weight file；

(7) target detection is carried out to test data set using trained model:

The data block that test data is concentrated is sequentially inputted to trained based on depth evolution beta pruning convolution net by (7a) In target detection network, obtains test data and concentrate the corresponding classification confidence score of the candidate frame of each data block, candidate frame And the corresponding target category of candidate frame；

(7b) abandons candidate frame of all classification confidence scores lower than the target category of threshold value 0.3, to its after reservation Remaining candidate frame carries out non-maxima suppression processing；

(7c) to the coordinate of candidate frame with a grain of salt map, on the remote sensing image before being mapped to cutting, and Secondary non-maxima suppression processing is carried out, the testing result figure of final remote sensing image is obtained.

A kind of Remote Sensing Target detection method based on depth evolution beta pruning convolution net disclosed by the invention, mainly solves In existing Remote Sensing Target detection technique not to detection speed and target detection precision carry out simultaneously it is global effectively optimize ask Topic.

Compared with the prior art, the present invention has the following advantages:

Provide a kind of prioritization scheme: the difficult satisfactory to both parties defect of the generally existing accuracy and speed of the prior art, side of the present invention Method optimizes model accuracy rate with arithmetic speed simultaneously, has in terms of computation complexity and arithmetic speed more apparent Advantage, and model accuracy rate is promoted compared with the prior art.

Target detection network is combined with the global Dynamical Pruning method based on evolution algorithm, network acceleration is realized: mentioning A kind of completely new overall situation and Dynamical Pruning scheme based on evolution algorithm out, the filter of redundancy is removed by pruning method, from And it realizes CNN and accelerates.Previous most methods tend to by it is a kind of successively fix in a manner of successively beta pruning filter, this side Formula filter removed before can not dynamically restoring may be ignored and there is complicated association between filter, flexibility compared with Difference may result in the significant decrease of network evaluation performance.This method is combined by the filter by all layers to beta pruning Coding, and by network of the evolution algorithm optimization to beta pruning uses network in the performance of test data set as evolution algorithm Fitness, and the iteration optimization of network structure is completed by way of retraining, so that model is reached ideal acceleration effect And guarantee model performance.

Model parameter amount is greatly reduced: since the Remote Sensing Target detection method based on depth evolution beta pruning convolution net makes With depth separate convolution replacement ResNet network in Standard convolution, and depth separate in convolution unit 1 × 1 by ReLU activation primitive is not used after point convolutional layer, but uses linear activation primitive, to prevent characteristic information to be corrupted such that Reduce parameter amount and calculation amount required for fitting data while keeping model inspection precision, accelerates network model Convergence rate overcomes in the prior art due to network parameter amount is larger the shortcomings that loss model arithmetic speed, and can The calculating equipment limited suitable for computing resource and storage resource.

Guarantee accuracy rate while reducing parameter amount: proposing that a kind of anti-residual error link block, conventional residual connection structure make First dimensionality reduction, which is carried out, with channel of 1 × 1 convolutional layer to input feature vector figure rises dimension again, but feature is compressed after dimensionality reduction, in image Part useful feature information is removed, and less target signature information causes model inspection accuracy rate to decrease.The present invention A kind of anti-residual error link block is designed, twice liter of dimension is first carried out for the number of active lanes of input feature vector figure, it is special to obtain more images Reference breath, then access depth detachable unit and carry out feature extraction and channel dimensionality reduction, reduce parameter amount, improve network operations speed It maintains while spending compared with high-accuracy.

Detailed description of the invention

Fig. 1 is flow chart of the invention；

Fig. 2 is that depth of the invention separates convolution unit structure chart.

Specific embodiment

The present invention is described in detail with reference to the accompanying drawing.

Embodiment 1:

Remote Sensing Target detection is remote sensing image processing and the application that analysis field is concerned, for example, judging remote sensing With the presence or absence of targets such as aircraft naval vessels in image, and it is identified, classified and is accurately positioned.It is continuous with satellite technology Development, existing remote sensing image data volume is more and more huger, and for vast sea area, aircraft Ship Target size How small and target sparse quickly and accurately detects that target is one challenging from the remote sensing image of magnanimity Task.And how preferably existing Remote Sensing Target detection technique is often conceived to the characteristic information of learning objective, in turn The accuracy rate of raising target detection, however the huge parameter amount as existing for the huge data volume and network model of remote sensing images, Current detection speed is greatly limited.

The present invention expands research for this status, proposes a kind of remote sensing figure taken into account Detection accuracy and detect speed As object detection method, specially a kind of Remote Sensing Target detection method based on depth evolution beta pruning convolution net, referring to figure 1, it comprises the following steps that

(1) training dataset and validation data set are handled: choosing the remote sensing image that several include plurality of target, will scheme As the image block that cutting process is 512 × 512 pixels, wherein the image block of remote sensing image 70% forms training data, 30% composition validation data set, and data enhancing is carried out to training dataset.

(1a) inputs several substantially remote sensing images comprising plurality of target to be processed.

(1b) includes the substantially remote sensing image of plurality of target to several, and target is marked using marking tool.

Remote sensing image is cut into the image block of 512 × 512 pixels centered on each target by (1c).

(1d) is named the image block of each well cutting according to data set naming rule, and will be all after name Image block forms training dataset and validation data set, and wherein training dataset accounts for 70%, and validation data set accounts for 30%, and to instruction Practice data set and carries out data enhancing.

Wherein, data set naming rule refers to, by the filename of each remote sensing images to be cut and cutting data block pair It answers sliding window step number to connect using English underscore " _ " symbol, generates the formatted file of .jpg.

(2) test data set is handled: several other remote sensing images comprising plurality of target of input, by image cut place Reason is that the image block of 512 × 512 pixels forms test data set.

It includes the substantially remote sensing image of plurality of target that (2a), which inputs other several to be processed,.

(2b) is marked target to substantially remote sensing image to be tested, using marking tool.

(2c) in a manner of being overlapped sliding window, overlaid pixel is set as 100, and picture is successively cut into 512 × 512 pictures The image block of element.

(2d) is named the image block of each well cutting according to data set naming rule, and will be all after name Image block forms test data set.

(3) construct depth convolution feature extraction sub-network: respectively construct depth separate the anti-residual error link block of convolution and Depth successively using 7 × 7 convolutional layers, maximum pond layer, and is separated the anti-residual error of convolution and connected by feature pyramid convolution module Module replaces connection with feature pyramid convolution module, constitutes depth convolution feature extraction sub-network.

Depth convolution feature extraction sub-network specific structure is that original image input layer → 7 × 7 convolutional layer → the first are most Great Chiization layer → the first depth separates anti-residual error link block C1 → the second depth of convolution and separates the anti-residual error company of convolution Connection module C2 → the first feature pyramid convolution module P1 → third depth separate the anti-residual error link block C3 of convolution → Second feature pyramid convolution module P2 → four depth separates the anti-residual error link block C4 → third feature of convolution The maximum pond of pyramid convolution module P3 → the second maximum pond layer → four feature pyramid convolution module P4 → third Change layer → five feature pyramid convolution module P5 → current generation characteristic pattern output layer.

(4) full convolution FCN detection sub-network network is constructed:

(4a) constructs full convolution FCN classification subnet: its structure is to choose each feature pyramid convolution module respectively Characteristic pattern is as classification 3 × 33 × 3 convolutional layers of convolutional layer → the second of subnet input layer → the first → third, 3 × 3 convolution Layer → the 4th 3 × 3 convolutional layers → five, 3 × 3 convolutional layers → classification subnet output layer；Classification subnet input layer is will be each The characteristic pattern of a feature pyramid convolution module successively carries out classification and Detection, input feature vector alternately as the input of classification subnet Figure size is respectively 64 × 64,32 × 32,16 × 16,8 × 8,4 × 4.

The characteristic of division of 5th 3 × 3 convolutional layers output is calculated, obtains each default frame in each class categories On classification confidence, the characteristic value of the characteristic pattern of the 5th 3 × 3 convolutional layers output inputs into sigmoid function, output default Frame belongs to the probability of respective classes, i.e. default frame is in all kinds of classification confidences, wherein the calculation formula of sigmoid function is such as Under:

Wherein, x indicates to input the characteristic value of the characteristic pattern into sigmoid function.

(4b) constructs full convolution FCN and return subnet: its structure is to choose each feature pyramid convolution module respectively Characteristic pattern is as recurrence 3 × 33 × 3 convolutional layers of convolutional layer → the second of subnet input layer → the first → third, 3 × 3 convolution Layer → the 4th 3 × 3 convolutional layers → five, 3 × 3 convolutional layers → recurrence subnet output layer；Recurrence subnet input layer is will be each The characteristic pattern of a feature pyramid convolution module successively carries out recurrence detection, input feature vector alternately as the input for returning subnet Figure size is respectively 64 × 64,32 × 32,16 × 16,8 × 8,4 × 4.

(5) it constructs and trains depth convolution target detection network:

(5a) constructs depth convolution target detection network: being classified using depth convolution feature extraction sub-network, full convolution FCN Subnet, full convolution FCN return subnet and successively build composition depth convolution target detection network, and structure is original image input Layer → depth convolution feature extraction sub-network → full convolution FCN classification returns sub-network.

(5b) trains depth convolution target detection network: using training dataset and validation data set as input to depth Convolution target detection network is trained, and obtains trained depth convolution target detection network, the depth volume after saving training The weight file of product target detection network.

(6a) carries out layer-by-layer DNA volume to the convolution filter for participating in beta pruning in trained depth convolution target detection network Code, coding are denoted as DNA_1,...l-1,l。

(6b) optimizes DNA using evolution algorithm_1,...l-1,lCoding, obtain final optimum results coding DNA '_1,...l-1,l。 (6c) combine optimum results coding DNA '_{1 ... l-1, l}And prune rule constructs the target detection based on depth evolution beta pruning convolution net Network, prune rule are to be encoded to 0 expression this convolution filter finally by beta pruning, are encoded to 1 expression this convolution filter by most It is retained eventually, is finely adjusted using training dataset, obtains the trained target detection based on depth evolution beta pruning convolution net Method network, i.e., trained model save trained Model Weight file.

(7) target detection is carried out to test data set using trained model:

The data block that test data is concentrated is sequentially inputted to trained based on depth evolution beta pruning convolution net by (7a) In target detection network, obtains test data and concentrate the corresponding classification confidence score of the candidate frame of each data block, candidate frame And the corresponding target category of candidate frame.

(7b) abandons candidate frame of all classification confidence scores lower than the target category of threshold value 0.3, to its after reservation Remaining candidate frame carries out non-maxima suppression processing.

The difficult satisfactory to both parties defect of the generally existing accuracy and speed of the prior art, is not resolved, the present invention is to mesh for a long time Mark detection network Detection accuracy optimizes simultaneously with arithmetic speed, a kind of prioritization scheme is provided, in accuracy and speed Aspect reaches most preferably simultaneously, there is more apparent advantage, and model accuracy rate phase in terms of computation complexity and arithmetic speed The prior art is promoted.

Thinking of the invention is: constructing separate convolution unit based on the depth that model parameter amount can be greatly reduced first Anti- residual error network to extract the essential characteristic of input picture, and carries out more in this, as the input of feature pyramid convolutional network Fine feature extraction, returns subnet using full convolution FCN classification subnet and full convolution FCN and is detected, and finally carries out global Evolution beta pruning optimization, to realize network acceleration.The extracted feature of the present invention is more suitable for Remote Sensing Target Detection task, It can be improved the accuracy rate of Remote Sensing Target detection, while greatly improving network operations speed.

Embodiment 2:

Remote Sensing Target detection method based on depth evolution beta pruning convolution net is the same as embodiment 1, step (3) described building Depth convolution feature extraction sub-network, the specific steps are that:

(3a) building depth separates the anti-residual error link block of convolution: its modular structure is that characteristic pattern inputs on last stage Layer → 1 × 1 convolutional layer → depth separates convolution unit → point by point and is added layer → current generation characteristic pattern output layer.

1 × 1 convolutional layer and depth separate convolution unit and occur in groups in anti-residual error link block, it is point-by-point be added layer be by The depth of preceding layer separates the output characteristic pattern of convolution unit and carrys out the characteristic pattern progress of reflexive residual error link block input layer It is point-by-point to be added the characteristic processing layer formed.

The meaning of anti-residual error link block is that conventional residual connection structure rises the channel elder generation dimensionality reduction of input feature vector figure again Dimension, and the channel of input feature vector figure is first risen dimension dimensionality reduction again by anti-residual error link block, wherein 1 × 1 convolutional layer is by input feature vector figure Channel carry out 2 times of liter dimension, depth separates convolution unit and carries out feature extraction and 2 times to the channel of input feature vector figure again Channel dimensionality reduction, so that constructed depth separates output characteristic pattern number of active lanes and the input spy of the anti-residual error link block of convolution Sign figure is consistent.

(3b) construction feature pyramid convolution module: this feature pyramid convolution module is the double-deck input single layer output Structure, modular structure is first convolutional layer of input feature vector Fig. 1 → input feature vector Fig. 1 → twice of up-sampling layer → defeated Characteristic pattern 1 out, first convolutional layer of input feature vector Fig. 2 → input feature vector Fig. 2 → output characteristic pattern 2 are added layer → the point by point Two convolutional layers → current generation characteristic pattern output layer.

Wherein, input feature vector Fig. 1 is that depth separates the input feature vector figure in the anti-residual error link block of convolution and exports special The identical phase characteristic figure of figure size is levied, input feature vector Fig. 2 is to have sky identical as output characteristic pattern 1 in anti-residual error link block Between size characteristic pattern, the point-by-point layer that is added is will export characteristic pattern 1 to carry out at the point-by-point feature for being added formation with characteristic pattern 2 is exported Layer is managed, twice of up-sampling layer will be by the ruler of first convolutional layer treated input feature vector Fig. 1 by bilinear interpolation algorithm Degree amplifies.

(3c) successively using 7 × 7 convolutional layers, maximum pond layer, and by depth separate the anti-residual error link block of convolution and Feature pyramid convolution module alternately connects, and constructs depth convolution feature extraction sub-network, specific structure is that original image is defeated Enter maximum pond layer → the first depth of layer → 7 × 7 convolutional layer → the first and separates anti-residual error link block C1 → the of convolution Two depth separate the anti-residual error link block C2 → the first feature pyramid convolution module P1 → third depth of convolution can It is anti-to separate the separable convolution of anti-residual error link block C3 → the second feature pyramid convolution module P2 → four depth of convolution The maximum pond layer → four feature gold word of residual error link block C4 → third feature pyramid convolution module P3 → the second The maximum pond layer → five feature pyramid convolution module P5 → current generation characteristic pattern of tower convolution module P4 → third is defeated Layer out.

Guarantee accuracy rate while reducing parameter amount: the present invention proposes a kind of anti-residual error link block, conventional residual connection Structure carries out first dimensionality reduction using channel of 1 × 1 convolutional layer to input feature vector figure and rises dimension again, but feature is compressed after dimensionality reduction, Part useful feature information is removed in image, and less target signature information causes model inspection accuracy rate to decrease. The present invention designs a kind of anti-residual error link block, first carries out twice liter of dimension for the number of active lanes of input feature vector figure, obtains more Image feature information, then access depth detachable unit and carry out feature extraction and channel dimensionality reduction, reduce parameter amount, improve network It maintains while arithmetic speed compared with high detection accuracy rate.

Embodiment 3:

Remote Sensing Target detection method based on depth evolution beta pruning convolutional network is the same as embodiment 1-2, step (3a) institute The depth stated separates convolution unit, and referring to fig. 2, cellular construction is that characteristic pattern input layer → 3 × 3 depth are rolled up on last stage Point-by-point convolutional layer → the second batch normalization the layer in lamination → the first batch normalization layer → ReLU activation primitive layer → 1 × 1 → linear activation primitive layer → output feature figure layer.

Depth separates convolution unit and Standard convolution is divided into the point-by-point convolution of depth convolution sum, with realize the space of feature with Channel separation is handled respectively, so that parameter amount and computation complexity be greatly reduced.

ReLU activation primitive is not used after activation primitive after 1 × 1 point-by-point convolutional layer, but uses linear activation Function, to prevent ReLU activation primitive from causing biggish information loss to the lower tensor of port number, thus to characteristic information into Row destroys.

Assuming that input feature vector figure size is H_in×W_in×C_in, H_in、W_in、C_inThe respectively width of input feature vector figure, height and logical Road number, the convolution kernel for the use of the high size of width being K × K, the characteristic pattern size of output are expressed as H_out×W_out×C_out, H_out、 W_out、C_outWidth, height and the number of active lanes of characteristic pattern are respectively exported, Standard convolution considers the space of input feature vector figure simultaneously and leads to Road information, calculation amount are as follows:

K×K×C_in×H_out×W_out×C_out

Depth separate convolution using 3 × 3 depth convolution sum 1 × 1 point-by-point convolution by the channel of input feature vector figure with It is spatially separating, individually handles, depth separates the calculation amount of convolution are as follows:

K×K×C_in×H_out×W_out+C_in×H_out×W_out×C_out

Relative to Standard convolution, the calculation amount that depth of the invention separates convolution is reduced

The convolution kernel for being 3 × 3 for a size, calculation amount reduce about 9 times.

Embodiment 4:

Remote Sensing Target detection method based on depth evolution beta pruning convolutional network is the same as embodiment 1-3, step (6a) institute That states refers to the layer-by-layer DNA encoding of convolution filter progress that beta pruning is participated in trained depth convolution target detection network:

(6a1) carries out convolution operation to output characteristic pattern: remembering the l of trained depth convolution target detection network structure Layer characteristic pattern, height H_l, width W_l, port number C_lOutput characteristic pattern beAnd remember l layers of feature The feature subgraph for scheming k-th of channel therein isThen Z_l ^(k)By the parameter of corresponding convolution filterAnd the characteristic pattern Z of front layer_l-1Obtained by carrying out convolution operation (*), f indicates activation primitive, Z_l ^(k)Calculation formula It is as follows:

Z_l ^(k)=f (Z_l-1*W_l ^(k))；

In common deep learning frame such as TensorFlow and Caffe, the convolution operation of tensor passes through transformation input Dimension and transposition convolution filter to be converted into matrix multiplication, by convolution operation treated output l layers of characteristic pattern Characteristic pattern Z_l* formula is as follows:

WhereinFor by convolution operation treated output characteristic pattern l-1 layers of characteristic pattern, W_l* convolution operation is indicated The parameter matrix of treated the corresponding convolution filter of l layers of characteristic pattern.

(6a2) carries out mask coding to the convolution filter for needing beta pruning or reservation: to trained depth convolution target Detect l layers of output characteristic pattern C of network structure_l, mask is introduced to the convolution filter of required beta pruning or reservationCoding is encoded to this convolution filter of 0 expression by beta pruning, is encoded to this convolution filter of 1 expression and is retained； ⊙ indicates inner product, and the convolution operation formula with the beta pruning of global characteristics channel is by formulaChange is as follows:

(6a3) successively encodes the convolution filter for participating in beta pruning: using trained based on the separable volume of depth Long-pending target detection network successively encodes the convolution filter for participating in beta pruning, the coding comprising needed beta pruning layer DNA_1,...l-1,lIt is denoted as

Wherein DNA_1,...l-1,lIndicate the 1st DNA encoding that beta pruning layer is waited for l,It indicates in l-1 layers of characteristic pattern Coded identification.

Since the present invention is based on the Remote Sensing Target detection methods of depth evolution beta pruning convolution net, to the depth after training Convolution target detection Web vector graphic pruning algorithms remove the convolution filter of bulk redundancy present in target detection network, reduce The over-fitting risk of network, has substantially simplified network structure, and less parameter amount makes it be easier to be deployed in embedded device, Simultaneously inference speed is dramatically speeded up.

Previous most methods tend to by it is a kind of successively fix in a manner of successively beta pruning filter, this mode can not Removed filter before dynamically restoring, may ignore and there is complicated association between filter, and flexibility is poor, may It will lead to the significant decrease of network evaluation performance.Filter of the present invention by all layers to beta pruning carries out combined coding, flexibility By force, and relevance between filter is made full use of, network performance is improved while accelerating network.

Embodiment 5:

Remote Sensing Target detection method based on depth evolution beta pruning convolution net is with embodiment 1-4, institute in step (6b) That states optimizes DNA using evolution algorithm_1,...l-1,lCoding, obtains final optimum results DNA₁′_,...l-1,lMethod particularly includes:

(6b1) initialization: beta pruning ratio r atio is arranged in setting evolutionary generation counter t=0, maximum evolutionary generation T_cut =0.5, according to ratio_cutIt is random to generate with DNA_1,...l-1,lM individual of coding is used as initial population

WhereinFor the DNA encoding of the m-1 individual in the 1st to l layers filter.

(6b2) adjusts network parameter using training dataset: to the group P of t wheel_t, use training dataset and addition The convolution operation formula of global characteristics channel beta pruning after mask codingTo the net of generation Network carries out retraining, adjusts network parameter.

(6b3) fitness calculates: using the volume of the global characteristics channel beta pruning after validation data set and addition mask coding Product operation formulaCalculate group P_tIn each individual fitnessWhereinFor the loss of validation data set.

(6b4) generates new individual: utilizing the fitness of individualThe individual with higher fitness is selected, is used for Intersect and make a variation to generate new individual, wherein crossing operation is according to crossover probability p_m=0.9 at random hands over parent individuality Fork operation, mutation operator is according to mutation probability p_c=0.9 carries out mutation operation to parent individuality at random, extremely by step (6b1) (6b4), group P_tNext-generation group P is obtained after selection, intersection, mutation operator_t+1。

(6b5), which judges whether to terminate, to evolve: if t=T, then having maximum adaptation degree individual with obtained in evolutionary process It is exported as optimal solution, terminates and calculate, coding is denoted as DNA '_1,...l-1,l, execute step (6c) building and cut based on depth evolution The target detection network of branch convolution net.Otherwise, if t < T, return step (6b2), repeat step (6b2) to (6b5), continue into The evolutionary optimization of row coding.

The present invention combines target detection network with the global Dynamical Pruning method based on evolution algorithm, realizes that network adds Speed: removing the filter of redundancy by pruning method, to realize that CNN accelerates.Unlike documents 2, this method without It need to be pre-designed Prior function, beta pruning process can be optimized by global dynamic evolution algorithm, reduce algorithm enforcement difficulty.This Invention optimizes the network to beta pruning by evolution algorithm by the way that all layers are carried out combined coding to the filter of beta pruning, makes Network is used, as the fitness of evolution algorithm, and to complete by way of retraining the iteration of network structure in the performance of test set Optimization, finally makes model reach ideal acceleration effect and guarantees model performance.

A more complete detailed example is given below, the invention will be further described.

Embodiment 6:

Remote Sensing Target detection method based on depth evolution beta pruning reel wire network is with embodiment 1-5, referring to Fig.1,

Step 1, it handles and determines training dataset and validation data set:

Several substantially remote sensing images comprising plurality of target to be processed are inputted, include the big of plurality of target to several Width remote sensing image is marked target using marking tool；Centered on each target, remote sensing image is cut At the image block of 512 × 512 pixels；According to data set naming rule, the image block of each well cutting is named, and will life All image blocks composition training dataset and validation data set after name, wherein training dataset accounts for 70%, and validation data set accounts for 30%, and to training data concentrate image block once carry out image scale transform, image translation, image rotation, image mirrors, The data enhancement operations such as noise are added in picture contrast and brightness adjustment and image, form final training dataset.

Step 2, it handles and determines test data set:

Several other substantially remote sensing images comprising plurality of target to be processed are inputted, to substantially optics to be tested Remote sensing images are marked target using marking tool；In a manner of being overlapped sliding window, overlaid pixel is set as 100, will Picture is successively cut into the image block of 512 × 512 pixels；According to data set naming rule, to the image block of each well cutting into Row name, and all image blocks after name are formed into test data set.

Step 3, depth convolution feature extraction sub-network is constructed, the specific steps are that:

Wherein, Standard convolution is divided into the point-by-point convolution of depth convolution sum by the separable convolution unit of depth, to realize feature Channel be spatially separating, so that computation complexity and model parameter amount be greatly reduced, cellular construction is, on last stage characteristic pattern Input layer → 3 × 3 depth convolutional layer → the first batch normalization layer → point-by-point convolutional layers in ReLU activation primitive layer → 1 × 1 → Second batch normalization layer → linear activation primitive layer → output feature figure layer；Activation primitive after 1 × 1 point-by-point convolutional layer ReLU activation primitive is not used later, but uses linear activation primitive, to prevent ReLU activation primitive destructive characteristics information.

Assuming that input feature vector figure size is expressed as H_in×W_in×C_in, H_in、W_in、C_inThe respectively width of input feature vector figure, height And number of active lanes, the convolution kernel for the use of the high size of width being K × K, the characteristic pattern size of output are expressed as H_out×W_out×C_out, H_out、W_out、C_outWidth, height and the number of active lanes of characteristic pattern are respectively exported, Standard convolution considers the space of input feature vector figure simultaneously And channel information, calculation amount are as follows:

K×K×C_in×H_out×W_out×C_out

K×K×C_in×H_out×W_out+C_in×H_out×W_out×C_out

The convolution kernel for being 3 × 3 for a size, calculation amount reduces about 9 times, so as to reach 7 to 9 times of speed It is promoted.

The meaning of anti-residual error link block is that conventional residual connection structure rises the channel elder generation dimensionality reduction of input feature vector figure again Dimension, and the channel of input feature vector figure is first risen dimension dimensionality reduction again by anti-residual error link block, wherein 1 × 1 convolutional layer is by input feature vector figure Channel carry out 2 times of liter dimension, depth separates 3 × 3 depth convolutional layers in convolution unit and carries out feature extraction, and depth can divide From in convolution unit 1 × 1 point-by-point convolutional layer by the dimensionality reduction of 2 times of the channel progress of input feature vector figure, so that constructed depth can The output characteristic pattern number of active lanes for separating the anti-residual error link block of convolution is consistent with input feature vector figure.

(3b) construction feature pyramid convolution module: the module is the structure of the double-deck input single layer output, module Structure is first convolutional layer of input feature vector Fig. 1 → input feature vector Fig. 1 → twice of up-sampling layer → output characteristic pattern 1, input First convolutional layer → output characteristic pattern 2 of 2 → input feature vector of characteristic pattern Fig. 2, point by point be added layer → the second convolutional layer → when Last stage characteristic pattern output layer.

Feature pyramid convolution module design parameter setting are as follows: the filtering of first convolutional layer of opposite input feature vector Fig. 1 Device size is 1 × 1, and convolution step-length is 1；The filter size of first convolutional layer of opposite input feature vector Fig. 2 is 1 × 1, convolution Step-length is 1, and the filter size of second convolutional layer is 3 × 3, and convolution step-length is 1.

Feature pyramid convolution module effectively can separate anti-residual error convolution as input using depth and extract by stage Feature, and the semantic feature from higher is combined by the method for up-sampling, allow network by further feature and shallow-layer Feature effectively combines and overcomes the semantic gap of different phase characteristic pattern, enables further feature and shallow-layer feature more effectively same When be applied to classification and return, be integrally improved sense image Small Target such as baby plane and naval vessel target detection accuracy rate.

Step 4, full convolution FCN detection sub-network network is constructed:

Construct full convolution FCN classification subnet: its structure is to choose the feature of each feature pyramid convolution module respectively 3 × 3 convolutional layers of figure conduct classification 3 × 33 × 3 convolutional layers of convolutional layer → the second of subnet input layer → the first → third → 4th 3 × 3 convolutional layers → five, 3 × 3 convolutional layers → classification subnet output layer.

Classification subnet input layer is by the characteristic pattern of each feature pyramid convolution module alternately as classification subnet Input, successively carries out classification and Detection, input feature vector figure size is respectively 64 × 64,32 × 32,16 × 16,8 × 8,4 × 4.

The parameter setting of full convolution FCN classification subnet are as follows:

Preceding four layer of 3 × 3 convolutional calculation, every layer of convolution step-length are 1；

Convolutional calculation is carried out to the output of the 4th 3 × 3 convolutional layers and obtains characteristic of division, convolution step-length is 1, filter number Mesh is 9 × 2, wherein " 9 " indicate the corresponding default frame of each pixel of the characteristic pattern of the 4th 3 × 3 convolutional layers output Number, the class categories number of " 2 " presentation class subnet.

Construct full convolution FCN and return subnet: its structure is to choose the feature of each feature pyramid convolution module respectively 3 × 3 convolutional layers of figure conduct recurrence 3 × 33 × 3 convolutional layers of convolutional layer → the second of subnet input layer → the first → third → 4th 3 × 3 convolutional layers → five, 3 × 3 convolutional layers → recurrence subnet output layer.

Returning subnet input layer is by the characteristic pattern of each feature pyramid convolution module alternately as recurrence subnet Input, successively carries out recurrence detection, input feature vector figure size is respectively 64 × 64,32 × 32,16 × 16,8 × 8,4 × 4.

Full convolution FCN returns the parameter setting of subnet are as follows:

Preceding four layers of convolutional calculation, every layer of convolution step-length are 1.

One layer of convolutional calculation is carried out to the output of the 4th 3 × 3 convolutional layers and obtains default frame position offset, convolution step-length It is 1, filter number is 9 × 4, wherein " 9 " indicate that each pixel of the characteristic pattern of the 4th 3 × 3 convolutional layers output is corresponding Default frame number, " 4 " indicate default 4 coordinate values in the frame upper left corner and the lower right corner position offset.

Two full convolution FCN sub-networks are respectively independent, each other not shared parameter.

Step 5, it constructs and trains depth convolution target detection network:

It constructs depth convolution target detection network: being classified using depth convolution feature extraction sub-network, full convolution FCN Net, full convolution FCN return subnet and successively build composition depth convolution target detection network, and structure is original image input layer → depth convolution feature extraction sub-network → full convolution FCN classification returns sub-network.

Training depth convolution target detection network: use training dataset and validation data set as input to depth convolution Target detection network is trained, and obtains trained depth convolution target detection network, the depth convolution mesh after saving training The weight file of mark detection network.

Step 6, it constructs and trains the object detection method network based on depth evolution beta pruning convolution net:

(6a) carries out layer-by-layer DNA volume to the convolution filter for participating in beta pruning in trained depth convolution target detection network Code, coding are denoted as DNA_1,...l-1,l, method particularly includes:

Z_l ^(k)=f (Z_l-1*W_l ^(k))；

In common deep learning frame such as TensorFlow and Caffe, the convolution operation of tensor passes through transformation input Dimension and transposition convolution filter to be converted into matrix multiplication, by convolution operation treated output l layers of characteristic pattern Characteristic pattern Z_l ^*Formula it is as follows:

WhereinFor by convolution operation treated output characteristic pattern l-1 layers of characteristic pattern, W_l ^*Indicate convolution operation The parameter matrix of treated the corresponding convolution filter of l layers of characteristic pattern.

(6a3) successively encodes the convolution filter for participating in beta pruning: using trained depth convolution target detection Network successively encodes the convolution filter for participating in beta pruning, the coding DNA comprising needed beta pruning layer_1,...l-1,lIt is denoted as

Wherein DNA_1,...l-1,lIndicate the 1st DNA encoding that beta pruning layer is waited for l,Indicate the volume in l-1 layers Code sign.

(6b) optimizes DNA using evolution algorithm_1,...l-1,lCoding, obtain final optimum results coding DNA '_1,...l-1,l, Method particularly includes:

(6b5), which judges whether to terminate, to evolve: if t=T, having maximum adaptation degree individual with obtained in evolutionary process It is exported as optimal solution, terminates and calculate, coding is denoted as DNA₁′_,...l-1,l, execute step (6c) building and cut based on depth evolution The target detection network of branch convolution net.Otherwise, if t < T, return step (6b2), repeat step (6b2) to (6b5), continue into The evolutionary optimization of row coding.

(6c) combine optimum results coding DNA '_{1 ... l-1, l}And prune rule building is based on depth evolution beta pruning convolution net Target detection network, prune rule are to be encoded to 0 expression this convolution filter finally by beta pruning, are encoded to the filter of this convolution of 1 expression Wave device is finally retained, and is finely adjusted using training dataset, is obtained trained based on depth evolution beta pruning convolution net Target detection network, i.e., trained model save trained Model Weight file.

Step 7, target detection is carried out to test data set using trained model:

The data block that test data is concentrated is sequentially inputted to the trained target based on depth evolution beta pruning convolution net It detects in network, obtains test data and concentrate the corresponding classification confidence score of the candidate frame of each data block, candidate frame and time Select the corresponding target category of frame.

Candidate frame of all classification confidence scores lower than the target category of threshold value 0.3 is abandoned, to remaining time after reservation Frame is selected, non-maxima suppression processing is carried out, non-maxima suppression processing refers to: to all detection blocks, obtaining according to classification confidence It point is ranked up from high to low, retains that degree of overlapping between detection block is low and the high candidate frame of score, abandon and be overlapped between detection block The high candidate frame low with score of degree, and so on, until traversing the minimum detection block of current detection frame sequence classification confidence, So that accuracy rate of testing result is higher, false alarm rate is lower.Wherein, the selection of threshold value can be adjusted according to the actual situation.

To the coordinate of candidate frame with a grain of salt map, on the remote sensing image before being mapped to cutting, and carry out Secondary non-maxima suppression processing, obtains the testing result figure of final remote sensing image.

By the present invention in that separating convolution with depth constructs anti-residual error connection structure, maintaining compared with high detection accuracy rate Model parameter amount and calculation amount is greatly reduced simultaneously, and has carried out global evolution on the basis of depth convolution target detection network Beta pruning realizes network acceleration, greatly improves the whole detection speed of model, simultaneously effective improve remote sensing image The detection accuracy on aircraft naval vessel.

Effect of the invention is described further below with reference to emulation experiment:

Simulated conditions:

Emulation experiment of the invention be E5-2697v4 × 2 Intel Xeon in dominant frequency 2.4GHz, memory 64G, It is carried out under the hardware environment of GeForce GTX 1080 × 2 and the software environment of the Darknet under linux system.

Emulation content and interpretation of result:

Emulation experiment of the invention is the method that the RetinaNet of method and the prior art of the invention is respectively adopted, and Target detection is carried out using remote sensing images of the global Dynamical Pruning GDP to the Hong Kong International Airport region in QuickBird satellite.

Accuracy rate and mean accuracy mAP (Mean Average Precisiom) two indices are used below, respectively to this Two kinds of remote sensing image object detection results of invention and prior art RetinaNet+GDP are evaluated, and using following formula, are divided It Ji Suan not the accuracy rate and average essence of the invention with the remote sensing image object detection results of prior art RetinaNet+GDP Spend mAP:

Recall rate=always detect correct number of targets/total realistic objective number

Accuracy rate=always detecting correct number of targets/always detects number of targets

Accuracy rate-recall rate curve is drawn, the detection accuracy AP of target detection is obtained according to the area of curve, by multiple classes Other AP averages to obtain mean accuracy mAP.

Airplane detection precision, the naval vessel test essence of the present invention and prior art RetinaNet+GDP are listed in table 1 respectively Degree and mAP index.

1 emulation experiment test result detection accuracy list of table

Method	RetinaNet+GDP	The method of the present invention
			Airplane detection precision	0.9236	0.9575
Naval vessel detection accuracy	0.6319	0.6508
			Mean accuracy mAP	0.7778	0.8042

The mAP of prior art RetinaNet+GDP is 77.78% as seen from Table 1, and the mAP of the method for the present invention is 80.42%, method of the invention is higher in detection aircraft and detection accuracy when Ship Target.Existing Remote Sensing Target detection Technology has fallen into bottleneck in terms of improving detection accuracy, the detection of the targets such as Small object in remote sensing images especially aircraft naval vessel Precision is difficult to effectively be promoted, and method of the invention also effectively increases remote sensing images while greatly improving detection speed The detection accuracy of the Small objects such as middle aircraft naval vessel.

The detection speed detection frame number FPS per second of the present invention with prior art RetinaNet+GDP are listed in table 2 respectively

Table 2 emulates FPS result

Method	RetinaNet+GDP	The method of the present invention
			Detection frame number FPS per second	23	35

Visible prior art RetinaNet+GDP detection speed is 23FPS, the detection speed of the method for the present invention in table 2 For 35FPS, the detection speed of method of the invention when detecting aircraft and Ship Target is faster.Existing Remote Sensing Target detection Technology often sacrifices detection speed when carrying out target detection, however in practical application to remote sensing figure especially in military field The real-time quick detection of picture progress is most important, and method of the invention greatly improves detection while maintaining high-accuracy Speed.

In conclusion a kind of Remote Sensing Target detection side based on depth evolution beta pruning convolution net disclosed by the invention Method mainly solves not carry out full detection speed and target detection precision simultaneously in existing Remote Sensing Target detection acceleration technique The problem of office effectively optimizes.The specific steps of the present invention are as follows: construction training dataset and validation data set；Construct test data Collection；Construct depth convolution feature extraction sub-network；Construct full convolution FCN detection sub-network network；It constructs and trains depth convolution target Detect network；It constructs and trains the target detection network based on depth evolution beta pruning convolution net；Using trained model to survey It tries data set and carries out target detection；Output test result.The present invention separates convolution using depth and constructs anti-residual error connection structure, Model parameter amount and calculation amount is greatly reduced while maintaining compared with high detection accuracy rate；By target detection network and based on evolution The global Dynamical Pruning method of algorithm combines, and realizes network acceleration.The present invention, which has, is greatly reduced computation complexity and model Parameter amount, the target detection speed for significantly improving remote sensing image, while target detection advantage with high accuracy, can be used for distant The ground object target for feeling aircraft and naval vessel in the different zones of image is fast and accurately detected.

Claims

1. a kind of Remote Sensing Target detection method based on depth evolution beta pruning convolution net, which is characterized in that including walking as follows It is rapid:

(1) training dataset and validation data set are handled: choosing the remote sensing image that several include plurality of target, image is cut The image block that processing is 512 × 512 pixels is cut, wherein the image block composition training data of remote sensing image 70%, 30% group Data enhancing is carried out at validation data set, and to training dataset；

(2) handle test data set: image cutting process is by several other remote sensing images comprising plurality of target of input The image block of 512 × 512 pixels forms test data set；

(3) it constructs depth convolution feature extraction sub-network: constructing depth respectively and separate the anti-residual error link block of convolution and feature Pyramid convolution module, successively using 7 × 7 convolutional layers, maximum pond layer, by depth separate the anti-residual error link block of convolution and Feature pyramid convolution module alternately connects, and constitutes depth convolution feature extraction sub-network；

Depth convolution feature extraction sub-network specific structure is the maximum pond of original image input layer → 7 × 7 convolutional layer → the first Change layer → the first depth and separates the anti-residual error connection mould of the separable convolution of anti-residual error link block C1 → the second depth of convolution Block C2 → the first feature pyramid convolution module P1 → third depth separates anti-residual error link block C3 → the second of convolution A feature pyramid convolution module P2 → four depth separates the anti-residual error link block C4 of convolution → third feature gold word The maximum pond layer of tower convolution module P3 → the second maximum pond layer → four feature pyramid convolution module P4 → third → the five feature pyramid convolution module P5 → current generation characteristic pattern output layer；

(4) full convolution FCN detection sub-network network is constructed:

(4a) constructs full convolution FCN classification subnet: its structure is 3 × 3 convolutional layer → the second of subnet input layer → the first of classifying 3 × 3 convolutional layers of a 3 × 3 convolutional layers → third 3 × 3 convolutional layers → four, 3 × 3 convolutional layer → five → classification subnet Output layer；Classification subnet input layer is by the characteristic pattern of each feature pyramid convolution module alternately as the defeated of classification subnet Enter, successively carries out classification and Detection；

(4b) constructs full convolution FCN and return subnet: its structure is to return 3 × 3 convolutional layer → the second of subnet input layer → the first 3 × 3 convolutional layers of a 3 × 3 convolutional layers → third 3 × 3 convolutional layers → four, 3 × 3 convolutional layer → five → recurrence subnet Output layer；Returning subnet input layer is by the characteristic pattern of each feature pyramid convolution module alternately as the defeated of recurrence subnet Enter, successively carries out recurrence detection；

(5) it constructs and trains depth convolution target detection network:

(5a) constructs depth convolution target detection network: being classified using depth convolution feature extraction sub-network, full convolution FCN Net, full convolution FCN return subnet and successively build composition depth convolution target detection network, and structure is original image input layer → depth convolution feature extraction sub-network → full convolution FCN classification returns sub-network；

(5b) trains depth convolution target detection network: using training dataset and validation data set as input to depth convolution Target detection network is trained, and obtains trained depth convolution target detection network, the depth convolution mesh after saving training The weight file of mark detection network；

(6a) carries out layer-by-layer DNA encoding to the convolution filter for participating in beta pruning in trained depth convolution target detection network, Coding is denoted as DNA_1,...l-1,l；

(6c) combine optimum results coding DNA '_1,...l-1,lAnd prune rule constructs the target based on depth evolution beta pruning convolution net Network is detected, prune rule is to be encoded to 0 expression this convolution filter finally by beta pruning, is encoded to this convolution filter of 1 expression It is finally retained, is finely adjusted using training dataset, obtain the trained target based on depth evolution beta pruning convolution net Network is detected, i.e., trained model saves trained Model Weight file；

(7) target detection is carried out to test data set using trained model:

The data block that test data is concentrated is sequentially inputted to the trained target based on depth evolution beta pruning convolution net by (7a) It detects in network, obtains test data and concentrate the corresponding classification confidence score of the candidate frame of each data block, candidate frame and time Select the corresponding target category of frame；

(7b) abandons candidate frame of all classification confidence scores lower than the target category of threshold value 0.3, to remaining time after reservation Frame is selected, non-maxima suppression processing is carried out；

(7c) to the coordinate of candidate frame with a grain of salt map, on the remote sensing image before being mapped to cutting, and carry out Secondary non-maxima suppression processing, obtains the testing result figure of final remote sensing image.

2. the Remote Sensing Target detection method according to claim 1 based on depth evolution beta pruning convolution net, feature It is, step (3) the building depth convolution feature extraction sub-network, the specific steps are that:

(3a) building depth separates the anti-residual error link block of convolution: its modular structure is, on last stage characteristic pattern input layer → 1 × 1 convolutional layer → depth separates convolution unit → point by point and is added layer → current generation characteristic pattern output layer；

1 × 1 convolutional layer separates convolution unit with depth and occurs in groups in anti-residual error link block, and point-by-point addition layer is will be previous The characteristic pattern of output characteristic pattern and next reflexive residual error link block input layer that the depth of layer separates convolution unit carries out point-by-point It is added the characteristic processing layer formed；

The channel of input feature vector figure is first risen dimension dimensionality reduction again by the anti-residual error link block, wherein 1 × 1 convolutional layer is by input feature vector The channel of figure carries out 2 times of liter dimension, and depth separates convolution unit and carries out feature extraction and 2 times to the channel of input feature vector figure again Channel dimensionality reduction so that constructed depth separates the output characteristic pattern number of active lanes and input of the anti-residual error link block of convolution Characteristic pattern is consistent；

(3b) construction feature pyramid convolution module: the module is the structure of the double-deck input single layer output, modular structure For, first convolutional layer of input feature vector Fig. 1 → input feature vector Fig. 1 → twice of up-sampling layer → output characteristic pattern 1, input feature vector First convolutional layer of Fig. 2 → input feature vector Fig. 2 → output characteristic pattern 2 is added layer → the second convolutional layer → current rank point by point Section characteristic pattern output layer；

Wherein, input feature vector Fig. 1 is that depth separates the input feature vector figure in the anti-residual error link block of convolution and exports characteristic pattern The identical phase characteristic figure of size, input feature vector Fig. 2 are to have in anti-residual error link block and output 1 same space ruler of characteristic pattern Very little characteristic pattern, the point-by-point layer that is added is will to export characteristic pattern 1 to carry out being added the characteristic processing formed point by point with characteristic pattern 2 is exported Layer, twice of up-sampling layer will be by the scales of first convolutional layer treated input feature vector Fig. 1 by bilinear interpolation algorithm It amplifies；

Depth is separated the anti-residual error link block of convolution and feature successively using 7 × 7 convolutional layers, maximum pond layer by (3c) Pyramid convolution module alternately connects, and constructs depth convolution feature extraction sub-network.

3. the Remote Sensing Target detection method based on depth evolution beta pruning convolution net stated according to claim 2, feature exist In depth described in step (3a) separates convolution unit, and cellular construction is that characteristic pattern input layer → 3 × 3 are deep on last stage Spend the convolutional layer → the first batch normalization point-by-point convolutional layer → the second batch normalizing in layer → ReLU activation primitive layer → 1 × 1 Change layer → linear activation primitive layer → output feature figure layer；

Depth separates convolution unit and Standard convolution is divided into the point-by-point convolution of depth convolution sum, to realize the space and channel of feature Separation is handled respectively；

ReLU activation primitive is not used after activation primitive after 1 × 1 point-by-point convolutional layer, but uses linear activation letter Number.

4. the Remote Sensing Target detection method according to claim 1 based on depth evolution beta pruning convolution net, feature Be, described in step (6a) in trained depth convolution target detection network participate in beta pruning convolution filter carry out by Layer DNA encoding refers to:

(6a1) carries out convolution operation to output characteristic pattern: remembering l layers of spy of trained depth convolution target detection network structure Sign figure, height H_l, width W_l, port number C_lOutput characteristic pattern beAnd remember l layers of characteristic pattern wherein The feature subgraph in k-th of channel beThen Z_l ^(k)By the parameter of corresponding convolution filterAnd The characteristic pattern Z of front layer_l-1Obtained by carrying out convolution operation (*), f indicates activation primitive, Z_l ^(k)Calculation formula is as follows:

Z_l ^(k)=f (Z_l-1*W_l ^(k))

In common deep learning frame such as TensorFlow and Caffe, the convolution operation of tensor is by converting the dimension inputted Spend and transposition convolution filter to be converted into matrix multiplication, by convolution operation treated output characteristic pattern l layers of feature Scheme Z_l ^*Formula it is as follows:

WhereinFor by convolution operation treated output characteristic pattern l-1 layers of characteristic pattern, W_l ^*Indicate convolution operation processing The parameter matrix of the corresponding convolution filter of l layers of characteristic pattern afterwards；

(6a2) carries out mask coding to the convolution filter for needing beta pruning or reservation: to trained depth convolution target detection L layers of output characteristic pattern C of network structure_l, mask is introduced to the convolution filter of required beta pruning or reservationCoding is encoded to this convolution filter of 0 expression by beta pruning, is encoded to this convolution filter of 1 expression and is retained； ⊙ indicates inner product, and the convolution operation formula with the beta pruning of global characteristics channel is by formulaChange is as follows:

(6a3) successively encodes the convolution filter for participating in beta pruning: trained depth convolution target detection network is used, The convolution filter for participating in beta pruning is successively encoded, the coding DNA comprising needed beta pruning layer_1,...l-1,lIt is denoted as

Wherein DNA_1,...l-1,lIndicate the DNA encoding of 1 to l layers,Indicate the coded identification in l-1 layers.

5. the Remote Sensing Target detection method according to claim 1 based on depth evolution beta pruning convolution net, feature It is, optimizes DNA using evolution algorithm described in step (6b)_1,...l-1,lCoding, obtains final optimum results DNA′_1,...l-1,lMethod particularly includes:

(6b1) initialization: beta pruning ratio r atio is arranged in setting evolutionary generation counter t=0, maximum evolutionary generation T_cut= 0.5, according to ratio_cutIt is random to generate with DNA_1,...l-1,lM individual of coding is used as initial population

WhereinFor the DNA encoding of the m-1 individual in the 1st to l layers filter；

(6b2) adjusts network parameter using training dataset: to the group P of t wheel_t, using training dataset and mask volume is added The convolution operation formula of global characteristics channel beta pruning after codeThe network of generation is carried out Retraining adjusts network parameter；

(6b3) fitness calculates: being grasped using validation data set and the convolution that the global characteristics channel beta pruning after mask encodes is added Make formulaCalculate group P_tIn each individual fitnessWherein For the loss of validation data set；

(6b4) generates new individual: utilizing the fitness of individualThe individual with higher fitness is selected, for intersecting And variation, to generate new individual, wherein crossing operation is according to crossover probability p_m=0.9 carries out intersection behaviour to parent individuality at random Make, mutation operator is according to mutation probability p_c=0.9 carries out mutation operation to parent individuality at random, extremely by step (6b1) (6b4), group P_tNext-generation group P is obtained after selection, intersection, mutation operator_t+1；

(6b5) judge whether terminate evolve: if t=T, then using in evolutionary process it is obtained have maximum adaptation degree individual as Optimal solution output, terminates and calculates, and coding is denoted as DNA '_1,...l-1,l, execute step (6c) building and rolled up based on depth evolution beta pruning The target detection network of product net.Otherwise, if t < T, return step (6b2) repeats step (6b2) to (6b5), continues to compile The evolutionary optimization of code.