CN108460341B

CN108460341B - Optical remote sensing image target detection method based on integrated depth convolution network

Info

Publication number: CN108460341B
Application number: CN201810113862.3A
Authority: CN
Inventors: 焦李成; 唐旭; 李阁; 冯捷; 张丹; 陈璞花; 古晶; 张梦旋; 丁静怡; 杨淑媛; 侯彪; 屈嵘
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2018-02-05
Filing date: 2018-02-05
Publication date: 2020-04-07
Anticipated expiration: 2038-02-05
Also published as: CN108460341A

Abstract

The invention discloses an optical remote sensing image target detection method based on an integrated depth convolution network, which mainly solves the problems of more false detection targets and complex and fussy test process in the prior art. The method comprises the following specific steps: (1) building a multi-branch deep network; (2) generating a training data set containing a target area; (3) training an integrated deep convolutional network for the first time; (4) generating all region training data sets; (5) training the integrated deep convolutional network for the second time; (6) generating a test data set; (7) obtaining a detection result graph; (8) the average accuracy is calculated. The method can extract all target candidate frames of the non-target area as negative samples, fully utilizes the information of the optical remote sensing image, better distinguishes the target and the complex background in the optical remote sensing image, and has the advantages of simple test process and less false detection target of the detection result.

Description

Optical remote sensing image target detection method based on integrated depth convolution network

Technical Field

The invention belongs to the technical field of image processing, and further relates to an optical remote sensing image target detection method based on an integrated depth convolution network in the technical field of target detection image processing. The method can be used for detecting the ground object targets such as airplanes, ships and the like from the optical remote sensing image.

Background

The optical remote sensing image plays an irreplaceable role in national defense and civil use, and because the difference between the imaging mechanism of the optical remote sensing image and the visible light image is large, the research of a processing algorithm aiming at the characteristics of the image is particularly important. Target detection in optical remote sensing images is one of important applications and basic problems of technologies such as computer vision, image processing and the like in the remote sensing field. With the continuous development and progress of remote sensing in imaging technology, optical remote sensing images are developing towards higher time resolution, higher spatial resolution and higher spectral resolution, and the corresponding application range and demand are increasing. The existing aerospace reconnaissance modes are various, and massive optical remote sensing data are obtained. Therefore, the research on the target detection of the optical remote sensing image is very valuable. The detection of the targets of the aircraft and the ships has important significance for national defense, military affairs and life production practice, such as the monitoring of airports, ports or airspaces and sea areas in specific areas, the management of transportation, the control of illegal criminal activities such as illegal fishing and smuggling, and the like.

The Chinese academy of sciences proposed a robust ship target detection method for deep learning in the patent document 'a robust ship target detection method based on deep learning' (patent application number: 201710677418.X, publication number: CN 107563303A). Firstly, processing and training a training sample to obtain a training classifier, obtaining an optical remote sensing image to be processed, and preprocessing the optical remote sensing image; then, sea and land segmentation is carried out on the preprocessed optical remote sensing image to obtain a sea and land segmentation area, a shore area without a ship is shielded to obtain a feature extraction network, and a rotation invariant depth feature of the sea and land segmentation area is extracted by using the feature extraction network to obtain a feature map; and finally, obtaining a response graph of the ship in the category by using a method of classifying and activating feature graphs, solving a connected domain of the response graph to obtain a primary detection frame, and estimating parameters of the ship to obtain a result graph with the detection frame. Although the method can reduce the false alarm rate and improve the target detection precision by distinguishing the water areas through sea and land segmentation, the method still has the defects that the sea and land segmentation is required to be carried out on the optical remote sensing image, and the segmentation requires multiple steps of region segmentation, feature extraction and the like, so that the test process is complex and tedious.

Shaoqing Ren, in its published paper "fast R-CNN: directions Real-Time object detection with Region pro-peal Networks" (Neralinformation Processing Systems Monte-Ri international conference paper 2015), proposed a fast regional convolutional neural network target detection method, which firstly uses a Region generation network (RPN) to extract target candidate regions, then classifies and frame regresses the target candidate regions, and finally performs non-maximum suppression Processing on all target frames to obtain the final detection result. The background is easily mistakenly detected as a target, resulting in low target detection accuracy.

Disclosure of Invention

The invention aims to provide an optical remote sensing image target detection method based on an integrated depth convolution network for detecting aircraft and ship targets, aiming at the defects of the prior art. Compared with other existing optical remote sensing image target detection methods, the method can utilize information in the optical remote sensing image more, is simple in test process, and obtains higher optical remote sensing image target detection precision.

The method comprises the steps of extracting all target candidate frames without target areas from an optical remote sensing image to serve as negative samples, training a built integrated deep convolution network by using the negative samples, and classifying the target candidate frames by using two classification sub-networks in the integrated deep convolution network, wherein the first classification sub-network mainly distinguishes the target from a surrounding background and an incomplete target, and the second classification sub-network mainly distinguishes the target from a complex background.

The method comprises the following specific steps:

(1) building an integrated deep convolutional network:

building an integrated deep convolution network consisting of a basic network, an area generation sub-network and two classification sub-networks;

(2) generating a training data set containing a target area:

(2a) randomly selecting 7 discontinuous optical remote sensing images containing airplane and ship targets from optical remote sensing images received in real time from a remote sensing satellite to serve as training images;

(2b) marking the category and position coordinates of all the airplane and ship targets in the training image to obtain marked targets;

(2c) using a matrix window with the size of 400 multiplied by 400 to perform block processing on all regions containing targets in each training image, and sequentially performing data enhancement of rotation, overturning and random exposure adjustment on the image blocks subjected to the block processing to obtain 5850 image blocks in total;

(2d) converting the position coordinates of each labeled target in the image blocks into the position coordinates of the labeled target in the image blocks, combining the categories and the position coordinates of all labeled targets in each image block to obtain class labels, and taking all the image blocks and the corresponding class labels as a training data set containing a target area;

(3) training the integrated deep convolutional network for the first time:

inputting a training data set containing a target area into the integrated deep convolutional network except the second classification subnetwork for iterative training until a loss function of the network is converged to obtain a trained integrated deep convolutional network;

(4) generating all-region training data sets:

(4a) generating all target candidate frames of the training image and confidence degrees and categories of the target candidate frames by using a cutting and merging method;

(4b) all target candidate frames with intersection with the labeling target are designated as target candidate frames with a target area, and all target candidate frames without intersection with the labeling target are designated as target candidate frames without a target area;

(4c) calculating the ratio of the intersection area to the union area of each target candidate frame and each labeled target to obtain the intersection ratio of each target candidate frame with a target area and each labeled target, selecting all target candidate frames with the intersection ratio of more than 0.5 to any labeled target, sequentially setting the category of each target candidate frame as the category of the labeled target with the maximum intersection ratio with the target candidate frame, and forming a target area training data set by the target candidate frames, the categories and the image blocks where the target candidate frames are located;

(4d) setting three background classes for a target candidate frame of a non-target area, setting the class of the target candidate frame with the class as a background class, setting the class of the target candidate frame with the class as an airplane background class, setting the class of the target candidate frame with the class as a second background class, setting the class of the target candidate frame with the class as a ship background class, and forming a non-target area training data set by the target candidate frame, the class and an image block where the target candidate frame is located;

(5) training the integrated deep convolutional network for the second time:

taking the training data set with the target area as a positive sample and the training data set without the target area as a negative sample, inputting the positive sample and the negative sample into the integrated deep convolutional network except the first classification sub-network for iterative training until the loss function of the integrated deep convolutional network is converged to obtain a trained integrated deep convolutional network;

(6) generating a test data set:

randomly selecting 4 discontinuous optical remote sensing images containing airplane and ship targets from optical remote sensing images received in real time from a remote sensing satellite to serve as test images;

(7) obtaining a detection result graph:

(7a) using a matrix window with the cutting interval of 100 and the size of 400 multiplied by 400 to cut each test image, sequentially inputting the image blocks subjected to the cutting processing into a trained integrated depth convolution network, sequentially outputting the frame, the category and two confidences of each target in each image block, wherein the two confidences are respectively a three-class confidence coefficient and a five-class confidence coefficient, calculating the final confidence coefficient of each target, and transforming the frame coordinate of the target into the coordinate of the target in the corresponding test image;

(7b) carrying out non-maximum inhibition processing on all target frames and corresponding confidence coefficients to obtain a final detection result;

(7c) and drawing the frame of each target to the corresponding position of the optical remote sensing image to obtain a detection result graph.

(8) Calculating the average precision:

respectively calculating the average precision of each category and the average precision of multiple categories, and evaluating the detection result;

compared with the prior art, the invention has the following advantages:

firstly, the invention extracts all target candidate frames of a non-target area from the optical remote sensing image as negative samples, trains the constructed integrated deep convolution network by using the negative samples, and uses two classification sub-networks in the integrated deep convolution network to classify the target candidate frames respectively, thereby overcoming the problem that the background is easy to be falsely detected as the target in the prior art, and leading the invention to be capable of fully utilizing the information of the optical remote sensing image, better distinguishing the target and the complex background in the optical remote sensing image and better conforming to the characteristics of the optical remote sensing image.

Secondly, the invention builds an integrated deep convolution network consisting of a basic network, a region generation sub-network and two classification sub-networks, inputs the test data into the trained network, can directly output the test result, overcomes the problem of complex test process caused by sea-land segmentation of the optical remote sensing image in the prior art, simplifies the test process and improves the efficiency of target detection.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a graph of partial detection results obtained using a prior art fast regional convolutional neural network object detection method;

FIG. 3 is a graph showing a partial detection result obtained by the method of the present invention.

Detailed Description

The present invention is described in further detail below with reference to the attached drawing figures.

The steps of the present invention will be described in further detail with reference to fig. 1.

Step 1, building an integrated deep convolutional network.

And constructing an integrated deep convolutional network consisting of a basic network, an area generation sub-network and two classification sub-networks.

The integrated deep convolutional network consisting of the basic network, the area generation sub-network and the two classification sub-networks is characterized in that the area generation sub-network and the two classification sub-networks are arranged in parallel, and each sub-network is connected with the basic network.

The basic network is provided with 18 layers, and the structure of the basic network is as follows in sequence: the input layer → the first convolution layer → the second convolution layer → the first pooling layer → the third convolution layer → the fourth convolution layer → the second pooling layer → the fifth convolution layer → the sixth convolution layer → the seventh convolution layer → the third pooling layer → the eighth convolution layer → the ninth convolution layer → the tenth convolution layer → the fourth pooling layer → the eleventh convolution layer → the twelfth convolution layer → the thirteenth convolution layer; the parameters of each layer of the basic network are set as follows:

the total number of feature maps of the input layer is set to 3.

The total number of the feature maps of the first to thirteenth convolutional layers is sequentially set to 64, 128, 256, 512, and the scale of the convolution kernel is set to 3 × 3 nodes.

The regional generation sub-network is provided with 3 layers, and the structure sequentially comprises: input layer → convolution layer → classification regression layer; the parameter settings of each layer of the area generation sub-network are as follows:

the total number of feature maps of the input layer is set to 512.

The total number of feature maps of the convolutional layer is set to 512, and the scale of the convolutional kernel is set to 3 x 3 nodes.

The total number of feature maps classified in the classification regression layer was set to 18 and the total number of feature maps regressed was set to 36.

The first classification subnetwork has 5 layers, whose structure is in turn: input layer → region pooling layer → fully connected layer → classification regression layer; the parameters of each layer of the first classification subnetwork are set as follows:

the total number of feature maps for the first to second total connected layers is set to 4096.

The total number of feature maps classified in the classification regression layer was set to 3, and the total number of feature maps regressed was set to 12.

The second classification subnetwork has 5 layers, whose structure is in turn: input layer → region pooling layer → fully connected layer → softmax classifier layer; the parameters of the layers of the second classification subnetwork are set as follows:

The total number of feature maps classified in the first classification regression layer was set to 3, and the total number of feature maps regressed was set to 12.

The total number of feature maps of the softmax classifier layer is set to 5.

And 2, generating a training data set containing the target area.

And randomly selecting 7 discontinuous optical remote sensing images containing airplane and ship targets from the optical remote sensing images received in real time from the remote sensing satellite to serve as training images.

And marking the category and position coordinates of all the airplane and ship targets in the training image to obtain marked targets.

And (3) performing block cutting on all regions containing targets in each training image by using a matrix window with the size of 400 x 400, and sequentially rotating, overturning and adjusting random exposure of the image blocks subjected to block cutting to obtain 5850 image blocks.

The steps of sequentially rotating, overturning and randomly adjusting the image blocks after the dicing processing are as follows:

firstly, each image block after the block cutting processing is rotated by 90 degrees clockwise and then rotated by 90 degrees anticlockwise, and the rotated image block is obtained.

And secondly, horizontally overturning each rotated image block, and then vertically overturning to obtain an overturned image block.

Thirdly, calculating the adjusted value of the random exposure of each pixel point in each turned image block according to the following formula:

g(i,j)＝a*f(i,j)+b

wherein g (i, j) represents the value of the ith pixel point in the ith image block after random exposure adjustment, a represents a random number in the interval of [0.8,1.2], a represents multiplication operation, f (i, j) represents the value of the jth pixel point in the ith image block, and b represents the random number in the interval of [ -20,20 ].

And transforming the position coordinates of each labeled target in the image block into the position coordinates of the labeled target in the image block, combining the categories and the position coordinates of all labeled targets in each image block to obtain class labels, and taking all the image blocks and the corresponding class labels as a training data set containing a target area.

And 3, training the integrated deep convolutional network for the first time.

Inputting a training data set containing a target area into the integrated deep convolutional network except the second classification subnetwork for iterative training until a loss function of the network is converged to obtain the integrated deep convolutional network after the first training;

and 4, generating a training data set of all the regions.

And generating all target candidate frames of the training image and the confidence degrees and the categories of the target candidate frames by using a cutting and merging method.

The cutting and combining method comprises the following steps:

firstly, using a matrix window with a cutting interval of 100 and a size of 400 multiplied by 400 to perform the block processing on all areas in each training image to obtain the image block of each training image after the block processing.

Secondly, sequentially inputting the image blocks after the block cutting processing of each training image into the trained integrated depth convolution network, and outputting a target candidate frame of each image block and the confidence coefficient and the category of the target candidate frame, wherein the category comprises three categories of airplane and ship backgrounds;

and thirdly, transforming the coordinates of the target candidate frame in each image block into the coordinates of the target candidate frame in the corresponding training image, and combining the confidence degrees and the categories of the target candidate frames and the target candidate frames of all the image blocks after the cutting processing of each training image to obtain the confidence degrees and the categories of all the target candidate frames and the target candidate frames of the training image.

And all the target candidate frames with intersection with the labeling target are designated as target candidate frames with a target area, and all the target candidate frames without intersection with the labeling target are designated as target candidate frames without a target area.

For all target candidate frames with target areas, calculating the ratio of the intersection area to the union area of each target candidate frame and each labeled target to obtain the intersection ratio of each target candidate frame and each labeled target, selecting all target candidate frames with the intersection ratio of more than 0.5 to any labeled target, sequentially setting the category of each target candidate frame as the category of the labeled target with the maximum intersection ratio of the target candidate frame, and combining the target candidate frames, the categories and the image blocks where the target candidate frames are located to form a target area training data set.

Setting three background classes for the target candidate frame without the target area, setting the class of the target candidate frame with the class as a background class, setting the class of the target candidate frame with the class as an airplane background class, setting the class of the target candidate frame with the class as a second background class, setting the class of the target candidate frame with the class as a ship background class, and forming a training data set without the target area by the target candidate frame, the class and the image block where the target candidate frame is located.

And 5, training the integrated deep convolutional network for the second time.

And taking the training data set with the target area as a positive sample and the training data set without the target area as a negative sample, inputting the positive sample and the negative sample into the integrated deep convolutional network except the first classification sub-network for iterative training until the loss function of the integrated deep convolutional network is converged, and obtaining the integrated deep convolutional network after the second training.

And 6, generating a test data set.

And randomly selecting 4 discontinuous optical remote sensing images containing airplane and ship targets from the optical remote sensing images received in real time from the remote sensing satellite to serve as test images.

And 7, obtaining a detection result graph.

And (3) carrying out block cutting processing on the test image by using a matrix window with the cutting interval of 100 and the size of 400 x 400, sequentially inputting the image blocks subjected to block cutting processing into a trained integrated depth convolution network, sequentially outputting the frame, the category and two confidences of each target in each image block, wherein the two confidences are respectively a three-category confidence coefficient and a five-category confidence coefficient, calculating the final confidence coefficient of each target, and transforming the frame coordinate of the target into the coordinate of the target in the corresponding test image.

The step of calculating the final confidence of each target is as follows:

the first step, the confidence coefficients of the background classes in the three classes of confidence coefficients are removed, and the confidence coefficients of the three background classes in the five classes of confidence coefficients are removed.

And secondly, multiplying the two corresponding confidences of each target to obtain the final confidence of each target.

And carrying out non-maximum inhibition processing on all target frames and the corresponding confidence degrees to obtain a final detection result.

And drawing the frame of each target to the corresponding position of the optical remote sensing image to obtain a detection result graph.

And 8, calculating average precision.

And respectively calculating the average precision of each category and the average precision of multiple categories, and evaluating the detection result.

The steps of respectively calculating the average precision of each category and the average precision of multiple categories are as follows:

the method comprises the steps of firstly, dividing the total number of detected correct airplane targets by the total number of detected airplane targets to obtain the precision of airplane categories, dividing the total number of detected correct airplane targets by the total number of marked airplane targets to obtain the recall rate of the airplane categories, drawing a curve by taking the recall rate as an abscissa and the precision as an ordinate, and taking the area enclosed by the curve and the coordinate axes as the average precision of airplanes.

And secondly, dividing the total number of the detected correct ship targets by the total number of the detected ship targets to obtain the accuracy of the ship type, dividing the total number of the detected correct ship targets by the total number of the marked ship targets to obtain the recall rate of the ship type, drawing a curve by taking the recall rate as an abscissa and the accuracy as an ordinate, and taking the area enclosed by the curve and the coordinate axes as the average accuracy of the ship.

And thirdly, averaging the average precision of the airplane and the ship to obtain multi-class average precision.

The effect of the present invention is further explained by combining the simulation experiment as follows:

1. simulation conditions are as follows:

the simulation experiment of the invention is carried out under the Intel (R) Xeon (R) E5-2630CPU with main frequency of 2.40GHz 16, hardware environment with memory of 64GB and software environment of Caffe.

2. Simulation content and result analysis:

by adopting the method and the rapid regional convolutional neural network target detection method in the prior art, under the simulation condition, the simulation experiment is carried out according to the steps of the method, and the two methods are evaluated on a test set to obtain the detection result graphs and the average precision of the two methods.

Fig. 2 is a partial detection result diagram of a third test image obtained by using the method of the present invention, and fig. 3 is a partial detection result diagram of a third test image obtained by using a prior art fast area convolutional neural network target detection method. The average accuracy ratio of the two methods is shown in table 1.

Comparing fig. 2 and fig. 3, it can be seen that: compared with the target detection method of the fast regional convolutional neural network, the method has the advantage that the number of the false detection targets is reduced.

TABLE 1 comparison of average accuracy of the inventive method and the prior art method

As can be seen from table 1, compared with the fast regional convolutional neural network target detection method, the average accuracy of each class obtained by using the method of the present invention is improved, especially for the ship class in which false detection is likely to occur.

In summary, the invention performs target detection on the optical remote sensing image through the integrated deep convolution network, sequentially generates two data sets to train the integrated deep convolution network, can not only utilize information of a target area in the training image, but also utilize complex background information of a non-target area, fully utilizes the information of the optical remote sensing image, utilizes two classification sub-networks to realize different classification functions, reduces the number of false detection targets, and improves the overall detection effect. The testing process is realized through a network, so that the testing process is simplified, and the target detection efficiency is improved.

Claims

1. A target detection method of an optical remote sensing image based on an integrated deep convolution network is characterized in that the method extracts all target candidate frames without target areas from the optical remote sensing image as negative samples, trains the built integrated deep convolution network by using the negative samples, and respectively classifies the target candidate frames by using two classification sub-networks in the integrated deep convolution network, and the method specifically comprises the following steps:

(1) building an integrated deep convolutional network:

(2) generating a training data set containing a target area:

(2c) using a matrix window with the size of 400 multiplied by 400 to perform block cutting processing on all areas containing the marked targets in each training image, and sequentially rotating, turning and adjusting the random exposure of the image blocks subjected to the block cutting processing to obtain 5850 image blocks;

(3) training the integrated deep convolutional network for the first time:

(4) generating all-region training data sets:

(4c) for all target candidate frames with target areas, calculating the ratio of the intersection area to the union area of each target candidate frame and each labeled target to obtain the intersection ratio of each target candidate frame and each labeled target, selecting all target candidate frames with the intersection ratio of more than 0.5 to any labeled target, sequentially setting the category of each target candidate frame as the category of the labeled target with the maximum intersection ratio of the target candidate frame, and combining the target candidate frames, the categories and the image blocks where the target candidate frames are located to form a target area training data set;

(4d) setting three background classes for a target candidate frame in a non-target area, setting the class of the target candidate frame with the class as a background class, setting the class of the target candidate frame with the class as an airplane background class, setting the class of the target candidate frame with the class as a second background class, setting the class of the target candidate frame with the class as a ship background class, and forming a training data set of the non-target area by using the target candidate frame, the class and an image block where the target candidate frame is located;

(5) training the integrated deep convolutional network for the second time:

taking the training data set with the target area as a positive sample and the training data set without the target area as a negative sample, inputting the positive sample and the negative sample into the integrated deep convolutional network except the first classification sub-network for iterative training until the loss function of the integrated deep convolutional network is converged to obtain the integrated deep convolutional network after the second training;

(6) generating a test data set:

randomly selecting 4 discontinuous optical remote sensing images containing airplane and ship targets from the optical remote sensing images received in real time from the remote sensing satellite to serve as test images;

(7) obtaining a detection result graph:

(7a) using a matrix window with the cutting interval of 100 and the size of 400 multiplied by 400 to cut the test image, sequentially inputting the image blocks subjected to the cutting processing into a trained integrated depth convolution network, sequentially outputting the frame, the category and two confidences of each target in each image block, wherein the two confidences are respectively a three-category confidence coefficient and a five-category confidence coefficient, calculating the final confidence coefficient of each target, and transforming the frame coordinate of the target into the coordinate of the target in the corresponding test image;

(7b) performing non-maximum inhibition processing on all target frames and corresponding confidence coefficients to obtain a final detection result;

(7c) drawing the frame of each target to the position corresponding to the optical remote sensing image to obtain a detection result graph;

(8) calculating the average precision:

2. The method for detecting the target of the optical remote sensing image based on the integrated deep convolutional network as claimed in claim 1, wherein the integrated deep convolutional network composed of the base network, the area generation sub-network and the two classification sub-networks in step (1) is characterized in that the area generation sub-network and the two classification sub-networks are arranged in parallel, and each sub-network is respectively connected with the base network.

3. The method for detecting the optical remote sensing image target based on the integrated deep convolutional network as claimed in claim 1, wherein the basic network in step (1) is provided with 18 layers, and the structure of the basic network is as follows in sequence: the input layer → the first convolution layer → the second convolution layer → the first pooling layer → the third convolution layer → the fourth convolution layer → the second pooling layer → the fifth convolution layer → the sixth convolution layer → the seventh convolution layer → the third pooling layer → the eighth convolution layer → the ninth convolution layer → the tenth convolution layer → the fourth pooling layer → the eleventh convolution layer → the twelfth convolution layer → the thirteenth convolution layer; the parameters of each layer of the basic network are set as follows:

setting the total number of the feature maps of the input layer to be 3;

4. The method for detecting the target of the optical remote sensing image based on the integrated deep convolutional network as claimed in claim 1, wherein the area generation sub-network in step (1) is provided with 3 layers, and the structure of the area generation sub-network sequentially comprises: input layer → convolution layer → classification regression layer; the parameter settings of each layer of the area generation sub-network are as follows:

setting the total number of the feature maps of the input layer to 512;

setting the total number of the feature maps of the convolution layers as 512, and setting the scale of a convolution kernel as 3 x 3 nodes;

5. The method for detecting the target of the optical remote sensing image based on the integrated deep convolutional network as claimed in claim 1, wherein the first classification sub-network in step (1) is provided with 5 layers, and the structure of the first classification sub-network is as follows in sequence: input layer → region pooling layer → fully connected layer → classification regression layer; the parameters of each layer of the first classification subnetwork are set as follows:

setting the total number of the feature maps of the first to the second two full-connection layers to 4096;

6. The method for detecting the target of the optical remote sensing image based on the integrated deep convolutional network as claimed in claim 1, wherein the second classification sub-network in step (1) is provided with 5 layers, and the structure sequentially comprises: input layer → region pooling layer → fully connected layer → softmax classifier layer; the parameters of the layers of the second classification subnetwork are set as follows:

setting the total number of the classified feature maps in the first classification regression layer to be 3, and setting the total number of the regressed feature maps to be 12;

the total number of feature maps of the softmax classifier layer is set to 5.

7. The method for detecting the target of the optical remote sensing image based on the integrated deep convolutional network as claimed in claim 1, wherein the steps of sequentially rotating, overturning and randomly adjusting the exposure of the image block after the block processing in the step (2c) are as follows:

firstly, clockwise rotating each image block subjected to block splitting by 90 degrees, and then anticlockwise rotating by 90 degrees to obtain a rotated image block;

secondly, horizontally overturning each rotated image block, and then vertically overturning to obtain an overturned image block;

g(i,j)＝a*f(i,j)+b

8. The method for detecting the target of the optical remote sensing image based on the integrated deep convolutional network as claimed in claim 1, wherein the step of the cutting and combining method in step (4a) is as follows:

firstly, using a matrix window with a cutting interval of 100 and a size of 400 multiplied by 400 to perform block processing on all areas in each training image to obtain an image block of each training image after the block processing;

secondly, sequentially inputting the image blocks of each training image subjected to the block cutting processing into the trained integrated depth convolution network, and outputting a target candidate frame of each image block and the confidence coefficient and the category of the target candidate frame, wherein the category comprises three categories of airplanes, ships and backgrounds;

9. The method for detecting the optical remote sensing image target based on the integrated deep convolutional network as claimed in claim 1, wherein the step of calculating the final confidence of each target in step (7a) is as follows:

step one, removing confidence coefficients of the background classes in the three classes of confidence coefficients and removing confidence coefficients of the three background classes in the five classes of confidence coefficients;

10. The method for detecting the optical remote sensing image target based on the integrated deep convolutional network as claimed in claim 1, wherein the step of calculating the average precision of each category and the average precision of multiple categories in step (8) respectively is as follows:

the method comprises the steps of firstly, dividing the total number of detected correct airplane targets by the total number of detected airplane targets to obtain the precision of airplane categories, dividing the total number of detected correct airplane targets by the total number of marked airplane targets to obtain the recall rate of the airplane categories, drawing a curve by taking the recall rate as an abscissa and the precision as an ordinate, and taking the area enclosed by the curve and coordinate axes as the average precision of airplanes;

secondly, dividing the total number of detected correct ship targets by the total number of detected ship targets to obtain the accuracy of the ship type, dividing the total number of detected correct ship targets by the total number of marked ship targets to obtain the recall rate of the ship type, drawing a curve by taking the recall rate as an abscissa and the accuracy as an ordinate, and taking the area enclosed by the curve and the coordinate axes as the average accuracy of the ship;