CN116524174A

CN116524174A - Marine organism detection method and structure of multiscale attention-fused Faster RCNN

Info

Publication number: CN116524174A
Application number: CN202310236356.4A
Authority: CN
Inventors: 陈小毛; 张健; 王立成; 赵金润
Original assignee: Guilin University of Electronic Technology
Current assignee: Guilin University of Electronic Technology
Priority date: 2023-03-13
Filing date: 2023-03-13
Publication date: 2023-08-01

Abstract

The invention provides a marine organism detection method of a FasterRCNN with multi-scale attention fusion, which comprises the following steps: 1) Performing enhancement processing on an input image by adopting an MSRCR image enhancement algorithm; 2) Extracting the characteristics of the image after the enhancement treatment by adopting an MSAFPN network to obtain a multi-scale characteristic map; 3) Inputting the multi-scale feature map into an RPN network to obtain candidate frames; 4) Mapping the candidate frames onto the multi-scale feature map, intercepting the multi-scale feature map, fixing the intercepted feature map to be uniform in size, flattening the feature map to be a one-dimensional vector, and carrying out target classification and frame regression through a full-connection layer, so that accurate positioning detection is realized. According to the invention, the image quality of marine organisms is improved by adopting an image enhancement algorithm, more accurate characteristic information of the target is obtained through multi-scale fusion and attention module in the MSAFPN, more marine organisms can be detected when the target is detected, and the accuracy rate of target detection is effectively improved.

Description

Marine organism detection method and structure of multiscale attention-fused Faster RCNN

Technical Field

The invention belongs to the field of target detection, and particularly relates to a marine organism detection method and structure of a multi-scale attention-fused fast RCNN.

Background

The ocean has wide region, contains a large amount of biological resources, has easily digestible proteins and amino acids, has rich biological proteins such as sea cucumbers, scallops and the like, is rich in 9 amino acids necessary for human bodies and is easy to be absorbed by the human bodies. Since marine organisms have high nutritional value, there is a high demand for humans, and in order to be able to fully utilize these marine resources, underwater robots are required to replace humans for underwater operations fishing.

The underwater robot has the main functions of obtaining the type and position information of the surrounding marine organism targets through a target detection algorithm and successfully capturing the marine organism targets. The accurate marine organism target detection can effectively improve the efficiency of underwater operation of the underwater robot, and the quality of the target detection algorithm can directly influence the fishing efficiency. The traditional target detection method designs the characteristics of the target object in advance, and the characteristics adopt common characteristic descriptors, so that a simple target can be accurately detected. Due to the diversity of marine organisms and the specificity of the underwater environment, the generalization and robustness of traditional target detection are poor. In recent years, deep learning has achieved excellent results in the fields of image recognition, target detection, and the like. The convolutional neural network has strong feature extraction capability, can effectively extract the features of the target object, and greatly improves the detection precision and speed of the target detection network based on deep learning.

Faster R-CNN is used as a representative of a deep convolution neural network, firstly, a feature extraction network is used for extracting target features to generate a feature map, secondly, the feature map is used as an input of a region suggestion network (Region Proposal Network, RPN), firstly, sliding is carried out on the feature map through a sliding window, 1*1 convolution is carried out respectively to obtain region scores and frame regression parameters, and a final candidate frame is obtained through non-maximum suppression. The candidate frames are mapped on the feature images and are subjected to ROI pooling operation, the feature images with different sizes are fixed to be 7 multiplied by 7, and after the feature images are changed into one-dimensional vectors, the full-connection layer is used for carrying out subsequent target classification and frame regression. However, the currently mainstream target detection network such as fast R-CNN has better detection effect on common larger targets and worse detection effect on small targets. Due to the specificity of the underwater environment, the problems of low contrast, large noise, bluish green color and the like of marine organism images exist, most marine organisms are small targets, the detection accuracy is low, the fishing efficiency is affected, and therefore the marine organism targets cannot be detected directly. Aiming at the problems, the invention provides a marine organism detection method and a marine organism detection structure for multi-scale attention fusion Faster RCNN.

Disclosure of Invention

In view of the above-mentioned drawbacks of the prior art, the present invention aims to provide a multi-scale attention-fused fast RCNN marine organism detection method and structure, which are used for solving the problems of poor quality of small marine organism detection images, low capturing efficiency caused by low detection accuracy of a detection network in the prior art.

To achieve the above and other related objects, the present invention provides a marine organism detection method of a multiscale attention fused fast RCNN, the detection method at least comprising:

1) Performing enhancement processing on an input image by adopting an MSRCR image enhancement algorithm;

2) Extracting the characteristics of the image after the enhancement treatment by adopting an MSAFPN network to obtain a multi-scale characteristic map;

3) Inputting the multi-scale feature map into an RPN network to obtain candidate frames;

4) Mapping the candidate frames onto the multi-scale feature map, intercepting the multi-scale feature map, fixing the intercepted feature map to be uniform in size, flattening the feature map to be a one-dimensional vector, and carrying out target classification and frame regression through a full-connection layer, so that accurate positioning detection is realized.

Preferably, in step 1), the MSRCR image enhancement algorithm is developed by Retinex, which is essentially to divide an image into a reflected image and a luminance image, as shown in the following formula (1):

S(x,y)＝R(x,y)·L(x,y) (1)

where S (x, y) is an input image, R (x, y) is a reflected image, L (x, y) is a luminance image, and the following formula (2) is obtained by taking the logarithm of formula (1):

wherein R is _ssr (x, y) is the processed output image, F (x, y) is a gaussian function, the gaussian function is convolved with the input image to obtain a luminance image, and the image obtained by MSR processing is shown in formula (3):

wherein N in the MSR is 3, the MSRCR is added with a color recovery factor C to regulate the defect of color distortion on the basis of the MSR, as shown in formulas (4) and (5):

R _msrcr (x,y)＝C(x,y)·R _msr (x,y) (4)

where β is the gain constant set to 1, α is the controlled nonlinear intensity set to 128, I _i,j (x, y) is the image of a certain channel, and after the MSRCR algorithm processes the image, the pixels will typically appear negative. The image is corrected by gain G and Offset, which are set to 5, 25, respectively, as shown in equation (6):

R _msrcr (x,y)'＝G·R _msrcr (x,y)+Offset (6)。

preferably, in step 2), the step of extracting features of the image after enhancement processing by using an MSAFPN network to obtain a multi-scale feature map includes;

first, four feature maps { C2, C3, C4, C5} are output by ResNet 50;

then, FPN is adopted to operate the three feature graphs { C2, C3, C4}, and the fused feature graphs { P2, P3, P4} are obtained through up-sampling from bottom to top and transverse connection;

finally, the channel attention and the space attention are sequentially operated on the feature graphs { P2, P3 and P4} through an attention module, so that the multi-scale feature graphs { N2, N3 and N4} with rich feature information and position information are obtained.

Preferably, in step 3), the step of inputting the multi-scale feature map into an RPN network to obtain a candidate box includes:

and setting anchor frames with various sizes, removing anchor frames crossing the boundary, adjusting the frame regression parameters obtained by convolution operation on the rest anchor frames to be candidate frames, wherein a large amount of overlapping exists among the candidate frames, and processing the candidate frames through foreground and background probability scores and non-maximum value inhibition, wherein the rest candidate frames are final candidate frames.

Preferably, in step 4), the obtained feature map is fixed to a uniform size by the aid of the RoiAlign, the feature map is flattened to a one-dimensional vector, and target classification and frame regression are performed through the full-connection layer, so that accurate positioning detection is achieved.

The invention also provides a marine organism detection structure of the multi-scale attention-fused Faster RCNN, which at least comprises:

the MSRCR image enhancement module is used for enhancing the input image;

the MSAFPN network is used for extracting the characteristics of the image after the enhancement treatment and obtaining a multi-scale characteristic image;

the RPN is used for processing the multi-scale feature map and obtaining candidate frames;

and Fast RCNN is used for mapping the candidate frames to the multi-scale feature images, intercepting the multi-scale feature images, fixing the intercepted feature images to be uniform in size, flattening the feature images to be one-dimensional vectors, and carrying out target classification and frame regression through a full-connection layer so as to realize accurate positioning detection.

Preferably, the MSAFPN network at least includes:

ResNet50 for outputting four feature maps { C2, C3, C4, C5};

FPN is used for operating the three feature graphs { C2, C3, C4}, and obtaining a fused feature graph { P2, P3, P4}, by up-sampling from bottom to top and transversely connecting;

and the attention module is used for sequentially carrying out channel attention and space attention operation on the feature graphs { P2, P3 and P4} to obtain the multi-scale feature graphs { N2, N3 and N4} with rich feature information and position information.

Preferably, the ResNet50 comprises at least a convolutional layer, a pooled layer, and a residual block.

As described above, the marine organism detection method and structure of the multiscale attention-fused Faster RCNN of the invention have the following beneficial effects:

1. after the marine organism image is enhanced by the MSRCR image enhancement algorithm, compared with the original image, the shape of each target can be obviously seen, the image is clearer, and the subsequent detection precision can be improved.

2. For the situation that most marine organisms show small targets, the MSAFPN network can obtain more accurate characteristic information of the target organisms through multi-scale fusion and attention modules, so that more marine organisms can be detected when the targets are detected.

3. The RoiAlign operation can lead the distance deviation between the position of the frame regression positioning and the real position of the target to be small in the target classification and the frame regression of the Fast RCNN part, and can effectively improve the accuracy of target detection.

Drawings

FIG. 1 is a schematic diagram of the marine organism detection structure of the multi-scale attention fused Faster RCNN of the present invention.

Fig. 2 is a schematic diagram of the MSAFPN network structure of the present invention.

Fig. 3 is a schematic diagram of a residual block structure according to the present invention.

Fig. 4 is a schematic diagram of the top-down upsampling and cross connect architecture of the present invention.

FIG. 5 is a schematic diagram of the attention mechanism of the present invention.

FIG. 6 is a photograph showing the contrast of the MSRCR enhancement effect of the present invention.

Description of element reference numerals

1 MSRCR image enhancement module

2 MSAFPN network

201 ResNet50

202 FPN

203. Attention module

3 RPN network

4 Fast RCNN

5. Input image

Detailed Description

Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention.

Please refer to the accompanying drawings. It should be noted that, the illustrations provided in the present embodiment merely illustrate the basic concept of the present invention by way of illustration, and only the components related to the present invention are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complex.

The deep feature map obtained by the feature extraction network of the Faster RCNN has larger receptive field and obvious features, and is suitable for detecting larger targets. The semantic information of the shallow feature map suitable for detecting the smaller target is low and cannot be used as the detection feature map. The ROI pooling layer in the Faster RCNN carries out twice quantization and rounding in the calculation process of the coordinate position, so that the corresponding pixel points of the original image cannot be accurately mapped back in the subsequent frame regression positioning. The first quantization rounding is to cut out the candidate frame on the feature map in order to align the candidate region with the cells of the feature map when mapping the candidate region to the feature map, and the second quantization rounding is to fix the cut-out feature map to a uniform size. The errors caused by the two quantization rounding processes are amplified in the mapping process, the influence on a large target is small, the influence on a small target is large, when the area of the target is small, accurate positioning of the target cannot be realized, and the detection of the target is possibly influenced. Aiming at the problems of poor image quality and inaccurate positioning of marine organism detection targets caused by a special underwater environment, the invention provides a multi-scale attention-fused fast RCNN marine organism detection method and structure, which are specifically as follows.

Example 1

The embodiment provides a marine organism detection method of a multiscale attention fused Faster RCNN, which at least comprises the following steps:

s1, performing enhancement processing on an input image by adopting an MSRCR image enhancement algorithm;

s2, extracting the characteristics of the image after the enhancement treatment by adopting an MSAFPN network to obtain a multi-scale characteristic diagram;

s3, inputting the multi-scale feature map into an RPN network to obtain a candidate frame;

and S4, mapping the candidate frames to the multi-scale feature map, intercepting the multi-scale feature map, fixing the intercepted feature map to be uniform in size, flattening the feature map to be a one-dimensional vector, and carrying out target classification and frame regression through a full-connection layer, so that accurate positioning detection is realized.

The following describes in detail the marine organism detection method of the multi-scale attention-fused fast RCNN of this embodiment with reference to the accompanying drawings.

As shown in fig. 1, step S1 is first performed, and enhancement processing is performed on the input image 5 using the MSRCR image enhancement algorithm.

MSRCR was developed from Retinex proposed by Land, which essentially divides an image into a reflected image and a luminance image. The following formula (1):

S(x,y)＝R(x,y)·L(x,y) (1)

where S (x, y) is an input image, R (x, y) is a reflected image, and L (x, y) is a luminance image. Taking the logarithm of the formula (1), the formula (2) is as follows:

wherein R is _ssr (x, y) is the processed output image and F (x, y) is a Gaussian function. The gaussian function is convolved with the input image to obtain the luminance image. The image obtained by MSR processing is shown in formula (3):

wherein N in the MSR is 3, and the MSRCR is added with a color recovery factor C to regulate the color distortion based on the MSR. As shown in formulas (4), (5):

R _msrcr (x,y)＝C(x,y)·R _msr (x,y) (4)

where β is the gain constant set to 1, α is the controlled nonlinear intensity set to 128, I _i,j (x, y) is an image of a certain channel. After the MSRCR algorithm processes the image, the pixels typically exhibit negative values. The image is corrected by gain G and Offset, which are set to 5, 25, respectively. As shown in formula (6):

R _msrcr (x,y)'＝G·R _msrcr (x,y)+Offset (6)

therefore, before the marine organism image is sent to the characteristic extraction network, the MSRCR image enhancement algorithm is used for enhancing the input image 5, so that the image quality can be improved, and the subsequent positioning detection of the image target is facilitated.

As shown in fig. 1, step S2 is then performed, and features of the enhanced image are extracted by using the MSAFPN network 2 ((Multi-Scale Attention Feature Pyramid Network, multi-scale attention feature pyramid network)) to obtain a Multi-scale feature map.

As an example, as shown in fig. 2, this step specifically includes:

first, four feature maps { C2, C3, C4, C5} are output through the res net50 (residual network) 201.

Further, this step is to output four feature maps { C2, C3, C4, C5} through the residual block of the ResNet 50.

Then, since the deep profile C5 is not suitable for detecting marine organisms of a small target, in addition to this, in order to promote the characteristic information of the shallow profile suitable for detecting a small target, the three profiles { C2, C3, C4} are operated by using the FPN (feature pyramid networks, characteristic pyramid network) 202, and the fused profiles { P2, P3, P4} are obtained by up-sampling and lateral connection from bottom to top.

Finally, the attention module 203 sequentially performs channel attention and spatial attention operations on the feature maps { P2, P3, P4} to obtain the multi-scale feature maps { N2, N3, N4} with rich feature information and location information.

In this step, channel attention operations are performed on { P2, P3, P4} respectively, in order to obtain finer features of interest. The target characteristic information obtained by using the channel attention is insufficient, so that the channel attention and the characteristic image of the previous channel attention are fused by using the channel attention adding and downsampling operation to obtain more characteristic information, the characteristic image after multiple times of fusion is sent to a spatial attention mechanism to obtain a multi-scale characteristic image { N2, N3, N4}, and the multi-scale characteristic image not only has finer characteristics, but also can be concentrated on the spatial position of marine organisms. The MSAFPN network 2 mainly completes feature fusion operation by continuously carrying out up-sampling and down-sampling, and not only is a channel attention mechanism capable of repeatedly extracting features added, but also a space attention mechanism is used for obtaining accurate position location, and a multi-scale feature map with rich feature information and position information can be obtained for subsequent detection through the operation.

For ease of understanding, the following description will be given of the ResNet50, FPN and attention module related to the MSAFPN network 2, and the MSAFPN network structure 2 is shown in FIG. 2, where the MSAFPN network 2 is composed of ResNet50201, FPN 202 and attention module 203.

The ResNet50201 is a ResNet50 with a full connection layer removed, the ResNet50 mainly comprises a convolution layer, a pooling layer and a residual block (RESBLOCK), firstly, carrying out the operations of convolution of 7*7, batch standardization (Batch Normalization, BN), reLU activation function and maximum pooling of 3*3 on input data, and then obtaining four layers of feature graphs { C2, C3, C4 and C5} through four residual blocks with the number of 3, 4, 6 and 3 respectively, wherein the feature graphs are more and more obvious in feature and the size is reduced by 2 times. Four residual blocks with the number of 3, 4, 6 and 3 are respectively arranged in the ResNet50, wherein each residual block contains two types of residual structures, one is named as Convblock, the other is named as Identityblock, the distribution of each residual block is one Convblock and a plurality of Identityblocks, and the residual block structures are shown in figure 3.

In fig. 3, (a) is a Convblock structure diagram, and (b) is an identify block structure diagram. The input dimension and the output dimension of Convblock are different and cannot be continuously connected in series, the dimension of a channel is changed by adding a convolution layer in a jump connection structure, the dimension of the channel is the same as the dimension of three convolution outputs, and the dimension of the channel is added to achieve the purpose of changing the output dimension of a feature map, while the input dimension and the output dimension of Identityblock are the same, and a plurality of Identityblocks can be connected in series, so that the purpose of deepening a network is achieved.

The FPN network performs feature fusion on the feature map through operations of top-down upsampling and cross-connect convolution, and the top-down upsampling and cross-connect structure is shown in fig. 4.

The size of the Resnet50 feature map is half of the size of the feature image of the previous layer, so that the receptive field of the shallow feature map is smaller, the receptive field of the deep feature map is larger, and the receptive field of the shallow feature map is suitable for detecting small targets and large targets. Features of the feature map become more and more obvious after each residual block, so that feature semantic information of the shallow feature map is low, and feature semantic information of the deep feature map is high. Aiming at the small target of most marine organisms, shallow feature images with smaller receptive fields are needed, but the feature semantic information of the shallow feature images is low, so that the FPN carries out up-sampling on the number of the fourth-layer feature image reduction channels and the number of the third-layer feature image reduction channels to carry out addition fusion to obtain a fusion feature image P4, carries out up-sampling operation on the fused feature images and carries out addition fusion on the number of the second-layer feature image reduction channels to obtain a fusion feature image P3, and similarly carries out up-sampling on the feature fused second-layer feature images and carries out addition fusion on the first-layer feature images to obtain a fusion feature image P2. The FPN fuses the shallow features and the deep features to obtain the feature map, so that the feature information of the shallow feature map is richer, the feature map can be used for subsequent detection tasks, and a large target which possibly appears can be detected by using the multi-scale feature map comprising the deeper feature map of the third layer, so that the condition of missing detection is avoided.

In order to increase the attention of the network to the marine organism characteristics and to suppress the influence of the background on the network, an attention module is used to obtain finer characteristics of the marine organism. There are three types of attention modules, channel attention, spatial attention, and mixed attention, respectively. Common attention modules are channel attention SENet, mixed attention CBAM, etc. The attention module adopted by the invention is of two types, namely channel attention and space attention. As shown in fig. 5, where (a) is a channel attention mechanism and (b) is a spatial attention mechanism.

For the channel attention module, the first step is to simultaneously execute Global maximum pooling (Global MaxPool) and Global average pooling (Global avgpool) operations on the input feature map along the spatial dimension, so as to obtain different spatial semantic description operators respectively. The two are passed through a shared perceptron, wherein the shared perceptron is composed of two full-connection layers, which can better fit complex correlations between channels, and greatly reduce parameter quantity and calculation quantity. And then adding and fusing the two channel attention feature vectors, finally obtaining a final channel attention weight vector Mc through an activation function sigmoid, and multiplying the weight vector with the input feature map to obtain the adjusted feature map.

For the spatial attention module, global maximum pooling (Global MaxPool) and Global average pooling (Global avgpool) operations are performed on the input feature map along the channel dimension, so as to obtain two different channel feature description operators respectively. And splicing the two, then performing convolution operations with convolution kernels of 7*7 and 3*3 respectively, adding and fusing the convolution results of the two, finally obtaining a final spatial attention weight vector Ms through an activation function sigmoid, and multiplying the weight vector and the feature map to obtain an adjusted final output feature map.

As shown in fig. 1, step S3 is then performed, and the multi-scale feature map is input into the RPN network 3 to obtain candidate boxes.

Specifically, the multiscale feature map obtained in step S2 is used as an input of an RPN network (i.e., a region suggestion network), features of each 3×3 region of the feature map are extracted through a sliding window with a convolution kernel size of 3×3 convolution, the size of the feature map is unchanged after the operation, and the channel dimension thereof becomes 512. An anchor of 9 is generated for each pixel point on the feature map, and then a convolution of 1×1 and a convolution of the number of output channels of 18 and 1×1 and an operation of the number of output channels of 36 are performed, respectively. For a classification layer with an output channel number of 18, 18 probability scores are obtained using a softmax activation function, where the 18 probability scores are obtained to represent the probability that 9 anchors generated for each pixel point predict it as foreground and background. For the regression layer with the output channel number of 36, 36 frame regression parameters are obtained, wherein the obtained 36 frame regression parameters represent that frame regression is performed on 9 anchors generated by each pixel point, and the number of the regression parameters is 4, and the regression parameters are respectively a center coordinate (x, y), a width w and a height h.

The 9 anchors in the fast R-CNN are generated by three rectangular frames with fixed sizes and proportions of {1:1,1:2,2:1}, the anchors crossing the boundary are ignored, and the rest of the anchors are adjusted to candidate frames by using frame regression parameters. The frame regression mainly brings the position of the remaining anchor closer to the real target position, so that a certain transformation relation needs to be found to adjust the position of the anchor closer to the real target position, and the specific transformation is as follows:

P _x ＝A _w d _x (A)+A _x (7)

P _y ＝A _h d _y (A)+A _y (8)

P _w ＝A _w exp(d _w (A)) (9)

P _h ＝A _h exp(d _h (A)) (10)

wherein A is _x ,A _y ,A _w ,A _h Respectively the center coordinates (x, y) of the anchors, and the width and the height; p (P) _x ,P _y ,P _w ,P _h The center coordinates (x, y) of the candidate frames are respectively wide and high; d, d _x ,d _y ,d _w ,d _h Is a frame regression parameter obtained through regression layer training. And according to formulas (3-3), (3-4), (3-5) and (3-6), the anchor position is adjusted through the frame regression parameters to obtain candidate frames. There will be a large amount of overlap between the adjusted candidate boxes, with the foreground-background probability score and non-maximum suppression (Non Maximum Suppression, NMS) used to obtain the final candidate box.

The method for setting the anchor comprises the following steps: the RPN in the original Faster RCNN configures 9 anchors for each point of the input feature diagram through a sliding window, and the RPN consists of scale parameters of three basic anchor sizes (128,256,512) and three proportions (1:1, 1:2, 2:1). After the feature pyramid is used, the traditional network outputs 5 feature graphs for generating candidate frames, wherein the size of a basic anchor corresponding to each feature graph is {32,64,128,256,512}, and scale parameters are (1:1, 1:2, 2:1) three proportions, so that 15 anchors can be obtained. The MSAFPN outputs a multi-scale feature map { N2, N3, N4} three sizes, so that the anchor of the RPN is set as the base anchor size {32,64,128 }, the scale parameter is (1:1, 1:2, 2:1) three proportions, a total of 9 anchors can be obtained, the 9 sizes are respectively distributed three according to the sizes of the N2, N3, N4 feature maps, N2 belongs to small targets suitable for detection, and therefore N2 is distributed according to the three proportions of the anchor size of 32, the scale parameter is (1:1, 1:2, 2:1), the anchor size is {64},128}, and the scale parameter is (1:1, 1:2, 2:1).

As shown in fig. 1, step S4 is finally executed, the candidate frame is mapped onto the multi-scale feature map and intercepted, the intercepted feature map is fixed to a uniform size, the feature map is flattened to a one-dimensional vector, and the target classification and the frame regression are performed through the full connection layer, so that the accurate positioning detection is realized.

In this step, the obtained feature map is fixed to a uniform size by the aid of the RoiAlign, flattened into a one-dimensional vector, and target classification and frame regression are performed through the full-connection layer, so that accurate positioning detection is achieved.

In the Fast RCNN4 part of the Fast RCNN, in the process of acquiring the region of interest (Region ofInterest, roI), the RoI pooling layer in the Fast RCNN4 performs twice quantization and rounding in the process of calculating the coordinate position, so that the corresponding pixel point of the original image cannot be mapped accurately in the subsequent frame regression positioning. The first quantization rounding is to cut out the candidate frame on the feature map in order to align the candidate region with the cells of the feature map when mapping the candidate region to the feature map, and the second quantization rounding is to fix the cut-out feature map to a uniform size.

Assuming that the width w=300 and the height h=300 of the candidate region of the input picture are equal to or greater than the width h=300, the step size stride=16 of the feature map generated by the feature extraction network, the width of the candidate region mapped onto the feature map is w/stride=18.75, the height of the candidate region mapped onto the feature map is h/stride=18.75, and the width and the height of the candidate region mapped onto the feature map after the first quantization rounding are 18, so that a first error occurs. The candidate frames are intercepted on the feature map, and the intercepted feature map is fixed to be uniform 7 multiplied by 7 because the sizes of the candidate frames with different sizes intercepted on the feature map are different, so that the training of the network is convenient. At this time, the feature map needs to be divided into 49 blocks, the width and height of the obtained small areas are 18/7=2.57, 18/7=2.57 respectively, and the final result is obtained by performing second quantization rounding and using maximum pooling for each 2×2 area. However, in the subsequent frame regression positioning, the original image needs to be mapped, and errors caused by the two quantization rounding operations are amplified in the mapping process, and the amplification factor is determined according to the stride and the fixed size of 7×7. The error of the above example is amplified to 112 times that of the original one, and when the area of the target is small, accurate positioning of the target cannot be achieved, and even the detection of the target may be affected. The RoiAlign does not round the floating point number in the Roi pooling layer, reserves the floating point number, and obtains the pixel value of the floating point number coordinate position by using a bilinear interpolation method.

The invention can improve the image quality by using MSRCR image enhancement, and the enhancement effect pair is shown in figure 6. As can be seen from the original image of FIG. 6, compared with the common image, the marine organism image has the problems of low contrast, bluish green color and the like, and can cause manual labeling and detection difficulties. From fig. 6, it can be seen that the image enhanced by MSRCR is greatly improved compared with the original image, and some objects which are not clearly seen before can be clearly seen after being enhanced.

The Average accuracy AP (Average-accuracy) is an evaluation index for evaluating the network detection effect, the AP value represents the detection accuracy of a single class, and mAP (meanAverage Precision) represents the Average value of APs in each class, and the mAP value is used to determine the performance of the network model. In order to verify the influence of image enhancement on the network detection effect, the network is extracted by taking the Faster RCNN as a reference and taking the ResNet50 as a characteristic, and the sea cucumber, sea urchin, scallop and starfish are detected as examples, and experimental comparison is carried out on the non-enhancement and MSRCR enhancement respectively, wherein the experimental result comparison is shown in the following table 1.

Table 1 comparison of experimental results with or without image enhancement

From table 1, it can be seen that the mAP value after MSRCR image enhancement is improved by 6.8% compared with the original image, which indicates that the image enhancement can improve the accuracy of network detection. The AP value of the sea cucumber is improved by 12.5%, the sea cucumber is green, the detection of the sea cucumber is influenced by the bluish-green image quality, and the influence of the background of the sea cucumber is ignored after the image is enhanced, so that the sea cucumber is easier to detect. The AP value of the scallop is improved by 8.9%, most of the scallop is covered by soil, the color of the soil is similar to that of the background, and the scallop is highlighted by neglecting the influence of the background after the image is enhanced so as to be detected. The sea urchin is black, and the background color has no great influence on the sea urchin, so the AP value does not change much. The AP value of the starfish is improved by 4.1 percent, and the starfish is blue similar to the sea cucumber, but the improvement of the mAP value of the starfish is less obvious than the sea cucumber because the image background is mainly green.

In addition, in order to prove that the technical scheme can obtain the beneficial effects, the technical scheme is compared with other different algorithms for detecting marine organisms. The experimental comparative results are shown in table 2 below.

TABLE 2 comparison of detection results of different detection algorithms on marine biological targets

It can be seen from table 2 that the detection results shown by the present solution are the best compared with other algorithms.

Example two

As shown in fig. 1, this embodiment provides a marine organism detection structure of a multi-scale attention-fused fast RCNN, which may be used to implement the detection method in the first embodiment, where the detection structure at least includes:

the MSRCR image enhancement module 1 is used for enhancing the input image 5;

the MSAFPN network 2 is used for extracting the characteristics of the image after the enhancement treatment and obtaining a multi-scale characteristic diagram;

an RPN network 3 for processing the multi-scale feature map and obtaining candidate boxes;

fast RCNN4, which is used to map the candidate frame to the multi-scale feature map and intercept the multi-scale feature map, fix the intercepted feature map to a uniform size, flatten the feature map to a one-dimensional vector, and conduct target classification and frame regression through the full connection layer, so as to realize accurate positioning detection.

As an example, as shown in fig. 2, the MSAFPN network 2 includes at least:

ResNet50201 for outputting four feature maps { C2, C3, C4, C5};

FPN 202 for operating the three feature maps { C2, C3, C4} and obtaining a fused feature map { P2, P3, P4} by up-sampling from bottom to top and connecting with the transverse direction;

the attention module 203 is configured to sequentially perform channel attention and spatial attention operations on the feature maps { P2, P3, P4}, so as to obtain the multi-scale feature maps { N2, N3, N4}, which have rich feature information and location information.

As an example, the res net50201 comprises at least a convolutional layer, a pooling layer, and a residual block.

The detection structure in the second embodiment is described in detail in the first embodiment, and will not be described here again.

In summary, after the image enhancement algorithm provided in the technical scheme enhances the marine organism image, the detection precision of the target can be improved. Aiming at the fact that most marine organisms are presented as small targets, the MSAFPN obtains more accurate characteristic information of the targets through multi-scale fusion and an attention module, and more marine organisms can be detected when the targets are detected. The RoiAlign operation can lead the distance deviation between the position of the frame regression positioning and the real position of the target to be very small in the target classification and the frame regression of the Fast RCNN part, and proves that the accuracy of target detection can be effectively improved.

Therefore, the invention effectively overcomes various defects in the prior art and has high industrial utilization value.

The above embodiments are merely illustrative of the principles of the present invention and its effectiveness, and are not intended to limit the invention. Modifications and variations may be made to the above-described embodiments by those skilled in the art without departing from the spirit and scope of the invention. Accordingly, it is intended that all equivalent modifications and variations of the invention be covered by the claims, which are within the ordinary skill of the art, be within the spirit and scope of the present disclosure.

Claims

1. A method for detecting marine organisms by using a multiscale attention-fused fast RCNN, the method comprising at least:

2. The method for marine organism detection by multi-scale attention-fused fast RCNN according to claim 1, characterized in that: in step 1), the MSRCR image enhancement algorithm is developed by Retinex, which essentially divides an image into a reflected image and a luminance image, as shown in the following formula (1):

S(x,y)＝R(x,y)·L(x,y) (1)

R _msrcr (x,y)＝C(x,y)·R _msr (x,y) (4)

R _msrcr (x,y)'＝G·R _msrcr (x,y)+Offset (6)。

3. the method for marine organism detection by multi-scale attention-fused fast RCNN according to claim 1, characterized in that: in the step 2), the MSAFPN network is adopted to extract the characteristics of the image after the enhancement treatment, and the step of obtaining the multi-scale characteristic map comprises the following steps of;

first, four feature maps { C2, C3, C4, C5} are output by ResNet 50;

4. The method for marine organism detection by multi-scale attention-fused fast RCNN according to claim 1, characterized in that: in step 3), inputting the multi-scale feature map into an RPN network, and obtaining a candidate frame includes:

5. The method for marine organism detection by multi-scale attention-fused fast RCNN according to claim 1, characterized in that: in the step 4), the intercepted feature images are fixed to be uniform in size by RoiAlign, then the feature images are flattened to be one-dimensional vectors, and target classification and frame regression are carried out through a full-connection layer, so that accurate positioning detection is realized.

6. A marine organism detection structure of multiscale attention fused fast RCNN, characterized in that the detection structure comprises at least:

the MSRCR image enhancement module is used for enhancing the input image;

7. The multi-scale attention-fused fast RCNN marine organism detection structure according to claim 6, wherein: the MSAFPN network at least comprises:

ResNet50 for outputting four feature maps { C2, C3, C4, C5};

8. The multi-scale attention-fused fast RCNN marine organism detection structure according to claim 7, wherein: the ResNet50 includes at least a convolutional layer, a pooling layer, and a residual block.