CN117372829A

CN117372829A - Marine vessel target identification method, device, electronic equipment and readable medium

Info

Publication number: CN117372829A
Application number: CN202311387377.2A
Authority: CN
Inventors: 张婷; 黄滔; 徐驰骋; 赵思恒; 陆晓玲
Original assignee: 711th Research Institute of CSIC
Current assignee: 711th Research Institute of CSIC
Priority date: 2023-10-25
Filing date: 2023-10-25
Publication date: 2024-01-09

Abstract

The application discloses a marine ship target identification method, a marine ship target identification device, electronic equipment and a marine ship target identification readable medium, wherein the marine ship target identification method comprises the steps of acquiring a ship navigation image to be identified; extracting features of the ship navigation image to obtain feature images on different layers; extracting the characteristic representation of the characteristic map in a multi-scale mode by applying cavity rolling and pooling operation to the characteristic map on each layer under the set cavity rate; fusing the feature representations of different layers by up-sampling and down-sampling operations to obtain fused features; based on the fusion features, bounding boxes and categories of objects are predicted using the convolution layer and full connection layer. The method and the device can enhance the feature extraction capability of the image, learn richer and more representative features, and effectively improve the feature expression capability under different scales and complex scenes. By means of the feature representation of the multi-scale feature map, information of different receptive fields can be captured, and the detection capability and the robustness of the multi-scale targets are improved.

Description

Marine vessel target identification method, device, electronic equipment and readable medium

Technical Field

The application relates to the technical field of image processing, in particular to a marine ship target identification method and device and electronic equipment.

Background

With the rapid development of the field of intelligent ships, the use of computer vision means to assist in the perception of a scene during navigation of a ship has become the choice of numerous ships. The machine vision method comprising the target recognition algorithm can provide intelligent auxiliary judgment information for crews, so that the safety of the ship during navigation is effectively improved, and meanwhile, important technical support is provided for future unmanned ship development. However, the actual offshore environment is complex and changeable, the visible light camera can only meet the recognition task in the daytime, the infrared camera is required to be introduced for assistance at night, but the resolution of the infrared camera is generally not high, so that the difficulty of algorithm recognition is increased. In addition, sea fog can cause bad conditions such as low visibility when the sea is sailed to lead to image quality to seriously decline, and the sea ship target scale phase difference is great, and the remote ship target is too little, very easily causes the problem such as leaking discernment.

Therefore, how to apply the marine ship target recognition method in the actual scene so as to effectively solve the problem that the ship target recognition task is the hot spot of the current research.

Disclosure of Invention

The technical purpose of the present application is to provide a method, an apparatus, an electronic device, and a readable medium for identifying a marine vessel target, which can improve the accuracy of marine vessel target identification.

To achieve the above technical object, the present application adopts the following method.

In a first aspect, the present application provides a method for identifying a marine vessel target, comprising:

acquiring a ship navigation image to be identified;

extracting features of the ship navigation image to obtain feature images on different layers;

extracting a feature representation of the feature map on each layer by applying a hole rolling and pooling operation at a set hole rate in a multi-scale manner;

fusing the feature representations of different layers by up-sampling and down-sampling operations to obtain fused features;

based on the fusion features, bounding boxes and categories of objects are predicted using a convolution layer and a full connection layer.

Further, the method further comprises:

and defogging treatment is carried out based on the ship navigation image.

Still further, defogging processing is performed based on the ship navigation image, including before:

and enhancing the brightness variation amplitude of the edge part in the ship navigation image to obtain the enhanced ship navigation image.

Further, based on the fusion features, predicting bounding boxes and categories of objects using a convolution layer and a full connection layer, comprising:

Carrying out space division on the fusion features according to a preset space division size to obtain a plurality of sub-feature images; rearranging pixels in each sub-feature map according to a channel sequence to form a new feature map; connecting the feature graphs of all the sub-feature graphs together according to a set sequence to obtain the final optimized fusion feature;

and predicting the boundary boxes and categories of the targets by utilizing a convolution layer and a full connection layer based on the fusion characteristics after final optimization.

Further, extracting the characteristics of the ship navigation image to obtain characteristic diagrams on different layers; extracting a feature representation of the feature map on each layer by applying a hole rolling and pooling operation at a set hole rate in a multi-scale manner; fusing the feature representations of different layers by up-sampling and down-sampling operations to obtain fused features; based on the fusion characteristics, predicting the boundary boxes and categories of the targets by utilizing a convolution layer and a full connection layer through a target recognition model;

the target recognition model is obtained by training the following method:

respectively acquiring a visible light ship navigation image training set and an infrared ship navigation image training set, wherein the visible light ship navigation image training set and the infrared ship navigation image training set are marked with the types of targets;

Training the target recognition model based on the visible light ship navigation image training set until the loss function of the target recognition model converges to obtain a visible light ship target recognition model, so as to call the visible light ship target recognition model to recognize the marine ship target under the daytime condition;

and training the target recognition model based on the infrared ship navigation image training set until the loss function of the target recognition model is converged to obtain an infrared ship target recognition model so as to call the infrared ship target recognition model to recognize the marine ship target under the condition of the night.

Still further, the method further comprises judging whether the current day condition or the night condition is present, and further determining to select the visible light ship target recognition model or the infrared ship target recognition model for marine ship target recognition;

judging whether the current day condition or the night condition comprises the following steps:

taking out r channel, g channel and b channel of the ship navigation image to be identified separately, and calculating average value of each channel respectively;

adding the average values of the three channels and dividing the average values by 3 to obtain an average rgb value of the ship navigation image;

Comparing the average rgb value with a preset value to determine whether the day or night is the day or night.

In a second aspect, embodiments of the present application provide a marine vessel target identification apparatus, the apparatus comprising:

the image acquisition module is used for acquiring a ship navigation image to be identified;

the target recognition model is used for extracting the characteristics of the ship navigation image to obtain characteristic diagrams on different layers; extracting a feature representation of the feature map on each layer by applying a hole rolling and pooling operation at a set hole rate in a multi-scale manner; fusing the feature representations of different layers by up-sampling and down-sampling operations to obtain fused features; based on the fusion features, bounding boxes and categories of objects are predicted using a convolution layer and a full connection layer.

Further, the target recognition model comprises a feature map extraction module, a cavity space convolution pooling pyramid, a feature fusion module, a target detection module and a loss function which are connected in sequence;

the feature map extraction module is used for extracting features of the ship navigation image to obtain feature maps on different layers;

the cavity space convolution pooling pyramid is used for extracting the characteristic representation of the characteristic graph in a multi-scale mode by applying cavity convolution and pooling operation under the set cavity rate to the characteristic graph on each layer;

The feature fusion module is used for fusing the feature representations of different layers by utilizing up-sampling and down-sampling operations to obtain fusion features;

the target detection module is used for predicting a target boundary box and a target category by utilizing a convolution layer and a full connection layer based on the fusion characteristics;

the loss function is used for calculating and optimizing to obtain the differences between the target boundary box and the target class and the real labels, and is used for optimizing parameters of the target recognition model.

Still further, the cavity space convolution pooling pyramid comprises 1 convolution layer, 3 pooling pyramids and 1 ASPP pooling layer, wherein the convolution layers are respectively connected with 3 pooling pyramids, and 3 pooling pyramids are respectively connected with the ASPP pooling layer.

Further, the device also comprises a feature optimization module, wherein the feature optimization module is provided with an input end and an output end, the input end is connected with the feature fusion module, and the output end is connected with the target detection module;

the feature optimization module is used for carrying out space division on the fusion features according to a preset space division size to obtain a plurality of sub-feature graphs; rearranging pixels in each sub-feature map according to a channel sequence to form a new feature map; connecting the feature graphs of all the sub-feature graphs together according to a set sequence to obtain the final optimized fusion feature;

In a third aspect, an embodiment of the present application provides an electronic device, including: a processor and a memory;

the memory is used for storing computer operation instructions;

the processor is configured to execute the marine vessel target identification method according to any one of the first aspect by calling the computer operation instruction.

In a fourth aspect, embodiments of the present application provide a computer readable medium having stored thereon a computer program for execution by a processor to implement a method of marine vessel target identification according to any of the first aspects.

Compared with the prior art, the marine ship target identification method provided by the embodiment of the application extracts the characteristic diagrams of a plurality of layers for the ship navigation image, and by applying the cavity rolling and pooling operation to the characteristic diagrams on each layer under the set cavity rate, the characteristic extraction capability of the image can be enhanced, richer and more representative characteristics can be learned, and the characteristic expression capability under different scales and complex scenes can be effectively improved. Through the semantic features of the multi-scale feature map, information of different receptive fields can be captured, the detection capability of the multi-scale targets is improved, and the performance and the robustness of target detection can be improved.

The marine ship target recognition device provided by the embodiment of the application can improve the problem of low resolution of pictures and improve the accuracy of small target recognition.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a method for identifying a target of a marine vessel according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of a marine vessel target identification method according to a second embodiment of the present disclosure;

FIG. 3 is an effect diagram before a defogging step in a marine vessel target recognition method provided in an embodiment of the present application;

FIG. 4 is an effect diagram of a defogging step in a marine vessel target recognition method provided in an embodiment of the present application;

FIG. 5 is a schematic view of a marine vessel object recognition device according to a third embodiment of the present application;

FIG. 6 is a schematic view of a marine vessel object recognition device according to a fourth embodiment of the present application;

Fig. 7 is a schematic diagram of a visible light recognition effect according to an embodiment of the present application;

FIG. 8 is a schematic diagram of an infrared recognition effect according to an embodiment of the present application;

FIG. 9 is a schematic view of a marine vessel object recognition device according to an embodiment of the present application;

FIG. 10 is a schematic diagram of an electronic device according to an embodiment of the present application;

reference numerals:

the marine vessel target recognition device comprises a 1-marine vessel target recognition device, a 10-image acquisition module, a 20-target recognition model, a 21-feature map extraction module, a 22-cavity space convolution pooling pyramid, a 23-feature fusion module, a 24-target detection module, a 25-loss function, a 26-feature optimization module, a 30-image defogging module, a 40-image enhancement module, 2-electronic equipment, a 210-processor, a 220-memory, a 230-bus and a 240-communication module.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

There are various types and sizes of vessels on the sea that have different appearances, sizes, speeds and patterns of behaviour, which makes the identification of vessels complex and difficult. In addition, the distance from the vessel to be observed during marine navigation may be large, which may lead to a blurred or illegible vessel detail. Remote observation may also be affected by factors such as weather, sea conditions, and atmospheric conditions, further increasing the challenges of vessel identification.

Therefore, a marine vessel target identification method and device with higher identification precision are required to be provided.

According to the marine ship target identification method provided by the embodiment of the application, the characteristic diagrams of the multiple layers are extracted from the ship navigation image, and the characteristic extraction capability of the image can be enhanced by applying the cavity rolling and pooling operation under the set cavity rate to the characteristic diagrams on each layer, so that richer and more representative characteristics can be learned, and the characteristic expression capability under different scales and complex scenes can be effectively improved. Through the semantic features of the multi-scale feature map, information of different receptive fields can be captured, the detection capability of the multi-scale targets is improved, and the performance and the robustness of target detection can be improved. In addition, in the embodiment of the application, the fused features are spatially divided according to the preset space division size to obtain a plurality of sub-feature images, and new feature images are formed based on the sub-feature images, so that the complexity of the whole method can be reduced, and the memory occupation and the calculation cost are reduced.

The present application is further described below with reference to the drawings and specific examples.

Example 1

As shown in fig. 1, the present embodiment provides a marine vessel target recognition method, which is applicable to an electronic device, and includes:

step 101: and acquiring a ship navigation image to be identified.

It will be appreciated that in particular embodiments, a dual spectrum (visible/infrared) camera may be built and data acquisition completed. Optionally, the camera is a Haikang Wei DS-2TD2667-25 dual-spectrum camera, wherein the resolution of visible light is 2688×1520, and the resolution of infrared light is 640×512. After the distance between the two cameras is determined, the cameras are installed and fixed on a special aluminum profile bracket. The bracket is fixed at the fence before a cab on a tug boat, and video data acquisition in the sailing process is completed along with the operation of the tug boat entering the sea.

Step 102: and extracting the characteristics of the ship navigation image to obtain characteristic diagrams on different layers.

In particular embodiments, feature extraction of images may be achieved by convolutional neural networks (Convolutional Neural Network, CNN), where convolutional operations are responsible for extracting features, such as multi-layer convolutional neural networks or multi-scale convolutional neural networks, etc.

Step 103: for the feature map on each layer, the feature representation of the feature map is extracted in a multi-scale manner by applying a hole rolling and pooling operation at a set hole rate.

Step 104: and fusing the feature representations of different layers by utilizing upsampling and downsampling operations to obtain fused features. Downsampling may be used to extract higher level semantic features and upsampling may restore the spatial resolution of the feature map for fusion with other levels of features. In a specific embodiment, if each up-sampling feature map is laterally connected with a corresponding down-sampling feature map, the laterally connected feature maps are fused through a summation operation on an element-by-element basis. Thus, a fused representation of feature maps at different scales can be obtained to provide a multi-level, multi-scale representation of features.

Step 105: based on the fusion features, bounding boxes and categories of objects are predicted using the convolution layer and full connection layer.

In a specific example, the feature map on the opposite layer of the pyramid 22 can be pooled by using the hole space convolution, and the feature representation of the feature map can be extracted in a multi-scale manner by applying the hole convolution and pooling operations at different hole rates. The cavity space convolution pooling pyramid 22 is ASPP, is Atrous Spatial Pyramid Pooling, and can also be called as a cavity space pooling pyramid.

According to the embodiment of the application, through the application of the cavity rolling and pooling operation on the feature map under different cavity rates, the feature extraction capability of the image can be enhanced, richer and more representative features can be learned, and the feature expression capability under different scales and complex scenes can be effectively improved. Through the semantic features of the multi-scale feature map, the information of different receptive fields can be captured, the detection capability of the multi-scale targets is improved, the target detection performance and robustness can be improved, the problem of low resolution of the picture is solved, and the accuracy of small target identification is improved.

Example two

The method for identifying a target of a marine vessel according to this embodiment, as shown in fig. 2, includes:

step 201: and acquiring a ship navigation image to be identified.

Step 202: and extracting features of the ship navigation image to obtain feature images on different layers.

Step 203: for the feature map on each layer, the feature representation of the feature map is extracted in a multi-scale manner by applying a hole rolling and pooling operation at a set hole rate.

Step 204: and fusing the feature representations of different layers by utilizing upsampling and downsampling operations to obtain fused features.

Step 205: carrying out space division on the fusion features according to a preset space division size to obtain a plurality of sub-feature images; rearranging pixels in each sub-feature map according to the channel sequence to form a new feature map; and connecting the feature graphs of all the sub-feature graphs together according to a set sequence to obtain the final optimized fusion feature.

Step 206: based on the final optimized fusion characteristics, a boundary box and a category of the target are predicted by using a convolution layer and a full connection layer.

According to the embodiment, the fusion features are spatially divided according to the preset spatial division size to obtain a plurality of sub-feature images, and a new feature image is formed based on the sub-feature images, so that the complexity of the whole method can be reduced, and the memory occupation and the calculation cost are reduced.

In some embodiments, steps 102-105 or steps 201-206 may be implemented using the object recognition model 20. The object recognition model 20 can be seen in particular as described in the following examples.

In addition to acquiring the ship navigation image to be identified by using the camera, steps 102 to 105 or steps 201 to 206 in this example may be implemented by using the development kit of the Iweida Jetson AGX Orin 32 GB. As an edge calculation module, the AI calculation capacity of the Injetton AGX Orin 32GB is up to 275TOPS, and the model calculation and application development requirements of the example can be effectively met.

On the basis of the first or second embodiment, the marine vessel target identification method provided in other embodiments further includes a defogging step of defogging based on the vessel navigation image.

In a specific embodiment, a dark channel algorithm flow may be adopted, first, a dark channel of an original hazy image is calculated, an atmospheric light component a is calculated according to ascending pixel values in the dark channel, then the transmittance t of each pixel is calculated, and a haze-free image is calculated according to the atmospheric light component a and the transmittance t. And testing parameters of a dark channel algorithm under the sea fog condition, wherein the main test parameters are a minimum filter radius R of the dark channel and a guide median filter radius R.

In some examples, to save computing resources and increase computing speed, the defogging step is set to be optional, not on under weather conditions with better visibility, and on under heavy fog conditions. Therefore, a judging module is provided to decide whether to execute defogging process.

In some examples, prior to defogging processing based on the vessel voyage image, comprising: and (3) for the ship navigation image, enhancing the brightness variation amplitude of the edge part in the image, and obtaining the enhanced ship navigation image.

In a specific example, the pretreatment is performed by using a USM sharpening enhancement algorithm, and the USM sharpening enhancement algorithm generates a blurred image after low-pass filtering of an original image, and the original image is subtracted from the blurred image to obtain an image with high-frequency components reserved. And amplifying the high-frequency image by using one parameter, and then overlapping the high-frequency image with the original image to obtain an enhanced image.

The USM is a sharpening algorithm for improving visual effect by mainly enhancing the brightness variation amplitude of an edge part in an image, and assuming that an original image is I, a gaussian blur image is G, a sharpened image is D, and a sharpening formula is as follows:

where ω represents a weight and has a value in the range of 0.1 to 0.9, typically ω=0.6.

After obtaining the dark channel image, using guided filter optimization, adjusting the defogging effect by adjusting the minimum value filter radius of the dark channel on the defogging effect. The optimal parameter obtained by the test in this example is the minimum value filter radius r=4 of the dark channel, and the average value filter radius r=32 in the guide filter. The effect after defogging is shown in fig. 3 and 4.

Example III

The present embodiment provides a marine vessel target identification apparatus 1, as shown in fig. 5, the apparatus including:

the image acquisition module 10 is used for acquiring a ship navigation image to be identified;

the object recognition model 20 is used for extracting the characteristics of each image block to obtain a characteristic diagram of each image block; for each feature map, extracting semantic features of the feature map in a multi-scale mode by applying hole rolling and pooling operations under different hole rates; fusing semantic features of different levels by up-sampling and down-sampling operations to obtain fused features; based on the fusion features, bounding boxes and categories of objects are predicted using the convolution layer and full connection layer.

The marine vessel target recognition device 1 provided by the embodiment of the application adopts the target recognition model 20, so that the problems of low resolution and small target recognition can be solved.

In some embodiments, as shown in fig. 5, the object recognition model 20 includes a feature map extraction module 21, a hole space convolution pooling pyramid 22, a feature fusion module 23, an object detection module 24, and a loss function 25, which are connected in sequence.

The feature map extraction module 21 is configured to extract, for the feature map on each level, a feature representation of the feature map in a multi-scale manner by applying a hole convolution and pooling operation at a set hole rate. The feature fusion module 23 is configured to fuse the feature representations of different levels using upsampling and downsampling operations to obtain a fused feature. The object detection module 24 is configured to predict the object bounding box and the object class using the convolution layer and the full connection layer based on the fusion features. The penalty function 25 is used to calculate the optimization to obtain the target bounding box and the differences between the target class and the real labels and to optimize the parameters of the target recognition model 20.

In some embodiments, the object recognition model 20 is a modified YOLOv7 network structure, and the feature extraction module 21 in the prior art YOLOv7 network structure is used to perform feature extraction on the ship navigation image, so as to obtain feature images on different levels. YOLOv7 uses dark-53 as its backbone network. Darknet-53 is a 53-layer convolutional neural network that is used to extract features from an input image.

The feature fusion module 23 is implemented using a Neck network (neg network) of the prior art YOLOv7 network architecture. YOLOv7 introduced a network of Neck called PANet (Path Aggregation Network) for fusing feature maps of different scales. The PANet fuses the feature maps of different levels through up-sampling and down-sampling operations to detect targets on different scales.

The object detection module 24 is implemented using the header network (Head network) of the prior art YOLOv7 network architecture, which is responsible for predicting bounding boxes and categories of objects. The method comprises a series of convolution layers and full connection layers, and finally, predicted boundary frame coordinates and class probabilities are output.

YOLOv7 trains the network using a Loss function 25 called YOLOv3 Loss. The loss function 25 includes target box regression loss, target class loss, and target confidence loss for optimizing network parameters.

In this embodiment, a hole space convolution pooling pyramid 22 (ASPP module) is added to the backbone network (backhaul network) in the modified YOLOV7 network structure. The hole space convolution pooling pyramid contains one convolution layer (1×1), 3 pooling pyramids and 1 ASPP pooling layer. The space convolution is different from the general convolution Conv in that the expansion rate condition is that ASPP obtains four-scale feature images after four space convolutions are carried out on the input feature images, so that multi-scale information of the input image is obtained. The four hole convolutions are respectively: (1, 1), (3, 1, padding=6), (3, 1, padding=12), (3, 1, padding=18), so the ASPP module can effectively expand the receptive field of the convolutional neural network, wherein padding is a filling parameter used to control the filling mode of the convolutional operation.

A hole space convolution pooling pyramid 22 is added after the backhaul network ELAN module. The feature map input to the cavity space convolution pooling pyramid 22 is 20 multiplied by 1024, all features are cascaded through a 1 multiplied by 1 convolution layer and output to a head network SPPCSPC module after the self-feature extraction and combination of the cavity space convolution pooling pyramid 22, and compared with the feature map before modification, the feature map is distinguished in that the multi-scale information is obviously enhanced after the feature map is subjected to cavity convolution of the cavity space convolution pooling pyramid 22.

Example IV

As shown in fig. 6, in this embodiment, the marine vessel target identification apparatus 1 further includes a feature optimization module 26 having an input end connected to the feature fusion module 23 and an output end connected to the target detection module 24. The feature optimization module 26 is configured to spatially divide the fusion feature according to a preset spatial division size, so as to obtain a plurality of sub-feature graphs; rearranging pixels in each sub-feature map according to a channel sequence to form a new feature map; and connecting the feature graphs of all the sub-feature graphs together according to a set sequence to obtain the final optimized fusion feature.

In this embodiment, the feature optimization module 26 is implemented using an SPD-Conv module. The SPD-Conv module is to cut the SPD after a non-step convolution, and for the feature map of (s×s×c1), downsample the sub-feature map by a scale factor of 2, obtain 4 sub-feature maps (S/2×s/2×c1) by slicing at the SPD part, connect the self-feature along the channel latitude to obtain the feature map (S/2×s/2×4c1), and then use a non-step convolution layer (i.e. stride=1) for more retaining the feature information. The SPD-Conv module is placed in front of the head network detection head, so that excessive downsampling can be reduced to a certain extent, and detailed information of low resolution and small objects can be well reserved.

The SPD-Conv module is added before the head network RepConv module, and after the addition position is the last ELAN-W module cascade connection, the detection capability of low resolution and small targets is enhanced before the YOLO detection head.

The method is characterized in that the YOLOv7 network structure is improved, an ASPP module (Atrous Spatial Pyramid Pooling, cavity space convolution pooling pyramid 22) is added into a backbone network (backbone network), wherein the ASPP module can expand the receptive field under the condition of not losing resolution, and the feature extraction function of the original backbone network is optimized; considering that the infrared camera recognition effect is further improved, the method further adds an SPD-Conv module (Space To Depth) in the head network, and low resolution and small target recognition are further enhanced. The SPD-Conv consists of a space-to-depth (SPD) layer and a convolution-free step size (Conv) layer. The role of the space-to-depth (SPD) layer is to reduce each spatial dimension of the input feature map to the channel dimension while preserving the information within the channel. This may be achieved by mapping each pixel or feature of the input feature map to a channel. In this process, the size of the spatial dimension may decrease, while the size of the channel dimension may increase. The non-step convolution (Conv) layer is a standard convolution operation that is performed after the SPD layer. Unlike step convolution, the non-step convolution does not move over the feature map, but rather convolves each pixel or feature map. This helps reduce the over-downsampling problem that may occur in the SPD layer and retains more fine-grained information.

The combination of SPD-Conv is to connect the SPD layer and Conv layer in series. Specifically, the input feature map is firstly converted through the SPD layer, and then the output result is subjected to convolution operation through the Conv layer. The combination mode can reduce the size of the space dimension without losing information, and meanwhile, the information in the channel is reserved, so that the detection performance of the CNN on low-resolution images and small objects can be improved.

The embodiment of the application is based on the practical difficulty of marine vessel target recognition, and provides an improved YOLOv7 network structure to realize a target recognition model 20, an SPD-Conv module and an ASPP module are innovatively combined, the advantages of downsampling feature images are achieved while distinguishing feature information is reserved, and ASPP is helpful for capturing image context information in multiple scales, so that recognition accuracy is further improved. The improved network structure is simultaneously suitable for the visible light and infrared identification models, and compared with the network structure before modification, the network structure can respectively improve the precision of the visible light and infrared identification models and has certain universality.

In a specific example, implementing the above improved YOLOV7 network structure includes the following implementation operational steps:

(1) First a new module is built in common.

(2) The new module (i.e., the registration hole space convolution pooling pyramid 22ASPP and SPD-Conv module) is registered in yolo. The original network structure configuration file YOLOv7.yaml was modified and renamed YOLOv7-custom.

Common.py file in source code of YOLOv 5. The common. Py file contains the usual functions and classes of the YOLOv5 network structure.

YOLO. Py is a common file used to implement the main logic of the YOLO algorithm. This document typically contains the definition of the YOLO model, the calculation of the loss function 25, forward propagation and backward propagation core functions.

The common yolo.py file typically covers the following functions:

model definition: YOLO. Py defines the structure of the YOLO model, including the modules of feature extraction backbone network, feature pyramid, prediction head (boundingbox regression, class prediction, etc.). These modules are typically defined in terms of functions or classes and provide a method of forward propagation.

Calculation of the loss function 25: YOLO. Py may contain code to calculate the YOLO loss function 25. The loss function 25 is used to measure the gap between model predictions and real labels, and typically includes boundingbox regression loss, category loss, target presence loss, and the like.

Forward propagation: YOLO. Py implements the forward propagation algorithm of the YOLO model, outputting the input image through the model to obtain predicted bounding boxes, class confidence levels, etc. This involves the processes of preprocessing, feature extraction, feature fusion and prediction of the image.

Backward propagation: the yolo. Py algorithm may include an update algorithm for model parameters that improves the performance and accuracy of the model by calculating the gradient of the loss function 25 and updating the parameters of the model using gradient descent or other optimization methods.

(3) ASPP and SPD-Conv layers were added in YOLOv7-custom. Yaml, and finally a new network structure was used in the training script train. Py.

In order to add ASPP modules in the backhaul network, the common. Py file is modified in a first step, in which ASPP classes are added. And secondly, modifying the YOLO.py file, and adding an ASPP module in the parameter_model method. And thirdly, modifying the YOLOv7.Yaml file into YOLOv7-custom. Yaml, adding ASPP, and adding the ASPP after the backup network ELAN module.

In order to add an SPD-Conv module in a head network, a common. Py file is modified in the first step, and an SPD-Conv class is newly added in the common. Py file; secondly, modifying the YOLO.py file, and adding an SPD-Conv module in a parameter_model method; and thirdly, continuously modifying YOLOv7-custom.yaml, adding SPD-Conv, and adding the SPD-Conv into the ELAN-W module cascade.

In practical applications, the object recognition model 20 needs to be trained in advance. Wherein model. Yaml file specified in train. Py is YOLOv7-custom. Yaml, the network structure constructed by the method can be used for training.

The training data set may be constructed based on real ship acquisition data and the public data set.

Optionally, a visible light image dataset for ship target recognition is constructed based on the visible light video data in combination with other public datasets. Based on the infrared video data, an infrared image data set for ship target identification is constructed in combination with other public data sets (such as Ai Ruiguang electric public data sets).

It will be appreciated that the visible light image dataset is used to train the target recognition model 20, and training is stopped when the model accuracy meets the requirements, resulting in a visible light ship target recognition model 20. The obtained visible light ship target recognition model 20 may perform ship target recognition based on the visible light image data.

The infrared image data set is utilized to train the target recognition model 20, training is stopped when the model precision meets the requirement, the obtained infrared ship target recognition model 20 is obtained, and the infrared ship target recognition model 20 can perform ship target recognition based on the image data set at night.

When the manual labeling method is adopted for data labeling, it is noted that when the training data set is screened, various types of original data proportions in the data set need to be balanced, so that too many large targets or too many small targets in the training data are avoided, and the balance of the training samples is ensured.

As an example, the object recognition model 20 may comprise a visible light ship object recognition model 20 and an infrared ship object recognition model 20, both of which have the same network structure as the object recognition model 20 described above, and the training data sets of both of which are different.

The visible light ship target recognition model 20 is trained using the ship target recognition visible light data set, and the visible light ship target recognition model 20 adopts the target recognition model 20 described in the above embodiment. The convergence curve of the object recognition model 20 (improved YOLOv7 network structure) is shown in the following table, and the average recognition accuracy is 0.982. And under the same condition, if the prior art YOLOv7 network structure is adopted for training, the average recognition accuracy is 0.976. It can be seen that the object recognition model 20 provided in the embodiment of the present application can effectively improve the visible light ship recognition accuracy.

The first table is the comparison result of the improved YOLOv7 network structure trained by using the visible light data set and the YOLOv7 network structure experiment in the prior art.

List one

Similarly, the infrared ship target recognition model 20 is trained using the ship target recognition infrared data set, and the infrared ship target recognition model 20 employs the target recognition model 20 described in the above embodiment. The model convergence curve is shown in the following table two, and the average recognition accuracy is 0.920. And under the same condition, if the prior art YOLOv7 network structure is adopted for training, the average recognition accuracy is 0.911. It can be seen that the object recognition model 20 provided in the embodiment of the present application can effectively improve the infrared ship recognition accuracy.

And secondly, training by adopting an infrared data set, and comparing the improved YOLOv7 network structure with the experimental result of the YOLOv7 network structure in the prior art.

Watch II

In some embodiments, a day and night determination module (not shown) may be added to the target recognition device, and the calculation logic of the day and night determination module is determined by a statistical method to simplify the calculation. Firstly, respectively creating data sets containing 2000 pictures according to videos shot by a visible light camera, wherein one data set is a picture set shot by the visible light camera in daytime, and the other data set is a picture set shot by the visible light camera in the night. And processing pictures of two data sets in batches by using a Piclow library, taking r, g and b channels of a single picture out individually, calculating the average value of the r, g and b channels, adding the r, g and b channels, dividing the average value by 3 to obtain an average rgb value Rd of the picture, calculating all the white picture sets to obtain an average rgb value Rd of the daytime, and similarly, adopting the same method for the night pictures to obtain an average rgb value Rn of the night, and setting a judgment standard (namely a preset value) rz= (Rd+Rn)/2. In practical application, firstly, calculating an average rgb value Rl of the acquired visible light image, and judging that the visible light image is daytime when Rl > Rz, otherwise, judging that the visible light image is night.

According to the above judgment result, the visible light ship target recognition model 20 is called for real-time video detection and target recognition under the daytime condition, and if the night is judged, the infrared ship target recognition model 20 is called for real-time video detection and target recognition, and the detection result is that a recognition frame is marked in the video. Visible and infrared recognition effects are shown in fig. 7 and 8.

The embodiment of the application calls the visible light ship target recognition model 20 to perform real-time video detection and target recognition in the daytime by setting the daytime and night judging modules, and frames the detected ship in the real-time video. The infrared ship target recognition model 20 is invoked at night for real-time video detection and target recognition, where the detected ship is framed. According to the embodiment of the application, the double-spectrum camera can be used for scene perception during real ship navigation, and the ship target recognition modules under the daytime and night conditions are respectively trained based on the visible light and the infrared video, so that the whole-day target detection task is realized, and the ship target recognition efficiency is improved.

In some embodiments, as shown in fig. 9, the marine vessel target identification apparatus 1 further includes an image enhancement module 40 for preprocessing using a USM sharpening enhancement algorithm, where the USM low-pass filters the original image to generate a blurred image, and subtracting the blurred image from the original image to obtain an image with high-frequency components preserved. And amplifying the high-frequency image by using one parameter, and then overlapping the high-frequency image with the original image to obtain an enhanced image.

In some embodiments, the marine vessel target identification apparatus 1 further comprises an image defogging module 30. The image defogging module 30 calculates the dark channel of the original foggy image according to the dark channel algorithm flow, calculates the atmospheric light component A according to ascending order of pixel values in the channel, then calculates the transmissivity t of each pixel, and calculates the foggy image according to the atmospheric light component A and the transmissivity t. And testing parameters of a dark channel algorithm under the sea fog condition, wherein the main test parameters are a minimum filter radius R of the dark channel and a guide median filter radius R.

Optionally, in order to save computing resources and improve computing speed, the defogging module is set to be optional, and is not started under weather conditions with better visibility, and is started under conditions that fog is heavier. Therefore, a judging module is provided to decide whether to execute defogging process.

It should be noted that, the functions of each functional module of the marine vessel target identification apparatus 1 according to the embodiment of the present application may be specifically implemented according to the method in the embodiment of the method, and the specific implementation process may refer to the related description of the embodiment of the method, which is not repeated herein.

In the embodiment of the application, the image defogging module 30 is constructed, and the dark channel defogging method and the USM (Unsharpen Mask) sharpening enhancement algorithm are combined, so that the dark channel defogging method which is more suitable for sea fog conditions is provided, and as sea fog relates to the condition that the similarity of sky and sea water is higher, the dark channel defogging algorithm is directly used, so that the dark channel defogging algorithm can fail in a low-contrast large sky area, and the problems of blocking effect and the like can occur in a far-distance sky place. Therefore, the method proposes that the USM sharpening enhancement algorithm is firstly used under the sea fog condition, interference details and noise are removed, edge characteristics are sharpened, and isolation of the sea antennae is improved. The sharpening degree of the method is natural, and the method basically does not have adverse effect on the subsequent dark channel algorithm. The image defogging module 30 is used as a front data processing module of the target detection algorithm, is set as an optional function, is started when sea fog is large, and can further improve the recognition accuracy.

As an example, a schematic structural diagram of an electronic device 2 to which the scheme of the embodiment of the present invention is applied is shown in fig. 10, and as shown in fig. 10, the electronic device 2 may include a processor 210 and a memory 220. Wherein the processor 210 is coupled to the memory 220, such as via a bus 230. The processor 210 is configured to perform the marine vessel target identification method as described in any of the method embodiments above by invoking the computer operating instructions.

Optionally, the electronic device 2 may further comprise a communication module 240. It should be noted that, in practical application, the communication module 240 is not limited to one, and the structure of the electronic device 2 is not limited to the embodiment of the present invention.

Note that references in the specification to "one embodiment," "an embodiment," "example embodiments," "some embodiments," etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Furthermore, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.

The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the spirit or scope of the disclosure. Thus, the disclosure is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of marine vessel target identification, comprising:

acquiring a ship navigation image to be identified;

2. The marine vessel target identification method of claim 1, further comprising: and defogging treatment is carried out based on the ship navigation image.

3. The marine vessel target identification method according to claim 2, wherein defogging processing based on the vessel navigation image is previously performed including:

4. The marine vessel target identification method of claim 1, wherein predicting bounding boxes and categories of targets with convolution layers and full connection layers based on the fusion features comprises:

5. The marine vessel target identification method according to claim 1, wherein the marine vessel navigation image is subjected to feature extraction to obtain feature images on different levels, and feature representations of the feature images are extracted in a multi-scale manner by applying a hole rolling and pooling operation at a set hole rate to the feature images on each level, and the feature representations of the different levels are fused by up-sampling and down-sampling operations to obtain fusion features; based on the fusion characteristics, predicting the boundary boxes and categories of the targets by utilizing a convolution layer and a full connection layer through a target recognition model;

the target recognition model is obtained by training the following method:

6. The marine vessel target identification method according to claim 5, further comprising determining whether a daytime or a nighttime condition is present, thereby determining whether to select the visible light vessel target identification model or the infrared vessel target identification model for marine vessel target identification;

7. A marine vessel object identification device, the device comprising:

8. The marine vessel target identification device according to claim 7, wherein the target identification model comprises a feature map extraction module, a cavity space convolution pooling pyramid, a feature fusion module, a target detection module and a loss function which are connected in sequence;

9. The marine vessel target identification device of claim 8, wherein the void space convolution pooling pyramid comprises 1 convolution layer, 3 pooling pyramids, and 1 ASPP pooling layer, wherein the convolution layers are respectively connected to 3 of the pooling pyramids, and 3 of the pooling pyramids are each connected to the ASPP pooling layer.

10. The marine vessel target identification device of claim 8, further comprising a feature optimization module having an input connected to the feature fusion module and an output connected to the target detection module;

the feature optimization module is used for carrying out space division on the fusion features according to a preset space division size to obtain a plurality of sub-feature graphs; rearranging pixels in each sub-feature map according to a channel sequence to form a new feature map; and connecting the feature graphs of all the sub-feature graphs together according to a set sequence to obtain the final optimized fusion feature.

11. An electronic device, comprising: a processor and a memory;

the memory is used for storing computer operation instructions;

the processor for executing the marine vessel object identification method according to any one of claims 1 to 6 by invoking the computer operation instructions.

12. A computer-readable medium comprising, on which a computer program is stored, execution of the computer program by a processor may implement a method for marine vessel object identification according to any of claims 1-6.