CN118097342A

CN118097342A - Sonar-based model training method, estimating device, device and storage medium

Info

Publication number: CN118097342A
Application number: CN202410525266.1A
Authority: CN
Inventors: 张志春; 赵远飞; 陈国军; 夏细明; 陈梦醒; 吕荣贤; 洪道玉; 陈巍; 郭铁铮; 王玉珍
Original assignee: Guangdong Institute Of Safety Production Science And Technology
Current assignee: Guangdong Institute Of Safety Production Science And Technology
Priority date: 2024-04-29
Filing date: 2024-04-29
Publication date: 2024-05-28

Abstract

The application provides a sonar-based model training method, an estimation method, a device, equipment and a storage medium, wherein the sonar-based model training method is used for acquiring a real sonar image data set and generating a simulation image data set through a sonar simulator, so that the acquisition difficulty of data is reduced, and more image data for subsequent training is acquired conveniently; the method comprises the steps of respectively classifying and combining a plurality of first simulation images of different underwater rock scenes and a plurality of second simulation images of different underwater scenes at the bottom of a ship to obtain a plurality of simulation image groups, which are beneficial to providing richer image information for model training, so that the accuracy of network training can be improved by training a composite gesture detection network and a track neural network through a real sonar image data set and the simulation image groups, the recognition accuracy of the network is improved, and the gesture parameters of a target and the estimation accuracy of the running track of the target are improved.

Description

Sonar-based model training method, estimating device, device and storage medium

Technical Field

The application relates to the field of deep learning, in particular to a sonar-based model training method, a sonar-based model training device and a sonar-based model training storage medium.

Background

In recent years, deep Learning (DL) techniques have been used in image processing to identify and locate objects in images, such as in automatic driving automobiles to identify pedestrians or traffic signs, and in accurately classifying and identifying a number of different categories of objects contained in images into corresponding target categories, such as in medical images to identify diseases. However, in terms of acquiring a sonar image for a diversified and complex underwater scene, for example, a sonar device is mounted on a ship (such as an underwater robot) to acquire the sonar image, and target gesture detection is performed by using the sonar image, there is still a problem that the recognition accuracy is low and the image processing effect is poor.

Disclosure of Invention

The embodiment of the application provides a sonar-based model training method, a sonar-based model training estimation method, a sonar-based model training device, a sonar-based model training equipment and a sonar-based model training storage medium, which are used for solving at least one problem existing in the related art, and the technical scheme is as follows:

In a first aspect, an embodiment of the present application provides a method for sonar-based model training, including:

Acquiring a real sonar image data set, and generating a simulation image data set through a sonar simulator, wherein the simulation image data set comprises a plurality of first simulation images of different underwater rock scenes and a plurality of second simulation images of different underwater scenes at the bottom of a ship, and the real sonar image data set, the first simulation images and the second simulation images all have tag information, and the tag information comprises gesture information and motion information;

respectively classifying and combining a plurality of first simulation images of different underwater rock scenes and a plurality of second simulation images of different underwater scenes at the bottom of the ship to obtain a plurality of simulation image groups;

Training a composite gesture detection network and a trajectory neural network through the real sonar image dataset and the simulated image set;

the composite gesture detection network is used for estimating gesture parameters of the target, and the trajectory neural network is used for estimating the driving trajectory of the target.

In one embodiment, the generating, by the sonar simulator, the simulated image dataset comprises one of:

The method comprises the steps of inputting first sensing data of different underwater rock scenes and second sensing data of different underwater scenes at the bottom of a ship, which are acquired in real time through a sonar sensor, into a sonar simulator to generate a simulated image data set;

Generating a first sonar image of different underwater rock scenes and a second sonar image of different underwater scenes at the bottom of a ship through a sonar model, and inputting the first sonar image and the second sonar image into a sonar simulator to generate a simulated image data set;

acquiring first historical sensing data of different underwater rock scenes and second historical sensing data of different underwater scenes at the bottom of a ship, and inputting the first historical sensing data and the second historical sensing data into a sonar simulator to generate a simulation image data set.

In one embodiment, the classifying and combining the plurality of first analog images of different underwater rock scenes and the plurality of second analog images of different underwater scenes at the bottom of the ship to obtain a plurality of analog image groups includes:

sequentially arranging a plurality of first simulation images corresponding to each underwater rock scene according to time or space, and sequentially arranging a plurality of second simulation images corresponding to each underwater scene at the bottom of the ship according to time or space;

respectively carrying out first grouping on a plurality of first simulation images which are sequentially arranged in each underwater rock scene, and respectively carrying out second grouping on a plurality of second simulation images which are sequentially arranged in each underwater scene at the bottom of the ship;

And determining a plurality of simulation image groups according to the first grouping result and the second grouping result.

In one embodiment, the method further comprises:

acquiring pixel point information of each real sonar image in the real sonar image data set, wherein the pixel point information comprises sector angle information of each pixel point;

according to the sector angle information of each pixel point, the pixel points of the real sonar image are ordered according to angles;

Based on the sector, the pixel points in each real sonar image are respectively spliced by using the sequencing result, so that a plurality of new real sonar images are obtained.

In one embodiment, the composite gesture detection network includes a CNN network, a preset number of tetrads, a preset number of first fully-connected layers, and a second fully-connected layer, and training the composite gesture detection network by the real sonar image dataset and the simulated image set includes:

Generating a training set according to the real sonar image data set and the simulated image data set, wherein the training set comprises a plurality of training images, and the training images comprise real sonar images and images in a simulated image group;

dividing each training image into a preset number of sub-images through the CNN network, and connecting corresponding quadrants of each sub-image through space coordinates of pixel points;

respectively inputting the sub-images into four-in-one networks for processing, inputting an output result of each four-in-one network into a corresponding first full-connection layer, and inputting an output result of each first full-connection layer into a second full-connection layer to obtain a posture parameter of a target;

Determining a first loss value according to the attitude parameter, the attitude information and the loss function;

Training the composite gesture detection network according to the first loss value until a first termination condition is reached.

In one embodiment, the training trajectory neural network includes, by the real sonar image dataset and the simulated image set:

Dividing each training image according to a track neural network to obtain a preset number of sub-images corresponding to each training image;

Performing compression filtering processing on the sub-images, and splicing the preset number of sub-images subjected to the compression filtering processing to obtain spliced training images;

Carrying out reasoning operation on all the spliced training images to obtain a driving track of the target;

Calculating a second loss value according to the motion information and the running track;

training the trajectory neural network according to the second loss value until a second termination condition is reached.

In a second aspect, an embodiment of the present application provides an estimation method:

collecting a real-time sonar image;

And inputting the real-time sonar images into a composite gesture detection network and a trajectory neural network, and estimating to obtain gesture parameters of the target and a driving trajectory of the target.

In a third aspect, an embodiment of the present application provides a sonar-based model training device, including:

the acquisition module is used for acquiring a real sonar image data set and generating a simulated image data set through a sonar simulator, wherein the simulated image data set comprises a plurality of first simulated images of different underwater rock scenes and a plurality of second simulated images of different underwater scenes at the bottom of the ship, and the real sonar image data set, the first simulated images and the second simulated images all have tag information which comprises gesture information and motion information;

The combination module is used for respectively classifying and combining a plurality of first simulation images of different underwater rock scenes and a plurality of second simulation images of different underwater scenes at the bottom of the ship to obtain a plurality of simulation image groups;

The training module is used for training a composite gesture detection network and a trajectory neural network through the real sonar image data set and the simulation image set;

In a fourth aspect, an embodiment of the present application provides an electronic device, including: a processor and a memory in which instructions are stored, the instructions being loaded and executed by the processor to implement the method of any of the embodiments of the above aspects.

In a fifth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program, which when executed implements a method in any one of the embodiments of the above aspects.

The beneficial effects in the technical scheme at least comprise:

The real sonar image data set is acquired, and the sonar simulator is used for generating the simulation image data set, so that the difficulty in acquiring data is reduced, and more image data for subsequent training are acquired conveniently; the method comprises the steps of respectively classifying and combining a plurality of first simulation images of different underwater rock scenes and a plurality of second simulation images of different underwater scenes at the bottom of a ship to obtain a plurality of simulation image groups, which are beneficial to providing richer image information for model training, so that the accuracy of network training can be improved by training a composite gesture detection network and a track neural network through a real sonar image data set and the simulation image groups, the recognition accuracy of the network is improved, and the gesture parameters of a target and the estimation accuracy of the running track of the target are improved.

The foregoing summary is for the purpose of the specification only and is not intended to be limiting in any way. In addition to the illustrative aspects, embodiments, and features described above, further aspects, embodiments, and features of the present application will become apparent by reference to the drawings and the following detailed description.

Drawings

In the drawings, the same reference numerals refer to the same or similar parts or elements throughout the several views unless otherwise specified. The figures are not necessarily drawn to scale. It is appreciated that these drawings depict only some embodiments according to the disclosure and are not therefore to be considered limiting of its scope.

FIG. 1 is a flowchart illustrating steps of a sonar-based model training method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of sonar parameters under different scenarios according to an embodiment of the present application;

FIG. 3 is another schematic diagram of sonar parameters under different scenarios according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a composite gesture detection network according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a four-in-one network according to an embodiment of the present application;

FIG. 6 is a diagram illustrating confidence levels according to an embodiment of the present application;

FIG. 7 is a graph illustrating a precision confidence curve according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a test result according to an embodiment of the application;

FIG. 9 is a schematic diagram illustrating steps of an estimation method according to an embodiment of the present application;

FIG. 10 is a schematic diagram of a trace of an embodiment of the present application;

FIG. 11 is a block diagram of a sonar-based model training device according to an embodiment of the present application;

Fig. 12 is a block diagram of an electronic device according to an embodiment of the application.

Detailed Description

Hereinafter, only certain exemplary embodiments are briefly described. As will be recognized by those of skill in the pertinent art, the described embodiments may be modified in various different ways without departing from the spirit or scope of the present application. Accordingly, the drawings and description are to be regarded as illustrative in nature and not as restrictive.

Noun interpretation:

Track estimation: trajectory estimation is based on the fact that in the case of some known observation data, the motion trajectory of an observation target or object in time and space is presumed by an algorithm model. This process typically involves analyzing and processing sensor data (e.g., radar, camera, GPS, etc.) to predict its likely location and path of movement, etc., without directly observing the target.

Underwater micro navigation: underwater micro-navigation refers to the process of accurately navigating and performing tasks in a limited underwater space by using various sensors and navigation techniques in an underwater environment. Such navigation systems are commonly used in the fields of underwater exploration, marine research, resource exploration, underwater engineering, and the like.

In the related art, there are drawbacks and disadvantages such as the following during the generation of the image itself and during the processing and use:

1. Limiting the authenticity of the simulation, the simulation data may not fully capture the complexity and diversity in the real underwater environment. Thus, after the model is trained on simulation data, it may face generalization problems in real scenes.

2. The data distribution is not matched, and the training model may cause performance degradation in a real environment due to the distribution difference between the simulation data and the real data. In practical applications, features of the sonar image may vary due to underwater conditions, water quality, and other factors, which are difficult to fully simulate in simulation. The model overfits the simulation scene and the model may tend to overfit to the specific scene and noise distribution in the simulation and not adapt well to the changes in the real scene. This may lead to performance degradation in actual underwater work tasks.

3. Lacking real world complexity, a sonar simulator may not accurately simulate various complex conditions in an underwater environment, such as ocean currents, tides, sub-sea terrain, and the like. These factors have a significant impact on navigation trajectory estimation, and simulation may not fully account for these factors.

4. It is difficult to simulate sensor noise, and in a real underwater environment, a sonar system may suffer from various disturbances and noise, such as multipath effects, uncertainty of underwater propagation, etc. It is difficult for a sonar simulator to accurately simulate these complex sensor noises, thereby affecting the robustness of the model to real noise.

5. Computational complexity, generating high quality sonar simulation data may require significant computational resources and time. This can be a bottleneck in the training process, especially in deep learning tasks that require large-scale data.

6. The lack of diversity, the simulation data may tend to be in fixed scenes and conditions, lacks diversity. In real underwater tasks, robots may face various unknowns, and the limitations of simulation data may make it difficult for models to cope with these.

In the related art, the computing time in the visual image processing of the underwater robot is long, the real-time performance is lacking, and the image dataset cannot be computed in real time to estimate the posture and the track of the underwater platform.

When the problems of single color and low resolution of the real sonar image are solved in the related technology, the combined sonar image annotation is constructed by an annotation method based on the reconstruction track. To overcome this challenge, a Deep Learning (DL) framework was improved to improve sonar image processing performance and trajectory estimation accuracy.

In the related art, the existing CNN-4b network has the following disadvantages:

(1) The data demand is greater: CNN-4b networks typically require extensive labeling of data sets for training to ensure that the network is able to learn a feature representation with a broad generalization. In cases where the data is limited or unbalanced, the performance of the network may be affected;

(2) Object boundary handling challenges: in a quadrant segmentation task, the network needs to handle boundaries for the quadrants, which may be limited by the fixed receptive field size of the convolution operation, resulting in insufficiently accurate localization of the quadrant edges. Additional boundary handling mechanisms are needed to improve accuracy;

(3) Computing resource requirements: deep CNN models, such as CNN-4b, typically require extensive computational resources to train and infer. This can pose challenges in resource-limited environments, limiting the actual availability of the network in certain application scenarios;

(4) Poor interpretation: the representation of features inside the CNN model is often difficult to interpret due to its deep structure and complexity.

The prior GeoNet-PoseNet has the following defects:

(1) The computational resource requirements are high: because GeoNet-PoseNet employ a deep hierarchy, its training and inferred computational resource requirements are relatively high. This may limit the application of the model in resource-constrained environments, especially on embedded systems or mobile devices.

(2) The data volume requirement is large: deep models typically require extensive labeling of the data set for training to obtain sufficient generalization capability. The cost of data collection and labeling may be a limiting factor for some particular fields or tasks.

(3) Poor interpretation: the internal decision process of GeoNet-PoseNet is difficult to interpret due to the complexity of the deep learning model. This may be inadequate in some scenarios where the interpretation requirements of the model are high.

(4) Generalization capability for specific scenarios: the deep learning model may be overfitted on certain scenes, resulting in a reduction in its generalization ability. The performance of the model may be challenged, especially in the face of significantly different scenes or lighting conditions than the training dataset.

When the model is actually applied GeoNet-PoseNet, the advantages and disadvantages are fully considered, and proper adjustment and optimization are carried out in combination with the requirements of specific application scenes and tasks so as to exert the advantages of the model to the greatest extent. Future research may focus on improving the efficiency of models, reducing the reliance on computing resources, and exploring more interpreted deep learning models.

Referring to FIG. 1, a flowchart of a sonar-based model training method according to an embodiment of the present application may include at least steps S100-S300:

s100, acquiring a real sonar image data set, and generating a simulation image data set through a sonar simulator.

Optionally, the real sonar image data set is collected by a sonar device, and includes a plurality of real sonar images, for example, a specific target object (such as a dummy or the like) is provided in advance in a specific experimental field after the experimental field is selected and confirmed according to a preset test requirement, the sonar device is carried on an underwater robot to perform sonar image collection work, and the sonar device can capture the real sonar images of the specific target object (the underwater dummy or the like) in the test process.

Optionally, the simulated image dataset comprises an underwater rock scene dataset ("rock Field" dataset) comprising a number of first simulated images of different underwater rock scenes and a vessel dataset comprising a number of second simulated images of different underwater scenes of the vessel bottom. The real sonar image, the first simulation image and the second simulation image in the real sonar image dataset all have tag information.

S200, respectively classifying and combining a plurality of first simulation images of different underwater rock scenes and a plurality of second simulation images of different underwater scenes at the bottom of the ship to obtain a plurality of simulation image groups.

And S300, training a composite gesture detection network and a trajectory neural network through the real sonar image data set and the simulation image set.

The sonar-based model training method can be executed by an electronic control unit, a controller, a processor and the like of a computer, a mobile phone, a tablet, a vehicle-mounted terminal and the like, and also can be executed by a cloud server.

According to the technical scheme, the real sonar image data set is obtained, and the sonar simulator is used for generating the simulation image data set, so that the difficulty in obtaining data is reduced, and more image data for subsequent training can be obtained conveniently; the method comprises the steps of respectively classifying and combining a plurality of first simulation images of different underwater rock scenes and a plurality of second simulation images of different underwater scenes at the bottom of a ship to obtain a plurality of simulation image groups, which are beneficial to providing richer image information for model training, so that the accuracy of network training can be improved by training a composite gesture detection network and a track neural network through a real sonar image data set and the simulation image groups, the recognition accuracy of the network is improved, and the gesture parameters of a target and the estimation accuracy of the running track of the target are improved.

In one embodiment, the generation of the simulated image dataset by the sonar simulator in step S100 includes at least one of steps S110-S130:

s110, first sensing data of different underwater rock scenes and second sensing data of different underwater scenes at the bottom of the ship are acquired in real time through sonar sensors, and the first sensing data and the second sensing data are input into a sonar simulator to generate a simulation image data set.

In an embodiment of the application, different underwater rock scenarios include, but are not limited to, rock comprising different shapes, sizes and densities, being present in an underwater environment, and having different quality of water, illumination, depth of water, transparency, reflection, scattering, underwater vegetation, sand mud, rock bedding, underwater structures, such as cavities, grooves or other topographical features, and the like; the different underwater scenes at the bottom of the ship comprise a plurality of scene types constructed by the bottom of the ship based on different factors such as shoal areas, rock bottoms, seaweed areas, deep water areas, structural simulation, substrate diversity, tidal changes and the like.

Optionally, the first sensing data and the second sensing data of different underwater rock scenes at the bottom of the ship can be acquired in real time through the sonar sensor, the first sensing data and the second sensing data are input into the sonar simulator, and after being processed by the sonar simulator, an underwater rock scene dataset ("Rocky Field dataset") and a ship dataset are correspondingly generated, so that a simulated image dataset is formed, and further data which can be used for training are generated.

S120, generating a first sonar image of different underwater rock scenes and a second sonar image of different underwater scenes at the bottom of the ship through a sonar model, and inputting the first sonar image and the second sonar image into a sonar simulator to generate a simulated image data set.

Optionally, the working principle of the sonar sensor can be simulated through a sonar model trained in advance, the first sonar image of different underwater rock scenes and the second sonar image of different underwater scenes at the bottom of the ship are generated through inputting relevant parameters of different underwater rock scenes and different underwater scenes at the bottom of the ship in the sonar model, and then the first sonar image and the second sonar image are input into a sonar simulator for processing, so that a simulated image data set is generated.

S130, acquiring first historical sensing data of different underwater rock scenes and second historical sensing data of different underwater scenes at the bottom of the ship, and inputting the first historical sensing data and the second historical sensing data into a sonar simulator to generate a simulation image data set.

Optionally, the first historical sensing data of different underwater rock scenes and the second historical sensing data of different underwater scenes at the bottom of the ship, which are acquired by the sonar equipment, can be collected, and then the first historical sensing data and the second historical sensing data are input into the sonar simulator for processing, so that a simulated image data set is finally generated.

Optionally, the gesture information includes, but is not limited to, rotation matrix, translation vector (displacement vector) and scaling parameters, rotation axis, rotation angle, translation vector and scaling parameters, etc., the motion information includes, but is not limited to, information of horizontal and freedom degrees of x, y, z axis, such as motion sample of angular offset of x, y, z axis, etc., the tag information may also include image information, such as category tag of different underwater rock scenes, different underwater scenes at the bottom of the ship, or object tag of object in the scene, and information of image depth range, ambiguity, focus position, etc. The label information can be manually marked and acquired as a model and automatically generated according to the acquired image information. The distance, displacement and the like in the label information can be in a meter unit, the rotation angle is in a degree unit, the motion information can be manually marked according to the running path, and then the motion information is sent into a neural network model in a sonar simulator environment to generate a more detailed and dense path label, so that the path running underwater is guided.

It should be noted that, since the path labels generated by the labels are discontinuous, it is disadvantageous for the operation of the underwater equipment to use linear transformation to convert the three parameter labels into parameters measured in the same unit, i.e. to quantize the minimum and maximum values of each type of label, and to map the original values to new ranges, thus forming a continuous path. For example, the input image tag conversion operation may be performed as follows:

First, a minimum value and a maximum value are acquired. Minimum and maximum values of displacement and rotation are obtained from tag information generated by the simulator. The displacement parameters are then converted. For the displacement parameters, the minimum and maximum values are used for linear normalization. Finally, the rotation parameters are subjected to angle linear conversion. Conversion, for rotation parameters, linear normalization using minimum and maximum values:

Wherein, Is normalized value,/>Is the original value,/>And/>Respectively the minimum and maximum of the original value.

If the label conversion process has adverse factors on model establishment and training, the quantized label information can be subjected to fuzzy classification, and the final value after the label quantization can be divided into three grades of small, medium and large according to the size of the final value, so that the operation amount is reduced.

As shown in fig. 2 and 3, in the embodiment of the present application, when a sonar image is processed in this study, it is set in advance that: the motion of the sonar sensor requires six degrees of freedom X, Y and the horizontal and rotational degrees of freedom of Z to represent. Which correspond to translation in X, Y and Z-axes and rotation about each axis. During sonar detection, the altitude from the ocean floor is assumed to be constant, surroundingAnd/>The rotation of the (4) is negligible, and the DL technology is adopted to respectively carry out motion estimation/>, under 3 degrees of freedom, on the sonar image:

Wherein,: Representing x-axis offset,/>: Representing the y-axis offset,/>: Representing the z-axis angular offset:

Wherein, Representing the winding/>The angle of rotation of the axis, as shown in fig. 2 and 3, the forward/backward motion of the platform corresponds to the y-axis, while the lateral motion corresponds to the x-axis. The height from the sea floor is constant. Rotation around the z-axis of the corresponding parameter/>. The sonar height and pitch angle of fig. 2 are 2.5 m and 35 ° respectively for different underwater rock scenarios. The sonar height and pitch of fig. 3 are 0.32 m and 0.4 ° respectively for different underwater scenarios at the bottom of the vessel. I.e. different height and angle parameters are used in different scenes to generate the simulated image.

In one embodiment, step S200 includes steps S210-S230:

S210, sequentially arranging a plurality of first simulation images corresponding to each underwater rock scene according to time or space, and sequentially arranging a plurality of second simulation images corresponding to each underwater scene at the bottom of the ship according to time or space.

Optionally, there may be a certain number of first simulated images in each underwater rock scene, a certain number of second simulated images in each underwater scene at the bottom of the vessel, and the first simulated images, the second simulated images have specific generation times when generated, and the first simulated images in each underwater rock scene and the second simulated images in each underwater scene at the bottom of the vessel may have spatial characteristics, such as ordering, coordinate relationships, and the like. Therefore, the first simulation images corresponding to each underwater rock scene can be sequentially arranged according to time or space, and the second simulation images corresponding to each underwater scene at the bottom of the ship can be sequentially arranged according to time or space, so that the continuity of the first simulation images and the second simulation images can be maintained, and the training effect of a network can be improved.

S220, respectively performing first grouping on a plurality of first simulation images which are sequentially arranged in each underwater rock scene, and respectively performing second grouping on a plurality of second simulation images which are sequentially arranged in each underwater scene at the bottom of the ship.

Optionally, after sequencing, the first group is performed on the plurality of first analog images sequentially arranged in each underwater rock scene, and the second group is performed on the plurality of second analog images sequentially arranged in each underwater scene at the bottom of the ship. It should be noted that, the number of images in the first grouping result and the second grouping result may be set according to the needs, and the first grouping result and the second grouping result may each have multiple groups of images. For example, there will be 5 first simulated images in each underwater rock scene, 5 second simulated images in each underwater scene at the bottom of the vessel, 5 images being a sequence, and 4 pairs of images, i.e. 4 sets of images, being made after grouping of one sequence.

S230, determining a plurality of simulation image groups according to the first grouping result and the second grouping result.

Therefore, based on the first grouping result and the second grouping result, a plurality of groups of images can be constituted, each group of images is denoted as a simulated image group, and a plurality of simulated image groups can be obtained.

In one implementation, the method of the embodiment of the present application may further include steps S200A-S200C:

S200A, acquiring pixel point information of each real sonar image in the real sonar image data set, wherein the pixel point information comprises sector angle information of each pixel point.

S200B, sorting the pixels of the real sonar image according to angles according to the fan angle information of each pixel.

And S200C, based on the fan shape, respectively splicing the pixel points in each real sonar image by using the sequencing result to obtain a plurality of new real sonar images.

Optionally, when each real sonar image in the real sonar image dataset is processed, pixel point information of the real sonar image, such as fan angle information of each pixel point, is obtained, and then the pixels of the real sonar image are ordered according to angles based on the fan angle information of one pixel point. Finally, based on the shape of the sonar image, such as a sector, and based on the layout of the sector, the pixel points in each real sonar image are respectively spliced by using the sequencing result to obtain a plurality of new real sonar images, namely new sector sonar images, so that if one real sonar image only shoots sonar image data of one object or target, the whole image of the object or target can be spliced and generated.

It should be noted that, by splicing and forming the analog image group, the method is beneficial to information fusion and multi-mode learning, and splicing images of different visual angles or channels can provide richer information, so that understanding of targets is enhanced, the method is very useful for identifying complex scenes or objects and environmental adaptability of various ends, and training effect of a network is improved.

It should be noted that, before step S200A or after step S200C, preprocessing may be performed on the obtained real sonar image, including but not limited to using algorithms such as median filtering and gaussian filtering, to remove spurious signals and abnormal points in the collected real sonar image, and then processing the spurious signals and abnormal points to extract a sonar image with high quality characteristics. In the processing process, the average 80% of noise can be removed, the image is clearer through contrast and brightness adjustment, and the final real sonar image data set has average 97% of availability and definition.

As shown in fig. 4 and 5, in one embodiment, the composite gesture detection network is a CNN-4QuadPoseNet network modified from existing CNNs and PoseNet, including a CNN network (including SPLIT LAYER split layer), a predetermined number of tetrads (QuadPoseNet subnets, such as QuadPoseNet1, quadPoseNet2, quadPoseNet3, quadPoseNet 4), a predetermined number of first fully connected layers, and a second fully connected layer.

In one embodiment, training the composite gesture detection network in step S300 with a real sonar image dataset and a simulated image set includes steps S310-S350:

S310, generating a training set according to the real sonar image data set and the simulation image data set.

Optionally, the real sonar image data set and the simulation image data set are divided into a training set, a verification set and a test set according to a certain proportion, for example, according to 9:0.5: 0.5. The training set comprises a plurality of training images, wherein the training images comprise real sonar images and images in a simulation image group. Wherein a portion of the data may be selected during training, referred to as mini-batch or batch size. The goal is to reduce the value of the mini-batch's loss function, which is selected from this portion of data during training, which may be completed as an iteration.

It should be noted that, the training set plays a role in updating parameters (weights and biases) of the model in the model generation process, so as to obtain better performance, and the behavior of the training set is that the model grasps knowledge and rules related to data; validation set): to correct and strengthen knowledge learned during training and to see if the effect of model training is going to the bad direction. The verification set is present to select the best performing model from a stack of possible models, which can be used to select the superparameter and evaluate the performance of the superparameter. Test set (test set): the test set is used for evaluating the final performance of the model and the generalization capability of the final model. But cannot be used as a basis for selection related to algorithms such as parameter adjustment, characteristic selection and the like. The function of the test set is embodied in the process of testing.

S320, dividing each training image into a preset number of sub-images through a CNN network, and connecting corresponding quadrants of each sub-image through space coordinates of pixel points.

In the embodiment of the present application, each training image is divided into a preset number of sub-images through a CNN network, for example, taking the preset number of sub-images as 4 as an example, each training image is divided into 4 sub-images, and then corresponding quadrants of each sub-image are vertically connected through spatial coordinates of pixel points, for example, to obtain 320×320 pixels sub-images, i.e., input.

S330, respectively inputting the sub-images into four-in-one networks for processing, inputting the output result of each four-in-one network into a corresponding first full-connection layer, and inputting the output result of each first full-connection layer into a second full-connection layer to obtain the attitude parameters of the target.

Optionally, the tetrad is a QuadPoseNet subnetwork, such as QuadPoseNet1, quadPoseNet, quadPoseNet, quadPoseNet, each QuadPoseNet subnetwork includes 8 convolutions (denoted as first convolutions, second convolutions, third convolutions, fourth convolutions, fifth convolutions, sixth convolutions, seventh convolutions, eighth convolutions, first average pooling layer, second average pooling layer, third full-connected layer, fourth full-connected layer) in order, e.g., the first convolutions have a filter size of 5×5, the number of convolution kernels is 6, the activation function uses Sigmoid, the first average pooling layer can be max-pooled, the filter size is 2×2, the steps are 2, the first average pooling layer uses 4×4 average pooling layers, the second to eighth convolutions have a filter size of 5×5, the number of convolution kernels is 16 or 6, and the third full-connected layer, fourth full-connected layer activation function uses Sigmoid.

Thus, the sub-images are respectively input into the four-in-one network, after being processed by the first convolution layer, the ReLU function is output into the first average pooling layer and then output into the third full-connection layer (FC 1), so as to obtain a first result, after being processed by the first convolution layer, the sub-images are processed by the second convolution layer, the third convolution layer, the fourth convolution layer, the fifth convolution layer, the sixth convolution layer, the seventh convolution layer and the eighth convolution layer, the output of the eighth convolution layer is input into the second average pooling layer, and then the output of the second average pooling layer is input into the second average pooling layer (FC 2), so as to obtain a second result, and the output result of the four-in-one network comprises the first result and the second result. Then, the first result and the second result are input to the first full-connection layer (2×1024) to obtain Output, and the Output result of each first full-connection layer is input to the second full-connection layer (8×1024, regression layer) to obtain the pose parameter of the target and the classification result (if a certain scene or a certain object is included, which is a segmentation mask-binary image of the object in the image, and the pixel value thereof indicates whether the object is included in the image. Alternatively, the pose parameters of the object may include 3 degrees of freedom in three directions of x, y, and z axes (Motion 3 DoF: motion 3 degrees of freedom), which may include translational, rotational, and scaling parameters of the object, and these parameters may be formally represented as a pose matrix or pose vector, which may include a rotation matrix, a translation vector, and scaling parameters, providing a comprehensive description of the pose transformation of the object in three dimensions, and the pose vector may include 3 degrees of freedom in three directions of x, y, and z axes (Motion 3 DoF: motion 3 degrees of freedom), which may include a rotation axis, a rotation angle, a translation vector, and scaling parameters, and the like.

And S340, determining a first loss value according to the attitude parameters, the attitude information and the loss function.

Alternatively, the regression output layer of the (MSE) loss function may be used to measure the estimation error during training. Since the conventional regression layer may not take into account the physical constraints of the motion parameters during the training process, in some extreme cases, the estimation result may deviate from the actual motion range. In order to cope with the problem, the method introduces a penalty function of a penalty mechanism of a predefined range of motion parameters to perform a first penalty value by modifying a regression layerIs calculated as follows:

Wherein, For the estimation of each degree of freedom in the pose parameters,/>For the kth degree of freedom of the x-axis,/>For the k-th degree of freedom of the y-axis,/>For the first degree of freedom of the z-axis,/>For quantification of the kth (x-axis, y-axis, z-axis) degree of freedom in the pose information, k represents an index of the training sample in the small lot, S is the size of the small lot, R is the number of parameters to be estimated, and r=3.

It should be noted that the motion penalty (WEIGHT DECAY) is a regularization technique used to constrain the weights of the model during the neural network training process. The method aims at limiting the values of network parameters by adding additional terms into a loss function, so as to prevent over-fitting and improve the generalization performance of the model, effectively prevent over-fitting, control the complexity of the model, increase the robustness on shared features and the like, and the motion penalty is an effective regularization means, is beneficial to balancing the complexity and generalization performance of the model and improves the performance of the model on unseen data.

In addition, the regression layer is modified to penalize the estimation in case the motion parameters are out of a predefined range, each degree of freedom can be penalized separately. Penalty describing each degree of freedomIs:

Wherein, Is a penalty parameter set to 100. B1 and B2 are boundaries defining a non-penalty interval, which are set to-10 and 10, respectively. Then with penalty interval/>The loss equation for (2) is:

where j is x, y or z, representing the penalty for the degree of freedom of the j-th coordinate axis, i.e., in some embodiments, this may be done Is used as a loss function to calculate a first loss value.

And S350, training the composite gesture detection network according to the first loss value until a first termination condition is reached.

Optionally, assuming that all training images are trained into one iteration, the first termination condition may be a loss threshold, for example, when the first loss value does not satisfy a value smaller than the loss threshold, performing the second iteration with all training images is continued until the first loss value is smaller than the loss threshold, and determining the trained composite gesture detection network based on the network parameters at this time.

In the training, a range may be set approximately by optimizing network parameters such as super parameters, and a super parameter (sample) may be randomly selected from the range, and the sampled value may be used to evaluate the recognition accuracy; then, the operation is repeated a plurality of times, the result of the recognition accuracy is observed, and the range required for reaching the super parameter is narrowed according to the result. By repeating this operation, the appropriate range of the super parameter can be gradually determined, and an optimal model can be obtained.

As shown in fig. 6 and fig. 7, taking the case that the identified object is an underwater human body (body), the training image includes human bodies in different scenes, and fig. 6 and fig. 7 are respectively a schematic diagram of the confidence coefficient output by the trained composite gesture detection network and a schematic diagram of the precision confidence coefficient curve. The two curves are the visual results of the target recognition model evaluation index for analyzing the performance of the model under different confidence levels (Confidence). They differ in the index of the evaluation.

(1) F1-Confidence Curve (confidence curve of output):

Where the F1 score is the harmonic mean of the Precision and Recall (Recall) that takes into account both the accuracy and the integrity of the model. The formula is:

confidence in target detection represents the degree of confidence of the model to the detection result. Typically, the model will give a confidence score when predicting the target, representing the probability that the model considers the prediction to be correct.

The F1-Confidence curve takes the confidence as the abscissa and the F1 score as the ordinate, and shows the change condition of the F1 score of the model under different confidence. By observing this curve, the overall performance of the model at different confidence levels can be appreciated, and appropriate confidence thresholds can be selected to balance accuracy and recall according to requirements.

On the curve there may be some fluctuation at the beginning, then gradually flattening and finally possibly dropping. Such shapes typically represent that at higher confidence, the performance of the model may begin to drop, as the model may filter out some real targets resulting in a reduced recall.

(2) Precision-Confidence Curve (Precision confidence curve):

the accuracy rate refers to how many samples of the model predicted as positive examples are true positive examples, and is one of important indexes for evaluating the accuracy of the model, and a calculation formula is as follows:

the Precision-Confidence curve takes the confidence as the abscissa and the accuracy as the ordinate, and shows the change condition of the accuracy of the model under different confidence. By observing this curve, the prediction accuracy of the model at different confidence levels can be known, helping to select the appropriate confidence level threshold to meet the required accuracy requirement. The curve generally rises rapidly and then tends to plateau. In this case, the model tends to have a higher accuracy with a higher confidence. Eventually, the accuracy may tend to be 1, which means that the model has few false positives at high confidence.

In summary, F1-Confidence Curve (confidence curve output) focuses more on balancing the accuracy and integrity of the model (combination of Precision and Recall), while Precision-Confidence Curve (confidence curve Precision) focuses more on the accuracy of the model (Precision).

In the embodiment of the application, in order to improve the robustness of the model, some noise, such as 10% partial shielding and 20% background complexity, is introduced in the training set to simulate the complex situation in the real world so as to ensure that the model has good generalization capability.

In the embodiment of the application, proper evaluation indexes such as accuracy, precision, recall rate, F1 score and the like are selected according to the properties of the task. Other possible parameters are set, such as batch size (batch size) or whether the evaluation mode is enabled. The test data is input into the neural network, and the output of the model is calculated by forward propagation. Ensuring that no gradient updates are made during testing. And calculating the selected evaluation index by using the real label of the test data and the prediction result of the model. These indices provide a quantitative assessment of the performance of the model. Depending on the nature of the task, the prediction results of the model may be visualized, which helps to more fully understand the performance of the model across different categories. As shown in fig. 8, the test result is shown that, taking the body (body) recognition as an example, the average confidence of recognition is 80%, and the test result shows that the accuracy of the network to the recognition of the sonar image target graph is higher, and the recognition effect is better. The test result shows that the CNN-4QuadPoseNet network model has remarkable effect in sonar image target identification. The model can accurately position and identify the position of the underwater target object in a complex environment, and the CNN-4QuadPoseNet is found to be obviously improved in the aspects of accuracy and robustness.

In one embodiment, the training of the trajectory neural network in step S300 by the real sonar image dataset and the simulated image set includes steps S301-S306:

S301, generating a training set according to a real sonar image data set and a simulation image data set.

It should be noted that the training set includes a plurality of training images, and the training images include real sonar images and images in the analog image group.

S302, dividing each training image according to the trajectory neural network to obtain a preset number of sub-images corresponding to each training image.

For example, the preset number is 4, and each training image is segmented through the trajectory neural network to obtain 4 sub-images.

It should be noted that, in the embodiment of the application, the network PoseNet-Normx which is designed autonomously is adopted as the trajectory neural network, so that the performance can be improved, and a new network model can be possibly improved or innovated in design, so that better performance can be expected to be obtained on a specific task, the inference of images acquired by underwater sonar can be better adapted, and the accuracy of target detection and attitude estimation can be increased. The network PoseNet-Normx is introduced more efficiently, which is helpful for accelerating training or deducing process and is suitable for application with high real-time requirement.

Meanwhile, the PoseNet-Normx neural network can supplement the functions of the CNN-4QuadPoseNet neural network, the CNN-4QuadPoseNet neural network has the phenomena of lower calculation efficiency, lower accuracy and the like on the underwater three-dimensional environment, and the PoseNet-Normx neural network is introduced to carry out reasoning budget on the underwater three-dimensional environment so as to obtain typical tag points of underwater movement.

Wherein PoseNet-Normx is composed of neurons, hierarchies and weights, and the PoseNet-Normx neural network performs tasks by learning the mapping relationship from input to output. The main components are as follows, input Layer: a layer accepting the raw input data, each neuron corresponding to a feature of the input data, a hidden layer (HIDDEN LAYER): layers between the input and output layers, each layer containing a plurality of neurons connected by weights to neurons of the previous layer. The depth of the hidden layer determines the depth of the network. Output Layer (Output Layer): the number of neurons in the output layer typically corresponds to the number of categories of tasks or the dimension of the regression target, which produces the final output result. Weight (Weights): each connection has an associated weight that determines the strength of the signal transmitted in the network. During training, these weights are updated by a back-propagation algorithm. Activation function (Activation Function): after each neuron, a nonlinear characteristic is introduced. Common activation functions include ReLU, sigmoid, and Tanh.

S303, performing compression filtering processing on the sub-images, and splicing the preset number of sub-images subjected to the compression filtering processing to obtain spliced training images.

S304, performing reasoning operation on all the spliced training images to obtain the driving track of the target.

In the embodiment of the application, compression filtering processing is performed on the sub-images through PoseNet-Normx, a preset number of sub-images after the compression filtering processing are spliced, for example, 4 sub-images are processed through an image compression algorithm and a filtering algorithm, and then the 4 sub-images are spliced to obtain a spliced training image. Then, the spliced training images are subjected to reasoning operation, so that motion data of the underwater moving target including but not limited to angle offsets of x, y and z axes and the like are obtained, and then a driving track of the target is generated based on the motion data. It should be noted that, the target may be an object in the training image, or since the sonar device is installed on the underwater robot (platform), and since the installation position of the sonar device and the underwater robot is fixed, the target may also be the underwater robot, and the motion data of the robot may be calculated based on the information in the acquired image, so as to estimate the running track of the robot.

S305, calculating a second loss value according to the motion information and the running track.

Similarly, taking the target as an example of the robot, the actual running track of the robot can be obtained based on the motion information of different images, and the second loss value can be calculated by combining the actual running track and the estimated running track of the robot.

And S306, training the trajectory neural network according to the second loss value until a second termination condition is reached.

Similarly, the second termination condition may be a loss threshold, and if all the spliced training images are trained once to be iterated, for example, if the second loss value does not satisfy the loss threshold, the second iteration is continued to be performed by using all the spliced training images until the second loss value is smaller than the loss threshold, and the trained trajectory neural network (PoseNet-Normx) may be determined based on the network parameters at this time.

In the embodiment of the application, the trained PoseNet-Normx network is subjected to simulation verification:

using the sonar simulator, a new data set is generated for verification of the trajectory estimator.

The scene is selected to create 904 tracks of noiseless images, the pitch angle and the height are kept the same from the sea bottom, the movement direction of the sonar is lateral, and the sonar moves along the tracks at constant 0.105-0.378m/s2 acceleration. Each point of the trajectory is represented by global cartesian coordinates (x _i,y_i), and a platform direction relative to the global scene directionCorresponding i=1, 2,3 when facing the y-axis direction. The first point of the track is placed at the origin (x ₁=0,y₁=0,θ₁ = 0). The coordinates are assumed to be on a plane parallel to the sea floor. The calculation method of the track points is as follows:

wherein/> ，/>，/>Are respectively/>, of each pair of imagesAnd/>Translation and/>Rotation estimation,/>I, i+1 represent indexes with respect to the platform direction of the global scene direction.

The image is mapped to global coordinates according to the reconstructed trajectory. To generate the labels, the pixel intensities for each point are interpolated from the image pixels, and the intensities of the pixels in the image are averaged when the images overlap. For clarity, only the pixels corresponding to the 24 center beams of each image are used for interpolation, except for the first and last images.

The performance of the DL method in terms of pose and trajectory estimation was verified using a ship bottom dataset obtained with real sonar. PoseNet-Normx network is trained using simulated hulls. The network is trained three times, different levels of noise are applied to the image, no noise, low level noise and high level noise, the levels measured in the real dataset being the same.

As shown in fig. 9, the embodiment of the present application further provides an estimation method, including steps S410 to S420:

s410, collecting real-time sonar images.

Sonar images are acquired in real-time, for example, by sonar equipment on an underwater robot.

S420, inputting the real-time sonar images into a composite gesture detection network and a trajectory neural network, and estimating to obtain gesture parameters of the target and a driving trajectory of the target.

The composite gesture detection network and the trajectory neural network are obtained by training the method in step S300.

By the method provided by the embodiment of the application, at least the effects can be achieved:

1) And the images generated by the sonar simulator participate in training of the deep learning model. In the process, the model realizes the capability of extracting information from a sonar image and accurately estimating the navigation track of the platform by learning the characteristics of the synthesized data set, and the track estimation reconstruction process has key significance for the actual navigation task, especially under the condition of facing a complex environment and limited observation conditions;

2) After the navigation track is estimated and restored, the sonar image can be further marked, so that the interpretation of the sonar image and the accuracy of navigation information are further improved, and the understanding and analyzing capability of sonar data in a real scene are enhanced;

3) The method has remarkable academic and application value in the aspect of processing monochromatic and low-resolution sonar images. Unlike the previous optical image estimation, the sonar image has the characteristics of single color and low resolution, so that the sonar has less motion estimation information, and a new architecture is established for processing the sonar image. The algorithm improves the safety performance and reliability of autonomous underwater navigation and remotely operated underwater vehicles;

4) Aiming at the defects of the prior CNN and GeoNet-PoseNet, a CNN-4QuadPoseNet structure is provided, and compared with a CNN-4b network, the CNN-4QuadPoseNet network model provided by the research has the advantages of less occupied operation reasoning resources, capability of remarkably improving the data processing and operation speed, higher accuracy of identifying underwater targets and the like. In general, CNN-4QuadPoseNet is first, the complexity of the model is reduced by reducing the number of network layers, reducing the number of parameters or adopting smaller convolution kernels and other modes, so that the calculated amount is reduced, and the design of a lightweight network structure is achieved; secondly, the calculation process of the model is decomposed into a plurality of parts to be performed in parallel by utilizing the parallel calculation technology, so that the calculation efficiency is improved; thirdly, a characteristic extraction method which is more suitable for underwater object identification is designed according to the characteristics of the underwater environment, for example, the influence of factors such as underwater illumination, water quality and the like on the characteristics of the object is considered, and corresponding data enhancement processing is carried out. The concrete improvement is as follows:

① Principle of network structure design

CNN-4QuadPoseNet is a deep neural network based on a four-branch QuadPose structure, and is designed to fully utilize the spatial position and posture information of a target object to improve the accuracy of posture estimation. The QuadPose structure contains four parallel convolutional neural network branches, which are used to extract the position (P), rotation (R), scale (S) and feature (F) of the target, respectively. Such a design aims at effectively learning and representing the position and posture information of the target object, respectively, so as to estimate the posture of the target more accurately.

② Selection of an optimization algorithm

In training CNN-4QuadPoseNet, a random gradient descent with momentum (SGD with momentum) of an optimization algorithm suitable for deep neural networks is selected to speed up the convergence speed of the model and avoid trapping in local minima. In addition, a learning rate scheduling strategy is employed, learning rate decays (LEARNING RATE DECAY) to better balance training speed and model performance.

③ Improvement of feature extraction method

CNN-4QuadPoseNet uses an improved feature extraction method to better capture position and attitude information of underwater objects. These improvements include the introduction of multi-scale convolution kernels (multi-scale convolution kernels) and nonlinear activation functions (e.g., leak ReLU) to the characteristics of the underwater target object to enhance the network's representation of complex objects. At the same time, by introducing a residual connection (residual connections) and an attention mechanism (attention mechanism), the network is able to better focus on important parts of the target object, thereby improving the accuracy of pose estimation.

5) The PoseNet-Normx network can infer the gesture and track of the underwater robot in the noise-free, low-level noise and high-level noise environments, so that the safety performance and reliability of autonomous underwater navigation and remote operation of the underwater vehicle are improved, as shown in fig. 10. It can be seen that with the trained network, even though the sonar characteristics of the training set are different, it is possible to produce a good annotation. And recording the path of the underwater robot every 1m forward according to a preset route of the underwater robot, and then generating a track curve. It should be noted that the sonar is designed to remain forward as it moves along the track to ensure that a consistent sonar direction is maintained in the sequence of successive images. This design helps to maintain consistency of the observations. The sonar collects the data of the environment as PoseNet-Normx, the result of reasoning of the network PoseNet-Normx is used as the motion path of the underwater robot, the connecting line in fig. 10 is the predicted motion path of the underwater robot, and the arrow is a black arrow to indicate the sonar direction of each 50 images.

TABLE 1

As shown in table 1, the recognition rate and the accuracy after improvement are significantly improved by 10% and 13% respectively compared with those before improvement. This shows that the improved algorithm effectively improves the accurate recognition capability of the system to underwater targets. The improved recognition speed is obviously accelerated and is reduced from 5 seconds/sheet to 3 seconds/sheet. The real-time performance is improved, and the system is more suitable for underwater sonar monitoring in a complex environment. The improved video frame rate increases from 12 fps to 18 fps, indicating that the system is more responsive and can process a continuous sonar image stream more quickly.

Referring to FIG. 11, there is shown a block diagram of a sonar-based model training device of an embodiment of the present application, which may include:

The acquisition module is used for acquiring a real sonar image data set and generating a simulation image data set through a sonar simulator, wherein the simulation image data set comprises a plurality of first simulation images of different underwater rock scenes and a plurality of second simulation images of different underwater scenes at the bottom of the ship, and the real sonar image data set, the first simulation images and the second simulation images all have tag information which comprises gesture information and motion information;

the training module is used for training the composite gesture detection network and the track neural network through the real sonar image data set and the simulation image set;

In the embodiment of the application, the training module is further used for:

The functions of each module in each device of the embodiments of the present application may be referred to the corresponding descriptions in the above methods, and are not described herein again.

Referring to fig. 12, a block diagram of an electronic device according to an embodiment of the present application is shown, the electronic device including: memory 310 and processor 320, memory 310 stores instructions executable on processor 320, and processor 320 loads and executes the instructions to implement the sonar-based model training method or the estimation method in the above embodiments. Wherein the number of memory 310 and processors 320 may be one or more.

In one embodiment, the electronic device further includes a communication interface 330 for communicating with an external device for data interactive transmission. If the memory 310, the processor 320 and the communication interface 330 are implemented independently, the memory 310, the processor 320 and the communication interface 330 may be connected to each other and communicate with each other through buses. The bus may be an industry standard architecture (Industry Standard Architecture, ISA) bus, an external device interconnect (PERIPHERAL COMPONENT INTERCONNECT, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown in fig. 12, but not only one bus or one type of bus.

Alternatively, in a specific implementation, if the memory 310, the processor 320, and the communication interface 330 are integrated on a chip, the memory 310, the processor 320, and the communication interface 330 may communicate with each other through internal interfaces.

An embodiment of the present application provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the sonar-based model training method or the estimating method provided in the above embodiment.

The embodiment of the application also provides a chip, which comprises a processor and is used for calling the instructions stored in the memory from the memory and running the instructions stored in the memory, so that the communication equipment provided with the chip executes the method provided by the embodiment of the application.

The embodiment of the application also provides a chip, which comprises: the input interface, the output interface, the processor and the memory are connected through an internal connection path, the processor is used for executing codes in the memory, and when the codes are executed, the processor is used for executing the method provided by the application embodiment.

It should be appreciated that the processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processor, digital signal processor (DIGITAL SIGNAL processing, DSP), application Specific Integrated Circuit (ASIC), field programmable gate array (fieldprogrammablegate array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc. A general purpose processor may be a microprocessor or any conventional processor or the like. It is noted that the processor may be a processor supporting an advanced reduced instruction set machine (ADVANCED RISC MACHINES, ARM) architecture.

Further, optionally, the memory may include a read-only memory and a random access memory, and may further include a nonvolatile random access memory. The memory may be volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may include a read-only memory (ROM), a Programmable ROM (PROM), an erasable programmable ROM (erasable PROM), an electrically erasable programmable EPROM (EEPROM), or a flash memory, among others. Volatile memory can include random access memory (random access memory, RAM), which acts as external cache memory. By way of example, and not limitation, many forms of RAM are available. For example, static random access memory (STATIC RAM, SRAM), dynamic random access memory (dynamic random access memory, DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (double DATA DATE SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (ENHANCED SDRAM, ESDRAM), synchronous link dynamic random access memory (SYNCHLINK DRAM, SLDRAM), and direct memory bus random access memory (direct rambus RAM, DR RAM).

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the processes or functions in accordance with the present application are fully or partially produced. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable devices. Computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another.

In the description of the present specification, a description referring to the terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.

Furthermore, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In the description of the present application, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

Any process or method description in a flowchart or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process. And the scope of the preferred embodiments of the present application includes additional implementations in which functions may be performed in a substantially simultaneous manner or in an opposite order from that shown or discussed, including in accordance with the functions that are involved.

Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions.

It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. All or part of the steps of the methods of the embodiments described above may be performed by a program that, when executed, comprises one or a combination of the steps of the method embodiments, instructs the associated hardware to perform the method.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules described above, if implemented in the form of software functional modules and sold or used as a stand-alone product, may also be stored in a computer-readable storage medium. The storage medium may be a read-only memory, a magnetic or optical disk, or the like.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that various changes and substitutions are possible within the scope of the present application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims

1. The sonar-based model training method is characterized by comprising the following steps of:

2. The sonar-based model training method of claim 1, wherein: the generating, by the sonar simulator, a simulated image dataset includes one of:

3. The sonar-based model training method of claim 1, wherein: the classifying and combining the first simulation images of different underwater rock scenes and the second simulation images of different underwater scenes at the bottom of the ship respectively to obtain a plurality of simulation image groups comprises:

4. A sonar-based model training method according to any of claims 1-3, characterized in that: the method further comprises the steps of:

5. A sonar-based model training method according to any of claims 1-3, characterized in that: the compound gesture detection network includes CNN network, predetermine a plurality of tetrads, predetermine a plurality of first full tie layers and second full tie layers, through true sonar image dataset with the simulation image group trains compound gesture detection network and includes:

6. A sonar-based model training method according to any of claims 1-3, characterized in that: through the real sonar image dataset and the simulated image set, the training trajectory neural network comprises:

7. An estimation method, comprising:

collecting a real-time sonar image;

Inputting the real-time sonar image into a composite gesture detection network and a trajectory neural network trained by any one of claims 1-6, and estimating to obtain gesture parameters of the target and a driving trajectory of the target.

8. Model trainer based on sonar, characterized by comprising:

9. An electronic device, comprising: a processor and a memory in which instructions are stored, the instructions being loaded and executed by the processor to implement the method of any one of claims 1-7.

10. A computer readable storage medium having stored therein a computer program which when executed implements the method of any of claims 1-7.