CN112329615A

CN112329615A - Environment situation evaluation method for autonomous underwater visual target grabbing

Info

Publication number: CN112329615A
Application number: CN202011214592.9A
Authority: CN
Inventors: 王楠; 杨学文; 崔燕妮; 胡文杰; 辛国玲; 张兴
Original assignee: Ocean University of China
Current assignee: Ocean University of China
Priority date: 2020-11-04
Filing date: 2020-11-04
Publication date: 2021-02-05
Anticipated expiration: 2040-11-04
Also published as: CN112329615B

Abstract

The invention relates to the technical field of computer vision, and particularly discloses an environment situation evaluation method for autonomous grabbing of an underwater visual target₁Training, target detection and recognition network N after training₁Capable of identifying dangerous objects in underwater source image shot by any monocular cameraThe risk coefficient evaluation grades of the positions and the dangerous objects are combined with the depth estimation image to generate a corresponding environment situation evaluation graph so as to evaluate the loss of the true value image with the environment situation, and therefore the information fusion network N is subjected to₂Optimizing the optimized information fusion network N₂The generated environment situation evaluation graph can be used as an important support for subsequently performing underwater environment operation tasks such as path planning, autonomous obstacle avoidance and grabbing and the like, so that the robot can be guided to realize the optimal behavior at a higher level.

Description

Environment situation evaluation method for autonomous underwater visual target grabbing

Technical Field

The invention relates to the technical field of computer vision, in particular to an environment situation evaluation method for autonomous underwater visual target grabbing.

Background

The mechanical arm needs to perform operation tasks in a complex and dynamic underwater environment and has the capacity of intelligently analyzing the external environment. The risk assessment of the surrounding environment is an important guarantee for the safety of the mechanical arm and the underwater carrier and the operation completion rate. Meanwhile, the environmental situation assessment can also be regarded as a process that the robot recognizes the surrounding environment and models the environment, and is an important support for subsequent path planning. Compared with a traditional artificial potential field method, a behavior decomposition method, an optimization-based algorithm and the like, the environment situation assessment can guide the robot to realize the optimal behavior at a higher level. At present, for underwater mechanical arm operation, considered factors are not comprehensive enough, a relatively complete environmental situation assessment strategy is not provided, a path can be reasonably planned for a robot, and autonomous obstacle avoidance and grabbing are achieved.

Disclosure of Invention

The invention provides an environmental situation evaluation method for autonomous underwater visual target grabbing, which solves the technical problems that: at present, for underwater mechanical arm operation, considered factors are not comprehensive enough, a relatively complete environmental situation assessment strategy is not provided, a path can be reasonably planned for a robot, and autonomous obstacle avoidance and grabbing are achieved.

In order to solve the technical problems, the invention provides an environmental situation assessment method for autonomous underwater visual target grabbing, which comprises the following steps: the method comprises the following steps:

s1: collecting underwater source images under various underwater dynamic scenes by adopting a monocular camera to generate an underwater source image data set;

s2: performing distance estimation on each underwater source image in the underwater source image data set to obtain a corresponding depth estimation image;

s3: determining dangerous object position information and danger coefficient evaluation grade information of each underwater source image, and generating an environment situation evaluation truth value image by combining the depth estimation image;

s4: training target detection and recognition network N by using each underwater source image and corresponding dangerous object position information and danger coefficient evaluation grade information₁；

S5: detection and recognition network N using trained targets₁Generating information, generating an environment situation evaluation graph by combining the depth estimation image corresponding to each underwater source image, and fusing a network N by using loss optimization information between the environment situation evaluation graph and the environment situation evaluation truth value image₂。

Further, the step S2 specifically includes the steps of:

s21: carrying out restoration processing on the underwater source image I by adopting a maximum attenuation recognition algorithm to obtain a restored image J;

s22: respectively extracting red channels J of the restored image J and the underwater source image I^RAnd I^RCalculating J^RAnd I^RAs a distance coefficient

S23: and normalizing the distance coefficient d to obtain a depth estimation image.

Further, in step S21, the maximum attenuation identification algorithm specifically includes the steps of:

s211: estimating global background light A; the method specifically comprises the following steps:

1) filtering the R channel of the underwater source image I by using a maximum filter with adjustable window size to obtain a corresponding depth image;

2) finding out pixel points with the lowest brightness of 10% in the depth image corresponding to each image block to correspond to the underwater source image I, and obtaining the background light of each image block according to the pixel points, wherein the number of the image blocks is v x w;

3) integrating the backlights of all image blocks to estimate the global backlight A of the R channel^R(x)；

4) Estimating the global background light A of the G channel of the underwater source image I by the same principle and step as the R channel^G(x) And global backlight A of B channel^B(x)；

S212: estimating a propagation coefficient xi; the method specifically comprises the following steps:

1) according to

Estimating a propagation coefficient ξ of an R channel of the underwater source image^R(x) Where Ω (x) represents a local area, y represents the position of a pixel point in said local area, I^R(y) represents the pixel value of the pixel point corresponding to the R channel;

2) calculating the propagation coefficient xi of the G channel of the underwater source image by the same principle and step as the R channel^G(x) And propagation coefficient xi of B channel^B(x)；

S213: and obtaining a restored image J according to the underwater light propagation model I (x) ═ J (x) xi (x) + A (1-xi (x)), wherein x is the position of the pixels in the underwater source image I and the restored image J.

Further, in the step S3:

the position information of the dangerous object adopts the coordinates of the central point of the target frame and the length and width description of the target frame;

the risk coefficient evaluation grade information calibrates the risk of the target into at least 5 risk grades with the risk degree from high to low according to the influence degree of the target in the underwater source image data set on the autonomous operation, and the corresponding risk grade is used as a classification label of the corresponding target;

according to GT ═ D ∑ η_iN(i|μ_iσ) calculating the environmental situation of each of the underwater source imagesEvaluating a truth image, wherein D is the depth estimation image generated in the step S2, eta_iA classification label for the ith target in the underwater source image, N (i | mu)_iσ) is the ith two-dimensional Gaussian distribution, μ_iAnd sigma is standard deviation of the central coordinate of the ith target frame, and the sigma can be adaptively set according to the image size.

Further, in the step S4, the object detection and identification network N₁Adopting fast RCNN; the network of the Fast RCNN mainly comprises an area generation network and a Fast RCNN network; the regional generation network, namely the RPN network, is composed of a feature extraction network, a parallel first classification network and a first boundary frame regression network; the Fast RCNN network consists of the feature extraction network, an ROI Powing network, a parallel second classification network and a second bounding box regression network; the RPN network and the Fast RCNN network share the feature extraction network;

the working process of the RPN network is as follows:

1) the feature extraction network adopts a pre-training network model, inputs the underwater source image and outputs a feature map extracted from the underwater source image;

2) finding that each position in the feature map corresponds to 9 prior frames in the underwater source image, and performing secondary classification on all the prior frames by the first classification network by adopting a first softmax network to judge whether each prior frame contains a target;

3) the first bounding box regression network generates bounding box regression parameters of a prior frame containing a target so as to correct the prior frame to obtain a candidate frame;

the working process of the Fast RCNN network comprises the following steps:

1) extracting a feature area corresponding to each candidate frame from the corresponding feature map by the ROI Pooling network according to the coordinates of each candidate frame, dividing the feature area into n parts, and adjusting the feature map to n parts by using a Max Pooling method;

2) the second classification network classifies the risk coefficient of the dangerous object by adopting a second softmax network;

3) and the second bounding box regression network generates the bounding box regression parameters of the candidate box.

Further, in the step S4, the target detection and recognition network N is trained₁The method specifically comprises the following steps:

s41: training the RPN network; the method specifically comprises the following steps:

1) screening the candidate frames obtained by the RPN, removing the candidate frames near the boundary, and then screening by using non-maximum value inhibition;

2) extracting N from the screened candidate frame_clsA candidate box for calculating a loss function L of the RPN network_RPNSaid loss function L_RPNLoss of L from the first classification_clsAnd first bounding box regression loss L_locThe two parts are as follows:

where i denotes the index of the candidate box, p_iProbability of predicting as target region for i-th candidate box, p_i ^*A classification truth value for whether the candidate box contains a target; t is t_iIs the bounding box regression parameter, t, of the ith candidate box_i ^*Is the boundary box regression parameter true value, N corresponding to the ith candidate box_clsThe number of candidate frames for extraction is set to 256, N_reg2400, λ is the balance coefficient, in practice, N_clsAnd N_regIf the difference is too large, the two are balanced by a parameter lambda (lambda is 10); the first classification loss L_clsNetwork training with cross entropy loss description to classify whether the candidate box contains a target (positive and negative); the first bounding box regression loss L_locUsing smooth L1 Loss description to train the first bounding box regression network;

s42: fixing parameters of the first classification network and the first bounding box regression network in the RPN, initializing the feature extraction network by using pre-training network model parameters, and adopting the candidate boxes provided by the RPNTraining the Fast RCNN network, and calculating a loss function L of the Fast-RCNN network according to the classification result of the second softmax network and the boundary box regression parameter of the second boundary box regression network_Fast-RCNNSaid L is_Fast-RCNNCalculation formula and loss function L of RPN network_RPNThe same;

s43: fixing the parameters of the feature extraction network in the step S42, and fine-tuning the parameters of the first classification network and the first bounding box regression network in the RPN network;

s44: fixing the parameters of the feature extraction network in the step S42, and fine-tuning the parameters of the ROI Pooling network, the second classification network, and the second bounding box regression network in the Fast RCNN network.

Further, the step S5 specifically includes the steps of:

s51: network N for detecting and identifying objects₁Outputting new position information of the dangerous object and risk coefficient evaluation grade information to obtain a mask image; specifically, a mask image is initialized by using a 0 pixel value, and the pixel value of a predicted target area in the mask image is set as a corresponding number of a predicted risk level, and is normalized;

s52: inputting the mask image obtained in the step S51 and the depth estimation image obtained in the step S2 into an information fusion network N₂Carrying out feature extraction and fusion to generate an environment situation evaluation graph;

s53: calculating a situation evaluation loss function L between the environment situation evaluation chart generated in the step S52 and the environment situation evaluation truth value image generated in the step S3, and fusing a network N according to gradient back propagation update information₂The parameters of (1); the situation assessment loss function L is:

wherein P represents the coordinate positions of all the pixels in the environment situation assessment map, P represents the P-th pixel in P, y (P) represents the environment situation assessment value of the pixel P in the environment situation assessment map, and gt (P) represents the environment situation assessment value of the pixel P in the environment situation assessment truth value image.

The invention provides an environmental situation assessment method for autonomous underwater visual target grabbing, which comprises the steps of firstly, collecting underwater source images under various underwater dynamic scenes by using a monocular camera as a data set (step S1); further performing distance estimation on each underwater source image in the data set to obtain a corresponding depth estimation image (step S2); then, the position information of the dangerous objects and the evaluation grade information of the danger coefficient of each underwater source image are determined (step S3) and are used for training the target detection and recognition network N₁(step S4), and generating an environment situation evaluation truth value image (step S3) by combining the depth estimation image for the information fusion network N₂Optimization is performed (step S5).

Aiming at the operation of an underwater mechanical arm, the invention calibrates the position information of dangerous objects and the evaluation grade information of danger coefficients in the underwater environment in advance to detect and identify the target by the network N₁Training, target detection and recognition network N after training₁The position of a dangerous object in an underwater source image shot by any monocular camera and the risk coefficient evaluation grade of the dangerous object can be identified, a corresponding environment situation evaluation graph is generated by combining a depth estimation image, loss is formed by the environment situation evaluation truth value image, and therefore the information fusion network N is subjected to₂Optimizing the optimized information fusion network N₂The generated environment situation evaluation graph can be used as an important support for subsequently performing underwater environment operation tasks such as path planning, autonomous obstacle avoidance and grabbing and the like, so that the robot can be guided to realize the optimal behavior at a higher level.

Drawings

FIG. 1 is a flowchart illustrating steps of an environmental situation assessment method for autonomous underwater visual target grabbing according to an embodiment of the present invention;

FIG. 2 is a data processing flow chart of an environmental situation assessment method for autonomous underwater visual target grabbing according to an embodiment of the present invention;

fig. 3 is a comparison diagram of an underwater source image and an environmental situation evaluation diagram thereof provided by the embodiment of the invention.

Detailed Description

The embodiments of the present invention will be described in detail below with reference to the accompanying drawings, which are given solely for the purpose of illustration and are not to be construed as limitations of the invention, including the drawings which are incorporated herein by reference and for illustration only and are not to be construed as limitations of the invention, since many variations thereof are possible without departing from the spirit and scope of the invention.

In order to perform more complete situation assessment on the autonomous underwater visual target grabbing of the robot, the embodiment of the invention provides an environment situation assessment method for the autonomous underwater visual target grabbing, which specifically includes steps S1 to S5 as shown in a step flow chart shown in fig. 1. The processing procedure of various data in this embodiment is shown in fig. 2.

(1) Step S1

S1: and collecting underwater source images under various underwater dynamic scenes by adopting a monocular camera to generate an underwater source image data set.

(2) Step S2

S2: and performing distance estimation on each underwater source image in the underwater source image data set to obtain a corresponding depth estimation image.

Step S2 specifically includes the steps of:

the maximum attenuation identification algorithm specifically comprises the following steps:

1) filtering an R channel of an underwater source image I by using a maximum filter with adjustable window size to obtain a corresponding depth image;

2) finding out pixel points with the lowest brightness of 10% in the depth image corresponding to each image block to correspond to the underwater source image I, and obtaining the background light of each image block according to the pixel points, wherein the number of the image blocks is v × w (2 × 2 in the embodiment);

1) according to

Estimating propagation coefficient xi of R channel of underwater source image^R(x) Where Ω (x) represents a local area, y represents the position of a pixel point in the local area, I^R(y) represents the pixel value of the pixel point corresponding to the R channel;

It should also be noted that it is possible to mention,

the estimation process comprises the following steps:

the classical light scattering model is as follows: i (x) ═ j (x) ξ (x) + a (x) (1- ξ (x)), since the absorption and scattering coefficients of different colors of water are different, and therefore the attenuation under water is also different, the propagation coefficients of R, G, B three colors are considered separately, yielding:

I^R(x)＝J^R(x)ξ^R(x)+A^R(x)(1-ξ^R(x))

I^G(x)＝J^G(x)ξ^G(x)+A^G(x)(1-ξ^G(x))

I^B(x)＝J^B(x)ξ^B(x)+A^B(x)(1-ξ^B(x))

taking the maximum value on the local region Ω (x) for both sides of the above formula, assuming that the propagation coefficient ξ (x) and the background light a (x) are consistent in the local region Ω (x), we obtain:

taking the R channel as an example, the conversion is continued:

then, both sides are removed 1-A^R(x) Obtaining:

further transformation:

in view of the attenuation at close distances, the underwater background light a is usually darker, especially in deep sea, while, at a suitable window size, the closer the object is to the camera, the brighter the object area, the more the maximum value of J is approximately 1, at which time,

(3) step S3

S3: and determining the position information of the dangerous objects and the risk coefficient evaluation grade information of each underwater source image, and generating an environment situation evaluation truth-valued image by combining the depth estimation image.

The position information of the dangerous object adopts the coordinates of the central point of the target frame and the length and width description of the target frame, and generates the position information of the dangerous object by manual evaluation or other image processing means.

The risk factor evaluation grade information includes a classification label (risk grade) of each underwater source image. According to the influence degree of the target in the underwater source image data set on the autonomous operation, manually or by adopting other image processing means, the risk of the target is calibrated to at least 5 risk levels from high to low in risk degree, and the corresponding risk level is used as a classification label of the corresponding target. Considering that different underwater objects have different influences on the operation, such as two organisms of fish and aquatic weeds which often occur, the fish basically has no influence on the operation, the aquatic weeds are easy to entangle, and the influence range of the aquatic weeds is changed along with the types of the aquatic weeds and the water flow condition. The method grades the dangerousness of the underwater objects, and qualitatively classifies the dangerousness of the underwater objects into 5 grades: very dangerous, more dangerous, normal, less dangerous and safe, corresponding to the numbers [4,3,2,1,0], respectively.

According to GT ═ D ∑ η_iN(i|μ_iσ) calculating an environment situation evaluation truth image (GT) of each underwater source image, wherein D is the depth estimation image generated in the step S2, and η is_iA classification label for the ith target in the underwater source image, N (i | mu)_iσ) is the ith two-dimensional Gaussian distribution, μ_iAnd sigma is standard deviation of the central coordinate of the ith target frame, and the sigma can be adaptively set according to the image size.

(4) Step S4

S4: training target detection and recognition network N by using each underwater source image and corresponding dangerous object position information and danger coefficient evaluation grade information₁。

Target detection and identification network N₁Adopting fast RCNN; the network of the Fast RCNN mainly comprises an area generation network and a Fast RCNN network; region generation network (RPN) network feature extraction networkThe system comprises a CNN, a parallel first classification network softmax-1 and a first boundary frame regression network Regressor-1; the Fast RCNN network consists of a feature extraction network CNN, an ROI Pooling network, a parallel second classification network softmax-2 and a second bounding box regression network Regressor-2; the RPN and Fast RCNN share the characteristic extraction network CNN;

the work flow of the RPN network is as follows:

1) the characteristic extraction network CNN adopts a pre-training network model, inputs an underwater source image and outputs a characteristic diagram extracted from the underwater source image;

2) finding out that each position in the feature map corresponds to 9 prior frames in the underwater source image, and carrying out secondary classification on all the prior frames by the first classification network softmax-1 by adopting a first softmax network to judge whether each prior frame contains a target;

3) the first bounding box regression network Regressor-1 generates bounding box regression parameters of a prior frame containing a target so as to correct the prior frame to obtain a candidate frame;

the working flow of the Fast RCNN network is as follows:

1) extracting a feature area corresponding to the candidate frame from the corresponding feature map by the ROI Pooling network according to the coordinates of each candidate frame, dividing the feature area into n x n parts, and adjusting the feature map to n x n size by using a Max Pooling method;

2) the second classification network softmax-2 classifies the risk coefficient of the dangerous object by adopting the second softmax network, and the risk coefficient is set as 5 in the invention;

3) and generating the boundary frame regression parameters of the candidate frame by the second boundary frame regression network Regressor-2.

In the present step S4, the target detection and recognition network N is trained₁The method specifically comprises the following steps:

s41: training an RPN network; the method specifically comprises the following steps:

1) screening candidate frames obtained by the RPN, removing the candidate frames near the boundary, and then screening by using non-maximum value inhibition;

2) extracting N from the screened candidate frame_clsA candidate frame for calculating a loss function L of the RPN network_RPNLoss function L_RPNLoss of L from the first classification_clsAnd first bounding box regression loss L_locThe two parts are as follows:

where i denotes the index of the candidate box, p_iProbability of predicting as target region for i-th candidate box, p_i ^*A classification truth value for whether the candidate box contains a target; t is t_iIs the bounding box regression parameter, t, of the ith candidate box_i ^*Is the boundary box regression parameter true value, N corresponding to the ith candidate box_clsThe number of candidate frames for extraction is set to 256, N_reg2400, λ is the balance coefficient, in practice, N_clsAnd N_regIf the difference is too large, the two are balanced by a parameter lambda (lambda is 10); first classification loss L_clsNetwork training with cross entropy loss description to classify whether the candidate box contains a target (positive and negative); first bounding Box regression loss L_locUsing smooth L1 Loss description to train a first bounding box regression network Regressor-1;

s42: fixing parameters of a first classification network softmax-1 and a first boundary frame regression network Regressor-1 in an RPN, initializing a feature extraction network CNN by using pre-training network model parameters, training a Fast RCNN network by using a candidate frame provided by the RPN, and calculating a loss function L of the Fast-RCNN network according to a classification result of a second softmax network and boundary frame regression parameters of a second boundary frame regression network Regressor-2_Fast-RCNN，L_Fast-RCNNCalculation formula and loss function L of RPN network_RPNThe same;

s43: fixing the parameters of the feature extraction network CNN in the step S42, and finely adjusting the parameters of a first classification network softmax-1 and a first boundary frame regression network Regressor-1 in the RPN;

s44: and fixing the parameters of the feature extraction network CNN in the step S42, and finely adjusting the parameters of the ROI Pooling network, the second classification network softmax-2 and the second bounding box regression network Regressor-2 in the Fast RCNN network.

(5) Step S5

Step S5 specifically includes the steps of:

s51: network N with object detection and recognition₁Outputting new position information (boundary box regression parameters) of the dangerous objects and risk coefficient evaluation grade information (risk coefficients) to obtain a mask image; specifically, a mask image is initialized by using a 0 pixel value, and the pixel value of a predicted target area in the mask image is set as a corresponding number of a predicted risk level, and is normalized;

s52: the mask image obtained in step S51 is input to the information fusion network N together with the depth estimation image obtained in step S2₂Carrying out feature extraction and fusion to generate an environment situation evaluation graph;

s53: calculating a situation evaluation loss function L between the environment situation evaluation chart generated in the step S52 and the environment situation evaluation truth value image generated in the step S3, and updating the information fusion network N according to the gradient back propagation₂The parameters of (1); the situational assessment loss function L is:

wherein, P represents the coordinate positions of all the pixel points in the environment situation evaluation graph, P represents the P-th pixel point in P, y (P) represents the environment situation evaluation value of the pixel point P in the environment situation evaluation graph, and gt (P) represents the environment situation evaluation value of the pixel point P in the environment situation evaluation truth value image.

After the underwater source image shown in fig. 3 is subjected to the environmental situation assessment method of the embodiment, not only is the target region clearly divided, but also the target region and the background region are strictly distinguished, the situation assessment is relatively complete, and the robot can be better guided to realize autonomous grabbing operation.

To sum up, the environmental situation assessment method for autonomous underwater visual target grabbing provided by the embodiment of the invention firstly adopts a monocular camera to collect underwater source images under various underwater dynamic scenes as a data set(step S1); further performing distance estimation on each underwater source image in the data set to obtain a corresponding depth estimation image (step S2); then, the position information of the dangerous objects and the evaluation grade information of the danger coefficient of each underwater source image are determined (step S3) and are used for training the target detection and recognition network N₁(step S4), and generating an environment situation evaluation truth value image (step S3) by combining the depth estimation image for the information fusion network N₂Optimization is performed (step S5).

Aiming at the operation of the underwater mechanical arm, the embodiment of the invention calibrates the position information of the dangerous object and the evaluation grade information of the danger coefficient in the underwater environment in advance to detect and identify the target by the network N₁Training, target detection and recognition network N after training₁The position of a dangerous object in an underwater source image shot by any monocular camera and the risk coefficient evaluation grade of the dangerous object can be identified, a corresponding environment situation evaluation graph is generated by combining a depth estimation image, loss is formed by the environment situation evaluation truth value image, and therefore the information fusion network N is subjected to₂Optimizing the optimized information fusion network N₂The generated environment situation evaluation graph can be used as an important support for subsequently performing underwater environment operation tasks such as path planning, autonomous obstacle avoidance and grabbing and the like, so that the robot can be guided to realize the optimal behavior at a higher level.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. An environmental situation assessment method for autonomous underwater visual target grabbing, comprising the steps of:

2. The method for assessing the environmental situation of underwater visual target autonomous grasping according to claim 1, wherein the step S2 specifically includes the steps of:

3. The environmental situation assessment method for autonomous underwater visual target grabbing according to claim 2, wherein in said step S21, said maximum attenuation identification algorithm specifically comprises the steps of:

1) according to

4. The environmental situation assessment method for autonomous crawling of underwater visual targets according to claim 2, characterized in that in said step S3:

according to GT ═ D ∑ η_iN(i|μ_iσ) calculating the environment situation evaluation truth value image of each of the underwater source images, wherein D is the depth estimation image generated in the step S2, η_iA classification label for the ith target in the underwater source image, N (i | mu)_iσ) is the ith two-dimensional Gaussian distribution, μ_iσ is the standard deviation of the coordinates of the center of the ith target box.

5. The method for environmental situation assessment for autonomous underwater visual target grabbing according to any one of claims 1-4, wherein in said step S4, said target detection and identification network N₁Adopting fast RCNN; the network of the Fast RCNN mainly comprises an area generation network and a Fast RCNN network; the regional generation network, namely the RPN network, is composed of a feature extraction network, a parallel first classification network and a first boundary frame regression network; the Fast RCNN network consists of the feature extraction network, an ROI Powing network, a parallel second classification network and a second bounding box regression network; the RPN network and the Fast RCNN network share the feature extraction network;

the working process of the RPN network is as follows:

the working process of the Fast RCNN network comprises the following steps:

6. The method for environmental situation assessment for autonomous underwater visual target crawling of claim 5, wherein in said step S4, said target detection and recognition network N is trained₁The method specifically comprises the following steps:

where i denotes the index of the candidate box, p_iProbability of predicting as target region for i-th candidate box, p_i ^*A classification truth value for whether the candidate box contains a target; t is t_iIs the bounding box regression parameter, t, of the ith candidate box_i ^*Is the boundary box regression parameter true value, N corresponding to the ith candidate box_clsFor the number of candidate frames extracted, N_reg2400, λ is the balance coefficient; the first classification loss L_clsNetwork described by cross entropy loss to classify whether a candidate box contains a targetTraining; the first bounding box regression loss L_locUsing smooth L1 Loss description to train the first bounding box regression network;

s42: fixing parameters of the first classification network and the first bounding box regression network in the RPN network, initializing the feature extraction network by using pre-training network model parameters, training the Fast RCNN network by using candidate frames provided by the RPN network, and calculating a loss function L of the Fast-RCNN network according to a classification result of the second softmax network and bounding box regression parameters of the second bounding box regression network_Fast-RCNNSaid L is_Fast-RCNNCalculation formula and loss function L of RPN network_RPNThe same;

7. The method for environmental situation assessment for autonomous underwater visual target grabbing according to claim 6, wherein in said step S41, N_cls＝256，λ＝10。

8. The method for assessing the environmental situation of underwater visual target autonomous grasping according to claim 7, wherein said step S5 specifically includes the steps of:

s52: combining the mask image obtained in step S51 with the imageThe depth estimation images obtained in step S2 are input to the information fusion network N together₂Carrying out feature extraction and fusion to generate an environment situation evaluation graph;