CN112329615A - Environment situation evaluation method for autonomous underwater visual target grabbing - Google Patents

Environment situation evaluation method for autonomous underwater visual target grabbing Download PDF

Info

Publication number
CN112329615A
CN112329615A CN202011214592.9A CN202011214592A CN112329615A CN 112329615 A CN112329615 A CN 112329615A CN 202011214592 A CN202011214592 A CN 202011214592A CN 112329615 A CN112329615 A CN 112329615A
Authority
CN
China
Prior art keywords
network
image
underwater
target
source image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011214592.9A
Other languages
Chinese (zh)
Other versions
CN112329615B (en
Inventor
王楠
杨学文
崔燕妮
胡文杰
辛国玲
张兴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ocean University of China
Original Assignee
Ocean University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ocean University of China filed Critical Ocean University of China
Priority to CN202011214592.9A priority Critical patent/CN112329615B/en
Publication of CN112329615A publication Critical patent/CN112329615A/en
Application granted granted Critical
Publication of CN112329615B publication Critical patent/CN112329615B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J15/00Gripping heads and other end effectors
    • B25J15/08Gripping heads and other end effectors having finger members
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1656Programme controls characterised by programming, planning systems for manipulators
    • B25J9/1664Programme controls characterised by programming, planning systems for manipulators characterised by motion, path, trajectory planning
    • B25J9/1666Avoiding collision or forbidden zones
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1694Programme controls characterised by use of sensors other than normal servo-feedback from position, speed or acceleration sensors, perception control, multi-sensor controlled systems, sensor fusion
    • B25J9/1697Vision controlled systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Mechanical Engineering (AREA)
  • Robotics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of computer vision, and particularly discloses an environment situation evaluation method for autonomous grabbing of an underwater visual target1Training, target detection and recognition network N after training1Capable of identifying dangerous objects in underwater source image shot by any monocular cameraThe risk coefficient evaluation grades of the positions and the dangerous objects are combined with the depth estimation image to generate a corresponding environment situation evaluation graph so as to evaluate the loss of the true value image with the environment situation, and therefore the information fusion network N is subjected to2Optimizing the optimized information fusion network N2The generated environment situation evaluation graph can be used as an important support for subsequently performing underwater environment operation tasks such as path planning, autonomous obstacle avoidance and grabbing and the like, so that the robot can be guided to realize the optimal behavior at a higher level.

Description

Environment situation evaluation method for autonomous underwater visual target grabbing
Technical Field
The invention relates to the technical field of computer vision, in particular to an environment situation evaluation method for autonomous underwater visual target grabbing.
Background
The mechanical arm needs to perform operation tasks in a complex and dynamic underwater environment and has the capacity of intelligently analyzing the external environment. The risk assessment of the surrounding environment is an important guarantee for the safety of the mechanical arm and the underwater carrier and the operation completion rate. Meanwhile, the environmental situation assessment can also be regarded as a process that the robot recognizes the surrounding environment and models the environment, and is an important support for subsequent path planning. Compared with a traditional artificial potential field method, a behavior decomposition method, an optimization-based algorithm and the like, the environment situation assessment can guide the robot to realize the optimal behavior at a higher level. At present, for underwater mechanical arm operation, considered factors are not comprehensive enough, a relatively complete environmental situation assessment strategy is not provided, a path can be reasonably planned for a robot, and autonomous obstacle avoidance and grabbing are achieved.
Disclosure of Invention
The invention provides an environmental situation evaluation method for autonomous underwater visual target grabbing, which solves the technical problems that: at present, for underwater mechanical arm operation, considered factors are not comprehensive enough, a relatively complete environmental situation assessment strategy is not provided, a path can be reasonably planned for a robot, and autonomous obstacle avoidance and grabbing are achieved.
In order to solve the technical problems, the invention provides an environmental situation assessment method for autonomous underwater visual target grabbing, which comprises the following steps: the method comprises the following steps:
s1: collecting underwater source images under various underwater dynamic scenes by adopting a monocular camera to generate an underwater source image data set;
s2: performing distance estimation on each underwater source image in the underwater source image data set to obtain a corresponding depth estimation image;
s3: determining dangerous object position information and danger coefficient evaluation grade information of each underwater source image, and generating an environment situation evaluation truth value image by combining the depth estimation image;
s4: training target detection and recognition network N by using each underwater source image and corresponding dangerous object position information and danger coefficient evaluation grade information1
S5: detection and recognition network N using trained targets1Generating information, generating an environment situation evaluation graph by combining the depth estimation image corresponding to each underwater source image, and fusing a network N by using loss optimization information between the environment situation evaluation graph and the environment situation evaluation truth value image2
Further, the step S2 specifically includes the steps of:
s21: carrying out restoration processing on the underwater source image I by adopting a maximum attenuation recognition algorithm to obtain a restored image J;
s22: respectively extracting red channels J of the restored image J and the underwater source image IRAnd IRCalculating JRAnd IRAs a distance coefficient
Figure BDA0002759934460000021
S23: and normalizing the distance coefficient d to obtain a depth estimation image.
Further, in step S21, the maximum attenuation identification algorithm specifically includes the steps of:
s211: estimating global background light A; the method specifically comprises the following steps:
1) filtering the R channel of the underwater source image I by using a maximum filter with adjustable window size to obtain a corresponding depth image;
2) finding out pixel points with the lowest brightness of 10% in the depth image corresponding to each image block to correspond to the underwater source image I, and obtaining the background light of each image block according to the pixel points, wherein the number of the image blocks is v x w;
3) integrating the backlights of all image blocks to estimate the global backlight A of the R channelR(x);
4) Estimating the global background light A of the G channel of the underwater source image I by the same principle and step as the R channelG(x) And global backlight A of B channelB(x);
S212: estimating a propagation coefficient xi; the method specifically comprises the following steps:
1) according to
Figure BDA0002759934460000022
Estimating a propagation coefficient ξ of an R channel of the underwater source imageR(x) Where Ω (x) represents a local area, y represents the position of a pixel point in said local area, IR(y) represents the pixel value of the pixel point corresponding to the R channel;
2) calculating the propagation coefficient xi of the G channel of the underwater source image by the same principle and step as the R channelG(x) And propagation coefficient xi of B channelB(x);
S213: and obtaining a restored image J according to the underwater light propagation model I (x) ═ J (x) xi (x) + A (1-xi (x)), wherein x is the position of the pixels in the underwater source image I and the restored image J.
Further, in the step S3:
the position information of the dangerous object adopts the coordinates of the central point of the target frame and the length and width description of the target frame;
the risk coefficient evaluation grade information calibrates the risk of the target into at least 5 risk grades with the risk degree from high to low according to the influence degree of the target in the underwater source image data set on the autonomous operation, and the corresponding risk grade is used as a classification label of the corresponding target;
according to GT ═ D ∑ ηiN(i|μiσ) calculating the environmental situation of each of the underwater source imagesEvaluating a truth image, wherein D is the depth estimation image generated in the step S2, etaiA classification label for the ith target in the underwater source image, N (i | mu)iσ) is the ith two-dimensional Gaussian distribution, μiAnd sigma is standard deviation of the central coordinate of the ith target frame, and the sigma can be adaptively set according to the image size.
Further, in the step S4, the object detection and identification network N1Adopting fast RCNN; the network of the Fast RCNN mainly comprises an area generation network and a Fast RCNN network; the regional generation network, namely the RPN network, is composed of a feature extraction network, a parallel first classification network and a first boundary frame regression network; the Fast RCNN network consists of the feature extraction network, an ROI Powing network, a parallel second classification network and a second bounding box regression network; the RPN network and the Fast RCNN network share the feature extraction network;
the working process of the RPN network is as follows:
1) the feature extraction network adopts a pre-training network model, inputs the underwater source image and outputs a feature map extracted from the underwater source image;
2) finding that each position in the feature map corresponds to 9 prior frames in the underwater source image, and performing secondary classification on all the prior frames by the first classification network by adopting a first softmax network to judge whether each prior frame contains a target;
3) the first bounding box regression network generates bounding box regression parameters of a prior frame containing a target so as to correct the prior frame to obtain a candidate frame;
the working process of the Fast RCNN network comprises the following steps:
1) extracting a feature area corresponding to each candidate frame from the corresponding feature map by the ROI Pooling network according to the coordinates of each candidate frame, dividing the feature area into n parts, and adjusting the feature map to n parts by using a Max Pooling method;
2) the second classification network classifies the risk coefficient of the dangerous object by adopting a second softmax network;
3) and the second bounding box regression network generates the bounding box regression parameters of the candidate box.
Further, in the step S4, the target detection and recognition network N is trained1The method specifically comprises the following steps:
s41: training the RPN network; the method specifically comprises the following steps:
1) screening the candidate frames obtained by the RPN, removing the candidate frames near the boundary, and then screening by using non-maximum value inhibition;
2) extracting N from the screened candidate frameclsA candidate box for calculating a loss function L of the RPN networkRPNSaid loss function LRPNLoss of L from the first classificationclsAnd first bounding box regression loss LlocThe two parts are as follows:
Figure BDA0002759934460000041
where i denotes the index of the candidate box, piProbability of predicting as target region for i-th candidate box, pi *A classification truth value for whether the candidate box contains a target; t is tiIs the bounding box regression parameter, t, of the ith candidate boxi *Is the boundary box regression parameter true value, N corresponding to the ith candidate boxclsThe number of candidate frames for extraction is set to 256, Nreg2400, λ is the balance coefficient, in practice, NclsAnd NregIf the difference is too large, the two are balanced by a parameter lambda (lambda is 10); the first classification loss LclsNetwork training with cross entropy loss description to classify whether the candidate box contains a target (positive and negative); the first bounding box regression loss LlocUsing smooth L1 Loss description to train the first bounding box regression network;
s42: fixing parameters of the first classification network and the first bounding box regression network in the RPN, initializing the feature extraction network by using pre-training network model parameters, and adopting the candidate boxes provided by the RPNTraining the Fast RCNN network, and calculating a loss function L of the Fast-RCNN network according to the classification result of the second softmax network and the boundary box regression parameter of the second boundary box regression networkFast-RCNNSaid L isFast-RCNNCalculation formula and loss function L of RPN networkRPNThe same;
s43: fixing the parameters of the feature extraction network in the step S42, and fine-tuning the parameters of the first classification network and the first bounding box regression network in the RPN network;
s44: fixing the parameters of the feature extraction network in the step S42, and fine-tuning the parameters of the ROI Pooling network, the second classification network, and the second bounding box regression network in the Fast RCNN network.
Further, the step S5 specifically includes the steps of:
s51: network N for detecting and identifying objects1Outputting new position information of the dangerous object and risk coefficient evaluation grade information to obtain a mask image; specifically, a mask image is initialized by using a 0 pixel value, and the pixel value of a predicted target area in the mask image is set as a corresponding number of a predicted risk level, and is normalized;
s52: inputting the mask image obtained in the step S51 and the depth estimation image obtained in the step S2 into an information fusion network N2Carrying out feature extraction and fusion to generate an environment situation evaluation graph;
s53: calculating a situation evaluation loss function L between the environment situation evaluation chart generated in the step S52 and the environment situation evaluation truth value image generated in the step S3, and fusing a network N according to gradient back propagation update information2The parameters of (1); the situation assessment loss function L is:
Figure BDA0002759934460000051
wherein P represents the coordinate positions of all the pixels in the environment situation assessment map, P represents the P-th pixel in P, y (P) represents the environment situation assessment value of the pixel P in the environment situation assessment map, and gt (P) represents the environment situation assessment value of the pixel P in the environment situation assessment truth value image.
The invention provides an environmental situation assessment method for autonomous underwater visual target grabbing, which comprises the steps of firstly, collecting underwater source images under various underwater dynamic scenes by using a monocular camera as a data set (step S1); further performing distance estimation on each underwater source image in the data set to obtain a corresponding depth estimation image (step S2); then, the position information of the dangerous objects and the evaluation grade information of the danger coefficient of each underwater source image are determined (step S3) and are used for training the target detection and recognition network N1(step S4), and generating an environment situation evaluation truth value image (step S3) by combining the depth estimation image for the information fusion network N2Optimization is performed (step S5).
Aiming at the operation of an underwater mechanical arm, the invention calibrates the position information of dangerous objects and the evaluation grade information of danger coefficients in the underwater environment in advance to detect and identify the target by the network N1Training, target detection and recognition network N after training1The position of a dangerous object in an underwater source image shot by any monocular camera and the risk coefficient evaluation grade of the dangerous object can be identified, a corresponding environment situation evaluation graph is generated by combining a depth estimation image, loss is formed by the environment situation evaluation truth value image, and therefore the information fusion network N is subjected to2Optimizing the optimized information fusion network N2The generated environment situation evaluation graph can be used as an important support for subsequently performing underwater environment operation tasks such as path planning, autonomous obstacle avoidance and grabbing and the like, so that the robot can be guided to realize the optimal behavior at a higher level.
Drawings
FIG. 1 is a flowchart illustrating steps of an environmental situation assessment method for autonomous underwater visual target grabbing according to an embodiment of the present invention;
FIG. 2 is a data processing flow chart of an environmental situation assessment method for autonomous underwater visual target grabbing according to an embodiment of the present invention;
fig. 3 is a comparison diagram of an underwater source image and an environmental situation evaluation diagram thereof provided by the embodiment of the invention.
Detailed Description
The embodiments of the present invention will be described in detail below with reference to the accompanying drawings, which are given solely for the purpose of illustration and are not to be construed as limitations of the invention, including the drawings which are incorporated herein by reference and for illustration only and are not to be construed as limitations of the invention, since many variations thereof are possible without departing from the spirit and scope of the invention.
In order to perform more complete situation assessment on the autonomous underwater visual target grabbing of the robot, the embodiment of the invention provides an environment situation assessment method for the autonomous underwater visual target grabbing, which specifically includes steps S1 to S5 as shown in a step flow chart shown in fig. 1. The processing procedure of various data in this embodiment is shown in fig. 2.
(1) Step S1
S1: and collecting underwater source images under various underwater dynamic scenes by adopting a monocular camera to generate an underwater source image data set.
(2) Step S2
S2: and performing distance estimation on each underwater source image in the underwater source image data set to obtain a corresponding depth estimation image.
Step S2 specifically includes the steps of:
s21: carrying out restoration processing on the underwater source image I by adopting a maximum attenuation recognition algorithm to obtain a restored image J;
the maximum attenuation identification algorithm specifically comprises the following steps:
s211: estimating global background light A; the method specifically comprises the following steps:
1) filtering an R channel of an underwater source image I by using a maximum filter with adjustable window size to obtain a corresponding depth image;
2) finding out pixel points with the lowest brightness of 10% in the depth image corresponding to each image block to correspond to the underwater source image I, and obtaining the background light of each image block according to the pixel points, wherein the number of the image blocks is v × w (2 × 2 in the embodiment);
3) integrating the backlights of all image blocks to estimate the global backlight A of the R channelR(x);
4) Estimating the global background light A of the G channel of the underwater source image I by the same principle and step as the R channelG(x) And global backlight A of B channelB(x);
S212: estimating a propagation coefficient xi; the method specifically comprises the following steps:
1) according to
Figure BDA0002759934460000071
Estimating propagation coefficient xi of R channel of underwater source imageR(x) Where Ω (x) represents a local area, y represents the position of a pixel point in the local area, IR(y) represents the pixel value of the pixel point corresponding to the R channel;
2) calculating the propagation coefficient xi of the G channel of the underwater source image by the same principle and step as the R channelG(x) And propagation coefficient xi of B channelB(x);
S213: and obtaining a restored image J according to the underwater light propagation model I (x) ═ J (x) xi (x) + A (1-xi (x)), wherein x is the position of the pixels in the underwater source image I and the restored image J.
It should also be noted that it is possible to mention,
Figure BDA0002759934460000081
the estimation process comprises the following steps:
the classical light scattering model is as follows: i (x) ═ j (x) ξ (x) + a (x) (1- ξ (x)), since the absorption and scattering coefficients of different colors of water are different, and therefore the attenuation under water is also different, the propagation coefficients of R, G, B three colors are considered separately, yielding:
IR(x)=JR(x)ξR(x)+AR(x)(1-ξR(x))
IG(x)=JG(x)ξG(x)+AG(x)(1-ξG(x))
IB(x)=JB(x)ξB(x)+AB(x)(1-ξB(x))
taking the maximum value on the local region Ω (x) for both sides of the above formula, assuming that the propagation coefficient ξ (x) and the background light a (x) are consistent in the local region Ω (x), we obtain:
Figure BDA0002759934460000082
Figure BDA0002759934460000083
Figure BDA0002759934460000084
taking the R channel as an example, the conversion is continued:
Figure BDA0002759934460000085
then, both sides are removed 1-AR(x) Obtaining:
Figure BDA0002759934460000086
further transformation:
Figure BDA0002759934460000087
in view of the attenuation at close distances, the underwater background light a is usually darker, especially in deep sea, while, at a suitable window size, the closer the object is to the camera, the brighter the object area, the more the maximum value of J is approximately 1, at which time,
Figure BDA0002759934460000091
(3) step S3
S3: and determining the position information of the dangerous objects and the risk coefficient evaluation grade information of each underwater source image, and generating an environment situation evaluation truth-valued image by combining the depth estimation image.
The position information of the dangerous object adopts the coordinates of the central point of the target frame and the length and width description of the target frame, and generates the position information of the dangerous object by manual evaluation or other image processing means.
The risk factor evaluation grade information includes a classification label (risk grade) of each underwater source image. According to the influence degree of the target in the underwater source image data set on the autonomous operation, manually or by adopting other image processing means, the risk of the target is calibrated to at least 5 risk levels from high to low in risk degree, and the corresponding risk level is used as a classification label of the corresponding target. Considering that different underwater objects have different influences on the operation, such as two organisms of fish and aquatic weeds which often occur, the fish basically has no influence on the operation, the aquatic weeds are easy to entangle, and the influence range of the aquatic weeds is changed along with the types of the aquatic weeds and the water flow condition. The method grades the dangerousness of the underwater objects, and qualitatively classifies the dangerousness of the underwater objects into 5 grades: very dangerous, more dangerous, normal, less dangerous and safe, corresponding to the numbers [4,3,2,1,0], respectively.
According to GT ═ D ∑ ηiN(i|μiσ) calculating an environment situation evaluation truth image (GT) of each underwater source image, wherein D is the depth estimation image generated in the step S2, and η isiA classification label for the ith target in the underwater source image, N (i | mu)iσ) is the ith two-dimensional Gaussian distribution, μiAnd sigma is standard deviation of the central coordinate of the ith target frame, and the sigma can be adaptively set according to the image size.
(4) Step S4
S4: training target detection and recognition network N by using each underwater source image and corresponding dangerous object position information and danger coefficient evaluation grade information1
Target detection and identification network N1Adopting fast RCNN; the network of the Fast RCNN mainly comprises an area generation network and a Fast RCNN network; region generation network (RPN) network feature extraction networkThe system comprises a CNN, a parallel first classification network softmax-1 and a first boundary frame regression network Regressor-1; the Fast RCNN network consists of a feature extraction network CNN, an ROI Pooling network, a parallel second classification network softmax-2 and a second bounding box regression network Regressor-2; the RPN and Fast RCNN share the characteristic extraction network CNN;
the work flow of the RPN network is as follows:
1) the characteristic extraction network CNN adopts a pre-training network model, inputs an underwater source image and outputs a characteristic diagram extracted from the underwater source image;
2) finding out that each position in the feature map corresponds to 9 prior frames in the underwater source image, and carrying out secondary classification on all the prior frames by the first classification network softmax-1 by adopting a first softmax network to judge whether each prior frame contains a target;
3) the first bounding box regression network Regressor-1 generates bounding box regression parameters of a prior frame containing a target so as to correct the prior frame to obtain a candidate frame;
the working flow of the Fast RCNN network is as follows:
1) extracting a feature area corresponding to the candidate frame from the corresponding feature map by the ROI Pooling network according to the coordinates of each candidate frame, dividing the feature area into n x n parts, and adjusting the feature map to n x n size by using a Max Pooling method;
2) the second classification network softmax-2 classifies the risk coefficient of the dangerous object by adopting the second softmax network, and the risk coefficient is set as 5 in the invention;
3) and generating the boundary frame regression parameters of the candidate frame by the second boundary frame regression network Regressor-2.
In the present step S4, the target detection and recognition network N is trained1The method specifically comprises the following steps:
s41: training an RPN network; the method specifically comprises the following steps:
1) screening candidate frames obtained by the RPN, removing the candidate frames near the boundary, and then screening by using non-maximum value inhibition;
2) extracting N from the screened candidate frameclsA candidate frame for calculating a loss function L of the RPN networkRPNLoss function LRPNLoss of L from the first classificationclsAnd first bounding box regression loss LlocThe two parts are as follows:
Figure BDA0002759934460000101
where i denotes the index of the candidate box, piProbability of predicting as target region for i-th candidate box, pi *A classification truth value for whether the candidate box contains a target; t is tiIs the bounding box regression parameter, t, of the ith candidate boxi *Is the boundary box regression parameter true value, N corresponding to the ith candidate boxclsThe number of candidate frames for extraction is set to 256, Nreg2400, λ is the balance coefficient, in practice, NclsAnd NregIf the difference is too large, the two are balanced by a parameter lambda (lambda is 10); first classification loss LclsNetwork training with cross entropy loss description to classify whether the candidate box contains a target (positive and negative); first bounding Box regression loss LlocUsing smooth L1 Loss description to train a first bounding box regression network Regressor-1;
s42: fixing parameters of a first classification network softmax-1 and a first boundary frame regression network Regressor-1 in an RPN, initializing a feature extraction network CNN by using pre-training network model parameters, training a Fast RCNN network by using a candidate frame provided by the RPN, and calculating a loss function L of the Fast-RCNN network according to a classification result of a second softmax network and boundary frame regression parameters of a second boundary frame regression network Regressor-2Fast-RCNN,LFast-RCNNCalculation formula and loss function L of RPN networkRPNThe same;
s43: fixing the parameters of the feature extraction network CNN in the step S42, and finely adjusting the parameters of a first classification network softmax-1 and a first boundary frame regression network Regressor-1 in the RPN;
s44: and fixing the parameters of the feature extraction network CNN in the step S42, and finely adjusting the parameters of the ROI Pooling network, the second classification network softmax-2 and the second bounding box regression network Regressor-2 in the Fast RCNN network.
(5) Step S5
Step S5 specifically includes the steps of:
s51: network N with object detection and recognition1Outputting new position information (boundary box regression parameters) of the dangerous objects and risk coefficient evaluation grade information (risk coefficients) to obtain a mask image; specifically, a mask image is initialized by using a 0 pixel value, and the pixel value of a predicted target area in the mask image is set as a corresponding number of a predicted risk level, and is normalized;
s52: the mask image obtained in step S51 is input to the information fusion network N together with the depth estimation image obtained in step S22Carrying out feature extraction and fusion to generate an environment situation evaluation graph;
s53: calculating a situation evaluation loss function L between the environment situation evaluation chart generated in the step S52 and the environment situation evaluation truth value image generated in the step S3, and updating the information fusion network N according to the gradient back propagation2The parameters of (1); the situational assessment loss function L is:
Figure BDA0002759934460000121
wherein, P represents the coordinate positions of all the pixel points in the environment situation evaluation graph, P represents the P-th pixel point in P, y (P) represents the environment situation evaluation value of the pixel point P in the environment situation evaluation graph, and gt (P) represents the environment situation evaluation value of the pixel point P in the environment situation evaluation truth value image.
After the underwater source image shown in fig. 3 is subjected to the environmental situation assessment method of the embodiment, not only is the target region clearly divided, but also the target region and the background region are strictly distinguished, the situation assessment is relatively complete, and the robot can be better guided to realize autonomous grabbing operation.
To sum up, the environmental situation assessment method for autonomous underwater visual target grabbing provided by the embodiment of the invention firstly adopts a monocular camera to collect underwater source images under various underwater dynamic scenes as a data set(step S1); further performing distance estimation on each underwater source image in the data set to obtain a corresponding depth estimation image (step S2); then, the position information of the dangerous objects and the evaluation grade information of the danger coefficient of each underwater source image are determined (step S3) and are used for training the target detection and recognition network N1(step S4), and generating an environment situation evaluation truth value image (step S3) by combining the depth estimation image for the information fusion network N2Optimization is performed (step S5).
Aiming at the operation of the underwater mechanical arm, the embodiment of the invention calibrates the position information of the dangerous object and the evaluation grade information of the danger coefficient in the underwater environment in advance to detect and identify the target by the network N1Training, target detection and recognition network N after training1The position of a dangerous object in an underwater source image shot by any monocular camera and the risk coefficient evaluation grade of the dangerous object can be identified, a corresponding environment situation evaluation graph is generated by combining a depth estimation image, loss is formed by the environment situation evaluation truth value image, and therefore the information fusion network N is subjected to2Optimizing the optimized information fusion network N2The generated environment situation evaluation graph can be used as an important support for subsequently performing underwater environment operation tasks such as path planning, autonomous obstacle avoidance and grabbing and the like, so that the robot can be guided to realize the optimal behavior at a higher level.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (8)

1. An environmental situation assessment method for autonomous underwater visual target grabbing, comprising the steps of:
s1: collecting underwater source images under various underwater dynamic scenes by adopting a monocular camera to generate an underwater source image data set;
s2: performing distance estimation on each underwater source image in the underwater source image data set to obtain a corresponding depth estimation image;
s3: determining dangerous object position information and danger coefficient evaluation grade information of each underwater source image, and generating an environment situation evaluation truth value image by combining the depth estimation image;
s4: training target detection and recognition network N by using each underwater source image and corresponding dangerous object position information and danger coefficient evaluation grade information1
S5: detection and recognition network N using trained targets1Generating information, generating an environment situation evaluation graph by combining the depth estimation image corresponding to each underwater source image, and fusing a network N by using loss optimization information between the environment situation evaluation graph and the environment situation evaluation truth value image2
2. The method for assessing the environmental situation of underwater visual target autonomous grasping according to claim 1, wherein the step S2 specifically includes the steps of:
s21: carrying out restoration processing on the underwater source image I by adopting a maximum attenuation recognition algorithm to obtain a restored image J;
s22: respectively extracting red channels J of the restored image J and the underwater source image IRAnd IRCalculating JRAnd IRAs a distance coefficient
Figure FDA0002759934450000011
S23: and normalizing the distance coefficient d to obtain a depth estimation image.
3. The environmental situation assessment method for autonomous underwater visual target grabbing according to claim 2, wherein in said step S21, said maximum attenuation identification algorithm specifically comprises the steps of:
s211: estimating global background light A; the method specifically comprises the following steps:
1) filtering the R channel of the underwater source image I by using a maximum filter with adjustable window size to obtain a corresponding depth image;
2) finding out pixel points with the lowest brightness of 10% in the depth image corresponding to each image block to correspond to the underwater source image I, and obtaining the background light of each image block according to the pixel points, wherein the number of the image blocks is v x w;
3) integrating the backlights of all image blocks to estimate the global backlight A of the R channelR(x);
4) Estimating the global background light A of the G channel of the underwater source image I by the same principle and step as the R channelG(x) And global backlight A of B channelB(x);
S212: estimating a propagation coefficient xi; the method specifically comprises the following steps:
1) according to
Figure FDA0002759934450000021
Estimating a propagation coefficient ξ of an R channel of the underwater source imageR(x) Where Ω (x) represents a local area, y represents the position of a pixel point in said local area, IR(y) represents the pixel value of the pixel point corresponding to the R channel;
2) calculating the propagation coefficient xi of the G channel of the underwater source image by the same principle and step as the R channelG(x) And propagation coefficient xi of B channelB(x);
S213: and obtaining a restored image J according to the underwater light propagation model I (x) ═ J (x) xi (x) + A (1-xi (x)), wherein x is the position of the pixels in the underwater source image I and the restored image J.
4. The environmental situation assessment method for autonomous crawling of underwater visual targets according to claim 2, characterized in that in said step S3:
the position information of the dangerous object adopts the coordinates of the central point of the target frame and the length and width description of the target frame;
the risk coefficient evaluation grade information calibrates the risk of the target into at least 5 risk grades with the risk degree from high to low according to the influence degree of the target in the underwater source image data set on the autonomous operation, and the corresponding risk grade is used as a classification label of the corresponding target;
according to GT ═ D ∑ ηiN(i|μiσ) calculating the environment situation evaluation truth value image of each of the underwater source images, wherein D is the depth estimation image generated in the step S2, ηiA classification label for the ith target in the underwater source image, N (i | mu)iσ) is the ith two-dimensional Gaussian distribution, μiσ is the standard deviation of the coordinates of the center of the ith target box.
5. The method for environmental situation assessment for autonomous underwater visual target grabbing according to any one of claims 1-4, wherein in said step S4, said target detection and identification network N1Adopting fast RCNN; the network of the Fast RCNN mainly comprises an area generation network and a Fast RCNN network; the regional generation network, namely the RPN network, is composed of a feature extraction network, a parallel first classification network and a first boundary frame regression network; the Fast RCNN network consists of the feature extraction network, an ROI Powing network, a parallel second classification network and a second bounding box regression network; the RPN network and the Fast RCNN network share the feature extraction network;
the working process of the RPN network is as follows:
1) the feature extraction network adopts a pre-training network model, inputs the underwater source image and outputs a feature map extracted from the underwater source image;
2) finding that each position in the feature map corresponds to 9 prior frames in the underwater source image, and performing secondary classification on all the prior frames by the first classification network by adopting a first softmax network to judge whether each prior frame contains a target;
3) the first bounding box regression network generates bounding box regression parameters of a prior frame containing a target so as to correct the prior frame to obtain a candidate frame;
the working process of the Fast RCNN network comprises the following steps:
1) extracting a feature area corresponding to each candidate frame from the corresponding feature map by the ROI Pooling network according to the coordinates of each candidate frame, dividing the feature area into n parts, and adjusting the feature map to n parts by using a Max Pooling method;
2) the second classification network classifies the risk coefficient of the dangerous object by adopting a second softmax network;
3) and the second bounding box regression network generates the bounding box regression parameters of the candidate box.
6. The method for environmental situation assessment for autonomous underwater visual target crawling of claim 5, wherein in said step S4, said target detection and recognition network N is trained1The method specifically comprises the following steps:
s41: training the RPN network; the method specifically comprises the following steps:
1) screening the candidate frames obtained by the RPN, removing the candidate frames near the boundary, and then screening by using non-maximum value inhibition;
2) extracting N from the screened candidate frameclsA candidate box for calculating a loss function L of the RPN networkRPNSaid loss function LRPNLoss of L from the first classificationclsAnd first bounding box regression loss LlocThe two parts are as follows:
Figure FDA0002759934450000041
where i denotes the index of the candidate box, piProbability of predicting as target region for i-th candidate box, pi *A classification truth value for whether the candidate box contains a target; t is tiIs the bounding box regression parameter, t, of the ith candidate boxi *Is the boundary box regression parameter true value, N corresponding to the ith candidate boxclsFor the number of candidate frames extracted, Nreg2400, λ is the balance coefficient; the first classification loss LclsNetwork described by cross entropy loss to classify whether a candidate box contains a targetTraining; the first bounding box regression loss LlocUsing smooth L1 Loss description to train the first bounding box regression network;
s42: fixing parameters of the first classification network and the first bounding box regression network in the RPN network, initializing the feature extraction network by using pre-training network model parameters, training the Fast RCNN network by using candidate frames provided by the RPN network, and calculating a loss function L of the Fast-RCNN network according to a classification result of the second softmax network and bounding box regression parameters of the second bounding box regression networkFast-RCNNSaid L isFast-RCNNCalculation formula and loss function L of RPN networkRPNThe same;
s43: fixing the parameters of the feature extraction network in the step S42, and fine-tuning the parameters of the first classification network and the first bounding box regression network in the RPN network;
s44: fixing the parameters of the feature extraction network in the step S42, and fine-tuning the parameters of the ROI Pooling network, the second classification network, and the second bounding box regression network in the Fast RCNN network.
7. The method for environmental situation assessment for autonomous underwater visual target grabbing according to claim 6, wherein in said step S41, Ncls=256,λ=10。
8. The method for assessing the environmental situation of underwater visual target autonomous grasping according to claim 7, wherein said step S5 specifically includes the steps of:
s51: network N for detecting and identifying objects1Outputting new position information of the dangerous object and risk coefficient evaluation grade information to obtain a mask image; specifically, a mask image is initialized by using a 0 pixel value, and the pixel value of a predicted target area in the mask image is set as a corresponding number of a predicted risk level, and is normalized;
s52: combining the mask image obtained in step S51 with the imageThe depth estimation images obtained in step S2 are input to the information fusion network N together2Carrying out feature extraction and fusion to generate an environment situation evaluation graph;
s53: calculating a situation evaluation loss function L between the environment situation evaluation chart generated in the step S52 and the environment situation evaluation truth value image generated in the step S3, and fusing a network N according to gradient back propagation update information2The parameters of (1); the situation assessment loss function L is:
Figure FDA0002759934450000051
wherein P represents the coordinate positions of all the pixels in the environment situation assessment map, P represents the P-th pixel in P, y (P) represents the environment situation assessment value of the pixel P in the environment situation assessment map, and gt (P) represents the environment situation assessment value of the pixel P in the environment situation assessment truth value image.
CN202011214592.9A 2020-11-04 2020-11-04 Environment situation evaluation method for autonomous underwater visual target grabbing Active CN112329615B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011214592.9A CN112329615B (en) 2020-11-04 2020-11-04 Environment situation evaluation method for autonomous underwater visual target grabbing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011214592.9A CN112329615B (en) 2020-11-04 2020-11-04 Environment situation evaluation method for autonomous underwater visual target grabbing

Publications (2)

Publication Number Publication Date
CN112329615A true CN112329615A (en) 2021-02-05
CN112329615B CN112329615B (en) 2022-04-15

Family

ID=74323483

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011214592.9A Active CN112329615B (en) 2020-11-04 2020-11-04 Environment situation evaluation method for autonomous underwater visual target grabbing

Country Status (1)

Country Link
CN (1) CN112329615B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113052885A (en) * 2021-03-29 2021-06-29 中国海洋大学 Underwater environment safety assessment method based on optical flow and depth estimation
CN113591854A (en) * 2021-08-12 2021-11-02 中国海洋大学 Low-redundancy quick reconstruction method of plankton hologram
CN113688825A (en) * 2021-05-17 2021-11-23 海南师范大学 AI intelligent garbage recognition and classification system and method
CN115890639A (en) * 2022-11-17 2023-04-04 浙江荣图智能科技有限公司 Robot vision guide positioning and grabbing control system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104134093A (en) * 2014-04-02 2014-11-05 贵州省交通规划勘察设计研究院股份有限公司 Surface line and point multilayer integrated forecasting method for highway geological hazards
US20150332103A1 (en) * 2014-05-19 2015-11-19 Soichiro Yokota Processing apparatus, computer program product, and processing method
CN106804118A (en) * 2015-09-24 2017-06-06 奥林巴斯株式会社 Information acquisition device, information reproduction apparatus, information acquisition method, information regeneration method, information obtain program and information regeneration program
CN108319234A (en) * 2017-12-31 2018-07-24 分众安环(北京)科技有限公司 Safety management system, method, equipment, storage medium, information processing cloud platform
CN108596853A (en) * 2018-04-28 2018-09-28 上海海洋大学 Underwater picture Enhancement Method based on bias light statistical model and transmission map optimization
CN108877267A (en) * 2018-08-06 2018-11-23 武汉理工大学 A kind of intersection detection method based on vehicle-mounted monocular camera
CN110866887A (en) * 2019-11-04 2020-03-06 深圳市唯特视科技有限公司 Target situation fusion sensing method and system based on multiple sensors
CN110991502A (en) * 2019-11-21 2020-04-10 北京航空航天大学 Airspace security situation assessment method based on category activation mapping technology

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104134093A (en) * 2014-04-02 2014-11-05 贵州省交通规划勘察设计研究院股份有限公司 Surface line and point multilayer integrated forecasting method for highway geological hazards
US20150332103A1 (en) * 2014-05-19 2015-11-19 Soichiro Yokota Processing apparatus, computer program product, and processing method
CN106804118A (en) * 2015-09-24 2017-06-06 奥林巴斯株式会社 Information acquisition device, information reproduction apparatus, information acquisition method, information regeneration method, information obtain program and information regeneration program
CN108319234A (en) * 2017-12-31 2018-07-24 分众安环(北京)科技有限公司 Safety management system, method, equipment, storage medium, information processing cloud platform
CN108596853A (en) * 2018-04-28 2018-09-28 上海海洋大学 Underwater picture Enhancement Method based on bias light statistical model and transmission map optimization
CN108877267A (en) * 2018-08-06 2018-11-23 武汉理工大学 A kind of intersection detection method based on vehicle-mounted monocular camera
CN110866887A (en) * 2019-11-04 2020-03-06 深圳市唯特视科技有限公司 Target situation fusion sensing method and system based on multiple sensors
CN110991502A (en) * 2019-11-21 2020-04-10 北京航空航天大学 Airspace security situation assessment method based on category activation mapping technology

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YAN ZHOU ET AL.: "Underwater Moving Target Detection Based on Image Enhancement", 《ISNN 2017》 *
程锦盛等: "基于深度学习方法的水下目标识别技术研究", 《中国优秀博硕士学位论文全文数据库(硕士)社会科学Ⅰ辑》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113052885A (en) * 2021-03-29 2021-06-29 中国海洋大学 Underwater environment safety assessment method based on optical flow and depth estimation
CN113688825A (en) * 2021-05-17 2021-11-23 海南师范大学 AI intelligent garbage recognition and classification system and method
CN113591854A (en) * 2021-08-12 2021-11-02 中国海洋大学 Low-redundancy quick reconstruction method of plankton hologram
CN113591854B (en) * 2021-08-12 2023-09-26 中国海洋大学 Low-redundancy rapid reconstruction method of plankton hologram
CN115890639A (en) * 2022-11-17 2023-04-04 浙江荣图智能科技有限公司 Robot vision guide positioning and grabbing control system

Also Published As

Publication number Publication date
CN112329615B (en) 2022-04-15

Similar Documents

Publication Publication Date Title
CN112329615B (en) Environment situation evaluation method for autonomous underwater visual target grabbing
CN109800824B (en) Pipeline defect identification method based on computer vision and machine learning
US20210089895A1 (en) Device and method for generating a counterfactual data sample for a neural network
CN110569837B (en) Method and device for optimizing damage detection result
CN110246151B (en) Underwater robot target tracking method based on deep learning and monocular vision
CN113469177A (en) Drainage pipeline defect detection method and system based on deep learning
CN112819772A (en) High-precision rapid pattern detection and identification method
EP3671555A1 (en) Object shape regression using wasserstein distance
CN110874590B (en) Training and visible light infrared visual tracking method based on adapter mutual learning model
CN111461213A (en) Training method of target detection model and target rapid detection method
Panetta et al. Logarithmic Edge Detection with Applications.
Pramunendar et al. A Robust Image Enhancement Techniques for Underwater Fish Classification in Marine Environment.
Qi et al. Micro-concrete crack detection of underwater structures based on convolutional neural network
CN110570361B (en) Sonar image structured noise suppression method, system, device and storage medium
Verma et al. FCNN: fusion-based underwater image enhancement using multilayer convolution neural network
CN116703895A (en) Small sample 3D visual detection method and system based on generation countermeasure network
CN114463676A (en) Safety helmet wearing detection method based on implicit expression
Afonso et al. Underwater object recognition: A domain-adaption methodology of machine learning classifiers
Vijayarani et al. An efficient algorithm for facial image classification
Beknazarova et al. Machine learning algorithms are used to detect and track objects on video images
CN116503406B (en) Hydraulic engineering information management system based on big data
Alves et al. Vision-based navigation solution for autonomous underwater vehicles
Yu Pavement surface distress detection and evaluation using image processing technology
Qu Image defogging algorithm based on physical prior and contrast learning
Karimi et al. A Framework for Generating Disparity Map from Stereo Images using Deep Learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant