CN110826575A

CN110826575A - Underwater target identification method based on machine learning

Info

Publication number: CN110826575A
Application number: CN201910950105.6A
Authority: CN
Inventors: 魏延辉; 姜瑶瑶; 蒋志龙; 贺佳林; ***强; 马博也; 牛家乐; 刘东东
Original assignee: Harbin Engineering University
Current assignee: Harbin Engineering University
Priority date: 2019-12-13
Filing date: 2019-12-13
Publication date: 2020-02-21

Abstract

An underwater target identification method based on machine learning belongs to the technical field of underwater machine vision detection processing. The core of the underwater target recognition algorithm is an SSD target detection algorithm, a feedforward convolution network structure is adopted, convolution boxes with different sizes and scales are adopted between different layers to carry out convolution to obtain feature maps with different scales, regression is carried out according to the feature maps, and finally a result is obtained through a non-maximum suppression algorithm. The detection of the target object can be realized through the non-maximum inhibition algorithm, the recognition accuracy of the underwater target can be greatly improved, effective visual information is provided for the underwater robot to observe and operate the underwater target, and the intelligent recognition capability of the underwater target is improved.

Description

Underwater target identification method based on machine learning

Technical Field

The invention belongs to the technical field of underwater machine vision detection, and particularly relates to an underwater target identification method based on machine learning.

Background

The ocean covers most of the earth's area, and has great resources and secret. In recent times, with the progress of the development technology of marine equipment, people begin to further recognize and develop marine resources, and various countries and regions in the world begin to develop marine equipment and exploit underwater resources without much energy. China has coastlines of nearly 2 kilometers, coastal economic sea areas have abundant ocean resources, good ocean development conditions and high requirements, and computer vision equipment serving as one of underwater sensing equipment is increasingly put into ocean detection. The complete vision system covers various disciplines and technologies such as optics, computer science control theory and the like, and is widely carried on underwater detection, operation and manned equipment, wherein the underwater tracking identification technology based on vision has very important research value.

The underwater robot is provided with the vision equipment, so that the underwater environment can be better sensed, and meanwhile, the underwater equipment can be saved more space by single-purpose installation, so that effective conditions are provided for precise and maneuvering operation of the underwater equipment. The related research of the invention aims to provide target and space information for underwater equipment operation, underwater robot obstacle avoidance, path planning and the like, so that the underwater monocular vision technology has practical research and practical application values.

The current underwater target recognition algorithm has the main defects that:

firstly, the precision of the current underwater vision detection is not enough, and the traditional recognition algorithm is not enough for the positioning precision of the target, so that the target detection is greatly influenced.

At present, algorithms based on machine learning are mainly divided into two types, one is a two-stage method such as R-CNN, the other is a one-stage, SSD belongs to the latter, and compared with Yolo, the SSD algorithm is much better than that in terms of accuracy and speed. In contrast to Yolo, SSD uses CNN to detect directly, rather than after the full connectivity layer as does Yolo. The method is characterized in that the detection is directly carried out by convolution only at one difference of SSD compared with Yolo, and two important changes are carried out, namely, the SSD extracts feature maps with different scales for detection, a large-scale feature map (a feature map closer to the front) can be used for detecting small objects, and a small-scale feature map (a feature map closer to the rear) is used for detecting large objects; the second is that the SSD uses Prior boxes (the former boxes, Default boxes, called Anchors in Faster R-CNN) of different dimensions and aspect ratios. The Yolo algorithm has the disadvantages of difficulty in detecting small targets and inaccurate positioning, but the important improvements enable the SSD to overcome the disadvantages to some extent.

Compared with the traditional CNN, the neural network, namely the network with an input layer, an output layer and a plurality of hidden layers, belongs to a sub-field of machine learning. The main principle is that after the training data enters the input layer, the training data is processed by a plurality of hidden layers, and a large amount of training is carried out to extract certain input characteristics, and the model is used for carrying out classification or prediction in practice. The traditional method has several problems in underwater target identification, such as low processing speed, low identification rate and the like.

SSDs, a single-shot detector for multiple classes, were introduced that are faster than the most advanced for single-shot detectors (yolo) and others before, with similar accuracy but faster speed compared to the fast-Rcnn and others identification networks. The core of the SSD is to predict the class scores and box offsets for a set of default bounding boxes using a small volume filter applied to the feature map. The design characteristics lead to simple end-to-end training and high precision, even in low-resolution input images, the conditions of unclear images and low resolution in an underwater environment can be met, and the balance between speed and accuracy is further improved.

Disclosure of Invention

The invention aims to provide an underwater target identification method based on machine learning, which can detect underwater target information more accurately.

The purpose of the invention is realized as follows:

an underwater target identification method based on machine learning comprises the following steps:

step 1: preprocessing an underwater target image, and performing retinex restoration on the image to obtain an input image;

step 2: carrying out convolution by adopting convolution boxes with different sizes and scales among different layers to obtain characteristic graphs with different scales;

and step 3: performing regression according to the feature maps obtained in the step two, performing normalization processing on each feature map, and obtaining defuelt boxes with different scales and offset through detectors and classifiers with different sizes;

and 4, step 4: obtaining a final result through an NMS inhibition algorithm;

in the step2, a region result obtained by RPN calculation in a region selection algorithm is adopted, the feature maps with different sizes are obtained by dividing according to different scales, then the RPN convolution core is used for moving the region on the feature map and obtaining the confidence value of the region, and a matrix with the confidence is obtained by continuously moving defualt box with different scales.

The width and height of the defuelt box in the step3 are as follows:

wherein

And

width and height of defualt box, respectively, the scale being a_r∈{1,2,1/2,3,1/3}，Is the scale of different layers, wherein m is the number, the default scale is realized by m different scale characteristic diagram predictions, S_minIs of the smallest scale, S_maxIs the scale of the largest feature map.

The central position of the selection frame in the step3 is as follows:

wherein x and y represent the horizontal frame, the central point of the longitudinal axis, | f_kI represents the k-th feature graph scale, i.j E [0, f_k]。

The method for NMS inhibition in step4 comprises the following steps:

step 4.1: arranging elements in the matrix according to the conf size;

step 4.2: IoU calculation is carried out on all the cross areas according to the sequence from large to small of the calculation result in the step 4.1, Th values are set, comparison is carried out on the Th values according to the sequence and IoU, and classification are carried out according to the sizes of the Th values;

step 4.3: the execution returns to the step 4.2 again at the second big frame position of the column;

step 4.4: repeating the step 4.3 until all the defuelt boxes in the row are executed;

step 4.5: performing traversal on the matrix, namely performing NMS of all categories;

step 4.6: and further residual screening is carried out, and all the final residual categories are selected according to the confidence coefficient.

The loss function of the whole model in the step4 is

Wherein x is used to determine if a designed feature capture box is presentThere is a corresponding target for the purpose of,

whether the ith box is matched with the jth target frame of the pth object or not is represented, the matching is 1, and otherwise, the matching is 0; if it isIndicating that at least one box matches the jth target bounding box; n represents the number of matching sums;

to measure the performance of the recognition;used to measure bounding box prediction performance; wherein

And (2) representing the deviation between the real target frame of the jth target and the frame of the characteristic grabbing box, wherein m belongs to { cx, cy, w, h }, wherein (cx, cy) represents the coordinate of the center point of the frame, and (w, h) represents the width and the height of the frame.

The invention has the beneficial effects that:

(1) the method adopts a multi-scale and anchor point mode to solve the problem of low precision of the regional proposal, adopts multi-scale characteristic vectors, greatly improves the good effect of combining small targets and large targets, simultaneously has great help to improve the overall identification accuracy, and can obtain more accurate position information compared with the prior proposal methods;

(2) the algorithm carries out 20000 iterations to calculate an error curve according to a loss function, the initial error is defined as 500, namely the convergence range is (0,500), the final convergence is about 20, the error rate is less than about one percent, and the accuracy of the algorithm is greatly improved.

Drawings

FIG. 1 is a diagram of a multi-scale detection implementation;

FIG. 2 is a region selection method;

FIG. 3 is a NMS algorithm flow chart;

FIG. 4 is a graph of stack rate calculations;

FIG. 5 is a view of individual fish and multiple fish detection at multiple angles;

fig. 6 is a graph of error variation.

Detailed Description

A detailed embodiment and effect of the present invention will be illustrated by the following examples in conjunction with the summary of the invention.

Aiming at the defects in the prior art, the invention aims to provide an underwater target detection algorithm with high reliability and good real-time performance, and underwater target information can be detected more accurately. The algorithm can meet the requirements of underwater observation and operation, and provides accurate identification of underwater targets for the underwater robot. The underwater target recognition method can greatly improve the recognition accuracy of the underwater target, provide effective visual information for the underwater target observation and operation of the underwater robot, and improve the intelligent recognition capability of the underwater target.

Implementation 1: as shown in the attached figure 1, the invention realizes an underwater target recognition algorithm based on machine learning according to the requirements of underwater observation and operation. The core of the algorithm is an SSD target detection algorithm, SSD is a method for detecting targets with very high precision, it adopts a feedforward convolution network structure, firstly preprocesses an underwater target image, performs retinex restoration on the image to obtain an input image, then, convolution is carried out by adopting convolution boxes with different sizes and scales among different layers to obtain feature maps with different scales, regression is carried out according to the obtained feature maps, and finally a final result is obtained by an NMS (network management system) inhibition method, the SSD target detection method adopts a multi-scale and anchor point mode to solve the problem of low precision of region suggestion, the adopted multi-scale feature vector greatly improves the good effect on both small targets and large targets, meanwhile, the method greatly helps to improve the overall identification accuracy, and can obtain more accurate position information compared with the conventional suggestion method. And the subsequent convolution network uses convolution templates with different scales to perform feature fusion so as to obtain default boxes with different scales and offset, and finally, an extreme value suppression algorithm is added to realize a final detection classification result. The specific steps are that a standard VGG-16 network is adopted, feature detection is carried out through convolution templates with different sizes to obtain different feature maps, normalization processing is carried out on each feature map, then defuelt box with different scales and offset is obtained through detectors and classifiers with different sizes, and finally detection of a target object is achieved through a non-maximum suppression algorithm.

The SSD multi-scale implementation process is shown in FIG. 1, and it is described here that a standard convolutional neural network VGG-16 is adopted as a network for feature extraction at the front end, and the subsequent convolutional network performs feature fusion by using convolutional templates of different scales, so as to obtain default boxes of different scales and with offset, and finally, a final detection classification result is realized by adding an extremum suppression algorithm.

Implementation 2: as shown in fig. 2, in the convolution network algorithm of SSD, where RPN (region pro posal network) can be used to calculate the obtained region result, and divide the region result according to different scales to obtain multiple regions with feature map sizes 38 × 38 × 512, 19 × 19 × 1024, 10 × 10 × 512, 5 × 5 × 256, 3 × 3 × 256, 1 × 1 × 256, etc., the RPN process is illustrated according to fig. 2, where 5 × 5 × 256 is taken as an example, where k default boxes can be seen to be generated by different scales and ratios, where k ═ 6 can generate 5 × 5 × 256 convolution boxes and 4 default boxes with offset have corresponding confidence classes.

The default scale is accomplished here by m distinct scale feature map predictions, where the smallest scale is set to S_minSet the maximum profile scale to S0.2_maxIf it is 0.95, the dimensions of all layers can be obtained in a recursive manner according to the corresponding dimensions, specifically:

here according to different ratios to a_rWherein the scale is a_rE {1,2,1/2,3,1/3}, and is in accordance with a_rThe width and height of the corresponding default box can be found.

Wherein

And

width and height of the defuelt box, respectively, and in the case of radio equal to 1, it is necessary to re-scale the box to

Thus, default boxes with 6 different dimensions can be obtained.

Meanwhile, the center position of the selection frame can be obtained by the following formula (3).

Wherein, | f_kI represents the k-th characteristic diagram scale, x and y represent the center points of the horizontal and vertical axes of the frame respectively, i.j E [0, f_k]。

The NMS (Non-extreme suppression) algorithm cannot accurately locate a true position in a convolutional neural network, and a Non-extreme suppression NMS (Non-maximum suppression) method needs to be adopted to alleviate the situation.

Implementation 3: as shown in figures 3 and 4 of the drawings,

step 1: the elements in the 8732 × 21 matrix are arranged according to the Conf size.

Step 2: IoU calculation is carried out on all the intersection areas according to the descending order of the calculation results in the first step, Th values are set, comparison is carried out on the Th values according to the order of IoU, and classification are carried out according to the sizes of the Th values.

Step 3: execution resumes at Step2 at the second big box position of the column.

Step 4: step3 is repeated until all default boxes in the row have been executed.

Step 5: the traversal of the 8732 x 21 matrix is performed, i.e. all classes of NMS are performed.

Step 6: and further residual screening is carried out, and all the final residual categories are selected according to the confidence coefficient.

And designing a feature capture box according to the coordinate points of each feature map to extract features, and using the features to predict the type and the bounding box of the target. Here, the convolution kernel of 3 × 3 is used to extract the features in each feature capture box, the convolution box used for each feature map is 3 × 6 (class +4), 6 is the number of capture boxes at each feature map coordinate point, and 4 is the deviation between the predicted target boundary and the target boundary box. If a feature map is m x n in size, with 6 boxes per coordinate, then an output of m x n 6 x (class +4) is ultimately produced:

and (4) calculating a loss function, wherein the loss function of the whole model is as follows:

where x is used to determine whether the designed feature capture box has a corresponding target,

indicating whether the ith box is matched with the jth target frame of the pth object, wherein the matching is 1, and otherwise, the matching is 0. If it is

Indicating that at least one box matches it for the jth target bounding box. Where N represents the number of matching sums. The first part of equation (5) is used to measure the performance of the recognition, and is mainly a class of softmax loss function, detail

Wherein

In equation (6), the second partial equation is used to measure the predicted performance of the bounding box, and the loss function used is as follows:

wherein

And (2) representing the deviation between the real target frame of the jth target and the frame of the characteristic grabbing box, wherein m belongs to { cx, cy, w, h }, wherein (cx, cy) represents the coordinate of the center point of the frame, and (w, h) represents the width and the height of the frame. The position information of the last frame can be expressed by the following formula (7):

meanwhile, the integrated confidence loss function of the plurality of regions may be expressed as:

implementation 5: as shown in fig. 5, for the case of multi-angle recognition of underwater environment, 200 underwater photos of different forms of fishes and crabs are collected through internet and experimental pool experiments. Training the network, and performing experiments in the aspects of loss functions, recognition accuracy and positioning accuracy IOU. Here, the Conf of the target is set to 0.5 or more, and recursive detection is performed on exclusion in the image region center 1/2 region, thereby performing detection and recognition for complex reality such as dense fish distribution and target crossing. The target fishes under different postures and different positions can be accurately identified, the conditions of far and near multi-angle and complicated parallel distribution of multiple fishes are accurately identified, multi-target identification can be realized, and the accurate identification can be realized for some partially shielded fishes. 20000 iterations and calculation of an error curve based on the loss function are shown in fig. 6, where the initial error is defined as 500, i.e. the convergence range is (0,500), and the final convergence is around 20, and the error rate is less than about 1%. In the positioning accuracy map, it is found that the positioning accuracy converges in an oscillation mode and runs through the whole training process, which is caused by the fuzzy and virtual images of underwater imaging. Finally, the method can well overcome the problem of false recognition caused by low recognition accuracy due to the existing underwater imaging blur.

Claims

1. An underwater target identification method based on machine learning is characterized by comprising the following steps:

step1, preprocessing an underwater target image, and performing retinex restoration on the image to obtain an input image;

step2, carrying out convolution by adopting convolution boxes with different sizes and scales among different layers to obtain feature maps with different scales;

step3, performing regression according to the feature maps obtained in the step two, performing normalization processing on each feature map, and obtaining defuelt boxes with different scales and offset through detectors and classifiers with different sizes;

and 4, obtaining a final result through an NMS (network management system) inhibition algorithm.

2. The machine learning-based underwater target recognition method according to claim 1, characterized in that: in the step2, a region result obtained by RPN calculation in a region selection algorithm is adopted, the feature maps with different sizes are obtained by dividing according to different scales, then the RPN convolution core is used for moving the region on the feature map and obtaining the confidence value of the region, and a matrix with the confidence is obtained by continuously moving defualt boxes with different scales.

3. The machine learning-based underwater target recognition method according to claim 1, characterized in that: the width and height of the defuelt box in the step3 are as follows:

whereinAnd

width and height of defualt box, respectively, the scale being a_r∈{1,2,1/2,3,1/3}，

Is the scale of different layers, wherein m is the number, the default scale is realized by m different scale characteristic diagram predictions, S_minIs of the smallest scale, S_maxIs the scale of the largest feature map.

4. The machine learning-based underwater target recognition method according to claim 1, characterized in that: the central position of the selection frame in the step3 is as follows:

wherein x and y represent the horizontal frame, the central point of the longitudinal axis, | f_kI represents the k-th feature graph scale, i.j E [0, | f_k|]。

5. The machine learning-based underwater target recognition method according to claim 1, characterized in that: the method for NMS inhibition in step4 comprises

Step 4.1: arranging elements in the matrix according to the conf size;

6. The machine learning-based underwater target recognition method according to claim 1, characterized in that: the loss function of the whole model in the step4 is

whether the ith box is matched with the jth target frame of the pth object or not is represented, the matching is 1, and otherwise, the matching is 0; if it is

Indicating that at least one box matches the jth target bounding box; n represents the number of matching sums;

to measure the performance of the recognition;

used to measure bounding box prediction performance; wherein