CN114241031B

CN114241031B - Fish body ruler measurement and weight prediction method and device based on double-view fusion

Info

Publication number: CN114241031B
Application number: CN202111579640.9A
Authority: CN
Inventors: 郑婵; 薛月菊; 胡俊茹; 黄龙; 丁成章
Original assignee: South China Agricultural University
Current assignee: South China Agricultural University
Priority date: 2021-12-22
Filing date: 2021-12-22
Publication date: 2024-05-10
Anticipated expiration: 2041-12-22
Also published as: CN114241031A

Abstract

The invention discloses a method and a device for measuring a fish body scale and predicting weight based on double-view fusion, which are used for acquiring an internal reference matrix and a distortion coefficient of acquisition equipment with different view angles; collecting image sequences of fish under different visual angles and different conditions, and establishing an image set; preprocessing an image set; building a Mask R-CNN network model, extracting characteristics of a target area, and training and verifying the model; outputting pixel-level target masks of different view angles from Mask branches; obtaining a keypoint heat map from the keypoint detection branch; testing an image set by using a trained Mask R-CNN network model to obtain a segmentation Mask and detection key points of the fish, converting coordinates of the key points into real world distances after refraction correction to obtain body scale data information of the fish, and calculating a median index of each body scale according to a multi-frame image sequence; training a second regression neural network model for weight prediction based on the training set and the sample real weight value; and visually displaying the body ruler index and the weight value.

Description

Fish body ruler measurement and weight prediction method and device based on double-view fusion

Technical Field

The invention relates to the technical field of aquatic product measurement, in particular to a method and a device for measuring a fish body ruler and predicting a body weight based on double-view fusion.

Background

Currently, fish breeders are required to evaluate the growth and development of fish populations to adjust further breeding strategies and to learn the growth data of fish continuously. Growth information such as body size and weight of fish records the growth condition of the fish in the cultivation process, and the record of the body size information plays an important role in research such as fish genetic analysis and breeding and in aquaculture. Therefore, accurate monitoring of the length and efficient estimation of the weight of fish are critical to scientific research, feeding management and harvesting.

However, the most commonly used methods for measuring the body size of the fish at present are contact type and non-contact type. The contact measurement method requires that a small part of fish samples are selected by a breeder to be captured and anesthetized, the fish body to be sampled is static, and a plurality of indexes (figure 2) of the body length and the whole length of the fish body are manually measured, and the data statistics are used for evaluating the growth condition of the whole fish shoal. The method is time-consuming, laborious and difficult to operate, and can induce physiological stress response of fish, and has adverse effects on tested fish, such as growth retardation and death caused by nerve excitation and inappetence. Another non-contact measuring means uses a vision device and a software system for measuring to measure the body size of fish instead of manual measurement operation. Computer vision technology and image processing technology have evolved rapidly over the last decades. In some methods, a camera is erected under the water of a culture environment, and a global or semi-global matched binocular image is used for carrying out visual estimation of projection of pixels under a world coordinate system, but because the underwater culture environment has poor visual conditions, such as insufficient illumination, uneven light, frequent floaters in water, turbid water body and severe dense fish-shoal shielding, accurate body scale measurement results cannot be obtained by the pixel-level matching algorithm, and larger estimation errors are generated. In addition, from the measurement dimension, the fish scale measurement method based on machine vision comprises two aspects: firstly, a two-dimensional method, namely, acquiring an image on a plane by using a camera; the second is a method of acquiring three-dimensional images by a depth camera or synchronous multi-view fusion of multiple cameras. Currently, a multi-view fusion measurement method is getting more and more attention and higher precision.

Therefore, the non-contact, harmless, underwater limiting target, underwater key point refraction correction, and the ability to measure point location and obtain multiple scale length indexes are the problems that need to be solved by those skilled in the art.

Disclosure of Invention

In view of the above, the invention provides a method and a device for measuring fish body scales and predicting weight based on double-view fusion, which automatically measure the length of the fish body scales through example segmentation, key point detection and a deep convolutional neural network in a visual technology, and estimate and predict the weight by using a regression convolutional network.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

a fish body ruler measurement and weight prediction method based on double-view fusion comprises the following steps:

S1, acquiring an internal reference matrix and distortion coefficients of acquisition equipment with different view angles, marking four points of water areas in different views according to an acquisition actual environment, and recording three-dimensional coordinates of the real world of the four points;

S2, acquiring image sequences of fish under different visual angles and different conditions, establishing an image set, and dividing the image set into a training set, a verification set and a test set according to a proportion;

s3, preprocessing an image set;

s4, building a Mask R-CNN network model, extracting characteristics of a target area by adopting a ResNet50 0+FPN backbone network, training and verifying the model by using a divided training set and verification set, and outputting a Mask branch and a key point branch by a network;

S5, branching out pixel-level target masks of different view angles from the Mask;

S6, obtaining a key point heat map of the view with different view angles from the key point detection branch;

s7, testing a test set in the image set by using a trained Mask R-CNN network model to obtain a segmentation Mask and detection key points of the fish in the test set, converting coordinates of the key points into real world distances after refraction correction to obtain body scale data information of the fish, and calculating a median index of each body scale according to a multi-frame image sequence;

S8, training a second regression neural network model based on the training set and the sample real weight value for weight prediction;

And S9, visually displaying the body size index and the predicted weight value which are measured visually.

Preferably, the step S1 specifically includes:

S11, randomly shooting a group of 25 images of the plane calibration plate from different angles, different positions and different distances by the acquisition equipment with a fixed focal length, wherein the acquisition equipment moves relatively randomly in front of the plane calibration plate;

s12, searching internal angle points of the chess grid on the plane calibration plate, and solving an internal reference matrix of the acquisition equipment;

And S13, carrying out mapping calculation on the camera for a plurality of times according to the calculation result of the internal parameters of the acquisition equipment, realizing correction processing on the shot image, and solving the distortion coefficient of the acquisition equipment.

Preferably, in the step S2, the viewing angle includes a top view and a side view, and the conditions include different types, different sizes, different lighting conditions, and different water quality conditions.

Preferably, the preprocessing in step S3 includes: image enhancement, illumination equalization, lens distortion correction, and in-water refraction correction.

Preferably, the step S4 specifically includes:

Adding a branch to each region of interest, called Mask branch, parallel to the object detection branches of the classification layer and the bounding box regression layer, which applies a small full convolution network FCN to a single RoI, predicting the segmentation Mask by pixel-to-pixel alignment of the output Mask and the input image;

The key point detection branch is used for detecting each individual RoI through a target detection network, and the key point detection is respectively carried out on each region by utilizing the key point branch of the network;

Both semantic segmentation and key point detection comprise target detection tasks, so that on the network structure, resNet50 0+ feature pyramid network FPN is adopted as a backbone network for feature extraction, and three parallel branches are connected: detection branches, mask branches, and critical point branches.

Preferably, the step S5 specifically includes: after RoIAlign, there is a convolutional layer header with a kernel of 1 x 1 for expanding the dimension of RoIAlign output; the method comprises deconvoluting a region selected by a RoI classifier to generate a low-resolution soft mask, scaling down the marked mask to 28×28 to calculate loss in a training stage, scaling up the predicted soft mask to the size of a RoI bounding box in an inference stage, and binarizing the mask using a threshold.

Preferably, the step S6 specifically includes: mapping RoIs generated by the RPN network to the feature map through a RoIAlign module, and extracting corresponding fish region features; then, the features of each RoI are sent into a convolution layer of 8 continuous 3×3 kernels, and key point features are extracted; then a 28×28 heat map is obtained by deconvolution; performing bilinear upsampling on the heat map after deconvolution to obtain a heat map with the size of Kx56 x 56; finally, up-sampling and mapping the size of the original image to obtain the position of the key point; the key point branch output is the positions of K key points in the original image.

Preferably, the step S7 specifically includes:

Testing a test set in an image set by using a trained Mask R-CNN network model, firstly obtaining a segmentation Mask of fish in a double-view image of the test set and a plurality of detection key points, wherein the obtained key point coordinates are two-dimensional image coordinates, and describing the process of converting the image coordinates into real world coordinates by taking coordinates P ₁ and coordinates P ₂ of a certain point in side view and overlook images as examples; selecting a certain point of a corner of the fish tank as an origin (0, 0) of a world coordinate system, and solving a ray direction vector led out by a light center, namely a ray representation of a camera main optical axis in the world coordinate system; solving and solving intersection points I ₁ (side view direction) and I ₂ (overlooking direction) between the main optical axis ray and a medium interface plane (namely a plane for separating air and water), wherein the medium plane can be calculated by three points in four marked points recorded in the step S1; the cosine value of the incoming and outgoing rays and the normal of the plane can be obtained by knowing the ratio of the refractive index of air to water and the rest of four marked points according to the Snell's law of refraction, and the direction of the refracted rays can be obtained by simultaneous solution And finally, calculating the midpoint P of the closest points M ₁ and M ₂ at the intersection of the two refracted rays from the side-view and overlook refracted rays, and taking the midpoint P as the optimal three-dimensional projection point of the real world.

The key points in each pair of double-view images are converted into real world distances after refraction correction, body scale index data of the fish are calculated, and the median index of each body scale index is calculated according to the multi-frame image sequence. Considering that the body scale measurement of the living fish has errors due to the posture difference or other reasons, the average value of the numerical values between 1/3 quantiles and 2/3 quantiles is measured by adopting a plurality of times of measurement, namely, the middle section data is reserved and then the average value is obtained. And the middle segmentation number is calculated for all the acquired body ruler data, so that the imaging errors of the camera and errors of a body ruler parameter measurement algorithm can be reduced.

Preferably, the step S8 specifically includes:

The constructed neural network model topology is 7-15-1, namely 7 nodes are arranged on an input layer, 15 nodes are arranged on a hidden layer, 1 node is arranged on an output layer, the activation function of the hidden layer is ReLu linear unit functions, and the activation function of the output layer is a linear function; the numbers of neurons of the input layer, the hidden layer and the output layer are respectively represented by symbols i, j and k, phi and k The activation functions of the hidden layer and the output layer are respectively, w _ij and w _jk are respectively the weights of neurons of the hidden layer and the output layer, and θ _j and θ _k are respectively the thresholds of neurons of the hidden layer and the output layer;

In the forward propagation process, the input and output of the hidden layer neuron node and the input and output of the output layer neuron node are respectively defined as:

net_j＝∑w_ijx_i+θ_j

o_j＝φ(net_j)＝φ(Σw_ijx_i+θ_j)

net_k＝∑w_jko_j+θ_k＝∑w_jkφ(∑w_ijx_i+θ_j)+θ_k

Wherein x _i is the body scale parameters and the image characteristic values of the fish, wherein 4 body scale parameters and 3 image characteristic values are taken as input vectors, 7 are taken as input vectors, net _j and o _j are respectively the input and output of hidden layer neurons, and net _k and o _k are respectively the input and output of output layer neurons;

During model training, the counter propagation of error signals, the weights w _ij and w _jk of each layer, the thresholds theta _j and theta _k are adjusted to minimize errors, when the prediction model reaches a target error or iteration times, the training phase is ended, and then body size information and image characteristics acquired by the acquired image samples are input into the trained fish quality prediction model to realize the estimation of the fish quality of the image samples.

The device comprises a fish tank with a limiting device, a collecting device and a data processing and calculating device, wherein the collecting device is arranged on the fish tank, collects image information of the fish tank and is connected with the data processing and calculating device, and the collected image information is sent to the data processing and calculating device;

The device is characterized in that a face light source plate is placed behind the acquisition device, white waxed light paper is placed in front of the face light source plate subjected to nodding, three side face walls of the fish tank are covered by semitransparent white acrylic plates except for an erection face of the acquisition device, a checkerboard calibration plate which is made of movable, opaque and coated organic glass is placed in the place which is located inside the fish tank in the front direction of the side shooting image head.

Compared with the prior art, the invention discloses a method and a device for measuring the fish body rule and predicting the weight based on double-view fusion, which are used for automatically measuring the length of the fish body rule through example segmentation, key point detection and a deep convolutional neural network in a visual technology and estimating and predicting the weight by using a regression convolutional network.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a process for measuring body size and estimating body weight by using a dual view fusion method according to the present invention.

Fig. 2 is a diagram showing a conventional manner of measuring fish scale data in a contact manner according to the present invention.

Fig. 3 is a diagram showing a body scale shooting device for a double-view fish.

FIG. 4 is a diagram showing refractive correction of a target point in water according to the present invention.

Fig. 5 is a diagram showing the overall architecture of a network provided by the present invention.

FIG. 6 is a diagram of two branching structures of the output provided by the present invention, (a) Mask branching and (b) critical point branching.

FIG. 7 is a schematic diagram of a side view and top view key point setting provided by the present invention.

Fig. 8 is a diagram showing a network structure of a fish weight prediction model provided by the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1, the embodiment of the invention discloses a method for measuring a fish body ruler and predicting weight based on double-view fusion, which is a traditional way for measuring fish body ruler data in a contact manner as shown in fig. 2. The method of the embodiment of the invention comprises the following steps:

s1, acquiring an internal reference matrix and distortion coefficients of acquisition equipment with different view angles, marking four points of a water area surface in different views according to actual parameters of an acquisition environment, and recording three-dimensional coordinates of the real world of the four points; specific:

The internal parameters of the camera at different focuses are different, and the camera needs to be calibrated first. Before the body ruler measurement experiment starts, no water is filled at the moment, no target fish to be measured is put in, the two-dimensional checkerboard calibration plate is taken down, and the image acquisition and calibration of the camera are carried out by using a Zhang's calibration algorithm. The calibrating process comprises the following steps: firstly, a camera shoots a group of 25 images on a plane calibration plate at random from different angles, different positions and different distances by a fixed focal length, and the camera moves relatively randomly in front of the calibration plate; secondly, searching internal angle points of the chess grids on the plane calibration plate, and solving an internal reference matrix of the camera; and finally, carrying out mapping calculation for several times on the camera according to the calculation result of the internal parameters of the camera, realizing correction processing on the shot image, and solving the distortion coefficient of the camera.

According to the actual inner diameter length and width of the fish tank and the water level height, four points (namely an upper left corner, an upper right corner, a lower left corner and a lower right corner of the water surface of the top view) of the water area surface in the top view are artificially marked, such as A, B, F and E points in fig. 4, the upper left corner water surface point of the fish tank is defined as a real world three-dimensional coordinate origin (0, 0 and 0), real world three-dimensional coordinates of the four points are recorded, the side view is the same, and four points (such as A, B, D and C in fig. 4) of the water area surface in front of the side view are selected for artificial marking.

S2, acquiring image sequences of fish under different visual angles and different conditions, establishing an image set, and dividing the image set into a training set, a verification set and a test set according to a proportion; specific:

The method comprises the steps of scooping fish to be tested into a water glass jar through a body scale image acquisition device of the fish, respectively erecting two cameras in front of and above the jar to acquire image sequences, acquiring a large number of image sequences of fish top views and side views under different types, different sizes, different illumination conditions and different water quality conditions, establishing an image database for measuring the fish body scale, and respectively dividing an image set into a training set, a verification set and a test set according to the proportion of 5:2:3.

S3, preprocessing an image set; specific:

Firstly, adopting a limiting contrast self-adaptive histogram equalization (CLAHE), and enhancing the contrast of an image to make the image more natural; and then, median filtering is firstly adopted on a spatial domain, image noise points are restrained, the image is subjected to sharpening filtering by using a Laplacian operator, and the image is overlapped with an original image, so that the edge contour enhancement of the image is realized.

The camera is calibrated by using 25 shot plane calibration plate images, and an internal reference matrix and correction coefficients of the overlook and side-view cameras are obtained, wherein the internal reference matrix and correction coefficients are shot and calibrated in the air, and the internal reference matrix and correction coefficients belong to learning of the attribute of the camera and are irrelevant to water surface refraction. Then, four points are selected as artificial annotation points (the four points are respectively taken from the crossing points between the water surface and the corners of the fish tank, as shown in the points A-F in fig. 4), under the condition that the coordinates of the four points in the three-dimensional space and the two-dimensional projection positions of the four points are known, the iterative method based on Levenberg-Marquardt optimization is used for minimizing the re-projection errors of the four points in each camera, and the pose of the camera, namely the rotation matrix and the translation vector of the external parameters of the camera, are calculated. Since the projection error is calculated using the camera internal parameters, the calculation process needs to depend on the camera internal parameters. The camera external reference matrix describes how to convert points of three-dimensional coordinates in the world coordinate system into points of three-dimensional coordinates in the camera coordinate system, so that the coordinate positions of the three-dimensional points of the world coordinate system where the camera optical center is located can be obtained. And finally, obtaining a conversion formula between the 3D coordinates of the camera and the 2D coordinates of the image after obtaining the external parameters and the internal parameters.

The calculation and correction of refracted rays in water is a more complex problem. As shown in fig. 4, the known top and side view image coordinates P ₁ and P ₂, as well as the previously known camera internal and external parameters, require calculation of the coordinates of the world coordinate system of the refracted target P point under consideration of the water surface refraction conditions. The world coordinate system origin is located at (0, 0) as in fig. 4, and the ray direction vector led out by the optical center, that is, the ray representation of the camera main optical axis in the world coordinate system, is first solved, as shown in the following formula (1). Wherein r ₀ is the three-dimensional coordinate position of the camera optical center point in the world coordinate system, which is also a point on the main optical axis ray in the world coordinate system, lambda is an independent variable, and when the independent variable lambda takes different values, the three-dimensional coordinates of different points on the main optical axis ray are obtained.

Wherein,The direction of the rays can be calculated by the inverse R ^-1 of the rotation matrix in the camera external reference, the inverse K ^-1 of the camera matrix in the internal reference and the transverse and longitudinal coordinates [ xy ] of the two-dimensional image projected by the target point P ₁, as shown in the formula (2). R ₀ is calculated from R ^-1 and the translation vector t as in equation (3).

r₀＝-R^-1t (3)

As shown in fig. 4, the intersection point I between the optical axis ray and the medium interface plane (i.e., the plane separating air and water) needs to be first found. The intersection point I is obtained by considering the more general condition that the main optical axis direction of the camera and the medium plane are not vertical (not completely shooting in front), as the connecting lines among the four corner points of the water area in the artificially marked fish tank are in three-dimensional vertical relation with the normal line of the medium plane, three points in the four points are taken,And/>The vector obtained by connecting two by two of three points in a certain view angle is expressed by the formula (4),/>And/>Vector cross-multiplication to determine the normal direction/>, of the medium plane

Let p be the intersection point of the ray and the plane, p ₀ be the remaining one of the four artificial annotation points in the top view or side view, and have formula (5):

Then, a variable lambda ₀ when the ray intersects with the plane is solved according to a formula (6), the variable lambda ₀ is substituted into the formula (1), and a three-dimensional coordinate I of the intersection point of the optical axis ray and the plane is solved, wherein the three-dimensional coordinate I is as shown in a formula (7):

after obtaining the coordinates of the three-dimensional intersection point I of planes among different media, the cosine value of the incoming and outgoing rays and the plane normal can be obtained by Snell's law (refraction law), and the direction of the refracted rays can be obtained by simultaneous solution Wherein/>Is the ratio of the refractive index of air to water. Wherein, θ ₁ is incident ray/>Surface normal to interface with Medium/>Included angle, θ ₂ is refractive ray/>From normal lineAn included angle between the two.

From direction of raysAnd the three-dimensional intersection point coordinates I (I ₁ and I ₂ under different visual angles) of the incident rays on planes between different media, so that the refracted rays represent an equation. Hypothesis/>Is a vector between two rays of different view angles, which is perpendicular to the refracted rays of the two view angles, as in equations (11) and (12), wherein/>And/>And respectively obtaining three-dimensional coordinates of a starting point and an ending point of the line segment by simultaneous solution in the directions of the side shooting view angle and the depression shooting view angle after refraction. As in formulas (13) and (14), wherein M ₁ is/>Intersection point with nodulation ray, M ₂ is/>The intersection point with the side-shooting ray, the optimal three-dimensional projection point P is the midpoint of the connecting line of the two intersection points M ₁ and M ₂, as shown in formula (15).

S4, building a Mask R-CNN network model, as shown in FIG. 5, extracting the characteristics of a target area by adopting a ResNet50 0+FPN backbone network, training and verifying the model by using a divided training set and verification set, and outputting a Mask branch and a key point branch by using a network output; the method specifically comprises the following steps:

An end-to-end network framework is adopted, and an instance segmentation task and a key point detection task are completed simultaneously. The example segmentation must correctly identify the object classification and location in the image, while accurately distinguishing the object to which each pixel belongs. Example segmentation includes object detection and foreground semantic segmentation. First, a branch of predictive segmentation Mask, called Mask branch, is added to each region of interest, parallel to the object detection branches of the classification layer and bounding box regression layer. The Mask branch applies a small full convolution network FCN to a single RoI, predicting the split Mask by pixel-to-pixel alignment of the output Mask and the input image. In addition, the key point detection branch is used for detecting each individual RoI through a target detection network, and the key point detection is respectively carried out on each area by utilizing the key point branch of the network. This is a top-down keypoint detection method employed in Mask R-CNN. In the method, semantic segmentation and key point detection both comprise target detection tasks. Therefore, in the network structure, resNet50 0+ feature pyramid network FPN is used as a backbone network for feature extraction, and then three parallel branches can be connected (fig. 5): the detection branch, mask branch and key point branch also complete three tasks simultaneously: target detection, instance segmentation, and keypoint detection.

S5, branching out pixel-level target masks of different view angles from the Mask; the method specifically comprises the following steps:

mask R-CNN is an extended pixel level instance segmentation based on Faster R-CNN. Compared with the structure of the Faster R-CNN, the Mask branch of the lightweight FCN is added to the Faster R-CNN on the network for each RoI prediction segmentation Mask, and the Mask is parallel to the existing classification and bounding box regression branches. The Mask branches are shown in FIG. 6-a. After RoIAlign, there is a convolutional layer header whose kernel is 1 x 1, whose purpose is to expand the dimension of RoIAlign output to make it more accurate in predicting the output mask. The area selected by the RoI classifier is then deconvolved, consisting of a convolutional network, to produce a low resolution soft mask (28 x 28 pixels). The soft mask size represented by the floating point number is smaller, making the mask branches lighter. In the training phase, the marked mask is scaled down to 28 x 28 to calculate the penalty. In the reasoning phase, the predicted soft mask is scaled up to the size of the RoI bounding box, binarized using a threshold. The branch ultimately generates a binary mask to be output, one for each object, which may determine whether a pixel is part of a specified object.

S6, obtaining a key point heat map of the view with different view angles from the key point detection branch; the method specifically comprises the following steps:

As shown in FIG. 7, the side view has 9 key points (important ones: fish mouth, dorsal highest under dorsal fin, ventral lowest before ventral fin, vertebral end and caudal fin end) and the top view has 8 key points (important ones: left and right ventral outermost broad points) for detection. These key points are used for three-dimensional body ruler measurement of the fish in the next stage respectively. The description of the key points and the measurable body size index are shown in table 1:

table 1 description of key points of side and top views of fish

The network structure of the critical point branches is shown in fig. 6-b. When constructing the key point detection branch, only minor changes are needed to be made to the split mask branch. For each instance of K keypoints, a one-hot mask with only one pixel labeled foreground is fed into the training model, converting the keypoint regression task into a masked regression task. In the reasoning process, the keypoint prediction model generates a binary mask heat map of k×m×m as output, where K is the number of classes of the keypoints of the fish image (9 keypoints in the side view and 8 keypoints in the top view), and m×m is the heat map resolution, and preferably m=56. The critical point branches are similar in network structure to the mask branches. The specific process is as follows: mapping RoIs generated by the RPN network to the feature map through a RoIAlign module, and extracting corresponding fish region features; then, the features of each RoI are sent into a convolution layer with 8 continuous 3×3 kernels, and key point features are extracted; then a 28×28 heat map is obtained by deconvolution; to increase the size of the heat map, performing bilinear upsampling on the heat map after deconvolution to obtain a heat map with the size of Kx56 x 56; and finally, up-sampling and mapping the size of the original image to obtain the positions of the key points. The key point branch output is the positions of K key points in the original image.

S7, testing a test set in the image set by using a trained Mask R-CNN network model to obtain a segmentation Mask and detection key points of the fish in the test set, converting coordinates of the key points into real world distances after refraction correction to obtain body scale data information of the fish, and calculating a median index of each body scale according to a multi-frame image sequence; specific:

as described above, after the calibration process of the camera, the camera focus is kept unchanged after the side-looking and top-looking camera reference matrix and distortion coefficient are obtained, and the process of locating and measuring the target coordinates can be started. After the step S6 is performed based on the example segmentation and the algorithm, 2D image coordinates of a plurality of key points in side view and top view are obtained, if 3D real world coordinates of the key points are to be obtained, camera internal parameters and distortion coefficients calculated after calibration, camera posture estimation (camera external parameters) and correction of refraction in water are obtained, and transformation from a camera coordinate system to a world coordinate system is achieved.

If the refraction correction is not considered, as shown in equation (16),In order to obtain the conversion relationship between the [ u, v ] coordinates (i.e. image coordinates) on the two-dimensional imaging plane and the [ x _w,y_w,z_w ] (i.e. real world coordinates) on the world coordinate system, the middle 3×3 matrix is the internal reference matrix (abbreviated as K in formula (17)) of the camera, and is determined by the parameters of the sensor and the camera lens. The middle 3 x 4 matrix is formed by connecting a 3 x 3 rotation matrix R and a translation vector t, and the external parameters of the camera are determined by the position and the posture of the camera.

The converted formula (16) can be abbreviated as formula (17), P _i is the 2D image coordinates of a point, P _w is the 3D world coordinates of the point, and the explanation of the internal reference matrix K, the rotation matrix R of the external reference, and the translation vector t is as before.

And (3) respectively processing the two-dimensional coordinate information of the side view and the overlook double view, fusing to form the three-dimensional real world coordinate of the target, and calculating according to the formula (15) in the step S3 if the refraction correction in water is considered. Finally, the following scale index of the fish body is calculated as shown in table 2:

TABLE 2 description and calculation of body size index of fish

S8, training a second regression neural network model based on the training set and the sample real weight value for weight prediction; specific:

and constructing an estimation model of fish quality based on a neural network BPNN prediction algorithm. The neural network trains a multi-layer feedforward network through error back propagation, has strong nonlinear mapping capability and is good in many prediction fields. The neural network model for fish weight estimation consists of three layers: input, hidden and output layers, the layers being connected by nodes. The parameters of the fish scale and the image characteristic values which are measured visually are used as the parameters of the input layer, and the quality of fish is used as the result of the output layer. In view of computational complexity and model accuracy, the model adopts three layers, namely a hidden layer. The constructed neural network model topology is 7-15-1, namely 7 nodes are arranged on the input layer, 15 nodes are arranged on the hidden layer, and 1 node is arranged on the output layer. The activation function of the hidden layer is ReLu linear unit functions, and the activation function of the output layer is linear. The topology of the constructed neural network model is shown in FIG. 8, in which the numbers of neurons, phi and k, of the input layer, hidden layer and output layer are represented by symbols i, j and k, respectively The activation functions of the hidden layer and the output layer are respectively, w _ij and w _jk are respectively the weights of the hidden layer and the output layer neurons, and θ _j and θ _k are respectively the thresholds of the hidden layer and the output layer neurons.

net_j＝∑w_ijx_i+θ_j (18)

o_j＝φ(net_j)＝φ(∑w_ijx_i+θ_j) (19)

net_k＝∑w_jko_j+θ_k＝∑w_jkφ(∑w_ijx_i+θ_j)+θ_k (20)

Wherein x _i is the body scale parameter and the image characteristic value of the fish (wherein 4 body scale parameters and 3 image characteristic values are taken as input vectors, and 7 are taken as total), net _j and o _j are respectively the input and output of hidden layer neurons, and net _k and o _k are respectively the input and output of output layer neurons.

During model training, the counter-propagation of error signals adjusts the weights (w _ij and w _jk) and thresholds (theta _j and theta _k) of each layer to minimize the error, and when the prediction model reaches the target error or the iteration number, the training phase is ended. And then, inputting body size information and image characteristics obtained by the online collected image sample into a trained fish quality prediction model, so that the fish quality of the image sample can be estimated.

S9, displaying the index value in a visual mode.

Specifically, place the face light source behind each camera, the light source board size is 30 x 30cm, provide a stable continuous luminous flux and be 670 lumen luminance, the colour temperature is 4000K's area source, especially, place white wax-up paper before the face light source board of nodding makes the light source scatter more even, to nodding, because of the fish bowl bottom surface is about perpendicular relation with the main optical axis of camera, nodding the face light source can appear in the field of vision of nodding the camera by the fish bowl bottom surface reflection on a large scale, after adding white wax-up paper, can make the highlight spot in the field of vision become soft, the formation of image is clearer and more natural.

The inner diameter of the fish tank is 30 multiplied by 30cm. The water depth is about 20 cm. The shot picture is larger than the whole bottom surface of the fish tank, particularly in the nodding picture, the effective fish tank picture occupies about 1200 multiplied by 1200 pixels, and in the side shooting picture, the effective fish tank picture occupies about 1800 multiplied by 900 pixels. In addition, three side walls of the fish tank are covered by semitransparent white acrylic plates, so that the mirror reflection caused by different media inside and outside the plates is reduced, but the side shooting needs to transmit light, so that the mirror reflection problem exists on one side. The position of the camera is taken on the opposite side, and the mirror reflection condition can be caused because of different internal and external media at the horizontal plane, but the reflection caused by the horizontal plane can be reduced by adjusting the shooting angle and the placing height of the camera to enable the visual angle of the camera to be parallel to the horizontal plane.

A movable, opaque, coated organic glass checkerboard calibration plate (size of the checkerboard 5 x 5 cm) is placed approximately in the front direction of the side camera and in the fish tank, as shown in fig. 3.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1) The refraction effect of light rays in water is considered, two-dimensional points in the image are converted and mapped to real world three-dimensional coordinates, and the accuracy of measuring the body scale of the fish body in water is improved through refraction correction;

2) The adaptation of illumination and scenes of an application environment is stronger by combining the deep convolution network Mask R-CNN;

3) Based on the principle of object instance segmentation, the contour edge of the fish can be accurately segmented, and the obtained object mask has higher segmentation precision and better segmentation effect than the traditional threshold value;

4) Based on the idea of key point detection, specific key points (such as mouth, dorsal fin, ventral fin, tail fin and the like) on the fish body and related to the body ruler are particularly focused and the spatial positions of the specific key points are obtained, so that the accuracy of underwater matching is improved, and more than ten items of fine body ruler data can be obtained;

5) The device is simple, the system is easy to operate, and the on-line body ruler calculation and weight estimation can be carried out on the fish in the field shooting image only by using a glass cylinder for measuring the fish body ruler and a checkerboard calibration plate capable of moving back and forth as a water-proof baffle, two cameras with side view and overlook view, a background with calculation capability and a display computer as a software interface, so that the accuracy and the execution efficiency are high;

6) From the aspects of scene change, fish type change and the like, the model has good generalization capability on body ruler calculation tasks by changing or expanding the data set;

7) And avoids the stress and physiological injury to the fish caused by injecting the anesthetic needle and weighing the fish away from water.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. The fish body ruler measurement and weight prediction method based on double-view fusion is characterized by comprising the following steps of:

s3, preprocessing an image set;

S8, training a second regression neural network model based on the training set and the sample real weight value for weight prediction; the method specifically comprises the following steps:

net_j＝∑w_ijx_i+θ_j

o_j＝φ(net_j)＝φ(∑w_ijx_i+θ_j)

net_k＝∑w_jko_j+θ_k＝∑w_jkφ(∑w_ijx_i+θ_j)+θ_k

during model training, the counter propagation of error signals, the weight values w _ij and w _jk of each layer, the threshold values theta _j and theta _k are adjusted to minimize errors, when the prediction model reaches a target error or iteration times, the training phase is ended, and then body scale information and image characteristics acquired by an acquired image sample are input into a trained fish quality prediction model to realize the estimation of the fish quality of the image sample;

2. The method for fish body ruler measurement and weight prediction based on double-view fusion according to claim 1, wherein the step S1 specifically comprises:

3. The method for measuring the body size and predicting the body weight of fish based on the double-view fusion according to claim 1, wherein in the step S2, the view angles comprise top view and side view, and the conditions comprise different types, different sizes, different illumination conditions and different water quality conditions.

4. The method for fish body ruler measurement and weight prediction based on double vision fusion according to claim 1, wherein the preprocessing in step S3 comprises: image enhancement, illumination equalization, lens distortion correction, and in-water refraction correction.

5. The method for fish body ruler measurement and weight prediction based on double-view fusion according to claim 1, wherein the step S4 specifically comprises:

6. The method for fish body ruler measurement and weight prediction based on double-view fusion according to claim 1, wherein the step S5 specifically comprises: after RoIAlign, there is a convolutional layer header with a kernel of 1 x1 for expanding the dimension of RoIAlign output; the method comprises deconvoluting a region selected by a RoI classifier to generate a low-resolution soft mask, scaling down the marked mask to 28×28 to calculate loss in a training stage, scaling up the predicted soft mask to the size of a RoI bounding box in an inference stage, and binarizing the mask using a threshold.

7. The method for fish body ruler measurement and weight prediction based on double-view fusion according to claim 1, wherein the step S6 specifically comprises: mapping RoIs generated by the RPN network to the feature map through a RoIAlign module, and extracting corresponding fish region features; then, the features of each RoI are sent into a convolution layer of 8 continuous 3×3 kernels, and key point features are extracted; then a 28×28 heat map is obtained by deconvolution; performing bilinear upsampling on the heat map after deconvolution to obtain a heat map with the size of K multiplied by 56, wherein K is the number of key points; finally, up-sampling and mapping the size of the original image to obtain the position of the key point; the key point branch output is the positions of K key points in the original image.

8. The method for fish body ruler measurement and weight prediction based on double-view fusion according to claim 1, wherein the step S7 specifically comprises:

Testing a test set in an image set by using a trained Mask R-CNN network model, firstly obtaining a segmentation Mask of fish in a double-view image of the test set and a plurality of detection key points, wherein the obtained key point coordinates are two-dimensional image coordinates, and the process of converting the image coordinates into real world coordinates is that the coordinates of a certain point in a side view image are P ₁ and the coordinates in a overlook image are P ₂;

Selecting a certain point of a corner of the fish tank as an origin (0, 0) of a world coordinate system, and solving a ray direction vector led out by a light center, namely a ray representation of a camera main optical axis in the world coordinate system;

solving and solving intersection points I ₁ and I ₂ between a main optical axis ray and a medium interface plane, namely a plane for separating air and water, wherein the medium plane can be obtained by calculating three points in four marked points recorded in the step S1, wherein I ₁ is a side view direction intersection point, and I ₂ is a top view direction intersection point;

The cosine value of the incoming and outgoing rays and the normal of the plane can be obtained by knowing the ratio of the refractive index of air to water and the rest of four marked points according to the Snell's law of refraction, and the direction of the refracted rays can be obtained by simultaneous solution Finally, calculating the midpoint P of the closest points M ₁ and M ₂ at the intersection of the two refracted rays by the refracted rays from side view and top view, wherein P is used as the optimal three-dimensional projection point of the real world;

the key points in each pair of double-view images are converted into real world distances after refraction correction, body scale index data of the fish are calculated, and the median index of each body scale index is calculated according to the multi-frame image sequence.

9. A device for measuring fish body scales and predicting weight based on double-view fusion, which is applied to the method for measuring fish body scales and predicting weight based on double-view fusion as claimed in any one of claims 1-8, and is characterized by comprising a fish tank with a limiting device, a collecting device and a data processing computing device, wherein the collecting device is arranged on the fish tank, collects image information of the fish tank, is connected with the data processing computing device and sends the collected image information to the data processing computing device;

The device is characterized in that a face light source plate is placed behind the acquisition device, white waxed light paper is placed in front of the face light source plate subjected to nodding, three side walls of the fish tank are covered by semitransparent white acrylic plates except for an erection surface of the acquisition device, a movable, opaque and coated organic glass checkerboard calibration plate is placed in the place in the fish tank, in the front direction of the side shooting image head.