CN116703835A

CN116703835A - Intelligent reinforcement detection method and system based on convolutional neural network and binocular vision

Info

Publication number: CN116703835A
Application number: CN202310578442.3A
Authority: CN
Inventors: 赵唯坚; 魏翠婷
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2023-05-22
Filing date: 2023-05-22
Publication date: 2023-09-05

Abstract

The application discloses an intelligent reinforcement detection method and system based on a convolutional neural network and binocular vision. The method comprises the following steps: s1: acquiring RGB image and depth image data of the steel bar by using a depth camera; s2: inputting RGB images of the reinforcing steel bars into a convolutional neural network for reinforcing steel bar identification, and obtaining a prediction boundary frame and a mask of the reinforcing steel bars; s3: based on the reinforcement recognition result, reinforcement detection is performed by utilizing a binocular vision technology, and a visual reinforcement quality acceptance result is output. The convolutional neural network is used for reinforcing steel bar identification, so that the accuracy of reinforcing steel bar target detection and segmentation is improved; and combining with binocular vision technology, make intelligent reinforcement detection possess real-time, output visual result simultaneously, assist the staff to carry out quality inspection and acceptance in reinforcing bar concealing engineering, promote work efficiency greatly, reduce the cost of labor.

Description

Intelligent reinforcement detection method and system based on convolutional neural network and binocular vision

Technical Field

The application relates to the technical field of deep learning and computer vision, in particular to an intelligent reinforcement detection method and system based on convolutional neural network and binocular vision.

Background

The reinforced concrete structure is the most widely used structural form in the existing engineering structure due to the advantages of easy material obtaining, strong plasticity, reasonable material stress, simple construction process, low cost and the like. In the structural design process, the bearing capacity of the components is guaranteed by controlling the diameters and the spacing of the reinforcing steel bars, and reinforcement concealed engineering acceptance is required before concrete pouring, namely whether the reinforcement binding specifications, the number and the spacing meet the design requirements is checked. The traditional reinforcement arrangement detection mainly relies on a manual measurement method, so that the detection precision and the detection range are limited, and potential safety hazards exist in the detection process of the construction site. Under the background of labor shortage and serious ageing of practitioners, the traditional method for detecting reinforcing bars needs to be transformed into intelligent.

The steel bar detection method based on the digital image processing technology is easily affected by factors such as illumination, background, shielding and the like, the precision can not meet the requirements of actual engineering, and the real-time detection effect is poor. Along with the development of laser scanning technology and equipment, a method for realizing high-precision measurement by utilizing three-dimensional point cloud is widely applied to the field of civil engineering, but the method is limited in practical application due to the fact that the laser scanning equipment is high in price, the data acquisition and calculation processes are complex and the like.

In recent years, due to strong feature learning capability, a target detection and example segmentation algorithm based on a convolutional neural network is widely applied, and good effects such as prefabricated part identification, reinforcement binding net point positioning, reinforcement section counting and the like are obtained.

The patent specification with publication number of CN113269718A discloses a concrete precast member crack detection method based on deep learning, which comprises the steps of collecting crack image data, preprocessing the collected crack image sample, and manually marking the sample; the marked data samples are subjected to data augmentation and are divided into a training set, a verification set and a test set; building a convolutional neural network model; training, verifying and testing the convolutional neural network model to obtain a final algorithm model; and detecting the crack image to be detected by using the algorithm model to obtain a detection result.

The patent specification with the publication number of CN115222652A discloses a method for identifying, counting and centering end faces of bundled reinforcing steel bars, which comprises the following steps of S1, shooting images of the end faces of the reinforcing steel bars, and obtaining images to be identified after processing; s2, performing data enhancement operation on the image to be identified by adopting a first preset algorithm; s3, forming a final detection frame in the image to be identified by adopting a second preset algorithm with a lightweight convolutional neural network, and calculating the number of the final detection frames; s4, generating a counting result.

Disclosure of Invention

Aiming at the defects of the existing steel bar detection technology, the application provides an intelligent reinforcement detection method based on a convolutional neural network and binocular vision, which utilizes an improved Mask R-CNN instance segmentation model to improve the accuracy of steel bar identification, combines the binocular vision technology to output a visual reinforcement detection result, assists staff in quality inspection and acceptance in steel bar hiding engineering, greatly improves work efficiency and reduces labor cost.

An intelligent reinforcement detection method based on convolutional neural network and binocular vision comprises the following steps:

s1: acquiring RGB image and depth image data of the steel bar by using a depth camera;

s2: inputting RGB images of the reinforcing steel bars into a convolutional neural network for reinforcing steel bar identification, and obtaining a prediction boundary frame and a mask of the reinforcing steel bars;

s3: based on the reinforcement recognition result, reinforcement detection is performed by utilizing a binocular vision technology, and a visual reinforcement quality acceptance result is output.

In a preferred embodiment, in step S1, depth image data of the steel bar is obtained by the following steps:

s1.1: performing stereo matching on left and right eye images of a depth camera to obtain a parallax image;

s1.2: according to the relation between depth and parallax, converting the parallax image into a depth image, wherein the calculation formula of the depth z is as follows:

wherein f is the focal length of the depth camera, b is the base line length of the depth camera, d is the parallax of the left and right eye images, and x _l Is the abscissa of the projection point of the left-eye camera, x _r Is the abscissa of the projected point of the right eye camera.

In a preferred embodiment, the step S2 specifically includes the steps of:

s2.1: the method comprises the steps of collecting original pictures of reinforcing steel bars by using camera equipment, manufacturing reinforcing steel bar mask labels by using a manual labeling method, dividing a data set into a training set and a testing set, and amplifying the data set through data enhancement;

s2.2: pre-training on an improved Mask R-CNN model by using a public data set COCO2017, and initializing network parameters based on a migration learning principle;

s2.3: training the improved Mask R-CNN model established in the step S2.2 through the data set in the step S2.1, and constructing a reinforcement instance segmentation model;

s2.4: and inputting the RGB image of the steel bar acquired by the depth camera into a steel bar example segmentation model to acquire a prediction boundary frame and a mask of the steel bar.

In step S2.1, the data enhancement refers to a geometric transformation operation of random translation, rotation, mirroring, affine transformation, and a pixel transformation operation of random brightness adjustment, contrast adjustment, HSV adjustment, gaussian noise addition, pretzel noise addition, and the like.

Further preferably, in step S2.2, the improved Mask R-CNN model includes an optimized feature extraction module, an RPN module, an ROI alignment module, and an output branch;

the optimized feature extraction module is a CA-SA module which is formed by adding a bottom-up propagation path and a channel attention CA module and a space attention mechanism SA module into a feature pyramid structure ResNet-FPN based on a residual error network; in the CA module, after a feature map with height of H, width of W and channel number of C is input into a global average pooling layer, space dimensions W and H are compressed into a unit of 1, then the obtained 1 multiplied by C feature map is subjected to convolution operation, the sum of channels is 1 through softmax processing, the output at the moment is the attention mechanism weight of each channel, and the weight is multiplied by the input feature map on the channel correspondingly to obtain an output feature map; in the SA module, after the feature map is subjected to 1X 1 convolution and softmax processing, the channel dimension is compressed into a unit 1, the SA module acquires a weight matrix of the H X W dimension feature map on a two-dimensional plane, the weight matrix corresponds to the spatial attention mechanism weight of each pixel point, represents the importance degree of spatial position information, and endows the input feature map with the weight matrix to amplify important features and weaken background information, so that the effects of feature screening and enhancement are realized; the bottom-up propagation path specifically refers to: transmitting the characteristic information of P2 to N2, N2 to perform 3X 3 convolution on the { P2, P3, P4 and P5} characteristic diagram obtained based on ResNet-FPN, down-sampling the height and width to the size of P3, adding the height and width with P3 element by element, sending the obtained result to a CA-SA module to obtain N3, and further extracting N4 and N5 on the P4 and P5 characteristic diagrams so as to obtain { N2, N3, N4 and N5} characteristic diagrams;

generating a feature map by the image through the optimized feature extraction module; the RPN module further generates a strong priori anchor point frame for each point on the feature map, and obtains the classification score and the bounding box regression quantity of the anchor point frame through 1X 1 convolution, so that a group of better candidate frames are screened out and input into the ROI alignment module; the ROI alignment module transforms the feature map generated by the optimized feature extraction module and the candidate frames screened by the RPN module to the same dimension so as to meet the requirement of the follow-up full convolution network on the input features; and finally, inputting the characteristics obtained by the ROI alignment module into a full-connection layer, and respectively outputting the prediction category score, the bounding box regression quantity and the pixel mask of the object at the classification branch, the bounding box regression branch and the mask branch, thereby completing the whole detection and segmentation task.

Further preferably, the step S2.3 specifically includes the steps of:

s2.3.1: setting an evaluation index of an improved Mask R-CNN model as mAP calculated by a COCO2017 data set definition method, namely using the average value of the sum of all average accuracy rates (AP) under different cross ratio thresholds (0.50:0.05:0.95), wherein the calculation formula of the AP is specifically as follows:

wherein P represents the proportion predicted correctly among all predicted rebar targets, and R represents the proportion predicted as a positive sample among all actual correct rebar targets;

s2.3.2: respectively taking the reinforcement picture and the reinforcement Mask label in the training set as input and output to be transmitted into an improved Mask R-CNN model to obtain weight parameters of a reinforcement instance segmentation model; carrying out boundary frame and mask prediction on the reinforced bar pictures in the test set by using the obtained weight parameters, carrying out loss and mAP index calculation on the result and the reinforced bar mask label true value, and adjusting the weight parameters; and obtaining a weight parameter corresponding to the maximum mAP until training is finished, and using the weight parameter for subsequent reinforcement recognition.

In a preferred embodiment, the step S3 specifically includes the steps of:

s3.1: based on the reinforcement recognition result, binarizing each mask image, extracting pixel point coordinates of the edges of each mask image by using an edge detection algorithm, and further calculating the pixel point coordinates of the center line of each mask by using a center axis conversion principle;

s3.2: calculating the normal vector of the center line of each mask by using a k nearest neighbor algorithm, dividing the mask with the horizontal component of the center line normal vector smaller than the vertical component into the up-down direction, and dividing the mask with the horizontal component of the center line normal vector larger than or equal to the vertical component into the left-right direction; then using the neutral line coordinates to automatically sequence the reinforcement masks according to the sequence from top to bottom and from left to right;

s3.3: extending the normal vector of the line pixel points along the mask to the edges of both sides, extracting the paired pixel points of the edges by utilizing linear interpolation, and calculating the diameters of the steel bars; similarly, extending the normal vector of the pixel points in the mask center line to the adjacent mask center line, and extracting the paired pixel points in the mask center line for calculating the distance between the steel bars;

s3.4: aligning the RGB image of the steel bar acquired by the depth camera with the depth image, and acquiring the depth information of each pixel point in the RGB image; converting the paired pixel points of the edge and the center line extracted in the step S3.3 from a pixel coordinate system to a camera coordinate system by using a camera internal reference matrix;

s3.5: substituting the camera coordinates of the paired pixel points of the edge and the central line of the steel bar obtained in the step S3.4 into a space distance formula, calculating the actual diameter and the actual distance of the steel bar, and outputting a visualized steel bar quality acceptance result.

Further preferably, in step S3.4, the specific formula of the camera internal reference matrix is:

wherein M represents a camera reference matrix, (u) ₀ ,v ₀ ) D is the coordinate of the central point of the RGB image in the pixel coordinate system _x 、d _y For the length of a single pixel point in the x-axis and the y-axis, f is the focal length of the depth camera, f _x 、f _y The number of pixel points represented by each f length on the x axis and the y axis of the imaging plane.

Further preferably, in step S3.5, the spatial distance formula is specifically:

wherein D is the distance between any two points in space, x ₁ 、y ₁ 、z ₁ Is the x, y, z coordinates of point 1, x ₂ 、y ₂ 、z ₂ Is the x, y, z coordinates of point 2。

The application also provides an intelligent reinforcement detection system based on the convolutional neural network and the binocular vision, and the system can execute the intelligent reinforcement detection method based on the convolutional neural network and the binocular vision.

Compared with the prior art, the application has the beneficial effects that:

(1) The application improves Mask R-CNN, adds a bottom-up propagation path into a feature extraction module, embeds a CA-SA module combining channel attention and space attention, strengthens fusion of shallow and deep feature information, and simultaneously gives a larger weight coefficient to a channel with high target response through a channel attention mechanism, wherein the space attention mechanism enables a target pixel to be a key point of feature extraction, thereby improving accuracy of detection and segmentation of a steel bar target.

(2) The application combines the binocular vision technology, so that the intelligent reinforcement detection has real-time performance, and simultaneously outputs a visual result, thereby assisting the staff in quality inspection and acceptance in reinforcement concealing engineering, greatly improving work efficiency and reducing labor cost.

Drawings

FIG. 1 is a flow chart of an intelligent reinforcement detection method based on convolutional neural network and binocular vision;

fig. 2 is image data of a steel bar acquired by a depth camera according to an embodiment, wherein (a) is an RGB diagram of the steel bar and (b) is a depth diagram of the steel bar;

FIG. 3 is a schematic diagram of a Mask R-CNN network architecture;

FIG. 4 is a diagram of a CA module architecture for a channel attention mechanism of an embodiment;

FIG. 5 is a diagram of a spatial attention mechanism SA module according to one embodiment;

FIG. 6 is a bottom-up attention mechanism path block diagram of an embodiment;

fig. 7 is an output result of the intelligent reinforcement detection method in the embodiment, wherein (a) is a reinforcement prediction result based on the improved Mask R-CNN, and (b) is a visualization result of reinforcement quality detection, and the diameter and the spacing of the reinforcement are intuitively shown.

Detailed Description

The application will be further elucidated with reference to the drawings and to specific embodiments. It is to be understood that these examples are illustrative of the present application and are not intended to limit the scope of the present application.

As shown in fig. 1, an intelligent reinforcement detection method based on convolutional neural network and binocular vision includes the steps:

s1: the RGB image and the depth image data of the steel bar are acquired by using a depth camera, and the depth image data of the steel bar is acquired specifically through the following steps:

By way of example, fig. 2 (a) is an RGB diagram of a rebar and fig. 2 (b) is a depth diagram of a rebar.

S2: the RGB image of the reinforcing steel bar is input into a convolutional neural network for reinforcing steel bar identification, and a prediction boundary frame and a mask of the reinforcing steel bar are obtained, which specifically comprises the following steps:

s2.1: manufacturing a steel bar data set; the method comprises the steps of collecting original pictures of reinforcing steel bars by using camera equipment, manufacturing reinforcing steel bar mask labels by using a manual labeling method, randomly dividing data into a training set and a testing set according to a ratio of 7:3, and amplifying the data set through data enhancement. The data enhancement refers to geometric transformation operations of random translation, rotation, mirroring and affine transformation, and pixel transformation operations of random brightness adjustment, contrast adjustment, HSV, gaussian noise increase, pretzel noise increase and the like.

S2.2: pre-training on an improved Mask R-CNN model using a public dataset COCO2017, initializing network parameters based on the principles of transfer learning.

The network structure of the Mask R-CNN is shown in fig. 3, and the Mask R-CNN model comprises a feature extraction module, an RPN module, an ROI alignment module and an output branch. Forming a feature extraction module by using a feature pyramid structure ResNet-FPN based on a residual error network, and generating a feature map by the image through the feature extraction module; the RPN module further generates a strong priori anchor point frame for each point on the feature map, and obtains the classification score and the bounding box regression quantity of the anchor point frame through 1X 1 convolution, so that a group of better candidate frames are screened out and input into the ROI alignment module; the ROI alignment module transforms the feature map generated by the optimized feature extraction module and the candidate frames screened by the RPN module to the same dimension so as to meet the requirement of the follow-up full convolution network on the input features; and finally, inputting the characteristics obtained by the ROI alignment module into a full-connection layer, and respectively outputting the prediction category score, the bounding box regression quantity and the pixel mask of the object at the classification branch, the bounding box regression branch and the mask branch, thereby completing the whole detection and segmentation task.

In the embodiment, the feature extraction module of the Mask R-CNN network structure is optimized to form an improved Mask R-CNN model. The optimized feature extraction module is a CA-SA module which is formed by adding a bottom-up propagation path and a channel attention CA module and a space attention mechanism SA module into a feature pyramid structure ResNet-FPN based on a residual network. In the CA module, after a feature map with height H, width W and channel number C is input into a global average pooling layer, space dimensions W and H are compressed into a unit 1, then convolution operation is carried out on the obtained 1×1×C feature map, and the sum of channels is made to be 1 through softmax processing, at the moment, output is attention mechanism weight of each channel, and the weight is multiplied by the input feature map correspondingly on the channels to obtain an output feature map. In the SA module, after the feature map is subjected to 1×1 convolution and softmax processing, the channel dimension is compressed into a unit 1, the SA module learns a weight matrix of the H×W size feature map on a two-dimensional plane, the weight matrix corresponds to the spatial attention mechanism weight of each pixel point, represents the importance degree of spatial position information, and gives the weight matrix of the input feature map to amplify important features and weaken background information, so that the effects of feature screening and enhancement are realized. The structure of the bottom-up propagation path is shown in fig. 6, and based on { P2, P3, P4, P5} feature maps obtained by the res net-FPN, feature information of P2 is transferred to N2, N2 is subjected to 3×3 convolution to downsample the width to the size of P3, and then added with P3 element by element to be sent to the CA-SA module to obtain N3, and so on, N4 and N5 are further extracted on the P4 and P5 feature maps, thereby obtaining { N2, N3, N4, N5} feature maps.

S2.3: training the improved Mask R-CNN model established in the step S2.2 through the data set in the step S2.1, and constructing a reinforcement instance segmentation model, wherein the training batch size is set to be 4, the initial learning rate is set to be 0.0005, and the training round (Epoch) is set to be 50, and the method specifically comprises the following steps:

S2.4: the RGB image of the steel bar acquired by the depth camera is input into the steel bar example segmentation model, and the prediction boundary box and the mask of the steel bar are obtained, and the result is shown in fig. 7 (a).

S3: based on the reinforcement recognition result, the reinforcement detection is carried out by utilizing a binocular vision technology, and a visual reinforcement quality acceptance result is output, and the method specifically comprises the following steps:

s3.1: based on the reinforcement recognition result, binarizing each mask image, extracting pixel point coordinates of the edges of each mask image by using an edge detection algorithm, and further calculating the pixel point coordinates of the center line of each mask by using a center axis conversion principle, wherein the upper left corner of the image is a pixel point coordinate origin (0, 0), and the right and downward directions are positive directions of a u axis and a v axis;

s3.2: calculating the normal vector of the center line of each mask by using a k nearest neighbor algorithm, dividing the mask with the horizontal component of the center line normal vector smaller than the vertical component into the up-down direction, and dividing the mask with the horizontal component of the center line normal vector larger than or equal to the vertical component into the left-right direction; then using the neutral line coordinates to automatically sequence the reinforcement masks according to the sequence from top to bottom and from left to right; calculating the average value of the coordinates of all pixel points (u, v) in each central line, sequencing the masks from the small to the large by using the average value of the v coordinates from top to bottom, and sequencing the masks from the small to the large by using the average value of the u coordinates from left to right;

s3.3: extracting paired pixel points of the edges by utilizing linear interpolation along the normal vector of the line pixel points in the mask to calculate the diameters of the steel bars, specifically, uniformly selecting 20 pixel points in each line, extending along the normal vector of the 20 pixel points to the edges of the two sides, extracting the paired pixel points of the edges by utilizing linear interpolation, and representing the diameter of each steel bar by using the average value of the connecting line lengths of the 20 paired pixel points; similarly, extending the normal vector of the pixel points in the mask center line to the adjacent mask center line, and extracting the paired pixel points in the mask center line for calculating the distance between the steel bars;

s3.4: aligning the RGB image of the steel bar acquired by the depth camera with the depth image, and acquiring the depth information of each pixel point in the RGB image; converting the paired pixel points of the edge and the center line extracted in the step S3.3 from a pixel coordinate system to a camera coordinate system by using a camera internal reference matrix; the specific formula of the camera internal reference matrix is as follows:

wherein M represents a camera reference matrix, (u) ₀ ,v ₀ ) D is the coordinate of the central point of the RGB image in the pixel coordinate system _x 、d _y For the length of a single pixel point in the x-axis and the y-axis, f is the focal length of the depth camera, f _x 、f _y The number of pixel points represented by each f length on the x axis and the y axis of the imaging plane is set;

s3.5: substituting the camera coordinates of the paired pixel points of the edge and the central line of the steel bar obtained in the step S3.4 into a space distance formula, calculating the actual diameter and the distance of the steel bar, and outputting a visualized steel bar quality acceptance result, as shown in fig. 7 (b); the space distance formula specifically comprises the following steps:

wherein D is the distance between any two points in space, x ₁ 、y ₁ 、z ₁ Is the x, y, z coordinates of point 1, x ₂ 、y ₂ 、z ₂ Is the x, y, z coordinates of point 2.

Further, it is to be understood that various changes and modifications of the present application may be made by those skilled in the art after reading the above description of the application, and that such equivalents are intended to fall within the scope of the application as defined in the appended claims.

Claims

1. An intelligent reinforcement detection method based on convolutional neural network and binocular vision is characterized by comprising the following steps:

2. The intelligent reinforcement detection method based on convolutional neural network and binocular vision according to claim 1, wherein in step S1, depth image data of the reinforcement is obtained through the following steps:

3. The intelligent reinforcement detection method based on convolutional neural network and binocular vision according to claim 1, wherein the step S2 specifically comprises the steps of:

4. The intelligent reinforcement detection method based on convolutional neural network and binocular vision according to claim 3, wherein in step S2.2, the improved Mask R-CNN model comprises an optimized feature extraction module, an RPN module, an ROI alignment module and an output branch;

5. The intelligent reinforcement detection method based on convolutional neural network and binocular vision according to claim 3, wherein the step S2.3 specifically comprises the steps of:

6. The intelligent reinforcement detection method based on convolutional neural network and binocular vision according to claim 1, wherein the step S3 specifically comprises the steps of:

7. The intelligent reinforcement detection method based on convolutional neural network and binocular vision according to claim 6, wherein in step S3.4, the specific formula of the camera internal reference matrix is:

wherein M represents a camera reference matrix, (u) ₀ ,v ₀ ) D is the coordinate of the central point of the RGB image in the pixel coordinate system _x 、d _y For the length of a single pixel point in the x-axis and the y-axis, f is the focal length of the depth camera, f _x 、f _y For every f in the x-axis and y-axis of the imaging planeThe length represents the number of pixels.

8. The intelligent reinforcement detection method based on convolutional neural network and binocular vision according to claim 6, wherein in step S3.5, the spatial distance formula is specifically:

9. An intelligent reinforcement detection system based on convolutional neural network and binocular vision, which is characterized in that the system can execute the intelligent reinforcement detection method based on convolutional neural network and binocular vision as set forth in any one of claims 1-8.