CN112212828A

CN112212828A - Locator gradient measuring method based on binocular vision

Info

Publication number: CN112212828A
Application number: CN201910625502.6A
Authority: CN
Inventors: 王瑞锋
Original assignee: Chengdu Tangyuan Electric Co Ltd
Current assignee: Chengdu Tangyuan Electric Co Ltd
Priority date: 2019-07-11
Filing date: 2019-07-11
Publication date: 2021-01-12

Abstract

The invention discloses a locator gradient measuring method based on binocular vision, and relates to the technical field of contact network detection. The method comprises an image acquisition step, wherein a binocular vision camera is adopted for image acquisition; the method comprises the following steps of positioning tube and pantograph key point extraction, wherein a depth Convolution Neural Network (CNN) target positioning algorithm is adopted to extract key points of images of a binocular camera at the same moment, and the steps of characteristic extraction and neural network matching are as follows: the method comprises the steps of realizing automatic feature matching by adopting a neural network, outputting CNN in the previous step as input of the matching network, calculating three-dimensional coordinates and calculating the gradient of a locator, wherein the network structure is a 5-layer fully-connected network. The method adopts two cameras to shoot the same locator and integrate deep learning, and measures the gradient of the locator according to the binocular vision principle, the precision error of the gradient of the locator measured by the measuring method is +/-0.1 degree, and the method can be operated in the scenes of day and night.

Description

Locator gradient measuring method based on binocular vision

Technical Field

The invention relates to the technical field of contact network detection, in particular to a locator gradient measuring method based on binocular vision.

Background

The locator is the key functional unit in the contact net positioner, and it plays the crucial effect to whether the train can safe speed-up and high-speed operation. However, if the positioner is not installed properly, the positioner may collide with the pantograph slide plate when the pantograph of the electric locomotive passes through, which may cause serious consequences. Therefore, the slope of the positioner of the catenary is an important parameter of the pantograph-catenary relationship.

The conventional manual detection method is labor-consuming and time-consuming and dangerous, so that the method has great significance in monitoring the gradient of the positioner based on machine vision. The image acquisition device is installed on the detection vehicle, and the slope of the locator can be calculated in real time by using the machine vision principle in the running process of the train.

At present, machine vision is adopted to calculate the gradient of a locator, and a camera is adopted to shoot, so that the calculation accuracy is not high enough; the two cameras are adopted, and the image feature matching precision is not high enough.

In conclusion, the invention adopts two cameras to shoot the same positioner and combines with deep learning, and calculates the gradient of the positioner according to the binocular vision principle.

Disclosure of Invention

In order to overcome the defects and shortcomings in the prior art, the invention provides a method for measuring the gradient of a locator based on binocular vision, and aims to solve the problems of low calculation precision and low matching precision in the prior art.

In order to solve the problems in the prior art, the invention is realized by the following technical scheme:

a locator gradient measuring method based on binocular vision is characterized in that: the method comprises the following steps:

an image acquisition step: the method comprises the following steps of (1) acquiring images by using binocular vision cameras, wherein the two cameras are parallelly arranged on a straight line, and shooting images of a positioner and a pantograph;

extracting key points of the positioning tube and the pantograph: respectively extracting key points of images collected by two cameras of the binocular vision camera at the same moment, and extracting the key points of the images of the binocular vision camera at the same moment by adopting a depth Convolution Neural Network (CNN) target positioning algorithm;

the deep Convolutional Neural Network (CNN) target positioning algorithm comprises a convolutional layer and a sampling layer, wherein the convolutional layer performs dot product operation through a convolutional kernel and upper layer output, and the convolutional operation is as follows:

wherein

For the convolution output of the current-layer network,

the method comprises the steps that the weight matrix is output by an upper-layer neural network, the bias vector is used as W, the initial value of B is a random value between 0 and 1, the activation function is used as f, the activation function is generally a Sigmoid function, and the function form is represented by the following formula Sigmoid =

(ii) a The sampling layer calculation means that when a sampling core slides in an image area, the maximum value of a point in a sliding window is calculated;

directly regressing a target frame and a target category of a position on a plurality of positions of the input image; the method specifically comprises the following steps:

a training stage step: collecting samples, wherein the kind of the target and the center position in the image are marked in each sample; dividing the image into 7-by-7 grids, and converting the center of the target in a knowledge rectangular frame of the whole image into the position center of the corresponding 7-by-7 grids; assuming that x is the center and w is the picture width, the target is at the several grid numbers index _ x = x/(w/7), the shift in the x direction is the remainder of x/(w/7), and the y direction is calculated according to the above method; sending the prepared training sample into a neural net grid for training, and storing the trained weight file;

loading a weight file and a configuration file of a neural network, sending an input image into a neural network model for calculation, dividing the image into 7-by-7 grids, randomly selecting 2 grids with fixed length-width ratios for labeling each grid, judging object types and confidence degrees in 2 rectangular frames, if a threshold value is met, taking the two rectangular frames as positioning targets, if the two targets are in the same type, combining the two rectangular frames, and labeling the position and type information of the rectangular frames;

the method comprises the following steps of feature extraction and neural network matching: the automatic feature matching is realized by adopting a neural network, the input of the matching network is the output of the CNN in the previous step, the network structure is a 5-layer full-connection network, the input is that the features of two camera images are connected in series, and fc1=512 represents that the number of neurons is 512; fc5 is the output layer, the number of neurons is 2, and the matching is successful or not; the training algorithm of the network adopts BP and random gradient descent method;

and (3) calculating a three-dimensional coordinate: calculating an essential matrix E and a recovery camera matrix P;

calculating an essential matrix E, wherein the essential matrix E is a special form of a basic matrix under the normalized image coordinates and is specifically represented as

Wherein x1 and x2 are coordinates of a pair of matching points of the left camera and the right camera respectively; calculating by adopting a epipolar geometry and random sampling consistency algorithm (RANSAC);

recovering the calculation of the camera matrix P, denoted as P = K [ R | t ], where K is the camera external parameter, R is the rotation matrix between the two cameras, and t is the translation matrix between the two cameras, the invention assumes that the rotation matrix of the left camera P1 is the identity matrix I, and t is 0; meanwhile, E = t | R, and t and R can be obtained according to SVD; further solving a matrix P2 of the right camera;

three-dimensional coordinate calculation according to the following formula:

the world coordinates of a space point P (x, y, z) can be calculated, wherein f, t and r are internal and external parameters of the camera, the world coordinates can be calculated through calibration and P1 and P2, and X, Y is the matching point of the left camera and the right camera under the image coordinates;

calculating the gradient of the positioner: world coordinates P1(x1, y1, z1), P2(x 2, y2, z 2) form a middle straight line, key points P3 and P4 of the bow net plane form another straight line, the included angle of the two straight lines is a positioning slope, and the included angle of the two spatial straight lines can obtain the slope of the positioner by adopting a cosine formula between vectors.

Furthermore, the convolutional network model constructed by the convolutional network has 17 layers, and the last two layers are full connection layers.

Further, the sampling kernel is a 3 × 3 sliding window.

Compared with the prior art, the beneficial technical effects brought by the invention are as follows:

1. the method adopts two cameras to shoot the same locator and integrate deep learning, and measures the gradient of the locator according to the binocular vision principle, the precision error of the gradient of the locator measured by the measuring method is +/-0.1 degree, and the method can be operated in the scenes of day and night.

2. The invention adopts binocular vision images for calculation, two cameras are parallelly arranged on a straight line, and the images of the positioner and the pantograph are shot simultaneously. Considering the requirement of running speed, if the full-image matching is time-consuming, the method calculates the gradient of the locator, and only needs to match the positioning pipe with the pantograph, so that key points of the positioning pipe and the pantograph are extracted first, and the extraction algorithm adopts a CNN target positioning algorithm.

Drawings

FIG. 1 is a flow chart of the binocular vision based locator slope measurement of the present invention;

FIG. 2 is a diagram of a Sigmoid function of the present invention;

FIG. 3 is a diagram of a convolution network model of the present invention;

fig. 4 is a network structure diagram of the feature extraction and matching neural network of the present invention.

Detailed Description

The technical scheme of the invention is further elaborated in the following by combining the drawings in the specification.

Example 1

Referring to fig. 1-4 of the specification, this embodiment discloses:

wherein

For the convolution output of the current-layer network,

Wherein x1 and x2 are coordinates of a pair of matching points of the left camera and the right camera respectively; using antipodal geometry and random pumpingSample consensus algorithm (RANSAC) calculation;

three-dimensional coordinate calculation according to the following formula:

Example 2

Referring to fig. 1-4 of the specification, this embodiment discloses as another preferred embodiment of the present invention:

as shown in fig. 1, left and right camera images: the invention adopts binocular vision images for calculation, two cameras are parallelly arranged on a straight line, and the images of the positioner and the pantograph are shot simultaneously.

Extracting key points of the positioning pipe and the pantograph: considering the requirement of running speed, if the full-image matching is time-consuming, the method calculates the gradient of the locator, and only needs to match the positioning pipe with the pantograph, so that key points of the positioning pipe and the pantograph are extracted first, and the extraction algorithm adopts a CNN target positioning algorithm.

(1) Convolutional Neural Network (CNN) architecture

Convolutional neural networks are an efficient identification method that has been developed in recent years and has attracted extensive attention. In the 60's of the 20 th century, Hubel and Wiesel discovered that their unique network structures can effectively reduce the complexity of feedback Neural Networks when studying neurons for local sensitivity and direction selection in the cerebral cortex of cats, and then proposed Convolutional Neural Networks (CNN). The convolutional neural network is generally formed by intersecting a convolutional layer and a sampling layer, wherein the convolutional layer is used for extracting image characteristics; the sampling layer is used for keeping the stability of local characteristics, and the CNN is used as a neural network in the invention.

(2) Convolutional neural network operations

CNN generally consists of convolutional layer and sampling layer, where convolutional layer refers to performing dot product operation through convolutional kernel and upper layer output, and the convolutional operation is as follows:

wherein

For the convolution output of the current-layer network,

The sampling layer calculation means that when the sampling core slides in the image area, the maximum value of the point in the sliding window is calculated, and the sampling core is the sliding window of 3 x 3.

(3) Convolutional neural network model

The convolutional network model is the key of the invention, the invention adopts the network model shown in figure 3, the total number of the network model is 17 layers, and the last two layers are full connection layers.

(4) Part target regression

The method of the invention uses the idea of regression, i.e. given an input image, directly regresses the target frame and the target class of a position on a plurality of positions of the image. The method is the innovation point of the invention.

A training stage step:

collecting massive samples, wherein each sample needs to mark the type of a target and the central position of the target in an image; dividing the image into 7 × 7 grids, converting the center of the target in the real rectangular frame of the full image into the center of the grid position corresponding to 7 × 7: assuming that x is the center and w is the picture width, the target has the several grid numbers index _ x = x/(w/7), the offset in the x direction is the remainder of x/(w/7), and the y direction is calculated identically; thirdly, the prepared training sample is sent into a neural network for training; fourthly, storing the trained weight file.

The identification stage comprises the following steps:

firstly, loading a weight file and a configuration file of a neural network; secondly, the input image is sent into a neural network model for calculation; dividing the image into 7 × 7 grids; for each grid, randomly taking two rectangular frames with fixed length-width ratios; judging the object type and confidence in 2 rectangular frames, if meeting a threshold, considering two rectangular frames as positioning targets, if two targets are in the same type, combining the rectangular frames, and ranging the position and type information of the rectangular frames.

Feature extraction and matching neural networks: the invention adopts a novel automatic feature extraction and automatic feature matching algorithm and adopts a neural network to realize automatic feature matching. The input to the matching network is the output of CNN of the previous step, and the network structure is a 5-layer fully-connected network, as shown in fig. 4, where the input is two camera image features concatenated together, and fc1=512 represents the number of neurons as 512. fc5 indicates the success or failure of matching, and the number of neurons is 2. The training algorithm of the network adopts BP and random gradient descent method.

And (3) calculating three-dimensional coordinates: calculation including the intrinsic matrix E and calculation of the recovered Camera matrix P

(1) Calculation of the essence matrix E

The essential matrix is a special form of a basic matrix under the normalized image coordinates, and satisfies the following formula:

wherein x1 and x2 are the coordinates of a pair of matching points of the left camera and the right camera respectively. Calculating by adopting a epipolar geometry and random sampling consistency algorithm (RANSAC);

(2) calculation of the recovery Camera matrix P

P = K [ R | t ], where K is camera external parameter, R is rotation matrix between two cameras, and t is translation matrix between two cameras, the present invention assumes that rotation matrix of left camera P1 is identity matrix I, and t is 0; at the same time, E = t | R, t and R can be found from SVD. Further, a matrix P2 of the right camera is obtained.

(3) And (3) calculating three-dimensional coordinates:

according to the following formula:

the world coordinates of a space point P (x, y, z) can be calculated, wherein f, t and r are internal and external parameters of the camera, the world coordinates can be obtained through calibration and P1 and P2, and X, Y is the matching point of the left camera and the right camera under the image coordinates.

Calculating the gradient of the positioner: the key points of the positioning tube, namely world coordinates P1(x1, y1, z1), P2(x1, y1, z1) form a straight line in the space; the key points P3 and P4 of the bow net plane form another straight line, and the included angle of the two straight lines is the slope of the locator. The included angle of two straight lines in the space can be obtained by adopting a cosine formula between vectors.

Claims

1. A locator gradient measuring method based on binocular vision is characterized in that: the method comprises the following steps:

wherein

For the convolution output of the current-layer network,

three-dimensional coordinate calculation according to the following formula:

2. The binocular vision-based locator slope measurement method of claim 1, wherein: the convolutional network model constructed by the convolutional network has 17 layers in total, and the last two layers are full connection layers.

3. The binocular vision-based locator slope measurement method of claim 1 or 2, wherein: the sampling kernel is a 3 x 3 sliding window.