CN107813310A

CN107813310A - One kind is based on the more gesture robot control methods of binocular vision

Info

Publication number: CN107813310A
Application number: CN201711176221.4A
Authority: CN
Inventors: 卫作龙; 夏晗; 林伟阳; 于兴虎; 佟明斯; 李湛
Original assignee: Zhejiang Youmai De Intelligent Equipment Co Ltd
Current assignee: Yu Xinghu
Priority date: 2017-11-22
Filing date: 2017-11-22
Publication date: 2018-03-20
Anticipated expiration: 2037-11-22
Also published as: CN107813310B

Abstract

The present invention relates to robot control method field, more particularly to based on the more gesture robot control methods of binocular vision, the present invention is in order to solve the robot control method of existing view-based access control model operation inconvenience, the identification of opponent influenceed by illumination and background color it is very big, and offline teaching method is computationally intensive, determination to robot model's precision and coordinate system has the defects of very high-precision requirement, and propose that one kind is based on the more gesture robot control methods of binocular vision, including：Binocular camera is set；It is artificial to choose the rectangle frame for including gesture；It is trained using training sample set pair grader.Detection of classifier obtains target；Target is tracked again, tracking result and testing result are merged；The offset distance that target's center's point is moved to target point from initial point, and output speed control instruction are calculated, robot is carried out translational motion；Characteristic point is extracted in target frame, solves spin matrix corresponding to characteristic point.The present invention is applied to paint-spray robot control method.

Description

Multi-gesture robot control method based on binocular vision

Technical Field

The invention relates to a robot control method, in particular to a binocular vision-based multi-gesture robot control method.

Background

The industrial robot is mainly applied by using a teaching aid by an operator to manually control joint movement of the robot so as to enable the robot to move to a preset position, record the position and transmit the position to a robot controller, and then the robot can automatically repeat the task according to instructions

At present, two methods, namely an artificial teaching method and an off-line teaching method, are mainly used for an industrial robot teaching method. The manual teaching means that the robot is guided by a human to complete the expected action by the end effector of the robot, or the mechanical simulation device is guided by the human operation, or a teaching box is used to make the robot complete the expected action, and the programming of the robot is realized by a real-time online teaching program, and the robot operates by memory, so the robot can repeat the reproduction continuously. The off-line teaching method includes collecting the model, simulating the simulation programming in the computer, planning the track and generating motion track automatically.

The teaching box is used for multiple applications in the field of industrial robots at present, and the control mode is low in efficiency and not intuitive. The robot control method based on vision only has one control mode, when fine adjustment is needed, position and posture instructions can interfere with each other, and control space and robot working space are not easy to be unified, so that operation is inconvenient ^[1] . The identification of the bare hand is mainly based on color space segmentation, and the method is greatly influenced by illumination and background color. The off-line teaching method has large calculation amount, complex algorithm, inconvenient calculation of irregular edges, and high precision of the robot model and the robotThe determination of the coordinate system has high precision requirements.

Disclosure of Invention

The invention provides a binocular vision-based multi-gesture robot control method, aiming at solving the defects that the existing vision-based robot control method is inconvenient to operate, the identification of an opponent is greatly influenced by illumination and background color, the off-line teaching method is large in calculation amount, and the determination of the robot model precision and the robot coordinate system has high precision requirements, and the binocular vision-based multi-gesture robot control method comprises the following steps:

step one, setting a binocular camera, and calibrating and correcting.

And step two, performing gesture demonstration in the visual field range of the binocular camera by an operator, manually selecting a rectangular frame containing a gesture in a video shot by a left video camera of the binocular camera, and adding the rectangular frame into the training sample set.

And step three, training the nearest neighbor classifier and the Bayes classifier by using the training sample set.

Step four, the operator appears in the visual field of the binocular camera according to the gesture in the step two; the processor utilizes a cascade variance classifier according to the image of the left camera, a Bayes classifier based on a random forest and a nearest neighbor classifier to detect and obtain a target; and tracking the target, fusing a tracking result and a detection result, updating samples in the gesture template, detecting and tracking on the epipolar line of the right camera image after the tracking is successful in the left camera image, and outputting a target rectangular frame if the tracking is successful in the left and right views.

Step five, tracking the central point of the target rectangular frame; and calculating the offset distance of the central point from the initial point to the target point, and outputting a speed control instruction to enable the robot to perform translational motion.

And step six, extracting characteristic points for describing the gesture outline in the target rectangular frame, and solving a rotation matrix corresponding to the characteristic points.

Preferably, the step one specifically includes:

the method comprises the steps of setting the distance between a left camera and a right camera in a binocular camera to be 20cm and placing the left camera and the right camera horizontally.

And step two, calibrating the binocular camera by using a Zhang Zhengyou calibration method, and eliminating distortion and line alignment of the views of the left camera and the right camera so as to enable the imaging origin coordinates of the views of the left camera and the right camera to be consistent, the optical axes to be parallel, the imaging planes to be coplanar and the polar lines to be aligned.

Preferably, the second step specifically comprises:

and step two, performing gesture demonstration by an operator in the visual field range of the binocular camera.

And step two, manually selecting a rectangular frame containing the gesture from the video shot by the left camera of the binocular camera.

Secondly, scaling, rotating and affine processing are carried out on the image blocks in the selected rectangular frame, and the scaled, rotated and affine images are normalized into image blocks with the same size to form a positive sample set; selecting a preset number of image blocks which are more than a preset threshold value away from the selected image blocks in the original image to form a negative sample set; the positive sample set and the negative sample set together constitute a training sample set.

Preferably, step three specifically includes: calculating the posterior probability of the foreground class of the Bayesian classifier by the following formula:

wherein y is ₁ Represents the foreground when y ₁ When =0, y represents that there is no object in the image ₁ 1, indicating that the image contains the target; x is the number of _i An ith feature representing an image; each feature of the image is the gray value size relation of two randomly selected points in the image, and the gray value size relation is represented by 0 or 1.

Preferably, the step four specifically includes: in the third step, the first step is that,

the number of the Bayesian classifiers is 10; characteristic x _i The number of corresponding foreground samples is # p, and the number of corresponding background samples is # nAnd the total number of samples is # m, then:

solving for p (y) for each Bayesian classifier ₁ |x _i ) And averaging the results, and if the average value is larger than a preset threshold value, determining that the target exists in the image.

Preferably, in step three, the nearest neighbor classifier is used to calculate the similarity between two image blocks, and the calculation formula is:

in the formula of ₁ ,μ ₂ ,σ ₁ ,σ ₂ Represents the mean and standard deviation of the images P1 and P2; the more similar the two images, the closer the result is to 1; the distance between the two images is defined as:the image slice is considered to contain the object when the distance between the two images is smaller than a predetermined threshold.

Preferably, the step four is specifically:

and step four, the operator appears in the visual field of the binocular camera according to the gesture in the step two.

And step two, manually selecting an initial rectangular frame by an operator.

And step two, the processor generates a sliding rectangular frame, filters the rectangular frame which does not meet the variance threshold condition by using a cascade variance classifier, then obtains the image block which possibly contains the foreground by screening through a Bayesian classifier, and calculates the similarity between the sliding rectangular frame and the manually selected initial rectangular frame through a nearest neighbor classifier.

And step three, selecting the rectangular frame with the highest overlapping degree as a sample rectangular frame, and calculating the Shi-Tomasi corner points in the sample rectangular frame as the feature points.

And fourthly, calculating a forward prediction error, a backward prediction error and the similarity in the sample rectangular frame, and screening out the feature points which are smaller than the average value of the forward prediction error and the backward prediction error and are larger than a preset similarity threshold.

And step four, calculating the average displacement of the feature points screened from the current frame and the corresponding feature points of the previous frame to obtain the position of the target frame of the current frame, and obtaining the size of the target frame in the current frame according to the ratio of the Euclidean distance of the feature points in the previous frame and the current frame.

And step four, carrying out normalization processing on the target frame obtained in the step four, calculating the similarity between the normalized target frame and all images in the positive sample set, if one similarity is greater than a specified threshold value, effectively tracking, adding the obtained target frame into the sample set, and if not, considering that the tracking is invalid and discarding.

Preferably, step five specifically includes:

fifthly, taking the center point of the target frame obtained in the fourth step and the sixth step as a gesture center point, and calculating a space coordinate value of the gesture center by using a stereoscopic vision principle parallax ranging method, wherein the method specifically comprises the following steps:

where X, Y, Z are the positions of the gesture center points in space, u ₁ For marking the x-coordinate, u, of the sphere in the left camera image coordinate system ₀ Is the x origin, u, of the left camera image coordinate system ₂ Is the x coordinate of the marker sphere in the right camera image coordinate system, d is the translation distance between the two cameras, v ₁ For marking the y-coordinate, v, of the sphere in the left camera image coordinate system ₀ Is the y origin of the left camera image coordinate system and f is the camera focal length.

Step two, an operator sets an origin at any position through a calibration button; when the processor detects that the gesture center point leaves a sphere with a preset control threshold value as a radius, a speed control instruction is output, and the calculation formula is as follows:

V＝kd

v is an output speed control instruction, k is a control coefficient, and d is the distance of the center of the gesture deviating from the initial position; and the speed control instruction is used for controlling the tail end of the robot to perform translational motion.

Preferably, the sixth step specifically includes:

sixthly, obtaining the outline of the gesture in the target frame obtained in the step four by a method based on the combination of skin color detection and a background difference method, and obtaining 5 feature points of the index finger, the middle finger, the ring finger, the dent of the index finger and the ring finger, and the dent of the middle finger and the ring finger by a convex hull detection and convex hull defect detection algorithm; and obtaining the space coordinates of the 5 characteristic points through the formula in the step five I.

And sixthly, defining a coordinate system on the palm, taking the root of the middle finger as an origin, defining the direction pointing to the fingertip of the middle finger as the positive direction of the y axis, defining a connecting line parallel to the two dents as the x axis, and defining the direction pointing to the little finger as the positive direction of the x axis.

And sixthly, solving the rotation matrix by using 5 characteristic points according to Carley theorem.

And sixthly, converting the rotation matrix into a pitch-yaw-roll Euler angle, acquiring a relative rotation angle of the gesture in the process of converting the current gesture into the original state, and outputting an Euler angular speed instruction according to the relative rotation angle to control the gesture change of the robot.

Preferably, the sixth step and the third step specifically include:

sixthly, establishing any rotation matrix R without eigenvalue-1 and an antisymmetric matrix S _b The following relations exist between the following components:

R＝(I-S _b ) ^-1 (I+S _b )

S _b ＝(R+I) ^-1 (R-I)

wherein I is an identity matrix, b = (b) ₁ ,b ₂ ,b ₃ ) ^T Is Carley vector; wherein b is ₁ ,b ₂ ,b ₃ The first, second and third components in the Carley vector respectively; and is

Step six, three, two, setting p _i The spatial coordinate value of the feature point i, q _i And the coordinate values of the feature point I in the palm coordinate system. Solving the rotation matrix equation is:

and (3) carrying out identity transformation on the formula to obtain:

in the formula:

v _i ＝p _i -q _i

u _i ＝p _i +q _i

S _ui is u _i A corresponding anti-symmetric matrix;

the equation can be found:

Ab＝c

in the formula:

sixthly, solving an equation Ab = c to obtain a Carley vector, and then calculating a rotation matrix R.

The invention has the beneficial effects that: 1. the position and posture instructions cannot interfere with each other, and the control space and the working space of the robot are easy to unify, so that the operation is simple and convenient; 2. the identification of the opponent uses the characteristic points and the rotation matrix, and is slightly influenced by illumination; 3. the off-line teaching part has small calculated amount and uncomplicated algorithm; 4. the requirements for the accuracy of the robot model and the determination of the robot coordinate system are not high.

Drawings

FIG. 1 is a schematic diagram of a gesture robot control apparatus according to the present invention;

FIG. 2 is a schematic diagram of a control gesture, wherein FIG. 2 (a) is a schematic diagram of a gesture in a gesture control mode; FIG. 2 (b) is a schematic diagram of a gesture in a position control mode;

fig. 3 is a flowchart of the binocular vision based multi-gesture robot control method of the present invention.

Detailed Description

The binocular vision-based multi-gesture robot control method is realized based on the device shown in fig. 1, wherein 101 is an upper computer and comprises a processor for calculating and controlling the robot; 102 is a painting robot; 103 is the left camera of a binocular camera, also called left camera, and 104 is the right camera of a binocular camera, also called right camera. 105 is the operator's hand. A group of binocular cameras are arranged in parallel to be used as a gesture detection part, and control signals obtained through computer processing are sent to the robot. The operator only needs to ensure that the hands are present in the field of view of both cameras.

To prevent the coupling of the position control and the attitude control, two gestures, one of which is the control of the position and one of which is the control of the attitude, can be recorded and learned in advance. The present invention defines that the posture control is performed when the palm is open, as shown in fig. 2 (a); the position control is performed when the thumb, the index finger and the middle finger are pinched together, as shown in fig. 2 (b).

Fig. 3 is a binocular vision-based multi-gesture robot control method, which specifically includes:

step one, setting a binocular camera, and calibrating and correcting.

Step four, the operator appears in the visual field of the binocular camera according to the gesture in the step two; the processor utilizes a cascade variance classifier according to the image of the left camera, and a Bayes classifier based on a random forest and a nearest neighbor classifier are used for detecting to obtain a target; and tracking the target, fusing a tracking result and a detection result, updating samples in the gesture template, detecting and tracking on the epipolar line of the right camera image after the tracking is successful in the left camera image, and outputting a target rectangular frame if the tracking is successful in the left and right views.

The first step may specifically be: the distance between the left camera and the right camera in the binocular camera is set to be 20cm, and the left camera and the right camera are placed in parallel as much as possible. Utilizing a Zhang Zhengyou calibration method to obtain internal parameters and external parameters of the camera, then carrying out stereo correction, respectively carrying out distortion elimination and line alignment on the left view and the right view, so that the imaging origin coordinates of the left view and the right view are consistent, the optical axes of the two cameras are parallel, the left imaging plane and the right imaging plane are coplanar, and the epipolar lines are aligned in a line mode.

Step two and step three are establishing a sample set and training process, which specifically can be: the training process trains the Bayes classifier and the nearest neighbor classifier, image blocks selected by a mouse in an initial frame are zoomed, rotated and affine, and finally normalized into the same-size image blocks as a positive sample set, a plurality of image blocks far away from the image blocks are selected as a negative sample set, and the sample set is used for training the nearest neighbor classifier. And (4) extracting a positive sample set and a negative sample set with 2-bit BP characteristics according to the sample set, training a Bayes classifier, and obtaining a Bayes posterior probability formula. And updating the sample set on line in the tracking mode of the next step, and iteratively training the two classifiers.

The device uses a cascade classifier mode to detect gestures, and comprises a variance classifier, a Bayesian classifier based on random forests and a nearest neighbor classifier. The variance classifier is to find the variance of the image of the sliding rectangle to be detected, because the variance of the tracking target area is generally larger than the variance of the background area, most scanning rectangle frames can be filtered out by the variance filter. The random forest contains 10 Bayes classifiers. The feature selected by the Bayes classifier is a 2-bit BP feature, the 2-bit BP feature is the gray value size relation of any two points, and the values are only 0 and 1. Using y as the class to which the image belongs _i (i =1,2) indicates that the detection problem in the topic can be regarded as one classification problem, and that only two classes, namely the foreground class and the background class, can be made y ₁ =0 indicates no object in the image, y ₁ =1 indicates that the image includes the object. By x _i (i＝1,2,3,...,2 ¹³ ) The feature set representing the image, namely the 2bitBP feature described above. Then the posterior probability of the foreground class obtained by the Bayesian classifier is:

let x in the sample set _i The number of corresponding foreground samples is # p, the number of corresponding background samples is # n, and the total number of samples is # m. Then

And the whole random forest has 10 posterior probabilities, the 10 posterior probabilities are averaged, and if the average is greater than a threshold value, the image slice is considered to contain the foreground object.

The nearest neighbor classifier has two functions, namely, the similarity between each image block and the online model is sequentially matched by using an NCC algorithm, and the positive sample space of the online model is updated. For comparing image blocks P ₁ And P ₂ And characterizing the similarity of the image blocks by using an NCC algorithm.

In the formula of ₁ ,μ ₂ ,σ ₁ ,σ ₂ The mean and standard deviation of the images P1 and P2 are shown. The closer the two images are, the closer the result is to 1. The distance between the two images is defined as:

if the distance is less than the threshold, the image slice is considered to contain the object.

The device selects a rectangular frame to select the middle gesture in the left camera video by using a mouse, obtains a threshold value of a variance classifier and a parameter of a Bayesian classifier through training, and saves a template of a nearest neighbor classifier, an operator must keep the gesture unchanged at the moment, can properly move and rotate hands in multiple angles, simulates a rotation angle possibly appearing in the control process, clicks and saves after learning is completed, and the device can save the template with multiple scales and multiple transformations.

The fourth step is a tracking step, which aims to firstly identify a rectangular frame where the hand is located in a picture shot by the camera and then specifically comprises the following steps according to the motion track of the hand: the operator appears in the visual field of the camera according to the gesture in fig. 2, the device automatically detects, and a trained cascade variance classifier is used, and based on a Bayes classifier of a random forest, a nearest neighbor classifier detects to obtain a target (namely, a process of identifying a rectangular frame where a hand is located). Entering a tracking and detecting cycle (namely a process of tracking the motion trail of the hand part) after the target is detected:

and tracking the Shi-Tomasi corner points in the target frame by using a pyramid LK optical flow method, and removing a part of poorly tracked points by using a forward error and a backward error and an NCC algorithm in order to optimize a tracking effect. The tracking process is as follows:

1. and according to the initially selected rectangular frame, performing template matching in the other camera by using an NCC algorithm to obtain an initial rectangular frame in the other camera.

2. And entering a tracking loop, finding out the rectangular frame with the highest overlapping degree of the target tracking frame from the generated sliding rectangular frames as an optimal tracking sample, and then calculating the Shi-Tomasi corner point in the rectangular frame as a characteristic point.

3. Forward and backward errors and matching similarity, and selecting points which meet the conditions (the points meet the characteristic points which are smaller than the given average forward and backward error and larger than the specified matching similarity), and filtering about half of characteristic points after finishing.

4. And predicting the size and the position of the target frame in the current frame by using the residual characteristic points, obtaining the position of the target frame in the current frame according to the average translation of the characteristic points successfully tracked and the characteristic points corresponding to the previous frame, and obtaining the size of the target frame in the current frame according to the ratio of the Euclidean distances corresponding to the characteristic points in the front and back two frames of images. Invalid if the position exceeds the image.

5. And calculating the similarity of the normalized image slice to the online model, and if the similarity is greater than a specified threshold value, finally considering that the tracking is effective at the time and storing the similarity into a positive sample set. Otherwise it is considered invalid and discarded.

6. And returning to the step 2.

Tracking calculation is carried out in the left camera and the right camera in parallel according to the position of the target in the initial frame, detection circulation using the classifier and the tracking circulation are calculated in parallel, the whole image is searched and detected in the image obtained by the left camera, a tracking result and a detection result are fused to obtain a final result, samples in the gesture template are updated, after tracking detection in the image of the left camera is successful, detection is carried out on a corresponding polar line of the image of the right camera and the tracking result of the right camera is fused to obtain the final result, and therefore the calculation amount of a detection algorithm can be simplified. And if the left view and the right view are tracked successfully at the same time, outputting a target rectangular box.

Step five, determining the space coordinates of the end position and the space coordinates of the initial position of the gesture recognized in the step four, and then determining how the robot should move, specifically:

the operator appears in the field of view of the camera according to the gesture in the right side of fig. 2 (b), and the device is pointed to for detection and tracking according to the method in the fourth step. The center of the rectangular frame that is always tracked can be regarded as the control point.

And calculating the space coordinate value of the gesture center by utilizing a stereoscopic vision principle parallax ranging method. The formula is as follows:

where X, Y, Z are the positions of the gesture centers in space, u ₁ For marking the x-coordinate, u, of the sphere in the left camera image coordinate system ₀ Is the x origin, u, of the left camera image coordinate system ₂ Is the x coordinate of the marker sphere in the right camera image coordinate system, d is the translation distance between the two cameras, v ₁ For marking the y-coordinate, v, of the sphere in the left camera image coordinate system ₀ Is the y origin of the left camera image coordinate system and f is the camera focal length.

When the mouse clicks the calibration button, the point is used as the origin, when the control point leaves the sphere with the control threshold value as the radius, a speed instruction is output, the size of the speed instruction is in direct proportion to the offset distance, and the calculation formula is as follows

V＝kd

Wherein V is the control instruction of output, k is the control coefficient, d is the distance that the center of the gesture deviates from the initial position. And controlling the tail end of the robot to perform translational motion.

And step six, determining the initial posture and the terminal posture of the gesture recognized in the step four, and then determining how the robot should adjust the postures. The pose is represented by a rotation matrix or euler angles. The sixth step is specifically as follows:

the operator appears in the field of view of the camera according to the gesture in the left side of the figure 2, and the device performs detection and tracking according to the method in the step 3. And obtaining the outline of the gesture in the target rectangular frame by using a background modeling method, and obtaining the connecting dents of the index finger, the middle finger, the ring finger and the corresponding fingers by using convex hull detection and convex hull defect detection algorithms to obtain five feature points in total. And obtaining the space coordinates of the five characteristic points according to the positioning method in the step 4.

The method is characterized in that a method based on combination of skin color detection and a background difference method is used for human hand segmentation, wherein the skin color detection converts a color space from an RGB space to an HSV space to obtain a better segmentation effect, and in order to solve the problem that skin colors are influenced by illumination, the background difference method based on a Gaussian mixture model is used to obtain a more complete segmentation effect. After obtaining the binary image of the human hand, the noise is filtered by using the opening operation and the contour area detection in the mathematical morphology. Then, the Graham scanning method is used for solving the salient points of the contour of the human hand, and the positions of the index finger, the middle finger and the ring finger can be obtained by selecting the three highest points. And then calculating points which are farthest from the two salient points between the adjacent salient points, namely the positions of the connection depressions of the fingers, so as to obtain five characteristic points. And obtaining the space coordinates of the five characteristic points according to the positioning method in the step 4.

According to a coordinate system defined by the gesture on the left side of the hand 2 on the palm, the root of the middle finger is used as an origin, the direction pointing to the fingertip of the middle finger is defined as the positive direction of the y axis, the connecting line parallel to the two sunken parts is defined as the positive direction of the x axis, and the direction pointing to the little finger is defined as the positive direction of the x axis.

Solving the rotation matrix according to Carley theorem, wherein the Carley vector representation method of the rotation matrix is as follows: any rotation matrix without eigenvalue-1 and an antisymmetric matrix have the following relationship:

R＝(I-S _b ) ^-1 (I+S _b )

S _b ＝(R+I) ^-1 (R-I)

wherein I is an identity matrix, b = (b) ₁ ,b ₂ ,b ₃ ) ^T Is a Carley vector.

Let p _i The spatial coordinate value of the feature point i, q _i And the coordinate values of the feature point I in the palm coordinate system. Then solving the rotation matrix equation is:

the above equation can be converted into:

in the formula:

v _i ＝p _i -q _i

u _i ＝p _i +q _i

S _ui is u _i Corresponding anti-symmetric matrix.

The equation can be found:

Ab＝c

in the formula:

solving the equation can obtain a Carley vector, and then calculating a rotation matrix R. The rotation matrix is then converted to a pitch-yaw-roll euler angle. And after clicking the calibration button by the mouse, taking the current posture as the original posture, and outputting an Euler angular velocity instruction according to the relative rotation angle to control the posture change of the robot.

Finally, if the operator moves his or her hand out of view or makes a gesture that is not in the template, control ends. And repeating the fifth step and the sixth step to continue the control.

< example >

The specific process of one embodiment of the invention is as follows:

(1) Placing a binocular camera to keep parallel as much as possible, and performing stereo calibration and correction by using a Zhang Zhengyou checkerboard calibration method.

(2) The device enters a learning template mode, a rectangular frame is selected by a mouse in a left camera video to select a middle gesture, the device starts to learn and stores the template, an operator must keep the gesture unchanged at the moment, can properly move and rotate the hand in multiple angles, simulates a rotation angle possibly occurring in the control process, clicks and stores after learning is finished, and the device can store the template with multiple scales and multiple transformations. If multiple different gesture templates are required, the above steps may be repeated.

(3) And (3) binocular vision tracking detection, wherein an operator appears in the visual field of the camera according to the gesture in the step (2), the device automatically detects, tracks the Shi-Tomasi corner points in the target frame by using a pyramid LK optical flow method after detecting the target by a cascade variance classifier, an aggregation classifier and a nearest neighbor classifier, fuses the tracking result and the detection result, updates the sample in the gesture template, detects and tracks on the epipolar line of the right camera image after the tracking is successful in the left camera image, and outputs a target rectangular frame if the tracking is successful in the left and right views.

(4) In the position control mode, the operator appears in the visual field of the camera according to the gesture in the step 2, and the device performs detection and tracking according to the method in the step 3. The center of the rectangular frame that is always tracked can be regarded as the control point. And obtaining the space coordinate value of the gesture center by using a stereoscopic vision three-dimensional reconstruction principle. After the mouse clicks the calibration button, the point is used as an origin, when the control point leaves a sphere with a control threshold value as a radius, a speed instruction is output, and the size of the speed instruction is in direct proportion to the offset distance: v = kd.

(5) In the attitude control mode, the operator appears in the visual field of the camera according to the gesture in the step 2, and the device performs detection and tracking according to the method in the step 3. And obtaining the outline of the gesture in the target rectangular frame by using a background modeling method, and obtaining the connecting concave positions of the index finger, the middle finger, the ring finger and the corresponding fingers by using convex hull detection and convex hull defect detection algorithms. The rotation matrix is solved according to Carley's theorem in advance for the coordinate system defined on the palm and then translated into pitch-yaw = roll euler angles. And after clicking the calibration button by the mouse, taking the current posture as the original posture, and outputting an Euler angular velocity instruction according to the relative rotation angle to control the posture change of the robot.

(6) The operator removes his hand from view or makes a gesture that is not in the template, and control ends. And repeating the step 3 and the step 4 to continue the control.

The present invention is capable of other embodiments and its several details are capable of modifications in various obvious respects, all without departing from the spirit and scope of the present invention.

Claims

1. A multi-gesture robot control method based on binocular vision is characterized by comprising

Step one, setting a binocular camera, and calibrating and correcting;

secondly, performing gesture demonstration in the visual field range of the binocular camera by an operator, manually selecting a rectangular frame containing a gesture in a video shot by a left video camera of the binocular camera, and adding the rectangular frame into a training sample set;

step three, training a nearest neighbor classifier and a Bayes classifier by using a training sample set;

step four, the operator appears in the visual field of the binocular camera according to the gesture in the step two; the processor utilizes a cascade variance classifier according to the image of the left camera, and a Bayes classifier based on a random forest and a nearest neighbor classifier are used for detecting to obtain a target; tracking the target, fusing a tracking result and a detection result, updating samples in the gesture template, detecting and tracking on the polar line of the right camera image after the tracking is successful in the left camera image, and outputting a target rectangular frame if the tracking is successful in the left and right views;

step five, tracking the central point of the target rectangular frame; calculating the offset distance of the central point from the initial point to the target point, and outputting a speed control instruction to enable the robot to perform translational motion;

and step six, extracting characteristic points for describing the gesture outline from the target rectangular frame, and solving a rotation matrix corresponding to the characteristic points to enable the robot to perform posture conversion.

2. The binocular vision-based multi-gesture robot control method according to claim 1, wherein the first step specifically comprises:

step one, setting the distance between a left camera and a right camera in a binocular camera to be 20cm, and horizontally placing the left camera and the right camera;

3. The binocular vision-based multi-gesture robot control method according to claim 1, wherein the second step specifically comprises:

secondly, performing gesture demonstration by an operator in the visual field range of the binocular camera;

secondly, manually selecting a rectangular frame containing gestures from a video shot by a left camera of a binocular camera;

secondly, scaling, rotating and affine processing are carried out on the image blocks in the selected rectangular frame, and the scaled, rotated and affine images are normalized into image blocks with the same size to form a positive sample set; selecting a preset number of image blocks in the original image, wherein the distance between the image blocks in the original image and the selected image blocks is greater than a preset threshold value to form a negative sample set; the positive sample set and the negative sample set together constitute a training sample set.

4. The binocular vision-based multi-gesture robot control method of claim 1, wherein in step three,

calculating the posterior probability of the foreground class of the Bayesian classifier by the following formula:

wherein y is ₁ Represents the foreground when y ₁ When =0, y represents that there is no object in the image ₁ 1, indicating that the image contains the target; x is the number of _i An ith feature representing an image; each characteristic of the image is the gray value size relation of two randomly selected points in the image, and the gray value size relation is represented by 0 or 1.

5. The binocular vision-based multi-gesture robot control method of claim 4, wherein in step three,

the number of the Bayesian classifiers is 10; characteristic x _i The corresponding number of positive samples is # p, the number of negative samples is # n, and the total number of samples is # m, then:

6. The binocular vision-based multi-gesture robot control method according to claim 1 or 4, wherein in step three, the nearest neighbor classifier is used for calculating the similarity of two image blocks, and the calculation formula is as follows:

in the formula of ₁ ,μ ₂ ,σ ₁ ,σ ₂ Represents the mean and standard deviation of images P1 and P2; the more similar the two images, the closer the result is to 1; the distance between the two images is defined as:the image slice is considered to contain the object when the distance between the two images is less than a predetermined threshold.

7. The binocular vision based multi-gesture robot control method according to claim 6, wherein the fourth step is specifically:

fourthly, the operator appears in the visual field of the binocular camera according to the gesture in the second step;

step two, manually selecting an initial rectangular frame by an operator;

step two, the processor generates a sliding rectangular frame, a cascade variance classifier is used for filtering out the rectangular frame which does not meet the variance threshold condition, then an image block which possibly contains the foreground is obtained through screening of a Bayesian classifier, and then the similarity between the sliding rectangular frame and the manually selected initial rectangular frame is calculated through a nearest neighbor classifier;

selecting a rectangular frame with the highest overlapping degree as a sample rectangular frame, and calculating Shi-Tomasi angular points in the sample rectangular frame as characteristic points;

fourthly, calculating a forward prediction error, a backward prediction error and a similarity in a sample rectangular frame, and screening out feature points which are smaller than the average value of the forward prediction error and the backward prediction error and are larger than a preset similarity threshold;

step four, calculating the average displacement of the feature points screened from the current frame and the corresponding feature points of the previous frame to obtain the position of the target frame of the current frame, and obtaining the size of the target frame in the current frame according to the ratio of the Euclidean distance between the feature points of the previous frame and the current frame;

8. The binocular vision-based multi-gesture robot control method according to claim 7, wherein the step five specifically comprises:

where X, Y, Z are the positions of the gesture center points in space, u ₁ For marking the x-coordinate, u, of the sphere in the left camera image coordinate system ₀ Is the x origin, u, of the left camera image coordinate system ₂ Is the x coordinate of the marker sphere in the right camera image coordinate system, d is the translation distance between the two cameras, v ₁ For marking the y-coordinate, v, of the sphere in the left camera image coordinate system ₀ Is the y origin of the left camera image coordinate system, and f is the camera focal length;

V＝kd

v is an output speed control instruction, k is a control coefficient, and d is the distance of the center of the gesture deviating from the initial position; the speed control command is used for controlling the robot to perform translational motion.

9. The binocular vision based multi-gesture robot control method according to claim 8, wherein the sixth step specifically includes:

sixthly, obtaining the outline of the gesture in the target frame obtained in the step IV by a method based on combination of skin color detection and a background difference method, and obtaining 5 feature points of the index finger, the middle finger, the ring finger, the dent of the index finger and the ring finger and the dent of the middle finger and the ring finger by a convex hull detection algorithm and a convex hull defect detection algorithm; obtaining the space coordinates of the 5 characteristic points through the formula in the fifth step;

defining a coordinate system on the palm, taking the root of the middle finger as an origin, defining the direction pointing to the fingertip of the middle finger as the positive direction of the y axis, defining a connecting line parallel to the two dents as the x axis, and defining the direction pointing to the little finger as the positive direction of the x axis;

sixthly, solving a rotation matrix by utilizing 5 characteristic points according to Carley theorem;