CN107813310A - One kind is based on the more gesture robot control methods of binocular vision - Google Patents

One kind is based on the more gesture robot control methods of binocular vision Download PDF

Info

Publication number
CN107813310A
CN107813310A CN201711176221.4A CN201711176221A CN107813310A CN 107813310 A CN107813310 A CN 107813310A CN 201711176221 A CN201711176221 A CN 201711176221A CN 107813310 A CN107813310 A CN 107813310A
Authority
CN
China
Prior art keywords
gesture
image
camera
target
rectangular frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711176221.4A
Other languages
Chinese (zh)
Other versions
CN107813310B (en
Inventor
卫作龙
夏晗
林伟阳
于兴虎
佟明斯
李湛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yu Xinghu
Original Assignee
Zhejiang Youmai De Intelligent Equipment Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Youmai De Intelligent Equipment Co Ltd filed Critical Zhejiang Youmai De Intelligent Equipment Co Ltd
Priority to CN201711176221.4A priority Critical patent/CN107813310B/en
Publication of CN107813310A publication Critical patent/CN107813310A/en
Application granted granted Critical
Publication of CN107813310B publication Critical patent/CN107813310B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J9/00Programme-controlled manipulators
    • B25J9/16Programme controls
    • B25J9/1602Programme controls characterised by the control system, structure, architecture
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J13/00Controls for manipulators
    • B25J13/08Controls for manipulators by means of sensing devices, e.g. viewing or touching devices

Landscapes

  • Engineering & Computer Science (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Human Computer Interaction (AREA)
  • Automation & Control Theory (AREA)
  • Image Analysis (AREA)

Abstract

The present invention relates to robot control method field, more particularly to based on the more gesture robot control methods of binocular vision, the present invention is in order to solve the robot control method of existing view-based access control model operation inconvenience, the identification of opponent influenceed by illumination and background color it is very big, and offline teaching method is computationally intensive, determination to robot model's precision and coordinate system has the defects of very high-precision requirement, and propose that one kind is based on the more gesture robot control methods of binocular vision, including:Binocular camera is set;It is artificial to choose the rectangle frame for including gesture;It is trained using training sample set pair grader.Detection of classifier obtains target;Target is tracked again, tracking result and testing result are merged;The offset distance that target's center's point is moved to target point from initial point, and output speed control instruction are calculated, robot is carried out translational motion;Characteristic point is extracted in target frame, solves spin matrix corresponding to characteristic point.The present invention is applied to paint-spray robot control method.

Description

Multi-gesture robot control method based on binocular vision
Technical Field
The invention relates to a robot control method, in particular to a binocular vision-based multi-gesture robot control method.
Background
The industrial robot is mainly applied by using a teaching aid by an operator to manually control joint movement of the robot so as to enable the robot to move to a preset position, record the position and transmit the position to a robot controller, and then the robot can automatically repeat the task according to instructions
At present, two methods, namely an artificial teaching method and an off-line teaching method, are mainly used for an industrial robot teaching method. The manual teaching means that the robot is guided by a human to complete the expected action by the end effector of the robot, or the mechanical simulation device is guided by the human operation, or a teaching box is used to make the robot complete the expected action, and the programming of the robot is realized by a real-time online teaching program, and the robot operates by memory, so the robot can repeat the reproduction continuously. The off-line teaching method includes collecting the model, simulating the simulation programming in the computer, planning the track and generating motion track automatically.
The teaching box is used for multiple applications in the field of industrial robots at present, and the control mode is low in efficiency and not intuitive. The robot control method based on vision only has one control mode, when fine adjustment is needed, position and posture instructions can interfere with each other, and control space and robot working space are not easy to be unified, so that operation is inconvenient [1] . The identification of the bare hand is mainly based on color space segmentation, and the method is greatly influenced by illumination and background color. The off-line teaching method has large calculation amount, complex algorithm, inconvenient calculation of irregular edges, and high precision of the robot model and the robotThe determination of the coordinate system has high precision requirements.
Disclosure of Invention
The invention provides a binocular vision-based multi-gesture robot control method, aiming at solving the defects that the existing vision-based robot control method is inconvenient to operate, the identification of an opponent is greatly influenced by illumination and background color, the off-line teaching method is large in calculation amount, and the determination of the robot model precision and the robot coordinate system has high precision requirements, and the binocular vision-based multi-gesture robot control method comprises the following steps:
step one, setting a binocular camera, and calibrating and correcting.
And step two, performing gesture demonstration in the visual field range of the binocular camera by an operator, manually selecting a rectangular frame containing a gesture in a video shot by a left video camera of the binocular camera, and adding the rectangular frame into the training sample set.
And step three, training the nearest neighbor classifier and the Bayes classifier by using the training sample set.
Step four, the operator appears in the visual field of the binocular camera according to the gesture in the step two; the processor utilizes a cascade variance classifier according to the image of the left camera, a Bayes classifier based on a random forest and a nearest neighbor classifier to detect and obtain a target; and tracking the target, fusing a tracking result and a detection result, updating samples in the gesture template, detecting and tracking on the epipolar line of the right camera image after the tracking is successful in the left camera image, and outputting a target rectangular frame if the tracking is successful in the left and right views.
Step five, tracking the central point of the target rectangular frame; and calculating the offset distance of the central point from the initial point to the target point, and outputting a speed control instruction to enable the robot to perform translational motion.
And step six, extracting characteristic points for describing the gesture outline in the target rectangular frame, and solving a rotation matrix corresponding to the characteristic points.
Preferably, the step one specifically includes:
the method comprises the steps of setting the distance between a left camera and a right camera in a binocular camera to be 20cm and placing the left camera and the right camera horizontally.
And step two, calibrating the binocular camera by using a Zhang Zhengyou calibration method, and eliminating distortion and line alignment of the views of the left camera and the right camera so as to enable the imaging origin coordinates of the views of the left camera and the right camera to be consistent, the optical axes to be parallel, the imaging planes to be coplanar and the polar lines to be aligned.
Preferably, the second step specifically comprises:
and step two, performing gesture demonstration by an operator in the visual field range of the binocular camera.
And step two, manually selecting a rectangular frame containing the gesture from the video shot by the left camera of the binocular camera.
Secondly, scaling, rotating and affine processing are carried out on the image blocks in the selected rectangular frame, and the scaled, rotated and affine images are normalized into image blocks with the same size to form a positive sample set; selecting a preset number of image blocks which are more than a preset threshold value away from the selected image blocks in the original image to form a negative sample set; the positive sample set and the negative sample set together constitute a training sample set.
Preferably, step three specifically includes: calculating the posterior probability of the foreground class of the Bayesian classifier by the following formula:
wherein y is 1 Represents the foreground when y 1 When =0, y represents that there is no object in the image 1 1, indicating that the image contains the target; x is the number of i An ith feature representing an image; each feature of the image is the gray value size relation of two randomly selected points in the image, and the gray value size relation is represented by 0 or 1.
Preferably, the step four specifically includes: in the third step, the first step is that,
the number of the Bayesian classifiers is 10; characteristic x i The number of corresponding foreground samples is # p, and the number of corresponding background samples is # nAnd the total number of samples is # m, then:
solving for p (y) for each Bayesian classifier 1 |x i ) And averaging the results, and if the average value is larger than a preset threshold value, determining that the target exists in the image.
Preferably, in step three, the nearest neighbor classifier is used to calculate the similarity between two image blocks, and the calculation formula is:
in the formula of 1212 Represents the mean and standard deviation of the images P1 and P2; the more similar the two images, the closer the result is to 1; the distance between the two images is defined as:the image slice is considered to contain the object when the distance between the two images is smaller than a predetermined threshold.
Preferably, the step four is specifically:
and step four, the operator appears in the visual field of the binocular camera according to the gesture in the step two.
And step two, manually selecting an initial rectangular frame by an operator.
And step two, the processor generates a sliding rectangular frame, filters the rectangular frame which does not meet the variance threshold condition by using a cascade variance classifier, then obtains the image block which possibly contains the foreground by screening through a Bayesian classifier, and calculates the similarity between the sliding rectangular frame and the manually selected initial rectangular frame through a nearest neighbor classifier.
And step three, selecting the rectangular frame with the highest overlapping degree as a sample rectangular frame, and calculating the Shi-Tomasi corner points in the sample rectangular frame as the feature points.
And fourthly, calculating a forward prediction error, a backward prediction error and the similarity in the sample rectangular frame, and screening out the feature points which are smaller than the average value of the forward prediction error and the backward prediction error and are larger than a preset similarity threshold.
And step four, calculating the average displacement of the feature points screened from the current frame and the corresponding feature points of the previous frame to obtain the position of the target frame of the current frame, and obtaining the size of the target frame in the current frame according to the ratio of the Euclidean distance of the feature points in the previous frame and the current frame.
And step four, carrying out normalization processing on the target frame obtained in the step four, calculating the similarity between the normalized target frame and all images in the positive sample set, if one similarity is greater than a specified threshold value, effectively tracking, adding the obtained target frame into the sample set, and if not, considering that the tracking is invalid and discarding.
Preferably, step five specifically includes:
fifthly, taking the center point of the target frame obtained in the fourth step and the sixth step as a gesture center point, and calculating a space coordinate value of the gesture center by using a stereoscopic vision principle parallax ranging method, wherein the method specifically comprises the following steps:
where X, Y, Z are the positions of the gesture center points in space, u 1 For marking the x-coordinate, u, of the sphere in the left camera image coordinate system 0 Is the x origin, u, of the left camera image coordinate system 2 Is the x coordinate of the marker sphere in the right camera image coordinate system, d is the translation distance between the two cameras, v 1 For marking the y-coordinate, v, of the sphere in the left camera image coordinate system 0 Is the y origin of the left camera image coordinate system and f is the camera focal length.
Step two, an operator sets an origin at any position through a calibration button; when the processor detects that the gesture center point leaves a sphere with a preset control threshold value as a radius, a speed control instruction is output, and the calculation formula is as follows:
V=kd
v is an output speed control instruction, k is a control coefficient, and d is the distance of the center of the gesture deviating from the initial position; and the speed control instruction is used for controlling the tail end of the robot to perform translational motion.
Preferably, the sixth step specifically includes:
sixthly, obtaining the outline of the gesture in the target frame obtained in the step four by a method based on the combination of skin color detection and a background difference method, and obtaining 5 feature points of the index finger, the middle finger, the ring finger, the dent of the index finger and the ring finger, and the dent of the middle finger and the ring finger by a convex hull detection and convex hull defect detection algorithm; and obtaining the space coordinates of the 5 characteristic points through the formula in the step five I.
And sixthly, defining a coordinate system on the palm, taking the root of the middle finger as an origin, defining the direction pointing to the fingertip of the middle finger as the positive direction of the y axis, defining a connecting line parallel to the two dents as the x axis, and defining the direction pointing to the little finger as the positive direction of the x axis.
And sixthly, solving the rotation matrix by using 5 characteristic points according to Carley theorem.
And sixthly, converting the rotation matrix into a pitch-yaw-roll Euler angle, acquiring a relative rotation angle of the gesture in the process of converting the current gesture into the original state, and outputting an Euler angular speed instruction according to the relative rotation angle to control the gesture change of the robot.
Preferably, the sixth step and the third step specifically include:
sixthly, establishing any rotation matrix R without eigenvalue-1 and an antisymmetric matrix S b The following relations exist between the following components:
R=(I-S b ) -1 (I+S b )
S b =(R+I) -1 (R-I)
wherein I is an identity matrix, b = (b) 1 ,b 2 ,b 3 ) T Is Carley vector; wherein b is 1 ,b 2 ,b 3 The first, second and third components in the Carley vector respectively; and is
Step six, three, two, setting p i The spatial coordinate value of the feature point i, q i And the coordinate values of the feature point I in the palm coordinate system. Solving the rotation matrix equation is:
and (3) carrying out identity transformation on the formula to obtain:
in the formula:
v i =p i -q i
u i =p i +q i
S ui is u i A corresponding anti-symmetric matrix;
the equation can be found:
Ab=c
in the formula:
sixthly, solving an equation Ab = c to obtain a Carley vector, and then calculating a rotation matrix R.
The invention has the beneficial effects that: 1. the position and posture instructions cannot interfere with each other, and the control space and the working space of the robot are easy to unify, so that the operation is simple and convenient; 2. the identification of the opponent uses the characteristic points and the rotation matrix, and is slightly influenced by illumination; 3. the off-line teaching part has small calculated amount and uncomplicated algorithm; 4. the requirements for the accuracy of the robot model and the determination of the robot coordinate system are not high.
Drawings
FIG. 1 is a schematic diagram of a gesture robot control apparatus according to the present invention;
FIG. 2 is a schematic diagram of a control gesture, wherein FIG. 2 (a) is a schematic diagram of a gesture in a gesture control mode; FIG. 2 (b) is a schematic diagram of a gesture in a position control mode;
fig. 3 is a flowchart of the binocular vision based multi-gesture robot control method of the present invention.
Detailed Description
The binocular vision-based multi-gesture robot control method is realized based on the device shown in fig. 1, wherein 101 is an upper computer and comprises a processor for calculating and controlling the robot; 102 is a painting robot; 103 is the left camera of a binocular camera, also called left camera, and 104 is the right camera of a binocular camera, also called right camera. 105 is the operator's hand. A group of binocular cameras are arranged in parallel to be used as a gesture detection part, and control signals obtained through computer processing are sent to the robot. The operator only needs to ensure that the hands are present in the field of view of both cameras.
To prevent the coupling of the position control and the attitude control, two gestures, one of which is the control of the position and one of which is the control of the attitude, can be recorded and learned in advance. The present invention defines that the posture control is performed when the palm is open, as shown in fig. 2 (a); the position control is performed when the thumb, the index finger and the middle finger are pinched together, as shown in fig. 2 (b).
Fig. 3 is a binocular vision-based multi-gesture robot control method, which specifically includes:
step one, setting a binocular camera, and calibrating and correcting.
And step two, performing gesture demonstration in the visual field range of the binocular camera by an operator, manually selecting a rectangular frame containing a gesture in a video shot by a left video camera of the binocular camera, and adding the rectangular frame into the training sample set.
And step three, training the nearest neighbor classifier and the Bayes classifier by using the training sample set.
Step four, the operator appears in the visual field of the binocular camera according to the gesture in the step two; the processor utilizes a cascade variance classifier according to the image of the left camera, and a Bayes classifier based on a random forest and a nearest neighbor classifier are used for detecting to obtain a target; and tracking the target, fusing a tracking result and a detection result, updating samples in the gesture template, detecting and tracking on the epipolar line of the right camera image after the tracking is successful in the left camera image, and outputting a target rectangular frame if the tracking is successful in the left and right views.
Step five, tracking the central point of the target rectangular frame; and calculating the offset distance of the central point from the initial point to the target point, and outputting a speed control instruction to enable the robot to perform translational motion.
And step six, extracting characteristic points for describing the gesture outline in the target rectangular frame, and solving a rotation matrix corresponding to the characteristic points.
The first step may specifically be: the distance between the left camera and the right camera in the binocular camera is set to be 20cm, and the left camera and the right camera are placed in parallel as much as possible. Utilizing a Zhang Zhengyou calibration method to obtain internal parameters and external parameters of the camera, then carrying out stereo correction, respectively carrying out distortion elimination and line alignment on the left view and the right view, so that the imaging origin coordinates of the left view and the right view are consistent, the optical axes of the two cameras are parallel, the left imaging plane and the right imaging plane are coplanar, and the epipolar lines are aligned in a line mode.
Step two and step three are establishing a sample set and training process, which specifically can be: the training process trains the Bayes classifier and the nearest neighbor classifier, image blocks selected by a mouse in an initial frame are zoomed, rotated and affine, and finally normalized into the same-size image blocks as a positive sample set, a plurality of image blocks far away from the image blocks are selected as a negative sample set, and the sample set is used for training the nearest neighbor classifier. And (4) extracting a positive sample set and a negative sample set with 2-bit BP characteristics according to the sample set, training a Bayes classifier, and obtaining a Bayes posterior probability formula. And updating the sample set on line in the tracking mode of the next step, and iteratively training the two classifiers.
The device uses a cascade classifier mode to detect gestures, and comprises a variance classifier, a Bayesian classifier based on random forests and a nearest neighbor classifier. The variance classifier is to find the variance of the image of the sliding rectangle to be detected, because the variance of the tracking target area is generally larger than the variance of the background area, most scanning rectangle frames can be filtered out by the variance filter. The random forest contains 10 Bayes classifiers. The feature selected by the Bayes classifier is a 2-bit BP feature, the 2-bit BP feature is the gray value size relation of any two points, and the values are only 0 and 1. Using y as the class to which the image belongs i (i =1,2) indicates that the detection problem in the topic can be regarded as one classification problem, and that only two classes, namely the foreground class and the background class, can be made y 1 =0 indicates no object in the image, y 1 =1 indicates that the image includes the object. By x i (i=1,2,3,...,2 13 ) The feature set representing the image, namely the 2bitBP feature described above. Then the posterior probability of the foreground class obtained by the Bayesian classifier is:
let x in the sample set i The number of corresponding foreground samples is # p, the number of corresponding background samples is # n, and the total number of samples is # m. Then
And the whole random forest has 10 posterior probabilities, the 10 posterior probabilities are averaged, and if the average is greater than a threshold value, the image slice is considered to contain the foreground object.
The nearest neighbor classifier has two functions, namely, the similarity between each image block and the online model is sequentially matched by using an NCC algorithm, and the positive sample space of the online model is updated. For comparing image blocks P 1 And P 2 And characterizing the similarity of the image blocks by using an NCC algorithm.
In the formula of 1212 The mean and standard deviation of the images P1 and P2 are shown. The closer the two images are, the closer the result is to 1. The distance between the two images is defined as:
if the distance is less than the threshold, the image slice is considered to contain the object.
The device selects a rectangular frame to select the middle gesture in the left camera video by using a mouse, obtains a threshold value of a variance classifier and a parameter of a Bayesian classifier through training, and saves a template of a nearest neighbor classifier, an operator must keep the gesture unchanged at the moment, can properly move and rotate hands in multiple angles, simulates a rotation angle possibly appearing in the control process, clicks and saves after learning is completed, and the device can save the template with multiple scales and multiple transformations.
The fourth step is a tracking step, which aims to firstly identify a rectangular frame where the hand is located in a picture shot by the camera and then specifically comprises the following steps according to the motion track of the hand: the operator appears in the visual field of the camera according to the gesture in fig. 2, the device automatically detects, and a trained cascade variance classifier is used, and based on a Bayes classifier of a random forest, a nearest neighbor classifier detects to obtain a target (namely, a process of identifying a rectangular frame where a hand is located). Entering a tracking and detecting cycle (namely a process of tracking the motion trail of the hand part) after the target is detected:
and tracking the Shi-Tomasi corner points in the target frame by using a pyramid LK optical flow method, and removing a part of poorly tracked points by using a forward error and a backward error and an NCC algorithm in order to optimize a tracking effect. The tracking process is as follows:
1. and according to the initially selected rectangular frame, performing template matching in the other camera by using an NCC algorithm to obtain an initial rectangular frame in the other camera.
2. And entering a tracking loop, finding out the rectangular frame with the highest overlapping degree of the target tracking frame from the generated sliding rectangular frames as an optimal tracking sample, and then calculating the Shi-Tomasi corner point in the rectangular frame as a characteristic point.
3. Forward and backward errors and matching similarity, and selecting points which meet the conditions (the points meet the characteristic points which are smaller than the given average forward and backward error and larger than the specified matching similarity), and filtering about half of characteristic points after finishing.
4. And predicting the size and the position of the target frame in the current frame by using the residual characteristic points, obtaining the position of the target frame in the current frame according to the average translation of the characteristic points successfully tracked and the characteristic points corresponding to the previous frame, and obtaining the size of the target frame in the current frame according to the ratio of the Euclidean distances corresponding to the characteristic points in the front and back two frames of images. Invalid if the position exceeds the image.
5. And calculating the similarity of the normalized image slice to the online model, and if the similarity is greater than a specified threshold value, finally considering that the tracking is effective at the time and storing the similarity into a positive sample set. Otherwise it is considered invalid and discarded.
6. And returning to the step 2.
Tracking calculation is carried out in the left camera and the right camera in parallel according to the position of the target in the initial frame, detection circulation using the classifier and the tracking circulation are calculated in parallel, the whole image is searched and detected in the image obtained by the left camera, a tracking result and a detection result are fused to obtain a final result, samples in the gesture template are updated, after tracking detection in the image of the left camera is successful, detection is carried out on a corresponding polar line of the image of the right camera and the tracking result of the right camera is fused to obtain the final result, and therefore the calculation amount of a detection algorithm can be simplified. And if the left view and the right view are tracked successfully at the same time, outputting a target rectangular box.
Step five, determining the space coordinates of the end position and the space coordinates of the initial position of the gesture recognized in the step four, and then determining how the robot should move, specifically:
the operator appears in the field of view of the camera according to the gesture in the right side of fig. 2 (b), and the device is pointed to for detection and tracking according to the method in the fourth step. The center of the rectangular frame that is always tracked can be regarded as the control point.
And calculating the space coordinate value of the gesture center by utilizing a stereoscopic vision principle parallax ranging method. The formula is as follows:
where X, Y, Z are the positions of the gesture centers in space, u 1 For marking the x-coordinate, u, of the sphere in the left camera image coordinate system 0 Is the x origin, u, of the left camera image coordinate system 2 Is the x coordinate of the marker sphere in the right camera image coordinate system, d is the translation distance between the two cameras, v 1 For marking the y-coordinate, v, of the sphere in the left camera image coordinate system 0 Is the y origin of the left camera image coordinate system and f is the camera focal length.
When the mouse clicks the calibration button, the point is used as the origin, when the control point leaves the sphere with the control threshold value as the radius, a speed instruction is output, the size of the speed instruction is in direct proportion to the offset distance, and the calculation formula is as follows
V=kd
Wherein V is the control instruction of output, k is the control coefficient, d is the distance that the center of the gesture deviates from the initial position. And controlling the tail end of the robot to perform translational motion.
And step six, determining the initial posture and the terminal posture of the gesture recognized in the step four, and then determining how the robot should adjust the postures. The pose is represented by a rotation matrix or euler angles. The sixth step is specifically as follows:
the operator appears in the field of view of the camera according to the gesture in the left side of the figure 2, and the device performs detection and tracking according to the method in the step 3. And obtaining the outline of the gesture in the target rectangular frame by using a background modeling method, and obtaining the connecting dents of the index finger, the middle finger, the ring finger and the corresponding fingers by using convex hull detection and convex hull defect detection algorithms to obtain five feature points in total. And obtaining the space coordinates of the five characteristic points according to the positioning method in the step 4.
The method is characterized in that a method based on combination of skin color detection and a background difference method is used for human hand segmentation, wherein the skin color detection converts a color space from an RGB space to an HSV space to obtain a better segmentation effect, and in order to solve the problem that skin colors are influenced by illumination, the background difference method based on a Gaussian mixture model is used to obtain a more complete segmentation effect. After obtaining the binary image of the human hand, the noise is filtered by using the opening operation and the contour area detection in the mathematical morphology. Then, the Graham scanning method is used for solving the salient points of the contour of the human hand, and the positions of the index finger, the middle finger and the ring finger can be obtained by selecting the three highest points. And then calculating points which are farthest from the two salient points between the adjacent salient points, namely the positions of the connection depressions of the fingers, so as to obtain five characteristic points. And obtaining the space coordinates of the five characteristic points according to the positioning method in the step 4.
According to a coordinate system defined by the gesture on the left side of the hand 2 on the palm, the root of the middle finger is used as an origin, the direction pointing to the fingertip of the middle finger is defined as the positive direction of the y axis, the connecting line parallel to the two sunken parts is defined as the positive direction of the x axis, and the direction pointing to the little finger is defined as the positive direction of the x axis.
Solving the rotation matrix according to Carley theorem, wherein the Carley vector representation method of the rotation matrix is as follows: any rotation matrix without eigenvalue-1 and an antisymmetric matrix have the following relationship:
R=(I-S b ) -1 (I+S b )
S b =(R+I) -1 (R-I)
wherein I is an identity matrix, b = (b) 1 ,b 2 ,b 3 ) T Is a Carley vector.
Let p i The spatial coordinate value of the feature point i, q i And the coordinate values of the feature point I in the palm coordinate system. Then solving the rotation matrix equation is:
the above equation can be converted into:
in the formula:
v i =p i -q i
u i =p i +q i
S ui is u i Corresponding anti-symmetric matrix.
The equation can be found:
Ab=c
in the formula:
solving the equation can obtain a Carley vector, and then calculating a rotation matrix R. The rotation matrix is then converted to a pitch-yaw-roll euler angle. And after clicking the calibration button by the mouse, taking the current posture as the original posture, and outputting an Euler angular velocity instruction according to the relative rotation angle to control the posture change of the robot.
Finally, if the operator moves his or her hand out of view or makes a gesture that is not in the template, control ends. And repeating the fifth step and the sixth step to continue the control.
< example >
The specific process of one embodiment of the invention is as follows:
(1) Placing a binocular camera to keep parallel as much as possible, and performing stereo calibration and correction by using a Zhang Zhengyou checkerboard calibration method.
(2) The device enters a learning template mode, a rectangular frame is selected by a mouse in a left camera video to select a middle gesture, the device starts to learn and stores the template, an operator must keep the gesture unchanged at the moment, can properly move and rotate the hand in multiple angles, simulates a rotation angle possibly occurring in the control process, clicks and stores after learning is finished, and the device can store the template with multiple scales and multiple transformations. If multiple different gesture templates are required, the above steps may be repeated.
(3) And (3) binocular vision tracking detection, wherein an operator appears in the visual field of the camera according to the gesture in the step (2), the device automatically detects, tracks the Shi-Tomasi corner points in the target frame by using a pyramid LK optical flow method after detecting the target by a cascade variance classifier, an aggregation classifier and a nearest neighbor classifier, fuses the tracking result and the detection result, updates the sample in the gesture template, detects and tracks on the epipolar line of the right camera image after the tracking is successful in the left camera image, and outputs a target rectangular frame if the tracking is successful in the left and right views.
(4) In the position control mode, the operator appears in the visual field of the camera according to the gesture in the step 2, and the device performs detection and tracking according to the method in the step 3. The center of the rectangular frame that is always tracked can be regarded as the control point. And obtaining the space coordinate value of the gesture center by using a stereoscopic vision three-dimensional reconstruction principle. After the mouse clicks the calibration button, the point is used as an origin, when the control point leaves a sphere with a control threshold value as a radius, a speed instruction is output, and the size of the speed instruction is in direct proportion to the offset distance: v = kd.
(5) In the attitude control mode, the operator appears in the visual field of the camera according to the gesture in the step 2, and the device performs detection and tracking according to the method in the step 3. And obtaining the outline of the gesture in the target rectangular frame by using a background modeling method, and obtaining the connecting concave positions of the index finger, the middle finger, the ring finger and the corresponding fingers by using convex hull detection and convex hull defect detection algorithms. The rotation matrix is solved according to Carley's theorem in advance for the coordinate system defined on the palm and then translated into pitch-yaw = roll euler angles. And after clicking the calibration button by the mouse, taking the current posture as the original posture, and outputting an Euler angular velocity instruction according to the relative rotation angle to control the posture change of the robot.
(6) The operator removes his hand from view or makes a gesture that is not in the template, and control ends. And repeating the step 3 and the step 4 to continue the control.
The present invention is capable of other embodiments and its several details are capable of modifications in various obvious respects, all without departing from the spirit and scope of the present invention.

Claims (9)

1. A multi-gesture robot control method based on binocular vision is characterized by comprising
Step one, setting a binocular camera, and calibrating and correcting;
secondly, performing gesture demonstration in the visual field range of the binocular camera by an operator, manually selecting a rectangular frame containing a gesture in a video shot by a left video camera of the binocular camera, and adding the rectangular frame into a training sample set;
step three, training a nearest neighbor classifier and a Bayes classifier by using a training sample set;
step four, the operator appears in the visual field of the binocular camera according to the gesture in the step two; the processor utilizes a cascade variance classifier according to the image of the left camera, and a Bayes classifier based on a random forest and a nearest neighbor classifier are used for detecting to obtain a target; tracking the target, fusing a tracking result and a detection result, updating samples in the gesture template, detecting and tracking on the polar line of the right camera image after the tracking is successful in the left camera image, and outputting a target rectangular frame if the tracking is successful in the left and right views;
step five, tracking the central point of the target rectangular frame; calculating the offset distance of the central point from the initial point to the target point, and outputting a speed control instruction to enable the robot to perform translational motion;
and step six, extracting characteristic points for describing the gesture outline from the target rectangular frame, and solving a rotation matrix corresponding to the characteristic points to enable the robot to perform posture conversion.
2. The binocular vision-based multi-gesture robot control method according to claim 1, wherein the first step specifically comprises:
step one, setting the distance between a left camera and a right camera in a binocular camera to be 20cm, and horizontally placing the left camera and the right camera;
and step two, calibrating the binocular camera by using a Zhang Zhengyou calibration method, and eliminating distortion and line alignment of the views of the left camera and the right camera so as to enable the imaging origin coordinates of the views of the left camera and the right camera to be consistent, the optical axes to be parallel, the imaging planes to be coplanar and the polar lines to be aligned.
3. The binocular vision-based multi-gesture robot control method according to claim 1, wherein the second step specifically comprises:
secondly, performing gesture demonstration by an operator in the visual field range of the binocular camera;
secondly, manually selecting a rectangular frame containing gestures from a video shot by a left camera of a binocular camera;
secondly, scaling, rotating and affine processing are carried out on the image blocks in the selected rectangular frame, and the scaled, rotated and affine images are normalized into image blocks with the same size to form a positive sample set; selecting a preset number of image blocks in the original image, wherein the distance between the image blocks in the original image and the selected image blocks is greater than a preset threshold value to form a negative sample set; the positive sample set and the negative sample set together constitute a training sample set.
4. The binocular vision-based multi-gesture robot control method of claim 1, wherein in step three,
calculating the posterior probability of the foreground class of the Bayesian classifier by the following formula:
wherein y is 1 Represents the foreground when y 1 When =0, y represents that there is no object in the image 1 1, indicating that the image contains the target; x is the number of i An ith feature representing an image; each characteristic of the image is the gray value size relation of two randomly selected points in the image, and the gray value size relation is represented by 0 or 1.
5. The binocular vision-based multi-gesture robot control method of claim 4, wherein in step three,
the number of the Bayesian classifiers is 10; characteristic x i The corresponding number of positive samples is # p, the number of negative samples is # n, and the total number of samples is # m, then:
solving for p (y) for each Bayesian classifier 1 |x i ) And averaging the results, and if the average value is larger than a preset threshold value, determining that the target exists in the image.
6. The binocular vision-based multi-gesture robot control method according to claim 1 or 4, wherein in step three, the nearest neighbor classifier is used for calculating the similarity of two image blocks, and the calculation formula is as follows:
in the formula of 1212 Represents the mean and standard deviation of images P1 and P2; the more similar the two images, the closer the result is to 1; the distance between the two images is defined as:the image slice is considered to contain the object when the distance between the two images is less than a predetermined threshold.
7. The binocular vision based multi-gesture robot control method according to claim 6, wherein the fourth step is specifically:
fourthly, the operator appears in the visual field of the binocular camera according to the gesture in the second step;
step two, manually selecting an initial rectangular frame by an operator;
step two, the processor generates a sliding rectangular frame, a cascade variance classifier is used for filtering out the rectangular frame which does not meet the variance threshold condition, then an image block which possibly contains the foreground is obtained through screening of a Bayesian classifier, and then the similarity between the sliding rectangular frame and the manually selected initial rectangular frame is calculated through a nearest neighbor classifier;
selecting a rectangular frame with the highest overlapping degree as a sample rectangular frame, and calculating Shi-Tomasi angular points in the sample rectangular frame as characteristic points;
fourthly, calculating a forward prediction error, a backward prediction error and a similarity in a sample rectangular frame, and screening out feature points which are smaller than the average value of the forward prediction error and the backward prediction error and are larger than a preset similarity threshold;
step four, calculating the average displacement of the feature points screened from the current frame and the corresponding feature points of the previous frame to obtain the position of the target frame of the current frame, and obtaining the size of the target frame in the current frame according to the ratio of the Euclidean distance between the feature points of the previous frame and the current frame;
and step four, carrying out normalization processing on the target frame obtained in the step four, calculating the similarity between the normalized target frame and all images in the positive sample set, if one similarity is greater than a specified threshold value, effectively tracking, adding the obtained target frame into the sample set, and if not, considering that the tracking is invalid and discarding.
8. The binocular vision-based multi-gesture robot control method according to claim 7, wherein the step five specifically comprises:
fifthly, taking the center point of the target frame obtained in the fourth step and the sixth step as a gesture center point, and calculating a space coordinate value of the gesture center by using a stereoscopic vision principle parallax ranging method, wherein the method specifically comprises the following steps:
where X, Y, Z are the positions of the gesture center points in space, u 1 For marking the x-coordinate, u, of the sphere in the left camera image coordinate system 0 Is the x origin, u, of the left camera image coordinate system 2 Is the x coordinate of the marker sphere in the right camera image coordinate system, d is the translation distance between the two cameras, v 1 For marking the y-coordinate, v, of the sphere in the left camera image coordinate system 0 Is the y origin of the left camera image coordinate system, and f is the camera focal length;
step two, an operator sets an origin at any position through a calibration button; when the processor detects that the gesture center point leaves a sphere with a preset control threshold value as a radius, a speed control instruction is output, and the calculation formula is as follows:
V=kd
v is an output speed control instruction, k is a control coefficient, and d is the distance of the center of the gesture deviating from the initial position; the speed control command is used for controlling the robot to perform translational motion.
9. The binocular vision based multi-gesture robot control method according to claim 8, wherein the sixth step specifically includes:
sixthly, obtaining the outline of the gesture in the target frame obtained in the step IV by a method based on combination of skin color detection and a background difference method, and obtaining 5 feature points of the index finger, the middle finger, the ring finger, the dent of the index finger and the ring finger and the dent of the middle finger and the ring finger by a convex hull detection algorithm and a convex hull defect detection algorithm; obtaining the space coordinates of the 5 characteristic points through the formula in the fifth step;
defining a coordinate system on the palm, taking the root of the middle finger as an origin, defining the direction pointing to the fingertip of the middle finger as the positive direction of the y axis, defining a connecting line parallel to the two dents as the x axis, and defining the direction pointing to the little finger as the positive direction of the x axis;
sixthly, solving a rotation matrix by utilizing 5 characteristic points according to Carley theorem;
and sixthly, converting the rotation matrix into a pitch-yaw-roll Euler angle, acquiring a relative rotation angle of the gesture in the process of converting the current gesture into the original state, and outputting an Euler angular speed instruction according to the relative rotation angle to control the gesture change of the robot.
CN201711176221.4A 2017-11-22 2017-11-22 Multi-gesture robot control method based on binocular vision Active CN107813310B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711176221.4A CN107813310B (en) 2017-11-22 2017-11-22 Multi-gesture robot control method based on binocular vision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711176221.4A CN107813310B (en) 2017-11-22 2017-11-22 Multi-gesture robot control method based on binocular vision

Publications (2)

Publication Number Publication Date
CN107813310A true CN107813310A (en) 2018-03-20
CN107813310B CN107813310B (en) 2020-10-20

Family

ID=61609771

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711176221.4A Active CN107813310B (en) 2017-11-22 2017-11-22 Multi-gesture robot control method based on binocular vision

Country Status (1)

Country Link
CN (1) CN107813310B (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109101872A (en) * 2018-06-20 2018-12-28 济南大学 A kind of generation method of 3D gesture mouse
CN109271931A (en) * 2018-09-14 2019-01-25 辽宁奇辉电子***工程有限公司 It is a kind of that gesture real-time identifying system is pointed sword at based on edge analysis
CN109460077A (en) * 2018-11-19 2019-03-12 深圳博为教育科技有限公司 A kind of automatic tracking method, automatic tracking device and automatic tracking system
CN109635648A (en) * 2018-11-05 2019-04-16 上海鲸鱼机器人科技有限公司 Robot and its control method
CN110916577A (en) * 2019-12-17 2020-03-27 小狗电器互联网科技(北京)股份有限公司 Robot static state judgment method and device and robot
CN111015657A (en) * 2019-12-19 2020-04-17 佛山科学技术学院 Adaptive control method, device and system of industrial robot
CN111046796A (en) * 2019-12-12 2020-04-21 哈尔滨拓博科技有限公司 Low-cost space gesture control method and system based on double-camera depth information
CN111216133A (en) * 2020-02-05 2020-06-02 广州中国科学院先进技术研究所 Robot demonstration programming method based on fingertip identification and hand motion tracking
CN111367415A (en) * 2020-03-17 2020-07-03 北京明略软件***有限公司 Equipment control method and device, computer equipment and medium
CN111462240A (en) * 2020-04-08 2020-07-28 北京理工大学 Target positioning method based on multi-monocular vision fusion
CN111539979A (en) * 2020-04-27 2020-08-14 天津大学 Human body front tracking method based on deep reinforcement learning
CN111949925A (en) * 2020-06-30 2020-11-17 中国资源卫星应用中心 Image relative orientation method and device based on Reed-Solomon matrix and maximum convex hull
CN112560592A (en) * 2020-11-30 2021-03-26 深圳市商汤科技有限公司 Image processing method and device, and terminal control method and device
CN112749664A (en) * 2021-01-15 2021-05-04 广东工贸职业技术学院 Gesture recognition method, device, equipment, system and storage medium
CN112917470A (en) * 2019-12-06 2021-06-08 鲁班嫡系机器人(深圳)有限公司 Teaching method, device and system of manipulator, storage medium and equipment
CN113255612A (en) * 2021-07-05 2021-08-13 智道网联科技(北京)有限公司 Preceding vehicle starting reminding method and system, electronic device and storage medium
CN113741550A (en) * 2020-05-15 2021-12-03 北京机械设备研究所 Mobile robot following method and system
CN113822251A (en) * 2021-11-23 2021-12-21 齐鲁工业大学 Ground reconnaissance robot gesture control system and control method based on binocular vision
CN117340914A (en) * 2023-10-24 2024-01-05 哈尔滨工程大学 Humanoid robot human body feeling control method and control system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102350700A (en) * 2011-09-19 2012-02-15 华南理工大学 Method for controlling robot based on visual sense
US20120062736A1 (en) * 2010-09-13 2012-03-15 Xiong Huaixin Hand and indicating-point positioning method and hand gesture determining method used in human-computer interaction system
CN104463191A (en) * 2014-10-30 2015-03-25 华南理工大学 Robot visual processing method based on attention mechanism
CN104680127A (en) * 2014-12-18 2015-06-03 闻泰通讯股份有限公司 Gesture identification method and gesture identification system
CN104821010A (en) * 2015-05-04 2015-08-05 清华大学深圳研究生院 Binocular-vision-based real-time extraction method and system for three-dimensional hand information
US20160170481A1 (en) * 2014-11-07 2016-06-16 Eye Labs, LLC Visual stabilization system for head-mounted displays
CN106502418A (en) * 2016-11-09 2017-03-15 南京阿凡达机器人科技有限公司 A kind of vision follower method based on monocular gesture identification

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120062736A1 (en) * 2010-09-13 2012-03-15 Xiong Huaixin Hand and indicating-point positioning method and hand gesture determining method used in human-computer interaction system
CN102350700A (en) * 2011-09-19 2012-02-15 华南理工大学 Method for controlling robot based on visual sense
CN104463191A (en) * 2014-10-30 2015-03-25 华南理工大学 Robot visual processing method based on attention mechanism
US20160170481A1 (en) * 2014-11-07 2016-06-16 Eye Labs, LLC Visual stabilization system for head-mounted displays
CN104680127A (en) * 2014-12-18 2015-06-03 闻泰通讯股份有限公司 Gesture identification method and gesture identification system
CN104821010A (en) * 2015-05-04 2015-08-05 清华大学深圳研究生院 Binocular-vision-based real-time extraction method and system for three-dimensional hand information
CN106502418A (en) * 2016-11-09 2017-03-15 南京阿凡达机器人科技有限公司 A kind of vision follower method based on monocular gesture identification

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
孔欣: "基于双目立体视觉的手势识别研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
李红英: "基于视觉的手势识别关键技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
王辉: "基于视觉的实时手势跟踪与识别及其在人机交互中的应用", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109101872A (en) * 2018-06-20 2018-12-28 济南大学 A kind of generation method of 3D gesture mouse
CN109101872B (en) * 2018-06-20 2023-04-18 济南大学 Method for generating 3D gesture mouse
CN109271931A (en) * 2018-09-14 2019-01-25 辽宁奇辉电子***工程有限公司 It is a kind of that gesture real-time identifying system is pointed sword at based on edge analysis
CN109635648A (en) * 2018-11-05 2019-04-16 上海鲸鱼机器人科技有限公司 Robot and its control method
CN109460077A (en) * 2018-11-19 2019-03-12 深圳博为教育科技有限公司 A kind of automatic tracking method, automatic tracking device and automatic tracking system
CN109460077B (en) * 2018-11-19 2022-05-17 深圳博为教育科技有限公司 Automatic tracking method, automatic tracking equipment and automatic tracking system
CN112917470A (en) * 2019-12-06 2021-06-08 鲁班嫡系机器人(深圳)有限公司 Teaching method, device and system of manipulator, storage medium and equipment
CN111046796A (en) * 2019-12-12 2020-04-21 哈尔滨拓博科技有限公司 Low-cost space gesture control method and system based on double-camera depth information
CN110916577A (en) * 2019-12-17 2020-03-27 小狗电器互联网科技(北京)股份有限公司 Robot static state judgment method and device and robot
CN111015657A (en) * 2019-12-19 2020-04-17 佛山科学技术学院 Adaptive control method, device and system of industrial robot
CN111216133A (en) * 2020-02-05 2020-06-02 广州中国科学院先进技术研究所 Robot demonstration programming method based on fingertip identification and hand motion tracking
CN111216133B (en) * 2020-02-05 2022-11-22 广州中国科学院先进技术研究所 Robot demonstration programming method based on fingertip identification and hand motion tracking
CN111367415B (en) * 2020-03-17 2024-01-23 北京明略软件***有限公司 Equipment control method and device, computer equipment and medium
CN111367415A (en) * 2020-03-17 2020-07-03 北京明略软件***有限公司 Equipment control method and device, computer equipment and medium
CN111462240B (en) * 2020-04-08 2023-05-30 北京理工大学 Target positioning method based on multi-monocular vision fusion
CN111462240A (en) * 2020-04-08 2020-07-28 北京理工大学 Target positioning method based on multi-monocular vision fusion
CN111539979A (en) * 2020-04-27 2020-08-14 天津大学 Human body front tracking method based on deep reinforcement learning
CN111539979B (en) * 2020-04-27 2022-12-27 天津大学 Human body front tracking method based on deep reinforcement learning
CN113741550A (en) * 2020-05-15 2021-12-03 北京机械设备研究所 Mobile robot following method and system
CN113741550B (en) * 2020-05-15 2024-02-02 北京机械设备研究所 Mobile robot following method and system
CN111949925B (en) * 2020-06-30 2023-08-29 中国资源卫星应用中心 Image relative orientation method and device based on Rodriger matrix and maximum convex hull
CN111949925A (en) * 2020-06-30 2020-11-17 中国资源卫星应用中心 Image relative orientation method and device based on Reed-Solomon matrix and maximum convex hull
CN112560592A (en) * 2020-11-30 2021-03-26 深圳市商汤科技有限公司 Image processing method and device, and terminal control method and device
CN112749664A (en) * 2021-01-15 2021-05-04 广东工贸职业技术学院 Gesture recognition method, device, equipment, system and storage medium
CN113255612A (en) * 2021-07-05 2021-08-13 智道网联科技(北京)有限公司 Preceding vehicle starting reminding method and system, electronic device and storage medium
CN113822251B (en) * 2021-11-23 2022-02-08 齐鲁工业大学 Ground reconnaissance robot gesture control system and control method based on binocular vision
CN113822251A (en) * 2021-11-23 2021-12-21 齐鲁工业大学 Ground reconnaissance robot gesture control system and control method based on binocular vision
CN117340914A (en) * 2023-10-24 2024-01-05 哈尔滨工程大学 Humanoid robot human body feeling control method and control system
CN117340914B (en) * 2023-10-24 2024-05-14 哈尔滨工程大学 Humanoid robot human body feeling control method and control system

Also Published As

Publication number Publication date
CN107813310B (en) 2020-10-20

Similar Documents

Publication Publication Date Title
CN107813310B (en) Multi-gesture robot control method based on binocular vision
CN112476434B (en) Visual 3D pick-and-place method and system based on cooperative robot
CN108369643B (en) Method and system for 3D hand skeleton tracking
CN113524194B (en) Target grabbing method of robot vision grabbing system based on multi-mode feature deep learning
Tzionas et al. Capturing hands in action using discriminative salient points and physics simulation
CN109993073B (en) Leap Motion-based complex dynamic gesture recognition method
JP5812599B2 (en) Information processing method and apparatus
CN110842914A (en) Hand-eye calibration parameter identification method, system and medium based on differential evolution algorithm
CN109559341B (en) Method and device for generating mechanical arm grabbing scheme
Schröder et al. Real-time hand tracking using synergistic inverse kinematics
CN111897349A (en) Underwater robot autonomous obstacle avoidance method based on binocular vision
CN112906797A (en) Plane grabbing detection method based on computer vision and deep learning
Hao et al. Vision-based surgical tool pose estimation for the da vinci® robotic surgical system
JP2016099982A (en) Behavior recognition device, behaviour learning device, method, and program
CN113393524B (en) Target pose estimation method combining deep learning and contour point cloud reconstruction
CN115816460B (en) Mechanical arm grabbing method based on deep learning target detection and image segmentation
JP2017123087A (en) Program, device and method for calculating normal vector of planar object reflected in continuous photographic images
CN114387513A (en) Robot grabbing method and device, electronic equipment and storage medium
CN116766194A (en) Binocular vision-based disc workpiece positioning and grabbing system and method
Jaiswal et al. Deep learning based command pointing direction estimation using a single RGB camera
JP3822482B2 (en) Face orientation calculation method and apparatus
Guðmundsson et al. Model-based hand gesture tracking in tof image sequences
CN112712030A (en) Three-dimensional attitude information restoration method and device
CN116188540A (en) Target identification and pose estimation method based on point cloud information
KR101868520B1 (en) Method for hand-gesture recognition and apparatus thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20211213

Address after: 315200 No. 2, Dongyuan, Hengda Shanshui City, Jiulonghu Town, Zhenhai District, Ningbo City, Zhejiang Province

Patentee after: Yu Xinghu

Address before: 325035 k604, scientific research and entrepreneurship building, Huazhong academy, No. 225, Chaoyang new street, Chashan street, Ouhai District, Wenzhou City, Zhejiang Province

Patentee before: ZHEJIANG YOUMAIDE INTELLIGENT EQUIPMENT Co.,Ltd.

TR01 Transfer of patent right