CN114419154A

CN114419154A - Mechanical arm dual-mode control method and system based on vision and man-machine cooperation

Info

Publication number: CN114419154A
Application number: CN202210051511.0A
Authority: CN
Inventors: 解仑; 左利钢
Original assignee: University of Science and Technology Beijing USTB
Current assignee: University of Science and Technology Beijing USTB
Priority date: 2022-01-17
Filing date: 2022-01-17
Publication date: 2022-04-29

Abstract

The invention discloses a mechanical arm dual-mode control method and system based on vision and man-machine cooperation, and relates to the technical field. The method comprises the following steps: the three-dimensional reconstruction unit is used for constructing a virtual scene; the visual identification unit is used for identifying the category of a target object in a remote operation scene and estimating the pose of the target object; the operation terminal is used for receiving the visual information and the state information fed back by the information interaction unit and sending a control instruction to the information interaction unit; the information interaction unit is used for receiving the control instruction issued by the operation terminal and issuing the control instruction to the dual-mode control unit; the dual-mode control unit is used for receiving a control instruction issued by the operation terminal and issuing a driving instruction to the bottom layer driving unit; and the bottom layer driving unit is used for receiving the driving instruction issued by the dual-mode control unit, processing the driving instruction through a smoothing algorithm and driving the mechanical arm to finish flexible motion. The invention can solve the problems of insufficient intellectualization and automation and the like of the existing intelligent robot technology in the application of high-risk industries.

Description

Mechanical arm dual-mode control method and system based on vision and man-machine cooperation

Technical Field

The invention relates to the technical field of complex operation of industrial robots, in particular to a mechanical arm dual-mode control method and system based on vision and man-machine cooperation.

Background

With the development of science and technology, various industries tend to be automated, intelligent and digitalized, and intelligent robots become important components in the industrial field. The development level of the robot technology is an important mark for measuring the state technological level and the industrial intelligence. With the advent of the artificial intelligence era, various intelligent robots are developed and enter factories to replace people to complete high-difficulty, high-risk and high-strength work. At the present stage, due to the harsh environment of a factory, the intelligentization and automation capabilities of the robot are still weak, and even the robot cannot carry out automatic operation in the harsh environment, so that two working modes of intellectualization and man-machine cooperation are needed to cooperatively complete a task with high difficulty. The visual interaction system of the robot is a core component of the robot system, the pose of a target object can be calculated in real time through a deep learning algorithm, the remote operation environment can be fed back to an operator, and the operator can efficiently and quickly complete high-difficulty operation tasks.

Aiming at the industries such as aerospace, nuclear industry, chemical industry and the like, the intelligent robot has the disadvantages of severe operating environment, high danger coefficient, high accident occurrence frequency and serious threat to the life safety of people, so that the intelligent robot is a necessary trend in the wide application in the industrial field.

Disclosure of Invention

The invention provides a method for realizing the intelligent robot technology, which aims at solving the problems of insufficient intellectualization and automation and the like of the existing intelligent robot technology in the high-risk industry.

In order to solve the technical problems, the invention provides the following technical scheme:

on one hand, the invention provides a mechanical arm dual-mode control system based on vision and man-machine cooperation, which is used for realizing a mechanical arm dual-mode control method based on vision and man-machine cooperation, and comprises a three-dimensional reconstruction unit, a vision identification unit, an operation terminal, an information interaction unit, a dual-mode control unit and a bottom layer driving unit; wherein:

the three-dimensional reconstruction unit is used for carrying out point cloud registration on the acquired point cloud data to obtain an initial scene point cloud; according to a point cloud segmentation algorithm, segmenting and rendering initial scene point cloud to obtain

Constructing a virtual scene according to the scene point cloud; feeding back the acquired real scene information to an information interaction unit; and feeding back the scene point cloud to the dual-mode control unit to complete obstacle avoidance operation.

The visual identification unit is used for identifying the type of a target object in a remote operation scene, estimating the pose of the target object, feeding back the information of the remote operation scene to the information interaction unit and the dual-mode control unit in real time and assisting an operation handle to complete man-machine cooperative operation; the remote operation scene information comprises pose information and visual information.

And the operation terminal is used for receiving the visual information and the state information fed back by the information interaction unit, monitoring the running state of the mechanical arm in real time, displaying the virtual scene, and issuing a control instruction to the information interaction unit according to an automatic or man-machine cooperative operation mode.

The information interaction unit is used for receiving the control instruction issued by the operation terminal and issuing the control instruction to the dual-mode control unit; receiving state information fed back by the dual-mode control unit in real time, and feeding back the state information to the operation terminal; and receiving real scene information and remote operation scene information in real time.

The dual-mode control unit is used for receiving a control instruction issued by the operation terminal and issuing a driving instruction to the bottom layer driving unit according to the automatic control instruction or the man-machine cooperative control instruction; and receiving scene point cloud and pose information.

And the bottom layer driving unit is used for receiving the driving instruction issued by the dual-mode control unit, converting the driving instruction into corner, speed and moment information, processing the driving instruction through a smooth interpolation algorithm and driving the mechanical arm to flexibly complete a complex operation task.

Optionally, the manner of acquiring the point cloud data and the real scene information includes:

and acquiring point cloud data and real scene information by adopting a laser radar and a depth camera.

Optionally, the operation terminal includes a security monitoring unit and a virtual scene unit.

And the safety monitoring unit is used for detecting whether the mechanical arm normally operates or not through the state information received by the information interaction unit, and actively early warning and alarming if the mechanical arm fails in operation.

And the virtual scene unit is used for displaying the remote operation process in real time according to the virtual scene of the three-dimensional reconstruction unit and the state information fed back by the information interaction unit.

Optionally, the state information received by the information interaction unit is used for detecting whether the mechanical arm operates normally, and if the mechanical arm fails in operation, the active early warning and the alarm are performed, wherein the active early warning and the alarm comprise:

and inputting the state information received by the information interaction unit into a trained prediction model to obtain the operation state of the mechanical arm, and actively issuing an abnormal control instruction and alarming if the mechanical arm fails in operation.

Optionally, the state information received by the information interaction unit includes: joint angle, velocity, pressure, power.

Optionally, the training process of the prediction model includes:

an initial prediction model is built through a convolutional neural network, a data set is collected according to an operation scene, and the initial prediction model is trained according to the data set to obtain the trained prediction model.

Optionally, the alarm comprises a voice alarm and a picture alarm.

Optionally, the information interaction unit is configured to receive a control instruction issued by the operation terminal and issue the control instruction to the dual-mode control unit; receiving state information fed back by the dual-mode control unit in real time, and feeding back the state information to the operation terminal; the real scene information and the remote operation scene information are received in real time, and the method comprises the following steps:

the information interaction unit adopts a 5G communication module to receive a control instruction issued by the operation terminal and issues the control instruction to the dual-mode control unit; receiving state information fed back by the dual-mode control unit in real time, and feeding back the state information to the operation terminal; and receiving real scene information and remote operation scene information in real time.

Optionally, the vision recognition unit comprises a pose estimator; the pose estimator comprises a semantic segmentation network, a feature extraction network, a feature fusion network, a pose estimation network and a pose fine adjustment network.

And the semantic segmentation network is used for identifying and segmenting the target object to form an RGB (red, green and blue) image, a depth image and a mask image containing the target object.

And the feature extraction network is used for respectively extracting features of the RGB image, the depth image and the mask image of the target object through three parallel feature extraction channels, and converting the extracted features into features with the same format through the feature compression network.

And the feature fusion network is used for performing feature fusion on the features with the same format to obtain pixel-level fusion features.

And the pose estimation network is used for predicting the pose information of the target object according to the pixel-level fusion characteristics.

And the pose fine tuning network is used for adjusting the pose information according to an iterative algorithm to obtain the corrected pose information.

Optionally, the training process of the semantic segmentation network includes:

and constructing an initial semantic segmentation network based on the convolutional neural network, and training the initial semantic segmentation network through the obtained semantic segmentation image data set to obtain the trained semantic segmentation network.

The acquisition method of the semantic segmentation image data set comprises the steps of acquiring images of a target object under different illumination and backgrounds, and labeling different object types by using an object type labeling tool to obtain the semantic segmentation image data set.

On the other hand, the invention provides a mechanical arm dual-mode control method based on vision and man-machine cooperation, which is realized by a mechanical arm dual-mode control system based on vision and man-machine cooperation, wherein the system comprises a three-dimensional reconstruction unit, a vision identification unit, an operation terminal, an information interaction unit, a dual-mode control unit and a bottom layer driving unit;

the method comprises an automatic control method and a man-machine cooperative control method.

The automatic control method comprises the following steps: the visual recognition unit calculates the pose of the target object in the space; outputting a scene point cloud by a three-dimensional reconstruction unit; the dual-mode control unit generates an obstacle avoidance operation path according to the position and the scene point cloud of the target object in the space and issues a driving instruction to the bottom layer driving unit; and the bottom driving unit calculates a smooth path through a smooth interpolation algorithm and drives the mechanical arm to complete the remote operation task.

The man-machine cooperative control method comprises the following steps: an operator sends a driving instruction to the bottom driving unit through a remote operation process and an operation handle which are displayed in real time by the operation terminal, the bottom driving unit analyzes the control instruction into corner, speed and moment information, and a mechanical arm is driven to complete a man-machine cooperative remote operation task.

Optionally, the training process of the prediction model includes:

Optionally, the alarm comprises a voice alarm and a picture alarm.

Optionally, the training process of the semantic segmentation network includes:

The technical scheme provided by the embodiment of the invention has the beneficial effects that at least:

in the above scheme, the control mode is selected by solving the situation and the field working environment through the motion planning algorithm, and in the automatic mode: the pose of a target object is calculated in real time through a visual identification unit, a three-dimensional reconstruction unit receives point clouds of a laser radar and a depth camera and reconstructs a scene point cloud, a control unit receives the pose of the object and the scene point cloud information and automatically generates an obstacle avoidance operation path, a driving instruction is issued to a bottom layer driving unit, the bottom layer driving unit calculates a smooth path through a smooth interpolation algorithm to drive a mechanical arm to move flexibly and complete an operation task; in the man-machine cooperation mode: according to the visual information of the virtual scene unit and the visual identification unit, a mechanical arm movement control instruction is issued through an operating handle, a bottom layer driving unit analyzes the control instruction into information such as a corner, a speed and a moment, and the mechanical arm is driven to move to complete man-machine cooperative operation; the safety monitoring unit monitors the running state of the system in real time through data fed back by the information interaction unit under two control modes, and the safe and reliable running of the system is guaranteed.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a block diagram of a dual-mode control system of a robot arm based on vision and man-machine interaction provided by an embodiment of the invention;

FIG. 2 is a schematic diagram illustrating an operation principle of a three-dimensional reconstruction unit according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of the operation of a visual identification unit according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of an operating principle of a security monitoring unit according to an embodiment of the present invention;

fig. 5 is a schematic diagram of an operating principle of a virtual scene unit according to an embodiment of the present invention;

FIG. 6 is a schematic diagram illustrating an operation principle of an information interaction unit according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of the operating principle of the dual-mode control unit provided by the embodiment of the invention;

FIG. 8 is a schematic diagram of the operation of the bottom drive unit provided by the embodiment of the present invention;

FIG. 9 is a structural schematic diagram of a dual-mode control system of a robot arm based on vision and man-machine interaction provided by the embodiment of the invention;

FIG. 10 is a flowchart of a dual-mode control method for a robot arm based on vision and human-machine interaction according to an embodiment of the present invention.

Detailed Description

In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.

As shown in fig. 1, an embodiment of the present invention provides a dual-mode control system for a manipulator based on vision and man-machine cooperation, where the system includes a three-dimensional reconstruction unit, a vision recognition unit, an operation terminal, an information interaction unit, a dual-mode control unit, and a bottom layer driving unit. The dual-mode control system block diagram of the mechanical arm based on vision and man-machine interaction is shown in fig. 1, wherein:

1. the three-dimensional reconstruction unit is used for carrying out point cloud registration on the acquired point cloud data to obtain an initial scene point cloud; according to a point cloud segmentation algorithm, segmenting and rendering initial scene point cloud to obtain scene point cloud, and constructing a virtual scene according to the scene point cloud; feeding back the acquired real scene information to an information interaction unit; and feeding back the scene point cloud to the dual-mode control unit to complete the obstacle avoidance task.

In a feasible implementation manner, as shown in fig. 2, in the scene reconstruction, Point cloud information may be output through a laser radar and a depth camera, and multiple pieces of Point cloud data are spliced by applying a Random Sample Consensus (Random Sample Consensus) and ICP (Iterative Closest Point) Point cloud registration algorithm to form a whole scene Point cloud. And partitioning the obstacle point cloud and the target object point cloud according to a region growing point cloud partitioning algorithm, and realizing obstacle avoidance operation of a complex scene based on the combination of a motion planning algorithm and point cloud data.

The laser radar and the depth camera adopt two types of sensors to obtain point cloud data, the laser radar can obtain large-scale sparse point cloud, the depth camera can obtain small-scale dense point cloud, and through registration of the two types of point cloud data, large-scale scene point cloud can be obtained, the problem of vulnerability of the large-scale point cloud can be solved through the dense point cloud data, and point cloud data which are distributed uniformly can be obtained. The embodiment of the application can adopt RS-LiDAR-16 series laser radar, and the distance measuring capacity is as follows: 0.1m-40m, measurement range: 290 °, measurement accuracy: 0.1-20m: +/-20 mm; 20-40m: + -30 mm, angular resolution: 0.25 ° (360 °/1,440steps), scan time: 20 ms; the depth camera may employ RealsenseD455, which may output close-range point cloud data.

2. And the visual identification unit is used for identifying the category of the target object in the remote operation scene, estimating the pose of the target object and feeding back the information of the remote operation scene to the information interaction unit and the dual-mode control unit in real time.

The remote operation scene information comprises pose information and visual information.

In one possible embodiment, as shown in fig. 3, the visual recognition unit can assist the operation of the handle to perform the man-machine cooperation.

1) And the semantic segmentation network is used for identifying and segmenting the target object to form an RGB (red, green and blue) image, a depth image and a mask image containing the target object.

Optionally, the training process of the semantic segmentation network includes:

building a RefineNet semantic segmentation network based on a convolutional neural network, and training the RefineNet semantic segmentation network through the obtained semantic segmentation image data set to obtain the trained semantic segmentation network.

The method for acquiring the semantic segmentation image data set comprises the steps of acquiring images of a target object under different illumination and backgrounds, and labeling different object types by using a scribbesupo semantic segmentation labeling tool to obtain the semantic segmentation image data set.

In a possible implementation, the target object can be identified through a semantic segmentation model, and the target object is extracted from the scene to form an RGB image, a depth image and a mask image which only contain the target object, and other features are reduced from being input into the model, so as to improve the accuracy of pose prediction.

Further, inputting the image into the trained semantic segmentation network can output all pixel points of the target object in the image, and the specific segmentation process can be as follows: firstly, a frame of RGB image is obtained, the RGB image is input into a semantic segmentation network to obtain mask information of a target object, all pixel points of the target object in the image are extracted through the mask information, and finally, the RGB image and a depth image only of the target object are obtained.

2) And the feature extraction network is used for respectively extracting features of the RGB image, the depth image and the mask image of the target object through the three parallel feature extraction channels, and converting the extracted features into features with the same format through the feature compression network.

In a feasible implementation mode, the feature extraction network converts the extracted features into features with the same format, so that the feature fusion network can conveniently fuse the image features and the point cloud features of the target object, and introduce the spatial features of the target object to weaken the influence of illumination and object color.

Specifically, the feature extraction performed by the three parallel feature extraction channels may be: the image features are based on a ResNet (Residual Network ) + PSPNet (Pyramid Scene Parsing Network) structure of a convolutional neural Network to extract texture and color features of a target object; the depth feature is a spatial feature extracted from a target object based on a PointNet (point cloud special extraction network) network structure; the mask feature is extracted from the target object based on a VGG (visual geometry group, super resolution test sequence) network structure.

3) And the characteristic fusion network is used for carrying out characteristic fusion on the characteristics with the same format to obtain pixel-level fusion characteristics.

In a feasible implementation manner, the feature fusion network is constructed by a convolutional neural network, the point cloud feature, the image feature and the mask feature of the target object are subjected to pixel-level fusion, and the pose of the target object is estimated by using the fused pixel-level feature.

4) And the pose estimation network is used for predicting the pose information of the target object according to the pixel-level features.

In one possible implementation, an estimated object pose network is constructed by a convolutional neural network, and the estimated object pose network is trained by an object estimated pose data set, wherein the specific pose estimation process is as follows: respectively extracting the characteristics of the RGB image, the mask image, the depth image and the point cloud model of the target object obtained by the semantic segmentation network; and performing pixel-level fusion on the extracted features, inputting the fused features into a pose estimation network, and predicting the pose of the target object in the space through the fused features.

The object pose data set can be a pose estimation data set which is manufactured by acquiring RGB images of a target object under different illumination and backgrounds and marking the posture information of the target object by using a 3D marking tool.

5) And the pose fine tuning network is used for adjusting the pose information according to an iterative algorithm to obtain the corrected pose information.

In a feasible implementation mode, the pose fine tuning network is constructed through the convolutional neural network, and the pose is finely tuned by adopting an iterative algorithm, so that the result output by the pose estimation network can be corrected, and the accuracy of the whole pose estimator is improved.

In a feasible implementation manner, the readsensed 455 depth camera can be used for acquiring images with a resolution of 1280 × 800, RGB and depth maps are generated at the same time, the frame rate is 90fps, the interface USB is 3.0, and the depth perception range is 0.4m to 6m, and the depth camera is small in external size, easy to install and capable of meeting factory operation scenes; the whole network model is deployed in a real-time Linux system, and the real-time performance of the whole system is met.

3. And the operation terminal is used for receiving the visual information and the state information fed back by the information interaction unit and sending a control command to the information interaction unit.

1) And the safety monitoring unit is used for detecting whether the mechanical arm normally operates or not through the state information received by the information interaction unit, and actively early warning and alarming if the mechanical arm fails in operation.

The active early warning mechanism is a prediction algorithm built by a convolutional neural network, and the acquired data set is expanded by a random algorithm, so that different conditions faced in the industrial production process are covered as much as possible, the prediction model has stronger generalization capability, and a prediction result can be output in a short time. The prediction model can predict the operation state of the mechanical arm system in real time, actively issues an abnormal control instruction, and guarantees the safe operation of the mechanical arm system in real time.

The prediction algorithm is provided with a plurality of feature extraction channels, the feature extraction is respectively carried out on state data generated by the mechanical arm system, the feature fusion is carried out, and the operation state of the mechanical arm is predicted through a multi-data feature fusion frame.

The alarm function is to carry out accurate alarm according to the result output by the prediction model, wherein two alarm modes of voice and image are adopted, the voice mode can rapidly remind an operator of abnormal occurrence, the image mode can precisely prompt a specific abnormal point, and the combination of the two modes can realize accurate and rapid alarm.

Optionally, the training process of the prediction model includes:

Optionally, the alarm comprises a voice alarm and a picture alarm.

The voice mode can quickly remind an operator of abnormity, the image mode can accurately prompt a specific abnormity point, and the combination of the two modes can realize accurate and quick alarm.

Further, the state information received by the information interaction unit is input into a trained prediction model to obtain the operation state of the mechanical arm, and if the mechanical arm fails in operation, an abnormal control instruction and an alarm are actively issued.

In a feasible implementation manner, as shown in fig. 4, the safety monitoring terminal is developed based on the internet technology, data is interacted through the internet, and data fed back by the information interaction unit is presented to an operator through a webpage manner, so that the operation state of the mechanical arm system is conveniently monitored, and active early warning and alarm functions are realized.

Specifically, the safety terminal in the WEB page mode is a monitoring WEB page developed based on a WEB framework, presents data fed back by the information interaction unit to an operator in a graph mode, and presents visual data of a remote working scene to the operator. The operation state of the mechanical arm system can be checked at any time and any place through a webpage mode, abnormal data generated in the operation process are stored in a database, and analysis and checking are facilitated. The monitoring unit establishes a data communication interface through TCP/IP, receives data fed back by the information interaction unit in real time, presents data such as a corner, a speed, a pressure and power to an operator in a real-time line graph mode, sets threshold value constraint on each data, gives an alarm in a picture mode and a voice mode, and can actively send a brake signal to stop the system operation; building an early warning prediction model based on a convolutional neural network, inputting various data such as pressure, speed, turning angle and the like into the prediction model for prediction, and analyzing the system operation state through a multi-data feature fusion technology for early warning; the prediction model is operated in a Linux operating system, the safety terminal webpage is operated in a Windows system, and an information interaction channel is established between the safety terminal webpage and the Windows system through TCP/IP.

2) And the virtual scene unit is used for displaying the remote operation process in real time according to the virtual scene of the three-dimensional reconstruction unit and the state information fed back by the information interaction unit.

In a feasible implementation manner, as shown in fig. 5, the virtual scene unit reconstructs a real scene in a point cloud manner by combining a laser radar and a depth camera, constructs a virtual scene by combining scene point cloud segmentation, and develops a virtual scene unit for displaying a remote operation process in real time by combining feedback data of the information interaction unit. The virtual scene is viewed through the VR head-mounted display, and the control unit is assisted to complete complex operation of the far-end mechanical arm.

The scene construction is to perform point cloud registration on point cloud data of a laser radar and a depth camera through a point cloud registration and point cloud segmentation algorithm to form a whole scene point cloud, extract a core component through the point cloud segmentation algorithm, and construct a virtual scene by combining Uinty3D software.

Specifically, a remote operation scene is constructed in Uinty3D one by one through a three-dimensional reconstruction technology, an information receiving interface is developed based on TCP/IP and used for receiving system states and visual data fed back by an information interaction unit, a plurality of information such as running states, collision information and visual information are fused, information fusion of a virtual scene and a real scene is achieved, and complex operation is completed by combining a mechanical arm. The Uinty3D is development software for constructing a three-dimensional virtual scene, a virtual object constructed by a three-dimensional reconstruction technology is introduced into the Uinty3D to construct a virtual scene of remote operation, and the virtual scene is operated in a Windows operating system; and the TCP/IP communication protocol is used for establishing a communication link between two devices and ensuring safe and reliable data transmission between the devices.

The remote operation scene and the operation process are displayed in real time, and the joint angle data of the real mechanical arm is fed back through the information interaction unit to drive the virtual model to move, so that the real and virtual scene synchronous motion is realized.

The effect of enhancing display is achieved, and visual data of a virtual scene and a real scene are combined to assist an operator to complete a high-difficulty complex operation task.

4. The information interaction unit is used for receiving the control instruction sent by the operation terminal by the information interaction unit through the 5G communication module and sending the control instruction to the dual-mode control unit; receiving state information fed back by the dual-mode control unit in real time, and feeding back the state information to the operation terminal; and receiving real scene information and remote operation scene information in real time.

In one possible implementation, as shown in fig. 6, the information interaction unit assists the virtual scene unit to construct the real-time augmented reality. And an information interaction channel between the operation terminal and the remote mechanical arm is established by a 5G communication technology. The 5G communication technology has the characteristics of ultrahigh speed, low time delay, large connection and the like, can safely, reliably and quickly transmit data such as vision, state information, control instructions and the like, and greatly improves the response efficiency of the whole system.

The embodiment of the application can adopt a remote communication RM500Q series 5G communication module, the module is in butt joint with a Linux real-time system, the maximum communication speed is 2.1Gbps, the current common frequency band is covered, the time delay is as low as 1ms, the ultrahigh reliability and the low time delay greatly meet the real-time performance of remote communication, data such as vision, state information and control instructions can be safely, reliably and quickly transmitted, the synchronism of a virtual operation scene and a real operation scene is met, real-time augmented reality operation is realized, and the response efficiency of the whole system is greatly improved.

5. The dual-mode control unit is used for receiving a control instruction issued by the operation terminal and issuing a driving instruction to the bottom layer driving unit; and receiving scene point cloud and pose information.

In one possible embodiment, as shown in fig. 7, the dual-mode control unit has two control modes, namely automatic control and man-machine cooperation, the automatic control is used for assisting the visual recognition unit to complete automatic operation, and the man-machine cooperation control is used for completing remote man-machine cooperation by combining the operating handle with the virtual scene unit. The dual-mode Control unit has a high requirement on real-time performance of a Control period, the dual-mode Control unit can operate in a Kurt-Linux real-time system, an EtherCAT (Ethernet Control Automation Technology) communication protocol is adopted for a communication line, the EtherCAT is a field bus based on Ethernet, and the dual-mode Control unit has the characteristics of high speed, high efficiency, short refresh period and the like, supports various equipment connection topological structures, meets a 5ms Control period, is stable in communication performance, and can issue a Control instruction to the drive unit in real time.

Automatic control and man-machine cooperative control decide a control mode based on path planning and a real scene. Specifically, a ROS (Robot Operating System) based motion planning algorithm solves the situation and the real operation scene, if the situation exists, the operation environment is good, an automatic mode is selected by default, otherwise, an operator uses a man-machine cooperation mode to complete a complex operation task;

the automatic control calculates the pose of the target object through the visual identification unit, controls the mechanical arm to move to the position near the target object to carry out an operation task according to the ROS robot operating system, and realizes motion planning obstacle avoidance based on the combination of scene point cloud data output by the three-dimensional reconstruction unit and a motion planning algorithm.

And in man-machine cooperative control, the mechanical arm is directly controlled by an operating handle to complete single joint movement and multi-joint movement, and information of a virtual environment and a real scene is checked by a VR head-mounted display to complete complex operation with high difficulty.

The single joint movement enables the movement joint through the operation terminal, the rotation angle of the servo motor is controlled by the operation handle to achieve the single joint movement, and the single joint movement can control the mechanical arm to achieve a space without a kinematics solution, so that complex operation is completed.

The multi-joint linkage is realized, the tail end pose of the mechanical arm is controlled through the operating handle, the motion path is solved according to tail end pose points output by the operating handle by combining forward kinematics, inverse kinematics and a motion planning algorithm, the multi-joint linkage of the mechanical arm can be controlled, and the effect of man-machine cooperative operation is achieved.

6. And the bottom layer driving unit is used for receiving the driving instruction issued by the dual-mode control unit, converting the driving instruction into corner, speed and moment information, processing the driving instruction through a smooth interpolation algorithm and driving the mechanical arm to complete flexible motion.

In a possible embodiment, as shown in fig. 8, the bottom layer driving unit analyzes the control command into information such as a rotation angle, a speed, a moment and the like through a bottom layer driver of the ROS robot operating system, and drives the servo motor to perform precise flexible motion.

The flexible motion is to smooth the analyzed driving command through a smooth interpolation algorithm, so that the mechanical arm can achieve the flexible motion.

Interpolation algorithms interpolate sparse points to denser points by mathematical functions. The mathematical functions comprise linear interpolation, circular interpolation and the like, and interpolation processing is carried out on pose, speed and other information, so that the mechanical arm can move smoothly.

In a possible implementation, the sensor in the embodiment of the present application may include: the system comprises a laser radar, a depth camera, a pressure sensor, a relative encoder and the like, and is used for assisting a far-end mechanical arm to complete high-difficulty complex operation.

As shown in fig. 9, the mechanical arm dual-mode control system based on vision and human-machine cooperation of the embodiment of the present application operates in a cluster, the cluster is composed of two computers, wherein an operation terminal composed of a virtual scene unit and a security monitoring unit operates on a Windows operation system, and a vision recognition unit, a three-dimensional reconstruction unit, a dual-mode control unit and a driving unit operate on a Linux operation system; the hardware equipment can adopt an Intel LGA 36476248R series processor, the main frequency is 3.2GHz, the processor has excellent performance, and a plurality of processes can be operated simultaneously; DDR 54000 MHz and disk capacity of 10T are adopted for the memory; the display card adopts NVIDIA Quadro RTX 800048G, and the display card has strong performance and high calculation speed. The device meets the real-time performance of the operation of the visual identification unit, the three-dimensional reconstruction unit, the virtual scene unit and the safety monitoring unit, supports various IO extensions, can receive various sensor data in real time, and guarantees the safe and reliable operation of the whole system.

In the embodiment of the invention, a control mode is selected through the motion planning algorithm solving condition of the ROS robot operating system and the field operation environment, and in the automatic mode: the pose of a target object is calculated in real time through a visual identification unit, a three-dimensional reconstruction unit receives point clouds of a laser radar and a depth camera and reconstructs a scene point cloud, a control unit receives the pose of the object and the scene point cloud information and automatically generates an obstacle avoidance operation path, a driving instruction is issued to a bottom layer driving unit, the bottom layer driving unit calculates a smooth path through a smooth interpolation algorithm to drive a mechanical arm to move flexibly and complete an operation task; in the man-machine cooperation mode: according to the visual information of the virtual scene unit and the visual identification unit, a mechanical arm movement control instruction is issued through an operating handle, a bottom layer driving unit analyzes the control instruction into information such as a corner, a speed and a moment, and the mechanical arm is driven to move to complete man-machine cooperative operation; the safety monitoring unit monitors the running state of the system in real time through data fed back by the information interaction unit under two control modes, and the safe and reliable running of the system is guaranteed.

As shown in fig. 10, the embodiment of the invention provides a dual-mode control method for a mechanical arm based on vision and man-machine coordination, which is implemented by a dual-mode control system for a mechanical arm based on vision and man-machine coordination, and the system comprises a three-dimensional reconstruction unit, a vision recognition unit, an operation terminal, an information interaction unit, a dual-mode control unit and a bottom layer driving unit. As shown in fig. 10, a flow chart of a dual-mode control method for a mechanical arm based on vision and man-machine interaction, a processing flow of the method may include the following steps:

an automatic control method and a man-machine cooperation control method.

And the state information received by the information interaction unit comprises joint angle, speed, pressure and power.

Optionally, the training process of the prediction model includes:

Optionally, the alarm comprises a voice alarm and a picture alarm.

Optionally, the training process of the semantic segmentation network includes:

In the embodiment of the invention, a control mode is selected by solving conditions and on-site operation environment through a motion planning algorithm, and in an automatic mode: the pose of a target object is calculated in real time through a visual identification unit, a three-dimensional reconstruction unit receives point clouds of a laser radar and a depth camera and reconstructs a scene point cloud, a control unit receives the pose of the object and the scene point cloud information and automatically generates an obstacle avoidance operation path, a driving instruction is issued to a bottom layer driving unit, the bottom layer driving unit calculates a smooth path through a smooth interpolation algorithm to drive a mechanical arm to move flexibly and complete an operation task; in the man-machine cooperation mode: according to the visual information of the virtual scene unit and the visual identification unit, a mechanical arm movement control instruction is issued through an operating handle, a bottom layer driving unit analyzes the control instruction into information such as a corner, a speed and a moment, and the mechanical arm is driven to move to complete man-machine cooperative operation; the safety monitoring unit monitors the running state of the system in real time through data fed back by the information interaction unit under two control modes, and the safe and reliable running of the system is guaranteed.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. A mechanical arm dual-mode control system based on vision and man-machine cooperation is characterized by comprising a three-dimensional reconstruction unit, a vision recognition unit, an operation terminal, an information interaction unit, a dual-mode control unit and a bottom layer driving unit; wherein:

the three-dimensional reconstruction unit is used for carrying out point cloud registration on the acquired point cloud data to obtain an initial scene point cloud; according to a point cloud segmentation algorithm, segmenting and rendering the initial scene point cloud to obtain a scene point cloud, and constructing a virtual scene according to the scene point cloud; feeding back the acquired real scene information to the information interaction unit; feeding back scene point clouds to the dual-mode control unit to complete obstacle avoidance operation;

the visual identification unit is used for identifying the type of a target object in a remote operation scene, estimating the pose of the target object, feeding back the information of the remote operation scene to the information interaction unit and the dual-mode control unit in real time and assisting an operation handle to complete man-machine cooperative operation; the remote operation scene information comprises pose information and visual information;

the operation terminal is used for receiving the visual information and the state information fed back by the information interaction unit, monitoring the running state of the mechanical arm and displaying a virtual scene in real time, and issuing a control instruction to the information interaction unit according to an automatic or man-machine cooperative operation mode;

the information interaction unit is used for receiving the control instruction sent by the operation terminal and sending the control instruction to the dual-mode control unit; receiving state information fed back by the dual-mode control unit in real time, and feeding back the state information to the operation terminal; real scene information and remote operation scene information are received in real time;

the dual-mode control unit is used for receiving a control instruction issued by the operation terminal and issuing a driving instruction to the bottom layer driving unit according to an automatic control instruction or a man-machine cooperative control instruction; receiving the scene point cloud and pose information;

and the bottom layer driving unit is used for receiving a driving instruction issued by the dual-mode control unit, converting the driving instruction into corner, speed and moment information, processing the driving instruction through a smooth interpolation algorithm and driving the mechanical arm to flexibly complete a complex operation task.

2. The system of claim 1, wherein the point cloud data and the real scene information are obtained in a manner that includes:

3. The system of claim 1, wherein the operation terminal comprises a security monitoring unit and a virtual scene unit;

the safety monitoring unit is used for detecting whether the mechanical arm normally operates or not through the state information received by the information interaction unit, and actively early warning and alarming if the mechanical arm fails to operate;

4. The system of claim 3, wherein the detecting whether the mechanical arm operates normally through the status information received by the information interaction unit, and if the mechanical arm fails to operate, the performing active pre-warning and alarm comprises:

inputting the state information received by the information interaction unit into a trained prediction model to obtain the operation state of the mechanical arm, and actively issuing an abnormal control instruction and alarming if the mechanical arm fails in operation;

5. The system of claim 4, wherein the training process of the predictive model comprises:

an initial prediction model is built through a convolutional neural network, a data set is collected according to an operation scene, and the initial prediction model is trained according to the data set to obtain a trained prediction model.

6. The system of claim 3, wherein the alarm comprises a voice alarm and a graphical alarm.

7. The system of claim 1, wherein the information interaction unit is configured to receive a control instruction issued by the operation terminal and issue the control instruction to the dual-mode control unit; receiving state information fed back by the dual-mode control unit in real time, and feeding back the state information to the operation terminal; the real scene information and the remote operation scene information are received in real time, and the method comprises the following steps:

8. The system of claim 1, wherein the vision recognition unit comprises a pose estimator; the pose estimator comprises a semantic segmentation network, a feature extraction network, a feature fusion network, a pose estimation network and a pose fine tuning network;

the semantic segmentation network is used for identifying and segmenting the target object to form an RGB (red, green and blue) image, a depth image and a mask image containing the target object;

the feature extraction network is used for respectively extracting features of the RGB image, the depth image and the mask image of the target object through three parallel feature extraction channels, and converting the extracted features into features with the same format through the feature compression network;

the feature fusion network is used for performing feature fusion on features in the same format to obtain pixel-level fusion features;

the pose estimation network is used for predicting pose information of the target object according to the pixel-level fusion characteristics;

9. The system of claim 1, wherein the training process of the semantic segmentation network comprises:

building an initial semantic segmentation network based on a convolutional neural network, and training the initial semantic segmentation network through an obtained semantic segmentation image data set to obtain a trained semantic segmentation network;

10. A mechanical arm dual-mode control method based on vision and man-machine cooperation is characterized in that the method is realized by a mechanical arm dual-mode control system based on vision and man-machine cooperation, and the system comprises a three-dimensional reconstruction unit, a vision recognition unit, an operation terminal, an information interaction unit, a dual-mode control unit and a bottom layer driving unit;

the method comprises an automatic control method and a man-machine cooperative control method;

the automatic control method comprises the following steps: the vision recognition unit calculates the pose of the target object in the space; the three-dimensional reconstruction unit outputs scene point cloud; the dual-mode control unit generates an obstacle avoidance operation path according to the position and the scene point cloud of the target object in the space and issues a driving instruction to the bottom layer driving unit; the bottom driving unit calculates a smooth path through a smooth interpolation algorithm and drives the mechanical arm to complete a remote operation task;

the man-machine cooperative control method comprises the following steps: an operator sends a driving instruction to the bottom driving unit through a remote operation process and an operation handle which are displayed by the operation terminal in real time, and the bottom driving unit analyzes the control instruction into corner, speed and moment information and drives the mechanical arm to complete a man-machine cooperative remote operation task.