CN111062364B

CN111062364B - Method and device for monitoring assembly operation based on deep learning

Info

Publication number: CN111062364B
Application number: CN201911383314.3A
Authority: CN
Inventors: 陈成军; 宋怡仁; 王天诺; 洪军; 李正浩; 郑帅; 李东年
Original assignee: Xian Jiaotong University; Qingdao University of Technology; Chongqing Institute of Green and Intelligent Technology of CAS
Current assignee: Xian Jiaotong University; Qingdao University of Technology; Chongqing Institute of Green and Intelligent Technology of CAS
Priority date: 2019-12-28
Filing date: 2019-12-28
Publication date: 2023-06-30
Anticipated expiration: 2039-12-28
Also published as: CN111062364A

Abstract

The invention relates to a method and a device for monitoring assembly operation based on deep learning, wherein the method for monitoring assembly action comprises the following steps: establishing an assembly tool sample set: shooting images of assembling actions of experimenters, collecting images of different assembling tools in the images, labeling each image, generating image labels, and collecting all the image labels as a sample set; training an assembly tool monitoring model: sending the assembly tool sample set into a target detection deep learning network model for training, and generating an assembly tool monitoring model; and (3) real-time monitoring: identifying an image assembled by an assembly worker on an assembly site, and identifying an assembly tool used by the assembly worker in the image; monitoring the assembly action times: after the assembly tool in the image is identified, the image is sent into a human body gesture recognition network model for recognition, human body joint movement information is identified, and the times of assembly actions of an assembly worker are judged by using the human body joint movement information.

Description

Method and device for monitoring assembly operation based on deep learning

Technical Field

The invention relates to a method and a device for monitoring assembly operation based on deep learning, and belongs to the technical field of intelligent manufacturing and assembly monitoring.

Background

In a mass custom production model, a factory often needs to fully utilize existing resource reassembly assembly lines to produce different products to meet the personalized needs of users. However, the assembler often has difficulty adapting to the changeable production mode, so that the key operation steps are forgotten or the operation degree is not up to standard, and the quality of the product is affected.

Therefore, the assembly actions of workers are monitored, the phenomena that the assembly workers forget key operation steps and the operation degree is not up to standard and the like can be prevented, and the product quality problem is avoided. In the past industry field, the monitoring emphasis has been placed on fault diagnosis, surface defect etc. detection, can not timely avoid the emergence of problem like this, and monitor the assembly action of workman in the assembly process then can in time discover the problem and remind assembly workman to reduce product disqualification rate and manufacturing cost. Therefore, how to monitor the assembly actions of workers in the assembly process with low cost and high efficiency is the development direction of the future assembly manufacturing industry.

Computer vision technology is to make a computer realize analysis and understanding of images or videos like human beings through a certain algorithm. In recent years, with the wide application of deep learning algorithms in the field of computer vision, tasks such as pattern recognition, target detection, motion recognition, human body posture estimation and the like have all obtained breakthrough progress. The deep learning can self-adaptively learn the characteristics, and the end-to-end training can be performed only by designing a proper neural network model, so that the characteristic engineering in the traditional machine learning is avoided, and the complex manual design characteristics are not needed. Therefore, the assembly action monitoring by utilizing the computer vision technology based on deep learning can effectively avoid the product quality problem.

Disclosure of Invention

In order to solve the technical problems, the method and the device for monitoring the assembly operation based on the deep learning utilize the target detection and gesture recognition technology to realize the recognition of the assembly actions of workers and the judgment of the action times, so that the monitoring of the assembly process of products is realized, the intelligent degree is improved, the operation cost is reduced, and the production period is shortened.

The technical scheme adopted by the invention is as follows:

the technical scheme is as follows:

a method of monitoring assembly operations based on deep learning, comprising the steps of:

establishing an assembly tool sample set: the method comprises the steps that an experimenter performs assembly actions by using different assembly tools, images of the assembly actions performed by the experimenter are shot, images of the different assembly tools in the images are collected, each image is marked, image labels containing tool type information and position information in the images are generated, and all the image labels are collected to be used as an assembly tool sample set;

training an assembly tool monitoring model: sending the assembly tool sample set into a target detection deep learning network model for training, and generating an assembly tool monitoring model;

and (3) real-time monitoring: installing a camera on an assembly site to shoot an image assembled by an assembly worker in real time, and sending the image into an assembly tool monitoring model for identification, so as to identify an assembly tool used by the assembly worker in the image;

monitoring the assembly action times: after the assembly tool used by the assembly worker in the image is identified, the image is sent into a human body gesture identification network model for identification, human body joint movement information of the assembly worker is identified, and the number of times that the assembly worker uses the assembly tool in the current image to carry out assembly action is judged by using the human body joint movement information;

error early warning: comparing the sequence of the assembling tools operated by the assembling workers and the times of assembling actions by using the assembling tools with the assembling flow of the original assembling production line, and prompting whether the assembling is correct or not to the assembling workers according to the comparison result.

Further, the step of establishing the assembly tool sample set specifically includes:

shooting assembly actions of a plurality of experimenters by adopting an RGB camera, shooting images of different assembly actions of each experimenter, shooting for 10 seconds by each assembly action, and extracting 10 images per second;

marking the assembly tool in each image by using image marking software, generating an image tag comprising the coordinate information of the assembly tool and the category information of the assembly tool, and discarding the image if the image has no tool information;

and carrying out data set division on all the image labels, and dividing the image labels into a training set, a verification set and a test set.

Further, the training and assembling tool monitoring model comprises the following steps:

setting super parameters of a target detection deep learning network model, inputting images in a training set and a verification set into the target detection deep learning network model for training, and continuously optimizing network weights according to the result of the verification set in the training process;

during training, according to the change conditions of an accuracy rate curve and a loss curve in the training process, determining the optimal training iteration times, stopping training, and storing the trained network weight;

selecting different IoU threshold values, comparing the change of the index of the accuracy and the recall rate under the different threshold values, and determining an optimal IoU threshold value;

and (3) testing on a test set by storing the trained network weight and setting an optimal IoU threshold value, and if the test result meets the expectations and no fitting phenomenon occurs, completing training of the training assembly tool monitoring model.

Further, the step of monitoring the number of assembly actions specifically includes:

using an RGB camera to shoot RGB images of assembly operations of assembly workers in real time on an assembly operation site, using an assembly tool monitoring model to monitor the shot images in real time, and judging that the assembly workers are about to use the assembly tools to perform corresponding assembly actions when an assembly tool appears in the images;

after detecting an assembly tool, immediately inputting the RGB image into a human body gesture recognition network model, and acquiring the coordinate information of each joint point and each joint point of a human body in the RGB image;

counting the upper limit and the lower limit of human body articulation point coordinates according to the action range of the human body articulation point for assembly action, carrying out data cleaning on articulation point coordinate information according to the upper limit and the lower limit of the human body articulation point coordinates, and eliminating articulation point coordinate information which does not accord with the range;

and drawing a curve of the change of the coordinate of the joint point along with time by using the coordinate information of all the joint points after data cleaning, and judging the number of assembly actions through the alternating change of wave peaks and wave troughs in the curve.

Further, the average value of all the coordinate information of the nodes after the data cleaning is subtracted.

The second technical scheme is as follows:

an apparatus for deep learning based monitoring of assembly operations, comprising a memory and a processor, the memory storing instructions adapted to be loaded by the processor and to perform the steps of:

The invention has the following beneficial effects:

1. according to the method and the device for monitoring the assembly operation based on the deep learning, the target detection is used for replacing the action detection, the type of the assembly operation (such as a hammer, a nut, a screw nut and the like) can be judged by using one frame of image, the repetition number of the assembly operation can be judged, and the monitoring of the assembly process is realized.

2. The invention relates to a method and a device for monitoring assembly operation based on deep learning, which are used for recognizing assembly action as tool object detection based on the characteristics of repeatability of assembly action and tool dependence. Compared with the traditional action recognition algorithm, the method has higher recognition accuracy and higher recognition speed.

3. According to the method and the device for monitoring the assembly operation based on the deep learning, the human body gesture recognition network model is utilized for gesture estimation and coordinate data of key joint points are extracted; finally, through cleaning and analyzing the coordinate data, the judgment of the repeated times of the assembly actions is realized according to the rule of the change of the coordinate information along with time. After the monitoring video of the assembly operation site is processed by the method, a manager can quickly judge the repeated times of actions of workers by only one image, and compared with the common human supervision, the method has the advantages of saving manpower input to a great extent and improving monitoring quality.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention;

FIG. 2 is an exemplary diagram of an assembly tool monitoring model detecting an assembly tool;

FIGS. 3 and 4 are example diagrams of identifying node data using a human gesture recognition network model;

FIG. 5 is a hammer action curve plotted in the example;

FIG. 6 is a file action curve plotted in the example;

fig. 7 is a graph of the action of screwing nuts plotted in the example.

Detailed Description

The invention will now be described in detail with reference to the drawings and to specific embodiments.

Example 1

Referring to fig. 1, a method for monitoring assembly operations based on deep learning includes the following steps:

establishing an assembly tool sample set: the method comprises the steps that an experimenter performs assembly actions by using different assembly tools, images of the assembly actions performed by the experimenter are shot, images of the different assembly tools in the images are collected, each image is marked (the images can be marked through image marking software, such as LableImage software), image labels containing tool type information and position information in the images are generated, and all the image labels are collected to serve as an assembly tool sample set.

Training an assembly tool monitoring model: sending the assembly tool sample set into a target detection deep learning network model for training, and generating an assembly tool monitoring model; when the target detection deep learning network model is trained, the appearance and the approximate outline of the currently-marked tool class information are learned according to the position information marked in the image tag, then the trained target detection deep learning network model is taken to analyze the image video, and the corresponding target in the video can be quickly found according to the self-learned parameters and is framed.

And (3) real-time monitoring: installing a camera on an assembly site to shoot an image assembled by an assembly worker in real time, and sending the image into an assembly tool monitoring model for identification, so as to identify an assembly tool used by the assembly worker in the image; and judging what assembling action is performed by the current assembling worker according to the assembling tool, and judging that the assembling worker is performing the action of hammering if the tool identified in the image is a hammer.

Monitoring the assembly action times: after the assembly tool used by the assembly worker in the image is identified, the image is sent into a human body gesture identification network model for identification, human body joint movement information of the assembly worker is identified, and the number of times that the assembly worker uses the assembly tool in the current image to carry out assembly action is judged by using the human body joint movement information; for example: when the assembly tool monitoring model recognizes that the assembly tool, namely a hammer, appears in an image shot by the camera in real time, the assembly tool monitoring model immediately starts to recognize the action times of assembly workers through the human body gesture recognition network model until the assembly tool monitoring model cannot monitor that the hammer exists in the camera image, and the action times recorded in the process (for example, the hammer is 5 times) are counted.

Error early warning: comparing the sequence of the assembling tools operated by the assembling workers and the times of assembling actions by using the assembling tools with the assembling flow of the original assembling production line, and prompting whether the assembling is correct or not to the assembling workers according to the comparison result. For example, the assembly process of the original assembly production line is to hammer the production product for five times, file for five times and screw the nut for three times; and comparing the monitored actual information with the original flow, wherein the actual assembly worker firstly hammers five times during assembly, then files four times, and then the assembly tool monitoring model monitors that files in the image disappear, the file is judged to be finished, the action times are not consistent with the original times, an alarm is sent to the assembly worker, and the alarm can be automatically sent in a voice or caption mode.

In this embodiment, the object detection replaces the motion detection, and the type of the assembly operation (such as hammering, contusion, screwing, etc.) can be determined by using one frame of image, and the repetition number of the assembly operation can also be determined, so as to realize the monitoring of the assembly process.

Example two

The present embodiment not only has the advantages of the first embodiment, but also proposes that the step of creating the assembly tool sample set specifically includes:

adopt RGB camera to shoot the assembly action of many experimenters, shoot every experimenter and carry out the image of different assembly actions, if: basic assembly actions such as hammers, files, brushes, saws, nuts and the like; shooting for 10 seconds in each assembly action, and extracting 10 images in each second;

setting super parameters of a target detection deep learning network model: the momentum of gradient decline, the initial learning rate and the attenuation coefficient, the images in a training set and a verification set are input into a target detection deep learning network model (the embodiment adopts a YOLOv3 target detection deep learning network model) for training, and in the training process, the network weight is continuously optimized according to the result of the verification set;

Referring specifically to fig. 2, further, the step of monitoring the number of assembly actions specifically includes:

using an RGB camera to shoot an RGB image of assembly operation of an assembly worker in real time on an assembly operation site, using a trained assembly tool monitoring model to detect tool information of an input image in real time, outputting tool type and position information detected by a current frame, judging that the assembly worker is about to use the assembly tool to perform corresponding assembly action when detecting that an assembly tool appears in the current frame image, and judging that the current assembly action is a hammer when detecting that the tool in the image is a hammer as shown in fig. 2;

referring specifically to fig. 3 and fig. 4, after detecting an assembly tool, immediately inputting an RGB image into a human body posture recognition network model (in this embodiment, the human body posture recognition network model adopts an openPose posture estimation network model), and acquiring coordinate information of each joint point and each joint point of a human body in the RGB image; and collecting the coordinate information of each joint point and each joint point of the human body in the image frame by frame until the assembly tool cannot be detected (namely, the current assembly action is finished), and ending the input of the coordinate information of each joint point and each joint point of the human body.

In the present embodiment, the number of motions is estimated for the time-varying condition of the motion hammer by the longitudinal (Y-direction) coordinate information, for the time-varying condition of the motion file by the transverse (X-direction) coordinate information, and for the joint-varying condition of the nut screw by X, Y directionThe situation estimates the number of actions. The output format of OpenPose to the human body joint point coordinates is 'phase_keypoints_2d': x ₁ ，y ₁ ，c ₁ ，…，x _n ，y _n ，c _n ]Wherein x is _n ，y _n Is the human body joint point coordinate, c _n The accuracy of the joint position prediction is determined. The joint point coordinates of each frame are extracted in time sequence, and in this embodiment, only the wrist joint point of the arm of the grasping tool is extracted.

For model and system performance reasons, it is difficult to achieve one hundred percent accuracy. In some frames, the positions of joints cannot be detected or predicted joints are seriously inconsistent with the real situation, so that the condition that the coordinates are 0 or the jumping performance is extremely high exists, and data cleaning is required, the step is to count the upper limit and the lower limit of the coordinates of the joints of the human body according to the action range of the joints of the human body for assembly action, and the data cleaning is carried out on the coordinate information of the joints according to the upper limit and the lower limit of the coordinates of the joints of the human body, so that the coordinate information of the joints which do not accord with the range is removed.

And drawing a curve of the change of the coordinates of the joint points along with time by using the coordinate information of all the joint points after data cleaning (in the embodiment, drawing by using a Python third party library matplotlib), and judging the number of assembly actions through the alternating change of wave peaks and wave troughs in the curve.

Further, the average value subtracting processing is performed on all the coordinate information of the joint points after the data cleaning, if the Y coordinate information of the hammer is changed between 900 and 950, one average value 900 or 890 can be subtracted from all the Y coordinates, so that the action fluctuation can be more visual.

In this embodiment, the curve drawing result refers to fig. 5-7, where fig. 5 is a hammer action curve, the ordinate represents the Y-direction coordinate value of the articulation point in the actual space, and the abscissa represents the frame number; FIG. 6 is a file action curve, with the ordinate representing the X-direction coordinate values of the articulation point in real space and the abscissa representing the number of frames; FIG. 7 is a graph of a nut tightening motion plotted with X and Y coordinate values in real space of the joint point as the abscissa and the ordinate, and with the frame number as the Z-axis coordinate; the action of screwing the nut is performed 3 times by judging that the hammer and the file are performed 5 times through the curve wave crest.

The present embodiment regards the recognition of the assembly action as tool object detection in the present invention based on the characteristics of the repeatability of the assembly action and the tool dependency. Compared to conventional action recognition algorithms, the YOLOv3 object detection algorithm is more suitable for assembly action recognition in repeatability and tool-dependent assembly actions. Compared with a conventional action recognition algorithm, the YOLOv3 model has higher recognition accuracy and higher recognition speed.

Carrying out attitude estimation by using OpenPose and extracting coordinate data of key nodes; finally, through cleaning and analyzing the coordinate data, the judgment of the repeated times of the assembly actions is realized according to the rule of the change of the coordinate information along with time. After the monitoring video of the assembly operation site is processed by the method provided by the embodiment, a manager can quickly judge the repeated times of actions of workers by only one image, and compared with the common human supervision, the method has the advantages of saving manpower input to a great extent and improving monitoring quality.

Example III

Example IV

The third embodiment has the advantages of the third embodiment, and further proposes that the step of creating the assembly tool sample set specifically includes:

In this embodiment, the number of operations is estimated for the case where the operating hammer is changed with time by the longitudinal (Y-direction) coordinate information, for the case where the operating file is changed with time by the transverse (X-direction) coordinate information, and for the case where the nut is screwed with the screw nut, for the case where the nut is changed with time by the X, Y direction. The output format of OpenPose to the human body joint point coordinates is 'phase_keypoints_2d': x ₁ ，y ₁ ，c ₁ ，…，x _n ，y _n ，c _n ]Wherein x is _n ，y _n Is the human body joint point coordinate, c _n The accuracy of the joint position prediction is determined. The joint point coordinates of each frame are extracted in time sequence, and in this embodiment, only the wrist joint point of the arm of the grasping tool is extracted.

The foregoing description is only illustrative of the present invention and is not intended to limit the scope of the invention, and all equivalent structures or equivalent processes or direct or indirect application in other related technical fields are included in the scope of the present invention.

Claims

1. A method for monitoring assembly operations based on deep learning, comprising the steps of:

error early warning: comparing the sequence of assembling tools operated by an assembling worker and the times of assembling actions by using each assembling tool with the assembling flow of the original assembling production line, and prompting whether the assembling is correct or not to the assembling worker according to the comparison result;

the assembly action frequency monitoring step specifically comprises the following steps:

2. The method for monitoring assembly operations based on deep learning of claim 1, wherein the step of creating an assembly tool sample set is specifically:

3. The method for monitoring assembly operations based on deep learning of claim 2, wherein the training assembly tool monitoring model step specifically comprises:

4. A method of monitoring assembly operations based on deep learning as claimed in claim 1, wherein: and carrying out mean value reduction processing on the coordinate information of all the nodes after the data cleaning.

5. An apparatus for deep learning based monitoring of assembly operations, comprising a memory and a processor, the memory storing instructions adapted to be loaded by the processor and to perform the steps of:

6. The apparatus for deep learning based monitoring of assembly operations of claim 5, wherein the step of creating an assembly tool sample set is specifically:

7. The apparatus for deep learning based monitoring of assembly operations of claim 6, wherein the training assembly tool monitoring model step is specifically:

8. The apparatus for deep learning based assembly operation monitoring of claim 5, wherein the means for reducing the average of all the node coordinate information after the data cleaning is performed.