CN111062364B - Method and device for monitoring assembly operation based on deep learning - Google Patents

Method and device for monitoring assembly operation based on deep learning Download PDF

Info

Publication number
CN111062364B
CN111062364B CN201911383314.3A CN201911383314A CN111062364B CN 111062364 B CN111062364 B CN 111062364B CN 201911383314 A CN201911383314 A CN 201911383314A CN 111062364 B CN111062364 B CN 111062364B
Authority
CN
China
Prior art keywords
assembly
image
tool
training
monitoring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911383314.3A
Other languages
Chinese (zh)
Other versions
CN111062364A (en
Inventor
陈成军
宋怡仁
王天诺
洪军
李正浩
郑帅
李东年
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Qingdao University of Technology
Chongqing Institute of Green and Intelligent Technology of CAS
Original Assignee
Xian Jiaotong University
Qingdao University of Technology
Chongqing Institute of Green and Intelligent Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University, Qingdao University of Technology, Chongqing Institute of Green and Intelligent Technology of CAS filed Critical Xian Jiaotong University
Priority to CN201911383314.3A priority Critical patent/CN111062364B/en
Publication of CN111062364A publication Critical patent/CN111062364A/en
Application granted granted Critical
Publication of CN111062364B publication Critical patent/CN111062364B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Social Psychology (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Psychiatry (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a method and a device for monitoring assembly operation based on deep learning, wherein the method for monitoring assembly action comprises the following steps: establishing an assembly tool sample set: shooting images of assembling actions of experimenters, collecting images of different assembling tools in the images, labeling each image, generating image labels, and collecting all the image labels as a sample set; training an assembly tool monitoring model: sending the assembly tool sample set into a target detection deep learning network model for training, and generating an assembly tool monitoring model; and (3) real-time monitoring: identifying an image assembled by an assembly worker on an assembly site, and identifying an assembly tool used by the assembly worker in the image; monitoring the assembly action times: after the assembly tool in the image is identified, the image is sent into a human body gesture recognition network model for recognition, human body joint movement information is identified, and the times of assembly actions of an assembly worker are judged by using the human body joint movement information.

Description

Method and device for monitoring assembly operation based on deep learning
Technical Field
The invention relates to a method and a device for monitoring assembly operation based on deep learning, and belongs to the technical field of intelligent manufacturing and assembly monitoring.
Background
In a mass custom production model, a factory often needs to fully utilize existing resource reassembly assembly lines to produce different products to meet the personalized needs of users. However, the assembler often has difficulty adapting to the changeable production mode, so that the key operation steps are forgotten or the operation degree is not up to standard, and the quality of the product is affected.
Therefore, the assembly actions of workers are monitored, the phenomena that the assembly workers forget key operation steps and the operation degree is not up to standard and the like can be prevented, and the product quality problem is avoided. In the past industry field, the monitoring emphasis has been placed on fault diagnosis, surface defect etc. detection, can not timely avoid the emergence of problem like this, and monitor the assembly action of workman in the assembly process then can in time discover the problem and remind assembly workman to reduce product disqualification rate and manufacturing cost. Therefore, how to monitor the assembly actions of workers in the assembly process with low cost and high efficiency is the development direction of the future assembly manufacturing industry.
Computer vision technology is to make a computer realize analysis and understanding of images or videos like human beings through a certain algorithm. In recent years, with the wide application of deep learning algorithms in the field of computer vision, tasks such as pattern recognition, target detection, motion recognition, human body posture estimation and the like have all obtained breakthrough progress. The deep learning can self-adaptively learn the characteristics, and the end-to-end training can be performed only by designing a proper neural network model, so that the characteristic engineering in the traditional machine learning is avoided, and the complex manual design characteristics are not needed. Therefore, the assembly action monitoring by utilizing the computer vision technology based on deep learning can effectively avoid the product quality problem.
Disclosure of Invention
In order to solve the technical problems, the method and the device for monitoring the assembly operation based on the deep learning utilize the target detection and gesture recognition technology to realize the recognition of the assembly actions of workers and the judgment of the action times, so that the monitoring of the assembly process of products is realized, the intelligent degree is improved, the operation cost is reduced, and the production period is shortened.
The technical scheme adopted by the invention is as follows:
the technical scheme is as follows:
a method of monitoring assembly operations based on deep learning, comprising the steps of:
establishing an assembly tool sample set: the method comprises the steps that an experimenter performs assembly actions by using different assembly tools, images of the assembly actions performed by the experimenter are shot, images of the different assembly tools in the images are collected, each image is marked, image labels containing tool type information and position information in the images are generated, and all the image labels are collected to be used as an assembly tool sample set;
training an assembly tool monitoring model: sending the assembly tool sample set into a target detection deep learning network model for training, and generating an assembly tool monitoring model;
and (3) real-time monitoring: installing a camera on an assembly site to shoot an image assembled by an assembly worker in real time, and sending the image into an assembly tool monitoring model for identification, so as to identify an assembly tool used by the assembly worker in the image;
monitoring the assembly action times: after the assembly tool used by the assembly worker in the image is identified, the image is sent into a human body gesture identification network model for identification, human body joint movement information of the assembly worker is identified, and the number of times that the assembly worker uses the assembly tool in the current image to carry out assembly action is judged by using the human body joint movement information;
error early warning: comparing the sequence of the assembling tools operated by the assembling workers and the times of assembling actions by using the assembling tools with the assembling flow of the original assembling production line, and prompting whether the assembling is correct or not to the assembling workers according to the comparison result.
Further, the step of establishing the assembly tool sample set specifically includes:
shooting assembly actions of a plurality of experimenters by adopting an RGB camera, shooting images of different assembly actions of each experimenter, shooting for 10 seconds by each assembly action, and extracting 10 images per second;
marking the assembly tool in each image by using image marking software, generating an image tag comprising the coordinate information of the assembly tool and the category information of the assembly tool, and discarding the image if the image has no tool information;
and carrying out data set division on all the image labels, and dividing the image labels into a training set, a verification set and a test set.
Further, the training and assembling tool monitoring model comprises the following steps:
setting super parameters of a target detection deep learning network model, inputting images in a training set and a verification set into the target detection deep learning network model for training, and continuously optimizing network weights according to the result of the verification set in the training process;
during training, according to the change conditions of an accuracy rate curve and a loss curve in the training process, determining the optimal training iteration times, stopping training, and storing the trained network weight;
selecting different IoU threshold values, comparing the change of the index of the accuracy and the recall rate under the different threshold values, and determining an optimal IoU threshold value;
and (3) testing on a test set by storing the trained network weight and setting an optimal IoU threshold value, and if the test result meets the expectations and no fitting phenomenon occurs, completing training of the training assembly tool monitoring model.
Further, the step of monitoring the number of assembly actions specifically includes:
using an RGB camera to shoot RGB images of assembly operations of assembly workers in real time on an assembly operation site, using an assembly tool monitoring model to monitor the shot images in real time, and judging that the assembly workers are about to use the assembly tools to perform corresponding assembly actions when an assembly tool appears in the images;
after detecting an assembly tool, immediately inputting the RGB image into a human body gesture recognition network model, and acquiring the coordinate information of each joint point and each joint point of a human body in the RGB image;
counting the upper limit and the lower limit of human body articulation point coordinates according to the action range of the human body articulation point for assembly action, carrying out data cleaning on articulation point coordinate information according to the upper limit and the lower limit of the human body articulation point coordinates, and eliminating articulation point coordinate information which does not accord with the range;
and drawing a curve of the change of the coordinate of the joint point along with time by using the coordinate information of all the joint points after data cleaning, and judging the number of assembly actions through the alternating change of wave peaks and wave troughs in the curve.
Further, the average value of all the coordinate information of the nodes after the data cleaning is subtracted.
The second technical scheme is as follows:
an apparatus for deep learning based monitoring of assembly operations, comprising a memory and a processor, the memory storing instructions adapted to be loaded by the processor and to perform the steps of:
establishing an assembly tool sample set: the method comprises the steps that an experimenter performs assembly actions by using different assembly tools, images of the assembly actions performed by the experimenter are shot, images of the different assembly tools in the images are collected, each image is marked, image labels containing tool type information and position information in the images are generated, and all the image labels are collected to be used as an assembly tool sample set;
training an assembly tool monitoring model: sending the assembly tool sample set into a target detection deep learning network model for training, and generating an assembly tool monitoring model;
and (3) real-time monitoring: installing a camera on an assembly site to shoot an image assembled by an assembly worker in real time, and sending the image into an assembly tool monitoring model for identification, so as to identify an assembly tool used by the assembly worker in the image;
monitoring the assembly action times: after the assembly tool used by the assembly worker in the image is identified, the image is sent into a human body gesture identification network model for identification, human body joint movement information of the assembly worker is identified, and the number of times that the assembly worker uses the assembly tool in the current image to carry out assembly action is judged by using the human body joint movement information;
error early warning: comparing the sequence of the assembling tools operated by the assembling workers and the times of assembling actions by using the assembling tools with the assembling flow of the original assembling production line, and prompting whether the assembling is correct or not to the assembling workers according to the comparison result.
Further, the step of establishing the assembly tool sample set specifically includes:
shooting assembly actions of a plurality of experimenters by adopting an RGB camera, shooting images of different assembly actions of each experimenter, shooting for 10 seconds by each assembly action, and extracting 10 images per second;
marking the assembly tool in each image by using image marking software, generating an image tag comprising the coordinate information of the assembly tool and the category information of the assembly tool, and discarding the image if the image has no tool information;
and carrying out data set division on all the image labels, and dividing the image labels into a training set, a verification set and a test set.
Further, the training and assembling tool monitoring model comprises the following steps:
setting super parameters of a target detection deep learning network model, inputting images in a training set and a verification set into the target detection deep learning network model for training, and continuously optimizing network weights according to the result of the verification set in the training process;
during training, according to the change conditions of an accuracy rate curve and a loss curve in the training process, determining the optimal training iteration times, stopping training, and storing the trained network weight;
selecting different IoU threshold values, comparing the change of the index of the accuracy and the recall rate under the different threshold values, and determining an optimal IoU threshold value;
and (3) testing on a test set by storing the trained network weight and setting an optimal IoU threshold value, and if the test result meets the expectations and no fitting phenomenon occurs, completing training of the training assembly tool monitoring model.
Further, the step of monitoring the number of assembly actions specifically includes:
using an RGB camera to shoot RGB images of assembly operations of assembly workers in real time on an assembly operation site, using an assembly tool monitoring model to monitor the shot images in real time, and judging that the assembly workers are about to use the assembly tools to perform corresponding assembly actions when an assembly tool appears in the images;
after detecting an assembly tool, immediately inputting the RGB image into a human body gesture recognition network model, and acquiring the coordinate information of each joint point and each joint point of a human body in the RGB image;
counting the upper limit and the lower limit of human body articulation point coordinates according to the action range of the human body articulation point for assembly action, carrying out data cleaning on articulation point coordinate information according to the upper limit and the lower limit of the human body articulation point coordinates, and eliminating articulation point coordinate information which does not accord with the range;
and drawing a curve of the change of the coordinate of the joint point along with time by using the coordinate information of all the joint points after data cleaning, and judging the number of assembly actions through the alternating change of wave peaks and wave troughs in the curve.
Further, the average value of all the coordinate information of the nodes after the data cleaning is subtracted.
The invention has the following beneficial effects:
1. according to the method and the device for monitoring the assembly operation based on the deep learning, the target detection is used for replacing the action detection, the type of the assembly operation (such as a hammer, a nut, a screw nut and the like) can be judged by using one frame of image, the repetition number of the assembly operation can be judged, and the monitoring of the assembly process is realized.
2. The invention relates to a method and a device for monitoring assembly operation based on deep learning, which are used for recognizing assembly action as tool object detection based on the characteristics of repeatability of assembly action and tool dependence. Compared with the traditional action recognition algorithm, the method has higher recognition accuracy and higher recognition speed.
3. According to the method and the device for monitoring the assembly operation based on the deep learning, the human body gesture recognition network model is utilized for gesture estimation and coordinate data of key joint points are extracted; finally, through cleaning and analyzing the coordinate data, the judgment of the repeated times of the assembly actions is realized according to the rule of the change of the coordinate information along with time. After the monitoring video of the assembly operation site is processed by the method, a manager can quickly judge the repeated times of actions of workers by only one image, and compared with the common human supervision, the method has the advantages of saving manpower input to a great extent and improving monitoring quality.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention;
FIG. 2 is an exemplary diagram of an assembly tool monitoring model detecting an assembly tool;
FIGS. 3 and 4 are example diagrams of identifying node data using a human gesture recognition network model;
FIG. 5 is a hammer action curve plotted in the example;
FIG. 6 is a file action curve plotted in the example;
fig. 7 is a graph of the action of screwing nuts plotted in the example.
Detailed Description
The invention will now be described in detail with reference to the drawings and to specific embodiments.
Example 1
Referring to fig. 1, a method for monitoring assembly operations based on deep learning includes the following steps:
establishing an assembly tool sample set: the method comprises the steps that an experimenter performs assembly actions by using different assembly tools, images of the assembly actions performed by the experimenter are shot, images of the different assembly tools in the images are collected, each image is marked (the images can be marked through image marking software, such as LableImage software), image labels containing tool type information and position information in the images are generated, and all the image labels are collected to serve as an assembly tool sample set.
Training an assembly tool monitoring model: sending the assembly tool sample set into a target detection deep learning network model for training, and generating an assembly tool monitoring model; when the target detection deep learning network model is trained, the appearance and the approximate outline of the currently-marked tool class information are learned according to the position information marked in the image tag, then the trained target detection deep learning network model is taken to analyze the image video, and the corresponding target in the video can be quickly found according to the self-learned parameters and is framed.
And (3) real-time monitoring: installing a camera on an assembly site to shoot an image assembled by an assembly worker in real time, and sending the image into an assembly tool monitoring model for identification, so as to identify an assembly tool used by the assembly worker in the image; and judging what assembling action is performed by the current assembling worker according to the assembling tool, and judging that the assembling worker is performing the action of hammering if the tool identified in the image is a hammer.
Monitoring the assembly action times: after the assembly tool used by the assembly worker in the image is identified, the image is sent into a human body gesture identification network model for identification, human body joint movement information of the assembly worker is identified, and the number of times that the assembly worker uses the assembly tool in the current image to carry out assembly action is judged by using the human body joint movement information; for example: when the assembly tool monitoring model recognizes that the assembly tool, namely a hammer, appears in an image shot by the camera in real time, the assembly tool monitoring model immediately starts to recognize the action times of assembly workers through the human body gesture recognition network model until the assembly tool monitoring model cannot monitor that the hammer exists in the camera image, and the action times recorded in the process (for example, the hammer is 5 times) are counted.
Error early warning: comparing the sequence of the assembling tools operated by the assembling workers and the times of assembling actions by using the assembling tools with the assembling flow of the original assembling production line, and prompting whether the assembling is correct or not to the assembling workers according to the comparison result. For example, the assembly process of the original assembly production line is to hammer the production product for five times, file for five times and screw the nut for three times; and comparing the monitored actual information with the original flow, wherein the actual assembly worker firstly hammers five times during assembly, then files four times, and then the assembly tool monitoring model monitors that files in the image disappear, the file is judged to be finished, the action times are not consistent with the original times, an alarm is sent to the assembly worker, and the alarm can be automatically sent in a voice or caption mode.
In this embodiment, the object detection replaces the motion detection, and the type of the assembly operation (such as hammering, contusion, screwing, etc.) can be determined by using one frame of image, and the repetition number of the assembly operation can also be determined, so as to realize the monitoring of the assembly process.
Example two
The present embodiment not only has the advantages of the first embodiment, but also proposes that the step of creating the assembly tool sample set specifically includes:
adopt RGB camera to shoot the assembly action of many experimenters, shoot every experimenter and carry out the image of different assembly actions, if: basic assembly actions such as hammers, files, brushes, saws, nuts and the like; shooting for 10 seconds in each assembly action, and extracting 10 images in each second;
marking the assembly tool in each image by using image marking software, generating an image tag comprising the coordinate information of the assembly tool and the category information of the assembly tool, and discarding the image if the image has no tool information;
and carrying out data set division on all the image labels, and dividing the image labels into a training set, a verification set and a test set.
Further, the training and assembling tool monitoring model comprises the following steps:
setting super parameters of a target detection deep learning network model: the momentum of gradient decline, the initial learning rate and the attenuation coefficient, the images in a training set and a verification set are input into a target detection deep learning network model (the embodiment adopts a YOLOv3 target detection deep learning network model) for training, and in the training process, the network weight is continuously optimized according to the result of the verification set;
during training, according to the change conditions of an accuracy rate curve and a loss curve in the training process, determining the optimal training iteration times, stopping training, and storing the trained network weight;
selecting different IoU threshold values, comparing the change of the index of the accuracy and the recall rate under the different threshold values, and determining an optimal IoU threshold value;
and (3) testing on a test set by storing the trained network weight and setting an optimal IoU threshold value, and if the test result meets the expectations and no fitting phenomenon occurs, completing training of the training assembly tool monitoring model.
Referring specifically to fig. 2, further, the step of monitoring the number of assembly actions specifically includes:
using an RGB camera to shoot an RGB image of assembly operation of an assembly worker in real time on an assembly operation site, using a trained assembly tool monitoring model to detect tool information of an input image in real time, outputting tool type and position information detected by a current frame, judging that the assembly worker is about to use the assembly tool to perform corresponding assembly action when detecting that an assembly tool appears in the current frame image, and judging that the current assembly action is a hammer when detecting that the tool in the image is a hammer as shown in fig. 2;
referring specifically to fig. 3 and fig. 4, after detecting an assembly tool, immediately inputting an RGB image into a human body posture recognition network model (in this embodiment, the human body posture recognition network model adopts an openPose posture estimation network model), and acquiring coordinate information of each joint point and each joint point of a human body in the RGB image; and collecting the coordinate information of each joint point and each joint point of the human body in the image frame by frame until the assembly tool cannot be detected (namely, the current assembly action is finished), and ending the input of the coordinate information of each joint point and each joint point of the human body.
In the present embodiment, the number of motions is estimated for the time-varying condition of the motion hammer by the longitudinal (Y-direction) coordinate information, for the time-varying condition of the motion file by the transverse (X-direction) coordinate information, and for the joint-varying condition of the nut screw by X, Y directionThe situation estimates the number of actions. The output format of OpenPose to the human body joint point coordinates is 'phase_keypoints_2d': x 1 ,y 1 ,c 1 ,…,x n ,y n ,c n ]Wherein x is n ,y n Is the human body joint point coordinate, c n The accuracy of the joint position prediction is determined. The joint point coordinates of each frame are extracted in time sequence, and in this embodiment, only the wrist joint point of the arm of the grasping tool is extracted.
For model and system performance reasons, it is difficult to achieve one hundred percent accuracy. In some frames, the positions of joints cannot be detected or predicted joints are seriously inconsistent with the real situation, so that the condition that the coordinates are 0 or the jumping performance is extremely high exists, and data cleaning is required, the step is to count the upper limit and the lower limit of the coordinates of the joints of the human body according to the action range of the joints of the human body for assembly action, and the data cleaning is carried out on the coordinate information of the joints according to the upper limit and the lower limit of the coordinates of the joints of the human body, so that the coordinate information of the joints which do not accord with the range is removed.
And drawing a curve of the change of the coordinates of the joint points along with time by using the coordinate information of all the joint points after data cleaning (in the embodiment, drawing by using a Python third party library matplotlib), and judging the number of assembly actions through the alternating change of wave peaks and wave troughs in the curve.
Further, the average value subtracting processing is performed on all the coordinate information of the joint points after the data cleaning, if the Y coordinate information of the hammer is changed between 900 and 950, one average value 900 or 890 can be subtracted from all the Y coordinates, so that the action fluctuation can be more visual.
In this embodiment, the curve drawing result refers to fig. 5-7, where fig. 5 is a hammer action curve, the ordinate represents the Y-direction coordinate value of the articulation point in the actual space, and the abscissa represents the frame number; FIG. 6 is a file action curve, with the ordinate representing the X-direction coordinate values of the articulation point in real space and the abscissa representing the number of frames; FIG. 7 is a graph of a nut tightening motion plotted with X and Y coordinate values in real space of the joint point as the abscissa and the ordinate, and with the frame number as the Z-axis coordinate; the action of screwing the nut is performed 3 times by judging that the hammer and the file are performed 5 times through the curve wave crest.
The present embodiment regards the recognition of the assembly action as tool object detection in the present invention based on the characteristics of the repeatability of the assembly action and the tool dependency. Compared to conventional action recognition algorithms, the YOLOv3 object detection algorithm is more suitable for assembly action recognition in repeatability and tool-dependent assembly actions. Compared with a conventional action recognition algorithm, the YOLOv3 model has higher recognition accuracy and higher recognition speed.
Carrying out attitude estimation by using OpenPose and extracting coordinate data of key nodes; finally, through cleaning and analyzing the coordinate data, the judgment of the repeated times of the assembly actions is realized according to the rule of the change of the coordinate information along with time. After the monitoring video of the assembly operation site is processed by the method provided by the embodiment, a manager can quickly judge the repeated times of actions of workers by only one image, and compared with the common human supervision, the method has the advantages of saving manpower input to a great extent and improving monitoring quality.
Example III
An apparatus for deep learning based monitoring of assembly operations, comprising a memory and a processor, the memory storing instructions adapted to be loaded by the processor and to perform the steps of:
establishing an assembly tool sample set: the method comprises the steps that an experimenter performs assembly actions by using different assembly tools, images of the assembly actions performed by the experimenter are shot, images of the different assembly tools in the images are collected, each image is marked (the images can be marked through image marking software, such as LableImage software), image labels containing tool type information and position information in the images are generated, and all the image labels are collected to serve as an assembly tool sample set.
Training an assembly tool monitoring model: sending the assembly tool sample set into a target detection deep learning network model for training, and generating an assembly tool monitoring model; when the target detection deep learning network model is trained, the appearance and the approximate outline of the currently-marked tool class information are learned according to the position information marked in the image tag, then the trained target detection deep learning network model is taken to analyze the image video, and the corresponding target in the video can be quickly found according to the self-learned parameters and is framed.
And (3) real-time monitoring: installing a camera on an assembly site to shoot an image assembled by an assembly worker in real time, and sending the image into an assembly tool monitoring model for identification, so as to identify an assembly tool used by the assembly worker in the image; and judging what assembling action is performed by the current assembling worker according to the assembling tool, and judging that the assembling worker is performing the action of hammering if the tool identified in the image is a hammer.
Monitoring the assembly action times: after the assembly tool used by the assembly worker in the image is identified, the image is sent into a human body gesture identification network model for identification, human body joint movement information of the assembly worker is identified, and the number of times that the assembly worker uses the assembly tool in the current image to carry out assembly action is judged by using the human body joint movement information; for example: when the assembly tool monitoring model recognizes that the assembly tool, namely a hammer, appears in an image shot by the camera in real time, the assembly tool monitoring model immediately starts to recognize the action times of assembly workers through the human body gesture recognition network model until the assembly tool monitoring model cannot monitor that the hammer exists in the camera image, and the action times recorded in the process (for example, the hammer is 5 times) are counted.
Error early warning: comparing the sequence of the assembling tools operated by the assembling workers and the times of assembling actions by using the assembling tools with the assembling flow of the original assembling production line, and prompting whether the assembling is correct or not to the assembling workers according to the comparison result. For example, the assembly process of the original assembly production line is to hammer the production product for five times, file for five times and screw the nut for three times; and comparing the monitored actual information with the original flow, wherein the actual assembly worker firstly hammers five times during assembly, then files four times, and then the assembly tool monitoring model monitors that files in the image disappear, the file is judged to be finished, the action times are not consistent with the original times, an alarm is sent to the assembly worker, and the alarm can be automatically sent in a voice or caption mode.
In this embodiment, the object detection replaces the motion detection, and the type of the assembly operation (such as hammering, contusion, screwing, etc.) can be determined by using one frame of image, and the repetition number of the assembly operation can also be determined, so as to realize the monitoring of the assembly process.
Example IV
The third embodiment has the advantages of the third embodiment, and further proposes that the step of creating the assembly tool sample set specifically includes:
adopt RGB camera to shoot the assembly action of many experimenters, shoot every experimenter and carry out the image of different assembly actions, if: basic assembly actions such as hammers, files, brushes, saws, nuts and the like; shooting for 10 seconds in each assembly action, and extracting 10 images in each second;
marking the assembly tool in each image by using image marking software, generating an image tag comprising the coordinate information of the assembly tool and the category information of the assembly tool, and discarding the image if the image has no tool information;
and carrying out data set division on all the image labels, and dividing the image labels into a training set, a verification set and a test set.
Further, the training and assembling tool monitoring model comprises the following steps:
setting super parameters of a target detection deep learning network model: the momentum of gradient decline, the initial learning rate and the attenuation coefficient, the images in a training set and a verification set are input into a target detection deep learning network model (the embodiment adopts a YOLOv3 target detection deep learning network model) for training, and in the training process, the network weight is continuously optimized according to the result of the verification set;
during training, according to the change conditions of an accuracy rate curve and a loss curve in the training process, determining the optimal training iteration times, stopping training, and storing the trained network weight;
selecting different IoU threshold values, comparing the change of the index of the accuracy and the recall rate under the different threshold values, and determining an optimal IoU threshold value;
and (3) testing on a test set by storing the trained network weight and setting an optimal IoU threshold value, and if the test result meets the expectations and no fitting phenomenon occurs, completing training of the training assembly tool monitoring model.
Referring specifically to fig. 2, further, the step of monitoring the number of assembly actions specifically includes:
using an RGB camera to shoot an RGB image of assembly operation of an assembly worker in real time on an assembly operation site, using a trained assembly tool monitoring model to detect tool information of an input image in real time, outputting tool type and position information detected by a current frame, judging that the assembly worker is about to use the assembly tool to perform corresponding assembly action when detecting that an assembly tool appears in the current frame image, and judging that the current assembly action is a hammer when detecting that the tool in the image is a hammer as shown in fig. 2;
referring specifically to fig. 3 and fig. 4, after detecting an assembly tool, immediately inputting an RGB image into a human body posture recognition network model (in this embodiment, the human body posture recognition network model adopts an openPose posture estimation network model), and acquiring coordinate information of each joint point and each joint point of a human body in the RGB image; and collecting the coordinate information of each joint point and each joint point of the human body in the image frame by frame until the assembly tool cannot be detected (namely, the current assembly action is finished), and ending the input of the coordinate information of each joint point and each joint point of the human body.
In this embodiment, the number of operations is estimated for the case where the operating hammer is changed with time by the longitudinal (Y-direction) coordinate information, for the case where the operating file is changed with time by the transverse (X-direction) coordinate information, and for the case where the nut is screwed with the screw nut, for the case where the nut is changed with time by the X, Y direction. The output format of OpenPose to the human body joint point coordinates is 'phase_keypoints_2d': x 1 ,y 1 ,c 1 ,…,x n ,y n ,c n ]Wherein x is n ,y n Is the human body joint point coordinate, c n The accuracy of the joint position prediction is determined. The joint point coordinates of each frame are extracted in time sequence, and in this embodiment, only the wrist joint point of the arm of the grasping tool is extracted.
For model and system performance reasons, it is difficult to achieve one hundred percent accuracy. In some frames, the positions of joints cannot be detected or predicted joints are seriously inconsistent with the real situation, so that the condition that the coordinates are 0 or the jumping performance is extremely high exists, and data cleaning is required, the step is to count the upper limit and the lower limit of the coordinates of the joints of the human body according to the action range of the joints of the human body for assembly action, and the data cleaning is carried out on the coordinate information of the joints according to the upper limit and the lower limit of the coordinates of the joints of the human body, so that the coordinate information of the joints which do not accord with the range is removed.
And drawing a curve of the change of the coordinates of the joint points along with time by using the coordinate information of all the joint points after data cleaning (in the embodiment, drawing by using a Python third party library matplotlib), and judging the number of assembly actions through the alternating change of wave peaks and wave troughs in the curve.
Further, the average value subtracting processing is performed on all the coordinate information of the joint points after the data cleaning, if the Y coordinate information of the hammer is changed between 900 and 950, one average value 900 or 890 can be subtracted from all the Y coordinates, so that the action fluctuation can be more visual.
In this embodiment, the curve drawing result refers to fig. 5-7, where fig. 5 is a hammer action curve, the ordinate represents the Y-direction coordinate value of the articulation point in the actual space, and the abscissa represents the frame number; FIG. 6 is a file action curve, with the ordinate representing the X-direction coordinate values of the articulation point in real space and the abscissa representing the number of frames; FIG. 7 is a graph of a nut tightening motion plotted with X and Y coordinate values in real space of the joint point as the abscissa and the ordinate, and with the frame number as the Z-axis coordinate; the action of screwing the nut is performed 3 times by judging that the hammer and the file are performed 5 times through the curve wave crest.
The present embodiment regards the recognition of the assembly action as tool object detection in the present invention based on the characteristics of the repeatability of the assembly action and the tool dependency. Compared to conventional action recognition algorithms, the YOLOv3 object detection algorithm is more suitable for assembly action recognition in repeatability and tool-dependent assembly actions. Compared with a conventional action recognition algorithm, the YOLOv3 model has higher recognition accuracy and higher recognition speed.
Carrying out attitude estimation by using OpenPose and extracting coordinate data of key nodes; finally, through cleaning and analyzing the coordinate data, the judgment of the repeated times of the assembly actions is realized according to the rule of the change of the coordinate information along with time. After the monitoring video of the assembly operation site is processed by the method provided by the embodiment, a manager can quickly judge the repeated times of actions of workers by only one image, and compared with the common human supervision, the method has the advantages of saving manpower input to a great extent and improving monitoring quality.
The foregoing description is only illustrative of the present invention and is not intended to limit the scope of the invention, and all equivalent structures or equivalent processes or direct or indirect application in other related technical fields are included in the scope of the present invention.

Claims (8)

1. A method for monitoring assembly operations based on deep learning, comprising the steps of:
establishing an assembly tool sample set: the method comprises the steps that an experimenter performs assembly actions by using different assembly tools, images of the assembly actions performed by the experimenter are shot, images of the different assembly tools in the images are collected, each image is marked, image labels containing tool type information and position information in the images are generated, and all the image labels are collected to be used as an assembly tool sample set;
training an assembly tool monitoring model: sending the assembly tool sample set into a target detection deep learning network model for training, and generating an assembly tool monitoring model;
and (3) real-time monitoring: installing a camera on an assembly site to shoot an image assembled by an assembly worker in real time, and sending the image into an assembly tool monitoring model for identification, so as to identify an assembly tool used by the assembly worker in the image;
monitoring the assembly action times: after the assembly tool used by the assembly worker in the image is identified, the image is sent into a human body gesture identification network model for identification, human body joint movement information of the assembly worker is identified, and the number of times that the assembly worker uses the assembly tool in the current image to carry out assembly action is judged by using the human body joint movement information;
error early warning: comparing the sequence of assembling tools operated by an assembling worker and the times of assembling actions by using each assembling tool with the assembling flow of the original assembling production line, and prompting whether the assembling is correct or not to the assembling worker according to the comparison result;
the assembly action frequency monitoring step specifically comprises the following steps:
using an RGB camera to shoot RGB images of assembly operations of assembly workers in real time on an assembly operation site, using an assembly tool monitoring model to monitor the shot images in real time, and judging that the assembly workers are about to use the assembly tools to perform corresponding assembly actions when an assembly tool appears in the images;
after detecting an assembly tool, immediately inputting the RGB image into a human body gesture recognition network model, and acquiring the coordinate information of each joint point and each joint point of a human body in the RGB image;
counting the upper limit and the lower limit of human body articulation point coordinates according to the action range of the human body articulation point for assembly action, carrying out data cleaning on articulation point coordinate information according to the upper limit and the lower limit of the human body articulation point coordinates, and eliminating articulation point coordinate information which does not accord with the range;
and drawing a curve of the change of the coordinate of the joint point along with time by using the coordinate information of all the joint points after data cleaning, and judging the number of assembly actions through the alternating change of wave peaks and wave troughs in the curve.
2. The method for monitoring assembly operations based on deep learning of claim 1, wherein the step of creating an assembly tool sample set is specifically:
shooting assembly actions of a plurality of experimenters by adopting an RGB camera, shooting images of different assembly actions of each experimenter, shooting for 10 seconds by each assembly action, and extracting 10 images per second;
marking the assembly tool in each image by using image marking software, generating an image tag comprising the coordinate information of the assembly tool and the category information of the assembly tool, and discarding the image if the image has no tool information;
and carrying out data set division on all the image labels, and dividing the image labels into a training set, a verification set and a test set.
3. The method for monitoring assembly operations based on deep learning of claim 2, wherein the training assembly tool monitoring model step specifically comprises:
setting super parameters of a target detection deep learning network model, inputting images in a training set and a verification set into the target detection deep learning network model for training, and continuously optimizing network weights according to the result of the verification set in the training process;
during training, according to the change conditions of an accuracy rate curve and a loss curve in the training process, determining the optimal training iteration times, stopping training, and storing the trained network weight;
selecting different IoU threshold values, comparing the change of the index of the accuracy and the recall rate under the different threshold values, and determining an optimal IoU threshold value;
and (3) testing on a test set by storing the trained network weight and setting an optimal IoU threshold value, and if the test result meets the expectations and no fitting phenomenon occurs, completing training of the training assembly tool monitoring model.
4. A method of monitoring assembly operations based on deep learning as claimed in claim 1, wherein: and carrying out mean value reduction processing on the coordinate information of all the nodes after the data cleaning.
5. An apparatus for deep learning based monitoring of assembly operations, comprising a memory and a processor, the memory storing instructions adapted to be loaded by the processor and to perform the steps of:
establishing an assembly tool sample set: the method comprises the steps that an experimenter performs assembly actions by using different assembly tools, images of the assembly actions performed by the experimenter are shot, images of the different assembly tools in the images are collected, each image is marked, image labels containing tool type information and position information in the images are generated, and all the image labels are collected to be used as an assembly tool sample set;
training an assembly tool monitoring model: sending the assembly tool sample set into a target detection deep learning network model for training, and generating an assembly tool monitoring model;
and (3) real-time monitoring: installing a camera on an assembly site to shoot an image assembled by an assembly worker in real time, and sending the image into an assembly tool monitoring model for identification, so as to identify an assembly tool used by the assembly worker in the image;
monitoring the assembly action times: after the assembly tool used by the assembly worker in the image is identified, the image is sent into a human body gesture identification network model for identification, human body joint movement information of the assembly worker is identified, and the number of times that the assembly worker uses the assembly tool in the current image to carry out assembly action is judged by using the human body joint movement information;
error early warning: comparing the sequence of assembling tools operated by an assembling worker and the times of assembling actions by using each assembling tool with the assembling flow of the original assembling production line, and prompting whether the assembling is correct or not to the assembling worker according to the comparison result;
the assembly action frequency monitoring step specifically comprises the following steps:
using an RGB camera to shoot RGB images of assembly operations of assembly workers in real time on an assembly operation site, using an assembly tool monitoring model to monitor the shot images in real time, and judging that the assembly workers are about to use the assembly tools to perform corresponding assembly actions when an assembly tool appears in the images;
after detecting an assembly tool, immediately inputting the RGB image into a human body gesture recognition network model, and acquiring the coordinate information of each joint point and each joint point of a human body in the RGB image;
counting the upper limit and the lower limit of human body articulation point coordinates according to the action range of the human body articulation point for assembly action, carrying out data cleaning on articulation point coordinate information according to the upper limit and the lower limit of the human body articulation point coordinates, and eliminating articulation point coordinate information which does not accord with the range;
and drawing a curve of the change of the coordinate of the joint point along with time by using the coordinate information of all the joint points after data cleaning, and judging the number of assembly actions through the alternating change of wave peaks and wave troughs in the curve.
6. The apparatus for deep learning based monitoring of assembly operations of claim 5, wherein the step of creating an assembly tool sample set is specifically:
shooting assembly actions of a plurality of experimenters by adopting an RGB camera, shooting images of different assembly actions of each experimenter, shooting for 10 seconds by each assembly action, and extracting 10 images per second;
marking the assembly tool in each image by using image marking software, generating an image tag comprising the coordinate information of the assembly tool and the category information of the assembly tool, and discarding the image if the image has no tool information;
and carrying out data set division on all the image labels, and dividing the image labels into a training set, a verification set and a test set.
7. The apparatus for deep learning based monitoring of assembly operations of claim 6, wherein the training assembly tool monitoring model step is specifically:
setting super parameters of a target detection deep learning network model, inputting images in a training set and a verification set into the target detection deep learning network model for training, and continuously optimizing network weights according to the result of the verification set in the training process;
during training, according to the change conditions of an accuracy rate curve and a loss curve in the training process, determining the optimal training iteration times, stopping training, and storing the trained network weight;
selecting different IoU threshold values, comparing the change of the index of the accuracy and the recall rate under the different threshold values, and determining an optimal IoU threshold value;
and (3) testing on a test set by storing the trained network weight and setting an optimal IoU threshold value, and if the test result meets the expectations and no fitting phenomenon occurs, completing training of the training assembly tool monitoring model.
8. The apparatus for deep learning based assembly operation monitoring of claim 5, wherein the means for reducing the average of all the node coordinate information after the data cleaning is performed.
CN201911383314.3A 2019-12-28 2019-12-28 Method and device for monitoring assembly operation based on deep learning Active CN111062364B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911383314.3A CN111062364B (en) 2019-12-28 2019-12-28 Method and device for monitoring assembly operation based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911383314.3A CN111062364B (en) 2019-12-28 2019-12-28 Method and device for monitoring assembly operation based on deep learning

Publications (2)

Publication Number Publication Date
CN111062364A CN111062364A (en) 2020-04-24
CN111062364B true CN111062364B (en) 2023-06-30

Family

ID=70304349

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911383314.3A Active CN111062364B (en) 2019-12-28 2019-12-28 Method and device for monitoring assembly operation based on deep learning

Country Status (1)

Country Link
CN (1) CN111062364B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112651306A (en) * 2020-12-10 2021-04-13 深兰人工智能(四川)有限公司 Tool taking monitoring method and device
CN114155610B (en) * 2021-12-09 2023-01-24 中国矿业大学 Panel assembly key action identification method based on upper half body posture estimation
CN116385442B (en) * 2023-06-06 2023-08-18 青岛理工大学 Virtual assembly defect detection method based on deep learning

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017133009A1 (en) * 2016-02-04 2017-08-10 广州新节奏智能科技有限公司 Method for positioning human joint using depth image of convolutional neural network
CN107451568A (en) * 2017-08-03 2017-12-08 重庆邮电大学 Use the attitude detecting method and equipment of depth convolutional neural networks
CN108764066A (en) * 2018-05-08 2018-11-06 南京邮电大学 A kind of express delivery sorting working specification detection method based on deep learning
CN109271938A (en) * 2018-09-19 2019-01-25 上海鸢安智能科技有限公司 A kind of gas station's emptying Safety Monitoring Control method based on intelligent video analysis technology
CN109299659A (en) * 2018-08-21 2019-02-01 中国农业大学 A kind of human posture recognition method and system based on RGB camera and deep learning
CN109492581A (en) * 2018-11-09 2019-03-19 中国石油大学(华东) A kind of human motion recognition method based on TP-STG frame
CN109711320A (en) * 2018-12-24 2019-05-03 兴唐通信科技有限公司 A kind of operator on duty's unlawful practice detection method and system
CN110110613A (en) * 2019-04-19 2019-08-09 北京航空航天大学 A kind of rail traffic exception personnel's detection method based on action recognition

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017133009A1 (en) * 2016-02-04 2017-08-10 广州新节奏智能科技有限公司 Method for positioning human joint using depth image of convolutional neural network
CN107451568A (en) * 2017-08-03 2017-12-08 重庆邮电大学 Use the attitude detecting method and equipment of depth convolutional neural networks
CN108764066A (en) * 2018-05-08 2018-11-06 南京邮电大学 A kind of express delivery sorting working specification detection method based on deep learning
CN109299659A (en) * 2018-08-21 2019-02-01 中国农业大学 A kind of human posture recognition method and system based on RGB camera and deep learning
CN109271938A (en) * 2018-09-19 2019-01-25 上海鸢安智能科技有限公司 A kind of gas station's emptying Safety Monitoring Control method based on intelligent video analysis technology
CN109492581A (en) * 2018-11-09 2019-03-19 中国石油大学(华东) A kind of human motion recognition method based on TP-STG frame
CN109711320A (en) * 2018-12-24 2019-05-03 兴唐通信科技有限公司 A kind of operator on duty's unlawful practice detection method and system
CN110110613A (en) * 2019-04-19 2019-08-09 北京航空航天大学 A kind of rail traffic exception personnel's detection method based on action recognition

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王天诺 ; 陈成军 ; 李东年 ; 洪军.基于3D卷积神经网络的装配动作识别.组合机床与自动化加工技术.2019,(008),全文. *

Also Published As

Publication number Publication date
CN111062364A (en) 2020-04-24

Similar Documents

Publication Publication Date Title
CN111062364B (en) Method and device for monitoring assembly operation based on deep learning
CN107194559B (en) Workflow identification method based on three-dimensional convolutional neural network
CN109159113B (en) Robot operation method based on visual reasoning
CN111400040B (en) Industrial Internet system based on deep learning and edge calculation and working method
CN110253577B (en) Weak-rigidity part assembling system and method based on robot operation technology
CN110287583B (en) Industrial equipment residual life prediction method based on cyclic neural network
Zamora-Hernandez et al. Deep learning-based visual control assistant for assembly in industry 4.0
CN110633738B (en) Rapid classification method for industrial part images
CN112434666B (en) Repetitive motion recognition method, device, medium, and apparatus
CN112464882B (en) Method, apparatus, medium, and device for recognizing continuous motion
CN111507261A (en) Process operation quality monitoring method based on visual target positioning
CN116708038B (en) Industrial Internet enterprise network security threat identification method based on asset mapping
CN115331002A (en) Method for realizing remote processing of heating power station fault based on AR glasses
CN115146798A (en) Assembly robot full-process monitoring and assisting method and system based on body data
CN113570580B (en) Power equipment component loosening identification system and method based on machine vision
CN111045415A (en) Multi-modal process fault detection method based on local probability density double subspace
CN117114420B (en) Image recognition-based industrial and trade safety accident risk management and control system and method
CN112967335A (en) Bubble size monitoring method and device
CN112613476A (en) Method for automatically detecting unsafe behaviors of workers based on machine vision
CN110727669A (en) Device and method for cleaning sensor data of power system
CN116361191A (en) Software compatibility processing method based on artificial intelligence
CN115937675A (en) Target and defect identification method in substation inspection environment
CN112936342A (en) System and method for evaluating actions of entity robot based on human body posture recognition algorithm
CN107515596B (en) Statistical process control method based on image data variable window defect monitoring
Zhang et al. Prediction of human actions in assembly process by a spatial-temporal end-to-end learning model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant