CN116540880A

CN116540880A - Enhanced training management of operators performing industrial processes

Info

Publication number: CN116540880A
Application number: CN202310745641.9A
Authority: CN
Inventors: M·卢茨; V·博福尔
Original assignee: Compagnie Generale des Etablissements Michelin SCA
Current assignee: Compagnie Generale des Etablissements Michelin SCA
Priority date: 2022-06-22
Filing date: 2023-06-21
Publication date: 2023-08-04
Also published as: FR3137205A1

Abstract

The present invention relates to enhanced training management for operators performing industrial processes. The present invention relates to a system comprising one or more processors capable of executing programming instructions stored in a memory operatively connected to the one or more processors to perform a method for training an operator in a predetermined process having a sequence of characteristic gestures, characterized in that the method comprises the steps of: -an acquisition step comprising: a hand tracking step for detecting a position of a start or end of a sequence of characteristic gestures corresponding to a current predetermined procedure, -a hand motion tracking step for monitoring trajectories and motions during the sequence of characteristic gestures, -a step of identifying the position of the operator's hand, -a comparison step for determining any differences between the identified gestures and the correct gestures of the respective predetermined procedure, such that the gestures of the operator performing the predetermined procedure are recorded for use in the current training program.

Description

Enhanced training management of operators performing industrial processes

Technical Field

The present invention relates to instructions given to operators performing industrial processes. More particularly, the present invention relates to an augmented management system based on a combination of augmented reality tools and measurements of an operator's finger movement during training for tasks related to one or more industrial processes.

Background

In the field of industrial processes, assembly, disassembly, operation and maintenance work involves complex actions. Each action requires specific knowledge and technology for each machine and device according to the relevant task. Due to the variety of machines and equipment, industrial companies need to provide periodic training for operators, users, workers, and technicians (collectively, "operators"). Thus, it would be beneficial to make the operator more autonomous in managing the skills required for different roles with novel training means.

In the known training method, the field training is performed directly in the working environment. This training is most effective because the operator learns through experience by directly following instructions provided during the training. On the other hand, this type of training may require a longer training time. Some machines and services may have to be stopped due to the absence of an operator. In addition, any damage to the equipment during training adds additional maintenance costs. Furthermore, industrial processes often include dangerous tasks that are risky for operators.

In industrial processes involving hand intervention, gestures are a form of non-verbal communication between a person and a machine. Gesture recognition is therefore considered an important aspect of human-computer interaction (HCI), which enables a computer to capture and understand gestures and then perform actions. Standardized gesture recognition systems are vision-based and include preprocessing (including hand region detection), feature extraction, and labeling. A common training technique is a surface electromyography (surface electromyography, sEMG) sensor for integrating gestures. Machine Learning (ML) is used to solve the problem of recognizing gestures from sEMG signals. Classifiers for gesture recognition include, but are not limited to, support Vector Machines (SVMs), k-nearest neighbors (k-nearest neighbours, k-NN), decision trees, random forests, linear discriminant analysis, artificial Neural Networks (ANNs), convolutional Neural Networks (CNNs), and equivalents.

Gestures are commonly used for HCI and augmented reality systems because of interpretability and communication speed. Gesture variation and diversity have a significant impact on the recognition rate and effectiveness of gesture recognition methods. Machine learning methods may be used to increase the speed of gesture recognition.

Various techniques may enable an operator to train in a combination of the real world and the virtual world (e.g., elements that combine both). For example, gesture recognition may be achieved by measuring differences between hand shapes with a depth sensor (e.g., a Kinect sensor) (see robustpart-Based Hand Gesture Recognition Using Kinect Sensor, zhou Ren, junsong Yuan, jingjin, zhengyou Zhang, IEEE TRANSACTIONS ON MULTIMEDIA, volume 15, month 5 (2013, month 8) (see "Zhou")Is a registered trademark owned by microsoft corporation). Instead of wearing data gloves, the depth sensor may robustly detect and segment the hand to establish a viable basis for gesture recognition.

In addition, real-time gesture recognition can be obtained from sEMG signals (see Real-Time Surface EMG Pattern Recognition for Hand Gestures Based on an Artificial Neural Network, zhen Zhang, kuo Yang, jinwu Qian and Lunwei Zhang, www.mdpi.com/journ/sensors (day 18 of 2019)) the armband is used to acquire sEMG signals and a sliding window method is employed to segment the data as features are extracted.

In another example, some display devices (e.g., various head mounted display devices) may have a transparent display that superimposes a displayed image onto a real-world background environment. When viewed through a transparent display, the image may appear in front of the real-world background environment. In particular, the image may be displayed on a transparent screen in a manner that appears to be mixed with elements of the real world background environment, which may be referred to as augmented reality.

In an enhanced environment, the augmented reality technology can enable training of tasks related to an industrial production process to be better and more efficient, so that a training mode of doing learning at the same time is realized. For example, patent US 9,643,314 discloses a method of controlling a robot to manipulate objects in a virtual world by capturing actions of an operator. In this case, the capture of the operator's gesture is transcribed to the robot, rather than compared to the expected gesture.

In another example, patent US 10,952,488 discloses a welding interface device that utilizes an optical sensor and a time-of-flight sensor to collect images of a welding environment. The apparatus further includes an augmented reality controller for determining a simulated object to be presented in the field of view, a position of the simulated object in the field of view, and a projection ratio (perspective) of the simulated object in the field of view. The translucent display of the apparatus presents the rendered simulated object within the field of view based on the location determined by the augmented reality controller, wherein at least a portion of the welding environment is viewable through the translucent display when the display presents the rendered simulated object.

Thus, training may be performed virtually to prevent potential injury to the operator during training, but this does not give the operator any insight into the actual workplace and tool used or the actual working process. Thus, the operator does not have a deep understanding of the appropriate gestures.

The present invention thus relates to an automatic recognition and assessment system for training tasks related to industrial processes. The invention proposes a solution for preparing training sequences based on augmented reality, focusing on the acquisition, evaluation and comparison of different gestures. The problem addressed by the present invention is automatic task identification to identify tasks associated with a predetermined process and the order in which the tasks are performed. The present invention thus differs from the prior art in that it provides a real-time analysis of the work performed by the operator, thereby enabling the operator to correct and verify the work performed.

Disclosure of Invention

The present invention relates to a method for training an operator in a predetermined process having a sequence of characteristic gestures, the method being implemented by a computer system comprising one or more processors capable of executing programming instructions stored in a memory operatively connected to the one or more processors to perform the method, characterized in that the method comprises the steps of:

-an acquisition step performed by a receiver module of a computer system intended to receive data corresponding to a first-person view of an operator, in which step gestures of the operator are captured in real time in the first-person view during a training program (training session) performed by the operator, and in which step the computer system acquires data of the gestures related to a predetermined procedure in which the operator participates, said step comprising:

a hand tracking step utilizing a display device of a computer system comprising an augmented reality tracking assembly, wherein an operator's hand is tracked to detect a position corresponding to a beginning or end of a sequence of characteristic gestures of a current predetermined process,

a hand movement tracking step, performed by an extraction module of the computer system, to extract a set of characteristic gestures found in a current predetermined process, wherein movement tracking with a display device of the computer system enables monitoring of the trajectory and movement of the operator's fingers during a sequence of characteristic gestures, in which step the display device detects all gestures in the sequence that are related to the predetermined process,

a step of identifying the position of the hand of the operator with respect to the characteristic gesture of the predetermined process, in which step the characteristic gesture is identified with the movement parameters performed by the operator and recorded in the movement tracking step, which step is performed by an extraction analysis module of the computer system to identify the characteristic gesture related to the predetermined process, and

A comparison step in which the identified characteristic gesture is compared with a characteristic gesture of a predetermined process to determine any difference between the two in the predetermined process,

such that gestures of an operator performing a predetermined procedure are recorded for use in the current training program.

In an embodiment of the method according to the invention, the step of obtaining comprises the step of capturing gestures made by the operator in a field of view of a display device of the computer system.

In an embodiment of the method according to the invention, the comparing step comprises:

generating a gesture quality verification to indicate when the recognized gesture is performed correctly,

generating a warning message to indicate when the recognized gesture is performed incorrectly,

-wherein the verification and warning messages are generated in real time.

In an embodiment of the method according to the invention, the gesture quality verification and the warning message quality verification comprise visual instructions in a field of view of a display device worn by the operator during the training program.

In an embodiment of the method according to the invention, the method further comprises the step of providing one or more visual instructions in a text field and/or an image field of a field of view of a display device of the computer system.

In an embodiment of the method according to the invention, generating the warning message comprises generating one or more suggestions prompting the operator to correct the incorrect gesture before performing the subsequent gesture.

In an embodiment of the method according to the invention, the hand movement tracking step comprises the step of detecting the end of the gesture sequence.

In an embodiment of the method according to the invention, in the comparing step, the one or more gesture recognition models utilize:

-machine learning algorithm, or

-a deep learning algorithm.

In an embodiment of the method according to the invention, the method further comprises the step of applying a time synchronization between the gesture made by the operator and the intended characteristic gesture during the training program.

In an embodiment of the method according to the invention, in the step of comparing, a time synchronization of the application is used when sending the authentication and/or warning message,

such that the operator receives an indication of success or failure of the recognized gesture in real-time.

Other aspects of the invention will become apparent from the detailed description that follows.

Drawings

The nature and various advantages of the present invention will become more apparent from the following detailed description and the accompanying drawings in which like reference numerals refer to like parts throughout, and in which:

FIG. 1 is a flow chart of an embodiment of a method of training an operator based on finger movement in accordance with the present invention.

FIG. 2 illustrates an example of a grip gesture involving a static hand gesture.

FIG. 3 illustrates a video screenshot including gestures made by an operator and instructions received while performing a predetermined procedure.

Fig. 4 shows the operator's field of view during a training program of an example flat knot (reef knot) process.

Fig. 5 shows an exploded view of the steps of an example flat junction process.

Detailed Description

Referring to the drawings, wherein like reference numbers refer to like elements, fig. 1 is a flow chart of one embodiment of a method 100 according to the present invention. The method 100 relates to a method of training an operator based on finger movement. The method 100 detects proper execution of one or more gestures related to one or more particular processes based on detection of the gesture of the operator. In this case, any reference to a "hand" (singular or plural) may include one or more corresponding fingers (including thumbs) and/or corresponding wrists.

In this case, the "operator" refers to a single user as an individual participant in a predetermined process. Operators may be individual members of a team, or groups participating in a predetermined process. The operator may include a machine (e.g., a robot or a collaborative robot) with built-in fingers that perform human-like actions. A human (or non-human) operator may operate such a machine by directly remotely managing the predetermined process. An operator (human or machine) may use one or more electronic systems and/or devices designed to receive control inputs and automatically send data to at least one other operator (e.g., a human operator managing a process before or after a current predetermined process).

In this document, the terms "method" or "process" may include one or more steps performed by at least one computer system having one or more processors to execute instructions that identify an activity. All sequences of tasks are given as examples and the method described is not limited to any particular sequence.

The method 100 according to the present invention is implemented by a computer system (or "system") for instantaneously displaying a gesture of an operator performing a predetermined procedure in a first person view of the operator. As part of the operator training, the method according to the invention is able to compare the actions of the operator with the actions related to one or more predetermined processes to ensure correct execution of the processes. In this document, a set of gestures constitutes an established way of performing a process in case a correct execution of the gesture is required. A "feature gesture" is a gesture that is related to a predetermined process that includes the gesture.

The method 100 according to the present invention utilizes a method of decomposing gestures to correlate visual gestures with a process performed based on a first person view of a trained operator. The algorithm used to perform the method 100 can be continually improved over all training sessions (training sessions), ensuring that the system performing the method improves from experience gained by correcting or verifying the operator, in particular the selection of the sequence of gestures to be detected and/or performed in a predetermined process. Thus, the processing of gestures according to the present invention enables the recognition of gestures related to each process, gestures related to a specific process, and the recognition of the entire process to be distinguished. An Artificial Intelligence (AI) -based tool may also be used to perform the method 100.

The input of operator gesture data including at least one video containing one or more first-person operator views may facilitate recognition of gestures in the first-person view of the operator. The gesture analysis process is performed manually in the method 100 and the recognition parameters are determined by trial and error. In an embodiment of the method 100, the gesture performed by the operator and identified in the operator's view may be associated with one or more predetermined processes. Operator gesture data inputs captured during one or more operator training items may include an operator starting position (or "neutral position") derived from a first person view of the operator and associated with a particular hand (and/or finger) of the operator. The captured gesture data may also include operator finger positions associated with at least some predetermined processes that the operator wishes to train.

Thus, the method 100 involves extracting features from a first-person view of an operator. In one embodiment, the features are based at least in part on feature gestures associated with predetermined processes identified during the training program in which the operator is engaged. The present method utilizes a machine learning method (e.g., neural network training) to identify characteristic gestures related to an executed process and correct the gestures made to prevent correct execution of one or more related processes.

Referring to FIG. 1, a computer system (or "system") executing a method 100 records gestures of an operator during a training program performed by the operator. The system obtains data from gestures that may be related to a particular process (or "predetermined process") in which the operator is engaged. The acquired gesture data may be associated with a characteristic gesture (or sequence of characteristic gestures) of a predetermined process that the operator is training.

The computer system (or "system") performing the method 100 includes a communication network that includes one or more communication servers (or "servers") that manage data entering the system from various sources. The communication network may comprise a wired or wireless connection and may use any data transmission protocol known to those skilled in the art. Examples of wireless connections may be packagedIncluding but not limited to Radio Frequency (RF), satellite, cell phone (analog or digital), bluetoothWi-Fi, infrared, zigBee, local Area Network (LAN), wireless Local Area Network (WLAN), wide Area Network (WAN), near Field Communication (NFC), other wireless communication configurations and standards, equivalents thereof, and combinations of these elements.

The term "processor" (or, alternatively, the term "programmable logic circuit") refers to one or more devices (e.g., one or more integrated circuits, one or more controllers, one or more microcontrollers, one or more microcomputers, one or more Programmable Logic Controllers (PLCs), one or more application specific integrated circuits, one or more neural networks, and/or one or more other known equivalent programmable circuits, as known to those skilled in the art) capable of processing and analyzing data and including one or more software programs for processing the data. The processor includes software for processing data captured by elements associated with the computer system (and corresponding data acquired) and for identifying and locating discrepancies and their sources for correction.

The computer system includes at least one display device worn on the head of the operator during the training program. The display device contains an optical system that provides the operator with a truly direct first-person view of the space in front of the operator. In this case, a "true direct" view refers to a representation that enables the operator's hand (including the wrist) to be directly seen with the human eye in a physical environment, rather than a created image (e.g., a video of the fingers of the hand being viewed on a screen is not a true direct view of the fingers). The display device may be a mobile network device (including "augmented reality" and/or "virtual reality" devices, and/or any combination and/or equivalent device) in communication with a communication network of a computer system, for example, see the head mounted display device in patent US11,017,231.

The display means may comprise at least one camera (or "camera") facing the physical environment in which the training program is performed by the operator. The camera may capture video and still images of the operator's gestures during training. In this document, "image" (singular or plural) refers to a still image and an image captured in a video sequence, respectively. The camera may be any commercially available camera including, but not limited to, a depth camera and a visible light camera (RGB camera). In the following description, the terms "camera," "sensor," and "optical sensor" may be used interchangeably and may refer to one or more devices designed to capture video and still images.

The display device used records a first person view containing gestures made by the operator during one or more training items. The recorded gestures include actions of an operator trained in a predetermined process. The display device may include various electronic components (including, for example, speakers, inertial sensors, GPS transceivers, and/or temperature sensors). However, during the training program, there is no need to map the operator's hand in the physical environment or to identify the operator's hand with respect to the geographic location of other objects in the operator's view.

The display device includes an integrated tracking component to track gestures made by the operator during the training program. The tracking assembly may take into account the perception of hand movement (including movement of individual fingers and/or wrists), which is primarily defined by movement monitoring from the perspective of the operator. The tracking component may be a commercially available component.

In one embodiment, the display device is an augmented reality tracking component, e.g., from MicrosoftIn this type of device, having a sensor built into the device eliminates the need for a fixed camera that needs to be positioned by the operator. Further, a display device including an augmented reality tracking component enables the first person view to provide an immediate projection of the operator on the gesture. A display built into such a display device can utilize a quick warning (to correct gestures immediately before other gestures are performed in a predetermined process) ) And/or immediately verify (to enhance the correct gesture and enable continuous execution of the predetermined procedure).

With the use of augmented reality technology, specifications of correct gestures and grips in a predetermined process are considered. For example, FIG. 2 shows that the grip gesture includes any static hand gesture that can firmly hold the object regardless of the direction of the hand (FIG. 2 corresponds to reference An Approach to Preparing Hand Gesture Maintenance Training Based on Virtual Reality, manoch Numfu, university of Klebur-Alps, https:// tel. Arches-evertes. Fr/tel-03158358, (2021, 3 months, 3 days), FIG. 16, (reference "Numfu") (reference Huagen, shoming, & Qunsheng, 2004). The grip gestures may be categorized according to basic shape (e.g., cuboid, cylinder, sphere, triangle, etc.) (quote liux. Cui, song, & Xu, 2014). Thus, in the real world, the hand is oriented in real space with no motion limits.

The predetermined procedure for training selection comprises a sequence of steps marked by specific gestures required for correctly performing the predetermined procedure. An embodiment of a model of a generic process may be used to specify a training sequence that an operator may follow, provided that a corresponding gesture sequence is recorded in advance (e.g., by shooting an expert performing the sequence). As part of the sequence of each step of the process, it is necessary to specify the gestures (and if necessary, one or more tools) required to properly perform the process, depending on the characteristics of the selected process. The gestures interact with the operator's hand to form a model describing the gestures for each step of the predetermined process.

The display device may be connected by a communication network of the computer system, by wired or wireless connection (e.g., by Wi-Fi, bluetoothInfrared, RFID transmission, universal Serial Bus (USB), mobile phone, and/or other wireless communication means).

The communication network of the computer system may include one or more other communication devices (or "devices") that capture and transmit data collected from the physical environment in which the operator is exercising the project. The one or more communication devices may include one or more portable devices, such as mobile network devices (e.g., cell phones, laptops, one or more portable network connection devices, including "augmented reality" and/or "virtual reality" devices, and/or any combination and/or equivalent device) of an identified operator or trainer of an operator. The one or more communication devices may also include one or more remote computers capable of transmitting data over a communications network.

The computer system includes one or more modules implemented by the computer system and capable of being executed by one or more processors. The module includes a receiver module designed to receive data corresponding to a first person view of an operator captured by a display device. The module further includes an extraction module designed to extract a set of characteristic gestures found in a predetermined process. The set of feature gestures may include feature gestures that appear in an operator view. The modules further include an analysis module designed to identify a particular feature gesture associated with a predetermined process (or task of the predetermined process) based at least in part on a set of feature gestures associated with a current predetermined process during the training program, thereby identifying proper execution of the feature gestures in a predetermined combination and/or order.

A display device mounted on the head of the operator may display virtual content to the operator. Referring to fig. 3 and 4, the head mounted display device may have a field of view (including but not limited to fingertip D of right index finger _D Joint D of right phalanx _D1 Is represented by the respective axes, and the respective left and right wrists P _G 、P _D (see fig. 3). The field of view of the display device defines the space in which the operator can see a first-person view of the hand that is performing the gesture correctly.

Referring to fig. 3 and 4, other indicators may be used to mark pixels corresponding to different parts of the hand, including the wrist. For example, represent left hand M _G Image of (2)The element can be used and expressed as right hand M _D Is marked with a different indicator than the pixels of the same. In another example, the pixel representing the right index finger D may be marked with a different indicator than the pixel representing the thumb T.

Gesture recognition may also be accomplished by marking pixels in any suitable manner. For example, an artificial neural network may be trained to classify each pixel with an appropriate indicator/label. In this way, different features of one hand or the other can be computationally recognized, enabling different gestures to be detected. For example, the three-dimensional position of a finger may be tracked from one training item to the next, enabling detection of parameters such as finger position, finger angle, finger speed, finger acceleration, and finger proximity (finger proximity).

This information is continuously processed to detect a particular position or gesture in the gesture step and motion is tracked to check whether the sequence of gestures achieves the goals of the gesture. These steps and associated learning methods are defined by experts and trainers in the field. In the example of a flat knot provided below, the yarn is taken from the right hand M _D To the left hand M _G The criteria for this step of the gesture need to be applied correctly. There are different ways to achieve the goal (to hold two yarns with the left hand), but the gesture is not considered "good" because it is not suitable for all situations (e.g., small yarns, large cables, tension, etc.). The tolerances are adapted to the different human morphologies and therefore the freedom of execution must be allowed without affecting the target of the gesture. This is therefore specific to each application.

Example combinations of gestures, and checks with respect to flat knots

Fig. 3, 4 and 5 illustrate example combinations of gestures, gestures and checks related to a process relating to a flat knot. Each gesture in the process includes a sequence of steps that are broken down as shown in fig. 5. Fig. 5 shows the detection area relative to the monitored hand, where some joints of the hand are located in space near some markers (e.g., the intersection of the plane of the hand and the axis of the finger or the direction of the wrist).

Referring to fig. 5, in the proof of concept of the flat knot process (here performed by the right-handed person), there is a combination of four (4) basic steps for knotting:

the process starts from a starting position where each hand holds one end of the yarn and the hands are spaced apart by a predetermined minimum distance. Referring to fig. 5, the steps defined are as follows:

the right hand performs a first crossing step ("step 1") during which the yarn held in the right hand is brought into the left hand by crossing from above. In performing this step, the system recognizes the correct gesture. Such identification is expressed to the operator in the form of an enhanced and/or encouraged message displayed in the field of view of the display device (see, for example, text field 50 shown in fig. 4). The message may contain instructions to proceed to the next step. If this step is performed incorrectly (e.g., yarn held in the right hand is brought into the left hand by crossing from below), the display device contains a guide message (e.g., a message read as "gesture incorrect, right hand is passed over yarn" in text field 50) that identifies the error and encourages correction.

The right hand performs a second hooking step ("step 2") during which the intersection of the yarns is held between the right thumb and the right index finger of the right hand. At the same time, the left hand brings the yarn from above to form a first hook without knots. If this step is completed successfully, the display means contains a message confirming completion and instructing the operator to proceed to the next step. If this step is performed incorrectly (e.g. the left hand brings the yarn from below rather than above), the display device contains a guide message (e.g. a message read as "no knot, retry once" in text field 50 or a message giving an indication) identifying the error and encouraging correction. Thus, the system focuses the operator's attention on the correct gestures to ensure completion of the desired knot and prevents the operator from learning gestures that prevent such completion.

The right hand performs a third crossing step ("step 3") during which the yarn held in the right hand is brought into the left hand by crossing from below. If this step is completed successfully, the display means contains a message confirming completion and instructing the operator to proceed to the next step. If this step is performed incorrectly (e.g., the left hand brings the yarn from above rather than below), the display device contains a guide message (e.g., a message in text field 50 prompting the operator to perform a quality check) that identifies the error and encourages correction.

The right hand performs a fourth and final hooking step ("step 4") during which the intersection of the yarns is gripped between the right thumb and the right index finger of the right hand. At the same time, the left hand brings the yarn from below to form a second hook (and thus a knot). If this step is completed successfully, the display means contains a message confirming completion and prompting the operator to perform a quality check. If this step is performed incorrectly (e.g., the left hand brings the yarn from above rather than below), the display device contains a guide message (e.g., a message read "above or below.

Referring again to example combinations of gestures with respect to a flat knot, an operator may receive a monitoring instruction for successful completion of an intended feature gesture. In the example shown in fig. 4, the instructions that appear in the field of view of the display device may include, for example:

recognition of the gesture "detected loop 1", which triggers a sequence of characteristic gestures (indicated by the instruction "expected loop 2"),

quality inspection of the recognized gestures ("checking the quality of the knots")

-an image of the correct knot for comparison with the knot made by the operator during the training program.

The display device may include a reference image (see, e.g., fig. 4, where the field of view of the display device contains image fields 52 representing the correct junction and intersecting from above and below). The system may provide instructions to the operator for each step of the process (the instructions being shown in text field 50 and/or as visual instructions in image 52) when performing the gesture. These instructions may be combined with one or more other alerts (e.g., including audible alerts) indicating that an action needs to be modified.

Each step may be performed correctly or incorrectly, but incorrect combinations of correct steps may result in incorrect knots or no knots. Therefore, the combination must also be checked. Thus, there is a movement with possible combinations, a "forbidden" combination that causes quality problems, a correct combination, and a combination that cannot be knotted (in the last case, there is no risk, as the operator is aware of the error himself). This decomposition is effective for right-handed persons, and is (left-right) symmetrical for left-handed persons.

By using a display device that includes an augmented reality tracking component, tracking of an operator's hand enables the operator to track the execution of movements, check intersections, check wrist and finger directions, and check combinations of gestures to verify that the executed gestures match a predetermined process. The display means may indicate errors and/or risks to the operator in real time, provide advice, repeat guidance messages, teach how to perform quality control by alert criteria, and monitor performance (e.g., by setting a "challenge" related to the number of correct gestures performed in succession).

Referring again to fig. 1-5, a detailed description of an example of a method 100 of training an operator based on finger movement in accordance with the present invention ("method") is provided. The method 100 is implemented by the computer system described above (or by a site and/or facility that includes the computer system). Each gesture is divided into several steps and each step is checked according to the flowchart shown in fig. 1. In method 100, the monitoring of gestures includes the following options:

the examination of the posture is carried out,

the tracking of the hand-gestures,

-instruction display, and

-a warning display.

Instructions and warnings are displayed in the field of view of a display device worn by the operator during a training program of a predetermined procedure.

When the method 100 according to the invention starts, the method comprises an acquisition step 102 during which the gesture of the operator is captured in a first person view during a training program performed by the operator. More specifically, the gesture of the operator is captured in a field of view of a real-time display device corresponding to the execution of the gesture. In this step, the system obtains data from gestures that may be related to a predetermined process in which the operator is engaged.

The acquisition step 102 includes a hand tracking step by a display device that includes an augmented reality tracking component. Hand tracking enables detection of a position corresponding to the beginning or end of a sequence of characteristic gestures. In this step, the position of one or both hands of the operator is tracked to determine if the position is within a predetermined tolerance range to begin the current predetermined process. This step represents starting the gesture sequence in the desired order starting from the initial position of the hand.

The method 100 further comprises a hand movement tracking step 104 during which the display device detects all gestures in a sequence related to a predetermined procedure selected for the current training. Motion tracking enables an operator to monitor the trajectory and motion of the operator's fingers during a sequence of characteristic gestures. The step includes a sequence end detection step. The tracking of hand movements in this step eliminates the need for the operator to take a specific initial position when starting the gesture sequence of the predetermined procedure in step 102.

The operator may use one or both hands according to the training selected and the predetermined procedure.

Gestures and related sequences are detected in a field of view of a display device. The detected gestures may be recorded for use in the current training and the operator's performance reviewed at the end of the training. For robust gesture recognition, one or more gesture recognition models primarily utilize machine learning algorithms (e.g., k-nearest neighbor (KNN) and Support Vector Machines (SVMs)) and deep learning algorithms (e.g., recurrent Neural Networks (RNNs) and deep Convolutional Neural Networks (CNNs)). These models are used to detect characteristic gestures of a predetermined process from one or more gestures made by an operator during a training program.

In method 100, a Graph CNN may be utilized to reconstruct a complete 3D mesh of the hand surface containing more rich information about the shape and pose of the hand. To train the network under full supervision, a synthetic dataset containing both the shape and pose of the hand may be created. This enables the use of end-to-end trained deep neural networks to retrieve 3D meshes of hands directly from a single RGB image (e.g., an image of one hand or both hands of an expert performing regularly the correct sequence (including the correct gestures) related to the predetermined procedure) (see 3D Hand Shape and Pose Estimation from a Single RGB Image,Liuhao Ge et al Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019) (reference "Ge"). Other methods may be used including learning nonlinear changes in hand shape.

The method 100 further comprises a step 106 of identifying the position of the operator's hand relative to a characteristic gesture of the predetermined procedure. In this step, key parameters of the motion performed by the operator (recorded in the motion tracking step 104) are used to identify the position of the hand relative to the characteristic gesture of the predetermined procedure.

The method 100 further includes a comparison step 108 during which characteristics of the gesture identified in step 106 (e.g., the tip D of the index finger _D Position of the joint T of the thumb, position of the wrist P _G 、P _D Is determined) to verify the gesture according to a predetermined procedure. In this step, the recognized gesture is compared to the characteristic gesture of the predetermined process to determine any differences between the recognized gesture and the correct gesture of the corresponding predetermined process. The comparison generates a verification of gesture quality 108a or a warning message 108b to alert the operator. The verification 108a and warning message 108b are provided in real-time such that the verification/correction aspects of the gesture are associated with a sequence of characteristic gestures of a predetermined process to be performed during training. The verification 108a and warning message 108b comprise written messages in the field of view of the display device (as represented by instructions 110 in fig. 5). These written messages may be accompanied by audio messages and/or haptic signals.

In a comparison step 108, the verification of the gesture 108a generates an additional verification 108a' to verify that the identified gesture is compatible with the established motion tolerance for the predetermined process. This additional verification may be performed before the next gesture in the sequence of feature gestures, enabling the operator to learn the current gesture.

If the additional verification 108a 'indicates incorrect execution of the recognized gesture, a warning message 108b' appears in the field of view of the display device. If the warning message 108b, 108b' indicates incorrect execution of the recognized gesture, the operator is alerted before the subsequent gesture is executed. Sending a warning message indicates a risk of incorrect execution of the predetermined procedure, enabling the operator to correct the gesture in real time during the current training. The warning message 108b, 108b' may include a suggestion (e.g., one or more readable instructions in the text field 50 as shown in fig. 4, and/or one or more visual instructions in the image field 52 as shown in fig. 4) prompting the operator to correct the incorrect gesture before performing the subsequent gesture. The warning messages 108b, 108b' may be resent until the operator correctly performs the recognized gestures (including performing the gestures in the correct order of the sequence of characteristic gestures of the predetermined process).

Thus, a system implementing the method 100 according to the invention monitors the sufficiency of the operator work in real time and verifies the quality of the gesture (and thus the likelihood of a positive result) with the sequence of steps. This sequence of steps may be used to avoid situations where positive results are obtained by unsuitable gestures (e.g., for ergonomic, quality, repeatability, and/or reasons for transferring to other processes). The system will instruct the operator to compare his work with the expected work. These aspects provide the result that the operator can perform an inspection based on the effective program understood by the operator (Assistant). Thus, operators are encouraged to perform self-criticizing and verify their ability to autonomously perform the actual process after obtaining self-monitoring ability.

In all embodiments of the method 100, a time synchronization is applied between the gesture made by the operator and the training item performed by the operator (e.g., to check whether the gesture is performed at the correct speed of the predetermined process). The time synchronization is applied by using time synchronization points recorded by the display device, which time synchronization points correspond in time to a predetermined course of the training program. The characteristic gesture and time synchronization point of the predetermined process are used to time align the gesture made by the operator with the gesture in the training program to the same clock. The operator's gestures are then time synchronized with the training program so that each gesture can be matched with a characteristic gesture at the appropriate time of the clock. Since the predetermined process involves a time sequence, the time may be used as a distinct signal identifying the relationship between the gesture in the field of view of the display device and the intended characteristic gesture. In the comparison step 108 of the method 100, this synchronization may be used when sending the verification 108a, 108a 'and/or the warning message 108b, 108b', such that the operator receives a real-time indication of success or failure of the gesture. The operator may then make an appropriate response to check the correct execution of the currently scheduled procedure.

In all embodiments of the method 100, the neural network may be trained to be able to recognize feature gestures in a training program and create bounding boxes with respect to the recognized feature gestures. The bounding box may represent a correctly performed gesture to enhance a good habit (e.g., a bounding box with respect to a poorly connected joint (poorly articulated) thumb T) when performing a predetermined procedure that is the subject of the current training program. During this training, the coordinates of the bounding box of the feature gesture are correlated with the coordinates of the hand position. These coordinates are used to calculate characteristics of the gesture, including but not limited to the completion time of the gesture and the completion time of the sequence that includes the gesture. The bounding box and the feature gestures are sent to a neural network (e.g., one or more CNNs) to co-learn representations of the feature gestures by different operators during training.

In any embodiment of the method 100 according to the invention, one or more steps may be performed iteratively.

The computer system may include pre-programming of management information. For example, the settings of the method 100 may be associated with parameters of a typical physical environment in which the system operates. In an embodiment of the method 100, a computer system (and/or a facility including the computer system) may receive voice commands or other audio data representing, for example, a step or stop of video capture. The command may include a request for the current state of the predetermined process. The generated response may be audible, visual, tactile (e.g., using a haptic interface), and/or virtual and/or augmented. The response and corresponding data may be stored in a neural network.

The present invention aims to minimize the disturbance to the trainee (operator). Thus, support should be provided to ensure that the gesture is smooth and automatic, to provide information about quality, accuracy and examination to be performed, and to repeat the guide message from the initial training (see flat-junction example above). A warning message following the detected error alerts the operator that a systematic self-test is required. By focusing on the characteristics of the gesture in real time, the operator training remains automatic regardless of personal characteristics (e.g., hand size, skin tone, whether the hand is diseased or disabled, the sex of the operator, etc.).

The terms "at least one" and "one or more" can be used interchangeably. The range expressed as "between a and b" includes the values "a" and "b".

Although particular embodiments of the disclosed apparatus have been illustrated and described, various changes, additions and modifications can be made without departing from the spirit or scope of the invention. Accordingly, no limitation should be imposed on the scope of the described invention except as set forth in the appended claims.

Claims

1. A system comprising one or more processors capable of executing programming instructions stored in a memory operably connected to the one or more processors to perform a method (100) for training an operator in a predetermined process having a sequence of characteristic gestures, characterized in that the method performed by the system comprises the steps of:

-an acquisition step (102) performed by a receiver module of a computer system intended to receive data corresponding to a first-person view of an operator, in which acquisition step (102) gestures of the operator are captured in real time in the first-person view during a training program performed by the operator, and in which acquisition step (102) the computer system acquires data of gestures related to a predetermined procedure in which the operator participates, the acquisition step (102) comprising:

a hand motion tracking step (104) performed by an extraction module of the computer system to extract a set of characteristic gestures found in a currently predetermined process, wherein motion tracking with a display device of the computer system enables monitoring of the trajectory and motion of an operator's finger during a sequence of characteristic gestures, in the hand motion tracking step (104) the display device detects all gestures in the sequence that are related to the predetermined process,

-a step (106) of identifying the position of the hand of the operator with respect to the characteristic gesture of the predetermined procedure, wherein the characteristic gesture is identified with the movement parameters performed by the operator and recorded in the movement tracking step (104), said steps being performed by an extraction analysis module of the computer system to identify the characteristic gesture related to the predetermined procedure, and

A comparison step (108) in which the identified characteristic gesture is compared with a characteristic gesture of a predetermined process to determine any difference between the two in the predetermined process,

2. The system of claim 1, wherein the display device of the system includes a field of view that captures gestures made by an operator in the acquiring step (102).

3. The system of claim 2, wherein the comparing step (108) of the method performed by the system comprises:

generating a gesture quality verification (108 a,108 a') to indicate when the recognized gesture is performed correctly,

generating a warning message (108 b,108 b') to indicate when the recognized gesture is performed incorrectly,

wherein the verification (108 a) and the warning message (108 b) are generated in real time.

4. A system according to claim 3, wherein the display means of the system comprises a display means worn by an operator during a training program, and wherein the visual instructions in the field of view of the display means comprise a gesture quality verification (108 a,108a ') and a warning message (108 b,108 b') generated in the comparison step of the method performed by the system.

5. The system of claim 4, wherein the field of view of the display device of the system comprises a text field (50) and/or an image field (52) provided with one or more visual instructions.

6. The system of any of claims 3 to 5, wherein generating a warning message (108 b,108 b') in a method performed by the system comprises generating one or more suggestions that prompt an operator to correct an incorrect gesture before performing a subsequent gesture.

7. The system according to any of the preceding claims, wherein the hand movement tracking step (104) of the method performed by the system comprises the step of detecting the end of the gesture sequence.

8. The system of any one of the preceding claims, wherein in the comparing step (108) of the method performed by the system, one or more gesture recognition models utilize:

-machine learning algorithm, or

-a deep learning algorithm.

9. The system of any of the preceding claims, wherein the method performed by the system further comprises the step of applying a time synchronization between the gesture made by the operator and the expected characteristic gesture during the training program.

10. The system according to claim 9, wherein during the comparison step (108) of the method performed by the system, a time synchronization of the application is used when sending the authentication (108 a,108a ') and/or the warning message (108 b,108 b'),