CN111860209A

CN111860209A - Hand recognition method and device, electronic equipment and storage medium

Info

Publication number: CN111860209A
Application number: CN202010606707.2A
Authority: CN
Inventors: 卢艺帆
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Beijing ByteDance Network Technology Co Ltd
Priority date: 2020-06-29
Filing date: 2020-06-29
Publication date: 2020-10-30
Anticipated expiration: 2040-06-29
Also published as: CN111860209B

Abstract

According to the hand identification method, the hand identification device, the electronic equipment and the storage medium, the image to be identified is acquired; respectively identifying the images to be identified by utilizing a plurality of hand identification models based on images to respectively obtain hand identification frames output by each hand identification model; the tracking network model is used for judging the accuracy of the hand image of each hand recognition frame, and the hand recognition frames passing judgment are output as hand image recognition results, so that a scheme that various hand recognition models can effectively recognize hands in a complex shooting scene is utilized, the tracking network model is combined to judge the results, and the accurate hand image recognition results are obtained.

Description

Hand recognition method and device, electronic equipment and storage medium

Technical Field

The embodiment of the disclosure relates to the field of image processing, and in particular, to a hand recognition method and apparatus, an electronic device, and a storage medium.

Background

The object type identification of the image is an indispensable link in the image tracking technology.

In tracking and recognizing a hand in an image, distance characteristics between the hand and a photographing apparatus show a diversified tendency due to complexity of a photographing environment. However, in the related art, there is no good identification method for hands with different distance characteristics, which also makes the hand position obtained by identifying the hand in the image by the related art inaccurate, and seriously affects the follow-up tracking of the hand.

Disclosure of Invention

In order to solve the above problem, embodiments of the present disclosure provide a hand recognition method and apparatus, an electronic device, and a storage medium.

In a first aspect, an embodiment of the present disclosure provides a hand recognition method, including:

acquiring an image to be identified;

respectively identifying the images to be identified by utilizing a plurality of hand identification models based on images to respectively obtain hand identification frames output by each hand identification model;

and performing hand image accuracy judgment on each hand recognition frame by using the tracking network model, and outputting the hand recognition frames passing judgment as hand image recognition results.

In a second aspect, an embodiment of the present disclosure provides a hand recognition device, including:

the acquisition module is used for acquiring an image to be identified;

the identification module is used for respectively identifying the images to be identified by utilizing a plurality of hand identification models based on images and respectively obtaining hand identification frames output by each hand identification model;

the judging module is used for judging the accuracy of the hand image of each hand recognition frame by utilizing the tracking network model;

and the output module is used for outputting the hand recognition frame passing the judgment as a hand image recognition result.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executes computer-executable instructions stored by the memory to cause the at least one processor to perform the hand recognition method as set forth in the first aspect above and in various possible designs of the first aspect.

In a fourth aspect, the present disclosure provides a computer-readable storage medium, in which computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the hand recognition method according to the first aspect and various possible designs of the first aspect is implemented.

According to the hand recognition method, the hand recognition device, the electronic equipment and the storage medium, the image to be recognized is obtained, the image to be recognized is recognized by utilizing multiple hand recognition models based on the image, and the hand recognition frames output by each hand recognition model are obtained respectively; the tracking network model is used for judging the accuracy of the hand image of each hand recognition frame, and the hand recognition frames passing judgment are output as hand image recognition results, so that a scheme that different hand recognition models can effectively recognize hands in a complex shooting scene is utilized, the tracking network model is combined to judge the results, and then the accurate hand image recognition results are obtained.

Drawings

In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present disclosure, and for those skilled in the art, other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a schematic diagram of a network architecture on which the embodiments of the present disclosure are based;

fig. 2 is a schematic flow chart of a hand recognition method according to an embodiment of the present disclosure;

fig. 3a is a schematic connection diagram in a hand recognition method according to an embodiment of the disclosure;

fig. 3b is a schematic output diagram of a human joint model in a hand recognition method according to an embodiment of the disclosure;

fig. 4 is a schematic output diagram of a pixel recognition model in a hand recognition method according to an embodiment of the present disclosure;

fig. 5 is a schematic flow chart of another hand recognition method according to the embodiment of the present disclosure;

fig. 6 is a schematic output diagram of a tracking network model in a hand recognition method according to an embodiment of the present disclosure;

Fig. 7 is a block diagram of a hand recognition device according to an embodiment of the present disclosure;

fig. 8 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present disclosure.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.

Due to the development of science and technology, the frequency and scale of the application of images in life are continuously increased, and the processing demand for images is continuously increased. The object type identification of the image is an indispensable link in the image tracking technology. With the development of machine learning algorithms, it becomes possible to automatically recognize hands appearing in images using neural network models.

In the prior art, hand recognition is realized based on pixel detection or human joint detection.

Specifically, the pixel detection means that each pixel in an image is processed by using a neural network model to determine an object to which each pixel belongs, the pixels are clustered, and each cluster obtained through clustering represents one object in the image. Then, based on the pixels constituting each object, the object type is determined to find pixels of which the object type is a hand, and the position of the hand in the image is determined based on these pixels. However, when the hand in the environment is too far from the photographing apparatus, the number of pixels occupied by the hand in the image is too small, and the pixel detection technology cannot determine and cluster the pixels constituting the hand, which makes it impossible to determine the position of the hand in the image.

The human joint detection is a method for identifying and positioning human joints in an image by using a neural network model, and the position of a hand in the image is estimated by joint positions by using the relative position relationship between the joints and the hand. However, when a person in the environment is too close to the imaging device, the model cannot recognize the positions of all joints, and thus cannot determine the position of the hand in the image.

In particular, for some interaction scenes based on gestures, such as man-machine interaction based on a television, man-machine interaction based on a mobile terminal, since there may be a plurality of people in the interaction scenes, the distances between different people and the shooting device are different. When the device needs to track the hands of a plurality of people located at different distances, a single identification mode cannot meet the requirement.

In view of the above problems, the present disclosure provides a hand recognition method, device, electronic device, and storage medium. The hand recognition method has the advantages that the hands of people at different distances in the same image can be recognized by using various hand recognition models, the recognition results are subjected to set processing by using the tracking network model, and the hand image recognition result is obtained.

Referring to fig. 1, fig. 1 is a schematic diagram of a network architecture based on which the present disclosure is based, and the network architecture shown in fig. 1 may specifically include a hand recognition device 2 and a terminal 1.

The terminal 1 may be a hardware device such as a mobile phone of a user, a desktop computer, a smart home device, a tablet computer, and the like, which can be used to collect an image, and the hand recognition device 2 is hardware or software that can interact with each terminal 1 through a network, and is configured to perform a hand recognition method described in each example below, perform hand recognition on image data obtained from each terminal 1, obtain a hand recognition result, and output the hand recognition result, where an object of the output is each terminal 1.

In the network architecture shown in fig. 1, when the hand recognition device 1 is hardware, it may include a cloud server with computing function; when the hand recognition device 1 is software, it can be installed in an electronic device with an arithmetic function, wherein the electronic device includes, but is not limited to, a laptop portable computer, a desktop computer, and the like.

That is, the hand recognition method based on the present disclosure may be specifically based on the embodiment shown in fig. 1, and is applicable to various application scenarios, including but not limited to: a target tracking scene based on human hands, a game interaction scene under a television scene, an equipment control scene based on gestures and the like.

In a target tracking scene of a human hand, the terminal 1 may be a tracking device including hardware such as a camera and an image radar acquisition device, and after the hand recognition device obtains a hand image recognition result, the hand image recognition result is returned to the tracking device, so that the tracking device presents the image and the hand image recognition result to a user.

In a gesture-based device control scene, in order to accurately acquire a gesture of a user, firstly, image positioning needs to be performed on a hand of the user, that is, acquired image data is processed by a hand recognition device to obtain a hand image recognition result, and then, the control device further analyzes an image of the hand therein based on the hand image recognition result to obtain a control instruction presented by the gesture, so as to control a controlled device based on the control instruction.

Particularly, in a game interaction scene under a television scene, a plurality of generally interactive users are provided, and the positions of hands of different users from the television are different, in order to accurately acquire a plurality of gestures of the plurality of users, all the hands appearing in the acquired image data need to be processed by a hand recognition device to obtain a hand image recognition result, and then, the game device further analyzes the image of the hand therein based on the image recognition result to obtain an interaction instruction presented by the gesture, so as to perform corresponding interaction with a game progress according to the interaction instruction and present the interaction result to the user.

In a first aspect, referring to fig. 2, fig. 2 is a schematic flowchart of a hand recognition method according to an embodiment of the present disclosure. The hand recognition method provided by the embodiment of the disclosure comprises the following steps:

step 101, obtaining an image to be identified.

The execution subject of the processing method provided by the present example is the hand recognition device described above, which can interact with the terminal to obtain an image captured by the terminal when executing its own task. These images will be preprocessed to become image data of the image to be recognized that can be used for hand recognition. The preprocessing includes, but is not limited to, segmenting, denoising, matrixing, and the like.

And 102, respectively identifying the images to be identified by utilizing a plurality of hand identification models based on the images, and respectively obtaining hand identification frames output by each hand identification model.

The plurality of hand recognition models that may be employed in embodiments provided by the present disclosure include at least two of a human joint recognition model, a pixel recognition model, and a thermal imaging-based hand position recognition model.

Of course, in other implementation manners, the multiple hand recognition models may further include models that can be used for recognizing the hand based on other principles, which is not limited by the embodiments of the present disclosure.

In a specific implementation, the hand in the image can be recognized in at least two ways, so as to obtain the hand recognition frames recognized in at least two ways.

Specifically, when the plurality of hand recognition models include a human joint recognition model, the hand recognition frame corresponding to the recognition mode of the hand can be obtained as follows: processing the recognition image by using the human body joint recognition model to obtain the positions of all joint points of the human body in the image; determining the position of the hand in the image by using the positions of all the joint points of the human body in the image; and outputting a hand recognition frame for identifying the hand position according to the position of the hand in the image.

Further, the determining the position of the hand in the image by using the positions of the joints of the human body in the image may specifically be: determining the position of the elbow joint and the wrist joint in the image; establishing a connecting line between the elbow joint and the wrist joint according to the positions of the elbow joint and the wrist joint, and determining the position of the hand in the image according to the connecting line; wherein the position of the hand in the image is on an extension of the line.

Fig. 3a is a schematic connection diagram in a hand recognition method according to an embodiment of the disclosure, and the scheme is further described with reference to fig. 3 a.

The human joint recognition model can recognize the image positions of various joint types, and the recognition device needs to screen the positions of the elbow joint and the wrist joint in the image. And then, establishing a connecting line a of the hand positions according to the image positions of the elbow joint and the wrist joint, wherein one end of the connecting line a is the position of the image of the elbow joint, and the other end of the connecting line a is the image position of the wrist joint. Then, the image position of the wrist joint is extended along the direction of the connecting line a from one end to obtain an extension line b. One end of the extension line b is the position of the image of the wrist joint, and the position of the hand in the image is located on the extension line.

Generally, the distance between the hand and the wrist joint is smaller than the distance between the wrist joint and the elbow joint, and therefore, with this characteristic, the recognition apparatus can determine the position of the hand in the image on the extended line b using the length of the connection line a.

Finally, fig. 3b is a schematic diagram illustrating an output of a human body joint model in a hand recognition method according to an embodiment of the present disclosure, and as shown in fig. 3b, after obtaining the position of the image of the hand, the position of the hand in the image is labeled on the image to be recognized with the position of the image of the hand as a labeling center, so as to obtain and output a hand recognition frame.

When the plurality of hand recognition models include the pixel recognition model, the hand recognition frame corresponding to the hand recognition mode can be obtained in the following mode: processing the image to be recognized by using a pixel recognition model, and determining an object to which each pixel in the image to be recognized belongs; determining the pixel position of each pixel corresponding to the hand in each pixel corresponding to each object; and generating and outputting a hand recognition frame according to the pixel position corresponding to the hand.

Specifically, each pixel in the image to be recognized is clustered by using the pixel recognition model to obtain a plurality of clustered pixels, and each clustered pixel corresponds to an object, wherein the object includes but is not limited to a target object such as a hand in a human body, and objects such as a table, a chair, a stool and the like in a scene.

Subsequently, fig. 4 is an output schematic diagram of a human body joint model in a hand recognition method provided by the embodiment of the present disclosure, and as shown in fig. 4, the pixel recognition model determines a cluster of pixels corresponding to a hand from each cluster, and generates a corresponding hand recognition frame according to a pixel position or a pixel coordinate corresponding to the pixel of the cluster, so as to label the pixel position of the hand.

That is to say, when the hand is too close to the shooting device, the pixel identification model can effectively identify and label the image position of the hand in such a situation, and when the hand is too far away from the shooting device, the human body joint identification model can effectively identify and label the image position of the hand in such a situation, and particularly in a television interaction scene, the multi-hand identification of a plurality of users has good identification accuracy.

When the plurality of hand recognition models include the hand position recognition model based on thermal imaging, the hand recognition frame corresponding to the hand recognition mode can be obtained in the following mode: processing the image to be recognized by utilizing a hand position recognition model based on thermal imaging to obtain a corresponding thermal imaging image in the image to be recognized; determining an image position corresponding to the hand according to the heat distribution in the thermal imaging image; and generating and outputting a hand recognition frame according to the image corresponding to the hand.

Of course, other hand recognition models can be used to perform multiple recognition on the hand in the complex shooting environment, which is not described herein again in the embodiments of the present disclosure.

And 103, performing hand image accuracy judgment on each hand recognition frame by using the tracking network model, and outputting the hand recognition frames passing judgment as hand image recognition results.

Specifically, after the various hand recognition models are processed for the same image to be recognized, a plurality of hand recognition frames are output. At this time, the accuracy of the hand recognition boxes can be judged by using the tracking network model.

Due to the different recognition modes, if the human joint recognition model is based on the positions of human joints, the positions of the hands are presumed, when the human hand is positioned behind the back, the hand image does not exist in the image, and at the moment, the human joint recognition model still outputs a hand recognition frame. At this time, it is necessary to effectively deal with the case where only those recognition results in which the hand image appears in the hand recognition box are retained by using the tracking network model, and the recognition results for the aforementioned case are subjected to discarding processing. By utilizing the tracking network model, the output hand image recognition result can be ensured to be real and effective.

According to the hand recognition method provided by the embodiment, the image to be recognized is obtained; respectively identifying the images to be identified by utilizing a plurality of hand identification models based on images to respectively obtain hand identification frames output by each hand identification model; the tracking network model is used for judging the accuracy of the hand image of each hand recognition frame, and the hand recognition frames passing judgment are output as hand image recognition results, so that a scheme that different hand recognition models can effectively recognize hands in a complex shooting scene is utilized, the tracking network model is combined to judge the results, and then the accurate hand image recognition results are obtained.

On the basis of the above embodiment, fig. 5 is a schematic flow chart of another hand recognition method provided by the embodiment of the present disclosure, and as shown in fig. 5, the method further includes:

step 201, acquiring an image to be identified.

Step 2021a, processing the recognition image by using the human body joint recognition model to obtain the positions of each joint point of the human body in the image;

step 2022a, determining the position of the hand in the image by using the positions of the joint points of the human body in the image;

step 2023a, outputting a hand recognition frame for identifying the position of the hand according to the position of the hand in the image. Step 203 is performed.

Step 2021b, processing the image to be recognized by using a pixel recognition model, and determining an object to which each pixel in the image to be recognized belongs;

step 2022b, determining a pixel position of each pixel corresponding to the hand in each pixel corresponding to each object; and generating and outputting a hand recognition frame according to the pixel position corresponding to the hand. Step 203 is performed.

Step 203, intercepting corresponding hand images to be judged according to each hand recognition frame;

step 204, judging whether each hand image to be judged comprises a hand by utilizing a tracking network model to obtain a judgment result;

Step 205, determining a target hand image according to the judgment result, and performing coordinate regression processing on the target hand image to obtain corresponding hand recognition frame coordinates; the target hand image is a hand image to be determined including a hand;

and step 206, outputting the hand recognition frame coordinates corresponding to the target hand image as a hand image recognition result.

It should be noted that, in the present embodiment, the plurality of hand recognition models are the body joint recognition model and the pixel recognition model, and when other types of hand recognition models are used, the implementation manner thereof is similar, and the description is not exhaustive here.

On the basis of the foregoing embodiment, the present embodiment further includes a process of processing each hand recognition box by using a tracking network model. In addition, specific implementation manners of respectively obtaining the hand recognition frames output by each hand recognition model by respectively recognizing the images to be recognized by using multiple hand recognition models based on images can be seen in the foregoing embodiments.

Specifically, when the hand recognition frames of a plurality of images to be recognized are generated and output to the tracking network model in

steps

2023a and 2022b, the tracking network model first cuts out the corresponding hand image to be determined, that is, cuts out the image corresponding to the position of the hand recognition frame, from each hand recognition frame, and these images constitute the hand image to be determined.

Subsequently, the tracking network model sequentially determines for each of the hand images to be determined, based on whether the image content includes a hand, to obtain a determination result for each of the hand images to be determined, i.e., whether the hand is included or not.

Fig. 6 is a schematic diagram illustrating an output of a tracking network model in a hand recognition method according to an embodiment of the present disclosure, as shown in fig. 6, a recognition device selects a hand image including a part of a hand from hand images to be determined as a target hand image according to a determination result, and performs coordinate regression processing according to a hand recognition frame corresponding to the target hand image, so as to obtain hand recognition frame coordinates which can be used for outputting and forming a hand image recognition result.

And finally, outputting the hand image recognition structure to a terminal, such as a television, so that the terminal can perform subsequent processing based on the result.

Optionally, in some cases, for a hand at the same position in the image to be recognized, corresponding recognition is performed on a plurality of hand recognition models, and a hand recognition frame is output to the tracking network model. At this point, the tracking network model will also perform deduplication operations.

Specifically, the judgment result obtained by judging whether each hand image to be judged includes a hand by using the tracking network model is used for indicating whether the hand image to be judged includes a hand or not and indicating the judgment accuracy.

When outputting the hand recognition frame coordinates corresponding to the target hand image as a hand image recognition result, the following operations are also executed: judging the overlapping degree of the coordinates of each hand recognition frame; when any two hand recognition frame coordinates are overlapped, one hand recognition frame coordinate is selected and output according to the judgment accuracy corresponding to the any two hand recognition frame coordinates.

For example, one of the hand recognition box coordinates may be selected and output by a non-maximum suppression algorithm based on the determination accuracy corresponding to the two arbitrary hand recognition box coordinates.

That is to say, in the embodiment of the present disclosure, when a situation occurs in which, for a hand at the same position in an image to be recognized, corresponding recognition is performed on a plurality of hand recognition models, and a hand recognition frame is output to a tracking network model, first, the tracking network model still performs determination on each hand recognition frame, and then, a determination is performed on the degree of overlap of the plurality of hand recognition frames obtained through the determination, that is, whether there is an overlap between the hand recognition frames is determined. When the overlapping occurs again, one of the non-maximum inhibition algorithms is selected for output based on the judgment accuracy.

On the basis of the foregoing embodiment, the embodiment effectively improves the accuracy of the output hand image recognition result by adopting the hand image accuracy determination including the authenticity determination, the deduplication determination, and the like.

Fig. 7 is a block diagram of a hand recognition apparatus according to an embodiment of the present disclosure, corresponding to the hand recognition method according to the above embodiment. For ease of illustration, only portions that are relevant to embodiments of the present disclosure are shown. Referring to fig. 7, the hand recognition device includes: the device comprises an acquisition module 10, an identification module 20, a judgment module 30 and an output module 40.

The device comprises an acquisition module 10, a recognition module and a recognition module, wherein the acquisition module is used for acquiring an image to be recognized;

the recognition module 20 is configured to respectively recognize the images to be recognized by using multiple hand recognition models based on the images, and respectively obtain hand recognition frames output by each hand recognition model;

the judging module 30 is used for judging the accuracy of the hand images of the hand recognition frames by utilizing the tracking network model;

and the output module 40 is used for outputting the hand recognition frame which passes the judgment as a hand image recognition result.

In alternative embodiments provided by the present disclosure, the plurality of hand recognition models includes at least two of a human joint recognition model, a pixel recognition model, and a thermal imaging based hand position recognition model.

In an optional embodiment provided by the present disclosure, when the plurality of hand recognition models includes a human joint recognition model, the recognition module 20 is specifically configured to: processing the recognition image by using the human body joint recognition model to obtain the positions of all joint points of the human body in the image; determining the position of the hand in the image by using the positions of all the joint points of the human body in the image; and outputting a hand recognition frame for identifying the hand position according to the position of the hand in the image.

In an optional embodiment provided by the present disclosure, the identification module 20 is specifically configured to: determining the position of the elbow joint and the wrist joint in the image; establishing a connecting line between the elbow joint and the wrist joint according to the positions of the elbow joint and the wrist joint, and determining the position of the hand in the image according to the connecting line; wherein the position of the hand in the image is on an extension of the line.

In an optional embodiment provided by the present disclosure, when the hand recognition model comprises a pixel recognition model, the recognition module 20 is specifically configured to: processing the image to be recognized by using a pixel recognition model, and determining an object to which each pixel in the image to be recognized belongs; determining the pixel position of each pixel corresponding to the hand in each pixel corresponding to each object; and generating and outputting a hand recognition frame according to the pixel position corresponding to the hand.

In an optional embodiment provided by the present disclosure, the determining module 30 is specifically configured to: intercepting a corresponding hand image to be judged according to each hand recognition frame; judging whether each hand image to be judged comprises a hand by utilizing a tracking network model to obtain a judgment result; determining a target hand image according to the judgment result, and performing coordinate regression processing on the target hand image to obtain corresponding hand recognition frame coordinates; the target hand image is a hand image to be determined including a hand; and outputting the hand recognition frame coordinates corresponding to the target hand image as a hand image recognition result.

In an optional embodiment provided by the present disclosure, the determination result is used to indicate whether a hand is included in the hand image to be determined, and is also used to indicate a determination accuracy;

the determination module 30 is specifically configured to: judging the overlapping degree of the coordinates of each hand recognition frame; when any two hand recognition frame coordinates are overlapped, one of the hand recognition frame coordinates is selected to be output by the output module 40 according to the judgment accuracy corresponding to the any two hand recognition frame coordinates.

In an optional embodiment provided by the present disclosure, the determining module 30 is specifically configured to: based on the judgment accuracy corresponding to the two arbitrary hand recognition box coordinates, one of the hand recognition box coordinates is selected by using a non-maximum suppression algorithm for output by the output module 40.

The hand recognition device provided by the embodiment acquires an image to be recognized; respectively identifying the images to be identified by utilizing a plurality of hand identification models based on images to respectively obtain hand identification frames output by each hand identification model; the tracking network model is used for judging the accuracy of the hand image of each hand recognition frame, and the hand recognition frames passing judgment are output as hand image recognition results, so that a scheme that different hand recognition models can effectively recognize hands in a complex shooting scene is utilized, the tracking network model is combined to judge the results, and then the accurate hand image recognition results are obtained.

The electronic device provided in this embodiment may be used to implement the technical solutions of the above method embodiments, and the implementation principles and technical effects are similar, which are not described herein again.

Referring to fig. 8, a schematic structural diagram of an electronic device 900 suitable for implementing the embodiment of the present disclosure is shown, where the electronic device 900 may be a terminal device or a server. Among them, the terminal Device may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a Digital broadcast receiver, a Personal Digital Assistant (PDA), a tablet computer (PAD), a Portable Multimedia Player (PMP), a car terminal (e.g., car navigation terminal), etc., and a fixed terminal such as a Digital TV, a desktop computer, etc. The electronic device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 8, the electronic device 900 may include a hand recognition device (e.g., a central processing unit, a graphics processor, etc.) 901, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage device 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data necessary for the operation of the electronic apparatus 900 are also stored. The hand recognition apparatus 901, the ROM 902, and the RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

Generally, the following devices may be connected to the I/O interface 905: input devices 906 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 907 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 908 including, for example, magnetic tape, hard disk, etc.; and a communication device 909. The communication device 909 may allow the electronic apparatus 900 to perform wireless or wired communication with other apparatuses to exchange data. While fig. 8 illustrates an electronic device 900 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication device 909, or installed from the storage device 908, or installed from the ROM 902. The above-described functions defined in the method of the embodiments of the present disclosure are performed when the computer program is executed by the hand recognition apparatus 901.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the methods shown in the above embodiments.

Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of Network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of a unit does not in some cases constitute a limitation of the unit itself, for example, the first retrieving unit may also be described as a "unit for retrieving at least two internet protocol addresses".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The following are some embodiments of the disclosure.

In a first aspect, according to one or more embodiments of the present disclosure, a hand recognition method includes:

acquiring an image to be identified;

In an optional embodiment provided by the present disclosure, when the plurality of hand recognition models include a human joint recognition model, the identifying the image to be recognized by using the plurality of image-based hand recognition models and the plurality of image-based hand recognition models respectively, and obtaining the hand recognition frame output by each hand recognition model respectively includes:

processing the recognition image by using the human body joint recognition model to obtain the positions of all joint points of the human body in the image;

Determining the position of the hand in the image by using the positions of all the joint points of the human body in the image;

and outputting a hand recognition frame for identifying the hand position according to the position of the hand in the image.

In an optional embodiment provided by the present disclosure, the determining the position of the hand in the image by using the positions of the joint points of the human body in the image includes:

determining the position of the elbow joint and the wrist joint in the image;

establishing a connecting line between the elbow joint and the wrist joint according to the positions of the elbow joint and the wrist joint, and determining the position of the hand in the image according to the connecting line; wherein the position of the hand in the image is on an extension of the line.

In an optional embodiment provided by the present disclosure, when the plurality of hand recognition models include a pixel recognition model, the identifying the image to be recognized by using the plurality of image-based hand recognition models and the plurality of image-based hand recognition models respectively, and obtaining the hand recognition frame output by each hand recognition model respectively includes:

processing the image to be recognized by using a pixel recognition model, and determining an object to which each pixel in the image to be recognized belongs; determining the pixel position of each pixel corresponding to the hand in each pixel corresponding to each object; and generating and outputting a hand recognition frame according to the pixel position corresponding to the hand.

In an optional embodiment provided by the present disclosure, the performing, by using a tracking network model, a hand image accuracy determination on each hand recognition frame, and outputting the hand recognition frame that has passed the determination as a hand image recognition result includes:

intercepting a corresponding hand image to be judged according to each hand recognition frame;

judging whether each hand image to be judged comprises a hand by utilizing a tracking network model to obtain a judgment result;

determining a target hand image according to the judgment result, and performing coordinate regression processing on the target hand image to obtain corresponding hand recognition frame coordinates; the target hand image is a hand image to be determined including a hand;

and outputting the hand recognition frame coordinates corresponding to the target hand image as a hand image recognition result.

correspondingly, the hand recognition frame coordinates corresponding to the target hand image are output as a hand image recognition result, and the method comprises the following steps:

judging the overlapping degree of the coordinates of each hand recognition frame;

When any two hand recognition frame coordinates are overlapped, one hand recognition frame coordinate is selected and output according to the judgment accuracy corresponding to the any two hand recognition frame coordinates.

In an optional embodiment provided by the present disclosure, the selecting and outputting one of the hand recognition frame coordinates according to the determination accuracy corresponding to the two arbitrary hand recognition frame coordinates includes:

and based on the judgment accuracy corresponding to the coordinates of any two hand recognition boxes, selecting one hand recognition box coordinate according to a non-maximum suppression algorithm and outputting the hand recognition box coordinate.

In a second aspect, according to one or more embodiments of the present disclosure, a hand recognition device comprises:

the acquisition module is used for acquiring an image to be identified;

In an optional embodiment provided by the present disclosure, when the plurality of hand recognition models includes a human joint recognition model, the recognition module is specifically configured to: processing the recognition image by using the human body joint recognition model to obtain the positions of all joint points of the human body in the image; determining the position of the hand in the image by using the positions of all the joint points of the human body in the image; and outputting a hand recognition frame for identifying the hand position according to the position of the hand in the image.

In an optional embodiment provided by the present disclosure, the identification module is specifically configured to: determining the position of the elbow joint and the wrist joint in the image; establishing a connecting line between the elbow joint and the wrist joint according to the positions of the elbow joint and the wrist joint, and determining the position of the hand in the image according to the connecting line; wherein the position of the hand in the image is on an extension of the line.

In an optional embodiment provided by the present disclosure, when the plurality of hand recognition models comprises a pixel recognition model, the recognition module is specifically configured to: processing the image to be recognized by using a pixel recognition model, and determining an object to which each pixel in the image to be recognized belongs; determining the pixel position of each pixel corresponding to the hand in each pixel corresponding to each object; and generating and outputting a hand recognition frame according to the pixel position corresponding to the hand.

In an optional embodiment provided by the present disclosure, the determining module is specifically configured to: intercepting a corresponding hand image to be judged according to each hand recognition frame; judging whether each hand image to be judged comprises a hand by utilizing a tracking network model to obtain a judgment result; determining a target hand image according to the judgment result, and performing coordinate regression processing on the target hand image to obtain corresponding hand recognition frame coordinates; the target hand image is a hand image to be determined including a hand; and outputting the hand recognition frame coordinates corresponding to the target hand image as a hand image recognition result.

the determination module is specifically configured to: judging the overlapping degree of the coordinates of each hand recognition frame; when any two hand recognition frame coordinates are overlapped, one hand recognition frame coordinate is selected to be output by the output module according to the judgment accuracy corresponding to the any two hand recognition frame coordinates.

In an optional embodiment provided by the present disclosure, the determining module is specifically configured to: and based on the judgment accuracy corresponding to the coordinates of any two hand recognition boxes, selecting one hand recognition box coordinate for the output module to output by using a non-maximum suppression algorithm.

In a third aspect, in accordance with one or more embodiments of the present disclosure, an electronic device comprises: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executing the memory-stored computer-executable instructions causes the at least one processor to perform the hand recognition method of any one of the preceding claims.

In a fourth aspect, according to one or more embodiments of the present disclosure, a computer-readable storage medium has stored therein computer-executable instructions that, when executed by a processor, implement a hand recognition method as in any one of the preceding claims.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. A hand recognition method, comprising:

acquiring an image to be identified;

2. The hand recognition method of claim 1, wherein the plurality of hand recognition models comprises at least two of a human joint recognition model, a pixel recognition model, and a thermal imaging-based hand position recognition model.

3. The hand recognition method according to claim 2, wherein when the plurality of hand recognition models include a human joint recognition model, the obtaining of the hand recognition frame output by each hand recognition model by respectively recognizing the image to be recognized by using the plurality of image-based hand recognition models and the plurality of image-based hand recognition models comprises:

4. The hand recognition method of claim 3, wherein the determining the position of the hand in the image by using the positions of the joints of the human body in the image comprises:

determining the position of the elbow joint and the wrist joint in the image;

5. The hand recognition method according to claim 1, wherein when the plurality of hand recognition models include pixel recognition models, the obtaining of the hand recognition frame output by each hand recognition model by respectively recognizing the image to be recognized by the plurality of image-based hand recognition models and by the plurality of image-based hand recognition models comprises:

processing the image to be recognized by using a pixel recognition model, and determining an object to which each pixel in the image to be recognized belongs;

determining the pixel position of each pixel corresponding to the hand in each pixel corresponding to each object; and generating and outputting a hand recognition frame according to the pixel position corresponding to the hand.

6. The hand recognition method according to any one of claims 1 to 5, wherein the hand image accuracy determination for each hand recognition frame using the tracking network model and outputting the hand recognition frame having passed the determination as a hand image recognition result comprises:

7. The hand recognition method according to claim 6, wherein the determination result is used to indicate whether a hand is included in the hand image to be determined, and also to indicate a determination accuracy;

8. The hand recognition method according to claim 7, wherein the selecting and outputting one of the hand recognition frame coordinates according to the determination accuracy corresponding to the two arbitrary hand recognition frame coordinates comprises:

9. A hand recognition device, comprising:

the acquisition module is used for acquiring an image to be identified;

10. A hand recognition device according to claim 9, wherein the plurality of hand recognition models comprises at least two of a human joint recognition model, a pixel recognition model and a thermal imaging based hand position recognition model.

11. The hand recognition device of claim 10, wherein when the plurality of hand recognition models comprises a human joint recognition model, the recognition module is specifically configured to: processing the recognition image by using the human body joint recognition model to obtain the positions of all joint points of the human body in the image; determining the position of the hand in the image by using the positions of all the joint points of the human body in the image; and outputting a hand recognition frame for identifying the hand position according to the position of the hand in the image.

12. The hand recognition device of claim 11, wherein the recognition module is specifically configured to: determining the position of the elbow joint and the wrist joint in the image; establishing a connecting line between the elbow joint and the wrist joint according to the positions of the elbow joint and the wrist joint, and determining the position of the hand in the image according to the connecting line; wherein the position of the hand in the image is on an extension of the line.

13. The hand recognition device of claim 10, wherein when the plurality of hand recognition models comprises a pixel recognition model, the recognition module is specifically configured to: processing the image to be recognized by using a pixel recognition model, and determining an object to which each pixel in the image to be recognized belongs; determining the pixel position of each pixel corresponding to the hand in each pixel corresponding to each object; and generating and outputting a hand recognition frame according to the pixel position corresponding to the hand.

14. A hand recognition device according to any one of claims 9 to 13, wherein the decision module is specifically configured to: intercepting a corresponding hand image to be judged according to each hand recognition frame; judging whether each hand image to be judged comprises a hand by utilizing a tracking network model to obtain a judgment result; determining a target hand image according to the judgment result, and performing coordinate regression processing on the target hand image to obtain corresponding hand recognition frame coordinates; the target hand image is a hand image to be determined including a hand; and outputting the hand recognition frame coordinates corresponding to the target hand image as a hand image recognition result.

15. The hand recognition apparatus according to claim 14, wherein the determination result is used to indicate whether a hand is included in the hand image to be determined, and also to indicate a determination accuracy;

16. The hand recognition device of claim 15, wherein the determination module is specifically configured to: and based on the judgment accuracy corresponding to the coordinates of any two hand recognition boxes, selecting one hand recognition box coordinate for the output module to output by using a non-maximum suppression algorithm.

17. An electronic device, comprising: at least one processor and memory;

the memory stores computer-executable instructions;

the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the hand recognition method of any of claims 1-8.

18. A computer-readable storage medium having computer-executable instructions stored thereon which, when executed by a processor, implement the hand recognition method of any one of claims 1-8.