CN113158774A

CN113158774A - Hand segmentation method, device, storage medium and equipment

Info

Publication number: CN113158774A
Application number: CN202110245345.3A
Authority: CN
Inventors: 古迎冬; 李骊
Original assignee: Beijing HJIMI Technology Co Ltd
Current assignee: Beijing HJIMI Technology Co Ltd
Priority date: 2021-03-05
Filing date: 2021-03-05
Publication date: 2021-07-23
Anticipated expiration: 2041-03-05
Also published as: CN113158774B

Abstract

The application discloses a hand segmentation method, a hand segmentation device, a storage medium and equipment, wherein an image input by a user is acquired and input into a segmentation network to obtain an output result of the segmentation network. And judging whether the first numerical value and the second numerical value are both larger than a preset threshold value. And if the first numerical value and the second numerical value are both greater than the preset threshold value, sending the left-hand mask and the right-hand mask to the user, otherwise, repeatedly executing the preset step, performing iterative processing on the output result until the first numerical value and the second numerical value indicated by the output result after the iterative processing are both greater than the preset threshold value, and sending the left-hand mask and the right-hand mask contained in the output result after the iterative processing to the user. Compared with the prior art, the calculation time spent by the method is obviously and effectively reduced, and the efficiency of hand segmentation is improved. In addition, based on the network structure of the segmentation network, the segmentation network has low requirements on hardware resources and can be widely applied to most individuals and teams.

Description

Hand segmentation method, device, storage medium and equipment

Technical Field

The present application relates to the field of image processing, and in particular, to a hand segmentation method, apparatus, storage medium, and device.

Background

How to accurately segment hands (including left hand and right hand) in an image is a problem of great concern for teams and enterprises researching gesture recognition in the market at present. At present, a deep learning network is generally used for realizing hand segmentation, however, under the condition of ensuring accurate segmentation results, the existing deep learning network usually takes a long time to calculate, so that the hand segmentation efficiency is low, the hardware resources are required to be high, the deep learning network is difficult to be suitable for most individuals and teams, the application range is too narrow, and the deep learning network is not beneficial to the research and development of gesture recognition work.

Disclosure of Invention

The application provides a hand segmentation method, a hand segmentation device, a storage medium and equipment, which are used for improving the efficiency of hand segmentation under the condition of ensuring the accuracy of a hand segmentation result.

In order to achieve the above object, the present application provides the following technical solutions:

a hand segmentation method comprising:

acquiring an image input by a user;

inputting the image into a pre-constructed segmentation network to obtain an output result of the segmentation network; the output result comprises a left-handed mask, a right-handed mask, a first numerical value and a second numerical value; the first value indicates a probability that the left-hand recognition was successful, and the second value indicates a probability that the right-hand recognition was successful;

judging whether the first numerical value and the second numerical value are both larger than a preset threshold value;

sending the left-handed mask and the right-handed mask to the user when the first numerical value and the second numerical value are both greater than the preset threshold;

under the condition that the first numerical value and the second numerical value are not larger than the preset threshold value, repeatedly executing a preset step, carrying out iterative processing on the output result until the first numerical value and the second numerical value indicated by the output result after the iterative processing are larger than the preset threshold value, and sending a left-hand mask and a right-hand mask contained in the output result after the iterative processing to the user; wherein the presetting step comprises: and generating a new image based on the output result, and inputting the new image into the segmentation network to obtain the new output result.

Optionally, the split network includes:

the down-sampling structure is used for down-sampling the image to obtain a down-sampled image;

the characteristic identification structure is used for identifying and obtaining a characteristic image from the down-sampled image; the feature images comprise a left-hand feature image and a right-hand feature image;

the up-sampling structure is used for up-sampling the left-hand feature image to obtain a mask of the left hand and the probability of successful identification of the left hand; and performing up-sampling on the right-hand feature image to obtain the mask of the right hand and the probability of successful identification of the right hand.

Optionally, the down-sampling structure includes:

a standard convolutional layer, a normalization layer, an activation layer, and a downsampling layer.

Optionally, the feature recognition structure includes:

the system comprises a depth convolution layer, a normalization layer, an activation layer and a three-dimensional point cloud operation layer.

Optionally, the upsampling structure includes:

a standard convolutional layer, a normalization layer, an activation layer, and a transposed convolutional layer.

Optionally, the split network further includes:

a skip linking structure to assist the upsampling structure in upsampling the feature image.

Optionally, the generating a new image based on the output result includes:

multiplying the left-handed mask by the first value to obtain a first product;

multiplying the mask of the right hand by the second numerical value to obtain a second product;

and carrying out channel merging on the first product and the second product to obtain a new image.

A hand segmentation device comprising:

an acquisition unit configured to acquire an image input by a user;

the segmentation unit is used for inputting the image into a pre-constructed segmentation network to obtain an output result of the segmentation network; the output result comprises a left-handed mask, a right-handed mask, a first numerical value and a second numerical value; the first value indicates a probability that the left-hand recognition was successful, and the second value indicates a probability that the right-hand recognition was successful;

the judging unit is used for judging whether the first numerical value and the second numerical value are both larger than a preset threshold value;

a sending unit, configured to send the left-hand mask and the right-hand mask to the user when both the first numerical value and the second numerical value are greater than the preset threshold;

the iteration unit is used for repeatedly executing the preset step under the condition that the first numerical value and the second numerical value are not larger than the preset threshold, performing iteration processing on the output result until the first numerical value and the second numerical value indicated by the output result after the iteration processing are both larger than the preset threshold, and sending a left-hand mask and a right-hand mask contained in the output result after the iteration processing to the user; wherein the presetting step comprises: and generating a new image based on the output result, and inputting the new image into the segmentation network to obtain a new output result.

A computer-readable storage medium comprising a stored program, wherein the program performs the hand segmentation method.

A hand segmentation apparatus comprising: a processor, a memory, and a bus; the processor and the memory are connected through the bus;

the memory is used for storing a program, and the processor is used for running the program, wherein the program executes the hand segmentation method during running.

According to the technical scheme, the image input by the user is obtained and input into the pre-constructed segmentation network, and the output result of the segmentation network is obtained. The output result includes a left-handed mask, a right-handed mask, a first value, and a second value. The first value indicates the probability of success of the left-hand recognition and the second value indicates the probability of success of the right-hand recognition. And judging whether the first numerical value and the second numerical value are both larger than a preset threshold value. And under the condition that the first numerical value and the second numerical value are both larger than a preset threshold value, sending the left-hand mask and the right-hand mask to the user. And under the condition that the first numerical value and the second numerical value are not larger than the preset threshold value, repeatedly executing the preset step, carrying out iterative processing on the output result until the first numerical value and the second numerical value indicated by the output result after the iterative processing are larger than the preset threshold value, and sending a left-hand mask and a right-hand mask contained in the output result after the iterative processing to the user. Wherein, predetermine the step and include: and generating a new image based on the output result, and inputting the new image into the segmentation network to obtain a new output result. By comparing the first numerical value, the second numerical value and the preset threshold value, the iterative processing times of the output result of the segmentation network can be planned, namely, the index quantification of the hand segmentation effect is realized (the quantification index is the preset threshold value, and the iterative processing times of the output result is planned by the preset threshold value), and the redundant calculation process is avoided. Therefore, compared with the prior art, the calculation time spent by the method is obviously and effectively reduced, and the efficiency of hand segmentation is improved. In addition, based on the network structure of the segmentation network, the segmentation network has low requirements on hardware resources, can be widely applied to most individuals and teams, and has high applicability.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1a is a schematic diagram of a hand segmentation method according to an embodiment of the present disclosure;

fig. 1b is a schematic diagram of a network structure of a split network according to an embodiment of the present application;

fig. 1c is a schematic network structure diagram of another split network provided in the embodiment of the present application;

FIG. 2 is a schematic diagram of another hand segmentation method provided in the embodiments of the present application;

fig. 3 is a schematic structural diagram of a hand segmentation apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

As shown in fig. 1a, a schematic diagram of a hand segmentation method provided in an embodiment of the present application includes the following steps:

s101: an image input by a user is acquired.

The image includes, but is not limited to, a color image, an infrared image, a depth image, and the like.

S102: and inputting the image into a pre-constructed segmentation network to obtain an output result of the segmentation network.

The output result of the segmentation network comprises a first segmentation result, a second segmentation result, a first numerical value and a second numerical value.

The first segmentation result indicates a mask (mask) for the left hand, the second segmentation result indicates a mask for the right hand, the first value indicates a probability of success of the left-hand recognition, and the second value indicates a probability of success of the right-hand recognition.

In an embodiment of the present application, a segmented network includes a down-sampling structure, a feature recognition structure, an up-sampling structure, and a hopping link structure.

Specifically, according to the network structure shown in fig. 1b, the process of segmenting the network processing image includes:

1. the image is input into a down-sampling structure to obtain a first result.

It should be noted that the down-sampling structure functions as: the image is down-sampled to obtain a down-sampled image (i.e., a first result). The downsampling structure includes a standard convolutional layer (commonly referred to as a standard Conv), a normalization layer (commonly referred to as a BN layer), an activation layer (commonly referred to as swish), and a downsampling layer (commonly referred to as a pooling). In the embodiment of the present application, the number of standard convolution layers and the size of the convolution kernel can be set by a skilled person according to actual conditions.

2. And inputting the first result into the feature recognition structure to obtain a feature image.

It should be noted that the feature recognition structure functions as: and identifying and obtaining a characteristic image from the down-sampled image. The feature images comprise a left-hand feature image and a right-hand feature image, and the feature recognition structure comprises a depth convolution layer (commonly known as DepthConv), a normalization layer, an activation layer and a three-dimensional point cloud operation layer (commonly known as PointConv).

3. And inputting the left-hand feature image into an up-sampling structure through a jump link structure to obtain a left-hand mask and the probability of successful left-hand identification.

4. And inputting the right-hand feature image into an up-sampling structure through a jump link structure to obtain a mask of the right hand and the probability of successful identification of the right hand.

It should be noted that the role of the jump link structure is: the auxiliary up-sampling structure samples the characteristic image, namely the training speed of the segmentation network is improved. The hopping link structure includes a channel merge layer (colloquially referred to as concat), a standard convolutional layer, and a 1 × 1 convolutional layer (colloquially referred to as 1 × 1 Conv). In the embodiment of the present application, the respective numbers of the channel merging layer, the standard convolution layer, and the 1 × 1 convolution layer can be set by a skilled person according to actual situations.

The effect of the up-sampling structure is: the feature image is up-sampled (specifically, the left-hand feature image is up-sampled to obtain a left-hand mask and a probability of successful left-hand recognition, and the right-hand feature image is up-sampled to obtain a right-hand mask and a probability of successful right-hand recognition). The upsampling structure includes a standard convolutional layer, a normalization layer, an activation layer, and a transposed convolutional layer (commonly referred to as TransConv).

It is emphasized that the above mentioned down-sampling structure, feature recognition structure, up-sampling structure, and skip-chaining structure form a split network, which can also be seen in fig. 1 c.

S103: and judging whether the first numerical value and the second numerical value are both larger than a preset threshold value.

If the first value and the second value are both greater than the preset threshold, S104 is executed, otherwise S105 is executed.

S104: the left-handed mask, and the right-handed mask are sent to the user.

It should be noted that, if both the first numerical value and the second numerical value are greater than the preset threshold, it is determined that the effect of hand segmentation meets the preset requirement, that is, the accuracy of the hand segmentation result can be ensured.

S105: the left-handed mask is multiplied by the first value to obtain a first product.

S106: and multiplying the mask of the right hand by the second numerical value to obtain a second product.

Wherein S105 and S106 are executed concurrently.

It should be noted that the specific implementation principle of multiplying the left-hand mask and the right-hand mask by the numerical value is common knowledge familiar to those skilled in the art, and is not described herein again.

S107: and merging the channels of the first product and the second product to obtain a new image, and returning to execute S102.

The specific implementation principle of channel merging is common knowledge familiar to those skilled in the art, and is not described herein again.

The new output result obtained by processing the new image in step S102 is superior to the original output result in the hand segmentation effect.

In summary, by comparing the first value, the second value, and the preset threshold, the number of iterations of the output result of the segmented network can be planned, that is, the index quantization of the effect of the hand segmentation is achieved (the quantization index is the preset threshold, and the number of iterations of the output result is planned by the preset threshold), so as to avoid performing a redundant calculation process. Therefore, compared with the prior art, the calculation time spent by the method in the embodiment is obviously and effectively reduced, so that the efficiency of hand segmentation is improved. In addition, based on the network structure of the segmentation network, the segmentation network has low requirements on hardware resources, can be widely applied to most individuals and teams, and has high applicability.

It should be noted that, in the above embodiments, the reference to S105 and S106 is an alternative specific implementation manner of the hand segmentation method described in this application. In addition, S107 mentioned in the above embodiments is also an optional specific implementation manner of the hand segmentation method described in this application. For this reason, the flow shown in the above embodiment can be summarized as the method shown in fig. 2.

As shown in fig. 2, a schematic diagram of another hand segmentation method provided in the embodiment of the present application includes the following steps:

s201: an image input by a user is acquired.

S202: and inputting the image into a pre-constructed segmentation network to obtain an output result of the segmentation network.

The output result comprises a left-hand mask, a right-hand mask, a first numerical value and a second numerical value. The first value indicates the probability of success of the left-hand recognition and the second value indicates the probability of success of the right-hand recognition.

S203: and judging whether the first numerical value and the second numerical value are both larger than a preset threshold value.

If the first value and the second value are both greater than the predetermined threshold, S204 is performed, otherwise S205 is performed.

S204: the left-hand mask, and the right-hand mask are sent to the user.

S205: and repeatedly executing the preset step, performing iterative processing on the output result until the first numerical value and the second numerical value indicated by the output result after the iterative processing are both greater than the preset threshold value, and sending the left-hand mask and the right-hand mask contained in the output result after the iterative processing to the user.

Wherein, predetermine the step and include: and generating a new image based on the output result, and inputting the new image into the segmentation network to obtain a new output result.

Corresponding to the hand segmentation method shown in the embodiment of the present application, the embodiment of the present application further provides a hand segmentation device.

As shown in fig. 3, a schematic structural diagram of a hand segmentation apparatus provided in an embodiment of the present application includes:

an acquiring unit 100 for acquiring an image input by a user.

And a segmentation unit 200, configured to input the image into a pre-constructed segmentation network, and obtain an output result of the segmentation network. The output result includes a left-handed mask, a right-handed mask, a first value, and a second value. The first value indicates the probability of success of the left-hand recognition and the second value indicates the probability of success of the right-hand recognition.

Wherein, cut apart the network and include: the down-sampling structure is used for down-sampling the image to obtain the down-sampled image; the characteristic identification structure is used for identifying and obtaining a characteristic image from the down-sampled image, and the characteristic image comprises a left-hand characteristic image and a right-hand characteristic image; the up-sampling structure is used for up-sampling the left-hand characteristic image to obtain a left-hand mask and the probability of successful left-hand identification, and up-sampling the right-hand characteristic image to obtain a right-hand mask and the probability of successful right-hand identification; and the jump link structure is used for assisting the up-sampling structure to up-sample the characteristic image.

The downsampling structure includes a standard convolution layer, a normalization layer, an activation layer, and a downsampling layer.

The feature recognition structure comprises a depth convolution layer, a normalization layer, an activation layer and a three-dimensional point cloud operation layer.

The upsampling structure includes a standard convolutional layer, a normalization layer, an active layer, and a transposed convolutional layer.

The determining unit 300 is configured to determine whether the first value and the second value are both greater than a preset threshold.

A sending unit 400, configured to send the left-handed mask and the right-handed mask to the user when both the first value and the second value are greater than the preset threshold.

And the iteration unit 500 is configured to, under the condition that both the first numerical value and the second numerical value are not greater than the preset threshold, repeatedly execute the preset step, perform iteration processing on the output result until both the first numerical value and the second numerical value indicated by the output result after the iteration processing are greater than the preset threshold, and send the left-handed mask and the right-handed mask included in the output result after the iteration processing to the user. Wherein, predetermine the step and include: and generating a new image based on the output result, and inputting the new image into the segmentation network to obtain a new output result.

Wherein, the process of generating a new image based on the output result by the iteration unit 500 includes: multiplying the left-handed mask by the first numerical value to obtain a first product; multiplying the mask of the right hand by the second numerical value to obtain a second product; and carrying out channel combination on the first product and the second product to obtain a new image.

The present application also provides a computer-readable storage medium comprising a stored program, wherein the program performs the hand segmentation method provided herein above.

The present application further provides a hand segmentation apparatus, including: a processor, a memory, and a bus. The processor is connected with the memory through a bus, the memory is used for storing programs, and the processor is used for running the programs, wherein when the programs are run, the hand segmentation method provided by the application is executed, and the method comprises the following steps:

acquiring an image input by a user;

under the condition that the first numerical value and the second numerical value are not larger than the preset threshold value, repeatedly executing a preset step, carrying out iterative processing on the output result until the first numerical value and the second numerical value indicated by the output result after the iterative processing are larger than the preset threshold value, and sending a left-hand mask and a right-hand mask contained in the output result after the iterative processing to the user; wherein the presetting step comprises: and generating a new image based on the output result, and inputting the new image into the segmentation network to obtain a new output result.

Optionally, the split network includes:

Optionally, the down-sampling structure includes:

Optionally, the feature recognition structure includes:

Optionally, the upsampling structure includes:

Optionally, the split network further includes:

Optionally, the generating a new image based on the output result includes:

multiplying the left-handed mask by the first value to obtain a first product;

The functions described in the method of the embodiment of the present application, if implemented in the form of software functional units and sold or used as independent products, may be stored in a storage medium readable by a computing device. Based on such understanding, part of the contribution to the prior art of the embodiments of the present application or part of the technical solution may be embodied in the form of a software product stored in a storage medium and including several instructions for causing a computing device (which may be a personal computer, a server, a mobile computing device or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A hand segmentation method, comprising:

acquiring an image input by a user;

2. The method of claim 1, wherein splitting the network comprises:

3. The method of claim 2, wherein the downsampling structure comprises:

4. The method of claim 2, wherein the feature recognition structure comprises:

5. The method of claim 2, wherein the upsampling structure comprises:

6. The method of claim 2, wherein splitting the network further comprises:

7. The method of claim 1, wherein generating a new image based on the output comprises:

multiplying the left-handed mask by the first value to obtain a first product;

8. A hand segmentation device, comprising:

an acquisition unit configured to acquire an image input by a user;

9. A computer-readable storage medium comprising a stored program, wherein the program performs the hand segmentation method of any one of claims 1-7.

10. A hand segmentation apparatus, comprising: a processor, a memory, and a bus; the processor and the memory are connected through the bus;

the memory is used for storing a program, and the processor is used for executing the program, wherein the program executes the hand segmentation method of any one of claims 1 to 7.