CN113158774A - Hand segmentation method, device, storage medium and equipment - Google Patents

Hand segmentation method, device, storage medium and equipment Download PDF

Info

Publication number
CN113158774A
CN113158774A CN202110245345.3A CN202110245345A CN113158774A CN 113158774 A CN113158774 A CN 113158774A CN 202110245345 A CN202110245345 A CN 202110245345A CN 113158774 A CN113158774 A CN 113158774A
Authority
CN
China
Prior art keywords
hand
numerical value
mask
output result
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110245345.3A
Other languages
Chinese (zh)
Other versions
CN113158774B (en
Inventor
古迎冬
李骊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing HJIMI Technology Co Ltd
Original Assignee
Beijing HJIMI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing HJIMI Technology Co Ltd filed Critical Beijing HJIMI Technology Co Ltd
Priority to CN202110245345.3A priority Critical patent/CN113158774B/en
Publication of CN113158774A publication Critical patent/CN113158774A/en
Application granted granted Critical
Publication of CN113158774B publication Critical patent/CN113158774B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • G06V40/113Recognition of static hand signs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/107Static hand or arm
    • G06V40/117Biometrics derived from hands

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a hand segmentation method, a hand segmentation device, a storage medium and equipment, wherein an image input by a user is acquired and input into a segmentation network to obtain an output result of the segmentation network. And judging whether the first numerical value and the second numerical value are both larger than a preset threshold value. And if the first numerical value and the second numerical value are both greater than the preset threshold value, sending the left-hand mask and the right-hand mask to the user, otherwise, repeatedly executing the preset step, performing iterative processing on the output result until the first numerical value and the second numerical value indicated by the output result after the iterative processing are both greater than the preset threshold value, and sending the left-hand mask and the right-hand mask contained in the output result after the iterative processing to the user. Compared with the prior art, the calculation time spent by the method is obviously and effectively reduced, and the efficiency of hand segmentation is improved. In addition, based on the network structure of the segmentation network, the segmentation network has low requirements on hardware resources and can be widely applied to most individuals and teams.

Description

Hand segmentation method, device, storage medium and equipment
Technical Field
The present application relates to the field of image processing, and in particular, to a hand segmentation method, apparatus, storage medium, and device.
Background
How to accurately segment hands (including left hand and right hand) in an image is a problem of great concern for teams and enterprises researching gesture recognition in the market at present. At present, a deep learning network is generally used for realizing hand segmentation, however, under the condition of ensuring accurate segmentation results, the existing deep learning network usually takes a long time to calculate, so that the hand segmentation efficiency is low, the hardware resources are required to be high, the deep learning network is difficult to be suitable for most individuals and teams, the application range is too narrow, and the deep learning network is not beneficial to the research and development of gesture recognition work.
Disclosure of Invention
The application provides a hand segmentation method, a hand segmentation device, a storage medium and equipment, which are used for improving the efficiency of hand segmentation under the condition of ensuring the accuracy of a hand segmentation result.
In order to achieve the above object, the present application provides the following technical solutions:
a hand segmentation method comprising:
acquiring an image input by a user;
inputting the image into a pre-constructed segmentation network to obtain an output result of the segmentation network; the output result comprises a left-handed mask, a right-handed mask, a first numerical value and a second numerical value; the first value indicates a probability that the left-hand recognition was successful, and the second value indicates a probability that the right-hand recognition was successful;
judging whether the first numerical value and the second numerical value are both larger than a preset threshold value;
sending the left-handed mask and the right-handed mask to the user when the first numerical value and the second numerical value are both greater than the preset threshold;
under the condition that the first numerical value and the second numerical value are not larger than the preset threshold value, repeatedly executing a preset step, carrying out iterative processing on the output result until the first numerical value and the second numerical value indicated by the output result after the iterative processing are larger than the preset threshold value, and sending a left-hand mask and a right-hand mask contained in the output result after the iterative processing to the user; wherein the presetting step comprises: and generating a new image based on the output result, and inputting the new image into the segmentation network to obtain the new output result.
Optionally, the split network includes:
the down-sampling structure is used for down-sampling the image to obtain a down-sampled image;
the characteristic identification structure is used for identifying and obtaining a characteristic image from the down-sampled image; the feature images comprise a left-hand feature image and a right-hand feature image;
the up-sampling structure is used for up-sampling the left-hand feature image to obtain a mask of the left hand and the probability of successful identification of the left hand; and performing up-sampling on the right-hand feature image to obtain the mask of the right hand and the probability of successful identification of the right hand.
Optionally, the down-sampling structure includes:
a standard convolutional layer, a normalization layer, an activation layer, and a downsampling layer.
Optionally, the feature recognition structure includes:
the system comprises a depth convolution layer, a normalization layer, an activation layer and a three-dimensional point cloud operation layer.
Optionally, the upsampling structure includes:
a standard convolutional layer, a normalization layer, an activation layer, and a transposed convolutional layer.
Optionally, the split network further includes:
a skip linking structure to assist the upsampling structure in upsampling the feature image.
Optionally, the generating a new image based on the output result includes:
multiplying the left-handed mask by the first value to obtain a first product;
multiplying the mask of the right hand by the second numerical value to obtain a second product;
and carrying out channel merging on the first product and the second product to obtain a new image.
A hand segmentation device comprising:
an acquisition unit configured to acquire an image input by a user;
the segmentation unit is used for inputting the image into a pre-constructed segmentation network to obtain an output result of the segmentation network; the output result comprises a left-handed mask, a right-handed mask, a first numerical value and a second numerical value; the first value indicates a probability that the left-hand recognition was successful, and the second value indicates a probability that the right-hand recognition was successful;
the judging unit is used for judging whether the first numerical value and the second numerical value are both larger than a preset threshold value;
a sending unit, configured to send the left-hand mask and the right-hand mask to the user when both the first numerical value and the second numerical value are greater than the preset threshold;
the iteration unit is used for repeatedly executing the preset step under the condition that the first numerical value and the second numerical value are not larger than the preset threshold, performing iteration processing on the output result until the first numerical value and the second numerical value indicated by the output result after the iteration processing are both larger than the preset threshold, and sending a left-hand mask and a right-hand mask contained in the output result after the iteration processing to the user; wherein the presetting step comprises: and generating a new image based on the output result, and inputting the new image into the segmentation network to obtain a new output result.
A computer-readable storage medium comprising a stored program, wherein the program performs the hand segmentation method.
A hand segmentation apparatus comprising: a processor, a memory, and a bus; the processor and the memory are connected through the bus;
the memory is used for storing a program, and the processor is used for running the program, wherein the program executes the hand segmentation method during running.
According to the technical scheme, the image input by the user is obtained and input into the pre-constructed segmentation network, and the output result of the segmentation network is obtained. The output result includes a left-handed mask, a right-handed mask, a first value, and a second value. The first value indicates the probability of success of the left-hand recognition and the second value indicates the probability of success of the right-hand recognition. And judging whether the first numerical value and the second numerical value are both larger than a preset threshold value. And under the condition that the first numerical value and the second numerical value are both larger than a preset threshold value, sending the left-hand mask and the right-hand mask to the user. And under the condition that the first numerical value and the second numerical value are not larger than the preset threshold value, repeatedly executing the preset step, carrying out iterative processing on the output result until the first numerical value and the second numerical value indicated by the output result after the iterative processing are larger than the preset threshold value, and sending a left-hand mask and a right-hand mask contained in the output result after the iterative processing to the user. Wherein, predetermine the step and include: and generating a new image based on the output result, and inputting the new image into the segmentation network to obtain a new output result. By comparing the first numerical value, the second numerical value and the preset threshold value, the iterative processing times of the output result of the segmentation network can be planned, namely, the index quantification of the hand segmentation effect is realized (the quantification index is the preset threshold value, and the iterative processing times of the output result is planned by the preset threshold value), and the redundant calculation process is avoided. Therefore, compared with the prior art, the calculation time spent by the method is obviously and effectively reduced, and the efficiency of hand segmentation is improved. In addition, based on the network structure of the segmentation network, the segmentation network has low requirements on hardware resources, can be widely applied to most individuals and teams, and has high applicability.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1a is a schematic diagram of a hand segmentation method according to an embodiment of the present disclosure;
fig. 1b is a schematic diagram of a network structure of a split network according to an embodiment of the present application;
fig. 1c is a schematic network structure diagram of another split network provided in the embodiment of the present application;
FIG. 2 is a schematic diagram of another hand segmentation method provided in the embodiments of the present application;
fig. 3 is a schematic structural diagram of a hand segmentation apparatus according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
As shown in fig. 1a, a schematic diagram of a hand segmentation method provided in an embodiment of the present application includes the following steps:
s101: an image input by a user is acquired.
The image includes, but is not limited to, a color image, an infrared image, a depth image, and the like.
S102: and inputting the image into a pre-constructed segmentation network to obtain an output result of the segmentation network.
The output result of the segmentation network comprises a first segmentation result, a second segmentation result, a first numerical value and a second numerical value.
The first segmentation result indicates a mask (mask) for the left hand, the second segmentation result indicates a mask for the right hand, the first value indicates a probability of success of the left-hand recognition, and the second value indicates a probability of success of the right-hand recognition.
In an embodiment of the present application, a segmented network includes a down-sampling structure, a feature recognition structure, an up-sampling structure, and a hopping link structure.
Specifically, according to the network structure shown in fig. 1b, the process of segmenting the network processing image includes:
1. the image is input into a down-sampling structure to obtain a first result.
It should be noted that the down-sampling structure functions as: the image is down-sampled to obtain a down-sampled image (i.e., a first result). The downsampling structure includes a standard convolutional layer (commonly referred to as a standard Conv), a normalization layer (commonly referred to as a BN layer), an activation layer (commonly referred to as swish), and a downsampling layer (commonly referred to as a pooling). In the embodiment of the present application, the number of standard convolution layers and the size of the convolution kernel can be set by a skilled person according to actual conditions.
2. And inputting the first result into the feature recognition structure to obtain a feature image.
It should be noted that the feature recognition structure functions as: and identifying and obtaining a characteristic image from the down-sampled image. The feature images comprise a left-hand feature image and a right-hand feature image, and the feature recognition structure comprises a depth convolution layer (commonly known as DepthConv), a normalization layer, an activation layer and a three-dimensional point cloud operation layer (commonly known as PointConv).
3. And inputting the left-hand feature image into an up-sampling structure through a jump link structure to obtain a left-hand mask and the probability of successful left-hand identification.
4. And inputting the right-hand feature image into an up-sampling structure through a jump link structure to obtain a mask of the right hand and the probability of successful identification of the right hand.
It should be noted that the role of the jump link structure is: the auxiliary up-sampling structure samples the characteristic image, namely the training speed of the segmentation network is improved. The hopping link structure includes a channel merge layer (colloquially referred to as concat), a standard convolutional layer, and a 1 × 1 convolutional layer (colloquially referred to as 1 × 1 Conv). In the embodiment of the present application, the respective numbers of the channel merging layer, the standard convolution layer, and the 1 × 1 convolution layer can be set by a skilled person according to actual situations.
The effect of the up-sampling structure is: the feature image is up-sampled (specifically, the left-hand feature image is up-sampled to obtain a left-hand mask and a probability of successful left-hand recognition, and the right-hand feature image is up-sampled to obtain a right-hand mask and a probability of successful right-hand recognition). The upsampling structure includes a standard convolutional layer, a normalization layer, an activation layer, and a transposed convolutional layer (commonly referred to as TransConv).
It is emphasized that the above mentioned down-sampling structure, feature recognition structure, up-sampling structure, and skip-chaining structure form a split network, which can also be seen in fig. 1 c.
S103: and judging whether the first numerical value and the second numerical value are both larger than a preset threshold value.
If the first value and the second value are both greater than the preset threshold, S104 is executed, otherwise S105 is executed.
S104: the left-handed mask, and the right-handed mask are sent to the user.
It should be noted that, if both the first numerical value and the second numerical value are greater than the preset threshold, it is determined that the effect of hand segmentation meets the preset requirement, that is, the accuracy of the hand segmentation result can be ensured.
S105: the left-handed mask is multiplied by the first value to obtain a first product.
S106: and multiplying the mask of the right hand by the second numerical value to obtain a second product.
Wherein S105 and S106 are executed concurrently.
It should be noted that the specific implementation principle of multiplying the left-hand mask and the right-hand mask by the numerical value is common knowledge familiar to those skilled in the art, and is not described herein again.
S107: and merging the channels of the first product and the second product to obtain a new image, and returning to execute S102.
The specific implementation principle of channel merging is common knowledge familiar to those skilled in the art, and is not described herein again.
The new output result obtained by processing the new image in step S102 is superior to the original output result in the hand segmentation effect.
In summary, by comparing the first value, the second value, and the preset threshold, the number of iterations of the output result of the segmented network can be planned, that is, the index quantization of the effect of the hand segmentation is achieved (the quantization index is the preset threshold, and the number of iterations of the output result is planned by the preset threshold), so as to avoid performing a redundant calculation process. Therefore, compared with the prior art, the calculation time spent by the method in the embodiment is obviously and effectively reduced, so that the efficiency of hand segmentation is improved. In addition, based on the network structure of the segmentation network, the segmentation network has low requirements on hardware resources, can be widely applied to most individuals and teams, and has high applicability.
It should be noted that, in the above embodiments, the reference to S105 and S106 is an alternative specific implementation manner of the hand segmentation method described in this application. In addition, S107 mentioned in the above embodiments is also an optional specific implementation manner of the hand segmentation method described in this application. For this reason, the flow shown in the above embodiment can be summarized as the method shown in fig. 2.
As shown in fig. 2, a schematic diagram of another hand segmentation method provided in the embodiment of the present application includes the following steps:
s201: an image input by a user is acquired.
S202: and inputting the image into a pre-constructed segmentation network to obtain an output result of the segmentation network.
The output result comprises a left-hand mask, a right-hand mask, a first numerical value and a second numerical value. The first value indicates the probability of success of the left-hand recognition and the second value indicates the probability of success of the right-hand recognition.
S203: and judging whether the first numerical value and the second numerical value are both larger than a preset threshold value.
If the first value and the second value are both greater than the predetermined threshold, S204 is performed, otherwise S205 is performed.
S204: the left-hand mask, and the right-hand mask are sent to the user.
S205: and repeatedly executing the preset step, performing iterative processing on the output result until the first numerical value and the second numerical value indicated by the output result after the iterative processing are both greater than the preset threshold value, and sending the left-hand mask and the right-hand mask contained in the output result after the iterative processing to the user.
Wherein, predetermine the step and include: and generating a new image based on the output result, and inputting the new image into the segmentation network to obtain a new output result.
In summary, by comparing the first value, the second value, and the preset threshold, the number of iterations of the output result of the segmented network can be planned, that is, the index quantization of the effect of the hand segmentation is achieved (the quantization index is the preset threshold, and the number of iterations of the output result is planned by the preset threshold), so as to avoid performing a redundant calculation process. Therefore, compared with the prior art, the calculation time spent by the method in the embodiment is obviously and effectively reduced, so that the efficiency of hand segmentation is improved. In addition, based on the network structure of the segmentation network, the segmentation network has low requirements on hardware resources, can be widely applied to most individuals and teams, and has high applicability.
Corresponding to the hand segmentation method shown in the embodiment of the present application, the embodiment of the present application further provides a hand segmentation device.
As shown in fig. 3, a schematic structural diagram of a hand segmentation apparatus provided in an embodiment of the present application includes:
an acquiring unit 100 for acquiring an image input by a user.
And a segmentation unit 200, configured to input the image into a pre-constructed segmentation network, and obtain an output result of the segmentation network. The output result includes a left-handed mask, a right-handed mask, a first value, and a second value. The first value indicates the probability of success of the left-hand recognition and the second value indicates the probability of success of the right-hand recognition.
Wherein, cut apart the network and include: the down-sampling structure is used for down-sampling the image to obtain the down-sampled image; the characteristic identification structure is used for identifying and obtaining a characteristic image from the down-sampled image, and the characteristic image comprises a left-hand characteristic image and a right-hand characteristic image; the up-sampling structure is used for up-sampling the left-hand characteristic image to obtain a left-hand mask and the probability of successful left-hand identification, and up-sampling the right-hand characteristic image to obtain a right-hand mask and the probability of successful right-hand identification; and the jump link structure is used for assisting the up-sampling structure to up-sample the characteristic image.
The downsampling structure includes a standard convolution layer, a normalization layer, an activation layer, and a downsampling layer.
The feature recognition structure comprises a depth convolution layer, a normalization layer, an activation layer and a three-dimensional point cloud operation layer.
The upsampling structure includes a standard convolutional layer, a normalization layer, an active layer, and a transposed convolutional layer.
The determining unit 300 is configured to determine whether the first value and the second value are both greater than a preset threshold.
A sending unit 400, configured to send the left-handed mask and the right-handed mask to the user when both the first value and the second value are greater than the preset threshold.
And the iteration unit 500 is configured to, under the condition that both the first numerical value and the second numerical value are not greater than the preset threshold, repeatedly execute the preset step, perform iteration processing on the output result until both the first numerical value and the second numerical value indicated by the output result after the iteration processing are greater than the preset threshold, and send the left-handed mask and the right-handed mask included in the output result after the iteration processing to the user. Wherein, predetermine the step and include: and generating a new image based on the output result, and inputting the new image into the segmentation network to obtain a new output result.
Wherein, the process of generating a new image based on the output result by the iteration unit 500 includes: multiplying the left-handed mask by the first numerical value to obtain a first product; multiplying the mask of the right hand by the second numerical value to obtain a second product; and carrying out channel combination on the first product and the second product to obtain a new image.
In summary, by comparing the first value, the second value, and the preset threshold, the number of iterations of the output result of the segmented network can be planned, that is, the index quantization of the effect of the hand segmentation is achieved (the quantization index is the preset threshold, and the number of iterations of the output result is planned by the preset threshold), so as to avoid performing a redundant calculation process. Therefore, compared with the prior art, the calculation time spent by the method in the embodiment is obviously and effectively reduced, so that the efficiency of hand segmentation is improved. In addition, based on the network structure of the segmentation network, the segmentation network has low requirements on hardware resources, can be widely applied to most individuals and teams, and has high applicability.
The present application also provides a computer-readable storage medium comprising a stored program, wherein the program performs the hand segmentation method provided herein above.
The present application further provides a hand segmentation apparatus, including: a processor, a memory, and a bus. The processor is connected with the memory through a bus, the memory is used for storing programs, and the processor is used for running the programs, wherein when the programs are run, the hand segmentation method provided by the application is executed, and the method comprises the following steps:
acquiring an image input by a user;
inputting the image into a pre-constructed segmentation network to obtain an output result of the segmentation network; the output result comprises a left-handed mask, a right-handed mask, a first numerical value and a second numerical value; the first value indicates a probability that the left-hand recognition was successful, and the second value indicates a probability that the right-hand recognition was successful;
judging whether the first numerical value and the second numerical value are both larger than a preset threshold value;
sending the left-handed mask and the right-handed mask to the user when the first numerical value and the second numerical value are both greater than the preset threshold;
under the condition that the first numerical value and the second numerical value are not larger than the preset threshold value, repeatedly executing a preset step, carrying out iterative processing on the output result until the first numerical value and the second numerical value indicated by the output result after the iterative processing are larger than the preset threshold value, and sending a left-hand mask and a right-hand mask contained in the output result after the iterative processing to the user; wherein the presetting step comprises: and generating a new image based on the output result, and inputting the new image into the segmentation network to obtain a new output result.
Optionally, the split network includes:
the down-sampling structure is used for down-sampling the image to obtain a down-sampled image;
the characteristic identification structure is used for identifying and obtaining a characteristic image from the down-sampled image; the feature images comprise a left-hand feature image and a right-hand feature image;
the up-sampling structure is used for up-sampling the left-hand feature image to obtain a mask of the left hand and the probability of successful identification of the left hand; and performing up-sampling on the right-hand feature image to obtain the mask of the right hand and the probability of successful identification of the right hand.
Optionally, the down-sampling structure includes:
a standard convolutional layer, a normalization layer, an activation layer, and a downsampling layer.
Optionally, the feature recognition structure includes:
the system comprises a depth convolution layer, a normalization layer, an activation layer and a three-dimensional point cloud operation layer.
Optionally, the upsampling structure includes:
a standard convolutional layer, a normalization layer, an activation layer, and a transposed convolutional layer.
Optionally, the split network further includes:
a skip linking structure to assist the upsampling structure in upsampling the feature image.
Optionally, the generating a new image based on the output result includes:
multiplying the left-handed mask by the first value to obtain a first product;
multiplying the mask of the right hand by the second numerical value to obtain a second product;
and carrying out channel merging on the first product and the second product to obtain a new image.
The functions described in the method of the embodiment of the present application, if implemented in the form of software functional units and sold or used as independent products, may be stored in a storage medium readable by a computing device. Based on such understanding, part of the contribution to the prior art of the embodiments of the present application or part of the technical solution may be embodied in the form of a software product stored in a storage medium and including several instructions for causing a computing device (which may be a personal computer, a server, a mobile computing device or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A hand segmentation method, comprising:
acquiring an image input by a user;
inputting the image into a pre-constructed segmentation network to obtain an output result of the segmentation network; the output result comprises a left-handed mask, a right-handed mask, a first numerical value and a second numerical value; the first value indicates a probability that the left-hand recognition was successful, and the second value indicates a probability that the right-hand recognition was successful;
judging whether the first numerical value and the second numerical value are both larger than a preset threshold value;
sending the left-handed mask and the right-handed mask to the user when the first numerical value and the second numerical value are both greater than the preset threshold;
under the condition that the first numerical value and the second numerical value are not larger than the preset threshold value, repeatedly executing a preset step, carrying out iterative processing on the output result until the first numerical value and the second numerical value indicated by the output result after the iterative processing are larger than the preset threshold value, and sending a left-hand mask and a right-hand mask contained in the output result after the iterative processing to the user; wherein the presetting step comprises: and generating a new image based on the output result, and inputting the new image into the segmentation network to obtain the new output result.
2. The method of claim 1, wherein splitting the network comprises:
the down-sampling structure is used for down-sampling the image to obtain a down-sampled image;
the characteristic identification structure is used for identifying and obtaining a characteristic image from the down-sampled image; the feature images comprise a left-hand feature image and a right-hand feature image;
the up-sampling structure is used for up-sampling the left-hand feature image to obtain a mask of the left hand and the probability of successful identification of the left hand; and performing up-sampling on the right-hand feature image to obtain the mask of the right hand and the probability of successful identification of the right hand.
3. The method of claim 2, wherein the downsampling structure comprises:
a standard convolutional layer, a normalization layer, an activation layer, and a downsampling layer.
4. The method of claim 2, wherein the feature recognition structure comprises:
the system comprises a depth convolution layer, a normalization layer, an activation layer and a three-dimensional point cloud operation layer.
5. The method of claim 2, wherein the upsampling structure comprises:
a standard convolutional layer, a normalization layer, an activation layer, and a transposed convolutional layer.
6. The method of claim 2, wherein splitting the network further comprises:
a skip linking structure to assist the upsampling structure in upsampling the feature image.
7. The method of claim 1, wherein generating a new image based on the output comprises:
multiplying the left-handed mask by the first value to obtain a first product;
multiplying the mask of the right hand by the second numerical value to obtain a second product;
and carrying out channel merging on the first product and the second product to obtain a new image.
8. A hand segmentation device, comprising:
an acquisition unit configured to acquire an image input by a user;
the segmentation unit is used for inputting the image into a pre-constructed segmentation network to obtain an output result of the segmentation network; the output result comprises a left-handed mask, a right-handed mask, a first numerical value and a second numerical value; the first value indicates a probability that the left-hand recognition was successful, and the second value indicates a probability that the right-hand recognition was successful;
the judging unit is used for judging whether the first numerical value and the second numerical value are both larger than a preset threshold value;
a sending unit, configured to send the left-hand mask and the right-hand mask to the user when both the first numerical value and the second numerical value are greater than the preset threshold;
the iteration unit is used for repeatedly executing the preset step under the condition that the first numerical value and the second numerical value are not larger than the preset threshold, performing iteration processing on the output result until the first numerical value and the second numerical value indicated by the output result after the iteration processing are both larger than the preset threshold, and sending a left-hand mask and a right-hand mask contained in the output result after the iteration processing to the user; wherein the presetting step comprises: and generating a new image based on the output result, and inputting the new image into the segmentation network to obtain a new output result.
9. A computer-readable storage medium comprising a stored program, wherein the program performs the hand segmentation method of any one of claims 1-7.
10. A hand segmentation apparatus, comprising: a processor, a memory, and a bus; the processor and the memory are connected through the bus;
the memory is used for storing a program, and the processor is used for executing the program, wherein the program executes the hand segmentation method of any one of claims 1 to 7.
CN202110245345.3A 2021-03-05 2021-03-05 Hand segmentation method, device, storage medium and equipment Active CN113158774B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110245345.3A CN113158774B (en) 2021-03-05 2021-03-05 Hand segmentation method, device, storage medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110245345.3A CN113158774B (en) 2021-03-05 2021-03-05 Hand segmentation method, device, storage medium and equipment

Publications (2)

Publication Number Publication Date
CN113158774A true CN113158774A (en) 2021-07-23
CN113158774B CN113158774B (en) 2023-12-29

Family

ID=76884338

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110245345.3A Active CN113158774B (en) 2021-03-05 2021-03-05 Hand segmentation method, device, storage medium and equipment

Country Status (1)

Country Link
CN (1) CN113158774B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110299774A1 (en) * 2008-04-22 2011-12-08 Corey Mason Manders Method and system for detecting and tracking hands in an image
US20130343601A1 (en) * 2012-06-22 2013-12-26 Charles Jia Gesture based human interfaces
CN108491752A (en) * 2018-01-16 2018-09-04 北京航空航天大学 A kind of hand gestures method of estimation based on hand Segmentation convolutional network
CN109190559A (en) * 2018-08-31 2019-01-11 深圳先进技术研究院 A kind of gesture identification method, gesture identifying device and electronic equipment
CN109977834A (en) * 2019-03-19 2019-07-05 清华大学 The method and apparatus divided manpower from depth image and interact object
CN111448581A (en) * 2017-10-24 2020-07-24 巴黎欧莱雅公司 System and method for image processing using deep neural networks
CN111539288A (en) * 2020-04-16 2020-08-14 中山大学 Real-time detection method for gestures of both hands
WO2020199593A1 (en) * 2019-04-04 2020-10-08 平安科技(深圳)有限公司 Image segmentation model training method and apparatus, image segmentation method and apparatus, and device and medium
WO2020215565A1 (en) * 2019-04-26 2020-10-29 平安科技(深圳)有限公司 Hand image segmentation method and apparatus, and computer device
US20200372246A1 (en) * 2019-05-21 2020-11-26 Magic Leap, Inc. Hand pose estimation

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110299774A1 (en) * 2008-04-22 2011-12-08 Corey Mason Manders Method and system for detecting and tracking hands in an image
US20130343601A1 (en) * 2012-06-22 2013-12-26 Charles Jia Gesture based human interfaces
CN111448581A (en) * 2017-10-24 2020-07-24 巴黎欧莱雅公司 System and method for image processing using deep neural networks
CN108491752A (en) * 2018-01-16 2018-09-04 北京航空航天大学 A kind of hand gestures method of estimation based on hand Segmentation convolutional network
CN109190559A (en) * 2018-08-31 2019-01-11 深圳先进技术研究院 A kind of gesture identification method, gesture identifying device and electronic equipment
CN109977834A (en) * 2019-03-19 2019-07-05 清华大学 The method and apparatus divided manpower from depth image and interact object
WO2020199593A1 (en) * 2019-04-04 2020-10-08 平安科技(深圳)有限公司 Image segmentation model training method and apparatus, image segmentation method and apparatus, and device and medium
WO2020215565A1 (en) * 2019-04-26 2020-10-29 平安科技(深圳)有限公司 Hand image segmentation method and apparatus, and computer device
US20200372246A1 (en) * 2019-05-21 2020-11-26 Magic Leap, Inc. Hand pose estimation
CN111539288A (en) * 2020-04-16 2020-08-14 中山大学 Real-time detection method for gestures of both hands

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BETANCOURT, A 等: "Left/right hand segmentation in egocentric videos", COMPUTER VISION AND IMAGE UNDERSTANDING, no. 154, pages 73 - 81, XP029831518, DOI: 10.1016/j.cviu.2016.09.005 *
谭台哲;韩亚伟;邵阳;: "基于RGB-D图像的手势识别方法", 计算机工程与设计, no. 02, pages 511 - 515 *

Also Published As

Publication number Publication date
CN113158774B (en) 2023-12-29

Similar Documents

Publication Publication Date Title
CN113887701B (en) Method, system and storage medium for generating output for neural network output layer
CN116415637A (en) Implementation of neural networks in fixed-point computing systems
CN104866478B (en) Malicious text detection and identification method and device
CN113435196B (en) Intention recognition method, device, equipment and storage medium
CN111984845B (en) Website wrongly written word recognition method and system
CN113313083A (en) Text detection method and device
CN105224283B (en) A kind of floating number processing method and processing device
US11699435B2 (en) System and method to interpret natural language requests and handle natural language responses in conversation
CN111368066A (en) Method, device and computer readable storage medium for acquiring dialogue abstract
CN111461302A (en) Data processing method, device and storage medium based on convolutional neural network
CN115240203A (en) Service data processing method, device, equipment and storage medium
CN113158774A (en) Hand segmentation method, device, storage medium and equipment
CN109684632B (en) Natural semantic understanding method, device and computing equipment
CN109829048B (en) Electronic device, interview assisting method, and computer-readable storage medium
CN112232361B (en) Image processing method and device, electronic equipment and computer readable storage medium
CN111209391A (en) Information identification model establishing method and system and interception method and system
CN109325234B (en) Sentence processing method, sentence processing device and computer readable storage medium
CN112801045B (en) Text region detection method, electronic equipment and computer storage medium
CN114662688A (en) Model training method, data processing method, device, electronic device and medium
CN113344200A (en) Method for training separable convolutional network, road side equipment and cloud control platform
CN109285559B (en) Role transition point detection method and device, storage medium and electronic equipment
CN112906621A (en) Hand detection method, device, storage medium and equipment
CN109165097B (en) Data processing method and data processing device
CN113744278A (en) Text detection method and device
CN111858839B (en) Processing device and processing method for responding to user side request

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant