WO2020228279A1 - 图像手掌区域提取方法及装置 - Google Patents

图像手掌区域提取方法及装置 Download PDF

Info

Publication number
WO2020228279A1
WO2020228279A1 PCT/CN2019/117713 CN2019117713W WO2020228279A1 WO 2020228279 A1 WO2020228279 A1 WO 2020228279A1 CN 2019117713 W CN2019117713 W CN 2019117713W WO 2020228279 A1 WO2020228279 A1 WO 2020228279A1
Authority
WO
WIPO (PCT)
Prior art keywords
palm
human hand
semantic segmentation
segmentation model
image
Prior art date
Application number
PCT/CN2019/117713
Other languages
English (en)
French (fr)
Inventor
惠慧
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020228279A1 publication Critical patent/WO2020228279A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/12Fingerprints or palmprints
    • G06V40/1347Preprocessing; Feature extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Definitions

  • This application relates to the technical field of biometric identification, and in particular to an image palm area extraction and device.
  • Palmprint recognition is an important branch of biometric recognition technology. At present, palmprint recognition is performed on a fixed device, and the palm is placed on a pure background for recognition, and the cut of the palm does not require a position search. However, the inventor found that such a fixing device limits the scope of application of palmprint recognition, and at the same time, purchasing and placing the equipment incurs a large cost. In addition, when the palm print device is applied to a portable terminal device such as a mobile phone, a large amount of background information will exist in the palm photo, and this background information will seriously interfere with the palm print recognition process.
  • the purpose of the embodiments of this application is to provide a method and device for extracting an image palm area, which can be applied on a portable terminal device to remove the background information of the palmprint image and extract the palm area of the image, providing for subsequent palmprint recognition operations Guaranteed.
  • one aspect of the embodiments of the present application provides a method for extracting an image palm region, including: acquiring a human hand image to be recognized; acquiring a human hand image to be recognized; and determining the palm region in the human hand image based on a semantic segmentation model Palm contour, wherein the semantic segmentation model is trained using training human hand images with different shooting backgrounds and annotated palm contours as input; extracting a palm area from the human hand image according to the palm contour; Based on the semantic segmentation model, determining the palm contour of the palm region in the human hand image includes: segmenting the closed image contour in the human hand image based on the semantic segmentation model; when there are multiple closed image contours, calculating the multiple closed image contours The area of the area included in the image contour is determined, and the closed image contour corresponding to the largest area of the area is determined as the palm contour.
  • an image palm region extraction device including: an acquiring unit for acquiring a human hand image to be recognized; a palm contour determining unit for determining a palm region in the human hand image based on a semantic segmentation model The palm contour of the hand, wherein the semantic segmentation model is trained by using training palmprint region images with different shooting backgrounds and annotated palm contours as input; the palm region extraction unit is used to extract from the palm contour according to the palm contour The palm area is extracted from the human hand image; the palm contour determining unit is also used to segment the closed image contour in the human hand image based on the semantic segmentation model; when there are multiple closed image contours, calculate the multiple closed images The area of the area included in the contour is determined, and the contour of the closed image corresponding to the largest area of the area is determined as the palm contour.
  • Another aspect of the embodiments of the present application provides a non-volatile readable storage medium on which computer-readable instructions are stored, wherein the computer-readable instructions implement the steps of the above-mentioned method of the present application when executed by a processor.
  • the semantic segmentation model trained on the training palmprint region image with the palm contour is used to determine the palm contour of the palm region in the hand image, and then the palm region is extracted from the hand image according to the palm contour;
  • This application combines image contour texture technology and neural network technology to quickly and accurately extract the palm contour and the corresponding palm area from the human hand image.
  • training human hand images have different shooting backgrounds, so based on this semantic segmentation model, the palm region extraction operation of human hand images with different backgrounds can be completed.
  • the semantic segmentation model has low data requirements and memory consumption for feature sample pictures, so that the technical solution can be realized with the help of general-purpose processors and cameras. It has a very wide range of application scenarios and can be widely used in mobile phones such as mobile phones. Such a universal terminal provides a basis for the market promotion of palmprint recognition technology.
  • FIG. 1 is a flowchart of an image palm region extraction method according to an embodiment of the present application
  • FIG. 2A shows a schematic diagram of a human hand image of the first example
  • FIG. 2B shows a schematic diagram of a human hand contour determined for the human hand image in FIG. 2A by applying the image palm region extraction method according to an embodiment of the present application;
  • FIG. 3A shows a schematic diagram of a human hand image of the second example
  • FIG. 3B shows a schematic diagram of a palm area extracted from the human hand image of FIG. 3A by applying the image palm area extraction method according to an embodiment of the present application;
  • FIG. 4A shows a schematic diagram of multiple palm regions extracted by applying an image palm region extraction method according to an embodiment of the present application
  • FIG. 4B shows a schematic diagram of the palm area after filling the cavity in FIG. 4A by applying the method for extracting the palm area of an image according to an embodiment of the present application;
  • FIG. 5 shows a flowchart of a training process for the semantic segmentation model in a method for extracting an image palm region according to an embodiment of the present application
  • FIG. 6 shows a principle flowchart of an image palm region extraction method according to an embodiment of the present application
  • Fig. 7 is a structural block diagram of an image palm region extraction device according to an embodiment of the present application.
  • FIG. 8 is a structural block diagram of an image palm region extraction device according to another embodiment of the present application.
  • FIG. 9 is a structural block diagram of a physical device equipped with an image palm region extraction device according to an embodiment of the present application.
  • the method for extracting the palm region of an image according to an embodiment of the present application includes:
  • it can be a dedicated integrated component, dedicated server or dedicated terminal dedicated to palmprint recognition or palm area extraction; on the other hand, it can also be a general-purpose server or terminal , Where the general-purpose server or terminal (such as a smart phone, a tablet computer, etc.) may be installed with a module for palm print recognition or palm area extraction or configured with program code for image palm area extraction, and the above all belong to this application Within the scope of protection.
  • the acquisition method of the human hand image it may be calling the camera of the terminal to collect the human hand image, or the terminal or the server receiving the human hand image uploaded from the bottom layer.
  • the semantic segmentation model is trained using training human hand images with different shooting backgrounds and annotated palm contours as input.
  • the different shooting backgrounds of the training human hand image may be diversified, for example, it may correspond to the shooting background of the human hand in an indoor background, an outdoor background, and a close-skinned background.
  • the semantic segmentation model used as a neural network model to train the palm contour in the hand image can be non-automatic, semi-automatic or fully automatic. For example, it can be the use of existing image contour extraction tools (magic wand tools) and Carry out adaptive adjustment and optimization, so as to realize the accurate extraction of the contour of the training human hand image.
  • the semantic segmentation model does not require high image pixel requirements and hardware memory requirements, so that it can be used in smart terminals, such as mobile phones integrated with specific applications; in addition, it is performed by using human hand images with different backgrounds.
  • the trained semantic segmentation model can realize the determination of palm contours in various backgrounds.
  • Figure 2A shows an example of a human hand image
  • Figure 2B shows an example of a human hand contour determined for the human hand image of Figure 2A
  • Figure 3A shows an example of a human hand image
  • Figure 3B shows an example of the palm area extracted from the human hand image of FIG. 3A. From this, it can be seen that no matter it is under a general background (as shown in Figure 2A-2B), or under such a close-skinned background of a human face background (as shown in Figure 3A-B), the background in the hand image can be better Elimination of information and extraction of palm area.
  • the determination process for the palm contour can also be implemented in the following ways: based on the semantic segmentation model, the closed image contour is segmented in the human hand image; in addition, when there are multiple closed image contours, the calculation is more complicated. The area of the region included in the contour of each closed image, and the contour of the corresponding closed image with the largest area is determined as the palm contour.
  • the qualified human hand area is a complete closed area, and the largest closed area in the human hand image is the palm contour corresponding to the palm area; therefore, the semantic segmentation model can be based on this rule. Training and screening of palm contours.
  • Figure 4A it shows an example of multiple palm regions extracted by applying the image palm region extraction method of an embodiment of the present application. Due to illumination and other reasons, there will be some small regions that are not effectively extracted and will show holes (The example marked in the circle in the picture).
  • the embodiment of the present application also proposes the following ways to realize the detection and filling of the cavity: detect whether there are other closed image contours in the palm contour; if there are other closed image contours, determine the other closed image contours as The cavity area; and, according to the content of the human hand image in the palm area, the cavity area is filled, for example, the image content area near the cavity area may be directly used to fill the cavity area.
  • Fig. 4B shows an example after filling the hole in Fig. 4A, thus realizing the extraction of a complete palm area, and solving the problem that the palm print recognition operation cannot be performed normally due to the hole in the palm area.
  • the training process for the semantic segmentation model in the method for extracting the palm region of an image in an embodiment of the present application includes:
  • S52 Extract palm contours corresponding to multiple training human hand images.
  • the palm contour in the human hand image can be extracted or annotated automatically or semi-automatically.
  • the semantic segmentation model may include an encoder network and a decoder network
  • the specific training process may be based on the encoder network in the semantic segmentation model, extracting the palm contours of the input multiple training human hand images Features; based on the decoder network in the semantic segmentation model, the pooling index calculated in the maximum pooling step of the corresponding encoder network is used to perform nonlinear upsampling operations to map the extracted palm contour features to Full input resolution feature mapping of pixel classification to train semantic segmentation model.
  • the up-sampled map is sparse, and then convolved with a trainable filter to generate a dense feature map
  • the role of the decoder network is to map the low-resolution encoder feature map to the full pixel classification Input resolution feature map.
  • the training process for the semantic segmentation model also includes: configuring a batch standardization layer for each convolutional layer in the semantic segmentation model, and setting a linear rectification function activation layer (Rectified Linear Unit) after the batch standardization layer. , ReLU), where each convolutional layer is a training hand image with palm contours corresponding to different photographing backgrounds; based on the batch normalization layer, which is propagated forward during training, the batch normalization layer only saves the mean and variance of the input weights , When the weight output is returned to the convolutional layer, it is still the original convolutional weight, and it is propagated backward during training.
  • ReLU linear rectification function activation layer
  • each convolutional layer with linear rectification function activation
  • the layer performs chain derivation to obtain the gradient and the current training rate. Therefore, ReLU is an improvement of the traditional activation function sigmoid, which better solves the problem of gradient disappearance during training.
  • the semantic segmentation model may be a Segnet model.
  • the following Table 1 shows the specific structural composition and parameters of the Segnet model.
  • the segmentation module SegNet includes sequential multilayer convolutional layers and multilayer deconvolutional layers (5 layers shown in Table 1 below).
  • the palm image and its corresponding label are used to train the SegNet network.
  • more than 2000 labeled images should be selected to train the SegNet network.
  • the input of the model training is multiple palmprint area images marked with the user's palm boundary, and the output is to classify the palm area pixels and the background area pixels, so that when the SegNet model is applied, the palm image with the background Corresponding palm area and background area are extracted from it, and the palm is separated from the background.
  • the specific training process can be: input five frames of pre-labeled palm border images related to the same application scenario (for example, with the same background information) into the SegNet network, and use the momentum-driven Adam algorithm to iteratively train the SegNet Network, get the parameters of SegNet network.
  • the momentum is set to 0.9.
  • After training save the parameters of the deep neural network.
  • the function of convolution is to extract features.
  • the convolution used by SegNet is the same convolution, that is, the image size is not changed after convolution; in Decoder In the process, the same convolution is also used, but the function of the convolution is to enrich the information of the enlarged image for upsampling, so that the information lost in the pooling process can be obtained by learning in the decoder.
  • the function of the decoder network is to map the low-resolution encoder feature map to the full input resolution feature map for pixel classification.
  • the SegNet network up-samples its lower resolution input feature map in the decoder.
  • the decoder uses the pooling index calculated in the maximum pooling step of the corresponding encoder to perform non-linear upsampling. This eliminates the need to learn upsampling.
  • the up-sampled map is sparse and then convolved with a trainable filter to produce a dense feature map. As a result, a trade-off between memory and accuracy involved in good segmentation performance is achieved. This enables the technical solution to have a very wide range of application scenarios, for example, it can be applied to the application scenario of taking pictures of mobile phones.
  • each convolutional layer in SegNet will add a bn (batch normalization) layer, and the bn layer is followed by ReLU (Rectified Linear Unit, Linear rectification function) activation layer.
  • ReLU Rectified Linear Unit, Linear rectification function
  • the bn layer standardizes the convolutional feature values (weights), but the output remains unchanged, that is, the bn layer only saves the mean and variance of the input weights, and the weights output When returning to the convolutional layer, it is still the weight of the original convolution; and, during training, it is propagated backwards, according to the mean and variance in the bn layer, combined with each convolutional layer and the ReLU layer for chain derivation, find Obtain the gradient to calculate the current learning rate.
  • ReLU is an improvement on the traditional activation function sigmoid, and it is mainly solved the problem of gradient disappearance. As shown in FIG.
  • the embodiment of the present application provides a method for extracting a palm part from a palm image by an FCN-based Segnet network to eliminate palm background information.
  • the segmentation method processes the image, it is specific to the pixel level, which means that each pixel in the image is assigned to a certain object category, and the boundary of each object is also marked. Therefore, different from the classification purpose, the correlation model needs to have pixel-level dense prediction capabilities.
  • it mainly involves the training phase based on the Segnet semantic segmentation model and the application phase based on the Segnet semantic segmentation model.
  • the palm image needs to be acquired, and then the palm image needs to be manually labeled.
  • the acquisition method of the palm image it can be collected by a camera (such as the camera of a mobile phone) and the image related to the human hand, which can be taken manually or downloaded from a keyword search on the Internet. Obtained, these photographed palm images or palm images downloaded from the Internet will have a background area and a palm area.
  • the manual labeling of the palm area of the palm image it can be based on purely manual labeling operations.
  • the self-adaptively recognized boundary is labeled and adjusted, so that the palm area and background area are marked in the photo image, and then the training palm image is labeled.
  • pixel adaptive matching segmentation algorithm tools such as the magic wand tool
  • the self-adaptively recognized boundary is labeled and adjusted, so that the palm area and background area are marked in the photo image, and then the training palm image is labeled.
  • a total of about 3,000 mobile phone photos of 400 people can be collected. They are taken indoors and outdoors.
  • the background includes desks, computers, trees, buildings and other different backgrounds for framing. This is beneficial to the Segnet model for different backgrounds.
  • Discrimination and recognition further, since both the face and the palm are skin colors, a major difficulty in the current palm region extraction process is how to accurately extract the palm region from the palm image of the human face background.
  • the application based on the Segnet semantic segmentation model can effectively separate the palm from the background area.
  • the Segnet semantic segmentation model has very low data volume requirements and memory consumption for feature sample pictures, and can be applied to mobile phones, for example, it can be integrated in mobile apps and used.
  • the user calls the camera module to take a picture of the palm of the human body.
  • S63 may correspond to different application scenarios, for example, it may be that the user opens the mobile phone APP and calls the corresponding camera module through a specific user operation.
  • S64 Call the Segnet semantic segmentation model to segment the background area and the palm area in the human palm image.
  • the palm area and background area in the human palm image are derived end-to-end, achieving rapid segmentation of the palm area; and, due to the Segnet semantic segmentation model’s data requirements and memory consumption for feature sample images Both are relatively low, so that the solution of the embodiment of the present application can be implemented with the help of a general-purpose processor and camera, and has a very wide range of application scenarios, for example, it can be applied in an APP. Because at present, palmprint recognition technology is still based on common codec technology, which has very high requirements for image pixels and image area specifications, so generally it can only be applied to fixed devices.
  • This technical solution can be used to The palm area extracted from the image of the cluttered background is not limited to the standard palm placement, and the training and recognition process based on the Segnet model (as described above) can also be achieved for low image pixels, making this technical solution widely available Locally applied in mobile terminals, such as transplanted into mobile phone APP.
  • Segnet has more recognition weights based on color and light in the recognition process
  • there may be uneven color or light on the palm in the image resulting in unrecognized palms. Hollow.
  • the image palm region extraction device of an embodiment of the present application includes: an acquiring unit 701, configured to acquire a human hand image to be recognized; and a palm contour determining unit 702, configured to determine the human hand based on a semantic segmentation model The palm contour of the palm area in the image, wherein the semantic segmentation model is trained using training human hand images with different shooting backgrounds and annotated palm contours as input; the palm area extraction unit 703 is used for training according to the palm contour, The palm area is extracted from the human hand image.
  • the palm contour determining unit 702 is further configured to segment the closed image contours in the human hand image based on the semantic segmentation model, and, when there are multiple closed image contours, calculate the contours of the multiple closed image contours.
  • the area of the included area, and the contour of the closed image corresponding to the largest area of the area is determined as the palm contour.
  • the palm region extraction unit 703 is further configured to detect whether there are other closed image contours in the palm contour, and, if there are other closed image contours, determine the other closed image contours as holes Area, and filling the hollow area according to the content of the human hand image in the palm area.
  • the device further includes a training unit 704 for acquiring multiple training hand images, wherein the multiple training hand images include images corresponding to different photographing backgrounds.
  • the training unit 704 is further configured to extract the palm contour features of the multiple input training human hand images based on the encoder network in the semantic segmentation model, and based on the decoding in the semantic segmentation model
  • the encoder network uses the pooling index calculated in the corresponding maximum pooling step of the encoder network to perform a nonlinear up-sampling operation to map the extracted palm contour features to the full input resolution for pixel classification Feature mapping to train the semantic segmentation model.
  • the semantic segmentation model is a convolutional neural network
  • the training unit 704 is further configured to configure a batch normalization layer corresponding to each convolution layer in the semantic segmentation model, and after the batch normalization layer A linear rectification function activation layer is also set, where each convolutional layer is a training human hand image with a palm contour corresponding to a different photographing background, and, based on the batch normalization layer, it propagates forward during training, and batch normalization The layer only saves the mean and variance of the input weights.
  • the semantic segmentation model is a Segnet model.
  • embodiments of the present application also provide a non-volatile readable storage medium on which computer-readable instructions are stored, and the computer-readable instructions are executed by a processor When realizing the image palm region extraction method shown in Figure 1-6.
  • a non-volatile readable storage medium on which computer-readable instructions are stored, and the computer-readable instructions are executed by a processor When realizing the image palm region extraction method shown in Figure 1-6.
  • an embodiment of the present application also provides a computer device 90.
  • the device 90 includes a storage device 901 and a processor 902; the storage device 901 is used to store computer-readable instructions; the processor 902 is used to execute the computer-readable instructions to realize the above as shown in FIGS. 1-6 The palm area extraction method of the image.
  • the semantic segmentation model trained on the training palmprint region image with the palm contour is used to determine the palm contour of the palm region in the human hand image, and then the palm region is extracted from the human hand image according to the palm contour ; Therefore, this application combines image contour texture technology and neural network technology to quickly and accurately extract palm contours and corresponding palm regions from human hand images.
  • training human hand images have different shooting backgrounds, so based on this semantic segmentation model, the palm region extraction operation of human hand images with different backgrounds can be completed.
  • the semantic segmentation model has low data requirements and memory consumption for feature sample pictures, so that the technical solution can be realized with the help of general-purpose processors and cameras. It has a very wide range of application scenarios and can be widely used in mobile phones such as mobile phones. Such a universal terminal provides a basis for the market promotion of palmprint recognition technology.
  • the software product can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.), including several
  • the instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute the methods described in each implementation scenario of this application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Geometry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

本申请涉及生物特征识别技术领域。本申请实施例提供一种图像手掌区域提取方法及装置,其中所述图像手掌区域提取方法包括:获取待识别的人手图像;基于语义分割模型,确定所述人手图像中手掌区域的手掌轮廓,其中所述语义分割模型是以具有不同拍摄背景的并标注了手掌轮廓的训练人手图像作为输入进行训练的;根据所述手掌轮廓,从所述人手图像中提取出手掌区域。由此,将图像轮廓纹理技术和神经网络技术相结合,能够快速精确地从具有不同背景的人手图像中提取出手掌轮廓和对应的手掌区域,并还具有广泛的市场应用前景。

Description

图像手掌区域提取方法及装置
本申请要求与2019年5月10日提交中国专利局、申请号为2019103902895、申请名称为“图像手掌区域提取方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在申请中。
技术领域
本申请涉及生物特征识别技术领域,具体地涉及一种图像手掌区域提取及装置。
背景技术
生物特征识别作为新兴的身份鉴别技术,是目前国际上最具前景的高新技术之一,属于国际前沿课题。掌纹识别是生物特征识别技术的一个重要分支。目前掌纹识别都是在固定装置上,将手掌放在单纯背景上进行识别,手掌的裁减不需要进行位置的寻找。但是,发明人发现这样固定装置限制了掌纹识别的适用范围,同时采购设备,摆放设备都都来了很大的成本。另外,当将掌纹设备应用在诸如手机的便携式终端设备上时,手掌照片会存在大量的背景信息,而这些背景信息会严重干扰对掌纹的识别过程。
因此,一种可以在便携式终端设备上应用的用于去除掌纹图像的背景信息的技术方案是目前业界的热门研究方向。
发明内容
本申请实施例的目的是提供一种图像手掌区域提取方法及装置,用以实现能够在便携式终端设备上应用以去除掌纹图像的背景信息并提取出图像手掌区域,为后续掌纹识别操作提供了保障。
为了实现上述目的,本申请实施例一方面提供一种图像手掌区域提取方法,包括:获取待识别的人手图像;获取待识别的人手图像;基于语义分割模型,确定所述人手图像中手掌区域的手掌轮廓,其中所述语义分割模型是以具有不同拍摄背景的并标注了手掌轮廓的训练人手图像作为输入进行训练的;根据所述手掌轮廓,从所述人手图像中提取出手掌区域;所述基于语义分割模型,确定所述人手图像中手掌区域的手掌轮廓包括:基于语义分割模型,在所述人手图像中分割出封闭图像轮廓;当存在多个封闭图像轮廓时,计算所述多个封闭图像轮廓所包含的区域面积,并将对应具有最大的所述区域面积的封闭图像轮廓确定为所述手掌轮廓。
本申请实施例另一方面提供一种图像手掌区域提取装置,包括:获取单元,用于获取待识别的人手图像;手掌轮廓确定单元,用于基于语义分割模型,确定所述人手图像中手掌区域的手掌轮廓,其中所述语义分割模型是以具有不同拍摄背景的并标注了手掌轮廓的训练掌纹区域图像作为输入进行训练的;手掌区域提取单元,用于根据所述手掌轮廓,从所述人手图像中提取出手掌区域;所述手掌轮廓确定单元还用于基于语义分割模型,在所述人手图像中分割出封闭图像轮廓;当存在多个封闭图像轮廓时,计算所述多个封闭图像轮廓所包含的区域面积,并将对应具有最大的所述区域面积的封闭图像轮廓确定为所述手掌轮廓。
本申请实施例另一方面提供一种计算机设备,包括存储器和处理器,所述存储器存储有计算机 可读指令,其中,所述处理器执行所述计算机可读指令时实现本申请上述的方法的步骤。
本申请实施例另一方面提供一种非易失性可读存储介质,其上存储有计算机可读指令,其中,所述计算机可读指令被处理器执行时实现本申请上述的方法的步骤。
通过上述技术方案,利用标注了手掌轮廓的训练掌纹区域图像所训练的语义分割模型来确定人手图像中手掌区域的手掌轮廓,进而依据该手掌轮廓来从人手图像中提取出手掌区域;由此,本申请将图像轮廓纹理技术和神经网络技术相结合,能够快速精确地从人手图像中提取出手掌轮廓和对应的手掌区域。另一方面,训练人手图像是具有不同拍摄背景的,因此基于该语义分割模型能够完成对不同背景的人手图像的手掌区域提取操作。另外,语义分割模型对于特征样本图片的数据量要求和内存消耗都较低,使得技术方案能够借助一般通用的处理器及摄像头就能够实现,具有非常广泛的应用场景,能够广泛地应用于诸如手机这样的通用型终端中,为掌纹识别技术的市场推广提供了基础。
本申请实施例的其它特征和优点将在随后的具体实施方式部分予以详细说明。
附图说明
附图是用来提供对本申请实施例的进一步理解,并且构成说明书的一部分,与下面的具体实施方式一起用于解释本申请实施例,但并不构成对本申请实施例的限制。在附图中:
图1是本申请一实施例的图像手掌区域提取方法的流程图;
图2A示出的是第一示例的人手图像的示意图;
图2B示出了应用本申请实施例的图像手掌区域提取方法针对图2A的人手图像所确定的人手轮廓的示意图;
图3A示出的是第二示例的人手图像的示意图;
图3B示出了应用本申请实施例的图像手掌区域提取方法针对图3A的人手图像所提取的手掌区域的示意图;
图4A示出了应用本申请一实施例的图像手掌区域提取方法所提取得到的多个手掌区域的示意图;
图4B示出了应用本申请一实施例的图像手掌区域提取方法对图4A中的空洞进行填补之后的手掌区域的示意图;
图5示出了本申请一实施例的图像手掌区域提取方法中的针对所述语义分割模型的训练过程的流程图;
图6示出了本申请一实施例的图像手掌区域提取方法的原理流程图;
图7是本申请一实施例的图像手掌区域提取装置的结构框图;
图8是本申请另一实施例的图像手掌区域提取装置的结构框图;
图9是本申请一实施例的搭建有图像手掌区域提取装置的实体装置的结构框图。
具体实施方式
以下结合附图对本申请实施例的具体实施方式进行详细说明。应当理解的是,此处所描述的具体实施方式仅用于说明和解释本申请实施例,并不用于限制本申请实施例。如图1所示,本申请一实施例的图像手掌区域提取方法,包括:
S11、获取待识别的人手图像。关于本申请实施例方法的实施主体,一方面,其可以是专用于掌 纹识别或手掌区域提取的专用集成组件、专用服务器或专用终端等;另一方面,其还可以是通用型服务器或终端,其中该通用型服务器或终端(例如智能手机、平板电脑等)可以是安装有用于进行掌纹识别或手掌区域提取的模块或配置有用于图像手掌区域提取的程序代码,且以上都属于本申请的保护范围内。关于人手图像的获取方式,其可以是调用终端的摄像头来采集人手图像,也可以是终端或服务器接收自底层所上传的人手图像。
S12、基于语义分割模型,确定人手图像中手掌区域的手掌轮廓,其中语义分割模型是以具有不同拍摄背景的并标注了手掌轮廓的训练人手图像作为输入进行训练的。其中,训练人手图像所具有的不同拍摄背景可以是多样化的,例如其可以是对应于人手在室内背景、室外背景以及近肤色背景等拍摄背景的。另外,语义分割模型作为神经网络模型所采用的训练人手图像中的手掌轮廓的标注过程可以是非自动、半自动的或全自动的,例如可以是利用现有的图像轮廓提取工具(魔棒工具)并进行适应性的调整优化,从而实现对训练人手图像的轮廓的精确提取。
S13、根据手掌轮廓,从人手图像中提取出手掌区域。其中,一方面,可以是直接将人手图像中由手掌轮廓所覆盖的范围直接提取出来作为手掌区域。另一方面,还可以是将手掌轮廓所覆盖的范围进行相应的调整和优化来作为最终的手掌区域,例如补充空洞,且以上实施方式都属于本申请的保护范围内。在本申请实施例中,语义分割模型对图像像素要求及硬件内存要求不高,使得其可以被应用在智能终端中,例如集成特定应用APP的手机中;另外,通过采用不同背景的人手图像进行训练的语义分割模型,其能够实现对在各种背景下的手掌轮廓的确定操作。如图2A示出的是人手图像的示例,以及,如图2B示出了针对图2A的人手图像所确定的人手轮廓的示例;如图3A示出的是人手图像的示例,以及,如图3B示出了从图3A的人手图像中提取出的手掌区域的示例。由此,可以看出不论是在一般背景(如图2A-2B)下,还是在人脸背景的这样近肤色背景(如图3A-B)下,都能较佳地完成对人手图像中背景信息的消除和手掌区域的提取。在一些实施方式中,针对手掌轮廓的确定过程还可以是通过以下方式来实现的:基于语义分割模型,在人手图像中分割出封闭图像轮廓;另外,当存在多个封闭图像轮廓时,计算多个封闭图像轮廓所包含的区域面积,并将对应具有最大的区域面积的封闭图像轮廓确定为手掌轮廓。
可以理解的是,合格的人手区域是一个完整的封闭的区域,并且在人手图像中应当是最大的那一个封闭区域才是对应手掌区域的手掌轮廓;因此,语义分割模型可以是基于此规则来进行训练并筛选手掌轮廓的。如图4A,其示出了应用本申请一实施例的图像手掌区域提取方法所提取得到的多个手掌区域示例,其因为光照等原因,会有一些小区域没有有效提取出来,会显示有空洞(如图中的圆圈中所标出的示例)。鉴于此,本申请实施例还提出了通过以下方式来实现对空洞的检测和填补:检测在手掌轮廓内是否还存在其他封闭图像轮廓;若存在其他封闭图像轮廓,则将其他封闭图像轮廓确定为空洞区域;以及,根据手掌区域内的人手图像的内容,填充该空洞区域,例如可以是直接利用在该空洞区域附近的图像内容区域来填补空洞区域。如图4B,其示出了对图4A中的空洞进行填补之后的示例,因此实现了完整的手掌区域的提取,解决了手掌区域存在空洞而导致后续的掌纹识别操作无法正常进行的问题。
如图5所示,本申请一实施例的图像手掌区域提取方法中的针对所述语义分割模型的训练过程,包括:
S51、获取多张训练人手图像,其中该多张训练人手图像包括对应于不同的拍照背景的训练人手图像。如上所述的,为了使得语义分割模型的性能足够健壮,其需要在各种拍照背景下的人手图像的训练源输入,由此实现对包括室内、室外和近肤色背景的人手图像中提取手掌轮廓。
S52、提取多张训练人手图像分别所对应的手掌轮廓。如上所述的,可以是自动或半自动化地提取或标注人手图像中的手掌轮廓。
S53、将具有所述手掌轮廓的所述多张训练人手图像输入至所述语义分割模型,以训练所述语义分割模型,使得经训练的所述语义分割模型能够从不同背景的人手图像中分割出手掌轮廓。
在一些实施方式中,语义分割模型可以是包括编码器网络和解码器网络,其具体的训练过程可以是基于语义分割模型中的编码器网络,提取所输入的多张训练人手图像各自的手掌轮廓特征;基于语义分割模型中的解码器网络,使用在相应的编码器网络的最大池化步骤中计算的池化索引来执行非线性上采样操作,以将所提取的手掌轮廓特征映射到用于像素分类的全输入分辨率特征映射,从而训练语义分割模型。其中,由于上采样的地图是稀疏的,然后与可训练的滤波器卷积以产生密集的特征地图,并且解码器网络的作用是将低分辨率编码器特征映射映射到用于像素分类的全输入分辨率特征映射。由此,实现了良好分割性能所涉及的内存、准确度以及图像像素之间的权衡,即使是在低分辨率图像输入也依然能够实现全输入分辨率特征映射,提升了本申请实施例的应用范围。
在一些实施方式中,其还可以是选用批标准化(batch normalization,bn)处理的方式进行训练,由此加快训练速率。具体的,在针对语义分割模型的训练过程中还包括:为语义分割模型中的每个卷积层分别对应配置批标准化层,并在批标准化层之后还设置线性整流函数激活层(Rectified Linear Unit,ReLU),其中每个卷积层分别是对应于不同拍照背景的具有手掌轮廓的训练人手图像;基于批标准化层,在训练时向前传播,批标准化层只保存输入权值的均值和方差,权值输出回到卷积层时仍然是当初卷积后的权值,以及,在训练时向后传播,根据批标准化层中的均值和方差,结合每个卷积层与线性整流函数激活层进行链式求导,求得梯度和当前训练速率。因此,ReLU是对于传统激活函数sigmoid的改进,较佳地解决了训练过程中梯度消失的问题。
示例性地,语义分割模型可以是Segnet模型。如下表1,其示出了Segnet模型的具体结构组成和参数,分割模块SegNet包含依次的多层卷积层和多层反卷积层(如下表1所示的5层)。
Figure PCTCN2019117713-appb-000001
表1
其中,利用手掌图像和其相应的标签训练SegNet网络,优选地,应当选择超过2000张标注图 像来训练SegNet网络。具体的,模型训练的输入是标注有用户手掌边界的多张掌纹区域图像,输出是将手掌区域像素与背景区域像素进行归类,使得在应用该SegNet模型时,能够从具有背景的手掌图像中提取出相应的手掌区域和背景区域,实现将手掌与背景的分离。具体的训练过程可以是:将关联于同一应用场景(例如具有相同的背景信息)下的五帧预标注有手掌边界的手掌图像输入到SegNet网络,利用带动量(momentum)的Adam算法迭代训练SegNet网络,获得SegNet网络的参数。具体实施的带动量(momentum)的Adam算法中,动量设置为0.9。一共迭代250,000次,学习率为0.0001,批量大小为4。训练结束后,保存深度神经网络的参数。利用编码图像实现对编码器网络和解码器网络的训练,SegNet的Encoder过程中,卷积的作用是提取特征,SegNet使用的卷积为same卷积,即卷积后不改变图片大小;在Decoder过程中,同样使用same卷积,不过卷积的作用是为上采样(upsampling)变大的图像丰富信息,使得在池化(Pooling)过程丢失的信息可以通过学习在Decoder得到。解码器网络的作用是将低分辨率编码器特征映射映射到用于像素分类的全输入分辨率特征映射。SegNet网络在解码器对其较低分辨率输入特征图进行上采样的方式。具体地,解码器使用在相应编码器的最大池化步骤中计算的池化索引来执行非线性上采样。这消除了学习上采样的需要。上采样的地图是稀疏的,然后与可训练的滤波器卷积以产生密集的特征地图。由此,实现了良好分割性能所涉及的内存与准确度之间的权衡。使得本技术方案具有非常广泛的应用场景,例如其可以被应用在手机拍照图像的应用场景中。
更优选地,还可以选用批标准化处理的方式进行训练,由此加快训练速度。其中,批标准化的主要作用在于加快学习速度,用于激活函数前,在SegNet中每个卷积层都会加上一个bn(batch normalization,批标准化)层,bn层后面为ReLU(Rectified Linear Unit,线性整流函数)激活层。其中,基于BN层,在训练时向前传播,bn层对卷积后的特征值(权值)进行标准化,但是输出不变,即bn层只保存输入权值的均值与方差,权值输出回到卷积层时仍然是当初卷积后的权值;以及,在训练时向后传播,根据bn层中的均值与方差,结合每个卷积层与ReLU层进行链式求导,求得梯度从而计算出当前的学习速率。ReLU是对于传统激活函数sigmoid的改进,主要在梯度消失的问题上得到很好的解决。如图6所示,本申请一实施例的图像手掌区域提取方法的原理流程,本申请实施例提供一种基于FCN的Segnet网络从手掌图像中提取手掌部分以消除手掌背景信息的方法,其中语义分割方法在处理图像时,具体到像素级别,也就是说会将图像中每个像素分配到某个对象类别,还要标出每个对象的边界。因此,与分类目的不同,相关模型要具有像素级的密集预测能力。在本申请实施例方法中,主要涉及针对基于Segnet语义分割模型的训练阶段和针对基于Segnet语义分割模型的应用阶段。
1)针对基于Segnet语义分割模型的训练阶段
S61、获取经标注有人手掌边界的手掌图像。
首先,需要获取手掌图像,然后需要对手掌图像进行人工标注。关于手掌图像的获取方式,其可以是通过收集由相机(例如手机的相机)拍照所产生的与人手相关的图像,其具体可以是人工拍摄的、或也可以是从互联网上关键词搜索下载而得到的等,这些拍摄手掌图像或网上下载的手掌图像会存在背景区域和手掌区域。关于手掌图像的手掌区域的人工标注,其可以是基于纯粹的人工标注操作进行的,另外其还可以是基于现有的像素自适应匹配分割算法工具(例如魔棒工具)识别出边界,并 在之后对所自适应识别的边界进行标注调整,由此实现在照片图像中标注出手掌区域和背景区域,进而为训练手掌图像打标签。作为示例,可以是采集400个人的总数约3000张手机拍摄的照片,分别于室内、室外,背景包含办公桌,电脑,树木,大厦等不同背景下进行取景,这样有利于Segnet模型对不同背景的区别和识别;进一步地,由于人脸和手掌都为肤色,在目前的手掌区域提取的过程中的一大难点就是如何从有人脸背景的手掌图像中精确地提取出手掌区域,因此本技术方案中提出还可以是特别采集了部分人脸背景的图片和一些近肤色的背景图片作为训练样本来对Segnet模型进行训练,从而实现从具有人脸背景或近肤色背景的图像中识别出手掌区域。
S62、将经标注有人手掌边界的手掌图像输入至vgg Segnet模型,以训练该模型。
2)针对基于Segnet语义分割模型的应用步骤
基于Segnet语义分割模型的应用,能够有效地将手掌与背景区域进行分离。并且,Segnet语义分割模型对于特征样本图片的数据量要求和内存消耗也很低,能够适用在手机中应用,例如可以是集成在手机APP中而被使用的。
S63、用户调用相机模块对手掌进行拍照人体手掌图像。
具体的,S63可以是对应于不同的应用场景,例如可以是用户打开手机APP,并通过特定的用户操作来调用相应的相机模块。
S64、调用Segnet语义分割模型,对人体手掌图像中的背景区域与手掌区域进行分割。
基于Segnet语义分割模型,端到端地推导出人体手掌图像中的手掌区域和背景区域,实现了对手掌区域的快速分割;并且,由于Segnet语义分割模型对于特征样本图片的数据量要求和内存消耗都较低,使得本申请实施例方案能够借助一般通用的处理器及摄像头就能够实现,具有非常广泛的应用场景,例如可以将其应用在APP中。因为就目前来看,目前掌纹识别技术还是基于普通编解码技术,其对图像像素及图像区域规范的要求非常高,因此一般也只能被应用在固定设备上,通过本技术方案能够从有杂乱背景的图像中提取出手掌区域,不限于规范的手掌摆放位置,并且基于Segnet模型的训练和识别过程(如上所描述的),对于低图像像素同样也能够实现,使得本技术方案能够广泛地应用在移动终端,例如被移植在手机APP中。
另外,因为Segnet在识别的过程中基于颜色和光亮的识别权重较多,但是在为手掌拍照时,可能会存在图像中手掌上的颜色或光亮不均匀的情况,导致手掌上存在未被识别的空洞。此时,我们提出可以是仅利用Segnet寻找手掌边界的轮廓,从而实现手掌边界在图像中的定位;然后,在原图中的对应位置标注轮廓,并对原图轮廓区域进行分割,从而能够杜绝手掌识别过程中空洞的出现。
如图7所示,本申请一实施例的图像手掌区域提取装置,包括:获取单元701,用于获取待识别的人手图像;手掌轮廓确定单元702,用于基于语义分割模型,确定所述人手图像中手掌区域的手掌轮廓,其中所述语义分割模型是以具有不同拍摄背景的并标注了手掌轮廓的训练人手图像作为输入进行训练的;手掌区域提取单元703,用于根据所述手掌轮廓,从所述人手图像中提取出手掌区域。
优选的,所述手掌轮廓确定单元702还用于基于语义分割模型,在所述人手图像中分割出封闭图像轮廓,以及,当存在多个封闭图像轮廓时,计算所述多个封闭图像轮廓所包含的区域面积,并将对应具有最大的所述区域面积的封闭图像轮廓确定为所述手掌轮廓。
优选的,所述手掌区域提取单元703还用于检测在所述手掌轮廓内是否还存在其他封闭图像轮 廓,以及,若存在所述其他封闭图像轮廓,则将所述其他封闭图像轮廓确定为空洞区域,以及,根据所述手掌区域内的人手图像的内容,填充所述空洞区域。
在具体的应用场景中,如图8所示,该装置还包括训练单元704,该训练单元704用于获取多张训练人手图像,其中所述多张训练人手图像包括对应于不同的拍照背景的训练人手图像;提取所述多张训练人手图像分别所对应的手掌轮廓;将具有所述手掌轮廓的所述多张训练人手图像输入至所述语义分割模型,以训练所述语义分割模型,使得经训练的所述语义分割模型能够从不同背景的人手图像中分割出手掌轮廓。
优选地,该训练单元704还用于基于所述语义分割模型中的编码器网络,提取所输入的所述多张训练人手图像各自的手掌轮廓特征,以及,基于所述语义分割模型中的解码器网络,使用在相应的所述编码器网络的最大池化步骤中计算的池化索引来执行非线性上采样操作,以将所提取的手掌轮廓特征映射到用于像素分类的全输入分辨率特征映射,从而训练所述语义分割模型。
优选地,所述语义分割模型为卷积神经网络,其中该训练单元704还用于为所述语义分割模型中的每个卷积层分别对应配置批标准化层,并在所述批标准化层之后还设置线性整流函数激活层,其中所述每个卷积层分别是对应于不同拍照背景的具有手掌轮廓的训练人手图像,以及,基于所述批标准化层,在训练时向前传播,批标准化层只保存输入权值的均值和方差,权值输出回到卷积层时仍然是当初卷积后的权值,以及,在训练时向后传播,根据所述批标准化层中的所述均值和所述方差,结合所述每个卷积层与线性整流函数激活层进行链式求导,求得梯度和当前训练速率。在一些实施方式中,所述语义分割模型为Segnet模型。
需要说明的是,本申请实施例提供的一种图像手掌区域提取装置所涉及各功能单元的其他相应描述,可以参考图1-6中的对应描述,在此不再赘述。
基于上述如图1-6所示方法,相应的,本申请实施例还提供了一种非易失性可读存储介质,其上存储有计算机可读指令,该计算机可读指令被处理器执行时实现上述如图1-6所示的图像手掌区域提取方法。基于上述如图1-6所示方法和如图7、8所示虚拟装置的实施例,为了实现上述目的,如图9所示,本申请实施例还提供了一种计算机设备90,该计算机设备90包括存储设备901和处理器902;所述存储设备901,用于存储计算机可读指令;所述处理器902,用于执行所述计算机可读指令以实现上述如图1-6所示的图像手掌区域提取方法。
通过应用本申请的技术方案,利用标注了手掌轮廓的训练掌纹区域图像所训练的语义分割模型来确定人手图像中手掌区域的手掌轮廓,进而依据该手掌轮廓来从人手图像中提取出手掌区域;由此,本申请将图像轮廓纹理技术和神经网络技术相结合,能够快速精确地从人手图像中提取出手掌轮廓和对应的手掌区域。另一方面,训练人手图像是具有不同拍摄背景的,因此基于该语义分割模型能够完成对不同背景的人手图像的手掌区域提取操作。另外,语义分割模型对于特征样本图片的数据量要求和内存消耗都较低,使得技术方案能够借助一般通用的处理器及摄像头就能够实现,具有非常广泛的应用场景,能够广泛地应用于诸如手机这样的通用型终端中,为掌纹识别技术的市场推广提供了基础。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到本申请可以通过硬件实现,也可以借助软件加必要的通用硬件平台的方式来实现。基于这样的理解,本申请的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动 硬盘等)中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施场景所述的方法。
本领域技术人员可以理解附图只是一个优选实施场景的示意图,附图中的模块或流程并不一定是实施本申请所必须的。本领域技术人员可以理解实施场景中的装置中的模块可以按照实施场景描述进行分布于实施场景的装置中,也可以进行相应变化位于不同于本实施场景的一个或多个装置中。上述实施场景的模块可以合并为一个模块,也可以进一步拆分成多个子模块。上述本申请序号仅仅为了描述,不代表实施场景的优劣。以上公开的仅为本申请的几个具体实施场景,但是,本申请并非局限于此,任何本领域的技术人员能思之的变化都应落入本申请的保护范围。

Claims (20)

  1. 一种图像手掌区域提取方法,包括:
    获取待识别的人手图像;
    基于语义分割模型,确定所述人手图像中手掌区域的手掌轮廓,其中所述语义分割模型是以具有不同拍摄背景的并标注了手掌轮廓的训练人手图像作为输入进行训练的;
    根据所述手掌轮廓,从所述人手图像中提取出手掌区域;
    所述基于语义分割模型,确定所述人手图像中手掌区域的手掌轮廓包括:
    基于语义分割模型,在所述人手图像中分割出封闭图像轮廓;
    当存在多个封闭图像轮廓时,计算所述多个封闭图像轮廓所包含的区域面积,并将对应具有最大的所述区域面积的封闭图像轮廓确定为所述手掌轮廓。
  2. 根据权利要求1所述的方法,所述根据所述手掌轮廓,从所述人手图像中提取出手掌区域包括:
    检测在所述手掌轮廓内是否还存在其他封闭图像轮廓;
    若存在所述其他封闭图像轮廓,则将所述其他封闭图像轮廓确定为空洞区域;以及
    根据所述手掌区域内的人手图像的内容,填充所述空洞区域。
  3. 根据权利要求1所述的方法,该方法还包括针对所述语义分割模型的训练过程,所述针对语义分割模型的训练过程包括:
    获取多张训练人手图像,其中所述多张训练人手图像包括对应于不同的拍照背景的训练人手图像;
    提取所述多张训练人手图像分别所对应的手掌轮廓;
    将具有所述手掌轮廓的所述多张训练人手图像输入至所述语义分割模型,以训练所述语义分割模型,使得经训练的所述语义分割模型能够从不同背景的人手图像中分割出手掌轮廓。
  4. 根据权利要求3所述的方法,所述将具有所述手掌轮廓的所述多张训练人手图像输入至所述语义分割模型,以训练所述语义分割模型包括:
    基于所述语义分割模型中的编码器网络,提取所输入的所述多张训练人手图像各自的手掌轮廓特征;
    基于所述语义分割模型中的解码器网络,使用在相应的所述编码器网络的最大池化步骤中计算的池化索引来执行非线性上采样操作,以将所提取的手掌轮廓特征映射到用于像素分类的全输入分辨率特征映射,从而训练所述语义分割模型。
  5. 根据权利要求3所述的方法,所述语义分割模型为卷积神经网络,其中所述针对语义分割模型的训练过程还包括:
    为所述语义分割模型中的每个卷积层分别对应配置批标准化层,并在所述批标准化层之后还设置线性整流函数激活层,其中所述每个卷积层分别是对应于不同拍照背景的具有手掌轮廓的训练人手图像;
    基于所述批标准化层,在训练时向前传播,批标准化层只保存输入权值的均值和方差,权值输出回到卷积层时仍然是当初卷积后的权值,以及,
    在训练时向后传播,根据所述批标准化层中的所述均值和所述方差,结合所述每个卷积层与线性整流函数激活层进行链式求导,求得梯度和当前训练速率。
  6. 根据权利要求1所述的方法,其中,所述语义分割模型为Segnet模型,所述Segnet模型包括均采用same卷积的多组卷积层和相应的反卷积层,其中所述卷积层为用于提取特征的编码器网络,且所述反卷积层为用于执行非线性上采样操作的解码器网络。
  7. 一种图像手掌区域提取装置,包括:
    获取单元,用于获取待识别的人手图像;
    手掌轮廓确定单元,用于基于语义分割模型,确定所述人手图像中手掌区域的手掌轮廓,其中所述语义分割模型是以具有不同拍摄背景的并标注了手掌轮廓的训练人手图像作为输入进行训练的;
    手掌区域提取单元,用于根据所述手掌轮廓,从所述人手图像中提取出手掌区域;
    所述手掌轮廓确定单元还用于基于语义分割模型,在所述人手图像中分割出封闭图像轮廓;当存在多个封闭图像轮廓时,计算所述多个封闭图像轮廓所包含的区域面积,并将对应具有最大的所述区域面积的封闭图像轮廓确定为所述手掌轮廓。
  8. 根据权利要求7所述的装置,所述手掌区域提取单元还用于检测在所述手掌轮廓内是否还存在其他封闭图像轮廓;若存在所述其他封闭图像轮廓,则将所述其他封闭图像轮廓确定为空洞区域;以及根据所述手掌区域内的人手图像的内容,填充所述空洞区域。
  9. 根据权利要求8所述的装置,所述装置还包括:训练单元;
    所述训练单元,用于获取多张训练人手图像,其中所述多张训练人手图像包括对应于不同的拍照背景的训练人手图像;提取所述多张训练人手图像分别所对应的手掌轮廓;将具有所述手掌轮廓的所述多张训练人手图像输入至所述语义分割模型,以训练所述语义分割模型,使得经训练的所述语义分割模型能够从不同背景的人手图像中分割出手掌轮廓。
  10. 根据权利要求9所述的装置,所述训练单元,还用于基于所述语义分割模型中的编码器网络,提取所输入的所述多张训练人手图像各自的手掌轮廓特征;基于所述语义分割模型中的解码器网络,使用在相应的所述编码器网络的最大池化步骤中计算的池化索引来执行非线性上采样操作,以将所提取的手掌轮廓特征映射到用于像素分类的全输入分辨率特征映射,从而训练所述语义分割模型。
  11. 根据权利要求9所述的装置,所述语义分割模型为卷积神经网络,其中所述训练单元,还用于为所述语义分割模型中的每个卷积层分别对应配置批标准化层,并在所述批标准化层之后还设置线性整流函数激活层,其中所述每个卷积层分别是对应于不同拍照背景的具有手掌轮廓的训练人手图像;基于所述批标准化层,在训练时向前传播,批标准化层只保存输入权值的均值和方差,权值输出回到卷积层时仍然是当初卷积后的权值,以及,在训练时向后传播,根据所述批标准化层中的所述均值和所述方差,结合所述每个卷积层与线性整流函数激活层进行链式求导,求得梯度和当前训练速率。
  12. 根据权利要求7所述的装置,其中,所述语义分割模型为Segnet模型,所述Segnet模型包括均采用same卷积的多组卷积层和相应的反卷积层,其中所述卷积层为用于提取特征的编码器网 络,且所述反卷积层为用于执行非线性上采样操作的解码器网络。
  13. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机可读指令,其中,所述处理器执行所述计算机可读指令时实现图像手掌区域提取方法,包括:获取待识别的人手图像;基于语义分割模型,确定所述人手图像中手掌区域的手掌轮廓,其中所述语义分割模型是以具有不同拍摄背景的并标注了手掌轮廓的训练人手图像作为输入进行训练的;根据所述手掌轮廓,从所述人手图像中提取出手掌区域;
    所述处理器执行所述计算机可读指令时实现所述基于语义分割模型,确定所述人手图像中手掌区域的手掌轮廓包括:基于语义分割模型,在所述人手图像中分割出封闭图像轮廓;当存在多个封闭图像轮廓时,计算所述多个封闭图像轮廓所包含的区域面积,并将对应具有最大的所述区域面积的封闭图像轮廓确定为所述手掌轮廓。
  14. 根据权利要求13所述的计算机设备,所述处理器执行所述计算机可读指令时实现所述根据所述手掌轮廓,从所述人手图像中提取出手掌区域包括:检测在所述手掌轮廓内是否还存在其他封闭图像轮廓;若存在所述其他封闭图像轮廓,则将所述其他封闭图像轮廓确定为空洞区域;以及根据所述手掌区域内的人手图像的内容,填充所述空洞区域。
  15. 根据权利要求13所述的计算机设备,所述处理器执行所述计算机可读指令时实现该方法还包括针对所述语义分割模型的训练过程,所述处理器执行所述计算机可读指令时实现所述针对语义分割模型的训练过程包括:获取多张训练人手图像,其中所述多张训练人手图像包括对应于不同的拍照背景的训练人手图像;提取所述多张训练人手图像分别所对应的手掌轮廓;将具有所述手掌轮廓的所述多张训练人手图像输入至所述语义分割模型,以训练所述语义分割模型,使得经训练的所述语义分割模型能够从不同背景的人手图像中分割出手掌轮廓。
  16. 根据权利要求15所述的计算机设备,所述处理器执行所述计算机可读指令时实现所述将具有所述手掌轮廓的所述多张训练人手图像输入至所述语义分割模型,以训练所述语义分割模型包括:基于所述语义分割模型中的编码器网络,提取所输入的所述多张训练人手图像各自的手掌轮廓特征;基于所述语义分割模型中的解码器网络,使用在相应的所述编码器网络的最大池化步骤中计算的池化索引来执行非线性上采样操作,以将所提取的手掌轮廓特征映射到用于像素分类的全输入分辨率特征映射,从而训练所述语义分割模型。
  17. 一种非易失性可读存储介质,其上存储有计算机可读指令,其中,所述计算机可读指令被处理器执行时实现图像手掌区域提取方法,包括:获取待识别的人手图像;基于语义分割模型,确定所述人手图像中手掌区域的手掌轮廓,其中所述语义分割模型是以具有不同拍摄背景的并标注了手掌轮廓的训练人手图像作为输入进行训练的;根据所述手掌轮廓,从所述人手图像中提取出手掌区域;
    所述处理器执行所述计算机可读指令时实现所述基于语义分割模型,确定所述人手图像中手掌区域的手掌轮廓包括:基于语义分割模型,在所述人手图像中分割出封闭图像轮廓;当存在多个封闭图像轮廓时,计算所述多个封闭图像轮廓所包含的区域面积,并将对应具有最大的所述区域面积的封闭图像轮廓确定为所述手掌轮廓。
  18. 根据权利要求17所述的存储介质,所述处理器执行所述计算机可读指令时实现所述根据所 述手掌轮廓,从所述人手图像中提取出手掌区域包括:检测在所述手掌轮廓内是否还存在其他封闭图像轮廓;若存在所述其他封闭图像轮廓,则将所述其他封闭图像轮廓确定为空洞区域;以及根据所述手掌区域内的人手图像的内容,填充所述空洞区域。
  19. 根据权利要求17所述的存储介质,所述处理器执行所述计算机可读指令时实现该方法还包括针对所述语义分割模型的训练过程,所述处理器执行所述计算机可读指令时实现所述针对语义分割模型的训练过程包括:获取多张训练人手图像,其中所述多张训练人手图像包括对应于不同的拍照背景的训练人手图像;提取所述多张训练人手图像分别所对应的手掌轮廓;将具有所述手掌轮廓的所述多张训练人手图像输入至所述语义分割模型,以训练所述语义分割模型,使得经训练的所述语义分割模型能够从不同背景的人手图像中分割出手掌轮廓。
  20. 根据权利要求19所述的存储介质,所述处理器执行所述计算机可读指令时实现所述将具有所述手掌轮廓的所述多张训练人手图像输入至所述语义分割模型,以训练所述语义分割模型包括:基于所述语义分割模型中的编码器网络,提取所输入的所述多张训练人手图像各自的手掌轮廓特征;基于所述语义分割模型中的解码器网络,使用在相应的所述编码器网络的最大池化步骤中计算的池化索引来执行非线性上采样操作,以将所提取的手掌轮廓特征映射到用于像素分类的全输入分辨率特征映射,从而训练所述语义分割模型。
PCT/CN2019/117713 2019-05-10 2019-11-12 图像手掌区域提取方法及装置 WO2020228279A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910390289.5A CN110287771A (zh) 2019-05-10 2019-05-10 图像手掌区域提取方法及装置
CN201910390289.5 2019-05-10

Publications (1)

Publication Number Publication Date
WO2020228279A1 true WO2020228279A1 (zh) 2020-11-19

Family

ID=68001533

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/117713 WO2020228279A1 (zh) 2019-05-10 2019-11-12 图像手掌区域提取方法及装置

Country Status (2)

Country Link
CN (1) CN110287771A (zh)
WO (1) WO2020228279A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113496482A (zh) * 2021-05-21 2021-10-12 郑州大学 一种毒驾试纸图像分割模型、定位分割方法及便携式装置
CN113592885A (zh) * 2021-06-29 2021-11-02 中南大学 基于SegNet-RS网络的大障碍物轮廓分割方法

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110287771A (zh) * 2019-05-10 2019-09-27 平安科技(深圳)有限公司 图像手掌区域提取方法及装置
CN111046835A (zh) * 2019-12-24 2020-04-21 杭州求是创新健康科技有限公司 一种基于区域特征集合神经网络的眼底照多病种检测***
CN111563477A (zh) * 2020-05-21 2020-08-21 苏州沃柯雷克智能***有限公司 一种合格手部照片获取方法、装置、设备及存储介质
CN111782219B (zh) * 2020-07-16 2024-03-22 矩阵元技术(深圳)有限公司 基于TensorFlow的自定义类型实现方法和装置
CN113808151A (zh) * 2021-09-09 2021-12-17 广州方硅信息技术有限公司 直播图像的弱语义轮廓检测方法、装置、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104680123A (zh) * 2013-11-26 2015-06-03 富士通株式会社 对象识别装置、对象识别方法和程序
CN108389210A (zh) * 2018-02-28 2018-08-10 深圳天琴医疗科技有限公司 一种医学图像分割方法及装置
US20180307911A1 (en) * 2017-04-21 2018-10-25 Delphi Technologies, Llc Method for the semantic segmentation of an image
CN108876791A (zh) * 2017-10-23 2018-11-23 北京旷视科技有限公司 图像处理方法、装置和***及存储介质
CN109426825A (zh) * 2017-08-31 2019-03-05 北京图森未来科技有限公司 一种物体封闭轮廓的检测方法和装置
CN110287771A (zh) * 2019-05-10 2019-09-27 平安科技(深圳)有限公司 图像手掌区域提取方法及装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9213890B2 (en) * 2010-09-17 2015-12-15 Sony Corporation Gesture recognition system for TV control
CN105701513B (zh) * 2016-01-14 2019-06-07 深圳市未来媒体技术研究院 快速提取掌纹感兴趣区域的方法
CN106845388B (zh) * 2017-01-18 2020-04-14 北京交通大学 基于复杂场景的移动终端掌纹感兴趣区域的提取方法
CN107256395A (zh) * 2017-06-12 2017-10-17 成都芯软科技股份公司 掌静脉提取方法及装置
CN107808143B (zh) * 2017-11-10 2021-06-01 西安电子科技大学 基于计算机视觉的动态手势识别方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104680123A (zh) * 2013-11-26 2015-06-03 富士通株式会社 对象识别装置、对象识别方法和程序
US20180307911A1 (en) * 2017-04-21 2018-10-25 Delphi Technologies, Llc Method for the semantic segmentation of an image
CN109426825A (zh) * 2017-08-31 2019-03-05 北京图森未来科技有限公司 一种物体封闭轮廓的检测方法和装置
CN108876791A (zh) * 2017-10-23 2018-11-23 北京旷视科技有限公司 图像处理方法、装置和***及存储介质
CN108389210A (zh) * 2018-02-28 2018-08-10 深圳天琴医疗科技有限公司 一种医学图像分割方法及装置
CN110287771A (zh) * 2019-05-10 2019-09-27 平安科技(深圳)有限公司 图像手掌区域提取方法及装置

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113496482A (zh) * 2021-05-21 2021-10-12 郑州大学 一种毒驾试纸图像分割模型、定位分割方法及便携式装置
CN113496482B (zh) * 2021-05-21 2022-10-04 郑州大学 一种毒驾试纸图像分割模型、定位分割方法及便携式装置
CN113592885A (zh) * 2021-06-29 2021-11-02 中南大学 基于SegNet-RS网络的大障碍物轮廓分割方法
CN113592885B (zh) * 2021-06-29 2024-03-12 中南大学 基于SegNet-RS网络的大障碍物轮廓分割方法

Also Published As

Publication number Publication date
CN110287771A (zh) 2019-09-27

Similar Documents

Publication Publication Date Title
WO2020228279A1 (zh) 图像手掌区域提取方法及装置
US11532154B2 (en) System and method for providing dominant scene classification by semantic segmentation
CN110662484B (zh) 用于全身测量结果提取的***和方法
CN103617432B (zh) 一种场景识别方法及装置
AU2017261537B2 (en) Automated selection of keeper images from a burst photo captured set
EP3338217B1 (en) Feature detection and masking in images based on color distributions
JP6905602B2 (ja) 画像照明方法、装置、電子機器および記憶媒体
CN106056064B (zh) 一种人脸识别方法及人脸识别装置
Bako et al. Removing shadows from images of documents
JP2017531950A (ja) 撮影テンプレートデータベースを構築し、且つ撮影推薦情報を提供するための方法及び装置
WO2021078001A1 (zh) 一种图像增强方法及装置
US9256950B1 (en) Detecting and modifying facial features of persons in images
WO2022021029A1 (zh) 检测模型训练方法、装置、检测模型使用方法及存储介质
CN109637664A (zh) 一种bmi评测方法、装置及计算机可读存储介质
CN108198177A (zh) 图像获取方法、装置、终端及存储介质
CN107395960A (zh) 拍照方法及装置、计算机装置和计算机可读存储介质
CN109274891B (zh) 一种图像处理方法、装置及其存储介质
CN110807759A (zh) 照片质量的评价方法及装置、电子设备、可读存储介质
CN110415212A (zh) 异常细胞检测方法、装置及计算机可读存储介质
WO2024021742A1 (zh) 一种注视点估计方法及相关设备
CN108492301A (zh) 一种场景分割方法、终端及存储介质
CN109325903A (zh) 图像风格化重建的方法及装置
CN105247606B (zh) 一种照片显示方法及用户终端
CN110298327A (zh) 一种视觉特效处理方法及装置、存储介质与终端
CN112651333A (zh) 静默活体检测方法、装置、终端设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19928666

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19928666

Country of ref document: EP

Kind code of ref document: A1