CN114511910A

CN114511910A - Face brushing payment intention identification method, device and equipment

Info

Publication number: CN114511910A
Application number: CN202210180456.5A
Authority: CN
Inventors: 尹英杰; 丁菁汀; 李亮
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2022-02-25
Filing date: 2022-02-25
Publication date: 2022-05-17

Abstract

The embodiment of the specification discloses a method, a device and equipment for recognizing willingness to pay by brushing face. The scheme comprises the following steps: acquiring a face brushing 2D image and a face brushing 3D image corresponding to the face brushing 2D image; determining candidate persons to be identified in the face brushing 2D image, and respectively generating corresponding mask maps according to first located areas of the candidate persons in the face brushing 2D image so as to distinguish the first located areas from other areas in the face brushing 2D image; extracting the features of the face brushing 2D image, and obtaining a first fusion feature according to the features of the face brushing 2D image and the mask image; extracting the features of the face brushing 3D image, and obtaining second fusion features according to the first fusion features and the features of the face brushing 3D image; and identifying whether each candidate has a willingness to swipe face according to the second fusion characteristic. The safety of face brushing payment can be improved.

Description

Face brushing payment intention identification method, device and equipment

Technical Field

The specification relates to the technical field of machine learning, in particular to a method, a device and equipment for recognizing willingness to pay by brushing face.

Background

With the development of computer and internet technologies, many services can be performed on line, and the development of various online service platforms is promoted. Wherein, brush face payment indicates the novel payment mode based on technologies such as artificial intelligence, machine vision, 3D sensing, big data realization, through adopting face identification as authentication's payment mode, has brought very big convenience for the user, receives user's general liking.

At present, in a face-brushing payment scene, a user to be paid needs to stand in front of equipment with a face-brushing payment function to perform face recognition after starting face-brushing payment. However, during the face brushing process, a plurality of users may stand in front of the device, which may cause the plurality of users to appear in the face brushing image captured by the device. At this time, when the device performs face recognition on the face brushing image, it is difficult to determine which user is the current user to be paid, that is, which user has a will of face brushing payment. In other words, only the current user to be paid has a willingness to swipe a face, and the other users do not have a willingness to swipe a face.

Based on this, brush face payment wish identification is the important link to brushing face safety guarantee in the payment system, helps promoting to brush face safety and experiences, but, if other users are discerned to equipment, to other users discerning, the mistake will appear and brush face payment to reduce the security of brushing face payment.

Based on this, a more secure identification scheme is required for face-brushing payments.

Disclosure of Invention

One or more embodiments of the present specification provide a method, an apparatus, a device, and a storage medium for recognizing a willingness to pay by swiping a face, so as to solve the following technical problems: a more secure identification scheme is needed for face-brushing payments.

To solve the above technical problem, one or more embodiments of the present specification are implemented as follows:

one or more embodiments of the present specification provide a method for recognizing a willingness to pay by swiping a face, including:

acquiring a face brushing 2D image and a face brushing 3D image corresponding to the face brushing 2D image;

determining candidate persons to be identified in the face brushing 2D image, and respectively generating corresponding mask maps according to first located areas of the candidate persons in the face brushing 2D image so as to distinguish the first located areas from other areas in the face brushing 2D image;

extracting the features of the face brushing 2D image, and obtaining a first fusion feature according to the features of the face brushing 2D image and the mask image;

extracting the features of the face brushing 3D image, and obtaining second fusion features according to the first fusion features and the features of the face brushing 3D image;

and identifying whether each candidate has a willingness to swipe face according to the second fusion characteristic.

One or more embodiments of the present specification provide a device for recognizing a willingness to pay by swiping a face, including:

the acquisition module is used for acquiring a face brushing 2D image and a face brushing 3D image corresponding to the face brushing 2D image;

the generating module is used for determining candidate persons to be identified in the face brushing 2D image, and respectively generating corresponding mask maps according to first located areas of the candidate persons in the face brushing 2D image so as to distinguish the first located areas from other areas in the face brushing 2D image;

the first extraction module is used for extracting the features of the face brushing 2D image and obtaining first fusion features according to the features of the face brushing 2D image and the mask image;

the second extraction module is used for extracting the features of the face brushing 3D image and obtaining second fusion features according to the first fusion features and the features of the face brushing 3D image;

and the identification module identifies whether each candidate has a face brushing willingness to pay according to the second fusion characteristics.

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to:

One or more embodiments of the present specification provide a non-volatile computer storage medium having stored thereon computer-executable instructions configured to:

at least one technical scheme adopted by one or more embodiments of the specification can achieve the following beneficial effects:

by respectively generating corresponding mask maps for the first areas of the candidates in the face brushing 2D image, the feature information of the candidates can be clearer, the difference between the face brushing willingness and the face brushing willingness is increased, the image comparison effect can be enhanced through the first fusion feature, so that the attention is focused on the candidates with the face brushing willingness, whether each candidate has the face brushing willingness or not is identified through the second fusion feature, the features of the corresponding candidates in the face brushing 3D image and the features of the candidates in the face brushing 2D image are combined and mutually supplement the facial features of the same candidate, the accuracy of face recognition can be further improved, and the candidates with the face brushing willingness and the candidates without the face brushing willingness in the face brushing image can be more accurately distinguished, the face brushing payment willingness of the candidate in the face brushing image is identified in a more targeted manner, so that the face brushing safety experience can be enhanced.

Drawings

In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only some embodiments described in the present specification, and for those skilled in the art, other drawings can be obtained according to the drawings without any creative effort.

Fig. 1 is a schematic flow chart of a method for recognizing a willingness to pay by swiping a face according to one or more embodiments of the present disclosure;

fig. 2 is a schematic diagram of a framework of a system for recognizing a willingness to swipe payment provided in one or more embodiments of the present specification;

fig. 3 is a schematic flowchart of a method for recognizing a willingness to pay by swiping based on end-to-end learning of a deep convolutional neural network, according to one or more embodiments of the present disclosure;

fig. 4 is a schematic structural diagram of a device for recognizing a willingness to pay by brushing face according to one or more embodiments of the present disclosure;

fig. 5 is a schematic structural diagram of a device for recognizing a willingness to swipe payment according to one or more embodiments of the present disclosure.

Detailed Description

The embodiment of the specification provides a method, a device, equipment and a storage medium for recognizing a brushing willingness-to-pay.

In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any inventive step based on the embodiments of the present disclosure, shall fall within the scope of protection of the present application.

Fig. 1 is a schematic flowchart of a method for recognizing a willingness to pay by swiping a face according to one or more embodiments of the present disclosure. The process can be executed by an electronic device with a face-brushing payment function, the electronic device can be a terminal with an image data processing function, for example, the electronic device can be a mobile terminal such as a mobile phone, a tablet, a notebook, and the like, and can also be a fixed terminal such as a desktop, or a server, and some input parameters or intermediate results in the process allow manual intervention and adjustment to help improve accuracy.

In one or more embodiments of the present specification, a candidate to be identified refers to a user who needs to pay a related fee, and it should be noted that the candidate wants to perform face brushing payment, needs to perform identity information registration on a corresponding client, and inputs face information, so that when the candidate starts face brushing payment, after it is identified that the candidate has a recognition intention of face brushing payment, the candidate is authenticated through the pre-registered face information.

That is, the face information of the candidate is included in the face brushing image, whether the candidate has a willingness to brush the face can be obtained by identifying the face information, and then the candidate can be authenticated through the face information. The brushing face image includes at least one of a brushing face 2D image and a brushing face 3D image.

The electronic device may acquire the face brushing image through the pre-installed image acquisition device after receiving the face brushing payment instruction, or the electronic device may generate the face brushing payment instruction according to the payment order and acquire the face brushing image through the image acquisition device.

The number of candidates in front of the image capturing device may be one or more, and when the number of candidates is plural, the brushing image includes plural candidates, and when the number of candidates is one, the brushing image includes one candidate. Meanwhile, the face brushing image not only includes face information of the candidate, but also includes other characteristic information of the candidate, such as trunk information and limb information, and also includes other objects which do not need to be identified, such as tables and chairs, hanging objects and the like included in the environment where the candidate is located.

In addition, in a normal case, when the electronic device executes a single face-brushing payment instruction, the specific candidate currently starting face-brushing payment is subjected to identity authentication, and the specific candidate generally has a face-brushing willingness, that is, when the single face-brushing payment instruction is executed, even if a plurality of candidates are included in a face-brushing image, the plurality of candidates do not all have the face-brushing willingness, only the specific candidate has the face-brushing willingness, and the specific candidate can be considered as safe in the willingness to pay, and other candidates are not safe in the willingness to pay.

For example, in public places, off-line Internet of Things (IoT) face brushing machines are often used for face brushing payments. The face brushing IoT machine in the public place is a machine with a face brushing function set in public consumption scenes such as business surpasses, convenience stores, restaurants, wine hotels, campus education medical treatment and campus education. The face-brushing IoT machine tool generally integrates various sensing information, such as 2D vision and 3D vision, so that the face-brushing payment system is supported to realize face recognition.

If A candidate clicks face-brushing payment, face-brushing payment is started, an IoT face-brushing machine receives a face-brushing payment instruction, a face-brushing image is obtained through image acquisition equipment, the face-brushing image is in an open public place, the situation that a plurality of candidate people queue for payment exists, the face-brushing image obtained by the IoT machine possibly comprises a plurality of candidate people, however, in the candidate people, only A candidate people actually have face-brushing payment will, at the moment, on the premise that A candidate people have face-brushing payment will, identity authentication is carried out on the A candidate people through face information of the A candidate people, namely, other candidate people do not actually have face-brushing payment will.

Further, if the B candidate is ranked behind the a candidate or the B candidate is side by side with the a candidate, even if the B candidate has no willingness to swipe, the image capturing apparatus may capture the B candidate at the same time as the a candidate during the face-swipe payment authentication, resulting in the captured face-swipe image including both the a candidate and the B candidate. If the electronic equipment does not regard the candidate A as the face brushing user but recognizes the candidate B by mistake in the process of identifying the face brushing image, the electronic equipment directly authenticates the identity of the candidate B without identifying whether the candidate B has the intention of face brushing payment, and after the authentication is passed, the electronic equipment pays through the account of the candidate B, so that the asset of the candidate B is brushed by mistake, the asset of the candidate B is lost, and the situation that the user A starts face brushing and swipes the user B by mistake is caused. Based on this, a safer identification scheme for face brushing payment is provided, and the face brushing safety experience is improved, which is described in detail through fig. 1 and related contents.

The process in fig. 1 may include the following steps:

s102: and acquiring a face brushing 2D image and a face brushing 3D image corresponding to the face brushing 2D image.

In one or more embodiments of the present disclosure, the face brushing 2D image is a planar image, which does not carry three-dimensional information and has no stereoscopic effect, that is, the 2D image is planar, the image on the planar surface is usually a two-dimensional image, and the face brushing 3D image carries three-dimensional information, it should be noted that the face brushing 3D image may directly or indirectly represent a stereoscopic image, a typical face brushing 3D image is a depth image, which represents 3D by combining 2D and depth information, and some embodiments below mainly take the depth image as an example for description. Depth images, also known as range images, refer to images having as pixel values the distances (depths) from an image grabber to points in a scene, which directly reflect the geometry of the visible surface of a scene. That is, the brushed 2D image corresponds to a projection of the candidate to be identified on a plane.

It should be noted that, when a face brushing 2D image is collected, mainly RGB color images of a candidate to be identified are obtained, only planar RGB image information is obtained, and depth information is not required. When the face brushing 3D image is collected, one-dimensional depth information is added compared with a face brushing 2D image, namely an RGB image and a depth image D are required to be obtained, and the RGB image and the depth image D together are an RGBD image.

However, since the face brushing 2D image is equivalent to the projection of the candidate to be recognized on the plane, when the face brushing 2D image person performs face recognition, the recognition result is interfered by external factors such as illumination, posture, expression and the like, which affects the accuracy of face recognition, and thus affects the accuracy of recognition of the face brushing payment intention, and the face brushing 3D image is more stereoscopic, therefore, two-dimensional and three-dimensional information of the face needs to be fully utilized, and multi-modal recognition refers to recognition by fusing multi-modal information, so that the recognition accuracy can be improved by multi-modal visual information, i.e., the face brushing 2D image and the face brushing 3D image.

Based on this, adopt the image acquisition equipment including colour and depth data to collect people's face information, image acquisition equipment can output colour picture and depth data simultaneously.

That is, each brushed face 2D image has a brushed face 2D image taken at the same time, i.e., each brushed face 2D image has a respective corresponding brushed face 3D image.

The electronic device may acquire the face brushing 2D image and the face brushing 3D image through the pre-installed image acquisition device after receiving the face brushing payment instruction, or the electronic device may generate the face brushing payment instruction according to the payment order, and acquire the face brushing 2D image and the face brushing 3D image through the image acquisition device.

S104: determining candidate persons to be identified in the face brushing 2D image, and respectively generating corresponding mask maps according to first located areas of the candidate persons in the face brushing 2D image so as to distinguish the first located areas from other areas in the face brushing 2D image.

In one or more embodiments of the present specification, a location area of the candidate in the face brushing 2D image is referred to as a first location area, and the first location area may include appearance feature information of the candidate, such as face information, torso information, and limb information, but in order to increase accuracy of the recognition result, the first location area mainly includes face information of the candidate. Meanwhile, the first located area can be determined according to the position information of the candidate in the face brushing 2D image.

The masking operation of the image refers to recalculating the value of each pixel in the image by a mask kernel, describing the influence degree of the pixel point in the field on the new pixel value by the mask kernel, and meanwhile, carrying out weighted average on the pixel point according to the weight factor in the mask operator, wherein the image masking operation is commonly used in areas such as image smoothing, edge detection, feature analysis and the like. Therefore, the first region of the candidate in the brushing face 2D image and the other regions in the brushing face 2D image can be distinguished by the masking operation.

It should be noted that, a first region of a single candidate in the face brushing 2D image corresponds to a single mask map, that is, if there are multiple candidates in the face brushing 2D image, a corresponding mask map is generated for each candidate, and multiple mask maps are finally obtained.

That is, in the single mask map, the first region can be distinguished from other regions, for example, the first region filling value is 1, and the other region filling value is 0. That is, by generating a mask map corresponding to each candidate, the feature information of the candidate can be made clearer, and the difference between the willingness to swipe a face and the willingness to swipe no face can be increased.

S106: and extracting the features of the face brushing 2D image, and obtaining a first fusion feature according to the features of the face brushing 2D image and the mask image.

In one or more embodiments of the present specification, how to extract the features of the brushed face 2D image is not limited herein, for example, the features of the brushed face 2D image are extracted by the first feature extraction model. The features of the brushed 2D image may include face features, torso features, and extremity features of each candidate. The face features can be global features of faces of the candidate persons, and the accuracy of the recognition result can be improved by recognizing the faces through the global features.

Of course, after the mask map is obtained, the first fused feature may be obtained by inputting the feature of the brushed 2D image and the mask map into the first fused feature extraction model.

Through the characteristic of extracting the 2D image of the face of brushing, the characteristic and the mask map that will brush the 2D image of face combine, a passageway has newly been increased for the characteristic of the 2D image of face of brushing equivalently, promptly, increase the passageway quantity of the characteristic of the 2D image of face of brushing, thereby obtain first integration characteristic, in first integration characteristic, then pay attention to the face characteristic that corresponds the candidate in the 2D image of face of brushing more, can more accurately distinguish the candidate who has the wish of face of brushing payment with the candidate who does not have the wish of face payment in the 2D image of face of brushing.

S108: and extracting the features of the face brushing 3D image, and obtaining a second fusion feature according to the first fusion feature and the features of the face brushing 3D image.

In one or more embodiments of the present specification, how to extract the features of the brushed face 3D image is not limited herein, and for example, the features of the brushed face 3D image may be extracted by the second feature extraction model. The feature of the brushed face 3D image may be extracted by performing feature detection on the brushed face 2D image corresponding to the brushed face 3D image, and then corresponding the detection result of the brushed face 2D image to the brushed face 3D image.

The face brushing 3D image corresponds to the face brushing 2D image, so that the features of the face brushing 3D image can also comprise face features, trunk features and limb features of each candidate. The face features can be global features of faces of the candidate persons, and the accuracy of the recognition result can be improved by recognizing the faces through the global features.

Of course, the second fused feature may be obtained by inputting the first fused feature and the feature of the brushed 3D image into the second fused feature extraction model.

By combining the first fused feature with the features of the brushed 3D image, which is equivalent to adding a new channel to the features of the first fused feature, i.e., increasing the number of channels of the first fused feature, resulting in a second fused feature in which the spatial differences of the brushed face 2D image and the brushed face 3D image can be combined, supplementing the face characteristics of the corresponding candidate in the face brushing 2D image through the characteristics of the face brushing 3D image, or the face features of the corresponding candidate in the face brushing 3D image are supplemented through the features in the face brushing 2D image, namely, the face features of the same candidate are mutually complemented by the two, thereby improving the face recognition accuracy, and the multi-mode recognition of the face brushing willingness of payment is realized, and finally, the candidate with the face brushing willingness of payment can be accurately distinguished from the candidate without the face brushing willingness of payment.

S110: and identifying whether each candidate has a willingness to swipe face according to the second fusion characteristic.

In one or more embodiments of the present specification, feature information of the candidate is determined according to the second fused feature, and whether the candidate has a willingness to swipe a face is identified according to the feature information of the candidate.

Since there are multiple candidates in the face brushing 2D image, a corresponding mask map is generated for each candidate, and multiple mask maps are finally obtained, so when identifying whether a single candidate has a willingness to brush face, when generating a mask map, a recognition process is performed to identify whether a candidate corresponding to a mask map has a willingness to brush face. That is, when there are a plurality of mask maps, a multiple-face-brushing willingness-to-pay recognition process is performed.

It should be noted that, a preset rule may be combined to identify whether the corresponding candidate has a willingness to swipe a face, for example, if it is identified that the face region of the candidate is located in the middle region, the candidate is considered to have the willingness to swipe a face, or if it is identified that the face region of the candidate occupies a large part of the area of the face-swiped image, and the face angle meets a preset angle threshold, the candidate is considered to have the willingness to swipe a face.

Further, the second fusion feature can be input into the willingness-to-pay recognition model, feature information of the candidate is recognized through the willingness-to-pay recognition model, a processing result is output according to a preset rule, then a face-brushing willingness-to-pay probability value is generated according to the processing result, and whether the candidate has the face-brushing willingness-to-pay can be judged through the face-brushing willingness-to-pay probability value.

The processing result can be a vector which is generated by the willingness-to-pay recognition model and used for representing the probability value of the willingness-to-pay by brushing the face.

For example, if the probability value is greater than the preset probability threshold, it may be considered that the candidate has a willingness to swipe a face, that is, after the face-swiping payment instruction of the electronic device starts face-swiping payment for the candidate, the candidate is generated, and if the probability value is less than or equal to the preset probability threshold, it may be considered that the candidate does not have the willingness to swipe a face. That is, the face-brushing payment instruction of the electronic device is generated not after the candidate starts face-brushing payment, but after other candidates start face-brushing payment.

Further, if the number of the candidate persons with the probability value larger than the preset probability is multiple, it indicates that the result of the face brushing payment intention identification is not credible, and the authentication failure is prompted. Otherwise, if no candidate with the probability value larger than the preset probability exists, the result of the face brushing payment intention recognition is also indicated to be not credible, and authentication failure is prompted.

By the method of fig. 1, the corresponding mask maps can be respectively generated for the first areas of the candidates in the brushing 2D image, the feature information of the candidates can be more vivid, the difference between the willingness to pay by brushing and the willingness to pay by brushing is increased, the image comparison effect can be enhanced by the first fusion feature, so that the attention can be focused on the candidates with the willingness to pay by brushing, and whether each candidate has the willingness to pay by brushing is identified by the second fusion feature, the features of the corresponding candidate in the brushing 3D image and the features of the candidate in the brushing 2D image are combined and mutually complement the face features of the same candidate, the accuracy of face recognition can be further improved, so that the candidate with the willingness to pay by brushing and the candidate without the willingness to pay by brushing in the brushing image can be more accurately distinguished, the face brushing payment will of the candidate in the face brushing image is identified in a more targeted mode, and therefore face brushing safety experience can be enhanced.

Based on the process of fig. 1, some specific embodiments and embodiments of the process are also provided in the present specification, and the description is continued below.

In one or more embodiments of the present specification, after the candidate is determined, the face regions of the candidate in the 2D image of the brush face are extracted, for example, the face of the candidate in the 2D image of the brush face is first extracted through a face extraction model, and then the face regions of the candidate are determined through position information of the face. Then, the face area is processed, and a face area selection frame is determined. The face region selection frame may have a plurality of display modes, for example, a circular frame, a rectangular frame, an irregular polygonal frame, etc., but there is a precondition that the face region selection frame must completely surround the face region of the candidate in order to ensure the accuracy of the recognition result.

After the face region selection frame is obtained, a first filling region of a mask image corresponding to the candidate is determined according to the face region selection frame of the candidate. The shape of the first filling area may have various display manners, which are not limited herein, such as a circular area, a rectangular area, an irregular polygonal area, and the like. However, there is a precondition that, in order to ensure the accuracy of the recognition result, the face region selection box is combined to be as close to the actual face region as possible when determining the first filling region.

After the first filling area is determined, a second filling area other than the first filling area is continuously determined in the brush face image, and different filling values are given to the first filling area and the second filling area.

Note that, in order to make the first filled region in the mask map coincide with the face region in the brush face 2D image as much as possible, the mask map having a resolution that coincides with the resolution of the brush face 2D image is generated after different filling values are assigned to the first filled region and the second filled region.

Further, since the face area is mostly a circular area or an elliptical area, in order to make the first filling area more fit to the face area of the candidate, the first filling area is taken as a circular area.

Specifically, when the face region is processed and the face region selection frame is determined, the face region selection frame is determined as a rectangular frame. After the face area of the rectangular frame is obtained, the face frame width and the face frame height are calculated according to the position of the rectangular frame in the face brushing image, and the radius of the circular area is calculated according to the face frame width and the face frame height.

When the radius of the circular area is calculated, because the face area is a circular area or an elliptical area, when a rectangular frame is initially generated, the face area is similar to an inscribed circle of the rectangular frame, so that the face area is restored to the maximum extent, and the first filling area is ensured to include all the face area as much as possible, and the maximum value between the half length of the width of the rectangular frame and the half length of the height of the rectangular frame is taken as the radius of the circular area.

Therefore, the center of the rectangular frame is used as the center of circle, the half length of the longest side of the rectangular frame is determined as the radius, and then the first filling area of the mask map corresponding to the candidate is determined based on the circular area formed by the center of circle and the radius.

For example, assume that the position of the face rectangle frame of the candidate in the face brushing image is (x)₁,y₁,x₂,y₂) Wherein x is₁And x₂Respectively, the position coordinate of the rectangular frame width on the x axis, y₁And y₂Respectively, the position coordinates of the rectangular frame height on the y axis.

Then the expression for calculating the face frame width is w ═ x₂-x₁And w is the width of the face frame. In addition, x is₁Has a position coordinate less than x₂The position coordinates of (a).

Then the expression of the face box height is calculated as h ═ y₂-y₁Wherein h is the height of the face frame, and y is₁Has a position coordinate less than y₂The position coordinates of (a).

The expression for determining the radius of the circular area is

Wherein R is the radius of the circular area.

Based on this, the position coordinates of the center of the rectangular frame are

R is max

From the foregoing description, it is more intuitive that the present solution is explained in more detail in conjunction with fig. 2-3.

Fig. 2 is a schematic diagram of a framework of a system for recognizing a willingness to swipe according to one or more embodiments of the present disclosure.

In one or more embodiments of the present specification, in order to more accurately identify the brushing willingness of a candidate, a way of performing multi-mode information end-to-end learning on a brushing 2D image and a brushing 3D image obtained in a brushing payment process by using a deep convolutional neural network is provided, so that the brushing willingness of the candidate can be safely detected, and in an end-to-end learning network, a candidate region attention mechanism is introduced into the brushing 2D image learning characteristic, so that a mask map is integrated into network learning, and the network can more specifically identify the brushing willingness of the candidate in the brushing 2D image, thereby enhancing the brushing safety experience. Further, the face brushing 3D image is processed by taking the depth value of the face area of the face brushing 3D image as a reference, so that the network can effectively sense the face area of the face brushing 3D image. Therefore, the network characteristics of the face brushing 2D images and the network characteristics of the face brushing 3D images are fused, and multi-mode recognition of the face brushing payment willingness is achieved.

As shown in fig. 2, the face brushing willingness-to-pay recognition system is implemented in a manner of performing multi-modal information end-to-end learning by using a deep convolutional neural network, and includes a face brushing 2D image, a mask map generated by a candidate in a first region of the face brushing 2D image, a third convolutional network module, a candidate first region attention mechanism implementation module, a face brushing 3D image, a processed face brushing 3D image, a first convolutional network module, a multi-modal feature fusion module, a fifth convolutional network module, and a network output.

It should be noted that the third convolutional network and the fifth convolutional network have a specific corresponding relationship, that is, the network types of the third convolutional network and the fifth convolutional network are not limited herein, but the third convolutional network and the fifth convolutional network are not split, for example, the third convolutional network and the fifth convolutional network belong to different parts of the same identification network type.

The face brushing 2D image, the generated mask image and the processed face brushing 3D image serve as input data of a deep convolutional neural network, and network output is face brushing willingness probability values, namely willingness safety probability and willingness non-safety probability.

It should be noted that, since the depth map is an image in which the distance (depth) from the image capture device to each point in the scene is taken as a pixel value, the farther each pixel point in the face brushing 3D image is from the image capture device, the larger the depth value of the corresponding pixel point is. When the candidate performs the recognition of the brushing will-pay, the candidate is usually closest to the image capturing device relative to other candidates.

Therefore, in the face brushing 3D image, under the condition that the candidate includes a plurality of candidates, at this time, the candidate without the face brushing willingness of payment is usually farther from the image acquisition device, and then the depth value of the corresponding pixel point in the face brushing 3D image of the candidate without the face brushing willingness of payment is larger than the depth value of the corresponding pixel point in the face brushing 3D image of the candidate with the face brushing willingness of payment. It should be noted that, instead of simply abandoning the pixel points completely, the filtering may be performed on the pixel points uniformly and generally, for example, values used for representing the pixel points are uniformly set to a specified same value. The method aims to reduce the difference between the pixel points and reduce the contribution of the pixel points to the model training and using process, so that the model is more concentrated on the pixel points within the preset threshold value, the calculation power is concentrated on the more valuable pixel points, the efficiency is improved, and the interference is reduced. Therefore, the accuracy of recognizing the brushing payment will is improved for the candidate who starts the brushing payment.

Based on the above, in the process of recognizing the brushing willingness-to-pay, the features of the brushing 2D image are extracted through the third convolution network. Inputting the characteristics of the face brushing image and the mask image into a candidate first-location region attention mechanism implementation module, and processing the characteristics of the face brushing 2D image and the mask image by the candidate first-location region attention mechanism implementation module to output first fusion characteristics. And extracting the characteristics of the brushed 3D image through the first convolution network and the processed brushed 3D image.

As shown in fig. 2, the candidate first-location region attention mechanism implementation module includes a feature of a face brushing 2D image, a mask image after resolution reduction processing, a fourth convolution network module, and a first fusion feature.

When the attention mechanism implementation module of the first region of the candidate processes the features and the mask image of the face brushing 2D image, the mask image is subjected to resolution reduction processing to obtain the mask image subjected to resolution reduction processing so as to adapt to the features of the face brushing 2D image. And then, fusing the features of the brushed 2D image and the mask image after resolution reduction processing through a fourth convolution network, and outputting a first fused feature.

And then, inputting the features of the brushed 3D images and the first fusion features into a multi-mode feature fusion module, processing the features of the brushed 3D images and the first fusion features by the multi-mode feature fusion module, and outputting second fusion features.

As shown in fig. 2, the multi-modal feature fusion module includes a first fusion feature, a feature of the brushed 3D image, and a second convolution network module.

When the multi-mode feature fusion module is used for processing the features of the face brushing 3D image and the first fusion features, the features of the face brushing 3D image and the first fusion features are fused through a second convolution network, and therefore second fusion features are output.

And finally, inputting the second fusion characteristic into a fifth convolution network, and processing the second fusion characteristic through the fifth convolution network to obtain a processing result.

Next, how to process the face brushing 3D image to filter the pixel points that are beyond the preset threshold from the image capturing device before extracting the features of the face brushing 3D image is described.

Specifically, the depth value of the second located region of each candidate in the face brushing 3D image is first calculated. The region where the candidate is located in the face brushing 3D image is referred to as a second located region. The second located region includes a face region.

And then, filtering the pixel points which are beyond a preset threshold value from the image acquisition equipment according to the depth value of the second region in all the pixel points in the second region. When the preset threshold is set, the depth value of the pixel point corresponding to the candidate with the face brushing payment will be effectively guaranteed to be smaller than the preset threshold through experience.

Further, since the second location area of each candidate in the brushed face 3D image needs to be obtained when calculating the depth value of the second location area of each candidate in the brushed face 3D image, since the brushing will-of-payment recognition system obtains the brushed face 2D image and extracts the features of the brushed face 2D image, in order to reduce the calculation pressure of the system, the second location area is obtained by mapping the detection result of the brushed face 2D image into the brushed face 3D image instead of directly extracting the features of the brushed face 3D image through the second feature extraction model.

Specifically, the face area of each candidate in the brushing face 3D image is determined according to the face area selection box of each candidate in the brushing face 2D image. For example, the position coordinates of the face region selection frame in the face brushing 2D image are extracted, and then the face region selection frame is mapped to the face brushing 3D image according to the position coordinates, so as to obtain the face region of each candidate in the face brushing 3D image.

And finally, determining the depth value of the face area of each candidate in the face brushing 3D image according to the average value of the depth values. For example, the average value of the depth values is used as the depth value of the face region. That is, the values of the results of the depth measurement in the face region are averaged to be the depth value of the face.

Furthermore, when the pixel points exceeding the preset threshold value from the image acquisition device are filtered according to the depth value of the second region, for convenience of data processing, the depth value of the second region is taken as a reference, a plurality of depth values of all the pixel points are correspondingly mapped into a required specified range after being processed, so that differences among the pixel points are reflected better, subsequent calculation is facilitated, the processed depth values can be filtered, and the pixel points exceeding the preset threshold value from the image acquisition device are filtered.

Specifically, the ratio of the depth values of all the pixel points to the depth value of the face area is calculated, and the depth value of the face area is obtained according to the average value of the depth values of all the pixel points. Therefore, the difference between the depth values of all the pixel points and the depth value of the face area is not large. That is, if the depth values are all positive, the data distribution of the ratio result will be mostly between 0-2.

Then, according to the reference value and the ratio of the depth values of the face area, a plurality of depth values of all pixel points are respectively processed to be within the vicinity range of the reference value. The reference value of the depth value of the face area may be set according to actual needs, for example, the reference value of the depth value of the face area is set to 127, that is, if the depth values are positive numbers, then the data distribution of the depth values of all the processed pixel points is, for example, mostly between 0 and 254.

For example, in order to reduce the calculation pressure of the system, the product of the reference value and the ratio of the depth values of the face area may be calculated first, and then the depth values of all the pixel points are respectively processed to be within the vicinity of the reference value according to the product.

And finally, filtering the pixel points corresponding to the processed pixel points with the depth values larger than the preset reference threshold value. Wherein, the larger the depth value of the processed pixel point is, the farther the pixel point is from the image acquisition equipment is. The preset reference threshold may be set according to actual needs, for example, the reference threshold is set to be 127+30 ═ 157, where 30 is an exemplary value, and may be adjusted according to actual needs, and the larger the value is, the more the pixels at a longer distance are considered.

Further, when the candidate is very close to the image capturing device, the face area of the candidate may not be complete, that is, the image capturing device does not completely capture the face area of the candidate, which may also affect the result of the recognition of the candidate's willingness to pay to brush face. Therefore, in order to ensure the accuracy of the identification result, the pixel points far away from the image acquisition equipment can be filtered, and meanwhile, the pixel points very close to the image acquisition equipment are filtered.

Specifically, a first preset reference threshold and a second preset reference threshold are preset, wherein the first preset reference threshold is smaller than the reference value, and the second preset threshold is larger than the reference value. When the second preset threshold is set, the adjustable interval may be set by using the reference value as a standard.

When the pixel points corresponding to the processed pixel points with the depth values larger than the preset reference threshold are filtered, namely the pixel points corresponding to the pixel points larger than the second preset reference threshold are filtered, the pixel points which are very far away from the image acquisition equipment are filtered, and the pixel points which are very close to the image acquisition equipment are filtered when the pixel points which are smaller than the first preset reference threshold are filtered. That is, it can be considered that the pixel points corresponding to the pixel points greater than the first preset reference threshold and smaller than the second preset reference threshold are kept as true as possible, and other pixel points can be generalized or ignored.

Based on the above, the maximum values between the multiple depth values of all the processed pixel points and the first preset reference threshold value can be respectively extracted, the minimum values between the maximum values and the second preset reference threshold value are extracted, the processed pixel points with the depth values larger than the second preset reference threshold value are filtered according to the maximum values and the minimum values, and the processed pixel points with the depth values smaller than the first preset reference threshold value are filtered.

Specifically, for example, the following expression is adopted, and the pixel points corresponding to the threshold value smaller than the first preset reference threshold value are filtered through the maximum value, and the pixel points corresponding to the threshold value larger than the second preset reference threshold value are filtered through the minimum value.

For example, the expression is min

Wherein, D is the depth value of a pixel point in the face brushing 3D image, D is the depth value of a face area, 127 is the reference value of the depth value of the face area, 0 is a first preset reference threshold, 30 is the adjustable value of the pixel point far away from the image acquisition device, a section containing 30 can be set as the adjustable section, 127+30 is a second preset reference threshold, min is the minimum value, and max is the maximum value.

Through the system of FIG. 2, multi-mode information collected in the face brushing payment process is utilized, namely, the multi-mode information comprises a face brushing 2D image and a face brushing 3D image, a deep convolutional neural network end-to-end learning mode is adopted, the face brushing payment willingness of a candidate in the face brushing image can be judged through the regional attention mechanism implementation module of the candidate, the multi-mode fusion mechanism is introduced into an end-to-end network, the network can effectively judge the face brushing willingness of the candidate identified in the face brushing image by fully utilizing a plurality of modal information, multi-mode willingness identification of the candidate in face brushing payment is achieved, especially in a public place, the situation that a candidate starts face brushing, the technical effect of mistakenly brushing assets of a candidate B is effectively prevented, and safe face brushing experience is enhanced.

Based on the system in fig. 2, more intuitively, fig. 3 is a schematic flowchart of a method for recognizing a willingness to pay by swiping face based on end-to-end learning of a deep convolutional neural network according to one or more embodiments of the present disclosure.

The flow in fig. 3 may include the following steps:

s302: and acquiring a face brushing 2D image and a face brushing 3D image corresponding to the face brushing 2D image.

S304: and calculating the depth value of a second region of each candidate in the face brushing 3D image, and filtering pixel points which are far away from image acquisition equipment and exceed a preset threshold value in all pixel points in the second region according to the depth value of the second region.

S306: and extracting the characteristics of the face brushing 2D image through a third convolution network, and performing resolution reduction processing on the mask image so as to adapt to the characteristics of the face brushing 2D image. It should be noted that the third convolutional network is obtained through supervised training in advance.

For example, by using the nearest neighbor sampling method, the resolution of the mask map is reduced to generate a mask map having the same resolution as that of the brushed 2D image, and the mask map can be adapted to the features of the brushed 2D image so that the fourth convolution network can perform processing.

S308: and fusing the features of the face brushing 2D image and the mask image after resolution reduction processing through a fourth convolution network to obtain a first fused feature. It should be noted that the fourth convolutional network is obtained through supervised training in advance.

In one or more embodiments of the present specification, in the process of obtaining the first fusion feature, the feature of the face brushing 2D image and the mask map after the resolution reduction processing are connected according to the channel dimension, and the connected feature is input to a fourth convolution network for processing, so as to obtain the first fusion feature.

The number of convolution layers of the fourth convolution network is not specifically limited herein. That is, the number of convolution layers of the fourth convolution network may be made up of 1 convolution layer or more.

S310: and extracting the characteristics of the brushed 3D image through a first convolution network. It should be noted that the first convolutional network is obtained through supervised training in advance.

The number of convolution layers of the first convolution network is not specifically limited herein. That is, the number of convolution layers of the first convolutional network may be made up of 1 convolutional layer or more.

S312: connecting the first fusion feature with the feature of the face brushing 3D image according to the channel dimension; and inputting the features obtained by the connection into a second convolution network for processing to obtain second fusion features. It should be noted that the second convolutional network is obtained through supervised training in advance.

And the number of the characteristic channels and the resolution between the second fusion characteristic and the first fusion characteristic are the same.

S314: and inputting the second fusion feature into a fifth convolution network corresponding to the third convolution network for processing to obtain a processing result, wherein the third convolution network and the fifth convolution network are obtained by splitting the same convolution network in advance.

In one or more embodiments of the present specification, the third convolutional network and the fifth convolutional network are obtained by splitting from the same convolutional network in advance, for example, the convolutional network is response, ShUffleNet V2, and the like.

In the splitting process, the third convolutional network and the fifth convolutional network can be respectively used as the front part and the rear part of the same convolutional network.

The position for splitting may be determined by the size of the resolution of the features of the brushed 2D image. Therefore, the target resolution needs to be determined so as to be the resolution of the features of the brushed face 2D image.

Wherein, when determining the target resolution, the size of the resolution of the 2D image of the face needs to be combined. For example, the resolution of the feature of the brushed face 2D image is set to the resolution of the brushed face 2D image

The target resolution is the resolution of the brushed-face 2D image

Then, among the convolutional layers in the same convolutional network, a convolutional layer matching the target resolution is determined. And finally, taking the matched convolutional layer as a splitting point, splitting the same convolutional network into a front part and a rear part, wherein the front part is used as a third convolutional network, and the rear part is used as a fifth convolutional network.

S316: and generating a probability value according to the processing result to indicate whether the corresponding candidate has the willingness to pay by brushing the face.

For example, the probability value is compared with a set threshold probability, and if the probability value is greater than the set threshold probability, it is determined that the will of the candidate is safe, that is, the candidate has a will of face brushing payment. And if the probability value is smaller than or equal to the set threshold probability, determining that the willingness of the candidate is unsafe, namely the candidate does not have the willingness to pay by brushing the face.

In light of the foregoing, it is further described how supervised training of the first, second, third, fourth, and fifth convolutional networks is performed before identifying whether each candidate has willingness to swipe a face.

In one or more embodiments of the present description, a training data set needs to be established first, and then network training is performed through the training data set.

Specifically, when a training data set is established, firstly, a face brushing 2D sample image containing a face brushing user and a face brushing 3D sample image corresponding to the face brushing 2D sample image are acquired, that is, a candidate enables face brushing payment, the face brushing 2D image and the face brushing 3D sample image are acquired through an image acquisition device, and no matter whether a plurality of candidate persons to be identified exist in the face brushing 2D image, the candidate persons are used as face brushing users, and the face brushing 2D image is used as the face brushing 2D sample image. Similarly, whether a plurality of candidate persons to be identified exist in the face brushing 3D image or not, the candidate persons are regarded as face brushing users in the face brushing 3D image, and the face brushing 3D image is regarded as a face brushing 3D sample image.

For example, a face brushing image is collected by a camera on an IoT face brushing machine under a line, if the candidate a enables face brushing payment, the candidate a is used as a face brushing user in the face brushing 2D image, and the face brushing 2D image is used as a face brushing 2D sample image.

It should be noted that, when the face brushing payment is started, the corresponding face brushing 2D image and the face brushing 3D image are collected once. For example, if the candidate a enables face brushing payment, the IoT face brushing tool acquires images for nearby users to obtain a single-face-brushing 2D image and a single-face-brushing 3D image, and if the candidate B enables face brushing payment, the IoT face brushing tool acquires images for nearby users again to obtain a single-face-brushing 2D image and a single-face-brushing 3D image again.

After the face brushing 2D sample image is obtained, the face brushing user is marked as having a face brushing willingness to pay, and a corresponding mask image is generated, so that a positive sample is obtained, namely the positive sample comprises the face brushing 2D sample image, the corresponding mask image and the corresponding face brushing 3D sample image.

When the corresponding mask image is generated, the position of the face area selection frame of the face brushing user in the face brushing 2D image is selected from the face brushing 2D image, and then the corresponding mask image is generated through the position.

In addition, when filtering the pixel points exceeding the preset threshold value from the image acquisition equipment in the face brushing 3D image, the pixel points correspond to the face brushing 3D depth image according to the position of the face area selection frame in the face brushing 2D image, and then the value of the depth measurement value result in the face area of the face brushing 3D depth image is averaged to be used as the depth value of the face area, so that the pixel points exceeding the preset threshold value from the image acquisition equipment are filtered according to the depth value of the face area.

And when the face brushing user is marked as having the face brushing willingness in the face brushing 2D sample image and the face brushing 3D sample image, the face brushing user can be marked by a willingness mark label, for example, the willingness label is {0,1}, where 1 represents that the face brushing willingness is present, and 0 represents that the face brushing willingness is not present.

After the face brushing 2D sample image and the face brushing 3D sample image are obtained, because a plurality of candidate persons to be identified may exist in the face brushing sample image, if other users who are photographed by the way are also included in the face brushing 2D sample image and the face brushing 3D sample image, the other users are marked as not having the intention of face brushing payment, and a corresponding mask map is generated, so that a negative sample is obtained. For example, if the candidate a enables the face brushing payment, the candidate B is marked as not having the intention to brush the face in the face brushing 2D sample image and the face brushing 3D sample image.

And finally, performing supervised training on the first convolutional network, the second convolutional network, the third convolutional network, the fourth convolutional network and the fifth convolutional network according to the obtained positive sample and negative sample.

When supervised training is performed, firstly, random sampling is performed on positive samples and negative samples in a training data set, and training batch and corresponding label tags are generated. The training batch and its corresponding label tag are then input into the initial deep convolutional neural network. The initial deep convolutional neural network comprises an untrained first convolutional network, a second convolutional network, a third convolutional network, a fourth convolutional network and a fifth convolutional network.

And then, outputting a probability value by the initial deep convolutional neural network, calculating a loss function through the probability value and a corresponding label, and performing network training without interrupting the optimized loss function through a gradient descent method, thereby completing supervised training and obtaining the deep convolutional neural network. The rule for identifying whether the candidate has the face brushing payment will or not can be obtained through network training. For example, if the deep convolutional neural network recognizes that the face region of the candidate is located in the middle region, the candidate is considered to have the willingness to pay by brushing the face.

Based on the same idea, one or more embodiments of the present specification further provide apparatuses and devices corresponding to the above-described method, as shown in fig. 4 and 5.

Fig. 4 is a schematic structural diagram of a device for recognizing a willingness to pay by brushing face according to one or more embodiments of the present disclosure, where the device includes:

an obtaining module 402, configured to obtain a 2D face brushing image and a 3D face brushing image corresponding to the 2D face brushing image;

a generating module 404, configured to determine candidate persons to be identified in the brushing face 2D image, and generate corresponding mask maps according to a first located region of each candidate person in the brushing face 2D image to distinguish the first located region from other regions in the brushing face 2D image;

the first extraction module 406 is configured to extract features of the face brushing 2D image, and obtain a first fusion feature according to the features of the face brushing 2D image and the mask image;

the second extraction module 408 is configured to extract features of the brushed 3D image, and obtain second fusion features according to the first fusion features and the features of the brushed 3D image;

the identification module 410 identifies whether each candidate has a willingness to swipe face according to the second fusion feature.

Optionally, the face brushing method further includes a filtering module, which calculates depth values of second areas of the candidates in the face brushing 3D image, and filters pixel points that are farther than a preset threshold from the image capture device among all pixel points in the second areas according to the depth values of the second areas.

Optionally, the filtering module determines, according to a face region selection frame of each candidate in the brushing 2D image, a face region of each candidate in the brushing 3D image;

calculating a plurality of depth values of all pixel points in the face region;

and determining the depth value of the face area of each candidate in the face brushing 3D image according to the average value of the depth values.

Optionally, the filtering module is configured to calculate ratios between a plurality of depth values of all the pixel points and the depth value of the face region respectively;

according to the reference value and the ratio of the depth values of the face area, respectively processing the depth values of all the pixel points to be in the range near the reference value;

filtering the pixel points corresponding to the processed multiple depth values of all the pixel points which are larger than a preset reference threshold value; wherein, the larger the depth value of the processed pixel point is, the farther the pixel point is from the image acquisition equipment is.

Optionally, the filtering module calculates a product of a reference value of the depth value of the face region and the ratio;

and respectively processing the depth values of all the pixel points to be in the range near the reference value according to the product.

Optionally, a first preset reference threshold and a second preset reference threshold are set, where the first preset reference threshold is smaller than the reference value, and the second preset reference threshold is larger than the reference value; the filtering module is used for respectively extracting the maximum values between the plurality of depth values of all the processed pixel points and a first preset reference threshold value;

extracting a minimum value between the maximum value and a second preset reference threshold value;

and filtering the processed pixel points with the depth values larger than the second preset reference threshold value and filtering the processed pixel points with the depth values smaller than the first preset reference threshold value according to the maximum value and the minimum value.

Optionally, the second extraction module 408 is configured to extract features of the brushed 3D image through a first convolution network;

connecting the first fusion feature with the feature of the face brushing 3D image according to the channel dimension; and inputting the features obtained by the connection into a second convolution network for processing to obtain second fusion features.

Optionally, the first extraction module 406 is configured to extract features of the brushed 2D image through a third convolutional network;

performing resolution reduction processing on the mask image to adapt to the characteristics of the face brushing 2D image;

and fusing the features of the face brushing 2D image and the mask image after resolution reduction processing through a fourth convolution network to obtain a first fused feature.

Optionally, the first extraction module 406 is configured to connect, according to a channel dimension, the feature of the brushed 2D image and the mask map after the resolution reduction processing;

and inputting the features obtained by the connection into the fourth convolution network for processing to obtain a first fusion feature.

Optionally, the identifying module 410 inputs the second fusion feature into a fifth convolutional network corresponding to the third convolutional network for processing, so as to obtain a processing result, where the third convolutional network and the fifth convolutional network are obtained by splitting the same convolutional network in advance;

and generating a probability value according to the processing result to indicate whether the corresponding candidate has the willingness to pay by brushing the face.

Optionally, the recognition module 410, determining a target resolution as a resolution of a feature of the brushed 2D image;

determining a convolutional layer matched with the target resolution in convolutional layers in the same convolutional network;

and taking the matched convolutional layer as a splitting point, splitting the same convolutional network into a former part and a latter part, wherein the former part is taken as the first convolutional network, and the latter part is taken as the third convolutional network.

Optionally, the system further comprises a supervised training module, which acquires a face brushing 2D sample image containing a face brushing user confirmed to brush the face and a face brushing 3D sample image corresponding to the face brushing 2D sample image;

marking the face brushing user as having a face brushing willingness to pay, and generating a corresponding mask map to obtain a positive sample;

if the brushing 2D sample image and the brushing 3D sample image also contain other users which are taken by the way, marking the other users as not having the brushing willingness to pay and generating corresponding mask maps to obtain negative samples;

and carrying out supervised training on the first convolutional network, the second convolutional network, the third convolutional network, the fourth convolutional network and the fifth convolutional network according to the obtained samples.

Optionally, the generating module 404 is configured to, for each determined candidate, perform:

determining a first filling area of a mask image corresponding to the candidate and a second filling area outside the first filling area according to the face area selection box of the candidate;

generating the mask map having a resolution identical to that of the brush face image by assigning different fill values to the first fill region and the second fill region.

Optionally, the face region selection frame is a rectangular frame; the generating module 404 determines the center of the rectangular frame as a circle center, and determines a half length of the longest side of the rectangular frame as a radius;

and determining a circular area formed based on the circle center and the radius as a first filling area of the mask map corresponding to the candidate.

Optionally, the brushed 2D image includes at least two human faces.

Optionally, the device is applied to an offline IoT face brushing tool, and the face brushing image is acquired by the IoT face brushing tool for nearby users.

Fig. 5 is a schematic structural diagram of a device for recognizing a willingness to pay by brushing face according to one or more embodiments of the present specification, where the device includes:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein,

Based on the same idea, one or more embodiments of the present specification further provide a non-volatile computer storage medium for identifying willingness-to-swipe payment, corresponding to the above method, and storing computer-executable instructions configured to:

the above description is merely one or more embodiments of the present disclosure and is not intended to limit the present disclosure. Various modifications and alterations to one or more embodiments of the present description will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement or the like made within the spirit and principle of one or more embodiments of the present specification should be included in the scope of the claims of the present specification.

Claims

1. A face brushing willingness-to-pay recognition method comprises the following steps:

2. The method of claim 1, prior to extracting features of the brushed 3D image, the method further comprising:

and calculating the depth value of a second region of each candidate in the face brushing 3D image, and filtering pixel points which are far away from image acquisition equipment and exceed a preset threshold value in pixel points in the second region according to the depth value of the second region.

3. The method according to claim 2, wherein the calculating the depth value of the second region of each candidate in the 3D face brushing image specifically comprises:

determining the face area of each candidate in the face brushing 3D image according to the face area selection frame of each candidate in the face brushing 2D image;

calculating a plurality of depth values of all pixel points in the face region;

4. The method according to claim 3, wherein the filtering, according to the depth value of the second located region, a pixel point which is farther than a preset threshold from the image capturing device includes:

respectively calculating the ratio of the depth values of all the pixel points to the depth value of the face area;

5. The method according to claim 4, wherein the processing of the depth values of all the pixels to be within the vicinity of the reference value respectively comprises:

calculating the product of the reference value of the depth value of the face area and the ratio;

6. The method according to claim 4, wherein a first preset reference threshold value and a second preset reference threshold value are set, the first preset reference threshold value is smaller than the reference value, and the second preset reference threshold value is larger than the reference value;

the filtering the processed pixel points corresponding to the depth values of all the pixel points larger than the preset reference threshold specifically includes:

respectively extracting the maximum values between the plurality of depth values of all the processed pixel points and a first preset reference threshold value;

7. The method according to claim 1, wherein the extracting features of the brushed 3D image and obtaining a second fused feature according to the first fused feature and the features of the brushed 3D image specifically include:

extracting the characteristics of the face brushing 3D image through a first convolution network;

8. The method according to claim 1, wherein the extracting the feature of the brushed 2D image and obtaining a first fused feature according to the feature of the brushed 2D image and the mask map specifically include:

extracting the features of the brushed 2D image through a third convolutional network;

and fusing the characteristics of the brushing face 2D image and the mask image after resolution reduction processing through a fourth convolution network to obtain a first fusion characteristic.

9. The method according to claim 8, wherein fusing, by a fourth convolution network, the feature of the brushed 2D image and the mask map after the resolution reduction processing to obtain a first fused feature specifically includes:

according to the channel dimension, connecting the characteristics of the face brushing 2D image with the mask image after resolution reduction processing; and inputting the connected features into the fourth convolution network for processing to obtain first fusion features.

10. The method of claim 8, wherein identifying whether each of the candidates has a willingness to swipe according to the second fused feature comprises:

inputting the second fusion feature into a fifth convolution network corresponding to the third convolution network for processing to obtain a processing result, wherein the third convolution network and the fifth convolution network are obtained by splitting the same convolution network in advance;

11. The method according to claim 10, wherein the splitting specifically comprises:

determining a target resolution as a resolution of a feature of the brushed 2D image;

and taking the matched convolutional layer as a splitting point, splitting the same convolutional network into a former part and a latter part, wherein the former part is taken as the third convolutional network, and the latter part is taken as the fifth convolutional network.

12. The method of claim 10, prior to identifying whether each of the candidates has a willingness to swipe according to the second fused feature, the method further comprising:

acquiring a brushing 2D sample image containing a confirmed brushing user and a brushing 3D sample image corresponding to the brushing 2D sample image;

13. The method according to claim 1, wherein the generating corresponding mask maps according to the first located region of each candidate in the 2D face brushing image respectively includes:

respectively aiming at each determined candidate, executing:

generating the mask map having a resolution identical to that of the brushed 2D image by assigning different fill values to the first and second fill regions.

14. The method of claim 13, wherein the face region selection box is a rectangular box;

the determining, according to the face area selection box of the candidate, a first filling area of a mask map corresponding to the candidate specifically includes:

determining the center of the rectangular frame as the circle center, and determining the half length of the longest edge of the rectangular frame as the radius; and determining a circular area formed based on the circle center and the radius as a first filling area of the mask map corresponding to the candidate.

15. A method as claimed in any one of claims 1 to 14, wherein the brushed 2D images comprise at least two human faces.

16. The method of any of claims 1-14 applied to an offline IoT facer, the facer image captured by the IoT facer for nearby users.

17. A brushing will-of-payment recognition device, comprising:

18. A brushing willingness-to-pay recognition device, comprising:

at least one processor; and the number of the first and second groups,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to cause the at least one processor to: