CN113205138A - Human face and human body matching method, equipment and storage medium - Google Patents

Human face and human body matching method, equipment and storage medium Download PDF

Info

Publication number
CN113205138A
CN113205138A CN202110489603.2A CN202110489603A CN113205138A CN 113205138 A CN113205138 A CN 113205138A CN 202110489603 A CN202110489603 A CN 202110489603A CN 113205138 A CN113205138 A CN 113205138A
Authority
CN
China
Prior art keywords
human
human face
face
human body
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110489603.2A
Other languages
Chinese (zh)
Inventor
施志祥
郑腾飞
阮宇艨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Yuncong Tianfu Artificial Intelligence Technology Co Ltd
Original Assignee
Sichuan Yuncong Tianfu Artificial Intelligence Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Yuncong Tianfu Artificial Intelligence Technology Co Ltd filed Critical Sichuan Yuncong Tianfu Artificial Intelligence Technology Co Ltd
Priority to CN202110489603.2A priority Critical patent/CN113205138A/en
Publication of CN113205138A publication Critical patent/CN113205138A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the technical field of image processing, and particularly provides a human face and human body matching method, equipment and a storage medium, aiming at solving the problem of low accuracy in the detection of matching of a human face and a human body by singly using position information of a human face frame and a human body frame. To this end, the method of the invention comprises: acquiring a human body image and a human face area in an image; inputting the human body image into a human face and human body matching model which is trained in advance to obtain a confidence map, wherein the confidence map comprises the human face and human body matching model; calculating to obtain the matching confidence coefficient of the human face and the human body according to the confidence image and the human face region; and when the matching confidence coefficient is greater than or equal to the human face and human body matching confidence coefficient threshold value, matching the human face and the human body. According to the invention, the semantic information and the position information between the human face and the human body are simultaneously used for matching judgment, so that the accuracy of matching detection of the human face and the human body can be effectively improved.

Description

Human face and human body matching method, equipment and storage medium
Technical Field
The invention belongs to the technical field of image processing, and particularly provides a human face and human body matching method, human face and human body matching equipment and a storage medium.
Background
The face image recognition technology has been widely applied to numerous fields such as finance, public security, frontier inspection and education, and this technique requires to obtain a comparatively clear face image usually, however if the pedestrian does not have the initiative cooperation, owing to receive the influence of factors such as camera shooting angle, environment, hardly obtain a comparatively clear face image under the most circumstances to can't carry out identification through face image. Human image recognition technology receives camera shooting angle and environmental impact less, utilizes human image recognition technology can establish ties the motion trail of same pedestrian under a plurality of cameras, then only needs to find one in numerous pedestrian's orbit images contain clear positive face image alright with carrying out face identification.
The method for face recognition through the cooperation of the human body image and the face image has the key step of finding a target face on the human body image. However, due to crowding of pedestrians and the shooting direction and angle of the camera, there are usually multiple faces on one human body image, or a single face on the human body image is not a target face corresponding to the human body, which makes it difficult to find the target face on the human body image. The existing method usually uses the coordinate information of a face frame and a human body frame to calculate the matching degree between the face frame and the human body frame to judge whether the face is matched with the human body. However, the method ignores semantic information between the human face and the human body, and often causes low matching accuracy; and the tracking error of the human face frame and the human body frame can also reduce the matching accuracy.
Accordingly, there is a need in the art for a new solution to the above-mentioned problems.
Disclosure of Invention
The method aims to solve the problem in the prior art that the matching accuracy rate is not high due to the fact that the matching of the human face and the human body is achieved through the position information of the human face frame and the human body frame in a single image. In a first aspect, the present invention provides a training method for a human face and human body matching model, where the method includes:
taking the sample image as the input of the human face and human body matching model, and outputting to obtain a sample confidence map;
obtaining a supervision signal according to the face area in the sample image;
calculating a loss value by a loss function according to the sample confidence map and the supervisory signal;
updating parameters of the human face and human body matching model according to the loss value and a back propagation algorithm;
the sample image is an image in which a human body image and a human face region are identified in an automatic or manual mode in the public data set.
In an embodiment of the above training method for a human face and human body matching model, the human face and human body matching model includes:
a neural network of U-shaped structure;
the sigmoid function maps the numerical value of the confidence map output by the neural network to 0-1;
wherein the confidence map and the human body image have the same size.
In an embodiment of the above training method for a human face and human body matching model, the step of obtaining a supervision signal according to a human face region in the sample image specifically includes:
obtaining a first supervision signal image according to the face area in the sample image;
dividing the first supervisory signal map into three categories: a target face region, a non-target face region and a background region;
respectively retracting the target face area and the non-target face area to the center according to a first retraction coefficient and a second retraction coefficient to obtain a second supervision signal image;
the supervisory signal is the second supervisory signal map, and the supervisory signal and the sample confidence map are the same in size.
In an embodiment of the above training method for a human face and human body matching model, the method for calculating the loss value includes:
the loss function selects a cross entropy loss function at a pixel level, and the expression of the loss function corresponding to each pixel is as follows:
Figure BDA0003050477930000031
wherein, L is a loss value, and M is the number of categories in the supervision signal; y iscThe method is a one-hot vector, elements only have two values of 0 and 1, when the category is the same as that of a sample, 1 is taken, otherwise 0 is taken; p is a radical ofcPredicting a probability value of the sample belonging to c in the sample confidence map;
the category includes the target face region and the non-target face region, and does not include the background region.
In a second aspect, the present invention provides a human face and body matching method, including:
acquiring a human body image and a human face area in an actual image;
inputting the human body image into a human face and human body matching model trained by the method according to the training method of the human face and human body matching model to obtain a confidence map;
calculating to obtain the matching confidence of the human face and the human body according to the confidence map and the human face region;
comparing the matching confidence with a human face and human body matching confidence threshold;
and when the matching confidence is greater than or equal to the human face and human body matching confidence threshold, judging that the human face of the actual image is matched with the human body, otherwise, judging that the human face of the actual image is not matched with the human body.
In an embodiment of the above human face and human body matching method, "calculating a matching confidence of a human face and a human body according to the confidence map and the human face region" includes:
in the confidence map, performing center indentation calculation on the face region according to a first indentation coefficient and a second indentation coefficient to obtain a face indentation region;
and calculating a response mean value in the human face indentation area range in the confidence image to serve as the matching confidence of the human face and the human body.
In one embodiment of the above-mentioned human face and body matching method,
when the matching confidence of the human face and the human body is calculated, if a plurality of human face regions exist, calculating the matching confidence of all the human face regions;
and when the matching confidence coefficient is compared with a human face matching confidence coefficient threshold value, selecting the highest matching confidence coefficient to be compared with the human face matching confidence coefficient threshold value.
In an embodiment of the above human face and human body matching method, the response mean value calculation method includes:
Figure BDA0003050477930000041
wherein P is the mean value of response, N is the number of pixels in the human face indentation area in the confidence map, PiA probability value that is predicted to belong to the target face for each pixel.
In a third aspect, the present invention provides a human face and body matching device, comprising a processor and a storage device, wherein the storage device is adapted to store a plurality of program codes, and wherein the program codes are adapted to be loaded and run by the processor to execute the human face and body matching method according to any one of the above aspects.
In a fourth aspect, the present invention provides a storage medium adapted to store a plurality of program codes, wherein the program codes are adapted to be loaded and run by a processor to execute the human face and body matching method according to any one of the above aspects.
The technical scheme of matching the human face and the human body of the invention includes designing a monitoring signal capable of reflecting semantic relation and position relation between the human face and the human body, training a human face and human body matching model by using the monitoring signal, and judging whether the human face is matched with the human body by using a confidence map output by the human face and human body matching model. In addition, in the method, the input is a human body image, and a confidence image directly output by the human body matching model of the human face is used as a basis for predicting whether the human face area is matched or not.
Drawings
Embodiments of the invention are described below with reference to the accompanying drawings, in which:
FIG. 1 is a flow chart of the main steps of an embodiment of the present invention.
Fig. 2 is a schematic view of the main frame structure of the embodiment of the present invention.
Fig. 3 is a schematic diagram of supervisory signal generation for an embodiment of the present invention.
Fig. 4 is a flowchart of main steps of a training method of a human face and human body matching model according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Preferred embodiments of the present invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.
Referring to fig. 1, fig. 1 is a flow chart of main steps of an embodiment of the present invention, and as shown in fig. 1, a human face and body matching method includes:
step S101: acquiring a human body image and a human face area in an actual image;
step S102: inputting the human body image into a human face and human body matching model which is trained in advance to obtain a confidence map;
step S103: calculating to obtain the matching confidence coefficient of the human face and the human body according to the confidence image and the human face region;
step S104: and when the matching confidence coefficient is greater than or equal to the human face and human body matching confidence coefficient threshold value, judging that the human face is matched with the human body, otherwise, judging that the human face is not matched with the human body.
With reference to fig. 2, fig. 2 is a schematic diagram of a main frame structure of an embodiment of the present invention, and an implementation flow of the present invention is described with reference to fig. 1 and fig. 2.
In this embodiment, the method for implementing step S101 is as follows: firstly, a human body image to be recognized is obtained from a video stream through a pedestrian detection module 21, and then the human body image is input into a face detection module 22 to obtain a face area, wherein the face area can be identified by a rectangular frame. The face detection module 22 will recognize all face information in the human body image, so the number of face regions is not less than 1, as shown in fig. 3, there are 2 face regions in the figure.
In this embodiment, the image processing is at the pixel level, preferably using a pixel coordinate system, the origin of which may be chosen in the upper left corner of the image. The position of the face region in the image can be identified by the coordinates of the top left corner and the bottom right corner of the rectangular box of the face region, as shown in the signal supervision diagram of fig. 3, the coordinates of the top left corner of the rectangular box of the target face region 33 is (x)1,y1) The coordinate of the lower right corner is (x)2,y2)。
In step S101, the methods of human body detection and human face detection are well known in the art, and the methods of human body detection and human face detection may be the same or different, and the embodiment of the present invention is not limited. For example, the fast R-CNN model, the SSD model, the YOLO model, etc. can be used for human detection and face detection, and those skilled in the art can select a suitable technical solution according to actual situations.
In the present embodiment, human body detection and face detection are performed by the pedestrian detection module 21 and the face detection module 22, respectively, and the face detection is based on the human body detection result. Those skilled in the art can implement human body detection and human face detection by adopting other schemes according to actual situations, for example, combining the pedestrian detection module 21 and the human face detection module 22 into one module, and implementing human body detection and human face detection by using an algorithm, wherein the image detection features include human body and human face. Modifications and substitutions of these schemes will fall within the scope of the invention.
It should be noted that, in this embodiment, the detected image is an image directly identified in a video stream, and the method of the present invention is also applicable to an image captured or captured in a video, or an image obtained in another manner.
Inputting the human body image obtained in step S101 into the trained human face and human body matching model in step S102 in the confidence map generation module 23, so as to obtain a confidence map of the human body image to be detected.
The confidence map generation module 23 in fig. 3 generally includes a human face and human body matching model, a supervision signal and a loss function, and the supervision signal and the loss function are mainly used for training the human face and human body matching model and can shield the execution of the supervision signal and the loss function when detecting whether the human face and the human body are matched.
In this embodiment, the structure of the human face and human body matching model mainly includes a U-shaped deep neural network and a sigmoid function. The structure of the deep neural network with the U-shaped structure comprises an Encoder part and a Decoder part, wherein the Encoder part captures context semantic information in an image through continuous convolution and downsampling operation to obtain high-level semantic information coding of the image; the Decoder part maps the high-level semantic information to the confidence map through continuous convolution and up-sampling operations. The size of the human body image input by the model is (h, w), the size of the confidence map output by the Encoder and the Decoder is also (h, w), and therefore the size of the confidence map is the same as that of the human body image input by the model. At this time, the confidence map output by the human face and human body matching model already contains semantic information and position information of the human face region.
And mapping the numerical value of the confidence map to 0-1 through a sigmoid function to obtain a final confidence map for calculating the confidence. The sigmoid function is as follows:
Figure BDA0003050477930000081
wherein S is a sigmoid function; and x is the output value of the deep neural network of the U-shaped structure of each pixel point in the confidence map.
The deep neural network with the U-shaped structure is selected because semantic information can be mapped into the confidence map through Encoder and Decoder links in the image processing process, so that the confidence map simultaneously contains the semantic information and the position information of the face in the processed image, and the confidence map is used as a judgment basis for matching, so that the matching accuracy of the face and the human body can be greatly improved.
In the confidence map generation module 23, the U-shaped deep neural network is a well-known technology in the art, and the embodiment of the present invention is not limited. Illustratively, the deep neural network with the U-shaped structure may adopt a U-Net network, a SegNet network, a UNet + + network, or the like, and a person skilled in the art may select an appropriate technical scheme according to actual situations.
In the embodiment, a supervision signal is generated, that is, prior knowledge of semantic information and position information of a human face is acquired; the a priori knowledge is that a supervisory signal of a batch of human body images of known face regions is known as a training set of a training model. Optionally, an image in the data set of the ReID public data set Market-1501 is selected, and a target human body and a face region in the image, such as a face frame 31 and a face frame 32 shown in fig. 3, are identified in a manual or automatic manner, where the face frame 31 is the target face region corresponding to the target human body and is used as a sample image for model training.
Continuing with FIG. 3, a first supervisory signal map is generated from the sample images with identified face frames. In order to embody the semantic relationship and the position relationship of the human face, the first supervision signal graph is divided into three categories: a target face region, a non-target face region and a background region. For ease of presentation in a computer, different categories in the supervisory signal map may be represented in different colors. As an example, a red region in the supervision signal map represents a target face region, a green region represents a non-target face region, and a black region represents a background region, and corresponds one-to-one to each region in the human body image on the position coordinates. As in fig. 3, the target face region 33 in the supervision signal map corresponds to the face region 31 in the sample image, the non-target face region 35 in the supervision signal map corresponds to the face region 32 in the sample image, and the other regions in the supervision signal map and the sample image are both background regions.
Further reading fig. 3, in order to reduce the adverse effect that the face boundary region may bring to the prediction result, the target face region and the non-target face region in the first supervision signal are respectively retracted toward the center thereof, so as to obtain a second supervision signal map. As shown in FIG. 3, the non-indented target face region 33 has the coordinate of (x) at the upper left corner of the rectangular frame1,y1) The coordinate of the lower right corner is (x)2,y2) (ii) a Indented target face region 34 with the coordinate of (m) at the upper left corner of the rectangular box1,n1) The coordinate of the lower right corner is (m)2,n2)。
Thus, the supervisory signal is a second supervisory signal map comprising indented target face regions and indented non-target face regions. And the second supervisory signal map is the same size as the first supervisory signal map; the first supervisory signal map is obtained directly from the sample image; the human body image and the confidence image output by the human body image through the human face human body matching model have the same size, so that the supervision signal and the sample confidence image have the same size.
The method for retracting the target face area and the non-target face area to the center comprises the following steps: the linear retraction is performed by a first retraction coefficient alpha in the x-axis direction and a second retraction coefficient beta in the y-axis direction. As an example, the indentation calculation of the target face frame is as follows:
m1=x1+α(x2-x1)
n1=y1+β(y2-y1)
m2=x2-α(x2-x1)
n2=y2-β(y2-y1)
the reduction coefficients α and β are usually set by a person skilled in the art according to the characteristics of the images in the sample library, and if α is 0.8 and β is 0.85, the values of α and β may be the same or different.
In the embodiment, the loss is calculated according to the sample confidence map and the supervision signal output by the human face and human body matching model, wherein the sample confidence map and the supervision signal are both obtained from the same sample image. The loss function is a cross-entropy loss function at the pixel level, which examines each pixel one by one and compares the prediction result for each pixel class with a supervisory signal. Therefore, the closeness of the actual output of the model to the desired output can be determined by the cross entropy function.
The loss function for each pixel is expressed as:
Figure BDA0003050477930000101
wherein, L is a loss value, and M is the number of categories in the supervision signal; y iscThe method is a one-hot vector, elements only have two values of 0 and 1, when the category is the same as that of a sample, 1 is taken, otherwise 0 is taken; p is a radical ofcIn this embodiment, a value calculated by a sigmoid function corresponding to each pixel in the sample confidence map may be sampled.
It should be noted that, in the present embodiment, since the operation performed is matching of a human face and a human body, when coordinates of a face region are known, the purpose is to distinguish whether the face region is a target face region or a non-target face region. Therefore, when the loss is calculated, only the loss generated by the two classification areas needs to be calculated, and the loss generated by the background area can be ignored.
Referring to fig. 4, fig. 4 is a flowchart illustrating main steps of a training method for a human face and human body matching model according to an embodiment of the present invention. As an example, a deep neural network UNet + + network with a U-shaped structure is selected as an example network of the human face human body matching model, and the main method for training the human face human body matching model comprises the following steps:
step S401: taking the sample image as the input of a UNet + + network, and obtaining a sample confidence map through sigmoid function operation;
step S402: obtaining a supervision signal according to a face area in a sample image;
step S403: calculating a loss value through a cross entropy loss function according to a sample confidence map and a supervision signal obtained from the same sample image;
step S404: and updating parameters of the UNet + + network according to the loss value and a back propagation algorithm.
After a plurality of iterations of training for a plurality of sample images, each sample image, the training may be considered complete when the calculated loss value is less than the set loss value threshold. The size of the loss value threshold is related to the number and quality of the sample images, the iteration times and other factors, and can be set by related technicians according to the actual situation, for example, the loss value threshold can be set to be 0.00005, and the smaller the loss value is, the closer the model prediction result is to the actual situation is.
The input of the confidence calculation module 24 in fig. 2 is the face region obtained in step S101 and the confidence map obtained in step S102. In step S103, for the same reason, in order to reduce the adverse effect that the face boundary region may bring on the matching result, all the face regions in the confidence map to be processed are respectively indented towards the center thereof to obtain all face indented regions, the indentation method is the same as the indentation method used for acquiring the monitoring signal in the aforementioned face human body matching model training, and the first indentation coefficient and the second indentation coefficient used for indentation are respectively the same as the values of the first indentation coefficient α and the second indentation coefficient β used for acquiring the monitoring signal in the aforementioned face human body matching model training.
And respectively calculating the response mean values of all human face indentation areas in the confidence image output by the human face human body matching model to obtain 1 or more response mean values, wherein the calculated response mean values are the matching confidence coefficients. The calculation method of the response mean value comprises the following steps:
Figure BDA0003050477930000111
wherein P is the mean value of response, N is the number of pixels in the human face indentation area in the confidence map, PiPredicting a probability value belonging to a target face for each pixel, wherein the probability value and a confidence map form a positive correlation, and the confidence map directly reflects the magnitude of the probability value.
The step S104 can be implemented by the matching decision module 25, and if the matching confidence obtained in the step S103 is greater than or equal to the preset human face and human body matching confidence threshold, it is determined that the human face in the actual image matches the human body, otherwise, it is determined that the human face in the actual image does not match the human body. When the human body image comprises a plurality of human face regions, only the human face region with the highest matching confidence coefficient is selected for matching judgment.
It should be noted that the method is also applicable to matching of multiple faces in an image with a human body. As an example, the image may be divided into a plurality of human body images and encoded, and the method of the present invention is sequentially used to match human faces with human bodies for the encoded human body images, so as to match the human bodies with a plurality of human faces in the image. The technical solutions can be selected by those skilled in the art according to the actual situation without departing from the principle of the present invention, and the technical solutions after the modifications or replacements of these solutions will fall within the protection scope of the present invention.
Furthermore, the invention also provides equipment for matching the human face with the human body. In an embodiment of the apparatus for matching a human face with a human body according to the present invention, the apparatus for matching a human face with a human body includes a processor and a storage device, the storage device may be configured to store and execute a program of the method for matching a human face with a human body of the above-mentioned method embodiment, and the processor may be configured to execute a program in the storage device, the program including but not limited to a program of the method for matching a human face with a human body of the above-mentioned method embodiment. For convenience of explanation, only the parts related to the embodiments of the present invention are shown, and details of the specific techniques are not disclosed. The device for matching the human face with the human body can be a control device formed by various electronic devices.
Further, the invention also provides a storage medium. In one embodiment of the storage medium according to the present invention, the storage medium may be configured to store a program for executing the method for matching a human face with a human body of the above-described method embodiment, and the program may be loaded and executed by a processor to implement the above-described method for matching a human face with a human body. For convenience of explanation, only the parts related to the embodiments of the present invention are shown, and details of the specific techniques are not disclosed. The storage medium may be a storage device formed by various electronic apparatuses, and optionally, the storage medium in the embodiment of the present invention is a non-transitory computer-readable storage medium.
Those of skill in the art will appreciate that the method steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described above generally in terms of their functionality in order to clearly illustrate the interchangeability of electronic hardware and software. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing or implying any particular order or sequence. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (10)

1. A training method of a human face and human body matching model is characterized by comprising the following steps:
taking the sample image as the input of the human face and human body matching model, and outputting to obtain a sample confidence map;
obtaining a supervision signal according to the face area in the sample image;
calculating a loss value by a loss function according to the sample confidence map and the supervisory signal;
updating parameters of the human face and human body matching model according to the loss value and a back propagation algorithm;
the sample image is an image in which a human body image and a human face region are identified in an automatic or manual mode in the public data set.
2. The training method of the human face and body matching model according to claim 1, wherein the human face and body matching model comprises:
a neural network of U-shaped structure;
the sigmoid function maps the numerical value of the confidence map output by the neural network to 0-1;
wherein the confidence map and the human body image have the same size.
3. The training method of the human face and human body matching model according to claim 1, wherein the step of obtaining the supervision signal according to the human face region in the sample image specifically comprises:
obtaining a first supervision signal image according to the face area in the sample image;
dividing the first supervisory signal map into three categories: a target face region, a non-target face region and a background region;
respectively retracting the target face area and the non-target face area to the center according to a first retraction coefficient and a second retraction coefficient to obtain a second supervision signal image;
the supervisory signal is the second supervisory signal map, and the supervisory signal and the sample confidence map are the same in size.
4. The training method of the human face and human body matching model according to claim 3, wherein the calculation method of the loss value comprises the following steps:
the loss function selects a cross entropy loss function at a pixel level, and the expression of the loss function corresponding to each pixel is as follows:
Figure FDA0003050477920000021
wherein, L is a loss value, and M is the number of categories in the supervision signal; y iscThe method is a one-hot vector, elements only have two values of 0 and 1, when the category is the same as that of a sample, 1 is taken, otherwise 0 is taken; p is a radical ofcPredicting a probability value of the sample belonging to c in the sample confidence map;
the category includes the target face region and the non-target face region, and does not include the background region.
5. A human face and human body matching method is characterized by comprising the following steps:
acquiring a human body image and a human face area in an actual image;
inputting the human body image into a human face and human body matching model trained according to the method of any one of claims 1 to 4 to obtain a confidence map;
calculating to obtain the matching confidence of the human face and the human body according to the confidence map and the human face region;
comparing the matching confidence with a human face and human body matching confidence threshold;
and when the matching confidence is greater than or equal to the human face and human body matching confidence threshold, judging that the human face in the actual image is matched with the human body, otherwise, judging that the human face in the actual image is not matched with the human body.
6. The human face and human body matching method according to claim 5, wherein the specific step of calculating the matching confidence of the human face and the human body according to the confidence map and the human face region comprises:
in the confidence map, performing center indentation calculation on the face region according to a first indentation coefficient and a second indentation coefficient to obtain a face indentation region;
and calculating a response mean value in the human face indentation area range in the confidence image to serve as the matching confidence of the human face and the human body.
7. The human face and body matching method of claim 6,
when the matching confidence of the human face and the human body is calculated, if a plurality of human face regions exist, calculating the matching confidence of all the human face regions;
and when the matching confidence coefficient is compared with a human face matching confidence coefficient threshold value, selecting the highest matching confidence coefficient to be compared with the human face matching confidence coefficient threshold value.
8. The human face and body matching method according to claim 6, wherein the response mean value is calculated by:
Figure FDA0003050477920000031
wherein P is the mean value of response, N is the number of pixels in the human face indentation area in the confidence map, PiA probability value that is predicted to belong to the target face for each pixel.
9. A human face and body matching device comprising a processor and a storage means adapted to store a plurality of program codes, characterized in that said program codes are adapted to be loaded and run by said processor to perform the human face and body matching method according to any one of claims 5 to 8.
10. A storage medium adapted to store a plurality of program codes, wherein the program codes are adapted to be loaded and run by a processor to perform the human face and body matching method of any one of claims 5 to 8.
CN202110489603.2A 2021-04-30 2021-04-30 Human face and human body matching method, equipment and storage medium Pending CN113205138A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110489603.2A CN113205138A (en) 2021-04-30 2021-04-30 Human face and human body matching method, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110489603.2A CN113205138A (en) 2021-04-30 2021-04-30 Human face and human body matching method, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113205138A true CN113205138A (en) 2021-08-03

Family

ID=77028718

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110489603.2A Pending CN113205138A (en) 2021-04-30 2021-04-30 Human face and human body matching method, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113205138A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113591785A (en) * 2021-08-12 2021-11-02 北京爱笔科技有限公司 Human body part matching method, device, equipment and storage medium
CN113591783A (en) * 2021-08-12 2021-11-02 北京爱笔科技有限公司 Human body and human face matching method, device, equipment and storage medium
CN113591786A (en) * 2021-08-12 2021-11-02 北京爱笔科技有限公司 Human body and human face matching method, device, equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080193020A1 (en) * 2005-02-21 2008-08-14 Mitsubishi Electric Coporation Method for Facial Features Detection
US20180068198A1 (en) * 2016-09-06 2018-03-08 Carnegie Mellon University Methods and Software for Detecting Objects in an Image Using Contextual Multiscale Fast Region-Based Convolutional Neural Network
CN110427905A (en) * 2019-08-08 2019-11-08 北京百度网讯科技有限公司 Pedestrian tracting method, device and terminal
CN110796069A (en) * 2019-10-28 2020-02-14 广州博衍智能科技有限公司 Behavior detection method, system, equipment and machine readable medium
CN111639616A (en) * 2020-06-05 2020-09-08 上海一由科技有限公司 Heavy identity recognition method based on deep learning
WO2021051857A1 (en) * 2019-09-18 2021-03-25 北京市商汤科技开发有限公司 Target object matching method and apparatus, electronic device and storage medium
CN112581500A (en) * 2020-12-21 2021-03-30 上海立可芯半导体科技有限公司 Method and device for matching pedestrians and human faces in target tracking

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080193020A1 (en) * 2005-02-21 2008-08-14 Mitsubishi Electric Coporation Method for Facial Features Detection
US20180068198A1 (en) * 2016-09-06 2018-03-08 Carnegie Mellon University Methods and Software for Detecting Objects in an Image Using Contextual Multiscale Fast Region-Based Convolutional Neural Network
CN110427905A (en) * 2019-08-08 2019-11-08 北京百度网讯科技有限公司 Pedestrian tracting method, device and terminal
WO2021051857A1 (en) * 2019-09-18 2021-03-25 北京市商汤科技开发有限公司 Target object matching method and apparatus, electronic device and storage medium
CN110796069A (en) * 2019-10-28 2020-02-14 广州博衍智能科技有限公司 Behavior detection method, system, equipment and machine readable medium
CN111639616A (en) * 2020-06-05 2020-09-08 上海一由科技有限公司 Heavy identity recognition method based on deep learning
CN112581500A (en) * 2020-12-21 2021-03-30 上海立可芯半导体科技有限公司 Method and device for matching pedestrians and human faces in target tracking

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
G. LIANG: "Cross-View Person Identification Based on Confidence-Weighted Human Pose Matching", IEEE TRANSACTIONS ON IMAGE PROCESSING, vol. 28, no. 8, pages 3821 - 3835, XP011729766, DOI: 10.1109/TIP.2019.2899782 *
纪德益: "基于人脸联合识别和全局轨迹模式一致性的跨摄像头多目标追踪算法", 中国优秀硕士学位论文全文数据库 (信息科技辑), no. 6, pages 138 - 794 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113591785A (en) * 2021-08-12 2021-11-02 北京爱笔科技有限公司 Human body part matching method, device, equipment and storage medium
CN113591783A (en) * 2021-08-12 2021-11-02 北京爱笔科技有限公司 Human body and human face matching method, device, equipment and storage medium
CN113591786A (en) * 2021-08-12 2021-11-02 北京爱笔科技有限公司 Human body and human face matching method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN107872644B (en) Video monitoring method and device
CN113205138A (en) Human face and human body matching method, equipment and storage medium
US20180129919A1 (en) Apparatuses and methods for semantic image labeling
CN108171196B (en) Face detection method and device
CN111738258A (en) Pointer instrument reading identification method based on robot inspection
Chetverikov et al. Dynamic texture as foreground and background
CN109993770B (en) Target tracking method for adaptive space-time learning and state recognition
CN110348475A (en) It is a kind of based on spatial alternation to resisting sample Enhancement Method and model
CN110827265B (en) Image anomaly detection method based on deep learning
CN110705412A (en) Video target detection method based on motion history image
CN104683802A (en) H.264/AVC compressed domain based moving target tracking method
JP2019117556A (en) Information processing apparatus, information processing method and program
CN109145738B (en) Dynamic video segmentation method based on weighted non-convex regularization and iterative re-constrained low-rank representation
CN111914627A (en) Vehicle identification and tracking method and device
CN111931572B (en) Target detection method for remote sensing image
CN113129336A (en) End-to-end multi-vehicle tracking method, system and computer readable medium
Al-Shammri et al. A combined method for object detection under rain conditions using deep learning
CN115830505A (en) Video target segmentation method and system for removing background interference through semi-supervised learning
US11657608B1 (en) Method and system for video content analysis
CN111488882B (en) High-precision image semantic segmentation method for industrial part measurement
CN113762027B (en) Abnormal behavior identification method, device, equipment and storage medium
JP5241687B2 (en) Object detection apparatus and object detection program
CN106530300A (en) Flame identification algorithm of low-rank analysis
Zin et al. Background modeling using special type of Markov Chain
CN112884730A (en) Collaborative significance object detection method and system based on collaborative learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination