CN113158867A - Method and device for determining human face features and computer-readable storage medium - Google Patents

Method and device for determining human face features and computer-readable storage medium Download PDF

Info

Publication number
CN113158867A
CN113158867A CN202110403314.6A CN202110403314A CN113158867A CN 113158867 A CN113158867 A CN 113158867A CN 202110403314 A CN202110403314 A CN 202110403314A CN 113158867 A CN113158867 A CN 113158867A
Authority
CN
China
Prior art keywords
image
video
determining
segmented
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110403314.6A
Other languages
Chinese (zh)
Inventor
简汉斌
黄沛杰
李佳奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Weima Technology Co ltd
Original Assignee
Weima Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Weima Technology Co ltd filed Critical Weima Technology Co ltd
Priority to CN202110403314.6A priority Critical patent/CN113158867A/en
Publication of CN113158867A publication Critical patent/CN113158867A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a method and a device for determining human face characteristics and a computer readable storage medium, wherein the method comprises the following steps: segmenting a target video to obtain a plurality of segmented videos, and determining a first processing parameter corresponding to a first image in the segmented videos according to a face detection model; aligning and rearranging each first image of the segmented video according to the first processing parameters to obtain second images, and detecting each second image by adopting a face detection model to obtain each third image and first screening parameters corresponding to the third images; determining a target image according to a third image corresponding to a first screening parameter meeting a first preset condition, determining a second processing parameter of the target image according to a feature extraction model, and aligning and rearranging each target image according to the second processing parameter to obtain each fourth image; and performing feature extraction on each fourth image by using a feature extraction model to obtain the face features. The invention shortens the extraction time of the human face features.

Description

Method and device for determining human face features and computer-readable storage medium
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a method and an apparatus for determining human face features, and a computer-readable storage medium.
Background
In some places requiring a secure environment, people entering and exiting the place need to be monitored and tracked, such as banks, shopping malls, communities and the like. And the monitoring and tracking of the crowd is realized by extracting the face characteristics of the acquired images.
In the prior art, the facial features are extracted through the collected video, and the video comprises a plurality of images, so that the extraction time of the facial features is longer.
Disclosure of Invention
The invention mainly aims to provide a method and a device for determining human face features and a computer readable storage medium, and aims to solve the problem of long extraction time of the human face features.
In order to achieve the above object, the present invention provides a method for determining a face feature, wherein the method for determining a face feature comprises the following steps:
segmenting a target video to obtain a plurality of segmented videos, and determining a first processing parameter corresponding to a first image in the segmented videos according to a face detection model;
aligning and rearranging each first image of the segmented video according to the first processing parameter to obtain a second image, and detecting each second image by adopting the face detection model to obtain each third image and a first screening parameter corresponding to the third image;
determining a target image according to a third image corresponding to a first screening parameter meeting a first preset condition, determining a second processing parameter of the target image according to a feature extraction model, and aligning and rearranging each target image according to the second processing parameter to obtain each fourth image;
and extracting the features of each fourth image by adopting the feature extraction model to obtain the face features, and storing the face features and the segmented video in a correlation manner.
In an embodiment, the step of determining the target image according to the third image corresponding to the first screening parameter meeting the first preset condition includes:
determining a third image corresponding to the first screening parameter meeting a first preset condition as a fifth image, and determining a third processing parameter of the fifth image according to a feature prediction model;
aligning and rearranging the fifth images according to the third processing parameters to obtain sixth images;
performing feature prediction on each sixth image by using the feature prediction model to obtain a seventh image and a second screening parameter of the seventh image;
and determining a target image according to a seventh image corresponding to the second screening parameter meeting a second preset condition.
In an embodiment, the step of determining a target image according to a seventh image corresponding to the second screening parameter meeting a second preset condition includes:
determining a seventh image corresponding to the second screening parameter meeting a second preset condition as an eighth image, and determining a fourth processing parameter of the eighth image according to a feature optimization model;
aligning and rearranging the eighth images according to the fourth processing parameters to obtain ninth images;
performing feature optimization on each ninth image by using the feature optimization model to obtain a tenth image and a third screening parameter of the tenth image;
and determining a tenth image corresponding to the third screening parameter meeting a third preset condition as a target image.
In an embodiment, the processing parameters are determined according to the number of image channels, the number of image rearrangements, and the image scaling, and the processing parameters include the first processing parameter, the second processing parameter, the third processing parameter, and the fourth processing parameter.
In an embodiment, the step of segmenting the target video to obtain a plurality of segmented videos includes:
determining the segmentation quantity of the target video, and segmenting the target video according to the segmentation quantity to obtain a segmented video to be determined;
decoding each segmented video to be determined to obtain a first time point corresponding to the completion of the decoding of the segmented video to be determined;
acquiring a second time point corresponding to the segmentation of each segmented video to be determined, and storing the segmented video to be determined, a first time point corresponding to the segmented video to be determined and a second time period in a related manner to obtain a segmented video;
the step of storing the facial features in association with the segmented video comprises:
and storing the facial features and the first time point and the second time point of the segmented video corresponding to the facial features in an associated manner so as to finish the storage of the facial features and the segmented video in an associated manner.
In an embodiment, each step in the face feature determination process is performed by a module corresponding to the step, and the modules perform one or more same steps in parallel.
In one embodiment, the number of the same steps executed by the modules is determined according to a ratio between a target time length and a time length required by the modules to execute the steps, and the target time length is determined according to a least common multiple corresponding to the time length required by each module to execute the steps.
In an embodiment, the face feature determining device determines the face feature of each segmented video through a plurality of threads, and the number of the threads is smaller than that of the segmented videos; when the thread finishes the face feature determination of the current segmented video, the face feature determination device allocates a target segmented video to the thread to determine the face feature of the target segmented video through the thread, wherein the target segmented video is the segmented video without the face feature determination, or the target video is a video segment split from the segmented video with the face feature determination, and images in the split video segment are not processed.
In order to achieve the above object, the present invention further provides a facial feature determination apparatus, including a memory, a processor, and a determination program stored in the memory and executable on the processor, wherein the determination program, when executed by the processor, implements the facial feature determination method as described above.
To achieve the above object, the present invention also provides a computer-readable storage medium storing a determination program which, when executed by a processor, implements the determination method of the human face feature as described above.
The invention provides a method, a device and a computer readable storage medium for determining human face features, wherein a device for determining human face features segments a target video to obtain a plurality of segmented videos, determines first processing parameters corresponding to first images in the segmented videos based on a human face detection model, aligns and rearranges each first image of the segmented videos according to the first processing parameters to obtain a second image, detects each second image by using the human face detection model to obtain each third image and screening parameters of the third images, determines the target image according to the third image corresponding to the screening parameters meeting preset conditions, determines the second processing parameters of the target image based on a feature extraction model, aligns and rearranges each target image to obtain a fourth image, and finally extracts the features of each fourth image by using the feature extraction model to obtain the human face features, and then the human face features and the segmented video are stored in an associated mode. According to the invention, the longer video is divided into a plurality of segmented videos, so that the device can simultaneously extract the face features of the segmented videos, the face features do not need to be extracted from the whole video, and the extraction time of the face features is shortened; furthermore, when the facial features of the segmented video are extracted, the images of the segmented video are aligned and rearranged, so that the device can rapidly process the images in batches, the models can conveniently extract the facial signs in the images, and the extraction time of the facial features is shortened.
Drawings
Fig. 1 is a schematic diagram of a hardware structure of a face feature determination apparatus according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of an embodiment of a method for determining human face features according to the present invention;
FIG. 3 is a schematic flow chart illustrating another embodiment of a method for determining human face features according to the present invention;
fig. 4 is a flowchart illustrating a method for determining a face feature according to another embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The main solution of the embodiment of the invention is as follows: segmenting a target video to obtain a plurality of segmented videos, and determining a first processing parameter corresponding to a first image in the segmented videos according to a face detection model; aligning and rearranging each first image of the segmented video according to the first processing parameter to obtain a second image, and detecting each second image by adopting the face detection model to obtain each third image and a first screening parameter corresponding to the third image; determining a target image according to a third image corresponding to a first screening parameter meeting a first preset condition, determining a second processing parameter of the target image according to a feature extraction model, and aligning and rearranging each target image according to the second processing parameter to obtain each fourth image; and extracting the features of each fourth image by adopting the feature extraction model to obtain the face features, and storing the face features and the segmented video in a correlation manner.
According to the invention, the longer video is divided into a plurality of segmented videos, so that the device can simultaneously extract the face features of the segmented videos, the face features do not need to be extracted from the whole video, and the extraction time of the face features is shortened; furthermore, when the facial features of the segmented video are extracted, the images of the segmented video are aligned and rearranged, so that the device can rapidly process the images in batches, the models can conveniently extract the facial signs in the images, and the extraction time of the facial features is shortened.
As an implementation, the determination device of the human face features may be as shown in fig. 1.
The embodiment of the invention relates to a face feature determination device, which comprises: a processor 1001, such as a CPU, a memory 1002, and a communication bus 1003. Wherein a communication bus 1003 is provided to enable connectivity communication between these components.
The Memory 1002 may be a Random Access Memory (RAM) or a non-volatile Memory (e.g., a disk Memory). As shown in fig. 1, the memory 1003, which is a kind of computer storage medium, may include therein a determination program; and the processor 1001 may be arranged to call a determination program in the memory 1002 and perform the following operations:
segmenting a target video to obtain a plurality of segmented videos, and determining a first processing parameter corresponding to a first image in the segmented videos according to a face detection model;
aligning and rearranging each first image of the segmented video according to the first processing parameter to obtain a second image, and detecting each second image by adopting the face detection model to obtain each third image and a first screening parameter corresponding to the third image;
determining a target image according to a third image corresponding to a first screening parameter meeting a first preset condition, determining a second processing parameter of the target image according to a feature extraction model, and aligning and rearranging each target image according to the second processing parameter to obtain each fourth image;
and extracting the features of each fourth image by adopting the feature extraction model to obtain the face features, and storing the face features and the segmented video in a correlation manner.
In one embodiment, the processor 1001 may be configured to call a determination program stored in the memory 1002 and perform the following operations:
determining a third image corresponding to the first screening parameter meeting a first preset condition as a fifth image, and determining a third processing parameter of the fifth image according to a feature prediction model;
aligning and rearranging the fifth images according to the third processing parameters to obtain sixth images;
performing feature prediction on each sixth image by using the feature prediction model to obtain a seventh image and a second screening parameter of the seventh image;
and determining a target image according to a seventh image corresponding to the second screening parameter meeting a second preset condition.
In one embodiment, the processor 1001 may be configured to call a determination program stored in the memory 1002 and perform the following operations:
determining a seventh image corresponding to the second screening parameter meeting a second preset condition as an eighth image, and determining a fourth processing parameter of the eighth image according to a feature optimization model;
aligning and rearranging the eighth images according to the fourth processing parameters to obtain ninth images;
performing feature optimization on each ninth image by using the feature optimization model to obtain a tenth image and a third screening parameter of the tenth image;
and determining a tenth image corresponding to the third screening parameter meeting a third preset condition as a target image.
In an embodiment, the processing parameters are determined according to the number of image channels, the number of image rearrangements, and the image scaling, and the processing parameters include the first processing parameter, the second processing parameter, the third processing parameter, and the fourth processing parameter.
In one embodiment, the processor 1001 may be configured to call a determination program stored in the memory 1002 and perform the following operations:
determining the segmentation quantity of the target video, and segmenting the target video according to the segmentation quantity to obtain a segmented video to be determined;
decoding each segmented video to be determined to obtain a first time point corresponding to the completion of the decoding of the segmented video to be determined;
acquiring a second time point corresponding to the segmentation of each segmented video to be determined, and storing the segmented video to be determined, a first time point corresponding to the segmented video to be determined and a second time period in a related manner to obtain a segmented video;
the step of storing the facial features in association with the segmented video comprises:
and storing the facial features and the first time point and the second time point of the segmented video corresponding to the facial features in an associated manner so as to finish the storage of the facial features and the segmented video in an associated manner.
In one embodiment, the processor 1001 may be configured to call a determination program stored in the memory 1002 and perform the following operations:
each step in the face feature determination process is executed by a module corresponding to the step, and the modules execute one or more same steps in parallel.
In one embodiment, the processor 1001 may be configured to call a determination program stored in the memory 1002 and perform the following operations:
the number of the same steps executed by the modules is determined according to the ratio of target time length to time length required by the modules to execute the steps, and the target time length is determined according to the least common multiple corresponding to the time length required by each module to execute the steps.
In one embodiment, the processor 1001 may be configured to call a determination program stored in the memory 1002 and perform the following operations:
the determining device of the human face features determines the human face features of the segmented videos through a plurality of threads, and the number of the threads is smaller than that of the segmented videos; when the thread finishes the face feature determination of the current segmented video, the face feature determination device allocates a target segmented video to the thread to determine the face feature of the target segmented video through the thread, wherein the target segmented video is the segmented video without the face feature determination, or the target video is a video segment split from the segmented video with the face feature determination, and images in the split video segment are not processed.
According to the scheme, the device for determining the face features segments the target video to obtain a plurality of segmented videos, determines the first processing parameter corresponding to the first image in the segmented videos based on the face detection model, and aligning and rearranging each first image of the segmented video according to the first processing parameter to obtain a second image, then, the face detection model is adopted to detect each second image to obtain each third image and the screening parameters of the third image, the target image is determined according to the third image corresponding to the screening parameters meeting the preset conditions, the second processing parameters of the target image are determined based on the feature extraction model, and aligning and rearranging the target images to obtain fourth images, finally, extracting the features of the fourth images by adopting a feature extraction model to obtain face features, and then, storing the face features and the segmented video in an associated manner. According to the invention, the longer video is divided into a plurality of segmented videos, so that the device can simultaneously extract the face features of the segmented videos, the face features do not need to be extracted from the whole video, and the extraction time of the face features is shortened; furthermore, when the facial features of the segmented video are extracted, the images of the segmented video are aligned and rearranged, so that the device can rapidly process the images in batches, the models can conveniently extract the facial signs in the images, and the extraction time of the facial features is shortened.
Based on the hardware architecture, the embodiment of the face feature determination method is provided.
Referring to fig. 2, fig. 2 is an embodiment of determining a face feature of the present invention, where the method for determining a face feature includes the following steps:
step S10, acquiring a target video, segmenting the target video to obtain a plurality of segmented videos, and determining a first processing parameter corresponding to a first image in the segmented videos according to a face detection model; in the present embodiment, the determination means whose subject is a human face feature is performed, and for convenience of description, the determination means of a human face feature is referred to as a means hereinafter.
The device firstly acquires a video needing face feature extraction, and the video is a target video. In order to rapidly analyze a video to obtain face features, a sufficiently fast data input speed is required, that is, high-speed decoding of data is required in advance, so that a target video is segmented, and then the segmented video is decoded, so that the segmented video can be read, and the input speed of the whole data is doubled. The apparatus may segment the target video according to a fixed duration of the segmented video, for example, if the fixed duration of the segmented video is 2min, then the target video of 9 min is segmented into 5 segmented videos. Furthermore, the device may divide the video into a fixed number of segmented videos, e.g., a fixed number of 5, then the target video is divided into 5 segmented videos.
The device is provided with a face detection model for identifying the face features in the image and marking the face features in the image. After the segmented videos are obtained, the face model sequentially identifies the face features of the first images in the segmented videos. Before performing the recognition, the apparatus needs to perform alignment and rearrangement processing on the respective first images. Specifically, the device acquires first processing parameters of the first images, and the first processing parameters are used for aligning and rearranging the first images. The first processing parameters include the number of image channels of the first image, the scale to which the first image needs to be scaled, and the number of aligned rearranged first images. The scaling of the first image scaling and the number of aligned and rearranged first images are determined according to the face detection model, i.e. the face detection model has size requirements for image recognition and the number requirements for aligned target images. The first processing parameter may be characterized as (W, H, C, N), where (W, H,) is characterized as the scaled image size of the first image, C is the number of aligned first images, and N is the number of images simultaneously processed by the face recognition model; for example, the first processing parameter (W, H, C, N) ═ 416,3, 2.
Step S20, aligning and rearranging each first image of the segmented video according to the first processing parameter to obtain a second image, and detecting each second image by using the face detection model to obtain each third image and a first screening parameter corresponding to the third image;
after determining a first Processing parameter corresponding to a first image, the device aligns and rearranges each first image in the segmented video according to the first Processing parameter to obtain a second image, and the alignment and rearrangement of the first images can facilitate batch Processing of a GPU (Graphics Processing Unit) in the device.
After the second images are obtained, the device performs feature detection on each second image by adopting the face detection model to obtain each third image and the first screening parameters corresponding to the third images. The third image is marked with the location of facial features, for example, framing a face in the form of a box. The segmented video does not have a face in each image, and the face features of each image can not be clearly extracted, so that the images with the faces need to be screened, and the first screening parameter can be used for screening a third image containing the faces. The first screening parameter may be a confidence of the third image.
Step S30, determining a target image according to a third image corresponding to a first screening parameter meeting a first preset condition, determining a second processing parameter of the target image according to a feature extraction model, and aligning and rearranging each target image according to the second processing parameter to obtain each fourth image;
the device is provided with a first preset condition, and after each third image is obtained, the device judges whether the first screening parameters of the third image meet the first preset condition. The first preset condition may be that the confidence is greater than a first preset threshold. Therefore, the device selects the third image corresponding to the confidence coefficient greater than the first preset threshold value to determine the third image as the target image, and the confidence coefficient greater than the first preset threshold value is the first screening parameter meeting the first preset condition.
The device is also provided with a feature extraction model, and the feature extraction model is used for extracting the human face features in the target image, namely extracting the human face features in the frame of the target image.
Before face feature extraction, alignment rearrangement needs to be performed on each target image, that is, a second processing parameter of the target image is obtained according to the feature extraction model. The second processing parameters include the number of image channels of the target image, the scale to which the target image needs to be scaled, and the number of aligned re-shuffled target images. The scaling of the target image scaling and the number of aligned rearranged target images are determined according to the feature extraction model, i.e. the feature extraction model has size requirements for image recognition and number requirements for aligned target images. The second processing parameter may also be characterized as (W, H, C, N), for example, the second processing parameter (W, H, C, N) ═ (112, 3, 4).
After the second Processing parameter corresponding to the target image is determined, the device aligns and rearranges each target image according to the second Processing parameter to obtain a fourth image, and the alignment and rearrangement of the fourth image can facilitate the batch Processing of a GPU (Graphics Processing Unit) in the device. The area where the scaled image is blank is filled with 0, and 0 is a pixel value.
And step S40, performing feature extraction on each fourth image by using the feature extraction model to obtain face features, and storing the face features and the segmented video in a related manner.
After the fourth images are obtained, the device adopts the feature extraction model to extract features of the fourth images, so that face features are obtained, and the face features can exist in the form of feature vectors. The device can sequentially process the face features of each segmented video, can also simultaneously extract the face features of each segmented video, and then stores the face features and the segmented videos corresponding to the face features in an associated manner. In addition, if the device needs to monitor and track immediately, the characteristic tracking of people is carried out based on the acquired human face characteristics.
It should be noted that the target video may be an offline video or a real-time video. When the device extracts the face features of the real-time video, a small number of images are captured to extract the face features, for example, a currently received image and a last received image are captured to extract the face features, or even only the currently received image is captured to extract the face features. Therefore, the efficiency of extracting the face features from the offline video by the device is higher than that of extracting the face features from the real-time video. Therefore, when the device receives the real-time video and the monitoring tracking is not limited by time currently, the real-time received image can be stored to be converted into the offline video, and then the offline video is subjected to feature extraction, so that the extraction efficiency of the human face features is improved.
In the technical solution provided in this embodiment, the determining apparatus for face features segments a target video to obtain a plurality of segmented videos, determines a first processing parameter corresponding to a first image in the segmented videos based on a face detection model, and aligning and rearranging each first image of the segmented video according to the first processing parameter to obtain a second image, then, the face detection model is adopted to detect each second image to obtain each third image and the screening parameters of the third image, the target image is determined according to the third image corresponding to the screening parameters meeting the preset conditions, the second processing parameters of the target image are determined based on the feature extraction model, and aligning and rearranging the target images to obtain fourth images, finally, extracting the features of the fourth images by adopting a feature extraction model to obtain face features, and then, storing the face features and the segmented video in an associated manner. According to the invention, the longer video is divided into a plurality of segmented videos, so that the device can simultaneously extract the face features of the segmented videos, the face features do not need to be extracted from the whole video, and the extraction time of the face features is shortened; furthermore, when the facial features of the segmented video are extracted, the images of the segmented video are aligned and rearranged, so that the device can rapidly process the images in batches, the models can conveniently extract the facial signs in the images, and the extraction time of the facial features is shortened.
Referring to fig. 3, fig. 3 is another embodiment of the method for determining a facial feature according to the present invention, and based on the above embodiment, the step S30 of "determining a target image according to a third image corresponding to a first filtering parameter satisfying a first preset condition" includes:
step S31, determining a third image corresponding to the first screening parameter meeting a first preset condition as a fifth image, and determining a third processing parameter of the fifth image according to a feature prediction model;
step S32, aligning and rearranging each fifth image according to the third processing parameter to obtain each sixth image;
step S33, performing feature prediction on each sixth image by using the feature prediction model to obtain a seventh image and a second screening parameter of the seventh image;
and step S34, determining a target image according to the seventh image corresponding to the second screening parameter meeting a second preset condition.
The range of framing of the third image obtained by detecting the second image by the face detection model is large, so that when the feature extraction model extracts the face features of the third image serving as the target image, the consumed time is long, and the extracted face features are not accurate. In contrast, a feature prediction model is set in the device, that is, the frame in the third image is further reduced through a feature preset model, so that the extraction time of the face features is reduced, and the extracted face features are more accurate.
In this regard, the device determines a third image corresponding to the first screening parameter meeting the first preset condition as a fifth image, and then determines a third processing parameter of the fifth image according to the feature prediction model. The third processing parameters include the number of image channels of the fifth image, the scale to which the fifth image needs to be scaled, and the number of aligned rearranged fifth images. The scaling of the fifth image and the number of aligned rearranged fifth images are determined from the feature prediction model, i.e. the feature prediction model has size requirements for image recognition and number requirements for aligned fifth images. The third processing parameter may also be characterized as (W, H, C, N), for example, the third processing parameter (W, H, C, N) ═ 48,48,3, any number.
The device aligns and rearranges the fifth images based on the third processing parameters to obtain sixth images, and predicts the sixth images by using the feature prediction model to obtain seventh images and second screening parameters of the seventh images. The second screening parameter may be a confidence of the seventh image. The seventh image is still an image marked with a box, and the size of the box in the seventh box is smaller than the size of the box in the third image.
And the device is provided with a second preset condition, and after each seventh image is obtained, the device judges whether the second screening parameters of the seventh image meet the second preset condition. The second preset condition may be that the confidence is greater than a second preset threshold. Therefore, the device selects the seventh image corresponding to the confidence coefficient greater than the second preset threshold value to be determined as the target image, and the confidence coefficient greater than the second preset threshold value is the second screening parameter meeting the second preset condition.
In the technical scheme provided by this embodiment, the device determines a third image corresponding to a first screening parameter meeting a first preset condition as a fifth image, determines a third processing parameter of the fifth image according to a feature prediction model, aligns and rearranges the fifth images according to the third processing parameter to obtain a sixth image, predicts the sixth image by using the feature prediction model to obtain a seventh image and a second screening parameter of the seventh image, and finally uses the seventh image corresponding to a second screening parameter meeting a second preset condition as a target image, so that the target image can be subjected to rapid face feature extraction by the feature extraction model, and the extracted face features are accurate.
Referring to fig. 4, fig. 4 is a further embodiment of the method for extracting facial features of the present invention, and based on the embodiment shown in fig. 3, the step S34 includes:
step S341, determining a seventh image corresponding to the second screening parameter that meets a second preset condition as an eighth image, and determining a fourth processing parameter of the eighth image according to a feature optimization model;
step S342, aligning and rearranging each eighth image according to the fourth processing parameter to obtain each ninth image;
step S343, performing feature optimization on each ninth image by using the feature optimization model to obtain a tenth image and a third screening parameter of the tenth image;
in step S344, the tenth image corresponding to the third screening parameter that meets the third preset condition is determined as the target image.
The frame marked by the seventh image is formed by feature points, and in order to further improve the accuracy of face feature extraction, a feature optimization model can be set in the device, namely, the feature points of the frame in the seventh image are optimized through the feature optimization model. The optimized feature points refer to feature points other than those constituting a straight line (the straight line is a straight line to which the device fits the feature points in the seventh image).
In this regard, the device determines a seventh image corresponding to the second screening parameter meeting the second preset condition as an eighth image, and then determines a fourth processing parameter of the eighth image according to the feature prediction model. The fourth processing parameters include the number of image channels of the eighth image, the scale to which the eighth image needs to be scaled, and the number of aligned rearranged eighth images. The scaling of the eighth image scaling and the number of aligned rearranged eighth images are determined according to the feature optimization model, i.e. the feature optimization model has size requirements for image recognition and number requirements for aligned eighth images. The fourth processing parameter may also be characterized as (W, H, C, N), for example, the fourth processing parameter (W, H, C, N) ═ (24,24,15, arbitrary numbers).
The device aligns and rearranges the eighth images based on the fourth processing parameters to obtain ninth images, and then performs feature optimization on the seventh images by using the feature optimization model to obtain tenth images and third screening parameters of the tenth images. The third screening parameter may be a confidence of the tenth image. The tenth image is still an image marked with a box, and the size of the box in the tenth box is smaller than the size of the box in the seventh image.
And the device is provided with a third preset condition, and after obtaining each tenth image, the device judges whether the third screening parameter of the tenth image meets the third preset condition. The third preset condition may be that the confidence is greater than a third preset threshold. Therefore, the device selects the tenth image corresponding to the confidence coefficient greater than the third preset threshold value to be determined as the target image, and the confidence coefficient greater than the third preset threshold value is the third screening parameter meeting the third preset condition.
As can be seen from the above embodiments, the processing parameters are determined according to the number of image channels, the number of image rearrangements, and the image scaling, and the processing parameters include a first processing parameter, a second processing parameter, a third processing parameter, and a fourth processing parameter.
In the technical scheme provided by this embodiment, the device determines a seventh image corresponding to the second screening parameter meeting the second preset condition as an eighth image, determines a fourth processing parameter of the eighth image according to the feature optimization model, aligns and rearranges the eighth images according to the fourth processing parameter to obtain a ninth image, predicts the ninth image by using the feature optimization model to obtain a tenth image and a third screening parameter of the tenth image, and finally uses a tenth image corresponding to the third screening parameter meeting the third preset condition as a target image, so that the extracted face features are more accurate.
In an embodiment, the step of segmenting the target video to obtain a plurality of segmented videos includes:
A. determining the segmentation quantity of the target video, and segmenting the target video according to the segmentation quantity to obtain a segmented video to be determined;
B. decoding each segmented video to be determined to obtain a first time point corresponding to the completion of the decoding of the segmented video to be determined;
C. acquiring a second time point corresponding to the segmentation of each segmented video to be determined, and storing the segmented video to be determined, a first time point corresponding to the segmented video to be determined and a second time period in a related manner to obtain a segmented video;
in this embodiment, the number of segments n is set in the device. After the device obtains the target video, the target video is segmented based on the number of segments to obtain a segmented video to be determined. And determining the segmented video to be unmarked. The device can obtain the starting time point of each segmented video to be determined for segmentation, wherein the starting time point is the second time point. For example, if the duration of the target video is T, the second time point T1 of the first segmented video to be determined is 0, the second time point T2 of the second segmented video to be determined is T/n, and the second time point tn of the nth segmented video to be determined is T ((n-1)/n) of ….
After each segmented video to be determined is obtained, the device decodes each segmented video to be determined simultaneously, records the time of decoding completion of the segmented video to be determined as a first time point, reads the segmented video to be determined according to second time points t1, t2 and … tn jump respectively, and decodes the segmented video until the decoding is successful, wherein the decoding completion time points of each segmented video to be determined are s1, s2 and … sn. Therefore, the first time point and the second time point corresponding to the first segmented video to be determined are (t1, s1), the first time point and the second time point corresponding to the second segmented video to be determined are (t2, s2), and the first time point and the second time point corresponding to the n-th segmented video to be determined … are (tn, sn).
The device associates and stores the segmented video to be determined, the first time point and the second time point corresponding to the segmented video to be determined, so as to obtain the segmented video. The first and second points in time of the marking can be understood as a marking of the segmented video. Therefore, after the face features are obtained, the first time point and the second time point of the segmented video corresponding to the face features are stored in an associated manner, that is, the storage of the face features and the segmented video is regarded as the associated storage of the face features and the segmented video, that is, the step of storing the face features and the segmented video in an associated manner includes:
and storing the facial features and the first time point and the second time point of the segmented video corresponding to the facial features in an associated manner so as to finish the storage of the facial features and the segmented video in an associated manner.
In the embodiment, the segmented video is marked based on the segmentation time point and the decoding time point of the segmented video, so that the determined face features can be conveniently associated and stored with the segmented video.
In one embodiment, each step in the face feature determination process is performed by a module corresponding to the step, and the modules perform one or more same steps in parallel. Specifically, there are fourteen steps in the extraction process of the face features. Respectively as follows:
1. video segmentation;
2. aligning and rearranging the initial image;
3. the face detection model detects the aligned and rearranged images to obtain an image A;
4. performing result screening on the image A (the result screening is screening parameters meeting preset conditions) to obtain an image B;
5. aligning and rearranging the B image;
6. aligning the rearranged B image by using a characteristic prediction model to predict to obtain a C image;
7. screening the result of the image C to obtain an image D;
8. aligning and rearranging the D image;
9. performing feature optimization on the aligned and rearranged D images by using a feature optimization model to obtain an E image;
10. screening the result of the E image to obtain an F image;
11. aligning and rearranging the F image;
12. performing feature extraction on the aligned and rearranged F images by adopting a feature extraction model to obtain human face features;
13. carrying out feature tracking by adopting the human face features to obtain a result;
14. and analyzing the result and storing the analysis result.
The fourteen steps described above are performed by different modules within the device, and since there are multiple images in a segmented video, modules can perform one or more of the same steps in parallel. For example, the module may simultaneously perform an alignment rearrangement on 3 sets of F images.
Furthermore, the number of the same steps executed by the modules is determined according to the ratio of the target time length to the time length required by the modules to execute the steps, and the target time length is determined according to the least common multiple corresponding to the time length required by each module to execute the steps. For example, the device executes four steps 1, 2,3, and 4, the time length for the step 1 to complete execution is 20s, the time length for the step 2 to complete execution is 30s, the time length for the step 3 to complete execution is 10s, and the time length for the step 4 to complete execution is 40s, and then the least common multiple of 20s, 30s, 10s, and 40s is 120s, then the number of steps 1 that can be executed simultaneously by the module 1 is 120s/20s 6, the number of steps 1 that can be executed simultaneously by the module 2 is 120s/30s 4, the number of steps 3 that can be executed simultaneously by the module 3 is 120s/10s 12, and the number of steps 4 that can be executed simultaneously by the module 4 is 120s/40s 3. Similarly, the number of other modules performing the same steps may be calculated according to the above example.
By adopting the mode, the CPU and the GPU in the device can balance the load, and the optimization of the extraction speed of the human face features is realized.
In one embodiment, since the extraction speed of the human body features of the segmented video is related to the content of the distributed video, the time for feature extraction estimated according to the file duration of the segmented video is not necessarily optimal. The embodiment provides a method for dynamically splitting a segmented video. When the facial features of the segmented video are extracted earlier by a certain analysis example than other analysis examples in the process of extracting the facial features of the segmented video, the device calls a dynamic splitting method, a new segmented video is split and distributed to the analysis example, and the idle example restarts analysis until the whole video is analyzed.
Specifically, the device determines the facial features of each segmented video through a plurality of threads (one thread can be regarded as an analysis example), and the number of the threads is less than or equal to that of the segmented videos, so that redundant segmented videos are not distributed to the threads to extract the facial features. When the thread finishes the face feature determination of the current segmented video, the device allocates the target segmented video to the thread finishing the human body feature extraction so as to determine the face feature of the target segmented video through the thread, wherein the target segmented video is the segmented video without the face feature determination; or the target video is a video segment split from the segmented video with the facial features, and images in the split video segment are not processed.
The following description will be made by taking a target segmented video as a video segment split from a segmented video on which a face feature is being performed.
The device finds a free analysis example; the device checks the face feature extraction progress of the segmented video, finds the longest unprocessed video segment from the segmented video, inserts a video reading starting point in the middle of the unprocessed video segment, and modifies the ending time point of the original segmented video, wherein the time point from the video reading starting point to the ending time point of the video segment is the target segmented video. And distributing the disassembled video segment to an analysis example for extracting the human body characteristics. The thread face features refer to steps S10 to S40, which are not described herein. For example, if the segmented video is 10 minutes 10 seconds to 20 minutes 10 seconds, the segmented video has been processed to 15 minutes 30 seconds, and a video reading point is found in 15 minutes 31 seconds to 20 minutes 10 seconds, if 16 minutes 10 seconds is selected, the 20 minutes 10 seconds of the end time point of the original segmented video is changed to 16 minutes 9 seconds, and the target video segment is 16 minutes 10 seconds to 20 minutes 10 seconds.
The present invention further provides a facial feature determination apparatus, which includes a memory, a processor, and a determination program stored in the memory and executable on the processor, and when the determination program is executed by the processor, the determination method of facial features as described in the above embodiments is implemented.
The present invention also provides a computer-readable storage medium storing a determination program that, when executed by a processor, implements the determination method of the face feature according to the above embodiment.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the embodiments described above can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is an alternative embodiment. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, a television, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only an alternative embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A method for determining human face features is characterized by comprising the following steps:
segmenting a target video to obtain a plurality of segmented videos, and determining a first processing parameter corresponding to a first image in the segmented videos according to a face detection model;
aligning and rearranging each first image of the segmented video according to the first processing parameter to obtain a second image, and detecting each second image by adopting the face detection model to obtain each third image and a first screening parameter corresponding to the third image;
determining a target image according to a third image corresponding to a first screening parameter meeting a first preset condition, determining a second processing parameter of the target image according to a feature extraction model, and aligning and rearranging each target image according to the second processing parameter to obtain each fourth image;
and extracting the features of each fourth image by adopting the feature extraction model to obtain the face features, and storing the face features and the segmented video in a correlation manner.
2. The method for determining the facial features as claimed in claim 1, wherein the step of determining the target image according to the third image corresponding to the first screening parameter satisfying the first preset condition includes:
determining a third image corresponding to the first screening parameter meeting a first preset condition as a fifth image, and determining a third processing parameter of the fifth image according to a feature prediction model;
aligning and rearranging the fifth images according to the third processing parameters to obtain sixth images;
performing feature prediction on each sixth image by using the feature prediction model to obtain a seventh image and a second screening parameter of the seventh image;
and determining a target image according to a seventh image corresponding to the second screening parameter meeting a second preset condition.
3. The method for determining the facial features as claimed in claim 2, wherein the step of determining the target image according to the seventh image corresponding to the second screening parameter satisfying the second preset condition includes:
determining a seventh image corresponding to the second screening parameter meeting a second preset condition as an eighth image, and determining a fourth processing parameter of the eighth image according to a feature optimization model;
aligning and rearranging the eighth images according to the fourth processing parameters to obtain ninth images;
performing feature optimization on each ninth image by using the feature optimization model to obtain a tenth image and a third screening parameter of the tenth image;
and determining a tenth image corresponding to the third screening parameter meeting a third preset condition as a target image.
4. The method of claim 3, wherein the processing parameters are determined according to the number of image channels, the number of image rearrangements, and the image scaling, and the processing parameters include the first processing parameter, the second processing parameter, the third processing parameter, and the fourth processing parameter.
5. The method for determining human face features according to any one of claims 1 to 4, wherein the step of segmenting the target video to obtain a plurality of segmented videos comprises:
determining the segmentation quantity of the target video, and segmenting the target video according to the segmentation quantity to obtain a segmented video to be determined;
decoding each segmented video to be determined to obtain a first time point corresponding to the completion of the decoding of the segmented video to be determined;
acquiring a second time point corresponding to the segmentation of each segmented video to be determined, and storing the segmented video to be determined, a first time point corresponding to the segmented video to be determined and a second time period in a related manner to obtain a segmented video;
the step of storing the facial features in association with the segmented video comprises:
and storing the facial features and the first time point and the second time point of the segmented video corresponding to the facial features in an associated manner so as to finish the storage of the facial features and the segmented video in an associated manner.
6. The method for determining facial features of any one of claims 1-4, wherein each step in the process of determining the facial features is performed by a module corresponding to the step, and the modules perform one or more same steps in parallel.
7. The method of claim 6, wherein the number of modules performing the same step is determined by a ratio of a target duration to a duration required for the modules to perform the step, the target duration being determined by a least common multiple of the durations required for the modules to perform the step.
8. The method for determining facial features of any one of claims 1 to 4, wherein the facial feature determining device determines the facial features of each segmented video through a plurality of threads, and the number of the threads is smaller than that of the segmented videos; when the thread finishes the face feature determination of the current segmented video, the face feature determination device allocates a target segmented video to the thread to determine the face feature of the target segmented video through the thread, wherein the target segmented video is the segmented video without the face feature determination, or the target video is a video segment split from the segmented video with the face feature determination, and images in the split video segment are not processed.
9. An apparatus for determining facial features, the apparatus comprising a memory, a processor and a determination program stored in the memory and executable on the processor, the determination program when executed by the processor implementing the method for determining facial features as claimed in any one of claims 1-8.
10. A computer-readable storage medium characterized in that the computer-readable storage medium stores a determination program which, when executed by a processor, implements the determination method of the facial features according to any one of claims 1 to 8.
CN202110403314.6A 2021-04-15 2021-04-15 Method and device for determining human face features and computer-readable storage medium Pending CN113158867A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110403314.6A CN113158867A (en) 2021-04-15 2021-04-15 Method and device for determining human face features and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110403314.6A CN113158867A (en) 2021-04-15 2021-04-15 Method and device for determining human face features and computer-readable storage medium

Publications (1)

Publication Number Publication Date
CN113158867A true CN113158867A (en) 2021-07-23

Family

ID=76890600

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110403314.6A Pending CN113158867A (en) 2021-04-15 2021-04-15 Method and device for determining human face features and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN113158867A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014001610A1 (en) * 2012-06-25 2014-01-03 Nokia Corporation Method, apparatus and computer program product for human-face features extraction
CN109284729A (en) * 2018-10-08 2019-01-29 北京影谱科技股份有限公司 Method, apparatus and medium based on video acquisition human face recognition model training data
CN109685018A (en) * 2018-12-26 2019-04-26 深圳市捷顺科技实业股份有限公司 A kind of testimony of a witness method of calibration, system and relevant device
CN109815868A (en) * 2019-01-15 2019-05-28 腾讯科技(深圳)有限公司 A kind of image object detection method, device and storage medium
CN110363129A (en) * 2019-07-05 2019-10-22 昆山杜克大学 Autism early screening system based on smile normal form and audio-video behavioural analysis
WO2019242416A1 (en) * 2018-06-20 2019-12-26 腾讯科技(深圳)有限公司 Video image processing method and apparatus, computer readable storage medium and electronic device
CN110866490A (en) * 2019-11-13 2020-03-06 复旦大学 Face detection method and device based on multitask learning
CN111209897A (en) * 2020-03-09 2020-05-29 腾讯科技(深圳)有限公司 Video processing method, device and storage medium
CN111274965A (en) * 2020-01-20 2020-06-12 上海眼控科技股份有限公司 Face recognition method and device, computer equipment and storage medium
CN111507138A (en) * 2019-01-31 2020-08-07 北京奇虎科技有限公司 Image recognition method and device, computer equipment and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014001610A1 (en) * 2012-06-25 2014-01-03 Nokia Corporation Method, apparatus and computer program product for human-face features extraction
WO2019242416A1 (en) * 2018-06-20 2019-12-26 腾讯科技(深圳)有限公司 Video image processing method and apparatus, computer readable storage medium and electronic device
CN109284729A (en) * 2018-10-08 2019-01-29 北京影谱科技股份有限公司 Method, apparatus and medium based on video acquisition human face recognition model training data
CN109685018A (en) * 2018-12-26 2019-04-26 深圳市捷顺科技实业股份有限公司 A kind of testimony of a witness method of calibration, system and relevant device
CN109815868A (en) * 2019-01-15 2019-05-28 腾讯科技(深圳)有限公司 A kind of image object detection method, device and storage medium
CN111507138A (en) * 2019-01-31 2020-08-07 北京奇虎科技有限公司 Image recognition method and device, computer equipment and storage medium
CN110363129A (en) * 2019-07-05 2019-10-22 昆山杜克大学 Autism early screening system based on smile normal form and audio-video behavioural analysis
CN110866490A (en) * 2019-11-13 2020-03-06 复旦大学 Face detection method and device based on multitask learning
CN111274965A (en) * 2020-01-20 2020-06-12 上海眼控科技股份有限公司 Face recognition method and device, computer equipment and storage medium
CN111209897A (en) * 2020-03-09 2020-05-29 腾讯科技(深圳)有限公司 Video processing method, device and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Y. WONG 等: "Patch-based probabilistic image quality assessment for face selection and improved video-based face recognition", 《CVPR 2011 WORKSHOPS》, 25 June 2011 (2011-06-25), pages 74 - 81, XP031926599, DOI: 10.1109/CVPRW.2011.5981881 *
徐旺旺: "基于深度学习的快速人脸检测算法实现与应用研究", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 2020, 15 July 2020 (2020-07-15), pages 138 - 1028 *

Similar Documents

Publication Publication Date Title
CN109670437B (en) Age estimation model training method, facial image recognition method and device
CN110399526B (en) Video title generation method and device and computer readable storage medium
CN111160275B (en) Pedestrian re-recognition model training method, device, computer equipment and storage medium
CN113496208B (en) Video scene classification method and device, storage medium and terminal
CN111783712A (en) Video processing method, device, equipment and medium
CN111899246A (en) Slide digital information quality detection method, device, equipment and medium
CN110930434A (en) Target object tracking method and device, storage medium and computer equipment
CN113139403A (en) Violation behavior identification method and device, computer equipment and storage medium
CN112507860A (en) Video annotation method, device, equipment and storage medium
CN111199186A (en) Image quality scoring model training method, device, equipment and storage medium
CN111091146B (en) Picture similarity obtaining method and device, computer equipment and storage medium
US9699501B2 (en) Information processing device and method, and program
CN113158867A (en) Method and device for determining human face features and computer-readable storage medium
CN111599382A (en) Voice analysis method, device, computer equipment and storage medium
CN111539390A (en) Small target image identification method, equipment and system based on Yolov3
CN116612498A (en) Bird recognition model training method, bird recognition method, device and equipment
CN110019951B (en) Method and equipment for generating video thumbnail
WO2019150649A1 (en) Image processing device and image processing method
CN113296723B (en) Method, system, device and medium for synchronously displaying target frame and video frame
CN116091862A (en) Picture quality identification method, device, equipment, storage medium and product
CN115424253A (en) License plate recognition method and device, electronic equipment and storage medium
CN112199131B (en) Page detection method, device and equipment
CN114463242A (en) Image detection method, device, storage medium and device
CN114697761B (en) Processing method, processing device, terminal equipment and medium
CN113709563B (en) Video cover selecting method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination