CN111950507B

CN111950507B - Data processing and model training method, device, equipment and medium

Info

Publication number: CN111950507B
Application number: CN202010863364.8A
Authority: CN
Inventors: 朱宏吉; 张彦刚
Original assignee: Beijing Orion Star Technology Co Ltd
Current assignee: Beijing Orion Star Technology Co Ltd
Priority date: 2020-08-25
Filing date: 2020-08-25
Publication date: 2024-06-11
Anticipated expiration: 2040-08-25
Also published as: CN111950507A

Abstract

The invention discloses a data processing and model training method, a device, equipment and a medium, which are used for solving the problems of large calculated amount, low efficiency and large occupied storage space in the existing process of determining passenger flow data. According to the embodiment of the invention, the position information of each target detection frame containing the head of the person in the image to be identified and the information of whether each target detection frame contains the face or not can be obtained through the joint detection model, so that the position information of each target detection frame containing the head of the person in the image to be identified and the information of whether each target detection frame contains the face or not can be extracted by only inputting the image to be identified into the joint detection model once, the calculation amount required for extracting the feature vector of the region of the target detection frame in the image to be identified for multiple times is reduced, and the efficiency of the determining process of the passenger flow data is improved.

Description

Data processing and model training method, device, equipment and medium

Technical Field

The present invention relates to the field of data analysis technologies, and in particular, to a training method, apparatus, device, and medium for a data processing and joint detection model.

Background

At present, the effect of the device on soliciting customers in the scene can be evaluated by analyzing the statistical passenger flow data, namely passenger flow value parameters and attention times parameters, so that technical support is provided for merchants to make better business decisions. Therefore, how to count passenger flow data is an increasingly focused problem in recent years.

In the prior art, in the process of determining the attention frequency parameter, the position information of a target detection frame containing a human head in an image to be identified is generally obtained through a human head detection model which is trained in advance, and for each target detection frame, whether the target detection frame contains the information of a human face or not is identified based on the region of the target detection frame in the image to be identified through a human face classification model which is trained in advance. And then determining the identification information of each target detection frame, and updating the attention frequency parameter in the currently stored passenger flow data when the identification information of the target detection frame meets the preset updating condition and the face is contained in the target detection frame.

With the method for determining the attention frequency parameters in the passenger flow data, a great deal of storage space is wasted due to the need of respectively storing the human head detection model and the human face classification model, and in the process of acquiring each target detection frame contained in the image to be identified and identifying whether the information of the human face is contained in each target detection frame or not, the feature vectors of the region of the target detection frame in the image to be identified are inevitably repeatedly extracted, so that the calculation amount required in the process of determining the attention frequency parameters is large and the efficiency is low.

Disclosure of Invention

The embodiment of the invention provides a training method, device, equipment and medium for a data processing and joint detection model, which are used for solving the problems of large calculation amount, low efficiency and large occupied storage space in the existing process of determining passenger flow data.

The embodiment of the invention provides a data processing method, which comprises the following steps:

Acquiring the position information of each target detection frame containing a human head in an image to be identified and the information of whether each target detection frame contains a human face or not through a joint detection model;

and determining passenger flow data according to the position information of the target detection frame and/or whether the target detection frame contains the information of the human face.

The embodiment of the invention provides a training method of a joint detection model, which comprises the following steps:

Acquiring any sample image in a sample set, wherein the sample image is marked with first position information of each sample head frame, a first identification value corresponding to each sample head frame and a second identification value, wherein the first identification value is used for identifying whether the sample head frame contains a head, and the second identification value is used for identifying whether the sample head frame contains a face;

acquiring second position information of each sample detection frame containing a human head in a sample image and information of whether each sample detection frame contains a human face or not through an original joint detection model;

And training the original joint detection model according to the second position information of the sample detection frame and the first position information of the corresponding sample head frame, whether the sample detection frame contains the first identification value corresponding to the head information and the corresponding sample head frame and whether the sample detection frame contains the second identification value corresponding to the face information and the corresponding sample head frame.

The embodiment of the invention provides a data processing device, which comprises:

The acquisition unit is used for acquiring the position information of each target detection frame containing the head of the person in the image to be identified and the information of whether each target detection frame contains the face or not through the joint detection model;

And the processing unit is used for determining the passenger flow data according to the position information of the target detection frame and/or the information of whether the target detection frame contains a human face or not.

The embodiment of the invention provides a training device of a joint detection model, which comprises the following components:

The first acquisition module is used for acquiring any sample image in the sample set, wherein the sample image is marked with first position information of each sample head frame, a first identification value corresponding to each sample head frame and a second identification value, the first identification value is used for identifying whether the sample head frame contains a head, and the second identification value is used for identifying whether the sample head frame contains a face;

The second acquisition module is used for acquiring second position information of each sample detection frame containing the human head in the sample image and information of whether each sample detection frame contains the human face or not through the original joint detection model;

The training module is used for training the original joint detection model according to the second position information of the sample detection frame and the first position information of the corresponding sample head frame, whether the sample detection frame contains the first identification value corresponding to the head information and the corresponding sample head frame and whether the sample detection frame contains the second identification value corresponding to the face information and the corresponding sample head frame.

The embodiment of the invention also provides electronic equipment, which at least comprises a processor and a memory, wherein the processor is used for realizing the steps of the data processing method or the training method of the joint detection model when executing the computer program stored in the memory.

The embodiment of the invention also provides a computer readable storage medium, which stores a computer program, the computer program realizing the steps of the data processing method or the training method of the joint detection model when being executed by a processor.

According to the embodiment of the invention, the position information of each target detection frame containing the human head in the image to be identified and the information of whether each target detection frame contains the human face or not can be obtained through the joint detection model, so that the position information of each target detection frame containing the human head in the image to be identified and the information of whether each target detection frame contains the human face or not can be extracted by only inputting the image to be identified into the joint detection model once, the calculation amount required for extracting the feature vector of the region of the target detection frame in the image to be identified for multiple times is reduced, and the efficiency of the determination process of the passenger flow data is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a data processing process according to an embodiment of the present invention;

Fig. 2 is a schematic diagram of each angle value of a face according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a specific data processing flow according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a training process of a joint detection model according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of each key point of a sample human head frame according to an embodiment of the present invention;

FIG. 6 is a schematic structural diagram of a joint detection model according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a data processing apparatus according to an embodiment of the present invention;

FIG. 8 is a schematic structural diagram of a training device for a joint detection model according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

For the purpose of promoting an understanding of the principles and advantages of the invention, reference will now be made in detail to the drawings and specific examples, some but not all of which are illustrated in the accompanying drawings. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1: fig. 1 is a schematic diagram of a data processing process according to an embodiment of the present invention, including:

s101: and acquiring the position information of each target detection frame containing the head of the person in the image to be identified and the information of whether each target detection frame contains the face or not through the joint detection model.

The embodiment of the invention is applied to electronic equipment, which can be intelligent equipment, such as intelligent robots, intelligent screens, intelligent monitoring systems and the like, and can also be servers and the like.

After the electronic equipment acquires the image to be identified, the image to be identified is analyzed based on the data processing method provided by the embodiment of the invention, so that passenger flow data is determined. The image to be identified may be acquired by the electronic device itself, or may be sent by other image acquisition devices, which is not limited herein.

In order to improve the efficiency of determining passenger flow data, in the embodiment of the invention, a joint detection model is trained in advance. After the electronic equipment acquires the image to be identified, the image to be identified is processed through the joint detection model which is trained in advance, so that the position information of each target detection frame containing the head of a person in the image to be identified and the information of whether each target detection frame contains the face of the person are acquired.

The position information of the target detection frame is the coordinates (e.g., pixel coordinates) of the region where each head of the person is located in the image to be identified (i.e., pixel coordinates of the region where each head of the person is located in the image to be identified), and then the region of each target detection frame in the image to be identified can be determined according to the position information of each target detection frame, for example, the coordinate value of the pixel point in the upper left corner of the target detection frame in the image to be identified and the coordinate value of the pixel point in the lower right corner of the target detection frame in the image to be identified.

It should be noted that, whether each obtained target detection frame includes information about a human face may be whether the target detection frame includes an identification value of the human face, for example, the identification value of the human face included in the target detection frame is "1", and the identification value of the human face not included in the target detection frame is "0"; or the probability of whether the target detection frame contains a human face or not.

In one possible implementation, if the joint detection model outputs a probability that each target detection frame contains a face, a decision threshold is preset to determine whether the target detection frame contains a face. After the probability of whether each target detection frame contains a human face is obtained based on the above embodiment, the probability of whether each target detection frame contains a human face is compared with a decision threshold for each target detection frame, so as to determine whether the target detection frame contains a human face. Specifically, for each target detection frame, if the probability of whether the target detection frame contains a human face is greater than a decision threshold, determining that the target detection frame contains the human face, otherwise, determining that the target detection frame does not contain the human face. For example, the decision threshold is 0.8, the probability of whether a certain target detection frame contains a human face is 0.9, the probability is determined to be 0.9 to be greater than the decision threshold 0.8, and the target detection frame contains the human face.

The decision threshold may be set empirically, or may be set to different values in different scenarios. For example, if the accuracy requirement on whether the target detection frame contains the information of the human face is higher, the decision threshold value can be set to be larger; if it is desired that the face object detection box is as recognizable as possible, the decision threshold may be set smaller. Specifically, the flexible setting can be performed according to actual requirements, and is not specifically limited herein.

S102: and determining passenger flow data according to the position information of the target detection frame and/or whether the target detection frame contains the information of the human face.

After the position information of each target detection frame and the information whether each target detection frame contains a face or not are acquired based on the above embodiment, subsequent processing is performed to determine passenger flow data, that is, to determine passenger flow value parameters and/or attention number parameters in the passenger flow data. The attention times parameter counts the number of pedestrians looking at the intelligent device. For example, in an office hall, based on the counted number of pedestrians looking at the intelligent robot, corresponding data analysis is performed to determine the possibility that the pedestrians want to interact with the intelligent robot in the current scene; or in a market, the number of people looking at the intelligent screen in the current scene is counted, and corresponding data analysis is carried out, so that the effect of attracting customers by playing advertisements through the intelligent screen in the current scene is evaluated. In counting the number of attentiveness, statistics may be periodically performed, such as one day, one week, one month, etc.

In one possible implementation, since the target detection frame including the head of a person in the image to be identified is generally attributed to pedestrians currently entering the scene, the pedestrians need to be counted in the passenger flow value parameters in the passenger flow data. Therefore, in the embodiment of the invention, corresponding processing can be directly performed according to the acquired position information of the target detection frame to determine the passenger flow data, namely the passenger flow value parameter.

In another possible implementation manner, when the acquired target detection frame contains a human face, the pedestrian to which the target detection frame belongs may be looking at the intelligent device, and the number of times that the pedestrian looks at the intelligent device is counted in the attention number parameter in the passenger flow data. Therefore, in the embodiment of the invention, the passenger flow data can be determined according to whether the target detection frame contains the information of the human face, namely, the attention frequency parameter in the passenger flow data is determined. For example, the attention number parameter in the guest flow value is determined according to the number of target detection frames including the face.

In addition, because the attention number parameter counts the number of pedestrians looking at the intelligent device, under the practical application scene, the situation that the same person looks at the intelligent device all the time or the same person looks at the intelligent device many times may occur. If the attention frequency parameters in the currently stored passenger flow data are updated directly according to the target detection frames containing the human faces in each image to be identified, the same person is counted into the attention frequency parameters for multiple times, so that the counted attention frequency parameters are not high in accuracy. Therefore, in order to improve the accuracy of the determined passenger flow data, in the embodiment of the invention, corresponding processing can be performed according to the position information of the target detection frame and the information of whether the target detection frame contains a face, so as to determine the passenger flow data, namely, determine the attention frequency parameter in the passenger flow data.

Example 2: in order to accurately determine the passenger flow data, on the basis of the above embodiment, in the embodiment of the present invention, determining the passenger flow data includes:

Determining identification information of a target detection frame; if the identification information meets a preset first updating condition and the target detection frame contains a human face, updating the attention frequency parameter in the currently stored passenger flow data.

In order to avoid counting the number of times that the same person looks at the intelligent device for multiple times into the attention number parameter, in the embodiment of the invention, after the target detection frame is acquired, the identification information of the target detection frame is determined, and in order to determine the identification information of the target detection frame, a tracking queue is preset, and the tracking queue stores the head frames of the tracking persons and the identification information corresponding to each head frame of the tracking persons. After the position information of each target detection frame containing the head of the person in the image to be identified is obtained, extracting the region of the target detection frame in the image to be identified according to the position information of the target detection frame aiming at each target detection frame. And the electronic equipment carries out corresponding processing based on the region of the target detection frame in the image to be identified and each tracking head frame stored in the current tracking queue, and determines the identification information of the target detection frame. The subsequent electronic equipment can perform corresponding processing based on the identification information of the target detection frame to determine passenger flow data. The head frame tracking device is used for tracking the head frame of the pedestrian in the acquired surrounding environment. When an initial tracking queue is set, the tracking queue can be empty, and in the subsequent data processing process, the tracking queue is updated in real time according to each target detection frame distributed with the identification information.

In one possible implementation, the process of determining the identification information of the target detection frame includes: the similarity between the target detection frame and each tracking head frame in the current tracking queue is respectively determined, the identification information of the tracking head frame with higher similarity with the target detection frame is determined through a Hungary algorithm and each similarity, and if the similarity between any tracking head frame corresponding to the identification information and the target detection frame is determined to be greater than a set threshold value, the identification information is determined to be the identification information of the target detection frame; otherwise, new identification information is allocated to the target detection frame, and the newly allocated identification information is determined to be the identification information of the target detection frame.

When determining the similarity between the target detection frame and each tracking head frame in the current tracking queue, the method for determining the similarity may be to calculate the overlapping proportion of the target detection frame and each tracking head frame in the current tracking queue, that is, calculate the overlapping rate of any tracking head frame and the region of the target detection frame in the image to be identified, or calculate the spatial distance between the region of the target detection frame in the image to be identified and each tracking head frame in the current tracking queue, such as the euclidean distance, chebyshev distance, etc. The manner in which the similarity is specifically determined is not specifically limited herein.

It should be noted that, the identification information of the target detection frame is used for uniquely identifying the identity information of the object to which the target detection frame belongs, and the identification information may be numbers, letters, special symbols, character strings, etc., or may be in other forms, so long as the identity information capable of uniquely identifying the object to which the target detection frame belongs can be used as the identification information in the embodiment of the present invention.

In the embodiment of the invention, in order to improve the accuracy of the determined attention frequency parameter, a first updating condition is preset, and the first updating condition may be whether the obtained identification information of the target detection frame is not matched with any statistical identification stored in the current statistical queue. The matching means that the current statistics queue stores the statistics identifier identical to the identifier information of the target detection frame.

Specifically, after determining the identification information of the target detection frame based on the method of the embodiment, matching the identification information of the target detection frame with any one of the statistical identifications stored in the current statistical queue, and if it is determined that the identification information of the target detection frame is not matched with any one of the statistical identifications stored in the current statistical queue, indicating that the identification information of the target detection frame meets a preset first updating condition; otherwise, the identification information of the target detection frame is not satisfied with the preset first updating condition.

Wherein the statistics queue includes identification information of pedestrians in the surrounding environment that have been counted into the attention number parameter. When an initial statistics queue is set, the statistics queue can be empty, and in the subsequent process of determining the attention frequency parameters, the statistics queue is updated in real time according to each target detection frame allocated with the identification information. In specific implementation, when the target detection frame is determined to contain a human face and the identification information of the target detection frame is determined to meet a preset first updating condition, the attention frequency parameter in the currently stored passenger flow data is updated.

Further, after updating the attention frequency parameter in the currently stored passenger flow data, the identification information of the target detection frame is updated to the current statistics queue, so that the attention frequency parameter in the currently stored passenger flow data is conveniently updated according to the target detection frame of the identification information.

In another possible embodiment, the method further comprises: acquiring a first angle identification value corresponding to each target detection frame through a joint detection model, wherein the first angle identification value is used for identifying the angle value of a face in the target detection frame containing the face or identifying that the target detection frame does not contain the face;

Determining that the identification information meets a preset first updating condition comprises the following steps: if the identification information is not matched with any statistical identification stored in the current statistical queue, and the first angle identification value corresponding to the target detection frame corresponding to the identification information meets a preset second updating condition, determining that the identification information meets the first updating condition.

In the practical application process, although the target detection frame contains a human face in the acquired image to be identified, the object to which the human face belongs does not necessarily look at the intelligent device, for example, the object to which the target detection frame a containing the human face belongs looks toward the sky, and the object to which the target detection frame B containing the human face belongs turns to look elsewhere. But generally to the face of the smart device, the angle of which is within a certain range. Therefore, in order to further improve the accuracy of the counted attention number parameter, in the embodiment of the present invention, the preset first updating condition may also be that the identification information of the target detection frame is not matched with any one of the counted identifications stored in the current counting queue, and the angle identification value (recorded as the first angle identification value) of the target detection frame corresponding to the identification information meets the preset third updating condition. The third updating condition is whether the angle value of any target detection frame corresponding to the identification information is smaller than a preset angle threshold value.

The first angle identification value is used for identifying an angle value of a face in a target detection frame containing the face, or identifying that the target detection frame does not contain the face. For example, the first angle identification value corresponding to the target detection frame a including the face is 30, the pitch angle of the face in the target detection frame a is 30 degrees, and the first angle identification value corresponding to the target detection frame B not including the face is null.

It should be noted that the angle value may be a pitch angle value and/or a yaw angle value of the face. Fig. 2 is a schematic diagram of each angle value of a face according to an embodiment of the present invention, where the angle value corresponding to Yaw is a Yaw angle value, and the angle value corresponding to Pitch is a Pitch angle value.

It should be noted that, the first angle identification value used for identifying that the target detection frame does not include a human face may be any angle value greater than 180 degrees, or may be any other form such as a character, so long as the first angle identification value can be distinguished from the angle value of the human face in the target detection frame including the human face, and the identification values for identifying that the target detection frame does not include the human face may all be used in the embodiments of the present invention.

In the embodiment of the invention, the image to be identified is input into the joint detection model, and through the joint detection model, not only the position information of each target detection frame containing the head of a person in the image to be identified and the information of whether each target detection frame contains the face of the person or not can be obtained, but also the first angle identification value corresponding to each target detection frame can be obtained.

For each obtained target detection frame, determining the identification information of the target detection frame based on the method of the embodiment according to the obtained position information of the target detection frame, then judging whether the identification information is not matched with any statistical identification stored in the current statistical queue, and determining whether a first angle identification value corresponding to the target detection frame corresponding to the identification information meets a preset second updating condition, namely determining whether the first angle identification value corresponding to any target detection frame corresponding to the identification information is smaller than a preset angle threshold value. If the identification information is not matched with any statistical identification stored in the current statistical queue and the first angle identification value corresponding to the target detection frame corresponding to the identification information meets a preset second updating condition, determining that the identification information meets the first updating condition.

The angle value identified by the first angle identification value corresponding to the target detection frame containing the face may include a pitch angle value and a yaw angle value. When the preset angle threshold is set, the angle thresholds corresponding to the pitching angle value and the yawing angle value can be the same or different. The specific setting can be flexibly set according to the requirements, and the specific limitation is not provided herein. However, whether the angle threshold value corresponding to the pitch angle value and the yaw angle value are the same or not, the pitch angle value and the yaw angle value of the face in the target detection frame corresponding to the identification information are smaller than the corresponding preset angle threshold values respectively, and it is determined that the first angle identification value corresponding to the target detection frame corresponding to the identification information meets the preset second updating condition.

For example, the preset angle thresholds corresponding to the pitch angle value and the yaw angle value are respectively 30 and 35, the pitch angle value 6 and the yaw angle value 12 of the face in any target detection frame corresponding to certain identification information a are determined, the pitch angle value 6 of the target detection frame is determined to be smaller than the corresponding angle threshold 30, the yaw angle value 12 of the target detection frame is determined to be smaller than the corresponding angle threshold 35, and then the first angle identification value corresponding to the target detection frame corresponding to the identification information a is determined to meet the preset second updating condition.

In an actual process, the collected image to be recognized may include a face of a pedestrian turning around, and the pedestrian does not actually look at the intelligent device, but after the position information of a target detection frame including a head of the pedestrian, the information of whether the target detection frame includes the face, and the first angle identification value corresponding to the target detection frame are acquired, after corresponding processing is performed based on the method of the embodiment, a first update condition is likely to be satisfied, so that the determined attention number parameter in the passenger flow data is inaccurate. Therefore, in order to improve accuracy of the attention number parameter in the determined passenger flow data, in the embodiment of the present invention, the preset second update condition may be whether the obtained continuously set number of images to be identified have target detection frames corresponding to a certain identification information, and whether the first angle identification values corresponding to the target detection frames corresponding to the identification information in the continuously set number of images to be identified are smaller than the preset angle threshold. Specifically, determining that the first angle identification value corresponding to the target detection frame corresponding to the identification information meets a preset second updating condition includes:

if the target detection frames corresponding to the identification information exist in the continuously set number of images to be identified, and the first angle identification values corresponding to the target detection frames corresponding to the identification information in the continuously set number of images to be identified are smaller than the preset angle threshold, determining that the first angle identification values corresponding to the target detection frames corresponding to the identification information meet the second updating condition.

In specific implementation, after determining the identification information of a certain target detection frame, determining whether the target detection frames corresponding to the identification information exist in the obtained continuous set number of images to be identified, and whether the first angle identification values corresponding to the target detection frames corresponding to the identification information in the continuous set number of images to be identified are smaller than a preset angle threshold value. If yes, the object to which the target detection frame corresponding to the identification information belongs is most likely to be seen to the intelligent device, and the first angle identification value corresponding to the target detection frame corresponding to the identification information is determined to meet the second updating condition; otherwise, the object to which the target detection frame corresponding to the identification information belongs is not seen to the intelligent device, and it is determined that the first angle identification value corresponding to the target detection frame corresponding to the identification information does not meet the second updating condition.

For example, the preset pitch angle value and the angle threshold corresponding to the yaw angle value are both 30, the number is set to be 3, the identification information of a certain target detection frame is determined to be a, the obtained continuous 3 images 1, 2 and 3 to be identified each contain the target detection frame of the identification information a, wherein the pitch angle value and the yaw angle value corresponding to the target detection frame of the identification information a in the image 1 to be identified are respectively 17 and 18, the pitch angle value and the yaw angle value corresponding to the target detection frame of the identification information a in the image 2 to be identified are respectively 24 and 18, the pitch angle value and the yaw angle value corresponding to the target detection frame of the identification information a in the image 3 to be identified are respectively 27 and 28, and the pitch angle value and the yaw angle value corresponding to the target detection frame of the identification information a in the continuous 3 images 1, 2 and 3 to be identified are respectively determined to be smaller than the preset angle threshold, and then the first angle identification value corresponding to the target detection frame of the identification information a is determined to satisfy the second updating condition.

Wherein, when setting the set number, different values can be set according to different scenes. The set number may be set smaller in order to count pedestrians looking at the smart device as much as possible, and may be set larger in order to improve the accuracy of the determined attention count parameter. In the implementation process, the method can be flexibly set according to actual requirements, and is not particularly limited herein.

Example 3: in order to accurately determine the passenger flow value parameter in the passenger flow data, on the basis of the above embodiments, in the embodiment of the present invention, determining the passenger flow data includes:

Determining identification information of a target detection frame; if the identification information of the target detection frame meets a preset third updating condition, updating the passenger flow value parameter in the currently stored passenger flow data.

In general, images to be identified acquired by an electronic device are acquired at preset time intervals, such as 50ms, 100ms, etc., and the preset time intervals are not set to be large in order to ensure that all images of people entering a shooting range can be acquired in real time. However, in the practical application process, the time spent by the same person from entering the shooting range to leaving the shooting range is generally longer than a preset time interval, so that the obtained multiple images to be identified all contain the target detection frames of the same person. If the passenger flow value parameters in the passenger flow data are determined directly according to the quantity contained in each target detection frame obtained from the image to be identified, the same person can be counted for multiple times, so that the accuracy of determining the passenger flow value parameters in the passenger flow data is not high.

Therefore, in the embodiment of the invention, before the passenger flow value parameter in the passenger flow data is determined, the identification information of the target detection frame is determined. For each target detection frame, determining whether a preset third updating condition is met or not based on the identification information of the target detection frame, so as to determine whether to update the passenger flow value parameter in the currently stored passenger flow data. The specific method for determining the identification information of the target detection frame is the same as that described in the above embodiment, and the repetition is not repeated.

In one possible implementation manner, the preset third updating condition may be whether the identification information of the target detection frame is the same as any one of the identification information stored in the current tracking queue. If the identification information of the target detection frame is identical to any one of the identification information stored in the current tracking queue, which indicates that the object to which the target detection frame belongs is counted into the passenger flow value parameter, the identification information of the target detection frame is determined not to meet a preset third updating condition, and the passenger flow value parameter in the currently stored passenger flow data is not updated. If the identification information of the target detection frame is determined to be different from any one of the identification information stored in the current tracking queue, which indicates that the object to which the target detection frame belongs is not counted into the passenger flow value parameter, the identification information of the target detection frame is determined to meet a preset third updating condition, and the passenger flow value parameter in the currently stored passenger flow data is updated.

In another possible implementation, since the acquired images to be identified may be acquired at the gates of application scenes such as a hall, a mall, a bus, etc., the images to be identified acquired at the gates of the application scenes may include each object detection frame of an object entering the application scene, but may easily include the object detection frames of objects passing through the gates of the application scenes. For example, in the case of counting passenger flow data of a bus, an image to be identified is generally collected at a door of the bus, and the collected image to be identified includes not only a target detection frame of an object to be on the bus, but also a target detection frame of an object passing through a gate of the bus, and if the identification information of the target detection frame meets a preset third updating condition, the determined passenger flow data is easy to be inaccurate, which is determined directly according to whether the identification information identical to the identification information of the target detection frame is stored in a current tracking queue.

Therefore, the preset third updating condition may also be whether the number of the tracking head frames corresponding to the identification information is greater than a set number threshold, and based on the image information of each two tracking head frames corresponding to the identification information, the sum of the determined distances is greater than a set distance threshold. Specifically, after the identification information of the target detection frame is obtained based on the above embodiment, it is determined whether the tracking head frames corresponding to the identification information are stored in the current tracking queue, if the tracking head frames corresponding to the identification information are stored, the tracking head frames corresponding to the identification information are obtained, and then the moving distances corresponding to the two tracking head frames are determined according to the image information of every two adjacent tracking head frames in the acquisition time. And then determining the sum of the distances according to each acquired moving distance. Judging whether the number of the head frames of the tracking person corresponding to the identification information is larger than a number threshold value or not, and judging whether the sum of the acquired distances is larger than a distance threshold value or not.

Further, if the number of the head frames corresponding to the identification information is greater than a number threshold and the sum of the acquired distances is greater than a distance threshold, the identification information is indicated to meet a preset third updating condition, and then the passenger flow value parameter in the currently stored passenger flow data is updated; if the number of the head frames of the tracked person corresponding to the identification information is not greater than a number threshold value or the sum of the acquired distances is not greater than a distance threshold value, which indicates that the identification information does not meet a preset third updating condition, the processing of updating the passenger flow value parameter in the currently stored passenger flow data is not executed.

When determining the corresponding moving distance of each two tracking head frames, the moving distance can be determined according to the position information of the pixel points of the set positions of the two tracking head frames in the image to be identified. For example, according to the distance of the coordinate values of the pixel points at the upper left corner of the two tracking head frames at the image to be identified, the distance of the coordinate values of the pixel points at the lower right corner of the tracking head frames at the image to be identified, the distance of the coordinate values of the pixel points at the midpoint of the diagonal line of the tracking head frames at the image to be identified, and the like.

Wherein the number threshold is generally not greater than a quotient determined from a product of a maximum photographing distance in a preset photographing range and a time interval and an average speed. In order to improve the accuracy of the passenger flow value parameter in the determined passenger flow data, the quantity threshold value is not required to be too small; in order to avoid that the object is not counted by the missing situation because the walking speed of the object is too fast, the number threshold value is not too large. In the specific implementation, the flexible setting can be performed according to actual requirements, and the specific limitation is not limited herein.

The distance threshold is generally not greater than the maximum shooting distance in the preset shooting range. In order to improve the accuracy of the passenger flow value parameter in the determined passenger flow data, the distance threshold value is not required to be too small; in order to avoid the situation that the object is not counted because the walking speed of the object is too high, the distance threshold value is not too large. In the specific implementation, the flexible setting can be performed according to actual requirements, and the specific limitation is not limited herein.

Example 4: in order to achieve the above-mentioned embodiments, the method for obtaining the position information of each target detection frame containing a human head in an image to be identified and the information of whether each target detection frame contains a human face only by using a joint detection model includes:

Extracting a network layer through the characteristics of the joint detection model to obtain the characteristic vector of the image to be identified; the feature vectors are respectively input into a position detection layer, a human head classification layer and a human face classification layer of the joint detection model to obtain first position vectors corresponding to detection frames, first human head classification vectors corresponding to the detection frames and first human face classification vectors corresponding to the detection frames which are recognized in the image to be recognized; and acquiring the position information of each target detection frame containing the human head in the image to be identified and the information of whether each target detection frame contains the human face or not based on the first position vector, the first human head classification vector and the first human face classification vector through an output layer of the joint detection model.

In the embodiment of the invention, the network structure of the joint detection model comprises a feature extraction network layer, a position detection layer, a human head classification layer, a human face classification layer and an output layer. Fig. 6 is a schematic diagram of a network structure of a joint detection model according to an embodiment of the present invention, where a position detection layer, a head classification layer, and a face classification layer of the joint detection model are respectively connected to a feature extraction network layer and an output layer. Specifically, after the image to be identified is obtained, the image to be identified is input into a joint detection model which is trained in advance, and the feature vector of the input image to be identified can be obtained through a feature extraction network layer in the joint detection model. In order to obtain the feature vector of the image to be identified conveniently, the feature extraction network layer is generally a network structure with small calculation amount, few parameters and good quantization, for example, FBNet, mobilenet with QF and other network structures.

In the embodiment of the invention, the feature extraction network layer in the joint detection model is simultaneously connected with a plurality of network layers, namely a position detection layer, a human head classification layer and a human face classification layer. In the specific implementation process, after a feature extraction network layer of the joint detection model acquires a feature vector of an image to be identified, the feature vector is respectively input into a position detection layer, a human head classification layer and a human face classification layer of the joint detection model; acquiring a first position vector corresponding to each detection frame identified in the image to be identified based on the feature vector through a position detection layer of the joint detection model; the first human head classification vector corresponding to each detection frame identified in the image to be identified is obtained based on the feature vector through the human head classification layer of the joint detection model; the face classification layer of the joint detection model is used for acquiring a first face classification vector corresponding to each detection frame identified in the image to be identified based on the feature vector; and through an output layer of the joint detection model, based on the obtained first position vector, the first human head classification vector and the first human face classification vector, performing corresponding processing to obtain position information of each target detection frame containing the human head in the image to be identified and information of whether each target detection frame contains the human face or not.

In one possible implementation manner, the obtaining, by the output layer of the joint detection model, the position information of each target detection frame including the head in the image to be identified and the information of whether each target detection frame includes the face based on the first position vector, the first head classification vector, and the first face classification vector includes:

Sequentially determining the position information corresponding to each detection frame and a first index value according to the sequence based on the first position vector through an output layer of the joint detection model, wherein the first index value is used for identifying the position of the position information corresponding to the detection frame in the first position vector; based on the first head classification vector, sequentially determining whether each detection frame contains head information and a second index value, wherein the second index value is used for identifying the position of the head information contained in the detection frame in the first head classification vector; based on the first face classification vector, sequentially determining whether each detection frame contains the information of the face and a third index value, wherein the third index value is used for identifying whether the detection frame contains the position of the information of the face in the first face classification vector; determining a target second index value corresponding to a target detection frame containing a human head from the second index values, determining a target first index value identical to the target second index value from the first index values, and determining a target third index value identical to the target second index value from the third index values; and determining the position information of the detection frame corresponding to the target first index value, the information of whether the detection frame corresponding to the target third index value contains the human face or not as the position information of the target detection frame and the information of whether the target detection frame contains the human face or not.

In the embodiment of the present invention, the first position vector includes the position information of each detection frame identified in the image to be identified, and the first number of elements in the first position vector required for determining the position information of each detection frame is preconfigured, for example, 4. Therefore, through the output layer of the joint detection model, the input first position vector can be divided into a plurality of sub position vectors according to a preset first quantity, and an index value (marked as a first index value) corresponding to each sub position vector is determined, wherein a first quantity of elements contained in each sub position vector is position information of one detection frame, and the first index value corresponding to any sub position vector is used for identifying the position of the sub position vector in the first position vector, namely, the position of the position information corresponding to the detection frame in the first position vector.

The first human head classification vector comprises information about whether human heads are contained in each detection frame identified in the image to be identified, and the second number of elements in the first human head classification vector required for determining whether the human heads are contained in each detection frame is also preconfigured. Therefore, through the output layer of the joint detection model, the input first head classification vector can be divided into a plurality of sub head classification vectors according to a preset second number, and an index value (marked as a second index value) corresponding to each sub head classification vector is determined, wherein the second number of elements contained in each sub head classification vector is information about whether a head is contained in one detection frame, and the second index value corresponding to any sub head classification vector is used for identifying the position of the sub head classification vector in the first head classification vector, namely, the position of the information about whether the head is contained in the detection frame in the first head classification vector.

Similarly, the first face classification vector includes information about whether each detection frame identified in the image to be identified includes a face, and the third number of elements in the first face classification vector required for determining whether each detection frame includes the face is also preset. Therefore, through the output layer of the joint detection model, the input first face classification vector can be divided into a plurality of sub-face classification vectors in turn according to a preset third number, and an index value (for convenience of description, denoted as a third index value) corresponding to each sub-face classification vector is determined, where the third number of elements included in each sub-face classification vector is information about whether a detection frame includes a face, and the third index value corresponding to any sub-face classification vector is used to identify the position of the sub-face classification vector in the first face classification vector, that is, the position of the information about whether the detection frame includes a face in the first face classification vector is used to identify the position of the information about whether the detection frame includes a face in the first face classification vector.

For each obtained second index value, if it is determined that the detection frame corresponding to the second index value includes a head of a person, determining the second index value as a target second index value, determining the detection frame corresponding to the target second index value as a target detection frame, searching for a first index value (marked as a target first index value) identical to the target second index value from each obtained first index value, searching for a third index value (marked as a target third index value) identical to the target second index value from each obtained third index value, and finally determining the position information of the detection frame corresponding to the target first index value, whether the detection frame corresponding to the target third index value includes face information, and determining the position information of the target detection frame and whether the target detection frame includes face information.

In another possible implementation manner, in order to obtain a first angle identification value corresponding to each target detection frame containing a human head in the image to be identified, the joint detection model further includes an angle detection layer, where the angle detection layer is connected to the feature extraction network layer and the output layer in the joint detection model respectively. When the feature extraction network layer in the joint detection model is used for obtaining the feature vector of the image to be identified, the feature vector is also input to the angle detection layer of the joint detection model so as to obtain a first angle identification vector corresponding to each detection frame identified in the image to be identified. Therefore, the feature vectors are respectively input to the position detection layer, the head classification layer and the face classification layer of the joint detection model to obtain a first position vector corresponding to each detection frame, a first head classification vector corresponding to each detection frame and a first face classification vector corresponding to each detection frame, which are identified in the image to be identified, and the method further comprises: inputting the feature vector to an angle detection layer of the joint detection model to obtain a first angle identification vector corresponding to each detection frame;

Based on the first position vector, the first human head classification vector and the first human face classification vector, the method for acquiring the position information of each target detection frame containing the human head in the image to be identified and the information of whether each target detection frame contains the human face or not through the output layer of the joint detection model comprises the following steps: and acquiring the position information of each target detection frame containing the head in the image to be identified, the information of whether each target detection frame contains the face or not and the first angle identification value corresponding to each target detection frame based on the first position vector, the first head classification vector, the first face classification vector and the first angle identification vector through an output layer of the joint detection model.

In the embodiment of the invention, after the angle detection layer of the joint detection model acquires the first angle identification vector, the first angle identification vector is also input to the output layer of the joint detection model so as to acquire a first angle identification value corresponding to each target detection frame containing the human head in the image to be identified.

Specifically, the first angle identification vector includes first angle identification values corresponding to the detection frames identified in the image to be identified, and the fourth number of elements in the first angle identification vector required for determining the first angle identification values corresponding to the detection frames is also preconfigured. Therefore, through the output layer of the joint detection model, the input first angle identification vector can be divided into a plurality of sub-angle identification vectors according to a preset fourth number, and an index value (marked as a fourth index value) corresponding to each sub-angle identification vector is determined, wherein the fourth number of elements contained in each sub-angle identification vector is the first angle identification value corresponding to one detection frame, and the fourth index value corresponding to any sub-angle identification vector is used for identifying the position of the sub-angle identification vector in the first angle identification vector, namely, the position of the first angle identification value corresponding to the detection frame in the first angle identification vector.

Further, after determining the target second index value corresponding to the target detection frame including the head of the person from the second index values based on the above embodiment, determining, for each target second index value, a fourth index value (denoted as a target fourth index value) identical to the target second index value from the obtained fourth index values, determining, as the position information of the target detection frame, whether the detection frame corresponding to the target first index value includes the information of the face, and the first angle identification value corresponding to the detection frame corresponding to the target fourth index value, whether the target detection frame includes the information of the face, and the first angle identification value corresponding to the target detection frame.

Fig. 3 is a schematic diagram of a specific data processing flow provided in an embodiment of the present invention, where the flow includes:

s301: and respectively acquiring the position information of each target detection frame containing the head of the person in the image to be identified, the information of whether each target detection frame contains the face or not and the first angle identification value corresponding to each target detection frame through the joint detection model.

The number of target detection frames included in the image to be identified acquired in the above step may be plural, and the following processing is performed for each acquired target detection frame:

S302: and determining the identification information of the target detection frame.

If it is determined that the passenger flow data includes the attention number parameter, continuing to execute S303 to S307; if it is determined that the passenger flow data includes the passenger flow value parameter, S308 to S310 are continuously executed.

S303: and judging whether the target detection frame contains a human face, if so, executing S304, otherwise, executing S307.

S304: and judging that the identification information is not matched with any statistical identification stored in the current statistical queue, if yes, executing S305, otherwise, executing S307.

S305: and judging whether target detection frames corresponding to the identification information exist in the continuously set number of images to be identified, and whether first angle identification values corresponding to the target detection frames corresponding to the identification information in the continuously set number of images to be identified are smaller than a preset angle threshold value, if yes, executing S306, otherwise, executing S307.

S306: and updating the attention frequency parameter in the currently stored passenger flow data.

S307: the attention number parameter in the currently stored passenger flow data is not updated.

S308: and judging whether the identification information of the target detection frame is the same as the identification information of any tracking head frame in the current tracking queue, if so, executing S309, otherwise, executing S310.

And determining the motion trail of the object to which the tracking head frame of the identification information belongs according to each image to be identified of the tracking head frame containing the identification information aiming at each identification information stored in the tracking queue, thereby realizing short-time tracking.

S309: and updating the passenger flow value parameter in the passenger flow data stored currently.

S310: and not updating the passenger flow value parameter in the currently stored passenger flow data.

Example 5: in order to improve the efficiency of data processing, the embodiment of the invention also provides a training method of the joint detection model. As shown in fig. 4, the method includes:

S401: any sample image in the sample set is obtained, the sample image is marked with first position information of each sample head frame, a first identification value corresponding to each sample head frame and a second identification value, wherein the first identification value is used for identifying whether the sample head frame contains a head, and the second identification value is used for identifying whether the sample head frame contains a face.

S402: and acquiring second position information of each sample detection frame containing the head of the person in the sample image and information of whether each sample detection frame contains the face or not through the original joint detection model.

S403: and training the original joint detection model according to the second position information of the sample detection frame and the first position information of the corresponding sample head frame, whether the sample detection frame contains the first identification value corresponding to the head information and the corresponding sample head frame and whether the sample detection frame contains the second identification value corresponding to the face information and the corresponding sample head frame.

The training method of the joint detection model provided by the embodiment of the invention is applied to the electronic equipment, and the electronic equipment can be a server and the like. The device for training the original joint detection model may be the same as or different from the electronic device for performing data processing in the above embodiment.

In order to improve the efficiency of data processing, the original joint detection model can be trained according to any sample image in a pre-acquired sample set. The sample image is marked with position information (marked as first position information) of a sample head frame, an identification value (marked as first identification value) corresponding to the sample head frame and containing a human head, and an identification value (marked as second identification value) whether containing a human face or not.

The first identification value and the second identification value may be represented by numerals, for example, the second identification value including the face is "1", the second identification value not including the face is "0", or may be represented by other forms such as a character string. However, in order to distinguish the first identification value from the second identification value, the specific content or the representation form of the first identification value and the second identification value are different, for example, the first identification value containing the head is "a", the second identification value containing the face is "1", and the second identification value not containing the face is "0". In the specific implementation, the flexible setting can be performed according to actual requirements, and the specific limitation is not limited herein.

In addition, in order to increase the diversity of the sample images, the angles of the faces of the person in each sample image should be as different as possible for a plurality of sample images including the same person, for example, the face of a person x in the sample image a is the front face of the person, the face of a person x in the sample image b is the side face of the person turning 45 degrees to the right, and the face of a person x in the sample image b is the side face of the person turning 45 degrees to the left.

In the embodiment of the invention, the position information (marked as the second position information) of each sample detection frame containing the head of the person in the sample image and the information whether each sample detection frame contains the face or not can be obtained through the original joint detection model. Specifically, the network layer is extracted through the features of the original joint detection model to obtain the feature vector of the sample image, and then the feature vector is respectively input to the position detection layer, the head classification layer and the face classification layer of the original joint detection model to obtain the position vector (marked as a second position vector) corresponding to each detection frame, the head classification vector (marked as a second head classification vector) corresponding to each detection frame and the face classification vector (marked as a second face classification vector) corresponding to each detection frame identified in the sample image. And finally, acquiring second position information of each sample detection frame containing the human head in the sample image and information of whether each sample detection frame contains the human face or not based on the second position vector, the second human head classification vector and the second human face classification vector through an output layer of the original joint detection model.

It should be noted that, for convenience of explanation, the training process of the original joint detection model is described in detail in the embodiment of the present invention for the training process of any acquired sample image. In the actual training process, the operation is performed on each sample image in the sample set, and when a preset convergence condition is met, the combined detection model is determined to be trained.

The meeting of the preset convergence condition may be that the number of the sample images correctly identified by the sample set through the joint detection model is greater than a set number, or the number of iterations of training the joint detection model reaches a set maximum number of iterations, etc. The implementation may be flexibly set, and is not particularly limited herein.

In one possible implementation manner, when the original joint detection model is trained, the sample images in the sample set can be divided into training sample images and test sample images, the original joint detection model is trained based on the training sample images, and then the trained joint detection model is verified based on the test sample images.

Example 6: in order to obtain an angle identification value corresponding to a target detection frame through a trained joint detection model, whether a person to whom the target detection frame belongs looks at intelligent equipment or not is accurately determined according to the angle value of the face in the target detection frame containing the face, and further accuracy of determined passenger flow data is improved;

the method further comprises the steps of: acquiring a second angle identification value corresponding to the sample detection frame through the original joint detection model, wherein the second angle identification value is used for identifying the angle value of the face in the sample detection frame containing the face or identifying that the sample detection frame does not contain the face;

Training the original joint detection model, further comprising: and training the original joint detection model according to the second angle identification value corresponding to the sample detection frame and the third angle identification value corresponding to the corresponding sample human head frame.

In the above embodiment, when determining whether the identification information of the target detection frame meets the preset first update condition, the first angle identification value corresponding to the target detection frame corresponding to the identification information needs to be acquired. Therefore, the angle identification value (marked as a third angle identification value) corresponding to each sample human head frame is also marked in any sample image in the sample set. The third angle identification value is used for the angle value of the face in the sample head frame containing the face or for identifying that the face is not wrapped in the sample head frame.

In order to further acquire a joint detection model with higher precision, in the embodiment of the invention, the angle value includes at least one of a yaw angle value, a pitch angle value and a roll angle value, for example, the yaw angle value, the pitch angle value and the roll angle value of a face in the sample head frame can be used as supervision information to train the original joint detection model. As shown in fig. 2, the angle identification value corresponding to Roll is a Roll angle value.

After any sample image in the sample set is acquired, the angle identification value (marked as a second angle identification value) corresponding to the sample detection frame in the sample image can also be acquired through the original joint detection model. The second angle identification value is used for identifying the angle value of the human face in the sample detection frame containing the human face or identifying that the sample detection frame does not contain the human face. And when the original joint detection model is trained, training the original joint detection model according to the second angle identification value corresponding to the acquired sample detection frame and the third angle identification value corresponding to the corresponding sample human head frame.

Specifically, the process of obtaining the second angle identification value corresponding to the sample detection frame through the original joint detection model includes: when the network layer is extracted through the characteristics of the original joint detection model, after the characteristic vector of the sample image is obtained, the characteristic vector is also input to the angle detection layer of the original joint detection model so as to obtain a second angle identification vector corresponding to each detection frame identified in the sample image. And acquiring second position information of each sample detection frame containing the human head in the sample image, whether each sample detection frame contains the information of the human face and a second angle identification value corresponding to each sample detection frame based on the second position vector, the second human head classification vector, the second human face classification vector and the second angle identification vector through an input layer of the original joint detection model.

And training the original joint detection model according to the second angle identification value corresponding to the sample detection frame and the third angle identification value corresponding to the corresponding sample head frame.

Example 7: because the human face in the target detection frame containing the human face necessarily contains key points on the human face, and the position information of the key points on the human face is favorable for determining the angle value of the human face in the target detection frame containing the human face, in order to obtain the key point position vector corresponding to each sample detection frame through the joint detection model, training the original joint detection model according to the sample key point position vector corresponding to the sample detection frame and the sample key point position vector corresponding to the corresponding pre-marked sample head frame, so as to further improve the accuracy of the trained joint detection model by adding the supervision information for training the joint detection model;

Obtaining a second angle identification value corresponding to each sample detection frame in the sample image through the original joint detection model, and further comprising: acquiring a key point position vector corresponding to each sample detection frame in a sample image through an original joint detection model;

training the original joint detection model, further comprising: and training the original joint detection model according to the key point position vector corresponding to the sample detection frame and the sample key point position vector corresponding to the corresponding sample human head frame.

In the practical application process, the detection frame generally comprises a detection frame of a human face, the detection frame necessarily comprises key points on the human face, and the angle value of the human face in the detection frame comprising the human face is determined by the position information of the key points on the human face, for example, the position information of the key points such as nose tip, lip peak, eye head and the like, and the accuracy of the angle value of the human face has stronger correlation with the position information of the extracted key points. Therefore, in order to further improve the accuracy of the trained joint detection model, in the embodiment of the invention, the original joint detection model can be trained by taking the position information of the key points of the face in the detection frame containing the face as the supervision information.

In the specific implementation process, sample key point position vectors corresponding to each sample head frame are also marked in the sample image, and the sample key point position vectors comprise position information of key points of faces in the sample head frames. In order to obtain the key point position vector corresponding to each detection frame in the sample image through the original joint detection model, the original joint detection model further comprises a key point detection layer. As shown in fig. 6, the keypoint detection layer is respectively connected with the feature extraction network layer and the output layer in the original joint detection model, so that the keypoint position vector corresponding to each sample detection frame in the sample image is obtained through the original detection model, thereby improving the accuracy of training the original joint detection model.

Specifically, the process of obtaining the key point position vector corresponding to each sample detection frame in the sample image through the original joint detection model includes: when the network layer is extracted through the characteristics of the original joint detection model, after the characteristic vector of the sample image is obtained, the characteristic vector is also input to the key point detection layer of the original joint detection model so as to obtain the total key point position vector corresponding to each detection frame identified in the sample image. And acquiring second position information of each sample detection frame containing the human head in the sample image, whether each sample detection frame contains information of the human face, a second angle identification value corresponding to each sample detection frame and a key point position vector corresponding to each sample detection frame based on the second position vector, the second human head classification vector, the second human face classification vector, the second angle identification vector and the total key point position vector through an input layer of the original joint detection model.

Fig. 5 is a schematic diagram of each key point of a sample head frame provided in an embodiment of the present invention, key points are marked on positions of nose tip, lip peak, lip angle, eye head, etc. of a face in the sample head frame, and a sample key point position vector corresponding to the sample head frame can be determined according to position information of the key points.

The number of key points marked on the head frame of the sample containing the face can be set to different values according to different scenes. If it is desired to reduce the time taken for training of the original detection model, the number of keypoints may be set smaller; if it is desired to obtain a higher accuracy of the original detection model, the number of the key points may be set to be larger. However, the method is not suitable for being too large, the calculation amount is very large when the original detection model is trained, and the original detection model after training is not easy to obtain. Preferably, the number of key points is generally 96 points, 106 points, etc.

And in the subsequent training of the original joint detection model, the original joint detection model can be trained by the obtained key point position vector corresponding to the sample detection frame and the sample key point position vector corresponding to the corresponding sample human head frame.

In one possible implementation, training the original joint detection model includes:

for each sample detection frame, if the sample detection frame is matched with any sample human head frame, determining a position loss value according to second position information of the sample detection frame and first position information of the matched sample human head frame; determining a first head loss value according to whether the sample detection frame contains head information and a first identification value corresponding to the matched sample head frame; determining a face loss value according to whether the sample detection frame contains the information of the face and a second identification value corresponding to the matched sample head frame; determining an angle loss value according to a second angle identification value corresponding to the sample detection frame and a third angle identification value corresponding to the matched sample head frame; determining a key point loss value according to the key point position vector corresponding to the sample detection frame and the sample key point position vector corresponding to the matched sample human head frame; determining a sub-loss value according to the position loss value, the first head loss value, the face loss value, the angle loss value and the key point loss value; if the sample detection frame is not matched with any sample human head frame, determining a second human head loss value according to whether the sample detection frame contains human head information and a preset first numerical value; determining a sub-loss value according to the second head loss value; and training the original joint detection model according to the sum of the sub-loss values corresponding to each sample detection frame.

In the embodiment of the invention, when determining the sub-loss value according to the determined position loss value, the first head loss value, the face loss value, the angle loss value and the key point loss value, the sub-loss value can be determined directly according to the sum of the determined position loss value, the first head loss value, the face loss value, the angle loss value and the key point loss value, or after performing certain algorithm processing on the position loss value, the first head loss value, the face loss value, the angle loss value and the key point loss value, the sub-loss value is determined, for example, the product of the position loss value and the corresponding weight value, the product of the first head loss value and the corresponding weight value, the product of the face loss value and the corresponding weight value, the product of the angle loss value and the corresponding weight value, and the product of the key point loss value and the corresponding weight value are determined according to the sum of each product.

Because the training influence of different information on the original joint detection model is different, the weight values respectively corresponding to the position loss value, the first head loss value, the face loss value, the angle loss value and the key point loss value can be the same or different. Each weight value may be set by an adaptive algorithm, or may be set by an artificial experience value, which is not particularly limited herein.

Because the third angle identification value corresponding to the sample head frame which does not contain the human face is only used for identifying that the sample head frame does not contain the human face. Therefore, when it is determined that the matched sample head frame does not include a face, the angle loss value is determined without determining the angle loss value according to the second angle identification value corresponding to the sample head frame and the third angle identification corresponding to the matched sample head frame, and the preset second value can be directly determined as the angle loss value. Specifically, determining an angle loss value according to the second angle identification value corresponding to the sample detection frame and the third angle identification value corresponding to the matched sample human head frame includes:

and if the matched sample head frame does not contain a face, a preset second numerical value is determined to be an angle loss value.

In order to improve the accuracy of the trained joint detection model, the preset second value is generally a smaller value such as "0", "0.1", and the like.

Similarly, the sample key point position vector corresponding to the sample head frame which does not contain the human face is only used for identifying that the sample head frame does not contain the human face. Therefore, when it is determined that the sample head frame corresponding to the sample head frame matched by a certain sample detection frame does not contain a human face, the key point loss value is not required to be determined according to the key point position vector corresponding to the sample detection frame and the sample key point position vector corresponding to the matched sample head frame, and the preset third value can be directly determined as the key point loss value. Specifically, determining a key point loss value according to the key point position vector corresponding to the sample detection frame and the sample key point position vector corresponding to the matched sample human head frame includes: and if the matched sample head frame does not contain a human face, determining a preset third numerical value as a key point loss value.

In order to improve the accuracy of the trained joint detection model, the third preset value is generally a smaller value such as "0", "0.1", and the like.

In another possible implementation manner, for each sample detection frame, matching the sample detection frame with any sample head frame, if the sample detection frame is not matched with any sample head frame, indicating that the sample detection frame is a detection frame without a head, determining a head loss value (for convenience of description, denoted as a second head loss value) directly according to information about whether the sample detection frame includes a head and a preset first value; and determining a sub-loss value according to the second head loss value.

In order to improve the accuracy of the trained joint detection model, the first value is generally a smaller value such as "0", "0.1", etc. In the embodiment of the present invention, the first value, the second value, and the third value may be the same or different, and are not specifically limited herein.

When determining the sub-loss value according to the second head loss value, the second head loss value may be directly determined as the sub-loss value, or the sub-loss value may be determined by performing a certain algorithm on the second head loss value, for example, by determining the sub-loss value according to the product of the second sub-loss value and the corresponding weight value. And training the original joint detection model according to the obtained sum of each sub-loss value so as to update the values of the parameters in the original joint detection model. In a specific implementation, when the original joint detection model is trained according to the sum of each sub-loss value, a gradient descent algorithm can be adopted to counter-propagate the gradient of the parameter in the original joint detection model, so that the value of the parameter of the original joint detection model is updated.

Example 8: fig. 7 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention, where the apparatus includes:

An obtaining unit 71, configured to obtain, through a joint detection model, position information of each target detection frame including a head of a person in an image to be identified, and information whether each target detection frame includes a face of the person;

the processing unit 72 is configured to determine passenger flow data according to the position information of the target detection frame and/or whether the target detection frame contains the information of the face.

In one possible implementation, the processing unit 72 is specifically configured to: determining identification information of a target detection frame; if the identification information meets a preset first updating condition and the target detection frame contains a human face, updating the attention frequency parameter in the currently stored passenger flow data.

In a possible implementation manner, the obtaining unit 71 is further configured to obtain, through the joint detection model, a first angle identification value corresponding to each target detection frame, where the first angle identification value is used to identify an angle value of a face in a target detection frame that includes a face, or identify that the target detection frame does not include a face;

The processing unit 72 is specifically configured to: if the identification information is not matched with any statistical identification stored in the current statistical queue, and the first angle identification value corresponding to the target detection frame corresponding to the identification information meets a preset second updating condition, determining that the identification information meets the first updating condition.

In one possible implementation, the processing unit 72 is specifically configured to: if the target detection frames corresponding to the identification information exist in the continuously set number of images to be identified, and the first angle identification values corresponding to the target detection frames corresponding to the identification information in the continuously set number of images to be identified are smaller than the preset angle threshold, determining that the first angle identification values corresponding to the target detection frames corresponding to the identification information meet the second updating condition.

In a possible embodiment, the obtaining unit 71 is specifically configured to: extracting a network layer through the characteristics of the joint detection model to obtain the characteristic vector of the image to be identified; the feature vectors are respectively input into a position detection layer, a human head classification layer and a human face classification layer of the joint detection model to obtain first position vectors corresponding to detection frames, first human head classification vectors corresponding to the detection frames and first human face classification vectors corresponding to the detection frames which are recognized in the image to be recognized; and acquiring the position information of each target detection frame containing the human head in the image to be identified and the information of whether each target detection frame contains the human face or not based on the first position vector, the first human head classification vector and the first human face classification vector through an output layer of the joint detection model.

In a possible embodiment, the obtaining unit 71 is specifically configured to:

In a possible implementation manner, the processing unit 72 is further configured to determine identification information of the target detection frame; if the identification information of the target detection frame meets a preset third updating condition, updating the passenger flow value parameter in the currently stored passenger flow data.

Example 9: fig. 8 is a schematic structural diagram of a training device for a joint detection model according to an embodiment of the present invention, where the device includes:

The first obtaining module 81 is configured to obtain any sample image in the sample set, where the sample image is marked with first position information of each sample head frame, a first identification value corresponding to each sample head frame, and a second identification value, where the first identification value is used to identify whether the sample head frame contains a head, and the second identification value is used to identify whether the sample head frame contains a face;

a second obtaining module 82, configured to obtain, through the original joint detection model, second position information of each sample detection frame including a head of a person in the sample image, and information whether each sample detection frame includes a face of the person;

The training module 83 is configured to train the original joint detection model according to the second position information of the sample detection frame and the first position information of the corresponding sample head frame, whether the sample detection frame includes a first identification value corresponding to the head information and the corresponding sample head frame, and whether the sample detection frame includes a second identification value corresponding to the face information and the corresponding sample head frame.

In a possible implementation manner, the sample image is further marked with a third angle identification value corresponding to each sample head frame, and the third angle identification value is used for identifying an angle value of a face in the sample head frame containing the face or identifying that the sample head frame does not contain the face;

The second obtaining module 82 is further configured to obtain, through the original joint detection model, a second angle identification value corresponding to the sample detection frame, where the second angle identification value is used to identify an angle value of a face in the sample detection frame that includes the face, or identify that the sample detection frame does not include the face;

The training module 83 is further configured to train the original joint detection model according to the second angle identification value corresponding to the sample detection frame and the third angle identification value corresponding to the corresponding sample head frame.

In one possible implementation manner, the sample image is also marked with a sample key point position vector corresponding to each sample human head frame;

The second obtaining module 82 is further configured to obtain, by using the original joint detection model, a key point position vector corresponding to each sample detection frame in the sample image;

the training module 83 is further configured to train the original joint detection model according to the key point position vector corresponding to the sample detection frame and the sample key point position vector corresponding to the corresponding sample human head frame.

In one possible implementation, the training module 83 is specifically configured to:

For each sample detection frame, if the sample detection frame is matched with any sample human head frame, determining a position loss value according to second position information of the sample detection frame and first position information of the matched sample human head frame; determining a first head loss value according to whether the sample detection frame contains head information and a first identification value corresponding to the matched sample head frame; determining a face loss value according to whether the sample detection frame contains face information and a second identification value corresponding to the matched sample head frame; determining an angle loss value according to a second angle identification value corresponding to the sample detection frame and a third angle identification value corresponding to the matched sample human head frame; determining a key point loss value according to the key point position vector corresponding to the sample detection frame and the sample key point position vector corresponding to the matched sample human head frame; determining a sub-loss value according to the position loss value, the first head loss value, the face loss value, the angle loss value and the key point loss value; if the sample detection frame is not matched with any sample human head frame, determining a second human head loss value according to whether the sample detection frame contains human head information and a preset first numerical value; determining a sub-loss value according to the second head loss value; and training the original joint detection model according to the sum of the sub-loss values corresponding to each sample detection frame.

and if the matched sample head frame does not contain a human face, determining a preset third numerical value as a key point loss value.

Example 10: fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, where, based on the above embodiments, the electronic device includes: the processor 91, the communication interface 92, the memory 93 and the communication bus 94, wherein the processor 91, the communication interface 92 and the memory 93 complete communication with each other through the communication bus 94; the memory 93 has stored therein a computer program which, when executed by the processor 91, causes the processor 91 to perform the steps of:

Acquiring the position information of each target detection frame containing a human head in an image to be identified and the information of whether each target detection frame contains a human face or not through a joint detection model; and determining passenger flow data according to the position information of the target detection frame and/or whether the target detection frame contains the information of the human face.

Since the principle of the electronic device for solving the problem is similar to that of the data processing method, the implementation of the electronic device can refer to the implementation of the method, and the repetition is omitted.

Example 11: fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, where, based on the above embodiments, the electronic device includes: processor 1001, communication interface 1002, memory 1003 and communication bus 1004, wherein processor 1001, communication interface 1002, memory 1003 accomplish the mutual communication through communication bus 1004;

The memory 1003 stores a computer program which, when executed by the processor 1001, causes the processor 1001 to perform the steps of: acquiring any sample image in a sample set, wherein the sample image is marked with first position information of each sample head frame, a first identification value corresponding to each sample head frame and a second identification value, wherein the first identification value is used for identifying whether the sample head frame contains a head, and the second identification value is used for identifying whether the sample head frame contains a face; acquiring second position information of each sample detection frame containing a human head in a sample image and information of whether each sample detection frame contains a human face or not through an original joint detection model; and training the original joint detection model according to the second position information of the sample detection frame and the first position information of the corresponding sample head frame, whether the sample detection frame contains the first identification value corresponding to the head information and the corresponding sample head frame and whether the sample detection frame contains the second identification value corresponding to the face information and the corresponding sample head frame.

Because the principle of the electronic device for solving the problem is similar to that of the training method of the joint detection model, the implementation of the electronic device can be referred to the implementation of the method, and the repetition is omitted.

The communication bus mentioned by the electronic device in the above embodiment may be a peripheral component interconnect standard (PERIPHERAL COMPONENT INTERCONNECT, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, or the like. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus. The communication interface 1002 is used for communication between the above-described electronic device and other devices. The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor. The processor may be a general-purpose processor, including a central processing unit, a network processor (Network Processor, NP), etc.; but also digital instruction processors (DIGITAL SIGNAL Processing units, DSPs), application specific integrated circuits, field programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.

Example 12: on the basis of the above embodiments, the embodiments of the present invention further provide a computer readable storage medium, in which a computer program executable by a processor is stored, which when executed on the processor causes the processor to implement the steps of:

Since the principle of solving the problem by the computer readable storage medium is similar to that of the data processing method in the above embodiment, the specific implementation may refer to the implementation of the data processing method, and the repetition is omitted.

Example 13: on the basis of the above embodiments, the embodiments of the present invention further provide a computer readable storage medium, in which a computer program executable by a processor is stored, which when executed on the processor causes the processor to implement the steps of:

Acquiring any sample image in a sample set, wherein the sample image is marked with first position information of each sample head frame, a first identification value corresponding to each sample head frame and a second identification value, wherein the first identification value is used for identifying whether the sample head frame contains a head, and the second identification value is used for identifying whether the sample head frame contains a face; acquiring second position information of each sample detection frame containing a human head in a sample image and information of whether each sample detection frame contains a human face or not through an original joint detection model; and training the original joint detection model according to the second position information of the sample detection frame and the first position information of the corresponding sample head frame, whether the sample detection frame contains the first identification value corresponding to the head information and the corresponding sample head frame and whether the sample detection frame contains the second identification value corresponding to the face information and the corresponding sample head frame.

Since the principle of solving the problem by the computer-readable storage medium is similar to that of the training method of the joint detection model in the above-described embodiment, specific implementation can be referred to as implementation of the data processing method.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A method of data processing, the method comprising:

Acquiring position information of each target detection frame containing a human head in an image to be identified, information of whether each target detection frame contains a human face or not, and a first angle identification value corresponding to each target detection frame through a joint detection model, wherein the first angle identification value is used for identifying an angle value of the human face in the target detection frame containing the human face or identifying that the target detection frame does not contain the human face;

determining passenger flow data according to the position information of the target detection frame, whether the target detection frame contains the information of a human face or not and a first angle identification value corresponding to the target detection frame;

Wherein the determining passenger flow data comprises:

determining the identification information of the target detection frame;

If the identification information is not matched with any statistical identification stored in the current statistical queue, and a first angle identification value corresponding to a target detection frame corresponding to the identification information meets a preset second updating condition, determining that the identification information meets a first updating condition;

If the identification information meets a preset first updating condition and the target detection frame contains a human face, updating the attention frequency parameter in the currently stored passenger flow data.

2. The method of claim 1, wherein determining that the first angle identification value corresponding to the target detection frame corresponding to the identification information meets a preset second update condition comprises:

If the target detection frames corresponding to the identification information exist in the continuously set number of images to be identified, and the first angle identification values corresponding to the target detection frames corresponding to the identification information in the continuously set number of images to be identified are smaller than a preset angle threshold, determining that the first angle identification values corresponding to the target detection frames corresponding to the identification information meet the second updating condition.

3. The method according to claim 1, wherein the obtaining, by using a joint detection model, the position information of each target detection frame containing a head of a person in the image to be identified and the information of whether each target detection frame contains a face of the person, includes:

acquiring a feature vector of the image to be identified through a feature extraction network layer of the joint detection model;

The feature vectors are respectively input into a position detection layer, a human head classification layer and a human face classification layer of the joint detection model to obtain first position vectors corresponding to detection frames, first human head classification vectors corresponding to the detection frames and first human face classification vectors corresponding to the detection frames which are recognized in the image to be recognized;

And acquiring the position information of each target detection frame containing the human head in the image to be identified and the information of whether the human face is contained in each target detection frame or not based on the first position vector, the first human head classification vector and the first human face classification vector through an output layer of the joint detection model.

4. The method according to claim 3, wherein the obtaining, by the output layer of the joint detection model, the position information of each target detection frame containing a human head in the image to be identified and the information of whether each target detection frame contains a human face based on the first position vector, the first human head classification vector, and the first human face classification vector includes:

Sequentially determining, by an output layer of the joint detection model, position information corresponding to each detection frame and a first index value based on the first position vector, where the first index value is used to identify a position of the position information corresponding to the detection frame in the first position vector; based on the first head classification vector, sequentially determining whether each detection frame contains information of the head and a second index value, wherein the second index value is used for identifying whether the detection frame contains the position of the head information in the first head classification vector; based on the first face classification vector, sequentially determining whether each detection frame contains information of a face and a third index value according to the sequence, wherein the third index value is used for identifying whether the detection frame contains the position of the information of the face in the first face classification vector;

Determining a target second index value corresponding to a target detection frame containing a human head from the second index values, determining a target first index value identical to the target second index value from the first index values, and determining a target third index value identical to the target second index value from the third index values;

And determining the position information of the detection frame corresponding to the target first index value, the information of whether the detection frame corresponding to the target third index value contains the human face or not as the position information of the target detection frame and the information of whether the target detection frame contains the human face or not.

5. The method of claim 1, wherein the determining passenger flow data further comprises:

and if the identification information of the target detection frame meets a preset third updating condition, updating the passenger flow value parameter in the currently stored passenger flow data.

6. The method of claim 1, wherein the training process of the joint detection model comprises:

Acquiring any sample image in a sample set, wherein the sample image is marked with first position information of each sample head frame, a first identification value, a second identification value and a sample key point position vector corresponding to each sample head frame, the first identification value is used for identifying whether the sample head frame contains a head, and the second identification value is used for identifying whether the sample head frame contains a face;

Acquiring second position information of each sample detection frame containing a human head in the sample image, information of whether each sample detection frame contains a human face or not and a key point position vector corresponding to each sample detection frame through an original joint detection model;

And training the original joint detection model according to the second position information of the sample detection frame and the first position information of the corresponding sample head frame, whether the sample detection frame contains the first identification value corresponding to the head information and the corresponding sample head frame, whether the sample detection frame contains the second identification value corresponding to the face information and the corresponding sample head frame, and the sample key point position vector corresponding to the sample detection frame and the sample key point position vector corresponding to the corresponding sample head frame.

7. The method of claim 6, wherein a third angle identification value corresponding to each sample head frame is further marked in the sample image, and the third angle identification value is used for identifying an angle value of a face in a sample head frame containing the face or identifying that the sample head frame does not contain the face;

8. The method of claim 6, wherein training the raw joint detection model comprises:

for each sample detection frame, if the sample detection frame is matched with any sample human head frame, determining a position loss value according to second position information of the sample detection frame and first position information of the matched sample human head frame; determining a first head loss value according to whether the sample detection frame contains head information and a first identification value corresponding to the matched sample head frame; determining a face loss value according to whether the sample detection frame contains the information of the face and a second identification value corresponding to the matched sample head frame; determining an angle loss value according to a second angle identification value corresponding to the sample detection frame and a third angle identification value corresponding to the matched sample head frame; determining a key point loss value according to the key point position vector corresponding to the sample detection frame and the sample key point position vector corresponding to the matched sample human head frame; determining a sub-loss value according to the position loss value, the first head loss value, the face loss value, the angle loss value and the key point loss value; if the sample detection frame is not matched with any sample human head frame, determining a second human head loss value according to whether the sample detection frame contains human head information and a preset first numerical value; determining a sub-loss value according to the second head loss value;

and training the original joint detection model according to the sum of the sub-loss values corresponding to each sample detection frame.

9. The method of claim 8, wherein determining the angle loss value based on the second angle identification value corresponding to the sample detection frame and the third angle identification value corresponding to the matched sample human head frame comprises:

And if the matched sample head frame does not contain a face, a preset second numerical value is determined to be the angle loss value.

10. The method of claim 8, wherein determining the keypoint loss value from the keypoint vector corresponding to the sample detection frame and the sample keypoint vector corresponding to the matched sample human head frame comprises:

and if the matched sample head frame does not contain a human face, determining a preset third numerical value as the key point loss value.

11. The method of claim 7, wherein the angle values comprise at least one of yaw angle values, pitch angle values, and roll angle values.

12. A data processing apparatus, the apparatus comprising:

The device comprises an acquisition unit, a detection unit and a detection unit, wherein the acquisition unit is used for acquiring position information of target detection frames containing human heads in images to be identified, information of whether the target detection frames contain human faces or not and a first angle identification value corresponding to each target detection frame through a joint detection model, wherein the first angle identification value is used for identifying angle values of human faces in the target detection frames containing the human faces or identifying that the target detection frames do not contain the human faces;

The processing unit is used for determining passenger flow data according to the position information of the target detection frame, whether the target detection frame contains the information of the human face or not and a first angle identification value corresponding to the target detection frame;

the processing unit is specifically configured to: determining the identification information of the target detection frame; if the identification information is not matched with any statistical identification stored in the current statistical queue, and a first angle identification value corresponding to a target detection frame corresponding to the identification information meets a preset second updating condition, determining that the identification information meets a first updating condition; if the identification information meets a preset first updating condition and the target detection frame contains a human face, updating the attention frequency parameter in the currently stored passenger flow data.

13. The apparatus according to claim 12, wherein the processing unit is specifically configured to:

14. The apparatus according to claim 12, wherein the acquisition unit is specifically configured to:

Acquiring a feature vector of the image to be identified through a feature extraction network layer of the joint detection model; the feature vectors are respectively input into a position detection layer, a human head classification layer and a human face classification layer of the joint detection model to obtain first position vectors corresponding to detection frames, first human head classification vectors corresponding to the detection frames and first human face classification vectors corresponding to the detection frames which are recognized in the image to be recognized; and acquiring the position information of each target detection frame containing the human head in the image to be identified and the information of whether the human face is contained in each target detection frame or not based on the first position vector, the first human head classification vector and the first human face classification vector through an output layer of the joint detection model.

15. The apparatus according to claim 14, wherein the acquisition unit is specifically configured to:

Sequentially determining, by an output layer of the joint detection model, position information corresponding to each detection frame and a first index value based on the first position vector, where the first index value is used to identify a position of the position information corresponding to the detection frame in the first position vector; based on the first head classification vector, sequentially determining whether each detection frame contains information of the head and a second index value, wherein the second index value is used for identifying whether the detection frame contains the position of the head information in the first head classification vector; based on the first face classification vector, sequentially determining whether each detection frame contains information of a face and a third index value according to the sequence, wherein the third index value is used for identifying whether the detection frame contains the position of the information of the face in the first face classification vector; determining a target second index value corresponding to a target detection frame containing a human head from the second index values, determining a target first index value identical to the target second index value from the first index values, and determining a target third index value identical to the target second index value from the third index values; and determining the position information of the detection frame corresponding to the target first index value, the information of whether the detection frame corresponding to the target third index value contains the human face or not as the position information of the target detection frame and the information of whether the target detection frame contains the human face or not.

16. The apparatus of claim 12, wherein the processing unit is further configured to:

17. The apparatus of claim 12, wherein the apparatus further comprises:

The first acquisition module is used for acquiring any sample image in a sample set, wherein the sample image is marked with first position information of each sample head frame, a first identification value corresponding to each sample head frame, a second identification value and a sample key point position vector, the first identification value is used for identifying whether the sample head frame contains a head, and the second identification value is used for identifying whether the sample head frame contains a face;

the second acquisition module is used for acquiring second position information of each sample detection frame containing the head of a person in the sample image, information of whether each sample detection frame contains the face of the person or not and a key point position vector corresponding to each sample detection frame through the original joint detection model;

The training module is used for training the original joint detection model according to the second position information of the sample detection frame and the first position information of the corresponding sample head frame, whether the sample detection frame contains the first identification value corresponding to the head information and the corresponding sample head frame, whether the sample detection frame contains the second identification value corresponding to the face information and the corresponding sample head frame, and the sample key point position vector corresponding to the sample detection frame and the sample key point position vector corresponding to the corresponding sample head frame.

18. The apparatus of claim 17, wherein a third angle identification value corresponding to each sample head frame is further marked in the sample image, and the third angle identification value is used for identifying an angle value of a face in a sample head frame containing the face, or identifying that the sample head frame does not contain the face;

the second obtaining module is further configured to obtain, through the original joint detection model, a second angle identifier corresponding to the sample detection frame, where the second angle identifier is used to identify an angle value of a face in the sample detection frame that includes the face, or identify that the sample detection frame does not include the face;

The training module is further configured to train the original joint detection model according to a second angle identifier corresponding to the sample detection frame and a third angle identifier corresponding to the corresponding sample human head frame.

19. The apparatus of claim 17, wherein the training module is specifically configured to:

20. The apparatus of claim 19, wherein the training module is specifically configured to:

21. The apparatus of claim 19, wherein the training module is specifically configured to:

22. An electronic device comprising at least a processor and a memory, the processor being adapted to implement the steps of the data processing method according to any of claims 1-11 when executing a computer program stored in the memory.

23. A computer-readable storage medium, characterized in that it stores a computer program which, when executed by a processor, implements the steps of the data processing method according to any of claims 1-11.