CN114373203A

CN114373203A - Picture archiving method and device, terminal equipment and computer readable storage medium

Info

Publication number: CN114373203A
Application number: CN202111564466.0A
Authority: CN
Inventors: 尹义; 冷鹏宇; 张宁; 刁俊
Original assignee: Shenzhen Intellifusion Technologies Co Ltd
Current assignee: Shenzhen Intellifusion Technologies Co Ltd
Priority date: 2021-12-20
Filing date: 2021-12-20
Publication date: 2022-04-19

Abstract

The application provides a picture archiving method, a picture archiving device, a terminal device and a computer readable storage medium, and the method comprises the following steps: acquiring a plurality of pictures to be archived, wherein each picture to be archived comprises a human face and/or a human body; for a picture to be archived including a face, detecting whether the included face image meets a preset quality condition; when a picture to be archived including a first face image meeting the quality condition is detected, extracting face characteristic information of the first face image; for a picture to be filed including a human body, identifying the head of the human body through target detection processing to obtain a plurality of pieces of first head key point information; identifying the human body by performing bottom-up identification processing on the picture to be filed to obtain one or more second human body posture information; and determining the human body characteristic information of the picture to be filed according to the first head key point information and the second human body posture information. The method and the device for processing the picture filing result effectively improve the reliability of the picture filing result.

Description

Picture archiving method and device, terminal equipment and computer readable storage medium

Technical Field

The present application belongs to the field of image processing technologies, and in particular, to a method and an apparatus for archiving an image, a terminal device, and a computer-readable storage medium.

Background

The picture archiving is a process of classifying the snapshot pictures collected by the shooting device into a plurality of files. The aim of picture archiving is that pictures in each archive after archiving include the same photographic subject. Picture is filed and is widely used in fields such as identification, trail tracking. For example, when tracking a plurality of target persons, a plurality of captured pictures are archived, and the pictures in each archived file include the same target person.

The existing picture filing method generally extracts the face characteristic information of people in the snap-shot pictures, calculates the similarity between the two snap-shot pictures according to the face characteristic information, and determines whether to summarize the two snap-shot pictures into the same file according to the size of the similarity. When the face shooting angle changes or the face features are fuzzy, the reliability of the calculated similarity is low, and the picture filing effect is poor.

Disclosure of Invention

The embodiment of the application provides a picture archiving method, a picture archiving device, terminal equipment and a computer readable storage medium, and the reliability of picture archiving results can be effectively improved.

In a first aspect, an embodiment of the present application provides a picture archiving method, including: acquiring a plurality of pictures to be archived, wherein each picture to be archived comprises a human face and/or a human body; for a picture to be archived including a face, detecting whether the included face image meets a preset quality condition; when a picture to be archived including a first face image meeting the quality condition is detected, extracting face characteristic information of the first face image; for a picture to be filed including a human body, identifying the head of the human body through target detection processing to obtain a plurality of pieces of first head key point information; identifying the human body by performing bottom-up identification processing on the picture to be filed to obtain one or more second human body posture information; determining the human body characteristic information of the picture to be filed according to the first head key point information and the second human body posture information; filing the pictures to be filed according to the face characteristic information to obtain at least one face file, and filing the pictures to be filed according to the human body characteristic information to obtain at least one human body file; and matching the at least one face file and the at least one body file according to the track information to which the picture to be filed belongs, and combining the matched face file and the matched body file into one file, wherein the two pictures to be filed belonging to the same group of track information comprise the same shooting object.

In the embodiments of the application, the quality of the face image is screened before the face feature information is extracted, and the face feature information is extracted only for the face image meeting the preset quality condition, so that the face similarity in the two images can be calculated more accurately subsequently, and the images are filed according to the face feature information. In addition, the head key point information and the human body posture information of the human body picture are determined by different methods, and the human body characteristic information of the human body picture is determined based on the head key point information and the human body posture information, so that the identification accuracy of the human body characteristic information is improved, and the reliability of picture filing is improved.

In a second aspect, an embodiment of the present application provides a picture archiving apparatus, including: the picture acquiring unit is used for acquiring a plurality of pictures to be archived, and each picture to be archived comprises a human face and/or a human body; the quality detection unit is used for detecting whether the included face image meets a preset quality condition or not for the picture to be archived including the face; the face feature recognition unit is used for extracting face feature information of the first face image when detecting a picture to be filed comprising the first face image meeting the quality condition; a human body feature recognition unit for: for a picture to be filed including a human body, identifying the head of the human body through target detection processing to obtain a plurality of pieces of first head key point information; identifying the human body by performing bottom-up identification processing on the picture to be filed to obtain one or more second human body posture information; determining the human body characteristic information of the picture to be filed according to the first head key point information and the second human body posture information; the first filing unit is used for filing the pictures to be filed according to the face characteristic information to obtain at least one face file, and filing the pictures to be filed according to the human body characteristic information to obtain at least one human body file; and the second filing unit is used for matching the at least one face file and the at least one body file according to the track information to which the picture to be filed belongs, and combining the matched face file and the matched body file into one file, wherein the two pictures to be filed belonging to the same group of track information comprise the same shooting object.

In a third aspect, an embodiment of the present application provides a terminal device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor, when executing the computer program, implements the picture archiving method according to any one of the above first aspects.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, and an embodiment of the present application provides a computer-readable storage medium, where a computer program is stored, where the computer program is executed by a processor, and is configured to implement the picture archiving method according to any one of the above first aspects.

In a fifth aspect, an embodiment of the present application provides a computer program product, which, when running on a terminal device, causes the terminal device to execute the picture archiving method according to any one of the above first aspects.

It is understood that the beneficial effects of the second aspect to the fifth aspect can be referred to the related description of the first aspect, and are not described herein again.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic flowchart of a picture archiving method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of corresponding human key points at the front side according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of corresponding human key points on the left side according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a face angle provided in an embodiment of the present application;

FIG. 5 is a schematic position diagram of a camera provided in an embodiment of the present application;

FIG. 6 is a diagram illustrating an image archiving process according to an embodiment of the present application;

FIG. 7 is a schematic structural diagram of a picture archiving apparatus according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a terminal device provided in an embodiment of the present application.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when.. or" upon "or" in response to a determination "or" in response to a detection ".

Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing or implying relative importance.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise.

Referring to fig. 1, which is a schematic flow chart of a picture archiving method provided in the embodiment of the present application, by way of example and not limitation, the method may include the following steps:

s101, acquiring a plurality of pictures to be archived, wherein each picture to be archived comprises a human face and/or a human body.

In the embodiment of the application, a plurality of pictures to be filed can be shot by different shooting devices, and can also be shot by the same shooting device.

Illustratively, in an application scenario, 10 cameras are installed in a shopping mall, each camera takes 100 snap shots, and the 10 cameras take 1000 snap shots. Some of the 1000 snap shots only include a face portion, some only include a body portion, and some include the entire person (including both the face portion and the body portion). The 1000 snap shots constitute a plurality of pictures to be archived. In the subsequent embodiment of the application, the picture to be archived including the human face part is recorded as the human face picture, and the picture to be archived including the human body part is recorded as the human body picture.

In order to accurately extract feature information subsequently, the picture including the whole person can be further divided into a face picture including only the face part and a human body picture including only the human body part, and feature extraction is subsequently performed on the divided face picture and the human body picture respectively.

In practical application, the pictures to be filed can be manually sorted, the pictures to be filed are manually divided into human face pictures and human body pictures, or the whole figure picture is manually divided into the human face pictures and the human body pictures. The trained recognition model can be used for sorting the pictures to be archived, and no specific limitation is made here.

S102, detecting whether the included face image meets a preset quality condition or not for the picture to be archived including the face.

The quality condition is a preset condition for preliminarily judging the quality of the face image and is used as a basis for preliminarily screening the face image.

In one example, whether the face image meets the quality condition can be determined according to the light condition of the image. For example, detecting whether the included face image meets a predetermined quality condition may include:

acquiring light rays of the face image, wherein the light rays are used for representing the brightness degree of the face image;

and detecting whether the face image meets the quality condition or not according to the light ray division of the face image.

Whether the face image meets the quality condition can be detected in a self-adaptive judging mode. For example, whether the face image meets the quality condition is detected by comparing the light ray of the face image with a light ray subthreshold value. If the light ray score of the face image is greater than or equal to the light ray score threshold value, the face image meets the quality condition; and if the light ray component of the face image is lower than the light ray component threshold value, the face image does not accord with the quality condition. The light ray threshold may be a preset value, which is determined according to at least one of a parameter of an image capturing device that captures a picture to be archived or a capturing environment parameter, and the embodiment of the present application is not limited thereto.

S103, when the picture to be filed including the first face image meeting the quality condition is detected, extracting the face feature information of the first face image.

In one example, when a picture to be archived including a first face image meeting a quality condition is detected, the detection of whether the face image of the picture to be archived meets the quality condition is stopped, and the extraction of the face feature information of the first face image is started.

A predetermined quality condition, such as a ray division threshold, is used as a criterion for determining whether to extract facial feature information from the first face image. If the first face image meets the preset quality condition, the overall quality of the first face image is good, the quality requirement for face recognition is met, the filing reliability of the later image is not affected, and therefore the face feature information of the first face image can be extracted. The predetermined quality condition (e.g., the light ray threshold) may be set according to experience or experimental data in combination with actual situations, and the embodiment of the present application is not limited to the setting manner and basis of the quality condition.

And if the first face image does not accord with the preset quality condition, detecting whether the next picture to be archived, which comprises the face, accords with the quality condition or processing other flows.

In the embodiments of the application, the quality of the face image is screened before the face feature information is extracted, and the face feature information is extracted only for the face image meeting the quality condition, so that the face similarity of the two images can be calculated more accurately subsequently, and the images are filed according to the face feature information.

In this example, the face feature information may be extracted using an existing image feature extraction method. For example, the feature information is extracted by using a trained neural network model, the feature information is extracted by using a scale-invariant feature variation method, the feature information is extracted by using a histogram of directional gradients, and the like, which are not particularly limited herein.

It should be noted that, when a certain picture to be archived includes both a human face and a human body, the picture to be archived is both a human face picture and a human body picture, and it is necessary to extract human face feature information and human body feature information from the picture to be archived, respectively.

In one example, after the facial feature information of one face picture to be archived is extracted, the steps S102-S103 are repeated for the next face picture to be archived until all the face pictures to be archived are processed. Optionally, after processing a face picture to be archived, another flow may be inserted, for example, a flow for processing a body picture to be archived is inserted, and after the other flow is processed, processing of the next face picture to be archived is continued according to S102-S103.

S104, for the picture to be filed including the human body, the head of the human body is identified through target detection processing so as to obtain a plurality of pieces of first head key point information.

And the picture including the human body in the picture to be archived is called a human body picture. In the embodiment of the application, the picture to be archived is archived according to the human face and also according to the human body characteristic information. The human body feature information is a human body feature that characterizes a certain person, such as a human body posture.

The human body posture recognition technology in the prior art mainly comprises two major categories:

one type is a top-down identification method, namely: the approximate position of the human body is positioned, and then the posture is specifically recognized. The most common method is to firstly obtain the position frame of each person in the image by adopting a target detection method, then perform human skeleton key point detection on a single person on the basis of the detection frame, and finally obtain the whole human posture, and the methods mainly comprise CPM, RMPE, mask-RCNN, GRMI and the like.

The other is a bottom-up recognition method, namely: the method comprises the steps of finding all limbs, combining the limbs, detecting all key points in an image, and clustering all the key points into different individuals through a relevant strategy. Typical representatives are: calculating each key point information of the human body posture by adopting a human body posture thermodynamic diagram Heatmaps or regression key point coordinates; connecting the calculated key points by adopting a Partial Affinity Fields (PAF); when a plurality of persons are identified, the human body posture information of each person is obtained by adopting a bipartite graph solving method of graph theory.

However, the above two methods have a problem of low recognition accuracy. When the human body posture is recognized, the head key point information and the human body posture information of the human body picture are determined by different methods, and the human body characteristic information of the human body picture is determined based on the head key point information and the human body posture information, so that the recognition accuracy of the human body characteristic information is improved, and the reliability of picture filing is improved.

In one example, recognizing a head of a human body by a target detection process to obtain a plurality of first head keypoint information includes:

s201, determining one or more head detection frames in the picture to be archived through target detection processing; and

s202, carrying out key point calculation processing on the first head detection frame to obtain a plurality of pieces of first head key point information.

In this example, a first head detection frame of the human body in the image to be processed is acquired by the target detection process (for example, by fast-RCNN or SSD), and the first head detection frame is a rectangular frame into which the head of the corresponding human body is inscribed as much as possible. When only one person is included in the image, obtaining a first head detection frame through target detection processing; when a plurality of persons are included in the image, a plurality of first head detection frames are obtained by the target detection processing. The object detection process can obtain the related information of the object in the image, including the category and the position of the object, such as a person or an object, and a specific number, and the position information is usually represented by a bounding box (detection box). In this embodiment, a bounding box of each human head is obtained by target detection processing.

Specifically, the target detection method that can be adopted in this embodiment is a two-stage detection method represented by fast RCNN, R-FCN, etc. based on a convolutional neural network, and this kind of method mainly deepens learning classification through a candidate window, first extracts a candidate region, and performs a classification scheme based on a deep learning method on the corresponding region; an end-to-end regression method based on deep learning represented by YOLO, SSD and the like can also be adopted, and the method divides the image into a plurality of small grids in advance and extracts the features in the small grids. In addition, the present embodiment may also adopt a conventional detection method to perform the target detection processing, which is all within the protection scope of the present invention.

After obtaining the first head detection frame, the present example may obtain a plurality of pieces of first head key point information through key point calculation processing, where the head key point information mainly refers to the position coordinate value of the pixel corresponding to the head key point (i.e., the position coordinate value of the head key point) and the position coordinate value of each pixel in the pixel region corresponding to the head key point. The head key point corresponding pixel region refers to all the other pixels except the central pixel contained in the circular region with the radius of R and the pixel corresponding to the head key point is taken as the central pixel. The value of R is not limited, for example: r can be a Gaussian function standard deviation value of 3 times, and can also be a certain proportion of long-edge pixels of the image to be processed, such as: 1/10 times.

In one specific example, the keypoint computation process may include the steps of:

s301, calculating each key point information of the human body posture by adopting a human body posture thermodynamic diagram or regression key point coordinates;

s302, acquiring the head posture of the human body according to the calculated information of each key point;

s303, when the head posture of the human body is the front side or the back side, respectively taking the middle points of all sides of the first head detection frame as four head key points;

s304, when the head posture of the human body is the left side, taking the midpoint of the right longitudinal side, the left lower vertex and the midpoint of the upper lateral side of the first head detection frame as three head key points;

s305, when the head posture of the human body is the right side, the middle point of the left longitudinal side, the right lower vertex and the middle point of the upper lateral side of the first head detection frame are taken as three head key points.

Step S301 is executed first, and each piece of key point information of the human body posture is calculated using the human body posture thermodynamic diagram or the regression key point coordinates. Calculating each keypoint information of a human pose using a human pose thermodynamic diagram and calculating each keypoint information of a human pose using regression keypoint coordinates is well known to those skilled in the art and will not be described herein in detail. In an example, 16 pieces of key point information (when the human body posture is the front side, the human body key points at this time are specifically shown in fig. 2) or 15 pieces of key point information (when the human body posture is the left side, the head key points at this time are specifically shown in fig. 3, where 1 is the top of the head, 2 is the left ear, and 3 is the chin) are obtained through calculation, and are related to the adopted calculation model, and the training of the calculation model is related to manual labeling, and the specific manner of manual labeling is described in detail later.

Then, step S302 is executed to obtain the head pose of the human body according to the calculated information of each key point. In the example, the calculated key points can be connected through PAF, and when a plurality of persons are identified, the bipartite graph solving method (such as Hungarian algorithm) of graph theory is adopted to obtain the body posture information of each person; when only one person is identified, the human posture identification information of each person is obtained according to the connected key points by utilizing deep learning. PAF and graph theory bipartite graph solving methods are well known to those skilled in the art and will not be described herein. Specifically, when the head is known to include four key points in step S301, it indicates that the head posture of the human body is the front or the back; when the head is known to include three key points in step S301 and the key point (i.e., chin) located at the vertex of the head detection frame is located at the leftmost side of the head detection frame, it indicates that the head posture of the human body is the left side; when the head includes three key points and the key point (i.e., the chin) at the vertex of the head detection frame is located at the rightmost side of the head detection frame in step S301, it indicates that the head posture of the human body is the right side.

So far, the head pose of each human body in the image to be processed can be obtained, and the head pose comprises the following steps: front, back, left side, or right side. Then, different head postures are respectively processed in different modes: when the head posture of the human body is the front side or the back side, respectively taking the middle points of all sides of the first head detection frame as four first head key points; when the head posture of the human body is the left side, taking the midpoint of the right longitudinal side, the left lower vertex and the midpoint of the upper lateral side of the first head detection frame as three first head key points; and when the head posture of the human body is the right side, taking the midpoint of the left longitudinal side, the right lower vertex and the midpoint of the upper lateral side of the first head detection frame as three first head key points.

In another specific example, the keypoint computation process may comprise the following steps:

s311, calculating each key point information of the human body posture by adopting a human body posture thermodynamic diagram or regression key point coordinates;

s312, acquiring the head posture of the human body according to the calculated information of each key point;

s313, when the head posture of the human body is a side face, performing transverse expansion processing on the first head detection frame to obtain an expanded first head detection frame;

s314, when the head posture of the human body is the front side or the back side, respectively taking the middle points of all sides of the first head detection frame as four first head key points;

s315, when the head posture of the human body is the left side, taking the midpoint of the right longitudinal side, the left lower vertex and the midpoint of the upper lateral side of the expanded first head detection frame as three pieces of first preliminary head key point information;

s316, when the head posture of the human body is the right side, the middle point of the left longitudinal side, the right lower vertex and the middle point of the upper lateral side of the expanded first head detection frame are used as three pieces of first preliminary head key point information;

and S317, when the head posture of the human body is the left side or the right side, performing transverse converging and contracting processing corresponding to the transverse expanding processing on the first preliminary head key point information to obtain first head key point information.

Compared to the previous examples (S301-305), step S311, step S312, step S314, step S315 and step S316 can refer to step S301, step S302, step S303, step S304 and step S305, respectively, which is mainly added with step S313 and step S317. When the head pose of the human body is known to be the left side or the right side, step S313 is executed to perform a horizontal extension process on the first head detection frame, that is, the first head detection frame is extended in the horizontal direction of the human face by a certain ratio (see fig. 3, where the middle rectangular frame is the first head detection frame and the extended rectangular frame is the extended first head detection frame), and a value range of the extension ratio may include 1.2 to 1.5, for example: under the condition that the central point of the first head detection frame is kept unchanged and the longitudinal length of the first head detection frame is kept unchanged, the first head detection frame is transversely expanded by 1.2 times, 1.3 times, 1.4 times or 1.5 times, namely the transverse coordinate is expanded by 1.2 times, 1.3 times, 1.4 times or 1.5 times under the condition that the center of the first head detection frame is used as the original point and the first head detection frame is kept unchanged, so that the expanded first head detection frame can completely cover the head of a person, and the accuracy of subsequent identification is improved.

After step S315 or step S316 is completed, three pieces of first preliminary head keypoint information corresponding to one human body are obtained, and step S317 is then executed, where the obtained three pieces of first preliminary head keypoint information are subjected to a transverse convergence process corresponding to the transverse expansion process in step S313, that is, the longitudinal coordinates of the three pieces of first preliminary head keypoint information are not changed, and the transverse coordinates are subjected to a convergence process with the center of the first head detection frame as the origin, for example: when the expansion ratio in step S313 is 1.1 times, the convergence ratio is 1/1.1, and the converged three first preliminary head keypoint information are taken as the three first head keypoint information.

When the head posture of the human body is a side face (not a front face or a back face), the information of the first head key points can be more in line with the actual situation through reasonable extension of the first head detection frame and convergence of the head key points obtained according to the reasonable extension, and finally the recognition accuracy of the human body posture is improved. It should be emphasized again that the keypoint information includes the position coordinate values of all pixels contained in a circular region with radius R, centered on the pixel corresponding to the keypoint.

Thus, the first head key point information is known.

And S105, identifying the human body by performing bottom-up identification processing on the picture to be filed to obtain one or more second human body posture information.

The bottom-up recognition process in this embodiment may include the following steps:

(1) calculating each key point information of the human body posture by adopting a human body posture thermodynamic diagram or regression key point coordinates, wherein the specific implementation mode can refer to the step S301, and the information of the step S301 can be directly obtained in a specific example, so that repeated execution is not needed;

(2) the calculated key points are connected by adopting a partial affinity field, when a plurality of persons are identified, the human body posture information of each person is obtained by adopting a bipartite graph solving method (such as Hungarian algorithm) of graph theory, the specific implementation mode can refer to the step S302, and the information of the step S302 can be directly obtained in a specific example, so that repeated execution is not needed. Partial affinity fields are commonly used in human body posture recognition technology, and are a non-parametric representation of torso key point association. It preserves position and orientation information in the support area of the limb. The partial affinity field is a two-dimensional vector field for each limb: for each pixel in the area belonging to each limb (arm, or leg, or torso), the two-dimensional vector encodes the direction pointing from one part on the limb to the other. Each limb has a corresponding affinity field to link their body parts. For details of the partial Affinity field, see the article "real Multi-Person 2D position Estimation using partial Affinity field" (web site link of the article: https:// axiv. org/abs/1611.08050) or other disclosure, and will not be described herein.

It should be noted that, in other embodiments of the present invention, other bottom-up recognition methods may be adopted to obtain the second body posture information, which are all within the protection scope of the present invention.

At this point, the second body posture information is obtained, and then S106 is performed.

And S106, determining the human body characteristic information of the picture to be filed according to the first head key point information and the second human body posture information.

In one example, determining the human body feature information of the picture to be filed according to the first head key point information and the second human body posture information comprises:

s401, extracting second head key point information and first trunk key point information from second human body posture information;

s402, fusing the first head key point information and the second head key point information to obtain fused third head key point information; and

and S403, taking the third head key point information and the first trunk key point information as the human body feature information of the picture to be archived.

In this example, extracting the second head keypoint information and the first torso keypoint information from the second body pose information (S401) is first performed, which is well known to those skilled in the art and will not be described herein in detail. In this example, there are 12 first torso keypoints and three keypoints per limb, as can be seen in particular in fig. 2. The second head key points are 3 (see fig. 3 in particular when the human head posture is the left side or the right side) or 4 (see fig. 2 in particular when the human head posture is the front side or the back side). It should be noted that, in this embodiment, it is necessary to extract not only the position coordinate values of the pixels corresponding to the head key point and the torso key point, but also the position coordinate value of the pixel corresponding to the circular region with the radius R and the center of each head key point and each torso key point.

This is where the second header keypoint information is known.

Next, the first head key point information and the second head key point information are fused to obtain fused third head key point information (S402).

In this example, a gaussian distribution value of each pixel in the whole to-be-processed image corresponding to each second head key point may be obtained through calculation, a gaussian distribution value of each pixel in the whole to-be-processed image corresponding to each fused third head key point is obtained through calculation according to the following formula, and then the calculated gaussian distribution value is converted into a position coordinate value of each pixel corresponding to each fused third head key point and a position coordinate value of a pixel corresponding to a circular region with a radius R.

After the connection problem of the PAF is obtained through the bipartite graph solving method of graph theory, all key point connection vector fields of the human body postures are already obtained, and in order to improve the positioning accuracy and further remove errors and redundant connections (the connections may be parts which are relatively difficult to identify due to overlapping parts or hidden parts of the human body in the previous processing (for example, step S105)), the existing posture information and the head position information of the bounding box are fused, so that the accuracy of information positioning is improved.

The fusion of the embodiment includes the following formula, that is, the gaussian distribution value of each pixel in the whole image to be processed corresponding to each fused third head key point can be calculated and obtained through the formula:

wherein f is_k(x_i) The Gaussian distribution value of the ith pixel corresponding to the fused kth third head key point, G is a bilinear interpolation function, R is the radius of the pixel region corresponding to the head key point, and l_kA position coordinate value, x, of a pixel corresponding to the kth second head key point_iIs the k-thThe Gaussian distribution value, x, of the ith pixel corresponding to the second head key point_jA position coordinate value, L, of a jth pixel corresponding to a kth first head key point_kThe position coordinate value of the pixel corresponding to the kth first head key point. Wherein, the value range of i is 1-M, and M is the number of pixels in a circular area with the radius of R; the value range of j is 1-N, and N is the total number of pixels in the image to be processed. In practical application, the value of R can be determined according to the gaussian distribution value, and then the specific value of R in the previous step is determined.

In this example, when the head pose of the human body is the front or the back, the number of the first head key points and the second head key points is four, so that the value of k is 1, 2, 3, and 4. When the head posture of the human body is the left side face or the right side face, the number of the first head key points and the second head key points is three, so that the value of k is 1, 2 and 3.

And calculating three or four head key points to obtain a third position coordinate value of the pixel corresponding to each head key point and a third position coordinate value of the pixel corresponding to each head key point, namely third head key point information.

Next, S403 is executed, and the third head key point information and the first torso key point information are used as the human body feature information of the picture to be archived.

In S403, the three or four pieces of third head keypoint information, for example, obtained in S402 and the twelve pieces of first torso keypoint information, for example, obtained in S105 are used as the human body feature information of the picture to be archived.

In S104-S106, the head key point information and the body posture information of the human body picture are determined by different methods, and the human body characteristic information of the human body picture is determined based on the head key point information and the body posture information, so that the identification accuracy of the human body characteristic information is improved, and the reliability of picture filing is improved. In addition, in the embodiment of S104-S106, when the human body posture is recognized, the first head key point information of the image to be processed is obtained through the target detection processing and the key point calculation processing, the second head key point information and the first torso key point information of the image to be processed are obtained through the bottom-up recognition processing, then the third head key point information is obtained by fusing the first head key point information and the second head key point information, and finally the third head key point information and the first torso key point information are used as the human body posture recognition result. Therefore, the embodiments can effectively combine the data labeling mode and the human body posture prediction algorithm, and use the idea of a top-down recognition method for reference, so that the facial features which are easy to extract are taken as the main target detection positioning features for extraction, and the recognition accuracy of the model is improved.

Therefore, the human face picture to be filed and the human body picture to be filed are extracted by the human face and human body characteristics.

In the above embodiment, the steps S102 to S106 are performed by dividing the picture to be filed into the picture of the face to be filed and the picture of the body to be filed, but it is understood that the detection may be performed during the processing of the steps S102 to S106 without being classified in advance. For example, after step S101, for each picture to be archived, it is first detected whether the picture to be archived includes a human face and/or a human body, and if the picture to be archived includes a human face, the process proceeds to step S102, and if the picture to be archived includes a human body, the process proceeds to step S104. Also, for a picture including both a human face and a human body, the processing of steps S102 to S103 and the processing of steps S104 to S106 are performed, and the order of performing these two processes is not limited.

S107, filing the plurality of pictures to be filed according to the face characteristic information to obtain at least one face file, and filing the plurality of pictures to be filed according to the human body characteristic information to obtain at least one human body file.

In S107, the face picture and the body picture in the picture to be archived are respectively archived. In one embodiment, the step of archiving the facial picture comprises:

calculating the face feature similarity between the face feature information of each two face pictures; if the face feature similarity is larger than a first preset threshold, the two face pictures are summarized to a same face file.

The distance between the face feature information of the two face pictures can be calculated by using an Euclidean distance and Mahalanobis distance equidistance calculation method, and then the distance is subtracted by using a preset numerical value to obtain the face feature similarity. And calculating the face feature similarity between the face feature information of the two face pictures by using similarity calculation methods such as cosine similarity, a pierce correlation coefficient, a Jacard similarity coefficient and the like. The method for calculating the face feature similarity is not specifically limited in the application.

In practical applications, it often happens that a shot picture of a human face is a side face. Because the side face contains less face feature information than the front face, the face feature similarity between the two side face pictures may be greater than the face feature similarity between the two front face pictures. The face feature similarity calculated in this case is inaccurate.

To solve the above problem, in one embodiment, the step of archiving the face picture includes:

calculating respective image quality values of a plurality of face pictures; calculating the face fusion similarity between every two face pictures according to the image quality value and the face characteristic information; and filing the plurality of face pictures according to the face fusion similarity to obtain at least one face file.

The face picture usually has a plurality of states, including angle, picture size, whether wearing a mask, picture definition and the like. And taking each state of the face picture as a quality parameter for evaluating the image quality, counting the parameter value of each quality parameter, and calculating the image quality value of the face picture according to the parameter values.

Taking an angle as an example, see fig. 4, which is a schematic diagram of a face angle provided in the embodiment of the present application. As shown in fig. 4, the face angles include roll, pitch, and yaw. Where roll represents the angle of axial rotation from front to back along the head, pitch represents the angle of axial rotation from left to right along the head, and yaw represents the angle of axial rotation from bottom to top along the head. These three angles may represent the rotation angle of the face relative to the camera.

Optionally, the manner of calculating the image quality value includes:

for each face picture, acquiring respective parameter values of a plurality of image quality parameters of the face picture; and weighting and summing the parameter values to obtain the image quality value of the face picture.

In particular, it can be represented by the formula

An image quality value is calculated. Wherein Q represents an image quality value, n is the number of quality parameters, x_iAs a parameter value, ω, of the ith quality parameter_iAnd the weight is corresponding to the parameter value of the ith quality parameter.

Taking the angle of the above-mentioned embodiment of fig. 4 as an example, three angle values of roll, pitch and yaw may be respectively used as a parameter value. For the picture size, the length and the width of the picture can be respectively used as a parameter value; the area of the picture may also be taken as a parameter value. For whether or not to wear the mask, the parameter values may be set for the case of wearing the mask and the case of not wearing the mask, for example, the parameter value for the case of wearing the mask is set to 1, and the parameter value for the case of not wearing the mask is set to 0. For picture sharpness, the resolution of a picture may be taken as a parameter value. It should be noted that, besides several quality parameters listed in the embodiment of the present application, other quality parameters that affect the picture quality may also be selected, and are not specifically limited herein. The angle value of the face picture can be obtained by identifying the existing face angle identification model, and is not particularly limited herein. The face wearing mask condition can be identified by the existing face mask identification model, and is not specifically limited herein.

The larger the influence of a certain quality parameter on the image quality, the larger the corresponding weight. For example: the angle of the face in the picture has a large influence on the image quality, and the size of the picture has a small influence on the image quality, so that the weight corresponding to the angle is increased, and the weight corresponding to the size of the picture is reduced.

In order to improve the calculation accuracy of the image quality value, the weight may be continuously learned. For example, the weights are learned using the feature similarity between pictures and the parameter values of a plurality of quality parameters for each picture.

By the method, the picture quality factor of the face picture is considered in the calculation of the face feature similarity, and the accuracy of the face similarity is effectively improved.

Based on the description of the image quality value, optionally, the calculation method of the face fusion similarity may include: and weighting and summing the face feature similarity between every two face pictures and the respective image quality values of every two face pictures to obtain the face fusion similarity between every two face pictures.

Optionally, another calculation method of the face fusion similarity includes:

calculating the similarity of the human face features between every two human face pictures according to the human face feature information; dividing a plurality of face pictures into a plurality of picture groups according to the image quality values; acquiring a preset coefficient matrix, wherein the coefficient matrix comprises a weight coefficient between picture groups to which every two human face pictures belong; and calculating the face fusion similarity between every two face pictures according to the face feature similarity between every two face pictures and the weight coefficient.

The picture group may be divided in the following manner: presetting a division range of an image quality value; and dividing the face picture corresponding to the image quality value in the division range into a picture group.

Illustratively, assume that the image quality values are divided into ranges of 0 ~ 50, 50 ~ 80, and 80 ~ 100. The image quality values of the face picture A, B, C, D are 30, 60, 70, and 90, respectively. The face picture A belongs to a first picture group, the face pictures B and C belong to a second picture group, and the face picture D belongs to a third picture group. It should be noted that the above is only an example of the picture group division, and the division range of the image quality value is not particularly limited.

In the embodiment of the application, the coefficient matrix may be manually set in advance, may also be calculated according to actual experience, and may also be continuously adjusted along with the actual application process.

In the embodiment of the application, the face similarity and the weight coefficient can be multiplied to obtain the face fusion similarity. Illustratively, assuming there are A, B, C face pictures, the image quality values of A, B, C are 40, 70, and 90, respectively. The three face pictures are divided into two picture groups according to the image quality value, specifically, the face pictures with the image quality value larger than 50 are divided into one picture group (high-quality picture group), and the face pictures with the image quality value smaller than 50 are divided into one picture group (low-quality picture group). I.e., a belongs to a low quality group of pictures and B and C belong to a high quality group of pictures. In the preset coefficient matrix, the weight coefficient between the high-quality picture group and the low-quality picture group is 0.9, the weight coefficient between the high-quality picture group and the high-quality picture group is 0.4, and the weight coefficient between the low-quality picture group and the low-quality picture group is 0.5.

Calculating the face fusion similarity between A and B: calculating the similarity of the human face features between the A and the B; the weight coefficient between the low-quality picture group to which A belongs and the high-quality picture group to which B belongs is 0.9; and multiplying the face feature similarity between the A and the B by 0.9 to obtain the face fusion similarity between the A and the B.

Calculating the face fusion similarity between A and C: calculating the similarity of the human face features between the A and the B; the weight coefficient between the low-quality picture group to which A belongs and the high-quality picture group to which C belongs is 0.9; and multiplying the face feature similarity between the A and the C by 0.9 to obtain the face fusion similarity between the A and the C.

Calculating the face fusion similarity between B and C: calculating the similarity of the human face features between the B and the C; the weight coefficient between the high-quality picture group to which B belongs and the high-quality picture group to which C belongs is 0.4; and multiplying the face feature similarity between B and C by 0.4 to obtain the face fusion similarity between B and C.

It can be seen from the above example that the measurement of the face feature similarity in each of the different quality intervals is unified by respectively correcting the face feature similarity in each of the intervals of the image quality values. Therefore, the face picture can be archived by utilizing the uniform threshold value, and unreasonable archiving results caused by non-uniform measurement of feature similarity are avoided.

In one embodiment, the step of archiving the human body picture comprises:

calculating the human body characteristic similarity between the respective human body characteristic information of each two human body pictures; if the human body feature similarity is larger than a second preset threshold value, the two human body pictures are summarized to the same human body file.

In practical application, two persons may wear similar clothes, which may result in high similarity of human body features between two human body pictures, and inaccurate final filing result.

To solve the above problem, in one embodiment, the step of archiving the human body picture includes:

calculating the human body feature similarity between every two human body pictures according to the human body feature information; calculating the time-space similarity between every two human body pictures; calculating the human body fusion similarity between every two human body pictures according to the human body feature similarity and the space-time similarity; and filing the plurality of human body pictures according to the human body fusion similarity to obtain at least one human body file.

The space-time similarity includes similarity on time information and similarity on space information. The time in the embodiment of the application may refer to the time when the shooting device takes a snapshot of the target object and obtains the human body picture. Since there may be a plurality of cameras in an actual scene, the installation positions of different cameras are different, resulting in an actual distance between different cameras. And this actual distance constitutes spatial information.

For example, refer to fig. 5, which is a schematic position diagram of a shooting device provided in an embodiment of the present application. As shown in fig. 5, the position points of the camera a and the camera B are acquired, where the position point of the camera a is an intersection O1 of the center of the field of view of the camera a and the center line of the illuminated road, and the position point of the camera B is an intersection O2 of the center of the field of view of the camera B and the center of the illuminated road. The actual distance between cameras a and B is the actual distance from O1 to O2 (line segments O1M, MN, and NO2 shown in fig. 5).

Optionally, the calculation method of the spatio-temporal similarity includes:

calculating the actual distance between the shooting devices corresponding to each two human body pictures; calculating the time similarity between the shooting time of each two human body pictures; and calculating the space-time similarity between every two human body pictures according to the actual distance and the time similarity.

The method for calculating the actual distance may be: the method comprises the steps of determining position points of two shooting devices in an application scene map, determining a path between the two position points in the application scene map, calculating the actual length of the path, and determining the actual length as an actual distance.

The method for calculating the time similarity may be: and multiplying the respective shooting time of each two human body pictures to obtain the time similarity between each two human body pictures. The method can also comprise the following steps: and calculating the similarity between the shooting time of each two human body pictures to obtain the time similarity. For example, a difference between two shooting times is calculated, and the difference is taken as a time similarity; the cosine similarity between two shooting times can also be calculated and taken as the time similarity. Of course, other similarity calculation methods may also be used, such as euclidean distance, mahalanobis distance, etc., and are not specifically limited herein.

Optionally, one implementation manner of calculating the time-space similarity between two human body pictures according to the actual distance and the time similarity may be as follows:

normalizing the calculated actual distance, reducing the distance to be between 0 and 1, and then subtracting the normalized distance from 1 to obtain the spatial similarity; and then multiplying the spatial similarity and the time similarity between every two human body pictures to obtain the space-time similarity between every two human body pictures.

One implementation way of calculating the human fusion similarity between every two human images according to the human feature similarity and the space-time similarity is as follows: and multiplying the human body feature similarity between every two human body pictures by the space-time similarity to obtain the human body fusion similarity between every two human body pictures.

Further, filing a plurality of human body pictures according to the human body fusion similarity to obtain at least one human body file, including: and if the human body fusion similarity between the two human body pictures is greater than a second preset threshold value, summarizing the two human body pictures into the same human body file.

In S107, a community graph cut method (e.g., infomap, louvain algorithm, etc.) may be further utilized for post-processing, so as to improve the accuracy of archiving.

And S108, matching at least one face file and at least one body file according to the track information to which the pictures to be filed belong, and combining the matched face file and the matched body file into one file, wherein the two pictures to be filed belonging to the same group of track information comprise the same shooting object.

In one embodiment, the implementation of matching the face profile and the body profile includes:

and for any human body file, respectively calculating a matching value between the human body file and each human face file, and combining the human face file and the human body file corresponding to the maximum matching value into one file.

The matching value represents the number of face pictures belonging to the track information corresponding to the face file in the human body file.

Optionally, the face files and the body files corresponding to the larger N matching values may also be combined into one file. In practical application, the size of the value of N can be determined according to the accuracy of filing. The larger the value of N is, the lower the filing precision is; the smaller the value of N, the higher the archiving accuracy.

In the embodiment of the application, when the human body archive and the human face archive are matched, the track information can be utilized. A plurality of snap-shot pictures shot by the same shooting object under a single shooting device belong to the same group of track information. The track information can be identified by using the existing track tracking technology, and is not limited herein.

In practical application, a shooting device can perform tracking shooting on a certain shooting object by using a tracking algorithm. During the tracking shooting process, a plurality of snapshot pictures are obtained, and the snapshot pictures form a group of track information. The plurality of snap pictures of the group of track information may include a face picture and a body picture. The human face picture and the human body picture which belong to the same group of track information are matched pictures.

Illustratively, a human profile a is matched to each face profile. Assume that there are two face files B and C, wherein the body file a includes 10 body pictures, and each of the face files B and C includes 10 face pictures. 8 human body pictures in the human body file A are matched with 8 human face pictures in the human face file B, namely 8 human body pictures in the human body file A belong to the track information corresponding to the human body picture B, and the matching value between the human body picture A and the human body picture B is 8; there are 3 human pictures in the human body archive A and 3 human face pictures in the human face archive C to match, namely there are 3 human pictures in A that belong to the track information that C corresponds to, the matching value between A and C is 3. The face file and the body file a corresponding to the maximum matching value (i.e. 8) are merged into one file, i.e. the face file B and the body file a corresponding to the matching value 8 are merged into one file.

In the above embodiment, the steps S102 to S106 are performed by dividing the picture to be filed into the picture of the face to be filed and the picture of the body to be filed, but it is understood that the detection may be performed during the processing of the steps S102 to S106 without being classified in advance. For example, after step S101, for each picture to be archived, it is first detected whether the picture to be archived includes a human face and/or a human body, and if the picture to be archived includes a human face, the process proceeds to step S102, and if the picture to be archived includes a human body, the process proceeds to step S104. It should be noted that, for the picture including both the human face and the human body, the processing of steps S102 to S103 and the processing of steps S104 to S106 are performed, and the order of performing these two processes is not limited.

Fig. 6 is a schematic diagram of a picture archiving process provided in the embodiment of the present application. As shown in fig. 6, a face fusion similarity is obtained according to the picture quality and the face picture characteristics (i.e., face characteristic information), and the face picture is archived according to the face fusion similarity to obtain a face archive. And obtaining human body fusion similarity according to the space-time similarity and human body picture characteristics (namely human body characteristic information), and filing the human body picture according to the human body fusion similarity to obtain a human body file. And finally, performing file aggregation processing on the face files and the human body files according to the track information.

By the method, human body characteristic information is considered on the basis of the human face characteristic information; moreover, the picture quality is considered during face filing, the spatio-temporal information is considered during human body filing, the dimensionality of the feature information is increased, the inaccuracy of the similarity caused by the inaccuracy of the feature information with single dimensionality is avoided, and the reliability of the picture filing result is effectively improved.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Fig. 7 is a block diagram of a picture archiving apparatus according to the embodiment of the present application, which corresponds to the picture archiving method according to the foregoing embodiment, and only the parts related to the embodiment of the present application are shown for convenience of illustration.

Referring to fig. 7, the apparatus includes:

the picture acquiring unit 71 is configured to acquire a plurality of pictures to be archived, where each picture to be archived includes a human face and/or a human body;

a quality detection unit 72, configured to detect whether a face image included in a picture to be archived, which includes a face, meets a predetermined quality condition;

a face feature recognition unit 73, configured to extract face feature information of the first face image when detecting a to-be-archived picture including the first face image that meets the quality condition;

a human feature recognition unit 74 for: for a picture to be filed including a human body, identifying the head of the human body through target detection processing to obtain a plurality of pieces of first head key point information; identifying the human body by performing bottom-up identification processing on the picture to be filed to obtain one or more second human body posture information; determining the human body characteristic information of the picture to be filed according to the first head key point information and the second human body posture information;

a first filing unit 75, configured to file the multiple pictures to be filed according to the facial feature information to obtain at least one facial file, and file the multiple pictures to be filed according to the human body feature information to obtain at least one human body file;

and a second filing unit 76, configured to match the at least one face archive and the at least one body archive according to the track information to which the picture to be filed belongs, and combine the matched face archive and body archive into one archive, where two pictures to be filed belonging to the same group of track information include the same shooting object.

It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.

The picture filing apparatus shown in fig. 7 may be a software unit, a hardware unit, or a combination of software and hardware unit built in an existing terminal device, may be integrated into the terminal device as a separate hanger, or may exist as a separate terminal device.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Fig. 8 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 8, the terminal device 8 of this embodiment includes: at least one processor 80 (only one shown in fig. 8), a memory 81, and a computer program 82 stored in the memory 81 and executable on the at least one processor 80, the processor 80 implementing the steps in any of the various picture archiving method embodiments described above when executing the computer program 82.

The terminal device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The terminal device may include, but is not limited to, a processor, a memory. Those skilled in the art will appreciate that fig. 8 is merely an example of the terminal device 8, and does not constitute a limitation of the terminal device 8, and may include more or less components than those shown, or combine some components, or different components, such as an input-output device, a network access device, and the like.

The Processor 80 may be a Central Processing Unit (CPU), and the Processor 80 may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 81 may in some embodiments be an internal storage unit of the terminal device 8, such as a hard disk or a memory of the terminal device 8. In other embodiments, the memory 81 may also be an external storage device of the terminal device 8, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the terminal device 8. Further, the memory 81 may also include both an internal storage unit and an external storage device of the terminal device 8. The memory 81 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of the computer program. The memory 81 may also be used to temporarily store data that has been output or is to be output.

The embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in the above-mentioned method embodiments.

The embodiments of the present application provide a computer program product, which when running on a terminal device, enables the terminal device to implement the steps in the above method embodiments when executed.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium and can implement the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to an apparatus/terminal device, recording medium, computer Memory, Read-Only Memory (ROM), Random-Access Memory (RAM), electrical carrier wave signals, telecommunications signals, and software distribution medium. Such as a usb-disk, a removable hard disk, a magnetic or optical disk, etc. In certain jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and patent practice.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. A picture archiving method, comprising:

acquiring a plurality of pictures to be archived, wherein each picture to be archived comprises a human face and/or a human body;

for a picture to be archived including a face, detecting whether the included face image meets a preset quality condition;

when a picture to be archived including a first face image meeting the quality condition is detected, extracting face characteristic information of the first face image;

for a picture to be filed including a human body, identifying the head of the human body through target detection processing to obtain a plurality of pieces of first head key point information;

identifying the human body by performing bottom-up identification processing on the picture to be filed to obtain one or more second human body posture information;

determining the human body characteristic information of the picture to be filed according to the first head key point information and the second human body posture information;

filing the pictures to be filed according to the face characteristic information to obtain at least one face file, and filing the pictures to be filed according to the human body characteristic information to obtain at least one human body file;

and matching the at least one face file and the at least one body file according to the track information to which the picture to be filed belongs, and combining the matched face file and the matched body file into one file, wherein the two pictures to be filed belonging to the same group of track information comprise the same shooting object.

2. The picture archiving method according to claim 1, wherein said recognizing the head of the human body by the object detection process to obtain a plurality of first head keypoint information comprises:

determining one or more head detection frames in the picture to be archived through target detection processing; and

and performing key point calculation processing on the first head detection frame to obtain a plurality of pieces of first head key point information.

3. The picture archiving method according to claim 1, wherein said determining the body characteristic information of the picture to be archived according to the first head key point information and the second body pose information comprises:

extracting second head key point information and first trunk key point information from the second human body posture information;

fusing the first head key point information and the second head key point information to obtain fused third head key point information; and

and taking the third head key point information and the first trunk key point information as the human body characteristic information of the picture to be filed.

4. The picture archiving method according to claim 1, wherein the key point calculation process includes:

calculating each key point information of the human body posture by adopting a human body posture thermodynamic diagram or regression key point coordinates;

acquiring the head posture of the human body according to the calculated information of each key point;

when the head posture of the human body is the front side or the back side, respectively taking the middle points of all the edges of the head detection frame as four head key points;

when the head posture of the human body is the left side, taking the midpoint of the right longitudinal side, the left lower vertex and the midpoint of the upper lateral side of the head detection frame as three head key points;

and when the head posture of the human body is the right side, taking the midpoint of the left longitudinal side, the right lower vertex and the midpoint of the upper lateral side of the head detection frame as three head key points.

5. The picture archiving method according to claim 1, wherein the archiving the plurality of pictures to be archived according to the facial feature information to obtain at least one facial archive comprises:

calculating respective image quality values of a plurality of face pictures, wherein the face pictures are to-be-archived pictures including faces;

calculating the face fusion similarity between every two face pictures according to the image quality value and the face feature information;

and archiving the pictures to be archived, including the human faces, of the plurality of pictures to be archived according to the human face fusion similarity to obtain the at least one human face file.

6. The picture archiving method according to claim 5, wherein said calculating a face fusion similarity between each two pictures to be archived, including faces, according to said image quality value and said face feature information comprises:

calculating the face feature similarity between every two pictures to be filed including the face according to the face feature information;

dividing the plurality of face pictures into a plurality of picture groups according to the image quality values;

acquiring a preset coefficient matrix, wherein the coefficient matrix comprises a weight coefficient between picture groups to which each two pictures to be archived belong;

and calculating the face fusion similarity between each two pictures to be archived according to the face feature similarity between each two pictures to be archived and the weight coefficient.

7. The picture archiving method according to claim 1, wherein the archiving the plurality of pictures to be archived according to the human body characteristic information to obtain at least one human body archive comprises:

calculating the human body feature similarity between every two human body pictures according to the human body feature information, wherein the human body pictures comprise pictures to be filed of human bodies;

calculating the time-space similarity between every two human body pictures;

calculating human body fusion similarity between every two human body pictures according to the human body feature similarity and the space-time similarity;

and archiving the human body pictures in the plurality of pictures to be archived according to the human body fusion similarity to obtain the at least one human body file.

8. A picture archiving apparatus, comprising:

the picture acquiring unit is used for acquiring a plurality of pictures to be archived, and each picture to be archived comprises a human face and/or a human body;

the quality detection unit is used for detecting whether the included face image meets a preset quality condition or not for the picture to be archived including the face;

the face feature recognition unit is used for extracting face feature information of the first face image when detecting a picture to be filed comprising the first face image meeting the quality condition;

a human body feature recognition unit for: for a picture to be filed including a human body, identifying the head of the human body through target detection processing to obtain a plurality of pieces of first head key point information; identifying the human body by performing bottom-up identification processing on the picture to be filed to obtain one or more second human body posture information; determining the human body characteristic information of the picture to be filed according to the first head key point information and the second human body posture information;

the first filing unit is used for filing the pictures to be filed according to the face characteristic information to obtain at least one face file, and filing the pictures to be filed according to the human body characteristic information to obtain at least one human body file;

and the second filing unit is used for matching the at least one face file and the at least one body file according to the track information to which the picture to be filed belongs, and combining the matched face file and the matched body file into one file, wherein the two pictures to be filed belonging to the same group of track information comprise the same shooting object.

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.