CN112241666A

CN112241666A - Target identification method, device and storage medium

Info

Publication number: CN112241666A
Application number: CN201910650008.5A
Authority: CN
Inventors: 黄耀海; 谭诚
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-07-18
Filing date: 2019-07-18
Publication date: 2021-01-19

Abstract

The disclosure provides a target identification method, an apparatus and a storage medium. The method and the device fully consider the difference between the target pictures of different targets in the target picture library, and set the threshold values for similarity comparison for the target pictures of the targets respectively, so that the accuracy of target identification is improved.

Description

Target identification method, device and storage medium

Technical Field

The present disclosure relates to the field of target recognition, and in particular, to a technique for setting a suitable similarity threshold for a target picture in a target picture library to perform accurate target recognition.

Background

Currently, object recognition techniques, especially face recognition techniques, have found widespread use in our lives and works. For example, in a scene with concentrated crowd, such as a sports game, a concert, a train station, and the like, where the number of people may reach tens of thousands or even hundreds of thousands, the worker can be assisted in performing work such as security check through face recognition. For another example, in an entrance guard application in an office, people entering the office can be determined through face recognition, and the like.

The face recognition process generally includes the steps of preprocessing (such as face detection, face feature point detection, etc.), face feature extraction, and postprocessing (such as face feature matching, sorting, etc.). And in the face feature matching of the post-processing step, similarity matching is carried out on the face features of the face picture to be recognized and the face features of the face pictures stored in the face picture library, and if the matching similarity is higher than a preset similarity threshold, the corresponding face picture in the face picture library is used as the recognition result of the face picture to be recognized.

In the current face feature matching, a fixed threshold is usually set for a face picture library to perform similarity determination. In this case, if the similarity threshold is set too high, the face picture to be recognized may be rejected even if the face picture is a face picture stored in the face picture library; however, if the similarity threshold is set too low, the face picture to be recognized that is not originally in the face picture library may be recognized by mistake.

Disclosure of Invention

Because the similarity threshold is set only for the target picture library in the current target identification method, and the difference between different targets is not considered, the target identification result is not good enough. The present disclosure is directed to solving the above-mentioned technical problem, and avoiding the problem that the target recognition accuracy is adversely affected due to unreasonable setting of the similarity threshold.

According to an aspect of the present disclosure, there is provided a target recognition method, the method including: carrying out similarity matching on a target picture to be identified and a target picture in a target picture library; determining a target picture meeting a similarity matching condition from a target picture library and determining a similarity threshold of the target picture; determining a target identification result according to whether the similarity between the target picture to be identified and the target picture meeting the similarity matching condition is higher than a determined similarity threshold value; and setting a similarity threshold for the target picture of each target in at least part of targets in the target picture library.

According to another aspect of the present disclosure, there is provided an object recognition apparatus, the apparatus including: the similarity matching unit is configured to perform similarity matching on the target picture to be identified and the target pictures in the target picture library; a threshold setting unit configured to set a similarity threshold for a target picture of each of at least some of the targets in the target picture library; a threshold determination unit configured to determine a target picture satisfying a similarity matching condition from a target picture library and determine a similarity threshold of the target picture; and the target identification unit is configured to determine a target identification result according to whether the similarity between the target picture to be identified and the target picture meeting the similarity matching condition is higher than the determined similarity threshold value.

According to another aspect of the present disclosure, there is provided a computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform the steps of the above-mentioned object recognition method.

Other features of the present disclosure will become apparent from the following description of exemplary embodiments with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate embodiments of the disclosure and together with the description of the embodiments, serve to explain the principles of the disclosure.

Fig. 1(a) to 1(c) show recognition results at different similarity thresholds.

Fig. 2(a) and 2(b) show examples of the distribution of human face features in the feature space.

Fig. 3 shows an example of setting a similarity density threshold for face pictures of respective persons in the face picture library.

Fig. 4 illustrates a hardware environment in which the face recognition system of the present disclosure operates.

Fig. 5 is a schematic diagram of a face recognition flow according to a first exemplary embodiment of the present disclosure.

Fig. 6 shows examples of a face picture library for training and a face picture library for query.

Fig. 7(a) and 7(b) show an example of the similarity degree alignment process.

Fig. 8 is a schematic diagram of a face recognition flow according to a second exemplary embodiment of the present disclosure.

Fig. 9(a) and 9(b) show the similarity density threshold values when the face pose is not considered and when the face pose is considered, respectively.

Fig. 10 shows the similarity density threshold when age is considered.

Fig. 11 shows the likeness density threshold when illumination is considered.

Fig. 12 shows an example of similarity transformation.

Fig. 13 shows an example of recognizing a plurality of face pictures to be recognized contained in a trajectory.

Fig. 14 is a schematic structural diagram of a face recognition apparatus according to an eighth exemplary embodiment of the present disclosure.

Detailed Description

Taking face recognition in target recognition as an example, currently, a face recognition method based on a confidence model is proposed in the industry, in the method, the similarity of any two face pictures in a face picture library is scored in advance, and a gaussian distribution curve based on the confidence model is calculated according to the similarity distribution condition of the face pictures. When the human face is identified, the human face picture to be identified is respectively matched with the human face pictures in the human face picture library according to the human face characteristics, and the maximum value of the similarity is found. Then, a confidence is calculated based on the maximum value of the similarity and the confidence model. And finally, comparing the calculated confidence with a preset confidence threshold, and if the calculated confidence is greater than the confidence threshold, determining a face recognition result according to the face picture corresponding to the maximum similarity.

In the above face recognition method based on the confidence model, the confidence model of gaussian distribution is established according to the distribution of the face pictures in the face picture library. However, if the number of face pictures in the face picture library is small or the contents of the face pictures are unbalanced, the established confidence model may deviate from the actual situation, so that the accuracy of the final face recognition result is adversely affected. For example, if the face image library contains a majority of face images of boys and a small number of face images of girls, the confidence model established is less accurate for the face images of girls. When the face picture to be recognized is a face picture of a girl, if a corresponding person in the face picture to be recognized is not in the face picture library, a false recognition may occur because the confidence model cannot be adapted to such a situation. For another example, if there is a situation where the face pictures in the face picture library are similar, such as two brother face pictures, if a brother face picture is used as the face picture to be recognized, the face picture may be recognized as a brother face picture.

In the current face recognition method, a fixed similarity threshold is set for a face picture library by establishing a mode such as a confidence model. And when the face features are matched, the face picture in the face picture library, the similarity of which with the face picture to be recognized is higher than the similarity threshold value, is taken as the face recognition result. In this case, setting of the similarity threshold is particularly important. Taking the similarity comparison results shown in fig. 1(a) to 1(c) as an example, the circles represent the first results of the ranking of faces in the face picture library, and the triangles represent the first results of the ranking of faces not in the face picture library. If the similarity threshold value in fig. 1(a) is better, the result higher than the similarity threshold value has a result that is not in the face picture library, but the false recognition rate is low, and the result is still a better recognition result. Here, the false recognition rate can be used to measure the capability of rejecting the face picture to be recognized that is not in the face picture library, for example, if there are 1000 face pictures to be recognized that are not in the face picture library, only 1 face picture to be recognized is mistakenly recognized. If the similarity threshold value is too high, as shown in fig. 1(b), the face picture to be recognized in the face picture library should be rejected, resulting in a low recall rate. If the value of the similarity threshold is too low, as shown in fig. 1(c), the face picture to be recognized that is originally not in the face picture library is mistakenly recognized, so that the false recognition rate is increased.

In addition to the above-described process of face recognition, the same problem may exist for target recognition for other targets. For example, in the case where the target picture gallery is a picture gallery of fruit pictures, if a fixed similarity threshold is set for the target picture gallery, it is difficult to set a better similarity threshold so that the false recognition rate is low and the recall rate is high.

Exemplary embodiments of the present disclosure will be described hereinafter with reference to the accompanying drawings. In the interest of clarity and conciseness, not all features of an embodiment have been described in the specification. It should be appreciated, however, that in the development of any such actual embodiment, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with device-related and business-related constraints, which may vary from one implementation to another. Moreover, it will be appreciated that such a development effort might be complex and time-consuming, but would nevertheless be a routine undertaking for those of ordinary skill in the art having the benefit of this disclosure.

Here, it should also be noted that, in order to avoid obscuring the present disclosure with unnecessary detail, only process steps and/or system structures germane to at least the scheme according to the present disclosure are shown in the drawings, and other details not germane to the present disclosure are omitted.

Taking the case that the target picture library is a face picture library as an example, the distribution of the face features of different face pictures in the feature space is not uniform. Some facial features have no other facial features close to each other around the feature space, and as shown in fig. 2(a), the similarity density of these facial features is low and is easily distinguished from other facial features in the feature space. Here, the similarity density refers to a degree of similarity between a human face and other human faces in a certain dimension (e.g., appearance). Some facial features have similar other facial features around the feature space, as shown in fig. 2(b), the similarity density of these facial features is high, and these facial features are not easily distinguished from other facial features in the feature space.

Of course, if the target picture library is not a face picture library, for example, a fruit picture library, similar to the above case of the face picture library, according to the distribution of the fruit features of different fruit pictures in the feature space, some fruits (for example, bananas) are easily distinguished from other fruits, and the similarity density is low; some fruits (e.g., oranges) are not easily distinguishable from other fruits and have a high degree of similarity.

In view of this, the present disclosure provides a new target identification method, which sets a similarity threshold for a target picture of each target in a target picture library according to the similarity density of the target pictures in a feature space in the target picture library. An optional setting method of the similarity threshold is as follows: and setting a lower similarity threshold value for the target picture with lower similarity density of the target features, and setting a higher similarity threshold value for the target picture with higher similarity density of the target features.

The present disclosure describes a scheme of setting a similarity threshold for a target picture of each target in a target picture library to perform target identification, taking a similarity threshold based on similarity density as an example. For convenience of description, the following exemplary embodiments are described taking a similarity threshold based on the similarity density (referred to simply as a similarity density threshold) as an example. In addition, the present disclosure is described taking face recognition as an example of target recognition, and the following exemplary embodiments all take face recognition as an example. However, the present disclosure is not limited to face recognition, and other object recognition such as fruit recognition, vehicle type recognition, etc. may be applied in the present disclosure.

For the example shown in fig. 3, the similarity density of the face features extracted from the face picture of the person a is low (there are no other face features close to the periphery in the feature space), and here, the similarity density threshold value set for the face picture of the person a is 0.55; the similarity density of the face features extracted from the face picture of the person B is high (there are other face features close to the periphery in the feature space), and the similarity density threshold value set for the face picture of the person B is 0.83, which is higher than the similarity density threshold value set for the face picture of the person a. When the human face is identified, the human face identification result is determined based on the similarity density threshold value of the human face picture of the person in the human face picture library. According to the scheme, the similarity density threshold is set for the face picture of each figure in the face picture library, so that the difference between the face pictures of different figures is fully considered, the distinguishing capability of the face similarity is greatly improved in the subsequent face recognition process, and the accuracy of face recognition is improved.

Fig. 4 illustrates a hardware environment in which the face recognition system of the present disclosure operates, including: a processor unit 10, an internal memory unit 11, a network interface unit 12, an input unit 13, an external memory 14, and a bus unit 15.

The processor unit 10 may be a CPU or a GPU. The internal memory unit 11 includes a Random Access Memory (RAM), a Read Only Memory (ROM). The RAM may be used as a main memory, a work area, and the like of the processor unit 10. The ROM may be used to store a control program for the processor unit 10, and may also be used to store files or other data to be used when running the control program. The network interface unit 12 may connect to a network and implement network communications. The input unit 13 controls input from a keyboard, a mouse, or the like. The external memory 14 stores a boot program and various applications and the like. The bus unit 15 is used to connect the units in the optimization apparatus of the multilayer neural network model.

Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings.

< first exemplary embodiment >

Fig. 5 depicts a flowchart of steps of face recognition according to the first exemplary embodiment of the present disclosure. In the first embodiment, the processing flow of face recognition shown in fig. 5 is implemented by using the RAM as a work memory and causing the CPU 10 to execute a program stored in the ROM and/or the external memory 14.

Step S101: setting a similarity density threshold value for the face picture of each person in at least part of the persons in the face picture library, wherein the similarity density threshold value is determined according to the similarity density of the face features in the face picture in the feature space.

In the face recognition system, the face picture library used may include a face picture library for training and a face picture library for query. In general, to better achieve similarity density, the face picture library for training may use a full-coverage data set that covers face pictures of various families, different ages, etc. on the earth. In general, a data set with more than a million number of face pictures can be considered as a large enough data set, and setting a similarity density threshold for the face pictures of the person by using the data set can effectively reduce the false recognition rate of the face recognition system. A plurality of face pictures of some persons (such as stars, politics, etc.) are contained in a face picture library for training, and for this reason, the same similarity density threshold value can be set for all face pictures of the persons. Of course, different similarity density thresholds may also be set for different face pictures of the same person, which will be described in the following embodiments two to five. For example, when the face picture is stored in the face picture library, the name of the person is simultaneously stored as the attribute information, so that the plurality of face pictures of the same person can be determined according to the name. The disclosure is not limited thereto and any possible manner may be applied to the disclosure.

Because the number of face pictures in the face picture library for training is large, and the face recognition which is usually requested to be performed has a specific purpose, after the similarity density threshold value is set for the face pictures of all the people in the face picture library for training, the face picture library for query containing fewer face pictures can be determined from the face picture library for training according to the purpose of the current face recognition service or the query range to which the login personnel belong, and the face recognition processing is performed by using the face picture library for query.

The following describes, by taking fig. 6 as an example, determining a face picture library for query from a face picture library for training and the similarity density of corresponding face pictures. The left side of fig. 6 shows the mapping relationship between the person identifier and the similarity density threshold set for the face picture of the person.

And setting a similarity density threshold value for the face picture of each person in the face picture library for training. Specifically, the face picture of each figure in the face picture library is circulated, the similarity between the face picture and the face pictures of other figures is scored, and then the face pictures are sorted according to the scores. Since the purpose of setting the similarity density threshold is to distinguish between persons, an alternative way is to determine the similarity density threshold of the face picture of a person based on the maximum value of the similarity score, for example, as shown in the following formula (1):

threshold ═ max (similarity) + α formula (1)

Wherein Threshold represents a similarity density Threshold set for a face picture of a current person; (max similarity) represents the maximum value of the similarity scores between the face picture of the current person and the face pictures of other persons; α is a constant, which can be set according to empirical values, such as α ═ 0.1; α may also be the mean deviation corresponding to the maximum value of the similarity score; the alpha can also set different alpha values for the face pictures of different people according to the user-defined people importance. For example, if it is desired to avoid the important person from being recognized by mistake, a large α value may be set when setting the similarity density threshold for the face picture of the important person, and a small α value may be set for the non-important person.

In addition to the above similarity-based scoring approach for determining the similarity density threshold of the face picture of the person, the present disclosure is not limited to other approaches. For example, the face picture of each person in the face picture library is circulated, the distance between the face feature of the face picture and the face feature of the face picture of other persons in the feature space is determined, and then the face pictures are sorted according to the distance. An alternative approach is to determine a similarity density threshold for a picture of a person's face based on the closest distance. The principle of this method is similar to that of the similarity scoring method, and is not described herein again.

After setting a similarity density threshold value for the face picture of each person, when a user logs in the recognition system, the recognition system recognizes the current login person from the face picture library for training according to the user information (such as the face picture and the user name of the login user) of the login user. Furthermore, a face picture library for inquiry corresponding to the current login person and a similarity density threshold value set for the face picture of each person in the face picture library for inquiry are also determined. If the recognition system does not recognize the current login person in the face picture library for training, the similarity density threshold of the face picture with the highest similarity can be selected as the similarity density threshold of the current login person.

Taking an office access control system of a company as an example, firstly, face pictures of employees of the company are put into a face picture library for training, and a similarity density threshold value is set for the face picture of each employee. Then, when the administrator (login user) logs in the face recognition system, a face picture library (a face picture library including face pictures of employees of the company) which can be used for inquiry of the office access control system of the company and a similarity density threshold set for the face picture of each employee are determined by recognition of the administrator, as shown in the right side of fig. 6.

Step S101 is a preprocessing step, and then, face recognition processing may be performed as needed. It should be noted that the method of setting the similarity density threshold for the face picture of each person in the face picture library for training and determining the face picture library for query from the face picture library for training is an optional method for implementing the present disclosure, but is not a necessary method for implementing the present disclosure. The present disclosure is not limited to the whole process of face recognition by establishing a face picture library without distinguishing the face picture library for training and the face picture library for recognition. For example, a face picture library is created for a sport, and the face picture library already contains a large number of face pictures, so that a similarity density threshold can be directly set for the face picture of each person in the face picture library, and face recognition can be directly performed in the face picture library. The above setting of the similarity density threshold for the face picture of each person in the face picture library is a preferred embodiment for implementing the present disclosure, but is not a necessary embodiment. Considering that the face picture library may include face pictures of a large number of people, the similarity density threshold may be set for only some of the face pictures of people, for example, the similarity density threshold may be set for important people. The present disclosure does not limit which people's face pictures in the face picture library have the similarity density threshold set.

Step S102: and carrying out similarity matching on the face picture to be recognized and the face pictures in the face picture library.

In step S102, when a face recognition request input from the outside is received, similarity calculation is performed on the face pictures to be recognized included in the request and each face picture in the face picture library, so as to obtain similarity ranking. Optionally, when the number of face pictures in the face picture library is large, an index may be constructed for the face pictures in the face picture library, and similarity matching is performed only on a part of face pictures which may be recognition results based on the constructed index, so that the calculation time is reduced, and fast matching is realized.

Step S103: and determining the face picture meeting the similarity matching condition and the similarity density threshold corresponding to the face picture from the face picture library.

In the disclosure, the face picture with the highest similarity to the face picture to be recognized can be used as the face picture meeting the similarity matching condition; or, the face pictures in the face picture library are sorted in the order of high similarity to low similarity, and the top N (N is a positive integer greater than or equal to 1) face pictures are taken as the face pictures meeting the similarity matching condition. The method and the device do not limit the similarity matching condition, and the matching condition can be set arbitrarily according to the requirements of users. After the face picture meeting the similarity matching condition is determined, the similarity density threshold corresponding to the face picture meeting the similarity matching condition can be determined according to the corresponding relationship between the face picture and the similarity density threshold determined in step S101.

If the face picture satisfying the similarity matching condition is not determined in step S103, step S106 is executed to output a result that cannot be recognized.

Step S104: and judging whether the similarity between the face picture to be recognized and the face picture meeting the similarity matching condition is higher than a corresponding similarity density threshold value or not. If yes, go to step S105; otherwise, step S106 is executed.

Taking the face picture with the highest similarity as the face picture satisfying the similarity matching condition, as shown in fig. 7(a), assuming that the similarity between the face picture to be recognized and the face picture a1 of the person a is 0.56, the similarity between the face picture B1 of the person B is 0.42, and the similarity between the face picture C1 of the person C is 0.33, it is determined that the face picture a1 is the face picture satisfying the similarity matching condition. After the corresponding relation between the face picture and the similarity density threshold is inquired, the similarity density threshold corresponding to the face picture a1 is determined to be 0.54. At this time, the similarity between the face picture to be recognized and the face picture a1 is higher than the similarity density threshold of the face picture a1, then step S105 is executed to use the person a shown in the face picture a1 as the recognition result. Otherwise, if the similarity between the face picture to be recognized and the face picture a1 is lower than the similarity density threshold of the face picture a1, step S106 is executed to output the result that the recognition cannot be performed.

Taking three face pictures (face pictures a1, b1 and c1) of three people with the highest similarity as an example, the process of face recognition is as follows: firstly, the similarity between the face pictures of several people and the face picture to be recognized is judged to be higher than the corresponding similarity density threshold value. If the similarity between the face picture of only one person (e.g., the face picture a1) and the face picture to be recognized is greater than the similarity density threshold corresponding to the face picture a1, step S105 is performed to use the person shown in the face picture a1 as the recognition result. If the similarity between a plurality of face pictures (such as the face pictures a1, b1, and c1) and the face picture to be recognized is higher than the corresponding similarity density threshold, the prediction of the face recognition result by only using the maximum similarity result may be risky, and it is further determined whether any two similarities among the three similarities between the face picture to be recognized and the three face pictures are close (e.g., the difference between the similarities is less than 10% of the set value). If so, indicating that the recognition results of the multiple persons are difficult to distinguish, executing step S106, and outputting the result which cannot be recognized; if not, it indicates that the multiple persons are distinguishable significantly, and step S105 is executed to use the person shown in the face picture with the highest similarity as the recognition result.

As shown in fig. 7(b), it is assumed that the similarity between the face pictures a1, b1, and c1 of the three people with the highest similarity and the face picture to be recognized is 0.56, 0.85, and 0.46, respectively, and the similarity density threshold values corresponding to the face pictures a1, b1, and c1 are 0.54, 0.83, and 0.45, respectively. It can be seen that the similarity between the face picture to be recognized and the face pictures a1, b1 and c1 is higher than the similarity density threshold value corresponding to each. Meanwhile, it is determined that none of the three similarities 0.56, 0.85, and 0.46 is close to each other, and therefore, the person B shown in the face picture B1 with the highest similarity can be used as the final recognition result.

Step S105: and (5) successfully identifying, and taking the figure shown by the determined face picture as an identification result.

Step S106: and if the recognition fails, outputting a result which cannot be recognized.

By the scheme of the first exemplary embodiment of the disclosure, the similarity density of the face picture of each person in the face picture library in the feature space is set to be the corresponding similarity density threshold value, so that the persons are effectively distinguished, and the accuracy of face recognition is improved.

The first exemplary embodiment above is to set the similarity density threshold value only with the similarity density of the face picture of the person in the feature space. However, the face pose, the age of the person, or the illumination state of the person in the face pictures may be different, and these may affect the face recognition result. Therefore, the present disclosure further sets similarity density thresholds for a plurality of face pictures of the same person based on the person target states in the face pictures, respectively, on the basis of the first exemplary embodiment. This is described below by way of second to fifth exemplary embodiments.

< second exemplary embodiment >

In the second embodiment, after the similarity densities of the face features in the feature space are set for a plurality of face pictures of the same person, the similarity density threshold of each face picture is adjusted based on the face postures of different face pictures of the same person, so that the similarity density threshold of the face pictures is optimized. The face pose here refers to the visual angle of the face in the face picture. In particular, the face pose may be customized by the user, for example, defining the face pose as a front face, a half side face, and a full side face. If the mirror image factor is considered, the image can be further divided into a left face and a right face. As another example, the face pose may be defined as looking up, looking down, and the like. The definition of the face pose is not limited in the present disclosure, as long as the defined face pose is favorable for improving the accuracy of face recognition. The face pose shown by the face picture can be taken as picture attribute information and stored in a face picture library together with the face picture in advance, and when the face picture to be recognized is received, the face pose information can be simultaneously received as the attribute information of the picture; of course, the present embodiment is not limited to other manners of determining the face pose, such as determining the face pose shown in the face picture by using techniques such as face feature point detection.

Fig. 8 depicts a flowchart of steps of face recognition according to a second exemplary embodiment of the present disclosure, which is specifically described as follows.

Step S201: and setting a similarity density threshold value for the face picture of each person in the face picture library.

This step S201 is the same as step S101. Taking the person a shown in fig. 9(a) as an example, in the present step S201, if the face pose is not considered, the similarity density threshold set for the face pictures a1 (front face), a2 (half side face), and a3 (full side face) of the person a is 0.54.

Step 202: and determining the similarity density threshold value of each face pose pair according to the face pose shown by each face picture of the same person and the face pose pair of the face pose possibly appearing in the face picture to be recognized.

Taking the person shown in fig. 9(b) as an example, in the case shown in fig. 9(a), the face poses are further considered, the face poses corresponding to the face pictures a 1-a 3 of the person a are respectively a front face, a half side face and a full side face, and when performing subsequent face recognition, the face pose of the face picture to be recognized may be one of the three poses. Considering all possible cases (permutation and combination), there are 6 cases as shown in fig. 9(b) for the face pose pairs of the face picture to be recognized and the face pictures a1 to a 3: a full-sided face-full-sided face pair, a full-sided face-half-sided face pair, a full-sided face-front face pair, a half-sided face-half-sided face pair, a half-sided face-front face pair, a front face-front face pair. On the basis of the similarity density threshold value of 0.54 (shown in fig. 9 (a)) set for the face pictures a 1-a 3 when the face pose is not considered, the similarity density threshold value of each face pose pair is adjusted, so that the adjusted similarity density threshold value corresponding to each face pose pair can better adapt to the similarity comparison between the face pose pairs.

Specifically, taking a full-side face-half-side face pair as an example, the similarity density threshold is reduced to 0.51 on the basis of 0.54 shown in fig. 9(a), because: the human face pose in the human face picture to be recognized is a full side face, which is different from the half side face of the human face picture a2 in terms of human face pose, and when the similarity of human face features is compared, the human face features extracted from the two human face pictures are greatly different. Therefore, for the pair of human face poses with different poses, the similarity density threshold needs to be slightly lowered to reduce the possibility of rejection. Further, taking the half-side face-half-side face pair as an example, there is no change from 0.54 shown in fig. 9(a), because: the human face posture in the human face picture to be recognized is a half-side face, which is the same as the human face posture of the human face picture a2, and more human face features are visible in the picture during the half-side face, so that the similarity density threshold value cannot be reduced in order to reduce the possibility of false recognition, and therefore, 0.54 is directly used as the similarity density threshold value of a half-side face-half-side face pair. The threshold for the likeness density of the frontal-frontal pair is also 0.54 for the same reason as the half-sided-half-sided pair. The similarity density threshold of the full-side face-full-side face pair is 0.53, which is slightly lower than the similarity density thresholds of the front face-front face pair and the half-side face-half-side face pair, because there are fewer human face features visible in the picture during full-side face, which are not favorable for human face feature extraction and similarity comparison, and therefore, the set similarity density threshold 0.53 is slightly lower than 0.54.

Note that although different poses of the same face are used in fig. 9(b), this is simply to clear the combination that represents the face poses concisely and does not represent the exact same face.

Step S203: and performing similarity matching on the input human face picture to be recognized and the human face pictures in the human face picture library.

This step S203 is the same as step S102 in the first exemplary embodiment, and is not described here again.

Step S204: and determining the face picture meeting the similarity matching condition and the similarity density threshold value of the face picture from the face picture library.

This step S204 is different from step S103 in the first exemplary embodiment in that: in step S204, the similarity density threshold corresponding to the face image is the similarity density threshold for the face pose pair in step S202. Also taking the case shown in fig. 9(b) as an example, assuming that the face picture a1 (all-side face) of the person a is a face picture satisfying the similarity matching condition and the face picture to be recognized is also an all-side face, the similarity density threshold value for performing the face recognition processing is determined to be 0.53 based on the similarity density threshold value corresponding to the all-side face-all-side face pair in fig. 9 (b). In step S103, similarly, assuming that the face picture a1 of the person a is a face picture satisfying the similarity matching condition, since the similarity density threshold is not adjusted for the face pose in the first embodiment, the similarity density threshold for performing the face recognition processing is determined to be 0.54.

Step S205: and judging whether the similarity between the face picture to be recognized and the face picture meeting the similarity matching condition is higher than a corresponding similarity density threshold value or not. If yes, go to step S206; otherwise, step S207 is executed.

This step S205 is the same as step S104 in the first exemplary embodiment.

Step S206: and (5) successfully identifying, and taking the figure shown by the determined face picture as an identification result.

Step S207: and if the recognition fails, outputting a result which cannot be recognized.

< third exemplary embodiment >

In the third embodiment, after the similarity densities of the face features in the feature space are set for a plurality of face pictures of the same person, the similarity density threshold of each face picture is adjusted based on the age factors of the persons in different face pictures. Different ages have different requirements on the similarity density threshold of face recognition, and the face picture to be recognized and the face pictures in the face picture library may respectively represent different ages of the same person. Therefore, the similarity density threshold of different face pictures of the same person needs to be adjusted according to the age factor. The age definition herein can be customized by the user, for example, defining the ages as children, young adults and old people, or directly defining the ages of 0-18 years, 18-60 years and over 60 years. The age of the person shown in the face picture library may be stored as picture attribute information together with the face picture. Of course, the present embodiment is not limited to determining the age information of the person shown in the face picture by the technique of face feature point detection or the like.

The face recognition process of the third embodiment is basically the same as the face recognition process of the second embodiment, and only the differences between the two embodiments are described below.

In the third embodiment, first, in the same manner as in the first and second embodiments, a similarity density threshold is set for the face picture of each person in the face picture library. Then, a similarity density threshold is determined for all possible age pairs, taking into account age factors. Taking the case described in fig. 10 as an example, the face pictures a1 to a3 of person a correspond to the three age groups of 0 to 18 years old, 18 to 60 years old, and over 60 years old, respectively. It is assumed that the similarity density threshold set for the face pictures a1 through a3 of the person a is 0.54 without considering the age factor. When further considering the age factors, the age of the face pictures a 1-a 3 corresponds to three age groups of 0-18 years, 18-60 years and over 60 years, and the age of the face picture to be recognized may also be one of the three age groups, so that the pair of the face picture to be recognized and the face pictures a 1-a 3 is obtained by permutation and combination, as shown in fig. 10, which includes 6 cases: 0 to 18 years-0 to 18 years old, 0 to 18 to 60 years old, 0 to 18 to 60 years old or older, 18 to 60 to 18 to 60 years old, 18 to 60 years old-60 years old or older, and 60 to 60 years old or older. On the basis of the similarity density threshold value 0.54 (without considering age factors) set for the face pictures a 1-a 3, the similarity density threshold values of the age pairs are adjusted, so that the adjusted similarity density threshold values corresponding to each age pair can better adapt to the similarity comparison between the age pairs.

Specifically, taking 0-18 years-18-60 years as an example, the similarity density threshold is reduced to 0.51 on the basis of 0.54 without considering the age factor, because: the face of the person in the childhood period and the youth period is changed greatly, which is not beneficial to the similarity contrast, so the similarity density threshold value needs to be slightly lowered for the similarity contrast in the childhood period and the youth period to reduce the possibility of the occurrence of the misrecognition. Taking the 18-60-18-60 years old pair as an example, the similarity density threshold is not changed on the basis of 0.54 irrespective of the age, because: because the facial change is small between 18 and 60 years of age, similarity comparison is easy to carry out between the same age groups, and the similarity density threshold value is not reduced for reducing the possibility of misidentification. Considering that human skeletal changes in the face after adult are small, the similarity density threshold for the pair above 60 years-above 60 years is also 0.54. Since the development of bones and the like is greatly changed during the growth of children, the similarity density threshold of 0-18 years-0-18 years can be slightly reduced to 0.53.

Thereafter, when the face picture to be recognized is received, the recognition result can be obtained in the same manner as in step S203 and step S207 of the second embodiment.

< fourth exemplary embodiment >

In the fourth embodiment, after the similarity density of the face features in the feature space is set as the similarity density of a plurality of face pictures of the same person, the similarity density threshold of each face picture is adjusted based on the illumination factor of the pictures in different face pictures of the same person during shooting. Because the face features in the face pictures are affected by the illumination during shooting, and the illumination states of different face pictures of the same person are different, the similarity density thresholds of different face pictures of the same person need to be adjusted according to the illumination states. Here, the illumination of the picture may be customized by the user, for example, defining the illumination as weak light, normal light, and strong light; as another example, illumination is defined as white light, yellow light, blue light, and the like. The illumination information of the face picture can be stored together with the face picture as picture attribute information. Of course, the present embodiment is not limited to determining illumination information of a face picture by a technique such as illumination detection.

The face recognition process of the fourth embodiment is basically the same as the face recognition process of the second embodiment, and only the differences between the two embodiments are described below.

In the fourth embodiment, first, in the same manner as in the first and second embodiments, similarity density thresholds are set for a plurality of face pictures of the same person in a face picture library. Then, considering the illumination factors of the face picture, the similarity density threshold of all possible illumination pairs is determined. Taking the case described in fig. 11 as an example, the lights corresponding to the face pictures a1 to a3 of the person a are respectively weak light, normal light, and strong light. It is assumed that the similarity density threshold value set for the face pictures a1 through a3 of the person a is 0.54 without considering the lighting factor. When the lighting factors are further considered, the lighting factors of the face pictures a1 to a3 include weak light, normal light and strong light, and the lighting factor of the face picture to be recognized may be one of the three cases. Therefore, by permutation and combination, the lighting of the face picture to be recognized and the face pictures a1 to a3 has 6 cases as shown in fig. 11: a weak light-weak light pair, a weak light-normal light pair, a weak light-strong light pair, a normal light-normal light pair, a normal light-strong light pair, and a strong light-strong light pair. On the basis of the similarity density threshold value 0.54 (without considering the condition of illumination factors) set for the face pictures a 1-a 3, the similarity density threshold value of each illumination pair is adjusted, so that the adjusted similarity density threshold value corresponding to each illumination pair can better adapt to the similarity comparison between the illumination pairs.

Specifically, taking the weak-normal light pair as an example, the similarity density threshold is reduced to 0.51 on a 0.54 basis without considering the illumination factor, because: the accuracy of the face features extracted in the low-light state is poorer than that in the normal light and high-light states, and the difference between the face features determined in the low-light state and the face features determined in the normal light state is larger, so that the similarity contrast is not facilitated, and therefore, the similarity density threshold can be slightly reduced, and the possibility of rejection is reduced. Again, taking the normal light-normal light pair as an example, there is no change on a 0.54 basis without considering the illumination factor because: the human face feature extraction is facilitated in the normal light state, and the similarity contrast is facilitated in the same illumination state, so that the similarity density threshold value of a normal light-normal light pair is 0.54, and the possibility of false recognition is reduced. Since the low light state is unfavorable for the similarity contrast compared to the normal light and the high light state, the similarity density threshold of the low light-low light pair may be slightly reduced to 0.53.

< fifth exemplary embodiment >

In the second to fourth embodiments, the similarity density threshold of the face pictures is more suitable for face recognition by adjusting the similarity density threshold of various factor pairs on the basis of the face pose, the age group of the person and the illumination factor of the pictures of the same person. In the fifth embodiment, at least two factors in the second to fourth embodiments are combined to adjust the similarity density threshold corresponding to the face picture.

To combine the second embodiment and the third embodiment into an example, first, the corresponding relationship between different face poses of the same person and the similarity density threshold is determined in the manner shown in the second embodiment, as shown in fig. 9 (b); the corresponding relationship between different ages of the same person and the similarity density threshold is determined in the manner shown in the third embodiment, as shown in fig. 10. When the human face is identified, similarity matching is carried out on the human face picture to be identified and the human face pictures in the human face picture library, the human face picture meeting the similarity matching condition is determined, and then the similarity density threshold value used for carrying out the human face identification is determined based on the similarity density threshold value of the determined human face picture meeting the similarity matching condition in the human face posture pair and the similarity density threshold value of the determined human face picture in the age pair. An alternative approach is to use the larger of the two similarity density thresholds as the similarity density threshold that is ultimately used for face recognition.

For example, after similarity matching is performed between the received face picture to be recognized and the photos in the face picture library, it is assumed that the face picture a1 (full-sided face, 18-60 years old) of the person a is a face picture satisfying the similarity matching condition, and the face pose of the face picture to be recognized is a full-sided face, and the age of the face picture to be recognized is 0-18 years old. Determining that the similarity density threshold corresponding to the full-sided face-full-sided face pair is 0.53 by searching the corresponding relationship between the face pose pair shown in fig. 9(b) and the similarity density threshold; then, the corresponding relationship between the age pairs and the similarity density threshold shown in FIG. 10 is found, and the similarity density threshold corresponding to the age pairs of 0 to 18 years and the age pairs of 18 to 60 years is 0.51. The greater 0.53 of the two similarity density thresholds can be used as the similarity density threshold for face recognition, that is, if the similarity between the face picture to be recognized and the face picture a1 is higher than 0.53, the person a shown in the face picture a1 is used as the recognition result; otherwise, outputting the result which cannot be identified.

The second embodiment and the third embodiment are combined into an example for description, and the disclosure is not limited to the combination of the other embodiments. For example, the second embodiment, the third embodiment and the fourth embodiment are combined to perform face recognition. And will not be described in detail herein.

< sixth exemplary embodiment >

In the solutions of the first to fifth embodiments, the similarity density threshold set for the face picture of the human object may be different for different face pictures. However, in order to make the face recognition system more user-friendly, it is desirable that the similarity density threshold used in face recognition is a desired value for a user. Therefore, in the sixth exemplary embodiment of the present disclosure, on the basis of any one of the first to fifth embodiments, the similarity density threshold set for the face picture is linearly transformed into the expected value of the user, and meanwhile, based on the same transformation ratio, the similarity between the face picture to be recognized and the face picture in the face picture library is also linearly transformed, so that the sequence of the similarities between the face picture to be recognized before and after transformation and the face picture in the face picture library is not changed, and the face recognition is performed by comparing the transformed similarities with the expected value of the user, and the result of the face recognition is not affected.

The face recognition process of the sixth embodiment is basically the same as the face recognition processes of the first to fifth exemplary embodiments, except that: when the similarity matching is performed between the face picture to be recognized and the face pictures in the face picture library, and the face picture meeting the similarity matching condition is determined, the similarity density threshold corresponding to the face picture with the highest similarity is converted into the expected value of the user, meanwhile, the similarity between the face picture to be recognized and the other face pictures is also converted according to the same conversion ratio, and the face recognition process shown in the step S104 is executed by using the converted similarity and the expected value of the user.

Taking the case shown in fig. 12 as an example, it is assumed that the similarity degrees between the face picture to be recognized and the face picture a1 of person a, the face picture B1 of person B, and the face picture C1 of person C are 0.46, 0.42, and 0.38, respectively, wherein the similarity density threshold of the face picture a1 with the highest similarity degree is 0.45; the expected value of the similarity density preset by the user is 0.55. Changing the similarity density threshold of the face picture a1 from 0.46 to 0.56, changing the transformation ratio of 0.56/0.46 to 1.22, and changing the similarity between the face picture to be recognized and the face picture a1 from 0.46 to 0.56 according to the transformation ratio; similarly, the similarity between the face picture to be recognized and the face picture b1 is changed from 0.42 to 0.51, and the similarity between the face picture to be recognized and the face picture c1 is changed from 0.38 to 0.46. The specific transformation formula is shown in the following formula (2).

Zoom _ Rate (2) is the similarity before the transformation, which is the similarity after the transformation of the similarity density threshold corresponding to the face picture with the highest expected value/similarity (after the transformation of the similarity threshold)

Referring to the situation shown on the left side of fig. 12, in the case where no transformation is performed, the similarity between the face picture to be recognized and the face picture a1 is higher than the similarity density threshold of the face picture a1, and it can be determined that the person a is the result of face recognition. Referring to the right side of fig. 12 again, after the similarity density threshold of the face pictures is transformed into a fixed expected value, the sequence of the face pictures a 1-c 1 after the similarity transformation does not change, and only the similarity of the face picture a1 after the transformation with the face picture to be recognized is higher than the expected value, the result of the face recognition is also person a, which is not changed compared with the result before the transformation.

According to the scheme of the sixth exemplary embodiment of the present disclosure, the similarity density threshold used in face recognition can be converted into the expected value of the user, so that the user feels a more friendly operation process, and meanwhile, compared with the first to fifth exemplary embodiments, such an operation does not change the recognition result, and does not cause the problem of reduction in the accuracy of recognition. Fig. 12 is a diagram illustrating an example of setting a similarity density threshold for each face picture based on the similarity density in the first exemplary embodiment, and a sixth embodiment is also not limited to the transformation of the similarity density thresholds for the face pose, the age group, and the illumination factor in the second to fifth embodiments, and will not be described again here.

< seventh exemplary embodiment >

The above first to sixth exemplary embodiments may be face recognition processing for one face picture to be recognized, and the present disclosure is not limited to the face recognition processing for video data. In the continuous frames of a video, the movement of the person is continuous, and the face recognition process of the video data only needs to give a face recognition result for the track of the same person. The tracks of the same person are a series of pictures of the person, but due to the motion and other reasons, the pictures are different in face pose, picture size, illumination state, definition and the like, and therefore, the face recognition results of the pictures may also be different. In view of this, a seventh exemplary embodiment of the present disclosure provides a method for identifying a trajectory of a person in a video, and determines a final identification result of the trajectory by comprehensively judging an identification result of each face picture to be identified in the trajectory, such as a peak method, a voting method, and the like. A face recognition process of a seventh exemplary embodiment of the present disclosure will be described in detail below.

The manner of setting the similarity density threshold value for the face pictures in the face picture library in advance in the seventh exemplary embodiment of the present disclosure is the same as that in the first to sixth exemplary embodiments.

In the seventh exemplary embodiment of the present disclosure, a plurality of pictures in a series of pictures in a person trajectory may be subjected to face recognition processing as face pictures to be recognized by sampling or the like. When face recognition processing is performed on any face picture to be recognized, the face picture satisfying the similarity matching condition and the similarity density threshold corresponding to the face picture may be determined in the same manner as in the first to sixth exemplary embodiments, and a face recognition result may be determined according to a determination result of whether the similarity between the face picture to be recognized and the face picture satisfying the similarity matching condition is higher than the similarity density threshold corresponding to the face picture to be recognized. After a plurality of face pictures to be recognized are recognized, if the obtained recognition results are the same, the same recognition result is used as the recognition result of the person track. If the obtained identification results are not identical, the final identification result is determined through comprehensive judgment. The manner of the seventh exemplary embodiment of the present disclosure is described below by taking the peak method as an example.

After each face picture to be recognized is recognized, the face picture with the highest similarity is determined from the obtained face pictures meeting the similarity matching condition. If the highest similarity is higher than the similarity density threshold corresponding to the face picture, taking the figure shown by the face picture as a face recognition result of the track; otherwise, outputting the result which cannot be identified. Taking the case shown in fig. 13 as an example, it is assumed that the face pictures 1 to 3 to be recognized are sampled from the trajectory of the person, and after similarity matching, the face picture c1 (with a similarity of 0.38) which satisfies the similarity matching condition and has the highest similarity with the face picture 1 to be recognized, the face picture b1 (with a similarity of 0.42) having the highest similarity with the face picture 2 to be recognized, and the face picture a1 (with a similarity of 0.46) having the highest similarity with the face picture 3 to be recognized are determined. Then, the face picture a1 with the highest similarity is determined from the face pictures a1 to c1, and the similarity density threshold corresponding to the face picture a1 is 0.45. Since the similarity between the face picture 3 to be recognized and the face picture a1 is higher than the similarity density threshold corresponding to the face picture a1, it can be determined that the person a shown in the face picture a1 is the recognition result of the track. It should be noted that fig. 13 is described as an example of transforming the similarity density threshold corresponding to the face picture with the highest similarity into the expected value 0.55 set by the user and performing equal-scale transformation on the similarity between the face picture to be recognized and the face pictures in the face picture library in the sixth exemplary embodiment, and as can be seen from fig. 13, only the similarity of the face picture a1 is higher than the expected value, the similarities of other face pictures are lower than the expected value, and the ordering of the similarities is the same as before the equal-scale transformation.

As a possible situation, if the similarity between a plurality of face pictures satisfying the similarity matching condition and the face picture to be recognized is higher than the corresponding similarity density threshold, determining whether the similarities are close (if the difference between the similarities is less than 10%) according to the method shown in fig. 7(b), if so, indicating that the recognition results of the plurality of people are difficult to distinguish, and outputting a result which cannot be recognized; if not, the multiple persons can be distinguished obviously, and the person shown in the face picture with the highest similarity is used as the recognition result.

In addition to the peak method, the seventh exemplary embodiment of the present disclosure is not limited to other manners, for example, after the recognition processing is performed on a plurality of face pictures to be recognized, even if the recognition results are not completely the same, if there are at least two identical recognition results, the largest number of identical recognition results may be used as the recognition result of the trajectory. If the recognition result of the face picture 1 to be recognized is the person a, the recognition result of the face picture 2 to be recognized is not recognizable, and the recognition result of the face picture 3 to be recognized is also the person a, the person a can be used as the recognition result of the track.

< eighth exemplary embodiment >

The eighth exemplary embodiment of the present disclosure describes an object recognition apparatus belonging to the same inventive concept as the first to seventh exemplary embodiments, and as shown in fig. 14, the object recognition apparatus includes a similarity matching unit 20, a threshold setting unit 21, a threshold determining unit 22, and an object recognition unit 23. The similarity matching unit 20 performs similarity matching between the target picture to be identified and the target pictures in the target picture library; the threshold setting unit 21 sets a similarity threshold for a target picture of each target of at least some targets in the target picture library; the threshold value determining unit 22 determines a target picture satisfying the similarity matching condition from the target picture library and determines a similarity threshold value of the target picture; the target recognition unit 23 determines a target recognition result according to whether the similarity between the target picture to be recognized and the target picture satisfying the similarity matching condition is higher than the determined similarity threshold.

Preferably, the threshold setting unit 21 determines similarity between the target picture of the target and the target pictures of other targets in the target picture library, and sets a similarity threshold for the target picture of the target according to the maximum value of the determined similarity and the importance of the target.

Preferably, in the case of face recognition as an example, the threshold setting unit 21 sets a similarity threshold for each face pose pair based on a face pose pair of a possible face pose combination shown by each face picture of the same person and a face pose shown by a face picture to be recognized; the threshold value determining unit 22 determines a similarity threshold value of a face pose pair of a face pose combination shown by a face picture meeting a similarity matching condition and a face pose combination shown by a face picture to be recognized; the face recognition unit 23 determines a face recognition result according to whether the similarity between the face picture to be recognized and the face picture satisfying the similarity matching condition is higher than the similarity threshold of the determined face pose pair.

Preferably, in the case of face recognition as an example, the threshold setting unit 21 sets a similarity threshold for each age pair based on an age pair in which an age shown in each face picture of the same person is combined with an age shown in a face picture to be recognized; the threshold value determining unit 22 determines a similarity threshold value of an age pair of an age combination shown by the face picture satisfying the similarity matching condition and an age combination shown by the face picture to be recognized; the face recognition unit 23 determines a face recognition result according to whether the similarity between the face picture to be recognized and the face picture satisfying the similarity matching condition is higher than the similarity threshold of the determined age pair.

Preferably, in the case of face recognition as an example, the threshold setting unit 21 sets a similarity threshold for each illumination pair based on an illumination pair of illumination of each face picture of the same person and a possible illumination combination in the face picture to be recognized; the threshold value determining unit 22 determines a similarity threshold value of an illumination pair of illumination of the face picture meeting the similarity matching condition and illumination combination of the face picture to be recognized; the face recognition unit 23 determines a face recognition result according to whether the similarity between the face picture to be recognized and the face picture satisfying the similarity matching condition is higher than the similarity threshold of the determined illumination pair.

Preferably, the apparatus further includes a similarity transformation unit 24 that transforms a similarity threshold of a target picture, which satisfies a similarity matching condition and has the highest similarity with the target picture to be recognized, into a predetermined expected value, and transforms the similarity between the target picture satisfying the similarity matching condition and the target picture to be recognized according to a transformation ratio between the expected value and the similarity threshold; the target recognition unit 23 determines a target recognition result according to whether the converted similarity is higher than the expected value.

Preferably, when a plurality of target pictures to be recognized included in the same person track are recognized, for each target picture to be recognized, the threshold determining unit 22 determines a target picture satisfying the similarity matching condition from the target picture library; the similarity transformation unit 24 transforms the similarity threshold of the target picture which satisfies the similarity matching condition and has the highest similarity with the target picture to be recognized into a predetermined expected value, and transforms the similarity between the target picture satisfying the similarity matching condition and the target picture to be recognized according to the transformation ratio between the expected value and the similarity threshold; (ii) a The target recognition unit 23 determines a target recognition result according to whether the converted similarity is higher than the expected value.

Preferably, when recognizing a plurality of target pictures to be recognized contained in the same target track, the target recognition unit 23 determines a target recognition result for each target picture to be recognized, and takes the same result with the largest number in the recognition results as the recognition result of the track.

Preferably, when the threshold determining unit 22 determines that at least two target pictures meet the similarity matching condition and the target identifying unit 23 determines that the similarity between the target picture to be identified and each of the at least two target pictures is higher than the similarity threshold of the target picture, the target identifying unit 23 determines whether the difference between each two similarities between the target picture to be identified and each of the at least two target pictures is smaller than a set value, and outputs the result that the target picture cannot be identified if the difference is smaller than the set value; and if not, taking the target shown by the target picture with the highest similarity with the target picture to be identified in the at least two target pictures as a target identification result.

Other embodiments

Embodiments of the present disclosure may also be implemented by a computer of a system or apparatus that reads and executes computer-executable instructions (e.g., one or more programs) recorded on a storage medium (also may be more fully referred to as a "non-transitory computer-readable storage medium") to perform the functions of one or more of the above-described embodiments and/or includes one or more circuits (e.g., an application-specific integrated circuit (ASIC)) for performing the functions of one or more of the above-described embodiments, and by a method performed by a computer of a system or apparatus by, for example, reading and executing computer-readable instructions from a storage medium to perform the functions of one or more of the above-described embodiments and/or controlling one or more circuits to perform the functions of one or more of the above-described embodiments. The computer may include one or more processors (e.g., a Central Processing Unit (CPU), Micro Processing Unit (MPU)) and may include a separate computer or a network of separate processors to read out and execute computer-executable instructions. The computer-executable instructions may be provided to the computer from, for example, a network or a storage medium. The storage medium may include, for example, one or more of a hard disk, a Random Access Memory (RAM), a Read Only Memory (ROM), storage of a distributed computing system, an optical disk such as a Compact Disk (CD), a Digital Versatile Disk (DVD), or a blu-ray disk (BD) (registered trademark), a flash memory device, a memory card, and the like.

The embodiments of the present disclosure can also be realized by a method in which software (programs) that perform the functions of the above-described embodiments are supplied to a system or an apparatus through a network or various storage media, and a computer or a Central Processing Unit (CPU), a Micro Processing Unit (MPU) of the system or the apparatus reads out and executes the methods of the programs.

While the present disclosure has been described with reference to exemplary embodiments, it is to be understood that the disclosure is not limited to the disclosed exemplary embodiments. The scope of the following claims is to be accorded the broadest interpretation so as to encompass all such modifications and equivalent structures and functions.

Claims

1. A method of object recognition, the method comprising:

carrying out similarity matching on a target picture to be identified and a target picture in a target picture library;

determining a target picture meeting a similarity matching condition from a target picture library and determining a similarity threshold of the target picture;

determining a target identification result according to whether the similarity between the target picture to be identified and the target picture meeting the similarity matching condition is higher than a determined similarity threshold value;

and setting a similarity threshold for the target picture of each target in at least part of targets in the target picture library.

2. The method according to claim 1, wherein setting a similarity threshold for the target picture of each of at least some of the targets in the target picture library specifically comprises:

determining the similarity between the target picture of the target and the target pictures of other targets in the target picture library;

and setting a similarity threshold value for the target picture of the target according to the determined maximum value of the similarity and the importance of the target.

3. The method of claim 1, wherein the target is a person, the target recognition is a human face recognition, and the target picture is a human face picture;

the method further comprises the following steps:

setting a similarity threshold for each face pose pair based on the face pose shown by each face picture of the same person and a face pose pair of possible face pose combinations shown by the face picture to be recognized;

determining a similarity threshold value of a face gesture pair of a face gesture combination shown by a face picture meeting a similarity matching condition and a face gesture combination shown by a face picture to be recognized;

and determining a face recognition result according to whether the similarity between the face picture to be recognized and the face picture meeting the similarity matching condition is higher than the similarity threshold of the determined face pose pair.

4. The method of claim 1, wherein the target is a person, the target recognition is a human face recognition, and the target picture is a human face picture;

the method further comprises the following steps:

setting a similarity threshold for each age pair based on the age pair of the age combination shown by each face picture of the same person and the age combination shown by the face picture to be recognized;

determining the similarity threshold of an age pair of an age combination shown by the face picture meeting the similarity matching condition and the age combination shown by the face picture to be recognized;

and determining a face recognition result according to whether the similarity between the face picture to be recognized and the face picture meeting the similarity matching condition is higher than the similarity threshold of the determined age pair.

5. The method of claim 1, wherein the target is a person, the target recognition is a human face recognition, and the target picture is a human face picture;

the method further comprises the following steps:

setting a similarity threshold for each illumination pair based on illumination pairs of each face picture of the same person and possible illumination combinations in the face picture to be recognized;

determining a similarity threshold value of an illumination pair of illumination of the face picture meeting the similarity matching condition and illumination combination of the face picture to be recognized;

and determining a face recognition result according to whether the similarity between the face picture to be recognized and the face picture meeting the similarity matching condition is higher than the similarity threshold of the determined illumination pair.

6. The method of claim 1, wherein the method further comprises:

converting the similarity threshold of the target picture which meets the similarity matching condition and has the highest similarity with the target picture to be identified into a preset expected value, and converting the similarity between the target picture which meets the similarity matching condition and the target picture to be identified according to the conversion ratio between the expected value and the similarity threshold;

and determining a target recognition result according to whether the converted similarity is higher than the expected value.

7. The method according to claim 1, wherein when a plurality of target pictures to be identified contained in the same target track are identified, for each target picture to be identified, a target picture satisfying a similarity matching condition is determined from a target picture library;

8. The method according to claim 1, wherein when a plurality of target pictures to be identified contained in the same target track are identified, a target identification result is determined for each target picture to be identified;

and taking the same result with the largest number in the recognition results as the recognition result of the track.

9. The method according to any one of claims 1 to 8, wherein when at least two target pictures meet the similarity matching condition and the similarity between the target picture to be identified and each target picture in the at least two target pictures is higher than the similarity threshold of the target picture, the method further comprises:

determining whether the difference value between every two similarity degrees of the target picture to be identified and each of the at least two target pictures is smaller than a set value;

under the condition that the difference value is smaller than the set value, outputting a result which cannot be identified; and if not, taking the target shown by the target picture with the highest similarity with the target picture to be identified in the at least two target pictures as a target identification result.

10. An object recognition apparatus, characterized in that the apparatus comprises:

the similarity matching unit is configured to perform similarity matching on the target picture to be identified and the target pictures in the target picture library;

a threshold setting unit configured to set a similarity threshold for a target picture of each of at least some of the targets in the target picture library;

a threshold determination unit configured to determine a target picture satisfying a similarity matching condition from a target picture library and determine a similarity threshold of the target picture;

and the target identification unit is configured to determine a target identification result according to whether the similarity between the target picture to be identified and the target picture meeting the similarity matching condition is higher than the determined similarity threshold value.

11. The apparatus of claim 10, wherein,

the threshold setting unit determines the similarity between the target picture of the target and the target pictures of other targets in the target picture library, and sets a similarity threshold for the target picture of the target according to the maximum value of the determined similarity and the importance of the target.

12. The apparatus of claim 10, wherein the object is a person, the object recognition is a human face recognition, and the object picture is a human face picture;

the threshold setting unit sets a similarity threshold for each face pose pair based on a face pose pair of a possible face pose combination shown by each face picture of the same person and a face pose shown by a face picture to be recognized;

the threshold value determining unit determines a similarity threshold value of a face gesture pair of a face gesture combination shown by a face picture meeting a similarity matching condition and a face gesture combination shown by a face picture to be recognized;

and the target recognition unit determines a face recognition result according to whether the similarity between the face picture to be recognized and the face picture meeting the similarity matching condition is higher than the similarity threshold of the determined face posture pair.

13. The apparatus of claim 10, wherein the object is a person, the object recognition is a human face recognition, and the object picture is a human face picture;

the threshold setting unit sets a similarity threshold for each age pair based on an age pair of a combination of an age shown by each face picture of the same person and an age shown by a face picture to be recognized;

the threshold value determining unit determines a similarity threshold value of an age pair of an age combination shown by the face picture meeting the similarity matching condition and an age combination shown by the face picture to be recognized;

and the target identification unit determines a face identification result according to whether the similarity between the face picture to be identified and the face picture meeting the similarity matching condition is higher than the similarity threshold of the determined age pair.

14. The apparatus of claim 10, wherein the object is a person, the object recognition is a human face recognition, and the object picture is a human face picture;

the threshold setting unit sets a similarity threshold for each illumination pair based on the illumination pair of each face picture of the same person and possible illumination combinations in the face picture to be recognized;

the threshold value determining unit determines a similarity threshold value of an illumination pair of illumination of the face picture meeting the similarity matching condition and illumination combination of the face picture to be recognized;

and the target recognition unit determines a face recognition result according to whether the similarity between the face picture to be recognized and the face picture meeting the similarity matching condition is higher than the similarity threshold of the determined illumination pair.

15. The apparatus of claim 10, wherein the apparatus further comprises:

a similarity transformation unit configured to transform a similarity threshold of a target picture, which satisfies a similarity matching condition and has the highest similarity with a target picture to be recognized, to a predetermined expected value, and transform a similarity between the target picture satisfying the similarity matching condition and the target picture to be recognized according to a transformation ratio between the expected value and the similarity threshold;

the target recognition unit determines a target recognition result according to whether the transformed similarity is higher than the expected value.

16. The apparatus of claim 10, wherein the apparatus further comprises a similarity transformation unit;

when a plurality of target pictures to be identified contained in the same target track are identified, the threshold value determining unit determines a target picture meeting the similarity matching condition from a target picture library aiming at each target picture to be identified;

the similarity transformation unit is configured to transform a similarity threshold of a target picture which satisfies a similarity matching condition and has the highest similarity with a target picture to be recognized into a predetermined expected value, and transform the similarity between the target picture satisfying the similarity matching condition and the target picture to be recognized according to a transformation ratio between the expected value and the similarity threshold;

17. The apparatus of claim 10, wherein,

when the target recognition unit recognizes a plurality of target pictures to be recognized contained in the same target track, a target recognition result is determined for each target picture to be recognized, and the same result with the largest number in the recognition results is used as the recognition result of the track.

18. The apparatus of any one of claims 10-17,

when the threshold determining unit determines that at least two target pictures meet the similarity matching condition and the target identifying unit determines that the similarity between the target picture to be identified and each of the at least two target pictures is higher than the similarity threshold of the target picture, the target identifying unit determines whether the difference between every two similarities between the target picture to be identified and each of the at least two target pictures is smaller than a set value or not, and outputs an unidentifiable result when the difference is smaller than the set value; and if not, taking the target shown by the target picture with the highest similarity with the target picture to be identified in the at least two target pictures as a target identification result.

19. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to carry out the steps of the method of object recognition according to any one of claims 1 to 9.