WO2022105130A1 - 复合表情识别方法、装置、终端设备及存储介质 - Google Patents

复合表情识别方法、装置、终端设备及存储介质 Download PDF

Info

Publication number
WO2022105130A1
WO2022105130A1 PCT/CN2021/091094 CN2021091094W WO2022105130A1 WO 2022105130 A1 WO2022105130 A1 WO 2022105130A1 CN 2021091094 W CN2021091094 W CN 2021091094W WO 2022105130 A1 WO2022105130 A1 WO 2022105130A1
Authority
WO
WIPO (PCT)
Prior art keywords
composite
expression
probability
recognized
compound
Prior art date
Application number
PCT/CN2021/091094
Other languages
English (en)
French (fr)
Inventor
易苗
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022105130A1 publication Critical patent/WO2022105130A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • the present application belongs to the technical field of artificial intelligence, and in particular, relates to a method, device, terminal device and storage medium for compound facial expression recognition.
  • One of the purposes of the embodiments of the present application is to provide a compound facial expression recognition method, device, terminal device and storage medium, aiming to solve the technical problem in the prior art that it is difficult to accurately identify the main facial expression and the secondary facial expression in the composite facial expression. .
  • an embodiment of the present application provides a compound expression recognition method, including:
  • the target classification result of the to-be-recognized image is obtained.
  • an embodiment of the present application provides a composite facial expression recognition device, including:
  • the first prediction module is used to identify the compound expressions in the image to be recognized by using the first expression recognition model, and obtain the first probability values corresponding to the various compound expressions one-to-one;
  • the first compound prediction module is used to determine the predicted compound expression according to the maximum value of the first probability value, and determine the corresponding first target model in the second expression recognition model set based on the predicted compound expression, and the to-be-recognized image Input into the first target model to obtain a first composite probability value predicting that the to-be-recognized image is a first composite expression; each of the first target models corresponds to two predicted composite expressions;
  • a single expression prediction module used for inputting the to-be-recognized image into a third expression recognition model, and predicting a plurality of target single expressions contained in the to-be-recognized image;
  • the second composite prediction module is configured to determine the corresponding second target model in the second expression recognition model set according to the multiple target single expressions, and input the to-be-recognized image into the second target model to obtain Predicting that the to-be-recognized image is a second compound probability value of the second compound expression; in the two predicted compound expressions corresponding to each of the second target models, the multiple target single expressions can be combined in a one-to-one correspondence. get;
  • the obtaining module is used to obtain the first misclassification probability corresponding to the first expression recognition model, obtain the first composite misclassification probability corresponding to each of the second expression recognition models, and obtain the second the second composite misclassification probability corresponding to the expression recognition model;
  • An identification module configured to identify the first probability value, the first misclassification probability, the first composite probability value, the first composite misclassification probability, the second composite probability value, and the second composite probability value
  • the misclassification probability is compounded to obtain the target classification result of the to-be-recognized image.
  • a third aspect of the embodiments of the present application provides a terminal device, including: a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program When realized:
  • the target classification result of the to-be-recognized image is obtained.
  • a fourth aspect of the embodiments of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and the computer program is executed by a processor to implement:
  • the target classification result of the to-be-recognized image is obtained.
  • a fifth aspect of the embodiments of the present application further provides a computer program product, when the computer program product is run on a terminal device, the terminal device can implement:
  • the target classification result of the to-be-recognized image is obtained.
  • the embodiments of the present application include the following advantages:
  • the first probability value of each compound expression is predicted by the first expression recognition model as a classification result; then, based on the prediction result of the first expression recognition model, the first target model is determined and the image to be recognized is re-identified. Carry out prediction, and use the obtained first composite probability value as another classification result; after that, single-expression recognition is performed by a third expression recognition model that only predicts the single expression of the target in the image to be recognized, and based on the prediction result of the third expression recognition model , determine that the second target model predicts the to-be-recognized image again, and use the obtained second composite probability value as another classification result.
  • the prediction results of the above multiple expression recognition models can be integrated as the basis for the preliminary recognition of complex facial expressions, and on this basis, the misclassification probability corresponding to each prediction result can be used as correction information.
  • the prediction result is corrected to obtain the target classification result, so as to further improve the accuracy of recognizing complex facial expressions.
  • Fig. 1 is the realization flow chart of a kind of compound expression recognition method provided by an embodiment of the present application
  • FIG. 2 is a schematic diagram of an implementation manner of S105 of a compound expression recognition method provided by an embodiment of the present application
  • FIG. 3 is a schematic diagram of an implementation manner of S106 of a compound expression recognition method provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of another implementation manner of S106 of a compound expression recognition method provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of another implementation manner of S106 of a compound expression recognition method provided by an embodiment of the present application.
  • FIG. 6 is a structural block diagram of a composite expression recognition device provided by an embodiment of the present application.
  • FIG. 7 is a structural block diagram of a terminal device provided by an embodiment of the present application.
  • the composite expression recognition method provided by the embodiments of the present application can be applied to terminal devices such as tablet computers, notebook computers, and ultra-mobile personal computers (UMPCs), and the embodiments of the present application do not impose any restrictions on the specific types of the terminal devices. .
  • terminal devices such as tablet computers, notebook computers, and ultra-mobile personal computers (UMPCs)
  • UMPCs ultra-mobile personal computers
  • FIG. 1 shows an implementation flowchart of a compound expression recognition method provided by an embodiment of the present application. The method includes the following steps:
  • the above-mentioned to-be-recognized image is a human face image including a human face, so the facial expression can be identified from the human face image.
  • the compound expression in the face image can be understood as that the face expression contains at least two expressions at the same time, and the compound expression can be considered as a master-slave expression with a master-slave relationship.
  • the above-mentioned master-slave expressions are expressed as, among the various expressions included in the human face image, which type of expression is the primary expression, and which type of expression is the secondary expression. It is understandable that for the two expressions, there is one primary expression and one secondary expression.
  • the main expression in the face image can be clearly distinguished from other expressions, and at this time, the other expressions can be considered as secondary expressions.
  • this embodiment is explained by using images to be recognized that include two expressions.
  • the facial expressions include, but are not limited to, happiness, surprise, disgust, fear, sadness, happiness, surprise, and naturalness, and any combination of the above two facial expressions can be considered as a compound expression.
  • the main expression is happy and the subordinate expression is surprised, and the main expression is surprised (corresponding to the above subordinate expression), and the subordinate expression is happy (corresponding to the above main expression)
  • the two Those who belong to different master-slave expressions composite expressions.
  • the first expression recognition model is a model obtained by training according to the first training data.
  • the first training data can be considered as a face image including a compound expression composed of the above eight single expressions.
  • 42 kinds of complex expressions can be combined correspondingly.
  • the first training data may also include 8 face images of a single expression to form a face image including 50 complex expressions.
  • a single expression may also be regarded as a composite expression in which the master expression and the slave expression are identical, and this is not limited.
  • the above-mentioned first training data can be obtained from the compound expression competition data set.
  • the compound expression competition data set contains 31,250 expression pictures, consisting of 125 individuals, each individual's 50 kinds of compound expression pictures, and each Emoticons include 5 pictures.
  • the data set can be divided into 83 individuals with a total of 20,650 images as the training set (the first training data), 9 individuals with a total of 2,250 images as the validation set, and the remaining 33 individuals as the test set.
  • the residual network model is used as the basic network, and the first training data is input into the residual network model for training.
  • the residual network can perform feature extraction on the face area to obtain 512-dimensional face features.
  • the face feature and the coordinate normalized value of 136-dimensional key points are set as the compound expression feature in the compound expression picture.
  • the compound expression feature is sent to the classification layer, the classification result is output, and the classification loss is calculated according to the actual compound expression result, and the residual network model is iteratively updated according to the classification loss to obtain the first expression recognition model.
  • the cross entropy can be used as the loss function. Because the cross entropy can represent the distance between the actual output (probability) and the expected output (probability), it can be considered that the smaller the value of the cross entropy, the closer the two probability distributions are.
  • the recognition model has high accuracy in recognizing compound expressions.
  • the real label of each composite expression picture can be pre-marked, that is, the probability (expected output) of which type of composite expression each composite expression belongs to.
  • the to-be-recognized image is input into the first model, and there are multiple first probability values obtained. That is, for 50 kinds of compound expressions, the first expression recognition model can output 50 first probability values, and each first probability value is a value when the first expression recognition model predicts that the image to be recognized is this type of compound expression.
  • S102 Determine a predicted compound expression according to the maximum value of the first probability value, determine a first target model corresponding to a second expression recognition model set based on the predicted compound expression, and input the to-be-recognized image into the first
  • the target model obtains a first composite probability value for predicting that the to-be-recognized image is a first composite expression; each of the first target models corresponds to two predicted composite expressions respectively.
  • the above-mentioned second expression recognition model is a model obtained by training according to the second training data.
  • the second training data may be obtained based on the first training data.
  • the combination of the two opposite expressions can be used.
  • the training data corresponding to the expressions trains a second expression recognition model. In this way, a second expression recognition model set including multiple second expression recognition models is obtained.
  • the expression pictures corresponding to the two compound expressions can be obtained from the compound expression competition data set, And determine the specific master-slave expression label (happy and surprised label, surprised and happy label) of each expression picture as a second training data, and then can train a binary classification model (second expression recognition model) about happy and surprised composite expressions.
  • the obtained second expression recognition model is only used to predict the master-slave relationship between happiness and surprise in the compound expression of the image to be recognized.
  • the 42 kinds of compound expressions can be combined into 21 kinds of second training data, and then 21 kinds of second expression recognition models can be obtained by training.
  • Each second expression recognition model is a binary classification model, and is used to predict the main expression and the subordinate expression in the expression picture.
  • the composite expression picture composed of the main expression A and the subordinate expression B, and the composite expression picture composed of the main expression B and the subordinate expression A can be used as a kind of second training data.
  • Classified second expression recognition model is a binary classification model, and is used to predict the main expression and the subordinate expression in the expression picture.
  • the first composite probability value of the main expression A and the secondary expression B in the composite expression in the to-be-recognized image can be obtained, and/or the composite expression in the to-be-recognized image can be obtained is the first composite probability value of master expression B and slave expression A.
  • a single expression is a composite expression with the same primary expression and secondary expression, so as to constitute 29 kinds of second expression recognition models.
  • the compound expression corresponding to the largest first probability value may be used as the predicted compound expression according to the magnitude of the first probability value.
  • each second expression recognition model is respectively used to recognize a type of compound expression in the image to be recognized. Therefore, the predicted compound expression also belongs to the category of compound expression recognition performed by a certain second expression recognition model, and then the second expression recognition model can be used as the first target model.
  • the first composite expression predicted by the first target model may be consistent with the predicted composite expression by the first expression recognition model, or may be opposite, which is not limited. It should be noted that the first target model predicts the image to be recognized, and the obtained prediction result is the first composite expression, and the value predicting the image to be recognized as the first composite expression is the first composite probability value.
  • the above-mentioned third expression recognition model is a model obtained by training according to the third training data.
  • the third training data may be obtained based on the first training data.
  • the two compound expressions are regarded as a new compound expression category.
  • a variety of new composite expression categories can be obtained.
  • the primary expression is happy
  • the secondary expression is a composite expression of surprise
  • the primary expression is surprised
  • the secondary expression is a composite expression of happy.
  • the two complex expressions can be used as the third training data, and the third training data does not consider the master-slave relationship of the complex expressions in the third training data when training the third expression recognition model, that is, only the corresponding expressions of each expression picture are marked. Contains multiple single expressions.
  • the third training data containing 29 kinds of master-slave relationships in compound expressions can be obtained. That is, the 21 types of training data in the above S102 and the 8 types of training data with the same master-slave expressions can be combined into the third training data.
  • the expression picture corresponding to each new compound expression can be obtained again from the compound expression competition data set as the third training data to train the third expression recognition model.
  • there is one and only one third expression recognition model trained and the third expression recognition model is only used to predict the target single expression contained in the image to be recognized.
  • the above-mentioned third expression recognition model when the above-mentioned third expression recognition model performs expression recognition on the image to be recognized, it can predict the third probability value corresponding to each compound expression composed of two single expressions. That is to say, the above-mentioned third expression recognition model only outputs the probability value of the composite expression composed of which two single expressions the image to be recognized is composed of, and does not consider the master-slave relationship between the two single expressions. The first probability value of the expression is different from the specific master-slave relationship of the slave expression). Therefore, the third expression recognition model can output 29 third probability values. Based on this, the single expression included in the compound expression corresponding to the largest third probability value can be used as the target single expression from the 29 third probability values. Exemplarily, if the third expression recognition model predicts that the third probability value of happiness and surprise among the composite expressions is the largest, then happiness and surprise may be used as the target single expression respectively.
  • S104 Determine a corresponding second target model in the second expression recognition model set according to the multiple target single expressions, and input the to-be-recognized image into the second target model to predict the to-be-recognized image is the second composite probability value of the second composite expression; the two predicted composite expressions corresponding to each of the second target models can be obtained by a one-to-one combination of the multiple target single expressions.
  • each second expression recognition model is respectively used for recognizing a type of compound expression in the image to be recognized. Therefore, according to a compound expression composed of a single target expression, a consistent compound expression can be determined from a class of compound expressions correspondingly recognized by each second expression recognition model as the second target model. That is, the two predicted compound expressions for each second target model to be recognized by the image to be recognized can be obtained by a one-to-one combination of multiple target single expressions.
  • a second expression recognition model for classifying the happy and surprised master-slave expressions in the composite expression can be determined, that is, the second target model. .
  • S105 Acquire a first misclassification probability corresponding to the first expression recognition model, acquire a first composite misclassification probability corresponding to each of the second expression recognition models, and acquire each of the second expression recognition models The corresponding second composite misclassification probability.
  • the above-mentioned first misclassification probability is the probability that the first expression recognition model makes an error in respectively predicting the master-slave expression in each compound expression.
  • the training data in the test set can be used for determination. For example, for 50 compound expressions, each compound expression includes 5 expression pictures, the first expression recognition model is used to predict the 5 expression pictures corresponding to each compound expression, and the number of accurately predicted pictures in each compound expression is counted . Then, according to the number of accurately predicted pictures and the total number (5), the misclassification probability when the first expression recognition model recognizes each compound expression is calculated.
  • the second misclassification probability is to first perform expression prediction on the to-be-recognized image of the compound expression through the first expression recognition model, and on the basis of the prediction result of the first expression recognition model, use the corresponding compound expression.
  • the second model makes predictions again. Based on this, the first compound misclassification probability when each second expression recognition model recognizes the corresponding compound expression is calculated. Similarly, each second expression recognition model predicts the second compound misclassification probability of the corresponding compound expression on the basis of the prediction result of the third expression recognition model, which will not be described in detail.
  • the above-mentioned first probability of misclassification, the first probability of composite misclassification, and the second probability of composite misclassification can be obtained after the training of the above-mentioned expression recognition model is completed, and stored in the terminal device, so that the terminal device can call it at any time. .
  • the above target classification result is the final prediction result of the image to be recognized, that is, the final prediction of the master-slave expression in the compound expression.
  • l 1j indicates that the first expression recognition model predicts that the image to be recognized is the jth image
  • the first probability value for compound expressions In this way, the explanations of l 2j and l 3j can be determined, which will not be described in detail.
  • the obtained prediction result is only to predict that the image to be recognized belongs to the first composite expression of this type of master-slave composite expression. probability value. For example, if the first target model predicts that the first compound expression is AB with a first compound probability value of 1, then the first compound probability value that predicts the compound expression as BA is 0.
  • the first target model does not output the first composite probability value corresponding to the other 48 categories of composite expressions, therefore, when participating in the above calculation, the remaining 48 categories of composite expressions (AC, CA, AD, DA ...) the first composite probability value is calculated with 0.
  • the second composite probability value l 3j it is the same as the processing method of the first composite probability value l 2j , which will not be described in detail.
  • 50 predicted probability values corresponding to 50 types of composite expressions can be obtained through the above three expression recognition models. Afterwards, the maximum value of the 50 predicted probability values can be used as the target probability value, and the type of composite expression (a composite expression with a master-slave relationship) corresponding to the target probability value can be used as the final target classification result of the image to be recognized.
  • the first probability value of each compound expression is predicted by the first expression recognition model as a classification result; then, based on the prediction result of the first expression recognition model, the first target model is determined and the image to be recognized is determined Predict again, and use the obtained first composite probability value as another classification result; after that, single expression recognition is performed by a third expression recognition model that only predicts the single expression of the target in the image to be recognized, and based on the prediction of the third expression recognition model As a result, it is determined that the second target model predicts the to-be-recognized image again, and the obtained second composite probability value is used as another classification result.
  • the prediction results of the above multiple expression recognition models can be integrated as the basis for the preliminary recognition of complex facial expressions, and on this basis, the misclassification probability corresponding to each prediction result can be used as correction information.
  • the prediction result is corrected to obtain the target classification result, so as to further improve the accuracy of recognizing complex facial expressions.
  • the first expression recognition model is obtained by training the first training data.
  • the above training data may not include the first misclassification probability.
  • a training data that is, multiple training images of each compound expression in the test set in the above S101 data set can be used to perform compound expression recognition.
  • the above prediction result error means that when the first prediction model predicts the training image of the compound expression in the test set, the prediction result (predicted compound expression) is inconsistent with the actual compound expression of the training image. It should be noted that the number of training images for each compound expression above may be consistent or inconsistent. In this embodiment, in order to make the first misclassification probability more fair when the first expression recognition model predicts each compound expression, the number of training images for each compound expression may be the same.
  • S106 is based on the first probability value, the first misclassification probability, the first composite probability value, the first composite misclassification probability, the second
  • the composite probability value and the second composite misclassification probability to obtain the target classification result of the to-be-recognized image also include the following sub-steps S1061-S1063, which are described in detail as follows:
  • the first target model and the second target model are both binary classification models, and only the probability values of two compound expressions can be output.
  • the first composite probability value of the remaining 48 composite expressions (not the first composite expression) will not be output correspondingly. Therefore, in order to facilitate the calculation of the predicted probability value of each compound expression, the first compound probability value of the remaining 48 compound expressions may be set to be 0 (preset value).
  • the above-mentioned preset value may be specifically set by the user according to the actual situation, and for details, reference may be made to the example description of the first composite probability value in the above-mentioned S106.
  • the second target model predicts that the image to be recognized is a second composite probability value that is not the second composite expression, and both are adjusted to preset values. Based on this, it can be obtained that the first composite probability value predicted by the first target model (the first composite probability value corresponding to the first composite expression, and the first composite probability value corresponding to the non-first composite expression) will have 50, each Corresponds to a class of compound expressions. Similarly, the second composite probability value predicted by the second target model (the second composite probability value corresponding to the second composite expression, and the second composite probability value corresponding to the non-second composite expression) will also correspond to 50, each Corresponds to a class of compound expressions.
  • the composite misclassification probability and the preset value are used to calculate the classification value corresponding to each composite expression in the to-be-recognized image.
  • the above calculation formula and explanation for calculating the classification value corresponding to each compound expression in the image to be recognized may refer to the calculation formula in S106 and the corresponding explanation, which will not be described in detail. It can be understood that the classification value corresponding to each compound expression above is the value 1 in the calculation formula of S106 above.
  • the classification value is the value comprehensively predicted based on the three types of expression recognition models.
  • the maximum value can be determined from the multiple classification values, and the compound expression corresponding to the maximum value can be used as The target classification result that is closest to the real complex expression of the image to be recognized, so as to improve the accuracy of recognizing the complex expression of the image to be recognized.
  • the to-be-recognized image includes multiple images, and the multiple to-be-recognized images belong to the same composite expression category; S106 is based on the first probability value, the first misclassification probability, The first composite probability value, the first composite misclassification probability, the second composite probability value, and the second composite misclassification probability are used to obtain the target classification result of the to-be-recognized image, which further includes the following sub-steps S1064-S1066, detailed as follows:
  • S1064 Acquire a target classification result of each to-be-recognized image among the multiple to-be-recognized images of the same composite expression.
  • the multiple images to be recognized of the same compound expression may be video images of multiple consecutive frames in a video clip, or pictures of people captured continuously. Because in practical situations, for a video containing a person, in the video images of multiple consecutive frames of the video, the expression change of the person will be very small. Therefore, the expressions of the characters in the video images of consecutive multiple frames can be generally considered to be the same type of compound expressions. Furthermore, the target classification result of each to-be-recognized image among the multiple to-be-recognized images of the same composite expression can be obtained through the above-mentioned composite expression recognition method.
  • the number of frames of the above-mentioned continuous multiple frames may be the number set by the user according to the actual situation, for example, five images to be recognized of the same composite expression.
  • the compound expression recognition method can be used for multiple images of the same compound expression category. During recognition, the accuracy of recognizing compound expressions of characters can be further improved.
  • S1064a Perform key point clustering processing on a plurality of images to be recognized including multiple types of compound expressions, to obtain key point feature information of each image to be recognized.
  • the above-mentioned multiple images to be recognized are images containing multiple types of compound expressions, and there may be multiple images corresponding to each type of compound expressions.
  • the above key points can be understood as the eyes, nose, mouth, etc. of the characters in each image to be recognized as the key points in the face image, which can detect each image to be recognized through face detection technology, and determine each image.
  • Clustering can be understood as determining whether the two to-be-recognized images belong to the same, after obtaining the coordinate information and feature information of the key points in each image to be recognized, according to whether the difference between the coordinate information and the feature information exceeds a preset value. Class compound expressions.
  • a plurality of to-be-recognized images belonging to the same compound expression category can be determined from among the plurality of to-be-recognized images according to the method of key point clustering. After that, the multiple images to be recognized of each same compound expression category may be processed through the above steps S1064-S1066, which will not be explained again.
  • the method further includes:
  • the multiple frames of video images are determined as multiple images to be recognized of the same composite expression.
  • the above-mentioned preset video may be a video pre-cached in a storage path designated by the terminal device, or may be a video uploaded by the user to the terminal device, which is not limited thereto.
  • the terminal device can play the video, and monitor the initial video image in which the face image initially appears in the video.
  • the frame rate of video playback is usually 24 frames per second. Therefore, it can be considered that the four consecutive frames of video images played later are multiple to-be-recognized images with the same complex expression as the initial video image.
  • the corresponding target classification result is obtained based on the terminal device.
  • the target classification result is obtained by processing the terminal tool.
  • Uploading the target classification results to the blockchain ensures its security and fairness and transparency to users.
  • the user equipment can download the target classification result from the blockchain in order to verify whether the target classification result has been tampered with.
  • the blockchain referred to in this example is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • FIG. 6 is a structural block diagram of a compound facial expression recognition device provided by an embodiment of the present application.
  • each unit included in the terminal device is used to execute each step in the embodiment corresponding to FIG. 1 to FIG. 5 .
  • the compound expression recognition apparatus 600 includes: a first prediction module 610, a first compound prediction module 620, a single expression prediction module 630, a second compound expression prediction module 640, an acquisition module 650 and an identification module 660, wherein:
  • the first prediction module 610 is configured to use the first expression recognition model to recognize the compound expressions in the image to be recognized, and obtain first probability values corresponding to each of the various compound expressions.
  • the first composite prediction module 620 is configured to determine the predicted composite expression according to the maximum value of the first probability value, and determine the corresponding first target model in the second expression recognition model set based on the predicted composite expression, and the to-be-recognized
  • the image is input to the first target model to obtain a first composite probability value predicting that the to-be-recognized image is a first composite expression; each of the first target models corresponds to two predicted composite expressions respectively.
  • the single expression prediction module 630 is configured to input the to-be-recognized image into a third expression recognition model, and predict a plurality of target single expressions included in the to-be-recognized image.
  • the second composite prediction module 640 is configured to determine a corresponding second target model in the second expression recognition model set according to the multiple target single expressions, and input the to-be-recognized image into the second target model Obtaining a second composite probability value predicting that the to-be-recognized image is a second composite expression; among the two predicted composite expressions corresponding to each of the second target models, the multiple target single expressions can be in one-to-one correspondence. combination is obtained.
  • the obtaining module 650 is configured to obtain the first misclassification probability corresponding to the first expression recognition model, obtain the first composite misclassification probability corresponding to each of the second expression recognition models, and obtain the first misclassification probability of each second expression recognition model.
  • the identification module 660 is configured to identify the first probability value, the first misclassification probability, the first composite probability value, the first composite misclassification probability, the second composite probability value, and the first composite probability value. Second, compound the misclassification probability to obtain the target classification result of the image to be recognized.
  • the obtaining module 650 is further used to:
  • the identification module 660 is also used to:
  • the first target model predicts that the to-be-recognized image is a first composite probability value that is not a first composite expression
  • the second target model predicts that the to-be-recognized image is a second composite probability that is not a second composite expression are adjusted to preset values; according to the first probability value, the first misclassification probability, the first composite probability value, the first composite misclassification probability, the second The composite probability value, the second composite misclassification probability and the preset value, calculate the classification value corresponding to each composite expression in the to-be-recognized image; determine the maximum value among the plurality of classification values, and use the maximum The composite expression corresponding to the value is used as the target classification result of the image to be recognized.
  • the to-be-recognized images include multiple, and the multiple to-be-recognized images all belong to the same compound expression category; the identifying module 660 is further configured to:
  • the identification module 660 is also used to:
  • the identification module 660 is also used to:
  • the composite facial expression recognition apparatus 600 further includes:
  • the uploading module is used for uploading the target classification result to the blockchain.
  • each unit/module is used to execute each step in the embodiment corresponding to FIG. 1 to FIG.
  • the steps in the examples have been explained in detail in the above-mentioned embodiments.
  • FIG. 7 is a structural block diagram of a terminal device provided by another embodiment of the present application.
  • the terminal device 700 in this embodiment includes: a processor 701 , a memory 702 , and a computer program 703 stored in the memory 702 and executable on the processor 701 , such as a program for a compound facial expression recognition method.
  • the processor 701 executes the computer program 703 the steps in each of the foregoing embodiments of the compound expression recognition methods are implemented, for example, S101 to S106 shown in FIG. 1 .
  • the processor 701 executes the computer program 703, the functions of each module in the embodiment corresponding to FIG. 6 are implemented, for example, the functions of the modules 610 to 660 shown in FIG. 6 . Specifically as follows:
  • a terminal device comprising a memory, a processor, and a computer program stored in the memory and running on the processor, the processor implements when the processor executes the computer program:
  • the target classification result of the to-be-recognized image is obtained.
  • the processor when the processor executes the computer program, it further implements:
  • a first misclassification probability when the first expression recognition model predicts each compound expression is calculated.
  • the processor when the processor executes the computer program, it further implements:
  • the first target model predicts that the to-be-recognized image is a first composite probability value that is not a first composite expression, and the second target model predicts that the to-be-recognized image is a second composite probability that is not a second composite expression value, all adjusted to the default value;
  • the first probability value the first misclassification probability, the first composite probability value, the first composite misclassification probability, the second composite probability value, the second composite misclassification probability and the preset value, and calculate the classification value corresponding to each compound expression in the to-be-recognized image;
  • the maximum value among the plurality of classification values is determined, and the composite expression corresponding to the maximum value is used as the target classification result of the image to be recognized.
  • the to-be-recognized image includes multiple images, and the multiple to-be-recognized images all belong to the same composite expression category; when the processor executes the computer program, the processor further implements:
  • the target classification result with the largest number of classifications is determined as the final target classification result of the plurality of images to be recognized.
  • the processor when the processor executes the computer program, it further implements:
  • the multiple to-be-recognized images with the same key point feature information are used as multiple to-be-recognized images of the same compound expression to obtain multiple to-be-recognized images of each type of the same compound expression.
  • the processor when the processor executes the computer program, it further implements:
  • the multiple frames of video images are determined as multiple images to be recognized of the same composite expression.
  • the processor when the processor executes the computer program, it further implements:
  • a computer-readable storage medium stores a computer program, and the computer program is implemented when executed by a processor:
  • the target classification result of the to-be-recognized image is obtained.
  • the computer program when executed by the processor, further implements:
  • a first misclassification probability when the first expression recognition model predicts each compound expression is calculated.
  • the computer program when executed by the processor, further implements:
  • the first target model predicts that the to-be-recognized image is a first composite probability value that is not a first composite expression, and the second target model predicts that the to-be-recognized image is a second composite probability that is not a second composite expression value, all adjusted to the default value;
  • the first probability value the first misclassification probability, the first composite probability value, the first composite misclassification probability, the second composite probability value, the second composite misclassification probability and the preset value, and calculate the classification value corresponding to each compound expression in the to-be-recognized image;
  • the maximum value among the plurality of classification values is determined, and the composite expression corresponding to the maximum value is used as the target classification result of the image to be recognized.
  • the to-be-recognized images include multiple, and the multiple to-be-recognized images all belong to the same compound expression category; when the computer program is executed by the processor, it also implements:
  • the target classification result with the largest number of classifications is determined as the final target classification result of the plurality of images to be recognized.
  • the computer program when executed by the processor, further implements:
  • the multiple to-be-recognized images with the same key point feature information are used as multiple to-be-recognized images of the same compound expression to obtain multiple to-be-recognized images of each type of the same compound expression.
  • the computer program when executed by the processor, further implements:
  • the multiple frames of video images are determined as multiple images to be recognized of the same composite expression.
  • the computer program when executed by the processor, further implements:
  • the computer program 703 may be divided into one or more units, and the one or more units are stored in the memory 702 and executed by the processor 701 to complete the present application.
  • One or more units may be a series of computer program instruction segments capable of accomplishing specific functions, and the instruction segments are used to describe the execution process of the computer program 703 in the terminal device 700 .
  • the computer program 703 can be divided into a first prediction module, a first composite prediction module, a single expression prediction module, a second composite prediction module, an acquisition module and an identification module, and the specific functions of each module are as above.
  • the terminal device may include, but is not limited to, the processor 701 and the memory 702 .
  • FIG. 7 is only an example of the terminal device 700, and does not constitute a limitation to the terminal device 700, and may include more or less components than the one shown, or combine some components, or different components
  • the terminal device may also include an input and output device, a network access device, a bus, and the like.
  • the so-called processor 701 can be a central processing unit, and can also be other general-purpose processors, digital signal processors, application-specific integrated circuits, off-the-shelf programmable gate arrays or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components. Wait.
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the memory 702 may be an internal storage unit of the terminal device 700 , such as a hard disk or a memory of the terminal device 700 .
  • the memory 702 may also be an external storage device of the terminal device 700 , such as a plug-in hard disk, a smart memory card, a flash memory card, etc., which are equipped on the terminal device 700 . Further, the memory 702 may also include both an internal storage unit of the terminal device 700 and an external storage device.
  • the computer-readable storage medium may be an internal storage unit of the terminal device described in the foregoing embodiments, such as a hard disk or a memory of the terminal device.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the computer-readable storage medium may also be an external storage device of the terminal device, for example, a pluggable hard disk, a smart memory card, a secure digital card, a flash memory card, etc. equipped on the terminal device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

本申请适用于人工智能技术领域,提供了一种复合表情识别方法、装置、终端设备及存储介质,该方法包括:利用第一表情识别模型、第一目标模型和第二目标模型识别待识别图像,分别得到每种复合表情的第一概率值、第一复合概率值和第二复合概率值;获取第一表情识别模型的第一误分类概率、每个第一目标模型的第一复合误分类概率和每个第二目标模型的第二复合误分类概率;根据第一概率值、第一误分类概率、第一复合概率值、第一复合误分类概率、第二复合概率值以及第二复合误分类概率,得到目标分类结果。通过上述复合表情识别方法,可在待识别图像包含复合表情时,结合多个表情识别模型的预测结果及误分类概率,准确预测待识别图像中的主从表情。

Description

复合表情识别方法、装置、终端设备及存储介质
本申请要求于2020年11月19日在中国专利局提交的、申请号为202011304521.8、发明名称为“复合表情识别方法、装置、终端设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请属于人工智能技术领域,尤其涉及一种复合表情识别方法、装置、终端设备及存储介质。
背景技术
表情识别作为人机交互的一个重要领域,已经得到了几十年的发展,在许多领域都有广泛应用。但由于人脸表情特征的多样性以及不同个体表情的差异性,表情识别仍然是视觉领域的一大难题。复合表情识别更是要求同时识别出主要表情和次要表情,而表情组合的多样性以及主次表情的难以分辨性,使得复合表情识别难上加难。
现有的方法,大多将表情识别作为人脸图片的分类任务,对图片进行特征提取与表情分类,该方法在单一图片的表情识别任务上对于部分特征明显的表情如高兴、吃惊等取得了较好的效果。但是,发明人意识到,对于某些表情如悲伤、厌恶等多种特征相似的复合表情却较难区分,且难以准确识别出复合表情中的主要表情和次要表情。
技术问题
本申请实施例的目的之一在于:提供一种复合表情识别方法、装置、终端设备及存储介质,旨在解决现有技术中难以准确识别出复合表情中的主要表情和次要表情的技术问题。
技术解决方案
为解决上述技术问题,本申请实施例采用的技术方案是:
第一方面,本申请实施例提供了一种复合表情识别方法,包括:
利用第一表情识别模型对待识别图像中的复合表情进行识别,得到多种复合表情分别一一对应的第一概率值;
根据所述第一概率值的最大值确定预测复合表情,并基于预测复合表情确定第二表情识别模型集合中对应的第一目标模型,且将所述待识别图像输入至所述第一目标模型得到预测所述待识别图像为第一复合表情的第一复合概率值;每种所述第一目标模型分别对应两种预测复合表情;
输入所述待识别图像至第三表情识别模型中,预测所述待识别图像包含的多个目标单一表情;
根据所述多个目标单一表情确定所述第二表情识别模型集合中对应的第二目标模型,并将所述待识别图像输入至所述第二目标模型中得到预测所述待识别图像为第二复合表情的第二复合概率值;每种所述第二目标模型分别对应的两种预测复合表情中,可分别由所述多个目标单一表情一一对应组合得到;
获取所述第一表情识别模型对应的的第一误分类概率,并获取所述每个第二表情识别模型对应的第一复合误分类概率,以及获取所述每个第二表情识别模型对应的第二复合误分类概率;
根据所述第一概率值、所述第一误分类概率、所述第一复合概率值、所述第一复合误分类概率、所述第二复合概率值以及所述第二复合误分类概率,得到所述待识别图像的目标分类结果。
第二方面,本申请实施例提供了一种复合表情识别装置,包括:
第一预测模块,用于利用第一表情识别模型对待识别图像中的复合表情进行识别,得到多种复合表情分别一一对应的第一概率值;
第一复合预测模块,用于根据所述第一概率值的最大值确定预测复合表情,并基于预测复合表情确定第二表情识别模型集合中对应的第一目标模型,且将所述待识别图像输入至所述第一目标模型得到预测所述待识别图像为第一复合表情的第一复合概率值;每种所 述第一目标模型分别对应两种预测复合表情;
单一表情预测模块,用于输入所述待识别图像至第三表情识别模型中,预测所述待识别图像包含的多个目标单一表情;
第二复合预测模块,用于根据所述多个目标单一表情确定所述第二表情识别模型集合中对应的第二目标模型,并将所述待识别图像输入至所述第二目标模型中得到预测所述待识别图像为第二复合表情的第二复合概率值;每种所述第二目标模型分别对应的两种预测复合表情中,可分别由所述多个目标单一表情一一对应组合得到;
获取模块,用于获取所述第一表情识别模型对应的的第一误分类概率,并获取所述每个第二表情识别模型对应的第一复合误分类概率,以及获取所述每个第二表情识别模型对应的第二复合误分类概率;
识别模块,用于根据所述第一概率值、所述第一误分类概率、所述第一复合概率值、所述第一复合误分类概率、所述第二复合概率值以及所述第二复合误分类概率,得到所述待识别图像的目标分类结果。
本申请实施例的第三方面提供了一种终端设备,包括:存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现:
利用第一表情识别模型对待识别图像中的复合表情进行识别,得到多种复合表情分别一一对应的第一概率值;
根据所述第一概率值的最大值确定预测复合表情,并基于预测复合表情确定第二表情识别模型集合中对应的第一目标模型,且将所述待识别图像输入至所述第一目标模型得到预测所述待识别图像为第一复合表情的第一复合概率值;每种所述第一目标模型分别对应两种预测复合表情;
输入所述待识别图像至第三表情识别模型中,预测所述待识别图像包含的多个目标单一表情;
根据所述多个目标单一表情确定所述第二表情识别模型集合中对应的第二目标模型,并将所述待识别图像输入至所述第二目标模型中得到预测所述待识别图像为第二复合表情的第二复合概率值;每种所述第二目标模型分别对应的两种预测复合表情中,可分别由所述多个目标单一表情一一对应组合得到;
获取所述第一表情识别模型对应的的第一误分类概率,并获取所述每个第二表情识别模型对应的第一复合误分类概率,以及获取所述每个第二表情识别模型对应的第二复合误分类概率;
根据所述第一概率值、所述第一误分类概率、所述第一复合概率值、所述第一复合误分类概率、所述第二复合概率值以及所述第二复合误分类概率,得到所述待识别图像的目标分类结果。
本申请实施例的第四方面提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现:
利用第一表情识别模型对待识别图像中的复合表情进行识别,得到多种复合表情分别一一对应的第一概率值;
根据所述第一概率值的最大值确定预测复合表情,并基于预测复合表情确定第二表情识别模型集合中对应的第一目标模型,且将所述待识别图像输入至所述第一目标模型得到预测所述待识别图像为第一复合表情的第一复合概率值;每种所述第一目标模型分别对应两种预测复合表情;
输入所述待识别图像至第三表情识别模型中,预测所述待识别图像包含的多个目标单一表情;
根据所述多个目标单一表情确定所述第二表情识别模型集合中对应的第二目标模型,并将所述待识别图像输入至所述第二目标模型中得到预测所述待识别图像为第二复合表情 的第二复合概率值;每种所述第二目标模型分别对应的两种预测复合表情中,可分别由所述多个目标单一表情一一对应组合得到;
获取所述第一表情识别模型对应的的第一误分类概率,并获取所述每个第二表情识别模型对应的第一复合误分类概率,以及获取所述每个第二表情识别模型对应的第二复合误分类概率;
根据所述第一概率值、所述第一误分类概率、所述第一复合概率值、所述第一复合误分类概率、所述第二复合概率值以及所述第二复合误分类概率,得到所述待识别图像的目标分类结果。
本申请实施例的第五方面还提供了一种计算机程序产品,当所述计算机程序产品在终端设备上运行时,使得所述终端设备行时实现:
利用第一表情识别模型对待识别图像中的复合表情进行识别,得到多种复合表情分别一一对应的第一概率值;
根据所述第一概率值的最大值确定预测复合表情,并基于预测复合表情确定第二表情识别模型集合中对应的第一目标模型,且将所述待识别图像输入至所述第一目标模型得到预测所述待识别图像为第一复合表情的第一复合概率值;每种所述第一目标模型分别对应两种预测复合表情;
输入所述待识别图像至第三表情识别模型中,预测所述待识别图像包含的多个目标单一表情;
根据所述多个目标单一表情确定所述第二表情识别模型集合中对应的第二目标模型,并将所述待识别图像输入至所述第二目标模型中得到预测所述待识别图像为第二复合表情的第二复合概率值;每种所述第二目标模型分别对应的两种预测复合表情中,可分别由所述多个目标单一表情一一对应组合得到;
获取所述第一表情识别模型对应的的第一误分类概率,并获取所述每个第二表情识别模型对应的第一复合误分类概率,以及获取所述每个第二表情识别模型对应的第二复合误分类概率;
根据所述第一概率值、所述第一误分类概率、所述第一复合概率值、所述第一复合误分类概率、所述第二复合概率值以及所述第二复合误分类概率,得到所述待识别图像的目标分类结果。
有益效果
与现有技术相比,本申请实施例包括以下优点:
本申请实施例,通过第一表情识别模型预测每种复合表情的第一概率值,作为一种分类结果;然后,基于第一表情识别模型的预测结果,确定第一目标模型并对待识别图像再次进行预测,将得到的第一复合概率值作为另一分类结果;之后,通过只预测待识别图像中目标单一表情的第三表情识别模型进行单一表情识别,并基于第三表情识别模型的预测结果,确定第二目标模型对待识别图像再次进行预测,将得到的第二复合概率值作为又一分类结果。最后,综合上述三种分类结果,以及三种分类结果中每种复合表情对应的误分类概率,计算每种复合表情的预测概率值,并根据预测概率值,从多种复合表情中确定目标分类结果。使得在针对复合表情识别任务时,可以综合上述多个表情识别模型的预测结果作为初步识别人脸复合表情的基础,并在此基础上,结合每个预测结果对应的误分类概率作为修正信息,对预测结果进行修正,得到目标分类结果,以进一步提高对人脸复合表情进行识别时的准确率。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或示范性技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1是本申请一实施例提供的一种复合表情识别方法的实现流程图;
图2是本申请一实施例提供的一种复合表情识别方法的S105的一种实现方式示意图;
图3是本申请一实施例提供的一种复合表情识别方法的S106的一种实现方式示意图;
图4是本申请一实施例提供的一种复合表情识别方法的S106的另一种实现方式示意图;
图5是本申请一实施例提供的一种复合表情识别方法的S106的又一种实现方式示意图;
图6是本申请实施例提供的一种复合表情识别装置的结构框图;
图7是本申请实施例提供的一种终端设备的结构框图。
本发明的实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。
本申请实施例提供的复合表情识别方法可以应用于平板电脑、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)等终端设备上,本申请实施例对终端设备的具体类型不作任何限制。
请参阅图1,图1示出了本申请实施例提供的一种复合表情识别方法的实现流程图,该方法包括如下步骤:
S101、利用第一表情识别模型对待识别图像中的复合表情进行识别,得到多种复合表情分别一一对应的第一概率值。
上述待识别图像为包含人脸的人脸图像,因此,可从人脸图像中识别人脸表情。其中,人脸图像中的复合表情可以理解为,人脸表情同时包含至少两种表情,且复合表情可以认为是具有主从关系的主从表情。上述主从表情表示为人脸图像包含的多种表情中,主要表情是何种表情,以及次要表情是何种表情。可以理解的是,对于两种表情而言,主要表情与次要表情各占一种。而对于两种以上的表情而言,可以认为人脸图像中的主要表情相比于其余表情可明显区分,此时,可将其余多种表情均可认为是次要表情。为便于解释说明,本实施例以包含两种表情的待识别图像进行解释说明。
需要补充的是,人脸表情包括但不限于开心、惊讶、厌恶、恐惧、悲伤、快乐、惊讶以及自然,任一上述两种人脸表情进行组合均可认为是复合表情。然而,需要说明的是,对于主表情为开心,从表情为惊讶的复合表情,与主表情为惊讶(与上述从表情对应),从表情为开心(与上述主表情对应)的复合表情,两者属于不同的主从表情(复合表情)。
在应用中,第一表情识别模型为根据第一训练数据进行训练得到的模型。其中,第一训练数据可以认为是包含上述八种单一表情组成的复合表情的人脸图像。另外,对于开心、惊讶、厌恶、恐惧、悲伤、快乐以及惊讶这7种表情,可对应组合得到42种复合表情。此外,第一训练数据还可包括8种单一表情的人脸图像,组成包含50种复合表情的人脸图像。此时,单一表情也可以认为是主表情与从表情均一致的复合表情,对此不作限定。
在具体应用中,上述第一训练数据可以从复合表情竞赛数据集中进行获取,复合表情竞赛数据集中包含31250张表情图片,由125个个体,每个个体的50种复合表情图片构成,且每种表情包括5张图片。对于上述数据集,可将数据集划分为83个个体共20650张图片作为训练集(第一训练数据),9个个体共2250张图片作为验证集,剩余33个个体作为测试集。而后,采用残差网络模型作为基础网络,将第一训练数据输入至残差网络模型进行训练,在整个训练过程中,残差网络可对人脸区域进行特征提取,得到512维的人脸特征,将此人脸特征与136维关键点(人脸的眼睛、鼻子等关键点)坐标归一化值进行集合,作为该复合表情图片中的复合表情特征。而后,将复合表情特征送入分类层,输出分类结果,并根据实际的复合表情结果计算分类损失,根据分类损失迭代更新残差网络模型,得到第一表情识别模型。其中,计算分类损失时,可采用交叉熵作为损失函数。因交叉熵可 表示实际输出(概率)与期望输出(概率)的距离,也即可认为交叉熵的值越小,两个概率分布就越接近,基于此,可使迭代更新后的第一表情识别模型对复合表情进行识别时的准确率高。对于第一训练数据而言,可预先标记每张复合表情图片的真实标签,也即每张复合表情具体属于哪一类复合表情的概率(期望输出)。
在应用中,在得到上述第一表情识别模型后,将待识别图像输入至第一模型中,得到的第一概率值具有多个。即对于50种复合表情,第一表情识别模型可输出50个第一概率值,每个第一概率值为第一表情识别模型预测待识别图像为该类复合表情时的数值。
S102、根据所述第一概率值的最大值确定预测复合表情,并基于预测复合表情确定第二表情识别模型集合中对应的第一目标模型,且将所述待识别图像输入至所述第一目标模型得到预测所述待识别图像为第一复合表情的第一复合概率值;每种所述第一目标模型分别对应两种预测复合表情。
在应用中,上述第二表情识别模型为根据第二训练数据进行训练得到模型。其中,第二训练数据可基于第一训练数据得到。具体的,对于上述每种复合表情中的主表情和从表情,若复合表情中的主表情与从表情,与其余复合表情中的主表情与从表情相反,则可根据该相反的两种复合表情对应的训练数据训练一种第二表情识别模型。以此,得到包含多种第二表情识别模型的第二表情识别模型集合。示例性的,对于主表情为开心,次表情为惊讶的复合表情,与主表情为惊讶,次表情为开心的复合表情,可从复合表情竞赛数据集中获取该两种复合表情对应的表情图片,并确定每张表情图片具体的主从表情标签(开心惊讶标签、惊讶开心标签)作为一种第二训练数据,进而可训练关于开心和惊讶复合表情的二分类模型(第二表情识别模型)。以此,得到的第二表情识别模型只用于预测待识别图像的复合表情中,开心和惊讶之间的主从关系。
具体的,对于上述组合得到的42种复合表情,根据上述说明,可将42种复合表情组合成21种第二训练数据,进而可训练得到21种第二表情识别模型。每种第二表情识别模型均为二分类模型,用于预测表情图片中的主表情和从表情。示例性的,可将主表情A和从表情B组成的复合表情图片,以及由主表情B和从表情A组成的复合表情图片,作为一种第二训练数据,进行训练得到关于AB主从表情分类的第二表情识别模型。而后,使用该第二表情识别模型对待识别图像进行识别时,即可得到待识别图像中复合表情为主表情A和从表情B的第一复合概率值,和/或,待识别图像中复合表情为主表情B和从表情A的第一复合概率值。此外,对于8种单一表情,也可认为单一表情是由主表情与次表情均一致的复合表情,以此构成29种第二表情识别模型。
在应用中,对于上述得到多种复合表情对应的第一概率值,可根据第一概率值的大小,将最大的第一概率值对应的复合表情作为预测复合表情。另外,每个第二表情识别模型分别用于对待识别图像中的一类复合表情进行识别。因此,预测复合表情也属于某个第二表情识别模型进行复合表情识别的类别,进而可将该第二表情识别模型作为第一目标模型。第一目标模型预测的第一复合表情,可能与第一表情识别模型的预测复合表情一致,也可能相反,对此不作限定。需要说明的是,第一目标模型对待识别图像进行预测,得到的预测结果即为第一复合表情,预测待识别图像为第一复合表情的数值即为第一复合概率值。
S103、输入所述待识别图像至第三表情识别模型中,预测所述待识别图像包含的多个目标单一表情。
在应用中,上述第三表情识别模型为根据第三训练数据进行训练得到模型。其中,第三训练数据可基于第一训练数据得到。具体的,若复合表情中的主表情与从表情,与其余复合表情中的主表情与从表情相反,则将该两种复合表情,作为新的复合表情类别。以此,可得到多种新的复合表情类别。示例性的,对于主表情为开心,次表情为惊讶的复合表情,与主表情为惊讶,次表情为开心的复合表情。可将该两种复合表情作为第三训练数据,且该第三训练数据在训练第三表情识别模型时,不考虑第三训练数据中复合表情的主从关系,即只标注每张表情图片对应包含的多个单一表情。以此,可得到包含29种不考虑复合表情 中主从关系的第三训练数据。即可将上述S102中21种类型的训练数据,加上8种主从表情一致的训练数据结合为第三训练数据。之后,可再次从复合表情竞赛数据集中获取每种新的复合表情对应的表情图片作为第三训练数据,训练第三表情识别模型。此时,训练的第三表情识别模型有且只有一个,且该第三表情识别模型只用于预测待识别图像中包含的目标单一表情。
可以理解的是,上述第三表情识别模型在对待识别图像进行表情识别时,可预测每两种单一表情组成的复合表情分别对应的第三概率值。即上述第三表情识别模型只输出待识别图像是由哪两种单一表情组成的复合表情的概率值,并不考虑两个单一表情之间的主从关系,其与表示每种复合表情(主表情与从表情的具体主从关系)的第一概率值不同。因此,第三表情识别模型可输出29种第三概率值,基于此,可从29种第三概率值中,将最大第三概率值对应的复合表情中包含的单一表情,作为目标单一表情。示例性的,若第三表情识别模型预测复合表情中开心与惊讶的第三概率值最大,则可分别将开心与惊讶作为目标单一表情。
S104、根据所述多个目标单一表情确定所述第二表情识别模型集合中对应的第二目标模型,并将所述待识别图像输入至所述第二目标模型中得到预测所述待识别图像为第二复合表情的第二复合概率值;每种所述第二目标模型分别对应的两种预测复合表情中,可分别由所述多个目标单一表情一一对应组合得到。
在应用中,上述S102已说明每个第二表情识别模型分别用于对待识别图像中的一类复合表情进行识别。因此,根据目标单一表情组成的复合表情,可从每个第二表情识别模型对应识别的一类复合表情中,确定出一致的复合表情,作为第二目标模型。即每种第二目标模型对待识别图像进行识别的两种预测复合表情,可由多个目标单一表情一一对应组合得到。
示例性的,在确定多个目标单一表情分别为惊讶和开心时,即可确定用于对复合表情中开心和惊讶的主从表情进行二分类的第二表情识别模型,即为第二目标模型。使用第二目标模型再次对待识别图像进行识别,得到输出主表情为开心,从表情为惊讶的概率值即为第二复合概率值;或者,输出主表情为惊讶,从表情为开心的概率值即为第二复合概率值。
S105、获取所述第一表情识别模型对应的的第一误分类概率,并获取所述每个第二表情识别模型对应的第一复合误分类概率,以及获取所述每个第二表情识别模型对应的第二复合误分类概率。
在应用中,上述第一误分类概率为第一表情识别模型分别对每种复合表情中的主从表情预测错误的概率。具体的,在训练得到第一表情识别模型后,可采用测试集中训练数据进行确定。例如,对于50种复合表情,每种复合表情包括5张表情图片,通过第一表情识别模型对每种复合表情对应的5张表情图片进行预测,并统计每种复合表情中预测准确的图片数量。而后,根据预测准确的图片数量与总数量(5),计算第一表情识别模型对每种复合表情进行识别时的误分类概率。由此,可认为在训练完第一训练模型后,即可通过上述方式确定第一表情识别模型与每种复合表情对应的误分类概率。具体计算公式可以为:y=1-aij。其中,i为1,2,3,j为1-50之间的数值;i等于1时,aij表示为第一表情识别模型中对第j类复合表情进行预测的分类准确率;i等于2时,aij表示为第二表情识别模型在第一表情识别模型预测结果的基础上,对j类复合表情进行预测的分类准确率;以及i等于3时,aij表示为第二表情识别模型在第三表情识别模型预测结果的基础上,对j类复合表情进行预测的分类准确率。其中,分类准确率=预测正确的样本数/总样本数。
可以理解的是,第二误分类概率为先通过第一表情识别模型对该类复合表情的待识别图像进行表情预测,并在第一表情识别模型预测结果的基础上,使用相应复合表情对应的第二模型再次进行预测。以此计算每种第二表情识别模型对相应复合表情进行识别时的第一复合误分类概率。同理可得,每个第二表情识别模型在第三表情识别模型预测结果的基 础上,对相应复合表情进行预测的第二复合误分类概率,对此不在详细描述。另外,上述第一误分类概率、第一复合误分类概率以及第二复合误分类概率均可在上述表情识别模型训练结束后进行获取,并存储在终端设备内部,以使得终端设备可随时进行调用。
S106、根据所述第一概率值、所述第一误分类概率、所述第一复合概率值、所述第一复合误分类概率、所述第二复合概率值以及所述第二复合误分类概率,得到所述待识别图像的目标分类结果。
在应用中,上述目标分类结果为待识别图像的最终预测结果,即最终预测复合表情中的主从表情。具体的,计算目标分类结果的公式如下:l=∑(1-a 1j)l 1j+(1-a 2j)l 2j+(1-a 3j)l 3j,j=1,2,...,50。其中,a ij(i=1,2,3)的解释说明具体可参照上述S105。l ij(i=1,2,3)中i的解释说明与a ij中i解释说明一致,示例性的,对i=1时,l 1j表示第一表情识别模型预测待识别图像为第j类复合表情时的第一概率值。以此可确定l 2j以及l 3j的解释说明,对此不再进行详细说明。需要说明的是,使用对应复合表情的二分类模型(第一目标模型)在对将待识别图像进行识别时,得到的预测结果只为预测待识别图像属于该类主从复合表情的第一复合概率值。例如,若第一目标模型预测复合表情为AB的第一复合概率值为1,则预测复合表情为BA的第一复合概率值即为0。此时,因第一目标模型不输出其余48种类别的复合表情对应的第一复合概率值,因此,在参与上述计算时,需要将其余48种类别的复合表情(AC、CA、AD、DA...)的第一复合概率值均以0参与计算。同理,对于第二复合概率值l 3j,其与第一复合概率值l 2j的处理方式一致,对此不再进行详细描述。
可以理解的是,根据上述公式可以通过上述三种表情识别模型得到预测待识别图像分别为50类复合表情对应的50个预测概率值。之后,可将50个预测概率值中的最大值作为目标概率值,并将目标概率值对应的该类复合表情(具有主从关系的复合表情),作为待识别图像最终的目标分类结果。
在本实施例中,通过第一表情识别模型预测每种复合表情的第一概率值,作为一种分类结果;然后,基于第一表情识别模型的预测结果,确定第一目标模型并对待识别图像再次进行预测,将得到的第一复合概率值作为另一分类结果;之后,通过只预测待识别图像中目标单一表情的第三表情识别模型进行单一表情识别,并基于第三表情识别模型的预测结果,确定第二目标模型对待识别图像再次进行预测,将得到的第二复合概率值作为又一分类结果。最后,综合上述三种分类结果,以及三种分类结果中每种复合表情对应的误分类概率,计算每种复合表情的预测概率值,并根据预测概率值,从多种复合表情中确定目标分类结果。使得在针对复合表情识别任务时,可以综合上述多个表情识别模型的预测结果作为初步识别人脸复合表情的基础,并在此基础上,结合每个预测结果对应的误分类概率作为修正信息,对预测结果进行修正,得到目标分类结果,以进一步提高对人脸复合表情进行识别时的准确率。
请参照图2,在一具体实施例中,S105获取所述第一表情识别模型对每类复合表情进行预测的第一误分类概率中,还包括如下子步骤S1051-S1053,详述如下:
S1051、获取训练数据中多种复合表情对应的多张训练图像,并将所述多张训练图像输入至所述第一表情识别模型,得到每张训练图像的预测结果。
在应用中,对于第一表情识别模型是采用第一训练数据进行训练得到,为保证第一表情识别模型对每种复合表情预测时的第一误分类概率的准确性,上述训练数据可不包含第一训练数据。即可使用上述S101数据集中测试集的每种复合表情的多张训练图像,进行复 合表情识别。
S1052、统计每种复合表情中,所述预测结果错误的错误数量。
在应用中,上述预测结果错误即为第一预测模型预测测试集中该种复合表情的训练图像时,预测结果(预测复合表情)与训练图像的实际复合表情不一致。需要说明的说,上述每种复合表情的训练图像的数量可以一致,也可以不一致。本实施例中,为使得第一表情识别模型对每种复合表情进行预测时的第一误分类概率更为公平,每种复合表情的训练图像数量可以一致。
S1053、基于所述每种复合表情对应的多张训练图像的总数以及所述错误数量,计算所述第一表情识别模型对所述每种复合表情进行预测时的第一误分类概率。
在应用中,计算第一表情识别模型对每种复合表情进行预测时的第一误分类概率,具体可参照上述S105中关于第一误分类概率进行计算的公式以及解释说明,对此不再进行详细描述。
请参照图3,在一具体实施例中,S106根据所述第一概率值、所述第一误分类概率、所述第一复合概率值、所述第一复合误分类概率、所述第二复合概率值以及所述第二复合误分类概率,得到所述待识别图像的目标分类结果中,还包括如下子步骤S1061-S1063,详述如下:
S1061、将所述第一目标模型预测所述待识别图像为非第一复合表情的第一复合概率值,以及所述第二目标模型预测所述待识别图像为非第二复合表情的第二复合概率值,均调整为预设值。
在应用中,上述已说明第一目标模型和第二目标模型均为二分类模型,只可输出两种复合表情的概率值。此时,对于第一目标模型而言,则不会对应输出其余48种复合表情(非第一复合表情)的第一复合概率值。因此,为便于计算每种复合表情的预测概率值,可设定其余48种复合表情的第一复合概率值均为0(预设值)。上述预设值具体可由用户根据实际情况进行设定,具体可参照上述S106中对于第一复合概率值的示例说明。同理可将第二目标模型预测待识别图像为非第二复合表情的第二复合概率值,均调整为预设值。基于此,可以得到第一目标模型预测的第一复合概率值(第一复合表情对应的第一复合概率值,以及非第一复合表情对应的第一复合概率值)将具有50个,每个对应一类复合表情。同样的,第二目标模型预测的第二复合概率值(第二复合表情对应的第二复合概率值,以及非第二复合表情对应的第二复合概率值)也将对应有50个,每个对应一类复合表情。
S1062、根据同一类别复合表情对应的第一概率值、所述第一误分类概率、所述第一复合概率值、所述第一复合误分类概率、所述第二复合概率值、所述第二复合误分类概率以及所述预设值,计算所述待识别图像中每种复合表情对应的分类值。
在应用中,上述计算待识别图像中每种复合表情对应的分类值的计算公式以及解释说明,具体可参照S106中的计算公式,以及对应的解释说明,对此不再详细描述。可以理解的是,上述每种复合表情对应的分类值即为上述S106计算公式中的l数值。
S1063、确定多个分类值中的最大值,并将所述最大值对应的复合表情作为所述待识别图像的目标分类结果。
在应用中,分类值为基于三类表情识别模型综合预测的数值,对于预测的每种复合表情对应的分类值,可从多个分类值中确定最大值,并将最大值对应的复合表情作为最接近待识别图像真实复合表情的目标分类结果,以提高对待识别图像的复合表情进行识别的准确率。
请参照图4,在一具体实施例中,所述待识别图像包括多个,多个待识别图像均属于相同复合表情类别;S106根据所述第一概率值、所述第一误分类概率、所述第一复合概率值、所述第一复合误分类概率、所述第二复合概率值以及所述第二复合误分类概率,得到所述待识别图像的目标分类结果,还包括如下子步骤S1064-S1066,详述如下:
S1064、获取所述相同复合表情的多个待识别图像中,每个待识别图像的目标分类结果。
在应用中,上述相同复合表情的多个待识别图像可以为视频片段中,连续多帧的视频图像,或者连续拍摄的人物图片。因在实际情况下,对于一段包含人物的视频,在该视频连续多帧的视频图像中,人物的表情变化会非常小。因此,可通常将连续多帧的视频图像中人物的表情认为是同一类复合表情。进而,可通过上述复合表情识别方法获取相同复合表情的多个待识别图像中,每个待识别图像的目标分类结果。
在应用中,上述连续多帧的帧数量可以为用户根据实际情况进行设定的数量,例如,相同复合表情的5张待识别图像。通过基于视频连续多帧的视频图像中,人物图像的表情不变性,以及上述多个表情识别模型综合预测的各个误分类概率,以使得复合表情识别方法在针对多张相同复合表情类别的图像进行识别时,可以进一步提高对人物复合表情识别的准确率。
S1065、从多个目标分类结果中,获取相同目标分类结果的分类数量。
S1066、将所述分类数量最多的目标分类结果,确定为所述多个待识别图像最终的目标分类结果。
在应用中,上述连续多帧的视频图像中,人物的表情变化虽然会非常小,但是对于经过上述复合表情识别方法进行处理后,每帧视频图像可能会预测出不同的复合表情。因此,对于同一类复合表情的多张待识别图像,可统计相同目标分类结果的分类数量,并将分类数量最多的目标分类结果,确定为多个待识别图像最终的目标分类结果(最终预测的复合表情)。以此,可提高对相同复合表情类别的多个待识别图像预测的准确率。
请参照图5,在一具体实施例中,在S1064获取所述相同复合表情的多个待识别图像中,每张待识别图像的分类结果之前,还包括如下步骤S1064a-S1064b,详述如下:
S1064a、对包括多类复合表情的多个待识别图像进行关键点聚类处理,得到所述每个待识别图像的关键点特征信息。
S1064b、将所述关键点特征信息相同的多个待识别图像,作为相同复合表情的多个待识别图像,得到每类相同复合表情的多个待识别图像。
在应用中,上述多个待识别图像为包含多类复合表情的图像,且每类复合表情的图像均可对应有多个。上述关键点可以理解为将每张待识别图像中,人物的眼睛、鼻子、嘴巴等作为人脸图像中的关键点,其可通过人脸检测技术对每张待识别图像进行检测,并确定每个关键点在待识别图像中的坐标信息和特征信息。聚类则可以理解为在获取到每张待识别图像中关键点的坐标信息和特征信息后,根据坐标信息和特征信息之间的差异是否超过预设数值,确定两张待识别图像是否属于同一类复合表情。
在实际情况中,若两张待识别图像均属于同一类复合表情,则两张待识别图像的同一关键点的特征信息以及坐标信息之间的差异将非常小。因此,可根据关键点聚类的方式在多张待识别图像中,确定属于相同复合表情类别的多个待识别图像。之后,可将每种相同复合表情类别的多个待识别图像经过上述S1064-S1066步骤处理,对此不再进行解释说明。
在一实施例中,在S1064所述获取所述相同复合表情的多个待识别图像中,每张待识别图像的分类结果之前,还包括:
从预设视频中,连续获取多帧相邻的视频图像;
将多帧视频图像确定为相同复合表情的多个待识别图像。
在应用中,上述S1064已说明将连续多帧的视频图像中人物的表情认为是同一类复合表情的理由,对此不再进行说明。上述预设视频可以为预先缓存在终端设备指定存储路径下的视频,也可以为用户上传至终端设备的视频,对此不作限定。对于预设视频,终端设备可对该视频进行播放,并监测视频中初始出现人脸图像的初始视频图像。在正常情况下,当视频帧率不低于24帧率(fps)时,人眼才会觉得视频时连贯的。因此,视频播放的帧率通常为每秒钟24帧。因此,可认为之后播放的连续4帧视频图像,与初始视频图像均为相同复合表情的多个待识别图像。
在一实施例中,在S106根据所述第一概率值、所述第一误分类概率、所述第一复合概率值、所述第一复合误分类概率、所述第二复合概率值以及所述第二复合误分类概率之后,还包括:
将所述目标分类结果上传至区块链中。
具体的,在本申请的所有实施例中,基于终端设备得到对应的目标分类结果,具体来说,目标分类结果由终端工具进行处理得到。将目标分类结果上传至区块链可保证其安全性和对用户的公正透明性。用户设备可以从区块链中下载得该目标分类结果,以便查证目标分类结果是否被篡改。本示例所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
请参阅图6,图6是本申请实施例提供的一种复合表情识别装置的结构框图。本实施例中该终端设备包括的各单元用于执行图1至图5对应的实施例中的各步骤。具体请参阅图1至图5以及图1至图5所对应的实施例中的相关描述。为了便于说明,仅示出了与本实施例相关的部分。参见图6,复合表情识别装置600包括:第一预测模块610、第一复合预测模块620、单一表情预测模块630、第二复合预测模块640、获取模块650和识别模块660,其中:
第一预测模块610,用于利用第一表情识别模型对待识别图像中的复合表情进行识别,得到多种复合表情分别一一对应的第一概率值。
第一复合预测模块620,用于根据所述第一概率值的最大值确定预测复合表情,并基于预测复合表情确定第二表情识别模型集合中对应的第一目标模型,且将所述待识别图像输入至所述第一目标模型得到预测所述待识别图像为第一复合表情的第一复合概率值;每种所述第一目标模型分别对应两种预测复合表情。
单一表情预测模块630,用于输入所述待识别图像至第三表情识别模型中,预测所述待识别图像包含的多个目标单一表情。
第二复合预测模块640,用于根据所述多个目标单一表情确定所述第二表情识别模型集合中对应的第二目标模型,并将所述待识别图像输入至所述第二目标模型中得到预测所述待识别图像为第二复合表情的第二复合概率值;每种所述第二目标模型分别对应的两种预测复合表情中,可分别由所述多个目标单一表情一一对应组合得到。
获取模块650,用于获取所述第一表情识别模型对应的的第一误分类概率,并获取所述每个第二表情识别模型对应的第一复合误分类概率,以及获取所述每个第二表情识别模型对应的第二复合误分类概率。
识别模块660,用于根据所述第一概率值、所述第一误分类概率、所述第一复合概率值、所述第一复合误分类概率、所述第二复合概率值以及所述第二复合误分类概率,得到所述待识别图像的目标分类结果。
在一实施例中,获取模块650还用于:
获取训练数据中多种复合表情对应的多张训练图像,并将所述多张训练图像输入至所述第一表情识别模型,得到每张训练图像的预测结果;统计每种复合表情中,所述预测结果错误的错误数量;基于所述每种复合表情对应的多张训练图像的总数以及所述错误数量,计算所述第一表情识别模型对所述每种复合表情进行预测时的第一误分类概率。
在一实施例中,识别模块660还用于:
将所述第一目标模型预测所述待识别图像为非第一复合表情的第一复合概率值,以及所述第二目标模型预测所述待识别图像为非第二复合表情的第二复合概率值,均调整为预设值;根据同一类别复合表情对应的第一概率值、所述第一误分类概率、所述第一复合概率值、所述第一复合误分类概率、所述第二复合概率值、所述第二复合误分类概率以及所 述预设值,计算所述待识别图像中每种复合表情对应的分类值;确定多个分类值中的最大值,并将所述最大值对应的复合表情作为所述待识别图像的目标分类结果。
在一实施例中,所述待识别图像包括多个,多个待识别图像均属于相同复合表情类别;识别模块660还用于:
获取所述相同复合表情的多个待识别图像中,每个待识别图像的目标分类结果;从多个目标分类结果中,获取相同目标分类结果的分类数量;将所述分类数量最多的目标分类结果,确定为所述多个待识别图像最终的目标分类结果。
在一实施例中,识别模块660还用于:
对包括多类复合表情的多个待识别图像进行关键点聚类处理,得到所述每个待识别图像的关键点特征信息;将所述关键点特征信息相同的多个待识别图像,作为相同复合表情的多个待识别图像,得到每类相同复合表情的多个待识别图像。
在一实施例中,识别模块660还用于:
从预设视频中,连续获取多帧相邻的视频图像;将多帧视频图像确定为相同复合表情的多个待识别图像。
在一实施例中,复合表情识别装置600还包括:
上传模块,用于将所述目标分类结果上传至区块链中。
应当理解的是,图6示出的复合表情识别装置的结构框图中,各单元/模块用于执行图1至图5对应的实施例中的各步骤,而对于图1至图5对应的实施例中的各步骤已在上述实施例中进行详细解释,具体请参阅图1至图5以及图1至图5所对应的实施例中的相关描述,此处不再赘述。
图7是本申请另一实施例提供的一种终端设备的结构框图。如图7所示,该实施例的终端设备700包括:处理器701、存储器702以及存储在存储器702中并可在处理器701运行的计算机程序703,例如复合表情识别方法的程序。处理器701执行计算机程序703时实现上述各个复合表情识别方法各实施例中的步骤,例如图1所示的S101至S106。或者,处理器701执行计算机程序703时实现上述图6对应的实施例中各模块的功能,例如,图6所示的模块610至660的功能。具体如下所述:
一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现:
利用第一表情识别模型对待识别图像中的复合表情进行识别,得到多种复合表情分别一一对应的第一概率值;
根据所述第一概率值的最大值确定预测复合表情,并基于预测复合表情确定第二表情识别模型集合中对应的第一目标模型,且将所述待识别图像输入至所述第一目标模型得到预测所述待识别图像为第一复合表情的第一复合概率值;每种所述第一目标模型分别对应两种预测复合表情;
输入所述待识别图像至第三表情识别模型中,预测所述待识别图像包含的多个目标单一表情;
根据所述多个目标单一表情确定所述第二表情识别模型集合中对应的第二目标模型,并将所述待识别图像输入至所述第二目标模型中得到预测所述待识别图像为第二复合表情的第二复合概率值;每种所述第二目标模型分别对应的两种预测复合表情中,可分别由所述多个目标单一表情一一对应组合得到;
获取所述第一表情识别模型对应的的第一误分类概率,并获取所述每个第二表情识别模型对应的第一复合误分类概率,以及获取所述每个第二表情识别模型对应的第二复合误分类概率;
根据所述第一概率值、所述第一误分类概率、所述第一复合概率值、所述第一复合误分类概率、所述第二复合概率值以及所述第二复合误分类概率,得到所述待识别图像的目标分类结果。
在一个实施例中,所述处理器执行所述计算机程序时还实现:
统计每种复合表情中,所述预测结果错误的错误数量;
基于所述每种复合表情对应的多张训练图像的总数以及所述错误数量,计算所述第一表情识别模型对所述每种复合表情进行预测时的第一误分类概率。
在一个实施例中,所述处理器执行所述计算机程序时还实现:
将所述第一目标模型预测所述待识别图像为非第一复合表情的第一复合概率值,以及所述第二目标模型预测所述待识别图像为非第二复合表情的第二复合概率值,均调整为预设值;
根据同一类别复合表情对应的第一概率值、所述第一误分类概率、所述第一复合概率值、所述第一复合误分类概率、所述第二复合概率值、所述第二复合误分类概率以及所述预设值,计算所述待识别图像中每种复合表情对应的分类值;
确定多个分类值中的最大值,并将所述最大值对应的复合表情作为所述待识别图像的目标分类结果。
在一个实施例中,所述待识别图像包括多个,多个待识别图像均属于相同复合表情类别;所述处理器执行所述计算机程序时还实现:
获取所述相同复合表情的多个待识别图像中,每个待识别图像的目标分类结果;
从多个目标分类结果中,获取相同目标分类结果的分类数量;
将所述分类数量最多的目标分类结果,确定为所述多个待识别图像最终的目标分类结果。
在一个实施例中,所述处理器执行所述计算机程序时还实现:
对包括多类复合表情的多个待识别图像进行关键点聚类处理,得到所述每个待识别图像的关键点特征信息;
将所述关键点特征信息相同的多个待识别图像,作为相同复合表情的多个待识别图像,得到每类相同复合表情的多个待识别图像。
在一个实施例中,所述处理器执行所述计算机程序时还实现:
从预设视频中,连续获取多帧相邻的视频图像;
将多帧视频图像确定为相同复合表情的多个待识别图像。
在一个实施例中,所述处理器执行所述计算机程序时还实现:
将所述目标分类结果上传至区块链中。
一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现:
利用第一表情识别模型对待识别图像中的复合表情进行识别,得到多种复合表情分别一一对应的第一概率值;
根据所述第一概率值的最大值确定预测复合表情,并基于预测复合表情确定第二表情识别模型集合中对应的第一目标模型,且将所述待识别图像输入至所述第一目标模型得到预测所述待识别图像为第一复合表情的第一复合概率值;每种所述第一目标模型分别对应两种预测复合表情;
输入所述待识别图像至第三表情识别模型中,预测所述待识别图像包含的多个目标单一表情;
根据所述多个目标单一表情确定所述第二表情识别模型集合中对应的第二目标模型,并将所述待识别图像输入至所述第二目标模型中得到预测所述待识别图像为第二复合表情的第二复合概率值;每种所述第二目标模型分别对应的两种预测复合表情中,可分别由所述多个目标单一表情一一对应组合得到;
获取所述第一表情识别模型对应的的第一误分类概率,并获取所述每个第二表情识别模型对应的第一复合误分类概率,以及获取所述每个第二表情识别模型对应的第二复合误分类概率;
根据所述第一概率值、所述第一误分类概率、所述第一复合概率值、所述第一复合误分类概率、所述第二复合概率值以及所述第二复合误分类概率,得到所述待识别图像的目标分类结果。
在一个实施例中,所述计算机程序被处理器执行时还实现:
获取训练数据中多种复合表情对应的多张训练图像,并将所述多张训练图像输入至所述第一表情识别模型,得到每张训练图像的预测结果;
统计每种复合表情中,所述预测结果错误的错误数量;
基于所述每种复合表情对应的多张训练图像的总数以及所述错误数量,计算所述第一表情识别模型对所述每种复合表情进行预测时的第一误分类概率。
在一个实施例中,所述计算机程序被处理器执行时还实现:
将所述第一目标模型预测所述待识别图像为非第一复合表情的第一复合概率值,以及所述第二目标模型预测所述待识别图像为非第二复合表情的第二复合概率值,均调整为预设值;
根据同一类别复合表情对应的第一概率值、所述第一误分类概率、所述第一复合概率值、所述第一复合误分类概率、所述第二复合概率值、所述第二复合误分类概率以及所述预设值,计算所述待识别图像中每种复合表情对应的分类值;
确定多个分类值中的最大值,并将所述最大值对应的复合表情作为所述待识别图像的目标分类结果。
在一个实施例中,所述待识别图像包括多个,多个待识别图像均属于相同复合表情类别;所述计算机程序被处理器执行时还实现:
获取所述相同复合表情的多个待识别图像中,每个待识别图像的目标分类结果;
从多个目标分类结果中,获取相同目标分类结果的分类数量;
将所述分类数量最多的目标分类结果,确定为所述多个待识别图像最终的目标分类结果。
在一个实施例中,所述计算机程序被处理器执行时还实现:
对包括多类复合表情的多个待识别图像进行关键点聚类处理,得到所述每个待识别图像的关键点特征信息;
将所述关键点特征信息相同的多个待识别图像,作为相同复合表情的多个待识别图像,得到每类相同复合表情的多个待识别图像。
在一个实施例中,所述计算机程序被处理器执行时还实现:
从预设视频中,连续获取多帧相邻的视频图像;
将多帧视频图像确定为相同复合表情的多个待识别图像。
在一个实施例中,所述计算机程序被处理器执行时还实现:
将所述目标分类结果上传至区块链中。
示例性的,计算机程序703可以被分割成一个或多个单元,一个或者多个单元被存储在存储器702中,并由处理器701执行,以完成本申请。一个或多个单元可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述计算机程序703在终端设备700中的执行过程。例如,计算机程序703可以被分割成第一预测模块、第一复合预测模块、单一表情预测模块、第二复合预测模块、获取模块以及识别模块,各模块具体功能如上。
终端设备可包括,但不仅限于,处理器701、存储器702。本领域技术人员可以理解,图7仅仅是终端设备700的示例,并不构成对终端设备700的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如终端设备还可以包括输入输出设备、网络接入设备、总线等。
所称处理器701可以是中央处理单元,还可以是其他通用处理器、数字信号处理器、专用集成电路、现成可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
存储器702可以是终端设备700的内部存储单元,例如终端设备700的硬盘或内存。存储器702也可以是终端设备700的外部存储设备,例如终端设备700上配备的插接式硬盘,智能存储卡,闪存卡等。进一步地,存储器702还可以既包括终端设备700的内部存储单元也包括外部存储设备。
所述计算机可读存储介质可以是前述实施例所述的终端设备的内部存储单元,例如所述终端设备的硬盘或内存。所述计算机可读存储介质可以是非易失性,也可以是易失性。所述计算机可读存储介质也可以是所述终端设备的外部存储设备,例如所述终端设备上配备的插接式硬盘,智能存储卡安全数字卡,闪存卡等。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种复合表情识别方法,其中,包括:
    利用第一表情识别模型对待识别图像中的复合表情进行识别,得到多种复合表情分别一一对应的第一概率值;
    根据所述第一概率值的最大值确定预测复合表情,并基于预测复合表情确定第二表情识别模型集合中对应的第一目标模型,且将所述待识别图像输入至所述第一目标模型得到预测所述待识别图像为第一复合表情的第一复合概率值;每种所述第一目标模型分别对应两种预测复合表情;
    输入所述待识别图像至第三表情识别模型中,预测所述待识别图像包含的多个目标单一表情;
    根据所述多个目标单一表情确定所述第二表情识别模型集合中对应的第二目标模型,并将所述待识别图像输入至所述第二目标模型中得到预测所述待识别图像为第二复合表情的第二复合概率值;每种所述第二目标模型分别对应的两种预测复合表情中,可分别由所述多个目标单一表情一一对应组合得到;
    获取所述第一表情识别模型对应的的第一误分类概率,并获取所述每个第二表情识别模型对应的第一复合误分类概率,以及获取所述每个第二表情识别模型对应的第二复合误分类概率;
    根据所述第一概率值、所述第一误分类概率、所述第一复合概率值、所述第一复合误分类概率、所述第二复合概率值以及所述第二复合误分类概率,得到所述待识别图像的目标分类结果。
  2. 如权利要求1所述的复合表情识别方法,其中,所述获取所述第一表情识别模型对应的的第一误分类概率,包括:
    获取训练数据中多种复合表情对应的多张训练图像,并将所述多张训练图像输入至所述第一表情识别模型,得到每张训练图像的预测结果;
    统计每种复合表情中,所述预测结果错误的错误数量;
    基于所述每种复合表情对应的多张训练图像的总数以及所述错误数量,计算所述第一表情识别模型对所述每种复合表情进行预测时的第一误分类概率。
  3. 如权利要求1所述的复合表情识别方法,其中,所述根据所述第一概率值、所述第一误分类概率、所述第一复合概率值、所述第一复合误分类概率、所述第二复合概率值以及所述第二复合误分类概率,得到所述待识别图像的目标分类结果,包括:
    将所述第一目标模型预测所述待识别图像为非第一复合表情的第一复合概率值,以及所述第二目标模型预测所述待识别图像为非第二复合表情的第二复合概率值,均调整为预设值;
    根据同一类别复合表情对应的第一概率值、所述第一误分类概率、所述第一复合概率值、所述第一复合误分类概率、所述第二复合概率值、所述第二复合误分类概率以及所述预设值,计算所述待识别图像中每种复合表情对应的分类值;
    确定多个分类值中的最大值,并将所述最大值对应的复合表情作为所述待识别图像的目标分类结果。
  4. 如权利要求3所述的复合表情识别方法,其中,所述待识别图像包括多个,多个待识别图像均属于相同复合表情类别;
    所述根据所述第一概率值、所述第一误分类概率、所述第一复合概率值、所述第一复合误分类概率、所述第二复合概率值以及所述第二复合误分类概率,得到所述待识别图像的目标分类结果,包括:
    获取所述相同复合表情的多个待识别图像中,每个待识别图像的目标分类结果;
    从多个目标分类结果中,获取相同目标分类结果的分类数量;
    将所述分类数量最多的目标分类结果,确定为所述多个待识别图像最终的目标分类结果。
  5. 如权利要求4所述的复合表情识别方法,其中,在所述获取所述相同复合表情的多个待识别图像中,每张待识别图像的分类结果之前,还包括:
    对包括多类复合表情的多个待识别图像进行关键点聚类处理,得到所述每个待识别图像的关键点特征信息;
    将所述关键点特征信息相同的多个待识别图像,作为相同复合表情的多个待识别图像,得到每类相同复合表情的多个待识别图像。
  6. 如权利要求4所述的复合表情识别方法,其中,在所述获取所述相同复合表情的多个待识别图像中,每张待识别图像的分类结果之前,还包括:
    从预设视频中,连续获取多帧相邻的视频图像;
    将多帧视频图像确定为相同复合表情的多个待识别图像。
  7. 如权利要求1-6任一所述的复合表情识别方法,其中,在所述根据所述第一概率值、所述第一误分类概率、所述第一复合概率值、所述第一复合误分类概率、所述第二复合概率值以及所述第二复合误分类概率,得到所述待识别图像的目标分类结果之后,还包括:
    将所述目标分类结果上传至区块链中。
  8. 一种复合表情识别装置,其中,所述装置包括:
    第一预测模块,用于利用第一表情识别模型对待识别图像中的复合表情进行识别,得到多种复合表情分别一一对应的第一概率值;
    第一复合预测模块,用于根据所述第一概率值的最大值确定预测复合表情,并基于预测复合表情确定第二表情识别模型集合中对应的第一目标模型,且将所述待识别图像输入至所述第一目标模型得到预测所述待识别图像为第一复合表情的第一复合概率值;每种所述第一目标模型分别对应两种预测复合表情;
    单一表情预测模块,用于输入所述待识别图像至第三表情识别模型中,预测所述待识别图像包含的多个目标单一表情;
    第二复合预测模块,用于根据所述多个目标单一表情确定所述第二表情识别模型集合中对应的第二目标模型,并将所述待识别图像输入至所述第二目标模型中得到预测所述待识别图像为第二复合表情的第二复合概率值;每种所述第二目标模型分别对应的两种预测复合表情中,可分别由所述多个目标单一表情一一对应组合得到;
    获取模块,用于获取所述第一表情识别模型对应的的第一误分类概率,并获取所述每个第二表情识别模型对应的第一复合误分类概率,以及获取所述每个第二表情识别模型对应的第二复合误分类概率;
    识别模块,用于根据所述第一概率值、所述第一误分类概率、所述第一复合概率值、所述第一复合误分类概率、所述第二复合概率值以及所述第二复合误分类概率,得到所述待识别图像的目标分类结果。
  9. 一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时实现:
    利用第一表情识别模型对待识别图像中的复合表情进行识别,得到多种复合表情分别一一对应的第一概率值;
    根据所述第一概率值的最大值确定预测复合表情,并基于预测复合表情确定第二表情识别模型集合中对应的第一目标模型,且将所述待识别图像输入至所述第一目标模型得到预测所述待识别图像为第一复合表情的第一复合概率值;每种所述第一目标模型分别对应两种预测复合表情;
    输入所述待识别图像至第三表情识别模型中,预测所述待识别图像包含的多个目标单一表情;
    根据所述多个目标单一表情确定所述第二表情识别模型集合中对应的第二目标模型,并将所述待识别图像输入至所述第二目标模型中得到预测所述待识别图像为第二复合表情的第二复合概率值;每种所述第二目标模型分别对应的两种预测复合表情中,可分别由所述多个目标单一表情一一对应组合得到;
    获取所述第一表情识别模型对应的的第一误分类概率,并获取所述每个第二表情识别模型对应的第一复合误分类概率,以及获取所述每个第二表情识别模型对应的第二复合误分类概率;
    根据所述第一概率值、所述第一误分类概率、所述第一复合概率值、所述第一复合误分类概率、所述第二复合概率值以及所述第二复合误分类概率,得到所述待识别图像的目标分类结果。
  10. 根据权利要求9所述的终端设备,其中,所述处理器执行所述计算机程序时还实现:
    获取训练数据中多种复合表情对应的多张训练图像,并将所述多张训练图像输入至所述第一表情识别模型,得到每张训练图像的预测结果;
    统计每种复合表情中,所述预测结果错误的错误数量;
    基于所述每种复合表情对应的多张训练图像的总数以及所述错误数量,计算所述第一表情识别模型对所述每种复合表情进行预测时的第一误分类概率。
  11. 根据权利要求9所述的终端设备,其中,所述处理器执行所述计算机程序时还实现:
    将所述第一目标模型预测所述待识别图像为非第一复合表情的第一复合概率值,以及所述第二目标模型预测所述待识别图像为非第二复合表情的第二复合概率值,均调整为预设值;
    根据同一类别复合表情对应的第一概率值、所述第一误分类概率、所述第一复合概率值、所述第一复合误分类概率、所述第二复合概率值、所述第二复合误分类概率以及所述预设值,计算所述待识别图像中每种复合表情对应的分类值;
    确定多个分类值中的最大值,并将所述最大值对应的复合表情作为所述待识别图像的目标分类结果。
  12. 根据权利要求11所述的终端设备,其中,所述待识别图像包括多个,多个待识别图像均属于相同复合表情类别;所述处理器执行所述计算机程序时还实现:
    获取所述相同复合表情的多个待识别图像中,每个待识别图像的目标分类结果;
    从多个目标分类结果中,获取相同目标分类结果的分类数量;
    将所述分类数量最多的目标分类结果,确定为所述多个待识别图像最终的目标分类结果。
  13. 根据权利要求12所述的终端设备,其中,所述处理器执行所述计算机程序时还实现:
    对包括多类复合表情的多个待识别图像进行关键点聚类处理,得到所述每个待识别图像的关键点特征信息;
    将所述关键点特征信息相同的多个待识别图像,作为相同复合表情的多个待识别图像,得到每类相同复合表情的多个待识别图像。
  14. 根据权利要求12所述的终端设备,其中,所述处理器执行所述计算机程序时还实现:
    从预设视频中,连续获取多帧相邻的视频图像;
    将多帧视频图像确定为相同复合表情的多个待识别图像。
  15. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其中,所述计算机程序被处理器执行时实现:
    利用第一表情识别模型对待识别图像中的复合表情进行识别,得到多种复合表情分别一一对应的第一概率值;
    根据所述第一概率值的最大值确定预测复合表情,并基于预测复合表情确定第二表情 识别模型集合中对应的第一目标模型,且将所述待识别图像输入至所述第一目标模型得到预测所述待识别图像为第一复合表情的第一复合概率值;每种所述第一目标模型分别对应两种预测复合表情;
    输入所述待识别图像至第三表情识别模型中,预测所述待识别图像包含的多个目标单一表情;
    根据所述多个目标单一表情确定所述第二表情识别模型集合中对应的第二目标模型,并将所述待识别图像输入至所述第二目标模型中得到预测所述待识别图像为第二复合表情的第二复合概率值;每种所述第二目标模型分别对应的两种预测复合表情中,可分别由所述多个目标单一表情一一对应组合得到;
    获取所述第一表情识别模型对应的的第一误分类概率,并获取所述每个第二表情识别模型对应的第一复合误分类概率,以及获取所述每个第二表情识别模型对应的第二复合误分类概率;
    根据所述第一概率值、所述第一误分类概率、所述第一复合概率值、所述第一复合误分类概率、所述第二复合概率值以及所述第二复合误分类概率,得到所述待识别图像的目标分类结果。
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时还实现:
    获取训练数据中多种复合表情对应的多张训练图像,并将所述多张训练图像输入至所述第一表情识别模型,得到每张训练图像的预测结果;
    统计每种复合表情中,所述预测结果错误的错误数量;
    基于所述每种复合表情对应的多张训练图像的总数以及所述错误数量,计算所述第一表情识别模型对所述每种复合表情进行预测时的第一误分类概率。
  17. 根据权利要求15所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时还实现:
    将所述第一目标模型预测所述待识别图像为非第一复合表情的第一复合概率值,以及所述第二目标模型预测所述待识别图像为非第二复合表情的第二复合概率值,均调整为预设值;
    根据同一类别复合表情对应的第一概率值、所述第一误分类概率、所述第一复合概率值、所述第一复合误分类概率、所述第二复合概率值、所述第二复合误分类概率以及所述预设值,计算所述待识别图像中每种复合表情对应的分类值;
    确定多个分类值中的最大值,并将所述最大值对应的复合表情作为所述待识别图像的目标分类结果。
  18. 根据权利要求17所述的计算机可读存储介质,其中,所述待识别图像包括多个,多个待识别图像均属于相同复合表情类别;所述计算机程序被处理器执行时还实现:
    获取所述相同复合表情的多个待识别图像中,每个待识别图像的目标分类结果;
    从多个目标分类结果中,获取相同目标分类结果的分类数量;
    将所述分类数量最多的目标分类结果,确定为所述多个待识别图像最终的目标分类结果。
  19. 根据权利要求18所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时还实现:
    对包括多类复合表情的多个待识别图像进行关键点聚类处理,得到所述每个待识别图像的关键点特征信息;
    将所述关键点特征信息相同的多个待识别图像,作为相同复合表情的多个待识别图像,得到每类相同复合表情的多个待识别图像。
  20. 根据权利要求18所述的计算机可读存储介质,其中,所述计算机程序被处理器执行时还实现:
    从预设视频中,连续获取多帧相邻的视频图像;
    将多帧视频图像确定为相同复合表情的多个待识别图像。
PCT/CN2021/091094 2020-11-19 2021-04-29 复合表情识别方法、装置、终端设备及存储介质 WO2022105130A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011304521.8 2020-11-19
CN202011304521.8A CN112381019B (zh) 2020-11-19 2020-11-19 复合表情识别方法、装置、终端设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022105130A1 true WO2022105130A1 (zh) 2022-05-27

Family

ID=74584463

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/091094 WO2022105130A1 (zh) 2020-11-19 2021-04-29 复合表情识别方法、装置、终端设备及存储介质

Country Status (2)

Country Link
CN (1) CN112381019B (zh)
WO (1) WO2022105130A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112381019B (zh) * 2020-11-19 2021-11-09 平安科技(深圳)有限公司 复合表情识别方法、装置、终端设备及存储介质
CN113158788B (zh) * 2021-03-12 2024-03-08 中国平安人寿保险股份有限公司 人脸表情识别方法、装置、终端设备及存储介质
CN113920575A (zh) * 2021-12-15 2022-01-11 深圳佑驾创新科技有限公司 一种人脸表情识别方法、装置及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010133661A1 (en) * 2009-05-20 2010-11-25 Tessera Technologies Ireland Limited Identifying facial expressions in acquired digital images
CN108921061A (zh) * 2018-06-20 2018-11-30 腾讯科技(深圳)有限公司 一种表情识别方法、装置和设备
CN109325422A (zh) * 2018-08-28 2019-02-12 深圳壹账通智能科技有限公司 表情识别方法、装置、终端及计算机可读存储介质
CN112381019A (zh) * 2020-11-19 2021-02-19 平安科技(深圳)有限公司 复合表情识别方法、装置、终端设备及存储介质

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110363079A (zh) * 2019-06-05 2019-10-22 平安科技(深圳)有限公司 表情交互方法、装置、计算机装置及计算机可读存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010133661A1 (en) * 2009-05-20 2010-11-25 Tessera Technologies Ireland Limited Identifying facial expressions in acquired digital images
CN108921061A (zh) * 2018-06-20 2018-11-30 腾讯科技(深圳)有限公司 一种表情识别方法、装置和设备
CN109325422A (zh) * 2018-08-28 2019-02-12 深圳壹账通智能科技有限公司 表情识别方法、装置、终端及计算机可读存储介质
CN112381019A (zh) * 2020-11-19 2021-02-19 平安科技(深圳)有限公司 复合表情识别方法、装置、终端设备及存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XUE YULI: "The Research Advance of Facial Expression Recognition in Human Computer Interaction", JOURNAL OF IMAGE AND GRAPHICS, ZHONGGUO TUXIANG TUXING XUEHUI, CN, vol. 14, no. 5, 31 May 2009 (2009-05-31), CN , pages 764 - 772, XP055931691, ISSN: 1006-8961 *

Also Published As

Publication number Publication date
CN112381019A (zh) 2021-02-19
CN112381019B (zh) 2021-11-09

Similar Documents

Publication Publication Date Title
WO2022105130A1 (zh) 复合表情识别方法、装置、终端设备及存储介质
US11032585B2 (en) Real-time synthetically generated video from still frames
TWI788529B (zh) 基於lstm模型的信用風險預測方法及裝置
CN109583332B (zh) 人脸识别方法、人脸识别***、介质及电子设备
WO2020248841A1 (zh) 图像的au检测方法、装置、电子设备及存储介质
CN111931795B (zh) 基于子空间稀疏特征融合的多模态情感识别方法及***
CN112395979B (zh) 基于图像的健康状态识别方法、装置、设备及存储介质
WO2021237907A1 (zh) 基于多分类器的风险识别方法、装置、计算机设备及存储介质
WO2021164317A1 (zh) 序列挖掘模型的训练方法、序列数据的处理方法及设备
WO2020244174A1 (zh) 人脸识别方法、装置、设备及计算机可读存储介质
CN111126347B (zh) 人眼状态识别方法、装置、终端及可读存储介质
Akbari et al. Distribution cognisant loss for cross-database facial age estimation with sensitivity analysis
WO2023179429A1 (zh) 一种视频数据的处理方法、装置、电子设备及存储介质
WO2020252903A1 (zh) Au检测方法、装置、电子设备及存储介质
Wang et al. Unleash the black magic in age: a multi-task deep neural network approach for cross-age face verification
WO2023109631A1 (zh) 数据处理方法、装置、设备、存储介质及程序产品
CN112365007A (zh) 模型参数确定方法、装置、设备及存储介质
WO2021068613A1 (zh) 面部识别方法、装置、设备及计算机可读存储介质
WO2023019927A1 (zh) 一种人脸识别方法、装置、存储介质及电子设备
WO2021147404A1 (zh) 依存关系分类方法及相关设备
CN114723652A (zh) 细胞密度确定方法、装置、电子设备及存储介质
US20230080048A1 (en) Method and apparatus for generating a contagion prevention health assessment
JP6947460B1 (ja) プログラム、情報処理装置、及び方法
CN107403199A (zh) 数据处理方法和装置
Saraswathi et al. Detection of synthesized videos using cnn

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21893286

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21893286

Country of ref document: EP

Kind code of ref document: A1