WO2022042043A1 - 机器学习模型的训练方法、装置和电子设备 - Google Patents
机器学习模型的训练方法、装置和电子设备 Download PDFInfo
- Publication number
- WO2022042043A1 WO2022042043A1 PCT/CN2021/104517 CN2021104517W WO2022042043A1 WO 2022042043 A1 WO2022042043 A1 WO 2022042043A1 CN 2021104517 W CN2021104517 W CN 2021104517W WO 2022042043 A1 WO2022042043 A1 WO 2022042043A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- machine learning
- learning model
- loss function
- image sample
- image
- Prior art date
Links
- 238000010801 machine learning Methods 0.000 title claims abstract description 134
- 238000000034 method Methods 0.000 title claims abstract description 58
- 238000012549 training Methods 0.000 title claims abstract description 56
- 230000006870 function Effects 0.000 claims abstract description 89
- 238000002372 labelling Methods 0.000 claims abstract description 17
- 238000012545 processing Methods 0.000 claims description 18
- 238000004590 computer program Methods 0.000 claims description 4
- 230000002596 correlated effect Effects 0.000 claims description 3
- 230000000875 corresponding effect Effects 0.000 claims description 3
- 238000013473 artificial intelligence Methods 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 230000003993 interaction Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000009977 dual effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000008521 reorganization Effects 0.000 description 2
- 241000282414 Homo sapiens Species 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/7715—Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/178—Human faces, e.g. facial parts, sketches or expressions estimating age from face image; using age information for improving recognition
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Definitions
- the present application is based on the CN application number 202010878794.7 and the filing date is August 27, 2020, and claims its priority.
- the disclosure of the CN application is hereby incorporated into the present application as a whole.
- the present disclosure relates to the technical field of artificial intelligence, and in particular, to a method for training a machine learning model, a device for a machine learning model, a method for recognizing age in a face image, an apparatus for recognizing an age in a face image, an electronic device, and a non-volatile computer Read the storage medium.
- Deep machine learning is one of the most important breakthroughs in the field of artificial intelligence in the past decade. It has achieved great success in speech recognition, natural language processing, computer vision, image and video analysis, multimedia and many other fields.
- face image processing technology based on deep machine learning is a very important research direction in computer vision tasks.
- Age estimation based on face image refers to the application of computer technology to model the law of face image change with age, so that the machine can infer the approximate age of a person or the age range to which they belong based on the face image.
- This technology has many applications, such as video surveillance, product recommendation, human-computer interaction, market analysis, user profiling, age progression, etc. If the age estimation problem based on face images is solved, then in daily life, various human-computer interaction systems based on age information will have great application requirements in real life.
- the machine learning model is trained by using the output results of the machine learning model itself and the pre-labeled results.
- a method for training a machine learning model comprising: inputting an image sample into a regression machine learning model; extracting a feature map of the image sample by using the regression machine learning model, and The feature map determines the recognition result of the image sample; the feature map is input into a classification machine learning model; according to the feature map, the classification machine learning model is used to determine the membership probability that the image sample belongs to each classification; The recognition result and the labeling result of the image sample are used to calculate the first loss function, and the second loss function is calculated according to the membership probability and the labeling result of the image sample; using the first loss function and the second loss function The loss function to train the regression machine learning model.
- the using the first loss function and the second loss function to train the regression machine learning model includes: training the regression machine learning model using the first loss function, and then using the first loss function to train the regression machine learning model.
- the regression machine learning model is trained by a weighted sum of the first loss function and the second loss function.
- the training of the regression machine learning model using the first loss function and the second loss function includes: training the classification machine learning model using the second loss function, and then using the second loss function to train the classification machine learning model.
- the classification machine learning model is trained by a weighted sum of the first loss function and the second loss function.
- the calculating the second loss function according to the membership probability and the labeling result of the image sample includes: according to the proportion of the number of samples in the correct classification to which the image sample belongs to the total number of samples, The second loss function is calculated, and the second loss function is negatively correlated with the proportion.
- using the regression machine learning model to extract the feature map of the image sample includes: using the regression machine learning model to extract the channel features of the image samples for each image channel; combining the channel features into the image samples feature map.
- using a regression machine learning model to extract the channel features of the image samples for each image channel includes: using a regression machine learning model to convolve the image samples according to different image channels to extract the characteristics of each channel.
- determining the membership probability of the image sample belonging to each category by using a classification machine learning model according to the feature map includes: using the classification machine learning model to determine each image channel in the feature map The association information between the two images; the feature map is updated according to the association information; the membership probability that the image sample belongs to each category is determined according to the updated feature map.
- the updating the feature map according to the association information includes: determining the weight of each channel feature according to the association information; using the weight to perform weighting processing on the corresponding channel feature; After processing the features of each channel, the feature map is updated.
- the image sample is a face image sample
- the recognition result is the age of the face in the face image sample
- each classification is a classification of each age group.
- an apparatus for training a machine learning model comprising at least one processor configured to perform the steps of: inputting image samples into a regression machine learning model, and using the regression machine
- the learning model extracts the feature map of the image sample, and determines the recognition result of the image sample according to the feature map; inputs the feature map into the classification machine learning model, and uses the classification machine learning model according to the feature map.
- determine the membership probability that the image sample belongs to each category calculate the first loss function according to the recognition result and the labeling result of the image sample, and calculate the second loss function according to the membership probability and the labeling result of the image sample loss function; using the first loss function and the second loss function to train the regression machine learning model.
- the using the first loss function and the second loss function to train the regression machine learning model includes: training the regression machine learning model using the first loss function, and then using the first loss function to train the regression machine learning model.
- the regression machine learning model is trained by a weighted sum of the first loss function and the second loss function.
- the training of the regression machine learning model using the first loss function and the second loss function includes: training the classification machine learning model using the second loss function, and then using the second loss function to train the classification machine learning model.
- the classification machine learning model is trained by a weighted sum of the first loss function and the second loss function.
- the calculating the second loss function according to the membership probability and the labeling result of the image sample includes: according to the proportion of the number of samples in the correct classification to which the image sample belongs to the total number of samples, The second loss function is calculated, and the second loss function is negatively correlated with the proportion.
- using the regression machine learning model to extract the feature map of the image sample includes: using the regression machine learning model to extract the channel features of the image samples for each image channel; combining the channel features into the image samples feature map.
- using a regression machine learning model to extract the channel features of the image samples for each image channel includes: using a regression machine learning model to convolve the image samples according to different image channels to extract the characteristics of each channel.
- determining the membership probability of the image sample belonging to each category by using a classification machine learning model according to the feature map includes: using the classification machine learning model to determine each image channel in the feature map The association information between the two images; the feature map is updated according to the association information; the membership probability that the image sample belongs to each category is determined according to the updated feature map.
- the updating the feature map according to the association information includes: determining the weight of each channel feature according to the association information; using the weight to perform weighting processing on the corresponding channel feature; After processing the features of each channel, the feature map is updated.
- the image sample is a face image sample
- the recognition result is the age of the face in the face image sample
- each classification is a classification of each age group.
- a method for recognizing the age of a face image comprising: recognizing the age of the face in the face image using a regression machine learning model trained by the training method in any of the above embodiments.
- an apparatus for identifying age of a face image comprising at least one processor configured to perform the following steps: using the training method in any one of the above embodiments to train the regression A machine learning model that recognizes the age of faces in images of faces.
- an electronic device comprising: a memory; and a processor coupled to the memory, the processor configured to execute the above-described based on instructions stored in the memory device.
- a non-volatile computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the method for training a machine learning model in any of the foregoing embodiments Or age recognition methods for face images.
- FIG. 1 shows a flowchart of some embodiments of the training method of the machine learning model of the present disclosure
- FIG. 2 shows a flowchart of some embodiments of step 110 in FIG. 1;
- FIG. 3 shows a flowchart of some embodiments of step 120 in FIG. 1;
- FIG. 4 shows a schematic diagram of some embodiments of the training method of the machine learning model of the present disclosure
- Figure 5 shows a flowchart of some embodiments of the apparatus for training a machine learning model of the present disclosure
- FIG. 6 illustrates a block diagram of some embodiments of electronic devices of the present disclosure
- FIG. 7 illustrates a block diagram of further embodiments of the electronic device of the present disclosure.
- the inventors of the present disclosure have found that the above-mentioned related technologies have the following problems: the training effect cannot meet the task requirements, resulting in low processing capability of the machine learning model.
- the present disclosure proposes a technical solution for training a machine learning model, which can use a classification model to assist in training a regression model, thereby improving the processing capability of the machine learning model.
- a regression machine learning model (such as for age recognition) can be constructed by using a convolutional network with fewer parameters (such as a shuffle Net model, etc.), which can improve the processing speed on the premise of ensuring the processing accuracy.
- a classification machine learning model with fine processing granularity (such as attention network) is used to assist in training. This allows, for example, to distinguish faces of different ages on features such as facial complexion.
- the technical solutions of the present disclosure can be realized through the following embodiments.
- FIG. 1 shows a flowchart of some embodiments of the training method of the machine learning model of the present disclosure.
- the training method includes: step 110, determining the recognition result of the image sample; step 120, determining each membership probability of the image sample; step 130, calculating the first and second loss functions; and step 140, training the regression machine Learning models.
- step 110 the image sample is input into the regression machine learning model, the feature map of the image sample is extracted by using the regression machine learning model, and the recognition result of the image sample is determined according to the feature map.
- the feature map may be extracted by the embodiment in FIG. 2 .
- FIG. 2 shows a flowchart of some embodiments of step 110 in FIG. 1 .
- step 110 includes: step 1110 , extracting features of each channel; and step 1120 , combining feature maps.
- step 1110 a regression machine learning model is used to extract the channel features of the image samples for each image channel.
- a regression machine learning model is used to convolve image samples according to different image channels to extract features of each channel.
- step 1120 the channel features are combined into a feature map of the image sample.
- step 120 the feature map is input into the classification machine learning model, and the classification machine learning model is used to determine the membership probability of the image sample belonging to each classification according to the feature map.
- membership probabilities may be determined by the embodiment in FIG. 3 .
- FIG. 3 shows a flowchart of some embodiments of step 120 in FIG. 1 .
- step 120 includes: step 1210 , determining the associated information of each image channel; step 1220 , updating the feature map; and step 1230 , determining each membership probability.
- the classification machine learning model is used to determine the correlation information between the image channels in the feature map.
- the correlation information between the channel features in the feature map can be extracted as the correlation information between each image channel.
- step 1220 the feature map is updated according to the associated information.
- the weight of each channel feature is determined according to the correlation information; the feature map is updated according to the weighted channel feature.
- step 1230 the membership probability of the image sample belonging to each category is determined according to the updated feature map.
- step 130 a first loss function is calculated according to the recognition result and the labeling result of the image sample.
- a second loss function is calculated according to the membership probability and the labeling result of the image sample.
- the first loss function may be implemented using Mae loss (Mean Absolute loss, mean absolute error).
- the first loss function can be:
- yi is the labeling result of the image sample (such as the real age value)
- Recognition results (such as predicted age values) output by a regression machine learning model.
- Mae loss is insensitive to outliers, thereby improving the performance of machine learning models.
- the second loss function is calculated according to the proportion of the number of samples in the correct classification to which the image samples belong to the total number of samples.
- the second loss function is negatively related to the proportion.
- the correct classification of the current image sample is class i
- the number of samples in class i is n i
- the total number of samples in all classes is N.
- the second loss function is negatively related to the proportion of ni in N.
- the number of samples in the sample datasets of various age groups are not evenly distributed. For example, particularly young children and older adults over 65 are less present. In this case, treating each age group equally to calculate the loss function would result in a lower training effect.
- Focal loss can be used to solve the problem of imbalanced proportions of different types of samples.
- the second loss function can be determined as:
- y i ' is the membership probability of the current image sample to category i.
- y i_label is the labeling result of the current image sample for category i. For example, if the correct classification of the current image sample is class i, then y i_label is 1, otherwise it is 0.
- ⁇ >0 is an adjustable hyperparameter, which can reduce the loss of easy-to-classify samples and make the training process focus more on difficult and misclassified samples.
- class_weight i is the proportion parameter of class i, and class_weight i can be:
- class_weight i N/(n class ⁇ n i )
- n class is the number of all classes.
- step 140 a regression machine learning model is trained using the first loss function and the second loss function.
- the regression machine learning model is trained using the first loss function, and then the regression machine learning model is trained using a weighted sum of the first loss function and the second loss function.
- the classification machine learning model is trained using the second loss function, and then the classification machine learning model is trained using a weighted sum of the first loss function and the second loss function.
- the weighted sum of the first loss function and the second loss function can be used to determine the comprehensive loss function L for training a regression machine learning model and a classification machine learning model:
- the image sample may be a face image sample
- the recognition result is the age of the face in the face image sample
- each classification is a classification of each age group.
- the regression machine learning model is used to estimate the age of the face
- the classification machine learning model is used to determine the membership probability that the face belongs to each age category (eg, age group).
- the regression machine learning model trained by the training method in any of the above embodiments can be used to identify the age of the face in the face image.
- FIG. 4 shows a schematic diagram of some embodiments of the training method of the machine learning model of the present disclosure.
- the entire network model can be divided into two parts: a regression machine learning model for extracting features and age estimation; a classification machine learning model with an attention mechanism module for calculating the membership probability of each classification.
- a regression machine learning model may be constructed using the Group convolution and Channel shuffle modules of shuffle Net V2 (shuffle network).
- the grouped convolution module may group different feature maps of the input layer according to different image channels. Then use different convolution kernels to convolve each group.
- a grouped convolution module can be implemented using Depth Wise, where the number of groups is equal to the number of input channels.
- this channel sparse connection method can be used to reduce the calculation amount of convolution.
- the output is the convolution result of each group, that is, the feature of each channel.
- the grouped convolution results cannot achieve the purpose of feature communication between channels.
- the channel shuffling module can be used to "recombine" the features of each channel, so that the recombined feature map can contain the components in the features of each channel.
- the grouped convolution module taking the restructured feature map as input can continue to perform feature extraction based on information from different channels. Therefore, this information can flow between different groups, improving the processing power of the machine learning model.
- a regression machine learning model can include the Conv1_BR module.
- the Conv1_BR module can include convolutional layers (such as 16 3 ⁇ 3 convolution kernels with stride of 2 and padding of 1) and BR (Batch norm Relu, batch regularization activation) layer.
- multiple grouped convolution modules and multiple channel reorganization modules can be alternately connected for extracting feature maps.
- the Conv5_BR module can be connected after multiple grouped convolution modules and multiple channel reorganization modules.
- the Conv5_BR module can include convolutional layers (such as 32 1 ⁇ 1 convolutions with stride of 1 and padding of 0) and BR layers.
- the Conv5_BR module can be followed by connecting a Flatten (flattening) layer, a fully connected layer Fc1 (such as a fully connected layer whose dimension is the number of age categories), a Softmax layer, and a fully connected layer Fc2 (such as dimension 1).
- the output of Fc2 can be an age estimate.
- the CAM Choannel Attention mechanism, channel attention mechanism CAM
- DANet Dual Attention Network, dual attention mechanism network
- the CAM module is used to extract the relationship (association information) between the features of each channel. For example, each channel feature can be weighted according to the associated information to update each channel feature.
- a classification machine learning model can include a Conv6_BR layer connected after a CAM module.
- the Conv6_BR layer can include convolutional layers (such as 32 1 ⁇ 1 convolutions with stride of 1 and padding of 0) and BR layers.
- a Flatten layer for example, a fully connected layer Fc_fl (such as a fully connected layer with a dimension equal to the number of age values), and a softmax layer can also be connected behind the Conv6_BR layer.
- the final output face belongs to the membership probability of each age value.
- a regression machine learning model may be trained according to a first loss function; a classification machine learning model may be trained according to a second loss function; and a regression machine learning model may be trained with a comprehensive loss function.
- the classification learning model is used to share the feature map extracted by the regression learning model, and assist in training the regression learning model.
- the machine learning model can be trained by combining classification processing and regression processing, thereby improving the processing capability of the machine learning model.
- Figure 5 shows a flowchart of some embodiments of the apparatus for training a machine learning model of the present disclosure.
- the training device 5 of the machine learning model includes at least one processor 51 .
- the processor 51 is configured to perform the training method in any of the above-described embodiments.
- FIG. 6 illustrates a block diagram of some embodiments of electronic devices of the present disclosure.
- the electronic device 6 of this embodiment includes: a memory 61 and a processor 62 coupled to the memory 61 , the processor 62 is configured to execute any one of the present disclosure based on instructions stored in the memory 61 The training method of the machine learning model or the age recognition method of the face image in the embodiment.
- the memory 61 may include, for example, a system memory, a fixed non-volatile storage medium, and the like.
- the system memory stores, for example, an operating system, an application program, a boot loader (Boot Loader), a database, and other programs.
- FIG. 7 illustrates a block diagram of further embodiments of the electronic device of the present disclosure.
- the electronic device 7 of this embodiment includes: a memory 710 and a processor 720 coupled to the memory 710 , and the processor 720 is configured to execute any one of the foregoing embodiments based on instructions stored in the memory 710 Training methods for machine learning models in or age recognition methods for face images.
- Memory 710 may include, for example, system memory, fixed non-volatile storage media, and the like.
- the system memory stores, for example, an operating system, an application program, a boot loader (Boot Loader), and other programs.
- the electronic device 7 may also include an input-output interface 730, a network interface 740, a storage interface 750, and the like. These interfaces 730 , 740 , 750 and the memory 710 and the processor 720 can be connected, for example, through a bus 760 .
- the input and output interface 730 provides a connection interface for input and output devices such as a display, a mouse, a keyboard, a touch screen, a microphone, and a speaker.
- Network interface 740 provides a connection interface for various networked devices.
- the storage interface 750 provides a connection interface for external storage devices such as SD cards and U disks.
- embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein .
- computer-usable non-transitory storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
- the methods and systems of the present disclosure may be implemented in many ways.
- the methods and systems of the present disclosure may be implemented in software, hardware, firmware, or any combination of software, hardware, and firmware.
- the above-described order of steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise.
- the present disclosure can also be implemented as programs recorded in a recording medium, the programs including machine-readable instructions for implementing methods according to the present disclosure.
- the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Human Computer Interaction (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Biology (AREA)
- Biodiversity & Conservation Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (14)
- 一种机器学习模型的训练方法,包括:将图像样本输入回归机器学习模型,利用所述回归机器学习模型提取所述图像样本的特征图,根据所述特征图确定所述图像样本的识别结果;将所述特征图输入分类机器学习模型,根据所述特征图,利用所述分类机器学习模型,确定所述图像样本属于各分类的隶属概率;根据所述识别结果和所述图像样本的标注结果,计算第一损失函数,根据所述隶属概率和所述图像样本的标注结果,计算第二损失函数;利用所述第一损失函数和所述第二损失函数,训练所述回归机器学习模型。
- 根据权利要求1所述的训练方法,其中,所述利用所述第一损失函数和所述第二损失函数,训练所述回归机器学习模型包括:利用所述第一损失函数训练所述回归机器学习模型,然后利用所述第一损失函数和所述第二损失函数的加权和训练所述回归机器学习模型。
- 根据权利要求1所述的训练方法,其中,所述利用所述第一损失函数和所述第二损失函数,训练所述回归机器学习模型包括:利用所述第二损失函数训练所述分类机器学习模型,然后利用所述第一损失函数和所述第二损失函数的加权和训练所述分类机器学习模型。
- 根据权利要求1所述的训练方法,其中,所述根据所述隶属概率和所述图像样本的标注结果,计算第二损失函数包括:根据所述图像样本所属正确分类中的样本数量在总样本数量中的占比,计算所述第二损失函数,所述第二损失函数与所述占比负相关。
- 根据权利要求1所述的训练方法,其中,所述利用回归机器学习模型提取图像样本的特征图包括:利用回归机器学习模型提取所述图像样本对于各图像通道的通道特征;将各所述通道特征组合为所述图像样本的特征图。
- 根据权利要求5所述的训练方法,其中,所述利用回归机器学习模型提取所述图像样本对于各图像通道的通道特征包括:利用回归机器学习模型,按照不同的图像通道分别对所述图像样本进行卷积,提取所述各通道特征。
- 根据权利要求1所述的训练方法,其中,所述根据所述特征图,利用分类机器学习模型,确定所述图像样本属于各分类的隶属概率包括:利用所述分类机器学习模型,确定所述特征图中各图像通道之间的关联信息;根据所述关联信息,更新所述特征图;根据更新后的特征图,确定所述图像样本属于各分类的隶属概率。
- 根据权利要求7所述的训练方法,其中,所述根据所述关联信息,更新所述特征图包括:根据所述关联信息,确定所述各通道特征的权重;利用所述权重,对相应的通道特征进行加权处理;根据加权处理后的所述各通道特征,更新所述特征图。
- 根据权利要求1-8任一项所述的训练方法,其中,所述图像样本为人脸图像样本,所述识别结果为所述人脸图像样本中人脸的年龄,所述各分类为各年龄段分类。
- 一种人脸图像的年龄识别方法,包括:利用权利要求1-9任一项所述的训练方法训练的回归机器学习模型,识别人脸图像中人脸的年龄。
- 一种机器学习模型的训练装置,包括至少一个处理器,所述处理器被配置为执行如下步骤:将图像样本输入回归机器学习模型,利用所述回归机器学习模型提取所述图像样本的特征图,根据所述特征图确定所述图像样本的识别结果;将所述特征图输入分类机器学习模型,根据所述特征图,利用所述分类机器学习模型,确定所述图像样本属于各分类的隶属概率;根据所述识别结果和所述图像样本的标注结果,计算第一损失函数,根据所述隶属概率和所述图像样本的标注结果,计算第二损失函数;利用所述第一损失函数和所述第二损失函数,训练所述回归机器学习模型。
- 一种人脸图像的年龄识别装置,包括至少一个处理器,所述处理器被配置为执行如下步骤:利用权利要求1-9任一项所述的训练方法训练的回归机器学习模型,识别人脸图像中人脸的年龄。
- 一种电子设备,包括:存储器;和耦接至所述存储器的处理器,所述处理器被配置为基于存储在所述存储器中的指令,执行权利要求1-9任一项所述的机器学习模型的训练方法或权利要求10所述的人脸图像的年龄识别方法。
- 一种非易失性计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现权利要求1-9任一项所述的机器学习模型的训练方法或权利要求10所述的人脸图像的年龄识别方法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/788,608 US20230030419A1 (en) | 2020-08-27 | 2021-07-05 | Machine Learning Model Training Method and Device and Electronic Equipment |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010878794.7A CN112016450B (zh) | 2020-08-27 | 2020-08-27 | 机器学习模型的训练方法、装置和电子设备 |
CN202010878794.7 | 2020-08-27 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022042043A1 true WO2022042043A1 (zh) | 2022-03-03 |
Family
ID=73502724
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/104517 WO2022042043A1 (zh) | 2020-08-27 | 2021-07-05 | 机器学习模型的训练方法、装置和电子设备 |
Country Status (3)
Country | Link |
---|---|
US (1) | US20230030419A1 (zh) |
CN (1) | CN112016450B (zh) |
WO (1) | WO2022042043A1 (zh) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114714145A (zh) * | 2022-05-07 | 2022-07-08 | 嘉兴南湖学院 | 一种刀具磨损状态的格拉姆角场增强对比学习监测方法 |
CN114743043A (zh) * | 2022-03-15 | 2022-07-12 | 北京迈格威科技有限公司 | 一种图像分类方法、电子设备、存储介质及程序产品 |
CN115049851A (zh) * | 2022-08-15 | 2022-09-13 | 深圳市爱深盈通信息技术有限公司 | 基于YOLOv5网络的目标检测方法、装置和设备终端 |
CN116564556A (zh) * | 2023-07-12 | 2023-08-08 | 北京大学 | 药物不良反应的预测方法、装置、设备及存储介质 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112016450B (zh) * | 2020-08-27 | 2023-09-05 | 京东方科技集团股份有限公司 | 机器学习模型的训练方法、装置和电子设备 |
CN115482422B (zh) * | 2022-09-20 | 2023-10-17 | 北京百度网讯科技有限公司 | 深度学习模型的训练方法、图像处理方法和装置 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110084216A (zh) * | 2019-05-06 | 2019-08-02 | 苏州科达科技股份有限公司 | 人脸识别模型训练和人脸识别方法、***、设备及介质 |
CN110197099A (zh) * | 2018-02-26 | 2019-09-03 | 腾讯科技(深圳)有限公司 | 跨年龄人脸识别及其模型训练的方法和装置 |
CN110287942A (zh) * | 2019-07-03 | 2019-09-27 | 成都旷视金智科技有限公司 | 年龄估计模型的训练方法、年龄估计方法以及对应的装置 |
US20200012884A1 (en) * | 2018-07-03 | 2020-01-09 | General Electric Company | Classification based on annotation information |
CN111259967A (zh) * | 2020-01-17 | 2020-06-09 | 北京市商汤科技开发有限公司 | 图像分类及神经网络训练方法、装置、设备及存储介质 |
CN112016450A (zh) * | 2020-08-27 | 2020-12-01 | 京东方科技集团股份有限公司 | 机器学习模型的训练方法、装置和电子设备 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111061889B (zh) * | 2018-10-16 | 2024-03-29 | 京东方艺云(杭州)科技有限公司 | 图片多标签的自动识别方法和装置 |
CN111461155A (zh) * | 2019-01-18 | 2020-07-28 | 富士通株式会社 | 训练分类模型的装置和方法 |
CN109871909B (zh) * | 2019-04-16 | 2021-10-01 | 京东方科技集团股份有限公司 | 图像识别方法及装置 |
CN110033332A (zh) * | 2019-04-23 | 2019-07-19 | 杭州智趣智能信息技术有限公司 | 一种人脸识别方法、***及电子设备和存储介质 |
CN111368672A (zh) * | 2020-02-26 | 2020-07-03 | 苏州超云生命智能产业研究院有限公司 | 一种用于遗传病面部识别模型的构建方法及装置 |
-
2020
- 2020-08-27 CN CN202010878794.7A patent/CN112016450B/zh active Active
-
2021
- 2021-07-05 WO PCT/CN2021/104517 patent/WO2022042043A1/zh active Application Filing
- 2021-07-05 US US17/788,608 patent/US20230030419A1/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110197099A (zh) * | 2018-02-26 | 2019-09-03 | 腾讯科技(深圳)有限公司 | 跨年龄人脸识别及其模型训练的方法和装置 |
US20200012884A1 (en) * | 2018-07-03 | 2020-01-09 | General Electric Company | Classification based on annotation information |
CN110084216A (zh) * | 2019-05-06 | 2019-08-02 | 苏州科达科技股份有限公司 | 人脸识别模型训练和人脸识别方法、***、设备及介质 |
CN110287942A (zh) * | 2019-07-03 | 2019-09-27 | 成都旷视金智科技有限公司 | 年龄估计模型的训练方法、年龄估计方法以及对应的装置 |
CN111259967A (zh) * | 2020-01-17 | 2020-06-09 | 北京市商汤科技开发有限公司 | 图像分类及神经网络训练方法、装置、设备及存储介质 |
CN112016450A (zh) * | 2020-08-27 | 2020-12-01 | 京东方科技集团股份有限公司 | 机器学习模型的训练方法、装置和电子设备 |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114743043A (zh) * | 2022-03-15 | 2022-07-12 | 北京迈格威科技有限公司 | 一种图像分类方法、电子设备、存储介质及程序产品 |
CN114743043B (zh) * | 2022-03-15 | 2024-04-26 | 北京迈格威科技有限公司 | 一种图像分类方法、电子设备、存储介质及程序产品 |
CN114714145A (zh) * | 2022-05-07 | 2022-07-08 | 嘉兴南湖学院 | 一种刀具磨损状态的格拉姆角场增强对比学习监测方法 |
CN114714145B (zh) * | 2022-05-07 | 2023-05-12 | 嘉兴南湖学院 | 一种刀具磨损状态的格拉姆角场增强对比学习监测方法 |
CN115049851A (zh) * | 2022-08-15 | 2022-09-13 | 深圳市爱深盈通信息技术有限公司 | 基于YOLOv5网络的目标检测方法、装置和设备终端 |
CN116564556A (zh) * | 2023-07-12 | 2023-08-08 | 北京大学 | 药物不良反应的预测方法、装置、设备及存储介质 |
CN116564556B (zh) * | 2023-07-12 | 2023-11-10 | 北京大学 | 药物不良反应的预测方法、装置、设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN112016450B (zh) | 2023-09-05 |
CN112016450A (zh) | 2020-12-01 |
US20230030419A1 (en) | 2023-02-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2022042043A1 (zh) | 机器学习模型的训练方法、装置和电子设备 | |
Wen et al. | Generalized incomplete multiview clustering with flexible locality structure diffusion | |
Zhao et al. | Robust lightweight facial expression recognition network with label distribution training | |
US11537884B2 (en) | Machine learning model training method and device, and expression image classification method and device | |
CN107766787B (zh) | 人脸属性识别方法、装置、终端及存储介质 | |
Deng et al. | Image aesthetic assessment: An experimental survey | |
US11222196B2 (en) | Simultaneous recognition of facial attributes and identity in organizing photo albums | |
Guo et al. | Face recognition based on convolutional neural network and support vector machine | |
Ali et al. | Boosted NNE collections for multicultural facial expression recognition | |
Do et al. | Deep neural network-based fusion model for emotion recognition using visual data | |
CN109063719B (zh) | 一种联合结构相似性和类信息的图像分类方法 | |
CN112395979B (zh) | 基于图像的健康状态识别方法、装置、设备及存储介质 | |
Salunke et al. | A new approach for automatic face emotion recognition and classification based on deep networks | |
Ren et al. | Semantic facial descriptor extraction via axiomatic fuzzy set | |
Yi et al. | Multi-modal learning for affective content analysis in movies | |
Yang et al. | Robust discriminant feature selection via joint L2, 1-norm distance minimization and maximization | |
Meng et al. | Few-shot image classification algorithm based on attention mechanism and weight fusion | |
Das et al. | Determining attention mechanism for visual sentiment analysis of an image using svm classifier in deep learning based architecture | |
Chauhan et al. | Analysis of Intelligent movie recommender system from facial expression | |
Lu et al. | Domain-aware se network for sketch-based image retrieval with multiplicative euclidean margin softmax | |
Deeb et al. | Human facial emotion recognition using improved black hole based extreme learning machine | |
CN112200260B (zh) | 一种基于丢弃损失函数的人物属性识别方法 | |
Dong et al. | A supervised dictionary learning and discriminative weighting model for action recognition | |
Yap et al. | Neural information processing | |
Zhao et al. | Multi-view dimensionality reduction via subspace structure agreement |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21859883 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21859883 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 25-09.2023) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21859883 Country of ref document: EP Kind code of ref document: A1 |