CN114495229A - Image recognition processing method and device, equipment, medium and product - Google Patents

Image recognition processing method and device, equipment, medium and product Download PDF

Info

Publication number
CN114495229A
CN114495229A CN202210097300.0A CN202210097300A CN114495229A CN 114495229 A CN114495229 A CN 114495229A CN 202210097300 A CN202210097300 A CN 202210097300A CN 114495229 A CN114495229 A CN 114495229A
Authority
CN
China
Prior art keywords
image
image recognition
sample face
face image
classification loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210097300.0A
Other languages
Chinese (zh)
Inventor
黄泽斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202210097300.0A priority Critical patent/CN114495229A/en
Publication of CN114495229A publication Critical patent/CN114495229A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure provides a processing method, device, medium and product for image recognition, which relate to the technical field of artificial intelligence, specifically to the technical field of deep learning and computer vision, and can be applied to scene domains such as face recognition. The specific implementation scheme comprises the following steps: taking the sample face image as input data of an image recognition model to obtain an image recognition result which is associated with the sample face image and is based on at least one image area; calculating a classification loss value associated with the image recognition model according to the image recognition result and a preset shielding label which is associated with the sample face image and is based on at least one image area; and adjusting model parameters of the image recognition model based on the classification loss value to obtain the adjusted image recognition model.

Description

Image recognition processing method and device, equipment, medium and product
Technical Field
The disclosure relates to the technical field of artificial intelligence, in particular to the technical field of deep learning and computer vision, and can be applied to scene domains such as face recognition.
Background
In the model training process, the classification loss value associated with the deep learning model can be determined according to the model identification result aiming at the sample data and the preset label data. And adjusting the model parameters of the deep learning model by utilizing the back propagation of the classification loss value to obtain the optimized deep learning model. However, in some scenarios, the optimization efficiency of the deep learning model is low, and the robustness is poor.
Disclosure of Invention
The present disclosure provides a method, apparatus, device, medium, and product for processing image recognition.
According to an aspect of the present disclosure, there is provided a processing method of image recognition, including: taking a sample face image as input data of an image recognition model to obtain an image recognition result which is associated with the sample face image and is based on at least one image area; calculating a classification loss value associated with the image recognition model according to the image recognition result and a preset shielding label which is associated with the sample face image and is based on the at least one image area; and adjusting model parameters of the image recognition model based on the classification loss value to obtain the adjusted image recognition model.
According to another aspect of the present disclosure, there is provided a processing apparatus for image recognition, including: the first processing module is used for taking the sample face image as input data of an image recognition model to obtain an image recognition result which is related to the sample face image and is based on at least one image area; a second processing module, configured to calculate a classification loss value associated with the image recognition model according to the image recognition result and a preset occlusion label associated with the sample face image and based on the at least one image region; and the third processing module is used for adjusting the model parameters of the image recognition model based on the classification loss value to obtain the adjusted image recognition model.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor and a memory communicatively coupled to the at least one processor. Wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the image recognition processing method described above.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the above-described image recognition processing method.
According to another aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the processing method of image recognition described above.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
fig. 1 schematically shows a system architecture of a processing method and apparatus for image recognition according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow diagram of a processing method of image recognition according to an embodiment of the present disclosure;
fig. 3 schematically shows a schematic diagram of a processing method of image recognition according to another embodiment of the present disclosure;
FIG. 4 schematically shows a diagram of image recognition results according to an embodiment of the present disclosure;
fig. 5 schematically shows a block diagram of a processing apparatus of image recognition according to an embodiment of the present disclosure;
fig. 6 schematically shows a block diagram of an electronic device for performing a process of image recognition according to an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
The embodiment of the disclosure provides a processing method for image recognition. The processing method of image recognition comprises the following steps: the method comprises the steps of taking a sample face image as input data of an image recognition model, obtaining an image recognition result which is related to the sample face image and is based on at least one image area, calculating a classification loss value which is related to the image recognition model according to the image recognition result and a preset shielding label which is related to the sample face image and is based on at least one image area, and adjusting model parameters of the image recognition model based on the classification loss value to obtain an adjusted image recognition model.
Fig. 1 schematically shows a system architecture of a processing method and apparatus for image recognition according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which the embodiments of the present disclosure may be applied to help those skilled in the art understand the technical content of the present disclosure, and does not mean that the embodiments of the present disclosure may not be applied to other devices, systems, environments or scenarios.
The system architecture 100 according to this embodiment may include an image database 101, a network 102, and a server 103. Network 102 is the medium used to provide a communication link between image database 101 and server 103. Network 102 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few. The server 103 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud computing, network services, middleware services, and the like.
The image database 101 interacts with a server 103 via a network 102 to receive or transmit face image data and the like. The image database 101 may be used to receive, store, or transmit facial image data.
The server 103 may be a server providing various services, such as a background processing server (for example only) that recognizes or processes face image data from the image database 101. The background processing server can identify and process the received data such as images, videos, audios and the like.
For example, the server 103 may obtain a sample face image from the image database 101, and the server 103 is configured to use the sample face image as input data of an image recognition model to obtain an image recognition result based on at least one image area associated with the sample face image. The server 103 is further configured to calculate a classification loss value associated with the image recognition model according to the image recognition result and a preset occlusion label associated with the sample face image and based on at least one image region, and adjust a model parameter of the image recognition model based on the classification loss value to obtain an adjusted image recognition model.
It should be noted that the processing method for image recognition provided by the embodiment of the present disclosure may be executed by the server 103. Accordingly, the processing device for image recognition provided by the embodiment of the present disclosure may be disposed in the server 103. The processing method for image recognition provided by the embodiment of the present disclosure may also be executed by a server or a server cluster that is different from the server 103 and is capable of communicating with the image database 101 and/or the server 103. Accordingly, the processing device for image recognition provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 103 and capable of communicating with the image database 101 and/or the server 103.
It should be understood that the number of image databases, networks, and servers in FIG. 1 are merely illustrative. There may be any number of image databases, networks, and servers, as desired for implementation.
The embodiment of the present disclosure provides a processing method for image recognition, and the processing method for image recognition according to an exemplary embodiment of the present disclosure is described below with reference to fig. 2 to 4 in conjunction with the system architecture of fig. 1. The processing method of image recognition of the embodiment of the present disclosure may be executed by the server 103 shown in fig. 1, for example.
Fig. 2 schematically shows a flowchart of a processing method of image recognition according to an embodiment of the present disclosure.
As shown in fig. 2, the processing method 200 of image recognition of the embodiment of the present disclosure may include, for example, operations S210 to S230.
In operation S210, the sample face image is used as input data of the image recognition model, and an image recognition result based on at least one image region associated with the sample face image is obtained.
In operation S220, a classification loss value associated with the image recognition model is calculated according to the image recognition result and a preset occlusion label based on at least one image region associated with the sample face image.
In operation S230, model parameters of the image recognition model are adjusted based on the classification loss value, resulting in an adjusted image recognition model.
An exemplary flow of each operation of the processing method of image recognition of the present embodiment is illustrated below.
Illustratively, the sample facial image may be obtained in various public, legally compliant manners, such as from a public image database, or by an image capture terminal after obtaining user authorization associated with the sample facial image. The sample face image is not image data for a specific user, and does not reflect personal information of a specific user.
The execution subject of the image recognition processing method may use the sample face image as input data of the image recognition model, and obtain an image recognition result based on at least one image area associated with the sample face image. There may be a face-blocking area in the sample face image, and the face-blocking area may be a blocking area caused by a face wearing object such as sunglasses, a mask, or may be a blocking area caused by a stain, an external blocking object, for example.
Before the sample face image is identified, the sample face image can be aligned to obtain an aligned sample face image. When the alignment processing is performed, facial feature recognition may be performed on the sample face image to obtain at least one facial feature point. And carrying out affine transformation alignment on the identified facial feature points and the facial feature points in the preset face template to obtain an aligned sample face image. The facial feature points may be feature points associated with facial five-sense organ regions, and may include, for example, left eye bead feature points, right eye bead feature points, nose tip feature points, left mouth corner feature points, right mouth corner feature points, and the like.
And identifying the sample face image by using the image identification model to obtain an image identification result which is associated with the sample face image and is based on at least one image area. The image recognition result may indicate a face-occlusion condition in the sample face image, which may include, for example, whether a face-occlusion exists, and location information of a face-occlusion region in the sample face image.
The image recognition model may be an initial recognition model obtained by randomly generating initial model parameters, or may be a preset recognition model obtained by performing iterative training on the initial recognition model. The image recognition model can be used for recognizing the sample face image to obtain an occlusion recognition result associated with the sample face image. And marking the identification result based on at least one image area according to the shielding identification result to obtain an image identification result. In addition, at least one image area of the sample face image can be identified by using the image identification model, and an image identification result associated with each image area in the at least one image area is obtained.
And calculating a classification loss value associated with the image recognition model according to the image recognition result and a preset shielding label which is associated with the sample face image and is based on at least one image area. The occlusion label indicates a facial occlusion condition of the corresponding image region, which may illustratively include a first label indicating the presence of a facial occlusion in the corresponding image region and a second label indicating the absence of a facial occlusion in the corresponding image region. The first tag may be represented by "1", for example, and the second tag may be represented by "0", for example.
When the preset shielding label associated with at least one image area of the sample image is determined, the sample face image can be shielded and identified through a preset facial shielding standard identification method, and a standard identification result is obtained. And marking the recognition result of at least one image area based on the sample face image based on the standard recognition result to obtain a preset shielding label. The standard identification method may be, for example, a manual identification method, and the preset occlusion label may be a true value obtained by manual labeling.
Illustratively, an initial label image may be constructed that is the same size as the sample face image, which may be an all-0 single-channel image associated with at least one image region. And according to a standard identification result associated with the sample face image, setting 1 for an occlusion label of a corresponding image area with facial occlusion, and setting 0 for an occlusion label of a corresponding image area without facial occlusion to obtain a label image associated with the sample face image, wherein the label image comprises a preset occlusion label associated with at least one image area in the sample face image.
And calculating a classification loss value associated with the image recognition model according to the image recognition result and a preset shielding label which is associated with the sample face image and is based on at least one image area, wherein the classification loss value indicates the difference degree between the image recognition result and the preset shielding label. For example, a classification loss value associated with the image recognition model based on each image region may be calculated according to the image recognition result and a preset occlusion label associated with the sample face image based on at least one image region. From the classification loss value based on each image region, a classification loss comprehensive evaluation value is calculated as a classification loss value associated with the image recognition model.
The classification loss value associated with the image recognition model may be calculated using a loss function, which may be, for example, an L1 loss regression loss function. Illustratively, the classification loss value L may be calculated using the following formula:
Figure BDA0003491017200000071
n denotes the number of image regions in the sample face image, yiIndicating a preset occlusion label for the ith image area, f (x)i) Denotes the occlusion recognition result for the ith image area, f (x)i)-yiThe classification loss value associated with the i-th image region is represented, and L represents the comprehensive evaluation value of the classification loss associated with the n-th image regions, that is, the classification loss value associated with the image recognition model. The preset occlusion label and the occlusion recognition result may be, for example, 0 or 1, 0 indicating that no facial occlusion exists in the corresponding image area, and 1 indicating that a facial occlusion exists in the corresponding image area.
And adjusting model parameters of the image recognition model based on the classification loss value to obtain the adjusted image recognition model. For example, based on the classification loss value, the model parameter of the image recognition model may be iteratively adjusted until the classification loss value converges, or the iterative training time reaches a preset maximum iterative time, or the classification loss value is lower than a preset threshold.
By the embodiment of the disclosure, the classification loss value associated with the image recognition model is calculated according to the image recognition result and the preset shielding label associated with the sample face image and based on the at least one image area by determining the image recognition result associated with the sample face image and based on the at least one image area. And adjusting the model parameters of the image recognition model according to the classification loss value back propagation to obtain the adjusted image recognition model.
And determining a classification loss value associated with the image recognition model according to an image recognition result associated with at least one image area of the sample face image and a preset shielding label, and reversely tuning the image recognition model based on the classification loss value, so that the training efficiency of the image recognition model is improved, and the training effect of the image recognition model is improved. In the model training process, the shielding recognition result and the preset shielding label are refined, so that the robustness of the image recognition model is improved, and the accuracy of face shielding recognition is effectively guaranteed.
Fig. 3 schematically shows a schematic diagram of a processing method of image recognition according to another embodiment of the present disclosure.
As shown in fig. 3, operation S210 may include, for example, operations S310 to S330.
In operation S310, feature extraction is performed on the sample face image to obtain sample image features.
In operation S320, an occlusion recognition result for the sample face image is output through the image recognition model based on the sample image feature.
In operation S330, according to the occlusion recognition result, a recognition result labeling based on at least one image region is performed to obtain an image recognition result.
An exemplary flow of each operation of the processing method of image recognition of the present embodiment is illustrated below.
Illustratively, the sample face image is subjected to feature extraction by using an image recognition model, so as to obtain sample image features. And outputting an occlusion recognition result aiming at the sample face image based on the sample image characteristics. The occlusion recognition result may indicate whether there is a facial occlusion in the sample face image, and indicate position information of a facial occlusion region in the sample face image. And marking the identification result based on at least one image area according to the shielding identification result to obtain an image identification result. The occlusion recognition result associated with an arbitrary image area may be, for example, 0 or 1, 0 indicating that no facial occlusion exists in the corresponding image area, and 1 indicating that a facial occlusion exists in the corresponding image area.
By way of example, at least one facial feature region in a sample facial image may be identified. And according to the shielding identification result aiming at the sample face image, carrying out identification result labeling based on at least one facial feature region to obtain an image identification result, wherein the image identification result can comprise an identification shielding label associated with each facial feature region in the at least one facial feature region.
And obtaining an image recognition result aiming at the sample face image according to the shielding recognition result associated with at least one facial feature region. The image recognition result can reflect the face shielding condition of the face characteristic region in the sample face image, and the image recognition model training is carried out based on the image recognition result, so that the robustness of the image recognition model is improved, and the accuracy of the face shielding recognition is improved.
When at least one facial feature region in the sample face image is identified, facial feature identification can be performed on the sample face image to obtain at least one facial feature point. At least one facial feature region in the sample face image is identified based on the facial feature points.
The facial feature points may be feature points associated with facial feature regions, and facial feature point recognition may be performed on the sample face image by using a facial feature recognition algorithm to obtain at least one facial feature point in the sample face image. The facial feature recognition algorithm may be, for example, an ERT algorithm, an SDM algorithm, an MTCNN algorithm, or an AFLW algorithm, among others. And determining at least one facial feature region in the sample face image according to the identified facial feature points. The facial feature region may be, for example, a facial five sense organ region, which may include, for example, an eye region, a mouth region, a nose region, a forehead region, a chin region, a cheek region, and so forth.
And determining a classification loss value associated with each facial feature region according to the image recognition result and a preset shielding label which is associated with the sample face image and is based on at least one facial feature region. And determining a classification loss comprehensive evaluation value corresponding to at least one facial feature region according to the classification loss value associated with each facial feature region as the classification loss value associated with the image recognition model. The classification loss comprehensive evaluation value may be, for example, an arithmetic mean, a weighted mean, or the like of the classification loss values associated with at least one facial feature region, and this embodiment does not limit this.
In another example, the sample face image may be subjected to segmentation processing to obtain at least one image segmentation region. For example, based on a preset segmentation parameter, a grid-shaped segmentation process may be performed on the sample face image to obtain m × n image segmentation regions distributed in a grid shape. And according to the shielding identification result aiming at the sample face image, carrying out identification result labeling based on at least one image segmentation region to obtain an image identification result, wherein the image identification result comprises an identification shielding label associated with each image segmentation region in the at least one image segmentation region.
And determining a classification loss value associated with each image segmentation region according to the image recognition result and a preset occlusion label which is associated with the sample face image and is based on at least one image segmentation region. And determining a classification loss comprehensive evaluation value corresponding to at least one image segmentation region according to the classification loss value associated with each image segmentation region as the classification loss value associated with the image recognition model. The classification loss comprehensive evaluation value may be, for example, an arithmetic mean value, a weighted mean value, or the like of the classification loss values associated with the at least one image division region, and this embodiment is not limited thereto.
And obtaining an image recognition result aiming at the sample face image according to the shielding recognition result associated with at least one image segmentation region. The image recognition result can reflect the facial shielding condition of at least one image segmentation area in the sample face image, and the image recognition model training is carried out based on the image recognition result, so that the training efficiency of the image recognition model is improved, and the shielding recognition performance of the image recognition model is improved.
According to another example mode, according to the occlusion recognition result aiming at the sample face image, the recognition result labeling based on at least one pixel in the sample face image is carried out, and an image recognition result is obtained, wherein the image recognition result comprises a recognition occlusion label associated with each pixel in the at least one pixel.
And determining a classification loss value associated with each pixel according to the image recognition result and a preset shielding label which is associated with the sample face image and is based on at least one pixel. And determining a classification loss comprehensive evaluation value corresponding to at least one pixel according to the classification loss value associated with each pixel as the classification loss value associated with the image recognition model. The classification loss comprehensive evaluation value may be, for example, an arithmetic mean value, a weighted mean value, or the like of the classification loss values associated with at least one pixel, and this embodiment does not limit this.
And obtaining an image recognition result aiming at the sample face image according to the shielding recognition result associated with at least one pixel. The image recognition result can reflect the shielding recognition condition of at least one pixel in the face image of the sample, and the image recognition model training is carried out based on the image recognition result, so that the robustness of the image recognition model can be effectively improved, and the high-performance recognition aiming at the shielding condition of the face can be realized.
In another example, at least one pixel in the sample face image is subjected to feature extraction to obtain a sample image feature. And outputting an occlusion recognition result aiming at least one pixel in the sample face image through an image recognition model based on the sample image characteristics. And marking the identification result based on at least one pixel in the sample face image according to the occlusion identification result to obtain the image identification result, wherein the image identification result comprises an identification occlusion label based on at least one pixel in the sample face image.
In another example, a sample face image is used as input data of an image recognition model, and at least one facial feature region in the sample face image is subjected to feature extraction by using the image recognition model to obtain sample image features. Based on the sample image features, an occlusion recognition result for at least one facial feature region is output. And aiming at the shielding recognition result of at least one facial feature region, forming an image recognition result associated with the sample face image.
And determining a classification loss value associated with each facial feature region according to the occlusion identification result associated with each facial feature region and a preset occlusion label. And calculating a classification loss comprehensive evaluation value corresponding to at least one facial feature region according to the classification loss value associated with each facial feature region as the classification loss value associated with the image recognition model.
In another example, a sample face image is used as input data of an image recognition model, and at least one facial feature region in the sample face image is subjected to feature extraction by using the image recognition model to obtain sample image features. Based on the sample image features, an occlusion recognition result for at least one facial feature region is output. And aiming at the shielding recognition result of at least one facial feature region, forming an image recognition result associated with the sample face image.
And determining a classification loss value associated with each facial feature region according to the occlusion identification result associated with each facial feature region and a preset occlusion label. And calculating a classification loss comprehensive evaluation value corresponding to at least one facial feature region according to the classification loss value associated with each facial feature region as a classification loss value associated with the image recognition model.
In another example, a sample face image is used as input data of an image recognition model, and feature extraction is performed on at least one pixel in the sample face image by using the image recognition model to obtain sample image features. Based on the sample image features, an occlusion recognition result for at least one pixel is output. And aiming at the occlusion recognition result of at least one pixel, forming an image recognition result associated with the sample face image.
And determining a classification loss value associated with each pixel according to the occlusion identification result associated with each pixel and a preset occlusion label. And calculating a classification loss comprehensive evaluation value corresponding to at least one pixel according to the classification loss value associated with each pixel as the classification loss value associated with the image recognition model.
And based on a back propagation algorithm and a gradient descent algorithm, iteratively adjusting model parameters of the image recognition model according to the classification loss value associated with the image recognition model to obtain the optimized image recognition model. And performing iterative training on the image recognition model for multiple times until the classification loss value associated with the image recognition model is converged or is smaller than a preset threshold value, or the iterative training times reach the preset iterative times threshold value.
According to the embodiment of the disclosure, the sample face image is subjected to feature extraction to obtain sample image features, based on the sample image features, the shielding recognition result aiming at the sample face image is output through the image recognition model, and according to the shielding recognition result, the recognition result based on at least one image area is labeled to obtain the image recognition result.
And determining an occlusion recognition result associated with at least one image area of the sample face image to obtain an image recognition result for the sample face image. The image recognition result can reflect richer and more detailed face shielding conditions in the sample face image. And performing iterative training aiming at the image recognition model according to the image recognition result and a preset shielding label associated with at least one image area, so that the training efficiency of the image recognition model is improved, and the robustness of the image recognition model is improved.
Fig. 4 schematically shows a schematic diagram of an image recognition result according to an embodiment of the present disclosure.
As shown in fig. 4, in the schematic diagram 400, the image recognition result associated with the face sample image 4a may include an occlusion recognition result for at least one facial feature region (as shown in fig. 4 b), or an occlusion recognition result for at least one image segmentation region (as shown in fig. 4 c), or an occlusion recognition result for at least one pixel.
The face sample image 4a may be identified by an image identification model to obtain an occlusion identification result associated with the face sample image 4 a. And marking the identification result based on at least one facial feature region based on the occlusion identification result to obtain an image identification result 4b associated with the face sample image 4 a. The facial feature regions may include, for example, regions 4b1 (forehead region), 4b2 (right eye region), 4b3 (right cheek region), 4b4 (chin region), 4b5 (mouth region), and the occlusion recognition results associated with regions 4b1, 4b2, 4b3, 4b4, 4b5 are 4b1-0, 4b2-1, 4b3-0, 4b4-0, 4b5-0, respectively, with 0 indicating that there is no facial occlusion for the corresponding region and 1 indicating that there is facial occlusion for the corresponding region.
The face sample image 4a may be identified by an image identification model to obtain an occlusion identification result associated with the face sample image 4 a. And marking the identification result based on at least one image segmentation area based on the occlusion identification result to obtain an image identification result 4c associated with the face sample image 4 a. The image segmentation areas may include, for example, areas 4c1, 4c2, and the occlusion recognition results associated with areas 4c1, 4c2 are 4c1-0, 4c2-1, respectively, with 0 indicating that there is no face occlusion for the corresponding area and 1 indicating that there is a face occlusion for the corresponding area.
In the model training process, the shielding recognition result is refined, the classification loss value associated with the image recognition model is determined based on the refined image recognition result and the preset shielding label, the image recognition model is reversely optimized based on the classification loss value, the training efficiency of the image recognition model is favorably improved, the robustness of the image recognition model can be effectively improved, and the accuracy of face shielding recognition is improved.
Fig. 5 schematically shows a block diagram of a processing apparatus of image recognition according to an embodiment of the present disclosure.
As shown in fig. 5, the processing apparatus 500 for image recognition of the embodiment of the present disclosure includes, for example, a first processing module 510, a second processing module 520, and a third processing module 530.
A first processing module 510, configured to use the sample face image as input data of an image recognition model, and obtain an image recognition result based on at least one image region and associated with the sample face image; a second processing module 520, configured to calculate a classification loss value associated with the image recognition model according to the image recognition result and a preset occlusion label associated with the sample face image and based on at least one image region; and a third processing module 530, configured to adjust model parameters of the image recognition model based on the classification loss value, to obtain an adjusted image recognition model.
By the embodiment of the disclosure, the classification loss value associated with the image recognition model is calculated according to the image recognition result and the preset shielding label associated with the sample face image and based on the at least one image area by determining the image recognition result associated with the sample face image and based on the at least one image area. And adjusting the model parameters of the image recognition model according to the classification loss value back propagation to obtain the adjusted image recognition model.
And determining a classification loss value associated with the image recognition model according to an image recognition result associated with at least one image area of the sample face image and a preset shielding label, and reversely tuning the image recognition model based on the classification loss value, so that the training efficiency of the image recognition model is improved, and the training effect of the image recognition model is improved. In the model training process, the shielding recognition result and the preset shielding label are refined, so that the robustness of the image recognition model is improved, and the accuracy of face shielding recognition is effectively guaranteed.
According to an embodiment of the present disclosure, a first processing module includes: the first processing submodule is used for carrying out feature extraction on the sample face image to obtain sample image features; the second processing submodule is used for outputting an occlusion recognition result aiming at the sample face image through the image recognition model based on the sample image characteristics; and the third processing submodule is used for labeling the identification result based on at least one image area according to the shielding identification result to obtain an image identification result.
According to an embodiment of the present disclosure, the first processing module further includes: a fourth processing sub-module for identifying at least one facial feature region in the sample face image, the third processing sub-module comprising: and the first processing unit is used for labeling the recognition result based on at least one facial feature region according to the shielding recognition result to obtain an image recognition result.
According to an embodiment of the present disclosure, the fourth processing submodule includes: the second processing unit is used for carrying out facial feature recognition on the sample face image to obtain at least one facial feature point; and a third processing unit for identifying at least one facial feature region in the sample face image based on the facial feature points.
According to an embodiment of the present disclosure, the first processing module further includes: the fifth processing submodule is used for carrying out segmentation processing on the sample face image to obtain at least one image segmentation area; the third processing submodule includes: and the fourth processing unit is used for labeling the recognition result based on at least one image segmentation area according to the shielding recognition result to obtain an image recognition result.
According to an embodiment of the present disclosure, the third processing submodule includes: and the fifth processing unit is used for labeling the identification result based on at least one pixel in the sample face image according to the shielding identification result to obtain an image identification result.
According to an embodiment of the present disclosure, the second processing module includes: a sixth processing submodule for calculating a classification loss value associated with the image recognition model based on each image region, based on the image recognition result and an occlusion label associated with the sample face image based on at least one image region; and a seventh processing sub-module for calculating a classification loss comprehensive evaluation value as a classification loss value associated with the image recognition model from the classification loss value based on each image region.
It should be noted that in the technical solutions of the present disclosure, the processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the related information are all in accordance with the regulations of the related laws and regulations, and do not violate the customs of the public order.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
Fig. 6 schematically shows a block diagram of an electronic device for performing a processing method of image recognition according to an embodiment of the present disclosure.
FIG. 6 illustrates a schematic block diagram of an example electronic device 600 that can be used to implement embodiments of the present disclosure. The electronic device 600 is intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 6, the apparatus 600 includes a computing unit 601, which can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 can also be stored. The calculation unit 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
A number of components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, a mouse, or the like; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 601 executes the respective methods and processes described above, such as the processing method of image recognition. For example, in some embodiments, the processing method of image recognition may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into the RAM 603 and executed by the computing unit 601, one or more steps of the processing method of image recognition described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured by any other suitable means (e.g., by means of firmware) to perform the processing method of image recognition.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with an object, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to an object; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which objects can provide input to the computer. Other kinds of devices may also be used to provide for interaction with an object; for example, feedback provided to the subject can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the object may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., an object computer having a graphical object interface or a web browser through which objects can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (17)

1. A processing method of image recognition, comprising:
taking a sample face image as input data of an image recognition model to obtain an image recognition result which is associated with the sample face image and is based on at least one image area;
calculating a classification loss value associated with the image recognition model according to the image recognition result and a preset shielding label which is associated with the sample face image and is based on the at least one image area; and
and adjusting the model parameters of the image recognition model based on the classification loss value to obtain the adjusted image recognition model.
2. The method of claim 1, wherein the using the sample face image as input data of an image recognition model to obtain an image recognition result based on at least one image region associated with the sample face image comprises:
carrying out feature extraction on the sample face image to obtain sample image features;
outputting an occlusion recognition result aiming at the sample face image through the image recognition model based on the sample image characteristics; and
and marking the identification result based on the at least one image area according to the shielding identification result to obtain the image identification result.
3. The method of claim 2, further comprising:
identifying at least one facial feature region in the sample face image,
the marking the identification result based on the at least one image area according to the shielding identification result to obtain the image identification result comprises:
and marking the recognition result based on the at least one facial feature area according to the shielding recognition result to obtain the image recognition result.
4. The method of claim 3, wherein said identifying at least one facial feature region in said sample face image comprises:
carrying out facial feature recognition on the sample face image to obtain at least one facial feature point; and
and identifying at least one facial feature region in the sample face image according to the facial feature points.
5. The method of claim 2, further comprising:
carrying out segmentation processing on the sample face image to obtain at least one image segmentation area;
the marking the identification result based on the at least one image area according to the shielding identification result to obtain the image identification result comprises:
and marking the recognition result based on the at least one image segmentation region according to the shielding recognition result to obtain the image recognition result.
6. The method according to claim 2, wherein the labeling the recognition result based on the at least one image area according to the occlusion recognition result to obtain the image recognition result comprises:
and according to the shielding identification result, carrying out identification result labeling based on at least one pixel in the sample face image to obtain the image identification result.
7. The method according to any one of claims 1 to 6, wherein the calculating of the classification loss value associated with the image recognition model from the image recognition result and a preset occlusion label associated with the sample face image based on the at least one image area comprises:
calculating a classification loss value associated with the image recognition model based on the each image region according to the image recognition result and an occlusion label associated with the sample face image based on the at least one image region; and
and calculating a comprehensive evaluation value of the classification loss according to the classification loss value based on each image area, wherein the comprehensive evaluation value of the classification loss is used as the classification loss value associated with the image recognition model.
8. A processing apparatus of image recognition, comprising:
the first processing module is used for taking the sample face image as input data of an image recognition model to obtain an image recognition result which is related to the sample face image and is based on at least one image area;
a second processing module, configured to calculate a classification loss value associated with the image recognition model according to the image recognition result and a preset occlusion label associated with the sample face image and based on the at least one image region; and
and the third processing module is used for adjusting the model parameters of the image recognition model based on the classification loss value to obtain the adjusted image recognition model.
9. The apparatus of claim 8, wherein the first processing module comprises:
the first processing submodule is used for extracting the characteristics of the sample face image to obtain the characteristics of the sample image;
the second processing submodule is used for outputting an occlusion recognition result aiming at the sample face image through the image recognition model based on the sample image characteristics; and
and the third processing submodule is used for labeling the identification result based on the at least one image area according to the shielding identification result to obtain the image identification result.
10. The apparatus of claim 9, the first processing module further comprising:
a fourth processing sub-module for identifying at least one facial feature region in the sample face image,
the third processing sub-module comprises:
and the first processing unit is used for labeling the recognition result based on the at least one facial feature region according to the shielding recognition result to obtain the image recognition result.
11. The apparatus of claim 10, wherein the fourth processing submodule comprises:
the second processing unit is used for carrying out facial feature recognition on the sample face image to obtain at least one facial feature point; and
and the third processing unit is used for identifying at least one facial feature area in the sample face image according to the facial feature points.
12. The apparatus of claim 9, the first processing module further comprising:
the fifth processing submodule is used for carrying out segmentation processing on the sample face image to obtain at least one image segmentation area;
the third processing sub-module comprises:
and the fourth processing unit is used for labeling the identification result based on the at least one image segmentation area according to the shielding identification result to obtain the image identification result.
13. The apparatus of claim 9, wherein the third processing sub-module comprises:
and the fifth processing unit is used for labeling the identification result based on at least one pixel in the sample face image according to the shielding identification result to obtain the image identification result.
14. The apparatus of any of claims 8 to 13, wherein the second processing module comprises:
a sixth processing submodule, configured to calculate a classification loss value associated with the image recognition model based on the at least one image region according to the image recognition result and an occlusion label associated with the sample face image based on the at least one image region; and
and the seventh processing submodule is used for calculating a comprehensive evaluation value of the classification loss according to the classification loss value based on each image area, and the comprehensive evaluation value of the classification loss is used as the classification loss value associated with the image recognition model.
15. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.
16. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any of claims 1-7.
17. A computer program product comprising a computer program which, when executed by a processor, implements a method according to any one of claims 1 to 7.
CN202210097300.0A 2022-01-26 2022-01-26 Image recognition processing method and device, equipment, medium and product Pending CN114495229A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210097300.0A CN114495229A (en) 2022-01-26 2022-01-26 Image recognition processing method and device, equipment, medium and product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210097300.0A CN114495229A (en) 2022-01-26 2022-01-26 Image recognition processing method and device, equipment, medium and product

Publications (1)

Publication Number Publication Date
CN114495229A true CN114495229A (en) 2022-05-13

Family

ID=81476885

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210097300.0A Pending CN114495229A (en) 2022-01-26 2022-01-26 Image recognition processing method and device, equipment, medium and product

Country Status (1)

Country Link
CN (1) CN114495229A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110135406A (en) * 2019-07-09 2019-08-16 北京旷视科技有限公司 Image-recognizing method, device, computer equipment and storage medium
CN111695463A (en) * 2020-05-29 2020-09-22 深圳数联天下智能科技有限公司 Training method of face impurity detection model and face impurity detection method
CN112633359A (en) * 2020-12-18 2021-04-09 成都艾特能电气科技有限责任公司 Multi-class model training method, medium and equipment based on gradient balance
CN113361363A (en) * 2021-05-31 2021-09-07 北京百度网讯科技有限公司 Training method, device and equipment for face image recognition model and storage medium
CN113392699A (en) * 2021-04-30 2021-09-14 深圳市安思疆科技有限公司 Multi-label deep convolution neural network method and device for face occlusion detection and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110135406A (en) * 2019-07-09 2019-08-16 北京旷视科技有限公司 Image-recognizing method, device, computer equipment and storage medium
CN111695463A (en) * 2020-05-29 2020-09-22 深圳数联天下智能科技有限公司 Training method of face impurity detection model and face impurity detection method
CN112633359A (en) * 2020-12-18 2021-04-09 成都艾特能电气科技有限责任公司 Multi-class model training method, medium and equipment based on gradient balance
CN113392699A (en) * 2021-04-30 2021-09-14 深圳市安思疆科技有限公司 Multi-label deep convolution neural network method and device for face occlusion detection and electronic equipment
CN113361363A (en) * 2021-05-31 2021-09-07 北京百度网讯科技有限公司 Training method, device and equipment for face image recognition model and storage medium

Similar Documents

Publication Publication Date Title
CN113379627A (en) Training method of image enhancement model and method for enhancing image
CN113657289B (en) Training method and device of threshold estimation model and electronic equipment
CN113657269A (en) Training method and device for face recognition model and computer program product
CN114494784A (en) Deep learning model training method, image processing method and object recognition method
CN113361710A (en) Student model training method, picture processing device and electronic equipment
CN114565513A (en) Method and device for generating confrontation image, electronic equipment and storage medium
CN113326773A (en) Recognition model training method, recognition method, device, equipment and storage medium
CN113378855A (en) Method for processing multitask, related device and computer program product
CN114120414A (en) Image processing method, image processing apparatus, electronic device, and medium
CN113643260A (en) Method, apparatus, device, medium and product for detecting image quality
CN113902696A (en) Image processing method, image processing apparatus, electronic device, and medium
CN113378958A (en) Automatic labeling method, device, equipment, storage medium and computer program product
CN113627361B (en) Training method and device for face recognition model and computer program product
CN114494747A (en) Model training method, image processing method, device, electronic device and medium
CN113033373A (en) Method and related device for training face recognition model and recognizing face
CN114926322B (en) Image generation method, device, electronic equipment and storage medium
CN114663980B (en) Behavior recognition method, and deep learning model training method and device
CN114078184B (en) Data processing method, device, electronic equipment and medium
CN115482443A (en) Image feature fusion and model training method, device, equipment and storage medium
CN114495229A (en) Image recognition processing method and device, equipment, medium and product
CN114093006A (en) Training method, device and equipment of living human face detection model and storage medium
CN114707638A (en) Model training method, model training device, object recognition method, object recognition device, object recognition medium and product
CN115147306A (en) Image processing method, image processing device, electronic equipment and storage medium
CN113903071A (en) Face recognition method and device, electronic equipment and storage medium
CN114445682A (en) Method, device, electronic equipment, storage medium and product for training model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination