CN116012656A

CN116012656A - Sample image generation method and image processing model training method and device

Info

Publication number: CN116012656A
Application number: CN202310098229.2A
Authority: CN
Inventors: 谢泽柯; 何峥; 孙明明; 鲁楠; 杨朔; 李平
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2023-01-20
Filing date: 2023-01-20
Publication date: 2023-04-25
Anticipated expiration: 2043-01-20
Also published as: CN116012656B

Abstract

The disclosure provides a sample image generation method, an image processing model training method, a sample image generation device, an image processing model training medium, relates to the field of artificial intelligence, in particular to the technical fields of computer vision, deep learning, image fusion and the like, and can be applied to scenes such as automatic driving, security protection and industrial automation. The specific implementation scheme of the sample image generation method is as follows: fusing at least two images marked with the categories to obtain a fused image; determining probability vectors of the fusion image aiming at a plurality of preset categories according to the categories of at least two image labels; the probability vector comprises probability values of fusion images belonging to each preset category, and at least two categories of image labels belong to the preset categories; and determining the labeling information of the fusion image according to the probability vector to obtain a sample image, wherein the labeling information indicates the category to which the fusion image belongs.

Description

Sample image generation method and image processing model training method and device

Technical Field

The disclosure relates to the field of artificial intelligence, in particular to the technical fields of computer vision, deep learning, image fusion and the like, and can be applied to scenes such as automatic driving, security protection, industrial automation and the like.

Background

With the development of computer technology and electronic technology, deep learning technology provides the possibility of acquiring knowledge from massive data and information. The deep learning technology continuously makes important breakthrough in the fields of computer vision, natural language processing, voice signal processing and the like, shows good application scenes, and is widely applied in the directions of automatic driving, security, industrial automation and the like.

The effect of the deep learning depends on a large amount of high-quality annotation data, and if the quality of the annotation data is low, the information learned by the deep learning model is deviated, so that the precision and generalization capability of the deep learning model are affected.

Disclosure of Invention

The present disclosure aims to provide a sample image generating method and an image processing model training method, device, electronic equipment and storage medium, which are beneficial to improving the quality of annotation data and reducing the generation cost of the annotation data.

According to a first aspect of the present disclosure, there is provided a method of generating a sample image, including: fusing at least two images marked with the categories to obtain a fused image; determining probability vectors of the fusion image aiming at a plurality of preset categories according to the categories of at least two image labels; the probability vector comprises probability values of fusion images belonging to each preset category, and at least two categories of image labels belong to the preset categories; and determining the labeling information of the fusion image according to the probability vector to obtain a sample image, wherein the labeling information indicates the category to which the fusion image belongs.

According to a second aspect of the present disclosure, there is provided a training method of an image processing model, including: inputting a first sample image in a training set into an image processing model to obtain feature vectors of the first sample image aiming at a plurality of preset categories; the feature vector includes feature data of the first sample image for each of a plurality of predetermined categories; the first sample image is labeled with a first category; determining a loss value of the image processing model according to the feature data aiming at the first category in the feature vector; and training the image processing model according to the loss value of the image processing model, wherein the first sample image in the training set comprises the sample image generated by the method provided by the first aspect of the disclosure.

According to a third aspect of the present disclosure, there is provided a generation apparatus of a sample image, including: the image fusion module is used for fusing at least two images marked with the categories to obtain a fused image; the probability determining module is used for determining probability vectors of the fusion image aiming at a plurality of preset categories according to the categories of at least two image labels; the probability vector comprises probability values of fusion images belonging to each preset category, and at least two categories of image labels belong to the preset categories; and the category labeling module is used for determining labeling information of the fusion image according to the probability vector to obtain a sample image, wherein the labeling information indicates the category to which the fusion image belongs.

According to a fourth aspect of the present disclosure, there is provided a training apparatus of an image processing model, comprising: the feature vector obtaining module is used for inputting the first sample image in the training set into the image processing model to obtain feature vectors of the first sample image aiming at a plurality of preset categories; the feature vector includes feature data of the first sample image for each of a plurality of predetermined categories; the first sample image is labeled with a first category; the loss value determining module is used for determining a loss value of the image processing model according to the characteristic data aiming at the first category in the characteristic vector; and a model training module for training the image processing model according to the loss value of the image processing model, wherein the first sample image in the training set is a sample image generated by the device provided in the third aspect of the disclosure.

According to a fifth aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of generating a sample image and/or the method of training an image processing model provided by the present disclosure.

According to a sixth aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to execute the generation method of the sample image and/or the training method of the image processing model provided by the present disclosure.

According to a seventh aspect of the present disclosure, there is provided a computer program product comprising computer programs/instructions stored on at least one of a readable storage medium and an electronic device, which when executed by a processor, implement the method of generating a sample image and/or the method of training an image processing model provided by the present disclosure.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is an application scenario schematic diagram of a method for generating a sample image and a training method and apparatus for an image processing model according to an embodiment of the present disclosure;

FIG. 2 is a flow diagram of a method of generating a sample image according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a method of generating a sample image according to an embodiment of the present disclosure;

FIG. 4 is a flow diagram of a training method of an image processing model according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of determining loss values for an image processing model according to an embodiment of the present disclosure;

FIG. 6 is a schematic diagram of a training method of an image processing model according to an embodiment of the present disclosure;

fig. 7 is a block diagram of a structure of a generating apparatus of a sample image according to an embodiment of the present disclosure;

FIG. 8 is a block diagram of a training apparatus of an image processing model according to an embodiment of the present disclosure; and

FIG. 9 is a schematic block diagram of an example electronic device for implementing a method of generating a sample image and/or a method of training an image processing model in accordance with an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The effect of deep learning depends on a large amount of high quality annotation data. If the quality of the annotation data is low, such as if the annotation of the data is incorrect or if the data itself contains noisy or ambiguous information, the information learned by the deep learning model may deviate, which may reduce the generalization ability of the deep learning model over unknown data. However, there is often some degree of ambiguity in the data in the actual scene. For example, annotated image data often suffers from the following blurring scenarios:

first, there is a high similarity between images of some objects, which may be misclassified if there is insufficient prior knowledge. For example, the images of some breeds of dogs tend to be quite similar to those of wolves, and in special cases are difficult for even professionals to discern.

Second, there is a partial picture including only an object in an image due to a problem of photographed light, a problem of photographed angle, or the like. This can result in the image itself being illegible. For example, an image of a whale contains only the tail of the whale exposed to the water, and this is not the case for determining which whale is in the image.

Thirdly, because the annotated information has the problem of semantic hierarchy, the annotation of the image data is inaccurate or incomplete. For example, an image of an automobile is usually marked with an automobile, and when the predetermined category further includes a category such as a tire, the tire category is often ignored during marking.

Fourth, since objects belong to a plurality of different categories of the same level, the labeling of image data is inaccurate or incomplete. For example, an image of a notebook computer may be classified into a handheld computer type and a mobile device type, and the integrity of the labeling is often not guaranteed in the labeling process.

The accuracy of manual annotation can be particularly challenging when there is ambiguity in the feature or annotation information of the image as described above. When manually labeling data, a single-time labeling mode is often adopted, namely, a labeling person selects a label which is considered to be accurate from a series of candidate labels. When the degree of blurring of the data is high, the label may not be able to accurately select the label. And because the labels of the images in the image classification data set are usually single labels, the single labels with labels cannot well reflect the fuzzy information existing in the data. The problems can lead to inaccurate labeling of a training set of a training image processing model, and the accuracy of the model obtained by training cannot meet the online requirement.

In order to solve the above problems, the present disclosure provides a sample image generation method and an image processing model training method, apparatus, device and medium. An application scenario of the method and apparatus provided in the present disclosure is described below with reference to fig. 1.

Fig. 1 is an application scenario schematic diagram of a method for generating a sample image and a training method and apparatus for an image processing model according to an embodiment of the present disclosure.

As shown in fig. 1, the application scenario 100 of this embodiment may include an electronic device 110 and a database 120. Wherein the electronic device 110 may be a variety of electronic devices having processing functionality including, but not limited to, laptop portable computers, desktop computers, servers, and the like.

The database 120 may have a number of annotated images stored therein. The electronic device 110 may, for example, read the annotated image from the database 120, generate a blurred image from the annotated image, and determine annotation information for the blurred image from the annotation information for the annotated image. In which a blurred image may be understood as an image having features of a plurality of objects. The electronic device 110 may use the blurred image with the labeling information to expand the training set of the image processing model 130, and increase the proportion of the blurred image in the training set, so that the image processing model 130 may learn the information of the blurred image, so as to improve the generalization capability and classification precision of the image processing model.

In an embodiment, as shown in fig. 1, the application scenario 100 may further include a terminal device 140. The terminal device 140 may be a variety of electronic devices with processing capabilities including, but not limited to, smart phones, tablet computers, portable computers, desktop computers, and the like. The terminal device 140 may be installed with a client application such as an image processing class application, an instant messaging class application, and the like. The terminal device 140 may be communicatively coupled to the electronic device 110 via a network, for example. Accordingly, the electronic device 110 may be a background management server that provides support for the running of the image processing application installed in the terminal device, or may be a cloud server or a blockchain server, which is not limited in this disclosure.

In an embodiment, the electronic device 110 may respond to the request of the terminal device 140, and send the trained image processing model to the terminal device 140, so that the terminal device 140 processes the image 150 acquired by the image acquisition device according to the image processing model to determine the category of the image 150, and obtain the image category 160.

It should be noted that, the method for generating a sample image provided in the present disclosure may be executed by the electronic device 110, or may be executed by any other electronic device communicatively connected to the database 120. Accordingly, the generating device of the sample image provided by the present disclosure may be disposed in the electronic device 110, or may be disposed in any other electronic device communicatively connected to the database 120. The training method of the image processing model provided by the present disclosure may be performed by the electronic device 110. Accordingly, the training apparatus of the image processing model provided by the present disclosure may be provided in the electronic device 110.

It will be appreciated that the electronic device may also generate fuzzy data of text or fuzzy data of speech, etc. based on similar principles, and train the text processing model or speech processing model based on the generated fuzzy data to improve the accuracy and generalization ability of the text processing model or speech processing model, for example.

It should be understood that the number and types of electronic devices 110, databases 120, and terminal devices in fig. 1 are merely illustrative. There may be any number and type of electronic devices 110, databases 120, and terminal devices as desired for implementation.

The method of generating a sample image provided by the present disclosure will be described in detail below with reference to fig. 2 to 3.

Fig. 2 is a flow chart of a method of generating a sample image according to an embodiment of the present disclosure.

As shown in fig. 2, the sample image generating method 200 of this embodiment may include operations S210 to S230.

In operation S210, at least two images labeled with a category are fused to obtain a fused image.

According to the embodiment of the disclosure, the at least two images marked with the category may be images randomly extracted from the disclosed image dataset, and the number of the at least two images may be set according to actual requirements, for example, may be any natural number greater than 1, such as 2, 3, 4, and the like.

In this embodiment, for example, the image of the central area may be first cut from at least two images, and the cut images of the at least two central areas may be spliced to obtain a fused image. The image of the central region is truncated, taking into account that the important content of the image is usually located in the central region of the image. After the images are spliced, for example, scaling processing and the like can be performed on the spliced images, and the scaled images are used as fusion images, so that the fusion images have fixed sizes and are more suitable for being used as training data.

In an embodiment, the at least two images may be further weighted and fused according to a predetermined set of weight coefficients, and the image obtained by the weighted and fused may be used as a fused image. Wherein the predetermined weight coefficient group includes a plurality of weight coefficients respectively corresponding to the at least two images. For example, if at least two images are m images with the same size, the pixel value of a certain pixel point in the fused image may be calculated by the following formula (1). By the method, the blurring degree of the fusion image can be improved, so that the fusion image is more in line with an actual scene.

wherein ,

for example, the pixel value of the pixel in the ith row and the jth column in the fused image can be x ₁ 、x ₂ 、...、x _m Pixel values of pixels located in the ith row and the jth column in the m-th image are respectively represented by the 1 st image, the 2 nd image, the..and the mth image in the m-th image of the same size. Alpha ₁ 、α ₂ 、…、α _m The weight coefficients corresponding to the 1 st image, the 2 nd image, the … th image, and the m-th image of the m images of the same size are respectively shown. It will be appreciated that α ₁ 、α ₂ 、…、α _m And may be 1.

In operation S220, a probability vector of the fused image for a plurality of predetermined categories is determined according to the categories of the at least two image labels, the probability vector including a probability value of the fused image belonging to each predetermined category.

In an embodiment, the probability value of the fused image belonging to the category of each image label may be determined according to the ratio of the image of the central area of each image in the fused image. For example, if the duty ratio is 0.2, it may be determined that the probability value of the fused image belonging to the category of each image label is 0.2. Thus, probability values of each of the at least two image-labeled categories to which the fused image belongs can be obtained, and the probability values can form probability vectors. It will be appreciated that for other categories of the plurality of predetermined categories, other than the category to which the at least two images are annotated, the probability value of the fused image belonging to the other category may be 0. And setting a plurality of preset categories as c, and obtaining a probability vector as a c-dimensional vector.

In one embodiment, if an image is labeled with a certain predetermined category, a value of 1 may be used to indicate that the image is labeled with the certain predetermined category. Accordingly, for a category to which an image is not labeled, a value of 0 may be used to indicate that the image is not labeled for that category. In this embodiment, the probability value of the fused image belonging to each predetermined category may be obtained by weighting the values indicating the categories of the at least two image labels by category according to a predetermined set of weight coefficients. For example, for a certain category, the value indicating whether the m images are labeled with the certain predetermined category is y ₁ 、y ₂ 、…、y _m The probability value of the fused image belonging to the certain predetermined category can be calculated using the following formula (2).

s＝α ₁ y ₁ +α ₂ y ₂ +…+α _m y _m Formula (2)

Based on similar principle, probability values of the fused image belonging to each of a plurality of preset categories can be obtained, and the probability values are arranged in sequence to obtain the fused image aiming at a plurality of preset categoriesProbability vectors for predetermined classes. It will be appreciated that y ₁ 、y ₂ 、…、y _m The value of each numerical value of the image data set is 0 or 1, and the category of the image label randomly extracted from the public image data set belongs to the plurality of preset categories.

In operation S230, labeling information of the fused image is determined according to the probability vector, and a sample image is obtained.

For example, the embodiment may use predetermined categories corresponding to at least two probability values with larger values in the probability vector as the categories of the fused image, and label the fused image according to the determined categories, so as to obtain the sample image. In this way, the generated sample image may be made to have not only at least two categories of information, but also at least two labels. By the method, the blurred image with the multiple labels can be obtained, and the labeling of the multiple labels is not needed manually, so that the image labeling cost can be reduced, and the cost for acquiring the blurred image can be reduced.

In one embodiment, a soft label may be provided for the image. By soft label is meant that the image takes on a value of not only 0 or 1 for the class indicated by the label, but also 0,1]Any real number in between to represent the degree of matching of the image to the category. For example, the probability vector determined in operation S220 and the predetermined category corresponding to the probability value in the probability vector may be used as the labeling information of the fused image. For example, if the ordering of the plurality of predetermined categories is fixed, the labeling information of the fused image may be represented as s= (s ₁ ，s ₂ ，...s _c ). Wherein c is the number of a plurality of preset categories, s ₁ 、s ₂ 、…、s _c The probability values corresponding to the 1 st category, the 2 nd category, the … and the c-th category in the c predetermined categories respectively can be obtained in the probability vector. Therefore, the labeling information of the obtained sample image is more attached to the information expressed by the sample image, the labeling information is used as a supervision signal to train the image processing model, the classification information predicted by the image processing model is more attached to the actual information, and the accuracy and the learning ability of the image processing model are improved.

According to the sample image generation method, the at least two images marked with the categories are fused, the probability vectors of the fused images are determined by combining the marking categories of the at least two images, the fuzzy image can be obtained, and accurate marking information of the fuzzy image can be obtained. Compared with the prior art, the method and the device can avoid the situation that the sample image is inaccurate due to the fact that the category of the blurred image cannot be judged manually, can also avoid the situation that the blurred image is less in training of the image processing model, and can improve the accuracy of the image processing model obtained through training of the sample image generated based on the embodiment of the disclosure.

Fig. 3 is a schematic diagram of a method of generating a sample image according to an embodiment of the present disclosure.

As shown in fig. 3, in this embodiment 300, at least two images (e.g., the 1 st image 311, the 2 nd image 312, the …, the m-th image 313) may be fused to obtain a fused image 330 when generating a sample image. Meanwhile, the probability vector 340 may be determined from at least two categories of image annotations (e.g., the 1 st annotation category 321 of the 1 st image 311, the 2 nd annotation categories 322, … of the 2 nd image 312, the m-th annotation category 323 of the m-th image 313).

Subsequently, this embodiment can map the soft labels described above to single labels, i.e., make quantization labels, resulting in a category for labeling the fused image 330. For example, the embodiment 300 may sample a plurality of predetermined categories 350 according to probability values in the probability vector 340 to obtain a sampling category 351, and annotate the fused image 330 with the sampling category 351 as annotation information of the fused image 330 to obtain the sample image 360.

For example, when the embodiment samples a plurality of predetermined categories, the probability value of each of the plurality of predetermined categories being sampled may be the probability value of the probability vector corresponding to each of the predetermined categories.

Through the quantization labeling, under the condition that the sample images are provided with single labels, the distribution of the labels of the plurality of sample images is more consistent with the distribution of image categories in an actual scene, and the image processing model is trained based on the sample images, so that the generalization capability of the image processing model can be improved.

In an embodiment, when the values indicating the categories of the at least two image annotations are weighted by category according to the set of predetermined weight coefficients, it may be determined that the annotated categories include the target image of each predetermined category, for example, first according to the categories of the at least two image annotations. Specifically, if the number of the plurality of predetermined categories is c, in this embodiment, the m images may be grouped according to the 1 st labeling category 321 to the m labeling category 323 to obtain c image groups corresponding to the c predetermined categories, respectively. It will be appreciated that groups of c images may include groups without images, and that if an image is labeled with at least two categories, the image may be divided into at least two groups of images. Thus, the images in the image group corresponding to each predetermined category are the target images marked with each predetermined category.

Then, the embodiment may weight the value of each predetermined category labeled by the target image according to the weight coefficient corresponding to the target image in the predetermined weight coefficient group, so as to obtain a probability value of the fused image belonging to each predetermined category. The principle of deriving a probability value for a fused image belonging to each predetermined class in this embodiment may be similar to that of equation (2) described above, except that equation (2) weights the values indicating whether all images are labeled for each predetermined class, whereas this embodiment weights only the values indicating the predetermined class for which the target image is labeled. By this embodiment, meaningless numerical calculations can be reduced, which is advantageous for improving the efficiency of determining the probability value.

In an embodiment, a probability value calculated using a principle similar to the formula (2) may be used as the weighted probability value, and then a normalization process may be performed on the weighted probability values of the predetermined categories, and the normalized probability value may be used as the probability value that the fused image belongs to each of the predetermined categories. Therefore, the situation that the sum of a plurality of probability values of a plurality of preset categories is larger than 1 due to the fact that a certain image is marked with at least two preset categories can be avoided, and the accuracy and the rationality of the determined probability values and probability vectors are improved.

Based on the method for generating the sample image provided by the disclosure, the disclosure also provides a training method of the image processing model. The training method of the image processing model will be described in detail with reference to fig. 4 to 6.

Fig. 4 is a flow diagram of a training method of an image processing model according to an embodiment of the present disclosure.

As shown in fig. 4, the training method 400 of the image processing model of this embodiment may include operations S410 to S430.

In operation S410, a first sample image in a training set is input into an image processing model, resulting in feature vectors of the first sample image for a plurality of predetermined categories.

According to embodiments of the present disclosure, the image processing model may include, for example, an image classification model, an image segmentation model, an image recognition model, or the like, having the ability to classify an image or an object in an image. The first sample image in this embodiment may be all sample images in the training set, and the first sample image may include a sample image generated by the sample image generation method described above. Accordingly, the first sample image may be labeled with a first category. For example, the first category of labels may be represented, for example, via the value 1 described above. Alternatively, the first category of labels may be represented via the labeled soft labels described above, which is not limiting of the present disclosure.

In this embodiment, the first sample image may be input to an image processing model, and feature vectors may be output from a processing layer preceding an activation layer in the image processing model that outputs classification information. For example, the activation layer may normalize the feature vector by using an activation function sigmoid or softmax, and output a probability vector representing the classification information. In this embodiment, the feature vector includes feature data of the first sample image for each of a plurality of predetermined categories. It will be appreciated that if the number of the plurality of predetermined categories is c, the feature vector may be represented by a c-dimensional vector, which is different from the probability vector output by the activation layer only in that the probability vector is a normalized vector, and the feature vector is not a normalized vector.

In an embodiment, the image processing model may include at least a feature extraction network, which may employ an encoding-decoding structure, and an activation network, which may be the activation layer described above. The feature extraction network is connected with the activation network in sequence, and output data of the feature extraction network is input data of the activation network. In this embodiment, the first sample image may be input to a feature extraction network in the image processing model, and the feature vector may be output from the feature extraction network. For example, the feature extraction network may be composed of a convolution layer for extracting features of the first sample image and a full connection layer for mapping the extracted features to spatial dimensions in which a plurality of predetermined categories are located. It will be appreciated that the architecture of the feature extraction network described above is merely exemplary to facilitate an understanding of the present disclosure, which is not limited thereto.

In operation S420, a loss value of the image processing model is determined according to the feature data for the first category in the feature vector.

In this embodiment, the value of the predetermined loss function may be determined according to the feature data for the first category, and the value of the predetermined loss function may be used as the loss value of the image processing model. The predetermined loss function may be a loss in training the support vector machine. The present disclosure is not limited in this regard.

In operation S430, the image processing model is trained according to the loss value of the image processing model.

In this embodiment, the image processing model may be trained using a back propagation algorithm with the goal of minimizing the loss value until the accuracy of the image processing model reaches a predetermined accuracy or the loss value of the image processing model reaches a convergence condition.

According to the embodiment of the disclosure, the generated blurred image is used as a sample image to train the image processing model, so that the generalization capability of the image processing model can be improved.

Fig. 5 is a schematic diagram of determining a loss value of an image processing model according to an embodiment of the present disclosure.

According to embodiments of the present disclosure, it is contemplated that the first sample image in the training set may be labeled with only a single category, but the first sample image is a blurred image, and the first sample image may contain information of unlabeled categories. In order that the image processing model may learn the association between the unlabeled class and the first sample image, the embodiment may employ the principle of PU learning to set the penalty function such that the image processing model trained based on the penalty function may be used to distinguish whether a plurality of predetermined classes are related to the input image.

Thus, as shown in fig. 5, in this embodiment 500, for each positive sample 511 of a predetermined class included in the training set 510, a loss value caused by the tag of the positive sample data obtained by the image processing model 520 may be determined as the first sub-classification loss value 540 according to the feature data 530 for each predetermined class obtained by the image processing model 520 after processing the positive sample. The positive sample 511 is a first image that includes information of each predetermined category in the first sample image, and the first category of the label includes the first image of each predetermined category. Meanwhile, this embodiment may use, as the negative sample 512 for each predetermined category, other samples in the training set than the positive sample 511 for the each predetermined category, that is, the second image other than the first image. The embodiment may determine, from the negative sample, for each predetermined category of feature data 550, a loss value resulting from the image processing model classifying the negative sample data into that certain predetermined category as a second sub-classification loss value 560. Subsequently, the embodiment may determine a total loss value 570 for each predetermined class for the image processing model from the first sub-class loss value 540 and the second sub-class loss value 560. As shown in fig. 5, this embodiment may yield c total loss values for c predetermined categories. Finally, the embodiment 500 may determine a penalty value 580 for the image processing model from the image processing model for a plurality of total penalty values for a plurality of predetermined categories.

For example, the embodiment may consider a weighted sum of the first sub-category loss value 540 and the second sub-category loss value 560 for each predetermined category as the total loss value for each predetermined category for the image processing model. The sum of a plurality of total loss values for a plurality of predetermined categories may be taken as the loss value 580.

For example, the total loss value R for the jth predetermined category of the c predetermined categories _pu (g _j ) Can be calculated by the following formula (1).

wherein ,g_j Feature data for the j-th predetermined category in the feature vector output by the image processing model may be represented.

And representing a first sub-class loss value obtained for the feature data of the j-th predetermined class from the positive sample of the j-th predetermined class. />

Representing a second sub-class loss value derived for the feature data of the j-th predetermined class from the negative sample of the j-th predetermined class. Pi _p 、π _n The super-parameters may be related to, for example, a proportion of data belonging to the jth predetermined category and a proportion of data not belonging to the jth predetermined category in the image data of the actual scene. Pi _p And pi _n And may be, for example, 1.

In one embodiment, there may be a plurality of first images (i.e., positive samples 511) of each predetermined category described above. The embodiment may determine, from each first image, a loss value for each predetermined category characteristic data for which the image processing model 520 classifies each first image into each predetermined category. The image processing model 520 is then determined to classify the plurality of first images as a mean of the loss values for each of the predetermined categories, the mean being taken as the first sub-classification loss value. For example, if the number of first images of the jth predetermined category is np, then the above-described

For example, the method can be obtained by calculation using the following formula (2).

wherein ,

representing a predetermined loss function for positive samples, wherein +1 can be used for example to represent the resulting characteristic data +.>

Is a positive sample, +.>

Represents n _p An i-th image of the first images. Predetermined loss function->

For example, the expression (3) below can be used.

Similarly, there may be a plurality of second images (i.e., negative samples 512) for each of the predetermined categories described above. The embodiment may determine, from each second image, a loss value for each predetermined category characteristic data for which the image processing model 520 classifies each second image into each predetermined category. The image processing model 520 is then determined to classify the plurality of second images as a mean of the loss values for each of the predetermined categories, the mean being taken as the second sub-classification loss value. For example, if the number of the second images of the jth predetermined category is n _n And then described above

For example, can be adopted toAnd (4) calculating the following formula to obtain the product.

wherein ,

representing a predetermined loss function for the negative sample, of which-1 can be used, for example, to represent the resulting characteristic data g _j Is a positive sample, +.>

Represents n _n A kth image of the second images. Predetermined loss function- >

For example, the expression (5) below can be used.

It will be appreciated that the types of predetermined loss functions described above are merely examples to facilitate an understanding of the present disclosure, and the present disclosure is not limited to such types of predetermined loss functions.

For example, for the j-th predetermined category of the c predetermined categories, if defined

The image of the unlabeled jth predetermined class is classified as a loss value of the jth predetermined class for the image processing model. Since the image of each predetermined category is not substantially labeled, an image having the information of the each predetermined category may be included, or an image not having the information of the each predetermined category may be included. Thus, the image of the jth predetermined class is not labeled with positive and negative samples of the jth predetermined class. Thus (S)>

Including the loss of classifying positive samples in the image that are not labeled with the jth predetermined category as the jth predetermined category and the loss of classifying negative samples in the image of the jth predetermined category as the jth predetermined category. Setting the images marked with the jth preset category in the training set as positive samples of the jth preset category, and then +.>

For example, the expression (6) can be expressed as follows.

wherein ,

representing a loss value resulting from classifying a negative sample in an image not labeled with the jth predetermined class as the jth predetermined class, >

The loss value resulting from classifying a positive sample in an image not labeled with the jth predetermined class as the jth predetermined class is represented. If only the loss caused by classifying the negative sample in the image not labeled with the jth predetermined class into the jth predetermined class is considered, this equation (6) may be substituted into the equation (1) described above, resulting in the following equation (7) representing the total loss value for the jth predetermined class.

Accordingly, in determining the second sub-classification loss value 560 of the jth predetermined class, for example, the image processing model may be determined to classify the plurality of second images as the loss value of the jth predetermined class based on the characteristic data of the plurality of second images (i.e., the image not labeled with the jth predetermined class) for the jth predetermined class

As a first value. Meanwhile, it is possible to determine a loss value +_for classifying the target image into the jth predetermined class by the image processing model based on the characteristic data of the target image in the plurality of second images for the jth predetermined class>

As a second value. The target image is an image including information of the j-th preset category in the second image. Subsequently, the embodiment may determine a second sub-category loss value based on the first value and the second value. In particular, the second sub-category loss value may be expressed as +. >

I.e. according to a first predetermined multiple pi of the first and second values _p And determining a second sub-class loss value.

In an embodiment, the value of the second sub-classification loss value may be defined to be greater than or equal to 0, so that the setting of the second sub-classification loss value is significant. This is because in practice the proportion of positive samples in the classification result may be pi to the proportion of positive samples in the unlabeled image _p If there is an error, then

May be less than 0. For example, in this embodiment, the formula (7) can be rewritten as the following formula (8), for example.

That is, if the difference between the first value and the first predetermined multiple of the second value is greater than or equal to zero, the difference is determined to be the second sub-classification loss value. And if the difference value of the first preset multiple of the first value and the second value is smaller than zero, determining zero as a second sub-classification loss value.

In one embodiment of the present invention, in one embodiment,

and />

The principle of the value of (2) and +.>

The value principle of the method is similar, and the average value of the loss values of the multiple image classifications can be taken. For example, a->

and />

Can be expressed as the following formulas (9) and (10), respectively.

wherein ,n_u Representing the number of images of the jth preset class which are not marked in the training set, n _p ' indicates the number of target images in the image not labeled with the j-th predetermined category.

Representing the kth image of the unlabeled jth predetermined class of images,

representing the o-th image in the target image.

In one embodiment, this may be in equation (8) described above,

and />

The above equation (8) can be rewritten, for example, as the following equation (11) by assigning different coefficients.

wherein ,

and />

Two different hyper-parameters. That is, in determining the total loss value for the jth predetermined category, a second predetermined multiple of the first sub-category loss value may be +.>

And the sum of the second sub-classification loss values as a total loss value for the j-th predetermined class for the image processing model. Wherein (1)>

and />

The value of (2) can be set according to the actual requirement. For example +.>

May be related to the duty cycle of the positive samples of the jth predetermined class in the training set. By setting the two different superparameters, i.e. by adjusting the two superparameters separately, the negative impact of the class imbalance of the samples in the training set on the image processing model can be reduced.

Fig. 6 is a schematic diagram of a training method of an image processing model according to an embodiment of the present disclosure.

As shown in fig. 6, in this embodiment 600, an image processing model 610 may include a feature extraction network 611 and an activation network 612.

In this embodiment 600, the first sample image 601 in the training set may be input into the feature extraction network 611, so as to obtain the feature vector 602 output by the feature extraction network 611, and the feature vector is used as the feature vector of the first sample image 601 for a plurality of predetermined categories. Then, a loss value 603 of the image processing model may be determined according to the feature vector 602, and the image processing model 610 may be trained according to the loss value 603, to obtain a trained image processing model 620.

In an embodiment, after training of the image processing model 610 is completed from the first sample image 601 in the training set, a verification set may also be employed to verify the accuracy of the trained image processing model 620, for example. For example, the second sample image 604 in the validation set may be input into the trained image processing model 620 resulting in the probability vector 605 of the activation network output in the trained image processing model 620. Wherein the second sample image 604 is labeled with a second class, the probability vector 605 includes a probability that the trained image processing model 620 classifies the second sample image 604 into each of a plurality of predetermined classes.

Subsequently, the embodiment may determine a classification loss 606 of the trained image processing model 620 based on the probability that the trained image processing model 620 classifies the second sample image 604 into the second category. For example, a cross entropy loss function may be employed to calculate the classification loss 606. For example, in the case where the second sample image is plural, the expected value of the classification loss 606 determined from the plural second sample images may be taken as the loss value. Then, the embodiment can determine that the accuracy of the image processing model meets the requirement and complete the training of the image processing model in the case that the expected value is determined to be smaller than the predetermined threshold. The predetermined threshold may be set according to actual requirements, which is not limited in this disclosure.

In an embodiment, if it is determined that the expected value is not less than the predetermined threshold, it may be determined that the accuracy of the image processing model does not meet the requirement, and the first sample image may be extracted from the training set again, and the trained image processing model 620 may be trained.

Based on the method for generating the sample image provided by the disclosure, the disclosure also provides a device for generating the sample image. The device will be described in detail below in connection with fig. 7.

Fig. 7 is a block diagram of a structure of a sample image generating apparatus according to an embodiment of the present disclosure.

As shown in fig. 7, the generating apparatus 700 of the sample image of this embodiment includes an image fusion module 710, a probability determination module 720, and a category labeling module 730.

The image fusion module 710 is configured to fuse at least two images labeled with a category to obtain a fused image. In an embodiment, the image fusion module 710 may be configured to perform the operation S210 described above, which is not described herein.

The probability determination module 720 is configured to determine, according to the at least two image-labeled categories, probability vectors of the fused image for a plurality of predetermined categories, where the probability vectors include probability values of the fused image belonging to each of the predetermined categories, and the at least two image-labeled categories belong to the plurality of predetermined categories. In an embodiment, the probability determination module 720 may be configured to perform the operation S220 described above, which is not described herein.

The category labeling module 730 is configured to determine labeling information of the fused image according to the probability vector, and obtain a sample image. The annotation information indicates the category to which the fusion image belongs. In an embodiment, the category labeling module 730 may be configured to perform the operation S230 described above, which is not described herein.

According to an embodiment of the disclosure, the above-mentioned category labeling module 730 may be specifically configured to determine a probability vector and a predetermined category corresponding to a probability value in the probability vector as labeling information of the fused image, so as to obtain a sample image.

The category labeling module 730 may include a sampling sub-module and a labeling sub-module, according to embodiments of the present disclosure. The sampling sub-module is used for sampling a plurality of preset categories according to the probability value of the probability vector to obtain sampling categories. The labeling sub-module is used for taking the sampling category as labeling information of the fusion image to obtain a sample image.

According to an embodiment of the present disclosure, the image fusion module 710 may be specifically configured to perform weighted fusion on at least two images according to a predetermined weight coefficient set, so as to obtain a fused image; the predetermined weight coefficient group includes a plurality of weight coefficients corresponding to at least two images, respectively.

According to an embodiment of the present disclosure, the probability determining module 720 may be specifically configured to weight, by category, the numerical values indicating the categories of the at least two image labels according to the predetermined weight coefficient set, so as to obtain a probability value of the fused image belonging to each predetermined category.

According to embodiments of the present disclosure, the probability determination module 720 may include an image determination sub-module, a weighting sub-module, and a normalization sub-module. The image determining sub-module is used for determining that the marked category comprises target images of each preset category according to the categories of at least two image marks. The weighting sub-module is used for weighting the numerical value of each preset category indicating the labeling of the target image according to the weight coefficient corresponding to the target image in the preset weight coefficient group, so as to obtain the weighted probability value of each preset category. The normalization submodule is used for normalizing a plurality of weighted probability values of a plurality of preset categories to obtain probability values of each preset category in the fused image.

Based on the training method of the image processing model provided by the disclosure, the disclosure also provides a training device of the image processing model. The device will be described in detail below in connection with fig. 8.

Fig. 8 is a block diagram of a training apparatus of an image processing model according to an embodiment of the present disclosure.

As shown in fig. 8, the training apparatus 800 of the image processing model of this embodiment includes a feature vector obtaining module 810, a loss value determining module 820, and a model training module 830.

The feature vector obtaining module 810 is configured to input a first sample image in the training set into the image processing model, and obtain feature vectors of the first sample image for a plurality of predetermined categories. Wherein the feature vector includes feature data of the first sample image for each of a plurality of predetermined categories; the first sample image is labeled with a first class. The first sample image in the training set comprises the sample image generated by the sample image generating means described above. In an embodiment, the feature vector obtaining module 810 may be configured to perform the operation S410 described above, which is not described herein.

The loss value determining module 820 is configured to determine a loss value of the image processing model according to the feature data for the first category in the feature vector. In an embodiment, the loss value determining module 820 may be configured to perform the operation S420 described above, which is not described herein.

The model training module 830 is configured to train the image processing model according to the loss value of the image processing model. In an embodiment, the model training module 830 may be configured to perform the operation S430 described above, which is not described herein.

According to an embodiment of the present disclosure, the training set includes a plurality of first sample images. The loss value determination module 820 includes a first loss determination sub-module, a second loss determination sub-module, a third loss determination sub-module, and a fourth loss determination sub-module. The first loss determination sub-module is used for determining a first sub-classification loss value of classifying the first image into each preset category by the image processing model according to the characteristic data of each preset category of the first image in the plurality of first sample images. The first image comprises information of each preset category, and the first category of the first image label is the information of each preset category. The second loss determination submodule is used for determining, according to characteristic data of each preset category of a second image in the plurality of first sample images, a second sub-classification loss value of the image processing model for classifying the second image into each preset category, wherein the second image is an image of which the first category marked in the plurality of first sample images does not comprise each preset category. The third loss determination submodule is used for determining a total loss value of the image processing model for each preset category according to the first sub-category loss value and the second sub-category loss value. The fourth loss determination submodule is used for determining a loss value of the image processing model according to the image processing model aiming at a plurality of total loss values of a plurality of preset categories.

According to an embodiment of the present disclosure, the first image is a plurality of images. The first loss determination submodule may include a first loss determination unit and a mean determination unit. The loss determination unit is used for determining that the image processing model classifies each first image into a loss value of each preset category according to the characteristic data of each first image for each preset category. The average value determining unit is used for determining an average value of loss values of the image processing model classifying the plurality of first images into each preset category, and obtaining a first sub-classification loss value.

According to an embodiment of the present disclosure, the second loss determination submodule may include a first value determination unit, a second value determination unit, and a second loss determination unit. The first value determining unit is configured to determine, as a first value, a loss value of the image processing model classifying the plurality of second images into each predetermined category, based on the feature data of the plurality of second images for each predetermined category. A second value determining unit configured to determine, as a second value, a loss value of the image processing model classifying the target image into each predetermined category, for each predetermined category, based on the feature data of the target image in the plurality of second images; the target image includes information of each predetermined category. The second loss determination unit is configured to determine a second sub-category loss value according to the first value and the second value.

According to an embodiment of the present disclosure, the second loss determining unit may specifically be configured to determine the second sub-category loss value according to a difference between the first value and the first predetermined multiple of the second value.

According to an embodiment of the present disclosure, the above-described second loss determination unit may specifically be configured to: determining, in response to the difference of the first value and the first predetermined multiple of the second value being greater than or equal to zero, the difference as a second sub-classification loss value; and in response to the difference of the first value and the first predetermined multiple of the second value being less than zero, determining zero as the second sub-classification loss value.

According to an embodiment of the present disclosure, the third loss determination submodule described above may be specifically configured to: and determining a sum of a second preset multiple of the first sub-classification loss value and the second sub-classification loss value to obtain a total loss value of the image processing model aiming at each preset category, wherein the first preset multiple and the second preset multiple are two super-parameters, and the first preset multiple is related to the duty ratio of positive samples aiming at each preset category in the training set.

According to an embodiment of the present disclosure, an image processing model includes a feature extraction network and an activation network. The above feature vector obtaining module may specifically be configured to input the first sample image into a feature extraction network, to obtain a feature vector output by the feature extraction network, as a feature vector of the first sample image for a plurality of predetermined categories.

According to an embodiment of the present disclosure, the training apparatus 800 of the image processing model may further include a probability vector obtaining module, a classification loss obtaining module, and a training completion determining module. The probability vector obtaining module is used for inputting the second sample image in the verification set into the trained image processing model to obtain the probability vector for activating the network output. The second sample image is labeled with a second category; the probability vector includes a probability that the trained image processing model classifies the second sample image into each of a plurality of predetermined categories. The classification loss obtaining module is used for determining the classification loss of the trained image processing model according to the probability of classifying the second sample image into the second category. The training completion determination module is to determine that training of the image processing model is complete in response to the expected value of the classification loss being less than a predetermined threshold.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and applying personal information of the user all conform to the regulations of related laws and regulations, necessary security measures are adopted, and the public welcome is not violated. In the technical scheme of the disclosure, the authorization or consent of the user is obtained before the personal information of the user is obtained or acquired.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

FIG. 9 illustrates a schematic block diagram of an example electronic device 900 that may be used to implement the methods of generating sample images and/or training image processing models of embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 901 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The computing unit 901, the ROM 902, and the RAM 903 are connected to each other by a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.

Various components in device 900 are connected to I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, or the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, an optical disk, or the like; and a communication unit 909 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunications networks.

The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 901 performs the respective methods and processes described above, for example, a sample image generation method and/or an image processing model training method. For example, in some embodiments, the method of generating the sample image and/or the method of training the image processing model may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as the storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 900 via the ROM 902 and/or the communication unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of the above-described generation method of a sample image and/or training method of an image processing model may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the method of generating the sample image and/or the method of training the image processing model in any other suitable way (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS"). The server may also be a server of a distributed system or a server that incorporates a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A method of generating a sample image, comprising:

fusing at least two images marked with the categories to obtain a fused image;

determining probability vectors of the fusion image aiming at a plurality of preset categories according to the categories of the at least two image labels; the probability vector comprises a probability value of each predetermined category to which the fused image belongs; the categories of the at least two image labels belong to the plurality of predetermined categories; and

Determining the labeling information of the fusion image according to the probability vector to obtain a sample image,

and the labeling information indicates the category to which the fused image belongs.

2. The method of claim 1, wherein the determining labeling information of the fused image according to the probability vector, obtaining a sample image comprises:

and determining the probability vector and the annotation information of the fusion image, which is the predetermined category corresponding to the probability value in the probability vector, to obtain a sample image.

3. The method of claim 1, wherein the determining labeling information of the fused image according to the probability vector, obtaining a sample image comprises:

sampling the plurality of preset categories according to the probability values of the probability vectors to obtain sampling categories; and

and taking the sampling category as the labeling information of the fusion image to obtain a sample image.

4. A method according to claim 3, wherein: the fusing of the at least two images marked with the category, and the obtaining of the fused image comprises the following steps:

weighting and fusing the at least two images according to a preset weight coefficient group to obtain a fused image; the predetermined weight coefficient group includes a plurality of weight coefficients respectively corresponding to the at least two images.

5. The method of claim 4, wherein the determining the probability vector of the fused image for the plurality of predetermined categories from the categories of the at least two image annotations comprises:

and weighting the numerical values indicating the categories of the at least two image labels according to the preset weight coefficient groups to obtain probability values of the fused image belonging to each preset category.

6. The method of claim 5, wherein said weighting the values indicative of the categories of the at least two image labels by category according to the set of predetermined weight coefficients to obtain a probability value for the fused image belonging to each of the predetermined categories comprises:

determining that the marked category comprises the target image of each preset category according to the categories of the at least two image marks;

weighting the numerical value of each preset category indicating the target image annotation according to the weight coefficient corresponding to the target image in the preset weight coefficient group to obtain a weighted probability value of each preset category; and

and normalizing the weighted probability values of the preset categories to obtain the probability value of each preset category in the preset categories of the fused image.

7. A method of training an image processing model, comprising:

inputting a first sample image in a training set into an image processing model to obtain feature vectors of the first sample image aiming at a plurality of preset categories; the feature vector includes feature data of the first sample image for each of the plurality of predetermined categories; the first sample image is marked with a first category;

determining a loss value of the image processing model according to the characteristic data aiming at the first category in the characteristic vector; and

training the image processing model according to the loss value of the image processing model,

wherein the first sample image in the training set comprises the sample image generated by the method of any one of claims 1-6.

8. The method of claim 7, wherein the training set comprises a plurality of first sample images; the determining a loss value of the image processing model according to the feature data for the first category in the feature vector comprises: for each of the predetermined categories:

determining a first sub-classification loss value of the image processing model for classifying a first image of a plurality of first sample images into each predetermined class according to the characteristic data of the first image for each predetermined class; the first image comprises information of each preset category, and the first category of the first image label is the preset category;

Determining a second sub-classification loss value of the image processing model for classifying a second image of the plurality of first sample images into each predetermined class according to the characteristic data of the second image for each predetermined class; the second image is an image, wherein the first category marked in the plurality of first sample images does not comprise each preset category;

determining a total loss value of the image processing model for each predetermined category according to the first sub-category loss value and the second sub-category loss value; and

determining a loss value of the image processing model for a plurality of total loss values of the plurality of predetermined categories according to the image processing model.

9. The method of claim 8, wherein the first image is a plurality of; the determining that the image processing model classifies the first image as the first sub-classification loss value for each of the predetermined categories comprises:

determining, for each of the predetermined categories, a loss value for the image processing model to classify each of the first images as the each of the predetermined categories based on the feature data for each of the first images; and

determining that the image processing model classifies a plurality of first images into the average value of the loss values of each preset category, and obtaining the first sub-classification loss value.

10. The method of claim 8, wherein the second image is a plurality of; determining that the image processing model classifies the second image as the second sub-classification loss value for each of the predetermined categories includes:

determining, as a first value, a loss value of the image processing model classifying the plurality of second images into each of the predetermined categories based on the plurality of second images for the characteristic data of each of the predetermined categories;

determining, as a second value, a loss value of the image processing model classifying the target image into each of the predetermined categories, based on the feature data of the target image in the plurality of second images for each of the predetermined categories; the target image includes information of each predetermined category; and

and determining the second sub-classification loss value according to the first value and the second value.

11. The method of claim 10, wherein the determining the second sub-class loss value from the first value and the second value comprises:

and determining the second sub-classification loss value according to the difference value of the first value and the first preset multiple of the second value.

12. The method of claim 11, wherein the determining the second sub-class loss value from a difference of a predetermined multiple of the first value and the second value comprises:

Determining, in response to a difference of a first predetermined multiple of the first value and the second value being greater than or equal to zero, the difference as the second sub-classification loss value;

and in response to the difference between the first value and the first predetermined multiple of the second value being less than zero, determining zero as the second sub-classification loss value.

13. The method of claim 9 or 10, wherein said determining a total loss value for said image processing model for said each predetermined class from said first sub-class loss value and said second sub-class loss value comprises:

determining a sum of a second predetermined multiple of the first sub-class loss value and the second sub-class loss value, obtaining a total loss value of the image processing model for each predetermined class,

wherein the first predetermined multiple and the second predetermined multiple are two superparameters, the first predetermined multiple being related to the duty cycle of positive samples in the training set for each of the predetermined categories.

14. The method of claim 7, wherein the image processing model includes a feature extraction network and an activation network; inputting a first sample image in a training set into an image processing model, and obtaining feature vectors of the first sample image aiming at a plurality of preset categories comprises:

And inputting the first sample image into the feature extraction network to obtain feature vectors output by the feature extraction network, wherein the feature vectors are used as feature vectors of the first sample image aiming at a plurality of preset categories.

15. The method of claim 14, further comprising:

inputting a second sample image in the verification set into the trained image processing model to obtain a probability vector output by the activation network; the second sample image is marked with a second category; the probability vector includes a probability that the trained image processing model classifies the second sample image into each of the plurality of predetermined categories;

determining a classification loss of the trained image processing model according to the probability that the trained image processing model classifies the second sample image into the second category; and

responsive to the expected value of the classification loss being less than a predetermined threshold, it is determined that training of the image processing model is complete.

16. A sample image generation apparatus comprising:

the image fusion module is used for fusing at least two images marked with the categories to obtain a fused image;

the probability determining module is used for determining probability vectors of the fusion image aiming at a plurality of preset categories according to the categories of the at least two image labels; the probability vector comprises a probability value of each predetermined category to which the fused image belongs; the categories of the at least two image labels belong to the plurality of predetermined categories; and

A category labeling module for determining labeling information of the fusion image according to the probability vector to obtain a sample image,

17. A training apparatus for an image processing model, comprising:

the feature vector obtaining module is used for inputting a first sample image in the training set into the image processing model to obtain feature vectors of the first sample image aiming at a plurality of preset categories; the feature vector includes feature data of the first sample image for each of the plurality of predetermined categories; the first sample image is marked with a first category;

a loss value determining module, configured to determine a loss value of the image processing model according to the feature data for the first category in the feature vector; and

a model training module for training the image processing model according to the loss value of the image processing model,

wherein the first sample image in the training set comprises the sample image generated by the apparatus of claim 15.

18. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 15.

19. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-15.

20. A computer program product comprising computer programs/instructions stored on at least one of a readable storage medium and an electronic device, which when executed by a processor, implement the steps of the method according to any one of claims 1 to 15.