CN113128588B

CN113128588B - Model training method, device, computer equipment and computer storage medium

Info

Publication number: CN113128588B
Application number: CN202110416258.XA
Authority: CN
Inventors: 艾长青; 周大军; 赖勇辉; 张先震
Original assignee: Shenzhen Tencent Domain Computer Network Co Ltd
Current assignee: Shenzhen Tencent Domain Computer Network Co Ltd
Priority date: 2021-04-16
Filing date: 2021-04-16
Publication date: 2024-03-26
Anticipated expiration: 2041-04-16
Also published as: CN113128588A

Abstract

The application discloses a model training method, a model training device, computer equipment and a computer storage medium, wherein the method comprises the following steps: acquiring a sample set of an object recognition model, wherein the sample set comprises a plurality of marked images and marked information of each marked image; if the sample set is detected to comprise N classes of marked objects according to the marking information of each marked image, and the object proportion balance among the classes is not achieved, the object expansion proportion of each class is obtained, and N is a positive integer greater than 1; determining the associated annotation image of each category from the sample set, and performing sample expansion based on the object expansion proportion of each category and the associated annotation image of each category to obtain a plurality of expansion images; and taking the plurality of expanded images and each labeling image in the sample set as training samples of the object recognition model, and carrying out model training on the object recognition model by adopting the training samples, so that the performance of the object recognition model can be improved.

Description

Model training method, device, computer equipment and computer storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a model training method, a model training apparatus, a computer device, and a computer storage medium.

Background

Object recognition generally refers to a technology that uses a machine to recognize and analyze objects of a certain class, and generally includes two tasks, namely classification and detection, where classification can be used to determine whether an image contains objects of a certain class, and detection can be used to mark the position and size of the objects. At present, an object recognition model is generally used to realize object recognition on an image. Since the accuracy of the object recognition result and the performance of the object recognition model are closely related, the performance of the object recognition model is closely related to the model training method. Therefore, how to train the object recognition model to improve the performance of the object recognition model becomes a research hotspot.

Disclosure of Invention

The application provides a model training method, a model training device, computer equipment and a computer storage medium, which can improve the performance of an object recognition model.

In one aspect, the present application provides a model training method, including:

acquiring a sample set of an object recognition model, wherein the sample set comprises a plurality of marked images and marked information of each marked image; the annotation information of any one annotation image is used for indicating: one or more annotation objects in any annotation image and the category to which each annotation object belongs;

If the sample set comprises N classes of marked objects according to the marked information of each marked image and the object proportion balance among the classes is not achieved, obtaining the object expansion proportion of each class, wherein N is a positive integer greater than 1;

determining the associated annotation image of each category from the sample set, and performing sample expansion based on the object expansion proportion of each category and the associated annotation image of each category to obtain a plurality of expansion images; the associated annotation image of any category refers to: a marked image with marked objects under any category;

and taking the plurality of expanded images and each labeling image in the sample set as training samples of the object recognition model, and carrying out model training on the object recognition model by adopting the training samples.

In one aspect, the present application provides a model training apparatus, comprising:

the system comprises an acquisition unit, a storage unit and a storage unit, wherein the acquisition unit is used for acquiring a sample set of an object recognition model, and the sample set comprises a plurality of annotation images and annotation information of each annotation image; the annotation information of any one annotation image is used for indicating: one or more annotation objects in any annotation image and the category to which each annotation object belongs;

The obtaining unit is further configured to obtain an object expansion ratio of each class if the sample set is detected to include N classes of annotation objects according to the annotation information of each annotation image, and the object expansion ratios of the classes are not balanced, where N is a positive integer greater than 1;

the processing unit is used for determining the associated annotation image of each category from the sample set, and performing sample expansion based on the object expansion proportion of each category and the associated annotation image of each category to obtain a plurality of expansion images; the associated annotation image of any category refers to: a marked image with marked objects under any category;

and the training unit is used for taking the plurality of expanded images and each labeling image in the sample set as training samples of the object recognition model and adopting the training samples to carry out model training on the object recognition model.

In one aspect, the present application provides a computer device comprising:

a processor adapted to implement one or more computer programs;

a computer storage medium storing one or more computer programs, the one or more computer programs adapted to be loaded and executed by the processor:

Acquiring a sample set of an object recognition model, wherein the sample set comprises a plurality of marked images and marked information of each marked image; the annotation information of any one annotation image is used for indicating: one or more annotation objects in any annotation image and the category to which each annotation object belongs; if the sample set comprises N classes of marked objects according to the marked information of each marked image and the object proportion balance among the classes is not achieved, obtaining the object expansion proportion of each class, wherein N is a positive integer greater than 1; determining the associated annotation image of each category from the sample set, and performing sample expansion based on the object expansion proportion of each category and the associated annotation image of each category to obtain a plurality of expansion images; the associated annotation image of any category refers to: a marked image with marked objects under any category; and taking the plurality of expanded images and each labeling image in the sample set as training samples of the object recognition model, and carrying out model training on the object recognition model by adopting the training samples.

In one aspect, the present application provides a computer storage medium storing one or more instructions adapted to be loaded and executed by the processor:

In one aspect, embodiments of the present application provide a computer program product or computer program, the computer program product comprising a computer program stored in a computer storage medium; the computer program is read from a computer storage medium by a processor, and executed by the processor, causes a computer device to perform:

According to the method and the device, when the object proportion of various labeling objects in the sample set is unbalanced, the associated labeling images of the various labeling objects are expanded based on the object expansion proportion of the various labeling objects, so that the purpose of expanding the various labeling objects is achieved, the problem of unbalanced object proportion of the various labeling objects in the sample set can be effectively solved, further, the object recognition model is trained by adopting the expanded sample set, and the performance of the object recognition model can be effectively improved.

Drawings

For a clearer description of the technical solutions of the present application, the drawings that are needed in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1a is a schematic illustration of a annotated image provided herein;

FIG. 1b is a schematic flow chart of a model training method provided in the present application;

FIG. 2 is a schematic diagram of a model training method provided herein;

FIG. 3a is a schematic illustration of a annotated image provided herein;

FIG. 3b is a schematic illustration of an augmented image provided herein;

FIG. 3c is a schematic illustration of yet another augmented image provided herein;

FIG. 4 is a schematic diagram of a model training method provided herein;

FIG. 5a is a schematic flow chart of one sample expansion provided herein;

FIG. 5b is a schematic diagram of a sample expansion process provided in the present application;

FIG. 5c is a schematic flow chart of determining a target area provided in the present application;

FIG. 5d is a schematic diagram of a sample expansion process provided in the present application;

FIG. 5e is a schematic illustration of a partial region misalignment type of sample expansion provided herein;

FIG. 6 is a schematic structural diagram of a model training device provided in the present application;

fig. 7 is a schematic architecture diagram of a computer device provided in the present application.

Detailed Description

With the explosive development of computer technology, artificial intelligence (Artificial Intelligence, AI) technology has also made great progress. By artificial intelligence techniques is meant the theory, method, technique and application of simulating, extending and expanding human intelligence, sensing the environment, obtaining knowledge and using knowledge to obtain optimal results using a digital computer or a machine controlled by a digital computer. In other words, artificial intelligence is a comprehensive technique of computer science; the intelligent machine is mainly used for producing a novel intelligent machine which can react in a similar way of human intelligence by knowing the essence of the intelligence, so that the intelligent machine has multiple functions of sensing, reasoning, decision making and the like. Accordingly, AI technology is a comprehensive discipline, and mainly includes Computer Vision (CV), speech processing, natural language processing, and Machine Learning (ML)/deep Learning.

The machine learning in the machine learning/deep learning is a multi-field interdisciplinary, and relates to a plurality of disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of AI and is a fundamental approach for computer devices to have intelligence; deep learning is a technique for machine learning by using a deep neural network system; machine learning/deep learning may generally include a variety of techniques including artificial neural networks, reinforcement learning (Reinforcement Learning, RL), supervised learning, unsupervised learning, etc.

Based on the machine learning/deep learning technology in the AI technology, the application provides a model training scheme which can be used for training an object recognition model to improve the performance of the object recognition model, such as: to improve the accuracy of object recognition by the object recognition model. The term "target recognition" means: identifying one or more objects in the image by using a deep neural network algorithm, wherein the image can be a game image, a character shooting image, a commodity image and the like; for game images, target recognition refers to recognition of virtual character objects, prop objects, blood bar objects, and the like in the game images; for a person-captured image, target recognition may refer to recognition of a ornament object, a five-sense organ object (e.g., eyes, nose, eyebrows, etc.), etc. in the person-captured image. In a specific implementation, the model training method may be performed by a computer device, which may be a terminal or a server. Wherein the terminal may include, but is not limited to: smart phones, tablet computers, notebook computers, desktop computers, smart televisions, etc.; a wide variety of clients (APP) may be running within the terminal, such as multimedia play clients, social clients, browser clients, information flow clients, educational clients, and so forth. The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs (Content Delivery Network, content delivery networks), basic cloud computing services such as big data and artificial intelligent platforms, and the like.

In a specific implementation, the general principle of the model training scheme is as follows: firstly, a plurality of labeling images for model training of an object recognition model can be obtained by a computer device, the plurality of labeling images can be a plurality of pictures with labeling information, which are manually collected, each labeling image is a sample (namely, picture data for training the model), each labeling image comprises one or more labeling objects, and the labeling image can comprise one labeling object 111 as shown in fig. 1a, wherein the labeling image can comprise one labeling object 111 as shown in 11 in fig. 1 a; as also shown at 12 in FIG. 1a, the annotation image can include multiple annotation objects of the same class (e.g., two annotation objects of class "tower"); as also shown at 13 in FIG. 1a, or at 14 in FIG. 1a, the annotation image can include multiple annotation objects under different categories, such as: two annotation objects 131 of the category "tower" and one annotation object 132 of the category "person", or as follows: a plurality of annotation objects 141 of the category "tower" and a plurality of annotation objects 142 of the category "person".

Because the collection modes of the annotation images are different, the number of various annotation objects included in each collected annotation image may be different, for example, if the annotation image is extracted from a shooting game picture, the number of annotation objects of a 'person' class may be far greater than the number of annotation objects of a 'house' class; if the annotation image is extracted from a build class game frame, the number of annotation objects for the "house" class may be much greater than the number of annotation objects for the "person" class. When the number of the various labeling objects is different, the proportion of the various labeling objects is possibly unbalanced; in this case, the computer device may determine the proportion of each type of annotation object to be expanded in the various types of annotation objects included in the plurality of annotation images, so that the computer device may determine the number of times (assumed to be the target number of times) that each annotation image needs to be expanded, and then perform sample expansion for the target number of times based on the annotation images, so as to achieve the purpose of balancing the proportion of each type of annotation object in the plurality of annotation images, and for example, the general principle of the model training scheme may be shown in fig. 1 b. It can be understood that the model training scheme provided by the application does not need excessive attention to picture content in the collection stage of the annotation images, so that human resources are saved, meanwhile, the computer equipment performs sample expansion based on the object expansion proportion of each type of annotation object to obtain a training sample, the problem of unbalanced object proportion of each type of annotation object in the plurality of annotation images is solved, and the generalization of the object recognition model obtained by training by adopting the training sample is enhanced.

Based on the principle illustration of the model training scheme, the application provides a model training method which can be executed by the computer equipment; referring to fig. 2, the model training method includes the following steps S201 to S204:

s201, a sample set of the object recognition model is obtained.

The sample set comprises a plurality of marked images and marked information of each marked image; the annotation information of any one annotation image is used for indicating: one or more annotation objects in the any annotation image, and a category to which each annotation object belongs.

S202, if the sample set is detected to comprise N classes of marked objects according to marking information of each marked image, and the object proportion balance among the classes is not achieved, the object expansion proportion of each class is obtained.

Wherein N is a positive integer greater than 1, namely: the sample set includes a plurality of classes of annotation objects, the meaning of "plurality" as referred to herein is: at least two.

In one embodiment, the computer device may obtain the number of the labeling objects under each category according to the labeling information of each labeling image, and then determine whether the object proportion balance is achieved between the categories based on the number of the labeling objects under each category. In a specific embodiment, the object scale balancing may refer to: the number of the labeling objects of each category reaches a desired value, for example, assume that the sample set includes 3 types of labeling objects, namely "people", "tree", "flame", and the preset values of the numbers of the "people", "tree", "flame" are 5000; then, if the number of "people" is 5000, the number of "trees" is 5056, and the number of "flames" is 5600, it is considered that the object proportion of each type of labeling object in the sample set is balanced.

Optionally, object scale balancing may also refer to: the difference between the number of tagged objects for each category is within a certain threshold range, such as: and if the quantity of the labeling objects in each category is not more than 100, the labeling objects in each category are considered to achieve the object proportion balance. In this case, for example, assuming that the sample set includes 3 classes of labeling objects, namely, "people", "tree" and "flame", if the number of "people" is 5000, the number of "tree" is 4000, and the number of "flame" is 5600, it is considered that the object proportion of each class of labeling objects in the sample set is not balanced, and the computer device is required to obtain the object expansion proportion of each class; if the number of people is 5000, the number of trees is 5056 and the number of flames is 5060, the object proportion of each type of marked object in the sample set is considered to be balanced, and the computer equipment is not required to acquire the object expansion proportion of each type.

In another embodiment, the computer device may further obtain the number of associated label images of each category according to the label information of each label image, and then determine whether the object proportion balance is achieved between the categories based on the number of associated label images of each category. Wherein, the associated annotation image of each category refers to: an annotation image having an annotation object under the category. As shown in FIG. 3a, 301 in FIG. 3a may be understood as an associated annotation image of a "tower" class annotation object, 302 in FIG. 3a may be understood as an associated annotation image of a "tower" class annotation object, and may be understood as an associated annotation image of a "person" class annotation object. Thus, the number of associated annotation images for each category can be understood as: for example, assuming that the labeling image 1 includes 1 class a labeling object a and 2 class B labeling objects B, and the labeling image 2 includes 1 class a labeling object a and 5 class C labeling objects C, the labeling images corresponding to the class a labeling objects are "labeling image 1" and "labeling image 2", and thus the number of associated labeling images of "class a" is 2; similarly, the number of associated labeling images of "class B" is 1, and the number of associated labeling images of "class C" is 1. Further, object scale balancing may refer to: the difference value between the numbers of the associated annotation images of each category is within a certain threshold range; optionally, object scale balancing may also refer to: the number of the associated annotation images of each category reaches the expected value.

S203, determining the associated annotation image of each category from the sample set, and performing sample expansion based on the object expansion proportion of each category and the associated annotation image of each category to obtain a plurality of expansion images.

In a specific embodiment, one category may correspond to one or more associated annotation images, and then N categories may correspond to at least N associated annotation images, where N is an integer greater than 1; in addition, the computer device may perform sample expansion based on one associated labeling image to obtain one or more expansion images, and it may be understood that the manner in which the computer device performs sample expansion based on the labeling object of each category may be: the computer device performs sample expansion based on the associated label image corresponding to each category, and it can be further understood that the computer device may perform sample expansion based on each category in the sample set, so as to obtain a plurality of expanded images, where "a plurality of" means: at least N. For example, assuming that the associated annotation image 1 and the associated annotation image 2 are both annotation images having class a annotation objects, the computer device may perform sample expansion according to the associated annotation image 1 to obtain one or more expanded images a, and the computer device may further perform sample expansion according to the associated annotation image 2 to obtain one or more expanded images b, and then the plurality of expanded images may be understood as including: the one or more augmented images a and the one or more augmented images b.

In one embodiment, the manner in which the computer device performs sample augmentation based on the associated annotation image includes, but is not limited to: copying the associated annotation image, and performing white screen expansion, black screen expansion, flower screen expansion, mirroring, rotation, scale variable, color dithering and the like on the associated annotation image. Assuming that the image shown at 31 in fig. 3b is a normal label image a, the so-called white screen expansion can be understood as: the computer device generates an augmented image with a white screen region based on the annotation image a (as shown at 32 in fig. 3b, 321 and 322 are both white screen regions); the so-called black screen expansion can be understood as: the computer device generates an augmented image with a black screen region based on the annotation image a (as shown at 33 in fig. 3b, 331 and 332 are both black screen regions); the so-called splash screen expansion can be understood as: the computer device generates an augmented image with a splash screen image based on the annotation image a (as shown in FIG. 3 c), wherein the splash screen image may include a dislocation type splash screen image (as shown at 34 in FIG. 3c, regions 341 and 342 are where dislocation occurs), a color dither type splash screen image (as shown at 35 and 36 in FIG. 3c, regions 351, 352 and 361 are where color dithering occurs), or other types of splash screen images.

S204, taking the plurality of expanded images and each labeling image in the sample set as training samples of the object recognition model, and carrying out model training on the object recognition model by adopting the training samples.

It can be understood that in the training samples of the object recognition model, the number of the labeling objects of each category is more than the number of the labeling objects of each category included in the sample set, so that better training effect can be achieved when the computer equipment trains the object recognition model by adopting the training samples.

According to the model training method, whether the object proportion of each type of labeling object in the sample set is balanced or not is judged by acquiring the labeling information of each type of labeling object in the sample set, and when the object proportion of each type of labeling object is not balanced, sample expansion can be performed based on the object expansion proportion of each type of labeling object and the associated labeling image of each type of labeling object so as to balance the proportion of each type of labeling object; and furthermore, in the process of carrying out model training on the object recognition model by adopting a plurality of expanded images and each labeling image in the sample set, the object recognition model can uniformly learn the characteristics of various labeling objects to a certain extent, thereby being beneficial to improving the learning training effect of the object recognition model and further improving the model performances such as generalization, robustness and the like of the object recognition model.

Referring to fig. 4, fig. 4 is a model training method provided in the present application, which may include the following steps S401 to S406:

s401, a sample set of the object recognition model is acquired.

In an embodiment, the specific implementation in step S401 may refer to the related description of step S201, which is not described herein.

S402, if the sample set is detected to comprise N classes of marked objects according to marking information of each marked image, and the object proportion balance among the classes is not achieved, the object expansion proportion of each class is obtained.

In one embodiment, when the computer device obtains the object expansion ratio of each category, the total number of images of the labeling images in the sample set and the expected number of images in the sample set may be obtained first, where the expected number of images refers to the number of labeling images included in the expected sample set. The desired number of images may be, for example, a minimum number of samples required for model training (e.g., a minimum of the number of labeled images).

In an alternative embodiment, if the expected number of images is greater than the total number of images, the computer device obtains a first reference expansion ratio for each category, and calculates the object expansion ratio for each category according to the first reference expansion ratio for each category. For example, assume that a sample set acquired by a computer device includes 10 annotation images, and that the 10 annotation images include 3 classes (class a, class B, and class C) of annotation objects. If the expected number of the images is 20, the computer equipment respectively obtains a first reference expansion ratio of the A category, a first reference expansion ratio of the B category and a first reference expansion ratio of the C category, then calculates the object expansion ratio of the A category based on the first reference expansion ratio of the A category, calculates the object expansion ratio of the B category based on the first reference expansion ratio of the B category, and calculates the object expansion ratio of the C category based on the first reference expansion ratio of the C category. Specifically, when the expected number of images is greater than the total number of images, the computer device may first calculate a scaling factor corresponding to each category, and then amplify the first reference expansion ratio of each category by using the scaling factor of each category, so as to obtain the object expansion ratio of each category. For example, the manner in which the computer device performs the amplification processing on the first reference expansion ratio of the i-th class labeling object can be seen in equation 1.

Wherein ratio is _i Representing a first reference expansion ratio corresponding to an ith labeling object, wherein sampleNum represents the total number of images of the labeling images in the sample set, minSampleNum represents the expected number of images in the sample set, and minRatio _i The object expansion ratio of the i-th class labeling object is expressed, and the ratio expansion coefficient can be understood as follows:

correspondingly, if the expected number of images is less than or equal to the total number of images, the computer device obtains a second reference expansion ratio corresponding to the labeling objects in each category, and uses the second reference expansion ratio as the object expansion ratio of the labeling objects in the category, for example, see formula 2.

minRatio _i ＝ratio′ _i 2, 2

Wherein ratio' _i Representing class i annotation objectsThe second reference extends the scale. Then, as can be seen from the foregoing, the manner in which the computer device obtains the object expansion ratio of each category can be seen from equation 3.

Wherein ratio is _i Representing a first reference expansion ratio, ratio 'corresponding to the i-th labeling object' _i A second reference expansion ratio representing an i-th labeling object, sampleNum representing the total number of images of the labeling images in the sample set, minSampleNum representing the expected number of images in the sample set, minRatio _i And the object expansion ratio of the ith class annotation object is represented.

In another alternative embodiment, if the expected number of images is greater than the total number of images, the computer device obtains a first reference expansion ratio for each category and takes the first reference expansion ratio for each category as the object expansion ratio for the category; if the expected number of images is less than or equal to the total number of images, the computer equipment acquires a second reference expansion ratio of each category and takes the second reference expansion ratio as an object expansion ratio of the category. Based on this, the manner in which the computer device determines the object expansion ratio of each category can be seen in equation 4.

In one embodiment, the first and second reference expansion rates of each category may be set according to an empirical value, for example, the first and second reference expansion rates may be set according to an application scenario of the object recognition model. For example, when the object recognition model is used to recognize objects in a game screen, the first reference expansion ratio and the second reference expansion ratio may be set according to the occurrence frequency of various objects in the game screen, such as: the first and second reference expansion ratios of the objects having higher frequency of occurrence are set to 0.2, and the first and second reference expansion ratios of the objects having lower frequency of occurrence are set to 0.1.

Alternatively, the first reference scale and the second reference scale corresponding to the labeling objects of the same category may be the same. For example, assuming that the game screen in game a mainly includes "tower", "hero", "monster", "road", and the like, and the frequency of occurrence of "tower" and "hero" is highest, the frequency of occurrence of "monster" is low, so when the object recognition model is used to recognize the object included in the game screen in game a, the first reference expansion ratio and the second reference expansion ratio of "tower" and "hero" may be set to 0.2, and the first reference expansion ratio and the second reference expansion ratio of "monster" and "road" may be set to 0.1.

Alternatively, the first reference scale and the second reference scale corresponding to the labeling objects of the same category may be different. Specifically, as can be seen from the foregoing, when the total number of images of the labeling images is smaller than the expected number of images of the labeling images, the computer device will perform sample expansion based on the first reference expansion ratio of each category, with the purpose of: more training samples are obtained through expansion, so that the model training effect is ensured. It can be understood that when the number of the marked images is small, the computer device needs to expand the volume based on the first reference expansion ratio to obtain more training samples, and when the number of the marked images is large, the computer device only needs to expand according to the basic ratio (or the second reference expansion ratio). Therefore, the first reference proportion and the second reference proportion of the same class may also be different, and specifically, the first reference expansion proportion corresponding to the nth class of labeling objects may be greater than the second reference expansion proportion corresponding to the nth class of labeling objects. For example, the commodity image mainly includes "clothes", "daily necessities", "pets", "flowers", and the like, and the frequency of occurrence of various objects in the commodity image is ordered as follows: clothing > daily necessities > flowers > pets, and therefore, when the object recognition model is used to recognize an object included in a commodity image in the online shopping platform, the first reference expansion ratio of "clothing" may be set to 0.5, the second reference expansion ratio of "clothing" may be set to 0.3, the first reference expansion ratio of "pets" may be set to 0.2, and the second reference expansion ratio of "pets" may be set to 0.1.

S403, determining the associated annotation image of each category from the sample set, and calculating the target expansion times of each associated annotation image in the nth category according to the object expansion proportion of the nth category.

Wherein, N is E [1, N ], the computer equipment can count the total number of the objects of the marked objects (namely, the sum of the number of the marked objects of N categories) included in the sample set according to the marked information of each marked image of the sample set; then, the computer device calculates the number of object extensions of the nth category based on the counted total number of objects and the object extension ratio of the nth category, which may be understood as: the total number of sample expansion by the computer device based on the annotation image with the class of annotation objects is required. Specifically, the calculation method may be as shown in equation 5:

extendSampleTimes _i ＝ceil(totalObjNum*mibRatio _i ) 5 wherein extendssampleTimes _i Expanding the number of the i-type labeling objects, ceil is a ceiling function (or called an upward rounding function), totalObjNum is the total number of the objects (namely, the total number of the labeling objects contained in the sample set), and minRatio _i The scale is extended for the object of the i-th class annotation object. For example, if the total number of objects is 10 and the object expansion ratio of the i-th class of labeling objects is 0.21, then totalObjNum is a minRatio _i =10x0.21=2.1, and further, the extendstampetides can be obtained by rounding up 2.1 _i And therefore, the computer device needs to make a new image (orWeighing: the associated annotation image of the class i annotation object) is expanded 3 times to obtain 3 expanded images. Further, after obtaining the object expansion number of the nth category, the computer device may calculate the target expansion times of each associated annotation image in the nth category according to the object expansion number of the nth category and the annotation information of each associated annotation image in the nth category.

Optionally, when the computer device obtains the number of the labeling objects under each category according to the labeling information of each labeling image, and determines whether the object proportion balance is achieved between the categories based on the number of the labeling objects under each category, a specific implementation manner of calculating the target expansion times of each associated labeling image in the nth category by the computer device may be referred to as the following example: assuming that 3 annotation images including the ith annotation object are respectively an associated annotation image A1 (including 2 ith annotation objects), an associated annotation image A2 (including 1 ith annotation object) and an associated annotation image A3 (including 3 ith annotation objects), if the number of object expansion of the ith annotation object is 3, the target expansion times of each associated annotation image may be: the number of target expansion times of the associated annotation image A1 is 1, the number of target expansion times of the associated annotation image A2 is 1, and the number of target expansion times of the associated annotation image A3 is 0; or, the target expansion times of the associated annotation image A1 are 3 times, the target expansion times of the associated annotation image A2 are 0 times, and the target expansion times of the associated annotation image A3 are 0 times; or, the number of target expansion times of the associated label image A1 is 0, the number of target expansion times of the associated label image A2 is 0, and the number of target expansion times of the associated label image A3 is 1. It should be noted that, the allocation of the target expansion times of each associated annotation image is not particularly limited, so long as the computer device is ensured to obtain 3 (the number of object expansion) newly added associated annotation images of the ith annotation object according to the expansion of the associated annotation images of the ith annotation object.

Correspondingly, if the computer device obtains the number of associated annotation images of each category according to the annotation information of each annotation image, and then judges whether the object proportion balance mode is achieved between the categories based on the number of associated annotation images of each category, the specific implementation mode of calculating the target expansion times of each associated annotation image in the nth category by the computer device can be seen as the following example: assuming that 3 annotation images including the ith annotation object are respectively an associated annotation image A1 (including 2 ith annotation objects), an associated annotation image A2 (including 1 ith annotation object) and an associated annotation image A3 (including 3 ith annotation objects), if the number of object expansion of the ith annotation object is 3, the target expansion times of each associated annotation image may be: the number of target expansion times of the associated annotation image A1 is 1, the number of target expansion times of the associated annotation image A2 is 1, and the number of target expansion times of the associated annotation image A3 is 1; or the target expansion times of the associated marked image A1 are 2 times, the target expansion times of the associated marked image A2 are 1 time, and the target expansion times of the associated marked image A3 are 0 time; or, the number of target expansion times of the associated annotation image A1 is 0, the number of target expansion times of the associated annotation image A2 is 3, the number of target expansion times of the associated annotation image A3 is 0, and so on, the number of target expansion times of each associated annotation image is not particularly limited, so long as the computer equipment is ensured to obtain 3 (the number of object expansion) newly added ith annotation objects according to the associated annotation images of the ith annotation objects.

S404, traversing each associated annotation image in the nth category, and taking the currently traversed associated annotation image as a target image.

Traversing each associated annotation image in the nth category means: the computer equipment sequentially accesses each associated annotation image in one or more associated annotation images corresponding to the nth class annotation object, wherein the specific access can be to acquire the annotation information of the associated annotation image, and then the currently traversed associated annotation image (or referred to as a target image) can be understood as: the annotation image of the annotation information is being acquired.

S405, performing sample expansion on the target image based on the target expansion times of the target image to obtain one or more expansion images corresponding to the target image.

In one embodiment, the computer device performs a sample expansion on the target image, and may specifically perform an image copying process on the target image to obtain a copied image; then, a target area is determined in the copied image, the color of each pixel point in the target area is adjusted to a target color, and the adjusted copied image is used as an extended image corresponding to the target image.

In an alternative embodiment, the computer device may randomly determine the size and position of the target area in the reproduced image, based on which the computer device adjusts the color of each pixel point in the target area to black or white, and the specific flow of this sample expansion operation may be shown in fig. 5 a. As shown in fig. 5a, the computer device may repeatedly perform the steps of generating a target area and filling the target area with black or white until the computer device generates and fills M target areas in the copy image, where M is an integer, and the positions where the computer device generates the target area (i.e., the "rectangular box" in fig. 5 a) may overlap. For example, as shown in fig. 5b, the first step of performing the above-mentioned "generating a target area and filling the target area with black or white" may be shown as 51 in fig. 5b, and the second step of performing the above-mentioned "generating a target area and filling the target area with black or white" may be shown as 52 in fig. 5 b. In a specific application, the number of the black screens or the white screens generated by the computer equipment can be realized by a random generation number mode, so that the computer equipment cannot generate too many target areas in order to ensure the effectiveness of the expanded image, and the duty ratio of the target areas (such as the black screen area and the white screen area) in the copied image cannot be too large. For example, the computer device may define the area occupied by the black and white screen regions to within 1/3 of the total area of the entire reproduced image, 1.ltoreq.M.ltoreq.5, where M is an integer.

In another alternative embodiment, if the duplicate image includes the annotation object under the nth category, and other annotation objects under other categories; the computer device determines the target area in the copy image, which may specifically include: the computer device determines the display areas of other marked objects in the copied image and takes the display areas of other marked objects as target areas. For example, referring to FIG. 5c, assuming that the duplicate image is as shown at 53 in FIG. 5c and the nth category is "tree," the computer device may determine the region in the duplicate image where "person" is located as the target region.

In yet another embodiment, the computer device may perform image copying processing on the target image to obtain a copy image; then generating a pattern image block related to the copy image according to the pattern type and the pattern color, and determining the display position of the pattern image block in the copy image; the computer device then adds a block of the screen image at the display location to overlay the area image at the display location to obtain an extended image corresponding to the target image, as shown, for example, in fig. 5 d. The screen pattern may include: regional snowflakes, transverse lines, longitudinal lines, text clutter, partial regional misalignment, etc., each of the floriation screen types may correspond to one or more floriation screen image tiles, and each of the floriation screen image tiles may correspond to one or more floriation screen colors. For example: the type of regional snowflakes can correspond to a circular snowflake region and also can correspond to a hexagonal snowflake region. Further, the computer device may generate a corresponding splash screen image block according to the splash screen type, and fill the splash screen image block with a corresponding splash screen color, so as to obtain a final extended image. For example, as shown in fig. 5e, when the type of the splash screen is that the partial region is dislocated, the splash screen image block is the shape and size of the dislocated region (as shown by 54 in fig. 5 e), and the splash screen color is the image content included in the erroneous region (as shown by 55 in fig. 5 e).

S406, taking the plurality of expanded images and each labeling image in the sample set as training samples of the object recognition model, and carrying out model training on the object recognition model by adopting the training samples.

In an embodiment, the specific implementation in step S406 may be referred to the related description in step S204, which is not described herein.

According to the model training method, when the number of the marked images in the sample set is small, the marked images are expanded based on the object expansion proportion, so that the number of the marked images is increased, the sample set is expanded, and further, the object recognition model obtained based on the sample set training has stronger robustness; meanwhile, the computer equipment can judge whether various labeling objects in the sample set reach the object proportion balance according to the labeling information of each sample in the sample set, when the various labeling objects in the sample set do not reach the object proportion balance, the computer equipment can acquire the object expansion proportion of each category, then the sample expansion is carried out based on the object expansion proportion, the purpose of balancing the proportion of various labeling objects in the sample set can be achieved, the object recognition model obtained after model training is carried out based on the balanced sample set can be achieved, the model recognition model has stronger generalization, in addition, the computer equipment can effectively increase the diversity of training samples in a plurality of ways of carrying out sample expansion on the category based on the object expansion proportion of each category, and further the model training effect is effectively ensured on the data source.

Based on the above description of the embodiments of the model training method, the embodiments of the present invention also disclose a model training apparatus, which may be a computer program (including program code) running in the above mentioned computer device. The model training apparatus may perform the method shown in fig. 2 or fig. 4. Referring to fig. 6, the model training apparatus may at least include: an acquisition unit 601, a processing unit 602, and a training unit 603.

An obtaining unit 601, configured to obtain a sample set of an object recognition model, where the sample set includes a plurality of annotation images and annotation information of each annotation image; the annotation information of any one annotation image is used for indicating: one or more annotation objects in any annotation image and the category to which each annotation object belongs;

the obtaining unit 601 is further configured to obtain an object expansion ratio of each class if the sample set is detected to include N classes of annotation objects according to the annotation information of each annotation image, and the object expansion ratios of the classes are not balanced, where N is a positive integer greater than 1;

the processing unit 602 is configured to determine, from the sample set, an associated label image of each category, and perform sample expansion based on the object expansion ratio of each category and the associated label image of each category, so as to obtain a plurality of expanded images; the associated annotation image of any category refers to: a marked image with marked objects under any category;

And a training unit 603, configured to take the plurality of extended images and each labeling image in the sample set as training samples of the object recognition model, and perform model training on the object recognition model by using the training samples.

In one embodiment, the obtaining unit 601 is specifically configured to, when executing obtaining the object expansion ratio of each category, execute:

acquiring the total number of images of the marked images in the sample set and the expected number of images of the sample set, wherein the expected number of images refers to the number of marked images which are expected to be included in the sample set;

if the expected number of the images is larger than the total number of the images, acquiring a first reference expansion ratio of each category, and respectively calculating an object expansion ratio of each category according to the first reference expansion ratio of each category;

and if the expected number of the images is smaller than or equal to the total number of the images, acquiring the second reference expansion ratio of each category as the object expansion ratio of each category.

In yet another embodiment, when the obtaining unit 601 performs calculation of the object expansion ratio of each category according to the first reference expansion ratio of each category, the method is specifically configured to:

Calculating a proportional amplification coefficient corresponding to each category according to the total number of images and the expected number of images;

and amplifying the first reference expansion ratio of each category by adopting the ratio amplification coefficient of each category to obtain the object expansion ratio of each category.

In yet another embodiment, the processing unit 602 performs sample expansion based on the object expansion ratio of each category and the associated labeling image of each category, and is specifically configured to perform:

calculating the target expansion times of each associated annotation image in the nth category according to the object expansion proportion of the nth category; wherein n is [1, N ];

traversing each associated annotation image in the nth category, and taking the currently traversed associated annotation image as a target image;

and performing sample expansion on the target image based on the target expansion times of the target image to obtain one or more expansion images corresponding to the target image.

In yet another embodiment, the processing unit 602, when executing the object expansion ratio according to the nth category, calculates the target expansion times of each associated annotation image in the nth category, specifically executes:

Counting the total number of the objects of the marked objects in the sample set according to the marked information of each marked image of the sample set;

calculating the number of object expansion of the nth category based on the counted total number of objects and the object expansion proportion of the nth category;

and calculating the target expansion times of each associated annotation image in the nth category according to the object expansion number of the nth category and the annotation information of each associated annotation image in the nth category.

In yet another embodiment, the processing unit 602 performs a sample expansion on the target image, specifically performing:

performing image copying processing on the target image to obtain a copy image;

determining a target area in the copied image, and adjusting the color of each pixel point in the target area to a target color;

and taking the adjusted copy image as an expansion image corresponding to the target image.

In yet another embodiment, if the duplicate image includes a label object under the nth category, and other label objects under other categories; the processing unit 602 specifically performs, when determining a target area in the copy image:

If the copied image comprises the labeling object under the nth category and other labeling objects under other categories; the target area is determined in the duplicate image.

performing image copying processing on the target image to obtain a copy image;

generating a flower screen image block related to the copy image according to the flower screen type and the flower screen color; determining the display position of the flower screen image block in the copied image;

and adding the screen image block at the display position to cover the area image at the display position, so as to obtain an extended image corresponding to the target image.

According to one embodiment of the present application, the steps involved in the methods shown in fig. 2 and 4 may be performed by the various units in the model training apparatus shown in fig. 6. For example, both of step S201 and step S202 shown in fig. 2 may be performed by the acquisition unit 601 in the model training apparatus shown in fig. 6; step S203 may be performed by the processing unit 602 in the model training apparatus shown in fig. 6; step S204 may be performed by the training unit 603 in the model training apparatus shown in fig. 6. For another example, steps S401 to S402 shown in fig. 4 may be performed by the acquisition unit 601 in the model training apparatus shown in fig. 6; steps S403 to S405 may be performed by the processing unit 602 in the model training apparatus shown in fig. 6; step S406 may be performed by the training unit 603 in the model training apparatus shown in fig. 6.

According to another embodiment of the present application, each unit in the model training apparatus shown in fig. 6 is divided based on a logic function, and each unit may be separately or completely combined into one or several other units to form the model training apparatus, or some unit(s) thereof may be further split into a plurality of units with smaller functions to form the model training apparatus, which may achieve the same operation without affecting the implementation of the technical effects of the embodiments of the present application. In other embodiments of the present application, the model training apparatus may also include other units, and in practical applications, these functions may also be implemented with assistance of other units, and may be implemented by cooperation of multiple units.

According to another embodiment of the present application, a model training apparatus as shown in fig. 6 may be constructed by running a computer program (including program code) capable of executing the steps involved in the method as shown in fig. 2 or fig. 4 on a general-purpose computing device such as a computer, including a Central Processing Unit (CPU), a random access storage medium (RAM), a read only storage medium (ROM), etc., processing elements and storage elements, and the model training method of the embodiments of the present application may be implemented. The computer program may be recorded on, for example, a computer storage medium, and loaded into and run in the above-described computing device through the computer storage medium.

According to the model training device, the obtaining unit is used for obtaining the labeling information of each labeling object in the sample set, so that whether the object proportion of each labeling object in the sample set is balanced or not is judged, and when the object proportion of each labeling object in the sample set is not balanced, the processing unit is called to carry out sample expansion based on the object expansion proportion of each class and the associated labeling image of each class so as to balance the proportion of each labeling object; and furthermore, in the process of calling the training unit and carrying out model training on the object recognition model by adopting a plurality of expanded images and each labeling image in the sample set, the object recognition model can uniformly learn the characteristics of various labeling objects to a certain extent, thereby being beneficial to improving the learning and training effects of the object recognition model and further improving the model performances such as generalization, robustness and the like of the object recognition model.

Based on the description of the method embodiment and the apparatus embodiment, the embodiment of the application further provides a computer device. Referring to fig. 7, the computer device includes at least a processor 701, an input interface 702, and a computer storage medium 703, and the processor 701, the input interface 702, and the computer storage medium 703 in the computer device may be connected by a bus or other means.

The computer storage medium 703 is a memory device in a computer device for storing programs and data. It is understood that the computer storage media 703 herein may include both built-in storage media in a computer device and extended storage media supported by the computer device. The computer storage media 703 provides storage space that stores the operating system of the computer device. Also stored in this memory space are one or more instructions, which may be one or more computer programs (including program code), adapted to be loaded and executed by the processor 701. The computer storage medium herein may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory; optionally, at least one computer storage medium remote from the processor may be present. The processor 701, or CPU (Central Processing Unit ), is a computing core as well as a control core of a computer device, which is adapted to implement one or more instructions, in particular to load and execute one or more instructions to implement a corresponding method flow or a corresponding function.

In one embodiment, one or more instructions stored in computer storage medium 703 may be loaded and executed by processor 701 to implement the corresponding method steps described above in connection with the method embodiments illustrated in FIGS. 2 and 4; in particular implementations, one or more instructions in the computer storage medium 703 are loaded by the processor 701 and perform the steps of:

In one embodiment, the obtaining the object extension ratio of each category is specifically loaded and executed by the processor 701:

In yet another embodiment, the calculating the object expansion ratio of each category according to the first reference expansion ratio of each category is specifically loaded and executed by the processor 701:

In yet another embodiment, the sample expansion is performed based on the object expansion ratio of each category and the associated labeling image of each category to obtain a plurality of expansion images, which are loaded and executed by the processor 701 specifically:

In yet another embodiment, the calculating the target expansion times of each associated annotation image in the nth category according to the object expansion ratio of the nth category is specifically loaded and executed by the processor 701:

In yet another embodiment, the target image is subjected to a sample expansion, specifically loaded and executed by the processor 701:

performing image copying processing on the target image to obtain a copy image;

In yet another embodiment, if the duplicate image includes a label object under the nth category, and other label objects under other categories; the determination of the target area in the duplicate image is specifically carried out by the processor 701:

Performing image copying processing on the target image to obtain a copy image;

According to the computer equipment provided by the application, the labeling information of each labeling object in the sample set is obtained, so that whether the object proportion of each class of labeling object in the sample set is balanced or not is judged, and when the object proportion of each class of labeling object is not balanced, sample expansion can be performed based on the object expansion proportion of each class and the associated labeling image of each class so as to balance the proportion of each class of labeling object; and furthermore, in the process of carrying out model training on the object recognition model by adopting a plurality of expanded images and each labeling image in the sample set, the object recognition model can uniformly learn the characteristics of various labeling objects to a certain extent, thereby being beneficial to improving the learning training effect of the object recognition model and further improving the model performances such as generalization, robustness and the like of the object recognition model.

The embodiment of the application further provides a computer storage medium, in which a computer program of the model training method is stored, where the computer program includes program instructions, and when one or more processors load and execute the program instructions, descriptions of the model training method in the embodiment can be implemented, which are not described herein. The description of the advantageous effects of the same method is not repeated here. It will be appreciated that the program instructions may be deployed to be executed on one or more devices that are capable of communicating with one another.

It should be noted that, according to an aspect of the present application, there is also provided a computer program product or a computer program, which comprises computer instructions stored in a computer readable storage medium. A processor in a computer device reads the computer instructions from a computer-readable storage medium and executes the computer instructions to thereby enable the computer device to perform the methods provided in the various alternatives described above with respect to the model training method embodiments shown in fig. 2 and 4.

Those skilled in the art will appreciate that implementing all or part of the above described embodiment methods may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed, may comprise the steps of embodiments of the model training method described above. The computer readable storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), or the like.

The foregoing disclosure is merely a partial embodiment of the present application, and it is not intended to limit the scope of the claims of the present application.

Claims

1. A method of model training, comprising:

determining the associated annotation image of each category from the sample set, and performing sample expansion based on the object expansion proportion of each category and the associated annotation image of each category to obtain a plurality of expansion images; the associated annotation image of any category refers to: a marked image with marked objects under any category; according to the object expansion proportion of the nth category, calculating the target expansion times of each associated annotation image in the nth category, wherein n is E [1, N ]; traversing each associated annotation image in the nth category, and taking the currently traversed associated annotation image as a target image; sample expansion is carried out on the target image based on the target expansion times of the target image, so that one or more expansion images corresponding to the target image are obtained;

2. The method of claim 1, wherein the obtaining the object expansion ratio for each category comprises:

3. The method of claim 2, wherein the calculating the object expansion ratio of each category according to the first reference expansion ratio of each category includes:

4. The method of claim 1, wherein calculating the target number of expansions for each associated annotation image in the nth category based on the object expansion ratio of the nth category comprises:

5. The method of claim 1, wherein performing a sample expansion on the target image comprises:

Performing image copying processing on the target image to obtain a copy image;

6. The method of claim 5, wherein if the duplicate image includes a label object under the nth category, and other label objects under other categories; the determining a target area in the copy image includes:

and determining the display areas of the other marked objects in the copied image, and taking the display areas of the other marked objects as target areas.

7. The method of claim 1, wherein performing a sample expansion on the target image comprises:

performing image copying processing on the target image to obtain a copy image;

8. A model training device, comprising:

the processing unit is used for determining the associated annotation image of each category from the sample set, and performing sample expansion based on the object expansion proportion of each category and the associated annotation image of each category to obtain a plurality of expansion images; the associated annotation image of any category refers to: a marked image with marked objects under any category; according to the object expansion proportion of the nth category, calculating the target expansion times of each associated annotation image in the nth category, wherein n is E [1, N ]; traversing each associated annotation image in the nth category, and taking the currently traversed associated annotation image as a target image; sample expansion is carried out on the target image based on the target expansion times of the target image, so that one or more expansion images corresponding to the target image are obtained;

9. A computer device, comprising:

a processor adapted to implement one or more computer programs;

computer storage medium storing one or more computer programs adapted to be loaded by the processor and to perform the model training method according to any of claims 1-7.

10. A computer storage medium storing one or more instructions adapted to be loaded by a processor and to perform the model training method of any of claims 1-7.