CN112070137B

CN112070137B - Training data set generation method, target object detection method and related equipment

Info

Publication number: CN112070137B
Application number: CN202010892590.9A
Authority: CN
Inventors: 周驰
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-08-27
Filing date: 2020-08-27
Publication date: 2024-06-28
Anticipated expiration: 2040-08-27
Also published as: CN112070137A

Abstract

The application relates to the technical field of artificial intelligence, and provides a method for generating a training data set, a method for detecting a target object and related equipment, wherein the training data set is used for training a target detection model, the target detection model is used for detecting the target object in an image, and the method for generating the training data set comprises the following steps: acquiring a first image comprising a detection object, wherein the detection object comprises a positive example corresponding to a target object; performing image processing on the first image to obtain at least one second image, wherein the image processing comprises at least one of pixel transformation, dissimilar geometric transformation and object boundary shielding processing; synthesizing the background image and at least one target image to obtain a third image, wherein the target image is the first image or the second image; and adding the position information of the third image and the target image in the third image to the training data set in a correlated way, thereby realizing automatic generation of the training data set.

Description

Training data set generation method, target object detection method and related equipment

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a training data set generation method, a target object detection method and related equipment.

Background

In video programs, a brand LOGO (Logotype, icon) needs to be added to the video image, i.e., LOGO coding. In order to ensure the accuracy of LOGO coding, after LOGO coding is completed, the video image added with LOGO is required to be detected so as to detect and identify whether the LOGO coded into the video image is accurate or not. In the related art, the video image is subjected to the LOGO detection through the target detection model, so as to determine whether the LOGO coded into the video image is the LOGO actually required to be coded into the video image.

Before the target detection model is used to detect the LOGO in the video image, the target detection model needs to be trained by a training data set to ensure that the target detection model can accurately detect and identify the LOGO in the image after training. Therefore, a training data set needs to be constructed before training the target detection model.

In the related art, a training data set is constructed by manually collecting images including LOGO, and in order to make the number of training samples in the training data set reach a certain number, the method of manually collecting images has the problems of large workload, long time consumption and low efficiency. Therefore, how to improve the efficiency of generating the training data set is a technical problem to be solved.

Disclosure of Invention

The embodiment of the application provides a training data generation method, a target object detection method and related equipment, and further solves the problem of low efficiency of generating a training data set in the related technology at least to a certain extent.

Other features and advantages of the application will be apparent from the following detailed description, or may be learned by the practice of the application.

According to an aspect of an embodiment of the present application, there is provided a method of generating a training data set for training a target detection model for detecting a target object in an image, the method comprising:

acquiring a first image comprising a detection object, the detection object comprising a positive example corresponding to the target object;

Performing image processing on the first image to obtain at least one second image, wherein the image processing comprises at least one of pixel transformation, dissimilar geometric transformation and object boundary shielding processing;

Synthesizing the background image and at least one target image to obtain a third image, wherein the target image is at least one of the first image and the second image;

And adding the position information association of the third image and the target image in the third image to the training data set.

According to an aspect of an embodiment of the present application, there is provided a target object detection method including: acquiring an image to be detected; inputting the image to be detected into a target detection model, wherein the target detection model is obtained by model training through a training data set, and the training data set is obtained according to the generation method of the training data set; and detecting the target object of the image to be detected by the target detection model, and outputting a detection result, wherein the detection result at least indicates whether the image to be detected comprises the target object. According to an aspect of an embodiment of the present application, there is provided a generating apparatus of a training data set for training a target detection model for detecting a target object in an image, the apparatus including:

the acquisition module is used for acquiring a first image comprising a detection object, wherein the detection object comprises a positive example corresponding to the target object;

The processing module is used for carrying out image processing on the first image to obtain at least one second image, wherein the image processing comprises at least one of pixel transformation, dissimilar geometric transformation and object boundary shielding processing;

The synthesizing module is used for synthesizing the background image and at least one target image to obtain a third image, wherein the target image is the first image or the second image;

an adding module is used for adding the position information of the third image and the target image in the third image to the training data set in a correlated mode.

According to an aspect of an embodiment of the present application, there is provided a target object detection apparatus including: the image acquisition module is used for acquiring an image to be detected; the input module is used for inputting the image to be detected into a target detection model, the target detection model is obtained by model training through a training data set, and the training data set is obtained according to the generation method of the training data set; and the detection module is used for detecting the target object of the image to be detected by the target detection model and outputting a detection result, wherein the detection result at least indicates whether the image to be detected comprises the target object. According to an aspect of an embodiment of the present application, there is provided an electronic apparatus including: a processor; and a memory having stored thereon computer readable instructions which, when executed by the processor, implement a method of generating a training data set or a method of target object detection as described above.

According to an aspect of an embodiment of the present application, there is provided a computer-readable storage medium having stored thereon computer-readable instructions which, when executed by a processor, implement a method of generating a training data set or a method of target object detection as described above.

In the scheme of the application, the first image comprising the detection object is subjected to image processing to obtain at least one second image, which is equivalent to expanding the image comprising the detection object; and then taking the image selected from the first image and the second image as a target image, synthesizing the target image and the background image to obtain a third image, and finally adding the position information of the third image and the target image in the third image into a training data set in an associated manner, so that the training data set for training the target detection model is automatically generated based on the limited first image without completely relying on the acquisition and construction of training data by manpower, the time for constructing the training data set is shortened, and the construction efficiency of the training data set is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application. It is evident that the drawings in the following description are only some embodiments of the present application and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art. In the drawings:

FIG. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of an embodiment of the application may be applied;

FIG. 2 is a flow chart illustrating a method of generating a training data set, according to one embodiment;

FIG. 3 is a flow chart of step 230 in an embodiment corresponding to the embodiment of FIG. 2;

FIG. 4 is a flow diagram illustrating generating a training data set, according to a specific embodiment;

FIG. 5 is a schematic diagram illustrating a generated third image according to an embodiment;

FIG. 6 is a flow chart illustrating a method of target object detection according to one embodiment;

FIG. 7 is a block diagram of an apparatus for generating a training data set, according to an embodiment;

FIG. 8 is a block diagram of a target object detection apparatus according to an embodiment;

Fig. 9 shows a schematic diagram of a computer system suitable for use in implementing an embodiment of the application.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the application may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the application.

The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.

Artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is the theory, method, technique, and application system that simulates, extends, and extends human intelligence using a digital computer or a machine controlled by a digital computer, perceives the environment, obtains knowledge, and uses the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision. The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing (Nature Language processing, NLP) technology, machine learning/deep learning, and other directions.

With the development of artificial intelligence technology, the artificial intelligence technology has been widely applied in the fields of image recognition and detection, for example, detection and recognition of a target object in an image by adopting a deep learning mode. Specifically, a target detection model is built through a neural network, then the detection model is trained through sample data, and then a target object in an image is automatically identified through the trained target detection model.

In video programs, a brand LOGO (Logotype, icon) needs to be added to the video image, i.e., LOGO coding. In order to ensure the accuracy of LOGO coding, after LOGO coding is completed, the video image added with LOGO is required to be detected so as to detect and identify whether the LOGO coded into the video image is accurate or not. In the related art, the LOGO in the video image is detected by the object detection model to determine whether the LOGO coded into the video image is the LOGO actually required to be coded into the video image.

Before the target detection model is used for identifying the LOGO in the video image, the target detection model needs to be trained by a training data set so as to ensure that the target detection model can accurately detect and identify the LOGO in the image after training. Therefore, construction of the training data set is required before training the target detection model.

In the related art, a training data set is generated by manually collecting images including LOGO, and in order to make the number of training samples in the training data set reach a certain number, the method of manually collecting images has the problems of large workload, long time consumption and low efficiency. Therefore, how to improve the generation efficiency of the training data set is a technical problem to be solved.

Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of an embodiment of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include terminal devices (such as one or more of the smart phone 101, tablet 102, and portable computer 103 shown in fig. 1, but of course desktop computers, etc.), a network 104, and a server 105. The network 104 is the medium used to provide communication links between the terminal devices and the server 105. The network 104 may include various connection types, such as wired communication links, wireless communication links, and the like.

It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, the server 105 may be a server cluster formed by a plurality of servers.

In one embodiment of the present application, the server 105 may acquire a first image including the detection object uploaded by the terminal device, and then perform image processing on the first image to obtain at least one second image; wherein the image processing may be at least one of a pixel transformation, a non-similar geometric transformation, an object boundary occlusion processing. And synthesizing at least one of the first image and the second image with the background image to obtain a third image. Thus, the third image including the detection object is used as a training sample in the training data set, and the training data set is automatically generated.

Further, in the task of detecting the target object, the target detection model needs to detect which target object is included in the image, and also needs to determine the position of the detected target object in the image. Therefore, after the third image is obtained, the position information of the target image in the third image is added to the training data set in association with the third image, and the position information of the target image in the third image is used as a label of the third image to determine whether the position of the detected object in the third image detected by the target detection model accords with the actual position (namely, the position indicated by the position information of the target image in the third image) or not when the target detection model is trained through the third image, and then, the parameters of the target detection model are adjusted according to the determination result.

In one embodiment of the present application, after obtaining the second image or the third image, the server 105 may further feed back the second image or the third image to the terminal device, where the user performs screening of the second image and the third image based on the terminal device, and then, the server performs a subsequent operation based on the second image or the third image selected by the user, so as to generate the training data set based on the second image and the third image selected by the user.

In one embodiment of the application, server 105 may also train the target detection model based on the generated training data set. Namely: and inputting the third image comprising the detection object into a target detection model, and detecting the target object of the third image by the target detection model to obtain a detection result. It will be appreciated that, since the third image may be a composite of a plurality of target images and a background image, there may be target objects at a plurality of positions in the third image, and of course, the target objects at a plurality of positions in the third image may be the same or different. Thus, the detection result output by the object detection model for the third image is position dependent, i.e. the detection result indicates at which position in the third image which object is detected.

After the detection result of the third image is obtained, whether the detected target object at the corresponding position is accurate or not is judged according to the label information corresponding to the third image and the position information associated with the third image. Specifically, firstly determining a detection result corresponding to the position indicated by the position information according to the position information associated with the third image, if the detection result corresponding to the position indicated by the position information does not exist, adjusting parameters of a target detection model, and then carrying out target object detection on the third image again through the target detection model after adjusting the parameters; otherwise, if the detection result corresponding to the position indicated by the position information exists, whether the target object indicated by the detection result is consistent with the target object indicated by the label information or not is judged by combining the detection result corresponding to the position information and the label information corresponding to the third image, if not, parameters of the target detection model are adjusted, and if so, the next third image in the training data set is used for training the target detection model.

In an embodiment of the present application, the server 105 may further perform model training through the generated training data set to obtain a target detection model, and perform target object detection on the image to be detected through the target detection model, and output a corresponding detection result. The target object detection is performed by detecting whether the image to be detected comprises the target object or not. Wherein the output detection result indicates at least whether the image to be detected includes the target object. In some embodiments of the present application, if the output detection result indicates that the image to be detected includes the target object, the detection result may further include location information of the detected target object in the image to be detected.

It should be noted that, the method for generating a training data set and the method for detecting a target object according to the embodiments of the present application are generally executed by the server 105, and accordingly, the device for generating a training data set and the device for detecting a target object are generally disposed in the server 105. However, in other embodiments of the present application, the terminal device may also have a similar function as the server, so as to perform the method for generating the training data set or the method for detecting the target object provided by the embodiments of the present application.

The implementation details of the technical scheme of the embodiment of the application are described in detail below:

FIG. 2 is a flow chart illustrating a method of generating a training data set for training a target detection model for detecting a target object in an image, which method may be performed by the server shown in FIG. 1, according to one embodiment, and includes at least the following steps 210-240, described in detail below.

Step 210, a first image is acquired that includes a detection object that includes a positive example corresponding to the target object.

The target object generally refers to an object to be detected and identified by the target detection model. It can be appreciated that if the detection tasks of the target detection models are different, there is a difference in the target objects to be detected.

For example, if the detection task of the target detection model is to detect the LOGO in the image, the LOGO to be detected and confirmed is the target object. Specifically, if it is required to detect whether the LOGO of "Tencel video", "WeChat Payment", "QQ game" in the video or image is accurate, the LOGO corresponding to "Tencel video", the LOGO corresponding to "WeChat Payment" and the LOGO corresponding to "QQ game" are target objects.

For another example, if the task of detecting the target detection model is to detect an animal in an image, the animal to be detected and confirmed is the target object. Specifically, if it is desired to detect monkeys, pandas, and horses in a video or image, the monkeys, pandas, and horses are the target subjects.

The target detection model for any detection task may be one or more target objects for detection and confirmation, and is not particularly limited herein.

A positive example corresponding to a target object refers to a real image of the target object. It will be appreciated that generating the training data set with the positive examples corresponding to the target object may ensure that the target detection model learns the characteristics of the target object during training based on the training data set, thereby enabling the target detection model to be used to detect the target object in the image.

In some embodiments of the present application, only the detection object is included in the first image, so that the introduction of images of other irrelevant things is avoided, and the subsequent processing operation amount of the first image is increased.

Step 220, performing image processing on the first image to obtain at least one second image, where the image processing includes at least one of pixel transformation, dissimilar geometry transformation, and object boundary occlusion processing.

The second image generally refers to an image obtained by performing image processing on the first image.

The pixel transformation may be at least one of a median filtering process, a transparency adjustment, a brightness adjustment.

The dissimilar geometric transformation refers to geometric transformation that makes the detection object in the second image dissimilar to the detection object in the first image. Correspondingly, the similar geometrical transformation refers to a geometrical transformation that makes the detection object in the second image similar to the detection object in the first image.

The non-similar geometric transformation may be at least one of a transverse stretching transformation, a longitudinal stretching transformation, a perspective transformation, and an affine transformation. In this case, the transverse stretching transformation and the longitudinal stretching transformation are both performed in only one direction, and the transverse stretching transformation and the longitudinal stretching transformation may be collectively referred to as a unidirectional stretching transformation.

The object boundary shielding process refers to cutting out a boundary portion of the first image detection object so that the boundary of the detection object in the second image is not complete.

Next, the above-mentioned image processing method will be specifically described.

The median filtering process is a nonlinear smoothing technique that sets the gray value of each pixel to the median of the gray values of all pixels within a certain neighborhood window of that point. In other words, in the image subjected to the median filtering, the gray value of the pixel is the median of the gray values of the plurality of pixels in the step with the pixel as the center and the area size of step. The step of the region may be selected according to actual needs, and is not specifically limited herein. In one embodiment, the value interval of step may be set to [4,6] or [10, 12].

Compared with the first image, the method has the advantages that the salt and pepper noise in the image obtained after the median filtering is reduced, and the definition of the second image is better. In addition, part of the image obtained after the median filtering processing has blurred edges, because when the median filtering is performed on the pixel points located at the boundary, the surrounding part of the image does not have the pixel points. Thus, the second image of the edge blurring of the detection object can be obtained by the median filtering process.

In some embodiments of the present application, transparency adjustment of the first image may be achieved by: acquiring transparency parameters of each pixel point in the first image in a transparency channel; and adjusting the transparency parameter of the corresponding pixel point according to the first adjusting parameter of the pixel point to obtain the second image.

For an image in RGBA format, the pixel value for each pixel includes four component values, R, G, B and A components, respectively. Wherein R represents Red (Red), B represents Blue (Blue), G represents Green (Green), and A represents Alpha (transparency channel). Generally, the image is represented in RGB format, so in order to obtain the transparency parameter of each pixel point in the first image on the transparency channel, the RGB format image needs to be converted into the RGBA format image first, so that the a component of the pixel point is extracted, that is, the transparency parameter of the pixel point on the transparency channel.

Wherein, in the image of RGBA format, R, G, B is an integer between 0 and 255 or a percentage between 0% and 100%, and the three component values describe the amounts of the three primary colors of red, green and blue in the expected color. The value of the component a in the pixel value ranges from 0.0 to 1.0, which indicates the transparency/opacity of the color of the pixel at which it is located, where 1 indicates complete opacity and 0 indicates complete transparency.

In some embodiments of the present application, the first adjustment parameter may be a scaling factor set for the transparency parameter, that is, the transparency parameter of the extracted pixel point is multiplied by the first adjustment parameter to obtain the adjusted transparency parameter.

In some embodiments of the present application, the first adjustment parameter may be a parameter variation set for the transparency parameter, that is, the transparency parameter of the extracted pixel point is added or subtracted from the first adjustment parameter to obtain the adjusted transparency parameter.

It should be noted that the first adjustment parameters set for each pixel point in the first image may be the same or different, and are not specifically limited herein.

In some embodiments of the present application, the transparency parameter of each pixel point in the first image is 1, and in order to perform transparency adjustment on the first image, the transparency parameter of each pixel point in the first image is randomly multiplied by an adjustment parameter, where the value range of the adjustment parameter may be 0.85-0.95.

By adjusting the transparency of the first image, an image having a transparency different from that of the first image can be obtained. Of course, in other embodiments, in order to obtain a second image with more diversity, multiple sets of different adjustment parameters may be set, and then transparency adjustment is performed on the first image according to the multiple sets of different adjustment parameters, so that multiple second images with different transparency from the first image may be obtained.

In some embodiments of the present application, the brightness adjustment of the first image may be implemented by the following procedure, which specifically includes: converting the first image from RGB space to HSV space; acquiring brightness components of each pixel point of the first image in an HSV space; adjusting the brightness component according to a second adjustment parameter; and inversely transforming the first image to an RGB space according to the adjusted brightness component to obtain the second image.

For an HSV format image, the pixel value of each pixel includes three components, namely: H. s, V. Where H (Hue) represents a Hue component, S (Saturation) represents a Saturation component, and V (Value) represents a brightness component. Specifically, the hue component H is measured by an angle, the value range of which is 0 ° to 360 °, the red is 0 °, the green is 120 °, and the blue is 240 ° calculated from the red in the counterclockwise direction; the saturation component S represents the degree of approaching the color to the spectrum, the value range is 0-100%, and the larger the value is, the more saturated the color is; the brightness component V represents the brightness of the color, and its value ranges from 0 (black) to 100% (white).

As described above, the image is generally expressed in RGB format, that is, in RGB space, in which each pixel includes three components, respectively: r represents Red (Red), B represents Blue (Blue), and G represents Green (Green).

Therefore, in order to obtain the luminance component of each pixel point in the first image, the first image needs to be converted from the RGB space to the HSV space first.

Specifically, the first image may be converted from the RGB space to the HSV space by the following formulas (1) to (5), and thus, the luminance components of the respective pixels of the first image in the HSV space are correspondingly determined.

V＝C_max， (5)

In some embodiments of the present application, the second adjustment parameter may be a scaling factor set for the luminance component, similar to the transparency adjustment, that is, the second adjustment parameter is multiplied by the luminance component of the extracted pixel point to obtain the adjusted luminance component.

In some embodiments of the present application, the second adjustment parameter may be a parameter variation set for the luminance component, that is, the luminance component of the pixel is added or subtracted from the second adjustment parameter corresponding to the pixel, so as to obtain the adjusted luminance component.

It should be noted that the second adjustment parameters set for each pixel point in the first image may be the same or different, and are not specifically limited herein.

In some embodiments of the present application, a value range may be set for a second adjustment parameter corresponding to each pixel point in the first image, for example, when the second adjustment parameter is a scaling factor set for a luminance component, the value range of the second adjustment parameter is set to be 0.8-1.2, and then the luminance component of each pixel point is randomly multiplied by a number in the value range, so as to obtain the luminance component after adjustment of each pixel point.

After the adjusted luminance component is obtained, the first image is inversely transformed into the RGB space according to the adjusted luminance component, and a second image in RGB format can be correspondingly obtained.

The perspective transformation is to rotate the shadow bearing surface (perspective surface) around the trace (perspective axis) by a certain angle according to the perspective rotation law by utilizing the condition that the perspective center, the image point and the target point are collinear, and the original projection light beam bundle is destroyed, so that the projection geometric figure on the shadow bearing surface can be kept unchanged.

The general transformation formula of perspective transformation is as follows:

where u, v are coordinates of the pixel points in the image before perspective transformation, and the coordinates of the pixel points in the image obtained by perspective transformation are x, y. Wherein,

x＝x′/w′， (7)

y＝y′/w′， (8)

Matrix in the above formula (6)Is a perspective transformation matrix, wherein a ₃₃ is equal to 1 in the perspective transformation matrix.

In some embodiments of the present application, the perspective transformation of the first image may be performed by the following procedure, specifically including: acquiring coordinates of four first feature points in the first image and coordinates of a first target point appointed for each first feature point; calculating according to the coordinates of the four first feature points and the coordinates of the corresponding first target point to obtain a perspective transformation matrix; and performing perspective transformation on the first image according to the perspective transformation matrix to obtain a second image.

The first target point is a corresponding point of the designated first feature point after perspective transformation. That is, in the present embodiment, the perspective transformation matrix is calculated based on coordinates of four points corresponding to the known transformation.

The above formulas (6) to (8) are combined to obtain:

the coordinates of the pixel points in the image obtained after the transformation can be determined by the formula (9) and the formula (10).

In this embodiment, the coordinates of the four first feature points in the acquired first image and the coordinates of the first target point corresponding to each first feature point are substituted into the above formulas (9) and (10), so that the value of the unknown parameter a ₁₁、a₁₂、a₁₃、a₂₁、a₂₂、a₂₃、a₃₁、a₃₂ in the perspective transformation matrix can be calculated correspondingly. Thus, the perspective transformation matrix is correspondingly determined.

Then, based on the obtained perspective transformation matrix, the coordinates of each pixel point in the first image are substituted into the above (9) and (10), so that the coordinates after perspective transformation can be correspondingly obtained, and then the coordinates of each pixel point in the second image after perspective transformation are correspondingly determined.

Affine transformation is also called affine mapping, which refers to performing linear transformation on an image in one vector space and translating the image to another vector space. Affine transformation is a linear transformation from two-dimensional coordinates to two-dimensional coordinates that maintains the "flatness" (i.e., the straight lines remain straight after transformation) and "parallelism" (i.e., the relative positional relationship between the two-dimensional patterns remains unchanged, the parallel lines remain parallel lines, and the order of the positions on the straight lines remains unchanged) of the two-dimensional patterns. Affine transformation can be expressed by the following formula:

in the above formula (11), X, y are coordinates of the pixel before affine transformation, X ', y' are coordinates of the pixel after affine transformation, and matrix Is an affine transformation matrix.

In some embodiments of the present application, affine transformation of the first image may be performed by the following procedure, specifically including: acquiring coordinates of three second feature points in the first image, and acquiring coordinates of a second target point designated for each second feature point; calculating to obtain an affine transformation matrix according to the coordinates of the second characteristic points and the coordinates of the corresponding second target points; and carrying out affine transformation on the first image according to the affine transformation matrix to obtain the second image.

Wherein the second target point is a corresponding point of the second feature point after affine transformation is specified. That is, in the present embodiment, the affine transformation matrix is calculated based on coordinates of three points corresponding to the known affine transformation.

After obtaining the coordinates of the three second feature points and the coordinates of the second target point corresponding to each second feature point, the coordinates are correspondingly brought into the above formula (11), and parameters in the affine transformation matrix can be correspondingly calculated and obtained: a. b, c, d, e, f, and thus, affine transformation matrix determination.

After determining the affine transformation matrix, substituting the coordinates of each pixel point in the first image into the formula (11), and calculating to obtain the coordinates of each pixel point after affine transformation, namely determining the coordinates of each pixel point in the second image obtained by affine transformation.

In some embodiments of the present application, the non-similar geometric transformation may also be a unidirectional stretching transformation, which may be performed on the first image by the following procedure: obtaining a stretching coefficient in a target stretching direction, wherein the target stretching direction is a height direction or a width direction of the first image; and carrying out stretching transformation on the first image in the target stretching direction according to the stretching coefficient.

The unidirectional stretching transformation can be a transverse stretching transformation or a longitudinal stretching transformation. The lateral stretching transformation is to stretch the width according to a stretching coefficient while keeping the height of the image unchanged, in which case the target stretching direction is the width direction of the image. The longitudinal stretching transformation is to stretch the height according to a stretching coefficient while keeping the width of the image unchanged, in which case the target stretching direction is the height direction of the image.

The target stretching direction and the stretching coefficient in the target stretching direction may be set according to actual needs, and are not particularly limited herein. It will be appreciated that in order to obtain a plurality of second images, a plurality of sets of stretch coefficients for achieving either longitudinal stretching or transverse stretching may be set.

In some embodiments of the present application, the image processing includes an object boundary occlusion process, and performing the object boundary occlusion process on the first image may be implemented by: performing edge detection on the detection object in the first image, and determining the position of the edge of the detection object in the first image; determining a target removal area in the first image according to the position of the determined boundary, wherein the boundary part of the detection object is positioned in the target removal area; and removing the target removing area from the first image to obtain the second image.

By performing edge detection, the position of the edge of the detection object in the first image can be correspondingly determined. Thus, the cutting line for cutting the detection object in the first image is determined with the position of the edge of the detection object as a reference.

The cutting line may be a line set a distance from a boundary to be removed in the detection object. In other embodiments, the cut line may also be a straight line determined from at least two points in the boundary to be removed in the test object. Of course, in other embodiments, the cut may also be a curve, a broken line, etc., and is not specifically limited herein.

After determining the cutting line, an area enclosed between the cutting line and a boundary to be removed in the detection object is a target removal area. Therefore, the first image is cut directly according to the cutting line, and the target removing area can be removed from the first image.

The target removal area partially including the edge of the detection object is removed from the first image, so that after the removal, the edge of the detection object in the first image is incomplete, and thus, the edge of the detection object can be regarded as partially blocked. Therefore, the second image obtained after the target removal area in the first image is removed can simulate the situation that the object is partially blocked in practice. Therefore, after the target detection model is trained based on the training data set constructed by the second image obtained by removing the target area, the target detection model can detect the target object with the partially-blocked recognition edge.

It is noted that in step 220, the image processing performed on the first image may be at least one of the above-listed pixel transforms (median filtering, transparency adjustment, brightness adjustment), non-similar geometric transforms (perspective transform, affine transform, lateral stretching transform, longitudinal stretching transform), and object boundary occlusion processing.

It will be appreciated that the image processing means employed are different and that the resulting second images are correspondingly different. Thus, by adopting any one of the image processing modes or combining two or more modes to process the first image, a plurality of different second images can be obtained. In other words, the process of step 220 corresponds to expanding the image including the detection object, and the expanded second image still includes the detection object. Thus, a training data set may be generated based on the image augmentation resulting second image and the initially resulting first image.

From another perspective, in practice, differences in shooting angles, light rays, brightness, and the like may cause differences in acquired images including the target object, so in general, in order to ensure that the target detection model can identify the target object in the images under different scenes, the images acquired under different scenes including the target object need to be acquired to train the target detection model.

Through the above procedure of step 220, the second image obtained by performing image processing on the first image may be equivalent to an image acquired under a different scene and including the target object. That is, the image (i.e., the second image) containing the target object in a plurality of scenes is obtained through image processing based on the same image (i.e., the first image) containing the target object, without the need of manually performing actual scene simulation and image acquisition in each scene, thereby greatly saving the time of image acquisition and reducing the workload of image acquisition.

With continued reference to fig. 2, in step 230, the background image and at least one target image are combined to obtain a third image, where the target image is the first image or the second image.

In the image composition, a background image may be combined with a target image, or a background image may be combined with two or more target images, in other words, a third image may include one target image, or two or more target images.

And in the process of image synthesis, the background image is used as a background, and the target image is attached to the background image to obtain a third image.

In some embodiments of the application, the size of the target image is smaller than the background image to ensure that the target image does not completely cover the background image after fitting the target image into the background image.

In some embodiments of the present application, a background image library may be pre-constructed to select an image from the background image library as a background image when image synthesis is required. The images in the background image library can be derived from a public database and can also be screenshot in video.

In a specific embodiment of the present application, the object detection model is used to detect the LOGO in the graph, and in this application scenario, the image in the background image library may be derived from OpenLogo databases, or may be a screenshot in various kinds of synthetic videos, for example, the background image library is constructed by using the pictures in OpenLogo databases and the shots in the synthetic videos according to the ratio of 1:1.

In some embodiments of the present application, to facilitate selection of the target image, a target image library may be further constructed, where the target image library is used to store the first image and the second image obtained by performing image processing on the first image. Thus, a plurality of images are selected from the target image library as target images for image synthesis.

In some embodiments of the application, the detection object further comprises a counterexample corresponding to the target object. Where the counterexample corresponding to the target object refers to an image that is similar to the target object but is easily mistaken for the image of the target object. A training data set is generated from a first image including a counter example corresponding to the target object and a first image including a positive example corresponding to the target object, so that a resulting target detection model trained by the training data set has the ability to recognize not only the target object in the image and but also the counter example corresponding to the target object, thereby reducing the probability that the target detection model erroneously detects an image similar to the target object as the target object, thereby improving the detection accuracy of the target detection model for the target object.

In some embodiments of the present application, in a scenario where the detection object further includes a counterexample corresponding to the target object, the target image library may also be divided into a positive example library and a negative example library. Wherein the positive example library is used for storing a first image comprising positive examples corresponding to the target object and a second image obtained according to the first image comprising positive examples corresponding to the target object, and the negative example library is used for storing a first image comprising negative examples corresponding to the target object and a second image obtained according to the first image comprising negative examples corresponding to the target object. For convenience of distinction, the images in the positive example library are referred to as positive example images, and the images in the negative example library are referred to as negative example images.

In some embodiments of the present application, in the process of applying the target detection model, an image of the target object that is mistakenly identified by the model may be stored as a first image in a counterexample library, and the newly stored first image is subjected to image processing according to the above step 220 to obtain a plurality of second images, and then a third image is generated based on the newly placed first image and the newly generated second image, so as to update and train the target detection model based on the newly obtained third image.

In some embodiments of the present application, in the process of synthesizing with a background image, only the positive example image may be selected to synthesize with the background image, a part of the positive example image may be selected and a part of the negative example image may be selected to synthesize with the background image, and only the negative example image may be selected to synthesize with the background image.

For the case that a part of the positive example image is selected and a part of the negative example image is selected to be synthesized with the background image, the background image and the selected negative example image can be synthesized first, the image obtained by the synthesis is used as the background image again, and then the selected positive example image and the image obtained by the previous synthesis are synthesized again, so that a third image comprising the negative example image and the third image comprising the positive example image are obtained.

Step 240, adding the third image and the position information of the target image in the third image to the training dataset in association.

Through the above procedure of steps 210-240, enrichment of the training data set may be achieved, and further, the target detection model may be trained with a third image in the training data set and position information of the target image in the third image, where the position information of the target image in the third image may be used to characterize the position of the detection object in the third image.

In some embodiments of the present application, a markup file may be further provided in the training dataset, the markup file storing location information of the target image in the third image.

As described above, since a third image may be a composite of a background image and a plurality of target images, the positional information associated with the third image includes a plurality of images.

In some embodiments of the present application, since the counterexample image includes a counterexample corresponding to the target object, when the counterexample image is taken as the target image for the third image generated from the counterexample image, the positional information of the counterexample image in the third image may not be stored in the training data set, but only the positional information of the positive example image in the third image taken as the target image may be stored in the training data set.

In some embodiments of the present application, for a third image comprising an example image, the third image is labeled simultaneously with the storing of the third image in the training data set, wherein the labeled label is used to indicate a target object included in the third image.

When the data in the training data set reaches the set sample size, the third image in the training data set and the position information associated with the third image can be used as training samples to train the target detection model. Specifically, the third image is input into the target detection model, the content features of the third image are extracted by the target detection model, then a detection result is determined based on the extracted content features, the detection result is used for indicating whether the target object is included in the third image, and if the detection result indicates that the target object is included in the third image, the detection result is also used for further indicating that the target object is located in the third image.

After the detection result corresponding to the third image is obtained, determining whether the parameters of the target detection model need to be adjusted according to the detection result of the third image, the label of the third image and the position information related to the third image. Specifically, if it is determined that the target detection model is misidentified according to the detection result and the position information associated with the third image, parameters of the target detection model need to be adjusted, and then the third image is detected through the target detection model after the parameters are adjusted; otherwise, if the detection result generated by the target detection model is accurate, training the target detection model by using the lower training sample is continued until the loss function of the target detection model is converged.

The method comprises the steps of detecting and identifying other non-target objects in a third image as target objects, and identifying that the positions of the identified target objects in the third image are different from the positions of the target objects in the third image indicated by position information associated with the third image.

In some embodiments of the application, the object detection model may be constructed from one or more neural networks, for example, convolutional neural networks, long and short memory neural networks, gated loop neural networks, and the like.

In the scheme of the application, the first image comprising the detection object is subjected to image processing to obtain at least one second image, which is equivalent to expanding the image comprising the detection object, then the image selected from the first image and the second image is used as a target image, the target image is synthesized with a background image to obtain a third image, and finally the position information of the third image and the target image in the third image is associated and added into a training data set, so that the training data set for training the target detection model is automatically generated based on the limited first image without completely relying on acquisition and construction of training data by manpower, the time for constructing the training data set is shortened, and the construction efficiency of the training data set is improved.

In one embodiment of the present application, as shown in FIG. 3, step 230 includes:

in step 310, synthesis indication information is obtained, where the synthesis indication information includes similar geometric transformation parameters and a number of fits.

The lamination number refers to the number of target images to be laminated in a background image.

And 320, selecting a corresponding number of target images according to the fitting number.

And 330, performing similarity geometric transformation on the selected target images according to the similarity geometric transformation parameters.

In some embodiments of the application, the similarity geometric transformation includes scaling, and the similarity geometric transformation parameters include a map scale; step 330 includes: determining the target size of the selected target image according to the mapping proportion and the size of the background image; and scaling the selected target image according to the target size until the size of the target image reaches the target size.

The map scale indicates the dimensional scale relationship of the target image to the background image in the third image.

In some embodiments of the present application, the width of the background image may be used as a reference, and then the target width of the target image may be calculated according to the width of the background image and the mapping proportion; and calculating a scaling factor according to the actual width of the target image and the calculated target width, so as to scale the target image according to the scaling factor, and enabling the width of the scaled target image to reach the target width. In other embodiments, the target image may be scaled with the height of the background image as a reference, and the process is similar to scaling with the width of the background image as a reference, which is not described herein.

In some embodiments of the application, the similarity geometric transformation comprises a rotation transformation, and the similarity geometric transformation parameters comprise rotation parameters; step 330 includes: and rotating the selected target image according to the rotation angle indicated by the rotation parameter and the indicated rotation direction.

In some embodiments of the present application, the ratio of the target image to be rotationally transformed may be preset, so that, during the image pasting process, a part of the target image is rotationally transformed and a part of the target image is not rotationally transformed, and of course, the ratio of the target image to be rotationally transformed to be actually pasted satisfies the set ratio of the target image to be rotationally transformed. For example, if the ratio of the rotation conversion is set to 1/6, the probability of the rotation conversion is set to 1/6 for any target image to be attached.

In some embodiments of the present application, the rotation angle indicated by the rotation parameter and the indicated rotation direction may be determined according to a set rule. For example, if the rotation angle is set in advance to a range of 10 ° to 45 °, and the optional rotation directions are set to be clockwise and counterclockwise, an angle may be selected from the rotation angle range as the rotation angle, and a direction may be selected from the optional rotation directions as the rotation direction.

In some embodiments of the present application, a candidate bonding ratio may also be set for each bonding number, and when bonding is required, one of the candidate bonding ratios is selected as the bonding ratio. The bonding proportion may be selected by a user, or may be selected according to a set rule, or may be selected randomly, which is not limited herein.

In some embodiments of the present application, the number of the laminating steps may be 1 (i.e. laminating a target image in a background image) or 4 (i.e. laminating a target image in a background image). The candidate bonding ratios set for the bonding number of 1 include 0.3, 0.35, 0.4, 0.45, 0.5, and 0.6. The candidate bonding ratios set for the bonding number 4 include 0.05, 0.075, 0.1, 0.125, 0.15, 0.175, 0.2, 0.25.

In some embodiments of the present application, in the process of laminating with the same background image, the similar geometric transformation may be performed on the target image according to only one lamination ratio, or may be performed on the target image according to multiple lamination ratios. For example, the similar geometric transformation may be performed on the partial target image according to the bonding ratio selected from the candidate bonding ratios corresponding to the bonding number of 4, and the similar geometric transformation may be performed on the partial target image according to the bonding ratio selected from the candidate bonding ratios corresponding to the bonding number of 1. Thus, there may be multiple sizes of target images in the same background image to facilitate better training of the target detection model.

In some embodiments of the present application, the candidate lamination ratio may also be set for lamination including the target image corresponding to the positive example of the target object and lamination including the target image corresponding to the negative example of the target object, respectively, for example, the lamination number is 4, and the candidate lamination ratio set for lamination including the target image corresponding to the positive example of the target object includes: 0.05, 0.075, 0.1, 0.125, 0.15, 0.175, 0.2, 0.25; and setting the candidate lamination ratio for lamination including the target image corresponding to the counterexample of the target object includes: 0.06, 0.08, 0.1, 0.12.

Of course, the above is merely an exemplary example of the candidate bonding ratio and the bonding number, and in other embodiments, the candidate bonding ratio and the bonding number may be set according to actual needs.

With continued reference to fig. 3, in step 340, the target image after performing the similar geometric transformation is attached to the background image, so as to obtain the third image.

In some embodiments of the present application, step 340 comprises: selecting the same number of mapping areas as the lamination number from the background image as target areas, wherein the mapping areas are determined by dividing the background image; and respectively attaching the target images subjected to the similar geometric transformation to the target areas to obtain the third image, wherein one target area is used for attaching one target image.

In some embodiments of the present application, the background image may be divided into regions equally according to a set size, for example, the background image is divided into grids according to 2×2, and a grid interval is used as a map area; the background image is also, for example, gridded as 3*3.

In some embodiments of the present application, in the process of fitting, the center of the target image subjected to the similar geometric transformation is aligned with the center of the target region, and then the target image subjected to the similar geometric transformation is fitted in the target region.

In some embodiments of the present application, after the target image after the geometric transformation of the line similarity is attached to the target area of the background image, a third image may be further subjected to image enhancement processing.

The scheme of the application is further described below in connection with a specific embodiment.

In this implementation, the object detection model is used to detect LOGO in the image. FIG. 4 is a flow chart illustrating generation of a training data set, according to a specific embodiment. As shown in fig. 4, a LOGO image and a counterexample image for the LOGO are input respectively, then the LOGO image and the counterexample image are subjected to pixel transformation, then unidirectional stretching transformation is performed, the transformed LOGO image is attached to a background image, and the transformed counterexample image is attached to the background image randomly.

In the process, the image obtained by laminating the counterexample image and the background image can be used as the background image again for laminating the counterexample image and the transformed LOGO image again.

It should be noted that, in the image processing performed, the pixel transformation is performed first and then the unidirectional stretching transformation is performed, and the order of the two transformations may also be exchanged, where the pixel transformation may be at least one of the above-listed materials, and the unidirectional stretching transformation may also be at least one of the above-listed materials.

In this embodiment, in the process of bonding the LOGO image or the second image obtained by performing image processing on the LOGO image to the background image, at least one candidate bonding ratio of the candidate bonding ratios (0.3, 0.35, 0.4, 0.45, 0.5, 0.6) set for the bonding number 1 may be scaled and bonded (for convenience of distinction, this bonding method is referred to as large-size bonding), and at least one candidate bonding ratio of the candidate bonding ratios (0.05, 0.075, 0.1, 0.125, 0.15, 0.175, 0.2, 0.25) set for the bonding number 4 may be scaled and bonded (this bonding method is referred to as small-size bonding).

In this embodiment, in the process of bonding the counterexample image and the background image, at least one candidate bonding ratio (0.06, 0.08, 0.1, 0.12) of the candidate bonding ratios (0.06, 0.08, 0.1, 0.12) set for the bonding number 4 is scaled and bonded.

In this embodiment, after the third image is attached to the background image, enhancement processing is further performed on the third image, and then the third image after the enhancement processing is added to the training dataset.

Fig. 5 shows a schematic representation of a third image obtained by the method according to the application, as shown in fig. 5, the target image fitted into the background image comprising a positive example image of LOGO (for ease of identification, the positive example image fitted into the background image is marked with a box in fig. 5) and a negative example image.

Assuming that the number of third images in the training dataset is t, the number of selected background images is k=0.01t, and the number of collected LOGO images is p. In the obtained third images, the number of the third images to which the 4 target images were attached was 0.95t, and the number of the third images to which the 1 target images were attached was 0.05t.

Further, assuming that the number of the positive example images expanded by the LOGO image is not less than 50, the expansion multiple epoch of the LOGO image can be calculated to satisfy the following conditions:

epoch≥50/p， (12)

Assuming that the number of the third images of the 4 target images is 0.95t, the number of the used positive images is 0.95t 4; since each of the expanded positive example images is used no more than 200 times, it is possible to obtain:

(1+epoch)*z≥(0.95t*4)/200， (13)

The approximation can be calculated according to equation (13):

epoch≥(0.95x*4)/200/z＝0.95x/50/z， (14)

combining formulas (12) and (14) can result in: the expansion multiple epoch of the LOGO image satisfies:

epoch≥max(0.95x/50/z)，50/z)， (15)

Correspondingly, the number of positive example images of the augmented LOGO is (1+epoch) z. It can be seen that the method according to the present disclosure can greatly enrich the number of training samples (i.e., the third image number) in the training dataset.

It should be noted that the above limitation of the number of times of use of each augmented positive example image and the number of positive example images obtained by augmenting the LOGO image is an exemplary example, and in other embodiments, the method may be set according to actual needs.

FIG. 6 is a flow chart illustrating a method of target object detection, which may be performed by the server shown in FIG. 1, according to one embodiment, including at least the following steps 610-630, as described in detail below.

In step 610, an image to be detected is acquired.

Step 620, inputting the image to be detected into a target detection model, wherein the target detection model is obtained by model training through a training data set, and the training data set is obtained according to the method for generating the training data set in any embodiment.

And step 630, performing target object detection on the image to be detected by the target detection model, and outputting a detection result, wherein the detection result at least indicates whether the image to be detected comprises a target object.

As described above, the target detection model corresponds to a detection task indicating a target object to be detected. Because the training data set is generated by expanding the image according to the first image comprising the detected object, after the model training is performed through the generated training data set to obtain the target detection model, the target detection model learns the image characteristics of the target object, and whether the image to be detected comprises the target object can be correspondingly detected.

The target detection model may be constructed by one or more neural networks such as a convolutional neural network and a recurrent neural network, and is not specifically limited herein.

In some embodiments of the present application, if the output detection result indicates that the image to be detected includes the target object, the detection result may further include position information of the target object in the image to be detected, where the position information is used to indicate a position of the target object in the image to be detected.

In the scheme of the application, the training data set obtained according to the method for generating the training data set comprises the third image and the position information of the target image in the third image, wherein the position information of the target image in the third image indicates the position of the target object in the third image, so that the target detection model trained by the training data set can accurately detect and determine the position of the target object in the image to be detected, and the position information is output.

According to the scheme, the training data set is constructed based on the first image comprising the target object and the second image obtained by performing image processing on the first image, so that the target detection model obtained by training through the training data set can identify the target object in images in different states (for example, under different shooting angles), and the accuracy of target object detection can be ensured. The target detection model obtained through training can automatically and quickly detect whether the image to be detected comprises the target object, and the efficiency and the accuracy of target object detection are greatly improved.

The following describes embodiments of the apparatus of the present application that may be used to perform the methods of the above-described embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.

The present application provides a training data set generating device 700, where the training data set generating device 700 may be configured in a server shown in fig. 1, where the training data set is used to train a target detection model, and the target detection model is used to detect a target object in an image, as shown in fig. 7, and the training data set generating device 700 includes:

An acquisition module 710 is configured to acquire a first image including a detection object, where the detection object includes a positive example corresponding to the target object.

A processing module 720, configured to perform image processing on the first image to obtain at least one second image, where the image processing includes at least one of pixel transformation, dissimilar geometry transformation, and object boundary occlusion processing.

And a synthesizing module 730, configured to synthesize the background image with at least one target image to obtain a third image, where the target image is the first image or the second image.

An adding module 740 is configured to add the third image and the positional information of the target image in the third image to the training dataset in association with each other.

In some embodiments of the application, the non-similar geometric transformations include perspective transformations, and the processing module 720 is configured to: acquiring coordinates of four first feature points in the first image and coordinates of a first target point appointed for each first feature point; calculating to obtain a perspective transformation matrix according to the coordinates of the four first characteristic points and the coordinates of the corresponding first target point; and performing perspective transformation on the first image according to the perspective transformation matrix to obtain the second image.

In some embodiments of the application, the non-similar geometric transformation comprises an affine transformation, and the processing module 720 is configured to: acquiring coordinates of three second feature points in the first image, and acquiring coordinates of a second target point designated for each second feature point; calculating to obtain an affine transformation matrix according to the coordinates of the second characteristic points and the coordinates of the corresponding second target points; and carrying out affine transformation on the first image according to the affine transformation matrix to obtain the second image.

In some embodiments of the application, the pixel transformation includes transparency adjustment, and the processing module 720 is configured to: acquiring transparency parameters of each pixel point in the first image in a transparency channel; and adjusting the transparency parameter of the corresponding pixel point according to the first adjusting parameter of the pixel point to obtain the second image.

In some embodiments of the application, the pixel transformation includes brightness adjustment, and the processing module 720 is configured to: converting the first image from RGB space to HSV space; acquiring brightness components of each pixel point of the first image in an HSV space; adjusting the brightness component according to a second adjustment parameter; and inversely transforming the first image to an RGB space according to the adjusted brightness component to obtain the second image.

In some embodiments of the application, the non-similar geometric transformations include unidirectional stretching transformations, and the processing module 720 is configured to: obtaining a stretching coefficient in a target stretching direction, wherein the target stretching direction is a height direction or a width direction of the first image; and carrying out stretching transformation on the first image in the target stretching direction according to the stretching coefficient.

In some embodiments of the application, the image processing includes object boundary occlusion processing, and the processing module 720 is configured to: performing edge detection on the detection object in the first image, and determining the position of the edge of the detection object in the first image; determining a target removal area in the first image according to the position of the determined boundary, wherein the boundary part of the detection object is positioned in the target removal area; and removing the target removing area from the first image to obtain the second image.

In some embodiments of the application, the synthesis module 730 is configured to: obtaining synthesis indication information, wherein the synthesis indication information comprises similar geometric transformation parameters and fitting quantity; selecting a corresponding number of target images according to the fitting number; respectively carrying out similar geometric transformation on the selected target images according to the similar geometric transformation parameters; and attaching the target image subjected to the similar geometric transformation to the background image to obtain the third image.

In some embodiments of the present application, in the step of fitting the target image after performing the similar geometric transformation to the background image to obtain the third image, the synthesis module 730 is further configured to: selecting the same number of mapping areas as the lamination number from the background image as target areas, wherein the mapping areas are determined by dividing the background image; and respectively attaching the target images subjected to the similar geometric transformation to the target areas to obtain the third image, wherein one target area is used for attaching one target image.

In some embodiments of the application, the similarity geometric transformation includes scaling, and the similarity geometric transformation parameters include a map scale; in the step of performing the similar geometric transformation on the selected target images according to the similar geometric transformation parameters, the synthesizing module 730 is configured to: determining the target size of the selected target image according to the mapping proportion and the size of the background image; and scaling the selected target image according to the target size until the size of the target image reaches the target size.

In some embodiments of the application, the similarity geometric transformation comprises a rotation transformation, and the similarity geometric transformation parameters comprise rotation parameters; in the step of performing the similar geometric transformation on the selected target images according to the similar geometric transformation parameters, the synthesizing module 730 is configured to: and rotating the selected target image according to the rotation angle indicated by the rotation parameter and the indicated rotation direction.

In some embodiments of the application, the detection object further comprises a counterexample corresponding to the target object.

The present application provides a target object detection apparatus 800, the target object detection apparatus 800 may be configured in the server shown in fig. 1, as shown in fig. 8, the target object detection apparatus 800 includes:

the image acquisition module 810 is configured to acquire an image to be detected.

The input module 820 is configured to input the image to be detected into a target detection model, where the target detection model is obtained by performing model training through a training data set, and the training data set is obtained according to the method for generating the training data set in any of the foregoing embodiments.

The detection module 830 is configured to detect a target object in the image to be detected by using the target detection model, and output a detection result, where the detection result indicates at least whether the image to be detected includes the target object. The implementation process of the functions and roles of each module/unit in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be repeated here.

It is to be understood that these modules may be implemented in hardware, software, or a combination of both. When implemented in hardware, these modules may be implemented as one or more hardware modules, such as one or more application specific integrated circuits. When implemented in software, the modules may be implemented as one or more computer programs executing on one or more processors.

It should be noted that, the computer system 900 of the electronic device shown in fig. 9 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present application.

As shown in fig. 9, the computer system 900 includes a central processing unit (Central Processing Unit, CPU) 901 which can perform various appropriate actions and processes, such as performing the methods in the above-described embodiments, according to a program stored in a Read-Only Memory (ROM) 902 or a program loaded from a storage portion 908 into a random access Memory (Random Access Memory, RAM) 903. In the RAM 903, various programs and data required for system operation are also stored. The CPU 901, ROM 902, and RAM 903 are connected to each other through a bus 904. An Input/Output (I/O) interface 905 is also connected to bus 904.

The following components are connected to the I/O interface 905: an input section 906 including a keyboard, a mouse, and the like; an output portion 907 including a speaker and the like, such as a Cathode Ray Tube (CRT), a Liquid crystal display (Liquid CRYSTAL DISPLAY, LCD), and the like; a storage portion 908 including a hard disk or the like; and a communication section 909 including a network interface card such as a LAN (Local Area Network ) card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as needed. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 910 so that a computer program read out therefrom is installed into the storage section 908 as needed.

In particular, according to embodiments of the present application, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from the network via the communication portion 909 and/or installed from the removable medium 911. When the computer program is executed by a Central Processing Unit (CPU) 901, various functions defined in the system of the present application are performed.

It should be noted that, the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-Only Memory (ROM), an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM), a flash Memory, an optical fiber, a portable compact disc read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present application, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Where each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be provided in a processor. Wherein the names of the units do not constitute a limitation of the units themselves in some cases.

According to an aspect of the present application, there is also provided a computer-readable storage medium having stored thereon computer-readable instructions which, when executed by a processor, implement the method of generating a training data set or the method of detecting a target object as in any of the embodiments above.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a touch terminal, or a network device, etc.) to perform the method according to the embodiments of the present application.

References herein to "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship.

Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains.

It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A method for generating a training data set, wherein the training data set is used for training a target detection model, the target detection model is used for detecting a target object in a third image to obtain a detection result of the third image, the detection result of the third image indicates whether the third image includes a target object, when the detection result of the third image indicates that the third image includes the target object, the detection result of the third image also indicates a position of the target object in the third image, so as to determine whether adjustment of parameters of the target detection model is needed based on the detection result of the third image, a label of the third image, and position information associated with the third image, and the target object includes a brand corresponding to a video program, the method comprises:

Acquiring a first image including only a detection object; wherein the acquiring a first image including a detection object includes: acquiring a first image comprising a positive example corresponding to the target object from a positive example library, and acquiring a first image comprising a negative example corresponding to the target object from a negative example library;

Performing image processing on the first image to obtain at least one second image, wherein the image processing comprises pixel transformation, dissimilar geometric transformation and object boundary shielding processing;

Obtaining a background image from a background image library, wherein the background image library comprises images obtained from a OpenLogo database and images obtained from screenshots respectively corresponding to various types of variety videos;

Synthesizing the background image and at least one target image to obtain a third image, wherein the target image is the first image or the second image;

adding a positional information association of the third image and the target image in the third image to the training dataset;

the synthesizing the background image and at least one target image to obtain a third image comprises the following steps:

Obtaining synthesis indication information, wherein the synthesis indication information comprises similar geometric transformation parameters and fitting quantity, the similar geometric transformation comprises scaling, and the similar geometric transformation parameters comprise mapping proportions;

Selecting a corresponding number of target images according to the fitting number;

Determining a target width of the selected target image according to the mapping proportion and the width of the background image, and determining a target height of the selected target image according to the mapping proportion and the height of the background image;

Scaling the selected target image according to the target width and the target height;

Attaching the scaled target image to the background image to obtain the third image; wherein the scaled target image has a size smaller than the size of the background image.

2. The method of claim 1, wherein the non-similar geometric transformation comprises a perspective transformation, and wherein the image processing the first image to obtain at least one second image comprises:

acquiring coordinates of four first feature points in the first image and coordinates of a first target point appointed for each first feature point;

Calculating to obtain a perspective transformation matrix according to the coordinates of the four first characteristic points and the coordinates of the corresponding first target point;

and performing perspective transformation on the first image according to the perspective transformation matrix to obtain the second image.

3. The method of claim 1, wherein the non-similar geometric transformation comprises an affine transformation, and wherein the image processing the first image to obtain at least one second image comprises:

acquiring coordinates of three second feature points in the first image, and acquiring coordinates of a second target point designated for each second feature point;

Calculating to obtain an affine transformation matrix according to the coordinates of the second characteristic points and the coordinates of the corresponding second target points;

and carrying out affine transformation on the first image according to the affine transformation matrix to obtain the second image.

4. The method of claim 1, wherein the pixel transformation includes transparency adjustment, and wherein the image processing the first image to obtain at least one second image comprises:

acquiring transparency parameters of each pixel point in the first image in a transparency channel;

And adjusting the transparency parameter of the corresponding pixel point according to the first adjusting parameter of the pixel point to obtain the second image.

5. The method of claim 1, wherein the pixel transformation includes brightness adjustment, and wherein the image processing the first image to obtain at least one second image comprises:

converting the first image from RGB space to HSV space;

acquiring brightness components of each pixel point of the first image in an HSV space;

adjusting the brightness component according to a second adjustment parameter;

And inversely transforming the first image to an RGB space according to the adjusted brightness component to obtain the second image.

6. The method of claim 1, wherein the non-similar geometric transformation comprises a unidirectional stretching transformation, and wherein the image processing the first image to obtain at least one second image comprises:

obtaining a stretching coefficient in a target stretching direction, wherein the target stretching direction is a height direction or a width direction of the first image;

and carrying out stretching transformation on the first image in the target stretching direction according to the stretching coefficient.

7. The method of claim 1, wherein the performing image processing on the first image to obtain at least one second image comprises:

performing edge detection on the detection object in the first image, and determining the position of the edge of the detection object in the first image;

determining a target removal area in the first image according to the position of the determined boundary, wherein the boundary part of the object to be detected is positioned in the target area;

and removing the target area from the first image to obtain the second image.

8. The method of any one of claims 1 to 7, wherein the similar geometric transformation further comprises a rotation transformation, and wherein the similar geometric transformation parameters further comprise rotation parameters;

And attaching the scaled target image to the background image to obtain the third image, including:

rotating the scaled target image according to the rotation angle indicated by the rotation parameter and the indicated rotation direction;

and attaching the rotated target image to the background image to obtain the third image.

9. A target object detection method, characterized by comprising:

Acquiring an image to be detected;

inputting the image to be detected into a target detection model, the target detection model being model trained by a training dataset, the training dataset being obtained according to the method of any of claims 1-8;

and detecting the target object of the image to be detected by the target detection model, and outputting a detection result, wherein the detection result at least indicates whether the image to be detected comprises the target object.

10. A training data set generating apparatus, wherein the training data set is used for training a target detection model, the target detection model is used for detecting a target object in a third image to obtain a detection result of the third image, the detection result of the third image indicates whether the third image includes a target object, when the detection result of the third image indicates that the third image includes the target object, the detection result of the third image also indicates a position of the target object in the third image, so as to determine whether adjustment of parameters of the target detection model is needed based on the detection result of the third image, a label of the third image, and position information associated with the third image, the target object includes a brand corresponding to a video program, the apparatus comprising:

An acquisition module for acquiring a first image including only a detection object; wherein the acquiring a first image including a detection object includes: acquiring a first image comprising a positive example corresponding to the target object from a positive example library, and acquiring a first image comprising a negative example corresponding to the target object from a negative example library;

The processing module is used for carrying out image processing on the first image to obtain at least one second image, wherein the image processing comprises pixel transformation, dissimilar geometric transformation and object boundary shielding processing;

The synthesis module is used for obtaining a background image from a background image library, wherein the background image library comprises images obtained from a OpenLogo database and images obtained from screenshots respectively corresponding to various types of variety videos; synthesizing the background image and at least one target image to obtain a third image, wherein the target image is the first image or the second image;

An adding module, configured to add the third image and the location information of the target image in the third image to the training dataset in association;

11. The apparatus of claim 10, wherein the non-similar geometric transformation comprises a perspective transformation, the processing module configured to:

12. The apparatus of claim 10, wherein the non-similar geometric transformation comprises an affine transformation, the processing module configured to:

13. An electronic device, comprising:

A processor; and

A memory having stored thereon computer readable instructions which, when executed by the processor, implement the method of any of claims 1 to 9.

14. A computer readable storage medium having stored thereon computer readable instructions which, when executed by a processor, implement the method of any of claims 1 to 9.