CN117541770A

CN117541770A - Data enhancement method and device and electronic equipment

Info

Publication number: CN117541770A
Application number: CN202210904226.9A
Authority: CN
Inventors: 吕永春; 朱徽; 周迅溢; 蒋宁; 吴海英
Original assignee: Mashang Consumer Finance Co Ltd
Current assignee: Mashang Consumer Finance Co Ltd
Priority date: 2022-07-29
Filing date: 2022-07-29
Publication date: 2024-02-09
Also published as: WO2024022149A1

Abstract

The application provides a data enhancement method, a device and electronic equipment, wherein the data enhancement method comprises the following steps: acquiring N original images and N background image sets, wherein one original image corresponds to one background image set, N is an integer greater than 1, and any background image set comprises at least one background image; performing target detection on each original image in the N original images according to M target detection networks to obtain M target detection frames of each original image, wherein M is an integer greater than 1; determining a first detection frame of each original image, wherein the first detection frame of each original image is a detection frame determined by M target detection frames of the original image; and fusing the region corresponding to the first detection frame in each original image and at least one background image in the background image set corresponding to the original image to obtain at least one enhanced image corresponding to each original image. In this way, the data enhancement effect can be improved.

Description

Data enhancement method and device and electronic equipment

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a data enhancement method and apparatus, and an electronic device.

Background

In recent years, deep learning has been widely used in the fields of image processing, computer vision, and the like. However, as the depth of the neural network increases, the overfitting phenomenon of the large-scale deep neural network becomes more serious, which may cause performance degradation problems. An important cause of the over-fitting problem is insufficient data volume of the training set, and various data enhancement techniques suitable for image type data are widely proposed to expand the available training set data.

At present, the common image enhancement scheme obtains a new image by turning, cutting, translating, converting colors and the like, and achieves the purpose of expanding image data.

Disclosure of Invention

The embodiment of the application provides a data enhancement method, a data enhancement device and electronic equipment, so as to solve the problem of poor data enhancement effect.

In order to solve the technical problems, the application is realized as follows:

in a first aspect, an embodiment of the present application provides a data enhancement method, including:

acquiring N original images and N background image sets, wherein one original image corresponds to one background image set, N is an integer greater than 1, and any background image set comprises at least one background image;

Performing target detection on each original image in the N original images according to M target detection networks to obtain M target detection frames of each original image, wherein M is an integer greater than 1;

determining a first detection frame of each original image, wherein the first detection frame of each original image is a detection frame determined by M target detection frames of the original image;

and fusing the region corresponding to the first detection frame in each original image and at least one background image in the background image set corresponding to the original image to obtain at least one enhanced image corresponding to each original image.

It can be seen that, in the data enhancement method of this embodiment, M target detection frames of an original image may be obtained through M different target detection networks, the first detection frame of the original image is determined by using the M target detection frames of the original image, so as to improve accuracy of the first detection frame of the original image, and a region corresponding to the first detection frame in the original image is fused with at least one background image in a set of background images corresponding to the original image, so as to obtain at least one enhancement image corresponding to the original image, so as to implement data enhancement on the original image. Thus, compared with the original image, the obtained enhanced image is changed greatly, more additional information can be provided, and the image enhancement effect can be improved.

In a second aspect, embodiments of the present application further provide a data enhancement device, including:

the first acquisition module is used for acquiring N original images and N background image sets, wherein one original image corresponds to one background image set, N is an integer greater than 1, and any background image set comprises at least one background image;

the target detection module is used for carrying out target detection on each original image in the N original images according to M target detection networks to obtain M target detection frames of each original image, wherein M is an integer greater than 1;

the first determining module is used for determining a first detection frame of each original image, wherein the first detection frame of each original image is a detection frame determined by M target detection frames of the original image;

and the fusion module is used for fusing the region corresponding to the first detection frame in each original image and at least one background image in the background image set corresponding to the original image to obtain at least one enhanced image corresponding to each original image.

In a third aspect, embodiments of the present application further provide an electronic device, including: the data enhancement method comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the steps in the data enhancement method when executing the computer program.

In a fourth aspect, embodiments of the present application further provide a computer readable storage medium having a computer program stored thereon, which when executed by a processor, implements the steps of the data enhancement method described above.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a data enhancement method provided in an embodiment of the present application;

FIG. 2 is a schematic diagram of convolutional neural network training provided in an embodiment of the present application;

FIG. 3 is a schematic diagram of a data enhancement method provided by an embodiment of the present application;

fig. 4 is an application scenario diagram of a data enhancement method provided in an embodiment of the present application;

FIG. 5 is a schematic structural diagram of a data enhancement device according to an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

In the network training process through image data, the image data used for training is also called an image training sample, the image data can directly influence the quality of the network training, the over fitting phenomenon is a common problem in the network training process, and an important cause of the over fitting phenomenon is insufficient image training sample quantity. Image data enhancement is an important way to increase the sample size, however, the enhanced image data obtained by the enhancement modes such as inversion, clipping, translation, color conversion and the like is limited in the amount of additional information provided relative to the original image, the enhancement effect of the image data is poor, and even if the obtained enhanced image and the original image are used for network training, the over-fitting phenomenon is still easy to cause because the enhanced image has less modification relative to the original image and can provide less additional information. Based on this, the embodiment of the application provides a data enhancement method, which is to fuse a first detection frame obtained by performing target detection on an original image with an additional background image to obtain an enhanced image, so that the enhanced image has larger modification compared with the original image, the obtained enhanced image can provide more additional information, the image enhancement effect is improved, and the occurrence of the over-fitting condition can be reduced by performing network training on the original image and the enhanced image which has larger modification compared with the original image and can provide more additional information.

It should be noted that the method may be applied to an electronic device, and the method may be performed by the electronic device, where the electronic device may be any device that may be used to implement data enhancement, for example, and may include, but is not limited to, a terminal device or a server device, etc.

Referring to fig. 1, fig. 1 is a flowchart of a data enhancement method provided in an embodiment of the present application, as shown in fig. 1, including the following steps:

step 101: acquiring N original images and N background image sets, wherein one original image corresponds to one background image set;

n is an integer greater than 1, and any background image set comprises at least one background image.

The original image may be understood as an image to be enhanced by data, and N original images are in one-to-one correspondence with N background image sets.

Step 102: and carrying out target detection on each original image in the N original images according to the M target detection networks to obtain M target detection frames of each original image, wherein M is an integer greater than 1.

It should be noted that, the object detection is understood to be detecting a position of an object in an image, and the detected position is represented by a detection frame, which may also be referred to as a bounding box, and is a rectangular box, where the detected object in the image is located in a corresponding object detection frame.

In addition, the M target detection networks may be networks obtained by corresponding M iteration rounds in the iterative training process of the initial detection network, where a minimum iteration round of the M iteration rounds is greater than a target round, and the target round may be preset according to an actual situation, or a maximum training round set by model training, which is not specifically limited in this embodiment. For example, the target round may be set to a round between half of the maximum training round and the maximum training round, with the largest iteration round of the M iteration rounds being less than or equal to the maximum training round of the model training.

For example, the maximum training round of model training is 40, the target round may be set to 20, m may take 20, first perform 20 iterative training on the initial detection network, starting from 21 st iterative training, and after each iterative training is completed, recording the obtained network until 40 th iterative training is completed, so 20 networks may be recorded, that is, the 20 target detection networks may include the recorded 21 st round through 40 th round respectively corresponding to the obtained network.

Step 103: the first detection frame of each original image is determined, and the first detection frame of the original image is a detection frame determined by M target detection frames of the original image.

Each original image corresponds to M target detection frames, and it should be noted that the target detection frames may be target rectangular frames. The M target detection frames of the original image may be processed to obtain a first detection frame of the original image, and in one example, for each of the N original images, the M target detection frames of the original image may be averaged to obtain the first detection frame of the original image. For example, the target detection frame may be represented by four vertex coordinates, and the four vertex coordinates of the M target detection frames of the original image may be averaged to obtain a first detection frame of the original image, that is, the four vertex coordinates of the first detection frame are the average of the four vertex coordinates of the M target detection frames of the original image, for example, any vertex coordinate of the first detection frame is the average of the vertex coordinates of the M target detection frames of the original image. It should be noted that, any vertex coordinate includes two component coordinates, and averaging means that the same component coordinates of the vertex in the M target detection frames are averaged. For example, the M object detection frames of one original image include a first object detection frame and a second object detection frame, four vertexes of the first object detection frame are J11 (X11, Y11), J12 (X12, Y12), J13 (X13, Y13) and J14 (X14, Y14), respectively, four vertexes of the second object detection frame are J21 (X21, Y21), J22 (X22, Y22), J23 (X23, Y23) and J24 (X24, Y24), respectively, J11 is understood to be an upper left corner vertex of the first object detection frame, J12 is an upper right corner vertex of the first object detection frame, J13 is a lower left corner vertex of the first object detection frame, J14 is the lower right corner vertex of the first target detection frame, J21 can be understood as the upper left corner vertex of the second target detection frame, J22 is the upper right corner vertex of the second target detection frame, J23 is the lower left corner vertex of the second target detection frame, J24 is the lower right corner vertex of the second target detection frame, and four vertex coordinates of the first detection frame of the original image are obtained by performing averaging processing ((x11+x21)/2, (y11+y21)/2), ((x12+x22)/2, (y12+y22)/2), ((x13+x23)/2, (y13+y23)/2) and ((x14+x24)/2), (y14+y24)/2). And carrying out the similar process on the M target detection frames of each original image to obtain a first detection frame of each original image.

Step 104: and fusing the region corresponding to the first detection frame in each original image and at least one background image in the background image set corresponding to the original image to obtain at least one enhanced image corresponding to each original image.

The first detection frame of each original image is determined, the region corresponding to the first detection frame can be extracted from the original image according to the first detection frame of the original image, then the extracted region corresponding to the first detection frame in the original image is fused with each background image of at least one background image in the background image set corresponding to the original image, and therefore at least one enhanced image corresponding to the original image can be obtained, and image data enhancement is achieved. The number of enhanced images corresponding to the original image is the same as the number of background images in the background image set corresponding to the original image. And carrying out the similar fusion process on each original image to obtain at least one enhanced image corresponding to each original image.

It should be noted that, at least one enhancement image corresponding to each original image of the N images is obtained, which can be understood that an enhancement image set corresponding to each original image is obtained, that is, N enhancement image sets are obtained, where an enhancement image set corresponding to any original image includes at least one enhancement image corresponding to the original image, the N original images and the N enhancement image sets can be used for training a subsequent deep neural network, that is, the subsequent deep neural network can be trained by using the N original images and the N enhancement image sets, which not only can increase the training sample size, but also, because the enhancement image obtained in the embodiment of the present application has larger modification compared with the original image, the obtained enhancement image can provide more additional information, so as to reduce the occurrence of over-fitting condition in the training process.

In the data enhancement method of the embodiment, M target detection frames of an original image can be obtained through M different target detection networks, the first detection frame of the original image is determined by using the M target detection frames of the original image, so that the accuracy of the first detection frame of the original image is improved, and at least one enhancement image corresponding to the original image is obtained by fusing an area corresponding to the first detection frame in the original image with at least one background image in a background image set corresponding to the original image, so that data enhancement of the original image is realized. Thus, compared with the original image, the obtained enhanced image is changed greatly, more additional information can be provided, and the image enhancement effect can be improved.

In one embodiment, the object detection network comprises a convolutional neural network;

according to the M target detection networks, performing target detection on each original image in the N original images to obtain M target detection frames of each original image in the N original images, wherein the target detection frames comprise:

inputting the N Zhang Yuanshi images into M convolutional neural networks for feature extraction to obtain M feature images of each original image;

carrying out normalization processing on M feature images of each original image to obtain M thermodynamic diagrams of each original image;

And calculating the rectangular detection frames of each thermodynamic diagram to obtain M rectangular detection frames of each original image.

It will be appreciated that the convolutional neural network may include a plurality of convolutional layers, and the profile may be the profile of the output of the last convolutional layer in the convolutional neural network. And extracting the characteristics of the original image by using a convolutional neural network, so that more detail characteristics of the original image can be extracted to obtain a characteristic map which better characterizes the characteristics of the original image. And then, carrying out normalization processing on M feature images of the original image to obtain M thermodynamic diagrams of the original image so as to determine a target detection frame in the image later. In one example, the feature map may be normalized to a thermodynamic map with pixel values in the range of [0,1 ].

In one example, the convolutional neural network may be obtained by performing iterative training through a simsim self-supervision method. This self-supervising approach directly maximizes the similarity of two views of an image without using negative samples and without requiring a momentum encoder. As shown in fig. 2, for one image x (image x), it is randomly augmented (e.g., rotated, color processed, etc.) twice to obtain two different views x ₁ 、x ₂ As input, two views x ₁ 、x ₂ Respectively pass through a coding network to obtain a corresponding first vector z ₁ Second vector z ₂ ，z ₁ The third vector p is obtained by processing a projection layer (which can be a multi-layer perceptron layer) ₁ ，z ₂ The fourth vector p is obtained through projection layer processing ₂ The coding network is e.g. the encoder f in fig. 2, and the projection layer is e.g. the projector h in fig. 2. Stopping the gradient operation as shown in fig. 2 (i.e., stop-grad in fig. 2) is critical to prevent collapse of the model. Then, the negative value of cosine similarity is minimized:

cosine similarity is as similar as in FIG. 2, D (p ₁ ,z ₂ ) Is p ₁ And z ₂ A negative value of cosine similarity of (c).

The loss function L is in symmetrical form:

wherein D (p ₂ ,z ₁ ) Is p ₂ And z ₁ The negative value of cosine similarity of (c) is expressed as follows:

model n (e.g., 40) runs were trained by a random gradient descent (SGD) optimizer using the loss function described above. It should be noted that, the above-mentioned coding network may be a convolutional neural network (may include a feature extraction network and a conversion layer, where the last layer of the feature extraction network outputs a feature map, and the conversion layer converts the feature map into a vector), and after the training is completed, the convolutional neural network training is completed. It should be noted that, in the training process, the training set used may include the N original images.

The self-supervised learning can capture approximate position information of the target, and the embodiment of the application utilizes the characteristic to estimate the object frame in the image corresponding feature map. Since the position information at the initial stage of the self-supervision training is inaccurate, the convolutional neural network from the M (e.g., 20) rounds of iterative training to the convergence round (e.g., the training round at the end of the training, such as the n rounds) is used to perform feature extraction on the original image a, that is, the convolutional neural network from the m+1st round to the n-th round training is used to perform feature extraction on the original image a, so that (n-M) feature maps (i.e., feature maps output by the last convolutional layer of the convolutional neural network) can be obtained. Normalizing the feature map to between [0,1] to generate thermodynamic diagrams, and then calculating a target detection frame in each thermodynamic diagram according to the following mode:

B＝K(l[R>i])

wherein R represents thermodynamic diagram, i represents threshold value of activation point (i.e. preset pixel threshold value), l is an indication function, l [ R > i ] represents result 1 when the value of pixel point in R is larger than i, otherwise result 0, the binarization processing of R is implemented to each pixel point in R to obtain binarized image, K is a function of calculating rectangular closure, instant calculating target detection frame, and K function returns to target detection frame of binarized image of thermodynamic diagram R.

Because M convolutional neural networks are recorded, each convolutional neural network outputs a thermodynamic diagram to obtain M thermodynamic diagrams of an original image A, target detection frames are calculated respectively to obtain M target detection frames of the original image A, and the results of the M target detection frames are averaged, namely four vertex coordinates of the M target detection frames are averaged in sequence to obtain a final first detection frame of the original image A.

In one embodiment, calculating the rectangular detection frames of each thermodynamic diagram to obtain M rectangular detection frames of each original image includes:

for each thermodynamic diagram, carrying out binarization processing on the thermodynamic diagram according to a preset pixel threshold value to obtain a binarized image of the thermodynamic diagram;

based on the binarized image of the thermodynamic diagram, a rectangular detection box of the thermodynamic diagram is calculated.

The binarization process may adjust the value of a pixel in the thermodynamic diagram that is greater than a preset pixel threshold to a first value, e.g., 1, and adjust the value of a pixel in the thermodynamic diagram that is less than or equal to the preset pixel threshold to a second value, e.g., 0, where the value of any pixel in the resulting binarized image is either the first value or the second value. Because only two pixel values exist in the binarized image, the target detection frame calculation is performed by using the binarized image, and the accuracy of the detection frame calculation can be improved.

In one embodiment, acquiring N background image sets includes:

for each original image, determining at least one category matched with the category of the original image according to the category of the original image, wherein the similarity between each category in the at least one category and the category of the original image is larger than a preset threshold value;

acquiring at least one reference image corresponding to each category in at least one category to obtain a reference image set corresponding to an original image;

and obtaining a background image in each image of the reference image set to obtain a background image set corresponding to the original image.

The category of the original image can be understood as a category of a target in the original image, a plurality of categories can be preset, the category of the original image can be a category in a plurality of categories, and the plurality of categories can be, for example, but not limited to, a person, a pig, a sheep, a cat, a dog, a deer, a horse, a bird and the like, and the category of the original image can be obtained in advance. The background image is understood to be an image after removing the target, and may be, for example, a background region or the like remaining after removing the target from the reference image.

In this embodiment, the background image of the reference image is fused with the area of the first detection frame, and the reference image is an image with a category matched with that of the original image, so that the difference between the area of the first detection frame and the background image of the reference image can be reduced, and the rationality of the enhanced image obtained by fusion can be improved.

In one embodiment, before determining at least one category matching the category of the original image according to the category of the original image, for each original image, the method further comprises:

determining the similarity between every two categories in a plurality of categories, wherein the categories comprise the category of the original image and at least one category;

wherein determining at least one category matching the category of the original image according to the category of the original image comprises:

and determining at least one category matched with the category of the original image from the rest categories according to the similarity between the category of the original image and the rest categories in the plurality of categories, wherein the rest categories are the categories of the plurality of categories except the category of the original image, and the similarity between each category in the at least one category and the category of the original image is larger than a preset threshold value.

At least one category matched with the categories of the original image is selected from the other categories through the similarity among the categories, namely in the embodiment, the background image of the reference image and the area of the first detection frame are fused, and the reference image is an image corresponding to at least one category, the similarity among the categories of which is larger than a preset threshold value, determined through the similarity among the categories, of the categories of the original image, so that the difference between the area of the first detection frame and the background image of the reference image can be reduced, and the rationality of the fused enhanced image is improved.

As an example, at least one category is a category, i.e., a matched category is a category having the greatest similarity with the original image category among the plurality of categories.

For example, as shown in fig. 3 and 4, an original image a is input into M target detection networks, M target detection frames of the original image a are extracted, a first detection frame J of the original image a is obtained by using the M target detection frames of the original image a, a category with the highest category similarity to the original image a is determined according to the similarity between every two categories of the recorded plurality of categories, a background area D in at least one reference image C corresponding to the category with the highest similarity is obtained, and the first detection frame J is merged into the background area D in the at least one reference image C to obtain at least one enhanced image Q.

In one embodiment, determining the similarity between each two of the plurality of categories includes:

inputting a plurality of categories into a semantic model for semantic analysis to obtain semantic vector representation of each category in the plurality of categories;

cosine similarity between semantic vector representations of each two of the plurality of categories is calculated.

The semantic model is not particularly limited in this embodiment, and for example, a Glove model, a word2vec (a word vector model), or the like may be used as the semantic model. Since in this embodiment, the semantic model is used to perform semantic analysis on the category, so that a semantic vector representation (or referred to as a word vector representation) of the category can be extracted, it can be understood that a semantic vector representation of one category can be used to represent semantic information of the category, and the similarity between two categories, which is also called semantic similarity, can be calculated according to the semantic vector representations of the two categories, for example, cosine similarity between the semantic vector representations of the two categories is used as the similarity between the two categories, so that the accuracy of the similarity between the categories can be improved.

It should be noted that, by using the semantic similarity between each class, a similar class corresponding to the class of the original image is obtained, the first detection frame of the original image is fused with the background image of the reference image corresponding to at least one class similar to the class of the original image, so as to realize image reconstruction, obtain the enhanced image of the original image, restrict and reduce the possibility of ambiguity of the enhanced image generated by reconstruction to a certain extent, and improve the rationality of the enhanced image. For example, for a dog's image, the background may be grass, furniture, etc., while for its most similar class, the background may be similar as well, with the dog as the foreground, merging with the background of the dog's image of the most similar cat to the dog class, an enhanced image may be obtained as an enhanced sample, improving the rationality of the enhanced image obtained by merging. The data enhancement method provided by the embodiment of the application has the advantages that the modification of the original image is simple and reasonable, the amplitude is large, the amount of the additional information which can be provided is more, and the method has the potential to resist the problem of over-fitting.

It should be noted that, the available original image may include, but is not limited to, expression data, facial image, natural biological classification, etc., and the image with abundant background information in the original image is better. Before the data enhancement of the original image, the original image is used for self-supervision learning, so that the position estimation result of the target of the original image can be obtained. And calculating the semantic similarity among the categories according to the categories to obtain the most similar category. And then, carrying out data enhancement according to the original image, the first detection frame, the most similar category and the like, and using the data enhancement in a subsequent deep neural network training process. In the training process, for a certain image, the background area of the reference image whose target is similar to the target of the certain image is mixed and reconstructed into more sample images (enhanced images).

The data enhancement method of the embodiment of the application can be suitable for enhancing image data in a training process of a deep neural network, for example, in the training process of the deep neural network, due to insufficient quantity of original images, training is easy to be conducted on fitting, and in order to reduce training and fitting, image data enhancement is needed on the original images. It can be understood that the original image and the additional background image are fused to obtain the enhanced image, so that the enhanced image can provide more additional information relative to the original image, the original image and the enhanced image are utilized to perform deep neural network training, the training sample size can be increased, and the enhanced image obtained by the embodiment of the application is changed relatively greatly relative to the original image, the obtained enhanced image can provide more additional information, and therefore the occurrence of the over-fitting condition in the training process can be reduced.

Referring to fig. 5, fig. 5 is a block diagram of a data enhancement device according to an embodiment of the present application, which can implement details of the data enhancement method in the foregoing embodiment and achieve the same effects. As shown in fig. 5, the data enhancement device 500 includes:

a first obtaining module 501, configured to obtain N original images and N background image sets, where one original image corresponds to one background image set, N is an integer greater than 1, and any background image set includes at least one background image;

the target detection module 502 is configured to perform target detection on each original image in the N original images according to M target detection networks, to obtain M target detection frames of each original image, where M is an integer greater than 1;

a first determining module 503, configured to determine a first detection frame of each original image, where the first detection frame of the original image is a detection frame determined by M target detection frames of the original image;

and the fusion module 504 is configured to fuse an area corresponding to the first detection frame in each original image with at least one background image in the background image set corresponding to the original image, so as to obtain at least one enhanced image corresponding to each original image.

A target detection module, comprising:

the extraction module is used for inputting the N Zhang Yuanshi images into M convolutional neural networks to perform feature extraction to obtain M feature images of each original image;

the normalization processing module is used for carrying out normalization processing on the M feature images of each original image to obtain M thermodynamic diagrams of each original image;

the detection frame determining module is used for calculating rectangular detection frames of each thermodynamic diagram to obtain M rectangular detection frames of each original image.

In one embodiment, the detection frame determination module includes:

the binarization processing module is used for carrying out binarization processing on each thermodynamic diagram according to a preset pixel threshold value to obtain a binarized image of the thermodynamic diagram;

and the detection frame calculation module is used for calculating a rectangular detection frame of the thermodynamic diagram based on the binarized image of the thermodynamic diagram.

In one embodiment, a first acquisition module includes:

the category determining module is used for determining at least one category matched with the category of the original image according to the category of the original image for each original image;

the first image acquisition module is used for acquiring at least one reference image corresponding to each type in at least one type so as to obtain a reference image set corresponding to the original image;

And the second image acquisition module is used for acquiring the background image in each image of the reference image set so as to obtain a background image set corresponding to the original image.

In one embodiment, the apparatus 500 further comprises:

the similarity determining module is used for determining the similarity between every two categories in a plurality of categories, wherein the categories comprise the categories of the original image and at least one category;

wherein, the category determination module is used for:

In one embodiment, the similarity determination module includes:

the vector representation acquisition module is used for inputting a plurality of categories into the semantic model for semantic analysis to acquire semantic vector representation of each category in the plurality of categories;

and the similarity calculation module is used for calculating cosine similarity between semantic vector representations of every two categories in the plurality of categories.

In one embodiment, a first determining module 503 is configured to

And for each original image, carrying out average processing on M target detection frames of the original image to obtain a first detection frame of the original image.

The data enhancement device provided in this embodiment of the present application can implement each process implemented by the data enhancement method in the foregoing embodiment, and technical features are in one-to-one correspondence, so that repetition is avoided, and details are not repeated here.

Fig. 6 is a schematic hardware structure of an electronic device implementing various embodiments of the present application.

The electronic device 600 includes, but is not limited to: radio frequency unit 601, network module 602, audio output unit 603, input unit 604, sensor 605, display unit 606, user input unit 607, interface unit 608, memory 609, processor 610, and power supply 611. It will be appreciated by those skilled in the art that the electronic device structure shown in fig. 6 is not limiting of the electronic device and that the electronic device may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. In the embodiment of the application, the electronic device includes, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted terminal, a wearable device, a pedometer and the like.

Wherein the processor 610 is configured to:

the processor 610 is specifically configured to:

In one embodiment, the processor 610 is specifically configured to:

for each original image, determining at least one category matched with the category of the original image according to the category of the original image;

In one embodiment, the processor 610 is further configured to:

the processor 610 is further specifically configured to:

In one embodiment, the processor 610 is further specifically configured to:

The embodiments of the present application have the same beneficial technical effects as the embodiments of the data enhancement method, and are not described herein in detail.

It should be understood that, in the embodiment of the present application, the radio frequency unit 601 may be configured to receive and send information or signals during a call, specifically, receive downlink data from a base station, and then process the downlink data with the processor 610; and, the uplink data is transmitted to the base station. Typically, the radio frequency unit 601 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 601 may also communicate with networks and other devices through a wireless communication system.

The electronic device provides wireless broadband internet access to the user via the network module 602, such as helping the user to send and receive e-mail, browse web pages, and access streaming media, etc.

The audio output unit 603 may convert audio data received by the radio frequency unit 601 or the network module 602 or stored in the memory 609 into an audio signal and output as sound. Also, the audio output unit 603 may also provide audio output (e.g., a call signal reception sound, a message reception sound, etc.) related to a specific function performed by the electronic device 600. The audio output unit 603 includes a speaker, a buzzer, a receiver, and the like.

The input unit 604 is used for receiving audio or video signals. The input unit 604 may include a graphics processor (Graphics Processing Unit, GPU) 6041 and a microphone 6042, the graphics processor 6041 processing image data of still pictures or video obtained by an image capturing apparatus (such as a camera) in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 606. The image frames processed by the graphics processor 6041 may be stored in the memory 609 (or other storage medium) or transmitted via the radio frequency unit 601 or the network module 602. Microphone 6042 may receive sound and can process such sound into audio data. The processed audio data may be converted into a format output that can be transmitted to the mobile communication base station via the radio frequency unit 601 in the case of a telephone call mode.

The electronic device 600 also includes at least one sensor 605, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor includes an ambient light sensor and a proximity sensor, wherein the ambient light sensor can adjust the brightness of the display panel 6061 according to the brightness of ambient light, and the proximity sensor can turn off the display panel 6061 and/or the backlight when the electronic device 600 moves to the ear. As one of the motion sensors, the accelerometer sensor can detect the acceleration in all directions (generally three axes), and can detect the gravity and direction when stationary, and can be used for recognizing the gesture of the electronic equipment (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and knocking), and the like; the sensor 605 may also include a fingerprint sensor, a pressure sensor, an iris sensor, a molecular sensor, a gyroscope, a barometer, a hygrometer, a thermometer, an infrared sensor, etc., which are not described herein.

The display unit 606 is used to display information input by a user or information provided to the user. The display unit 606 may include a display panel 6061, and the display panel 6061 may be configured in the form of a liquid crystal display (Liquid Crystal Display, LCD), an Organic Light-Emitting Diode (OLED), or the like.

The user input unit 607 may be used to receive input numeric or character information and to generate key signal inputs related to user settings and function control of the electronic device. Specifically, the user input unit 607 includes a touch panel 6071 and other input devices 6072. Touch panel 6071, also referred to as a touch screen, may collect touch operations thereon or thereabout by a user (e.g., operations of the user on touch panel 6071 or thereabout using any suitable object or accessory such as a finger, stylus, or the like). The touch panel 6071 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into touch point coordinates, which are then sent to the processor 610, and receives and executes commands sent from the processor 610. In addition, the touch panel 6071 may be implemented in various types such as resistive, capacitive, infrared, and surface acoustic wave. The user input unit 607 may include other input devices 6072 in addition to the touch panel 6071. Specifically, other input devices 6072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a track ball, a mouse, and a joystick, which are not described herein.

Further, the touch panel 6071 may be overlaid on the display panel 6061, and when the touch panel 6071 detects a touch operation thereon or thereabout, the touch operation is transmitted to the processor 610 to determine a type of a touch event, and then the processor 610 provides a corresponding visual output on the display panel 6061 according to the type of the touch event. Although in fig. 6, the touch panel 6071 and the display panel 6061 are two independent components for implementing the input and output functions of the electronic device, in some embodiments, the touch panel 6071 and the display panel 6061 may be integrated to implement the input and output functions of the electronic device, which is not limited herein.

The interface unit 608 is an interface to which an external device is connected to the electronic apparatus 600. For example, the external devices may include a wired or wireless headset port, an external power (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 608 may be used to receive input (e.g., data information, power, etc.) from an external device and transmit the received input to one or more elements within the electronic apparatus 600 or may be used to transmit data between the electronic apparatus 600 and an external device.

The memory 609 may be used to store software programs as well as various data. The memory 609 may mainly include a storage program area that may store an operating system, an application program required for at least one function (such as a sound playing function, an image playing function, etc.), and a storage data area; the storage data area may store data (such as audio data, phonebook, etc.) created according to the use of the handset, etc. In addition, the memory 609 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The processor 610 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in the memory 609, and calling data stored in the memory 609, thereby performing overall monitoring of the electronic device. The processor 610 may include one or more processing units; preferably, the processor 610 may integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 610.

The electronic device 600 may also include a power supply 611 (e.g., a battery) for powering the various components, and preferably the power supply 611 may be logically coupled to the processor 610 via a power management system that performs functions such as managing charging, discharging, and power consumption.

In addition, the electronic device 600 includes some functional modules, which are not shown, and will not be described herein.

Preferably, the embodiment of the present application further provides an electronic device, including a processor 610, a memory 609, and a computer program stored in the memory 609 and capable of running on the processor 610, where the computer program when executed by the processor 610 implements each process of the embodiment of the data enhancement method, and the same technical effects can be achieved, and for avoiding repetition, a description is omitted herein.

The embodiment of the application further provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements each process of the above-mentioned data enhancement method embodiment, and can achieve the same technical effects, so that repetition is avoided, and no further description is provided herein. Wherein the computer readable storage medium is selected from Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), including several instructions for causing a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method described in the embodiments of the present application.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those of ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are also within the protection of the present application.

Claims

1. A method of data enhancement, comprising:

2. The method of claim 1, wherein the target detection network comprises a convolutional neural network;

performing object detection on each original image in the N original images according to the M object detection networks to obtain M object detection frames of each original image in the N original images, including:

3. The method according to claim 2, wherein calculating the rectangular detection frames of each thermodynamic diagram to obtain M rectangular detection frames of each original image comprises:

4. The method of claim 1, wherein acquiring the N background image sets comprises:

acquiring at least one reference image corresponding to each category in the at least one category to obtain a reference image set corresponding to the original image;

5. The method of claim 4, wherein for each original image, before determining at least one category matching the category of the original image based on the category of the original image, the method further comprises:

determining a similarity between every two categories of a plurality of categories, wherein the plurality of categories comprise the category of the original image and the at least one category;

wherein the determining at least one category matching the category of the original image according to the category of the original image comprises:

and determining at least one category matched with the category of the original image from the rest categories according to the similarity between the category of the original image and the rest categories of the plurality of categories, wherein the rest categories are categories of the plurality of categories except the category of the original image, and the similarity between each category of the at least one category and the category of the original image is larger than a preset threshold value.

6. The method of claim 5, wherein determining the similarity between each two of the plurality of categories comprises:

inputting the multiple categories into a semantic model for semantic analysis to obtain semantic vector representation of each category in the multiple categories;

7. The method of claim 1, wherein determining the first detection box for each original image comprises:

8. A data enhancement device, comprising:

9. An electronic device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the data enhancement method according to any one of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps in the data enhancement method according to any of claims 1 to 7.