CN113222874B

CN113222874B - Data enhancement method, device, equipment and storage medium applied to target detection

Info

Publication number: CN113222874B
Application number: CN202110610137.9A
Authority: CN
Inventors: 韦嘉楠; 周超勇; 刘玉宇
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-06-01
Filing date: 2021-06-01
Publication date: 2024-02-02
Anticipated expiration: 2041-06-01
Also published as: CN113222874A

Abstract

The invention discloses a data enhancement method, a device, equipment and a storage medium applied to target detection, wherein the method comprises the following steps: arranging four images in a final image frame, the images all having a target bounding box; randomly generating corresponding vertical dividing lines in the final image frame through each vertical overlapping area; multiplying and superposing mask matrixes of images on two sides of the vertical dividing line with an image matrix of the image respectively to form a one-side fusion image E and a other-side fusion image F; through at least one transverse overlapping area, randomly generating transverse dividing lines in a final image frame, and multiplying mask matrixes of one side fusion image E and the other side fusion image F with corresponding image matrixes respectively and superposing the mask matrixes to form a final image; and taking the product of the classification vector of the classification to which each target belongs and the average value of the mask values in the target boundary box as the classification vector of each target of the final image. The invention can reduce the loss of the target boundary box while finishing data enhancement.

Description

Data enhancement method, device, equipment and storage medium applied to target detection

Technical Field

The present invention relates to the field of data enhancement technologies, and in particular, to a data enhancement method, apparatus, device, and storage medium applied to target detection.

Background

Machine learning is a multi-domain interdisciplinary, a fundamental approach to computer intelligence that applies throughout the various domains of artificial intelligence, with machine learning algorithms attempting to mine the rules underlying it from large amounts of data and use it for prediction or classification. It should be noted that the goal of machine learning is to make the learned model well suited for "new samples", not just the ability to perform well on training samples, called Generalization (Generalization) ability. While machine learning to obtain models with better generalization capabilities often requires a large amount of training data.

The task of object detection is to find all objects of interest in an image, determine their location and size, which is one of the core problems in the machine vision field. The training data required for the object detection model should be labeled with the classification of the object and with the coordinates of the object bounding box.

At present, data enhancement is widely used in computer vision, and similar but different training samples are generated by making a series of random changes on training images, so that the size of a training data set is enlarged, and the generalization capability of a model is improved. The current data enhancement method with more applications is to respectively perform operations such as turning, cutting, color gamut changing and the like on pictures, then splice, in the splicing process, cut out the overlapping area of the pictures, only retain the content of the uppermost picture in the overlapping area, and cut out the picture covered below the overlapping area, and generate new training data by using the combination of the pictures, thereby providing a large amount of training data.

However, this data enhancement method also creates a new problem, since the spliced pictures are already subjected to operations such as flipping, clipping, color gamut changing, etc., and the re-clipping of the overlapped area is more likely to cause excessive loss of the pictures and the bounding boxes, so that a part of the target image may be lost, and some solutions are: for a bounding box with a part cut out, the bounding box can be deleted directly, or the cut out boundary is taken as a new boundary of the bounding box, or the size of the original bounding box is kept unchanged, and the coordinates of the bounding box are recalculated in a new picture. However, these operations may change the accuracy of the bounding box, thereby affecting the training effect.

Disclosure of Invention

The invention provides a data enhancement method, a device, electronic equipment and a storage medium applied to target detection, and aims to reduce the loss of a target bounding box and achieve the purpose of data enhancement.

In order to achieve the above object, the present invention provides a data enhancement method applied to object detection, including:

arranging four images at four corner positions in a final image frame in a partially overlapped mode, wherein each image is provided with a target boundary frame marked with a target therein, and the target boundary frames are marked with the belonging classification of the target and the center coordinates of the target boundary frame;

Corresponding vertical dividing lines are randomly generated in the final image frame through each vertical overlapping area, wherein the vertical overlapping areas are overlapping areas formed between two images on the same side in the up-down direction in the final image frame;

multiplying and superposing mask matrixes of images at two sides of each vertical dividing line with an image matrix of the image respectively, so as to form a one-side fusion image E and a other-side fusion image F;

randomly generating a transverse dividing line penetrating through a final image frame in the final image frame through at least one transverse overlapping area, and multiplying mask matrixes of the one-side fusion image E and the other-side fusion image F with corresponding image matrixes respectively to form a final image, wherein the transverse overlapping area is an overlapping area formed by two images on the same side in the left-right direction in the final image frame;

the center coordinates of each object bounding box in the final image box are updated and the product of the classification vector of each object's class and the average value of the mask values within its object bounding box is taken as the classification vector of each object of the final image.

Optionally, the mask matrix includes a mask of a local image area and a mask of a non-local image area, the mask of the local image area means that in an overlapping area, a mask value is set from an edge of one image to the dividing line in a manner that a first set value gradually decreases to a second set value, a mask value is set from the dividing line to an edge of the other image in a manner that the second set value gradually decreases to 0, the mask value in the non-overlapping area is set to the first set value,

The non-own image area mask refers to a mask filled with 0 values except for own image area in a common minimum envelope rectangle of images at both sides of a dividing line, the own image area mask and the non-own image area mask together form a mask matrix of the image,

the image matrix comprises an image area element and a non-image area element, wherein the image area element is an element value of the image, and the non-image area element is a 0 value filled outside an image area in a common minimum envelope rectangle of the images at two sides of the dividing line.

Optionally, each image is pre-processed by at least one of flipping, scaling, and tone transforming before being placed at the four corner locations in the final image frame.

Optionally, at least one corner of each of the four images is aligned with a corresponding corner of the final image frame.

Optionally, the classification vector is represented by a one-hot vector.

Optionally, an image matrix A of the one-sided fused image E is obtained _e The formula of (2) is as follows:

A _e ＝M _a ⊙A _a +M _b ⊙A _b

wherein M is _a 、A _a The mask matrix and the image matrix of one image in the one-side fusion image E are respectively;

M _b 、A _b is the mask matrix and the image matrix of the other image in the one-side fusion image E; obtaining an image matrix A of the fusion image F on the other side _f The formula of (2) is as follows:

A _f ＝M _c ⊙A _c +M _d ⊙A _d

wherein M is _c 、A _c A mask matrix and an image matrix of one image in the fused image F at the other side are respectively;

M _d 、A _d is the mask matrix and the image matrix of the other image in the other side fusion image F;

obtaining an image matrix A of the final image _mix The formula of (2) is as follows:

A _mix ＝M _e ⊙A _e +M _f ⊙A _f

M _e is the mask matrix of the one-side fusion image E;

M _f is the mask matrix of the other side fusion image F.

Optionally, the first set value is 1, and the second set value is 0.5.

The invention also provides a data enhancement device applied to target detection, which comprises:

the image arrangement module is used for respectively arranging four images at four corner positions in a final image frame in a partially overlapped mode, wherein each image is provided with a target boundary frame marked with a target therein, and the target boundary frames are marked with the belonging classification of the target and the center coordinates of the target boundary frame;

a dividing line generating module, configured to randomly generate a corresponding vertical dividing line in the final image frame through each vertical overlapping area, where the vertical overlapping area refers to an overlapping area formed between two images on the same side in the up-down direction in the final image frame, and to randomly generate a horizontal dividing line in the final image frame through at least one horizontal overlapping area, where the horizontal overlapping area refers to an overlapping area formed by two images on the same side in the left-right direction in the final image frame;

The first image fusion module is used for multiplying and superposing mask matrixes of images at two sides of each vertical dividing line with the image matrixes of the images respectively so as to form a one-side fusion image E and a other-side fusion image F;

the second image fusion module is used for multiplying the mask matrix of the one-side fusion image E and the other-side fusion image F with the corresponding image matrix respectively and superposing the mask matrix and the corresponding image matrix to form a final image;

and the target updating module is used for updating the center coordinates of the boundary boxes of the targets in the final image frame, and taking the product of the classification vector of the classification of the targets and the average value of the mask values in the boundary boxes of the targets as the classification vector of the targets of the final image.

The invention also provides an electronic device comprising:

at least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a data enhancement method for object detection as described above.

The present invention also provides a computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements a data enhancement method as described above for application to object detection.

According to the invention, the arranged images are re-segmented by adopting the vertical segmentation lines and the horizontal segmentation lines, and then the overlapping areas are fused according to different weights. And the multiple images are fused and spliced in the overlapping area in a gradual change weight mode, so that the loss of a target boundary frame is effectively reduced, and the data enhancement of the images for training the target detection model is realized.

Drawings

FIG. 1 is a flowchart illustrating an embodiment of a data enhancement method for target detection according to the present invention;

fig. 2a is a schematic layout view of an image A, B, C, D of a first embodiment provided by the present invention;

fig. 2b is a schematic diagram of the fusion of the image A, B of the first embodiment into a single-side fusion image E;

fig. 2c is a schematic diagram showing the fusion of the image C, D of the first embodiment into the other side fusion image F according to the present invention;

fig. 3a is a schematic layout view of an image A, B, C, D of a second embodiment provided by the present invention;

FIG. 3b is a schematic view of a parting line according to a second embodiment of the present invention;

fig. 4a is a schematic layout view of an image A, B, C, D of a third embodiment provided by the present invention;

FIG. 4b is a schematic view of a third embodiment of a parting line according to the present invention;

Fig. 5a is a schematic layout view of an image A, B, C, D of a fourth embodiment provided by the present invention;

FIG. 5b is a schematic view of a fourth embodiment of a parting line according to the present invention;

fig. 6 is a schematic layout view of an image A, B, C, D of a fifth embodiment provided by the present invention;

FIG. 7 is a schematic block diagram illustrating an embodiment of a data enhancement device for object detection according to the present invention;

FIG. 8 is a schematic structural diagram of an embodiment of an electronic device implementing a data enhancement method applied to object detection according to the present invention;

the achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Referring to fig. 1, a flow chart of a data enhancement method applied to object detection according to an embodiment of the present invention is shown. The method may be performed by an apparatus, which may be implemented in software and/or hardware.

First embodiment

In this embodiment, the data enhancement method applied to target detection includes:

and S10, arranging four images at four corner positions in a final image frame in a partially overlapped mode, wherein each image has a target boundary frame marked with a target therein, and the target boundary frame is marked with the belonging classification of the target and the center coordinate of the target boundary frame.

As shown in fig. 2a, the final image frame is rectangular, and the final image frame has a width W and a height h. Wherein the four corners are respectively arranged at the upper left and the upper right of the final image frame of the four preprocessed imagesLower left, lower right, and preferably aligns one corner of the image with a corresponding corner of the final image frame, so that the final fusion splice forms the final image in the final image frame. Of course, the present invention does not exclude that an image may have multiple corners aligned with corresponding corners of the final image frame, and the calculation method is not different, and only one corner alignment is described below as an example. In fig. 2a the first image is shown on the top left, the second image is shown on the top right, the third image is shown on the bottom left and the fourth image is shown on the bottom right. For convenience of distinction, four images are respectively marked as A, B, C, D, and the width and height of the image area are sequentially corresponding to w _a ，w _b ，w _c ，w _d ，h _a ，h _b ，h _c ，h _d . For clarity, the left-hand side of the drawing is shown by a dash-dot line and the right-hand side is shown by a two-dot-dash line in fig. 2 a. It can be seen that the right side of image a overlaps the left side of image B, the lower side of image a overlaps both images C and D, the lower side of image B does not overlap both images C and D, and the upper side edges of images C and D are flush.

Each image is marked with a target, each target has its associated class and a target bounding box, and the center coordinates of each target bounding box have been marked in the image. The object detection differs from the image classification in that the image classification only requires placing the image into a class, whereas the object detection not only requires achieving the classification of the object, but also requires determining the positioning of the object, i.e. obtaining the position information of the object in the image, and using the object bounding box to draw out the object, and giving the center coordinates of the object bounding box. The data enhancement of the images requires the position information of the object bounding boxes of the objects in the images to be re-given.

Step S20, through each vertical overlapping area, randomly generating a corresponding vertical dividing line in the final image frame. The vertical overlap region refers to an overlap region formed between images disposed left and right.

As in fig. 2a, images a and B are formed with vertical overlap regions, and images C and D are also formed with vertical overlap regions, passing through these two vertical overlap regionsThe two vertical overlapping areas may be provided with only one common vertical dividing line, and the vertical dotted line in fig. 2a is the vertical dividing line, and its coordinate is W _mixed . It is of course also possible to provide a vertical dividing line, respectively, as will be described further below.

Step S30, multiplying the mask matrix of the images at two sides of each vertical dividing line by the image matrix of the image respectively and superposing to form a one-side fusion image E and a other-side fusion image F.

The mask matrix comprises an image area mask and a non-image area mask, wherein the image area mask is that mask values are set in an overlapping area from the edge of one image to the dividing line in a mode that a first set value gradually decreases to a second set value, mask values are set in a mode that the second set value gradually decreases to 0 from the dividing line to the edge of the other image, and the mask values in the non-overlapping area are set to the first set value. The first set point is preferably 1 and the second set point is preferably 0.5.

The non-own image area mask is a mask filled with 0 values except for the own image area in the common minimum envelope rectangle of the images at both sides of the dividing line, and the own image area mask and the non-own image area mask jointly form a mask matrix of the image.

The image matrix comprises an image area element and a non-image area element, wherein the image area element is an element value of an image of the image, and the non-image area element is filled with a value of 0 except for the image area in a common minimum envelope rectangle of the images at two sides of the dividing line. The elements of the image are referred to as pixels.

Taking A, B two images as an example of left-right fusion, the image A is on the left side, the image B is on the right side, and the fusion effect to be realized is as follows:

in the non-overlapping areas of the image A and the image B, reserving corresponding areas, namely setting mask values of corresponding positions to be 1;

in the overlapping area of the image A and the image B, the coordinate is W _mixed On the vertical split lines of the columns, images A, B each account for 50% weight, and the mask values of A, B are all 0.5.

For the A image, the mask of the image area is from the left edge of the A image to the left edge of the B image, and the area of the A image below the longitudinal overlapping area is a non-overlapping area, the mask value is 1, and the mask value is from the left edge of the B image to W _mixed The mask value of the column gradually decreases from 1 to 0.5 and from W _mixed The mask value is gradually decremented from 0.5 to 0, listed to the right edge of the a image.

The present image area mask of the a image is thus obtained as follows:

refilling the mask with 0 value in the region except the region of the image in the common minimum envelope rectangle of the image A and the image B to obtain a mask matrix M consistent with the common minimum envelope rectangle of the image A and the image B _a 。

For the B image, the mask value is 1 from the right edge of the B image to the right edge of the A image, and the mask value is 1 from the right edge of the A image to the W _mixed The mask value of the column gradually decreases from 1 to 0.5 and from W _mixed The mask value is gradually decremented from 0.5 to 0, column to the left edge of the B image.

The mask of the B image in the present image area is thus obtained as follows:

filling a mask with 0 value in the region except the region of the image in the common minimum envelope rectangle of the image A and the image B to obtain a mask matrix M of the image B _b 。

For the image A, the area elements of the image are the areas covered by the image, the corresponding pixel values are taken, and all the areas except the area of the image in the common minimum envelope rectangle of the image A and the image B are set to 0, so that an image matrix A is obtained _a . For B image, the local image area element is the imageThe covered areas are all taken to be the corresponding pixel values, and all the areas except the image area in the common minimum envelope rectangle of the image A and the image B are set to 0, so that an image matrix A of the image is obtained _b 。

Mask matrix M _a ，M _b The same matrix as the image A, B can be used for element multiplication. Thus, an upper fusion image E obtained after the progressive fusion of the image A and the image B is obtained, and the formula is as follows:

A _e ＝M _a ⊙A _a +M _b ⊙A _b

wherein, as follows, the dot product is shown, the image matrix of the A image and the B image are A respectively _a ，A _b The image matrix of image E is A _e 。

The two images A, B are fused and spliced to form a one-side fused image E, the size of the one-side fused image E is consistent with the common minimum envelope rectangle size of the image A and the image B, and the form is shown in fig. 2B.

The mask matrix and the image matrix of the image C and the image D can be obtained by adopting the same method, wherein the mask matrix of the image C is as follows:

the mask matrix for image D is as follows:

the image C and the image D are fused and spliced to obtain a fused image F on the other side, the size of the fused image F is consistent with the common minimum envelope rectangle size of the image C and the image D, the form is shown in figure 2C, and the image matrix of the obtained image F is A _f 。

Step S40, through the transverse overlapping area, randomly generating a transverse dividing line with a coordinate h in the final image frame _mixed For the one-side fusion image E and the other-side fusion image F, respectively masking matrixes and corresponding image momentsThe arrays are multiplied and superimposed to form the final image. The lateral overlapping area refers to an overlapping area formed between the upper and lower images.

In the lateral overlapping region, a mask value is set from the edge of the one side fusion image E to the lateral dividing line in such a manner that the first setting value gradually decreases to the second setting value, and a mask value is set from the lateral dividing line to the edge of the other side fusion image F in such a manner that the second setting value gradually decreases to 0, and the mask value in the non-overlapping region is set to the first setting value, thereby obtaining the present image region mask. The first set point is preferably 1 and the second set point is preferably 0.5.

Outside the local image area in the common minimum envelope rectangle (i.e. the final image frame) of the images E and F, filling the mask with 0 value to obtain a non-local image area mask, wherein the local image area mask and the non-local image area mask jointly form a mask matrix of the image, thereby obtaining a mask matrix M of the one-side fusion image E _e 。

Likewise, a mask matrix M of the other-side fusion image F can be obtained _f 。

Obtaining a mask matrix M _e ，M _f Then, respectively corresponding to the image matrix A _e 、A _f Multiplying and superposing to obtain a final image matrix A _mix ＝M _e ⊙A _e +M _f ⊙A _f 。

And S50, updating coordinates of the target boundary boxes of the images in the final image frame, and taking products of classification vectors of the classifications to which the targets belong and average values of mask values in the target boundary boxes as the classification vectors of the targets of the final image.

Specifically, each image A, B, C, D has a target bounding box for target detection, but each target bounding box is represented in the form of coordinates in the original respective image, and its actual coordinates have changed after stitching with other images, so that it is necessary to convert it into a coordinate representation of each target bounding box in the final image after the final image is formed.

Typically the location of the target bounding box is represented by (P _xi ，P _yi ，P _wi ，P _hi ) Expressed by, wherein P _xi X-axis coordinates, P, representing the center point of the target bounding box in the original image _yi Representing the Y-axis coordinate, P, of the center point of the target bounding box in the original image _wi Representing the width of the target bounding box, P _hi Representing the height of the target bounding box. For example, with the upper left corner of the final image as the origin, the position of the target bounding box in the original image B is (P _xi-B ，P _yi-B ，P _wi-B ，P _hi-B ) The position of the target boundary frame in the final image after fusion and splicing is [ (W-W) _b +P _xi-B )，P _yi-B ，P _wi-B ，P _hi-B ]. Likewise, new coordinates may be obtained by calculation for each target bounding box. Since the coordinate conversion is simple, it is not described in detail here.

Further, each image is pre-processed by at least one of flipping, scaling, and tone transforming before being placed in the final image frame at the location to be fused. Specifically, the turning refers to at least one of horizontal turning and vertical turning of the image. The flipping is illustrated with the image in a portrait setting, where the image is flipped 180 ° in the horizontal plane and the vertical flipping is flipped 180 ° about a horizontal axis in the image plane.

Specifically, the zooming refers to zooming in or zooming out the image in a certain proportion.

In addition, the tone conversion includes operations of brightness, saturation, and the like, which can change the image, and are not described in detail herein.

Further, the classification vector may be expressed in the form of a one-hot vector. For example, if the average value of the mask matrix corresponding to the region within a certain target bounding box in the image a is 0.8, and the classification vector corresponding to the target bounding box is represented by a one-hot vector (0, 1, 0), the one-hot vector after fusion and splicing is (0,0.8,0,0).

It should be noted that, in step S30 and step S40, two images on two sides of the transverse dividing line are fused respectively, then the images on two sides of the transverse dividing line are fused together, step S30 and step S40 are only exemplary, and the transverse dividing line may be generated first, two images on two sides of the transverse dividing line are fused respectively, then the vertical dividing line is formed, and then the images on two sides of the vertical dividing line are fused together.

In addition, fig. 2a to 2c show that the vertical dividing line is in the form of one strip, and the vertical overlapping area may also be as shown in fig. 3a to 3 b. Fig. 3a shows an arrangement of images A, B, C, D, and the section lines in fig. 3b show a vertical overlap region and a lateral overlap region formed respectively, and it can be seen that the vertical overlap region is formed as two, and the lateral overlap region is formed as one. And one vertical parting line cannot pass through the two vertical overlapping areas, two vertical parting lines can be arranged. And fusing the images at two sides of each vertical dividing line, then regenerating a transverse dividing line, and fusing the fused images at two sides of the transverse dividing line. The specific fusion method is the same as the above method, and will not be described in detail here.

Fig. 4a is another arrangement of images A, B, C, D, whereby the vertical and horizontal overlap regions are constituted as seen by section lines in fig. 4 b. The vertical overlapping area and the horizontal overlapping area are two, and the horizontal parting line cannot pass through the two horizontal overlapping areas at the same time, so that the horizontal cross section line only passes through one horizontal overlapping area.

Fig. 5a is another arrangement of images A, B, C, D, whereby the vertical and horizontal overlap regions are constituted as seen by section lines in fig. 5 b. The number of the vertical overlapping areas and the number of the horizontal overlapping areas are two, and at the moment, the horizontal section line can pass through the two horizontal overlapping areas at the same time or can pass through only one horizontal overlapping area.

In addition, the above is described in terms of the corner of the image A, B, C, D being aligned with the corner of the final image frame, but the present application is not limited to this arrangement, and may be other than the corner alignment, for example, in fig. 6, the corner of the image C is not completely aligned with the lower left corner. And the image A and the image B are fused into a one-side fused image E, the image C and the image D are fused into another-side fused image F, and the fusion is carried out according to the method.

Second embodiment

In addition, fig. 2a to 2c show that the vertical dividing line is in the form of one strip, and the vertical overlapping area may also be as shown in fig. 3a to 3 b. Fig. 3a shows an arrangement of images A, B, C, D, and the section lines in fig. 3b show a vertical overlap region and a lateral overlap region formed respectively, and it can be seen that the vertical overlap region is formed as two, and the lateral overlap region is formed as one. And one vertical parting line cannot pass through the two vertical overlapping areas, two vertical parting lines can be arranged. And fusing the images at two sides of each vertical dividing line, then regenerating a transverse dividing line, and fusing the fused images at two sides of the transverse dividing line. The specific fusion method is the same as that of the first embodiment, and the description thereof is omitted here.

Third embodiment

Fig. 4a is another arrangement of images A, B, C, D, whereby the vertical and horizontal overlap regions are constituted as seen by section lines in fig. 4 b. The vertical overlapping area and the horizontal overlapping area are two, and the horizontal parting line cannot pass through the two horizontal overlapping areas at the same time, so that the horizontal cross section line only passes through one horizontal overlapping area. The specific image fusion method thereof is the same as that of the first embodiment, and a description thereof is omitted here.

Fourth embodiment

Fig. 5a is another arrangement of images A, B, C, D, whereby the vertical and horizontal overlap regions are constituted as seen by section lines in fig. 5 b. The number of the vertical overlapping areas and the number of the horizontal overlapping areas are two, and at the moment, the horizontal section line can pass through the two horizontal overlapping areas at the same time or can pass through only one horizontal overlapping area. The specific image fusion method thereof is the same as that of the first embodiment, and a description thereof is omitted here.

Fifth embodiment

In addition, the above is described in terms of the corner of the image A, B, C, D being aligned with the corner of the final image frame, but the present application is not limited to this arrangement, and may be other than the corner alignment, for example, in fig. 6, the corner of the image C is not completely aligned with the lower left corner. And the image A and the image B are fused into a one-side fused image E, the image C and the image D are fused into another-side fused image F, and the fusion is carried out according to the method. The specific image fusion method thereof is the same as that of the first embodiment, and a description thereof is omitted here.

Fig. 7 is a schematic functional block diagram of an embodiment of a data enhancement device for object detection according to the present invention.

The data enhancement device 100 for object detection of the present invention may be installed in an electronic apparatus. Depending on the implemented functions, the data enhancement device 100 applied to object detection may include an image arrangement module 101, a segmentation line generation module 102, a first image fusion module 103, a second image fusion module 104, and an object update module 105, where the modules refer to a series of computer program segments capable of being executed by a processor of an electronic device and of performing a fixed function, and stored in a memory of the electronic device.

In the present embodiment, the functions concerning the respective modules are as follows:

wherein the image arrangement module 101 is configured to arrange four images in a partially overlapping manner at four corner positions in a final image frame, wherein each image has a target bounding box labeling a target therein, the target bounding box being labeled with the belonging class of the target, and the center coordinates of the target bounding box.

As shown in fig. 2a, the final image frame is rectangular, and the final image frame has a width W and a height h. Wherein the four corners refer to that the four preprocessed images are respectively placed at the left upper, right upper, left lower and right lower positions in the final image frame, and the corners of the images are preferably aligned with the corners of the final image frame so as to be finally fused and spliced to form the final image in the final image frame. In fig. 2a the first image is shown on the top left, the second image is shown on the top right, the third image is shown on the bottom left and the fourth image is shown on the bottom right. For convenience of distinction, four images are respectively marked as A, B, C, D, and the width and height of the image area are sequentially corresponding to w _a ，w _b ，w _c ，w _d ，h _a ，h _b ，h _c ，h _d . For clarity, the left-hand side of the drawing is shown by a dash-dot line and the right-hand side is shown by a two-dot-dash line in fig. 2 a. It can be seen that the right side of image a overlaps the left side of image B, the lower side of image a overlaps both images C and D, the lower side of image B does not overlap both images C and D, and the upper side edges of images C and D are flush.

The division line generating module 102 is configured to randomly generate a corresponding vertical division line in the final image frame through each vertical overlapping region. And creating a lateral split line across the at least one lateral overlap region. The vertical overlapping area refers to an overlapping area formed between the images arranged left and right in the final image frame. The lateral overlapping area refers to an overlapping area formed between the images disposed one above the other in the final image frame.

If the images a and B in fig. 2a are formed with vertical overlapping areas, the images C and D are also formed with vertical overlapping areas, and only one common vertical dividing line may be provided through the two vertical overlapping areas, and the vertical dotted line in fig. 2a is a vertical dividing line, and its coordinate is W _mixed . It is of course also possible to provide a vertical dividing line, respectively, as will be described further below.

The first image fusion module 103 is configured to multiply and superimpose the mask matrices of the images on both sides of each vertical dividing line with the image matrix of the image to form a one-side fusion image E and another-side fusion image F.

The present image area mask of the a image is thus obtained as follows:

The mask of the B image in the present image area is thus obtained as follows:

For the image A, the area elements of the image are the areas covered by the image, the corresponding pixel values are taken, and all the areas except the area of the image in the common minimum envelope rectangle of the image A and the image B are set to 0, so that an image matrix A is obtained _a . For the B image, the area elements of the B image are the areas covered by the image, the corresponding pixel values are taken, and all the areas except the area of the B image in the common minimum envelope rectangle of the image A and the image B are set to 0, so that an image matrix A of the B image is obtained _b 。

A _e ＝M _a ⊙A _a +M _b ⊙A _b

the mask matrix for image D is as follows:

And the second image fusion module 104 is configured to multiply the mask matrix of the one-side fusion image E and the other-side fusion image F with the corresponding image matrix and superimpose the mask matrix to form a final image. The lateral overlapping area refers to an overlapping area formed between the upper and lower images.

The object updating module 105 is configured to update coordinates of an object bounding box of each image in the final image frame, and take a product of a classification vector of each object belonging to the classification and an average value of mask values in the object bounding box as a classification vector of each object of the final image.

Typically the location of the target bounding box is represented by (P _xi ，P _yi ，P _wi ，P _hi ) Expressed by, wherein P _xi X-axis coordinates, P, representing the center point of the target bounding box in the original image _yi Representing the Y-axis coordinate, P, of the center point of the target bounding box in the original image _wi Representing the width of the target bounding box, P _hi Representing the height of the target bounding box. For example, with the upper left corner of the final image as the origin, the position of the target bounding box in the original image B is (P _xi-B ，P _yi-B ，P _wi-B ，P _hi-B ) Then its target edgeThe position of the boundary frame in the final image after fusion and splicing is [ (W-W) _b +P _xi-B )，P _yi-B ，P _wi-B ，P _hi-B ]. Likewise, new coordinates may be obtained by calculation for each target bounding box. Since the coordinate conversion is simple, it is not described in detail here.

Fig. 8 is a schematic structural diagram of an embodiment of an electronic device according to the present invention for implementing a data enhancement method applied to object detection.

The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program stored in the memory 11 and executable on the processor 10, such as a data enhancement program 12 for object detection.

The memory 11 includes at least one type of readable storage medium, including flash memory, a mobile hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may in other embodiments also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only for storing application software installed in the electronic device 1 and various types of data, such as code of a data enhancement program applied to object detection, but also for temporarily storing data that has been output or is to be output.

The processor 10 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, combinations of various control chips, and the like. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects respective components of the entire electronic device using various call interfaces and lines, executes programs or modules (e.g., a data enhancement program applied to object detection, etc.) stored in the memory 11 by running or executing the programs or modules, and calls data stored in the memory 11 to perform various functions of the electronic device 1 and process the data.

The bus may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. The bus is arranged to enable a connection communication between the memory 11 and at least one processor 10 etc.

Fig. 8 shows only an electronic device with components, and it will be appreciated by a person skilled in the art that the structure shown in fig. 8 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or may combine certain components, or may be arranged in different components.

For example, although not shown, the electronic device 1 may further include a power source (such as a battery) for supplying power to each component, and optionally, the power source may be logically connected to the at least one processor 10 through a power management device, so as to implement functions of charge management, discharge management, and power consumption management through the power management device. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The electronic device 1 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described herein.

Further, the electronic device 1 may further comprise a network invocation interface, optionally the network invocation interface may comprise a wired invocation interface and/or a wireless invocation interface (e.g. WI-FI invocation interface, bluetooth invocation interface, etc.), typically used for establishing a communication connection between the electronic device 1 and other electronic devices.

The electronic device 1 may optionally further comprise a user-invoked interface, which may be a Display (Display), an input unit, such as a Keyboard (Keyboard), or a standard wired-invoked interface, a wireless-invoked interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. The display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device 1 and for displaying a visual user interface.

It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.

The data enhancement program 12 stored in the memory 11 of the electronic device 1 and applied to object detection is a combination of a plurality of instructions, which when run in the processor 10, can implement:

The specific operation flow is shown in fig. 1, and the specific operation flow can be referred to in fig. 1 for a description of the data enhancement method applied to the target detection, which is not repeated here.

Further, the modules integrated in the electronic device 1 may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as a stand alone product. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM).

In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be other manners of division when actually implemented.

The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units can be realized in a form of hardware or a form of hardware and a form of software functional modules.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof.

The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

Finally, it should be noted that the above-mentioned embodiments are merely for illustrating the technical solution of the present invention and not for limiting the same, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made to the technical solution of the present invention without departing from the spirit and scope of the technical solution of the present invention.

Claims

1. A data enhancement method for target detection, comprising:

Updating the center coordinates of each object bounding box in the final image box, and taking the product of the classification vector of each object belonging to the classification and the average value of the mask values in the object bounding box as the classification vector of each object of the final image,

the mask matrix includes a mask of a local image area and a mask of a non-local image area, wherein the mask of the local image area is that mask values are set from the edge of one image to the dividing line in a mode that a first set value gradually decreases to a second set value, the mask values are set from the dividing line to the edge of the other image in a mode that the second set value gradually decreases to 0, the mask value of the non-local image area is set to the first set value,

wherein the image matrix comprises a local image area element and a non-local image area element, the local image area element is an element value of the local image, the non-local image area element is a 0 value filled outside the local image area in a common minimum envelope rectangle of the images at two sides of the dividing line,

Obtaining an image matrix A of the one-side fusion image E _e The formula of (2) is as follows:

A _e ＝M _a ⊙A _a +M _b ⊙A _b

M _b 、A _b is the mask matrix and the image matrix of the other image in the one-side fusion image E;

obtaining an image matrix A of the fusion image F on the other side _f The formula of (2) is as follows:

A _f ＝M _c ⊙A _c +M _d ⊙A _d

A _mix ＝M _e ⊙A _e +M _f ⊙A _f

M _e is the mask matrix of the one-side fusion image E;

M _f is the mask matrix of the other side fusion image F.

2. The data enhancement method for object detection according to claim 1,

the images are pre-processed by at least one of flipping, scaling, and tone transforming before being arranged at the four corner locations in the final image frame.

3. The data enhancement method for object detection according to claim 1,

at least one corner of each of the four images is aligned with a corresponding corner of the final image frame.

4. The data enhancement method for object detection according to claim 1,

the classification vector is represented by a one-hot vector.

5. The data enhancement method for object detection according to claim 1,

the first set value is 1, and the second set value is 0.5.

6. A data enhancement device for use in object detection, comprising:

a target updating module for updating the center coordinates of each target bounding box in the final image frame and taking the product of the classification vector of each target belonging to the classification and the average value of the mask values in the target bounding box as the classification vector of each target of the final image,

A _e ＝M _a ⊙A _a +M _b ⊙A _b

A _f ＝M _c ⊙A _c +M _d ⊙A _d

A _mix ＝M _e ⊙A _e +M _f ⊙A _f

M _e is the mask matrix of the one-side fusion image E;

M _f is the mask matrix of the other side fusion image F.

7. An electronic device, the electronic device comprising:

at least one processor; the method comprises the steps of,

A memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the data enhancement method for object detection according to any one of claims 1 to 5.

8. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the data enhancement method for object detection according to any one of claims 1 to 5.