CN113743219B

CN113743219B - Moving object detection method and device, electronic equipment and storage medium

Info

Publication number: CN113743219B
Application number: CN202110888181.6A
Authority: CN
Inventors: 付源梓; 赵洋洋; 赵勇
Original assignee: Beijing Gelingshentong Information Technology Co ltd
Current assignee: Beijing Gelingshentong Information Technology Co ltd
Priority date: 2021-08-03
Filing date: 2021-08-03
Publication date: 2023-09-19
Anticipated expiration: 2041-08-03
Also published as: CN113743219A

Abstract

The embodiment of the application provides a moving object detection method, a moving object detection device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring an image to be input, wherein the image to be input comprises continuous multi-frame images, and the last frame of image in the multi-frame images is a current frame of image; inputting the image to be input into a target detection model to obtain a thermodynamic diagram corresponding to the current frame image; combining detection frames of the target object in the continuous frame image before the current frame image, and determining the target detection frame of the target object in the current frame image from the thermodynamic diagram; and determining the object in the target detection frame as the target object. When the current frame image is blocked, the position of the target object can be predicted by combining continuous multi-frame images, so that missed detection is avoided.

Description

Moving object detection method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer vision, and in particular, to a method and apparatus for detecting a moving object, an electronic device, and a storage medium.

Background

Target detection refers to finding out a target of interest in an image and determining the position of the target. Currently, target detection can be realized by a traditional detection method and a deep learning method.

The traditional detection method is mostly based on a framework of a sliding window, firstly, traversing the whole image to select candidate areas, then extracting features from the areas by using a common directional gradient histogram (Histogram of Oriented Gradient, HOG), scale-invariant feature transform (Scale-invariant feature transform, SIFT) features and the like, and finally classifying the areas by using a classifier. Deep learning is typically performed by using a two-stage or one-stage object detector to detect the position of an object present in an image.

However, when the above method is used to detect a moving object, such as a moving sphere, the moving sphere may be completely blocked by a player in a certain frame, so that the sphere cannot be detected, thereby causing missed detection.

Disclosure of Invention

The embodiment of the application provides a moving object detection method, a moving object detection device, electronic equipment and a storage medium, which can effectively solve the problem of missed detection when a moving object is detected.

According to a first aspect of the embodiment of the present application, there is provided a moving object detection method, which acquires an image to be input, where the image to be input includes continuous multi-frame images, and a last frame image in the multi-frame images is a current frame image; inputting the image to be input into a target detection model to obtain a thermodynamic diagram corresponding to the current frame image; combining detection frames of the target object in the continuous frame image before the current frame image, and determining the target detection frame of the target object in the current frame image from the thermodynamic diagram; and determining the object in the target detection frame as the target object.

According to a second aspect of the embodiments of the present application, there is provided a moving object detection apparatus including: the device comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring an image to be input, the image to be input comprises continuous multi-frame images, and the last frame image in the multi-frame images is a current frame image; the detection module is used for inputting the image to be input into a target detection model to obtain a thermodynamic diagram corresponding to the current frame image; the detection frame determining module is used for combining the detection frames of the target objects in the continuous frame images before the current frame image, determining the target detection frames of the target objects in the current frame image from the thermodynamic diagram, and the target object determining module is used for determining the objects in the target detection frames as the target objects.

According to a third aspect of embodiments of the present application, there is provided an electronic device comprising one or more processors; a memory; one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to perform a method as described above as applied to an electronic device.

According to a fourth aspect of embodiments of the present application, an embodiment of the present application provides a computer readable storage medium having program code stored therein, wherein the above-described method is performed when the program code is run.

The target detection method provided by the embodiment of the application is adopted to acquire the image to be input, wherein the image to be input comprises continuous multi-frame images, and the last frame image in the multi-frame images is the current frame image; inputting the image to be input into a target detection model to obtain a thermodynamic diagram corresponding to the current frame image; combining detection frames of the target object in the continuous frame image before the current frame image, and determining the target detection frame of the target object in the current frame image from the thermodynamic diagram; and determining the object in the target detection frame as the target object. When the current frame image is blocked, the position of the target object can be predicted by combining continuous multi-frame images, so that missed detection is avoided. And determining a target detection frame of the target object in the current frame image by combining the detection frames of the target object in the continuous frame images before the current frame image, so that the target detection frame in the current frame image is more stable, and the target object is accurately determined.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

fig. 1 is a schematic diagram of an application environment of a moving object detection method according to an embodiment of the present application;

FIG. 2 is a flowchart of a moving object detection method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a moving object detection model according to an embodiment of the present application;

FIG. 4 is a flowchart of a moving object detection method according to another embodiment of the present application;

FIG. 5 is a flowchart of a moving object detection method according to still another embodiment of the present application;

FIG. 6 is a functional block diagram of a moving object detecting device according to an embodiment of the present application;

fig. 7 is a block diagram of an electronic device for performing a moving object detection method according to an embodiment of the present application.

Detailed Description

Object detection is a branch of the current computer vision field, and two types of methods are generally used for object detection: traditional detection methods and deep learning methods.

The traditional target detection method is mostly based on a framework of sliding windows, firstly traversing the whole image to select candidate regions, then extracting features from the regions by using a common directional gradient histogram (Histogram of Oriented Gradient, HOG), scale-invariant feature transform (Scale-invariant feature transform, SIFT) features and the like, and finally classifying by using a classifier.

However, in recent years, the traditional detection method is difficult to meet the requirement of people on the target detection effect, and the deep learning occupies most of the mountain in the field of computer vision nowadays, so that the target detection research based on the deep learning obtains a qualitative leap; in the existing target detection method based on deep learning, a two-stage or one-stage target detector is utilized to detect the target position existing in the image. Two-stage algorithm such as R-CNN is used for detecting less areas according to texture, edge, color and other information in the image through a selective search algorithm, and higher recall rate is ensured, but real-time performance is not enough; a one-stage algorithm such as SSD can ensure high-precision detection and simultaneously give consideration to detection speed. Currently, a moving object detection method is often realized based on a deep learning object detection method.

However, the above methods are all based on single frame 2d image for detecting moving objects, and as moving objects, such as spheres, may be blocked by players on the court, the current frame image may have the situation that the spheres are completely blocked, thereby causing object missed detection. The moving object has the characteristic of motion blurring, when the blurred object is encountered, the position of the object cannot be detected correctly, missed detection can be caused, and as other objects can also appear in a scene, false detection can also occur.

In view of the above problems, in an embodiment of the present application, a moving object detection method is provided, where an image to be input is obtained, where the image to be input includes continuous multi-frame images, and a last frame image in the multi-frame images is a current frame image; inputting the image to be input into a target detection model to obtain a thermodynamic diagram corresponding to the current frame image; combining detection frames of the target object in the continuous frame image before the current frame image, and determining the target detection frame of the target object in the current frame image from the thermodynamic diagram; and determining the object in the target detection frame as the target object. When the current frame image is blocked, the position of the target object can be predicted by combining continuous multi-frame images, so that missed detection is avoided. And determining a target detection frame of the target object in the current frame image by combining the detection frames of the target object in the continuous frame images before the current frame image, so that the target detection frame in the current frame image is more stable, and the target object is accurately determined.

The scheme in the embodiment of the application can be realized by adopting various computer languages, such as Java which is an object-oriented programming language, javaScript which is an transliterated script language, python and the like.

In order to make the technical solutions and advantages of the embodiments of the present application more apparent, the following detailed description of exemplary embodiments of the present application is provided in conjunction with the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present application and not exhaustive of all embodiments. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.

Referring to fig. 1, an application environment 10 of the moving object detection method provided by the present application is shown, where the application environment 10 includes an electronic device 20, an image capturing apparatus 30 and a playing field 40. The image acquisition device 30 is in communication connection with the electronic device 20, the image acquired by the image acquisition device 30 is sent to the electronic device 20, and the electronic device 20 can process the received image.

The image capturing device 30 may be a mobile device with an image capturing function, such as a smart phone, a tablet computer, etc. The image acquisition device 30 is disposed in the playing field 40 for acquiring environmental information in the scene 40, wherein the shooting field of view of the image acquisition device 30 may cover the entire playing field 40.

The image capturing device 30 may send the captured image to the electronic device 20 through a network. The network may be a 5G network, a 4G network, a Wi-Fi network, or the like. The electronic device 20 may be a server, an intelligent terminal, a computer, or the like.

Thus, the electronic device 20 may acquire an image to be input, where the image to be input includes a plurality of continuous frames of images, and a last frame of the plurality of frames of images is a current frame of image; inputting the image to be input into a target detection model to obtain a thermodynamic diagram corresponding to the current frame image; combining detection frames of the target object in the continuous frame image before the current frame image, and determining the target detection frame of the target object in the current frame image from the thermodynamic diagram; and determining the object in the target detection frame as the target object.

Referring to fig. 2, an embodiment of the present application provides a moving object detection method, which can be applied to the electronic device in the application environment 10, where the electronic device may be a smart phone, a computer, a server, or the like.

Step 110, an image to be input is obtained, wherein the image to be input comprises continuous multi-frame images, and the last frame of image in the multi-frame images is the current frame of image.

The image to be input can be acquired in real time by the image acquisition device, and the image acquisition device can transmit the acquired image to the electronic equipment every time one frame of image is acquired, so that the electronic equipment can acquire every frame of image acquired by the image acquisition device.

The electronic equipment can acquire an image to be input from the images acquired by the image acquisition device, wherein the image to be input comprises continuous multi-frame images, and the last frame of image in the multi-frame images is the current frame of image. It is understood that the continuous multi-frame image may refer to continuous three-frame images, continuous four-frame images, or continuous five-frame images. Only three consecutive frames of images will be described in detail below.

The electronic device may temporarily store two frames of images, and when the current frame of image is obtained, the two frames of images and the temporarily stored two frames of images form the image to be input. For example, the electronic device may temporarily store the 1 st frame image and the 2 nd frame image, and when the 3 rd frame image is obtained, the electronic device and the temporarily stored 1 st frame image and 2 nd frame image form the image to be input, where the 3 rd frame image is the current frame image. When the 4 th frame image is received, the electronic device may temporarily store the 2 nd frame image and the 3 rd frame image, and form the 4 th frame image, the 2 nd frame image and the 3 rd frame image into an image to be input, where the 4 th frame image is a current frame image.

And 120, inputting the image to be input into a target detection model to obtain a thermodynamic diagram corresponding to the current frame image.

And after the image to be input is obtained, inputting the image to be input into a target detection model, and outputting a thermodynamic diagram corresponding to the current frame image.

Because the input of the target detection model is continuous three-frame images, each frame of image has 3 channels, the 3-frame images are spliced together to form 9 channels, and the detection speed can be influenced due to the overlarge calculation amount of the U-Net network, a 1*1 convolution layer can be arranged in front of each convolution layer in the U-Net network, the number of the channels is reduced to one fourth of the previous channels, so that the calculation amount of the U-Net network is reduced, the target detection speed can be improved, continuous multi-frame images are input, more information can be acquired compared with the single-frame images, and the false detection probability can be reduced.

It will be appreciated that the input layer of the object detection model supports 9 channels when the image to be input is a continuous three-frame image, and that the input layer of the object detection model supports 12 channels when the image to be input is a continuous four-frame image. Specifically, the number of channels of the input layer that can be used for the object detection model may be determined according to the number of images in the image to be input, which is not specifically limited herein.

The target detection model is obtained after training a neural network model in advance, the neural network model comprises a U-Net network, and 1*1 convolution is arranged before each convolution layer in the U-Net network, so that the calculated amount of the U-Net network can be reduced, and the target detection speed can be effectively improved.

When training the neural network model to obtain a target detection model, a sample set can be firstly obtained, wherein the sample set comprises a plurality of sample images with continuous shooting time stored according to video clips and labels corresponding to the sample images, and the labels comprise the center position, the width and the height of a labeling frame of a target object; and training the neural network model by using the sample to obtain the target detection model. Specifically, the structure of the object detection model may refer to fig. 3, in fig. 3, the image to be input is assumed to be a continuous three-frame image, a matrix with width x height x 9 dimensions may be obtained, the matrix with width x height x 9 dimensions is input into an input layer, and after passing through the input layer, enters a U-Net network, and finally a thermodynamic diagram with width x height x1 dimension is output. A convolution layer, a ReLU activation function, a BN layer, a max pooling layer, and a deconvolution layer may be included in the U-Net network, with a 1*1 convolution disposed before each convolution layer.

And step 130, combining the detection frames of the target object in the continuous frame images before the current frame image, and determining the target detection frame of the target object in the current frame image from the thermodynamic diagram.

After obtaining the thermodynamic diagram of the current frame image, the detection frame of the target object in the continuous frame image before the current frame image can be combined, and the target detection frame of the target object in the current frame image can be determined from the thermodynamic diagram based on the detection frame.

Specifically, when determining the target detection frame of the target object in the current frame image, determining a target rectangular frame of the target object from the thermodynamic diagram; acquiring a detection frame of a target object in continuous frame images before the current frame image; determining the average value of the widths of the detection frame and the target rectangular frame and the average value of the heights of the detection frame and the target rectangular frame as a target size; and setting the target rectangular frame as the target size to obtain the target detection frame.

The number of the consecutive frame images may refer to 4 frame images, 5 frame images, and the like, and is not particularly limited herein.

And 140, determining the object in the target detection frame as the target object.

After the target detection frame is determined, the object in the target detection frame can be directly determined to be the target object.

According to the moving target detection method provided by the embodiment of the application, the image to be input is obtained, the image to be input comprises continuous multi-frame images, and the last frame image in the multi-frame images is the current frame image; inputting the image to be input into a target detection model to obtain a thermodynamic diagram corresponding to the current frame image; combining detection frames of the target object in the continuous frame image before the current frame image, and determining the target detection frame of the target object in the current frame image from the thermodynamic diagram; and determining the object in the target detection frame as the target object. When the current frame image is blocked, the position of the target object can be predicted by combining continuous multi-frame images, so that missed detection is avoided. And determining a target detection frame of the target object in the current frame image by combining the detection frames of the target object in the continuous frame images before the current frame image, so that the target detection frame in the current frame image is more stable, and the target object is accurately determined.

Referring to fig. 4, another embodiment of the present application provides a moving object detection method, which is mainly described on the basis of the foregoing embodiments, and specifically includes the following steps.

Step 210, obtaining an image to be input, where the image to be input includes continuous multi-frame images, and a last frame image in the multi-frame images is a current frame image.

And 220, inputting the image to be input into a target detection model to obtain a thermodynamic diagram corresponding to the current frame image.

Step 210 and step 220 may refer to the corresponding parts of the foregoing embodiments, and are not described herein.

And 230, determining a target rectangular frame of the target object from the thermodynamic diagram.

After the thermodynamic diagram is obtained, a target rectangular box of the target object may be determined from the thermodynamic diagram. Specifically, the electronic device may determine all contours in the thermodynamic diagram, and for each contour, a minimum rectangular box containing the contour may be obtained. The smallest rectangular frame is a rectangular frame having the smallest area including the outline.

If the thermodynamic diagram includes a plurality of outlines, a plurality of minimum rectangular frames can be obtained, after the plurality of rectangular frames are obtained, the area of each minimum rectangular frame can be obtained, and the rectangular frame with the largest area is selected from the minimum rectangular frames to be the target rectangular frame. For example, there are 3 smallest rectangular boxes, box a, box B, and box C, respectively, where the area of box B is largest, and box B can be determined to be the target rectangular box.

Step 240, obtaining a detection frame of the target object in the continuous frame image before the current frame image.

Step 250, determining the average value of the widths of the detection frame and the target rectangular frame, and the average value of the heights of the detection frame and the target rectangular frame as the target size.

After the target rectangular frame is determined, in order to avoid the problem that the size of the detection frame is unstable, the size difference of the detection frames detected by the previous frame image and the next frame image is large, the visual effect is bad, the detection frame of the target object in the continuous frame images before the current frame image can be obtained, and the target detection frame is determined according to the obtained detection frame.

Specifically, when the detection frames of the target object in the continuous frame images before the current frame image are acquired, each frame image corresponds to one detection frame, if the detection frames of the continuous 3 frame images are acquired, 3 detection frames can be obtained, and if the detection frames of the continuous 4 frame images are acquired, 4 detection frames can be obtained.

After obtaining the plurality of detection frames, the width and the height of each detection frame can be obtained, and the average value of the width of each detection frame and the width of the target rectangular frame and the average value of the height of each detection frame and the height of the target rectangular frame are calculated to obtain the target size.

For example, there are 3 detection frames, the 1 st detection frame has a width x1, a height y1, the 2 nd detection frame has a width x2, a height y2, the 3 rd detection frame has a width x3, a height y3, the target rectangular frame has a width x4, and a height y4. The average value of the widths of each detection frame and the target rectangular frame is calculated as follows: x= (x1+x2+x3+x4)/4; the average value of the heights of each detection frame and the target rectangular frame is calculated as follows: y= (y1+y2+y3+y4)/4; the target dimension is X in width and Y in height.

And 260, setting the target rectangular frame to the target size to obtain the target detection frame.

After the target size is determined, the target rectangular frame may be set to the target size, to obtain a target detection frame.

In some embodiments, the center point of the target rectangular frame may be determined first, and the center point is used to expand the center, and the center point is used to expand half of the average value of the width, and half of the average value of the height, respectively, to obtain the target detection frame.

In other embodiments, the target detection frame may be obtained by expanding or shrinking the target rectangular frame in a preset direction until the size of the target rectangular frame is the target size.

And step 270, determining the object in the target detection frame as the target object.

Step 270 may refer to the corresponding parts of the foregoing embodiments, and will not be described herein.

According to the moving target detection method provided by the embodiment of the application, after the target rectangular frame is determined from the thermodynamic diagram, the target size is determined by combining the detection frames of the target objects corresponding to the continuous frame images before the current frame image, and the target rectangular frame is set as the target size to obtain the target detection frame. Therefore, the object detection frame can be prevented from flashing severely, the object detection frame with stable size is obtained, and the visual effect is good.

Referring to fig. 5, a method for detecting a moving object according to still another embodiment of the present application is provided, and a process for obtaining a target detection model is described with emphasis on the foregoing embodiments.

Step 310, an image to be input is obtained, wherein the image to be input comprises continuous multi-frame images, and the last frame of image in the multi-frame images is the current frame of image.

Step 310 may refer to the corresponding parts of the foregoing embodiments, and will not be described herein.

Step 320, obtaining a sample set, where the sample set includes a plurality of image groups, each image group includes a continuous multi-frame sample image and a label corresponding to the sample image, where the label includes a center position of a labeling frame of a target object in the sample image and a width and a height of the labeling frame of the target object in the sample image.

After the image to be input is obtained, the image to be input is required to be input into a target detection model for target detection. Before using the target detection model, training of the neural network model is required to obtain the target detection model.

In training a neural network model, it is generally necessary to construct a sample set that includes a plurality of image groups, each image group including a plurality of consecutive sample images, and a label corresponding to the sample image, the label including a center position of a labeling frame of a target object in the sample image and a width and a height of the labeling frame of the target object in the sample image.

The image group can be from a video clip, the video clip is composed of sample images with continuous shooting time, each frame of sample image in the video clip can be acquired, and continuous multi-frame sample images are combined into the image group. The number of sample images included in each image group is the same as the number of images in the image to be input.

For example, each image group may include consecutive 3 frames of sample images, and then consecutive 3 frames of images are also included in the image to be input.

And step 330, training a neural network model by using the sample set to obtain the target detection model.

When training the neural network model by using the sample set, an image group in the sample set may be input into the neural network model, and a thermodynamic diagram corresponding to a current sample image may be obtained, where the current sample image is the last frame image in the image group.

Because the excessive calculation amount of the U-Net network can influence the model speed, a 1*1 convolution layer is arranged before each convolution layer in the U-Net network, the channel number is reduced, the calculation amount is reduced, the model speed is increased, a thermodynamic diagram is used for predicting a detection frame, and the problem of motion blur can be relieved.

After obtaining the thermodynamic diagram corresponding to the current sample image, a label of the sample image can be obtained, and a label thermodynamic value is calculated according to the label. The label comprises the center position, the width and the height of a labeling frame of a target object in a sample image, and the specific calculation of the label thermodynamic value can be carried out according to the following formula:

wherein ,g_xy Representing the thermal value of the label, (x) ₁ ,y ₁ ) Representing coordinates of each pixel point in the sample image, (x) ₀ ,y ₀ ) Representing the coordinates, sigma, of the central position of the annotation frame ² Representing the variance of the gaussian blur. In the embodiment of the application, sigma ² The value of (2) is 16, and can be specifically set according to the needs.

After the tag thermodynamic value is obtained, a predicted value corresponding to the thermodynamic diagram may be obtained, and a target loss may be determined from the predicted value and the tag thermodynamic value. Specifically, the target loss is a weighted binary cross entropy between a predicted value and the tag thermodynamic value, and the obtained target loss function is:

wherein ,(x₂ ,y ₂ ) Representing the coordinates of each pixel point in the thermodynamic diagram, g _xy Representing the thermal value of the label, p _xy Representing the predicted value.

Therefore, whether the neural network model is converged or not can be determined according to the target loss, and the target detection model is obtained when the neural network model is converged.

And step 340, inputting the image to be input into a target detection model to obtain a thermodynamic diagram corresponding to the current frame image.

Step 350, combining the detection frames of the target object in the continuous frame image before the current frame image, and determining the target detection frame of the target object in the current frame image from the thermodynamic diagram.

And step 360, determining the object in the target detection frame as the target object.

Steps 340 to 360 may refer to the corresponding parts of the foregoing embodiments, and are not described herein.

It should be noted that, step 320 and step 330 are steps for obtaining the target detection model, and thus, step 320 and step 330 may be performed before step 340 or before step 310, which is not limited herein.

According to the moving target detection method provided by the embodiment of the application, the neural network model is trained to obtain the target detection model, the target detection model comprises U-Net networks, and a 1*1 convolution layer is arranged in front of each convolution layer in the U-Net networks. On the basis of the original U-Net network, 1*1 convolution is arranged in front of each convolution layer, so that the calculated amount of the U-Net network can be reduced, and the detection speed of a moving object can be effectively improved.

Referring to fig. 6, an embodiment of the present application provides a moving object detecting apparatus 400, where the moving object detecting apparatus 400 includes an obtaining module 410, a detecting module 420, a detecting frame determining module 430 and a target object determining module 440. The obtaining module 410 is configured to obtain an image to be input, where the image to be input includes a continuous multi-frame image, and a last frame image in the multi-frame image is a current frame image; the detection module 420 is configured to input the image to be input into a target detection model, and obtain a thermodynamic diagram corresponding to the current frame image; the detection frame determining module 430 is configured to determine, from the thermodynamic diagram, a target detection frame of the target object in the current frame image in combination with a detection frame of the target object in a consecutive frame image preceding the current frame image; the target object determining module 440 is configured to determine that the object in the target detection frame is the target object.

Further, the object detection model includes a U-Net network, each convolution layer in the U-Net network being preceded by a 1*1 convolution.

Further, the detection frame determining module 430 is further configured to determine a target rectangular frame of the target object from the thermodynamic diagram; acquiring a detection frame of a target object in continuous frame images before the current frame image; determining the average value of the widths of the detection frame and the target rectangular frame and the average value of the heights of the detection frame and the target rectangular frame as a target size; and setting the target rectangular frame as the target size to obtain the target detection frame.

Further, the detection frame determining module 430 is further configured to determine all contours in the thermodynamic diagram; obtaining a minimum rectangular frame containing each contour based on each contour; and acquiring the rectangular frame with the largest area in the smallest rectangular frame as the target rectangular frame.

Further, the detection module 420 is further configured to obtain a sample set, where the sample set includes a plurality of image groups, each of the image groups includes a continuous multi-frame sample image and a label corresponding to the sample image, where the label includes a center position of a labeling frame of a target object in the sample image and a width and a height of the labeling frame of the target object in the sample image; and training the neural network model by using the sample set to obtain the target detection model.

Further, the detection module 420 is further configured to input the image group into the neural network model to obtain a thermodynamic diagram corresponding to a current sample image, where the current sample image is a last frame image in the image group; calculating a label thermodynamic value according to the label corresponding to the current sample image; obtaining a predicted value corresponding to the thermodynamic diagram, and determining target loss according to the predicted value and the label thermodynamic value; and when the neural network model is determined to be converged according to the target loss, obtaining the target detection model.

Further, the detection module 420 is further configured to calculate a thermal value of the tag according to the following formula:

wherein ,g_xy Representing the thermal value of the label, (x) ₁ ,y ₁ ) Representing coordinates of each pixel point in the sample image, (x) ₀ ,y ₀ ) Representing the coordinates, sigma, of the central position of the annotation frame ² Representing the variance of the gaussian blur.

Further, the detection module 420 is further configured to determine, according to the predicted value and the tag thermal value, that the target loss is:

Further, the target object is a sphere.

The moving target detection device provided by the embodiment of the application acquires an image to be input, wherein the image to be input comprises continuous multi-frame images, and the last frame image in the multi-frame images is a current frame image; inputting the image to be input into a target detection model to obtain a thermodynamic diagram corresponding to the current frame image; combining detection frames of the target object in the continuous frame image before the current frame image, and determining the target detection frame of the target object in the current frame image from the thermodynamic diagram; and determining the object in the target detection frame as the target object. When the current frame image is blocked, the position of the target object can be predicted by combining continuous multi-frame images, so that missed detection is avoided. And determining a target detection frame of the target object in the current frame image by combining the detection frames of the target object in the continuous frame images before the current frame image, so that the target detection frame in the current frame image is more stable, and the target object is accurately determined.

It should be noted that, for convenience and brevity of description, specific working processes of the apparatus described above may refer to corresponding processes in the foregoing method embodiments, which are not repeated herein.

Referring to fig. 7, an embodiment of the present application provides a block diagram of an electronic device 500, where the electronic device 500 includes a processor 510, a memory 520, and one or more application programs, where the one or more application programs are stored in the memory 520 and configured to be executed by the one or more processors 510, and the one or more program is configured to perform the moving object detection method described above.

The electronic device 500 may be a terminal device such as a smart phone, a tablet computer, etc. capable of running an application program, or may be a server. The electronic device 500 of the present application may include one or more of the following components: a processor 510, a memory 520, and one or more application programs, wherein the one or more application programs may be stored in the memory 520 and configured to be executed by the one or more processors 510, the one or more program(s) configured to perform the method as described in the foregoing method embodiments.

Processor 510 may include one or more processing cores. The processor 510 utilizes various interfaces and lines to connect various portions of the overall electronic device 500, perform various functions of the electronic device 500, and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 520, and invoking data stored in the memory 520. Alternatively, the processor 510 may be implemented in hardware in at least one of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 510 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for being responsible for rendering and drawing of display content; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 510 and may be implemented solely by a single communication chip.

The Memory 520 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (Read-Only Memory). Memory 520 may be used to store instructions, programs, code sets, or instruction sets. The memory 520 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described below, etc. The storage data area may also store data created by the electronic device 500 in use (e.g., phonebook, audiovisual data, chat log data), and the like.

The electronic equipment provided by the embodiment of the application acquires an image to be input, wherein the image to be input comprises continuous multi-frame images, and the last frame image in the multi-frame images is a current frame image; inputting the image to be input into a target detection model to obtain a thermodynamic diagram corresponding to the current frame image; combining detection frames of the target object in the continuous frame image before the current frame image, and determining the target detection frame of the target object in the current frame image from the thermodynamic diagram; and determining the object in the target detection frame as the target object. When the current frame image is blocked, the position of the target object can be predicted by combining continuous multi-frame images, so that missed detection is avoided. And determining a target detection frame of the target object in the current frame image by combining the detection frames of the target object in the continuous frame images before the current frame image, so that the target detection frame in the current frame image is more stable, and the target object is accurately determined.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A moving object detection method, characterized in that the method comprises:

acquiring an image to be input, wherein the image to be input comprises continuous multi-frame images, and the last frame of image in the multi-frame images is a current frame of image;

inputting the image to be input into a target detection model to obtain a thermodynamic diagram corresponding to the current frame image;

combining detection frames of the target object in the continuous frame image before the current frame image, and determining the target detection frame of the target object in the current frame image from the thermodynamic diagram;

determining an object in the target detection frame as the target object;

the target detection model is obtained through the following steps:

acquiring a sample set, wherein the sample set comprises a plurality of image groups, each image group comprises a plurality of continuous multi-frame sample images and a label corresponding to the sample images, and the label comprises the center position of a labeling frame of a target object in the sample images and the width and the height of the labeling frame of the target object in the sample images;

training a neural network model by using the sample set to obtain the target detection model;

training the neural network model by using the sample set, and obtaining the target detection model comprises the following steps:

inputting the image group into the neural network model to obtain a thermodynamic diagram corresponding to a current sample image, wherein the current sample image is the last frame image in the image group;

calculating a label thermodynamic value according to the label corresponding to the current sample image;

obtaining a predicted value corresponding to the thermodynamic diagram, and determining target loss according to the predicted value and the label thermodynamic value;

and when the neural network model is determined to be converged according to the target loss, obtaining the target detection model.

2. The method of claim 1, wherein the object detection model comprises a U-Net network, each convolution layer in the U-Net network being preceded by a 1*1 convolution.

3. The method of claim 1, wherein said determining the target detection frame of the target object in the current frame image from the thermodynamic diagram in combination with the detection frame of the target object in the successive frame images preceding the current frame image comprises:

determining a target rectangular frame of the target object from the thermodynamic diagram;

acquiring a detection frame of a target object in continuous frame images before the current frame image;

determining the average value of the widths of the detection frame and the target rectangular frame and the average value of the heights of the detection frame and the target rectangular frame as a target size;

and setting the target rectangular frame as the target size to obtain the target detection frame.

4. A method according to claim 3, wherein said determining a target rectangular box of said target object from said thermodynamic diagram comprises:

determining all contours in the thermodynamic diagram;

obtaining a minimum rectangular frame containing each contour based on each contour;

and acquiring the rectangular frame with the largest area in the smallest rectangular frame as the target rectangular frame.

5. The method of claim 1, wherein the label thermal value is calculated according to the formula:

；

wherein ,representing the thermal value of the label, (x) ₁ ,y ₁ ) Representing the coordinates of each pixel point in the sample image, (-a->Coordinates representing the central position of the annotation frame,/->Representing the variance of the gaussian blur.

6. The method of claim 1, wherein determining a target loss from the predicted value and the tag thermodynamic value is:

；

wherein ,（x₂ ,y ₂ ) Representing the coordinates of each pixel point in the thermodynamic diagram,representing the thermal value of the label>Representing the predicted value.

7. The method of any one of claims 1-6, wherein the target object is a sphere.

8. An electronic device, the electronic device comprising:

one or more processors;

a memory electrically connected to the one or more processors;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more applications configured to perform the method of any of claims 1-7.

9. A computer readable storage medium having stored therein program code which is callable by a processor to perform the method of any one of claims 1 to 7.