CN113449634A

CN113449634A - Video detection method and device for processing under strong light environment

Info

Publication number: CN113449634A
Application number: CN202110718258.5A
Authority: CN
Inventors: 谢尔康; 姜蓓蓓
Original assignee: Shanghai Hansheng Information Technology Co ltd
Current assignee: Shanghai Hansheng Information Technology Co ltd
Priority date: 2021-06-28
Filing date: 2021-06-28
Publication date: 2021-09-28

Abstract

The invention discloses a video detection method and a device for processing a highlight environment, wherein the method comprises the following steps: acquiring streaming data by an image acquisition unit; synthesizing the streaming data into a picture and then importing the picture into a deep neural network; the deep neural network converts the size of the picture into N x N to obtain a converted picture; after the converted picture is subjected to multiple detections of the convolution layer, parameters of a detection target are given according to preset confidence coefficient parameters; after the multiple detections, the method further comprises the steps of adding a convolutional layer, an upper sampling layer, a splicing layer and a convolution group, wherein the convolution group is used for receiving data of the convolutional layer in the basic network. According to the method and the device for processing the video detection in the strong light environment, the accuracy, the stability and the prediction precision of prediction are improved by adding the convolution layer, the upper sampling layer, the splicing layer and the convolution group.

Description

Video detection method and device for processing under strong light environment

Technical Field

The present invention relates to a video detection method and apparatus, and in particular, to a video detection method and apparatus for processing a video in a high light environment.

Background

At present, the conventional wharf is continuously developed towards an automatic wharf, and the conditions in a bridge pod and an operation cabin of the port wharf need to be detected in real time so as to realize the automatic operation of the wharf.

However, in general, due to a complex environment in a bridge crane cabin of a port and a wharf, for example, all-weather operation on an island can be subject to the situation of strong light and weak light in the cabin, an operation cabin needs to move along with the carrying process of the bridge crane, and frequent shaking conditions exist, so that the video identification technology has high false detection and high computational power loss.

There is therefore a need for a method of video detection that addresses the above-mentioned problems and disadvantages.

Disclosure of Invention

The invention aims to solve the technical problem of providing a video detection method and a video detection device for processing a highlight environment, wherein the accuracy, stability and prediction precision of prediction are improved by adding a convolution layer, an upper sampling layer, a splicing layer and a convolution group.

The technical scheme adopted by the invention for solving the technical problems is to provide a video detection method for processing a highlight environment, which comprises the following steps:

acquiring streaming data by an image acquisition unit;

synthesizing the streaming data into a picture and then importing the picture into a deep neural network;

the deep neural network converts the size of the picture into N x N to obtain a converted picture;

after the converted picture is subjected to multiple detections of the convolution layer, parameters of a detection target are given according to preset confidence coefficient parameters;

after the multiple detections, the method further comprises the steps of adding a convolutional layer, an upper sampling layer, a splicing layer and a convolution group, wherein the convolution group is used for receiving data of the convolutional layer in the basic network.

Preferably, the image acquisition unit includes: vision camera, laser radar, camera.

Preferably, after the streaming data is synthesized into a picture and then introduced into a neural network, the method further comprises classifying the picture according to a preset rule.

Preferably, the parameters of the detection target include a central x coordinate, a central y coordinate, and a width and a height of the detection frame.

Preferably, the receptive field in the multiple detections is 16 times, and the size of the used anchor frame is (10, 13); (16, 30); (32,33).

The present invention further provides a video detection apparatus for processing a video in a strong light environment, which comprises:

an image acquisition unit for acquiring streaming data;

the image synthesis unit is used for synthesizing the streaming data into an image and then importing the image into the deep neural network;

the image conversion unit is used for converting the size of the image into N x N through the deep neural network to obtain a converted image;

the parameter acquisition unit is used for detecting the converted picture for multiple times through the convolution layer and then giving out a parameter of a detection target according to a preset confidence coefficient parameter;

and the convolution unit is used for adding a convolution layer, an upper sampling layer and a splicing layer after the detection for a plurality of times, and a convolution group, wherein the convolution group is used for receiving the data of the convolution layer in the basic network.

Compared with the prior art, the invention has the following beneficial effects: according to the method and the device for processing the video detection in the strong light environment, a convolutional layer, an upper sampling layer, a splicing layer and a convolution group are added, and the convolution group is used for receiving data of the convolutional layer in a basic network, so that the accuracy, the stability and the prediction precision of prediction are improved;

further, by using 4-layer up-sampling, deeper detection can be achieved, and the probability of false detection is reduced.

Drawings

FIG. 1 is a flow chart of a method for processing video detection in a high light environment according to an embodiment of the present invention;

FIG. 2 is a block diagram of an exemplary embodiment of a video detection apparatus for handling high light environments;

FIG. 3 is a diagram illustrating the number of neural network layers used in a video detection method for processing a highlight environment according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a convolution group in a neural network used in a method for processing video detection in a strong light environment according to an embodiment of the present invention.

Detailed Description

The invention is further described below with reference to the figures and examples.

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. However, it will be apparent to one of ordinary skill in the art that the present invention may be practiced without these specific details. Accordingly, the particular details set forth are merely exemplary, and the particular details may be varied from the spirit and scope of the present invention and still be considered within the spirit and scope of the present invention.

Referring now to fig. 1, fig. 1 is a flow chart illustrating a method for processing video detection in a high light environment according to an embodiment of the present invention. The embodiment of the invention provides a video detection method for processing a highlight environment, which comprises the following steps:

step 101: acquiring streaming data by an image acquisition unit;

step 102: synthesizing the streaming data into a picture and then importing the picture into a deep neural network;

step 103: the deep neural network converts the size of the picture into N x N to obtain a converted picture;

step 104: after the converted picture is subjected to multiple detections of the convolution layer, parameters of a detection target are given according to preset confidence coefficient parameters;

step 105: after the multiple detections, the method further comprises the steps of adding a convolutional layer, an upper sampling layer, a splicing layer and a convolution group, wherein the convolution group is used for receiving data of the convolutional layer in the basic network.

In a specific implementation, the streaming data is RSTP (rapid spanning Tree Protocol) streaming data.

The image acquisition unit includes: vision camera, laser radar, camera.

And after the streaming data is synthesized into a picture and then is led into a neural network, classifying the picture according to a preset rule. And the step of importing the streaming data into the neural network after the streaming data is synthesized into the picture comprises the step of synthesizing the streaming data into the picture by using an OPENCV component. The preset rules comprise operation criteria in the wharf bridge hanging cabin, such as smoking, operation of electronic equipment during working, non-safety belt fastening and other illegal operations, and can be preset.

The deep neural network converts the size of the picture into N x N to obtain a converted picture, wherein the converted picture comprises a given image or video frame and is set as mat, and the size of the mat is firstly converted into N x N, wherein N can be preset and is a default value of 416. And then, sequentially detecting the converted images by the convolutional layers, and finally giving parameters of all detection targets meeting the requirements according to the specified confidence coefficient parameters.

The parameters of the detection target comprise a central x coordinate, a central y coordinate, and the width and the height of a detection frame. During the detection process, 3 times of detection are performed, corresponding to different receptive fields.

The receptive field in the multiple detection is 16 times, and the size of the used anchor frame is (10, 13); (16, 30); (32,33).

In the specific implementation, after the original predicted values 1, 2 and 3, a convolutional layer, an upsampling layer and a splicing layer, which are the same as the upper layer, are connected, and a convolutional group is used for receiving the data of the convolutional layer in the base network. Finally, a predicted value 4 is calculated together with the data, and by taking the predicted value as a contrast, the data of the convolutional layer is further enhanced. That is, the use of 4-level upsampling can achieve deeper detection and reduce the probability of false detection.

Referring now to fig. 2, fig. 2 is a block diagram of an apparatus for processing video detection in a high light environment according to an embodiment of the present invention. The embodiment of the present invention provides a video detection apparatus 21 for processing under a strong light environment, including:

an image acquisition unit 211 for acquiring streaming data;

a picture synthesizing unit 212, configured to synthesize the streaming data into a picture and then import the picture into a deep neural network;

a picture conversion unit 213, configured to convert the size of the picture into N × N through the deep neural network to obtain a converted picture;

a parameter obtaining unit 214, configured to obtain a parameter of a detection target according to a preset confidence parameter after the converted picture is subjected to multiple detections by the convolutional layer;

convolution unit 215 for adding a convolutional layer, an upsampling layer, and a splicing layer after the multiple detections, and a convolution group for receiving data of the convolutional layer in the base network.

In a specific implementation, the image acquiring unit 211 includes: vision camera, laser radar, camera.

And after the streaming data is synthesized into a picture and then is led into a neural network, classifying the picture according to a preset rule.

The parameters of the detection target comprise a central x coordinate, a central y coordinate, and the width and the height of a detection frame.

Referring now to fig. 3 and 4, fig. 3 is a diagram showing the number of layers of a neural network used in a method for processing video detection in a high light environment according to an embodiment of the present invention, and fig. 4 is a schematic diagram of a convolution group in a neural network used in a method for processing video detection in a high light environment according to an embodiment of the present invention.

After the original predicted values 1, 2 and 3, a convolutional layer, an upper sampling layer and a splicing layer which are the same as the upper layer, and a convolutional group are connected, wherein the convolutional group is used for receiving the data of the convolutional layer in the basic network.

In a specific implementation, a convolutional group includes convolutional layer 1x1, convolutional layer 3x3, convolutional layer 1x1, convolutional layer 3x3, and convolutional layer 1x 1.

In summary, the method and the device for processing video detection in a strong light environment provided by the invention increase a convolution layer, an upper sampling layer, a splicing layer and a convolution group, wherein the convolution group is used for receiving data of the convolution layer in a basic network, so that the accuracy, stability and prediction precision of prediction are improved;

further, by using 4-layer up-sampling, deeper detection can be achieved, and the probability of false detection is reduced. .

Those of ordinary skill in the art will appreciate that the elements and steps of the various examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the various examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The above examples are only for illustrating the technical solutions of the present invention, and are not limited thereto. Although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention.

Claims

1. A video detection method for processing under a strong light environment is characterized by comprising the following steps:

acquiring streaming data by an image acquisition unit;

2. The method of claim 1, wherein the image capturing unit comprises: vision camera, laser radar, camera.

3. The method according to claim 1, further comprising classifying the pictures according to a preset rule after synthesizing the streaming data into the pictures and importing the pictures into a neural network.

4. The method of claim 1, wherein the parameters of the detection target include center x coordinate, center y coordinate, width and height of the detection frame.

5. The method for processing video detection in a strong light environment according to claim 1, wherein the receptive field in the multiple detections is 16 times, and the size of the anchor frame used is (10, 13); (16, 30); (32,33).

6. A video detection apparatus for handling high light environments, comprising:

an image acquisition unit for acquiring streaming data;

7. The apparatus for processing video detection in a high light environment according to claim 1, wherein the image capturing unit comprises: vision camera, laser radar, camera.

8. The apparatus according to claim 1, further comprising classifying the pictures according to a preset rule after synthesizing the streaming data into pictures and importing the pictures into a neural network.

9. The apparatus of claim 1, wherein the parameters of the detection target include center x coordinate, center y coordinate, width and height of the detection frame.

10. The apparatus for processing video detection in a high light environment according to claim 1, wherein the field of view in the multiple detections is 16 times, and the size of the anchor frame used is (10, 13); (16, 30); (32,33).