CN113362369A

CN113362369A - State detection method and detection device for moving object

Info

Publication number: CN113362369A
Application number: CN202110635008.5A
Authority: CN
Inventors: 秦家虎; 周文华; 王帅; 张展鹏
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2021-06-07
Filing date: 2021-06-07
Publication date: 2021-09-07

Abstract

The invention discloses a state detection method of a moving object, which comprises the following steps: acquiring an image frame sequence shot by a monocular camera, wherein an image frame of the image sequence comprises a moving object to be detected; carrying out target detection on the image frame sequence to obtain an image block sample for representing the moving object to be detected; inputting the image block sample into a feature extraction module, and outputting optical flow information and tracking information of the moving object to be detected; and inputting the optical flow information and the tracking information into a neural network model, and outputting the state information of the moving object to be detected.

Description

State detection method and detection device for moving object

Technical Field

The invention relates to the technical field of traffic safety, in particular to the field of a state detection method and a state detection device of a moving object.

Background

In the past few years, intelligent driving systems have been rapidly developed, sensing the dynamic environment around an autonomous vehicle is a key task for implementing autonomous driving, and vehicle relative speed estimation is a basic function required by modern intelligent driving systems. Traditionally, dynamic environmental information around a vehicle is perceived by a distance sensor (e.g., LiDAR or millimeter wave radar).

Currently, the application of distance sensors (e.g., LiDAR or millimeter wave radar) is one of the most representative solutions in smart driving applications. These sensors can directly measure the distance and speed of other vehicles, but they are susceptible to adverse environmental factors such as rain, snow or fog.

Recent studies have shown that it is indeed possible, but still limited, to estimate self-motion and disparity maps of monocular camera images by the structure of motion in an autonomous driving scenario. Methods based on dynamic scene streams work well, but rely on stereo image datasets. Furthermore, they come at the cost of very high computational cost, so the estimation of time frame pairs on a single CPU core may take 5-10 minutes. In an autonomous driving scenario, the computational resources are typically very limited, which makes the object scene stream currently impractical in practice.

In addition, in dynamic application scenarios, predicting optical flow in the entire image is not a desirable option due to the imbalance in motion distribution between the static background and the moving vehicle. The predicted traffic from the full image flow network is eventually zero when the actual traffic of the running vehicle is only a small fraction of a pixel. Furthermore, due to perspective projection, a vehicle traveling at high speed may have a small optical flow; on the other hand, a low-speed close-distance vehicle may generate a large flow rate on an image. This situation extends the drawbacks of full image stream networks.

Disclosure of Invention

Technical problem to be solved

The invention discloses a state detection method and a state detection device for a moving object, which aim to at least partially solve the technical problems.

(II) technical scheme

To achieve the above object, an embodiment of the present invention provides a method for detecting a state of a moving object, including: acquiring an image frame sequence shot by a monocular camera, wherein an image frame of the image sequence comprises a moving object to be detected; carrying out target detection on the image frame sequence to obtain an image block sample for representing the moving object to be detected; inputting the image block sample into a feature extraction module, and outputting optical flow information and tracking information of the moving object to be detected; and inputting the optical flow information and the tracking information into a neural network model, and outputting the state information of the moving object to be detected.

According to an embodiment of the present invention, performing target detection on the image frame sequence to obtain image block samples for characterizing the moving object to be detected includes: inputting the image frames in the image frame sequence into a target detector, and outputting a bounding box (B) for representing the moving object to be detected_i＝(l_i，t_i，r_i，b_i) Wherein l is_iIs the left boundary coordinate, t, of the bounding box_iIs the upper boundary coordinate of the bounding box, r_iAs the right boundary coordinates of the bounding box, b_iThe lower boundary coordinate of the boundary frame;

cutting the boundary frame according to a preset cutting rule to obtain an image block corresponding to the boundary frame;

and constructing the image block samples according to the image block.

According to an embodiment of the present invention, clipping the bounding box according to a preset clipping rule to obtain an image block corresponding to the bounding box includes:

for bounding box B_i＝(l_i，t_i，r_i，b_i) The clipping region is defined as:

wherein the content of the first and second substances,

sigma is a spreading factorAnd (4) adding the active ingredients.

According to an embodiment of the present invention, the feature extraction module includes an optical flow calculation module and a tracker, inputting the image block samples into the feature extraction module, and outputting the optical flow information and the tracking information of the moving object to be detected includes: inputting the image block sample into the optical flow calculation module, and outputting the optical flow information of the moving object to be detected; and inputting the image block samples into the tracker and outputting the tracking information of the moving object to be detected.

According to an embodiment of the present invention, the optical flow information includes a magnitude and an angle of a pixel displacement vector of the moving object to be detected.

According to an embodiment of the present invention, the tracking information is obtained using a deep sort tracking algorithm.

According to an embodiment of the invention, the neural network model includes a long-term recursive convolutional network, a first multi-layered perceptron, and a second multi-layered perceptron.

According to an embodiment of the present invention, inputting the optical flow information and the tracking information into a neural network model, and outputting the state information of the moving object to be detected includes: inputting the optical flow information into the long-term recursive convolutional network, and outputting a first one-dimensional vector; inputting the tracking information into the first multilayer perceptron, and outputting a second one-dimensional vector; pooling and stacking the first one-dimensional vector and the second one-dimensional vector to obtain a 1 x n-dimensional vector, wherein n is a dimension; inputting the 1 Xn-dimensional vector into the second multi-layer perceptron, and outputting the state information.

According to an embodiment of the present invention, the first multi-layered sensor is a 6-layered multi-layered sensor and the second multi-layered sensor is a 3-layered multi-layered sensor.

According to an embodiment of the present invention, a state detection apparatus of a mobile object implemented based on any one of the above methods includes: the acquisition module is used for acquiring an image frame sequence obtained by shooting by a monocular camera, wherein the image frame of the image sequence comprises a moving object to be detected; the detection module is used for carrying out target detection on the image frame sequence to obtain an image block sample for representing the moving object to be detected; the first processing module is used for inputting the image block samples into the feature extraction module and outputting optical flow information and tracking information of the moving object to be detected; and the second processing module is used for inputting the optical flow information and the tracking information into a neural network model and outputting the state information of the moving object to be detected.

(III) advantageous effects

The invention has the following beneficial effects by the detection method and the detection device for estimating the state of the moving object, which comprises the following steps:

(1) compared to conventional moving object detection algorithms, the present invention proposes a data-based method to estimate the relative velocity of a moving object using a monocular camera, thereby eliminating the need for expensive sensors such as lidar.

(2) The method proposed by the present invention is a lightweight architecture and depends on pre-computed optical characteristics, which are easy to compute and can be done in real time. The invention is based on calibration of the camera and therefore does not rely on any hardware for estimating the speed.

(3) The invention adopts a sampling mode taking a moving object as a center, and reduces the influence of unbalanced motion distribution and perspective projection on optical flow cue estimation.

Drawings

Fig. 1 is a flowchart illustrating a method for detecting a state of a moving object according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of boundary frame sampling of a moving object, i.e., a vehicle, according to an embodiment of the present invention.

Detailed Description

In order that the objects, technical solutions and advantages of the present invention will become more apparent, the present invention will be further described in detail with reference to the accompanying drawings in conjunction with the following specific embodiments.

To implement a monocular vision-based solution for ambient vehicle speed estimation, techniques for vehicle target detection and depth estimation are needed. Object detection is a technique required to predict the state (e.g., position, speed, and direction) of a surrounding vehicle.

To estimate ego-motion and surrounding vehicle velocity, optical flow is commonly used in computer vision as a method of depth estimation techniques, as it can be used to calculate velocity relative to the ground. Conventional optical flow estimation algorithms fall into two categories: feature-based methods and variational methods. Feature-based methods find image displacements by tracking features, including, for example, edges, corners and other locally good structures, and tracking them over a series of frames. The main limitation of this method is that it is difficult to estimate the flow in areas lacking salient features, and the variational method provides a more accurate estimate by using an energy function coupled with the assumption of constancy of brightness and spatial smoothness.

Compared with a distance sensor, the camera sensor can provide richer scene texture and structure information even under adverse conditions, and can be used as an economical, efficient and powerful substitute product for the distance sensor.

In view of economic efficiency and environmental suitability, the present invention proposes a data-based method to estimate the relative speed and position of a vehicle using a monocular camera.

An embodiment of the present invention provides a method for detecting a state of a moving object, including: acquiring an image frame sequence shot by a monocular camera, wherein an image frame of the image sequence comprises a moving object to be detected; carrying out target detection on the image frame sequence to obtain an image block sample for representing the moving object to be detected; inputting the image block sample into a feature extraction module, and outputting optical flow information and tracking information of the moving object to be detected; and inputting the optical flow information and the tracking information into a neural network model, and outputting the state information of the moving object to be detected, as shown in fig. 1.

In the embodiment of the present invention, the moving object may be, for example, a running vehicle, a crowd, other moving objects, and the like, and the monocular camera may be disposed on the moving object. Taking a running vehicle as an example, a sampling strategy with the vehicle as a center can be adopted, and a monocular camera captures an image frame sequence to process the influence of unbalanced motion distribution and perspective projection. In particular, e.g. at t₁Time-acquisition of an RGB image

At t₂Time-acquisition of an RGB image

By the method, the current frame image can be estimated

Relative to the camera coordinate system.

According to an embodiment of the present invention, performing target detection on the image frame sequence to obtain image block samples for characterizing the moving object to be detected includes: inputting the image frames in the image frame sequence into a target detector, and outputting a bounding box B for representing the moving object to be detected_i＝(l_i，t_i，r_i，b_i) Wherein l is_iIs the left boundary coordinate, t, of the bounding box_iIs the upper boundary coordinate of the bounding box, r_iAs the right boundary coordinates of the bounding box, b_iThe coordinates of the lower boundary of the bounding box, as shown in FIG. 2; clipping the bounding box (bounding box) according to a preset clipping rule to obtain an image block corresponding to the bounding box; and constructing the image block samples according to the image block.

In an embodiment of the present invention, the target detector may be, for example, an master-RCNN or a YOLO; l_i、t_i、r_i、b_iMay be coordinates in units of pixels.

wherein，

σ is the spreading factor.

In the embodiment of the present invention, taking the moving object as an example of a traveling vehicle, the boundary frame B of the traveling vehicle is acquired_i＝(l_i，t_i，r_i，b_i) Cutting the boundary frame according to the cutting rule to obtain an image block corresponding to the boundary frame, and adjusting the image block to a fixed size to obtain an image block sample; meanwhile, the corresponding image block is generated in the last frame at the same position in the above manner, and the image block is adjusted to the fixed size to obtain the image block sample.

In the embodiment of the present invention, the optical flow specifically refers to an amount of movement of a pixel point representing the same object (object) in one frame of a video image to a next frame, and may be represented by a two-dimensional vector. Information in the spatiotemporal domain is extracted using dense optical flow, which can emphasize the motion information of the vehicle throughout the sequence of image frames.

In an embodiment of the present invention, the tracker may be a tracker with a built-in deep sort tracking algorithm, for example.

In an embodiment of the present invention, the moving object may be, for example, a running vehicle, and the optical flow and depth features of the vehicle in the clipped and resized image block samples of the vehicle may be predicted by using a function of OpenCV, so as to obtain the magnitude and angle of a vector of a pixel displacement of the vehicle, and the pixel displacement is used as a basic feature of the moving object state information.

According to the embodiment of the invention, the tracking information can be obtained by using a Deepsort tracking algorithm.

In an embodiment of the present invention, the tracking information obtained according to the DeepSort tracking algorithm is, for example, a state vector, which is defined as:

wherein (x, y) represents the center point of the bounding box, h represents the height of the bounding box, r represents the aspect ratio of the bounding box,

the height information is the aspect ratio information corresponding to the center of the relative speed of the image block in the coordinate system. When the detector detects information associated with the target vehicle, the state vector of the target vehicle is updated using the detected bounding box.

In an embodiment of the present invention, the moving object may be, for example, a vehicle in motion, and the state information output by the above method may be speed and position information of the vehicle.

Another embodiment of the present invention provides a state detection apparatus for a mobile object based on any one of the above methods, including: the acquisition module is used for acquiring an image frame sequence obtained by shooting by a monocular camera, wherein the image frame of the image sequence comprises a moving object to be detected; the detection module is used for carrying out target detection on the image frame sequence to obtain an image block sample for representing the moving object to be detected; the first processing module is used for inputting the image block samples into the feature extraction module and outputting optical flow information and tracking information of the moving object to be detected; and the second processing module is used for inputting the optical flow information and the tracking information into a neural network model and outputting the state information of the moving object to be detected.

It should be noted that the state detection device portion of the moving object in the embodiment of the present disclosure corresponds to the state detection method portion of the moving object in the embodiment of the present disclosure, and the description of the state detection device portion of the moving object specifically refers to the state detection method portion of the moving object, and is not repeated herein.

By the method and the device disclosed by the invention, a lightweight system structure is obtained, expensive hardware devices are not needed, the influence of severe weather, unbalanced motion distribution and perspective projection on optical flow clue estimation is reduced, and the detection of the state information of the moving object is improved.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A state detection method of a moving object, comprising:

acquiring an image frame sequence obtained by shooting by a monocular camera, wherein the image frames of the image frame sequence comprise a moving object to be detected;

performing target detection on the image frame sequence to obtain an image block sample for representing the moving object to be detected;

inputting the image block samples into a feature extraction module, and outputting optical flow information and tracking information of the moving object to be detected; and

and inputting the optical flow information and the tracking information into a neural network model, and outputting the state information of the moving object to be detected.

2. The method of claim 1, wherein the performing target detection on the sequence of image frames to obtain image block samples for characterizing the moving object to be detected comprises:

inputting image frames in the image frame sequence into a target detector, and outputting a bounding box B for representing the moving object to be detected_i＝(l_i，t_i，r_i，b_i) Wherein l is_iIs the left boundary coordinate, t, of the bounding box_iIs the upper boundary coordinate of the bounding box, r_iAs the right boundary coordinates of the bounding box, b_iThe lower boundary coordinates of the boundary frame are obtained;

cutting the boundary box according to a preset cutting rule to obtain an image block corresponding to the boundary box;

and constructing the image block samples according to the image blocks.

3. The method according to claim 2, wherein clipping the bounding box according to a preset clipping rule to obtain an image block corresponding to the bounding box comprises:

wherein the content of the first and second substances,

σ is the spreading factor.

4. The method of claim 1, wherein the feature extraction module comprises an optical flow calculation module and a tracker, the inputting the image block samples into the feature extraction module, and the outputting the optical flow information and the tracking information of the moving object to be detected comprises:

inputting the image block samples into the optical flow calculation module, and outputting optical flow information of the moving object to be detected;

and inputting the image block samples into the tracker and outputting the tracking information of the moving object to be detected.

5. The method of claim 1, wherein the optical flow information comprises a magnitude and an angle of a pixel displacement vector of the object to be detected.

6. The method of claim 1, further comprising:

and obtaining the tracking information by using a Deepsort tracking algorithm.

7. The method of claim 1, wherein the neural network model comprises a long-term recursive convolutional network, a first multi-layered perceptron, and a second multi-layered perceptron.

8. The method of claim 7, wherein inputting the optical flow information and tracking information into a neural network model, outputting the state information of the moving object to be detected comprises:

inputting the optical flow information into the long-term recursive convolutional network, and outputting a first one-dimensional vector;

inputting the tracking information into the first multilayer perceptron, and outputting a second one-dimensional vector;

pooling and stacking the first one-dimensional vector and the second one-dimensional vector to obtain a 1 x n-dimensional vector, wherein n is a dimension;

and inputting the vector with the dimension of 1 x n into the second multilayer perceptron, and outputting the state information.

9. The method of claim 8, wherein the first multi-layered sensor is a 6-layered multi-layered sensor and the second multi-layered sensor is a 3-layered multi-layered sensor.

10. A state detection apparatus of a mobile object realized based on the method of any one of claims 1 to 9, comprising:

the system comprises an acquisition module, a detection module and a processing module, wherein the acquisition module is used for acquiring an image frame sequence obtained by shooting by a monocular camera, and the image frame of the image sequence comprises a moving object to be detected;

the detection module is used for carrying out target detection on the image frame sequence to obtain an image block sample for representing the moving object to be detected;

the first processing module is used for inputting the image block samples into the feature extraction module and outputting optical flow information and tracking information of the moving object to be detected; and

and the second processing module is used for inputting the optical flow information and the tracking information into a neural network model and outputting the state information of the moving object to be detected.