CN113920163B

CN113920163B - Moving target detection method based on combination of traditional and deep learning

Info

Publication number: CN113920163B
Application number: CN202111176760.4A
Authority: CN
Inventors: 蒋涛; 崔亚男; 谢昱锐; 付克昌; 袁建英; 吴思东; 黄小燕; 刘明文; 段翠萍; 罗鸿明
Original assignee: Chengdu University of Information Technology
Current assignee: Chengdu University of Information Technology
Priority date: 2021-10-09
Filing date: 2021-10-09
Publication date: 2024-06-11
Anticipated expiration: 2041-10-09
Also published as: CN113920163A

Abstract

The invention discloses a moving target detection method based on combination of traditional and deep learning, which comprises the following steps: step one, detecting adjacent frames of road images acquired by a binocular camera by adopting an example segmentation algorithm so as to divide each image into a potential moving target area and a static area; step two, extracting and matching characteristic points of potential moving target areas and static areas in each image respectively; and thirdly, determining motion compensation based on the camera self-motion parameters, and judging the motion state by calculating the re-projection error so as to finish marking the moving object in the image based on the judging result. The invention discloses a moving target detection method based on combination of traditional and deep learning, which can effectively improve algorithm instantaneity and improve detection precision by improving precision of self-motion parameter estimation.

Description

Moving target detection method based on combination of traditional and deep learning

Technical Field

The invention relates to the technical field of image processing. More particularly, the present invention relates to a moving object detection method based on a combination of conventional and deep learning for use in the case of detecting a potential moving object on an intelligent vehicle running road.

Background

Today, intelligent vehicles have become a hotspot for research in the world's vehicle engineering field and a new power for the growth of the automobile industry, and many developed countries incorporate them into the intelligent transportation field, which is the major development of the country. The intelligent vehicle has the characteristics of complex running environment, high dynamic property, high randomness and the like. The accurate detection and track prediction of the dynamic targets in the environment are the basis of unmanned vehicle behavior decision and navigation planning, are the key for ensuring intelligent safe driving, and particularly are particularly important for unmanned vehicle decision when the intelligent safe driving is changed into a multi-lane driving and the dynamic targets are gathered into a highway from a highway and the like.

Currently, perception of moving objects by intelligent vehicles mainly includes a laser radar-based method and a vision-based method. The laser radar can obtain accurate distance information of a scene target from the vehicle, but is limited by angular resolution, so that the detection capability of a long-distance small target is weak; in addition, the high price is also one of the factors limiting the popularity of unmanned vehicles. In contrast, the visual sensor considered to be the most humanoid perception is paid attention to because of the advantages of low cost, small volume, light weight, large information, good algorithm reusability and the like, and even a few unmanned giant companies use pure visual perception as the dominant force direction of intelligent vehicle environment perception. Currently, the application of vision on intelligent vehicles is mainly focused on the recognition of lane lines, road signs, pedestrians and vehicles. For moving object detection of a moving camera, for example, the camera is fixed on a moving platform, and a moving object detection method based on a still camera is not applicable any more due to the movement of the camera itself. Therefore, research on moving object detection based on a mobile camera has become a hotspot in recent years.

At present, moving object detection based on a still camera mainly adopts methods such as background subtraction, a frame difference method, an optical flow method and the like to realize moving object detection, and is widely applied to crowd monitoring in public places. However, when the camera is fixed on a mobile platform, moving object detection is performed, for example: the intelligent vehicle is not applicable, and the movement of the camera is used for mixing the movement of the target and the movement of the background, so that great difficulty is brought to the detection of the moving target.

Disclosure of Invention

It is an object of the present invention to address at least the above problems and/or disadvantages and to provide at least the advantages described below.

To achieve these objects and other advantages and in accordance with the purpose of the invention, there is provided a moving object detection method based on a combination of conventional and deep learning, comprising:

Step one, detecting adjacent frames of road images acquired by a binocular camera by adopting an example segmentation algorithm so as to divide each image into a potential moving target area and a static area;

step two, extracting and matching characteristic points of potential moving target areas and static areas in each image respectively;

and thirdly, determining motion compensation based on the camera self-motion parameters, and judging the motion state by calculating the re-projection error so as to finish marking the moving object in the image based on the judging result.

Preferably, in step one, the binocular camera is configured to employ a binocular camera mounted on a vehicle;

In the second step, an example segmentation algorithm SOLOv is adopted to mark the background pixel value in the road environment data image as 0, and the pixel value of each other potential moving object is marked in sequence from 1,2, so that the corresponding different potential moving objects in each image acquired by the binocular camera are set as mask images with different label information, and the road environment data image is divided into a potential moving object area and a static area.

Preferably, in the second step, the feature point extraction method for the potential moving target region and the stationary region is configured to include:

The feature points of the static area are configured to be obtained by adopting an ORB feature point extraction method, and camera self-motion parameters are obtained through a feature point homogenization extraction strategy;

The feature points for the potential moving object region are configured to be obtained by adopting a Shi Tomasi feature point extraction method.

Preferably, in step three, the motion compensation is configured to include:

based on the feature points extracted and matched in the static area, calculating camera self-motion parameters between every two frames in front and back frame images by using a PnP method;

and performing motion compensation on the previous frame image in every two frames through the self-motion parameters of the camera, so that the image is equivalent to the condition that the camera is stationary.

Preferably, in the third step, the re-projection error is to project the feature point on the current frame onto the previous frame to obtain a re-projection residual image of the adjacent frame.

Preferably, in the third step, the method for determining a motion state is configured to:

the length corresponding to the feature point represented by the reprojection residual on the potential moving target region Judging threshold value/> with characteristic point motion stateComparison is made at/>The characteristic points are marked with colors to indicate that the characteristic points are points of movement;

Traversing each potential moving object by using a label, counting the number of moving points falling on each potential moving object area, setting a threshold value phi of the number of moving points, and marking the potential moving object area as green to represent the moving object if the number of moving points of a certain potential moving object area is larger than the threshold value;

Wherein the said Is configured as:

；

mean value of reprojection residual length representing stationary region,/> Indicating a number greater than 1.

The invention at least comprises the following beneficial effects: firstly, the mask generated by example segmentation is used as a potential moving target area, and the whole image is divided into two parts accurately by the method: a stationary region and a potential moving target region. And the feature points are further extracted by using different strategies for the two parts, so that the real-time performance of the algorithm is improved, and the accuracy of self-motion parameter estimation is improved.

Secondly, when the moving object is judged, the motion state of the object is judged by taking the threshold value method of the self-motion estimation error into consideration, the error of the self-motion parameter estimation is considered, the accuracy is improved, and the method is practical and effective.

Additional advantages, objects, and features of the invention will be set forth in part in the description which follows and in part will become apparent to those having ordinary skill in the art upon examination of the following or may be learned from practice of the invention.

Drawings

FIG. 1 is a flow chart of a moving object detection method based on the combination of the traditional and the deep learning;

FIG. 2 is a schematic diagram of a process for determining a moving object in the fourth step;

FIG. 3 is an illustration of an untreated road environment obtained in an application of an embodiment of the present invention;

FIG. 4 is a schematic illustration of marking points of motion in a potential moving target region in step four of the present invention;

FIG. 5 is a schematic diagram of a moving object marked in step four of the present invention;

FIG. 6 is another schematic diagram of the moving object marked in the fourth step of the present invention;

FIG. 7 is a schematic diagram of a prior art marking of a moving object;

fig. 8 is another schematic diagram of a prior art marking a moving object.

Detailed Description

The present invention is described in further detail below with reference to the drawings to enable those skilled in the art to practice the invention by referring to the description.

According to the moving target detection method based on the combination of the traditional method and the deep learning, the problem that the traditional target detection cannot be detected when shielding occurs and the detection effect is not ideal is solved by trying to combine the pedestrian vehicle detection algorithm based on the deep learning with the traditional method. In addition, a threshold method considering the self-motion estimation error is introduced to judge the motion state of the target so as to solve the problem that the accuracy is not high because the existing method only uses the re-projection error to judge the motion state of the target and neglects the existence of the self-motion parameter estimation error.

The invention detects the vehicle and pedestrian examples of the left image in the road image by using an example segmentation algorithm, and takes the detected vehicles and pedestrians as potential moving targets.

Because potential moving targets in the traffic environment are mainly pedestrians and vehicles, the traditional target detection cannot be detected when shielding occurs, the detection effect is not ideal, and the current pedestrian and vehicle detection based on deep learning is very mature, so that the invention introduces a pedestrian and vehicle detection algorithm based on deep learning, firstly locates candidate areas of the moving targets in images, reduces the search range of the moving targets on the whole image, and improves the detection speed of the moving targets. Considering the accuracy of target boundary detection, the invention aims to obtain the boundary areas of vehicles and pedestrians by adopting an example segmentation algorithm.

Then, feature points on four images of a front frame and a rear frame are detected (the front frame and the rear frame are four images because the binocular camera can acquire images of a left view and a right view of the same object when in use, but the two images only have slight differences in view, so that a left image or a right image can be directly selected for analysis processing during analysis), camera self-motion parameters are calculated firstly, and then motion compensation is carried out on the previous frame image in every two frames through the obtained camera self-motion parameters, so that a re-projection residual image is obtained.

The specific flow is as follows:

Since the self-motion parameter estimation of the camera mainly obtains the pose of the camera through a visual odometer, the current visual odometer is based on the assumed condition of a static scene, and when a dynamic target is taken as a main body, an algorithm can be invalid. The characteristic points are unevenly distributed in the static area, which causes errors in calculating the self-motion parameters of the camera, so the invention combines the potential moving object extraction and the visual odometer, and the steps comprise: firstly, regarding a potential moving target area as moving, extracting characteristic points in the rest area, and using the characteristic points as self-motion parameter estimation; secondly, the accuracy of the self-motion parameter estimation is improved by using a homogenization feature extraction strategy in the static region.

And finally, judging the motion state of the target through the reprojection residual image. According to the uncertainty of the target motion estimation, a threshold method considering the self-motion parameter estimation error is adopted to judge the motion state of the target. The threshold here refers to the average length of the feature point pairs. And respectively calculating the average length of the stationary region characteristic point pairs and the length of the potential moving target region characteristic point pairs in the re-projection residual image. The value of the stationary region takes into account the error estimated from the motion parameters. And if the value of the characteristic point pair length of the potential moving target area is larger than a certain multiple of the average value of the static area, judging that the characteristic point is moving, otherwise, judging that the potential moving target area is static. And setting a constant threshold value, counting the number of motion points of each potential motion target area, and marking the potential motion target area as a motion target if the number of motion points is larger than the threshold value, otherwise marking the potential motion target area as a static target, and marking the potential motion target area with different colors respectively.

Examples:

The invention is realized on Clion experimental platform, mainly comprising seven steps, mainly relating to potential moving object extraction by utilizing example segmentation, characteristic point extraction and matching, self-motion parameter estimation, motion compensation, calculation of re-projection residual error, motion state judgment and marking by using self-adaptive threshold, and specifically comprising the following steps:

step one, extracting potential moving targets from an input image, wherein the potential moving targets comprise vehicles and pedestrians. Specifically, a mature example segmentation algorithm SOLOv is adopted, the background pixel value is marked as 0, the pixel values of all the other potential moving targets are marked in sequence from 1,2, mask images corresponding to each image of the left image of the binocular camera and provided with different label information for different potential moving targets are obtained, and compared with the traditional method, the method can extract the potential moving targets more completely.

And step two, extracting and matching characteristic points by utilizing the potential moving target area and the static area acquired by each image in the step one, wherein the two areas respectively use two different methods to meet different targets required to be achieved by the corresponding areas. The stationary region uses an ORB feature point extraction method and a feature point homogenization strategy to obtain relatively accurate camera self-motion parameters. And the potential moving target area needs a large number of characteristic points, so that the problem that the target is lost due to too few characteristic points in the judgment of the moving state is avoided, and a Shi Tomasi characteristic point extraction method is used to obtain richer characteristic points. The static and potential moving target areas of the right image are not required to be output through deep learning again, but the corresponding characteristic points of the right image are directly obtained through matching of the characteristic points of the corresponding areas of the left image, and time cost can be reduced.

And thirdly, calculating camera self-motion parameters between every two frames by utilizing a PnP method through the feature points extracted and matched by the static region and utilizing the front and back frame images.

And step four, performing motion compensation on the previous frame image in every two frames by utilizing the camera self-motion parameters in the step three. Making the image equivalent to the case where the camera is stationary.

And fifthly, carrying out reprojection by utilizing the characteristic points in the step two. I.e. projected onto the previous frame using the feature points of the current frame. And acquiring a re-projection residual image.

Step six, the errors due to the estimation of the camera from the motion parameters cannot be completely eliminated (if there is no residual error of the re-projection of the static region with the errors, the residual error should be zero). The average value of the heavy projection residues of the static area is calculated, the heavy projection residues on the potential moving target area and the average value of the heavy projection residues of the static area are used as a threshold value for comparison, and if a certain heavy projection residual on the potential moving target area is larger than the threshold value, the characteristic point is marked as green to represent the moving point. Using the label to traverse each potential moving object, counting the number of moving points falling on each potential moving object area, setting a threshold value phi of the number of moving points, and if the number of moving points of a certain potential moving object area is larger than the threshold value, marking the potential moving object area as red to represent the moving object. Otherwise the area is marked green, indicating a stationary object.

；

When (when)This feature point is marked as a moving point, where/>Representing the length of the feature point pair represented by the reprojection residual on the potential moving target region,/>Mean value of reprojection residual length representing stationary region,/>Representing a value greater than 1,/>And a threshold value for judging the motion state of the feature point.

In order to better illustrate the accuracy of the invention in detecting the moving object, for the same road environment data, the detection flow and effect of the invention are shown in fig. 4-6, and the invention can be used for marking the moving object in the complex road environment in the maximum range, and can be used for marking the static potential moving object in the image, wherein the marking accuracy meets the use requirement;

The conventional method for detecting moving objects does not perform instance segmentation and does not adopt a specific threshold value to judge the motion state of feature points in the image information processing process, so the detection results are shown in fig. 7 and 8, wherein the area in the box represents the detected moving object, and the defects in the prior art can be known from the figure: 1, because of excessive interference factors in road environment images, the prior art can misdetect roadside telegraph poles or walls, leaves and the like as moving targets, and a large number of misdetections occur; 2, the prior art cannot mark some stationary potential moving objects. Meanwhile, in the prior art, the image is not partitioned in the actual processing, so that the detection processing efficiency is low.

The above is merely illustrative of a preferred embodiment, but is not limited thereto. In practicing the present invention, appropriate substitutions and/or modifications may be made according to the needs of the user.

The number of equipment and the scale of processing described herein are intended to simplify the description of the present invention. Applications, modifications and variations of the present invention will be readily apparent to those skilled in the art.

Although embodiments of the invention have been disclosed above, they are not limited to the use listed in the specification and embodiments. It can be applied to various fields suitable for the present invention. Additional modifications will readily occur to those skilled in the art. Therefore, the invention is not to be limited to the specific details and illustrations shown and described herein, without departing from the general concepts defined in the claims and their equivalents.

Claims

1. The moving object detection method based on the combination of the traditional and the deep learning is characterized by comprising the following steps of: step one, detecting adjacent frames of road images acquired by a binocular camera by adopting an example segmentation algorithm so as to divide each image into a potential moving target area and a static area;

Determining motion compensation based on camera self-motion parameters, and judging a motion state by calculating a reprojection error so as to finish marking a moving target in an image based on a judgment result;

in step one, the binocular camera is configured to employ a binocular camera mounted on a vehicle;

In the second step, an example segmentation algorithm SOLOv is adopted to mark a background pixel value in the road environment data image as 0, and the pixel value of each other potential moving object is marked in sequence from 1,2, so that the corresponding different potential moving objects in each image acquired by the binocular camera are set as mask images with different label information, and the road environment data image is divided into a potential moving object area and a static area;

In the third step, the method for judging the motion state is configured as follows:

Traversing each potential moving object by using a label, counting the number of moving points falling on each potential moving object area, setting a threshold value phi of the number of moving points, and marking the potential moving object area as red to represent the moving object if the number of moving points of a certain potential moving object area is larger than the threshold value; otherwise, marking the potential moving target area as green to represent a static target;

Wherein the said Is configured as:

；

2. The moving object detection method based on a combination of conventional and deep learning as claimed in claim 1, wherein in the second step, the feature point extraction manner for the potential moving object region and the stationary region is configured to include:

3. The moving object detection method based on a combination of conventional and deep learning as claimed in claim 1, wherein in step three, the motion compensation is configured to include:

4. The moving object detection method based on a combination of conventional and deep learning as claimed in claim 1, wherein in the third step, the re-projection error is to project a feature point projected on a feature point on a current frame onto a previous frame to obtain a re-projection residual image of an adjacent frame.