CN112699854A

CN112699854A - Method and device for identifying stopped vehicle

Info

Publication number: CN112699854A
Application number: CN202110300499.8A
Authority: CN
Inventors: 钟文坤; 韩磊; 杜威; 杜虎
Original assignee: Liangfengtai Shanghai Information Technology Co ltd
Current assignee: Liangfengtai Shanghai Information Technology Co ltd
Priority date: 2021-03-22
Filing date: 2021-03-22
Publication date: 2021-04-23
Anticipated expiration: 2041-03-22
Also published as: CN112699854B

Abstract

The method comprises the steps that a road surface video is shot through the unmanned aerial vehicle in a cruising mode and transmitted to computing equipment, after the computing equipment receives the road surface video, a plurality of video frames corresponding to the road surface video are obtained, vehicles in the video frames are detected and tracked, then a transformation matrix between adjacent video frames is calculated, the displacement of each detected vehicle in an actual scene is further calculated, and finally whether the vehicle stops running in the actual scene is judged through counting the accumulated displacement of each vehicle in a period of time And flexible road surface parking identification.

Description

Method and device for identifying stopped vehicle

Technical Field

The present application relates to the field of communications, and more particularly, to a technique for identifying a stopped vehicle.

Background

With the rapid development of science and technology, the unmanned aerial vehicle technology gradually enters the visual field of people. The unmanned aerial vehicle is rapidly applied to various industries by virtue of the unique advantages of the unmanned aerial vehicle. With the gradual popularization of the digitization and the intellectualization of the expressway, the unmanned aerial vehicle has more and more application prospects in the field of the expressway. The unmanned aerial vehicle flies high and far away, and particularly when the highway is jammed on the road due to the fact that factors such as holidays or accidents and weather face, a high-definition camera carried on the unmanned aerial vehicle can transmit pictures to a monitoring platform in real time at the first time, so that the high-speed traffic police can deploy police strength according to the aerial photographing road condition. The illegal parking of the high-speed road is the most common traffic illegal behavior of the high-speed road, and the illegal parking is easy to cause traffic accidents of rear-end collision, side turning, rubbing and even serial collision, which are main reasons for serious traffic accidents of the high-speed road. At present, the supervision work of the running of vehicles on the highway still needs a large amount of manpower and material resources to be input, especially, illegal parking is monitored, illegal parking is found mainly through manpower pavement inspection and fixed camera monitoring, a large amount of manpower is needed to be input in the manpower pavement inspection, the fixed camera monitoring only can depend on the camera which is erected, the monitoring range is small, and the flexibility is poor. Therefore, how to efficiently, flexibly and accurately identify the parking behavior of the road section becomes a problem to be solved urgently in the intellectualization of the expressway.

Disclosure of Invention

An object of the present application is to provide a method and apparatus for identifying a stopped vehicle.

According to an aspect of the present application, there is provided a method of identifying a stopped vehicle, the method including:

acquiring pavement video information shot by an unmanned aerial vehicle, and acquiring a plurality of video frames corresponding to the pavement video information;

obtaining each detected vehicle and a vehicle position area in the subsequent video frame thereof by performing vehicle detection on at least one video frame in the plurality of video frames and performing vehicle tracking on the subsequent video frame corresponding to the at least one video frame;

extracting feature points on the video frames, matching the feature points on two adjacent video frames to obtain feature point matching pairs, and calculating a transformation matrix between the two adjacent video frames according to the feature point matching pairs;

for each vehicle in the video frame, if the vehicle is also present in the adjacent video frame of the video frame, obtaining the actual motion vector of the vehicle relative to the actual scene in the video frame and the adjacent video frame according to the vehicle position area of the vehicle in the video frame, the vehicle position area of the vehicle in the adjacent video frame, and the transformation matrix between the video frame and the adjacent video frame;

and for each detected vehicle, obtaining the actual movement distance of the vehicle in a preset time period before each video frame according to a plurality of actual movement vectors corresponding to the vehicle, and identifying the vehicle as a vehicle stopping running if the preset number of actual movement distances obtained by the vehicle in the continuous preset number of video frames are all smaller than or equal to a preset distance threshold value.

According to one aspect of the present application, there is provided a computing device that identifies a stopped vehicle, the device including:

the one-to-one module is used for acquiring road surface video information shot by the unmanned aerial vehicle and acquiring a plurality of video frames corresponding to the road surface video information;

a second module, configured to perform vehicle detection on at least one video frame of the multiple video frames and perform vehicle tracking on a subsequent video frame corresponding to the at least one video frame, so as to obtain each detected vehicle and a vehicle position area in the subsequent video frame;

the three modules are used for extracting feature points on the video frames, matching the feature points on two adjacent video frames to obtain feature point matching pairs, and calculating a transformation matrix between the two adjacent video frames according to the feature point matching pairs;

a fourth module, configured to, for each vehicle in a video frame, if the vehicle is also present in an adjacent video frame of the video frame, obtain an actual motion vector of the vehicle in the video frame and the adjacent video frame relative to an actual scene according to a vehicle position area of the vehicle in the video frame, a vehicle position area of the vehicle in the adjacent video frame, and a transformation matrix between the video frame and the adjacent video frame;

and a fifth module, configured to, for each detected vehicle, obtain, according to a plurality of actual motion vectors corresponding to the vehicle, an actual motion distance of the vehicle in a predetermined time period before each video frame, and identify the vehicle as a stopped vehicle if all of the predetermined number of actual motion distances obtained by the vehicle in consecutive predetermined number of video frames are less than or equal to a preset distance threshold.

According to an aspect of the present application, there is provided an apparatus for identifying a stopped vehicle, wherein the apparatus includes:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to:

According to one aspect of the application, there is provided a computer-readable medium storing instructions that, when executed, cause a system to:

According to another aspect of the application, there is provided a computer program product comprising a computer program which, when executed by a processor, performs the method of:

Compared with the prior art, the unmanned aerial vehicle cruise shooting road video and transmits the road video to the computing equipment, the computing equipment receives the road video, obtains a plurality of video frames corresponding to the road video, detects and tracks vehicles in the video frames, then calculates a transformation matrix of adjacent video frames, further calculates the displacement of each detected vehicle in an actual scene, and finally judges whether the vehicle stops running in the actual scene or not by counting the accumulated displacement of each vehicle within a period of time, compared with the prior art, the unmanned aerial vehicle cruise shooting method does not need to deploy monitoring equipment on site in advance, is convenient to use, has low deployment cost and management cost, can monitor the stopping behavior of a highway in a large range, can move and fly all the time, does not need to hover in the air to fix a monitoring picture, can realize more convenient, high-efficient, nimble road surface parking discernment, this application realizes parking discernment based on visual analysis technique to the video, only uses the video on unmanned aerial vehicle's last camera of carry to shoot the road surface, need not to rely on the information of other sensors (like GPS, angle sensor, altitude sensor) to parking recognition function is realized to minimum hardware dependence.

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings.

Drawings

FIG. 1 illustrates a flow chart of a method of identifying a stopped vehicle according to one embodiment of the present application;

FIG. 2 illustrates a block diagram of a computing device identifying a stopped vehicle according to one embodiment of the present application;

FIG. 3 illustrates an exemplary system that can be used to implement the various embodiments described herein;

FIG. 4 illustrates a schematic view of a reduced vehicle location area in accordance with one embodiment of the present application;

FIG. 5 shows a schematic diagram of a video frame according to an embodiment of the present application;

FIG. 6 illustrates a schematic diagram of a mask image, according to one embodiment of the present application.

The same or similar reference numbers in the drawings identify the same or similar elements.

Detailed Description

The present application is described in further detail below with reference to the attached figures.

In a typical configuration of the present application, the terminal, the device serving the network, and the trusted party each include one or more processors (e.g., Central Processing Units (CPUs)), input/output interfaces, network interfaces, and memory.

The Memory may include forms of volatile Memory, Random Access Memory (RAM), and/or non-volatile Memory in a computer-readable medium, such as Read Only Memory (ROM) or Flash Memory. Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, Phase-Change Memory (PCM), Programmable Random Access Memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), electrically Erasable Programmable Read-Only Memory (EEPROM), flash Memory or other Memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device.

The device referred to in this application includes, but is not limited to, a user device, a network device, or a device formed by integrating a user device and a network device through a network. The user equipment includes, but is not limited to, any mobile electronic product capable of performing human-computer interaction with a user, such as a smart phone, a tablet computer, and the like, and the mobile electronic product may employ any operating system, such as an Android operating system, an iOS operating system, and the like. The network Device includes an electronic Device capable of automatically performing numerical calculation and information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded Device, and the like. The network device includes but is not limited to a computer, a network host, a single network server, a plurality of network server sets or a cloud of a plurality of servers; here, the Cloud is composed of a large number of computers or web servers based on Cloud Computing (Cloud Computing), which is a kind of distributed Computing, one virtual supercomputer consisting of a collection of loosely coupled computers. Including, but not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, a wireless Ad Hoc network (Ad Hoc network), etc. Preferably, the device may also be a program running on the user device, the network device, or a device formed by integrating the user device and the network device, the touch terminal, or the network device and the touch terminal through a network.

Of course, those skilled in the art will appreciate that the foregoing is by way of example only, and that other existing or future devices, which may be suitable for use in the present application, are also encompassed within the scope of the present application and are hereby incorporated by reference.

In the description of the present application, "a plurality" means two or more unless specifically limited otherwise.

Fig. 1 shows a flowchart of a method for identifying a stopped traveling vehicle according to an embodiment of the present application, the method including step S11, step S12, step S13, step S14, and step S15. In step S11, the computing device obtains road video information captured by the unmanned aerial vehicle, and obtains a plurality of video frames corresponding to the road video information; in step S12, the computing device obtains each detected vehicle and its vehicle position area in the subsequent video frame by performing vehicle detection on at least one video frame of the plurality of video frames and performing vehicle tracking on the subsequent video frame corresponding to the at least one video frame; in step S13, the computing device extracts feature points on the video frames, matches the feature points on two adjacent video frames to obtain a feature point matching pair, and computes a transformation matrix between the two adjacent video frames according to the feature point matching pair; in step S14, for each vehicle in the video frame, if the vehicle is also present in an adjacent video frame of the video frame, the computing device obtains an actual motion vector of the vehicle in the video frame and the adjacent video frame relative to the actual scene according to a vehicle position area of the vehicle in the video frame, a vehicle position area of the vehicle in the adjacent video frame, and a transformation matrix between the video frame and the adjacent video frame; in step S15, the computing device obtains, for each detected vehicle, an actual movement distance of the vehicle in a predetermined time period before each video frame according to a plurality of actual movement vectors corresponding to the vehicle, and identifies the vehicle as a stopped vehicle if a predetermined number of actual movement distances obtained by the vehicle in a predetermined number of consecutive video frames are all less than or equal to a preset distance threshold.

In step S11, the computing device obtains road video information captured by the unmanned aerial vehicle, and obtains a plurality of video frames corresponding to the road video information. In some embodiments, the unmanned aerial vehicle can automatically cruise according to a preset route, and can also manually control the unmanned aerial vehicle to cruise, wherein the unmanned aerial vehicle can cruise at a constant speed or an unsteady speed, and does not need to hover in the cruise process. In some embodiments, the computing device includes, but is not limited to, a mobile device, a PC, a cloud server, and other devices with computing capability, and the road video shot in cruising is sent to the remote control end of the unmanned aerial vehicle through the wireless transmission module of the unmanned aerial vehicle itself, and then the road video is sent to the computing device through the network transmission module. In some embodiments, the computing device may also be a computing module mounted on the drone or a computing module of the remote control end, if the drone itself or the remote control end contains computing capabilities. In some embodiments, after receiving the road surface video, the computing device decodes the video to obtain a plurality of video frames corresponding to the road surface video.

In step S12, the computing device obtains each detected vehicle and its vehicle location area in the subsequent video frame by performing vehicle detection on at least one video frame of the plurality of video frames and performing vehicle tracking on the subsequent video frame corresponding to the at least one video frame. In some embodiments, the vehicle detection is performed on at least one of the plurality of video frames, which may be vehicle detection performed on each of the plurality of video frames, or vehicle detection may be performed on a specific video frame and each subsequent video frame of the plurality of video frames, or vehicle detection may be performed on the current video frame at intervals of a predetermined number of frames, for example, the predetermined number of frames is 10 frames, vehicle detection is performed on the 1 st frame, the 11 th frame, the 21 st frame, and so on. In some embodiments, the object of vehicle detection is to find the corresponding positions of all vehicles in the image, and the output is the coordinates of the vehicle bounding rectangle, i.e. the vehicle position area, in the image. In some embodiments, any object Detection algorithm may be used for vehicle Detection, and is not limited herein, for example, ACF (Aggregate Channel Features for Multi-view Face Detection), DPM (Deformable Part model), pyramidBox (A Context-associated Single Box Face Detector), SSD (Single Box Detector), MTCNN (Multi-task connected network), YOLOv2 (second version of the You Only Look series object Detection algorithm), YOLOv3 (third version of the You Only Look series object Detection algorithm), and so on. Preferably, the yolov4-tiny (a simplified version of the fourth version of the young Only Look one series object detection algorithm) object detection algorithm can be used for vehicle detection, and the algorithm is fast, has good robustness and can be operated on a mobile device. In some embodiments, after a vehicle and its vehicle position area in a video frame are detected on a certain video frame, the vehicle needs to be tracked in a subsequent video frame corresponding to the video frame, that is, a unique ID is given to the same vehicle in the video frame image sequence, and vehicle tracking is to predict its vehicle position area in the next video frame according to the vehicle position area of the vehicle in the current video frame. In some embodiments, any target tracking algorithm may be used for vehicle tracking, such as kcf (kernel Correlation filter), tld (tracking Learning detection), mosse (minimum Output Sum of Squared error), etc., without limitation. In some embodiments, for a certain video frame, if the video frame is not subjected to vehicle detection, the position area of the vehicle in the video frame predicted by the vehicle tracking algorithm may be used as the vehicle position area in the video frame.

In step S13, the computing device extracts feature points on the video frames, matches the feature points on two adjacent video frames to obtain a feature point matching pair, and computes a transformation matrix between the two adjacent video frames according to the feature point matching pair. In some embodiments, since the drone is flying in cruise (may hover, or may be moving in flight all the time), we need to compute a transformation matrix between two adjacent video frames to know the relative motion of the drone with respect to the actual scene. In some embodiments, the video frame for extracting the feature point is each of a plurality of video frames corresponding to the road surface video information, and in other embodiments, the video frame for extracting the feature point is a video frame for performing the first vehicle detection and each subsequent video frame. In some embodiments, the plurality of feature points are extracted From the video frame, and any feature extraction algorithm may be used to extract the feature points on the video frame, without any limitation, such as surf (speedup Robust features), Sift (Scale-innovative feature transform), Harris (Harris corner detection algorithm), Fast (features From acquired Segment test), ORB (organized Fast and organized brief) algorithm may be used to extract the feature points quickly, and ORB algorithm is Fast in computation speed, has good invariance to translation, rotation, Scale scaling, brightness change, occlusion, noise, and the like, and also maintains a certain degree of stability to visual change and affine transformation. In some embodiments, after extracting the feature points from the video frames, matching a plurality of feature points on two adjacent video frames, that is, taking one of the video frames as a reference image, and finding a feature point that is the closest to each feature point of the reference image from the feature points in the other video frame, where any feature matching algorithm may be used for feature point matching, without any limitation, preferably, a Kd-tree (k-dimensional tree) algorithm is used for feature point matchingFast matching of points. In some embodiments, after feature point matching, a plurality of feature point matching pairs may be obtained, and from the plurality of feature point matching pairs, a transformation matrix describing a correspondence between the plurality of feature point matching pairs may be calculated, for example, let (p,

) For any one feature point matching pair in the feature point matching pairs, the any one group of matching points and the transformation matrix H satisfy

And = H × p. In some embodiments, the transformation matrix may be a perspective transformation matrix, or may also be an affine transformation matrix. In some embodiments, the transformation matrix may be calculated by a RANSAC (Random Sample Consensus) algorithm, or may also be calculated by a RANSAC derivation algorithm, or may also be calculated by a least square method, where a specific calculation method using the least square method is as follows: there are n feature point matching pairs, (p,

) For any one of the n feature point matching pairs, H is a transformation matrix, we have

H = H × p, least squares is to find H such that:

and (5) minimizing the sum of squares of the residuals, namely solving:

the matrix solution according to the least squares method yields:

。

in step S14, for each vehicle in the video frame, if the vehicle is also present in the neighboring video frame of the video frame, the computing device obtains the actual motion vector of the vehicle in the video frame and the neighboring video frame relative to the actual scene according to the vehicle position area of the vehicle in the video frame, the vehicle position area of the vehicle in the neighboring video frame, and the transformation matrix between the video frame and the neighboring video frame. In some embodiments, if the vehicle is also present in an adjacent video frame (a previous video frame or a next video frame) of the video frame, the following description will take the adjacent video frame as the previous video frame as an example, that is, for the same vehicle (i.e., ID) in two adjacent video frames formed by the video frame and the previous video frame, a theoretical vehicle position region of the vehicle in the video frame can be obtained according to a vehicle position region of the vehicle in the previous video frame and a transformation matrix between the two adjacent video frames, that is, if the vehicle is stationary, a position of the vehicle in the video frame is different from a position of the vehicle in the previous video frame when the vehicle is stationary because the drone is moving. According to the theoretical vehicle position area of the vehicle in the video frame and the vehicle position area of the vehicle in the video frame, the actual motion vector of the vehicle in the video frame relative to the previous video frame can be obtained, that is, the actual motion vector of the vehicle in two adjacent video frames formed by the video frame and the previous video frame relative to the actual scene, if the vehicle is stopped in the actual scene, the calculated actual motion vector tends to 0.

In step S15, the computing device obtains, for each detected vehicle, an actual movement distance of the vehicle in a predetermined time period before each video frame according to a plurality of actual movement vectors corresponding to the vehicle, and identifies the vehicle as a stopped vehicle if a predetermined number of actual movement distances obtained by the vehicle in a predetermined number of consecutive video frames are all less than or equal to a preset distance threshold. In some embodiments, for each detected vehicle, after obtaining the actual motion vector of the vehicle in the adjacent frame, a predetermined time period t is set, for example, 4 to 6 seconds, a plurality of actual motion vectors corresponding to the vehicle in the predetermined time period are obtained, the modulus values of the actual motion vectors are calculated, and then the sum is performed, so as to obtain the actual motion distance of the vehicle in the predetermined time period. In some embodiments, for the same vehicle, if the time of the current video frame is t1, we calculate the actual moving distance S of the current video frame in the (t 1-t, t 1) time period according to the corresponding actual motion vectors in the (t 1-t, t 1) time period, and if the S values of consecutive N frames (1 < = N) are all less than or equal to a preset distance threshold, identify the vehicle as a stopped vehicle. For example, when N =1, the determination is made by whether the S value in the t-period before the current frame is less than or equal to a preset distance threshold; and when N =2, judging whether the S value in the t period before the current frame and the S value in the t period before the next frame of the current frame are both smaller than or equal to a preset distance threshold value. In some embodiments, the computing device may output an alert upon identifying the vehicle stopped from traveling. In some embodiments, outputting the warning may be the computing device sending a warning to the police/command center, informing the police/command center that there is a stopped vehicle, and the police/command center taking the next action (e.g., sending the police to go, checking the reason for parking, and preventing a traffic accident). In some embodiments, the license plate number of the parked vehicle may be identified from the road video and communicated to the police/command center. In some embodiments, the position information of the parking vehicle can be identified and obtained from the road surface video according to the position information and the shooting angle information of the unmanned aerial vehicle, and the position information is informed to the police/command center.

The method comprises the steps that a road surface video is shot through the unmanned aerial vehicle in a cruising mode and transmitted to computing equipment, after the computing equipment receives the road surface video, a plurality of video frames corresponding to the road surface video are obtained, vehicles in the video frames are detected and tracked, then a transformation matrix between adjacent video frames is calculated, the displacement of each detected vehicle in an actual scene is further calculated, and finally whether the vehicle stops running in the actual scene is judged through counting the accumulated displacement of each vehicle in a period of time High-efficient, nimble road surface discernment of parking, this application realizes parking discernment based on visual analysis technique to the video, only uses the video on unmanned aerial vehicle last carry camera shooting road surface, need not to rely on the information (like GPS, angle sensor, height sensor) of other sensors to the minimum hardware relies on and realizes the recognition function that parks.

In some embodiments, the obtaining a plurality of video frames corresponding to the road surface video information includes: obtaining a plurality of original video frames corresponding to the road surface video information; and sampling the plurality of original video frames according to a preset frame number interval to obtain a plurality of video frames corresponding to the road surface video information. In some embodiments, after receiving the road surface video, decoding the video to obtain a plurality of original video frames corresponding to the road surface video, if the video transmission frame rate is relatively high, the amplitude of vehicle movement on adjacent frame images is generally not too large, in order to obtain obvious displacement of adjacent frame vehicles and save calculation resources, the plurality of original video frames may be sampled according to a predetermined frame number interval to obtain a plurality of video frames corresponding to the road surface video information, that is, one frame may be extracted every N frames (after extracting the current video frame, the next frame extracted is the nth frame after the current video frame, N > = 1), and preferably, it is required to obtain 7 to 14 frames after sampling every second.

In some embodiments, the vehicle detection and vehicle tracking of at least one video frame of the plurality of video frames comprises: taking a designated video frame in the plurality of video frames as a current video frame, performing vehicle detection on the current video frame, obtaining each vehicle in the current video frame and a vehicle position area thereof in the current video frame, and performing operation S for each vehicle in the current video frame, wherein operation S includes: dividing a vehicle position area corresponding to the vehicle to obtain a plurality of image blocks, obtaining sampling points corresponding to each image block in the plurality of image blocks, obtaining displacement information of each sampling point in a next video frame of the current video frame through an optical flow algorithm, determining the displacement information of the vehicle in the next video frame relative to the current video frame according to the displacement information of each sampling point in the next video frame, and obtaining a predicted vehicle position area of the vehicle in the next video frame according to the displacement information of the vehicle and the vehicle position area of the vehicle in the current video frame; performing vehicle detection on a next video frame of the current video frame, obtaining each vehicle in the next video frame and a vehicle position area in the next video frame, matching one or more vehicle position areas in the next video frame with one or more predicted vehicle position areas in the next video frame, obtaining a vehicle tracking result of the next video frame, and executing an operation S for each vehicle in the next video frame, wherein if the vehicle position area corresponding to a first vehicle detected in the next video frame is matched with the predicted vehicle position area corresponding to a second vehicle detected in the current video frame, it is determined that the first vehicle and the second vehicle are the same vehicle; and taking the next video frame as a new current video frame, repeatedly executing the step R, and repeating the steps for the subsequent video frames. In some embodiments, starting from a specified video frame in the plurality of video frames, performing vehicle detection on each video frame in the specified video frame and its subsequent video frames, where the specified video frame may be a first frame in the plurality of video frames, or may be any one frame specified in the plurality of video frames, taking the specified video frame as a current video frame, performing vehicle detection on the current video frame, obtaining each vehicle in the current video frame and its vehicle position region in the current video frame, i.e. a vehicle circumscribed rectangle, then, for each vehicle, dividing the vehicle position region corresponding to the vehicle into a plurality of image blocks, which may be uniform blocks, or non-uniform blocks, e.g. uniformly divided into 3 image blocks, and then, for each image block, obtaining a sampling point corresponding to the image block, where the sampling point may be a central pixel point of the image block, or, it may be a strong texture pixel in the image block, in some embodiments, each image block reserves a sampling point, then a displacement of each sampling point in a next video frame of the current video frame relative to the current video frame is calculated through an optical flow (e.g., L-K optical flow) algorithm, then for each vehicle, displacements of sampling points respectively corresponding to a plurality of image blocks divided according to a vehicle position region corresponding to the vehicle are obtained, an average value of the displacements of the sampling points is taken as displacement information of the vehicle in the next video frame relative to the current video frame, and a predicted vehicle position region of the vehicle in the next video frame can be predicted according to the displacement information and the vehicle position region of the vehicle in the current video frame. In some embodiments, the vehicle detection is also performed on a next video frame of the current video frame, each vehicle in the next video frame and a vehicle position area thereof in the next video frame are obtained, two-by-two matching is performed on one or more vehicle position areas in the next video frame and one or more predicted vehicle position areas in the next video frame, a vehicle tracking result of the next video frame is obtained, if the vehicle position area of a first vehicle detected in the next video frame matches a predicted vehicle position area corresponding to a second vehicle detected in the current video frame, the first vehicle and the second vehicle are the same vehicle, the same ID is given to the first vehicle and the second vehicle, that is, a second vehicle in the current video frame is detected in the next video frame, the tracking result of the next video frame includes a detection result of the second vehicle in the next video frame, namely, the vehicle position area of the first vehicle in the next video frame is used as the vehicle position area of the second vehicle in the next video frame. If the vehicle position area matched with the predicted vehicle position area corresponding to the second vehicle detected in the current video frame does not exist in the next video frame, that is, the second vehicle is not detected in the next video frame, the tracking result of the next video frame may include directly considering that the second vehicle disappears, and then no tracking is performed on the second vehicle, or may include the prediction result of the second vehicle in the next video frame, that is, the predicted vehicle position area of the second vehicle in the next video frame is taken as the vehicle position area of the second vehicle in the next video frame, and then the second vehicle continues to be tracked in a plurality of video frames of a predetermined number, and if the second vehicle is not detected in the plurality of video frames of the predetermined number, then the second vehicle is considered to disappear. And if the predicted vehicle position area matched with the vehicle position area corresponding to the first vehicle detected in the next video frame does not exist in the next video frame, determining that the first vehicle is a new vehicle, and giving the first vehicle a new ID, wherein the tracking result of the next video frame comprises the vehicle position area of the first vehicle in the next video frame. In some embodiments, the predicted vehicle position area of each vehicle in the next video frame of the next video frame is predicted according to the tracking result of the next video frame, then the next video frame is used as a new current video frame, the steps are repeatedly executed on the next video frame of the new current video frame, and the like for the subsequent video frames. In some embodiments, a vehicle location region may be determined to match a predicted vehicle location region if the area of the overlap region between the vehicle location region and the predicted vehicle location region is greater than or equal to a predetermined area threshold, or the vehicle location region may be determined to match the predicted vehicle location region if the ratio of the area of the overlap region between the vehicle location region and the predicted vehicle location region compared to the area of the union of the two location regions is greater than or equal to a predetermined ratio threshold.

In some embodiments, before the dividing the vehicle position area corresponding to the vehicle into the plurality of image blocks, the method further includes: and scaling the vehicle position area corresponding to the vehicle, so that the image information of the current video frame in the vehicle position area corresponding to the vehicle only comprises the vehicle. In some embodiments, vehicle detection is performed on a current video frame, each vehicle in the current video frame and an initial vehicle position region, that is, a vehicle bounding rectangle, in the current video frame are obtained, and then the initial vehicle position region is reduced by a certain proportion (for example, 70% of the original) to ensure that image information in the vehicle position region obtained after the reduction of the current video frame only includes the vehicle and does not include other things (for example, pedestrians, road surfaces, and the like). In some embodiments, the reduction scale may also be determined based on empirical values.

In some embodiments, the obtaining the sampling points corresponding to each of the plurality of image blocks includes: and for each image block in the plurality of image blocks, calculating a gradient value corresponding to each pixel point in the image block, and taking the pixel point with the maximum corresponding gradient value in the image block as a sampling point of the image block. In some embodiments, for each image block, the gradients in the X direction and the Y direction of each pixel point are calculated, and assuming that the gradient value in the X direction of the pixel point is dx and the gradient value in the Y direction is dy, the gradient value of the pixel point is (dx)²+(dy)²And calculating to find out the pixel point with the maximum gradient value of the image block as the sampling point of the image block, thereby quickly extracting the strong texture pixel point in the image block.

In some embodiments, said determining the displacement information of the vehicle in the next video frame relative to the current video frame according to the displacement information of each sampling point in the next video frame comprises: and sequencing each sampling point according to the displacement information of the sampling point in the next video frame, eliminating the sampling points sequenced at the first preset percentage and the sampling points sequenced at the second preset percentage, and taking the average value of the displacement information of the rest sampling points as the displacement information of the vehicle in the next video frame relative to the current video frame. In some embodiments, since the optical flow algorithm may have a large prediction error, the plurality of sampling points corresponding to each vehicle are sorted according to the displacement size thereof, the largest and smallest predetermined percentages (e.g., 10% -20%) of the sampling points are removed, and then the average of the displacements of the remaining sampling points is taken as the displacement of the vehicle in the next video frame relative to the current video frame.

In some embodiments, the performing vehicle detection on a next video frame of the current video frame, obtaining each vehicle in the next video frame and its vehicle position area in the next video frame, matching one or more vehicle position areas in the next video frame with one or more predicted vehicle position areas in the next video frame, and obtaining the vehicle tracking result of the next video frame includes: and calculating the IOU between each of the one or more vehicle position areas in the next video frame and each of the one or more predicted vehicle position areas in the next video frame, and if the IOU between the vehicle position area corresponding to the first vehicle detected in the next video frame and the predicted vehicle position area corresponding to the second vehicle detected in the current video frame is greater than or equal to a predetermined threshold value, determining that the first vehicle and the second vehicle are the same vehicle. In some embodiments, pairwise matching each vehicle location area in the next video frame with each predicted vehicle location area, calculates an iou (interaction over union) between vehicle location area a and predicted vehicle location area b:

wherein area (a), area (b) is the location of the vehicle location area a, the predicted vehicle location area b, area (a) and area (b) is the area of the overlapping area of the two location areas, and area (a) and area (b) is the area of the union area of the two location areas, i.e., the area of the two location areas and the area of the overlapping area minus the two location areas. In some embodiments, if the IOU between the vehicle location area corresponding to the first vehicle detected in the next video frame and the predicted vehicle location area corresponding to the second vehicle detected in the current video frame in the next video frame is greater than or equal to a predetermined threshold (e.g., 0.5), it may be determined that the first vehicle and the second vehicle are the same vehicle, and the first vehicle and the second vehicle are given the same ID.

In some embodiments, determining that the first vehicle and the second vehicle are the same vehicle if the IOU between the vehicle location area corresponding to the first vehicle detected in the next video frame and the predicted vehicle location area corresponding to the second vehicle detected in the current video frame is greater than or equal to a predetermined threshold includes: and if the IOU between the vehicle position area corresponding to the first vehicle detected in the next video frame and the predicted vehicle position areas corresponding to the second vehicles detected in the current video frame is larger than or equal to a preset threshold value, determining that the target second vehicle with the largest corresponding IOU in the second vehicles is the same as the first vehicle. In some embodiments, if the vehicle position area corresponding to the first vehicle detected in the next video frame and the IOU between the plurality of predicted vehicle position areas corresponding to the plurality of second vehicles detected in the current video frame in the next video frame are both greater than or equal to the predetermined threshold, it is determined that the target second vehicle with the largest corresponding IOU among the plurality of second vehicles is the same vehicle as the first vehicle.

In some embodiments, the method further comprises at least one of: if the predicted vehicle position area matched with the vehicle position area corresponding to the third vehicle detected in the next video frame does not exist in the next video frame, determining the third vehicle as a new vehicle; if the vehicle position area matched with the predicted vehicle position area corresponding to the fourth vehicle detected in the current video frame does not exist in the next video frame, determining that the fourth vehicle is a disappeared vehicle, and stopping vehicle tracking on the fourth vehicle in the subsequent video frame; and if the vehicle position area matched with the predicted vehicle position area corresponding to the fifth vehicle detected in the current video frame does not exist in the next video frames of the continuous preset number, determining that the fifth vehicle is a disappeared vehicle, and stopping vehicle tracking on the fifth vehicle in the subsequent video frames. In some embodiments, if there is no predicted vehicle location area in the next video frame that matches the vehicle location area corresponding to the third vehicle detected in the next video frame, the third vehicle is determined to be a new vehicle, and a new ID is assigned thereto. In some embodiments, if there is no vehicle location area in the next video frame that matches the predicted vehicle location area corresponding to the fourth vehicle detected in the current video frame, the fourth vehicle is determined to be a disappearing vehicle, and the fourth vehicle is no longer tracked in the subsequent video frame corresponding to the next video frame. In some embodiments, if there is no vehicle position area matching the predicted vehicle position area corresponding to the fifth vehicle detected in the current video frame in a predetermined number of consecutive next video frames, the fifth vehicle is determined to be a disappearing vehicle, and the fifth vehicle is no longer continuously tracked in subsequent video frames corresponding to the plurality of next video frames.

In some embodiments, for a feature point extracted from a video frame, the feature point does not include a feature point on the video frame in the vehicle position area. In actual scene application, because a large number of moving vehicles may exist in a scene, if feature points of the two adjacent frames of images are extracted directly and then transformation matrixes of the two adjacent frames of images are calculated, since the textures of the vehicles on the images are relatively rich, part of the feature points are likely to fall into the moving vehicles, and a large error is caused in calculating the transformation matrixes of the vehicles. In some embodiments, after extracting the plurality of feature points from the video frame, the feature points located in the vehicle position region may be eliminated from the plurality of feature points.

In some embodiments, the extracting feature points on the video frame includes: generating a mask image corresponding to a video frame according to a vehicle position area in the video frame, wherein a first gray value corresponding to the vehicle position area in the mask image is different from second gray values corresponding to other areas; and extracting feature points from the video frame region corresponding to the region of the second gray value of the mask image corresponding to the video frame. In some embodiments, a mask image is generated according to a vehicle position area in a video frame, the vehicle position area is set as a first gray scale value in the mask image, other areas are set as a second gray scale value, the first gray scale value is different from the second gray scale value, then feature points are extracted from the video frame area corresponding to the area of the second gray scale value of the mask image corresponding to the video frame, so that feature points located in the vehicle position area can be avoided from being extracted, as shown in fig. 5, which is a certain video frame including a plurality of vehicle position areas (i.e., vehicle bounding rectangles), the mask image corresponding to the video frame is generated according to the plurality of vehicle position areas, as shown in fig. 6, the vehicle position area is set as a first gray scale value 0 in the mask image, and other areas are set as a second gray scale value 255.

In some embodiments, the calculating a transformation matrix between the two adjacent video frames according to the feature point matching pairs includes: randomly selecting a preset number of feature point matching pairs from the feature point matching pairs to obtain a key matrix for describing the preset number of feature point matching pairs, and obtaining a matching inner point group corresponding to the key matrix from the plurality of feature point matching pairs; randomly selecting the preset number of feature point matching pairs from the feature point matching pairs again, and iteratively executing the operation until a preset iteration end condition is met; and determining a key matrix of which the corresponding matched inner point group meets a preset number condition from a plurality of currently obtained key matrices as a transformation matrix between the two adjacent video frames, wherein the reprojection error of the matched inner point group corresponding to the key matrix relative to the key matrix is less than or equal to a preset error threshold. In some embodiments, the key matrix is used to describe the correspondence between the randomly selected predetermined number of matching pairs of feature points, for example, let (p,

) If any one of the feature point matching pairs in the predetermined number of feature point matching pairs is selected, the any one feature point matching pair and the key matrix H satisfy

And = H × p. In some embodiments, the key matrix H may be a 3 x 3 perspective transformation matrix, the key matrix H being as follows:

by

Expansion of = H × p yields the following formula:

wherein the content of the first and second substances,

=[

]^T，p=[x y 1]^Tand 8 unknowns require 8 equations to solve, so at least 4 feature point matching pairs need to be selected from the plurality of feature point matching pairs, one feature point matching pair provides two equations, 8 equations in total are provided, then a key matrix H can be obtained by solving by using a least square method, and a matching interior point group corresponding to the key matrix H is obtained from the plurality of feature point matching pairs. In some embodiments, for each of the plurality of pairs of feature point matches, the reprojection error with respect to the key matrix H is as follows:

wherein the content of the first and second substances,

and

and if the reprojection error is less than or equal to a first preset error threshold value, regarding the feature point matching pair as a matching interior point group corresponding to the key matrix, and performing interior point inspection on all the feature point matching pairs to obtain a plurality of feature point matching pairs corresponding to the key matrix H and determining the feature point matching pairs as the matching interior point group corresponding to the key matrix H. In some embodiments, a predetermined number of feature point matching pairs are randomly selected from the plurality of feature point matching pairs, then a key matrix for describing a correspondence between the randomly selected feature point matching pairs is obtained, a plurality of matching interior point groups corresponding to the key matrix are obtained from all the feature point matching pairs, then the predetermined number of feature point matching pairs are randomly selected from all the feature point matching pairs again, the above operation is performed iteratively until a predetermined iteration end condition is satisfied, and a key matrix, of which the corresponding matching interior point group satisfies the predetermined number condition, is determined as a transformation matrix between two adjacent video frames from among the currently obtained plurality of key matrices. In some embodiments, the predetermined number of conditions may be that the key matrix matching the largest number of groups of points is used as the transformation matrix. In some embodiments, the key matrix H may also be an affine transformation matrix of 3 × 3, and at this time, the key matrix H may be obtained by solving using a least square method, only by selecting at least 3 feature point matching pairs from the plurality of feature point matching pairs, one feature point matching pair providing two equations, and 6 equations in total. In some embodiments, for the transformation matrix between two adjacent video frames and the matching inlier set corresponding to the transformation matrix obtained previously, the total weight projection error is as follows:

and then optimizing the transformation matrix through a nonlinear optimization algorithm to ensure that the total weight projection error corresponding to the optimized transformation matrix reaches a minimum value, and using the optimized transformation matrix for subsequent calculation. In some embodiments, the non-linear optimization methods include, but are not limited to, the LM (Levenberg-Marquard, Levenberg-Marquardt) algorithm, the Gauss Newton algorithm, the gradient descent algorithm, and the like. In some embodiments, the predetermined iteration end condition is that a ratio of the number of matched inner point groups corresponding to the key matrix obtained in one iteration to the number of matched pairs of the plurality of feature points is greater than or equal to a predetermined ratio threshold; wherein, the determining a key matrix, of which the corresponding matching interior point group satisfies a predetermined number of conditions, from among the plurality of currently obtained key matrices as a transformation matrix between the two adjacent video frames includes: and determining the key matrix obtained in the iteration as a transformation matrix between the two adjacent video frames. In some embodiments, if the ratio of the number of the matched inner point groups corresponding to the key matrix obtained in one iteration to the number of the matched pairs of the plurality of feature points is greater than or equal to a predetermined ratio threshold, the iteration is stopped, and the key matrix obtained in the iteration is determined as the transformation matrix corresponding to the adjacent frame. In some embodiments, the predetermined iteration end condition is a predetermined number of iterations; wherein, the determining a key matrix, of which the corresponding matching interior point group satisfies a predetermined number condition, from among the plurality of currently obtained key matrices as a transformation matrix between the two adjacent video frames includes: and determining the key matrix with the maximum number of corresponding matched inner point groups from a plurality of currently obtained key matrices as a transformation matrix between the two adjacent video frames. In some embodiments, after a predetermined number of iterations (e.g., 10) are completed, the iterations are stopped, and the key matrix with the largest number of matched intra-point groups in the plurality of currently obtained key matrices is determined as the transformation matrix between two adjacent video frames.

In some embodiments, the method further comprises determining a vehicle location area of the vehicle in the video frame, a vehicle location area of the vehicle in the neighboring video frame, the video frame, and the neighboringObtaining the actual motion vector of the vehicle relative to the actual scene in the video frame and the adjacent video frame by a transformation matrix between the video frames, comprising: obtaining an initial vehicle position area of the vehicle in the adjacent video frames according to the vehicle position area of the vehicle in the video frames and the transformation matrix between the video frames and the adjacent video frames; and obtaining the actual motion vector of the vehicle relative to the actual scene in the video frame and the adjacent video frame according to the initial vehicle position area of the vehicle in the adjacent video frame and the vehicle position area in the adjacent video. In some embodiments, for the same vehicle (i.e. the same ID) in two adjacent video frames formed by each video frame and the adjacent video frame (the previous video frame or the next video frame) of the video frame, taking the adjacent video frame as the previous video frame of the video frame as an example, assuming that the center coordinates of the vehicle position area of the vehicle in the previous video frame are (x 0, y 0) and the center coordinates of the vehicle position area in the video frame are (x 1, y 1), the center coordinates of the initial vehicle position area of the vehicle in the video frame are calculated according to the previously obtained transformation matrix H between the two adjacent video frames formed by the video frame and the previous video frame

：

Then, according to the initial vehicle position area and the vehicle position area of the vehicle in the video frame, the actual motion vector of the vehicle in the video frame relative to the previous video frame can be obtained, that is, the actual motion vector of the vehicle in the video frame and the previous video frame relative to the actual scene:

the actual motion vector calculated above tends to 0 if the vehicle is stopped in an actual scene.

In some embodiments, for each detected vehicle, obtaining an actual movement distance of the vehicle in a predetermined time period before each video frame according to a plurality of actual movement vectors corresponding to the vehicle includes: and for each detected vehicle, sequencing a plurality of actual motion vectors corresponding to the vehicle in a preset time period before each video frame according to corresponding module values, eliminating the actual motion vectors sequenced in the first third preset percentage and the actual motion vectors sequenced in the last fourth preset percentage, and summing module values of the remaining actual motion vectors to obtain the actual motion distance of the vehicle in the preset time period. In some embodiments, a predetermined time period is set, such as 4-6 seconds (not limited herein), for each detected vehicle, a plurality of actual motion vectors corresponding to a plurality of adjacent frames of the vehicle within the predetermined time period before each video frame are obtained, and a modulus value of each actual motion vector is calculated:

the actual motion vectors are sorted according to the magnitude of the modulus values, because the vehicle detection, the vehicle tracking and the calculation transformation matrix in front may have large errors, and the error accumulation may cause large influence on the final identification judgment, the statistical filtering needs to be performed on the actual motion vectors in the predetermined time period, for example, a threshold value is set, such as 15% (no limitation is made here), the modulus values of the top 15% and the bottom 15% after sorting are removed, and the remaining modulus values are summed to obtain the actual motion distance of the vehicle in the predetermined time period before each video frame.

In some embodiments, the method further comprises: and the computing equipment obtains the preset distance threshold according to the image resolution of the road surface video information and a preset sensitivity parameter. In some embodiments, assuming that the image resolution of the road video is (w, h), that is, the width of the image is w and the height is h, the preset distance threshold a = (w + h)/b is set as follows:

where b is a predetermined sensitivity parameter, preferably the b range is generally 10< b <40, with smaller b being more sensitive, i.e. easier to identify a stop, but also easier to cause false positives.

Fig. 2 shows a block diagram of a computing device for identifying a stopped vehicle according to an embodiment of the present application, which includes a one-module 11, a two-module 12, a three-module 13, a four-module 14, and a five-module 15. The one-to-one module 11 is used for acquiring road surface video information shot by the unmanned aerial vehicle and acquiring a plurality of video frames corresponding to the road surface video information; a second module 12, configured to perform vehicle detection on at least one video frame of the multiple video frames and perform vehicle tracking on subsequent video frames corresponding to the at least one video frame, so as to obtain each detected vehicle and a vehicle position area in the subsequent video frames; a third module 13, configured to extract feature points on video frames, match the feature points on two adjacent video frames to obtain a feature point matching pair, and calculate a transformation matrix between the two adjacent video frames according to the feature point matching pair; a fourth module 14, configured to, for each vehicle in a video frame, if the vehicle is also present in an adjacent video frame of the video frame, obtain an actual motion vector of the vehicle in the video frame and the adjacent video frame relative to an actual scene according to a vehicle position area of the vehicle in the video frame, a vehicle position area of the vehicle in the adjacent video frame, and a transformation matrix between the video frame and the adjacent video frame; a fifthly module 15, configured to, for each detected vehicle, obtain, according to a plurality of actual motion vectors corresponding to the vehicle, an actual motion distance of the vehicle in a predetermined time period before each video frame, and identify the vehicle as a stopped vehicle if a predetermined number of actual motion distances obtained by the vehicle in a predetermined number of consecutive video frames are all smaller than or equal to a preset distance threshold.

And the one-to-one module 11 is used for acquiring the road surface video information shot by the unmanned aerial vehicle and acquiring a plurality of video frames corresponding to the road surface video information. In some embodiments, the unmanned aerial vehicle can automatically cruise according to a preset route, and can also manually control the unmanned aerial vehicle to cruise, wherein the unmanned aerial vehicle can cruise at a constant speed or an unsteady speed, and does not need to hover in the cruise process. In some embodiments, the computing device includes, but is not limited to, a mobile device, a PC, a cloud server, and other devices with computing capability, and the road video shot in cruising is sent to the remote control end of the unmanned aerial vehicle through the wireless transmission module of the unmanned aerial vehicle itself, and then the road video is sent to the computing device through the network transmission module. In some embodiments, the computing device may also be a computing module mounted on the drone or a computing module of the remote control end, if the drone itself or the remote control end contains computing capabilities. In some embodiments, after receiving the road surface video, the computing device decodes the video to obtain a plurality of video frames corresponding to the road surface video.

A secondary module 12, configured to perform vehicle detection on at least one video frame of the multiple video frames and perform vehicle tracking on subsequent video frames corresponding to the at least one video frame, so as to obtain each detected vehicle and a vehicle position area in the subsequent video frames. In some embodiments, the vehicle detection is performed on at least one of the plurality of video frames, which may be vehicle detection performed on each of the plurality of video frames, or vehicle detection may be performed on a specific video frame and each subsequent video frame of the plurality of video frames, or vehicle detection may be performed on the current video frame at intervals of a predetermined number of frames, for example, the predetermined number of frames is 10 frames, vehicle detection is performed on the 1 st frame, the 11 th frame, the 21 st frame, and so on. In some embodiments, the object of vehicle detection is to find the corresponding positions of all vehicles in the image, and the output is the coordinates of the vehicle bounding rectangle, i.e. the vehicle position area, in the image. In some embodiments, any object Detection algorithm may be used for vehicle Detection, and is not limited herein, for example, ACF (Aggregate Channel Features for Multi-view Face Detection), DPM (Deformable Part model), pyramidBox (A Context-associated Single Box Face Detector), SSD (Single Box Detector), MTCNN (Multi-task connected network), YOLOv2 (second version of the You Only Look series object Detection algorithm), YOLOv3 (third version of the You Only Look series object Detection algorithm), and so on. Preferably, the yolov4-tiny (a simplified version of the fourth version of the young Only Look one series object detection algorithm) object detection algorithm can be used for vehicle detection, and the algorithm is fast, has good robustness and can be operated on a mobile device. In some embodiments, after a vehicle and its vehicle position area in a video frame are detected on a certain video frame, the vehicle needs to be tracked in a subsequent video frame corresponding to the video frame, that is, a unique ID is given to the same vehicle in the video frame image sequence, and vehicle tracking is to predict its vehicle position area in the next video frame according to the vehicle position area of the vehicle in the current video frame. In some embodiments, any target tracking algorithm may be used for vehicle tracking, such as kcf (kernel Correlation filter), tld (tracking Learning detection), mosse (minimum Output Sum of Squared error), etc., without limitation. In some embodiments, for a certain video frame, if the video frame is not subjected to vehicle detection, the position area of the vehicle in the video frame predicted by the vehicle tracking algorithm may be used as the vehicle position area in the video frame.

And a third module 13, configured to extract feature points on the video frames, match the feature points on two adjacent video frames to obtain a feature point matching pair, and calculate a transformation matrix between the two adjacent video frames according to the feature point matching pair. In some embodiments, since the drone is flying in cruise (may hover, or may be moving in flight all the time), we need to compute a transformation matrix between two adjacent video frames to know the relative motion of the drone with respect to the actual scene. In some embodiments, the video frame for extracting the feature points is each of a plurality of video frames corresponding to the road surface video information,in other embodiments, the video frame from which the feature points are extracted is the video frame from which the first vehicle detection is performed and each subsequent video frame. In some embodiments, the plurality of feature points are extracted From the video frame, and any feature extraction algorithm may be used to extract the feature points on the video frame, without any limitation, such as surf (speedup Robust features), Sift (Scale-innovative feature transform), Harris (Harris corner detection algorithm), Fast (features From acquired Segment test), ORB (organized Fast and organized brief) algorithm may be used to extract the feature points quickly, and ORB algorithm is Fast in computation speed, has good invariance to translation, rotation, Scale scaling, brightness change, occlusion, noise, and the like, and also maintains a certain degree of stability to visual change and affine transformation. In some embodiments, after extracting the feature points from the video frames, matching a plurality of feature points on two adjacent video frames, that is, taking one of the video frames as a reference image, and finding a feature point that is the closest to each feature point of the reference image from the feature points in the other video frame, where any feature matching algorithm may be used to perform feature point matching, and no limitation is made herein, and preferably, a Kd-tree (k-dimensional tree) algorithm is used to perform fast matching of the feature points. In some embodiments, after feature point matching, a plurality of feature point matching pairs may be obtained, and from the plurality of feature point matching pairs, a transformation matrix describing a correspondence between the plurality of feature point matching pairs may be calculated, for example, let (p,

And = H × p. In some embodiments, the transformation matrix may be a perspective transformation matrix, or may also be an affine transformation matrix. In some embodiments of the present invention, the,the transformation matrix may be calculated by a Random Sample Consensus (Random Sample Consensus) algorithm, or may be calculated by a Random Sample Consensus (RANSAC) derivation algorithm, or may be calculated by a least square method, wherein a specific calculation method using the least square method is as follows: there are n feature point matching pairs, (p,

H = H × p, least squares is to find H such that:

and (5) minimizing the sum of squares of the residuals, namely solving:

the matrix solution according to the least squares method yields:

。

a fourth module 14, configured to, for each vehicle in the video frames, if the vehicle is also present in an adjacent video frame of the video frames, obtain actual motion vectors of the vehicle in the video frames and the adjacent video frames relative to an actual scene according to a vehicle position area of the vehicle in the video frames, a vehicle position area of the vehicle in the adjacent video frames, and a transformation matrix between the video frames and the adjacent video frames. In some embodiments, if the vehicle is also present in an adjacent video frame (a previous video frame or a next video frame) of the video frame, the following description will take the adjacent video frame as the previous video frame as an example, that is, for the same vehicle (i.e., ID) in two adjacent video frames formed by the video frame and the previous video frame, a theoretical vehicle position region of the vehicle in the video frame can be obtained according to a vehicle position region of the vehicle in the previous video frame and a transformation matrix between the two adjacent video frames, that is, if the vehicle is stationary, a position of the vehicle in the video frame is different from a position of the vehicle in the previous video frame when the vehicle is stationary because the drone is moving. According to the theoretical vehicle position area of the vehicle in the video frame and the vehicle position area of the vehicle in the video frame, the actual motion vector of the vehicle in the video frame relative to the previous video frame can be obtained, that is, the actual motion vector of the vehicle in two adjacent video frames formed by the video frame and the previous video frame relative to the actual scene, if the vehicle is stopped in the actual scene, the calculated actual motion vector tends to 0.

A fifthly module 15, configured to, for each detected vehicle, obtain, according to a plurality of actual motion vectors corresponding to the vehicle, an actual motion distance of the vehicle in a predetermined time period before each video frame, and identify the vehicle as a stopped vehicle if a predetermined number of actual motion distances obtained by the vehicle in a predetermined number of consecutive video frames are all smaller than or equal to a preset distance threshold. In some embodiments, for each detected vehicle, after obtaining the actual motion vector of the vehicle in the adjacent frame, a predetermined time period t is set, for example, 4 to 6 seconds, a plurality of actual motion vectors corresponding to the vehicle in the predetermined time period are obtained, the modulus values of the actual motion vectors are calculated, and then the sum is performed, so as to obtain the actual motion distance of the vehicle in the predetermined time period. In some embodiments, for the same vehicle, if the time of the current video frame is t1, we calculate the actual moving distance S of the current video frame in the (t 1-t, t 1) time period according to the corresponding actual motion vectors in the (t 1-t, t 1) time period, and if the S values of consecutive N frames (1 < = N) are all less than or equal to a preset distance threshold, identify the vehicle as a stopped vehicle. For example, when N =1, the determination is made by whether the S value in the t-period before the current frame is less than or equal to a preset distance threshold; and when N =2, judging whether the S value in the t period before the current frame and the S value in the t period before the next frame of the current frame are both smaller than or equal to a preset distance threshold value. In some embodiments, the computing device may output an alert upon identifying the vehicle stopped from traveling. In some embodiments, outputting the warning may be the computing device sending a warning to the police/command center, informing the police/command center that there is a stopped vehicle, and the police/command center taking the next action (e.g., sending the police to go, checking the reason for parking, and preventing a traffic accident). In some embodiments, the license plate number of the parked vehicle may be identified from the road video and communicated to the police/command center. In some embodiments, the position information of the parking vehicle can be identified and obtained from the road surface video according to the position information and the shooting angle information of the unmanned aerial vehicle, and the position information is informed to the police/command center.

In some embodiments, the obtaining a plurality of video frames corresponding to the road surface video information includes: obtaining a plurality of original video frames corresponding to the road surface video information; and sampling the plurality of original video frames according to a preset frame number interval to obtain a plurality of video frames corresponding to the road surface video information. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, the vehicle detection and vehicle tracking of at least one video frame of the plurality of video frames comprises: taking a designated video frame in the plurality of video frames as a current video frame, performing vehicle detection on the current video frame, obtaining each vehicle in the current video frame and a vehicle position area thereof in the current video frame, and performing operation S for each vehicle in the current video frame, wherein operation S includes: dividing a vehicle position area corresponding to the vehicle to obtain a plurality of image blocks, obtaining sampling points corresponding to each image block in the plurality of image blocks, obtaining displacement information of each sampling point in a next video frame of the current video frame through an optical flow algorithm, determining the displacement information of the vehicle in the next video frame relative to the current video frame according to the displacement information of each sampling point in the next video frame, and obtaining a predicted vehicle position area of the vehicle in the next video frame according to the displacement information of the vehicle and the vehicle position area of the vehicle in the current video frame; performing vehicle detection on a next video frame of the current video frame, obtaining each vehicle in the next video frame and a vehicle position area in the next video frame, matching one or more vehicle position areas in the next video frame with one or more predicted vehicle position areas in the next video frame, obtaining a vehicle tracking result of the next video frame, and executing an operation S for each vehicle in the next video frame, wherein if the vehicle position area corresponding to a first vehicle detected in the next video frame is matched with the predicted vehicle position area corresponding to a second vehicle detected in the current video frame, it is determined that the first vehicle and the second vehicle are the same vehicle; and taking the next video frame as a new current video frame, repeatedly executing the step R, and repeating the steps for the subsequent video frames. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, before the dividing the vehicle position area corresponding to the vehicle into the plurality of image blocks, the method further includes: and scaling the vehicle position area corresponding to the vehicle, so that the image information of the current video frame in the vehicle position area corresponding to the vehicle only comprises the vehicle. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, the obtaining the sampling points corresponding to each of the plurality of image blocks includes: and for each image block in the plurality of image blocks, calculating a gradient value corresponding to each pixel point in the image block, and taking the pixel point with the maximum corresponding gradient value in the image block as a sampling point of the image block. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, said determining the displacement information of the vehicle in the next video frame relative to the current video frame according to the displacement information of each sampling point in the next video frame comprises: and sequencing each sampling point according to the displacement information of the sampling point in the next video frame, eliminating the sampling points sequenced at the first preset percentage and the sampling points sequenced at the second preset percentage, and taking the average value of the displacement information of the rest sampling points as the displacement information of the vehicle in the next video frame relative to the current video frame. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, the performing vehicle detection on a next video frame of the current video frame, obtaining each vehicle in the next video frame and its vehicle position area in the next video frame, matching one or more vehicle position areas in the next video frame with one or more predicted vehicle position areas in the next video frame, and obtaining the vehicle tracking result of the next video frame includes: and calculating the IOU between each of the one or more vehicle position areas in the next video frame and each of the one or more predicted vehicle position areas in the next video frame, and if the IOU between the vehicle position area corresponding to the first vehicle detected in the next video frame and the predicted vehicle position area corresponding to the second vehicle detected in the current video frame is greater than or equal to a predetermined threshold value, determining that the first vehicle and the second vehicle are the same vehicle. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, determining that the first vehicle and the second vehicle are the same vehicle if the IOU between the vehicle location area corresponding to the first vehicle detected in the next video frame and the predicted vehicle location area corresponding to the second vehicle detected in the current video frame is greater than or equal to a predetermined threshold includes: and if the IOU between the vehicle position area corresponding to the first vehicle detected in the next video frame and the predicted vehicle position areas corresponding to the second vehicles detected in the current video frame is larger than or equal to a preset threshold value, determining that the target second vehicle with the largest corresponding IOU in the second vehicles is the same as the first vehicle. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, the apparatus is further for at least one of: if the predicted vehicle position area matched with the vehicle position area corresponding to the third vehicle detected in the next video frame does not exist in the next video frame, determining the third vehicle as a new vehicle; if the vehicle position area matched with the predicted vehicle position area corresponding to the fourth vehicle detected in the current video frame does not exist in the next video frame, determining that the fourth vehicle is a disappeared vehicle, and stopping vehicle tracking on the fourth vehicle in the subsequent video frame; and if the vehicle position area matched with the predicted vehicle position area corresponding to the fifth vehicle detected in the current video frame does not exist in the next video frames of the continuous preset number, determining that the fifth vehicle is a disappeared vehicle, and stopping vehicle tracking on the fifth vehicle in the subsequent video frames. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, for a feature point extracted from a video frame, the feature point does not include a feature point on the video frame in the vehicle position area. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, the extracting feature points on the video frame includes: generating a mask image corresponding to a video frame according to a vehicle position area in the video frame, wherein a first gray value corresponding to the vehicle position area in the mask image is different from second gray values corresponding to other areas; and extracting feature points from the video frame region corresponding to the region of the second gray value of the mask image corresponding to the video frame. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, the calculating a transformation matrix between the two adjacent video frames according to the feature point matching pairs includes: randomly selecting a preset number of feature point matching pairs from the feature point matching pairs to obtain a key matrix for describing the preset number of feature point matching pairs, and obtaining a matching inner point group corresponding to the key matrix from the plurality of feature point matching pairs; randomly selecting the preset number of feature point matching pairs from the feature point matching pairs again, and iteratively executing the operation until a preset iteration end condition is met; and determining a key matrix of which the corresponding matched inner point group meets a preset number condition from a plurality of currently obtained key matrices as a transformation matrix between the two adjacent video frames, wherein the reprojection error of the matched inner point group corresponding to the key matrix relative to the key matrix is less than or equal to a preset error threshold. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, the predetermined iteration end condition is that a ratio of the number of matched inner point groups corresponding to the key matrix obtained in one iteration to the number of matched pairs of the plurality of feature points is greater than or equal to a predetermined ratio threshold; wherein, the determining a key matrix, of which the corresponding matching interior point group satisfies a predetermined number of conditions, from among the plurality of currently obtained key matrices as a transformation matrix between the two adjacent video frames includes: and determining the key matrix obtained in the iteration as a transformation matrix between the two adjacent video frames. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, the predetermined iteration end condition is a predetermined number of iterations; wherein, the determining a key matrix, of which the corresponding matching interior point group satisfies a predetermined number condition, from among the plurality of currently obtained key matrices as a transformation matrix between the two adjacent video frames includes: and determining the key matrix with the maximum number of corresponding matched inner point groups from a plurality of currently obtained key matrices as a transformation matrix between the two adjacent video frames. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, said obtaining the actual motion vector of the vehicle in the video frame and the neighboring video frame relative to the actual scene according to the vehicle position area of the vehicle in the video frame, the vehicle position area of the vehicle in the neighboring video frame, the transformation matrix between the video frame and the neighboring video frame comprises: obtaining an initial vehicle position area of the vehicle in the adjacent video frames according to the vehicle position area of the vehicle in the video frames and the transformation matrix between the video frames and the adjacent video frames; and obtaining the actual motion vector of the vehicle relative to the actual scene in the video frame and the adjacent video frame according to the initial vehicle position area of the vehicle in the adjacent video frame and the vehicle position area in the adjacent video. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, for each detected vehicle, obtaining an actual movement distance of the vehicle in a predetermined time period before each video frame according to a plurality of actual movement vectors corresponding to the vehicle includes: and for each detected vehicle, sequencing a plurality of actual motion vectors corresponding to the vehicle in a preset time period before each video frame according to corresponding module values, eliminating the actual motion vectors sequenced in the first third preset percentage and the actual motion vectors sequenced in the last fourth preset percentage, and summing module values of the remaining actual motion vectors to obtain the actual motion distance of the vehicle in the preset time period. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

In some embodiments, the apparatus is further configured to: and obtaining the preset distance threshold according to the image resolution of the road surface video information and a preset sensitivity parameter. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.

FIG. 3 illustrates an exemplary system that can be used to implement the various embodiments described in this application.

In some embodiments, as shown in FIG. 3, the system 300 can be implemented as any of the devices in the various embodiments described. In some embodiments, system 300 may include one or more computer-readable media (e.g., system memory or NVM/storage 320) having instructions and one or more processors (e.g., processor(s) 305) coupled with the one or more computer-readable media and configured to execute the instructions to implement modules to perform the actions described herein.

For one embodiment, system control module 310 may include any suitable interface controllers to provide any suitable interface to at least one of processor(s) 305 and/or any suitable device or component in communication with system control module 310.

The system control module 310 may include a memory controller module 330 to provide an interface to the system memory 315. Memory controller module 330 may be a hardware module, a software module, and/or a firmware module.

System memory 315 may be used, for example, to load and store data and/or instructions for system 300. For one embodiment, system memory 315 may include any suitable volatile memory, such as suitable DRAM. In some embodiments, the system memory 315 may include a double data rate type four synchronous dynamic random access memory (DDR4 SDRAM).

For one embodiment, system control module 310 may include one or more input/output (I/O) controllers to provide an interface to NVM/storage 320 and communication interface(s) 325.

For example, NVM/storage 320 may be used to store data and/or instructions. NVM/storage 320 may include any suitable non-volatile memory (e.g., flash memory) and/or may include any suitable non-volatile storage device(s) (e.g., one or more Hard Disk Drives (HDDs), one or more Compact Disc (CD) drives, and/or one or more Digital Versatile Disc (DVD) drives).

NVM/storage 320 may include storage resources that are physically part of the device on which system 300 is installed or may be accessed by the device and not necessarily part of the device. For example, NVM/storage 320 may be accessible over a network via communication interface(s) 325.

Communication interface(s) 325 may provide an interface for system 300 to communicate over one or more networks and/or with any other suitable device. System 300 may wirelessly communicate with one or more components of a wireless network according to any of one or more wireless network standards and/or protocols.

For one embodiment, at least one of the processor(s) 305 may be packaged together with logic for one or more controller(s) (e.g., memory controller module 330) of the system control module 310. For one embodiment, at least one of the processor(s) 305 may be packaged together with logic for one or more controller(s) of the system control module 310 to form a System In Package (SiP). For one embodiment, at least one of the processor(s) 305 may be integrated on the same die with logic for one or more controller(s) of the system control module 310. For one embodiment, at least one of the processor(s) 305 may be integrated on the same die with logic for one or more controller(s) of the system control module 310 to form a system on a chip (SoC).

In various embodiments, system 300 may be, but is not limited to being: a server, a workstation, a desktop computing device, or a mobile computing device (e.g., a laptop computing device, a holding computing device, a tablet, a netbook, etc.). In various embodiments, system 300 may have more or fewer components and/or different architectures. For example, in some embodiments, system 300 includes one or more cameras, a keyboard, a Liquid Crystal Display (LCD) screen (including a touch screen display), a non-volatile memory port, multiple antennas, a graphics chip, an Application Specific Integrated Circuit (ASIC), and speakers.

The present application also provides a computer readable storage medium having stored thereon computer code which, when executed, performs a method as in any one of the preceding.

The present application also provides a computer program product, which when executed by a computer device, performs the method of any of the preceding claims.

The present application further provides a computer device, comprising:

one or more processors;

a memory for storing one or more computer programs;

the one or more computer programs, when executed by the one or more processors, cause the one or more processors to implement the method of any preceding claim.

It should be noted that the present application may be implemented in software and/or a combination of software and hardware, for example, implemented using Application Specific Integrated Circuits (ASICs), general purpose computers or any other similar hardware devices. In one embodiment, the software programs of the present application may be executed by a processor to implement the steps or functions described above. Likewise, the software programs (including associated data structures) of the present application may be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Additionally, some of the steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.

In addition, some of the present application may be implemented as a computer program product, such as computer program instructions, which when executed by a computer, may invoke or provide methods and/or techniques in accordance with the present application through the operation of the computer. Those skilled in the art will appreciate that the form in which the computer program instructions reside on a computer-readable medium includes, but is not limited to, source files, executable files, installation package files, and the like, and that the manner in which the computer program instructions are executed by a computer includes, but is not limited to: the computer directly executes the instruction, or the computer compiles the instruction and then executes the corresponding compiled program, or the computer reads and executes the instruction, or the computer reads and installs the instruction and then executes the corresponding installed program. Computer-readable media herein can be any available computer-readable storage media or communication media that can be accessed by a computer.

Communication media includes media by which communication signals, including, for example, computer readable instructions, data structures, program modules, or other data, are transmitted from one system to another. Communication media may include conductive transmission media such as cables and wires (e.g., fiber optics, coaxial, etc.) and wireless (non-conductive transmission) media capable of propagating energy waves such as acoustic, electromagnetic, RF, microwave, and infrared. Computer readable instructions, data structures, program modules, or other data may be embodied in a modulated data signal, for example, in a wireless medium such as a carrier wave or similar mechanism such as is embodied as part of spread spectrum techniques. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. The modulation may be analog, digital or hybrid modulation techniques.

By way of example, and not limitation, computer-readable storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. For example, computer-readable storage media include, but are not limited to, volatile memory such as random access memory (RAM, DRAM, SRAM); and non-volatile memory such as flash memory, various read-only memories (ROM, PROM, EPROM, EEPROM), magnetic and ferromagnetic/ferroelectric memories (MRAM, FeRAM); and magnetic and optical storage devices (hard disk, tape, CD, DVD); or other now known media or later developed that can store computer-readable information/data for use by a computer system.

An embodiment according to the present application comprises an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to perform a method and/or a solution according to the aforementioned embodiments of the present application.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims

1. A method of identifying a stopped vehicle, wherein the method comprises:

2. The method of claim 1, wherein the obtaining a plurality of video frames corresponding to the road surface video information comprises:

obtaining a plurality of original video frames corresponding to the road surface video information;

and sampling the plurality of original video frames according to a preset frame number interval to obtain a plurality of video frames corresponding to the road surface video information.

3. The method of claim 1, wherein the vehicle detection by at least one video frame of the plurality of video frames and vehicle tracking of a subsequent video frame to which the at least one video frame corresponds comprises:

taking a designated video frame in the plurality of video frames as a current video frame, performing vehicle detection on the current video frame, obtaining each vehicle in the current video frame and a vehicle position area thereof in the current video frame, and performing operation S for each vehicle in the current video frame, wherein operation S includes: dividing a vehicle position area corresponding to the vehicle to obtain a plurality of image blocks, obtaining sampling points corresponding to each image block in the plurality of image blocks, obtaining displacement information of each sampling point in a next video frame of the current video frame through an optical flow algorithm, determining the displacement information of the vehicle in the next video frame relative to the current video frame according to the displacement information of each sampling point in the next video frame, and obtaining a predicted vehicle position area of the vehicle in the next video frame according to the displacement information of the vehicle and the vehicle position area of the vehicle in the current video frame;

performing vehicle detection on a next video frame of the current video frame, obtaining each vehicle in the next video frame and a vehicle position area in the next video frame, matching one or more vehicle position areas in the next video frame with one or more predicted vehicle position areas in the next video frame, obtaining a vehicle tracking result of the next video frame, and executing an operation S for each vehicle in the next video frame, wherein if the vehicle position area corresponding to a first vehicle detected in the next video frame is matched with the predicted vehicle position area corresponding to a second vehicle detected in the current video frame, it is determined that the first vehicle and the second vehicle are the same vehicle;

and taking the next video frame as a new current video frame, repeatedly executing the step R, and repeating the steps for the subsequent video frames.

4. The method according to claim 3, wherein before dividing the vehicle position area corresponding to the vehicle into the plurality of image blocks, the method further comprises:

and scaling the vehicle position area corresponding to the vehicle, so that the image information of the current video frame in the vehicle position area corresponding to the vehicle only comprises the vehicle.

5. The method of claim 3 or 4, wherein the obtaining the sample points corresponding to each of the plurality of image blocks comprises:

and for each image block in the plurality of image blocks, calculating a gradient value corresponding to each pixel point in the image block, and taking the pixel point with the maximum corresponding gradient value in the image block as a sampling point of the image block.

6. The method of claim 5, wherein the determining the displacement information of the vehicle in the next video frame relative to the current video frame according to the displacement information of each sampling point in the next video frame comprises:

and sequencing each sampling point according to the displacement information of the sampling point in the next video frame, eliminating the sampling points sequenced at the first preset percentage and the sampling points sequenced at the second preset percentage, and taking the average value of the displacement information of the rest sampling points as the displacement information of the vehicle in the next video frame relative to the current video frame.

7. The method of claim 3, wherein the performing vehicle detection on a next video frame of the current video frame, obtaining each vehicle in the next video frame and its vehicle location area in the next video frame, matching one or more vehicle location areas in the next video frame with one or more predicted vehicle location areas in the next video frame, obtaining vehicle tracking results for the next video frame comprises:

and calculating the IOU between each of the one or more vehicle position areas in the next video frame and each of the one or more predicted vehicle position areas in the next video frame, and if the IOU between the vehicle position area corresponding to the first vehicle detected in the next video frame and the predicted vehicle position area corresponding to the second vehicle detected in the current video frame is greater than or equal to a predetermined threshold value, determining that the first vehicle and the second vehicle are the same vehicle.

8. The method of claim 7, wherein the determining that the first vehicle and the second vehicle are the same vehicle if the IOU between the vehicle location area corresponding to the first vehicle detected in the next video frame and the predicted vehicle location area corresponding to the second vehicle detected in the current video frame is greater than or equal to a predetermined threshold comprises:

and if the IOU between the vehicle position area corresponding to the first vehicle detected in the next video frame and the predicted vehicle position areas corresponding to the second vehicles detected in the current video frame is larger than or equal to a preset threshold value, determining that the target second vehicle with the largest corresponding IOU in the second vehicles is the same as the first vehicle.

9. The method of claim 7, wherein the method further comprises at least one of:

if the predicted vehicle position area matched with the vehicle position area corresponding to the third vehicle detected in the next video frame does not exist in the next video frame, determining the third vehicle as a new vehicle;

if the vehicle position area matched with the predicted vehicle position area corresponding to the fourth vehicle detected in the current video frame does not exist in the next video frame, determining that the fourth vehicle is a disappeared vehicle, and stopping vehicle tracking on the fourth vehicle in the subsequent video frame;

and if the vehicle position area matched with the predicted vehicle position area corresponding to the fifth vehicle detected in the current video frame does not exist in the next video frames of the continuous preset number, determining that the fifth vehicle is a disappeared vehicle, and stopping vehicle tracking on the fifth vehicle in the subsequent video frames.

10. The method according to any one of claims 1 to 3, wherein, for a feature point extracted from a video frame, the feature point does not include a feature point in a vehicle position area on the video frame.

11. The method of claim 10, wherein said extracting feature points on a video frame comprises:

generating a mask image corresponding to a video frame according to a vehicle position area in the video frame, wherein a first gray value corresponding to the vehicle position area in the mask image is different from second gray values corresponding to other areas;

and extracting feature points from the video frame region corresponding to the region of the second gray value of the mask image corresponding to the video frame.

12. The method according to any one of claims 1 to 3, wherein said calculating a transformation matrix between said two adjacent video frames according to said feature point matching pairs comprises:

randomly selecting a preset number of feature point matching pairs from the feature point matching pairs to obtain a key matrix for describing the preset number of feature point matching pairs, and obtaining a matching inner point group corresponding to the key matrix from the plurality of feature point matching pairs; randomly selecting the preset number of feature point matching pairs from the feature point matching pairs again, and iteratively executing the operation until a preset iteration end condition is met; and determining a key matrix of which the corresponding matched inner point group meets a preset number condition from a plurality of currently obtained key matrices as a transformation matrix between the two adjacent video frames, wherein the reprojection error of the matched inner point group corresponding to the key matrix relative to the key matrix is less than or equal to a preset error threshold.

13. The method according to any one of claims 1 to 3, wherein the obtaining of the actual motion vector of the vehicle in the video frame and the neighboring video frame relative to the actual scene according to the vehicle position area of the vehicle in the video frame, the vehicle position area of the vehicle in the neighboring video frame, the transformation matrix between the video frame and the neighboring video frame comprises:

obtaining an initial vehicle position area of the vehicle in the adjacent video frames according to the vehicle position area of the vehicle in the video frames and the transformation matrix between the video frames and the adjacent video frames;

and obtaining the actual motion vector of the vehicle relative to the actual scene in the video frame and the adjacent video frame according to the initial vehicle position area of the vehicle in the adjacent video frame and the vehicle position area in the adjacent video.

14. The method according to any one of claims 1 to 3, wherein the obtaining, for each detected vehicle, an actual movement distance of the vehicle within a predetermined time period before each video frame according to a plurality of actual movement vectors corresponding to the vehicle comprises:

and for each detected vehicle, sequencing a plurality of actual motion vectors corresponding to the vehicle in a preset time period before each video frame according to corresponding module values, eliminating the actual motion vectors sequenced in the first third preset percentage and the actual motion vectors sequenced in the last fourth preset percentage, and summing module values of the remaining actual motion vectors to obtain the actual motion distance of the vehicle in the preset time period.

15. The method of any of claims 1-3, wherein the method further comprises:

and obtaining the preset distance threshold according to the image resolution of the road surface video information and a preset sensitivity parameter.

16. An apparatus for identifying a stopped vehicle, the apparatus comprising:

a processor; and

a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the method of any of claims 1 to 15.

17. A computer-readable medium storing instructions that, when executed by a computer, cause the computer to perform operations of any of the methods of claims 1-15.

18. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method according to any one of claims 1 to 15 when executed by a processor.