WO2021022643A1 - 一种视频目标检测与跟踪方法及装置 - Google Patents
一种视频目标检测与跟踪方法及装置 Download PDFInfo
- Publication number
- WO2021022643A1 WO2021022643A1 PCT/CN2019/108080 CN2019108080W WO2021022643A1 WO 2021022643 A1 WO2021022643 A1 WO 2021022643A1 CN 2019108080 W CN2019108080 W CN 2019108080W WO 2021022643 A1 WO2021022643 A1 WO 2021022643A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target
- detection
- video frame
- image
- frame image
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 61
- 238000001514 detection method Methods 0.000 claims abstract description 564
- 238000000605 extraction Methods 0.000 claims description 18
- 230000009466 transformation Effects 0.000 claims description 15
- 238000004364 calculation method Methods 0.000 claims description 13
- 230000008569 process Effects 0.000 claims description 12
- 230000001960 triggered effect Effects 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 3
- 238000007689 inspection Methods 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
- G06T7/251—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
- G06T7/75—Determining position or orientation of objects or cameras using feature-based methods involving models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
Definitions
- the present invention relates to the technical field of computer vision, in particular to a method and device for detecting and tracking video targets.
- tracking and detecting targets in videos collected by capture devices is the main content of computer vision.
- the self-driving car in order to autopilot, the self-driving car needs to know the driving environment around the self-driving car. Therefore, it needs to pass
- the collection equipment of the self-vehicle performs target detection and tracking on the surrounding environment of the self-vehicle.
- the current target detection method only performs target detection on the target in a single frame image in the video, and does not consider the relationship between the front and rear frames, so that the detection accuracy of the target detection is low.
- the current target tracking method only tracks each target appearing in the first frame of the video. When a new target appears in the video, the new target cannot be tracked. Therefore, there is an urgent need for a video target detection and tracking method that has high detection accuracy and can track newly emerging targets.
- the present invention provides a video target detection and tracking method and device, so as to improve the detection accuracy of target detection and track newly emerging targets.
- the specific technical solution is as follows.
- an embodiment of the present invention provides a video target detection and tracking method, the method including:
- the detection target of the current video frame image is taken as the first detection target, and for each first detection target The target is determined, the rectangular image area corresponding to the first detected target in the current video frame image is determined based on the position of the first detected target, and the width and height of the rectangular image area are respectively scaled to pre-established local target detection Input the width and height of the image to the model, input the zoomed rectangular image area into the local target detection model to obtain the position and category of the second detected target, and return to execute the detection whether the real-time acquisition by the acquisition device is received and the surrounding environment Of the current video frame image;
- the detection target When the detection target is not detected and there is a detection target in the previous video frame image of the current video frame image, the detection target existing in the previous video frame image is taken as the third detection target, and for each third detection target The target is determined, the rectangular image area corresponding to the third detected target in the current video frame image is determined, the width and height of the rectangular image area are respectively scaled to the width and height of the input image of the pre-established local target detection model, and The rectangular image area obtained after scaling is input into the local target detection model to obtain the position and category of the fourth detected target, the corresponding relationship between the fourth detected target and the third detected target is established, and the detection is performed back Whether the step of receiving the current video frame image of the surrounding environment collected by the collecting device in real time;
- the detection target of the current video frame image and the detection target existing in the previous video frame image are regarded as the fifth
- the detected target determine the corresponding rectangular image area of the fifth detected target in the video frame image where the fifth detected target is located, and scale the width and height of the rectangular image area respectively To the width and height of the input image of the pre-established local target detection model, input the zoomed rectangular image area into the local target detection model to obtain the position and category of the sixth detected target.
- an embodiment of the present invention provides a video target detection and tracking device, which includes:
- the detection module is used to detect whether the current video frame image of the surrounding environment collected by the collecting device in real time is received
- the judging module is used to determine whether the frame number interval between the current video frame image and the video frame image of the last full-image target detection is a preset interval if the current video frame image is received, and if so, trigger the full-image Target detection module;
- the full-image target detection module is configured to perform full-image target detection on the current video frame image according to a pre-established full-image target detection model
- the first detection result module is used to set the detection target of the current video frame image as the first detection when the location and category of the detection target are detected and there is no detection target in the previous video frame image of the current video frame image Target, for each first detected target, the rectangular image area corresponding to the first detected target in the current video frame image is determined based on the position of the first detected target, and the width and height of the rectangular image area are respectively Zoom to the width and height of the input image of the pre-established local target detection model, input the zoomed rectangular image area into the local target detection model to obtain the position and category of the second detected target, and trigger the detection module;
- the second detection result module is used for when the detection target is not detected and the detection target exists in the previous video frame image of the current video frame image, the detection target existing in the previous video frame image is regarded as the third detection target Target, for each third detected target, determine the corresponding rectangular image area of the third detected target in the current video frame image, and scale the width and height of the rectangular image area to a pre-established local target detection model Input the width and height of the image, input the zoomed rectangular image area into the local target detection model to obtain the position and category of the fourth detected target, and establish the relationship between the fourth detected target and the third detected target Correspondence, trigger the detection module;
- the third detection result module is used to combine the detection target of the current video frame image and the previous video frame image when the location and category of the detection target are detected and the previous video frame image of the current video frame image has the detection target
- the existing detection target is regarded as the fifth detection target.
- the corresponding rectangular image area of the fifth detection target in the video frame image where the fifth detection target is located is determined, and the rectangle The width and height of the image area are respectively scaled to the width and height of the input image of the pre-established local target detection model, and the zoomed rectangular image area is input into the local target detection model to obtain the position and category of the sixth detected target, Target matching is performed on a plurality of sixth detected targets to obtain targets that are successfully matched and unsuccessfully matched between the current video frame image and the previous video frame image, and trigger the detection module.
- this embodiment can combine the detection result of the previous video frame image with the detection result of the current video frame image in the case of the current video frame image for the full image target detection, and pass the full image-partial alternation
- the detection method local target detection is continued after the full-image target detection, which takes into account the relationship between the front and rear video frame images, and improves the detection accuracy of target detection.
- the embodiment of the present invention is based on the full-image
- the target detection model and the local target detection model perform target detection on each video frame image, so that the target existing in each video frame image can be detected. Therefore, the newly appeared target in the video frame can be detected.
- the corresponding relationship of the same target between the previous video frame image and the current video frame image can be obtained, and the matching target between the previous video frame image and the current video frame image can be obtained, thus, the new appearance can be achieved
- the target is tracked instead of tracking each target appearing in the first video frame image in the video.
- the detection result of the previous video frame image and the detection result of the current video frame image are combined, and the whole image-partial alternate detection method is used to perform the whole
- the local target detection is continued, which takes into account the relationship between the front and rear video frame images, and improves the detection accuracy of the target detection.
- the embodiment of the present invention is based on the whole image target detection model and the local target detection The model performs target detection on each video frame image, so that the target existing in each video frame image can be detected. Therefore, the newly appeared target in the video frame can be detected. At the same time, the previous target can be obtained after partial detection.
- full-image target detection is not performed for each video frame, but a full-image target detection is performed at an interval of a preset number of frames, and other video frames are subjected to local target detection.
- the amount of calculation for detection is much smaller than that for full-image target detection. Therefore, by adopting the method of performing full-image target detection once at intervals of a preset number of frames in the embodiment of the present invention, the amount of calculation can be significantly reduced.
- a full-image target detection model can be obtained that associates the first sample image with the position and category of the target in the detection frame, and the video frame can be detected by the full-image target detection model.
- the image performs full-image target detection in order to obtain the position and category of the target in the video frame image.
- the first width and height of the corresponding rectangular image area of the detected target in the current video frame image are scaled to the width and height of the input image of the pre-established local target detection model, making preparations for subsequent local target detection.
- a local target detection model can be obtained that associates the second sample image with the position and category of the target in the detection frame.
- a full-image target can be detected
- the obtained detected target is then subjected to local target detection in order to correct the position and category of the detected target, and obtain the precise position and category of the target in the video frame image.
- the current video frame image and the previous video frame image are matched successfully and unsuccessfully matched.
- the goal of successful matching is to
- the last video frame image corresponds to the same target in the current video frame image one-to-one, and the position of the same target in the last video frame image and the position in the current video frame can be known, which serves the purpose of tracking the same target It also serves the purpose of target detection for the same target, and the obtained unsuccessful matching target serves the purpose of target detection for different targets.
- FIG. 1 is a schematic flowchart of a video target detection and tracking method provided by an embodiment of the present invention
- Fig. 2 is a schematic structural diagram of a video target detection and tracking device provided by an embodiment of the present invention.
- the embodiment of the present invention discloses a video target detection and tracking method, which can consider the relationship between the front and rear video frames, improve the detection accuracy of target detection, and at the same time, can track newly-appearing targets.
- the embodiments of the present invention will be described in detail below.
- FIG. 1 is a schematic flowchart of a method for detecting and tracking a video target provided by an embodiment of the present invention. This method is applied to electronic equipment. The method specifically includes the following steps S110 to S160:
- step S110 Detect whether the current video frame image of the surrounding environment collected by the collecting device in real time is received, and if so, execute step S120.
- the collection device After the collection device collects the video in real time, it sends the collected video to the electronic device.
- the collection device of the own car collects the video in real time, and then sends the collected video to the electronic device of the own car.
- the device may be a processor of the vehicle.
- the electronic device detects whether the current video frame image of the surrounding environment collected by the collecting device in real time is received, and performs subsequent steps according to the detection result.
- step S120 Determine whether the frame number interval between the current video frame image and the video frame image of the last full-image target detection is a preset interval, and if so, perform step S130.
- the embodiment of the present invention no longer performs full-image target detection for each video frame image, but It adopts the method of full-image target detection every preset frame number interval. Therefore, when the electronic device detects and receives the current video frame image of the vehicle surrounding environment collected by the self-car acquisition device in real time, it needs to determine the current video frame image and the previous Whether the frame number interval between the video frame images for the whole image target detection is a preset interval, and the subsequent steps are executed according to the detection result.
- S130 Perform full-picture target detection on the current video frame image according to a pre-established full-picture target detection model.
- the frame interval between the current video frame image and the last full-image target detection video frame image is a preset interval, it means that the current video frame image is a video frame image that requires full-image target detection.
- the established full-image target detection model performs full-image target detection on the current video frame image.
- the training process of the full-image target detection model can be:
- the training is completed, and a full-image target detection model that associates the first sample image with the position and category of the target in the detection frame is obtained.
- the electronic device first needs to construct a first initial network model, and then train it to obtain a full-image target detection model.
- a caffe tool can be used to construct a first initial network model including a first feature extraction layer, a region generation network layer, and a first regression layer.
- the first initial network model may be Faster R-CNN (Faster Region Convolutional Neural Networks), R-FCN (Region-based Fully Convolutional Networks, Region-based Fully Convolutional Networks), YOLO Algorithm or SSD algorithm.
- the first sample image and the detection contained in the first sample image After acquiring the first sample image in the training set and the first position and the first category corresponding to the target in the detection frame contained in the first sample image, the first sample image and the detection contained in the first sample image are input into the first initial network model for training.
- the first sample image is input to the first feature extraction layer, and the full image feature vector in the first sample image is determined through the first model parameters of the first feature extraction layer. Then the determined feature vector of the full image is input to the region generation network layer, and feature calculation is performed on the feature vector of the full image through the second model parameter of the region generation network layer to obtain feature information of the candidate region containing the first reference target. Then the feature information is input to the first regression layer, and the feature information is regressed through the third model parameters of the first regression layer to obtain the first reference category to which the first reference target belongs and the first reference target in the first sample image The first reference position in.
- the first reference category and the first reference position are obtained, they are compared with the first category and the first position respectively, and the first difference value between the first reference category and the first category can be calculated through the predefined objective function.
- the second difference value between the first reference position and the first position is calculated.
- the training process it is possible to loop through all the first sample images, and continuously adjust the first model parameter, the second model parameter, and the third model parameter of the first initial network model.
- the number of iterations reaches the first preset number, it means that the first initial network model at this time can adapt to most of the first sample images and obtain accurate results.
- the training of the first initial network model is completed and the entire Figure target detection model. It is understandable that the trained full-image target detection model makes the first sample image correlate with the position and category of the target in the detection frame, and the full-image target detection model takes the full image as input to obtain the detected target The location and category of the model.
- a full-image target detection model that associates the first sample image with the position and category of the target in the detection frame can be obtained, and the full-image target detection model can be Full image target detection is performed on the video frame image in order to obtain the position and category of the target in the video frame image.
- the detected target of the current video frame image is taken as the first detected target, and for each first detection target, A detected target, based on the position of the first detected target, determine the corresponding rectangular image area of the first detected target in the current video frame image, and scale the width and height of the rectangular image area to the pre-established local target detection
- the model inputs the width and height of the image, and inputs the zoomed rectangular image area into the local target detection model to obtain the position and category of the second detected target, and then returns to step S110.
- the embodiment of the present invention needs to merge the detection result of the current video frame with the detection result of the previous video frame.
- the detection result of the target detection model in the whole image is used to obtain the position and category of the detected target and the current
- the detection target of the current video frame image is taken as the first detection target.
- the score of the detected target will also be obtained.
- the score is greater than the preset threshold to indicate the detection
- the accuracy of the target is high. Therefore, when the position and category of the detected target are obtained and there is no detected target in the previous video frame image of the current video frame image, the score of the detected target of the current video frame image is greater than
- the detection target of the preset threshold is used as the first detection target.
- the embodiment of the present invention proposes a method of full-image-local alternate detection, that is, after performing full-image target detection, the first inspection is performed.
- the target continues to perform local target detection.
- the local target detection method is to perform local target detection through a pre-established local target detection model.
- the size of the input image is a preset size, and the preset size is usually small. Therefore, before performing local target detection, it needs to be performed
- the size of the image of the partial target detection is scaled to a preset size. That is, for each first detected target, the corresponding rectangular image area of the first detected target in the current video frame image is determined based on the position of the first detected target, and the width and height of the rectangular image area are respectively scaled to the preset
- the established local target detection model inputs the width and height of the image, and the zoomed rectangular image area is input into the local target detection model to obtain the position and category of the second detected target. Then, return to step S110. Because when performing local target detection, only one zoomed rectangular image area is input at a time, so that the amount of calculation is small, and the probability of false detection is further reduced.
- inputting the zoomed rectangular image area into the local target detection model to obtain the position and category of the second detected target may include: inputting the zoomed rectangular image area into the local target detection model The position and category of the candidate detection target and the score of the candidate detection target are obtained in, and the candidate detection target of the candidate detection target whose score is greater than the preset threshold is taken as the second detection target.
- the corresponding rectangular image area of the first detected target in the current video frame image is determined based on the position of the first detected target, and the width and height of the rectangular image area are respectively scaled to
- the step of inputting the width and height of the pre-established local target detection model may include:
- the coordinates of the upper left corner point and the lower right corner point of the first detected target in the current video frame image are determined based on the position of the first detected target, in the current video frame image Obtain a rectangular image area with the upper left corner and the lower right corner as the diagonal;
- the scaled coordinates of the upper left intersection point and the scaled coordinates of the lower right corner point are calculated;
- the width and height of the rectangular image area are respectively scaled to the width of the input image of the pre-established local target detection model And height.
- the position of the first detected target is obtained, then the coordinates of the upper left corner point and the lower right corner point of the first detected target in the current video frame image are known, In order to be able to perform local target detection, a rectangular image area with the upper left corner and the lower right corner as the diagonal is obtained in the current video frame image.
- the coordinates of the upper left corner point include the abscissa of the upper left corner point and the ordinate of the upper left corner point.
- the coordinates of the lower right corner point include the abscissa of the lower right corner point and the ordinate of the lower right corner point.
- the preset coordinate transformation coefficient includes the first The preset abscissa transformation coefficient, the first preset ordinate transformation coefficient, the second preset abscissa transformation coefficient, and the second preset ordinate transformation coefficient.
- the scaled coordinates of the upper left corner point and the scaled coordinates of the lower right corner point can be calculated by the following formula:
- a x is the first preset abscissa transformation coefficient
- a y is the first preset ordinate transformation coefficient
- d x is the second preset abscissa transformation coefficient
- d y is the second preset ordinate transformation coefficient
- x lt is the abscissa of the upper left corner point
- y lt is the ordinate of the upper left corner point
- x rb is the abscissa of the lower right corner point
- y rb is the ordinate of the lower right corner point
- F w is the horizontal coordinate of the upper left corner point after scaling Coordinates
- F h is the ordinate after scaling of the upper left corner point
- H is the height of the input image of the local target detection model
- W is the width of the input image of the local target detection model.
- the width and height of the rectangular image area need to be scaled separately How much zoom can reach the width and height of the input image of the pre-established local target detection model, and then zoom the width and height according to the zoom amount, that is, zoom based on the coordinates of the upper left corner, the coordinates of the lower right corner, and the upper left intersection
- the width and height of the rectangular image area are respectively scaled to the width and height of the input image of the local target detection model established in advance.
- the preset coordinate transformation coefficient, and the width and height of the input image of the pre-established local target detection model in the current video frame image of the first detected target are respectively scaled to the width and height of the input image of the pre-established local target detection model to prepare for subsequent local target detection.
- the training process of the local target detection model can be:
- the second sample image and the second position and second category corresponding to the target in the detection frame contained in the second sample image are input into the second initial network model, where the second initial network model includes a second feature extraction layer and a first Second regression layer
- the electronic device first needs to construct a second initial network model, and then train it to obtain a local target detection model.
- the caffe tool can be used to construct a second initial network model including a second feature extraction layer and a second regression layer.
- the second initial network model may be Faster R-CNN (Faster Region Convolutional Neural Networks), R-FCN (Region-based Fully Convolutional Networks, Region-based Fully Convolutional Networks), YOLO Algorithm or SSD algorithm.
- the second sample image and the target in the detection frame contained in the second sample image are input into the second initial network model for training.
- the second sample image is input to the second feature extraction layer, and the feature vector in the second sample image is determined through the fourth model parameter of the second feature extraction layer. Then the determined feature vector is input to the second regression layer, and the feature vector is regressed through the fifth model parameter of the second regression layer to obtain the second reference category to which the second reference target belongs and the second reference target in the second The second reference position in the sample image.
- the second reference category and the second reference position are obtained, they are compared with the second category and the second position respectively, and the third difference value between the second reference category and the second category can be calculated through the predefined objective function.
- a fourth difference value between the second reference position and the second position is calculated.
- the training process it is possible to loop through all the second sample images, and continuously adjust the fourth model parameter and the fifth model parameter of the second initial network model.
- the number of iterations reaches the second preset number, it means that the second initial network model at this time can adapt to most of the second sample images and obtain accurate results.
- the training of the second initial network model is completed and the local target is obtained.
- Detection model It is understandable that the local target detection model obtained by training makes the second sample image correlate with the position and category of the target in the detection frame, and the local target detection model takes the local image as input to obtain the position and type of the detected target. The model of the category.
- a local target detection model can be obtained that associates the second sample image with the position and category of the target in the detection frame.
- the local target detection model can be used to perform full
- the detected target obtained by the image target detection is then subjected to local target detection to correct the position and category of the detected target, and obtain the precise position and category of the target in the video frame image.
- the detection target existing in the previous video frame image is taken as the third detection target.
- the detected target is not detected by the full-image target detection model, including but not limited to the current video frame image does not have a target, for example: in the field of autonomous driving, the self-driving car is parked in the parking lot, and the self-driving car
- the whole image target detection model fails to detect it.
- the embodiment of the present invention proposes a method of full-image-local alternate detection, that is, after performing full-image target detection, the third inspection is performed.
- the target continues to perform local target detection.
- the local target detection method is to perform local target detection through a pre-established local target detection model. For the training process of the local target detection model, refer to the description in step S140, which will not be repeated here.
- the size of the input image is a preset size, and the preset size is usually small. Therefore, before performing local target detection, it needs to be performed
- the size of the image of the partial target detection is scaled to a preset size. That is, for each third detection target, determine the corresponding rectangular image area of the third detection target in the current video frame image, and scale the width and height of the rectangular image area to the input image of the pre-established local target detection model. Width and height, input the zoomed rectangular image area into the local target detection model to obtain the position and category of the fourth detected target, establish the corresponding relationship between the fourth detected target and the third detected target, and return to step S110 .
- determining the rectangular image area corresponding to the third detection target in the current video frame image may include: determining the first target position of the third detection target in the previous video frame image, and determining in the current video frame A first reference position that is the same as the first target position is determined based on the first reference position, and a rectangular image area corresponding to the third detected target in the current video frame image is determined.
- the third detection target Since the position of the third detection target in the two video frames before and after will not change much, it can be assumed that in the current video frame, the third detection target is still at the position of the previous video frame.
- the first target position and then use the rectangular image area corresponding to the first reference position that is the same as the first target position in the current video frame image as the rectangular image area corresponding to the third detected target in the current video frame image.
- the width and height of the rectangular image area corresponding to the third detected target in the current video frame image are scaled to the width and height of the input image of the pre-established local target detection model, and the scaled rectangular image area is input to the local
- the position and category of the fourth detected target are obtained in the target detection model, and thus the position of the third detected target in the current video frame image, that is, the position of the fourth detected target can be obtained.
- step S110 is returned to.
- Establish the corresponding relationship between the fourth detected target and the third detected target that is, the previous video frame image is corresponding to the same target in the current video frame image, and the position of the same target in the previous video frame image can be known. And the position in the current video frame plays the purpose of tracking the same target.
- the detection target of the current video frame image and the detection target existing in the previous video frame image are regarded as the fifth Check out the target.
- the embodiment of the present invention proposes a method of full-image-local alternate detection, that is, after the whole-image target detection is performed, the fifth inspection is performed.
- the target continues to perform local target detection.
- the local target detection method is to perform local target detection through a pre-established local target detection model. For the training process of the local target detection model, refer to the description in step S140, which will not be repeated here.
- the size of the input image is a preset size, and the preset size is usually small. Therefore, before performing local target detection, it needs to be performed
- the size of the image of the partial target detection is scaled to a preset size. That is, for each fifth detected target, determine the corresponding rectangular image area of the fifth detected target in the video frame image where the fifth detected target is located, and scale the width and height of the rectangular image area to the pre-established local
- the target detection model inputs the width and height of the image, and inputs the zoomed rectangular image area into the local target detection model to obtain the position and category of the sixth detected target.
- the rectangular image area corresponding to the fifth detected target in the video frame image where the fifth detected target is located is determined, and the width and height of the rectangular image area are respectively scaled to the width and width of the input image of the local target detection model established in advance.
- the height method refer to step S140 to determine the corresponding rectangular image area of the first detected target in the current video frame image, and scale the width and height of the rectangular image area to the width and height of the input image of the pre-established local target detection model. The method is not repeated here.
- the sixth detection target includes both the detection target of the previous video frame image and the detection target of the current video frame image, in order to detect and track the target, after obtaining the position and category of the sixth detection target, Target matching is performed on a plurality of sixth detected targets, and the target that is successfully matched and the target that is unsuccessful between the current video frame image and the previous video frame image are obtained, and step S110 is returned to.
- the step of performing target matching on a plurality of sixth detected targets to obtain a target that is successfully matched and a target that is not successfully matched between the current video frame image and the previous video frame image may include:
- For each sixth detection target in the current video frame image determine the overlap area and intersection area between the sixth detection target and each sixth detection target in the previous video frame image, and calculate the area of the overlap area Quotient with the area of the intersecting area;
- the sixth detection target of the current video frame image corresponding to the quotient not less than the preset threshold value and the sixth detection target of the previous video frame image are regarded as the successful matching target, and the quotient of the quotient less than the preset threshold value is corresponding
- the sixth detection target of the current video frame image and the sixth detection target of the previous video frame image are regarded as unsuccessful matching targets.
- multiple sixth detected targets are matched by calculating IoU, where IoU (Intersection over Union) refers to the area of the intersection of two geometric figures divided by the combination of the two The quotient of the area. The higher the IoU, the more overlapping parts and the more similar the two goals. Therefore, after obtaining the location and category of the sixth detection target, for each sixth detection target in the current video frame image, determine the sixth detection target and each sixth detection target in the previous video frame image Calculate the quotient of the overlap area and the intersection area between the overlap area and the intersection area.
- IoU Intersection over Union
- the quotient After obtaining the quotient, compare the quotient with the preset threshold. If it is greater than or equal to the preset threshold, it means that the two sixth detection targets are relatively similar. If it is less than the preset threshold, it means that the two sixth detection targets are not similar. , Taking the sixth detection target of the current video frame image corresponding to the quotient not less than the preset threshold and the sixth detection target of the previous video frame image as the target of successful matching, and the quotient that is less than the preset threshold The corresponding sixth detection target of the current video frame image and the sixth detection target of the previous video frame image are regarded as unsuccessful matching targets.
- the reason for the existence of unsuccessful targets may be that the full-image target detection model fails to detect new targets in the current video frame image, or it may be targets that exist in both the previous video frame image and the current video frame image.
- the target is detected in the video frame image, but the target cannot be detected in the current video frame image by the full-image target detection model.
- the reason is not limited to this.
- the target of successful matching and unsuccessful matching between the current video frame image and the previous video frame image are obtained, and the obtained target of successful matching is
- One-to-one correspondence between the last video frame image and the same target in the current video frame image can know the position of the same target in the previous video frame image and the position in the current video frame, which can track the same target
- the purpose also serves the purpose of target detection on the same target, and the obtained unsuccessful matching target serves the purpose of target detection on different targets.
- this embodiment can combine the detection result of the previous video frame image with the detection result of the current video frame image in the case of the current video frame image for the full image target detection, and pass the full image-partial alternation
- the detection method local target detection is continued after the full-image target detection, which takes into account the relationship between the front and rear video frame images, and improves the detection accuracy of target detection.
- the embodiment of the present invention is based on the full-image
- the target detection model and the local target detection model perform target detection on each video frame image, so that the target existing in each video frame image can be detected. Therefore, the newly appeared target in the video frame can be detected.
- the corresponding relationship of the same target between the previous video frame image and the current video frame image can be obtained, and the matching target between the previous video frame image and the current video frame image can be obtained, thus, the new appearance can be achieved
- the target is tracked instead of tracking each target appearing in the first video frame image in the video.
- the embodiments of the present invention can be applied to automatic driving.
- the electronic equipment of the self-vehicle detects and tracks the targets in the surrounding environment of the self-vehicle collected by the collection device of the self-vehicle in real time, so as to realize automatic driving.
- a video for automatic driving provided by the embodiment of the present invention
- the target detection and tracking method may also include:
- the detection does not receive the current video frame image of the surrounding environment of the vehicle collected in real time by the self-car acquisition device, it means that the self-car acquisition device no longer collects images.
- the algorithm ends and the previously detected target and the tracking result need to be output. That is, it is necessary to output the position and category of the detection target existing in the previous video frame image of the current video frame image and the corresponding relationship of each detection target.
- the position and category of the detection target existing in the previous video frame image of the current video frame image and each detection Target detection and tracking are realized by means of target correspondence.
- a video target detection and tracking method for automatic driving may also include:
- the detection target existing in the previous video frame image is taken as the seventh detection target, and for each seventh detection target, the seventh detection target is determined
- the rectangular image area corresponding to the target in the current video frame image, the width and height of the rectangular image area are respectively scaled to the width and height of the input image of the pre-established local target detection model, and the zoomed rectangular image area is input to the local target detection
- the position and category of the eighth detected target are obtained from the model, the corresponding relationship between the eighth detected target and the seventh detected target is established, and step S110 is returned to.
- the frame number interval between the current video frame image and the last full-image target detection video frame image is not the preset interval, it means that the current video frame image does not require full-image target detection.
- the current video frame If there is no detection target in the previous video frame of the image, no processing is done. If the detection target exists in the previous video frame of the current video frame, the detection target existing in the previous video frame is regarded as the seventh detection target. Go out.
- the local target detection method is to perform local target detection through a pre-established local target detection model.
- the training process of the local target detection model refer to the description in step S140, which will not be repeated here.
- the size of the input image is a preset size, and the preset size is usually small. Therefore, before performing local target detection, it needs to be performed
- the size of the image of the partial target detection is scaled to a preset size. That is, for each seventh detection target, determine the corresponding rectangular image area of the seventh detection target in the current video frame image, and scale the width and height of the rectangular image area to the input image of the pre-established local target detection model. Width and height, input the zoomed rectangular image area into the local target detection model to obtain the position and category of the eighth detected target, establish the corresponding relationship between the eighth detected target and the seventh detected target, and return to step S110 .
- determining the rectangular image area corresponding to the seventh detection target in the current video frame image may include: determining the second target position of the seventh detection target in the previous video frame image, and determining in the current video frame A second reference position that is the same as the second target position is determined based on the second reference position, and a rectangular image area corresponding to the seventh detected target in the current video frame image is determined.
- the seventh detection target Since the position of the seventh detection target in the two video frames before and after will not change too much, it can be assumed that in the current video frame, the seventh detection target is still at the position of the previous video frame.
- the second target position and then use the rectangular image area corresponding to the second reference position that is the same as the second target position in the current video frame image as the rectangular image area corresponding to the seventh detected target in the current video frame image.
- the width and height of the rectangular image area corresponding to the seventh detected target in the current video frame image are scaled to the width and height of the input image of the pre-established local target detection model, and the scaled rectangular image area is input to the local
- the position and category of the eighth detected target are obtained from the target detection model, and thus the position of the seventh detected target in the current video frame image, that is, the position of the eighth detected target can be obtained.
- step S110 is returned to.
- Establish the corresponding relationship between the eighth detected target and the seventh detected target that is, the previous video frame image is corresponding to the same target in the current video frame image, and the position of the same target in the previous video frame image can be known. And the position in the current video frame plays the purpose of tracking the same target.
- full-image target detection is not performed for each video frame, but a full-image target detection is performed at an interval of a preset number of frames, and other video frames are performed for local target detection. Due to the local target detection The calculation amount is much smaller than the full-image target detection. Therefore, by adopting the method of performing a full-image target detection at intervals of a preset frame number in the embodiment of the present invention, the calculation amount can be significantly reduced.
- Fig. 2 is a schematic structural diagram of a video target detection and tracking device provided by an embodiment of the present invention.
- the device may include:
- the detection module 210 is configured to detect whether the current video frame image of the surrounding environment collected by the collecting device in real time is received;
- the judging module 220 is used for judging whether the frame number interval between the current video frame image and the video frame image of the last full-image target detection is a preset interval if the current video frame image is received, and if so, trigger the full-image Figure target detection module 230;
- the full-image target detection module 230 is configured to perform full-image target detection on the current video frame image according to a pre-established full-image target detection model
- the first detection result module 240 is used for setting the detection target of the current video frame image as the first detection target when the detection target's position and category are detected and the previous video frame image of the current video frame image does not have the detection target For each first detected target, determine the corresponding rectangular image area of the first detected target in the current video frame image based on the position of the first detected target, and calculate the width and height of the rectangular image area Respectively zoom to the width and height of the input image of the pre-established local target detection model, input the zoomed rectangular image area into the local target detection model to obtain the position and category of the second detected target, and trigger the detection module 210 ;
- the second detection result module 250 is used to set the detection target existing in the previous video frame image as the third detection target when the detection target is not detected and the previous video frame image of the current video frame image has the detection target.
- Target for each third detected target, determine the corresponding rectangular image area of the third detected target in the current video frame image, and scale the width and height of the rectangular image area to the pre-established local target detection
- the model inputs the width and height of the image, inputs the zoomed rectangular image area into the local target detection model to obtain the position and category of the fourth detected target, and establishes the fourth detected target and the third detected target To trigger the detection module 210;
- the third detection result module 260 is used to compare the detection target of the current video frame image and the previous video frame when the detection target is detected and the position and category of the detection target are detected and the previous video frame image of the current video frame image has the detection target
- the detected target that exists in the image is taken as the fifth detected target. For each fifth detected target, determine the corresponding rectangular image area of the fifth detected target in the video frame image where the fifth detected target is located.
- the width and height of the rectangular image area are respectively scaled to the width and height of the input image of the pre-established local target detection model, and the zoomed rectangular image area is input into the local target detection model to obtain the position and category of the sixth detected target , Perform target matching on a plurality of sixth detected targets, obtain targets that are successfully matched and unsuccessfully matched between the current video frame image and the previous video frame image, and trigger the detection module 210.
- this embodiment can combine the detection result of the previous video frame image with the detection result of the current video frame image in the case of the current video frame image for the full image target detection, and pass the full image-partial alternation
- the detection method local target detection is continued after the full-image target detection, which takes into account the relationship between the front and rear video frame images, and improves the detection accuracy of target detection.
- the embodiment of the present invention is based on the full-image
- the target detection model and the local target detection model perform target detection on each video frame image, so that the target existing in each video frame image can be detected. Therefore, the newly appeared target in the video frame can be detected.
- the corresponding relationship of the same target between the previous video frame image and the current video frame image can be obtained, and the matching target between the previous video frame image and the current video frame image can be obtained, thus, the new appearance can be achieved
- the target is tracked instead of tracking each target appearing in the first video frame image in the video.
- the foregoing device may further include:
- the output module is used to output the previous video frame image of the current video frame image if the current video frame image is not received after the detection of whether the current video frame image of the surrounding environment collected by the acquisition device in real time is received. The location and category of the detected target and the corresponding relationship of each detected target.
- the foregoing device may further include:
- the fourth detection result module is used to determine whether the frame number interval between the current video frame image and the video frame image of the last full-image target detection is a preset interval, if it is not a preset interval, When there is a detection target in the previous video frame image of the current video frame image, the detection target existing in the previous video frame image is taken as the seventh detection target, and for each seventh detection target, the seventh detection target is determined.
- the rectangular image area corresponding to the target in the current video frame image is obtained, the width and height of the rectangular image area are respectively scaled to the width and height of the input image of the pre-established local target detection model, and the rectangular image area obtained after scaling is input.
- the location and category of the eighth detected target are obtained from the local target detection model, the corresponding relationship between the eighth detected target and the seventh detected target is established, and the detection module is triggered.
- the above-mentioned apparatus may further include a first training module configured to train to obtain the full-image target detection model, and the first training module may include:
- the first acquisition submodule is configured to acquire the first sample image in the training set and the first position and the first category corresponding to the target in the detection frame contained in the first sample image;
- the first input sub-module is used to input the first sample image and the first position and the first category corresponding to the target in the detection frame contained in the first sample image into the first initial network model, where ,
- the first initial network model includes a first feature extraction layer, a region generation network layer, and a first regression layer;
- the full-image feature vector determining sub-module is configured to determine the full-image feature vector in the first sample image through the first model parameter of the first feature extraction layer;
- the feature information determining sub-module is configured to perform feature calculation on the full image feature vector through the second model parameter of the region generation network layer to obtain feature information of the candidate region containing the first reference target;
- the first generation sub-module is used to perform regression on the feature information through the third model parameter of the first regression layer to obtain the first reference category to which the first reference target belongs and the location of the first reference target The first reference position in the first sample image;
- a first difference calculation sub-module configured to calculate a first difference value between the first reference category and the first category, and calculate a second difference value between the first reference position and the first position ;
- the first adjustment sub-module is configured to adjust the first model parameter, the second model parameter, and the third model parameter based on the first difference value and the second difference value to trigger the first acquisition Submodule
- the first training completion sub-module is used to complete the training when the number of iterations reaches the first preset number of times to obtain a full-image target detection model that associates the first sample image with the position and category of the target in the detection frame.
- the first detection result module 240 may be specifically used for:
- the coordinates of the upper left corner point and the lower right corner point of the first detected target in the current video frame image are determined based on the position of the first detected target, in the current video frame image Obtaining a rectangular image area with the upper left corner point and the lower right corner point as diagonal lines;
- the coordinates of the upper left corner point According to the coordinates of the upper left corner point, the coordinates of the lower right corner point, the preset coordinate transformation coefficient, and the width and height of the input image of the local target detection model established in advance, the scaled coordinates of the upper left corner point and the The scaled coordinates of the lower right corner point;
- the width and height of the rectangular image area are respectively scaled to The width and height of the input image of the pre-built local target detection model.
- the above-mentioned apparatus may further include a second training module configured to train to obtain the local target detection model, and the second training module may include:
- the second acquisition submodule is configured to acquire the second sample image in the training set and the second position and second category corresponding to the target in the detection frame contained in the second sample image;
- the second input sub-module is used to input the second sample image and the second position and second category corresponding to the target in the detection frame contained in the second sample image into the second initial network model, where
- the second initial network model includes a second feature extraction layer and a second regression layer;
- the feature vector determining sub-module is configured to determine the feature vector in the second sample image through the fourth model parameter of the second feature extraction layer;
- the second generation sub-module is used to perform regression on the feature vector through the fifth model parameter of the second regression layer to obtain the second reference category to which the second reference target belongs and the second reference target in the The second reference position in the second sample image;
- a second difference calculation sub-module configured to calculate a third difference value between the second reference category and the second category, and calculate a fourth difference value between the second reference position and the second position ;
- the second adjustment sub-module is configured to adjust the fourth model parameter and the fifth model parameter based on the third difference value and the fourth difference value, return to execute the acquisition of the second sample image in the training set, and The step of the second position and the second category corresponding to the target in the detection frame included in the second sample image;
- the second training completion sub-module is used to complete the training when the number of iterations reaches the second preset number to obtain a local target detection model that associates the second sample image with the position and category of the target in the detection frame.
- the third detection result module 260 may be specifically used for:
- each sixth detection target in the current video frame image determines the overlap area and intersection area between the sixth detection target and each sixth detection target in the previous video frame image, and calculate the overlap area The quotient of the area of and the area of the intersection area;
- the sixth detection target of the current video frame image and the sixth detection target of the previous video frame image corresponding to the quotient not less than the preset threshold among the quotients are regarded as the target of successful matching, and the quotient is less than the preset threshold.
- the sixth detection target of the current video frame image corresponding to the quotient of the threshold value and the sixth detection target of the previous video frame image are regarded as the unsuccessful matching target.
- the foregoing device embodiment corresponds to the method embodiment, and has the same technical effect as the method embodiment.
- the device embodiment is obtained based on the method embodiment, and the specific description can be found in the method embodiment part, which will not be repeated here.
- modules in the device in the embodiment may be distributed in the device in the embodiment according to the description of the embodiment, or may be located in one or more devices different from this embodiment with corresponding changes.
- the modules of the above-mentioned embodiments can be combined into one module or further divided into multiple sub-modules.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (10)
- 一种视频目标检测与跟踪方法,其特征在于,包括:检测是否接收到采集设备实时采集的周围环境的当前视频帧图像;如果接收到当前视频帧图像,判断所述当前视频帧图像与上一次进行全图目标检测的视频帧图像之间的帧数间隔是否为预设间隔;如果是预设间隔,根据预先建立的全图目标检测模型对所述当前视频帧图像进行全图目标检测;当检测得到检出目标的位置和类别且当前视频帧图像的上一视频帧图像不存在检出目标时,将当前视频帧图像的检出目标作为第一检出目标,对于每个第一检出目标,基于该第一检出目标的位置确定该第一检出目标在当前视频帧图像中对应的矩形图像区域,将所述矩形图像区域的宽度和高度分别缩放至预先建立的局部目标检测模型输入图像的宽度和高度,将缩放后得到的矩形图像区域输入所述局部目标检测模型中得到第二检出目标的位置和类别,返回执行所述检测是否接收到采集设备实时采集的周围环境的当前视频帧图像的步骤;当未检测出检出目标且当前视频帧图像的上一视频帧图像存在检出目标时,将所述上一视频帧图像存在的检出目标作为第三检出目标,对于每个第三检出目标,确定该第三检出目标在当前视频帧图像中对应的矩形图像区域,将所述矩形图像区域的宽度和高度分别缩放至预先建立的局部目标检测模型输入图像的宽度和高度,将缩放后得到的矩形图像区域输入所述局部目标检测模型中得到第四检出目标的位置和类别,建立所述第四检出目标与该第三检出目标的对应关系,返回执行所述检测是否接收到采集设备实时采集的周围环境的当前视频帧图像的步骤;当检测得到检出目标的位置和类别且当前视频帧图像的上一视频帧图像存在检出目标时,将当前视频帧图像的检出目标和上一视频帧图像存在的检出目标作为第五检出目标,对于每个第五检出目标,确定该第五检出目标在该第五检出目标所在视频帧图像中对应的矩形图像区域,将所述矩形图像区域的宽度和高度分别缩放至预先建立的局部目标检测模型输入图像的宽度和高度,将缩放后得到的矩形图像区域输入所述局部目标检测模型中得到第六检出目标的位置和类别,对多个第六检出目标进行目标匹配,得到当前视频帧图像与上一视频帧图像之间匹配成功的目标和匹配不成功的目标,返回执行所述检测是否接收到采集设备实时采集的周围环境的当前视频帧图像的步骤。
- 如权利要求1所述的方法,其特征在于,在所述检测是否接收到采集设备实时采集的周围环境的当前视频帧图像的步骤之后,所述方法还包括:如果未接收到当前视频帧图像,输出所述当前视频帧图像的上一视频帧图像存在的检出目标的位置和类别以及各检出目标的对应关系。
- 如权利要求1所述的方法,其特征在于,在所述判断所述当前视频帧图像与上一次进行全图目标检测的视频帧图像之间的帧数间隔是否为预设间隔的步骤之后,所述方法还包括:如果不是预设间隔,在当前视频帧图像的上一视频帧图像存在检出目标时,将所述上一视频帧图像存在的检出目标作为第七检出目标,对于每个第七检出目标,确定该第七检出目标在当前视频帧图像中对应的矩形图像区域,将所述矩形图像区域的宽度和高度分别缩放至预先建立的局部目标检测模型输入图像的宽度和高度,将缩放后得到的矩形图像区域输入所述局部目标检测模型中得到第八检出目标的位置和类别,建立所述第八检出目标与该第七检出目标的对应关系,返回执行所述检测是否接收到采集设备实时采集的周围环境的当前视频帧图像的步骤。
- 如权利要求1所述的方法,其特征在于,所述全图目标检测模型的训练过程为:获取训练集中的第一样本图像以及所述第一样本图像包含的检测框内的目标对应的第一位置和第一类别;将所述第一样本图像以及所述第一样本图像包含的检测框内的目标对应的第一位置和第一类别输入到第一初始网络模型中,其中,所述第一初始网络模型包括第一特征提取层、区域生成网络层和第一回归层;通过所述第一特征提取层的第一模型参数,确定所述第一样本图像中的全图特征向量;通过所述区域生成网络层的第二模型参数对所述全图特征向量进行特征计算,得到包含第一参考目标的候选区域的特征信息;通过所述第一回归层的第三模型参数,对所述特征信息进行回归,得到所述第一参考目标所属的第一参考类别和所述第一参考目标在所述第一样本图像中的第一参考位置;计算所述第一参考类别与所述第一类别之间的第一差异值,计算所述第一参考位置与所述第一位置之间的第二差异值;基于所述第一差异值和所述第二差异值调整所述第一模型参数、所述第二模型参数和所述第三模型参数,返回执行所述获取训练集中的第一样本图像以及所述第一样本图像包含的检测框内的目标对应的第一位置和第一类别的步骤;当迭代次数达到第一预设次数时,完成训练,得到使得第一样本图像与检测框内的目标的位置和类别相关联的全图目标检测模型。
- 如权利要求1所述的方法,其特征在于,所述对于每个第一检出目标,基于该第一检出目标的位置确定该第一检出目标在当前视频帧图像中对应的矩形图像区域,将所述矩形图像区域的宽度和高度分别缩放至预先建立的局部目标检测模型输入图像的宽度和高度的步骤,包括:对于每个第一检出目标,基于该第一检出目标的位置确定该第一检出目标在当前视频帧图像中的左上角点的坐标和右下角点的坐标,在当前视频帧图像中得到以所述左上角点和所述右下角点为对角线的矩形图像区域;根据所述左上角点的坐标、所述右下角点的坐标、预设坐标变换系数以及预先建立的局部目标检测模型输入图像的宽度和高度计算得到所述左上角点缩放后的坐标和所述右下角点缩放后的坐标;基于所述左上角点的坐标、所述右下角点的坐标、所述左上角点缩放后的坐标和所述右下角点缩放后的坐标,将所述矩形图像区域的宽度和高度分别缩放至预先建立的局部目标检测模型输入图像的宽度和高度。
- 如权利要求1所述的方法,其特征在于,所述局部目标检测模型的训练过程为:获取训练集中的第二样本图像以及所述第二样本图像包含的检测框内的目标对应的第二位置和第二类别;将所述第二样本图像以及所述第二样本图像包含的检测框内的目标对应的第二位置和第二类别输入到第二初始网络模型中,其中,所述第二初始网络模型包括第二特征提取层和第二回归层;通过所述第二特征提取层的第四模型参数,确定所述第二样本图像中的特征向量;通过所述第二回归层的第五模型参数,对所述特征向量进行回归,得到第二参考目标所属的第二参考类别和所述第二参考目标在所述第二样本图像中的第二参考位置;计算所述第二参考类别与所述第二类别之间的第三差异值,计算所述第二参考位置与所述第二位置之间的第四差异值;基于所述第三差异值和所述第四差异值调整所述第四模型参数和所述第五模型参数,返回执行所述获取训练集中的第二样本图像以及所述第二样本图像包含的检测框内的目标对应的第二位置和第二类别的步骤;当迭代次数达到第二预设次数时,完成训练,得到使得第二样本图像与检测框内的目标的位置和类别相关联的局部目标检测模型。
- 如权利要求1所述的方法,其特征在于,所述对多个第六检出目标进行目标匹配,得到当前视频帧图像与上一视频帧图像之间匹配成功的目标和匹配不成功的目标的步骤,包括:对于当前视频帧图像的每个第六检出目标,确定该第六检出目标与上一视频帧图像的每个第六检出目标之间的重叠区域以及相交区域,并计算所述重叠区域的面积与所述相交区域的面积的商;将所述商中不小于预设阈值的商对应的当前视频帧图像的第六检出目标以及上一视频帧图像的第 六检出目标作为匹配成功的目标,将所述商中小于预设阈值的商对应的当前视频帧图像的第六检出目标以及上一视频帧图像的第六检出目标作为匹配不成功的目标。
- 一种视频目标检测与跟踪装置,其特征在于,包括:检测模块,用于检测是否接收到采集设备实时采集的周围环境的当前视频帧图像;判断模块,用于如果接收到当前视频帧图像,判断所述当前视频帧图像与上一次进行全图目标检测的视频帧图像之间的帧数间隔是否为预设间隔,如果是,触发全图目标检测模块;所述全图目标检测模块,用于根据预先建立的全图目标检测模型对所述当前视频帧图像进行全图目标检测;第一检测结果模块,用于当检测得到检出目标的位置和类别且当前视频帧图像的上一视频帧图像不存在检出目标时,将当前视频帧图像的检出目标作为第一检出目标,对于每个第一检出目标,基于该第一检出目标的位置确定该第一检出目标在当前视频帧图像中对应的矩形图像区域,将所述矩形图像区域的宽度和高度分别缩放至预先建立的局部目标检测模型输入图像的宽度和高度,将缩放后得到的矩形图像区域输入所述局部目标检测模型中得到第二检出目标的位置和类别,触发所述检测模块;第二检测结果模块,用于当未检测出检出目标且当前视频帧图像的上一视频帧图像存在检出目标时,将所述上一视频帧图像存在的检出目标作为第三检出目标,对于每个第三检出目标,确定该第三检出目标在当前视频帧图像中对应的矩形图像区域,将所述矩形图像区域的宽度和高度分别缩放至预先建立的局部目标检测模型输入图像的宽度和高度,将缩放后得到的矩形图像区域输入所述局部目标检测模型中得到第四检出目标的位置和类别,建立所述第四检出目标与该第三检出目标的对应关系,触发所述检测模块;第三检测结果模块,用于当检测得到检出目标的位置和类别且当前视频帧图像的上一视频帧图像存在检出目标时,将当前视频帧图像的检出目标和上一视频帧图像存在的检出目标作为第五检出目标,对于每个第五检出目标,确定该第五检出目标在该第五检出目标所在视频帧图像中对应的矩形图像区域,将所述矩形图像区域的宽度和高度分别缩放至预先建立的局部目标检测模型输入图像的宽度和高度,将缩放后得到的矩形图像区域输入所述局部目标检测模型中得到第六检出目标的位置和类别,对多个第六检出目标进行目标匹配,得到当前视频帧图像与上一视频帧图像之间匹配成功的目标和匹配不成功的目标,触发所述检测模块。
- 如权利要求8所述的装置,其特征在于,所述装置还包括:输出模块,用于在所述检测是否接收到采集设备实时采集的周围环境的当前视频帧图像之后,如果未接收到当前视频帧图像,输出所述当前视频帧图像的上一视频帧图像存在的检出目标的位置和类别以及各检出目标的对应关系。
- 如权利要求8所述的装置,其特征在于,所述装置还包括:第四检测结果模块,用于在所述判断所述当前视频帧图像与上一次进行全图目标检测的视频帧图像之间的帧数间隔是否为预设间隔之后,如果不是预设间隔,在当前视频帧图像的上一视频帧图像存在检出目标时,将所述上一视频帧图像存在的检出目标作为第七检出目标,对于每个第七检出目标,确定该第七检出目标在当前视频帧图像中对应的矩形图像区域,将所述矩形图像区域的宽度和高度分别缩放至预先建立的局部目标检测模型输入图像的宽度和高度,将缩放后得到的矩形图像区域输入所述局部目标检测模型中得到第八检出目标的位置和类别,建立所述第八检出目标与该第七检出目标的对应关系,触发所述检测模块。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910729242.7A CN112347817B (zh) | 2019-08-08 | 2019-08-08 | 一种视频目标检测与跟踪方法及装置 |
CN201910729242.7 | 2019-08-08 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021022643A1 true WO2021022643A1 (zh) | 2021-02-11 |
Family
ID=74367598
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/108080 WO2021022643A1 (zh) | 2019-08-08 | 2019-09-26 | 一种视频目标检测与跟踪方法及装置 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN112347817B (zh) |
WO (1) | WO2021022643A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112966699A (zh) * | 2021-03-24 | 2021-06-15 | 沸蓝建设咨询有限公司 | 一种通信工程项目的目标检测*** |
CN114305317A (zh) * | 2021-12-23 | 2022-04-12 | 广州视域光学科技股份有限公司 | 一种智能辨别用户反馈视标的方法和*** |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113962141A (zh) * | 2021-09-22 | 2022-01-21 | 北京智行者科技有限公司 | 一种目标检测模型自动化迭代方法、设备及存储介质 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150116504A1 (en) * | 2008-12-04 | 2015-04-30 | Sony Corporation | Image processing device and method, image processing system, and image processing program |
CN106228571A (zh) * | 2016-07-15 | 2016-12-14 | 北京光年无限科技有限公司 | 面向机器人的目标物追踪检测方法及装置 |
CN106599836A (zh) * | 2016-12-13 | 2017-04-26 | 北京智慧眼科技股份有限公司 | 多人脸跟踪方法及跟踪*** |
CN106875425A (zh) * | 2017-01-22 | 2017-06-20 | 北京飞搜科技有限公司 | 一种基于深度学习的多目标追踪***及实现方法 |
CN107563313A (zh) * | 2017-08-18 | 2018-01-09 | 北京航空航天大学 | 基于深度学习的多目标行人检测与跟踪方法 |
CN108388879A (zh) * | 2018-03-15 | 2018-08-10 | 斑马网络技术有限公司 | 目标的检测方法、装置和存储介质 |
CN108694724A (zh) * | 2018-05-11 | 2018-10-23 | 西安天和防务技术股份有限公司 | 一种长时间目标跟踪方法 |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106650630B (zh) * | 2016-11-11 | 2019-08-23 | 纳恩博(北京)科技有限公司 | 一种目标跟踪方法及电子设备 |
CN108491816A (zh) * | 2018-03-30 | 2018-09-04 | 百度在线网络技术(北京)有限公司 | 在视频中进行目标跟踪的方法和装置 |
CN109035292B (zh) * | 2018-08-31 | 2021-01-01 | 北京智芯原动科技有限公司 | 基于深度学习的运动目标检测方法及装置 |
CN109584276B (zh) * | 2018-12-04 | 2020-09-25 | 北京字节跳动网络技术有限公司 | 关键点检测方法、装置、设备及可读介质 |
-
2019
- 2019-08-08 CN CN201910729242.7A patent/CN112347817B/zh active Active
- 2019-09-26 WO PCT/CN2019/108080 patent/WO2021022643A1/zh active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150116504A1 (en) * | 2008-12-04 | 2015-04-30 | Sony Corporation | Image processing device and method, image processing system, and image processing program |
CN106228571A (zh) * | 2016-07-15 | 2016-12-14 | 北京光年无限科技有限公司 | 面向机器人的目标物追踪检测方法及装置 |
CN106599836A (zh) * | 2016-12-13 | 2017-04-26 | 北京智慧眼科技股份有限公司 | 多人脸跟踪方法及跟踪*** |
CN106875425A (zh) * | 2017-01-22 | 2017-06-20 | 北京飞搜科技有限公司 | 一种基于深度学习的多目标追踪***及实现方法 |
CN107563313A (zh) * | 2017-08-18 | 2018-01-09 | 北京航空航天大学 | 基于深度学习的多目标行人检测与跟踪方法 |
CN108388879A (zh) * | 2018-03-15 | 2018-08-10 | 斑马网络技术有限公司 | 目标的检测方法、装置和存储介质 |
CN108694724A (zh) * | 2018-05-11 | 2018-10-23 | 西安天和防务技术股份有限公司 | 一种长时间目标跟踪方法 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112966699A (zh) * | 2021-03-24 | 2021-06-15 | 沸蓝建设咨询有限公司 | 一种通信工程项目的目标检测*** |
CN114305317A (zh) * | 2021-12-23 | 2022-04-12 | 广州视域光学科技股份有限公司 | 一种智能辨别用户反馈视标的方法和*** |
CN114305317B (zh) * | 2021-12-23 | 2023-05-12 | 广州视域光学科技股份有限公司 | 一种智能辨别用户反馈视标的方法和*** |
Also Published As
Publication number | Publication date |
---|---|
CN112347817B (zh) | 2022-05-17 |
CN112347817A (zh) | 2021-02-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021196294A1 (zh) | 一种跨视频人员定位追踪方法、***及设备 | |
WO2021022643A1 (zh) | 一种视频目标检测与跟踪方法及装置 | |
CN104282020B (zh) | 一种基于目标运动轨迹的车辆速度检测方法 | |
CN111126399B (zh) | 一种图像检测方法、装置、设备及可读存储介质 | |
WO2018177026A1 (zh) | 确定道路边沿的装置和方法 | |
CN104700414B (zh) | 一种基于车载双目相机的前方道路行人快速测距方法 | |
WO2020253010A1 (zh) | 一种泊车定位中的停车场入口定位方法、装置及车载终端 | |
JP2019124683A (ja) | オブジェクト速度推定方法と装置及び画像処理機器 | |
Kuk et al. | Fast lane detection & tracking based on Hough transform with reduced memory requirement | |
WO2020237942A1 (zh) | 一种行人3d位置的检测方法及装置、车载终端 | |
CN114089330B (zh) | 一种基于深度图像修复的室内移动机器人玻璃检测与地图更新方法 | |
JP2015055875A (ja) | 移動先レーンに合流する移動アイテムの移動元レーンの判定 | |
CN112906777A (zh) | 目标检测方法、装置、电子设备及存储介质 | |
CN116310679A (zh) | 多传感器融合目标检测方法、***、介质、设备及终端 | |
JPH07262375A (ja) | 移動体検出装置 | |
CN111402293A (zh) | 面向智能交通的一种车辆跟踪方法及装置 | |
CN110147748A (zh) | 一种基于道路边缘检测的移动机器人障碍物识别方法 | |
CN101320477B (zh) | 一种人体跟踪方法及其设备 | |
CN113256683B (zh) | 目标跟踪方法及相关设备 | |
CN112347818B (zh) | 一种视频目标检测模型的困难样本图像筛选方法及装置 | |
WO2024131200A1 (zh) | 基于单目视觉的车辆3d定位方法、装置及汽车 | |
CN116958935A (zh) | 基于多视角的目标定位方法、装置、设备及介质 | |
CN114037977B (zh) | 道路灭点的检测方法、装置、设备及存储介质 | |
CN105894505A (zh) | 一种基于多摄像机几何约束的快速行人定位方法 | |
CN115249407B (zh) | 指示灯状态识别方法、装置、电子设备、存储介质及产品 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19940628 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19940628 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 14.11.2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19940628 Country of ref document: EP Kind code of ref document: A1 |