CN116452631A - Multi-target tracking method, terminal equipment and storage medium - Google Patents

Multi-target tracking method, terminal equipment and storage medium Download PDF

Info

Publication number
CN116452631A
CN116452631A CN202310306107.8A CN202310306107A CN116452631A CN 116452631 A CN116452631 A CN 116452631A CN 202310306107 A CN202310306107 A CN 202310306107A CN 116452631 A CN116452631 A CN 116452631A
Authority
CN
China
Prior art keywords
target
track
matching
tracks
previous frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310306107.8A
Other languages
Chinese (zh)
Inventor
陈龙涛
廖国兴
曾焕强
朱建清
黄德天
傅玉青
施一帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaqiao University
Original Assignee
Huaqiao University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaqiao University filed Critical Huaqiao University
Priority to CN202310306107.8A priority Critical patent/CN116452631A/en
Publication of CN116452631A publication Critical patent/CN116452631A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30241Trajectory
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a multi-target tracking method, terminal equipment and a storage medium, wherein the method comprises the following steps: reading video information; performing target segmentation on a frame image in the video information to obtain pixel level information and apparent characteristic information of a target; predicting the track in each frame of image based on a Kalman filtering algorithm; based on the track in the previous frame image and the apparent characteristic information of the target in the current frame image, calculating the appearance similarity between the track and the target, extracting the track and the target with the appearance similarity larger than a similarity threshold as a pre-matching track and a pre-matching target, and storing the pre-matching target into a matching target set; calculating a cost matrix between the pre-matching track and the pre-matching target, calculating Mask-IoU scores between the non-matching track and the non-matching target, and fusing the two to obtain a final cost matrix; and obtaining a track matching result through a Hungary algorithm. Compared with the existing method, the method has the advantages of efficiency and performance.

Description

Multi-target tracking method, terminal equipment and storage medium
Technical Field
The present invention relates to the field of video signal processing, and in particular, to a multi-target tracking method, a terminal device, and a storage medium.
Background
Existing mainstream real-time tracking methods generally use a deep learning-based build tracking method to improve correlation accuracy. However, in some cases, tracking methods built based on deep learning often do not have high tracking accuracy and fast operation, and are compatible with low cost, and the high complexity of these methods results in often unbalanced performance and efficiency.
Disclosure of Invention
In order to solve the problems, the invention provides a multi-target tracking method, a terminal device and a storage medium.
The specific scheme is as follows:
a multi-target tracking method comprising the steps of:
s1: reading video information to be subjected to target tracking;
s2: performing target segmentation on a frame image in the video information to obtain pixel level information of a target; obtaining apparent characteristic information of the target through a re-recognition model based on pixel level information of the target;
s3: predicting the track in each frame of image based on a Kalman filtering algorithm;
s4: based on the track in the previous frame image and the apparent characteristic information of the target in the current frame image, calculating the appearance similarity between the track and the target, extracting the track and the target with the appearance similarity greater than a similarity threshold as a pre-matching track and a pre-matching target, storing the pre-matching target into a matching target set, taking other targets except the pre-matching target in the current frame image as unmatched targets, and taking other tracks except the pre-matching track in the previous frame image as unmatched tracks;
s5: calculating Mask-IoU scores between the pre-matching track and the pre-matching target through pixel-level information of the pre-matching track and the pre-matching target; calculating the appearance similarity between the pre-matching track and the pre-matching target through the apparent characteristic information of the pre-matching track and the pre-matching target; taking a weighted summation result of the Mask-IoU score and the appearance similarity as a cost matrix of the pre-matching track and the pre-matching target; calculating Mask-IoU scores between the unmatched tracks and the unmatched objects through pixel-level information of the unmatched tracks and the unmatched objects; fusing the cost matrix of the pre-matching track and the pre-matching target with the Mask-IoU score between the non-matching track and the non-matching target to obtain a final cost matrix;
s6: and obtaining a matching result of each target and the track in the previous frame image through a Hungary algorithm based on the final cost matrix and the track set of the previous frame image.
Further, the specific process of step S4 includes:
extracting all tracks with track survival time smaller than the maximum track survival time from tracks of a previous frame of image to form a survival track set;
adding the target in the current frame image to an unmatched target set;
pairing tracks in the survival track set with the targets in the unmatched target set pairwise to form candidate pairs;
and calculating the appearance similarity between the tracks and the targets in each candidate pair, and extracting the tracks and the targets in the candidate pair with the appearance similarity larger than a similarity threshold as a pre-matching track and a pre-matching target.
Further, the track survival time is initialized to 0, the track survival time of the track which is judged to be the pre-matching track is set to 0 in each frame, and the track survival time of other tracks is added with 1.
Further, the calculation formula of the cost matrix N is:
N=λC+(1-λ)I
wherein λ represents the weight, C represents the appearance similarity matrix, and I represents the Mask-IoU score matrix.
Further, the track of the target is matched to the track of the previous frame and the track of n continuous frames before the previous frame as the confirmation track of the previous frame, other tracks are used as the non-confirmation tracks of the previous frame, the track in the previous frame image in the step S4 is the confirmation track of the previous frame, and the track set of the previous frame image in the step S6 is the set formed by all tracks in the previous frame image.
A multi-target tracking terminal device comprising a processor, a memory and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method according to the embodiments of the invention when the computer program is executed.
A computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method described above for embodiments of the present invention.
Compared with the existing method, the method has the advantages of being capable of combining efficiency and performance, and capable of meeting the actual application demands of low cost and instant at present.
Drawings
Fig. 1 is a flowchart of a first embodiment of the present invention.
Detailed Description
For further illustration of the various embodiments, the invention is provided with the accompanying drawings. The accompanying drawings, which are incorporated in and constitute a part of this disclosure, illustrate embodiments and together with the description, serve to explain the principles of the embodiments. With reference to these matters, one of ordinary skill in the art will understand other possible embodiments and advantages of the present invention.
The invention will now be further described with reference to the drawings and detailed description.
Embodiment one:
the embodiment of the invention provides a multi-target tracking method, as shown in fig. 1, comprising the following steps:
s1: and reading video information to be subjected to target tracking.
The reading of the video information is performed in this embodiment via the open-cv interface.
S2: performing target segmentation on a frame image in the video information to obtain pixel level information of a target; and obtaining apparent characteristic information of the target through the re-identification model based on the pixel level information of the target.
The target segmentation model and the re-recognition model are all existing models, and are not limited herein.
S3: and predicting the track in each frame of image based on a Kalman filtering algorithm.
S4: based on the track in the previous frame image and the apparent characteristic information of the target in the current frame image, calculating the appearance similarity between the track and the target, extracting the track and the target with the appearance similarity greater than a similarity threshold as a pre-matching track and a pre-matching target, storing the pre-matching target into a matching target set, taking other targets except the pre-matching target in the current frame image as unmatched targets, and taking other tracks except the pre-matching track in the previous frame image as unmatched tracks.
Step S4 belongs to a pre-matching phase, the implementation of which comprises the following steps:
s401: extracting all tracks with track survival time smaller than the maximum track survival time from tracks of the previous frame of image, and sequencing the tracks according to the sequence from small track survival time to large track survival time to form a survival track set;
s402: adding the target in the current frame image to an unmatched target set;
s403: pairing tracks in the survival track set with the targets in the unmatched target set pairwise to form candidate pairs;
s404: and calculating the appearance similarity between the tracks and the targets in each candidate pair, and extracting the tracks and the targets in the candidate pair with the appearance similarity larger than a similarity threshold as a pre-matching track and a pre-matching target.
In the appearance similarity calculation, the embodiment adopts a cosine distance algorithm, and a similarity threshold and a maximum track survival time are set by a person skilled in the art according to requirements.
Each pre-matching target only has one pre-matching track, and if the pre-matching target has two or more candidate pairs with appearance similarity larger than the similarity threshold, the track in the candidate pair with the largest appearance similarity is selected as the matching track of the pre-matching target.
In the setting of the track survival time, the initialization is set to 0, the track survival time of the track which is judged to be the pre-match track is set to 0 in each frame, and the track survival time of other tracks is added with 1.
S5: calculating Mask-IoU scores between the pre-matching track and the pre-matching target through pixel-level information of the pre-matching track and the pre-matching target; calculating the appearance similarity between the pre-matching track and the pre-matching target through the apparent characteristic information of the pre-matching track and the pre-matching target; taking a weighted summation result of the Mask-IoU score and the appearance similarity as a cost matrix of the pre-matching track and the pre-matching target; calculating Mask-IoU scores between the unmatched tracks and the unmatched objects through pixel-level information of the unmatched tracks and the unmatched objects; and fusing the cost matrix of the pre-matching track and the pre-matching target with the Mask-IoU score between the non-matching track and the non-matching target to obtain a final cost matrix.
The calculation formula of the cost matrix N in this embodiment is:
N=λC+(1-λ)I
wherein λ represents the weight, C represents the appearance similarity matrix, and I represents the Mask-IoU score matrix.
The calculation formula of the final cost matrix F is as follows:
where H represents a similarity threshold.
S6: and obtaining a matching result of each target and the track in the previous frame image through a Hungary algorithm based on the final cost matrix and the track set of the previous frame image.
In order to further improve the matching efficiency, in this embodiment, the track of the target is matched to the track of the previous frame and n consecutive frames before the previous frame, where the track in the previous frame image in step S4 is the confirmation track of the previous frame, and the other tracks are the non-confirmation tracks of the previous frame, and the track set of the previous frame image in step S6 is the set formed by all the tracks in the previous frame image. By the method, matching efficiency is guaranteed, and omission of tracks when the targets corresponding to the unconfirmed tracks reappear in the video is avoided.
n is an integer greater than 2, in this embodiment n is set to 3.
The embodiment of the invention provides a multi-target tracking method capable of combining efficiency and performance, which uses a strategy of firstly detecting segmentation frame by frame and then associating data, utilizes traditional apparent and motion information to match, and integrates pre-matching information and a quick association algorithm for selecting weighting. The embodiment provides a simple and flexible tracking method capable of achieving good tracking effect, and can meet the requirements of low-cost and instant practical application.
Embodiment two:
the invention also provides a multi-target tracking terminal device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the steps in the method embodiment of the first embodiment of the invention are realized when the processor executes the computer program.
Further, as an executable scheme, the multi-target tracking terminal device may be a computing device such as a desktop computer, a notebook computer, a palm computer, and a cloud server. The multi-target tracking terminal device may include, but is not limited to, a processor, a memory. It will be appreciated by those skilled in the art that the above-described constituent structure of the multi-target tracking terminal device is merely an example of the multi-target tracking terminal device, and does not constitute limitation of the multi-target tracking terminal device, and may include more or less components than those described above, or may combine some components, or different components, for example, the multi-target tracking terminal device may further include an input/output device, a network access device, a bus, etc., which is not limited by the embodiment of the present invention.
Further, as an implementation, the processor may be a central processing unit (Central Processing Unit, CPU), other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like, which is a control center of the multi-target tracking terminal device, and connects various parts of the entire multi-target tracking terminal device using various interfaces and lines.
The memory may be used to store the computer program and/or module, and the processor may implement various functions of the multi-target tracking terminal device by running or executing the computer program and/or module stored in the memory and invoking data stored in the memory. The memory may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, at least one application program required for a function; the storage data area may store data created according to the use of the cellular phone, etc. In addition, the memory may include high-speed random access memory, and may also include non-volatile memory, such as a hard disk, memory, plug-in hard disk, smart Media Card (SMC), secure Digital (SD) Card, flash Card (Flash Card), at least one disk storage device, flash memory device, or other volatile solid-state storage device.
The present invention also provides a computer readable storage medium storing a computer program which when executed by a processor implements the steps of the above-described method of an embodiment of the present invention.
The modules/units integrated in the multi-target tracking terminal device may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a software distribution medium, and so forth.
While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (7)

1. A multi-target tracking method, comprising the steps of:
s1: reading video information to be subjected to target tracking;
s2: performing target segmentation on a frame image in the video information to obtain pixel level information of a target; obtaining apparent characteristic information of the target through a re-recognition model based on pixel level information of the target;
s3: predicting the track in each frame of image based on a Kalman filtering algorithm;
s4: based on the track in the previous frame image and the apparent characteristic information of the target in the current frame image, calculating the appearance similarity between the track and the target, extracting the track and the target with the appearance similarity greater than a similarity threshold as a pre-matching track and a pre-matching target, storing the pre-matching target into a matching target set, taking other targets except the pre-matching target in the current frame image as unmatched targets, and taking other tracks except the pre-matching track in the previous frame image as unmatched tracks;
s5: calculating Mask-IoU scores between the pre-matching track and the pre-matching target through pixel-level information of the pre-matching track and the pre-matching target; calculating the appearance similarity between the pre-matching track and the pre-matching target through the apparent characteristic information of the pre-matching track and the pre-matching target; taking a weighted summation result of the Mask-IoU score and the appearance similarity as a cost matrix of the pre-matching track and the pre-matching target; calculating Mask-IoU scores between the unmatched tracks and the unmatched objects through pixel-level information of the unmatched tracks and the unmatched objects; fusing the cost matrix of the pre-matching track and the pre-matching target with the Mask-IoU score between the non-matching track and the non-matching target to obtain a final cost matrix;
s6: and obtaining a matching result of each target and the track in the previous frame image through a Hungary algorithm based on the final cost matrix and the track set of the previous frame image.
2. The multi-target tracking method of claim 1, wherein: the specific process of step S4 comprises the following steps:
extracting all tracks with track survival time smaller than the maximum track survival time from tracks of a previous frame of image to form a survival track set;
adding the target in the current frame image to an unmatched target set;
pairing tracks in the survival track set with the targets in the unmatched target set pairwise to form candidate pairs;
and calculating the appearance similarity between the tracks and the targets in each candidate pair, and extracting the tracks and the targets in the candidate pair with the appearance similarity larger than a similarity threshold as a pre-matching track and a pre-matching target.
3. The multi-target tracking method of claim 2, wherein: the track survival time is initialized to 0, the track survival time of the track which is judged to be the pre-matched track is set to 0 in each frame, and the track survival time of other tracks is added with 1.
4. The multi-target tracking method of claim 1, wherein: the calculation formula of the cost matrix N is as follows:
N=λC+(1-)I
wherein λ represents the weight, C represents the appearance similarity matrix, and I represents the Mask-IoU score matrix.
5. The multi-target tracking method of claim 1, wherein: and (3) taking the tracks of the previous frame and the continuous n frames before the previous frame as the confirmation tracks of the previous frame, taking other tracks as the unacknowledged tracks of the previous frame, wherein the tracks in the previous frame image in the step S4 are the confirmation tracks of the previous frame, and the track set of the previous frame image in the step S6 is a set formed by all tracks in the previous frame image.
6. A multi-target tracking terminal device, characterized by: comprising a processor, a memory and a computer program stored in the memory and running on the processor, which processor, when executing the computer program, carries out the steps of the method according to any one of claims 1 to 5.
7. A computer-readable storage medium storing a computer program, characterized in that: the computer program implementing the steps of the method according to any one of claims 1 to 5 when executed by a processor.
CN202310306107.8A 2023-03-27 2023-03-27 Multi-target tracking method, terminal equipment and storage medium Pending CN116452631A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310306107.8A CN116452631A (en) 2023-03-27 2023-03-27 Multi-target tracking method, terminal equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310306107.8A CN116452631A (en) 2023-03-27 2023-03-27 Multi-target tracking method, terminal equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116452631A true CN116452631A (en) 2023-07-18

Family

ID=87132976

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310306107.8A Pending CN116452631A (en) 2023-03-27 2023-03-27 Multi-target tracking method, terminal equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116452631A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117435934A (en) * 2023-12-22 2024-01-23 中国科学院自动化研究所 Matching method, device and storage medium of moving target track based on bipartite graph
CN117576167A (en) * 2024-01-16 2024-02-20 杭州华橙软件技术有限公司 Multi-target tracking method, multi-target tracking device, and computer storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117435934A (en) * 2023-12-22 2024-01-23 中国科学院自动化研究所 Matching method, device and storage medium of moving target track based on bipartite graph
CN117576167A (en) * 2024-01-16 2024-02-20 杭州华橙软件技术有限公司 Multi-target tracking method, multi-target tracking device, and computer storage medium
CN117576167B (en) * 2024-01-16 2024-04-12 杭州华橙软件技术有限公司 Multi-target tracking method, multi-target tracking device, and computer storage medium

Similar Documents

Publication Publication Date Title
Li et al. Deep learning approaches on pedestrian detection in hazy weather
Paisitkriangkrai et al. Pedestrian detection with spatially pooled features and structured ensemble learning
CN112528831B (en) Multi-target attitude estimation method, multi-target attitude estimation device and terminal equipment
CN116452631A (en) Multi-target tracking method, terminal equipment and storage medium
CN109543641B (en) Multi-target duplicate removal method for real-time video, terminal equipment and storage medium
CN110765860A (en) Tumble determination method, tumble determination device, computer apparatus, and storage medium
WO2016119076A1 (en) A method and a system for face recognition
CN111104925B (en) Image processing method, image processing apparatus, storage medium, and electronic device
CN113112542A (en) Visual positioning method and device, electronic equipment and storage medium
WO2023083231A1 (en) System and methods for multiple instance segmentation and tracking
Tsai et al. MobileNet-JDE: a lightweight multi-object tracking model for embedded systems
Niu et al. Boundary-aware RGBD salient object detection with cross-modal feature sampling
Wang et al. Multistage model for robust face alignment using deep neural networks
CN112749576B (en) Image recognition method and device, computing equipment and computer storage medium
Guo et al. UDTIRI: An online open-source intelligent road inspection benchmark suite
CN116563588A (en) Image clustering method and device, electronic equipment and storage medium
CN116168439A (en) Lightweight lip language identification method and related equipment
CN114219831A (en) Target tracking method and device, terminal equipment and computer readable storage medium
CN114913201A (en) Multi-target tracking method, device, electronic equipment, storage medium and product
CN113033263B (en) Face image age characteristic recognition method
CN116580063B (en) Target tracking method, target tracking device, electronic equipment and storage medium
CN111814865A (en) Image identification method, device, equipment and storage medium
CN112084874A (en) Object detection method and device and terminal equipment
Qin et al. Target tracking method based on interference detection
Li et al. Learning spatial self‐attention information for visual tracking

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination