WO2022171036A1 - Procédé de suivi de cible vidéo, appareil de suivi de cible vidéo, support de stockage et dispositif électronique - Google Patents

Procédé de suivi de cible vidéo, appareil de suivi de cible vidéo, support de stockage et dispositif électronique Download PDF

Info

Publication number
WO2022171036A1
WO2022171036A1 PCT/CN2022/075086 CN2022075086W WO2022171036A1 WO 2022171036 A1 WO2022171036 A1 WO 2022171036A1 CN 2022075086 W CN2022075086 W CN 2022075086W WO 2022171036 A1 WO2022171036 A1 WO 2022171036A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
image
tracked
feature
video
Prior art date
Application number
PCT/CN2022/075086
Other languages
English (en)
Chinese (zh)
Inventor
江毅
孙培泽
袁泽寰
王长虎
Original Assignee
北京有竹居网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京有竹居网络技术有限公司 filed Critical 北京有竹居网络技术有限公司
Publication of WO2022171036A1 publication Critical patent/WO2022171036A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Definitions

  • the present application is based on the Chinese application with the application number of 202110179157.5 and the filing date of February 9, 2021, and claims its priority.
  • the disclosure of the Chinese application is hereby incorporated into the present application as a whole.
  • the present disclosure relates to the technical field of image processing, and in particular, to a video target tracking method, a video target tracking device, a storage medium, and an electronic device.
  • Video target tracking is the basis of many video application fields such as human behavior analysis and sports video commentary, and requires high real-time performance.
  • the video target tracking in the related art is usually based on the process of first target detection and then target tracking. Specifically, the target detection is performed on the two frames before and after the video, and then the detected targets are matched into pairs, so as to achieve target tracking.
  • the inventor believes that, in the related art, target detection needs to be performed first, and then target tracking is performed, so the delay is relatively high, especially in a scenario where there are many targets to be tracked, the delay problem is particularly obvious.
  • the present disclosure provides a video target tracking method, tracking device, storage medium and electronic equipment, so as to realize end-to-end video target tracking and reduce the time delay of video target tracking.
  • the present disclosure provides a video target tracking method, the method comprising:
  • the target to be tracked in the image is determined.
  • the present disclosure provides a video target tracking device, the device comprising:
  • the acquisition module is used to acquire the video to be tracked
  • a tracking module configured to input the video to be tracked into a target tracking model to obtain a target tracking result corresponding to the video to be tracked, the tracking module includes:
  • a first determination submodule configured to determine, for each frame of the video to be tracked, a feature vector corresponding to the target to be tracked in the target detection image corresponding to the image, and the target detection image includes the target to be tracked;
  • the second determination submodule is configured to perform a first similarity calculation between each feature vector in the feature map corresponding to the image and the feature vector corresponding to the target to be tracked in the target detection image, and calculate the similarity according to the first similarity calculation.
  • a similarity calculation result determining the target feature vector in all the feature vectors of the feature map;
  • the third determination sub-module is configured to determine the target to be tracked in the image according to the target feature vector.
  • the present disclosure provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processing apparatus, implements any of the video target tracking methods provided by the embodiments of the present disclosure.
  • the present disclosure provides an electronic device, comprising:
  • a processing device is configured to execute the computer program in the storage device to implement any video target tracking method provided by the embodiments of the present disclosure.
  • the present disclosure provides a computer program, comprising: instructions, when executed by a processor, the instructions cause the processor to execute any of the video object tracking methods provided by the embodiments of the present disclosure.
  • the present disclosure provides a computer program product comprising instructions, which when executed by a processor, cause the processor to execute any of the video object tracking methods provided by the embodiments of the present disclosure.
  • the target tracking model can perform the first similarity calculation between the feature vector corresponding to each frame of the video to be tracked and the feature vector corresponding to the target to be tracked in the target detection image, so as to determine the first similarity according to the first similarity calculation result.
  • the target to be tracked in each frame of image Therefore, the target to be tracked in each frame of image output by the target tracking model can correspond to the target to be tracked in the target detection image one-to-one, that is, target detection and target association can be completed at the same time, thereby reducing the time in the target tracking process. extension.
  • FIG. 1 is a flowchart of a method for tracking a video target according to an exemplary embodiment of the present disclosure
  • FIG. 2 is a schematic diagram of a target tracking process in a video target tracking method according to an exemplary embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of a target tracking process in another video target tracking method according to an exemplary embodiment of the present disclosure
  • FIG. 4 is a block diagram of a video target tracking apparatus according to an exemplary embodiment of the present disclosure.
  • FIG. 5 is a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.
  • the term “including” and variations thereof are open-ended inclusions, ie, "including but not limited to”.
  • the term “based on” is “based at least in part on.”
  • the term “one embodiment” means “at least one embodiment”; the term “another embodiment” means “at least one additional embodiment”; the term “some embodiments” means “at least some embodiments”. Relevant definitions of other terms will be given in the description below.
  • Video target tracking in the related art is usually based on a process of target detection and then target tracking.
  • the detection module performs target detection on the two frames before and after the video, and then the detected targets are matched into pairs by the association module, so as to achieve target tracking.
  • the model components of this process are relatively complex, and the delay is relatively high, especially in scenarios where there are many targets to be tracked, the delay problem is particularly obvious.
  • the present disclosure proposes a video target tracking method, a video target tracking device, a storage medium and an electronic device, so as to realize end-to-end video target tracking and reduce the time delay of video target tracking.
  • FIG. 1 is a flowchart of a video target tracking method according to an exemplary embodiment of the present disclosure. 1, the video target tracking method includes:
  • Step 101 acquiring the video to be tracked.
  • acquiring the video to be tracked may be in response to the user's video input operation, acquiring the video input by the user, or automatically acquiring the video captured by the image capturing device from the image capturing device after receiving the target tracking instruction, etc. , which is not limited in this embodiment of the present disclosure.
  • Step 102 Input the video to be tracked into the target tracking model to obtain the target tracking result corresponding to the video to be tracked.
  • the target tracking model is used to perform the following processing: for each frame of the video to be tracked, determine the feature vector corresponding to the target to be tracked in the target detection image corresponding to the image, and the target detection image includes the target to be tracked; The first similarity calculation is performed between each feature vector in the feature map and the feature vector corresponding to the target to be tracked in the target detection image, and the target feature vector is determined in all the feature vectors in the feature map according to the first similarity calculation result; According to the target feature vector, the target to be tracked in the image is determined.
  • the target tracking model can perform the first similarity calculation between the feature vector corresponding to each frame of the video to be tracked and the feature vector corresponding to the target to be tracked in the target detection image, so as to determine the first similarity according to the first similarity calculation result.
  • the video to be tracked may be input into the target tracking model.
  • each frame of images in the video to be tracked has a time sequence, so a video image sequence composed of multiple frames of images arranged in time sequence can be obtained according to the video to be tracked. Therefore, inputting the video to be tracked into the target tracking model may also be inputting the video image sequence corresponding to the video to be tracked into the target tracking model.
  • the training of the target tracking model may be performed according to the sample images and sample target information corresponding to the sample images.
  • the sample image can be input into the target tracking model to obtain the predicted target information output by the target tracking model for the sample image, then the loss function is calculated according to the predicted target information and the sample target information, and finally the loss function is calculated according to the As a result, the parameters of the target tracking model are adjusted to make the target tracking model output more accurate target information.
  • the target detection function and the target tracking function of the target tracking model can be trained synchronously.
  • the model training method in the related art is to gradually train the detection module and the association module. In scenarios with many targets to be tracked, this training method requires It takes a lot of time and it is difficult to achieve the optimal training effect.
  • the method of synchronizing the training of the target detection function and the target tracking function of the target tracking model in the embodiment of the present disclosure not only simplifies the components of the target tracking model, but also simplifies the training process of the target tracking model, which can better satisfy multiple targets. Tracked scene requirements.
  • the target tracking model can determine the feature vector corresponding to the target to be tracked in the target detection image corresponding to the image for each frame of the video to be tracked, and then compare each feature vector in the feature map corresponding to the image with the target detection image.
  • the first similarity calculation is performed on the feature vector corresponding to the target to be tracked in the image, and the target feature vector is determined from all the feature vectors in the feature map according to the first similarity calculation result.
  • the target to be tracked in the image is determined according to the target feature vector.
  • the target detection image may be a previous frame image of the image including the target to be tracked, or the target detection image may be a preset input image including the target to be tracked.
  • the video target tracking method provided by the embodiments of the present disclosure can be applied to two application scenarios where the tracking target is given and the tracking target is unknown.
  • the target detection image may be a preset input image including the target to be tracked, for example, the target to be tracked is person A, then the preset input image may be the person photographed by an image acquisition device Full body photo of A.
  • the target detection image may be the previous frame of the image including the target to be tracked.
  • a feature vector corresponding to the target to be tracked in the target detection image may be determined, and the feature vector may be a result of vectorization based on the image features of the center pixel of the target to be tracked, or the feature vector may be It is the result obtained by vectorizing the image feature of a certain pixel point that the target to be tracked can be distinguished from other targets, etc., which is not limited in this embodiment of the present disclosure.
  • the manner of determining the feature vector is similar to that in the related art, and details are not repeated here.
  • the feature map corresponding to the image may be an image obtained by quantization according to the image feature vector of each pixel in the image.
  • the feature vector corresponding to the target to be tracked in the target detection image is a pixel-level feature vector, so each feature vector in the feature map corresponding to the target to be tracked image corresponding to the target to be tracked can be compared with the target detection image corresponding to the target to be tracked.
  • the feature vector performs the first similarity calculation, so as to achieve target tracking according to the first similarity calculation result.
  • the first similarity calculation may be to perform vector dot product calculation, Euclidean distance calculation, etc. between the feature vector corresponding to each pixel in the image and the feature vector corresponding to the target to be tracked in the target detection image.
  • the method of calculating the first similarity is not limited.
  • the target tracking model may include an attention mechanism module, and the attention mechanism module may perform the first similarity calculation process to determine the feature vector corresponding to the target existing in both the image and the target detection image, that is, to obtain the target feature. vector.
  • the target detection image is a preset input image including the target to be tracked.
  • the attention mechanism module can correspond to each feature vector in the feature map with the target to be tracked in the preset input image
  • the first similarity calculation is performed on the feature vector of , and the target feature vector is output according to the first similarity calculation result, so that the target tracking model can determine the target to be tracked in the frame image according to the target feature vector.
  • determining the target feature vector among all the feature vectors of the feature map may be: when the target detection image includes N targets to be tracked, among all the feature vectors of the feature map, Select N eigenvectors with the largest first similarity calculation result as target eigenvectors, where N is a positive integer.
  • the target detection image is a preset input image including the target to be tracked. It can be determined by the target detection method in the related art that the target detection image includes N targets to be tracked, that is, the number of feature vectors corresponding to the targets to be tracked is N. In this scenario, for each frame of the video to be tracked, among all the feature vectors included in the feature map corresponding to the image, the N feature vectors with the largest first similarity calculation result may be selected as the target feature vector to determine The target to be tracked in this image.
  • all the first similarity calculation results can be sorted from large to small, and then, from all the feature vectors included in the feature map corresponding to the image, N feature vectors corresponding to the first similarity calculation result in the top order are selected. as the target feature vector.
  • N feature vectors corresponding to the first similarity calculation result in the top order are selected. as the target feature vector.
  • N feature vectors corresponding to the first similarity calculation result in the top order are selected. as the target feature vector.
  • N feature vectors corresponding to the first similarity calculation result in the top order are selected. as the target feature vector.
  • the target feature vector is not limited in this embodiment of the present disclosure.
  • the target to be tracked determined according to the selected N feature vectors is the target existing in each frame of the video to be tracked and the target detection image, that is, the target to be tracked in each frame of the image output by the target tracking model.
  • the tracking target can be in one-to-one correspondence with the target to be tracked in the target detection image, so target detection and target association can be completed at the same time, thereby reducing the time delay in the target tracking process.
  • the target tracking model can also be used to determine the feature vectors corresponding to all targets in the image according to the pre-trained position vector parameters.
  • determining the target feature vector among all the feature vectors of the feature map may be: when the target detection image includes N targets to be tracked, among all the feature vectors of the feature map, select the first target feature vector.
  • the N eigenvectors with the largest similarity calculation result are regarded as similar eigenvectors, and N is a positive integer. Then, the feature vectors corresponding to all targets in the image and N similar feature vectors are deduplicated to obtain target feature vectors.
  • the target detection image is the previous frame of the image including the target to be tracked.
  • the attention mechanism module can compare each feature vector in the feature map with the target to be tracked in the previous frame of image.
  • a first similarity calculation is performed on the corresponding feature vector, and a similar feature vector is determined according to the first similarity calculation result.
  • the attention mechanism module can also determine the feature vectors of all objects in this frame of images according to the pre-trained position vector parameters.
  • the target tracking model can perform feature vector fusion based on the similar feature vectors and the feature vectors of all targets in the frame image to obtain the target feature vector. Finally, the target tracking model can determine the target to be tracked in the current frame image according to the target feature vector.
  • the position vector parameter may include a plurality of unit position vectors.
  • the position vector parameter may include equal to or more than H ⁇ W unit position vectors to cover each pixel in the image. Location. It should be understood that, the number of unit position vectors included in the position vector parameter can be set to be larger to adapt to the image size in different scenarios.
  • the position vector parameter may be obtained by training in the following way: determining the predicted feature vector corresponding to the target in the sample image according to the initial position vector parameter, so as to obtain the predicted target information corresponding to the sample image, wherein the sample image is pre-marked with Corresponding sample target information, and then calculate the loss function according to the predicted target information and the sample target information, and adjust the initial position vector parameter according to the calculation result of the loss function.
  • the initial position vector parameter may be a random value, that is, after setting the number of unit position vectors in the position vector, the value of each unit position vector is a random value. Then, according to the result of the loss function of the target tracking model in the training process, the position vector parameter can be adjusted through the back-propagation algorithm, so that the position vector parameter can more accurately predict the position of the target in the image.
  • the feature vector determined according to the pre-trained position vector parameter may be used as the target feature vector. It should be understood that in a scenario where the tracking target is unknown, the target detection image may be the previous frame of the image, so the similarity calculation cannot be performed on the first frame of image.
  • the feature vectors corresponding to all targets in the image determined according to the pre-trained position vector parameters may be used as target feature vectors to determine the target to be tracked in the image.
  • each frame of the video to be tracked may correspond to the previous frame of image including the target to be tracked, so the corresponding The feature vector and the first similarity calculation result determine the target feature vector.
  • N feature vectors with the largest first similarity calculation result may be selected as similar feature vectors, where N is a positive integer. Then, the feature vectors corresponding to all targets in the image and N similar feature vectors can be deduplicated to obtain target feature vectors.
  • the N similar feature vectors represent the feature vectors corresponding to the targets existing in each frame of image in the video to be tracked and the target detection image, and the feature vector determined according to the position vector parameter is each frame in the video to be tracked.
  • the determined eigenvectors are eigenvectors corresponding to the targets B1, B2, and B3, and the two have eigenvectors corresponding to the same targets (B1 and B2).
  • feature vector fusion can be performed. For example, the feature vectors corresponding to all the targets in the image and N similar feature vectors can be deduplicated to obtain the target feature vector for determining the target to be tracked. Target.
  • the feature vectors corresponding to all the targets in the image and N similar feature vectors are deduplicated to obtain the target feature vector, which can be: for the feature vector corresponding to each target in the image, the feature vector Perform the second similarity calculation with N similar feature vectors, when the second similarity calculation result is greater than or equal to the preset similarity, delete the second similarity from the feature vectors corresponding to all targets in the image or from the N similar feature vectors The eigenvector corresponding to the result of the degree calculation. Then, the eigenvectors corresponding to all the targets in the deleted image and the remaining eigenvectors in the N similar eigenvectors are taken as target eigenvectors.
  • the second similarity calculation may be for a feature vector corresponding to each target in the image, performing vector dot product calculation, Euclidean distance calculation, etc. on the feature vector and N similar feature vectors.
  • the method of similarity calculation is not limited.
  • the preset similarity can be customized according to the actual situation, which is not limited in the present disclosure.
  • the feature vector corresponding to a certain target in the image and a certain feature vector among the N similar feature vectors can be regarded as the same feature vector, so that the feature vector can be regarded as the same feature vector.
  • Vector to perform the delete operation is not limited.
  • the deletion operation can be performed in the feature vectors corresponding to all the targets in the image or in the N similar feature vectors.
  • the N similar eigenvectors are the eigenvectors corresponding to the targets B1 and B2, and the eigenvectors determined according to the position vector parameters are the eigenvectors corresponding to the targets B1, B2, and B3.
  • the feature vectors corresponding to all the targets in the image and the remaining feature vectors in the N similar feature vectors can be used as target feature vectors.
  • the N similar eigenvectors are empty, and the eigenvectors corresponding to all targets in the image are those corresponding to the targets B1, B2 and B3.
  • the feature vectors corresponding to all the targets in the image and the remaining feature vectors in the N similar feature vectors are the feature vectors corresponding to the targets B1, B2 and B3, that is, the target feature vectors are the feature vectors corresponding to the targets B1, B2 and B3 .
  • deduplication processing method is only a possible method for feature vector fusion for feature vectors provided by the embodiment of the present disclosure. During the specific implementation of the present disclosure, other methods may also be used to The feature vector and the N similar feature vectors are fused to the feature vector, which is not limited in this embodiment of the present disclosure.
  • the feature vectors corresponding to all the targets in the image can be determined through the position vector parameter, and the similar feature vectors corresponding to the targets existing in both the image and the target detection image can be determined according to the second similarity calculation result, and then the similar feature
  • the vector and the feature vectors corresponding to all the targets in the image are fused with feature vectors to remove redundant feature vectors, and at the same time improve the computational efficiency, a more accurate target to be tracked can be obtained.
  • the target feature vector can be subjected to linear feature transformation to obtain tracking frame information corresponding to the target to be tracked in the image, where the tracking frame information includes the position information of the tracking frame corresponding to the target to be tracked and size information, so that the target to be tracked can be indicated in the image according to the tracking frame information.
  • linear feature transformation to obtain tracking frame information corresponding to the target to be tracked in the image, where the tracking frame information includes the position information of the tracking frame corresponding to the target to be tracked and size information, so that the target to be tracked can be indicated in the image according to the tracking frame information.
  • an embodiment of the present disclosure also provides a video target tracking apparatus, which can become part or all of an electronic device through software, hardware, or a combination of the two. 4, the video target tracking device includes:
  • an acquisition module 401 configured to acquire the video to be tracked
  • a tracking module 402 configured to input the video to be tracked into a target tracking model to obtain a target tracking result corresponding to the video to be tracked, the tracking module includes:
  • the first determination sub-module 4021 is configured to, for each frame of the video to be tracked, determine the feature vector corresponding to the target to be tracked in the target detection image corresponding to the image, and the target detection image includes the target to be tracked ;
  • the second determination sub-module 4022 is configured to perform a first similarity calculation between each feature vector in the feature map corresponding to the image and the feature vector corresponding to the target to be tracked in the target detection image, and calculate the similarity according to The first similarity calculation result, the target feature vector is determined in all the feature vectors of the feature map;
  • the third determination sub-module 4023 is configured to determine the target to be tracked in the image according to the target feature vector.
  • the target detection image is an image of the previous frame of the image that includes the target to be tracked; or, the target detection image is a preset input image that includes the target to be tracked.
  • the second determination submodule 4022 is used for:
  • the target detection image includes N targets to be tracked, among all the feature vectors in the feature map, the N feature vectors with the largest first similarity calculation result are selected as the target feature vectors, and N is positive Integer.
  • the target tracking model is also used to determine the corresponding feature vectors of all targets in the image according to the position vector parameter of pre-training, and the second determination submodule is used for:
  • the target detection image includes N targets to be tracked, among all the feature vectors in the feature map, the N feature vectors with the largest first similarity calculation result are selected as similar feature vectors, where N is positive integer;
  • Deduplication processing is performed on the feature vectors corresponding to all targets in the image and the N similar feature vectors to obtain target feature vectors.
  • the image is a first frame image of the video to be tracked, and the device further includes:
  • the fourth determination sub-module is configured to use the feature vector determined according to the pre-trained position vector parameter as the target feature vector.
  • the second determination submodule 4022 is used for:
  • the feature vectors corresponding to all targets in the image after deletion processing and the remaining feature vectors in the N similar feature vectors are used as target feature vectors.
  • the apparatus 400 further includes the following modules for obtaining the position vector parameters through training:
  • the first training module is used to determine the predicted feature vector corresponding to the target in the sample image according to the initial position vector parameter, so as to obtain the predicted target information corresponding to the sample image, wherein the sample image is pre-marked with the corresponding sample target information;
  • the second training module is configured to calculate a loss function according to the predicted target information and the sample target information, and adjust the initial position vector parameter according to the calculation result of the loss function.
  • modules may be implemented as software components executing on one or more general-purpose processors, or as hardware, such as programmable logic devices and/or application specific integrated circuits, that perform certain functions or combinations thereof.
  • the modules may be embodied in the form of a software product that may be stored in non-volatile storage media including a computer device (eg, a personal computer, a server, a network device, mobile terminal, etc.) to implement the method described in the embodiments of the present invention.
  • a computer device eg, a personal computer, a server, a network device, mobile terminal, etc.
  • the above-mentioned modules may also be implemented on a single device, or may be distributed on multiple devices. The functions of these modules can be combined with each other or further split into multiple sub-modules.
  • an embodiment of the present disclosure further provides a computer-readable medium on which a computer program is stored, and when the program is executed by a processing device, implements the steps of any of the above video target tracking methods.
  • an electronic device including:
  • a processing device is used to execute the computer program in the storage device, so as to realize the steps of any of the above video target tracking methods.
  • Terminal devices in the embodiments of the present disclosure may include, but are not limited to, such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), vehicle-mounted terminals (eg, mobile terminals such as in-vehicle navigation terminals), etc., and stationary terminals such as digital TVs, desktop computers, and the like.
  • the electronic device shown in FIG. 5 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.
  • an electronic device 500 may include a processing device (eg, a central processing unit, a graphics processor, etc.) 501 that may be loaded into random access according to a program stored in a read only memory (ROM) 502 or from a storage device 508 Various appropriate actions and processes are executed by the programs in the memory (RAM) 503 . In the RAM 503, various programs and data required for the operation of the electronic device 500 are also stored.
  • the processing device 501, the ROM 502, and the RAM 503 are connected to each other through a bus 504.
  • An input/output (I/O) interface 505 is also connected to bus 504 .
  • I/O interface 505 input devices 506 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration
  • An output device 507 such as a computer
  • a storage device 508 including, for example, a magnetic tape, a hard disk, etc.
  • Communication means 509 may allow electronic device 500 to communicate wirelessly or by wire with other devices to exchange data. While FIG. 5 shows electronic device 500 having various means, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network via the communication device 509 , or from the storage device 508 , or from the ROM 502 .
  • the processing apparatus 501 executes the above-mentioned functions defined in the methods of the embodiments of the present disclosure.
  • an embodiment of the present disclosure provides a computer program, including: instructions, when executed by a processor, the instructions cause the processor to execute any of the above video object tracking methods.
  • an embodiment of the present disclosure provides a computer program product, including instructions, when executed by a processor, the instructions cause the processor to execute any of the above video object tracking methods.
  • the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal in baseband or propagated as part of a carrier wave with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.
  • any currently known or future developed network protocol such as HTTP (HyperText Transfer Protocol)
  • HTTP HyperText Transfer Protocol
  • communication networks include local area networks (“LAN”), wide area networks (“WAN”), the Internet (eg, the Internet), and peer-to-peer networks (eg, ad hoc peer-to-peer networks), as well as any currently known or future development network of.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: acquires the video to be tracked; inputs the video to be tracked into the target tracking model to obtain The target tracking result corresponding to the video to be tracked, and the target tracking model is used to perform the following processing: for each frame of the video to be tracked, determine the feature corresponding to the target to be tracked in the target detection image corresponding to the image vector, the target detection image includes the target to be tracked; first similarity is performed between each feature vector in the feature map corresponding to the image and the feature vector corresponding to the target to be tracked in the target detection image degree calculation, and according to the first similarity calculation result, determine the target feature vector from all the feature vectors in the feature map; according to the target feature vector, determine the target to be tracked in the image.
  • Computer program code for performing operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and This includes conventional procedural programming languages - such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider to via Internet connection).
  • LAN local area network
  • WAN wide area network
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
  • the modules involved in the embodiments of the present disclosure may be implemented in software or hardware. Among them, the name of the module does not constitute a limitation of the module itself under certain circumstances.
  • exemplary types of hardware logic components include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on Chips (SOCs), Complex Programmable Logical Devices (CPLDs) and more.
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs Systems on Chips
  • CPLDs Complex Programmable Logical Devices
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • Example 1 provides a video object tracking method, the method comprising:
  • the target to be tracked in the image is determined.
  • Example 2 provides the method of Example 1, wherein the target detection image is a previous frame image of the image including the target to be tracked; or
  • the target detection image is a preset input image including the target to be tracked.
  • Example 3 provides the method of Example 1, wherein according to the first similarity calculation result, determining a target feature vector among all feature vectors of the feature map, including:
  • the target detection image includes N targets to be tracked, among all the feature vectors in the feature map, the N feature vectors with the largest first similarity calculation result are selected as the target feature vectors, and N is positive Integer.
  • Example 4 provides the method of Example 1, wherein the target tracking model is further configured to determine feature vectors corresponding to all targets in the image according to pre-trained position vector parameters, The first similarity calculation result determines the target feature vector in all the feature vectors of the feature map, including:
  • the target detection image includes N targets to be tracked, among all the feature vectors in the feature map, the N feature vectors with the largest first similarity calculation result are selected as similar feature vectors, where N is positive integer;
  • Deduplication processing is performed on the feature vectors corresponding to all targets in the image and the N similar feature vectors to obtain target feature vectors.
  • Example 5 provides the method of Example 4, where the image is a first frame image of the video to be tracked, and the method further includes:
  • the feature vector determined according to the pre-trained position vector parameter is used as the target feature vector.
  • Example 6 provides the method of Example 4, wherein the feature vectors corresponding to all targets in the image and the N similar feature vectors are deduplicated to obtain target features vector, including:
  • the feature vectors corresponding to all targets in the image after deletion processing and the remaining feature vectors in the N similar feature vectors are used as target feature vectors.
  • Example 7 provides the method of any one of Examples 4-6, and the position vector parameter is obtained by training in the following manner:
  • a loss function is calculated according to the predicted target information and the sample target information, and the initial position vector parameter is adjusted according to the calculation result of the loss function.
  • Example 8 provides a video target tracking apparatus, the apparatus comprising:
  • the acquisition module is used to acquire the video to be tracked
  • a tracking module configured to input the video to be tracked into a target tracking model to obtain a target tracking result corresponding to the video to be tracked, the tracking module includes:
  • a first determination submodule configured to determine, for each frame of the video to be tracked, a feature vector corresponding to the target to be tracked in the target detection image corresponding to the image, and the target detection image includes the target to be tracked;
  • the second determination submodule is configured to perform a first similarity calculation between each feature vector in the feature map corresponding to the image and the feature vector corresponding to the target to be tracked in the target detection image, and calculate the similarity according to the first similarity calculation.
  • a similarity calculation result determining the target feature vector in all the feature vectors of the feature map;
  • the third determination sub-module is configured to determine the target to be tracked in the image according to the target feature vector.
  • Example 9 provides the apparatus of Example 8, and the target detection image is a previous frame image of the image including the target to be tracked; or, the target detection image is a preset input image including the target to be tracked.
  • Example 10 provides the apparatus of Example 8, the second determination submodule for:
  • the target detection image includes N targets to be tracked, among all the feature vectors in the feature map, the N feature vectors with the largest first similarity calculation result are selected as the target feature vectors, and N is positive Integer.
  • Example 11 provides the apparatus of Example 8, wherein the target tracking model is further configured to determine feature vectors corresponding to all targets in the image according to pre-trained position vector parameters, and the first Two determine the sub-module for:
  • the target detection image includes N targets to be tracked, among all the feature vectors in the feature map, the N feature vectors with the largest first similarity calculation result are selected as similar feature vectors, where N is positive integer;
  • Deduplication processing is performed on the feature vectors corresponding to all targets in the image and the N similar feature vectors to obtain target feature vectors.
  • Example 12 provides the apparatus of Example 11, where the image is a first frame image of the video to be tracked, and the apparatus further includes:
  • the fourth determination sub-module is configured to use the feature vector determined according to the pre-trained position vector parameter as the target feature vector.
  • Example 13 provides the apparatus of Example 11, the second determination submodule for:
  • the feature vectors corresponding to all targets in the image after deletion processing and the remaining feature vectors in the N similar feature vectors are used as target feature vectors.
  • Example 14 provides the apparatus of any one of Examples 11-13, the apparatus further comprising the following module for training to obtain the position vector parameter:
  • the first training module is used to determine the predicted feature vector corresponding to the target in the sample image according to the initial position vector parameter, so as to obtain the predicted target information corresponding to the sample image, wherein the sample image is pre-marked with the corresponding sample target information;
  • the second training module is configured to calculate a loss function according to the predicted target information and the sample target information, and adjust the initial position vector parameter according to the calculation result of the loss function.
  • Example 15 provides a computer-readable medium having stored thereon a computer program that, when executed by a processing apparatus, implements the steps of the method of any one of Examples 1-7.
  • Example 16 provides an electronic device comprising:
  • a processing device configured to execute the computer program in the storage device, to implement the steps of the method in any one of Examples 1-7.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

La présente invention concerne un procédé de suivi de cible vidéo, un appareil de suivi de cible vidéo, un support de stockage et un dispositif électronique, permettant de réaliser un suivi de cible vidéo de bout en bout et de réduire le retard temporel d'un suivi de cible vidéo. Le procédé de suivi de cible vidéo consiste à : acquérir une vidéo à suivre ; et entrer ladite vidéo dans un modèle de suivi cible afin d'obtenir un résultat de suivi cible correspondant à ladite vidéo, le modèle de suivi de cible servant à effectuer le traitement suivant : pour chaque trame d'image de ladite vidéo, déterminer un vecteur de caractéristique correspondant à une cible à suivre dans une image de détection cible correspondant à l'image, l'image de détection cible comprenant ladite cible ; effectuer un premier calcul de similarité sur chaque vecteur de caractéristiques dans une carte de caractéristiques correspondant à l'image et le vecteur de caractéristiques correspondant à ladite cible dans l'image de détection cible, puis déterminer un vecteur de caractéristique cible parmi tous les vecteurs de caractéristiques dans la carte de caractéristiques en fonction du premier résultat de calcul de similarité ; et déterminer ladite cible dans l'image selon le vecteur de caractéristique cible.
PCT/CN2022/075086 2021-02-09 2022-01-29 Procédé de suivi de cible vidéo, appareil de suivi de cible vidéo, support de stockage et dispositif électronique WO2022171036A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110179157.5A CN112907628A (zh) 2021-02-09 2021-02-09 视频目标追踪方法、装置、存储介质及电子设备
CN202110179157.5 2021-02-09

Publications (1)

Publication Number Publication Date
WO2022171036A1 true WO2022171036A1 (fr) 2022-08-18

Family

ID=76123159

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/075086 WO2022171036A1 (fr) 2021-02-09 2022-01-29 Procédé de suivi de cible vidéo, appareil de suivi de cible vidéo, support de stockage et dispositif électronique

Country Status (2)

Country Link
CN (1) CN112907628A (fr)
WO (1) WO2022171036A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116385497A (zh) * 2023-05-29 2023-07-04 成都与睿创新科技有限公司 用于体腔内的自定义目标追踪方法及***
CN117975198A (zh) * 2024-02-02 2024-05-03 北京视觉世界科技有限公司 目标检测类数据集的自动化构建方法及其相关设备

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112907628A (zh) * 2021-02-09 2021-06-04 北京有竹居网络技术有限公司 视频目标追踪方法、装置、存储介质及电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214238A (zh) * 2017-06-30 2019-01-15 百度在线网络技术(北京)有限公司 多目标跟踪方法、装置、设备及存储介质
US20200065617A1 (en) * 2018-08-24 2020-02-27 Nec Laboratories America, Inc. Unsupervised domain adaptation for video classification
CN111242973A (zh) * 2020-01-06 2020-06-05 上海商汤临港智能科技有限公司 目标跟踪方法、装置、电子设备及存储介质
CN111311635A (zh) * 2020-02-08 2020-06-19 腾讯科技(深圳)有限公司 一种目标定位方法、装置及***
CN112907628A (zh) * 2021-02-09 2021-06-04 北京有竹居网络技术有限公司 视频目标追踪方法、装置、存储介质及电子设备

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109829397B (zh) * 2019-01-16 2021-04-02 创新奇智(北京)科技有限公司 一种基于图像聚类的视频标注方法、***以及电子设备
CN110717414B (zh) * 2019-09-24 2023-01-03 青岛海信网络科技股份有限公司 一种目标检测追踪方法、装置及设备
CN111898416A (zh) * 2020-06-17 2020-11-06 绍兴埃瓦科技有限公司 视频流处理方法、装置、计算机设备和存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109214238A (zh) * 2017-06-30 2019-01-15 百度在线网络技术(北京)有限公司 多目标跟踪方法、装置、设备及存储介质
US20200065617A1 (en) * 2018-08-24 2020-02-27 Nec Laboratories America, Inc. Unsupervised domain adaptation for video classification
CN111242973A (zh) * 2020-01-06 2020-06-05 上海商汤临港智能科技有限公司 目标跟踪方法、装置、电子设备及存储介质
CN111311635A (zh) * 2020-02-08 2020-06-19 腾讯科技(深圳)有限公司 一种目标定位方法、装置及***
CN112907628A (zh) * 2021-02-09 2021-06-04 北京有竹居网络技术有限公司 视频目标追踪方法、装置、存储介质及电子设备

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116385497A (zh) * 2023-05-29 2023-07-04 成都与睿创新科技有限公司 用于体腔内的自定义目标追踪方法及***
CN116385497B (zh) * 2023-05-29 2023-08-22 成都与睿创新科技有限公司 用于体腔内的自定义目标追踪方法及***
CN117975198A (zh) * 2024-02-02 2024-05-03 北京视觉世界科技有限公司 目标检测类数据集的自动化构建方法及其相关设备

Also Published As

Publication number Publication date
CN112907628A (zh) 2021-06-04

Similar Documents

Publication Publication Date Title
WO2022171036A1 (fr) Procédé de suivi de cible vidéo, appareil de suivi de cible vidéo, support de stockage et dispositif électronique
CN110298413B (zh) 图像特征提取方法、装置、存储介质及电子设备
JP2023547917A (ja) 画像分割方法、装置、機器および記憶媒体
WO2022252881A1 (fr) Procédé et appareil de traitement d'image, support lisible et dispositif électronique
WO2022105779A1 (fr) Procédé de traitement d'image, procédé d'entraînement de modèle, appareil, support et dispositif
WO2023030370A1 (fr) Procédé et appareil de détection d'image d'endoscope, support de stockage et dispositif électronique
CN111784712B (zh) 图像处理方法、装置、设备和计算机可读介质
WO2022028254A1 (fr) Procédé d'optimisation de modèle de positionnement, procédé de positionnement et dispositif de positionnement
CN113033580B (zh) 图像处理方法、装置、存储介质及电子设备
CN110347875B (zh) 一种视频场景分类方法、装置、移动终端及存储介质
WO2023179310A1 (fr) Procédé et appareil de restauration d'image, dispositif, support et produit
WO2022233223A1 (fr) Procédé et appareil d'assemblage d'image, dispositif et support
WO2023030427A1 (fr) Procédé d'entraînement pour modèle génératif, procédé et appareil d'identification de polypes, support et dispositif
CN113449070A (zh) 多模态数据检索方法、装置、介质及电子设备
CN112330788A (zh) 图像处理方法、装置、可读介质及电子设备
CN113610034B (zh) 识别视频中人物实体的方法、装置、存储介质及电子设备
CN108257081B (zh) 用于生成图片的方法和装置
CN111862351B (zh) 定位模型优化方法、定位方法和定位设备
CN111311609B (zh) 一种图像分割方法、装置、电子设备及存储介质
CN110765304A (zh) 图像处理方法、装置、电子设备及计算机可读介质
WO2023016290A1 (fr) Procédé et appareil de classification de vidéo, support lisible et dispositif électronique
WO2022194145A1 (fr) Procédé et appareil de détermination de position de photographie, dispositif et support
WO2022052889A1 (fr) Procédé et appareil de reconnaissance d'image, dispositif électronique et support lisible par ordinateur
CN113435528B (zh) 对象分类的方法、装置、可读介质和电子设备
CN114004229A (zh) 文本识别方法、装置、可读介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22752197

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22752197

Country of ref document: EP

Kind code of ref document: A1