CN112418104A

CN112418104A - Pedestrian tracking method and related equipment

Info

Publication number: CN112418104A
Application number: CN202011335074.2A
Authority: CN
Inventors: 唐欢; 胡文泽
Original assignee: Shenzhen Intellifusion Technologies Co Ltd
Current assignee: Shenzhen Intellifusion Technologies Co Ltd
Priority date: 2020-11-24
Filing date: 2020-11-24
Publication date: 2021-02-26

Abstract

The embodiment of the application provides a pedestrian tracking method and related equipment, wherein the method comprises the following steps: determining a target pedestrian in the current frame image; acquiring target human body features and K historical human body features, wherein the target human body features are human body features of the target pedestrians, the K historical human body features are human body features of historical pedestrians in a previous frame image, the previous frame image is a previous frame image of the current frame image in a video stream, and K is a positive integer; and performing feature fusion according to the target human body features and the K historical human body features to determine whether the target pedestrian is the historical pedestrian. By adopting the embodiment of the application, the accuracy rate of pedestrian tracking is improved.

Description

Pedestrian tracking method and related equipment

Technical Field

The present application relates to the field of computer technologies, and in particular, to a pedestrian tracking method and related devices.

Background

With the leap-type development of deep learning, indexes of target Tracking are rapidly improved, such as Multi-Object Tracking Accuracy (MOTA); especially, a series of tracking algorithms represented by a multi-target tracking algorithm (deepsort) extract features through a human body feature model (Reid) to track, so that the method has great practical value in practical application and self-evident importance. By observing the current target tracking field, different people cannot be distinguished in percentage according to human body characteristics, and especially when the people are similar to the target tracking field, the human body characteristics do not have obvious distinguishing degree, so that the tracking accuracy is reduced, and the actual application is greatly influenced.

Disclosure of Invention

The embodiment of the application discloses a pedestrian tracking method and related equipment, which are beneficial to improving the accuracy of pedestrian tracking.

The first aspect of the embodiment of the application discloses a pedestrian tracking method, which comprises the following steps: determining a target pedestrian in the current frame image; acquiring target human body features and K historical human body features, wherein the target human body features are human body features of the target pedestrians, the K historical human body features are human body features of historical pedestrians in a previous frame image, the previous frame image is a previous frame image of the current frame image in a video stream, and K is a positive integer; and performing feature fusion according to the target human body features and the K historical human body features to determine whether the target pedestrian is the historical pedestrian.

In an exemplary embodiment, the determining the target pedestrian in the current frame image includes: carrying out multi-scale feature decomposition on the current frame image to obtain a low-frequency feature component and a high-frequency feature component; dividing the low-frequency feature components into a plurality of regions; determining an information entropy corresponding to each of the plurality of regions to obtain a plurality of information entropies; determining an average information entropy and a target mean square error according to the plurality of information entropies; determining a target adjusting coefficient corresponding to the target mean square error; adjusting the average information entropy according to the target adjustment coefficient to obtain a target information entropy; determining a first evaluation value corresponding to the target information entropy according to a preset mapping relation between the information entropy and the score; acquiring target shooting parameters corresponding to the current frame image; determining a target low-frequency weight corresponding to the target shooting parameter according to a mapping relation between a preset shooting parameter and the low-frequency weight, and determining a target high-frequency weight according to the target low-frequency weight; determining the distribution density of the target characteristic points according to the high-frequency characteristic components; determining a second evaluation value corresponding to the target feature point distribution density according to a preset mapping relation between the feature point distribution density and the score; performing weighting operation according to the first evaluation value, the second evaluation value, the target low-frequency weight and the target high-frequency weight to obtain the target definition of the current frame image; if the target definition is smaller than the preset definition threshold, determining a target image enhancement algorithm corresponding to the target definition according to the mapping relation between the image definition and the image enhancement algorithm; performing image enhancement processing on the current frame image according to the target image enhancement algorithm to obtain an enhanced current frame image; and determining the target pedestrian in the enhanced current frame image.

A second aspect of the embodiments of the present application discloses a pedestrian tracking apparatus, including: the determining unit is used for determining a target pedestrian in the current frame image; the device comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring target human body characteristics and K historical human body characteristics, the target human body characteristics are human body characteristics of a target pedestrian, the K historical human body characteristics are human body characteristics of a historical pedestrian in a previous frame image, the previous frame image is the previous frame image of the current frame image in a video stream, and K is a positive integer; and the tracking unit is also used for carrying out feature fusion according to the target human body features and the K historical human body features so as to determine whether the target pedestrian is the historical pedestrian.

A third aspect of embodiments of the present application discloses a server comprising a processor, a memory, a communication interface, and one or more programs stored in the memory and configured to be executed by the processor, the programs comprising instructions for performing the steps in the method according to any one of the first aspect of embodiments of the present application.

The fourth aspect of the present embodiment discloses a chip, which includes: a processor, configured to call and run a computer program from a memory, so that a device on which the chip is installed performs the method according to any one of the first aspect of the embodiments of the present application.

A fifth aspect of embodiments of the present application discloses a computer-readable storage medium, which is characterized by storing a computer program for electronic data exchange, wherein the computer program causes a computer to execute the method according to any one of the first aspect of embodiments of the present application.

A sixth aspect of embodiments of the present application discloses a computer program product, which causes a computer to execute the method according to any one of the first aspect of the embodiments of the present application.

It can be seen that, in the embodiment of the present application, when performing video tracking on a pedestrian, for each frame image (that is, a current frame image), a target pedestrian in the current frame image is determined first; then, acquiring the human body characteristics of the target pedestrian and acquiring the human body characteristics of K historical pedestrians in the video tracking process; and then carrying out feature fusion on the human body features of the target pedestrian and the human body features of the K historical pedestrians, and determining whether the target pedestrian is a historical pedestrian according to the human body features of the target pedestrian and fusion features obtained by feature fusion. In the prior art, the human body characteristics of a target pedestrian are directly compared with the human body characteristics of a historical pedestrian so as to judge whether the target pedestrian is the historical pedestrian; the human body characteristics of the target pedestrian are compared with the fusion characteristics, and the fusion characteristics are obtained by performing characteristic fusion on the human body characteristics of a plurality of historical pedestrians, so that the method has higher accuracy in characteristic comparison, and is favorable for improving the accuracy of pedestrian tracking.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic flow chart of a pedestrian tracking method according to an embodiment of the present application;

FIG. 2 is a schematic flow chart diagram of another pedestrian tracking method provided by the embodiments of the present application;

fig. 3 is a schematic structural diagram of a pedestrian tracking device according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a server according to an embodiment of the present application.

Detailed Description

The embodiments of the present application will be described below with reference to the drawings.

Referring to fig. 1, fig. 1 is a schematic flow chart of a pedestrian tracking method provided in an embodiment of the present application, where the pedestrian tracking method is applicable to a server, and the pedestrian tracking method includes, but is not limited to, the following steps.

Step 101, determining a target pedestrian in a current frame image;

the current frame image is a current frame image in a video stream for video tracking, one or more pedestrians may exist in the current frame image, and the target pedestrian is one of multiple pedestrians.

The method includes the steps of performing multi-scale feature decomposition on a current frame image by using a multi-scale decomposition algorithm to obtain a low-frequency feature component and a high-frequency feature component, wherein the multi-scale decomposition algorithm may be at least one of the following algorithms: pyramid transform algorithms, wavelet transforms, contourlet transforms, shear wave transforms, etc., and are not limited herein. The target photographing parameter may be at least one of: ISO, exposure duration, white balance parameter, focus parameter, etc., without limitation. In addition, in the embodiment of the application, the value range of the adjusting coefficient can be-0.15; target information entropy is (1+ target regulation coefficient) average information entropy; the target low-frequency weight and the target high-frequency weight are equal to 1; the distribution density of the target characteristic points is equal to the total quantity of the characteristic points/the area of the region of the high-frequency characteristic components; the target definition is the first evaluation value and the target low-frequency weight and the second evaluation value and the target high-frequency weight.

Therefore, the image quality evaluation can be carried out based on the two dimensions of the low-frequency component and the high-frequency component of the current frame image, the target definition of the current frame image can be accurately obtained, and then a target pedestrian in the current frame image is determined, so that the failure of pedestrian tracking caused by low image definition can be avoided, and the accuracy of pedestrian tracking is favorably improved.

102, obtaining target human body features and K historical human body features, wherein the target human body features are human body features of the target pedestrians, the K historical human body features are human body features of historical pedestrians in a previous frame image, the previous frame image is a previous frame image of the current frame image in a video stream, and K is a positive integer;

wherein, the human body features can be represented by vectors.

The target human body characteristics can be obtained by inputting a human body detection frame corresponding to the target pedestrian into a human body characteristic model (Reid); the human body detection frame is used for carrying out pedestrian detection on the image in the pedestrian tracking process, and is an area image after the pedestrian is selected by the frame, namely an area image containing the pedestrian in the frame image.

The historical human body features are human body features of historical pedestrians in a video tracking process and are obtained by extracting human body features of each frame of image before a current frame of image. That is, video tracking is performed on a video stream where a current frame image is located, and in the video tracking process, when a pedestrian is tracked on each frame image, human body features of pedestrians in the frame image need to be extracted, and the extracted human body features of the pedestrians are stored and serve as the human body features of historical pedestrians.

And 103, performing feature fusion according to the target human body features and the K historical human body features to determine whether the target pedestrian is the historical pedestrian.

Specifically, the purpose of performing feature fusion according to the target human body features and the K historical human body features is to obtain fusion features, and then perform feature comparison by using the fusion features and the target human body features, for example, perform similarity calculation on the fusion features and the target human body features, so as to determine whether the target pedestrian is a historical pedestrian. When the characteristics are fused, K historical human body characteristics are fused, but the fusion proportion of each historical human body characteristic is related to the similarity between the historical human body characteristic and the target human body characteristic.

In an exemplary embodiment, the performing feature fusion according to the target human body feature and the K historical human body features to determine whether the target pedestrian is the historical pedestrian includes: respectively carrying out similarity calculation on the target human body features and the K historical human body features to obtain K first similarities; carrying out weighted average according to the K historical human body features and the K first similarities to obtain fusion features; similarity calculation is carried out on the target human body features and the fusion features to obtain second similarity; and if the second similarity is larger than a preset similarity threshold value, determining that the target pedestrian is a historical pedestrian.

Wherein, the calculation formula of the fusion characteristic is as follows:

in the above formula, tenor_fusionFor fusion features, tenor_iThe method comprises the following steps of (1) obtaining the ith historical human body characteristic, namely the human body characteristic of the ith historical pedestrian; weight_iThe similarity value between the target human body feature and the ith historical human body feature, namely the similarity value between the human body feature of the target pedestrian and the human body feature of the ith historical pedestrian.

As can be seen, in this example, similarity calculation is performed on the human body features of the target pedestrian and the human body features of the K historical pedestrians, respectively, to obtain K first similarities; then carrying out weighted average according to the human body characteristics of the K historical pedestrians and the K first similarities to obtain fusion characteristics; similarity calculation is carried out on the human body characteristics and the fusion characteristics of the target pedestrian to obtain a second similarity; if the second similarity is larger than a preset similarity threshold, determining that the target pedestrian is a historical pedestrian; because the fusion features are adopted to be compared with the human body features of the target pedestrians, the accuracy rate is higher when the features are compared, and the accuracy rate of pedestrian tracking is improved.

In an exemplary embodiment, the obtaining the target human body feature and the obtaining K historical human body features includes: comparing a target pedestrian number with a historical pedestrian number, wherein the target pedestrian number is a pedestrian number corresponding to the target pedestrian, and the historical pedestrian number is a pedestrian number corresponding to the historical pedestrian; and if the comparison fails, acquiring the target human body characteristics and acquiring the K historical human body characteristics.

It should be understood that the tracked pedestrian numbers (ids) in the deepsort process are divided into two types, the first type is non-newly appeared id, and only the current human body feature information needs to be extracted through a Reid model (Person Re-Identification, pedestrian Re-Identification) and stored. The second type is newly appeared id, and comparison with the id of the historical pedestrian is needed to determine whether the pedestrian corresponding to the id is the historical pedestrian. For the target pedestrian, the id of the target pedestrian can be compared with the id of the historical pedestrian in the video tracking process, so as to preliminarily determine whether the target pedestrian is the historical pedestrian; if the comparison between the id of the target pedestrian and the id of the historical pedestrian in the video tracking process fails, which indicates that the target pedestrian may not be the historical pedestrian, the human body features of the target pedestrian and the human body features of the K historical pedestrians need to be obtained, and whether the target pedestrian is the historical pedestrian is further determined through the human body feature comparison.

It can be seen that, in this example, before the human body features of the target pedestrian are obtained and the human body features of the K historical pedestrians are obtained, the id of the target pedestrian is compared with the id of the historical pedestrian in the video tracking process to preliminarily determine whether the target pedestrian is the historical pedestrian, so as to further improve the accuracy of pedestrian tracking.

In an exemplary embodiment, the method further comprises: if the comparison is successful, acquiring the target human body characteristics and acquiring a target human body detection frame corresponding to the target pedestrian; arranging the target human body detection frame into a preset format; and storing the target human body detection frame in the preset format, the target human body characteristics and the target pedestrian number in an associated manner.

It should be understood that if the id of the target pedestrian is successfully compared with the id of the historical pedestrian in the video tracking process, which indicates that the target pedestrian is the historical pedestrian, the human body characteristics of the target pedestrian are obtained and the human body detection frame of the target pedestrian is obtained, then the human body detection frame of the target pedestrian is arranged into a preset format, and then the human body detection frame of the target pedestrian in the preset format, the human body characteristics of the target pedestrian and the id of the target pedestrian are stored in an associated manner.

The target human body detection frame is arranged into a preset format, namely the target human body detection frame is arranged into an x, y, w and h format.

As can be seen, in this example, if the id of the target pedestrian is successfully compared with the id of the historical pedestrian in the video tracking process, it is indicated that the target pedestrian is the historical pedestrian, and the related information of the target pedestrian is directly obtained and stored.

In an exemplary embodiment, the acquiring a target human body detection frame corresponding to the target pedestrian includes: inputting the current frame image into a pre-trained human body detection model to obtain N first human body detection frames corresponding to the current frame image, wherein N is a positive integer; screening out first human body detection frames with corresponding confidence degrees larger than a preset confidence degree threshold value from the N first human body detection frames to obtain M second human body detection frames, wherein M is a positive integer smaller than or equal to N; and acquiring the target human body detection frame from the M second human body detection frames.

The human body detection frame is used for detecting pedestrians in the image in the process of tracking the pedestrians, and is an area image after the pedestrians are selected in the frame image, namely an area image containing the pedestrians in the frame image. In addition, the human detection model trained in advance may be a pedestrian detector.

For example, the current frame image is sent to a pedestrian detector, the confidence coefficient can be set to be 0.3, the first human body detection frame with the confidence coefficient higher than 0.3 is extracted, all the second human body detection frames are obtained, and then the human body detection frame of the target pedestrian is obtained from all the second human body detection frames.

Therefore, in this example, when the human body detection frame in the current frame image is acquired, the human body detection frame with the confidence coefficient greater than the preset confidence coefficient threshold is used as the human body detection frame corresponding to the current frame image, which is beneficial to reducing the problem of inaccurate pedestrian tracking caused by low confidence coefficient of the human body detection frame, so that the accuracy of pedestrian tracking is improved.

In an exemplary embodiment, before the inputting the current frame image into a human body detection model trained in advance to obtain N first human body detection frames corresponding to the current frame image, the method further includes: judging whether the size of the current frame image is a preset size or not; and if the size of the current frame image is not the preset size, processing the current frame image to enable the size of the current frame image to be the preset size.

Specifically, the size of the input image of the pedestrian detector is specified to be 512 × 512, that is, the predetermined size is 512 × 512, and therefore, if the size of the current frame image is not 512 × 512, it needs to be processed so that the size thereof is 512 × 512.

It can be seen that, in this example, when the size of the current frame image does not conform to the predetermined size required by the model training, the current frame image is preprocessed to make the size of the current frame image the predetermined size, so that each frame image in the video stream can be used for training the video node structured model, which is beneficial to improving the accuracy of the model, that is, improving the accuracy of the pedestrian tracking.

In an exemplary embodiment, the processing the current frame image to make the size of the current frame image be the predetermined size includes: in a high dimension, if the height of the current frame image is larger than the height of the preset size, scaling the height of the current frame image to the height of the preset size; if the height of the current frame image is smaller than the height of the preset size, h rows of zeros are filled in the row direction of the current frame image, wherein h is the difference value between the height of the preset size and the height of the current frame image; in a width dimension, if the width of the current frame image is larger than the width of the preset size, scaling the width of the current frame image to the width of the preset size; if the width of the current frame image is smaller than the width of the preset size, filling w columns of zeros in the column direction of the current frame image, where w is a difference between the width of the preset size and the width of the current frame image.

Specifically, if the size of the current frame image is larger than the predetermined size, and the height and width of the predetermined size are equal, the processing the current frame image to make the size of the current frame image be the predetermined size includes: determining the ratio of the side length of the long edge of the current frame image to the side length of the preset size, wherein if the height of the current frame image is larger than or equal to the width, the side length of the long edge of the current frame image is the height of the current frame image; if the height of the current frame image is smaller than the width, the side length of the long edge of the current frame image is the width of the current frame image; the side length of the preset size is the height or the width of the preset size; reducing the current frame image according to the ratio to obtain a first target image, wherein the side length of a long side of the first target image is the side length of the preset size; if the side length of the short side of the first target image is equal to the side length of the preset size, the first target image is the processed current frame image; and if the side length of the short side of the first target image is smaller than the side length of the preset size, filling zero in the direction of the short side of the first target image to obtain a second target image, wherein the side length of the short side of the second target image is the side length of the preset size, and the second target image is the processed current frame image.

For example, the size of a frame image in a video stream is generally 1920 × 1080, which needs to be changed to 512 × 512, the ratio of the image to the long side (1920) needs to be calculated, then the long side is scaled according to the ratio, the short side is 288 after scaling, so that the short side is smaller than 512 after scaling, and zero padding is uniformly performed, that is, the short side is padded from 288 to 512.

As can be seen, in this example, when the size of the current frame image does not meet the predetermined size required by the model training, if the size of the current frame image is larger than the predetermined size, the reduction processing is performed to make the size of the current frame image be the predetermined size; if the size of the current frame image is smaller than the preset size, zero filling processing is carried out to enable the size of the current frame image to be the preset size; therefore, each frame of image in the video stream can be used for training the video node structured model, and the accuracy of the model, namely the accuracy of pedestrian tracking, can be improved.

Referring to fig. 2, fig. 2 is a schematic flow chart of another pedestrian tracking method provided in the embodiment of the present application, where the pedestrian tracking method can be applied to a server, and the pedestrian tracking method includes, but is not limited to, the following steps.

201. The current frame image is obtained from the video stream.

202. The current frame image is input into the depsort flow.

203. And judging whether the id of the target pedestrian is the same as that of the historical pedestrian.

204. If the id of the target pedestrian is the same as the id of the historical pedestrian, extracting the human body feature of the target pedestrian by adopting a Reid model, and storing the frame number of the current frame image, the human body detection frame of the target pedestrian, the human body feature of the target pedestrian and the id of the target pedestrian.

It should be understood that if the id of the target pedestrian is the same as the id of the historical pedestrian, it indicates that the target pedestrian is the historical pedestrian, and the related information of the target pedestrian may be directly stored, that is, the frame number of the current frame image, the human body detection frame of the target pedestrian, the human body characteristics of the target pedestrian, and the id of the target pedestrian are stored in the data storage (datastore) device.

The input of the pedestrian detector is generally 512 × 512 images, while the size of the current frame image is generally 1920 × 1080, the current frame image needs to be preprocessed into 512 × 512 images, then the 512 × 512 images are sent to the pedestrian detector, the confidence coefficient is set to be 0.3, and the human body detection frames with the confidence coefficient higher than 0.3 are extracted, so that all the human body detection frames of the current frame image are obtained. And then inputting the human body detection frame of the target pedestrian into the Reid model, and extracting the human body characteristics of the target pedestrian.

205. And if the id of the target pedestrian is not the same as the id of the historical pedestrian, extracting the human body characteristics of the target pedestrian by adopting a Reid model.

It should be understood that if the id of the target pedestrian is not the same as the id of the history pedestrian, the target pedestrian may be a newly-appeared pedestrian or a history pedestrian whose tracking trajectory is broken.

206. The human body features of the target pedestrian and the human body features of the multiple historical pedestrians are subjected to similarity calculation respectively to obtain multiple first similarities, weighted average is carried out according to the human body features of the multiple historical pedestrians and the multiple first similarities to obtain a fusion feature, and the human body features of the target pedestrian and the fusion feature are subjected to similarity calculation to obtain a second similarity.

The multiple historical pedestrians are historical pedestrians appearing in a video tracking process, namely the multiple historical pedestrians are historical pedestrians in a frame image of which the frame number in the video stream is smaller than that of the current frame image; the calculation formula for the fusion features is as described above.

207. And judging whether the second similarity is larger than a preset similarity threshold value.

208. If the second similarity is larger than the preset similarity threshold, determining that the target pedestrian is a historical pedestrian, reconnecting the track of the target pedestrian, and storing the frame number of the current frame image, the human body detection frame of the target pedestrian, the human body characteristics of the target pedestrian and the id of the target pedestrian.

209. And if the second similarity is not greater than the preset similarity threshold, determining that the target pedestrian is not the historical pedestrian.

210. And storing the frame number of the current frame image, the human body detection frame of the target pedestrian, the human body characteristics of the target pedestrian and the id of the target pedestrian.

It should be understood that after the frame number of the current frame image, the human body detection frame of the target pedestrian, the human body characteristics of the target pedestrian and the id of the target pedestrian are stored, the pedestrian tracking process of the next frame image of the current frame image is executed, and the above steps are repeated until the video is finished.

It can be seen that in the embodiment of the application, the characteristic information of the pedestrian in different time periods is utilized to provide powerful auxiliary information on the human body in the tracking process, each different id is subjected to characteristic storage, when a new id appears, weight calculation and characteristic fusion are performed on all human body characteristics of the previously stored id, similarity calculation and judgment are performed, pedestrian judgment in the tracking process is assisted, the track breaking situation is reduced, the accuracy of the tracking process is increased, further the tracking key index in an actual scene is improved, and the actual value of the algorithm falling to the ground is improved.

The method of the embodiments of the present application is set forth above in detail and the apparatus of the embodiments of the present application is provided below.

Referring to fig. 3, fig. 3 is a schematic structural diagram of a pedestrian tracking apparatus 300 provided in an embodiment of the present application, the pedestrian tracking apparatus is applied to a server, the pedestrian tracking apparatus 300 may include a determining unit 301, an obtaining unit 302, and a tracking unit 303, where details of each unit are as follows:

a determination unit 301, configured to determine a target pedestrian in the current frame image;

an obtaining unit 302, configured to obtain a target human body feature and K historical human body features, where the target human body feature is a human body feature of the target pedestrian, the K historical human body features are human body features of historical pedestrians in a previous frame image, the previous frame image is a previous frame image of the current frame image in a video stream, and K is a positive integer;

the tracking unit 303 is further configured to perform feature fusion according to the target human body features and the K historical human body features to determine whether the target pedestrian is the historical pedestrian.

In an exemplary embodiment, the tracking unit 303 is specifically configured to: respectively carrying out similarity calculation on the target human body features and the K historical human body features to obtain K first similarities; carrying out weighted average according to the K historical human body features and the K first similarities to obtain fusion features; similarity calculation is carried out on the target human body features and the fusion features to obtain second similarity; and if the second similarity is larger than a preset similarity threshold value, determining that the target pedestrian is a historical pedestrian.

In an exemplary embodiment, the obtaining unit 302 is specifically configured to: comparing a target pedestrian number with a historical pedestrian number, wherein the target pedestrian number is a pedestrian number corresponding to the target pedestrian, and the historical pedestrian number is a pedestrian number corresponding to the historical pedestrian; and if the comparison fails, acquiring the target human body characteristics and acquiring the K historical human body characteristics.

In an exemplary embodiment, the obtaining unit 302 is further configured to: if the comparison is successful, acquiring the target human body characteristics and acquiring a target human body detection frame corresponding to the target pedestrian; arranging the target human body detection frame into a preset format; and storing the target human body detection frame in the preset format, the target human body characteristics and the target pedestrian number in an associated manner.

In an exemplary embodiment, in the aspect of acquiring the target human body detection frame corresponding to the target pedestrian, the acquiring unit 302 is specifically configured to: inputting the current frame image into a pre-trained human body detection model to obtain N first human body detection frames corresponding to the current frame image, wherein N is a positive integer; screening out first human body detection frames with corresponding confidence degrees larger than a preset confidence degree threshold value from the N first human body detection frames to obtain M second human body detection frames, wherein M is a positive integer smaller than or equal to N; and acquiring the target human body detection frame from the M second human body detection frames.

In an exemplary embodiment, before the current frame image is input into a human detection model trained in advance to obtain N first human detection frames corresponding to the current frame image, the obtaining unit 302 is further configured to: judging whether the size of the current frame image is a preset size or not; and if the size of the current frame image is not the preset size, processing the current frame image to enable the size of the current frame image to be the preset size.

In an exemplary embodiment, in the processing the current frame image to make the size of the current frame image be the predetermined size, the obtaining unit 302 is specifically configured to: in a high dimension, if the height of the current frame image is larger than the height of the preset size, scaling the height of the current frame image to the height of the preset size; if the height of the current frame image is smaller than the height of the preset size, h rows of zeros are filled in the row direction of the current frame image, wherein h is the difference value between the height of the preset size and the height of the current frame image; in a width dimension, if the width of the current frame image is larger than the width of the preset size, scaling the width of the current frame image to the width of the preset size; if the width of the current frame image is smaller than the width of the preset size, filling w columns of zeros in the column direction of the current frame image, where w is a difference between the width of the preset size and the width of the current frame image.

It should be noted that the implementation of each unit may also correspond to the corresponding description of the method embodiment shown in fig. 1 or fig. 2. Of course, the pedestrian tracking apparatus 300 provided in the embodiment of the present application includes, but is not limited to, the above unit modules, such as: the pedestrian tracking device 300 may also include a memory unit 304, and the memory unit 304 may be used to store program codes and data for the pedestrian tracking device 300.

In the pedestrian tracking apparatus 300 depicted in fig. 3, when performing video tracking on a pedestrian, for each frame image (i.e., the current frame image), a target pedestrian in the current frame image is determined; then, acquiring the human body characteristics of the target pedestrian and acquiring the human body characteristics of K historical pedestrians in the video tracking process; and then carrying out feature fusion on the human body features of the target pedestrian and the human body features of the K historical pedestrians, and determining whether the target pedestrian is a historical pedestrian according to the human body features of the target pedestrian and fusion features obtained by feature fusion. In the prior art, the human body characteristics of a target pedestrian are directly compared with the human body characteristics of a historical pedestrian so as to judge whether the target pedestrian is the historical pedestrian; the human body characteristics of the target pedestrian are compared with the fusion characteristics, and the fusion characteristics are obtained by performing characteristic fusion on the human body characteristics of a plurality of historical pedestrians, so that the method has higher accuracy in characteristic comparison, and is favorable for improving the accuracy of pedestrian tracking.

Referring to fig. 4, fig. 4 is a schematic structural diagram of a server 410 according to an embodiment of the present disclosure, where the server 410 includes a processor 411, a memory 412, and a communication interface 413, and the processor 411, the memory 412, and the communication interface 413 are connected to each other through a bus 414.

The memory 412 includes, but is not limited to, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), or a portable read-only memory (CD-ROM), and the memory 412 is used for related computer programs and data. Communication interface 413 is used for receiving and transmitting data.

The processor 411 may be one or more Central Processing Units (CPUs), and in the case that the processor 411 is one CPU, the CPU may be a single-core CPU or a multi-core CPU.

The processor 411 in the server 410 is configured to read the computer program code stored in the memory 412, and perform the following operations: determining a target pedestrian in the current frame image; acquiring target human body features and K historical human body features, wherein the target human body features are human body features of the target pedestrians, the K historical human body features are human body features of historical pedestrians in a previous frame image, the previous frame image is a previous frame image of the current frame image in a video stream, and K is a positive integer; and performing feature fusion according to the target human body features and the K historical human body features to determine whether the target pedestrian is the historical pedestrian.

It should be noted that the implementation of each operation may also correspond to the corresponding description of the method embodiment shown in fig. 1 or fig. 2.

In the server 410 depicted in fig. 4, when performing video tracking on a pedestrian, for each frame image (i.e., current frame image), a target pedestrian in the current frame image is determined; then, acquiring the human body characteristics of the target pedestrian and acquiring the human body characteristics of K historical pedestrians in the video tracking process; and then carrying out feature fusion on the human body features of the target pedestrian and the human body features of the K historical pedestrians, and determining whether the target pedestrian is a historical pedestrian according to the human body features of the target pedestrian and fusion features obtained by feature fusion. In the prior art, the human body characteristics of a target pedestrian are directly compared with the human body characteristics of a historical pedestrian so as to judge whether the target pedestrian is the historical pedestrian; the human body characteristics of the target pedestrian are compared with the fusion characteristics, and the fusion characteristics are obtained by performing characteristic fusion on the human body characteristics of a plurality of historical pedestrians, so that the method has higher accuracy in characteristic comparison, and is favorable for improving the accuracy of pedestrian tracking.

The embodiment of the present application further provides a chip, where the chip includes at least one processor, a memory and an interface circuit, where the memory, the transceiver and the at least one processor are interconnected by a line, and the at least one memory stores a computer program; when the computer program is executed by the processor, the method flow shown in fig. 1 or fig. 2 is implemented.

An embodiment of the present application further provides a computer-readable storage medium, in which a computer program is stored, and when the computer program runs on a computer, the method flow shown in fig. 1 or fig. 2 is implemented.

The embodiment of the present application further provides a computer program product, and when the computer program product runs on a computer, the method flow shown in fig. 1 or fig. 2 is implemented.

It should be understood that the Processor mentioned in the embodiments of the present Application may be a Central Processing Unit (CPU), and may also be other general purpose processors, Digital Signal Processors (DSP), Application Specific Integrated Circuits (ASIC), Field Programmable Gate Arrays (FPGA) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It will also be appreciated that the memory referred to in the embodiments of the application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of example, but not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), Double Data Rate Synchronous Dynamic random access memory (DDR SDRAM), Enhanced Synchronous SDRAM (ESDRAM), Synchronous link SDRAM (SLDRAM), and Direct Rambus RAM (DR RAM).

It should be noted that when the processor is a general-purpose processor, a DSP, an ASIC, an FPGA or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, the memory (memory module) is integrated in the processor.

It should be noted that the memory described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

It should also be understood that reference herein to first, second, third, fourth, and various numerical designations is made only for ease of description and should not be used to limit the scope of the present application.

It should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the above-described division of units is only one type of division of logical functions, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The above functions, if implemented in the form of software functional units and sold or used as a separate product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The steps in the method of the embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs.

The modules in the device can be merged, divided and deleted according to actual needs.

The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A pedestrian tracking method, comprising:

determining a target pedestrian in the current frame image;

acquiring target human body features and K historical human body features, wherein the target human body features are human body features of the target pedestrians, the K historical human body features are human body features of historical pedestrians in a previous frame image, the previous frame image is a previous frame image of the current frame image in a video stream, and K is a positive integer;

and performing feature fusion according to the target human body features and the K historical human body features to determine whether the target pedestrian is the historical pedestrian.

2. The method according to claim 1, wherein the performing feature fusion according to the target human body feature and the K historical human body features to determine whether the target pedestrian is the historical pedestrian comprises:

respectively carrying out similarity calculation on the target human body features and the K historical human body features to obtain K first similarities;

carrying out weighted average according to the K historical human body features and the K first similarities to obtain fusion features;

similarity calculation is carried out on the target human body features and the fusion features to obtain second similarity;

and if the second similarity is larger than a preset similarity threshold value, determining that the target pedestrian is a historical pedestrian.

3. The method according to claim 1 or 2, wherein the obtaining of the target human body features and the obtaining of the K historical human body features comprise:

comparing a target pedestrian number with a historical pedestrian number, wherein the target pedestrian number is a pedestrian number corresponding to the target pedestrian, and the historical pedestrian number is a pedestrian number corresponding to the historical pedestrian;

and if the comparison fails, acquiring the target human body characteristics and acquiring the K historical human body characteristics.

4. The method of claim 3, further comprising:

if the comparison is successful, acquiring the target human body characteristics and acquiring a target human body detection frame corresponding to the target pedestrian;

arranging the target human body detection frame into a preset format;

and storing the target human body detection frame in the preset format, the target human body characteristics and the target pedestrian number in an associated manner.

5. The method according to claim 4, wherein the obtaining of the target human body detection frame corresponding to the target pedestrian comprises:

inputting the current frame image into a pre-trained human body detection model to obtain N first human body detection frames corresponding to the current frame image, wherein N is a positive integer;

screening out first human body detection frames with corresponding confidence degrees larger than a preset confidence degree threshold value from the N first human body detection frames to obtain M second human body detection frames, wherein M is a positive integer smaller than or equal to N;

and acquiring the target human body detection frame from the M second human body detection frames.

6. The method of claim 5, wherein before the inputting the current frame image into a pre-trained human body detection model to obtain N first human body detection boxes corresponding to the current frame image, the method further comprises:

judging whether the size of the current frame image is a preset size or not;

and if the size of the current frame image is not the preset size, processing the current frame image to enable the size of the current frame image to be the preset size.

7. The method according to claim 6, wherein the processing the current frame image to make the size of the current frame image be the predetermined size comprises:

in a high dimension, if the height of the current frame image is larger than the height of the preset size, scaling the height of the current frame image to the height of the preset size; if the height of the current frame image is smaller than the height of the preset size, h rows of zeros are filled in the row direction of the current frame image, wherein h is the difference value between the height of the preset size and the height of the current frame image;

in a width dimension, if the width of the current frame image is larger than the width of the preset size, scaling the width of the current frame image to the width of the preset size; if the width of the current frame image is smaller than the width of the preset size, filling w columns of zeros in the column direction of the current frame image, where w is a difference between the width of the preset size and the width of the current frame image.

8. A pedestrian tracking apparatus, comprising:

the determining unit is used for determining a target pedestrian in the current frame image;

the device comprises an acquisition unit, a processing unit and a display unit, wherein the acquisition unit is used for acquiring target human body characteristics and K historical human body characteristics, the target human body characteristics are human body characteristics of a target pedestrian, the K historical human body characteristics are human body characteristics of a historical pedestrian in a previous frame image, the previous frame image is the previous frame image of the current frame image in a video stream, and K is a positive integer;

and the tracking unit is also used for carrying out feature fusion according to the target human body features and the K historical human body features so as to determine whether the target pedestrian is the historical pedestrian.

9. A server, comprising a processor, memory, a communication interface, and one or more programs stored in the memory and configured to be executed by the processor, the programs including instructions for performing the steps in the method of any of claims 1-7.

10. A computer-readable storage medium, characterized in that it stores a computer program for electronic data exchange, wherein the computer program causes a computer to perform the method according to any one of claims 1-7.