CN111597980A

CN111597980A - Target object clustering method and device

Info

Publication number: CN111597980A
Application number: CN202010408248.7A
Authority: CN
Inventors: 张修宝; 沈海峰
Original assignee: Beijing Didi Infinity Technology and Development Co Ltd
Current assignee: Beijing Didi Infinity Technology and Development Co Ltd
Priority date: 2018-12-17
Filing date: 2018-12-17
Publication date: 2020-08-28
Anticipated expiration: 2038-12-17
Also published as: CN111597979B; CN110781710A; CN111597979A; CN111597980B; CN110781710B

Abstract

The application relates to the technical field of image processing, in particular to a target object clustering method and a target object clustering device, wherein the method comprises the following steps: acquiring a monitoring video, intercepting subimages comprising target object information from an image of the monitoring video, and recording the frame number of each subimage; then, extracting a feature vector of each sub-image, and dividing the sub-images comprising the same target object information into the same category set based on the extracted feature vector of each sub-image; and finally, determining a category set to be adjusted based on the frame numbers of the sub-images included in the divided category sets, and adjusting the sub-images included in the category set to be adjusted. By the method, the accuracy of target object clustering can be improved.

Description

Target object clustering method and device

Technical Field

The present application relates to the field of image processing technologies, and in particular, to a target object clustering method and apparatus.

The present application is a divisional application of the patent application with application number 201811544296.8.

Background

In the application fields of video monitoring, security protection, unmanned driving and the like, detection of a target object in a monitored video is generally involved, and the target object is, for example, a pedestrian or a vehicle appearing in the monitored video. Specifically, in some specific application scenarios, for example, when determining the activity of a target object in an area monitored by a surveillance video, images of all existing target objects need to be screened from the surveillance video, which involves clustering the images of the existing target objects in the surveillance video.

In the prior art, common clustering algorithms mainly include K-means, KD-tree and the like, but such clustering algorithms often determine a category threshold at first so as to determine how many categories target objects need to be clustered, but when clustering target objects in a monitored video, it is often difficult to determine how many categories target objects need to be clustered, so that a large error may occur in the selection of the category threshold, and further the clustering accuracy is low.

Disclosure of Invention

In view of this, embodiments of the present application provide a target object clustering method and apparatus to improve the clustering accuracy in the target object detection process.

Mainly comprises the following aspects:

in a first aspect, an embodiment of the present application provides a target object clustering method, including:

acquiring a monitoring video, intercepting subimages comprising target object information from an image of the monitoring video, and recording the frame number of each subimage;

extracting a feature vector of each sub-image, and dividing the sub-images including the same target object information into the same category set based on the extracted feature vector of each sub-image;

and determining a category set to be adjusted based on the frame numbers of the sub-images included in the divided category sets, and adjusting the sub-images included in the category set to be adjusted.

In a possible embodiment, the dividing the sub-images including the same target object information into the same category set based on the extracted feature vector of each sub-image includes:

for an ith sub-image to be clustered, dividing the ith sub-image into a kth category set; i and k are positive integers;

and selecting the sub-images which meet the clustering condition of the kth class set from the sub-images to be clustered except the ith sub-image, and dividing the sub-images into the kth class set.

In a possible embodiment, the selecting, from the sub-images to be clustered except for the ith sub-image, a sub-image that meets the clustering condition of the kth class set and dividing the sub-image into the kth class set includes:

sequentially selecting sub-images from the sub-images to be clustered except for the ith sub-image, and executing a first clustering process until all the sub-images to be clustered are traversed; wherein the first clustering process comprises:

calculating a first feature similarity between the feature vector of the selected sub-image and the feature vector of the ith sub-image;

when the first feature similarity corresponding to the selected jth sub-image is determined to be larger than a first set threshold, dividing the jth sub-image into the kth category set;

taking the sub-images selected after the jth sub-image as residual sub-images, and calculating a second feature similarity between the feature vector of each residual sub-image and the feature vector of any sub-image in the kth category set;

and dividing the residual sub-images of which the corresponding second feature similarity is greater than the first set threshold into the kth category set based on the corresponding second feature similarity of each residual sub-image.

sequentially selecting sub-images to be clustered from the intercepted sub-images, and executing a second clustering process by taking the selected sub-images as clustering centers until all the sub-images to be clustered are traversed; wherein the second clustering process comprises:

calculating a third feature similarity between the feature vector of the selected sub-image and the feature vector of each sub-image to be clustered except the selected sub-image;

screening out the subimages to be clustered, of which the corresponding third feature similarity is greater than a first set threshold value, based on the corresponding third feature similarity of each subimage to be clustered;

and dividing the selected sub-images and the corresponding sub-images to be clustered, of which the third feature similarity is greater than a first set threshold, into the same category set.

In a possible implementation manner, the determining a category set to be adjusted based on frame numbers of sub-images included in each divided category set includes:

determining the maximum frame number and the minimum frame number of the sub-images in each classified set;

detecting whether intermediate frame numbers between the maximum frame number and the minimum frame number respectively corresponding to each classified set are continuous or not;

and determining the class set with discontinuous intermediate frame numbers as the class set to be adjusted.

In a possible embodiment, adjusting the sub-images included in the category set to be adjusted includes:

determining missing intermediate frame numbers in an nth to-be-adjusted category set aiming at the nth to-be-adjusted category set;

determining a first candidate sub-image matched with the missing intermediate frame number in other category sets except the nth category set to be adjusted; determining a first reference sub-image matched with a frame number adjacent to the missing intermediate frame number in the nth to-be-adjusted category set;

calculating a fourth feature similarity between the feature vector of the first reference sub-image and the feature vector of each first candidate sub-image;

screening out first candidate sub-images with corresponding fourth feature similarity larger than a second set threshold value based on the corresponding fourth feature similarity of each first reference sub-image;

and dividing the screened first candidate subimages into the nth class set to be adjusted.

In a possible embodiment, determining a first candidate sub-image in the category sets other than the nth category set to be adjusted, which matches the missing inter-frame number, includes:

screening out a first candidate category set with the frame number of the sub-image as the missing intermediate frame number from the other category sets;

determining a first candidate sub-image whose frame number included in the first candidate category set is the missing intermediate frame number.

In a possible implementation manner, determining a category set to be adjusted based on frame numbers of sub-images included in each divided category set includes:

aiming at the divided nth class set, determining the maximum frame number of the sub-image in the nth class set and a second reference sub-image matched with the maximum frame number;

determining a second candidate sub-image which is matched with a frame number next to the maximum frame number in other category sets except the nth category set;

calculating a fourth feature similarity between the feature vectors of the second reference sub-images and the feature vectors of each second candidate sub-image;

when a second candidate sub-image with the fourth feature similarity larger than a second set threshold exists, determining a second candidate category set where the second candidate sub-image with the fourth feature similarity larger than the second set threshold exists and the nth category set as category sets to be adjusted.

In a possible embodiment, the adjusting the sub-images included in the category set to be adjusted includes:

dividing a second candidate sub-image included in the second candidate category set, wherein the fourth feature similarity is greater than the second set threshold, into the nth category set.

aiming at the divided nth class set, determining the minimum frame number of the sub-image in the nth class set and a third reference sub-image matched with the minimum frame number;

determining a third candidate sub-image which is matched with the frame number which is the last frame number of the minimum frame number in other category sets except the nth category set;

calculating a fourth feature similarity between the feature vectors of the third reference sub-images and the feature vectors of each third candidate sub-image;

and when a third candidate sub-image with the fourth feature similarity larger than a second set threshold exists, determining a third candidate category set where the third candidate sub-image with the fourth feature similarity larger than the second set threshold exists and the nth category set as category sets to be adjusted.

and dividing a third candidate sub-image with the fourth feature similarity larger than the second set threshold included in the third candidate category set into the nth category set.

In a second aspect, an embodiment of the present application provides a target object clustering device, including:

the acquisition module is used for acquiring a monitoring video, intercepting subimages comprising target object information from an image of the monitoring video and recording the frame number of each subimage;

the dividing module is used for extracting the characteristic vector of each sub-image and dividing the sub-images comprising the same target object information into the same category set based on the extracted characteristic vector of each sub-image;

and the adjusting module is used for determining the category set to be adjusted based on the frame numbers of the sub-images included in the divided category sets, and adjusting the sub-images included in the category set to be adjusted.

In a possible design, when the extracted feature vector of each sub-image is used to divide the sub-images including the same target object information into the same category set, the dividing module is specifically configured to:

In a possible design, when the sub-image to be clustered excluding the ith sub-image is selected from the sub-images to be clustered, the partitioning module is specifically configured to:

In a possible design, when determining the class set to be adjusted based on frame numbers of sub-images included in each of the divided class sets, the adjusting module is specifically configured to:

In a possible design, when adjusting the sub-images included in the category set to be adjusted, the adjusting module is specifically configured to:

In a possible design, when determining a first candidate sub-image in the category sets other than the nth category set to be adjusted, which matches the missing inter-frame number, the adjusting module is specifically configured to:

In a third aspect, an embodiment of the present application further provides an electronic device, including: a processor, a memory and a bus, wherein the memory stores machine-readable instructions executable by the processor, the processor and the memory communicate via the bus when the electronic device is running, and the machine-readable instructions, when executed by the processor, perform the steps of the method for clustering target objects according to the first aspect or any one of the possible embodiments of the first aspect.

In a fourth aspect, this application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor to perform the steps of the method for clustering target objects in the foregoing first aspect, or any possible implementation manner of the first aspect.

According to the target object clustering method and device provided by the embodiment of the application, the sub-images comprising the target object information are intercepted from the images of the monitoring video, and then when clustering is performed on the sub-images of the target object information, the sub-images comprising the same target object information are firstly divided into the same category set based on the extracted feature vector of each sub-image, and furthermore, the sub-images in the category set can be adjusted by utilizing the frame numbers of the sub-images in each category set. By the method, the sub-images are roughly clustered by utilizing the characteristic vectors of the sub-images, and then the sub-images in the category set are adjusted according to the frame numbers of the sub-images in the category set, so that the error generated when clustering is carried out only according to the characteristic vectors of the sub-images can be reduced, and the clustering accuracy is improved.

In order to make the aforementioned objects, features and advantages of the embodiments of the present application more comprehensible, embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

Fig. 1 is a schematic flowchart illustrating a target object clustering method provided in an embodiment of the present application;

fig. 2 is a schematic flow chart illustrating a first clustering process provided in an embodiment of the present application;

fig. 3 is a flow chart illustrating a second clustering process provided by an embodiment of the present application;

FIG. 4 is a flowchart illustrating a method for determining a category set to be adjusted according to an embodiment of the present application;

fig. 5 is a flowchart illustrating a method for adjusting missing inter frame numbers in a category set to be adjusted according to an embodiment of the present application;

FIG. 6 is a flow chart of another method for determining a set of categories to be adjusted according to an embodiment of the present application;

FIG. 7 is a flow chart illustrating another method for determining a set of categories to be adjusted according to an embodiment of the present application;

fig. 8 is a schematic architecture diagram of a target object clustering apparatus 800 according to an embodiment of the present application;

fig. 9 shows a schematic structural diagram of an electronic device 900 provided in an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The following detailed description of the embodiments of the present application is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

First, an application scenario to which the present application is applicable will be described. The method and the device for determining the activity of the target object in the area monitored by the monitoring video can be applied to application scenes such as determining the activity of the target object in the area monitored by the monitoring video according to the monitoring video. Illustratively, the target object is, for example, a pedestrian, a vehicle, or the like, and by recognizing an image of the pedestrian or the vehicle appearing in the surveillance video, a movement trajectory of the pedestrian or the vehicle in an area monitored by the surveillance video or the like can be deduced.

It is worth noting that, in the prior art, the common clustering algorithms for clustering target objects are mainly K-means, KD-tree and the like, and such clustering algorithms often need to set a clustering class threshold in advance, however, in practical application, when clustering target objects in a monitored video, it is often difficult to determine how many classes the target objects need to be classified into, and therefore, a large error may occur in selecting such a class threshold, and further, the clustering accuracy is low.

In view of the above problems, the present application provides a method and an apparatus for clustering target objects, which can divide sub-images containing the same target object information into different category sets according to feature vectors of the sub-images after intercepting the sub-images containing the target object information from a monitored video, and then adjust the sub-images in the category sets by using frame numbers of the sub-images included in each category set. Therefore, the clustering accuracy of the target objects appearing in the monitoring video can be improved.

The technical solutions provided in the present application are described in detail below with reference to specific examples.

Referring to fig. 1, a schematic flow chart of a target object clustering method provided in the embodiment of the present application is shown, including the following steps:

step 101, acquiring a surveillance video, intercepting sub-images including target object information from an image of the surveillance video, and recording a frame number of each sub-image.

The target object information may be pixel information of an area where the target object is located on an image including the target object in the monitored video.

In a specific implementation, because not every frame of image of the surveillance video contains the target object information, for example, if the target object is a certain pedestrian and the pedestrian appears in the acquired surveillance video only within a certain period of time, the image containing the target object information in the surveillance video may be first identified and screened, and then the sub-image containing the target object information may be further intercepted from the screened image.

In a possible implementation manner, each frame of image of the monitoring video may be identified to determine whether the image includes the target object information, or one frame of image may be selected for identification every preset number of frames to determine whether the image includes the target object information. When it is determined that target object information is included in a certain frame image, the image including the target object information may be taken as a sub-image; alternatively, the target object may be labeled in an image including the target object information, and then the labeled portion may be cut out as the sub-image, for example, the target object may be labeled by using a rectangular frame, and then the labeled rectangular frame region may be cut out as the sub-image.

And 102, extracting the feature vector of each sub-image, and dividing the sub-images comprising the same target object information into the same category set based on the extracted feature vector of each sub-image.

In an embodiment of the present application, when the sub-images including the same target object information are divided into the same category set based on the extracted feature vector of each sub-image, taking the ith sub-image to be clustered as an example, the ith sub-image may be divided into the kth category set, where i and k are positive integers; and then selecting the sub-images which accord with the clustering condition of the kth class set from the sub-images to be clustered except the ith sub-image, and dividing the sub-images into the kth class set.

For example, if the sub-images to be clustered are the 1 st sub-image, the 2 nd sub-image, the 3 rd sub-image, the 4 th sub-image, and the 5 th sub-image; the 1 st sub-image may be classified into the 1 st category set, and then sub-images meeting the clustering condition of the 1 st category set are selected from the 2 nd sub-image, the 3 rd sub-image, the 4 th sub-image, and the 5 th sub-image and classified into the 1 st category set.

Specifically, when the subimages to be clustered except the ith subimage are selected and classified into the kth class set, the subimages meeting the clustering condition of the kth class set can be sequentially selected from the subimages to be clustered except the ith subimage, and the first clustering process is executed until all the subimages to be clustered are traversed. The first clustering process is shown in fig. 2, and includes the following steps:

step 201, calculating a first feature similarity between the feature vector of the selected sub-image and the feature vector of the ith sub-image.

The feature vector of the selected sub-image may reflect the features of the target object included in the selected sub-image, the feature vector of the ith sub-image may reflect the features of the target object included in the ith sub-image, and the similarity between the target object included in the selected sub-image and the target object included in the ith sub-image may be obtained by calculating the first feature similarity between the feature vector of the selected sub-image and the feature vector of the ith sub-image.

In an example, when the first feature similarity is calculated, for example, the cosine similarity between the feature vector of the selected sub-image and the feature vector of the ith sub-image may be calculated, where the cosine similarity may be calculated according to the following formula:

wherein A is_iFeature vector representing the ith sub-image, B_iThe characteristic vector of the selected sub-image is represented, n represents the number of characteristic values contained in the characteristic vector, and theta represents the included angle between the characteristic vector of the selected sub-image and the characteristic vector of the ith sub-image.

After the value of cos θ is calculated according to the above formula, the cosine value can be used to represent the first feature similarity between the feature vector of the selected sub-image and the feature vector of the ith sub-image, and the smaller θ is, the closer cos θ is to 1, indicating that the two feature vectors are more similar.

In another example, the euclidean distance between the feature vector of the selected sub-image and the feature vector of the ith sub-image may be calculated, and the calculated euclidean distance is used as the first feature similarity between the feature vector of the selected sub-image and the feature vector of the ith sub-image, and a specific computation method of the euclidean distance is not described here.

Step 202, when it is determined that the first feature similarity corresponding to the selected jth sub-image is greater than a first set threshold, dividing the jth sub-image into a kth category set.

In specific implementation, the jth sub-image is set as any one of the sub-images to be clustered except the ith sub-image, and when the first feature similarity between the feature vector of the jth sub-image and the feature vector of the ith sub-image is calculated to be larger than a first set threshold, the jth sub-image is divided into a category set where the ith sub-image is located.

Step 203, taking the sub-images selected after the jth sub-image as residual sub-images, and calculating a second feature similarity between the feature vector of each residual sub-image and the feature vector of any sub-image in the kth category set.

For example, if the mth sub-image is any one of the sub-images to be clustered except the ith sub-image and the jth sub-image, a second feature similarity between the feature vector of the mth sub-image and the feature vector of the ith sub-image may be calculated, or a second feature similarity between the feature vector of the mth sub-image and the feature vector of the jth sub-image may be calculated.

In a possible embodiment, the calculation method of the second feature similarity may be the same as the calculation method of the first feature similarity, and will not be described herein again; or the first feature similarity adopts a cosine similarity calculation method, the second feature similarity adopts an euclidean distance calculation method or a hash algorithm to calculate the similarity, and specifically, the euclidean distance calculation method and the hash algorithm calculation method are not described in the embodiments of the present application.

And 204, based on the second feature similarity corresponding to each residual sub-image, dividing the residual sub-images with the corresponding second feature similarity larger than the first set threshold into the kth category set.

Specifically, a second feature similarity between a feature vector of each sub-image except the ith sub-image and the jth sub-image and a feature vector of the ith sub-image or a feature vector of the jth sub-image in the sub-images to be clustered may be calculated, and the sub-images with the second feature similarity greater than a first set threshold may be classified into a kth category set.

For example, after the ith sub-image and the jth sub-image are divided into the kth class set, the sub-images to be clustered further include the a th sub-image, the b th sub-image, the c th sub-image and the d th sub-image, then second feature similarities between feature vectors of the a th sub-image, the b th sub-image, the c th sub-image and the d th sub-image and the ith sub-image or the jth sub-image can be respectively calculated, and when the second feature similarity between the feature vector of the a th sub-image and the feature vector of the ith sub-image is greater than a first set threshold, the a th sub-image is divided into the kth class set; and when the feature vector of the b-th sub-image and the feature vector of the i-th sub-image and the second feature similarity between the feature vector of the b-th sub-image and the feature vector of the j-th sub-image are smaller than a first set threshold, not dividing the b-th sub-image into a k-th category set.

The first clustering process is exemplarily described below with reference to specific implementation scenarios.

Setting a first set threshold value as H, selecting a first pedestrian as a pedestrian in a category set 1 by using a target object as a pedestrian in a surveillance video, subsequently calculating first feature similarity between each pedestrian to be clustered and the selected first pedestrian aiming at all pedestrians to be clustered in the surveillance video until the first feature similarity between a second pedestrian and the first pedestrian of all the pedestrians to be clustered is calculated to be larger than the first set threshold value H, dividing the second pedestrian into the category set 1, wherein the category set 1 contains the first pedestrian and the second pedestrian, calculating second feature similarity between each remaining pedestrian to be clustered and the first pedestrian or the second pedestrian, and dividing the Nth pedestrian into the category set 1 when the second feature similarity between the Nth pedestrian and the first pedestrian or the second pedestrian is calculated to be larger than the first set threshold value H, and repeating the steps until all the remaining pedestrians to be clustered are traversed, thereby completing the clustering of the category set 1.

In another embodiment of the present application, when the sub-images including the same target object information are divided into the same category set based on the extracted feature vector of each sub-image, the sub-images to be clustered may be sequentially selected from the intercepted sub-images, and the second clustering process is performed with the selected sub-images as clustering centers until all the sub-images to be clustered are traversed. Wherein, the second clustering process may be the method shown in fig. 3, and includes the following steps:

step 301, calculating a third feature similarity between the feature vector of the selected sub-image and the feature vector of each sub-image to be clustered except the selected sub-image.

And 302, screening out the sub-images to be clustered, of which the corresponding third feature similarity is greater than a first set threshold value, based on the third feature similarity corresponding to each sub-image to be clustered.

And 303, dividing the selected sub-images and the corresponding sub-images to be clustered, of which the third feature similarity is greater than a first set threshold, into the same category set.

In a specific implementation, the calculation method of the third feature similarity may be the same as the calculation method of the first feature similarity and/or the second feature similarity, and will not be described herein again.

The two clustering processes shown above may cause inaccurate classification of the class set for the sub-images with varying feature vectors. For example, in a scene where a pedestrian crosses a road, with the pedestrian as a target object, a feature similarity between feature vectors of a sub-image containing information of the pedestrian captured when no vehicle blocks the pedestrian and a sub-image containing information of the pedestrian captured when a vehicle blocks the pedestrian may be small, in this case, different sub-images containing features of the same pedestrian are easily divided into different category sets, so that some images of the pedestrian originally belonging to the category set are missing in some category sets, and more images of the pedestrian not belonging to the category set are present in some category sets.

In view of the above problem, the present application further provides a scheme for adjusting the sub-images included in the category set based on the frame numbers of the sub-images, referring to the following step 103:

step 103, determining a category set to be adjusted based on the frame numbers of the sub-images included in the divided category sets, and adjusting the sub-images included in the category set to be adjusted.

Considering that the moving track of the target object appearing in the surveillance video is always continuous, that is, the frame numbers of the sub-images containing the target object information are continuous and do not basically appear in the middle one of the sub-images of several continuous frames without containing the target object information, based on the characteristic of the moving continuity of the target object, the category set to be adjusted is determined by judging whether the frame numbers of the sub-images included in the category set are continuous, and the sub-images included in the category set to be adjusted are adjusted.

The following two specific ways of determining the category set to be adjusted and adjusting the sub-images included in the category set to be adjusted are listed:

and the first condition is that whether the sub-image with the middle frame number in the category set is missing or not is detected.

The method shown in fig. 4 may be followed to determine the category set to be adjusted, including the following steps:

step 401, determining the maximum frame number and the minimum frame number of the sub-images in each classified set.

Step 402, detecting whether the intermediate frame numbers between the maximum frame number and the minimum frame number respectively corresponding to each classified set are continuous.

And step 403, determining the class set with discontinuous intermediate frame numbers as the class set to be adjusted.

Illustratively, if the frame numbers of the sub-images included in the category set a are respectively 1, 2, 3, 4, 5, 6, 7, 8, the maximum frame number is 8, the minimum frame number is 1, and the intermediate frame numbers between the maximum frame number and the minimum frame number are consecutive frame numbers, the category set a is a category set that does not need to be adjusted; if the frame numbers of the sub-images included in the category set B are 1, 2, 3, 4, 6, 7, 8, the maximum frame number is 8, the minimum frame number is 1, and the intermediate frame number between the maximum frame number and the minimum frame number is not a continuous frame number, the category set B is the category set to be adjusted.

Further, the method shown in fig. 5 may be adopted to adjust the missing inter-frame numbers in the category set to be adjusted, including the following steps:

step 501, determining missing intermediate frame numbers in the nth to-be-adjusted category set aiming at the nth to-be-adjusted category set.

The category set to be adjusted is a set comprising discontinuous frame numbers of the sub-images, and the missing intermediate frame numbers in the category set to be adjusted can be determined according to the frame numbers of the sub-images.

For example, if the frame numbers of the sub-images included in the category set to be adjusted are 1, 2, 3, 4, 6, 7, 8, the missing inter-frame number is 5.

Step 502, determining a first candidate sub-image matched with the missing intermediate frame number in other category sets except the nth category set to be adjusted; and determining a first reference sub-image matched with a frame number adjacent to the missing intermediate frame number in the nth to-be-adjusted category set.

Specifically, a first candidate category set with the frame number of the sub-image as the missing inter-frame number may be screened from other category sets, and then the first candidate sub-image with the frame number as the missing inter-frame number included in the first candidate category set is determined.

For example, if the frame number missing from the nth to-be-adjusted category set is 6, determining that the category set of the sub-images with the frame number of 6 in the other category sets is the first candidate category set, and then screening the sub-images with the frame number of 6 from the first candidate category set to determine the sub-images as the first candidate sub-images.

In a possible implementation, if the missing inter-frame number is x, the (x + 1) th sub-image in the nth class set to be adjusted may be used as the first reference sub-image, or the (x-1) th sub-image may be used as the first reference sub-image, or the (x + 1) th sub-image and the (x-1) th sub-image may be used as the first reference sub-image at the same time.

Step 503, calculating a fourth feature similarity between the feature vector of the first reference sub-image and the feature vector of each first candidate sub-image.

The fourth feature similarity is the same as the calculation method of any one of the first feature similarity, the second feature similarity and the third feature similarity, and will not be described herein again.

Specifically, if the first reference sub-image is a sub-image, a fourth feature similarity between the feature vector of the first reference sub-image and the feature vector of each first candidate sub-image may be calculated; if the number of the sub-images contained in the first reference sub-image is larger than 1, calculating fourth feature similarity between the feature vector of each sub-image in the first reference sub-image and the feature vector of each first candidate sub-image.

Step 504, based on the fourth feature similarity corresponding to each first reference sub-image, screening out the first candidate sub-images whose corresponding fourth feature similarity is greater than a second set threshold.

And 505, dividing the screened first candidate sub-image into an nth category set to be adjusted.

In a possible embodiment, the calculated fourth feature similarities are all less than or equal to the second set threshold, in which case the nth set of categories to be adjusted will not be adjusted. For example, for a surveillance video in which a pedestrian crosses a road, the pedestrian is completely blocked by a vehicle for a long period of time, and in this case, the feature information of the pedestrian when the pedestrian is completely blocked by the vehicle cannot be obtained in the surveillance video, so that discontinuity of the frame number in the category set of the feature information for describing the pedestrian may be caused. By the method, the subimages corresponding to the frame numbers missing from the nth to-be-adjusted category set can be screened from other category sets, and the clustering accuracy is improved.

In case two, the frame numbers between the maximum frame number and the minimum frame number in the category set are consecutive, but the feature information after the maximum frame number or the feature information before the minimum frame number is lost, in this case, the category set also needs to be adjusted.

In a possible implementation manner, when determining whether feature information after the maximum frame number is lost, the method shown in fig. 6 may be used to determine the category set to be adjusted based on the frame numbers of the sub-images included in the divided category sets, including the following steps:

step 601, aiming at the divided nth class set, determining the maximum frame number of the sub-image in the nth class set and a second reference sub-image matched with the maximum frame number.

Step 602, determining a second candidate sub-image in the other category sets except the nth category set, which is matched with the frame number next to the maximum frame number.

For example, if the maximum frame number in the nth class set is x, the x +1 th sub-image is determined to be the second reference sub-image in the other class set.

Step 603, calculating a fourth feature similarity between the feature vectors of the second reference sub-images and the feature vectors of each second candidate sub-image.

Step 604, when there is a second candidate sub-image with a fourth feature similarity greater than a second set threshold, determining a second candidate category set where the second candidate sub-image with the fourth feature similarity greater than the second set threshold is located and an nth category set as the category set to be adjusted.

According to the method shown in fig. 6, after the category set to be adjusted is determined, the second candidate sub-images included in the second candidate category set, whose fourth feature similarity is greater than the second set threshold, may be divided into the nth category set, so as to implement adjustment of the nth category set.

In another possible implementation, when determining whether the feature information before the minimum frame number is lost, the method shown in fig. 7 may be used to determine the category set to be adjusted based on the frame numbers of the sub-images included in the divided category sets, including the following steps:

step 701, determining a minimum frame number of the sub-image in the nth class set and a third reference sub-image matched with the minimum frame number for the nth class set.

Step 702, determining a third candidate sub-image in the other category sets except the nth category set, which is matched with the frame number last to the minimum frame number.

And 703, calculating a fourth feature similarity between the feature vector of the third reference sub-image and the feature vector of each third candidate sub-image.

Step 704, when there is a third candidate sub-image with a fourth feature similarity greater than a second set threshold, determining a third candidate category set where the third candidate sub-image with the fourth feature similarity greater than the second set threshold is located and an nth category set as the category set to be adjusted.

According to the method shown in fig. 7, after the category set to be adjusted is determined, the third candidate sub-images included in the third candidate category set, whose fourth feature similarity is greater than the second set threshold, may be divided into the nth category set, so as to implement adjustment of the nth category set.

By the method provided by the embodiment, the sub-images in the category set can be adjusted according to the frame numbers of the sub-images in the category set, so that the error of clustering according to the feature similarity between the feature vectors of the sub-images can be reduced, and the accuracy of clustering is improved.

In the above embodiment, the sub-images including the target object information are intercepted from the image of the surveillance video, and then when the sub-images of the target object information are clustered, the sub-images including the same target object information are firstly divided into the same category set based on the extracted feature vector of each sub-image, and further, the sub-images included in the category sets can be adjusted by using the frame numbers of the sub-images included in each category set. By the method, the sub-images are roughly clustered by utilizing the characteristic vectors of the sub-images, and then the sub-images in the category set are adjusted according to the frame numbers of the sub-images in the category set, so that the error generated when clustering is carried out only according to the characteristic vectors of the sub-images can be reduced, and the clustering accuracy is improved.

Referring to fig. 8, an architecture schematic diagram of a target object clustering device 800 provided in the embodiment of the present application includes an obtaining module 801, a dividing module 802, and an adjusting module 803, specifically:

an obtaining module 801, configured to obtain a surveillance video, intercept sub-images including target object information from an image of the surveillance video, and record a frame number of each sub-image;

a dividing module 802, configured to extract a feature vector of each sub-image, and divide the sub-images including the same target object information into the same category set based on the extracted feature vector of each sub-image;

an adjusting module 803, configured to determine a category set to be adjusted based on frame numbers of sub-images included in each divided category set, and adjust the sub-images included in the category set to be adjusted.

In one possible design, when the extracted feature vector of each sub-image is used to divide the sub-images including the same target object information into the same category set, the dividing module 802 is specifically configured to:

In one possible design, when the sub-image to be clustered excluding the ith sub-image is selected from the sub-images to be clustered, the partitioning module 802 is specifically configured to:

In a possible design, when determining the class set to be adjusted based on the frame numbers of the sub-images included in the divided class sets, the adjusting module 803 is specifically configured to:

In a possible design, when adjusting the sub-images included in the category set to be adjusted, the adjusting module 803 is specifically configured to:

In one possible design, when determining the first candidate sub-image in the category set other than the nth category set to be adjusted, which matches the missing inter-frame number, the adjusting module 803 is specifically configured to:

According to the target object clustering device, the sub-images comprising the target object information are intercepted from the images of the monitoring videos, and then when the sub-images of the target object information are clustered, the sub-images comprising the same target object information are firstly divided into the same category set based on the extracted feature vector of each sub-image, and furthermore, the sub-images in the category sets can be adjusted by utilizing the frame numbers of the sub-images in each category set. By the method, the sub-images are roughly clustered by utilizing the characteristic vectors of the sub-images, and then the sub-images in the category set are adjusted according to the frame numbers of the sub-images in the category set, so that the error generated when clustering is carried out only according to the characteristic vectors of the sub-images can be reduced, and the clustering accuracy is improved.

Based on the same technical concept, the embodiment of the application also provides the electronic equipment. Referring to fig. 9, a schematic structural diagram of an electronic device 900 provided in the embodiment of the present application includes a processor 901, a memory 902, and a bus 903. The memory 902 is used for storing execution instructions, and includes a memory 9021 and an external memory 9022; the memory 9021 is also referred to as an internal memory, and is configured to temporarily store operation data in the processor 901 and data exchanged with an external memory 9022 such as a hard disk, the processor 901 exchanges data with the external memory 9022 through the memory 9021, and when the electronic device 900 is operated, the processor 901 communicates with the memory 902 through the bus 903, so that the processor 901 executes the following instructions:

The specific processing flow of the processor 901 may refer to the description of the above method embodiment, and is not described herein again.

Based on the same technical concept, embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and the computer program is executed by a processor to perform the steps of the above target object clustering method.

Specifically, the storage medium can be a general-purpose storage medium, such as a removable disk, a hard disk, or the like, and when a computer program on the storage medium is executed, the target object clustering method can be executed to improve the accuracy of target object clustering.

Based on the same technical concept, embodiments of the present application further provide a computer program product, which includes a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute the steps of the target object clustering method, and specific implementation may refer to the above method embodiments, and will not be described herein again.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A target object clustering method is characterized by comprising the following steps:

determining a category set to be adjusted based on frame numbers of the sub-images included in each divided category set, and adjusting the sub-images included in the category set to be adjusted;

determining a category set to be adjusted based on frame numbers of sub-images included in each divided category set, including:

2. The method of claim 1, wherein the dividing the sub-images including the same target object information into the same category set based on the extracted feature vector of each sub-image comprises:

3. The method according to claim 2, wherein the selecting, from the sub-images to be clustered except for the ith sub-image, the sub-image meeting the clustering condition of the kth class set and dividing into the kth class set comprises:

4. The method of claim 1, wherein the dividing the sub-images including the same target object information into the same category set based on the extracted feature vector of each sub-image comprises:

5. The method of claim 1, wherein the adjusting the sub-images included in the set of categories to be adjusted comprises:

6. A target object clustering apparatus, comprising:

the adjusting module is used for determining a category set to be adjusted based on the frame numbers of the sub-images in each divided category set and adjusting the sub-images in the category set to be adjusted;

the adjusting module, when determining the category set to be adjusted based on the frame numbers of the sub-images included in the divided category sets, is specifically configured to:

7. The apparatus according to claim 6, wherein the partitioning module, when partitioning the sub-images including the same target object information into the same category set based on the extracted feature vector of each sub-image, is specifically configured to:

8. The apparatus according to claim 7, wherein the partitioning module, when selecting the sub-image meeting the clustering condition of the kth class set from the sub-images to be clustered except the ith sub-image and partitioning the sub-image into the kth class set, is specifically configured to:

9. The apparatus according to claim 6, wherein the partitioning module, when partitioning the sub-images including the same target object information into the same category set based on the extracted feature vector of each sub-image, is specifically configured to:

10. The apparatus according to claim 6, wherein the adjusting module, when adjusting the sub-images included in the set of categories to be adjusted, is specifically configured to:

11. An electronic device, comprising: a processor, a memory and a bus, the memory storing machine-readable instructions executable by the processor, the processor and the memory communicating via the bus when the electronic device is running, the processor executing the machine-readable instructions to perform the steps of the target object clustering method according to any one of claims 1 to 5.

12. A computer-readable storage medium, having stored thereon a computer program for performing, when being executed by a processor, the steps of the method for clustering target objects according to any one of claims 1 to 5.