CN115019078A

CN115019078A - Data clustering method and device

Info

Publication number: CN115019078A
Application number: CN202210946778.6A
Authority: CN
Inventors: 刘俊龙; 申晨; 沈旭; 黄建强
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2022-08-09
Filing date: 2022-08-09
Publication date: 2022-09-06
Anticipated expiration: 2042-08-09
Also published as: CN115019078B

Abstract

The embodiment of the specification provides a data clustering method and a device, wherein the data clustering method comprises the following steps: the method comprises the steps of obtaining a data set to be clustered, clustering any two data to be clustered according to the matching probability between any two data to be clustered in the data set to be clustered to generate an intermediate clustering result, determining an expected value corresponding to each data to be clustered in the intermediate clustering result according to the matching probability between any two data to be clustered in the intermediate clustering result, wherein the expected value comprises a clustering accuracy expected value and/or a clustering splitting degree expected value, and adjusting the intermediate clustering result according to the expected value to generate a corresponding target clustering result.

Description

Data clustering method and device

Technical Field

The embodiment of the specification relates to the technical field of computers, in particular to a data clustering method and device.

Background

In the field of internet technology, the applied service scenarios are very many and complex, and the user population is huge, so that massive data is generated, and real-time computing (also called online computing) processing is required on the massive data to provide real-time responses to the users.

One of the ways of online computation, namely, online clustering, generally requires predefining the number of clustering results (clustering clusters) due to the current clustering method, but in an actual service scenario, the number of clustering results generated by clustering data may not be obtained in advance due to uncertainty of online data generated in real time, and therefore, if the number of clustering results is set blindly before clustering data, the clustering results are not accurate enough, and therefore, it is urgently needed to provide an effective method to solve such problems.

Disclosure of Invention

In view of this, the embodiments of the present specification provide a data clustering method. One or more embodiments of the present specification also relate to a data clustering apparatus, an image clustering method, an image clustering apparatus, a vehicle image processing method, a vehicle image processing apparatus, a computing device, a computer-readable storage medium, and a computer program, so as to solve the technical drawbacks of the prior art.

According to a first aspect of embodiments of the present specification, there is provided a data clustering method, including:

acquiring a data set to be clustered, and clustering any two data sets to be clustered according to the matching probability between any two data sets to be clustered in the data set to be clustered to generate a middle clustering result;

determining an expected value corresponding to each data to be clustered in the intermediate clustering result according to the matching probability between any two data to be clustered in the intermediate clustering result, wherein the expected value comprises a clustering accuracy expected value and/or a clustering splitting degree expected value;

and adjusting the intermediate clustering result according to the expected value to generate a corresponding target clustering result.

According to a second aspect of embodiments herein, there is provided a data clustering apparatus including:

the device comprises an acquisition module, a clustering module and a clustering module, wherein the acquisition module is configured to acquire a data set to be clustered, and perform clustering processing on any two data to be clustered according to the matching probability between any two data to be clustered in the data set to be clustered to generate a middle clustering result;

the determining module is configured to determine an expected value corresponding to each data to be clustered in the intermediate clustering result according to the matching probability between any two data to be clustered in the intermediate clustering result, wherein the expected value comprises a clustering accuracy expected value and/or a clustering splitting degree expected value;

and the adjusting module is configured to adjust the intermediate clustering result according to the expected value to generate a corresponding target clustering result.

According to a third aspect of embodiments herein, there is provided an image clustering method including:

acquiring an image set to be clustered, and clustering any two images to be clustered according to the matching probability between any two images to be clustered in the image set to be clustered to generate a middle clustering result;

determining an expected value corresponding to each image to be clustered in the intermediate clustering result according to the matching probability between any two images to be clustered in the intermediate clustering result, wherein the expected value comprises a clustering accuracy expected value and/or a clustering splitting degree expected value;

According to a fourth aspect of embodiments herein, there is provided an image clustering apparatus including:

the clustering module is configured to acquire an image set to be clustered, and perform clustering processing on any two images to be clustered according to the matching probability between any two images to be clustered in the image set to be clustered to generate a middle clustering result;

the determining module is configured to determine an expected value corresponding to each image to be clustered in the intermediate clustering result according to the matching probability between any two images to be clustered in the intermediate clustering result, wherein the expected value comprises a clustering accuracy expected value and/or a clustering splitting degree expected value;

and the generating module is configured to adjust the intermediate clustering result according to the expected value to generate a corresponding target clustering result.

According to a fifth aspect of embodiments herein, there is provided a vehicle image processing method including:

acquiring a vehicle image set to be clustered, and clustering any two vehicle images to be clustered according to the matching probability between any two vehicle images to be clustered in the vehicle image set to be clustered to generate a middle clustering result;

determining an expected value corresponding to each vehicle image to be clustered in the intermediate clustering result according to the matching probability between any two vehicle images to be clustered in the intermediate clustering result, wherein the expected value comprises a clustering accuracy expected value and/or a clustering splitting degree expected value;

adjusting the intermediate clustering result according to the expected value to generate a corresponding target clustering result;

and determining the motion track of the target vehicle according to the target clustering result of the vehicle image to be clustered, which contains the target vehicle.

According to a sixth aspect of the embodiments herein, there is provided a vehicle image processing apparatus including:

the system comprises an acquisition module, a clustering module and a clustering module, wherein the acquisition module is configured to acquire a vehicle image set to be clustered, and perform clustering processing on any two vehicle images to be clustered according to the matching probability between any two vehicle images to be clustered in the vehicle image set to be clustered to generate an intermediate clustering result;

the first determining module is configured to determine an expected value corresponding to each vehicle image to be clustered in the intermediate clustering result according to the matching probability between any two vehicle images to be clustered in the intermediate clustering result, wherein the expected value comprises a clustering accuracy expected value and/or a clustering splitting degree expected value;

the adjusting module is configured to adjust the intermediate clustering result according to the expected value to generate a corresponding target clustering result;

the second determination module is configured to determine the motion track of the target vehicle according to a target clustering result of the vehicle image to be clustered, which contains the target vehicle.

According to a seventh aspect of embodiments herein, there is provided a computing device comprising:

a memory and a processor;

the memory is configured to store computer-executable instructions and the processor is configured to execute the computer-executable instructions to perform any one of the steps of the data clustering method, the image clustering method, or the vehicle image processing method.

According to an eighth aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of any one of the data clustering method, the image clustering method, or the vehicle image processing method.

According to a ninth aspect of embodiments herein, there is provided a computer program, wherein the computer program, when executed in a computer, causes the computer to execute the steps of the above-described data clustering method, the image clustering method, or the vehicle image processing method.

In an embodiment of the present specification, a data set to be clustered is obtained, any two data to be clustered are clustered according to a matching probability between any two data to be clustered in the data set to be clustered, an intermediate clustering result is generated, an expected value corresponding to each data to be clustered in the intermediate clustering result is determined according to the matching probability between any two data to be clustered in the intermediate clustering result, where the expected value includes an expected value of clustering accuracy and/or an expected value of clustering splitting degree, and the intermediate clustering result is adjusted according to the expected value, so as to generate a corresponding target clustering result.

In the process of clustering data to be clustered, the embodiment of the present specification does not need to specify the number of clustering results, and only clusters according to the matching probability among the data to be clustered, and adjusts the clustering results according to the expected value of the clustering accuracy and/or the expected value of the clustering split degree corresponding to the data to be clustered in each clustering result in real time, which is beneficial to ensuring the accuracy of the clustering results.

Drawings

FIG. 1 is a flow chart of a data clustering method provided by an embodiment of the present specification;

FIG. 2 is a schematic diagram of a data clustering process provided by one embodiment of the present specification;

FIG. 3 is a flowchart illustrating a processing procedure of a data clustering method according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a data clustering device provided in an embodiment of the present specification;

FIG. 5 is a flow chart of a method for clustering images provided by an embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of an image clustering device provided in an embodiment of the present specification;

FIG. 7 is a flowchart of a vehicle image processing method provided in one embodiment of the present description;

fig. 8 is a schematic structural diagram of a vehicle image processing apparatus according to an embodiment of the present specification;

fig. 9 is a block diagram of a computing device according to an embodiment of the present disclosure.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, as those skilled in the art will be able to make and use the present disclosure without departing from the spirit and scope of the present disclosure.

The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It will be understood that, although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can also be referred to as a second and, similarly, a second can also be referred to as a first without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

First, the noun terms to which one or more embodiments of the present specification relate are explained.

Clustering: the process of dividing a collection of physical or abstract objects into classes consisting of similar objects is called clustering. The cluster generated by clustering is a collection of a set of data objects that are similar to objects in the same cluster and distinct from objects in other clusters.

Euclidean distance: refers to the distance between two points in Euclidean space, and is used for calculating the Euclidean distance between the characteristic vectors of a certain mode between certain two objects.

Feature vector: for the one-dimensional array calculated by the pictures, the similarity of two pictures can be generally obtained by calculating the Euclidean distance through the feature vectors of the two pictures.

Cosine similarity: the similarity of the two vectors is evaluated by calculating the cosine value of the included angle of the two vectors, and the similarity is used for calculating the similarity between feature data of a certain mode between certain two objects.

Matching probability: probability that a pair (two) of objects belong to the same cluster class.

And (3) online clustering: the streaming data requires clustering to perform near real-time clustering, namely, the delay between the data arrival time and the data clustering completion time meets certain requirements.

Accuracy: i.e. the maximum number/total number within a class belonging to the same class.

The degree of splitting: the number of classes into which objects belonging to the same class are clustered/1.

Matching probability: the probability of the same class between the objects can be calculated between the objects, for example, after extracting the characteristic vector between the picture pairs, the Euclidean distance is calculated, and the mapping result from the Euclidean distance to the matching probability can be counted/calculated through some labeled data or preset mapping rules.

In the present specification, a data clustering method is provided, and the present specification relates to a data clustering device, an image clustering method, an image clustering device, a vehicle image processing method, a vehicle image processing device, a computing apparatus, a computer-readable storage medium, and a computer program, which are described in detail one by one in the following examples.

Fig. 1 shows a flowchart of a data clustering method provided in an embodiment of the present specification, which specifically includes the following steps.

102, acquiring a data set to be clustered, and clustering any two data sets to be clustered according to the matching probability between any two data sets to be clustered in the data set to be clustered to generate a middle clustering result.

Specifically, the data set to be clustered comprises at least two data to be clustered; the data to be clustered is item data in a target item, and corresponding item processing can be performed according to a clustering result by clustering the item data.

And under the condition that the data to be clustered is the image to be clustered, the data set to be clustered is the image set to be clustered.

Due to the current data clustering method, the number of clustering results (clustering clusters) is generally required to be predefined, but in the actual application scenario of the target project, the number is not necessarily available; for example, in the running process of a target project, if streaming data is generated, near real-time clustering is required to be performed under the streaming data, that is, the delay between the data arrival time and the data clustering completion time meets a certain requirement, so an online clustering mode is generally adopted, and in the online clustering process, data to be clustered are generated in real time, that is, data to be clustered collectively contain which data to be clustered, and which clustering results are generated by clustering the data to be clustered, which clustering results cannot be predetermined, and if the number of clustering results is set blindly before the data to be clustered is clustered, the clustering results are not accurate enough.

Based on this, in the embodiments of the present specification, the clustering target is formalized into maximum likelihood optimization by calculating the matching probability between the data set to be clustered and the neighboring objects, and the clustering result is decomposed and decided by analyzing the accuracy/fragmentation degree, so as to meet the accuracy requirements of different items on the category and achieve better fragmentation degree. By the method, the number of clustering results does not need to be specified in advance, the method is suitable for different online data windows, and the online clustering problem can be effectively solved. The method can be widely applied to various clustering scenes, such as online clustering of urban large-scale pedestrians, motor vehicles and non-motor vehicles, online clustering of large-scale short videos, dynamic real-time clustering issued by a social platform, commodity classification of a shopping platform and the like.

In specific implementation, according to the matching probability between any two data to be clustered in the data set to be clustered, clustering any two data to be clustered includes:

performing feature extraction processing on at least two data to be clustered contained in the data set to be clustered to generate corresponding feature extraction results;

determining the matching probability between any two data to be clustered in the at least two data to be clustered based on the feature extraction result;

and clustering the at least two data to be clustered according to the matching probability.

Further, determining a matching probability between any two data to be clustered in the at least two data to be clustered based on the feature extraction result includes:

determining a target Euclidean distance between any two data to be clustered in the at least two data to be clustered based on the feature extraction result;

and determining a target matching probability having a mapping relation with the target Euclidean distance according to a mapping relation between a preset Euclidean distance and the matching probability, and determining the target matching probability as the matching probability between any two data to be clustered in the at least two data to be clustered.

Specifically, the data set to be clustered includes at least two data to be clustered, and in the process of clustering the data to be clustered in the data set to be clustered, feature extraction processing can be performed on each data to be clustered to obtain a feature vector corresponding to each data to be clustered, and then the matching probability between any two data to be clustered in the data set to be clustered is determined by calculating the euclidean distance or cosine similarity between the feature vectors of any two data to be clustered, so as to determine whether to cluster the two data to be clustered into the same category according to the matching probability.

Wherein, a mapping relation between the Euclidean distance and the matching probability can be set firstly, or a mapping relation between the cosine similarity and the matching probability can be set, for example, the Euclidean distance is 0-3, the Euclidean distance is mapped to the matching probability, the value of the matching probability is 0.1, the Euclidean distance is 3.1-5, the mapping is carried out to the matching probability, the value of the matching probability is 0.2, and the like; the specific mapping relationship may be determined according to actual requirements, and is not limited herein.

After the euclidean distance or the cosine similarity between any two data to be clustered is obtained through calculation, the euclidean distance or the cosine similarity can be mapped to the matching probability, so as to determine whether the two data to be clustered are clustered into the same category according to the size of the target matching probability obtained through mapping, for example, if the size of the target matching probability is greater than a preset probability threshold value by 50%, the two data to be clustered can be clustered into the same category, so as to generate a corresponding intermediate clustering result.

In practical applications, the intermediate clustering results generated by clustering may be one or at least two.

In addition, assuming that the data to be clustered is data 1, data 2, data 3 and data 4, when clustering is performed on the data 1, data 2 and data 3, determining that the matching probability between the data 1 and the data 2, the matching probability between the data 1 and the data 3 and the matching probability between the data 2 and the data 3 are all greater than a preset probability threshold, clustering the data 1, the data 2 and the data 3 to the same intermediate clustering result, when clustering is performed on the data 4, calculating three matching probabilities between the data 4 and the data 1, between the data 4 and the data 2 and between the data 4 and the data 3, and determining that a certain number (more than half) of the matching probabilities among the three matching probabilities are all greater than a preset probability threshold, clustering the data 4 to the intermediate clustering result, if the more than half of the three matching probabilities are not greater than the preset probability threshold, the data 4 may be clustered individually to another intermediate clustering result first.

In the embodiment of the specification, the number of clustering results of data to be clustered is not preset, and in the clustering process, the Euclidean distance or cosine similarity between any two data to be clustered is calculated to further determine the matching probability between any two data to be clustered according to the Euclidean distance or cosine similarity, so that the data to be clustered is clustered according to the matching probability, and the accuracy of the generated clustering result is favorably ensured.

Or, in the online operation process of the target item, if online clustering processing needs to be performed on target data to be clustered generated in the target item in real time, the historical data to be clustered of the target item and the target data to be clustered generated in real time can jointly form a data set to be clustered, and each data to be clustered is clustered according to the matching probability between any two data to be clustered in the data set to be clustered to generate an intermediate clustering result, which can be specifically realized by the following steps:

according to a first matching probability between any two pieces of historical data to be clustered in the data set to be clustered, clustering any two pieces of historical data to be clustered to generate an initial clustering result;

determining a second matching probability between the target data to be clustered and each historical data to be clustered in the data set to be clustered;

and updating the initial clustering result according to the second matching probability to generate an intermediate clustering result.

Specifically, the data to be clustered includes historical data to be clustered and target data to be clustered, the historical data to be clustered is data generated in the process that a target item operates in a historical time interval, and the target data to be clustered is data generated in the process that the target item operates in a current time interval.

When target data to be clustered is received, historical data to be clustered may be actually clustered to a corresponding initial clustering result, and a specific clustering mode is to perform clustering according to a first matching probability between any two pieces of historical data to be clustered, that is, if the first matching probability between two pieces of historical data to be clustered is greater than a preset probability threshold, the two pieces of historical data to be clustered may be clustered to the same initial clustering result, or, if the first matching probabilities between the historical clustering data a and more than half of the historical clustering data in the initial clustering result are greater than the preset probability threshold, the historical clustering data a may be added to the initial clustering result, so as to implement clustering.

Therefore, if the historical data to be clustered is clustered to generate an initial clustering result, the target data to be clustered is clustered on line, that is, whether the target data to be clustered can be clustered to a certain initial clustering result is determined, specifically, a second matching probability between the target data to be clustered and each piece of historical data to be clustered included in any initial clustering result is determined, so as to determine whether the target data to be clustered is added to the initial clustering result or whether the target data to be clustered is clustered to a new initial clustering result, so as to generate an intermediate clustering result.

Or, a class representative object of each initial clustering result may be determined, that is, historical data to be clustered at a central point in each initial clustering result is determined, the historical data to be clustered is used as the class representative object, and then a second matching probability between the target data to be clustered and the class representative object is calculated, so as to determine whether to add the target data to be clustered to the corresponding initial clustering result or re-cluster the target data to be clustered to a new initial clustering result, so as to generate an intermediate clustering result.

For example, the historical data to be clustered includes data 1, data 2, data 3, data 4 and data 5, the target data to be clustered is data 6, data 1, data 2 and data 3 are clustered to form an initial clustering result J1, and data 4 and data 5 are clustered to form an initial clustering result J2, when the data 6 needs to be clustered, the matching probabilities between the data 6 and the data 1, between the data 6 and the data 2, between the data 6 and the data 3, between the data 6 and the data 4, and between the data 6 and the data 5 can be calculated first, and whether the data 6 is clustered to the initial clustering result J1 or the initial clustering result J2 or whether the data 6 is clustered to a new initial clustering result J3 can be determined according to the matching probabilities.

Or, the class representative objects of the initial clustering result J1 and the initial clustering result J2 may be determined first, if the class representative object of the initial clustering result J1 is data 1, and the class representative object of the initial clustering result J2 is data 4, then the matching probabilities between data 6 and data 1, and between data 6 and data 4 may be calculated, and whether to cluster data 6 to the initial clustering result J1 or the initial clustering result J2, or cluster data 6 to a new initial clustering result J3 is determined according to the matching probabilities, so as to update the initial clustering result and generate an intermediate clustering result.

The embodiment of the specification does not preset the number of clustering results of data to be clustered, in the clustering process, historical data to be clustered are clustered to generate initial clustering results, and then under the condition that new data to be clustered exist and the new data to be clustered need to be clustered on line, whether the data to be clustered to the target can be clustered to any initial clustering result can be determined by calculating the matching probability between the data to be clustered and the historical data to be clustered in each initial clustering result, and the accuracy of the generated clustering results can be guaranteed.

Wherein updating the initial clustering result according to the second matching probability comprises:

determining a first category corresponding to the target data to be clustered according to the second matching probability;

and updating the initial clustering result according to the first category and the second category corresponding to the initial clustering result.

Specifically, when online clustering is performed on target data to be clustered in the data set to be clustered, an initial category label may be set for the target data to be clustered, then a second matching probability between the target data to be clustered and each historical data to be clustered in each initial clustering result is calculated, and the initial clustering results are updated according to the second matching probability.

In the process of updating the initial clustering result according to the second matching probability, a first category corresponding to the target data to be clustered can be determined according to the following formula:

wherein i represents target data to be clustered, namely the ith target data to be clustered in the data set to be clustered; t represents the number of initial clustering results; j represents target historical data to be clustered, and the second matching probability between the target historical data to be clustered and the target data to be clustered is larger than a preset probability threshold;

representing an initial clustering result containing j; KNN (i) is the category of the K neighbor of i, namely the category of the historical data to be clustered of each target;

an initial category label for the ith object;

a category label of j, i.e., a second category;

the category labels representing i and j are consistent;

a second matching probability that the class labels of i and j are consistent;

the category labels representing i and j are not consistent;

representing the probability of the category labels of i and j not being consistent;

to represent

And

the corresponding weight is consistent and is a constant.

In the practical application of the method, the air conditioner,

。

thus, can be concretely

、

And

and inputting the formula to obtain a first category corresponding to the target data to be clustered.

And then determining whether to cluster the target data to be clustered to a certain initial clustering result or cluster the target data to be clustered to a new initial clustering result by judging whether the first category is consistent with the second category corresponding to each initial clustering result, so as to update the initial clustering result.

In practical applications, the second category (category label) corresponding to the initial clustering result may be determined according to the generation sequence of the initial clustering results, for example, the first initial clustering result is obtained by clustering first, the category of the first initial clustering result is category 1, the second initial clustering result is obtained by clustering next, the category of the second initial clustering result is category 2, and so on. The initial category label is added to the historical data to be clustered, and the initial category label can be realized according to the arrangement sequence of the data in the data set to be clustered, for example, the category of the data 1 is category 1, the category of the data 2 is category 2, and so on.

Along with the above example, the data 1, the data 2, and the data 3 are grouped into the initial clustering result J1, the corresponding category label is category 1, the data 4 and the data 5 are grouped into the initial clustering result J2, the corresponding category label is category 2, when the data 6 needs to be clustered, the second matching probabilities between the data 6 and the data 1, between the data 6 and the data 2, between the data 6 and the data 3, between the data 6 and the data 4, and between the data 6 and the data 5 are calculated, and the second matching probabilities between the data 6 and the data 1, between the data 2, and between the data 5 are determined to be greater than the preset probability threshold according to the calculation result, so that the data 1, the data 2, and the data 5 are determined to be the target historical data to be clustered.

Then, a first category of the data 6 can be determined based on the second matching probability between the data 6 and the data 1 and the formula, and if the first category of the data 6 is determined to be the category 1, the data 6 is clustered to an initial clustering result J1; if the first category of the data 6 is determined to be not the category 1, determining the first category of the data 6 based on the second matching probability between the data 6 and the data 2 and the formula, and if the first category of the data 6 is determined to be the category 1, clustering the data 6 to an initial clustering result J1; if the first category of the data 6 is determined to be not the category 1, determining the first category of the data 6 based on the second matching probability between the data 6 and the data 5 and the formula, and if the first category of the data 6 is determined to be the category 2, clustering the data 6 to an initial clustering result J2; if it is determined that the first category of data 6 is not category 2, then data 6 is clustered to a new initial clustering result J3.

Or, in order to reduce the clustering complexity, in the embodiment of the present specification, when calculating the second matching probability, the class representative objects in each initial clustering result may be determined, then the second matching probability between the target data to be clustered and the class representative objects is calculated, then k neighbor objects of the target data to be clustered are screened from the various representative objects according to the second matching probability, that is, k class representative objects whose second matching probability with the target data to be clustered is greater than a preset probability threshold are used as neighbor objects of the target data to be clustered, and then the first class corresponding to the target data to be clustered is determined according to the second matching probability between the neighbor objects and the target data to be clustered.

In addition, under the condition that at least two pieces of target data to be clustered exist simultaneously, a second matching probability between any two pieces of target data to be clustered and a second matching probability between the target data to be clustered and various representative objects can be calculated, for any one piece of target data to be clustered, k neighbor objects of the target data to be clustered can be screened from the class representative object and other pieces of target data to be clustered according to the second matching probability between the target data to be clustered and the class representative objects and the second matching probability between the target data to be clustered and other pieces of target data to be clustered, then a first category corresponding to the target data to be clustered is determined according to the second matching probability between the neighbor objects and the target data to be clustered, and a corresponding clustering process is achieved according to the first category.

In the embodiment of the present specification, the number of clustering results of data to be clustered is not preset, and in the process of performing online clustering on new target data to be clustered, whether target data to be clustered can be clustered to any initial clustering result can be determined by calculating the matching probability between the target data to be clustered and historical data to be clustered in each initial clustering result, or by calculating the matching probability between the target data to be clustered and class representative objects in each initial clustering result, which is beneficial to ensuring the accuracy of generated clustering results.

Further, updating the initial clustering result according to the first category and a second category corresponding to the initial clustering result, including:

adding the target data to be clustered to the first initial clustering result under the condition that the first category is consistent with a second category corresponding to the initial clustering result, wherein the first initial clustering result is one of the initial clustering results;

determining a first class representative object of the first initial clustering result, and determining a third matching probability between the first class representative object and first historical data to be clustered, wherein the first historical data to be clustered belongs to a second initial clustering result, and the second initial clustering result is one of the initial clustering results;

and updating the initial clustering result according to the third matching probability.

Or under the condition that the first category is inconsistent with the second category, clustering second historical data to be clustered in the target data to be clustered and the first initial clustering result to generate a third initial clustering result, wherein a second matching probability between the target data to be clustered and the second historical data to be clustered is greater than a preset probability threshold;

determining a third class representative object of the third initial clustering result, and determining a fourth matching probability between the third class representative object and each historical data to be clustered in the first initial clustering result and/or the second initial clustering result;

and updating the initial clustering result according to the fourth matching probability.

Specifically, clustering is carried out on historical data to be clustered to generate a first initial clustering result and a second initial clustering result, and when the class of the target data to be clustered is determined to be consistent with the class corresponding to the first initial clustering result, the target data to be clustered can be clustered to the first initial clustering result; and then, re-determining a class representative object of the first initial clustering result, and determining whether the historical data to be clustered in the second initial clustering result needs to be clustered to the first initial clustering result under the condition that the target data to be clustered to the first initial clustering result by calculating a third matching probability between the class representative object and each piece of historical data to be clustered in the second initial clustering result, so as to update the initial clustering result.

Or when the category of the target data to be clustered is determined to be inconsistent with the category corresponding to the first initial clustering result, determining historical data to be clustered, of the first initial clustering result, of which the second matching probability with the target data to be clustered is greater than a preset probability threshold, and clustering the historical data to be clustered and the target data to be clustered to a third initial clustering result.

And then, re-determining a class representative object of the third initial clustering result, and determining whether other historical data to be clustered in the first initial clustering result and the second initial clustering result need to be clustered to the third initial clustering result or not under the condition of generating the third initial clustering result by calculating the matching probability between the class representative object and the historical data to be clustered in the first initial clustering result and/or the second initial clustering result so as to update the initial clustering result.

Then, a first category of the data 6 can be determined based on the second matching probability between the data 6 and the data 1 and the formula, and if the first category of the data 6 is determined to be the category 1, the data 6 is clustered to an initial clustering result J1; then, the class representative object of the initial clustering result is re-determined, and a third matching probability between the class representative object and the data 4 or the data 5 is calculated to determine whether the data 4 or the data 5 needs to be clustered to the initial clustering result J1 in case of clustering the data 6 to the initial clustering result J1.

Specifically, under the condition that the third matching probability is greater than the preset probability threshold, the data 4 or the data 4 can be clustered to the initial clustering result J1, so that the initial clustering result is updated.

If it is determined that the first class of the data 6 is not the class 1 and the matching probability between the data 6 and the data 1 is greater than the preset threshold, the data 1 and the data 6 may be clustered to the initial clustering result J3, then the class representative object of the initial clustering result J3 is determined, and the matching probability between the class representative object and the data 2, the data 3, the data 4, and the data 5 is calculated, so as to determine whether the data 2, the data 3, the data 4, and the data 5 need to be clustered to the initial clustering result J3, so as to update the initial clustering result.

In the embodiment of the specification, the number of clustering results of data to be clustered is not preset, and in the process of performing online clustering on new target data to be clustered, the target data to be clustered is clustered according to the category corresponding to the target data to be clustered and the category corresponding to each initial clustering result, so that the initial clustering results are updated, and the accuracy of the clustering results is guaranteed.

And 104, determining an expected value corresponding to each data to be clustered in the intermediate clustering result according to the matching probability between any two data to be clustered in the intermediate clustering result, wherein the expected value comprises a clustering accuracy expected value and/or a clustering splitting degree expected value.

In specific implementation, determining an expected value corresponding to each data to be clustered in the intermediate clustering result according to the matching probability between any two data to be clustered in the intermediate clustering result includes:

determining the matching probability between the ith data to be clustered and each data to be clustered in the target intermediate clustering result, wherein the target intermediate clustering result is any one of the at least two intermediate clustering results;

according to the matching probability, determining a first probability that the ith data to be clustered belongs to the target intermediate clustering result and a second probability that the ith data to be clustered does not belong to the target intermediate clustering result;

under the condition that the ith data to be clustered is divided into the target intermediate clustering results, determining that the ith data to be clustered belongs to first accuracy and first split degree corresponding to the target intermediate clustering results, and determining that the ith data to be clustered does not belong to second accuracy and second split degree corresponding to the target intermediate clustering results;

determining a first clustering accuracy expected value corresponding to the ith data to be clustered based on the first probability, the second probability, the first accuracy and the second accuracy;

and determining a first clustering classification degree expected value corresponding to the ith data to be clustered based on the first probability, the second probability, the first split degree and the second split degree.

Further, under the condition that the ith data to be clustered is not divided into the target intermediate clustering results, determining that the ith data to be clustered belongs to a third accuracy and a third split degree corresponding to the target intermediate clustering results, and determining that the ith data to be clustered does not belong to a fourth accuracy and a fourth split degree corresponding to the target intermediate clustering results;

determining a second clustering accuracy expected value corresponding to the ith data to be clustered based on the first probability, the second probability, the third accuracy and the fourth accuracy;

and determining a second clustering classification degree expected value corresponding to the ith data to be clustered based on the first probability, the second probability, the third division degree and the fourth division degree.

Specifically, after clustering is performed on each data to be clustered to generate at least two intermediate clustering results, the data to be clustered included in the intermediate clustering results can be further adjusted to ensure the accuracy of the clustering results.

For each intermediate clustering result, if the intermediate clustering result contains one or at least two pieces of data to be clustered, determining a clustering accuracy expected value and/or a clustering split degree expected value corresponding to each piece of data to be clustered when the data to be clustered is clustered to the intermediate clustering result, and determining whether the clustering result of the data to be clustered needs to be adjusted or not according to the clustering accuracy expected value and/or the clustering split degree expected value, that is, whether the data to be clustered needs to be adjusted from the intermediate clustering result to other intermediate clustering results or not.

The splitting degree is used for representing data to be clustered belonging to the same category, and the distribution condition of the data to be clustered in different clustering results is that, for example, if the data to be clustered belonging to the same category are all clustered to the same clustering result, namely distributed to the same clustering result, the splitting degree corresponding to each data to be clustered in the category is 1; if the data to be clustered belonging to the same category are clustered to two different clustering results, the corresponding splitting degrees of the data to be clustered in the category are all 2, and so on.

In the embodiment of the present specification, a target intermediate clustering result in at least two intermediate clustering results is taken as an example, and an ith data to be clustered in the target intermediate clustering result is taken as an example, to explain an adjustment process of the target intermediate clustering result.

The method comprises the steps of firstly determining the matching probability between the ith data to be clustered and each piece of other data to be clustered in a target intermediate clustering result, and then determining the first probability that the ith data to be clustered belongs to the target intermediate clustering result and the second probability that the ith data to be clustered does not belong to the target intermediate clustering result according to the matching probability.

Wherein the first probability p can be calculated by the following formula:

wherein the content of the first and second substances,

。

assuming that the target intermediate clustering result includes data 1, data 2, and data 3, the ith data to be clustered is data 1, the matching probability between data 1 and data 2 is 0.6, the mismatching probability between data 1 and data 2 is 0.4, the matching probability between data 1 and data 3 is 0.7, and the mismatching probability between data 1 and data 3 is 0.3, based on this, the first probability that data 1 belongs to the target intermediate clustering result and the second probability that data 2 does not belong to the target intermediate clustering result can be determined by inputting 0.6, 0.4, 0.7, and 0.3 into the above-mentioned calculation formula of the first probability p.

A cluster split expectation value and a cluster accuracy split value may then be calculated based on the first probability and the second probability.

In practical application, if the ith data to be clustered is clustered to a target intermediate clustering result, the probability that the ith data to be clustered belongs to the target intermediate clustering result is p, the corresponding clustering accuracy Z1 is 1.0, and the splitting degree F1 is 1.0; the probability that the ith data to be clustered does not belong to the target intermediate clustering result is 1-p, the corresponding clustering accuracy Z2 is (n-1)/n, and the splitting degree F2 is 1.0.

Based on this, if the i-th data to be clustered is clustered to the target intermediate clustering result, the corresponding first clustering accuracy expected value E1= p × Z1+ (1-p) × Z1= p + (1-p) × (n-1)/n, and the first clustering splitting degree expected value E2= p × F1+ (1-p) × F2= 1.

If the ith data to be clustered is clustered to a new intermediate clustering result, namely the ith data to be clustered is not classified to a target intermediate clustering result, the probability that the ith data to be clustered belongs to the target intermediate clustering result is p, the corresponding clustering accuracy Z3 is 1.0, and the splitting degree F3 is 2.0; the probability that the ith data to be clustered does not belong to the target intermediate clustering result is 1-p, the corresponding clustering accuracy Z4 is 1, and the splitting degree F4 is 1.0.

Based on this, if the ith data to be clustered is not clustered to the target intermediate clustering result, the corresponding second clustering accuracy expected value E3= p × Z3+ (1-p) × Z4=1, and the second clustering splitting degree expected value E4= p × F3+ (1-p) × F4= 1.

After the expected value of the clustering accuracy and the expected value of the clustering classification are obtained through calculation, whether the clustering result of the ith data to be clustered needs to be adjusted or not can be determined according to the values of the expected value of the clustering accuracy and the expected value of the clustering classification, and in practical application, if any one of the expected value of the clustering accuracy and the expected value of the clustering classification is smaller than a preset threshold value, the ith data to be clustered can be independently formed into a new class, so that each intermediate clustering result can be adjusted, and a corresponding target clustering result is generated.

And 106, adjusting the intermediate clustering result according to the expected value to generate a corresponding target clustering result.

Specifically, as described above, after the clustering accuracy expected value and the clustering classification expected value are obtained through calculation, it can be determined whether the clustering result of the ith data to be clustered needs to be adjusted according to the values of the clustering accuracy expected value and the clustering classification expected value.

In the practical clustering process, the clustering accuracy and the clustering split degree corresponding to the clustering result are always spears, and high clustering accuracy means high clustering split degree.

A schematic diagram of a data clustering process provided in an embodiment of the present specification is shown in fig. 2. In fig. 2, the historical data to be clustered is the clustered data, the online data is the target data to be clustered, and the online data D is _t Or D _t+1 In the process of carrying out on-line clustering on the data, firstly carrying out batch clustering, determining class representative objects in each clustering result, then calculating neighbor objects of the class representative objects in the clustered data, carrying out maximum likelihood probability clustering optimization on the class representative objects and the neighbor objects thereof to obtain corresponding probability clustering results, then carrying out resolution decision-making on each probability clustering result after carrying out accuracy evaluation, and updating the clustering results corresponding to the clustered data according to the decision-making results to generate target clustering results.

Specifically, the batch data in the acquired online data can be subjected to batch clustering first, so that the calculation amount of subsequent processing steps is reduced. Each data can be directly used as a class, and the neighbor objects of the data can be directly determined. The clustering mode can use a traditional clustering method and can also use a probability clustering method, and the accuracy is preferentially ensured.

In the embodiment of the specification, the probability clustering does not need to specify the category number, and no matter the size of batch data, under the condition that the batch clustering ensures high accuracy, the online clustering result is similar to the maximum likelihood probability target of full data. In addition, the splitting optimization of the object is carried out through the accuracy estimation of the probability clustering result, and the splitting degree is optimized as much as possible under the requirement of different projects on the category accuracy; and designing an incremental clustering optimization process of the unclassified data and the classified data, ensuring that the calculated amount in the optimization process is related to the unclassified data amount and does not increase along with data accumulation, and ensuring that the incremental clustering effect is close to the full offline clustering effect.

Because the urban brain scene needs to continuously analyze the traffic flow/people flow obtained by the video flow and perform clustering (document aggregation) analysis on the traffic flow/people flow, the delay requirement is at most 1 day, and the shorter the delay is, the better the delay is, if the data clustering method provided by the embodiment of the specification is applied to the urban brain scene, the second-level response can be realized, so that the product has strong competitiveness.

In an embodiment of the present specification, a data set to be clustered is obtained, any two data to be clustered in the data set to be clustered are clustered according to a matching probability between any two data to be clustered, an intermediate clustering result is generated, an expected value corresponding to each data to be clustered in the intermediate clustering result is determined according to the matching probability between any two data to be clustered in the intermediate clustering result, where the expected value includes an expected value of clustering accuracy and/or an expected value of clustering splitting degree, and the intermediate clustering result is adjusted according to the expected value, so as to generate a corresponding target clustering result.

The following description further describes the data clustering method by taking an application of the data clustering method provided in this specification in an image clustering scene as an example, with reference to fig. 3. Fig. 3 shows a flowchart of a processing procedure of a data clustering method provided in an embodiment of the present specification, which specifically includes the following steps.

Step 302, according to a first matching probability between any two historical images to be clustered, clustering any two historical images to be clustered to generate an initial clustering result.

And 304, determining a second matching probability among the image set to be clustered, the target image to be clustered and each historical image to be clustered.

And step 306, determining a first category corresponding to the target image to be clustered according to the second matching probability.

And 308, adding the target image to be clustered to the first initial clustering result under the condition that the first category is consistent with the second category corresponding to the first initial clustering result.

Step 310, determining a first class representative object of the first initial clustering result, and determining a third matching probability between the first class representative object and the first historical to-be-clustered image, wherein the first historical to-be-clustered data belongs to a second initial clustering result, and the second initial clustering result is one of the initial clustering results.

And step 312, updating the initial clustering result according to the third matching probability to generate at least two intermediate clustering results.

And step 314, clustering the target image to be clustered and the second historical image to be clustered in the first initial clustering result to generate a third initial clustering result under the condition that the first category is inconsistent with the second category, wherein the second matching probability between the target image to be clustered and the second historical image to be clustered is greater than a preset probability threshold.

And step 316, determining a third class representative object of the third initial clustering result, and determining a fourth matching probability between the third class representative object and each historical data to be clustered in the first initial clustering result and/or the second initial clustering result.

And step 318, updating the initial clustering result according to the fourth matching probability to generate at least two intermediate clustering results.

And step 320, determining a clustering accuracy expected value and/or a clustering splitting degree expected value corresponding to each image to be clustered in each intermediate clustering result according to the matching probability between any two images to be clustered in each intermediate clustering result.

And 322, adjusting each intermediate clustering result according to the clustering accuracy expected value and/or the clustering splitting degree expected value to generate a corresponding target clustering result.

In the process of clustering the images to be clustered, the embodiment of the present specification does not need to specify the number of clustering results, and only clusters the images to be clustered according to the matching probability among the images to be clustered, and adjusts the clustering results according to the expected value of the clustering accuracy and/or the expected value of the clustering split degree corresponding to the images to be clustered in each clustering result in real time, which is beneficial to ensuring the accuracy of the clustering results.

Corresponding to the above method embodiment, the present specification further provides an embodiment of a data clustering device, and fig. 4 shows a schematic structural diagram of the data clustering device provided in an embodiment of the present specification. As shown in fig. 4, the apparatus includes:

an obtaining module 402, configured to obtain a data set to be clustered, and perform clustering processing on any two data to be clustered according to a matching probability between any two data to be clustered in the data set to be clustered, so as to generate an intermediate clustering result;

a determining module 404, configured to determine an expected value corresponding to each to-be-clustered data in the intermediate clustering result according to a matching probability between any two to-be-clustered data in the intermediate clustering result, where the expected value includes an expected value of clustering accuracy and/or an expected value of clustering split;

and the adjusting module 406 is configured to adjust the intermediate clustering result according to the expected value, so as to generate a corresponding target clustering result.

Optionally, the obtaining module 402 is further configured to:

under the condition that the first category is consistent with a second category corresponding to a first initial clustering result, adding the target data to be clustered to the first initial clustering result, wherein the first initial clustering result is one of the initial clustering results;

Optionally, the obtaining module 402 is further configured to:

under the condition that the first category is inconsistent with the second category, clustering second historical data to be clustered in the target data to be clustered and the first initial clustering result to generate a third initial clustering result, wherein a second matching probability between the target data to be clustered and the second historical data to be clustered is greater than a preset probability threshold;

Optionally, the determining module 404 is further configured to:

determining the matching probability between the ith data to be clustered and each data to be clustered in the target intermediate clustering result, wherein the target intermediate clustering result is any one of the intermediate clustering results;

Optionally, the determining module 404 is further configured to:

under the condition that the ith data to be clustered is not divided into the target intermediate clustering results, determining that the ith data to be clustered belongs to a third accuracy and a third split degree corresponding to the target intermediate clustering results, and determining that the ith data to be clustered does not belong to a fourth accuracy and a fourth split degree corresponding to the target intermediate clustering results;

Optionally, the data set to be clustered includes an image set to be clustered.

The foregoing is a schematic scheme of a data clustering apparatus according to this embodiment. It should be noted that the technical solution of the data clustering device and the technical solution of the data clustering method belong to the same concept, and details that are not described in detail in the technical solution of the data clustering device can be referred to the description of the technical solution of the data clustering method.

Fig. 5 is a flowchart illustrating an image clustering method according to an embodiment of the present disclosure, which specifically includes the following steps.

Step 502, acquiring an image set to be clustered, and clustering any two images to be clustered according to the matching probability between any two images to be clustered in the image set to be clustered to generate a middle clustering result.

Step 504, determining an expected value corresponding to each image to be clustered in the intermediate clustering result according to the matching probability between any two images to be clustered in the intermediate clustering result, wherein the expected value comprises a clustering accuracy expected value and/or a clustering splitting degree expected value.

And 506, adjusting the intermediate clustering result according to the expected value to generate a corresponding target clustering result.

Corresponding to the above method embodiment, the present specification further provides an image clustering device embodiment, and fig. 6 shows a schematic structural diagram of an image clustering device provided in an embodiment of the present specification. As shown in fig. 6, the apparatus includes:

the clustering module 602 is configured to acquire an image set to be clustered, perform clustering processing on any two images to be clustered according to the matching probability between any two images to be clustered in the image set to be clustered, and generate an intermediate clustering result;

a determining module 604, configured to determine an expected value corresponding to each image to be clustered in the intermediate clustering result according to a matching probability between any two images to be clustered in the intermediate clustering result, where the expected value includes an expected value of clustering accuracy and/or an expected value of clustering split;

a generating module 606 configured to adjust the intermediate clustering result according to the expected value, and generate a corresponding target clustering result.

The above is a schematic scheme of an image clustering apparatus of this embodiment. It should be noted that the technical solution of the image clustering device and the technical solution of the image clustering method belong to the same concept, and details that are not described in detail in the technical solution of the image clustering device can be referred to the description of the technical solution of the image clustering method.

Fig. 7 is a flowchart illustrating a vehicle image processing method according to an embodiment of the present disclosure, which includes the following steps.

Step 702, obtaining a vehicle image set to be clustered, and clustering any two vehicle images to be clustered according to the matching probability between any two vehicle images to be clustered in the vehicle image set to be clustered to generate an intermediate clustering result.

Step 704, determining an expected value corresponding to each vehicle image to be clustered in the intermediate clustering result according to the matching probability between any two vehicle images to be clustered in the intermediate clustering result, wherein the expected value comprises a clustering accuracy expected value and/or a clustering splitting degree expected value.

Step 706, adjusting the intermediate clustering result according to the expected value, and generating a corresponding target clustering result.

Step 708, determining the motion track of the target vehicle according to the target clustering result of the vehicle image to be clustered, which contains the target vehicle.

Specifically, the vehicle image processing method provided in the embodiments of the present specification is applied to a vehicle track identification scene, and a specific implementation process of clustering a vehicle image set to be clustered to generate a target clustering result is similar to an implementation process of clustering data to be clustered in a data set to be clustered to generate a target clustering result in the aforementioned data clustering method, and is not described herein again.

After the vehicle images to be clustered are clustered to generate target clustering results, the target clustering results of the vehicle images to be clustered containing the target vehicles can be determined, and the motion tracks of the target vehicles are comprehensively determined by combining the space-time information according to the vehicle images to be clustered of the target vehicles in the target clustering results.

In the vehicle image processing method provided by the embodiment of the description, in the process of clustering the vehicle images to be clustered, the number of clustering results does not need to be specified, clustering is performed only according to the matching probability among the vehicle images to be clustered, and the clustering results are adjusted in real time according to the clustering accuracy expected values and/or the clustering split degree expected values corresponding to the vehicle images to be clustered in the clustering results, so that the accuracy of the clustering results is favorably ensured, and the accuracy of target vehicle track reduction is favorably improved.

The foregoing is a schematic arrangement of a vehicle image processing method of the present embodiment. It should be noted that the technical solution of the vehicle image processing method and the technical solution of the image clustering method belong to the same concept, and details that are not described in detail in the technical solution of the vehicle image processing method can be referred to the description of the technical solution of the image clustering method.

Corresponding to the above method embodiment, the present specification further provides a vehicle image processing apparatus embodiment, and fig. 8 shows a schematic structural diagram of a vehicle image processing apparatus provided in an embodiment of the present specification. As shown in fig. 8, the apparatus includes:

the obtaining module 802 is configured to obtain a vehicle image set to be clustered, and perform clustering processing on any two vehicle images to be clustered according to a matching probability between any two vehicle images to be clustered in the vehicle image set to be clustered, so as to generate an intermediate clustering result;

a first determining module 804, configured to determine an expected value corresponding to each vehicle image to be clustered in the intermediate clustering result according to a matching probability between any two vehicle images to be clustered in the intermediate clustering result, where the expected value includes a clustering accuracy expected value and/or a clustering splitting degree expected value;

an adjusting module 806, configured to adjust the intermediate clustering result according to the expected value, and generate a corresponding target clustering result;

a second determining module 808 configured to determine a motion trajectory of the target vehicle according to a target clustering result including a to-be-clustered vehicle image of the target vehicle.

The foregoing is a schematic configuration of a vehicular image processing apparatus of the present embodiment. It should be noted that the technical solution of the vehicle image processing apparatus belongs to the same concept as the technical solution of the vehicle image processing method described above, and details of the technical solution of the vehicle image processing apparatus, which are not described in detail, can be referred to the description of the technical solution of the vehicle image processing method described above.

FIG. 9 illustrates a block diagram of a computing device 900 provided in accordance with one embodiment of the present specification. Components of the computing device 900 include, but are not limited to, a memory 910 and a processor 920. The processor 920 is coupled to the memory 910 via a bus 930, and a database 950 is used to store data.

Computing device 900 also includes access device 940, access device 940 enabling computing device 900 to communicate via one or more networks 960. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. Access device 940 may include one or more of any type of network interface (e.g., a Network Interface Card (NIC)) whether wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present description, the above-described components of computing device 900, as well as other components not shown in FIG. 9, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device architecture shown in FIG. 9 is for purposes of example only and is not limiting as to the scope of the description. Other components may be added or replaced as desired by those skilled in the art.

Computing device 900 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), a mobile phone (e.g., smartphone), a wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 900 may also be a mobile or stationary server.

Wherein the processor 920 is configured to execute computer-executable instructions that, when executed by the processor, implement the steps of the data clustering method, the image clustering method, or the vehicle image processing method.

The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device belongs to the same concept as the technical solution of the data clustering method, the image clustering method, or the vehicle image processing method, and details of the technical solution of the computing device, which are not described in detail, can be referred to the description of the technical solution of the data clustering method, the image clustering method, or the vehicle image processing method.

An embodiment of the present specification further provides a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of the data clustering method, the image clustering method, or the vehicle image processing method.

The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium is the same as that of the data clustering method, the image clustering method or the vehicle image processing method, and details of the technical solution of the storage medium, which are not described in detail, can be referred to the description of the technical solution of the data clustering method, the image clustering method or the vehicle image processing method.

An embodiment of the present specification further provides a computer program, wherein when the computer program is executed in a computer, the computer program causes the computer to execute the steps of the data clustering method, the image clustering method, or the vehicle image processing method.

The above is an illustrative scheme of a computer program of the present embodiment. It should be noted that the technical solution of the computer program is the same as the technical solution of the data clustering method, the image clustering method or the vehicle image processing method, and details not described in detail in the technical solution of the computer program can be referred to the description of the technical solution of the data clustering method, the image clustering method or the vehicle image processing method.

The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The computer instructions comprise computer program code which may be in the form of source code, object code, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer-readable medium may contain suitable additions or subtractions depending on the requirements of legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer-readable media may not include electrical carrier signals or telecommunication signals in accordance with legislation and patent practice.

It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of acts, but those skilled in the art should understand that the present embodiment is not limited by the described acts, because some steps may be performed in other sequences or simultaneously according to the present embodiment. Further, those skilled in the art should also appreciate that the embodiments described in this specification are preferred embodiments and that acts and modules referred to are not necessarily required for an embodiment of the specification.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the embodiments. The specification is limited only by the claims and their full scope and equivalents.

Claims

1. A method of clustering data, comprising:

acquiring a video stream related to a target project, analyzing the video stream to obtain a data set to be clustered, and clustering any two data sets to be clustered according to the matching probability between any two data sets to be clustered in the data sets to be clustered to generate an intermediate clustering result;

2. The data clustering method according to claim 1, wherein the clustering any two data to be clustered according to the matching probability between any two data to be clustered in the data set to be clustered comprises:

3. The data clustering method according to claim 2, wherein the determining the matching probability between any two data to be clustered in the at least two data to be clustered based on the feature extraction result comprises:

4. The data clustering method according to any one of claims 1, wherein the clustering any two data to be clustered according to the matching probability between any two data to be clustered in the data set to be clustered to generate an intermediate clustering result comprises:

5. The data clustering method of claim 4, wherein the updating the initial clustering result according to the second matching probability comprises:

6. The data clustering method according to claim 5, wherein the updating the initial clustering result according to the first category and the second category corresponding to the initial clustering result comprises:

7. The data clustering method according to claim 5 or 6, wherein the updating the initial clustering result according to the first category and the second category corresponding to the initial clustering result comprises:

8. The data clustering method according to any one of claims 1 to 6, wherein the determining an expected value corresponding to each data to be clustered in the intermediate clustering result according to the matching probability between any two data to be clustered in the intermediate clustering result comprises:

9. The data clustering method of claim 8, further comprising:

10. The data clustering method according to any one of claims 1 to 6, wherein the data sets to be clustered comprise image sets to be clustered.

11. An image clustering method, comprising:

12. A vehicle image processing method, comprising:

13. A computing device, comprising:

a memory and a processor;

the memory is configured to store computer-executable instructions for execution by the processor, which computer-executable instructions, when executed by the processor, perform the steps of the data clustering method according to any one of claims 1 to 10, the image clustering method according to claim 11, or the vehicle image processing method according to claim 12.

14. A computer readable storage medium storing computer executable instructions which, when executed by a processor, implement the steps of the data clustering method according to any one of claims 1 to 10, the image clustering method according to claim 11 or the vehicle image processing method according to claim 12.