CN117131397A

CN117131397A - Load spectrum clustering method and system based on DTW distance

Info

Publication number: CN117131397A
Application number: CN202311130261.0A
Authority: CN
Inventors: 贺小帆; 李晨迪; 范天宇; 孙燕涛; 张生良; 付志忠; 纪鹏飞
Original assignee: 93208 Troops Of Chinese Pla; Beihang University
Current assignee: 93208 Troops Of Chinese Pla; Beihang University
Priority date: 2023-09-04
Filing date: 2023-09-04
Publication date: 2023-11-28

Abstract

The invention discloses a load spectrum clustering method and a system based on a DTW distance, wherein the method comprises the following steps: s1, collecting load spectrum data, and preprocessing the load spectrum data to obtain a peak-valley value sequence; s2, calculating a DTW distance based on the peak-valley value sequence, and carrying out load spectrum clustering based on the DTW distance to obtain a preliminary clustering result; and S3, evaluating the preliminary clustering result, and determining a final clustering result based on the evaluated result. According to the method, the load spectrums are clustered according to the similarity, and after the clustering is completed, certain similarity exists among the load spectrums of each type, the damage of each type of load spectrums is calculated by adopting a relative damage calculation method, so that a damage sample is obtained, and a foundation is provided for the dispersion characterization of the damage of the load spectrums.

Description

Load spectrum clustering method and system based on DTW distance

Technical Field

The invention belongs to the technical field of airplane actual measurement load spectrum processing, and particularly relates to a load spectrum clustering method and system based on a DTW distance.

Background

The load-time process experienced by each single machine in the cluster in the actual use process has certain difference, even if the same single machine executes the same task, the load-time process also has certain difference, in order to characterize the difference of the load spectrum used by the single machine, the dispersion of the load spectrum needs to be studied, and the key point is the damage calculation of the load spectrum.

There are many difficulties in accurately calculating the damage to the load spectrum, one is the high-low load order effect in the load spectrum, and researches show that when the spectrum types in the load spectrum are arranged according to high-low, the critical damage value is lower than 1, and when the spectrum types are arranged according to low-high, the critical damage value is higher than 1. Many researchers have proposed many theories of damage accumulation from different perspectives, but all have certain limitations. In 1979 Schutz pointed out in the literature that "the damage threshold is consistent for two load spectra of similar shape, the life of one load spectrum can be determined by the life of the other load spectrum", i.e. the relative damage calculation method.

The key of the relative damage calculation is how to judge the similarity between the two load spectrums, and the simplest is to consider the two load spectrums as coordinate points in a high-dimensional space, and judge the similarity between the load spectrums by calculating the euclidean distance between the two points, but the lengths of the actually measured load spectrums are random and unlikely to be completely consistent, and the measurement mode adopting the euclidean distance is not suitable, so that the load spectrum clustering method needs to be proposed to solve the problems.

Disclosure of Invention

The invention aims to solve the defects of the prior art, and provides a TW distance-based load spectrum clustering method and system, wherein a DTW distance is used as a measure of load spectrum similarity, load spectrums with relatively close DTW distances are clustered sequentially according to a hierarchical clustering method, and finally the cluster number of the clusters is determined according to the Denne index, so that the load spectrum clustering result is determined.

In order to achieve the above object, the present invention provides the following solutions:

a load spectrum clustering method based on DTW distance comprises the following steps:

s1, collecting load spectrum data, and obtaining a peak-valley value sequence based on the load spectrum data;

s2, calculating a DTW distance based on the peak-valley value sequence, and carrying out load spectrum clustering based on the DTW distance to obtain a preliminary clustering result;

and S3, evaluating the preliminary clustering result, and determining a final clustering result based on the evaluated result.

Preferably, the S1 includes:

carrying out peak-valley extraction on the load spectrum data to obtain a peak-valley load sequence;

and carrying out standardization treatment on the peak-valley load sequence to obtain the peak-valley sequence.

Preferably, the S2 includes:

calculating Euclidean distances for the components of the two peak-valley sequences, constructing the Euclidean distances of the components into a distance matrix, and initializing the distance matrix;

selecting a shortest path based on the initialized distance matrix, and calculating the DTW distance of the shortest path;

constructing a DTW distance matrix based on the DTW distance, and arranging elements in the DTW distance matrix in an ascending order manner to obtain a DTW value sequence;

and clustering the load spectrum based on the DTW value sequence to obtain a preliminary clustering result.

Preferably, the S3 includes:

evaluating the preliminary clustering result by adopting the Denne coefficient to obtain a final class number;

and setting the final class number as a cut-off value, and obtaining a final clustering result based on the cut-off value.

The invention also provides a load spectrum clustering system based on the DTW distance, which comprises the following steps:

the device comprises a preprocessing unit, a clustering unit and an evaluation unit;

the preprocessing unit is used for collecting load spectrum data and obtaining a peak-valley value sequence based on the load spectrum data;

the clustering unit is used for calculating a DTW distance based on the peak-valley value sequence, and carrying out load spectrum clustering based on the DTW distance to obtain a primary clustering result;

the evaluation unit is used for evaluating the preliminary clustering result and determining a final clustering result based on the evaluation result.

Preferably, the method for obtaining the peak-valley sequence by the preprocessing unit comprises the following steps:

Preferably, the method for obtaining the preliminary clustering result of the book searching by the clustering unit comprises the following steps:

Preferably, the method for obtaining the final clustering result by the evaluation unit comprises the following steps:

Compared with the prior art, the invention has the beneficial effects that:

the load spectrum clustering method based on the DTW distance provided by the invention clusters the load spectrums according to the similarity, and after the clustering is completed, the load spectrums of each type have certain similarity, and the damage of the load spectrums of each type is calculated by adopting a relative damage calculation method, so that a damage sample is obtained, and a foundation is provided for the dispersion characterization of the damage of the load spectrums.

Drawings

In order to more clearly illustrate the technical solutions of the present invention, the drawings that are needed in the embodiments are briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a load spectrum clustering method based on DTW distance in an embodiment of the invention;

FIG. 2 is a schematic diagram of a DTW distance calculation path according to an embodiment of the invention;

FIG. 3 is a schematic diagram of a load sequence of peaks and valleys of a load spectrum according to an embodiment of the present invention;

FIG. 4 is a schematic diagram showing the comparison of the normalization process before and after the normalization process according to the embodiment of the present invention;

wherein, (a) is a standardized pre-schematic; (b) is a standardized schematic;

FIG. 5 is a hierarchical clustering tree diagram according to an embodiment of the present invention;

FIG. 6 is a diagram showing the variation of DVI values with the number of classes M according to an embodiment of the present invention;

FIG. 7 is a comparison of similar load spectra of an embodiment of the present invention;

wherein, (a) is a graph comparing spectrum 20 with spectrum 24; (b) is a graph comparing spectrum 28 with spectrum 40;

FIG. 8 is a graph showing the comparison of different types of load spectra according to the embodiment of the present invention.

Wherein, (a) is a graph comparing spectrum 8 with spectrum 9; (b) is a graph comparing spectrum 19 with spectrum 9.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.

Example 1

As shown in fig. 1, the embodiment provides a load spectrum clustering method based on DTW distance, which includes the following steps:

s1, collecting load spectrum data, and obtaining a peak-valley value sequence based on the load spectrum data; in particular, the method comprises the steps of,

firstly, carrying out peak-valley extraction on load spectrum data to obtain a peak-valley load sequence.

The peak-to-valley extraction is performed in this example using the following method:

in sigma _i Indicating the current rotational speed. According to the method, a load sequence with only peak and valley values can be obtained;

and filtering the load sequence, and deleting small variation in the load sequence to obtain a peak-valley load sequence. In this embodiment, the threshold for filtering is chosen to be 8% of the maximum peak and minimum estimates.

And then, carrying out standardization processing on the peak-valley value load sequence to obtain a peak-valley value sequence.

In this embodiment, the method for normalization processing includes:

wherein x 'is' _i Represents the ith data value, x, after normalization _i Represents the i-th data value before normalization, μ represents the sample mean of the load spectrum peak-valley sequence, σ represents the sample standard deviation of the load spectrum peak-valley sequence.

The calculation method of mu and sigma comprises the following steps:

wherein n represents the total amount of data samples, x _i Representing the ith data value prior to normalization,mean values of the sample data are shown.

S2, calculating a DTW distance based on the peak-valley value sequence, and carrying out load spectrum clustering based on the DTW distance to obtain a primary clustering result;

for any two peak-to-valley sequences (possibly of unequal length) after the above treatment: x= { X ₁ ，x ₂ ，...，x _m }、Y＝{y ₁ ，y ₂ ，...，y _n And measuring the distance or similarity between the two peak-valley sequences. Specific:

first, the Euclidean distance between two peak-to-valley sequence components is calculated: d, d _f,g ＝|x _f -y _g I, wherein x _r And y is _g Components representing two peak-valley sequences, respectively; and constructing a distance matrix based on Euclidean distances between the components, and initializing the distance matrix.

Then, a path { w } is found by the initialized distance matrix ₁ ，w ₂ ，...，w _k -where k e [ max { m, n }, m+n }-1)，w ₁ ＝(1，1)，w _k = (m, n); the paths are monotonic and continuous so that the resulting accumulated DTW distance is minimized. The DTW distance calculating method comprises the following steps:

DTW _ij ＝d(i，j)+min{DTW _i-1，j-1 ，DTW _i，j-1 ，DfW _i-1，j }，

final accumulated DTW _m，n As the DTW distance between the two peak-to-valley sequences, as shown in fig. 2.

And constructing an N multiplied by N DTW distance matrix through the obtained DTW distance, and arranging elements in the DTW distance matrix in an ascending order mode to obtain a DTW value sequence.

Wherein, the DTW distance matrix is as follows:

the elements on the main diagonal of the DTW distance matrix are all 0 and are symmetrical about the main diagonal;

wherein D is _N-1，N Represents the DTW distance between the N-1 load spectrum and the N load spectrum.

And clustering the load spectrum based on the DTW value sequence to obtain a preliminary clustering result. The method comprises the following specific steps:

firstly, arranging elements in a DTW distance matrix D in an ascending order to obtain a DTW value sequence, simultaneously recording an index value corresponding to each element, and enabling the initial class number to be M=N;

wherein D' represents the reordered DTW value sequence, D _j (j=1, 2,..n ') denotes the respective DTW values after reordering, N' denotes the length of the DTW sequence, and N denotes the total number of load spectra to be clustered.

Then, d with the minimum DTW value is calculated according to the order of the sizes ₁ Two correspondingThe load spectrum is gathered into one type and marked as clustered, and the number of the types is reduced by 1 at the moment;

thereafter, from d ₂ Initially, the following determination is made for the two load spectrums corresponding to the load spectrums: if the two load spectrums are not clustered, merging the two load spectrums into one class, subtracting 1 from the class number, and marking the two load spectrums as clustered; if one of the two load spectrums is gathered into a certain class, the other load spectrum is allocated into the class, the class number is reduced by 1, and the spectrums are marked as clustered; if the two load spectrums are divided into different categories, combining the categories to which the two load spectrums belong, and subtracting 1 from the number of the categories;

finally, the previous step is repeated until all DTW values are retrieved or a certain condition is reached.

Through the clustering method, a group of hierarchical clustering tree diagrams of the load spectrum can be obtained, and if the cluster number (namely the class number) of the clusters is set, the corresponding number of the load spectrum in each cluster can also be obtained.

In the embodiment, the Denne coefficient is adopted to evaluate the preliminary clustering result, namely the hierarchical clustering tree diagram, so as to obtain the final category number; specific:

the dunn index is defined as the ratio of the minimum distance between two clusters to the maximum value of the intra-cluster distance, calculated as follows:

wherein K represents the number of categories obtained by final clustering, Ω _m ，Ω _n Represents the m, n, x _f ，y _g Representing elements in both classes, respectively.

The higher the DVI value, the larger the distance between clusters and the smaller the distance between clusters, namely the more obvious the difference between different types is, the higher the similarity in the same type is, and the better the clustering effect is. In the process of polymerizationWhen the classes are classified, calculating DVI values corresponding to different cluster numbers, and selecting the cluster number with larger DVI value and smaller cluster number relative to the total number of samples (namely the total number of load spectrums to be clustered) as the final class number M to be divided ₀ 。

The final category number is set as a cutoff value as the set condition in S2. The result at the cut-off is taken as the final clustering result.

Example two

The embodiment of the invention provides a load spectrum clustering system based on a DTW distance, which comprises the following steps: the device comprises a preprocessing unit, a clustering unit and an evaluation unit.

the working process of the preprocessing unit comprises the following steps:

In this embodiment, the method for normalization processing includes:

wherein x 'is' _i Represents the ith data value, x, after normalization _i Representation ofThe i-th data value before normalization, μ represents the sample mean of the load spectrum peak-valley sequence, σ represents the sample standard deviation of the load spectrum peak-valley sequence.

The calculation method of mu and sigma comprises the following steps:

The clustering unit is used for calculating the DTW distance based on the peak-valley value sequence, and carrying out load spectrum clustering based on the DTW distance to obtain a primary clustering result;

the working process of the clustering unit comprises the following steps:

first, the Euclidean distance between two peak-to-valley sequence components is calculated: d, d _f,g ＝|x _f -y _g I, wherein x _f And y is _g Components representing two peak-valley sequences, respectively; and constructing a distance matrix based on Euclidean distances between the components, and initializing the distance matrix.

Then, a path { w } is found by the initialized distance matrix ₁ ，w ₂ ，...，w _k -where k e [ max { m, n }, m+n-1), w ₁ ＝(1，1)，w _k = (m, n); the paths are monotonic and continuous so that the resulting accumulated DTW distance is minimized. The DTW distance calculating method comprises the following steps:

DTW _ij ＝d(i，j)+min{DTW _i-1,j-1 ，DTW _i,j-1 ，DTW _i-1，j }，

final accumulated DTW _m，n The minimum distance of (2) is taken as the DTW distance between the two peak-valley sequences.

Wherein, the DTW distance matrix is as follows:

Then, d with the minimum DTW value is calculated according to the order of the sizes ₁ The two corresponding load spectrums are gathered into one type and marked as clustered, and the number of the types is reduced by 1 at the moment;

thereafter, from d ₂ Initially, the following determination is made for the two load spectrums corresponding to the load spectrums: if the two load spectrums are not clustered, combining the two load spectrums into one class, subtracting 1 from the class number, and marking the two load spectrums as already clustered at the same timeClustering is performed; if one of the two load spectrums is gathered into a certain class, the other load spectrum is allocated into the class, the class number is reduced by 1, and the spectrums are marked as clustered; if the two load spectrums are divided into different categories, combining the categories to which the two load spectrums belong, and subtracting 1 from the number of the categories;

wherein K represents the number of categories obtained by final clustering, Ω _m ，Ω _n Represents the m, n-th class.

The higher the DVI value, the larger the distance between clusters and the smaller the distance between clusters, namely the more obvious the difference between different types is, the higher the similarity in the same type is, and the better the clustering effect is. When clustering is carried out, calculating DVI values corresponding to different cluster numbers, and selecting the cluster number with larger DVI value and smaller cluster number relative to the total number of samples (namely the total number of load spectrums to be clustered) as the final class number M to be divided ₀ 。

The final category number is set as a cutoff value as a condition set in the clustering unit. The result at the cut-off is taken as the final clustering result.

Example III

In this embodiment, a plurality of load spectrum peak-valley sequences generated randomly are selected, and the method steps of the present invention will be described in detail. In this embodiment, a total of 43 load spectra were selected.

The load spectrum data is processed first, and peak-to-valley extraction and small load deletion are performed to obtain a peak-to-valley load sequence as shown in fig. 3.

This is then normalized, as shown in fig. 4 (a) (b) before and after normalization.

Then calculating the DTW distance between the load spectrums, wherein the final DTW distance matrix is a 43×43 square matrix under the condition of clustering the 43 load spectrums in the embodiment, and the DTW distance sequence length of the part above the main diagonal to be considered in actual clustering isOnly a part of the DTW sequence and its corresponding two payload spectrum numbers are shown here.

The DTW value sequence is:

the corresponding load spectrum index value sequence is as follows:

the above calculated DTW value sequence is reordered in order from small to large, and the corresponding payload spectrum index value sequence is reordered so that the initial class number is m=43, and the reordered result is as follows (only a part is shown due to the longer length).

The corresponding sequence of load spectrum index values at this time is:

in order, the two load spectra corresponding to the first value in D' are grouped into one class, namely spectrum 19 and spectrum 23, and then the two spectra are marked as clustered, and the class number is reduced by 1 (at this time, the class number is 42).

Starting from the 2 nd value in D', it is necessary to determine whether there are already clustered load spectrums, and in this embodiment, the two load spectrums 27 and 37 corresponding to the second value 1.2718 are not clustered, so that the two load spectrums are clustered into one class.

The above operation is repeated for subsequent values in D' until all values are traversed.

Through the steps, hierarchical clustering is completed, and a tree diagram of the hierarchical clustering can be drawn as shown in fig. 5.

The corresponding DVI values when m=2 to 42 are calculated, and a curve of the DVI value with the change of the M value is drawn as shown in fig. 6.

As can be seen from fig. 6, the dunn index increases in trend with increasing cluster number, because increasing cluster number represents decreasing maximum distance in the cluster, and in the extreme case, each spectrum is a single class, where the distance in the cluster is 0 and the dvi value tends to infinity. In theory, the larger the DVI value, the better the clustering effect, but the more clusters are considered, the more clusters are relative to the total number of spectrums, the less significant the clustering is, so that an appropriate cluster number is selected, and meanwhile, the larger the DVI value is ensured, and when the cluster number is 8 (at the position of the dotted line in the figure), the DVI value reaches an extremum, and when the cluster number is increased again, firstly, the DVI value is not significantly increased, and secondly, the cluster number is increased, so that the load spectrum can be divided into 8 types of spectrums, namely, the optimal class number is 8.

Taking m=8 as a clustering cut-off condition, finally obtaining the number of each type of load spectrum, as shown in table 1:

TABLE 1

For the above results, four sets of load spectrum data of the same class and different classes were randomly extracted and compared, and the results are shown in fig. 7 (a) (b) and fig. 8 (a) (b) (for convenience of comparison, the load spectra after normalization are plotted).

The above embodiments are merely illustrative of the preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, but various modifications and improvements made by those skilled in the art to which the present invention pertains are made without departing from the spirit of the present invention, and all modifications and improvements fall within the scope of the present invention as defined in the appended claims.

Claims

1. The load spectrum clustering method based on the DTW distance is characterized by comprising the following steps of:

2. The DTW distance-based load spectrum clustering method of claim 1, wherein S1 comprises:

3. The DTW distance-based load spectrum clustering method of claim 1, wherein S2 comprises:

4. The DTW distance-based load spectrum clustering method of claim 1, wherein S3 comprises:

5. A DTW distance-based load spectrum clustering system, comprising:

the preprocessing unit is used for acquiring load spectrum data and obtaining a peak-valley value sequence based on the load spectrum data;

6. The DTW distance-based load spectrum clustering system of claim 5, wherein the preprocessing unit obtains the peak-to-valley sequence by a method comprising:

7. The DTW distance-based load spectrum clustering system of claim 5, wherein the clustering unit obtains a preliminary clustering result of the search book by the method comprising:

8. The DTW distance-based load spectrum clustering system of claim 5, wherein the method for obtaining the final clustering result by the evaluation unit comprises:

evaluating the preliminary clustering result by adopting the Denne coefficient to obtain a final class number; and setting the final class number as a cut-off value, and obtaining a final clustering result based on the cut-off value.