CN114611604A

CN114611604A - User screening method based on electric drive assembly load characteristic fusion and clustering

Info

Publication number: CN114611604A
Application number: CN202210237252.0A
Authority: CN
Inventors: 王震; 周驰; 赵礼辉
Original assignee: Linui Shanghai Intelligent Technology Co ltd
Current assignee: Linui Shanghai Intelligent Technology Co ltd
Priority date: 2022-03-11
Filing date: 2022-03-11
Publication date: 2022-06-10

Abstract

The invention discloses a user screening method based on electric drive assembly load characteristic fusion and clustering, which comprises the following steps: acquiring user sample load data, and analyzing and obtaining failure leading load of the electric drive assembly based on the user sample load data; acquiring key load characteristic parameters of the electric drive assembly based on the failure dominant load of the electric drive assembly; and processing the key load characteristic parameters of the electric drive assembly, and realizing typical user screening of the reliability key load of the associated electric drive assembly based on the processed load characteristic parameters. The method realizes typical user screening by associating the key load of the reliability of the electric drive assembly, can effectively cover the regional distribution, the driving behavior and the driving working condition of actual overall users, can provide load input for accelerating the endurance specification compilation of the electric drive assembly in the stages of product development and test verification, accelerates the development process of products, and provides certain technical support for high-quality product development.

Description

User screening method based on electric drive assembly load characteristic fusion and clustering

Technical Field

The invention belongs to the field of load data analysis of electric drive systems, and particularly relates to a user screening method based on electric drive assembly load feature fusion and clustering.

Background

The electric drive system is used as a core component of a new energy automobile, and in the development and verification stage of an electric drive assembly product, the reliability design and accelerated durability check of the electric drive assembly under the service condition of an actual user are very important. The method is oriented to a large amount of user data and comprises multidimensional load characteristics, a typical user sample is rapidly screened to serve as the basis for design and verification of an electric drive assembly product, the product research and development period can be shortened, and the product reliability level can be improved. At present, a random sampling method can be used for fast sampling, but due to the randomness of sampling, the sampling result deviates from the overall characteristics, the multi-dimensional load information of an electric drive assembly is difficult to cover by infinite random sampling under the single characteristic, and a user label corresponding to sample data cannot be obtained by adopting a pseudo-random number sampling-based method.

Disclosure of Invention

The invention relates to electric drive assembly failure dominant load construction characteristic parameters, and provides a user screening method based on electric drive assembly load characteristic fusion and clustering, which is used for realizing rapid sampling of typical users and providing small samples and representative load input for accelerated endurance assessment of an electric drive assembly, thereby accelerating the product research and development process.

In order to achieve the purpose, the invention provides the following scheme: a user screening method based on electric drive assembly load characteristic fusion and clustering comprises the following steps:

acquiring user sample load data, and analyzing and obtaining failure dominant load of the electric drive assembly based on the user sample load data;

acquiring a key load characteristic parameter of the electric drive assembly based on the failure dominant load of the electric drive assembly;

and processing the key load characteristic parameters of the electric drive assembly, and realizing typical user screening of the reliability key load of the associated electric drive assembly based on the processed load characteristic parameters.

Preferably, the sample load data at least includes load data in different regions, different driving behaviors and different driving conditions;

the failure leading load of the electric drive assembly at least comprises motor rotating speed, motor torque, acceleration, thermal stress and electric stress.

Preferably, the process of obtaining the critical load characteristic parameters of the electric drive assembly comprises,

determining key load characteristics of the electric drive assembly through load amplitude distribution and combined distribution based on the failure dominant load of the electric drive assembly;

analyzing user characteristic differences based on the key load characteristics of the electric drive assembly to obtain typical characteristic parameters of the failure load of the associated electric drive assembly under user conditions;

the typical characteristic parameters include, but are not limited to, user run time, mileage, speed mean, speed variance, motor torque mean, motor torque variance, acceleration mean, acceleration variance.

Preferably, the processing of the critical load characteristic parameters of the electric drive assembly comprises the steps of performing multi-feature fusion analysis, user feature cluster analysis, minimum sample size determination, classified random sampling, hypothesis testing, establishment of objective functions and constraint conditions, and determination of a typical user optimal solution.

Preferably, the processing of the key load characteristic parameters of the electric drive assembly further comprises user classification by multi-feature fusion analysis and user clustering analysis; determining a sampling minimum sample size based on the total sample size and the expected estimation error; analyzing the difference between the extracted sample characteristics and the overall sample characteristics by adopting a classified random sampling method and a hypothesis testing method; and establishing an objective function and a constraint condition, and realizing typical user screening of the reliability key load of the associated electric drive assembly.

Preferably, the multi-feature fusion analysis is based on a principal component analysis method, the high-dimensional variable and correlated initial feature parameters are converted into a plurality of low-dimensional mutually-irrelevant principal component feature parameters, and the electrical drive assembly key load feature parameters are subjected to dimensionality reduction to obtain a dimensionality reduction feature matrix; the principal component feature parameter retains parameter information of the initial feature parameter.

Preferably, the user clustering analysis is based on a dimension reduction characteristic matrix, and a clustering algorithm is adopted to classify the users to obtain a user characteristic clustering classification result;

and the classified random sampling carries out random sampling on each type of users based on the user characteristic clustering classification result.

Preferably, the hypothesis test analyzes the extracted sample using a Z test;

the Z test statistic was:

wherein the content of the first and second substances,

is the sample mean, μ₀Is the overall mean, σ is the overall standard deviation, and n is the number of samples.

The invention discloses the following technical effects:

the invention provides a user screening method based on electric drive assembly load feature fusion and clustering, which plays an important role in links such as product design, simulation, test verification and the like based on typical user load data in the electric drive assembly product research and development process. According to the invention, the key load of the reliability of the electric drive assembly is associated, typical user screening is realized by adopting technical methods such as multi-feature fusion, cluster analysis, minimum sample size determination, classified random sampling, hypothesis test and the like, the regional distribution, driving behaviors and driving conditions of actual overall users can be effectively covered, the key load information of the reliability of the electric drive assembly is considered in the screened user load characteristics, the load input can be provided for accelerating the endurance specification compilation of the electric drive assembly in the stages of product development and test verification, the product research and development process is accelerated, and a certain technical support is provided for the high-quality product development.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a flow chart of a method according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

As shown in FIG. 1, the invention provides a user screening method based on electric drive assembly load feature fusion and clustering, which comprises the following steps:

acquiring user sample load data, and analyzing and obtaining failure leading load of the electric drive assembly based on the user sample load data;

acquiring key load characteristic parameters of the electric drive assembly based on the failure dominant load of the electric drive assembly;

The sample load data at least comprises load data in different regions, different driving behaviors and different driving working conditions;

The process of obtaining the critical load characteristic parameters of the electric drive assembly includes,

and analyzing the user characteristic difference based on the key load characteristics of the electric drive assembly to obtain the typical characteristic parameters of the failure load of the associated electric drive assembly under the user condition.

The method comprises the steps of processing the key load characteristic parameters of the electric drive assembly, including multi-feature fusion analysis, user feature cluster analysis, minimum sample size determination, classified random sampling, hypothesis testing, target function and constraint condition establishment, and typical user optimal solution determination.

Processing the key load characteristic parameters of the electric drive assembly further comprises the steps of carrying out user classification by adopting multi-characteristic fusion analysis and user clustering analysis; determining a sampling minimum sample size based on the total sample size and the expected estimation error; analyzing the difference between the extracted sample characteristics and the overall sample characteristics by adopting a classified random sampling method and a hypothesis testing method; and establishing an objective function and a constraint condition, and realizing typical user screening of the reliability key load of the associated electric drive assembly.

The multi-feature fusion analysis is based on a principal component analysis method, the high-dimensional variable and correlated initial feature parameters are converted into a plurality of low-dimensional mutually-irrelevant principal component feature parameters, and the electrical drive assembly key load feature parameters are subjected to dimension reduction to obtain a dimension reduction feature matrix; the principal component feature parameter retains parameter information of the initial feature parameter.

The user clustering analysis is based on a dimension reduction characteristic matrix, and a clustering algorithm is adopted to classify users to obtain a user characteristic clustering classification result; classified random sampling each class of users is randomly sampled based on the user feature clustering classification results. Hypothesis testing the samples taken were analyzed using the Z test.

Example one

Further, the invention provides a user screening method based on electric drive assembly load feature fusion and clustering, which comprises the following steps: the method comprises the steps of analyzing failure dominant loads of an electric drive assembly, determining key load characteristics of the electric drive assembly from the load amplitude distribution and combined distribution angles, and determining the optimal solution of a typical user by multi-characteristic fusion analysis, user characteristic cluster analysis, minimum sample size determination, classified random sampling, significance test, target function and constraint condition establishment. The specific implementation steps are as follows:

step 1, load data of 3 thousands of new energy vehicles in different regions, different driving behaviors and different driving conditions are contained on the basis of user overall sample data, failure leading loads of an electric drive assembly are analyzed, and load factors influencing reliability of the electric drive assembly in the user data are determined.

In this embodiment, step 1 is based on user overall sample data, and because a lot of load channels are included in the actual user load collection process, the critical load affecting the failure of the electric drive assembly component needs to be concerned for the reliability evaluation of the electric drive assembly, so that representative characteristic parameters are constructed. The failure leading load of the electric drive assembly component mainly comprises load characteristics such as motor rotating speed, motor torque, acceleration, thermal stress, electric stress and the like.

And 2, determining the key load characteristics of the electric drive assembly from the load amplitude distribution and combined distribution angles. The speed mean value, the speed variance, the torque mean value, the torque variance, the acceleration mean value and the acceleration variance of each user in a certain time can be counted for different users. In addition, in order to reflect the use conditions of different users, the invention uses the characteristics of running time, driving mileage and the like to distinguish the difference between different users. And finally, analyzing the characteristic difference among users based on the load characteristic amplitude distribution and the combined distribution so as to select the typical characteristic parameters of the related electric drive assembly failure load under the user condition.

In the embodiment, the load amplitude distribution and the combined distribution angle in the step 2 determine the key load characteristics of the electric drive assembly. By counting the load characteristic amplitude distribution and the combined distribution under different users, the behavior characteristics and the difference of the users can be reflected, and a basis is provided for load characteristic selection in the typical user screening process. The test statistics shows that the total driving mileage is 1 ten thousand kilometers in 8 months, and the total driving mileage of more than 90 percent of users is less than 28000 kilometers; the calculation shows that the maximum number of users can run for about 1.4h per day in 8 months on average; the average speed of the user is between 10km/h and 70 km/h. The mean distribution of the torques is mainly divided into positive torques and negative torques, the mean of the positive torques being substantially 10Nm to 25Nm on average, and the negative torques being substantially-40 Nm to 0Nm on average; the average vehicle speed is 10 km/h-70 km/h, and the torque average characteristic is divided into a positive torque and a negative torque. From the view of load fluctuation, the larger the vehicle speed variance is, the larger the rear motor torque variance is, the certain linear relationship between the vehicle speed variance and the rear motor torque variance is, and 16 characteristic parameters are selected to screen typical users, as shown in table 1.

TABLE 1

Serial number	Characteristic parameter	Means of
			1	day	Days of operation
2	mileage	Total mileage
			3	mile_per_day	Average daily mileage
4	speed_cnt	Total time of operation
			5	speed_mean	Mean value of velocity
6	speed_Variance	Variance of velocity
			7	Torque_F_cnt	Front motor run time
8	Toque_F_mean	Front motor torque mean
			9	Torque_F_Variance	Front motor torque variance
10	Torque_R_cnt	Run time of rear motor
			11	Torque_R_mean	Mean value of torque of rear motor
12	Torque_R_Variance	Rear motor torque variance
			13	Acc_X_mean	Mean value of acceleration in X direction
14	Acc_X_Variance	Variance of acceleration in X direction
			15	Acc_Y_mean	Mean value of acceleration in Y direction
16	Acc_Y_Variance	Variance of acceleration in Y direction

And 3, performing multi-feature fusion analysis. Due to the large total user sample amount and multiple parameter dimensions, the extraction of samples based on single feature distribution is difficult to cover the whole feature information. Extracting samples separately considering a plurality of feature distributions increases the amount of calculation. Therefore, the invention provides a dimension reduction method considering multi-feature fusion, which converts complex feature indexes with high dimensional variables and certain correlation into a plurality of low-dimensional independent principal components for replacement and reserves a large amount of information in original feature parameters. The invention adopts a principal component analysis method to realize the key load characteristic dimension reduction of the electric drive assembly, and carries out principal component extraction when the accumulated information contribution rate reaches more than 80 percent, and the extracted principal component can cover most of information content in the original characteristic.

In this embodiment, the multi-feature fusion analysis method in step 3 mainly adopts a principal component analysis method, and assumes that there are n users, each user has p feature parameters, and the calculated feature parameter matrix of all users is denoted as X. Setting a random variable X_iHas a mean value of μ_iIf the covariance matrix is Σ i, then the p feature parameters are linearly transformed through principal component analysis, and the generated comprehensive index is the principal component and is marked as y₁,y₂,…,y_p. Wherein the characteristic parameter matrix X can be expressed as:

the expression of the transformed principal component is shown as formula (2):

wherein, y₁,y₂,…,y_pIs each main component, x₁,x₂,…,x_pIs an original characteristic parameter l'₁,l'₂,…,l'_pThe load factor is a characteristic parameter in each principal component.

Solving each principal component y from correlation coefficient matrix or covariance matrix_pWherein the variance and covariance calculation formula is as follows:

var(y_j)＝l'_j∑l_j j＝1,2,…,p (3)

cov(y_j,y_k)＝l'_j∑l_k j,k＝1,2,…,p (4)

before solving the eigenvalue and the corresponding eigenvector through the covariance matrix, the dispersion degree difference is large due to the values of each variable caused by different dimensions due to the influence of the result dimensions of the principal component analysis. In order to eliminate some unreasonable influence caused by different dimensions, each original variable is usually standardized. The normalized variables are then:

wherein, vector Z ═ (Z)₁,z₂,…,z_p) The covariance matrix of' is the correlation coefficient matrix ρ ═ ρ (x)_i,x_j)]_p×p. After principal component analysis, the covariance matrix of the principal components is a diagonal matrix Lambda, and the characteristic root of which the diagonal elements are correlation coefficients rho is Lambda₁,λ₂,…λ_p. Wherein the matrix of correlation coefficients

Ith principal component y_iWith the original characteristic parameter variable x_jThe correlation coefficient between them is called factor load quantity, it reflects the original characteristic parameter variable and mainThe closer the absolute value of the closeness between the components is to 1, the closer the relationship is.

The total cumulative contribution to each principal component can be represented by r:

in the formula (6), the contribution ratio of the jth principal component is λ_jR, the cumulative variance contribution rate of the first m principal components is

When the cumulative contribution rate reaches more than 80%, the first m principal components are extracted to replace most of the characteristic information quantity in the original variables.

And 4, clustering and analyzing the user characteristics. And (3) classifying the users by adopting a clustering algorithm based on the 5-dimensional feature matrix after the dimension reduction obtained in the step (3), and taking the distance from each user in each class to the clustering center after clustering as a basis for distinguishing the user difference, thereby realizing the independent sampling of each class.

In the embodiment, aiming at the user characteristic clustering analysis in the step 4, based on the principal component obtained after the dimensionality reduction in the step 3, the user categories are divided by adopting a K-Means clustering algorithm, and the distance from each user in each category to the clustering center after clustering is used as a basis for distinguishing the user difference, so that each category is independently sampled.

Assume the sample data is matrix X_n×mWherein n is the number of samples, and m is the corresponding characteristic dimension. Initializing k cluster centers { C₁,C₂,…,C_kAnd calculating the Euclidean distance from each sample to the center of each cluster, wherein the formula is as follows:

in the formula (7), X_iRepresenting the ith sample, i is more than or equal to 1 and less than or equal to n; c_jJ is more than or equal to 1 and less than or equal to k, and represents the jth clustering center;X_itrepresents the t-th characteristic of the ith sample, wherein t is more than or equal to 1 and less than or equal to m; c_jtRepresenting the t-th attribute of the j-th cluster center.

For the selection of the cluster number k, the elbow rule is that when the k value increases, the inflection point of the drop of the loss function is calculated, the damage function is the square sum of errors in the cluster (SSE), the distance from the sample point in the cluster to the cluster center reflects the degree of agglomeration in the cluster, and then the calculation formula for the square sum of errors in the cluster is as follows:

in the formula (8), C_iIs a cluster of type i, p is C_iSample point of (1), m_iIs C_iCluster center point of (2).

And dividing the users into three categories through k-means clustering, wherein the first category only comprises two users, and the total driving mileage and the running time of the two users are minimum and deviate from the total characteristics of the users. Therefore, in this embodiment, only the second class and the third class are sampled separately.

And 5, determining the minimum sample size. The determination of the minimum sample size in a sample is related to the sampling error, the sample mean value fluctuates above and below the true mean value in the process of multiple sampling, and the fluctuation of the deviation relative to the overall mean value is the sampling error. The confidence interval for the overall mean is made up of the sample mean x and the estimation error. Under the conditions of repeated sampling and infinite overall sampling, the estimation error is

z_α/2Together with the sample size n determines the magnitude of the estimation error. Once the confidence level 1-alpha, z is determined_α/2The value of (c) is determined. For a given z_α/2The value of (c) and the total standard deviation sigma, the amount of samples needed for any desired estimation error can be determined. Assuming the expected estimation error is ε, then

From this, the formula for determining the sample size can be derived as follows:

where ε is the acceptable estimation error at a given confidence level, z_α/2Is determined by the confidence level used in the interval estimation.

And 6, classifying and randomly sampling. Because the Euclidean distance from each user to the cluster center of the cluster is determined after the user characteristics are clustered, the invention respectively carries out random sampling on each class by taking the mean value of the Euclidean distance of each class of overall samples as a target, so that the distance mean value and the variance of the extracted samples are close to the mean value and the variance of the whole class.

Step 7, hypothesis testing. Based on the sample information, the difference between the overall index and the sample index is deduced according to a certain probability level. For known overall variance, the extracted samples can be analyzed using the Z-test, with Z-test statistics:

in the formula (11), the reaction mixture is,

When each type of user is randomly sampled, the characteristic mean value of the extracted sample is assumed to be equal to the overall mean value: h0; the alternative assumption is that: the feature mean of the extracted samples deviates from the overall mean: H1. assuming that the probability of the occurrence is P, the larger the value of P, the closer the feature mean of the extracted sample is to the overall feature mean.

In this embodiment, the overall variance described in step 7 is the variance of the distance from each user to the center of the cluster.

Step (ii) ofAnd 8, establishing an objective function and a constraint condition. In order to make the user sample mean extracted from each class of users approximately equal to the class of overall sample mean, and the sample variance is close to the overall variance. Taking the probability P value of the establishment of the hypothesis in the Z test as an optimization parameter, and assuming that the overall variance of each type of user characteristics is

The variance of the extracted samples is

Thus establishing the objective function and constraints in the random sampling process. An objective function: p>0.99 or max (P); constraint conditions are as follows:

and 9, determining the optimal solution of the typical user. And (4) respectively carrying out infinite sampling on each type of users based on the objective function and the constraint condition established in the step (8) until the condition is met. In addition, the screened typical users need to further consider the regional distribution of the rate, and because different regions and road conditions have certain differences in the damage to the electric drive assembly, the screened typical users need to be adjusted in combination with the regional distribution of the users so as to cover users in multiple regions.

In this embodiment, the minimum sample size is 200 users, and it is determined that 119 users are extracted from the second class and 81 users are extracted from the third class according to the ratio of the second class and the third class users after the cluster analysis. And obtaining various typical user samples through infinite classification sampling, wherein the mean value and the variance of the second type sample are shown in table 2, and the mean value and the variance of the third type sample are shown in table 3. And finally, determining 200 typical users including 18 provincial regions, 4 direct municipalities and 2 autonomous region users by combining the regional distribution conditions of the users. The method has certain representativeness in areas of northeast, northwest, China, east China, southwest and south China.

TABLE 2

Sample mean	Global mean value	Mean difference rate
			2.5695	2.5701	-0.023％
Sample variance	Total variance	Variance ratio of difference
			1.9665	1.9671	0.031％

TABLE 3

Sample mean	Global mean value	Mean difference rate
			2.4492	2.4509	-0.069％
Sample variance	Total variance	Variance ratio of difference
			1.9550	1.9575	0.128％

The method comprises the steps of analyzing failure dominant loads of the electric drive assembly based on user overall sample data, determining load factors influencing the reliability of the electric drive assembly, analyzing user characteristic differences from load amplitude distribution and combined distribution, determining key load characteristic parameters of the electric drive assembly, classifying users by adopting a multi-characteristic fusion analysis and cluster analysis method, determining minimum sample size of sampling based on overall sample size and expected estimation errors, and realizing typical user screening of the reliability critical loads of the associated electric drive assembly by adopting a classified random sampling method, a hypothesis testing method, an established objective function and constraint conditions. Typical users screened according to the technical method can effectively cover the actual overall user regional distribution, driving behaviors and driving conditions, wherein the screened user load characteristics take the reliability key load information of the electric drive assembly into consideration, so that load input can be provided for accelerating the endurance specification compilation of the electric drive assembly in the product development and test verification stages, and the product research and development process is accelerated.

The above-described embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements of the technical solutions of the present invention can be made by those skilled in the art without departing from the spirit of the present invention, and the technical solutions of the present invention are within the scope of the present invention defined by the claims.

Claims

1. A user screening method based on electric drive assembly load feature fusion and clustering is characterized by comprising the following steps:

2. The electric drive assembly load signature fusion and clustering-based user screening method of claim 1,

the sample load data at least comprises load data in different regions, different driving behaviors and different driving conditions;

3. The electric drive assembly load signature fusion and clustering-based user screening method of claim 1,

the typical characteristic parameters include, but are not limited to, user running time, mileage, speed mean, speed variance, motor torque mean, motor torque variance, acceleration mean, and acceleration variance.

4. The electric drive assembly load signature fusion and clustering-based user screening method of claim 1,

the processing of the key load characteristic parameters of the electric drive assembly comprises the steps of performing multi-feature fusion analysis, user feature cluster analysis, minimum sample size determination, classified random sampling, hypothesis testing, establishment of a target function and constraint conditions, and determination of a typical user optimal solution.

5. The electric drive assembly load signature fusion and clustering based user screening method of claim 4,

6. The electric drive assembly load signature fusion and clustering-based user screening method of claim 5,

7. The electric drive assembly load signature fusion and clustering-based user screening method of claim 5,

the user clustering analysis is based on a dimension reduction characteristic matrix, and a clustering algorithm is adopted to classify users to obtain a user characteristic clustering classification result;

8. The electric drive assembly load signature fusion and clustering-based user screening method of claim 5,

the hypothesis test adopts a Z test to analyze the extracted sample;

the Z test statistic was:

wherein the content of the first and second substances,