CN114610595A

CN114610595A - Method, device, equipment and storage medium for identifying model performance influence factors

Info

Publication number: CN114610595A
Application number: CN202210351077.8A
Authority: CN
Inventors: 王尧峰; 曹斌
Original assignee: Neusoft Reach Automotive Technology Shenyang Co Ltd
Current assignee: Neusoft Reach Automotive Technology Shenyang Co Ltd
Priority date: 2022-04-02
Filing date: 2022-04-02
Publication date: 2022-06-10

Abstract

The invention discloses a method, a device, equipment and a storage medium for identifying model performance influence factors, wherein the method comprises the following steps: constructing an initial model, and training the initial model based on the obtained input error training set and the obtained influence factor training set to obtain a target error model; acquiring an input error data set and a target characteristic data set to be analyzed; wherein the target feature data set to be analyzed comprises at least one feature to be analyzed; inputting the input error data set and the target characteristic data set to be analyzed into the target error model to obtain an output error data set; and processing the output error data group to obtain the influence level of each type of the characteristic data to be analyzed on the target error model. According to the technical scheme, the influence level of each feature to be analyzed on the error of the original model can be indirectly determined.

Description

Method, device, equipment and storage medium for identifying model performance influence factors

Technical Field

The invention belongs to the technical field of data processing, and particularly relates to a method, a device, equipment and a storage medium for identifying model performance influence factors.

Background

At present, the original model error analysis method is generally to directly perform error analysis on the original model, the analysis process needs to know the running program and input data of the original model, and needs to re-execute the original model, and only can perform analysis on the data items used by the original model.

However, the currently constructed model cannot call the running program of the original model, so that the original model cannot be re-executed during error analysis; in addition, the data items used by the original model may lack part of input items or have deviation in frequency of part of data, etc., that is, it is difficult to retrieve the input data of the original model; therefore, the error of the original model cannot be analyzed at present.

Disclosure of Invention

The present invention is directed to solving, at least to some extent, one of the technical problems in the related art. Therefore, an object of the present invention is to provide a method, an apparatus, a device and a storage medium for identifying influence factors of model performance.

In order to solve the above technical problem, an embodiment of the present invention provides the following technical solutions:

a method for identifying model performance influencing factors comprises the following steps:

constructing an initial model, and training the initial model based on the obtained input error training set and the obtained influence factor training set to obtain a target error model;

acquiring an input error data set and a target characteristic data set to be analyzed; wherein the target feature data set to be analyzed comprises at least one feature to be analyzed; inputting the input error data set and the target characteristic data set to be analyzed into the target error model to obtain an output error data set;

and processing the output error data group to obtain the influence level of each type of the characteristic data to be analyzed on the target error model.

Optionally, the target feature data group to be analyzed includes any one of:

m kinds of the features to be analyzed;

m-1 features and one of said features to be analyzed;

one of said features to be analyzed;

wherein M is not less than 2 and is an integer.

Optionally, before the initial model is constructed, including,

acquiring an initial characteristic data group to be analyzed and an initial input error data group;

and calculating the correlation degree of the initial characteristic data group to be analyzed and the initial input error data group based on a correlation degree algorithm, and screening the initial characteristic data group to be analyzed based on the correlation degree to obtain a first target characteristic data group to be analyzed.

Optionally, the calculating, based on a correlation algorithm, a correlation between the initial feature data group to be analyzed and the initial input error data group, and screening the initial feature data group to be analyzed based on the correlation to obtain a first target feature data group to be analyzed includes:

calculating the correlation degree of each feature to be analyzed and the initial input error data set;

screening the initial characteristic data group to be analyzed based on the correlation degree to obtain a first target characteristic data group to be analyzed; or

Performing dimensionality reduction processing on the initial feature data group to be analyzed to obtain an initial dimensionality reduction feature data group to be analyzed;

calculating the dimensionality reduction correlation degree of the initial dimensionality reduction characteristic data set to be analyzed and the initial input error data set;

and screening the characteristic data group to be analyzed for dimensionality reduction based on the dimensionality reduction correlation degree to obtain the first target characteristic data group to be analyzed.

Optionally, the constructing an initial model, and training the initial model based on the obtained input error training set and the obtained influencing factor training set to obtain a target error model includes:

acquiring a first input error training set and a first influence factor training set based on the first target characteristic data group to be analyzed;

constructing an initial model, and training the initial model based on the first input error training set and a first influence factor training set to obtain a first target error model;

performing iterative training on the first target error model to obtain an N-1 target error model; wherein N is not less than 2 and is an integer;

acquiring an Nth influence factor training set and an Nth input error training set based on the Nth target characteristic data group to be analyzed;

and performing Nth training on the Nth-1 target error model based on the Nth influence factor training set and the Nth input error training set to obtain an Nth target error model.

Optionally, the processing the output error data group to obtain an influence level of each type of the feature data to be analyzed on the target error model includes:

analyzing the Nth output error data group based on an evaluation algorithm to obtain the influence value of each feature to be analyzed on the Nth target error model;

and determining the influence level of each feature to be analyzed on the Nth target error model based on the influence value.

Optionally, after determining the influence level of each of the features to be analyzed on the nth target error model, the method further includes:

screening the characteristic data to be analyzed of the Nth target based on the influence level of each characteristic to be analyzed on the Nth target error model to obtain an N +1 th target characteristic data group to be analyzed;

acquiring an N +1 influence factor training set and an N +1 input error training set based on the N +1 target feature data group to be analyzed and the N +1 input error data group;

and performing (N + 1) th training on the nth target error model based on the (N + 1) th influencing factor training set and the (N + 1) th input error training set to obtain an (N + 1) th target error model.

The embodiment of the present invention further provides an apparatus for identifying model performance influencing factors, including:

the training module is used for constructing an initial model, training the initial model based on the acquired input error training set and the acquired influence factor training set, and acquiring a target error model;

the acquisition module is used for acquiring an input error data set and a target characteristic data set to be analyzed; wherein the target feature data set to be analyzed comprises at least one feature to be analyzed;

the output module is used for inputting the input error data set and the target feature data set to be analyzed into the target error model to obtain an output error data set;

and the calculation module is used for processing the output error data group to obtain the influence level of each type of characteristic data to be analyzed on the target error model.

Embodiments of the present invention also provide an electronic device comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the method as described above when executing the computer program.

Embodiments of the present invention also provide a computer-readable storage medium comprising a stored computer program, wherein the computer program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the method as described above.

The embodiment of the invention has the following technical effects:

in the technical scheme of the invention, 1) the input error data set is obtained based on RTM, the input error data set can be inconsistent with the input of the original model, and the original model does not need to be executed again, so that the operation program of the original model does not need to be known, the operation is simple, and in addition, because the original model and the target error model share the estimation value set, the influence level of each feature to be analyzed on the error of the target error model can be indirectly determined by determining the influence level of each feature to be analyzed on the error of the original model.

2) A target error model is constructed, and the target error model and the original model are independent from each other and do not influence the operation of each other, so that the target model of the embodiment of the invention has the advantages of strong independence, small interference and high confidence coefficient.

3) The target error model is trained for multiple times based on the type of the features to be analyzed, so that the confidence coefficient of the target error model is improved, and the accuracy of the influence level of each feature to be analyzed on the error of the original model is improved.

4) The target characteristic data group to be analyzed comprises a plurality of categories, and different categories of target characteristic data groups to be analyzed can be selected according to different requirements of users, so that the data processing efficiency of the system and the accuracy of processing results are improved.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

FIG. 1 is a schematic flow chart of a method for identifying model performance influencing factors according to an embodiment of the present invention;

FIG. 2 is a first example of a flow of a method for identifying model performance influencing factors according to an embodiment of the present invention;

FIG. 3 is a second example of a process of a method for identifying model performance influencing factors according to an embodiment of the present invention;

FIG. 4 is a third example of a flow of a method for identifying model performance influencing factors according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an apparatus for identifying model performance influencing factors according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

First, in order to facilitate understanding of embodiments of the present invention by those skilled in the art, some terms are explained:

(1) RTM: real Time Monitor, Real Time monitoring system.

(2) pearson algorithm: pearson correlation coefficient, Pearson correlation coefficient; the pearson correlation coefficient is widely used to measure the degree of correlation between two variables, with a value between-1 and 1.

(3) The PCA algorithm: principal Component Analysis, Principal Component Analysis algorithm.

(4) LASSO: a Least absolute shrinkage and selection operator, lasso algorithm.

(5) RF: random Forest, Random Forest.

(6) MLP: Multi-Layer persistence, Multi-Layer perceptron.

(7) MSE: mean Squared Error, Mean Squared Error.

(8) F1 score: f1 Score.

(9) PDP: parallel distributed processing, parallel distributed model.

(10)ICE：Impact Confidence Ease。

(11)SHAP：SHapley Additive exPlanation。

(12) SOC: state Of Charge.

(13) SOH: state Of Health, Life State.

The embodiment of the invention provides a system for identifying model performance influence factors, which comprises the following steps:

RTM, model builder, processor and memory;

the RTM, the model builder, the processor and the memory perform data interaction based on a network.

In an actual application scene, an original model is internally arranged in the RTM, so that a real value group and an estimated value group are obtained based on the RTM; the method comprises the steps that an estimated value group is output based on an original model, then the estimated value group is sent to an RTM by the original model to be stored, the real value group and the estimated value group are calculated and processed, and a calculation error group is obtained, wherein the calculation error group is also stored in the RTM, namely the estimated value group, the real value group and the calculation error group can be obtained based on the RTM; an input error data set is obtained based on the calculated error set.

Based on the input error data set and a characteristic data set to be analyzed which may affect the original model error, the input data is input into a target error model which is constructed based on a model builder and is obtained through training;

the target error model is obtained by inputting an error training set and an influence factor training set for training based on the model builder, and is independent of the original model, namely the original model does not need to be executed again when the error analysis is carried out based on the target error model.

Further, after the output error data set is obtained based on the target error model, the output error data set is input to the processor, the processor analyzes and processes the output error data set based on an internal evaluation algorithm, and then the influence level or the influence degree of each feature to be analyzed on the target error model is obtained.

Further, in the process of system operation, the generated data can be stored in a memory through a data interface as required for subsequent calling.

In the embodiment of the invention, the input error data set is obtained based on RTM, the input error data set can be inconsistent with the input of the original model, and the original model does not need to be executed again, so that the operation program of the original model does not need to be known, the operation is simple, and in addition, because the original model and the target error model share the estimation value set, the influence level of each characteristic to be analyzed on the error of the target error model can be indirectly determined by determining the influence level of each characteristic to be analyzed on the error of the original model.

As shown in fig. 1, an embodiment of the present invention further provides a method for identifying a model performance influencing factor, which is applied to the above system, and includes:

step S1: constructing an initial model, and training the initial model based on the obtained input error training set and the influence factor training set to obtain a target error model;

according to the embodiment of the invention, a real value group, an estimated value group and an input error data group are obtained from RTM;

the real value set comprises a plurality of real values, and each estimation value set comprises a plurality of estimation values; correspondingly, each real value corresponds to an estimated value to form a group of data, and by analogy, a plurality of groups of data, namely input error data groups, can be obtained.

The kind of features that may be analyzed may be obtained based on business knowledge, for example: the method comprises the steps of determining the temperature, the time, the residual charging capacity, the maximum power of a charging pile, the stability of the charging pile, the ambient temperature, the temperature rise rate and the like, and determining an approximate range of the characteristic types to be analyzed based on business knowledge.

Specifically, taking a real value group and an estimated value group of the charging remaining time as an example, based on business knowledge, a plurality of influence factors and influence modes of errors of a target error model which may influence the charging remaining time can be obtained:

for example: 1) high SOC charging mode: when the charging is close to full charge, the charging mode is changed from constant current to constant voltage or multi-stage constant current, at the moment, the current is reduced, and the time consumption is increased.

2) Charge remaining capacity: the current electric quantity to be charged is related to SOC and SOH;

3) and (3) power supply limitation: including whether the maximum charging power of the current charging equipment and the current provided by the equipment are stable or not;

4) battery equalization: battery equalization may additionally consume energy, thereby affecting charging time;

5) thermal management: on one hand, the thermal management influences additionally consume energy, and on the other hand, the charging power is influenced through a power meter, so that the charging time is influenced;

6) and (3) limiting charging power: the charging power limit is an important factor that affects the charging remaining time, depending on the SOC, temperature, and SOH.

Furthermore, the original model acquires a plurality of groups of data, analyzes the plurality of groups of data and outputs an input error data group.

Each input error data group comprises a plurality of input error data, and the number of the input error data groups is correspondingly consistent with the number of the groups of the data.

In an optional embodiment of the present invention, if the real value group and the estimated value group cannot be obtained based on RTM, only the estimated value group evaluation can be obtained; the evaluation of the estimated value may include large error, small error, pass or fail, etc.

In a practical application scenario, after obtaining the estimation value set evaluation, for example: the evaluation set of estimates includes a subset of qualifying estimates and a subset of disqualifying estimates.

Specifically, the evaluation of the estimated value group may be analyzed based on:

acquiring a target characteristic group to be analyzed, and analyzing the correlation of the target characteristic group to be analyzed and an estimation value evaluation group based on a mean value test algorithm;

further: if a certain feature to be analyzed in the target feature group to be analyzed is taken as a variable, the data of other various feature to be analyzed in the target feature group to be analyzed is kept unchanged, and only the data of the feature to be analyzed is changed, for example; if the feature to be analyzed is temperature, performing gradient transformation on the temperature, for example, adopting gradient change value taking, increasing 5 degrees each time, taking data between 20 and 70 degrees to obtain a plurality of target feature groups to be analyzed, analyzing each target feature group to be analyzed and estimation value group evaluation based on a mean value test algorithm, and observing distribution change of data of qualified estimation value subgroups and unqualified estimation subgroup in the estimation value group evaluation (for example, t test algorithm); the correlation between each feature to be analyzed and the evaluation of the estimated value group can be determined, and then the correlation between each feature to be analyzed and the original model can be obtained, wherein the influence of each feature to be analyzed on the error of the original model can be specifically determined according to the value of the t test, and the influence level is determined according to the influence.

In addition, according to actual needs, a scatter diagram algorithm can be selected to determine the correlation between each feature to be analyzed and the evaluation of the estimation value group.

Further, before the building of the initial model, including,

acquiring an initial characteristic data set to be analyzed and an initial input error data set;

Specifically, the calculating, based on a correlation algorithm, a correlation between the initial feature data group to be analyzed and the initial input error data group, and screening the initial feature data group to be analyzed based on the correlation to obtain a first target feature data group to be analyzed includes:

screening the initial characteristic data group to be analyzed based on the correlation degree to obtain a first target characteristic data group to be analyzed;

the embodiment of the invention can be realized based on pearson algorithm, and specifically, the correlation degree between each feature to be analyzed and the initial input error data set can be determined based on the value of the pearson algorithm; for example: if the value of the pearson algorithm is 0, it is indicated that no linear correlation exists between the feature to be analyzed and the initial input error data set;

if the value of the pearson algorithm is 1, the fact that the characteristic to be analyzed and the initial input error data set have good linear correlation is shown;

……

by analogy, the correlation degree of each feature to be analyzed and the initial input error data set can be determined, further, a correlation degree threshold value can be set, the features to be analyzed, of which the value of the pearson algorithm does not accord with the correlation degree threshold value, are filtered and deleted, and then the features to be analyzed, of which the value of the pearson algorithm accords with the correlation degree threshold value, are specially reserved to form a first target feature data set to be analyzed.

It should be noted that, in the above embodiment, each type of feature to be analyzed corresponds to a plurality of pieces of feature data to be analyzed.

According to the embodiment of the invention, some characteristics with poor error influence correlation with a certain target error model to be input can be deleted, the calculation power of the system is prevented from being wasted in the subsequent calculation process, and the calculation efficiency is improved.

Or, in the embodiment of the present invention, the dimension reduction processing is performed on the initial feature data group to be analyzed to obtain an initial dimension reduction feature data group to be analyzed;

The embodiment of the invention can be realized based on a PCA algorithm, the PCA algorithm is used for carrying out dimensionality reduction on the initial target characteristic data group to be analyzed, a group of characteristic data to be analyzed with larger correlation in the initial target characteristic data group to be analyzed and the initial input error data group is obtained based on dimensionality reduction results, wherein the group of characteristic data to be analyzed comprises a plurality of characteristics to be analyzed, and a first target characteristic data group to be analyzed is obtained based on the plurality of characteristics to be analyzed.

In an actual application scenario, a PCA algorithm or a pearson algorithm can be selected according to actual needs to perform screening processing on the characteristic data group to be analyzed of the Nth target, so as to obtain the characteristic data group to be analyzed of the Nth target.

Further, the constructing an initial model, and training the initial model based on the obtained input error training set and the obtained influencing factor training set to obtain a target error model includes:

In the embodiment of the present invention, in order to improve the matching degree between the nth target error model and the nth input error data set and the nth target feature data set to be analyzed, whenever the total number of types of features to be analyzed included in the nth target feature data set to be analyzed is changed, the current target error model needs to be trained, for example, the nth-1 target error model needs to be trained to obtain the nth target error model.

According to the embodiment of the invention, the target error model is constructed, and the target error model and the original model are independent from each other and do not influence the operation of each other, so that the target model of the embodiment of the invention has the advantages of strong independence, small interference and high confidence coefficient.

In an optional embodiment of the present invention, the nth target error model may be a regression model or a classification model;

if an Nth output error data set needs to be directly obtained based on the Nth target error model, selecting a regression model as the Nth target error model; if only the satisfaction of the set of estimated values of the original model needs to be determined, the classification model can be selected as the nth target error model.

For example: the Nth target error model for LASSO, RF, MLP, etc. structures may be selected.

During the training process of the N-1 st target error model, the confidence of the training result needs to be determined to be reliable, so as to ensure the reliability of the obtained N-1 th target error model.

Specifically, a first threshold value is set, when the (N-1) th target error model is a regression model, the confidence coefficient of the training result is verified based on the MSE algorithm, the step is repeated until the confidence coefficient of the training result is greater than the first threshold value, and the training result corresponding to the confidence coefficient is used as the Nth target error model.

And setting a second threshold, when the N-1 th target error model is the classification model, verifying the confidence coefficient of the training result based on the F1 algorithm, repeating the step until the confidence coefficient of the training result is greater than the second threshold, and taking the training result corresponding to the confidence coefficient as the N-th target error model.

According to the embodiment of the invention, the training result is verified, and the confidence of the Nth output error data set obtained based on the Nth target error model is ensured.

Step S2: acquiring an input error data set and a target characteristic data set to be analyzed; wherein the target feature data set to be analyzed comprises at least one feature to be analyzed;

further, the target feature data group to be analyzed includes any one of:

m kinds of the features to be analyzed;

m-1 features and one of said features to be analyzed;

one of said features to be analyzed;

wherein M is not less than 2 and is an integer.

In an actual application scene, different types of Nth target analysis feature arrays can be selected according to actual conditions;

for example: 1) when the influence of the gradient change of the feature to be analyzed on the error of the nth target error model needs to be judged, the nth target feature data group to be analyzed including the feature to be analyzed can be selected.

2) When the influence of a certain characteristic to be analyzed in the multiple characteristics to be analyzed on the error of the Nth target error model needs to be determined, an Nth target characteristic data group to be analyzed comprising M-1 characteristics and one characteristic to be analyzed can be selected; specifically, the characteristic data group to be analyzed of the Nth target comprises multiple groups of data; wherein, each group of data changes one of the characteristic data to obtain a group of characteristic data to be analyzed; by analogy, a plurality of groups of characteristic data to be analyzed can be obtained; then determining the influence of a certain feature to be analyzed on the error of the Nth target error model;

and repeating the steps to obtain the influence level of each characteristic to be analyzed on the error of the Nth target error model.

3) When the influence of each of the multiple features to be analyzed on the error of the nth target error model needs to be simultaneously judged, an nth target feature data group to be analyzed including the M features to be analyzed can be selected.

According to the embodiment of the invention, the target characteristic data group to be analyzed comprises a plurality of categories, so that the target characteristic data group to be analyzed of different categories can be selected according to different requirements of users, and the efficiency of data processing and the accuracy of processing results of the system are improved.

Step S3: inputting the input error data set and the target characteristic data set to be analyzed into the target error model to obtain an output error data set;

specifically, the inputting the input error data set and the target feature data set to be analyzed into the target error model to obtain an output error data set includes:

inputting the characteristic data group to be analyzed of the Nth target and the Nth input error data group into the Nth target error model to obtain an Nth output error data group; or

And inputting the characteristic data group to be analyzed of the Nth dimensionality reduction target and the Nth input error data group into the Nth target error model to obtain the Nth output error data group.

Step S4: and processing the output error data group to obtain the influence level of each type of the characteristic data to be analyzed on the target error model.

Specifically, the processing the nth output error data group to obtain the influence level of each type of feature data to be analyzed on the nth target error model includes:

According to the embodiment of the invention, after the Nth input error data group is obtained, the Nth input error data group is analyzed based on an evaluation algorithm, wherein the Nth output error data group comprises a plurality of output error data; specifically, the number of input error data is correspondingly kept consistent with the number of input error data.

For example, the nth output error data set is analyzed based on the PDP algorithm or the ICE algorithm, and specifically, the following formula may be used:

in the formula, V represents the category of the characteristic to be analyzed, n represents the number of the characteristic data to be analyzed of V, and i represents the serial numbers in the n characteristic data to be analyzed of V; XS represents the feature to be analyzed to be evaluated; XC is the remaining feature to be analyzed; that is, the sum of XS and XC is the total number of types of the feature to be analyzed contained in the Nth target feature data set to be analyzed.

In an actual application scene, the output results of the PDP algorithm and the ICE algorithm result in the priority of the correlation between each feature to be analyzed and the Nth input error data; according to the priority, the influence level of each feature to be analyzed on the Nth target error model can be judged, and correspondingly, the influence level of each feature to be analyzed on the original model can be obtained.

In an optional embodiment of the present invention, the nth output error data set may be analyzed and processed based on the SHAP algorithm to obtain a contribution degree of each feature to be analyzed to the nth target error model; according to the contribution degree, the influence level of each feature to be analyzed on the Nth target error model can be obtained, and correspondingly, the influence level of each feature to be analyzed on the original model can be obtained.

Further, obtaining the nth target error model includes:

According to the embodiment of the invention, after the influence level of each characteristic to be analyzed on the original model is obtained, a plurality of characteristics to be analyzed in the characteristic data group to be analyzed of the Nth target can be screened based on the influence level according to actual needs; for example, the influence of the time and the temperature on the errors of the two influencing factors and the Nth target error model is determined based on the influence levels, the influence levels are low, and the correlation is poor, so that the two characteristics are deleted in the subsequent analysis process of the model error influencing factors and are not analyzed any more.

For example, if the nth target feature data group to be analyzed includes 8 features to be analyzed, 6 features to be analyzed are left after the temperature and time features are filtered and deleted; that is, the (N + 1) th target feature data group to be analyzed comprises 6 features to be analyzed; therefore, if the N +1 th target to-be-analyzed feature data set is input to the nth target error model for processing, the output result confidence is low, and therefore, the N +1 th influencing factor training set and the N +1 th input error training set need to be obtained again, and the nth training is performed on the nth target error model based on the N +1 th influencing factor training set and the N +1 th input error training set, wherein the nth training includes multiple times of training, and the confidence of the training result is detected based on the MSE algorithm or the F1 algorithm until the confidence meets the requirement, so that the N +1 th target error model can be obtained.

According to the embodiment of the invention, the target error model is trained for multiple times based on the type of the feature to be analyzed, so that the confidence coefficient of the target error model is improved, and the accuracy of the influence level of each feature to be analyzed on the error of the original model is improved.

The embodiment of the invention can be realized based on the following implementation mode:

for example, taking the charging remaining time data in the actual application scenario as the real value group and the estimated value group as the input error data group as an example:

the accuracy of the charging remaining time is influenced by various factors, and the embodiment of the invention analyzes the relationship between the error of the estimated remaining time and the actual charging time and the factors possibly influencing the charging time by a model-free influence factor analysis method and an error analysis method by means of a target error model to obtain the main factors influencing the error of the charging remaining time and the influence level of each factor.

Based on business knowledge, obtaining factors which may influence the charging remaining time, wherein the factors may specifically include two types;

the first type is a physical factor, such as: electrical, electrochemical and thermal parameters of the cell; data relating to these factors may be obtained on an RTM basis;

the second category is policy factors, such as: the charging current and the charging voltage are controlled by a control unit such as a charging pile or a vehicle control device, and the data of the factors can be acquired from RTM (resin transfer molding) or calculated based on historical data; the method for obtaining the policy factor data by calculation based on the historical data is not in the protection scope of the invention, and therefore, the details are not repeated.

Therefore, the first target feature data set to be analyzed in the embodiment of the present invention may select at least one feature to be analyzed from the above two types of factors for analysis based on business knowledge.

Specifically, the influence factors of the charging remaining time original model are identified and analyzed based on the nth target to-be-analyzed feature data group of 3 categories:

1) as shown in fig. 2, the characteristic data set to be analyzed of the first type nth target specifically includes:

acquiring an initial characteristic data group to be analyzed, wherein the initial characteristic data group to be analyzed only comprises one characteristic to be analyzed; for example: if the feature to be analyzed is temperature, the initial feature data set to be analyzed includes a plurality of temperature values, for example: acquiring temperature gradient change data based on a gradient change algorithm, for example: 5 degrees, 10 degrees, 15 degrees, 20 degrees, 25 degrees, 30 degrees, 35 degrees, 40 degrees, … … degrees

Then, screening the initial characteristic data group to be analyzed based on a pearson algorithm, obtaining that the correlation between the temperature data of multiple gradients such as 10-degree data, 15-degree data and 20-degree data … … and the initial input error data group is large, and obtaining a first target characteristic data group to be analyzed based on the temperature data of multiple gradients such as 10-degree data, 15-degree data and 20-degree data … …;

constructing an initial model, obtaining a first input error training set and a first influence factor training set based on a first target characteristic data group to be analyzed, and training the initial model for multiple times based on the first input error training set and the first influence factor training set to obtain a first target error model;

inputting the first target characteristic data group to be analyzed and the first input error data group into a first target error model, and outputting a first output error data group;

analyzing the first output error data set based on a PDP algorithm, and determining the priority ranking of the influence of a plurality of temperature values on the error of the first target error model;

then screening the first target characteristic data group to be analyzed based on priority ranking, selecting the first 20 temperature values, and obtaining a second target characteristic data group to be analyzed;

acquiring a second influence factor training set and a second input error training set based on a second target characteristic data group to be analyzed, and training the first target error model for multiple times based on the second influence factor training set and the second input error training set to acquire a second target error model;

inputting the second target characteristic data group to be analyzed into a second target error model, and outputting a second output error data group;

analyzing the second output error data set based on a PDP algorithm, and determining the priority ranking of the influence of the plurality of temperature values on the error of the second target error model;

then screening a second target characteristic data group to be analyzed based on priority ranking, selecting the first 19 temperature values, and obtaining a third target characteristic data group to be analyzed;

……

analogizing in sequence, outputting an Nth output error data group based on the Nth target error model, and then obtaining the influence contribution degree of each temperature value in the Nth target characteristic data group to be analyzed to the Nth target error model based on the SHAP algorithm;

in addition, the threshold value of N can be set according to actual needs;

if N is equal to the threshold value of N, the flow is terminated;

otherwise, let N be N +1, continue iteration, and repeatedly execute the above-mentioned process.

2) As shown in fig. 3, the characteristic data set to be analyzed of the nth target in the second category specifically includes:

acquiring an initial characteristic data group to be analyzed, wherein the initial characteristic data group to be analyzed only comprises M-1 characteristics and one characteristic to be analyzed; for example: this kind of treat that analysis characteristic is temperature, time, surplus charge capacity, fills electric pile maximum power, fills electric pile stability, ambient temperature and temperature rise rate etc. respectively, then the initial characteristic data set of treating the analysis includes multiunit data, specifically includes: a group of data of M-1 characteristics and changed temperature, a group of data of M-1 characteristics and changed time, a group of data of M-1 characteristics and changed residual charging capacity, a group of data of M-1 characteristics and changed charging pile maximum power, a group of data of M-1 characteristics and changed charging pile stability, a group of data of M-1 characteristics and changed environmental temperature, and the like;

then, screening the data based on a pearson algorithm or a PCA algorithm to obtain that the correlation between various characteristics such as the residual charging capacity, the maximum power of the charging pile, the stability of the charging pile … … and the initial input error data set is large, and obtaining a first target characteristic data set to be analyzed based on the data such as the residual charging capacity, the maximum power of the charging pile, the stability of the charging pile and the like;

analyzing the first output error data set based on a PDP algorithm, and determining the priority ordering of the influence of the plurality of temperature values on the error of the first target error model;

then screening the first target characteristic data group to be analyzed based on priority ranking, selecting the top 20 characteristics, and obtaining a second target characteristic data group to be analyzed; acquiring a second influence factor training set and a second input error training set based on a second target characteristic data group to be analyzed, and training the first target error model for multiple times based on the second influence factor training set and the second input error training set to obtain a second target error model;

analyzing the second output error data group based on a PDP algorithm, and determining the priority ranking of the influence of the characteristics of the residual charging capacity, the maximum power of the charging pile, the stability of the charging pile and the like on the error of the second target error model;

then screening a second target characteristic data group to be analyzed based on priority ranking, and selecting the first 19 characteristics to obtain a third target characteristic data group to be analyzed;

……

analogizing in sequence, outputting an Nth output error data group based on the Nth target error model, and then obtaining the influence contribution degree of each feature to be analyzed in the Nth target feature data group to be analyzed on the Nth target error model based on the SHAP algorithm;

in addition, the threshold value of N can be set according to actual needs;

if N is equal to the threshold value of N, the flow is terminated;

3) As shown in fig. 4, the third type nth target feature data group to be analyzed specifically includes:

acquiring an initial characteristic data group to be analyzed, wherein the initial characteristic data group to be analyzed only comprises M characteristics to be analyzed; for example: the M to-be-analyzed characteristics are temperature, time, residual charging capacity, charging pile maximum power, charging pile stability, ambient temperature, temperature rise rate and the like, and the initial to-be-analyzed characteristic data set comprises data of temperature, time, residual charging capacity, charging pile maximum power, charging pile stability, ambient temperature, temperature rise rate and the like;

……

sequentially analogizing, outputting an Nth output error data group based on the Nth target error model, and then obtaining the influence contribution degree of each of a plurality of characteristics to be analyzed in the Nth target characteristic data group to be analyzed on the Nth target error model based on the SHAP algorithm;

in addition, the threshold value of N can be set according to actual needs;

if N is equal to the threshold value of N, the flow is terminated;

As shown in fig. 3, an embodiment of the present invention further provides an apparatus 500 for identifying model performance influencing factors, including:

the embodiment of the present invention further provides an apparatus 500 for identifying model performance influencing factors, including:

the training module 501 is configured to construct an initial model, and train the initial model based on the obtained input error training set and the obtained influence factor training set to obtain a target error model;

an obtaining module 502, configured to obtain an input error data set and a target feature data set to be analyzed; wherein the target feature data set to be analyzed comprises at least one feature to be analyzed;

an output module 503, configured to input the input error data set and the target feature data set to be analyzed to the target error model, so as to obtain an output error data set;

a calculating module 504, configured to process the output error data set, and obtain an influence level of each type of feature data to be analyzed on the target error model.

Optionally, the target feature data group to be analyzed includes any one of:

m kinds of the features to be analyzed;

m-1 features and one of said features to be analyzed;

one of said features to be analyzed;

wherein M is not less than 2 and is an integer.

Optionally, before the initial model is constructed, including,

acquiring an N +1 th influencing factor training set and an N +1 th input error training set based on the N +1 th target characteristic data group to be analyzed and the N +1 th input error data group;

In addition, other configurations and functions of the apparatus according to the embodiment of the present invention are known to those skilled in the art, and are not described herein for reducing redundancy.

It should be noted that the logic and/or steps represented in the flowcharts or otherwise described herein, such as an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

In the description of the present invention, it is to be understood that the terms "central," "longitudinal," "lateral," "length," "width," "thickness," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," "clockwise," "counterclockwise," "axial," "radial," "circumferential," and the like are used in the orientations and positional relationships indicated in the drawings for convenience in describing the invention and to simplify the description, and are not intended to indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and are therefore not to be considered limiting of the invention.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

In the present invention, unless otherwise expressly stated or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; they may be directly connected or indirectly connected through intervening media, or they may be interconnected within two elements or in a relationship where two elements interact with each other unless otherwise specifically limited. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

In the present invention, unless otherwise expressly stated or limited, the first feature "on" or "under" the second feature may be directly contacting the first and second features or indirectly contacting the first and second features through an intermediate. Also, a first feature "on," "over," and "above" a second feature may be directly or diagonally above the second feature, or may simply indicate that the first feature is at a higher level than the second feature. A first feature being "under," "below," and "beneath" a second feature may be directly under or obliquely under the first feature, or may simply mean that the first feature is at a lesser elevation than the second feature.

Although embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are exemplary and not to be construed as limiting the present invention, and that changes, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A method for identifying model performance influencing factors is characterized by comprising the following steps:

2. The method according to claim 1, wherein the target feature data set to be analyzed comprises any one of the following:

m kinds of the features to be analyzed;

m-1 features and one of said features to be analyzed;

one of said features to be analyzed;

wherein M is not less than 2 and is an integer.

3. The method of claim 1, wherein before constructing the initial model, comprising,

4. The method according to claim 3, wherein the calculating a correlation degree between the initial feature data group to be analyzed and the initial input error data group based on a correlation degree algorithm, and screening the initial feature data group to be analyzed based on the correlation degree to obtain a first target feature data group to be analyzed comprises:

5. The method of claim 3, wherein the constructing an initial model and training the initial model based on the obtained input error training set and the influencing factor training set to obtain a target error model comprises:

performing iterative training on the first target error model to obtain an N-1 target error model; wherein N is more than or equal to 2 and is an integer;

6. The method according to claim 5, wherein the processing the output error data set to obtain the influence level of each of the feature data to be analyzed on the target error model comprises:

analyzing the Nth output error data group based on an evaluation algorithm to obtain an influence value of each feature to be analyzed on the Nth target error model;

7. The method according to claim 6, wherein after determining the influence level of each of the features to be analyzed on the Nth target error model, the method further comprises:

8. An apparatus for identifying model performance affecting factors, comprising:

9. An electronic device comprising a processor, a memory, and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the method of any one of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, comprising a stored computer program, wherein the computer program, when executed, controls an apparatus in which the computer-readable storage medium is located to perform the method of any of claims 1-7.