CN112800678B

CN112800678B - Multi-task selection model construction method, multi-task selective maintenance method and system

Info

Publication number: CN112800678B
Application number: CN202110123843.0A
Authority: CN
Inventors: 皮德常; 徐悦; 陈阳
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2021-01-29
Filing date: 2021-01-29
Publication date: 2024-05-28
Anticipated expiration: 2041-01-29
Also published as: CN112800678A

Abstract

The invention relates to a multitask selection model construction method, a multitask selective maintenance method and a system, wherein modeling is built aiming at multitask selective maintenance by combining an effective life attenuation model and a risk function adjustment model, the problem that the model in the prior art can only be selectively maintained aiming at a single task is solved, and the reliability of a system and the precision of maintenance cost are calculated by using the model; the multi-task selective maintenance optimal problem is standardized into a discrete Markov decision process, and the neural network structure is adjusted to solve based on the reinforcement learning framework, so that the optimal maintenance strategy of each component during interruption and the optimal operation time of each maintenance task are realized, the maintenance efficiency and the maintenance cost are improved, and the defects of unstable dimension and dimension disaster are effectively overcome.

Description

Multi-task selection model construction method, multi-task selective maintenance method and system

Technical Field

The invention relates to the field of crossing technology of engineering application and information science, in particular to a method for constructing a multi-task selection model, a method for maintaining multi-task selectivity and a system thereof.

Background

Maintenance activities play an important role in the economics of the modern industry, with costs of about 28% and up to 40% of the total production costs. In general, components may be maintained during interrupts to improve the reliability of subsequent task execution. However, due to maintenance resource limitations (e.g., budgets, time, labor, and repair facilities), it is not possible to perform repair on all components. To overcome this problem, a decision maker selects several components during an interruption and repairs the selected components according to a corresponding repair level, a process called selective maintenance.

The research of the selective maintenance problem is focused on two aspects of a model establishment and a solving method, and belongs to the nonlinear integer programming problem. The existing selective maintenance model is limited to the state change of a single task, and mainly aims at the selective maintenance of the single task, and discusses the reliability evaluation and cost calculation of a component or a system under the conditions of multiple states, randomness, independence, ambiguity, uncertainty and the like. In addition, in the prior art, when reliability evaluation modeling is carried out, only life degradation brought by component maintenance is considered, so that reliability evaluation results are inaccurate.

Based on this, there is a need in the art for a method of constructing a maintenance model that selectively maintains multiple tasks to improve the reliability assessment of the system.

Disclosure of Invention

The invention aims to provide a multi-task selection model construction method, a multi-task selection maintenance method and a system, which realize multi-task modeling, and can realize accurate evaluation and calculation of the reliability and maintenance cost of the system according to the model.

In order to achieve the above object, the present invention provides the following solutions:

a method for constructing a multitasking selection model comprises the following steps:

Calculating a risk rate function of each maintenance task under different maintenance strategies by adopting an effective life attenuation model and a risk function adjustment model, wherein the maintenance strategies comprise a component non-maintenance strategy, a component replacement strategy and an imperfect maintenance strategy;

Calculating a reliability function and a cost function according to the risk rate function;

establishing a multi-task selection model according to the reliability function and the cost function;

evaluating the reliability of a system where the component is located and the maintenance cost of the component according to the multi-task selection model;

and selecting a maintenance strategy for the assembly at each maintenance task according to the reliability and the maintenance cost.

A method of multitasking selective maintenance, comprising:

Constructing a target maintenance model by taking the minimum cost function as a target when the reliability function in the multi-task selection model meets a reliability preset value;

Converting the target maintenance model into a Markov decision model, wherein the Markov decision model is a four-tuple function comprising a state space, an action space, a reward function and a state transfer function, the action space represents a set of all actions, and the actions comprise maintenance strategies of the component in each interruption time and running time of each maintenance task;

Training, optimizing and solving the Markov decision model by adopting a reinforcement learning framework and a neural network;

When the training times meet the predicted training times, outputting an optimal maintenance strategy for all components in each interruption time and the optimal running time of each maintenance task;

and maintaining the assembly according to the optimal maintenance strategy and the optimal running time.

A multitasking selective maintenance system comprising:

the model construction module is used for constructing a target maintenance model by taking the minimum cost function as a target when the reliability function in the multi-task selection model meets the reliability preset value;

a model conversion module for converting the target maintenance model into a Markov decision model, wherein the Markov decision model is a four-tuple function comprising a state space, an action space, a reward function and a state transfer function, and the action space comprises a maintenance strategy for the component during each interruption and a running time of each maintenance task;

The solving module is used for carrying out training optimization solving on the Markov decision model by adopting a reinforcement learning framework and a neural network;

the output module is used for outputting the optimal maintenance strategies of all components and the optimal running time of the maintenance tasks when the training times meet the predicted training times;

and the maintenance module is used for maintaining the assembly according to the optimal maintenance strategy and the optimal running time.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects:

The invention solves the problem that the model in the prior art can only be selectively maintained for a single task by constructing a multi-task selection model, and has strong applicability;

when the reliability and the maintenance cost of the component system are evaluated by the multitask selection model, the reliability and the cost calculation precision are improved by combining the effective life attenuation model and the risk function adjustment model.

The invention adjusts the neural network structure based on the reinforcement learning framework, solves the multi-task selection model, and effectively solves the problems of unstable dimension and dimension disaster; on the premise of meeting specific system resources (preset time, required reliability, limited manpower and maintenance facilities), the optimal maintenance strategy of each component during interruption and the optimal operation time of each maintenance task are realized, and the maintenance efficiency and the maintenance cost are improved.

The invention can be applied to the selective maintenance system with multiple tasks, various scales and types after expansion.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a method for constructing a multi-task selection model according to embodiment 1 of the present invention;

FIG. 2 is a maintenance task sequence chart of applying a risk ratio function corresponding to an imperfect maintenance strategy to maintenance tasks provided in embodiment 1 of the present invention;

fig. 3 is a flowchart of a method for selectively maintaining multiple tasks provided in embodiment 2;

fig. 4 is a block diagram of a multitasking selective maintenance system according to embodiment 3.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.

Example 1:

referring to fig. 1, the present invention provides a method for constructing a multitasking selection model, including:

step S1: calculating a risk rate function of each maintenance task under different maintenance strategies by adopting an effective life attenuation model and a risk function adjustment model, wherein the maintenance strategies comprise a component non-maintenance strategy, a component replacement strategy and an imperfect maintenance strategy;

The decision maker does not take any maintenance action on the component, namely, takes a maintenance-free strategy, and the component state under the strategy is in a 'recovered as old' state because the component state under the strategy is not improved, and the risk rate function is maintained unchanged, and the corresponding formula is as follows:

h_i,k(t_k+x)＝h_i,k-1(B_i,k-1+x)

Wherein x represents the time of the component i from the last maintenance to the kth maintenance task, k is a positive integer greater than or equal to 1, h _i,k () is the risk ratio function of the kth maintenance task to the component i, h _i,k-1 () is the risk ratio function of the kth-1 maintenance task to the component i, t _k is the logic running time of the system where the component i is located, and B _i,k-1 is the effective life of the component i before maintenance;

The process of repairing or replacing the component by the decision maker is to take a component replacement strategy, under which the state of the component is from the new state, called "recovery as new", and the risk ratio function is reset to the function at the beginning, and the corresponding formula is:

h_i,k(t_k+x)＝h_i,0(x)

wherein h _i,0 () is a risk function before the component i is not repaired;

Component states under the imperfect repair policy (IM) are between "restored as old" and "restored as new". Fig. 2 is a maintenance task sequence diagram of applying a risk ratio function corresponding to an imperfect maintenance strategy to a maintenance task, where tm _k and tb _k represent the running time of the kth maintenance task and the maintenance time spaced from the kth maintenance task, and t _k and B _k are the logic running time at the beginning and the effective lifetime at the end of the kth maintenance task, respectively.

The risk rate function corresponding to the imperfect maintenance strategy is:

h_i,k(t_k+x)＝A_i,k-1·h_i,k-1(b_i,k-1·B_i,k-1+x)

Wherein A _i,k-1 is the accumulation of the risk factor adjustment parameters a _k' for the previous k-1 maintenance tasks for the component i, B _i,k-1 is the effective life decay parameter of the component i, and k 'is the kth' service interval.

Step S2: calculating a reliability function and a cost function according to the risk rate function;

the reliability function is expressed as:

Wherein R _i is the reliability of the system in which the component i is located, D _k is the running time of the kth maintenance task for the component i, x represents the time from the last maintenance of the component i to the kth maintenance task, h _i,k () is the risk ratio function of the kth maintenance task for the component i, and t _k is the logic running time of the system in which the component i is located;

The cost function is expressed as:

Wherein C _i is the maintenance cost for the component i, For a fixed repair cost for the component i, c _i (l) is the cost of taking a repair policy for the component i, when l=1, selecting the component not repair policy; selecting the replacement component policy when l=2; when l=3, the imperfect repair strategy is selected.

Step S3: establishing a multi-task selection model according to the reliability function and the cost function;

step S4: evaluating the reliability of a system where the component is located and the maintenance cost of the component according to the multi-task selection model;

step S5: and selecting a maintenance strategy for the assembly at each maintenance task according to the reliability and the maintenance cost.

The method solves the problem that the model in the prior art can only be selectively maintained for a single task by constructing a multi-task selection model, and has strong applicability; in the embodiment, the risk rate adjustment and the effective life attenuation are considered simultaneously, the risk rate functions corresponding to different maintenance strategies are calculated, and when the risk rate functions are further applied to the reliability evaluation of the system, the accuracy can be greatly improved.

Example 2:

different maintenance strategies can lead to changes in system status. Which repair strategy is to be taken on the component under different system conditions so that minimizing costs under specific system resources (pre-set time, required reliability, defined manpower and repair facilities) is a problem currently being investigated universally in the art.

Because of the large and complex scale of the solution space of this problem, the current main solution methods include evolutionary algorithms and reinforcement learning methods. The evolutionary algorithm, such as genetic algorithm, differential evolutionary algorithm, particle swarm optimization algorithm and the like, can well solve the optimization problem of NP-difficult (NP-Hard problem, namely, non-deterministic problem of polynomial complexity); however, when the problem dimension is variable, the conventional evolutionary algorithm cannot be directly applied to problem solving. The conventional reinforcement learning method has a dimension disaster problem, especially when faced with a large-scale complex system.

In order to overcome the defects in the prior art, referring to fig. 3, the present invention further provides a method for selectively maintaining multiple tasks, including:

Step Sa: constructing a target maintenance model by taking the minimum cost function as a target when the reliability function in the multi-task selection model meets a reliability preset value;

the target maintenance model is expressed as:

Wherein s.t. denotes a constraint, C denotes a cost function in the target maintenance model, i, n denotes the number of components, V represents the number of subsystems included in the system where the component is located, v ' represents a specific subsystem, n _v' represents the number of components included in the v ' subsystem, R ₀ represents the lower limit of the reliability preset value, R represents the reliability function in the target maintenance model, v ' is the v ' subsystem, and u ' is the u ' component in the v ' subsystem.

Step Sb: the target maintenance model is converted into a Markov decision model, wherein the Markov decision model is a four-element function comprising a state space, an action space, a rewarding function and a state transfer function, the action space represents a set of all actions, and the actions comprise maintenance strategies of the component in each interruption time and running time of each maintenance task.

Since the transition of system component states (e.g., useful life, risk functions, reliability, etc.) is independent of historical values, the target maintenance model may be converted to a Markov decision model. The four-tuple function can be expressed as (S, a, P, R), where:

S is a state space, representing a set of all states of the system. For a component i at the kth maintenance task, its state _i (k) may include the accumulation of component risk rate adjustment parameters a _i,k, the effective lifetime of the component at the end of the maintenance task B _i,k, the system reliability R _i,k, the logical runtime t _k of the overall system;

Component state _i (k) is expressed as:

state_i(k)＝[A_i,k,B_i,k,R_i,k,t_k]；

The system state is expressed as:

state(k)＝[state₁(k),state₂(k),...,state_n(k)]；

A is an action space, representing a set of all actions. Action _i (k) for a component i at the kth maintenance task, including all possible maintenance policies l _i,k at each interruption and the run length D _k of the system at the kth maintenance task; the whole action space comprises maintenance strategies of various components and the whole operation time of the system, namely action (k) = [ l _1,k,...,l_n,k,D_k ], wherein n represents the number of the components;

r is a reward function representing a set of all rewares, i.e. the value of the benefit obtained after an action is performed on the current state. In this embodiment, whether the cost function meets the requirement of the system reliability is taken as a reward function, and the calculation formula is as follows:

reward(k)＝-(C+(R＜R₀)×ζ)

Zeta is a punishment coefficient when the reliability requirement is not met, and if the system reliability R is lower than the reliability value R ₀ of the requirement, the punishment coefficient is added to the rewarding function;

P is a state transfer function, which is used for obtaining a transfer function of a next state after executing a certain action on a current state, and can refer to different maintenance strategies, namely, the calculation of a component risk rate function is caused, and the method is concretely as follows: when adopting a maintenance-free strategy, the risk rate adjustment parameter A _i,k+1 is kept unchanged, namely A _i,k+1＝A_i,k, and the effective life attenuation parameter B _i,k+1 is added with the running time of the current maintenance task, namely B _i,k+1＝B_i,k+D_k; when the component replacement strategy is adopted, the risk rate adjustment parameter is set to be an initial value, namely A _i,k+1 =1, and the effective life attenuation parameter B _i,k+1 is equal to the running time of the current maintenance task, namely B _i,k+1＝D_k; when an imperfect maintenance strategy is adopted, a risk rate adjustment parameter A _i,k+1＝a_i,k×A_i,k is adjusted, and a valid life attenuation parameter B _i,k+1＝b_i,k×B_i,k+D_k is adjusted; the logical runtime t _k is equal to the logical runtime itself plus the current runtime and interrupt time.

Step Sc: training, optimizing and solving the Markov decision model by adopting a reinforcement learning framework and a neural network, wherein the training, optimizing and solving method specifically comprises the following steps:

Initializing a first maintenance strategy neural network Q _act, a parameter theta _act thereof and a target maintenance strategy neural network And its parameter/>First maintenance task operation time length neural network Q _dura, parameter theta _dura of first maintenance task operation time length neural network Q _dura and target maintenance task operation time length neural network/>And its parameter/>

According to the four-tuple function, obtaining the input of the first maintenance strategy neural network and the first maintenance task operation time length neural network, specifically comprising: initializing a system state (k) at the kth task according to the state space S; according to the current system state (k) and the action space A, the input of a first maintenance strategy neural network and a first maintenance task operation time length neural network is obtained respectively: input _act (k) and input _dura (k);

storing the four-tuple function into a cache area;

selecting a certain number of samples { state (j), action (j), rewind (j+1), state (j+1) } from the buffer;

According to a maintenance strategy target value target _act (j) and a maintenance task operation duration target value target _dura (j) of the sample, respectively carrying out gradient descent solution on the target maintenance strategy neural network Q _act and the target maintenance task operation duration neural network Q _dura;

Further, the specific calculation of the target _act (j) and the target _dura (j) includes:

Wherein target _act (j) represents a maintenance policy target value corresponding to the jth maintenance task, target _dura (j) represents an operation duration target value corresponding to the jth maintenance task, raward (j+1) represents a bonus function corresponding to the jth maintenance task, Representing a state-action function corresponding to the target maintenance strategy neural network,/>Representing a state-action function corresponding to the target maintenance task duration neural network, wherein state (j+1) represents a state space corresponding to a j+1st maintenance task, action _act (j) represents an action space corresponding to the target maintenance strategy neural network when the j+1st maintenance task is performed, and action _dura (j) represents a state space corresponding to the target maintenance task duration neural network when the j+1st maintenance task is performed,/>Representing the target repair strategy neural network parameters,/>Representing the neural network parameters of the target maintenance task duration, wherein gamma represents the discount coefficient between the current income and the long-term maximum income,/>Representation is such that/>Maximum maintenance strategy,/>Representation is such that/>Maximum maintenance strategy;

updating the target maintenance strategy neural network and the target maintenance task operation time length neural network under a preset number of times, namely

When the logic running time of the system where the component is located is greater than the preset system running time T (namely, the state (j+1) is in a termination state), the step of storing the four-tuple function into a cache area is shifted; otherwise, when the training times meet the predicted training times, outputting the optimal maintenance strategy for all the components in each interruption time and the optimal running time of each maintenance task.

Step Sd: when the training times meet the predicted training times, outputting an optimal maintenance strategy for all components in each interruption time and the optimal running time of each maintenance task;

Step Se: and maintaining the assembly according to the optimal maintenance strategy and the optimal running time.

As an optional implementation manner, storing the four-tuple function in a buffer area specifically includes:

And selecting maintenance action _act (k) and operation duration action _dura (k) corresponding to the kth maintenance task according to the epsilon strategy: namely, if one random number is smaller than epsilon value, randomly selecting maintenance action and operation duration; otherwise, the maintenance action and the operation time length are selected under the condition of the maximum state-action function, and the calculation formula is as follows:

Wherein argmax represents a function for parameterizing the state-action function, Q _act () represents a state-action function corresponding to the first maintenance policy neural network, Q _dura () represents a state-action function corresponding to the first maintenance task duration neural network, input _act (k) represents an input of the first maintenance policy neural network, input _dura (k) represents an input of the first maintenance task duration neural network, action _act represents an action space corresponding to the first maintenance policy neural network, action _dura represents an action space corresponding to the first maintenance task duration neural network, θ _act represents the first maintenance policy neural network parameter, and θ _dura represents the first maintenance task duration neural network parameter;

The state-action value function Q (state (k), action (k)) represents the quality of long-term return obtained by selecting actions according to the policy pi under the condition of a known state, and is denoted as Q value. That is, implementing different repair actions and run times for different system states maximizes the long-term benefit Q value (i.e., minimizes the long-term consumption cost C) under conditions that meet the particular system resources;

enabling an action space action (k) = [ action _act(k),action_dura (k) ] corresponding to the kth maintenance task;

Obtaining a state space state (k+1) of a next task according to a state space of a current task, and obtaining a reward function reward (k+1) of the next task according to a reward function of the current task;

And storing { state (k), action (k), rewind (k+1), state (k+1) } into the cache region D, wherein state (k) represents a system state corresponding to the kth maintenance task.

The neural network structure is adjusted based on the reinforcement learning framework, the multi-task selection model is solved, the problems of unstable dimension and dimension disaster are effectively solved, and the method is suitable for multi-task and multi-type selective maintenance models.

Example 3:

referring to fig. 4, the present invention also provides a system for selectively maintaining multiple tasks, comprising:

the model construction module M1 is used for constructing a target maintenance model by taking the minimum cost function as a target when the reliability function in the multi-task selection model meets the reliability preset value;

A model conversion module M2, configured to convert the target maintenance model into a markov decision model, where the markov decision model is a four-tuple function including a state space, an action space, a reward function, and a state transfer function, and the action space includes a maintenance policy for the component during each interruption and a runtime for each maintenance task;

the solving module M3 is used for carrying out training optimization solving on the Markov decision model by adopting a reinforcement learning framework and a neural network;

The output module M4 is used for outputting the optimal maintenance strategy of all components and the optimal running time of the maintenance task when the training times meet the predicted training times;

and a maintenance module M5, configured to maintain the assembly according to the optimal maintenance strategy and the optimal running time.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the system disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.

Claims

1. A method of multitasking selective maintenance comprising:

Repairing the assembly according to the optimal repair strategy and the optimal running time;

The method for constructing the multi-task selection model comprises the following steps:

Selecting a maintenance strategy for the assembly during each maintenance task according to the reliability and the maintenance cost;

the risk rate function corresponding to the component non-maintenance strategy is as follows:

h_i,k(t_k+x)＝h_i,k-1(B_i,k-1+x)

Wherein x represents the time of a component i from the last maintenance to the kth maintenance task, k is a positive integer greater than or equal to 1, h _i,k () is the risk ratio function of the kth maintenance task to the component i, h _i,k-1 () is the risk ratio function of the kth-1 to the maintenance task to the component i, t _k is the logic running time of the system where the component i is located, and B _i,k-1 is the effective life of the component i before maintenance;

the risk rate function corresponding to the replacement component strategy is as follows:

h_i,k(t_k+x)＝h_i,0(x)

wherein h _i,0 () is a risk function before the component i is not repaired;

The risk rate function corresponding to the imperfect maintenance strategy is as follows:

h_i,k(t_k+x)＝A_i,k-1·h_i,k-1(b_i,k-1·B_i,k-1+x)

Wherein A _i,k-1 is the accumulation of the risk factor adjustment parameters a _k' for the previous k-1 maintenance tasks for the component i, B _i,k-1 is the effective life decay parameter of the component i, k 'is the kth' service interval;

the reliability function is expressed as:

The cost function is expressed as:

Wherein C _i is the maintenance cost for the component i, For a fixed repair cost for the component i, c _i (l) is the cost of taking a repair policy for the component i, when l=1, selecting the component not repair policy; selecting the replacement component policy when l=2; selecting the imperfect repair strategy when l=3;

the training optimization solution for the Markov decision model by adopting a reinforcement learning framework and a neural network specifically comprises the following steps:

Initializing a first maintenance strategy neural network Q _act, a parameter theta _act thereof and a target maintenance strategy neural network And parameters thereofFirst maintenance task operation time length neural network Q _dura, parameter theta _dura of first maintenance task operation time length neural network Q _dura and target maintenance task operation time length neural network/>And its parameter/>

Obtaining the input of the first maintenance strategy neural network and the first maintenance task operation time length neural network according to the four-tuple function;

storing the four-tuple function into a cache area;

Selecting a certain number of samples from the buffer area;

according to the maintenance strategy target value and the maintenance task operation time length target value of the sample, respectively carrying out gradient descent solution on the target maintenance strategy neural network and the target maintenance task operation time length neural network;

updating the target maintenance strategy neural network and the target maintenance task operation duration neural network under a preset number of times;

when the logic running time of the system where the component is located is larger than the preset system running time, the step of storing the four-element function into a cache area is carried out; otherwise, when the training times meet the predicted training times, outputting the optimal maintenance strategy of all components in each interruption time and the optimal running time of each maintenance task;

the storing the four-tuple function in the buffer area specifically includes:

Selecting a maintenance action _act (k) and a running duration action _dura (k) corresponding to the kth maintenance task according to an epsilon strategy;

Storing { state (k), action (k), rewind (k+1), state (k+1) } into the cache region D, wherein state (k) represents a system state corresponding to the kth maintenance task;

the concrete calculation of the maintenance strategy target value and the maintenance task operation duration target value of the sample comprises the following steps:

Wherein target _act (j) represents a maintenance policy target value corresponding to the jth maintenance task, target _dura (j) represents an operation duration target value corresponding to the jth maintenance task, raward (j+1) represents a bonus function corresponding to the jth maintenance task, Representing a state-action function corresponding to the target maintenance strategy neural network,/>Representing a state-action function corresponding to the target maintenance task duration neural network, wherein state (j+1) represents a state space corresponding to a j+1st maintenance task, action _act (j) represents an action space corresponding to the target maintenance strategy neural network when the j+1st maintenance task is performed, and action _dura (j) represents a state space corresponding to the target maintenance task duration neural network when the j+1st maintenance task is performed,/>Representing the target repair strategy neural network parameters,/>Representing the neural network parameters of the target maintenance task duration, wherein gamma represents the discount coefficient between the current income and the long-term maximum income,/>Representation is such that/>Maximum maintenance strategy,/>Representation is such that/>Maximum maintenance strategy.

2. The method of claim 1, wherein the target maintenance model is expressed as:

3. The method for selectively maintaining multiple tasks according to claim 1, wherein the selecting the maintenance action _act (k) and the operation duration action _dura (k) corresponding to the kth maintenance task according to the epsilon policy specifically includes:

Selecting a random number;

Judging the sizes of the random number and the epsilon value;

if the random number is smaller than the epsilon value, randomly selecting maintenance action and operation duration; otherwise, selecting the maintenance action and the operation duration under the condition of the maximum state-action function;

the calculation formula for selecting the maintenance action and the operation duration under the condition of the maximum state-action function is as follows:

Wherein argmax represents a function for parameterizing the state-action function, Q _act () represents a state-action function corresponding to the first maintenance policy neural network, Q _dura () represents a state-action function corresponding to the first maintenance task duration neural network, input _act (k) represents an input of the first maintenance policy neural network, input _dura (k) represents an input of the first maintenance task duration neural network, action _act represents an action space corresponding to the first maintenance policy neural network, action _dura represents an action space corresponding to the first maintenance task duration neural network, θ _act represents the first maintenance policy neural network parameter, and θ _dura represents the first maintenance task duration neural network parameter.

4. A multi-task selective maintenance system, comprising:

A repair module for repairing the assembly according to the optimal repair strategy and the optimal run time;

h_i,k(t_k+x)＝h_i,k-1(B_i,k-1+x)

h_i,k(t_k+x)＝h_i,0(x)

wherein h _i,0 () is a risk function before the component i is not repaired;

h_i,k(t_k+x)＝A_i,k-1·h_i,k-1(b_i,k-1·B_i,k-1+x)

the reliability function is expressed as:

The cost function is expressed as:

Initializing a first maintenance strategy neural network Q _act, a parameter theta _act thereof and a target maintenance strategy neural network And parameters thereofFirst maintenance task operation time length neural network Q _dura, parameter theta _dura of first maintenance task operation time length neural network and target maintenance task operation time length neural networkAnd its parameter/>

storing the four-tuple function into a cache area;

Selecting a certain number of samples from the buffer area;

the storing the four-tuple function in the buffer area specifically includes: