WO2021059787A1 - Control device, learning device, control method, learning method, and program - Google Patents

Control device, learning device, control method, learning method, and program Download PDF

Info

Publication number
WO2021059787A1
WO2021059787A1 PCT/JP2020/030640 JP2020030640W WO2021059787A1 WO 2021059787 A1 WO2021059787 A1 WO 2021059787A1 JP 2020030640 W JP2020030640 W JP 2020030640W WO 2021059787 A1 WO2021059787 A1 WO 2021059787A1
Authority
WO
WIPO (PCT)
Prior art keywords
wing
control amount
load
model
input
Prior art date
Application number
PCT/JP2020/030640
Other languages
French (fr)
Japanese (ja)
Inventor
大地 和田
圭佑 木村
英晶 村山
Original Assignee
国立研究開発法人宇宙航空研究開発機構
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 国立研究開発法人宇宙航空研究開発機構 filed Critical 国立研究開発法人宇宙航空研究開発機構
Publication of WO2021059787A1 publication Critical patent/WO2021059787A1/en

Links

Images

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B64AIRCRAFT; AVIATION; COSMONAUTICS
    • B64CAEROPLANES; HELICOPTERS
    • B64C13/00Control systems or transmitting systems for actuating flying-control surfaces, lift-increasing flaps, air brakes, or spoilers
    • B64C13/02Initiating means
    • B64C13/16Initiating means actuated automatically, e.g. responsive to gust detectors
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B64AIRCRAFT; AVIATION; COSMONAUTICS
    • B64CAEROPLANES; HELICOPTERS
    • B64C3/00Wings
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B64AIRCRAFT; AVIATION; COSMONAUTICS
    • B64CAEROPLANES; HELICOPTERS
    • B64C9/00Adjustable control surfaces or members, e.g. rudders
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01LMEASURING FORCE, STRESS, TORQUE, WORK, MECHANICAL POWER, MECHANICAL EFFICIENCY, OR FLUID PRESSURE
    • G01L5/00Apparatus for, or methods of, measuring force, work, mechanical power, or torque, specially adapted for specific purposes
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B11/00Automatic controllers
    • G05B11/01Automatic controllers electric
    • G05B11/36Automatic controllers electric with provision for obtaining particular characteristics, e.g. proportional, integral, differential
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05BCONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
    • G05B13/00Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
    • G05B13/02Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
    • G05B13/04Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric involving the use of models or simulators
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present invention relates to a control device, a learning device, a control method, a learning method, and a program.
  • the present application claims priority based on Japanese Patent Application No. 2019-173265 filed on September 24, 2019, the contents of which are incorporated herein by reference.
  • Non-Patent Documents 1 and 2 use an optical fiber sensor stretched around a wing structure to detect the strain of the structure, and perform machine learning based on the detected strain.
  • the load distribution is identified with high accuracy and stability.
  • Daichi WADA and Masato TAMAYAMA “Wing Load and Angle of Attack Identification by Integrating Optical Fiber Sensing and Neural Network Approach in Wind Tunnel Test.” Appl. Sci. 2019, 9 (7), Daichi WADA, Yohei SUGIMOTO, Hideaki MURAYAMA, Hirotaka IGAWA and Toshiya NAKAMURA. “Investigation of Inverse Analysis and Neural Network Approaches for Identifying Distributed Load Using Strains. , Pp. 151-161, 2019: doi: 10.2322 / tjsass.62.151.
  • One aspect of the present invention provides a control device, a learning device, a control method, a learning method, and a program capable of reducing a structural load on a wing.
  • One aspect of the present invention is an acquisition unit that acquires information indicating the strain of the blade detected by an optical fiber sensor provided on the blade of the structure and information indicating the control amount of the movable blade of the structure.
  • the first determination unit that determines the load and angle of attack of the blade based on the strain and the control amount indicated by the information acquired by the acquisition unit, and when the state variable is input, it corresponds to the state variable.
  • the load and angle of attack determined by the first determination unit and the information acquired by the acquisition unit are shown in a model trained to output the value of the action to be taken or a variable indicating the action.
  • a second determination unit that inputs a part or all of the control amount as the state variable and determines the control amount of the movable wing based on the output result of the model in which the state variable is input, and the second determination unit. It is a control device including a control unit that controls the movable blade based on the control amount determined by the determination unit.
  • the structural load on the wing can be reduced.
  • FIG. 1 is a diagram showing an example of the configuration of an aircraft 1 including the control device 100 of the embodiment.
  • the aircraft 1 includes, for example, a main wing 10, a vertical stabilizer 12, a horizontal stabilizer 14, and a control device 100.
  • the X, Y, and Z axes represent the aircraft fixed coordinate system
  • the X axis represents the roll axis
  • the Y axis represents the pitch axis
  • the Z axis represents the yaw axis.
  • Aircraft 1 is an example of a "structure”.
  • the main wing 10 is a wing that generates lift that supports the weight of the aircraft 1.
  • the wing 10 includes a flap FL1 ⁇ FL8, the optical fiber sensor S FB, and a pressure sensor S P is provided.
  • the main wing 10 is an example of a "wing of a structure”.
  • Flaps FL1 to FL8 are movable wings that increase the lift of the main wing 10.
  • flaps FL are movable wings that increase the lift of the main wing 10.
  • the main wing 10 may be further provided with other movable wings such as an aileron (auxiliary wing) for rolling the airframe and a spoiler for reducing lift.
  • the aileron may be any flap FL, or may be a movable wing provided separately from the flap FL.
  • the optical fiber sensor S FB is provided in a line shape at, for example, several places on at least one side (for example, the upper surface) of the main wing 10.
  • Each line of the optical fiber sensor SFB is attached, for example, along the main girder and the rear girder of the main wing 10 (along the Y-axis direction).
  • an FBG Fiber Bragg Grating
  • strain is sensed at intervals of about several [mm] to several tens [cm].
  • the optical fiber sensor SFB can detect the strains of the main blade 10 at several tens to several thousand points as a discrete distribution.
  • the pressure sensor S P is, for example, a pitot-static tube, detects the pressure applied to the main wing 10.
  • the pressure sensor SP is arranged in a one-dimensional array in the center of the span segment of the main wing 10 along the X-axis direction.
  • the pressure sensor S P is placed in ten places of the upper surface of the main wing 10 is installed in ten places on the lower surface side of the main wing 10.
  • the pressure sensor S P is a detected pressure value to detect a load distribution according to the main wing 10 by integrating in the section of the main wing 10.
  • the vertical stabilizer 12 and the horizontal stabilizer 14 are provided at positions away from the center of gravity of the aircraft 1 (for example, the end of the aircraft).
  • the vertical stabilizer 12 may be provided with, for example, a rudder for controlling the movement of the airframe around the Z axis.
  • the horizontal stabilizer 14 may be provided with, for example, an elevator for controlling the movement of the airframe around the Y-axis.
  • FIG. 2 is a diagram showing an example of the configuration of the control device 100 of the embodiment.
  • the control device 100 includes, for example, a communication unit 102, a drive unit 104, a control unit 110, and a storage unit 130.
  • the communication unit 102 is, for example, a wireless communication module including a receiver and a transmitter, and wirelessly communicates with an external device via a network.
  • the network may include, for example, WAN (Wide Area Network) or LAN (Local Area Network).
  • the external device includes, for example, a learning device 200 described later.
  • the drive unit 104 is, for example, an actuator such as a servo motor.
  • the drive unit 104 drives movable wings such as a flap FL provided on the main wing 10, an aileron, and a spoiler.
  • the drive unit 104 may drive a rudder provided on the vertical stabilizer 12 or an elevator provided on the horizontal stabilizer 14.
  • the control unit 110 includes, for example, an acquisition unit 112, a control amount determination unit 114, and a drive control unit 116.
  • the control amount determination unit 114 is an example of the “first determination unit” and the “second determination unit”.
  • the component of the control unit 110 is realized by, for example, a processor such as a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit) executing a program stored in the storage unit 130.
  • a processor such as a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit) executing a program stored in the storage unit 130.
  • Some or all of the components of the control unit 110 may be realized by hardware such as LSI (Large Scale Integration), ASIC (Application Specific Integrated Circuit), or FPGA (Field-Programmable Gate Array), or software. And may be realized by the cooperation of hardware.
  • the storage unit 130 is realized by, for example, an HDD (Hard Disc Drive), a flash memory, an EEPROM (Electrically Erasable Programmable Read Only Memory), a ROM (Read Only Memory), a RAM (Random Access Memory), or the like.
  • the storage unit 130 stores the first model data D1 and the second model data D2 in addition to various programs such as firmware and application programs.
  • the first model data D1 and the second model data D2 may be installed in the storage unit 130 from the learning device 200 via a network, or may be installed from a portable storage medium connected to the drive device of the control device 100, for example. It may be installed in the storage unit 130.
  • the first model data D1 is information (program or data structure) that defines the first model MDL1.
  • the first model MDL1 is, for example, a model learned to output the load distribution of the main wing 10 and the angle of attack ⁇ when the strain distribution of the main wing 10 and the control amount of the movable wing of the main wing 10 are input. ..
  • Such a model may be realized, for example, by a model in which a plurality of neural networks are configured in multiple stages. Each of the plurality of neural networks includes, for example, an input layer, at least one intermediate layer (hidden layer), and an output layer.
  • the control amount includes, for example, a steering angle.
  • the control amount will be described as the steering angle.
  • the second model data D2 is information (program or data structure) that defines the second model MDL2.
  • the second model MDL2 for example, action value function Q (s t, a t) to be handled in reinforcement learning a model trained the approximation function.
  • Action value function Q (s t, a t) is the one in which there is under environmental conditions s t of the time t, and represents the value when selecting a certain action a t as a function.
  • the second model MDL2 when environmental conditions s t is input, one or more actions that can be taken under the environmental conditions s t (behavior variable) a respective value (Q value of t Also called) is output.
  • the second model MDL2 may be realized by, for example, a neural network including an input layer, a plurality of intermediate layers (hidden layers), and an output layer.
  • a DQN Deep Q-Network
  • the first model data D1 and the second model data D2 include, for example, connection information on how the units included in each of the plurality of layers constituting the neural network are connected to each other, and between the combined units.
  • connection information includes, for example, the number of units included in each layer, information that specifies the type of unit to which each unit is connected, an activation function that realizes each unit, a gate provided between units in the hidden layer, and the like. Contains information.
  • the activation function that realizes the unit may be, for example, a rectified linear function (ReLU function), a sigmoid function, a step function, or other function.
  • the gate selectively passes or weights the data transmitted between the units, for example, depending on the value returned by the activation function (eg 1 or 0).
  • the coupling coefficient includes, for example, a weight given to the output data when data is output from a unit of a certain layer to a unit of a deeper layer in a hidden layer of a neural network.
  • the coupling coefficient may include a bias component peculiar to each layer.
  • FIG. 3 is a flowchart showing an example of a series of processing flows of the control unit 110 of the embodiment. The processing of this flowchart may be repeated, for example, at a predetermined cycle.
  • the acquisition unit 112 acquires information indicating the strain distribution of the main wing 10 (hereinafter referred to as strain information) from the optical fiber sensor SFB , and indicates the steering angle of the movable wing of the main wing 10 from the drive unit 104.
  • Information (hereinafter referred to as steering angle information) is acquired (step S100).
  • Strain information and steering angle information are, for example, multidimensional vectors.
  • the strain information vector will be referred to as a “strain vector ⁇ ( ⁇ )”
  • the steering angle information vector will be referred to as a “steering angle vector ⁇ ( ⁇ )”.
  • ( ⁇ ) represents a vector symbol.
  • strain vector ⁇ ( ⁇ ) for example, the strain value detected by each FBG of the optical fiber sensor S FB provided along the main girder and each FBG of the optical fiber sensor S FB provided along the rear girder.
  • Each of the strain values detected by is included as an element value.
  • the rudder angle vector ⁇ ( ⁇ ) includes, for example, the rudder angle values of the flaps FL1 to FL8 as element values.
  • the rudder angle vector ⁇ ( ⁇ ) may include the rudder angle values of other movable wings such as ailerons and spoilers as elements.
  • the strain information and the rudder angle information are not limited to the vector, that is, the tensor of the first floor, and may be the tensor of the second floor or higher.
  • control amount determination unit 114 inputs the strain vector ⁇ ( ⁇ ) and the steering angle vector ⁇ ( ⁇ ) acquired by the acquisition unit 112 into the pre-learned first model MDL1 (step S102).
  • FIG. 4 is a diagram schematically showing the first model MDL1.
  • the model MDL1-1 and the model MDL1-2 are configured in multiple stages.
  • the model MDL1-1 and the model MDL1-2 are neural networks, respectively.
  • the strain vector ⁇ ( ⁇ ) and the rudder angle vector ⁇ ( ⁇ ) are input to the model MDL1-1 in the previous stage.
  • the model MDL1-1 in the previous stage outputs a vector having the distribution value of the load applied to the main wing 10 as an element (hereinafter, referred to as a load vector F ( ⁇ )).
  • the rudder angle vector ⁇ ( ⁇ ) in addition to the load vector F ( ⁇ ) which is the output result of the previous model MDL1-1, the rudder angle vector ⁇ ( ⁇ ) also input to the previous model MDL1-1. Is entered.
  • the model MDL1-2 in the latter stage outputs the angle of attack ⁇ of the main wing 10 as a 0th-order tensor, that is, a scalar.
  • control amount determination unit 114 inputs the strain vector ⁇ ( ⁇ ) which is the strain information and the steering angle vector ⁇ ( ⁇ ) which is the steering angle information into the first model MDL1, the model MDL1 in the previous stage of the first model MDL1.
  • Information indicating the load distribution which is the output result (hereinafter referred to as load distribution information) is acquired from -1, and information indicating the angle of attack ⁇ which is the output result is obtained from the model MDL1-2 in the subsequent stage (hereinafter referred to as reception).
  • Acquire (referred to as angle information) (step S104).
  • the control amount determination unit 114 calculates the difference between the sum of the load distributions indicated by the acquired load distribution information and the sum of the target load distributions (hereinafter, referred to as total load difference ⁇ F sum ) (step S106). ).
  • the sum of the load distributions is, for example, the sum of all the load values included as elements in the load vector F ( ⁇ ).
  • the target load distribution may be, for example, the total load that the main wing 10 must bear in order for the aircraft 1 to maintain level flight.
  • the control amount determination unit 114 uses a part or all (preferably all) of the steering angle vector ⁇ ( ⁇ ), the angle of attack ⁇ , the load vector F ( ⁇ ), and the total load difference ⁇ F sum as the state variables s.
  • the state variable s is a multidimensional vector in which a part or all of the rudder angle vector ⁇ ( ⁇ ), the angle of attack ⁇ , the load vector F ( ⁇ ), and the total load difference ⁇ F sum is included as an element.
  • FIG. 5 is a diagram schematically showing the second model MDL2.
  • the second model MDL2 contains a state variable s including a part or all of the rudder angle vector ⁇ ( ⁇ ), the angle of attack ⁇ , the load vector F ( ⁇ ), and the total load difference ⁇ F sum. Entered.
  • the state variable s is input, the second model MDL2 outputs the respective value Q (s, a) of one or a plurality of actions a that can be taken under the state variable s.
  • each of the plurality of actions a has a value.
  • this multidimensional vector will be described as Q (s, a) ( ⁇ ).
  • Action a is selected from, for example, the following three options. These three options (1) to (3) are merely examples, and some of them may be omitted or another option may be added.
  • the movable wings to be controlled are eight flaps FL1 to FL8, one of the options (1) to (3) is selected for each of the flaps FL1 to FL8.
  • the action value Q (s, a) ( ⁇ ) output by the second model MDL2 is a 24-dimensional vector.
  • the action of the option (1) is the first surface side (plus side of the rudder angle), which is one surface of the wing surface, and the second surface side, which is the other surface, with respect to the direction in which the movable wing intersects the wing surface. It may be defined that the movable wing is not moved on either side (minus side of the rudder angle).
  • the action of the option (2) may be defined as moving the movable wing toward the first surface side in a direction intersecting the wing surface of the movable wing.
  • the action of the option (3) may be defined as moving the movable wing toward the second surface side in a direction intersecting the wing surface of the movable wing.
  • the control amount determination unit 114 acquires the action value Q (s, a) ( ⁇ ) (step S110). Then, the control amount determination unit 114 determines the steering angle of each movable wing to be controlled based on the acquired action value Q (s, a) ( ⁇ ) (step S112).
  • the action value Q (s, a) ( ⁇ ) includes the value when the action a of (1) is caused and the value when the action a of (2) is caused with respect to the flap FL1.
  • (3) The value when the action a is taken is included as an element value.
  • the control amount determination unit 114 selects the action a having the highest value from these three actions a with respect to the flap FL1.
  • the control amount determination unit 114 randomly selects an action from all actions a with a certain probability ⁇ like the Epsilon-Greedy method, and has the highest value with the remaining probability (1- ⁇ ). Action a may be selected.
  • control amount determination unit 114 determines the steering angle that the movable wing should take in the next cycle t + 1 based on the action a determined for each movable wing.
  • the drive control unit 116 controls each actuator included in the drive unit 104 based on the steering angle determined by the control amount determination unit 114 to drive the movable wing (step S114). Specifically, the drive control unit 116 determines the operation amount of each actuator from the steering angle (control amount) determined by the control amount determination unit 114, and controls each actuator with the determined operation amount. Drive the movable wings. This ends the processing of this flowchart.
  • the learning device 200 may be a single device, or may be a system in which a plurality of devices connected via a network such as a WAN or a LAN operate in cooperation with each other. That is, the learning device 200 may be realized by a plurality of computers (processors) included in a system using distributed computing or cloud computing.
  • FIG. 6 is a diagram showing an example of the configuration of the learning device 200 of the embodiment.
  • the learning device 200 includes a communication unit 202, a control unit 210, and a storage unit 230.
  • the communication unit 202 is, for example, a wireless communication module including a receiver and a transmitter, and wirelessly communicates with an external device such as a control device 100 via a network.
  • the control unit 210 includes, for example, an acquisition unit 212 and a learning unit 214.
  • the components of the control unit 210 are realized, for example, by a processor such as a CPU or GPU executing a program stored in the storage unit 230.
  • Some or all of the components of the control unit 210 may be realized by hardware such as LSI, ASIC, or FPGA, or may be realized by the cooperation of software and hardware.
  • the storage unit 230 is realized by, for example, an HDD, a flash memory, an EEPROM, a ROM, a RAM, or the like.
  • the storage unit 230 stores the first model data D1 and the second model data D2, the third model data D3, and the teacher data D4 described above, in addition to various programs such as firmware and application programs.
  • the third model data D3 is information (program or data structure) that defines the third model MDL3.
  • the third model MDL3 is a simulator for performing deep reinforcement learning, and is not used during the above-mentioned operation.
  • the load vector F ( ⁇ ) representing the load distribution of the main wing 10 is input.
  • a model trained to output may be realized, for example, by a neural network including an input layer, at least one intermediate layer (hidden layer), and an output layer.
  • the third model MDL3 may be a mere static data set or database representing the results of the wind tunnel test described later. For example, suppose that the tester arbitrarily determines the steering angle of the movable wing and the angle of attack ⁇ of the main wing 10 to perform a wind tunnel test, and observes the load distribution of the main wing 10 during the test. In this case, the third model MDL3 may be table data in which the observed load distribution of the main wing 10 is associated with the data set of the steering angle of the movable wing and the angle of attack ⁇ of the main wing 10.
  • the relationship between the input value of the steering angle of the movable wing and the angle of attack ⁇ of the main wing 10 and the output value of the load distribution of the main wing 10 is defined by a function formula or the like. It may be a numerical model.
  • the third model data D3 may include various information such as coupling information as well as the first model data D1 and the second model data D2.
  • the teacher data D4 is data for learning the first model MDL1.
  • ⁇ ) and the angle of attack ⁇ are a data set associated with each other as a teacher label (also called a target).
  • Such teacher data D4 may be obtained, for example, by performing a wind tunnel test.
  • the test chamber to perform wind tunnel tests the same as the main wing 10 of the aircraft 1, or wings similar model for the optical fiber sensor S FB and the pressure sensor S P is provided, rotatable test device, such as a turntable Place on top. Then, while the air flow is generated in the test chamber, the movable wings are driven while changing the rotation angle of the rotatable test device by 1 degree.
  • the strain and load of the blade which can change according to the steering angle of the movable blade, are measured by the optical fiber sensor S FB and the pressure sensor S. It will be detected by P.
  • the teacher data D4 may be generated by virtually creating the same environment as when the aircraft 1 is flying by the wind tunnel test.
  • FIG. 7 is a flowchart showing an example of a series of processing flows of the control unit 210 of the embodiment.
  • the processing of this flowchart may be repeated at a predetermined cycle, for example, when learning the first model MDL1.
  • the learning device 200 is realized by a plurality of computers included in a system utilizing distributed computing or cloud computing, a part or all of the processing of this flowchart may be processed in parallel by the plurality of computers.
  • the acquisition unit 212 acquires the strain vector ⁇ ( ⁇ ) and the steering angle vector ⁇ ( ⁇ ) associated with the teacher label from the teacher data D4 stored in the storage unit 230 (step S200).
  • the learning unit 214 inputs the strain vector ⁇ ( ⁇ ) and the steering angle vector ⁇ ( ⁇ ) acquired by the acquisition unit 212 into the unlearned first model MDL1 (step S202).
  • the learning unit 214 acquires the load distribution information and the angle of attack information from the first model MDL1 (step S204).
  • the learning unit 214 has a load vector F ( ⁇ ) associated as a teacher label with respect to the strain vector ⁇ ( ⁇ ) and the angle of attack vector ⁇ ( ⁇ ) input to the first model MDL1 in the process of S200. ) And the load vector F ( ⁇ ) output by the first model MDL1 as the load distribution information, and the angle of attack ⁇ associated as the teacher label and the first model MDL1 as the angle of attack information. The difference from the output angle of attack ⁇ is calculated (step S206).
  • the learning unit 214 learns the first model MDL1 so that the difference between the load vector F ( ⁇ ) and the difference between the angles of attack ⁇ become small (step S208). For example, the learning unit 214 determines (updates) the weighting coefficient, the bias component, and the like, which are parameters of the first model MDL1, by using a stochastic gradient descent method or the like so that each difference becomes small.
  • the learning unit 214 stores the learned first model MDL1 in the storage unit 230 as the first model data D1.
  • the learning unit 214 repeats the processes of S200 to S208 (iterates) and learns the first model MDL1. Then, the learning unit 214 transmits the first model data D1 that defines the fully learned first model MDL1 to the control device 100 via, for example, the communication unit 202. This ends the processing of this flowchart.
  • FIG. 8 is a flowchart showing another example of a series of processing flow of the control unit 210 of the embodiment.
  • the processing of this flowchart may be repeated at a predetermined cycle, for example, when learning the second model MDL2.
  • the learning device 200 is realized by a plurality of computers included in a system using distributed computing or cloud computing, a part or all of the processing of this flowchart may be processed in parallel by the plurality of computers. ..
  • the acquisition unit 212 acquires the state variable s (t) of the main wing 10 or a wing of a similar model thereof in a certain period (time) t (step S300).
  • the state variable s (t) here is the same as the variable input to the second model MDL2 during operation.
  • the state variable s (t) including all of them is acquired in the process of S300 at the time of learning.
  • the learning unit 214 inputs the state variable s (t) acquired by the acquisition unit 212 into the second model MDL2 (step S302).
  • the learning unit 214 acquires the action value Q (s, a) ( ⁇ ) from the second model MDL2 (step). S304).
  • the learning unit 214 determines the optimum action from one or more actions that can be taken under the state variable s based on the acquired action value Q (s, a) ( ⁇ ). Select a (step S306).
  • the optimal action a may be, for example, the action having the highest value or the action based on the Epsilon-Greedy method.
  • the learning unit 214 may reduce the probability ⁇ each time the processing of this flowchart is repeated (as the number of iterations increases).
  • the optimal behavior a may be selected using the roulette selection method of the genetic algorithm, or may be selected using the softmax method utilizing the Boltzmann distribution.
  • the learning unit 214 inputs the action variable a representing the selected action into the learned third model MDL3 (step S308). That is, the learning unit 214 inputs the steering angle vector ⁇ ( ⁇ ) having the steering angle to be taken by each movable wing to be controlled in the next period t + 1 as an element to the learned third model MDL3. At this time, the learning unit 214 adds the rudder angle vector ⁇ ( ⁇ ) of the next period t + 1 output by the second model MDL2, and the angle of attack ⁇ of the current period t, that is, the processing of the second model MDL2 in S302. The angle of attack ⁇ input as the state variable s is also input to the third model MDL3.
  • the learning unit 214 acquires the load distribution information (that is, the load vector F ( ⁇ )) in the next period t + 1 from the third model MDL3 as the information representing the state variable s'of the main wing 10 in the next period t + 1. (Step S310).
  • the learning unit 214 calculates the total load difference ⁇ F sum based on the load vector F ( ⁇ ) acquired as the state variable s', and the calculated total load difference ⁇ F sum and the steering angle vector ⁇ ( ⁇ ). , Part or all of the interception angle ⁇ and the load vector F ( ⁇ ) is input to the second model MDL2 as the state variable s'(step S312).
  • the angle of attack ⁇ included in the state variable s'input to the second model MDL2 may be the angle of attack ⁇ in the current period t. That is, the learning unit 214 may take over the value of the angle of attack ⁇ in the period t acquired in the process of S300 as it is and input it to the second model MDL2 as the state variable s'.
  • the learning unit 214 calculates the reward for the action a selected in the process of S306 (step S314).
  • the learning unit 214 may calculate the reward r (t) in the current cycle t based on the mathematical formula (1).
  • M represents the wing root moment calculated from the load distribution
  • F sum represents the total load distributed on the main wing 10.
  • the learning unit 214 sets the reward r (t) to zero.
  • the initial time is, for example, the time in the horizontal flight state before the control device 100 starts the control for reducing the structural load of the aircraft 1 (that is, the control during operation).
  • the predetermined condition may further include that the absolute value of the total load difference ⁇ F sum is less than the lower limit of a certain allowable range.
  • the predetermined condition includes that the absolute value of the total load difference ⁇ F sum is less than the lower limit of the allowable range
  • the learning unit 214 determines that the absolute value of the total load difference ⁇ F sum is within the allowable range and the time.
  • the reward r (t) may be set to a value larger than zero.
  • the learning unit 214 may give a negative reward (penalty) to the calculated reward r (t) based on the mathematical formula (2).
  • ⁇ ( ⁇ ) in the equation is when the difference between the steering angle of the movable wing at the previous time t-1 and the steering angle of the movable wing at the current time t is obtained for each of the movable wings to be controlled.
  • the learning unit 214 multiplies an arbitrary weighting coefficient (in the formula (2), the weighting coefficient is 3 as an example) by multiplying the sum of the absolute values of the differences in the steering angles of each movable blade by an arbitrary weighting coefficient ⁇ ( ⁇ ). , The product of the sum ⁇ ( ⁇ ) and the weighting coefficient is subtracted from the reward r (t) calculated in the process of S312.
  • the second model MDL2 is learned so that the reward r (t) becomes smaller as the number of movable wings to be moved increases among the plurality of movable wings to be controlled.
  • the control surface can be efficiently controlled without frequently moving the control surface (movable blades).
  • the learning unit 214 inputs the calculated reward r (t), the action value Q (s', a') output by the second model MDL2 when the state variable s'is input, and the state variable s.
  • the second model MDL2 is learned based on the action value Q (s, a) output by the second model MDL2 at the time of this (step S316).
  • the learning unit 214 uses the second model MDL2 to obtain the action value Q (s', a') for each of the plurality of actions a'that can be taken at the next time t + 1, and the plurality of actions.
  • the maximum value maxQ (s', a') is selected from the action values Q (s', a') corresponding to each of a'.
  • the learning unit 214 multiplies the selected action value maxQ (s', a') by a weighting coefficient (0 ⁇ ⁇ 1) called a discount rate ⁇ , and further adds a reward r (t).
  • the learning unit 214 learns the second model MDL2 so that the difference between r (t) + ⁇ maxQ (s', a') and Q (s, a) becomes small.
  • the learning unit 214 sets the weighting coefficient and bias component, which are parameters of the second model MDL2, so that the difference between r (t) + ⁇ maxQ (s', a') and Q (s, a) becomes small.
  • Q (s, a) is an example of "first value”
  • Q (s', a') is an example of "second value”.
  • the learning unit 214 stores the learned second model MDL2 in the storage unit 230 as the second model data D2.
  • the learning unit 214 repeats the processes of S300 to S316 (iterates) and learns the second model MDL2. Then, the learning unit 214 transmits the second model data D2 that defines the fully learned second model MDL2 to the control device 100 via, for example, the communication unit 202. This ends the processing of this flowchart.
  • the control device 100 has a strain vector ⁇ ( ⁇ ), which is information indicating the strain of the main wing 10 of the aircraft 1, and information indicating a steering angle of a movable wing such as a flap FL or an aileron of the aircraft 1.
  • the rudder angle vector ⁇ ( ⁇ ) and is obtained.
  • the control device 100 inputs the acquired strain vector ⁇ ( ⁇ ) and the steering angle vector ⁇ ( ⁇ ) to the first model MDL1 learned in advance, and based on the output result of the first model MDL1 in which these vectors are input. Then, the load distribution of the main wing 10 and the angle of attack ⁇ are determined.
  • the control device 100 includes a load vector F ( ⁇ ) indicating the determined load distribution of the main wing 10, a total load difference ⁇ F sum of the main wing 10, an angle of attack ⁇ of the main wing 10, and a steering angle vector of a movable wing such as a flap FL.
  • ⁇ ( ⁇ ) is input to the pre-learned second model MDL2 as the state variable s, and the control amount of the movable wing is based on the output result of the second model MDL2 in which the state variable s is input.
  • the control device 100 controls the movable wing based on the determined control amount.
  • the structural load of the main wing 10 for example, the wing root moment M
  • the load vector F ( ⁇ ) and the angle of attack ⁇ are determined from the strain vector ⁇ ( ⁇ ) and the steering angle vector ⁇ ( ⁇ ) using the first model MDL1. Is not limited to this.
  • the first model MDL1 is realized.
  • the load vector F ( ⁇ ) and the angle of attack ⁇ are obtained from the strain vector ⁇ ( ⁇ ) and the steering angle vector ⁇ ( ⁇ ) by using these approximation formulas and tables. You may decide. That is, the first model MDL1 may be table data or an approximate expression like the third model MDL3.
  • the control amount input to the first model MDL1 has been described as being the steering angle of the movable wing of the main wing 10, but the present invention is not limited to this.
  • the control amount may include a sweep angle, a twist angle, and the like in addition to or instead of the steering angle.
  • the sweep angle is the angle formed by the pitch axis (Y axis) when the movable wing is rotated around the yaw axis (Z axis).
  • the twist angle is the angle formed by the roll axis (X axis) when the movable wing is rotated around the pitch axis (Y axis).
  • the second model MDL2 is action value Q (s, a) instead of outputting a, may be trained to output behavior variable a t.
  • the control device 100 uses the strain vector ⁇ ( ⁇ ), which is information indicating the strain of the main wing 10 of the aircraft 1, and the information indicating the steering angle of the movable wing such as the flap FL or aileron of the aircraft 1.
  • strain vector
  • An aircraft obtains a certain rudder angle vector ⁇ ( ⁇ ) and uses the pre-learned first model MDL1 and second model MDL2 from the acquired strain vector ⁇ ( ⁇ ) and rudder angle vector ⁇ ( ⁇ ).
  • the control device 100 determines the control amount of various turbine blades by applying deep reinforcement learning in order to reduce the structural load of the turbine blades included in the wind power generation device and the turbine blades included in the tidal current power generation device. May be good.
  • the turbine blade is provided with the optical fiber sensor SFB like the main wing 10.
  • Wind turbines and tidal current generators are other examples of "structures" and turbine blades are another example of "wings of structures”.
  • the turbine blade may be provided with a flap to control the air flow or water flow.
  • the main wing of the aircraft 1 may be replaced with a turbine blade
  • the movable wing such as the flap FL or aileron of the aircraft 1 may be replaced with a flap of the turbine blade.
  • control device 100 includes a strain vector ⁇ ( ⁇ ) indicating the strain of the turbine blade detected by the optical fiber sensor SFB provided on the turbine blade of the wind power generation device or the tidal current power generation device, and a flap of the turbine blade.
  • the steering angle vector ⁇ ( ⁇ ) indicating the steering angle is acquired.
  • the control device 100 inputs the acquired strain vector ⁇ ( ⁇ ) and the steering angle vector ⁇ ( ⁇ ) to the first model MDL1 learned in advance, and based on the output result of the first model MDL1 in which these vectors are input.
  • the load distribution of the turbine blades and the angle of attack ⁇ are determined.
  • the control device 100 includes a load vector F ( ⁇ ) indicating the determined load distribution of the turbine blades, a total load difference ⁇ F sum of the turbine blades, an angle of attack ⁇ of the turbine blades, and a steering angle vector ⁇ ( ⁇ ) of the flaps.
  • a part or all of the state variables are input to the pre-learned second model MDL2 as state variables s, and the flap control amount is determined based on the output result of the second model MDL2 in which the state variables s are input.
  • control device 100 controls the flap based on the determined control amount.
  • the way in which the turbine blades receive the airflow and water flow can be appropriately changed, so that the structural load on the turbine blades can be reduced and the power generation efficiency of the power generation device can be increased.
  • the turbine blade may be a variable pitch blade.
  • the turbine blade itself functions as a movable blade. That is, the turbine blades have both the functions of the main wing and the movable wing in the aircraft 1.
  • the acquisition unit 112 of the control device 100 acquires the strain vector ⁇ ( ⁇ ) indicating the strain distribution of the turbine blade from the optical fiber sensor SFB provided on the turbine blade, and sets the turbine blade on the pitch axis.
  • the angle vector ⁇ # ( ⁇ ) indicating the pitch angle of the turbine blade is acquired from the actuator that rotates around.
  • the control amount determination unit 114 obtains the strain vector ⁇ ( ⁇ ) and the angle vector ⁇ # ( ⁇ ). , Input to the pre-learned first model MDL1.
  • the first model MDL1 is pre-trained by the learning device 200 as described in the above-described embodiment.
  • the learning unit 214 of the learning device 200 uses the teacher data to determine the load distribution applied to the turbine blades when the strain vector ⁇ ( ⁇ ) and the angle vector ⁇ # ( ⁇ ) of the turbine blades are input.
  • the first model MDL1 is learned so as to output the indicated load vector F ( ⁇ ) and the angle of attack ⁇ of the turbine blade.
  • the first model MDL1 outputs the turbine blade load vector F ( ⁇ ) and the angle of attack ⁇ when the turbine blade strain vector ⁇ ( ⁇ ) and the angle vector ⁇ # ( ⁇ ) are input. become.
  • the control amount determination unit 114 acquires the load vector F ( ⁇ ) of the turbine blade and the angle of attack ⁇ from the first model MDL1 in which the strain vector ⁇ ( ⁇ ) and the angle vector ⁇ # ( ⁇ ) are input.
  • the control amount determination unit 114 calculates the total load difference ⁇ F sum , which is the difference between the sum of the load distributions indicated by the acquired load vector F ( ⁇ ) and the sum of the target load distributions.
  • the control amount determination unit 114 uses a part or all (preferably all) of the turbine blade angle vector ⁇ # ( ⁇ ), angle of attack ⁇ , load vector F ( ⁇ ), and total load difference ⁇ F sum as state variables s. Input to the second model MDL2. It is assumed that the second model MDL2 is pre-trained by the learning device 200 as described in the above-described embodiment. That is, when a part or all of the angle vector ⁇ # ( ⁇ ), the angle of attack ⁇ , the load vector F ( ⁇ ), and the total load difference ⁇ F sum is input as the state variable s, the second model MDL2 is in that state. It is learned to output the value Q (s, a) ( ⁇ ) of the action to be taken according to the variable s.
  • the control amount determination unit 114 acquires the action value Q (s, a) ( ⁇ ). Then, the control amount determination unit 114 determines the pitch angle of the turbine blade based on the acquired action value Q (s, a) ( ⁇ ).
  • the drive control unit 116 controls the actuator based on the pitch angle determined by the control amount determination unit 114 to rotate the turbine blade around the pitch axis.
  • Aircraft 10 ... Main wing, 12 ... Vertical stabilizer, 14 ... Horizontal stabilizer, 100 ... Control device, 102 ... Communication unit, 104 ... Drive unit, 110 ... Control unit, 112 ... Acquisition unit, 114 ... Control amount determination unit, 116 ... drive control unit, 130 ... storage unit, 200 ... learning device, 202 ... communication unit, 210 ... control unit, 212 ... acquisition unit, 214 ... learning unit, 230 ... storage unit, MDL1 ... first model, MDL2 ... first 2 models, MDL3 ... 3rd model

Landscapes

  • Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Automation & Control Theory (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Mechanical Engineering (AREA)
  • Health & Medical Sciences (AREA)
  • Force Measurement Appropriate To Specific Purposes (AREA)
  • Feedback Control In General (AREA)
  • Aerodynamic Tests, Hydrodynamic Tests, Wind Tunnels, And Water Tanks (AREA)

Abstract

A control device of this embodiment comprises: an acquisition unit that acquires information indicating the strain of a vane of a structure, and information indicating a control amount of a movable vane of the structure; a first determination unit that determines a load and an elevation angle of the vane on the basis of the strain and the control amount indicated by the information acquired by the acquisition unit; a second determination unit that inputs, to a model trained by deep reinforcement learning, as state variables, some or all of the load and the elevation angle determined by the first determination unit, and the control amount indicated by the information acquired by the acquisition unit, and the second determination unit determines a control amount of the movable vane on the basis of an output result of the model to which the state variables were input; and a control unit that controls the movable vane on the basis of the control amount determined by the second control unit.

Description

制御装置、学習装置、制御方法、学習方法、及びプログラムControl devices, learning devices, control methods, learning methods, and programs
 本発明は、制御装置、学習装置、制御方法、学習方法、及びプログラムに関する。
 本願は、2019年9月24日に出願された日本国特許出願2019-173265号に基づき優先権を主張し、その内容をここに援用する。
The present invention relates to a control device, a learning device, a control method, a learning method, and a program.
The present application claims priority based on Japanese Patent Application No. 2019-173265 filed on September 24, 2019, the contents of which are incorporated herein by reference.
 航空機の翼や風力タービンのブレード等の構造物にかかる荷重の分布を同定するために、それら構造物のひずみを計測し、計測したひずみから荷重を同定する技術が知られている。例えば、非特許文献1、2に記載された技術は、翼の構造物に張り巡らせた光ファイバセンサを用いて、その構造物のひずみを検出し、検出したひずみを基に機械学習を行うことで、高精度且つ安定して荷重分布を同定している。 In order to identify the distribution of the load applied to structures such as aircraft blades and blades of wind turbines, there is known a technique for measuring the strain of those structures and identifying the load from the measured strain. For example, the techniques described in Non-Patent Documents 1 and 2 use an optical fiber sensor stretched around a wing structure to detect the strain of the structure, and perform machine learning based on the detected strain. The load distribution is identified with high accuracy and stability.
 従来の技術を用いて、例えば、総揚力を一定に保ちながら、翼にかかるモーメントや応力を低減するような目標を達成するために、合理的な判断に基づいて翼にかかる荷重分布をリアルタイムに制御することが望まれている。しかしながら、従来の技術では、この点について十分に検討されておらず、翼への構造的負荷を低減させることができない場合があった。 Using conventional techniques, for example, in order to achieve goals such as reducing the moment and stress on the wing while keeping the total lift constant, the load distribution on the wing is distributed in real time based on rational judgment. It is desired to control. However, in the conventional technique, this point has not been sufficiently studied, and there are cases where the structural load on the blade cannot be reduced.
 本発明の一つの態様は、翼への構造的負荷を低減させることができる制御装置、学習装置、制御方法、学習方法、及びプログラムを提供する。 One aspect of the present invention provides a control device, a learning device, a control method, a learning method, and a program capable of reducing a structural load on a wing.
 本発明の一態様は、構造物の翼に設けられた光ファイバセンサによって検出された前記翼のひずみを示す情報と、前記構造物の可動翼の制御量を示す情報とを取得する取得部と、前記取得部により取得された情報が示す前記ひずみ及び前記制御量に基づいて、前記翼の荷重及び迎角を決定する第1決定部と、状態変数が入力されると、前記状態変数に応じてとるべき行動の価値又は前記行動を示す変数を出力するように学習されたモデルに、前記第1決定部により決定された前記荷重及び前記迎角と、前記取得部により取得された情報が示す前記制御量とのうち一部または全部を前記状態変数として入力し、前記状態変数を入力した前記モデルの出力結果に基づいて前記可動翼の制御量を決定する第2決定部と、前記第2決定部により決定された前記制御量に基づいて、前記可動翼を制御する制御部と、を備える制御装置である。 One aspect of the present invention is an acquisition unit that acquires information indicating the strain of the blade detected by an optical fiber sensor provided on the blade of the structure and information indicating the control amount of the movable blade of the structure. , The first determination unit that determines the load and angle of attack of the blade based on the strain and the control amount indicated by the information acquired by the acquisition unit, and when the state variable is input, it corresponds to the state variable. The load and angle of attack determined by the first determination unit and the information acquired by the acquisition unit are shown in a model trained to output the value of the action to be taken or a variable indicating the action. A second determination unit that inputs a part or all of the control amount as the state variable and determines the control amount of the movable wing based on the output result of the model in which the state variable is input, and the second determination unit. It is a control device including a control unit that controls the movable blade based on the control amount determined by the determination unit.
 本発明の一態様によれば、翼への構造的負荷を低減させることができる。 According to one aspect of the present invention, the structural load on the wing can be reduced.
実施形態の制御装置を備える航空機の構成の一例を示す図である。It is a figure which shows an example of the structure of the aircraft which comprises the control device of embodiment. 実施形態の制御装置の構成の一例を示す図である。It is a figure which shows an example of the structure of the control device of an embodiment. 実施形態の制御部の一連の処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of a series of processing of the control part of an embodiment. 第1モデルを模式的に示す図である。It is a figure which shows the 1st model schematically. 第2モデルを模式的に示す図である。It is a figure which shows the 2nd model schematically. 実施形態の学習装置の構成の一例を示す図である。It is a figure which shows an example of the structure of the learning apparatus of embodiment. 実施形態の制御部の一連の処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of a series of processing of the control part of an embodiment. 実施形態の制御部の一連の処理の流れの他の例を示すフローチャートである。It is a flowchart which shows another example of the flow of a series of processing of the control part of embodiment.
 以下、図面を参照し、本発明の制御装置、学習装置、制御方法、学習方法、及びプログラムの実施形態について説明する。本出願が日本語から英語に翻訳された場合、本開示全体を通して使用されるように、単数形「a」、「an」、および「the」は、文脈が明らかにそうでないことを示していない限り、複数形のリファレンスを含むとみなしてよい。 Hereinafter, the control device, the learning device, the control method, the learning method, and the embodiment of the program of the present invention will be described with reference to the drawings. If this application is translated from Japanese to English, the singular forms "a", "an", and "the" do not clearly indicate that the context is not, as used throughout this disclosure. As long as it contains a plural reference, it may be considered.
 [航空機の構成]
 図1は、実施形態の制御装置100を備える航空機1の構成の一例を示す図である。図示のように、航空機1は、例えば、主翼10と、垂直尾翼12と、水平尾翼14と、制御装置100とを備える。図中X、Y、Z軸は、機体固定座標系を表しており、X軸は、ロール軸を表し、Y軸はピッチ軸を表し、Z軸はヨー軸を表している。航空機1は、「構造物」の一例である。
[Aircraft configuration]
FIG. 1 is a diagram showing an example of the configuration of an aircraft 1 including the control device 100 of the embodiment. As shown, the aircraft 1 includes, for example, a main wing 10, a vertical stabilizer 12, a horizontal stabilizer 14, and a control device 100. In the figure, the X, Y, and Z axes represent the aircraft fixed coordinate system, the X axis represents the roll axis, the Y axis represents the pitch axis, and the Z axis represents the yaw axis. Aircraft 1 is an example of a "structure".
 主翼10は、航空機1の重量を支える揚力を発生させる翼である。例えば、主翼10には、フラップFL1~FL8と、光ファイバセンサSFBと、圧力センサSとが設けられる。主翼10は、「構造物の翼」の一例である。 The main wing 10 is a wing that generates lift that supports the weight of the aircraft 1. For example, the wing 10 includes a flap FL1 ~ FL8, the optical fiber sensor S FB, and a pressure sensor S P is provided. The main wing 10 is an example of a "wing of a structure".
 フラップFL1~FL8は、主翼10の揚力を増大させる可動翼である。以下、これらフラップFL1~FL8を区別しない場合、まとめてフラップFLと称して説明する。主翼10には、フラップFLに加えて、更に、機体をロールさせるためのエルロン(補助翼)や、揚力を減少させるためのスポイラーといった他の可動翼が設けられてもよい。エルロンは、いずれかのフラップFLであってもよいし、フラップFLとは別に設けられた可動翼であってもよい。 Flaps FL1 to FL8 are movable wings that increase the lift of the main wing 10. Hereinafter, when these flaps FL1 to FL8 are not distinguished, they will be collectively referred to as flaps FL. In addition to the flap FL, the main wing 10 may be further provided with other movable wings such as an aileron (auxiliary wing) for rolling the airframe and a spoiler for reducing lift. The aileron may be any flap FL, or may be a movable wing provided separately from the flap FL.
 光ファイバセンサSFBは、例えば、主翼10の少なくとも片面(例えば上面)の数か所にライン状に設けられる。光ファイバセンサSFBの各ラインは、例えば、主翼10の主桁と後桁に沿って(Y軸方向に沿って)取り付けられる。光ファイバセンサSFBには、例えば、各ライン上において、FBG(Fiber Bragg Grating)が設置され、数[mm]から数十[cm]程度の間隔でひずみがセンシングされる。これによって、光ファイバセンサSFBは、主翼10の数十箇所から数千箇所のひずみを離散的な分布として検出することができる。 The optical fiber sensor S FB is provided in a line shape at, for example, several places on at least one side (for example, the upper surface) of the main wing 10. Each line of the optical fiber sensor SFB is attached, for example, along the main girder and the rear girder of the main wing 10 (along the Y-axis direction). In the optical fiber sensor S FB , for example, an FBG (Fiber Bragg Grating) is installed on each line, and strain is sensed at intervals of about several [mm] to several tens [cm]. As a result, the optical fiber sensor SFB can detect the strains of the main blade 10 at several tens to several thousand points as a discrete distribution.
 圧力センサSは、例えば、ピトー静圧管であり、主翼10にかかる圧力を検出する。例えば、圧力センサSは、主翼10のスパンセグメントの中央に、X軸方向に沿って一次元のアレイ状に配置される。具体的には、圧力センサSは、主翼10の上面側の十数箇所に設置され、主翼10の下面側の十数箇所に設置される。圧力センサSは、検出した圧力値を主翼10の断面内で積分することで主翼10にかかる荷重分布を検出する。 The pressure sensor S P is, for example, a pitot-static tube, detects the pressure applied to the main wing 10. For example, the pressure sensor SP is arranged in a one-dimensional array in the center of the span segment of the main wing 10 along the X-axis direction. Specifically, the pressure sensor S P is placed in ten places of the upper surface of the main wing 10 is installed in ten places on the lower surface side of the main wing 10. The pressure sensor S P is a detected pressure value to detect a load distribution according to the main wing 10 by integrating in the section of the main wing 10.
 垂直尾翼12及び水平尾翼14は、航空機1の機体の重心から離れた位置(例えば機体の末端)に設けられる。垂直尾翼12には、例えば、Z軸周りの機体の動きを制御するための方向舵が設けられてよい。水平尾翼14には、例えば、Y軸周りの機体の動きを制御するための昇降舵が設けられてよい。 The vertical stabilizer 12 and the horizontal stabilizer 14 are provided at positions away from the center of gravity of the aircraft 1 (for example, the end of the aircraft). The vertical stabilizer 12 may be provided with, for example, a rudder for controlling the movement of the airframe around the Z axis. The horizontal stabilizer 14 may be provided with, for example, an elevator for controlling the movement of the airframe around the Y-axis.
 [制御装置の構成]
 図2は、実施形態の制御装置100の構成の一例を示す図である。図示のように、制御装置100は、例えば、通信部102と、駆動部104と、制御部110と、記憶部130とを備える。
[Control device configuration]
FIG. 2 is a diagram showing an example of the configuration of the control device 100 of the embodiment. As shown in the figure, the control device 100 includes, for example, a communication unit 102, a drive unit 104, a control unit 110, and a storage unit 130.
 通信部102は、例えば、受信機や送信機を含む無線通信モジュールであり、ネットワークを介して外部装置と無線通信する。ネットワークには、例えば、WAN(Wide Area Network)やLAN(Local Area Network)などが含まれてよい。外部装置には、例えば、後述する学習装置200が含まれる。 The communication unit 102 is, for example, a wireless communication module including a receiver and a transmitter, and wirelessly communicates with an external device via a network. The network may include, for example, WAN (Wide Area Network) or LAN (Local Area Network). The external device includes, for example, a learning device 200 described later.
 駆動部104は、例えば、サーボモータ等のアクチュエータである。駆動部104は、主翼10に設けられたフラップFLや、エルロン、スポイラーといった可動翼を駆動する。駆動部104は、垂直尾翼12に設けられた方向舵や、水平尾翼14に設けられた昇降舵を駆動してもよい。 The drive unit 104 is, for example, an actuator such as a servo motor. The drive unit 104 drives movable wings such as a flap FL provided on the main wing 10, an aileron, and a spoiler. The drive unit 104 may drive a rudder provided on the vertical stabilizer 12 or an elevator provided on the horizontal stabilizer 14.
 制御部110は、例えば、取得部112と、制御量決定部114と、駆動制御部116とを備える。制御量決定部114は、「第1決定部」及び「第2決定部」の一例である。 The control unit 110 includes, for example, an acquisition unit 112, a control amount determination unit 114, and a drive control unit 116. The control amount determination unit 114 is an example of the “first determination unit” and the “second determination unit”.
 制御部110の構成要素は、例えば、CPU(Central Processing Unit)やGPU(Graphics Processing Unit)などのプロセッサが記憶部130に格納されたプログラムを実行することにより実現される。制御部110の構成要素の一部または全部は、LSI(Large Scale Integration)、ASIC(Application Specific Integrated Circuit)、またはFPGA(Field-Programmable Gate Array)などのハードウェアにより実現されてもよいし、ソフトウェアとハードウェアの協働によって実現されてもよい。 The component of the control unit 110 is realized by, for example, a processor such as a CPU (Central Processing Unit) or a GPU (Graphics Processing Unit) executing a program stored in the storage unit 130. Some or all of the components of the control unit 110 may be realized by hardware such as LSI (Large Scale Integration), ASIC (Application Specific Integrated Circuit), or FPGA (Field-Programmable Gate Array), or software. And may be realized by the cooperation of hardware.
 記憶部130は、例えば、HDD(Hard Disc Drive)、フラッシュメモリ、EEPROM(Electrically Erasable Programmable Read Only Memory)、ROM(Read Only Memory)、RAM(Random Access Memory)などにより実現される。記憶部130は、ファームウェアやアプリケーションプログラムなどの各種プログラムの他に、第1モデルデータD1や、第2モデルデータD2などを格納する。第1モデルデータD1及び第2モデルデータD2は、例えば、ネットワークを介して学習装置200から記憶部130にインストールされてもよいし、制御装置100のドライブ装置に接続された可搬型の記憶媒体から記憶部130にインストールされてもよい。 The storage unit 130 is realized by, for example, an HDD (Hard Disc Drive), a flash memory, an EEPROM (Electrically Erasable Programmable Read Only Memory), a ROM (Read Only Memory), a RAM (Random Access Memory), or the like. The storage unit 130 stores the first model data D1 and the second model data D2 in addition to various programs such as firmware and application programs. The first model data D1 and the second model data D2 may be installed in the storage unit 130 from the learning device 200 via a network, or may be installed from a portable storage medium connected to the drive device of the control device 100, for example. It may be installed in the storage unit 130.
 第1モデルデータD1は、第1モデルMDL1を定義した情報(プログラムまたはデータ構造)である。第1モデルMDL1は、例えば、主翼10のひずみ分布と、主翼10の可動翼の制御量とが入力されると、主翼10の荷重分布と迎角αを出力するように学習されたモデルである。このようなモデルは、例えば、複数のニューラルネットワークが多段に構成されたモデルによって実現されてよい。複数のニューラルネットワークのそれぞれには、例えば、入力層と、少なくとも一つの中間層(隠れ層)と、出力層とが含まれる。制御量には、例えば、舵角が含まれる。以下、一例として、制御量が舵角であるものとして説明する。 The first model data D1 is information (program or data structure) that defines the first model MDL1. The first model MDL1 is, for example, a model learned to output the load distribution of the main wing 10 and the angle of attack α when the strain distribution of the main wing 10 and the control amount of the movable wing of the main wing 10 are input. .. Such a model may be realized, for example, by a model in which a plurality of neural networks are configured in multiple stages. Each of the plurality of neural networks includes, for example, an input layer, at least one intermediate layer (hidden layer), and an output layer. The control amount includes, for example, a steering angle. Hereinafter, as an example, the control amount will be described as the steering angle.
 第2モデルデータD2は、第2モデルMDL2を定義した情報(プログラムまたはデータ構造)である。第2モデルMDL2は、例えば、強化学習において扱われる行動価値関数Q(s,a)の近似関数を学習したモデルである。行動価値関数Q(s,a)とは、ある時刻tのある環境状態sの下で、ある行動aを選択したときの価値を関数として表したものである。従って、第2モデルMDL2は、環境状態sが入力されると、環境状態sの下で取り得ることが可能な一つまたは複数の行動(行動変数)aのそれぞれの価値(Q値ともいう)を出力する。第2モデルMDL2は、例えば、入力層と、複数の中間層(隠れ層)と、出力層とを含むニューラルネットワークによって実現されてよい。このように、行動価値関数Q(s,a)をニューラルネットワークに近似関数として学習させる手法は、深層強化学習の一つの手法であるDQN(Deep Q-Network)と呼ばれる。 The second model data D2 is information (program or data structure) that defines the second model MDL2. The second model MDL2, for example, action value function Q (s t, a t) to be handled in reinforcement learning a model trained the approximation function. Action value function Q (s t, a t) is the one in which there is under environmental conditions s t of the time t, and represents the value when selecting a certain action a t as a function. Thus, the second model MDL2, when environmental conditions s t is input, one or more actions that can be taken under the environmental conditions s t (behavior variable) a respective value (Q value of t Also called) is output. The second model MDL2 may be realized by, for example, a neural network including an input layer, a plurality of intermediate layers (hidden layers), and an output layer. In this way, the approach to learning action-value function Q (s t, a t) as the approximate function to the neural network is referred to as a DQN (Deep Q-Network) is one of the methods of deep reinforcement learning.
 第1モデルデータD1及び第2モデルデータD2には、例えば、ニューラルネットワークを構成する複数の層のそれぞれに含まれるユニットが互いにどのように結合されるのかという結合情報や、結合されたユニット間で入出力されるデータに付与される結合係数などの各種情報が含まれる。結合情報とは、例えば、各層に含まれるユニット数や、各ユニットの結合先のユニットの種類を指定する情報、各ユニットを実現する活性化関数、隠れ層のユニット間に設けられたゲートなどの情報を含む。ユニットを実現する活性化関数は、例えば、正規化線形関数(ReLU関数)であってもよいし、シグモイド関数や、ステップ関数、その他の関数などであってもよい。ゲートは、例えば、活性化関数によって返される値(例えば1または0)に応じて、ユニット間で伝達されるデータを選択的に通過させたり、重み付けたりする。結合係数は、例えば、ニューラルネットワークの隠れ層において、ある層のユニットから、より深い層のユニットにデータが出力される際に、出力データに対して付与される重みを含む。結合係数は、各層の固有のバイアス成分などを含んでもよい。 The first model data D1 and the second model data D2 include, for example, connection information on how the units included in each of the plurality of layers constituting the neural network are connected to each other, and between the combined units. Various information such as the coupling coefficient given to the input / output data is included. The connection information includes, for example, the number of units included in each layer, information that specifies the type of unit to which each unit is connected, an activation function that realizes each unit, a gate provided between units in the hidden layer, and the like. Contains information. The activation function that realizes the unit may be, for example, a rectified linear function (ReLU function), a sigmoid function, a step function, or other function. The gate selectively passes or weights the data transmitted between the units, for example, depending on the value returned by the activation function (eg 1 or 0). The coupling coefficient includes, for example, a weight given to the output data when data is output from a unit of a certain layer to a unit of a deeper layer in a hidden layer of a neural network. The coupling coefficient may include a bias component peculiar to each layer.
 [運用時(ランタイム)の処理フロー]
 以下、フローチャートに即して制御部110の運用時の一連の処理の流れを説明する。運用とは、予め学習された第1モデルMDL1及び第2モデルMDL2の出力結果を用いて、主翼10の可動翼を制御する動作状態を表す。図3は、実施形態の制御部110の一連の処理の流れの一例を示すフローチャートである。本フローチャートの処理は、例えば、所定の周期で繰り返し行われてよい。
[Runtime processing flow]
Hereinafter, a series of processing flows during operation of the control unit 110 will be described according to the flowchart. The operation represents an operation state in which the movable wing of the main wing 10 is controlled by using the output results of the first model MDL1 and the second model MDL2 learned in advance. FIG. 3 is a flowchart showing an example of a series of processing flows of the control unit 110 of the embodiment. The processing of this flowchart may be repeated, for example, at a predetermined cycle.
 まず、取得部112は、光ファイバセンサSFBから、主翼10のひずみ分布を示す情報(以下、ひずみ情報と称する)を取得するとともに、駆動部104から、主翼10の可動翼の舵角を示す情報(以下、舵角情報と称する)を取得する(ステップS100)。 First, the acquisition unit 112 acquires information indicating the strain distribution of the main wing 10 (hereinafter referred to as strain information) from the optical fiber sensor SFB , and indicates the steering angle of the movable wing of the main wing 10 from the drive unit 104. Information (hereinafter referred to as steering angle information) is acquired (step S100).
 ひずみ情報及び舵角情報は、例えば、多次元のベクトルである。以下、ひずみ情報のベクトルのことを「ひずみベクトルξ(→)」と称し、舵角情報のベクトルのことを「舵角ベクトルδ(→)」と称して説明する。(→)はベクトル記号を表している。 Strain information and steering angle information are, for example, multidimensional vectors. Hereinafter, the strain information vector will be referred to as a “strain vector ξ (→)”, and the steering angle information vector will be referred to as a “steering angle vector δ (→)”. (→) represents a vector symbol.
 ひずみベクトルξ(→)には、例えば、主桁に沿って設けられた光ファイバセンサSFBの各FBGが検出したひずみ値と、後桁に沿って設けられた光ファイバセンサSFBの各FBGが検出したひずみ値とのそれぞれが要素値として含まれる。 In the strain vector ξ (→), for example, the strain value detected by each FBG of the optical fiber sensor S FB provided along the main girder and each FBG of the optical fiber sensor S FB provided along the rear girder. Each of the strain values detected by is included as an element value.
 舵角ベクトルδ(→)には、例えば、フラップFL1~FL8のそれぞれの舵角値が要素値として含まれる。舵角ベクトルδ(→)には、エルロンやスポイラーといった他の可動翼の舵角値が要素として含まれてもよい。 The rudder angle vector δ (→) includes, for example, the rudder angle values of the flaps FL1 to FL8 as element values. The rudder angle vector δ (→) may include the rudder angle values of other movable wings such as ailerons and spoilers as elements.
 ひずみ情報及び舵角情報は、ベクトル、すなわち一階のテンソルに限られず、二階以上のテンソルであってもよい。 The strain information and the rudder angle information are not limited to the vector, that is, the tensor of the first floor, and may be the tensor of the second floor or higher.
 次に、制御量決定部114は、取得部112によって取得されたひずみベクトルξ(→)と舵角ベクトルδ(→)とを、予め学習された第1モデルMDL1に入力する(ステップS102)。 Next, the control amount determination unit 114 inputs the strain vector ξ (→) and the steering angle vector δ (→) acquired by the acquisition unit 112 into the pre-learned first model MDL1 (step S102).
 図4は、第1モデルMDL1を模式的に示す図である。図示のように、例えば、第1モデルMDL1は、モデルMDL1-1とモデルMDL1-2が多段に構成される。モデルMDL1-1とモデルMDL1-2とは、それぞれニューラルネットワークである。 FIG. 4 is a diagram schematically showing the first model MDL1. As shown in the figure, for example, in the first model MDL1, the model MDL1-1 and the model MDL1-2 are configured in multiple stages. The model MDL1-1 and the model MDL1-2 are neural networks, respectively.
 前段のモデルMDL1-1には、ひずみベクトルξ(→)と、舵角ベクトルδ(→)とが入力される。前段のモデルMDL1-1は、これらベクトルが入力されると、主翼10にかかる荷重の分布値を要素とするベクトル(以下、荷重ベクトルF(→)と称する)を出力する。 The strain vector ξ (→) and the rudder angle vector δ (→) are input to the model MDL1-1 in the previous stage. When these vectors are input, the model MDL1-1 in the previous stage outputs a vector having the distribution value of the load applied to the main wing 10 as an element (hereinafter, referred to as a load vector F (→)).
 後段のモデルMDL1-2には、前段のモデルMDL1-1の出力結果である荷重ベクトルF(→)に加えて、更に、前段のモデルMDL1-1にも入力された舵角ベクトルδ(→)が入力される。後段のモデルMDL1-2は、荷重ベクトルF(→)と舵角ベクトルδ(→)とが入力されると、主翼10の迎角αを、0階のテンソル、すなわちスカラとして出力する。 In the latter model MDL1-2, in addition to the load vector F (→) which is the output result of the previous model MDL1-1, the rudder angle vector δ (→) also input to the previous model MDL1-1. Is entered. When the load vector F (→) and the rudder angle vector δ (→) are input, the model MDL1-2 in the latter stage outputs the angle of attack α of the main wing 10 as a 0th-order tensor, that is, a scalar.
 図3のフローチャートの説明に戻る。制御量決定部114は、第1モデルMDL1にひずみ情報であるひずみベクトルξ(→)と舵角情報である舵角ベクトルδ(→)とを入力すると、その第1モデルMDL1の前段のモデルMDL1-1から、その出力結果である荷重分布を示す情報(以下、荷重分布情報と称する)を取得し、後段のモデルMDL1-2から、その出力結果である迎角αを示す情報(以下、迎角情報と称する)を取得する(ステップS104)。 Return to the explanation of the flowchart in FIG. When the control amount determination unit 114 inputs the strain vector ξ (→) which is the strain information and the steering angle vector δ (→) which is the steering angle information into the first model MDL1, the model MDL1 in the previous stage of the first model MDL1. Information indicating the load distribution which is the output result (hereinafter referred to as load distribution information) is acquired from -1, and information indicating the angle of attack α which is the output result is obtained from the model MDL1-2 in the subsequent stage (hereinafter referred to as reception). Acquire (referred to as angle information) (step S104).
 次に、制御量決定部114は、取得した荷重分布情報が示す荷重分布の総和と、目標とする荷重分布の総和との差分(以下、総荷重差ΔFsumと称する)を算出する(ステップS106)。荷重分布の総和とは、例えば、荷重ベクトルF(→)に要素として含まれる全ての荷重値の総和である。目標とする荷重分布とは、例えば、航空機1が水平飛行を保つために、主翼10が受け持つ必要のある総荷重であってよい。 Next, the control amount determination unit 114 calculates the difference between the sum of the load distributions indicated by the acquired load distribution information and the sum of the target load distributions (hereinafter, referred to as total load difference ΔF sum ) (step S106). ). The sum of the load distributions is, for example, the sum of all the load values included as elements in the load vector F (→). The target load distribution may be, for example, the total load that the main wing 10 must bear in order for the aircraft 1 to maintain level flight.
 次に、制御量決定部114は、舵角ベクトルδ(→)、迎角α、荷重ベクトルF(→)、総荷重差ΔFsumのうち一部または全部(好ましくは全部)を状態変数sとして第2モデルMDL2に入力する(ステップS108)。すなわち、状態変数sは、舵角ベクトルδ(→)、迎角α、荷重ベクトルF(→)、総荷重差ΔFsumのうち一部または全部が要素として含まれる多次元ベクトルである。 Next, the control amount determination unit 114 uses a part or all (preferably all) of the steering angle vector δ (→), the angle of attack α, the load vector F (→), and the total load difference ΔF sum as the state variables s. Input to the second model MDL2 (step S108). That is, the state variable s is a multidimensional vector in which a part or all of the rudder angle vector δ (→), the angle of attack α, the load vector F (→), and the total load difference ΔF sum is included as an element.
 図5は、第2モデルMDL2を模式的に示す図である。図示の例のように、第2モデルMDL2には、舵角ベクトルδ(→)、迎角α、荷重ベクトルF(→)、総荷重差ΔFsumのうち一部または全部を含む状態変数sが入力される。第2モデルMDL2は、状態変数sが入力されると、その状態変数sの下で取り得ることが可能な一つまたは複数の行動aのそれぞれの価値Q(s,a)を出力する。状態変数sの下で取り得ることが可能な行動aが複数存在する場合、複数の行動aのそれぞれに価値が存在する。従って、行動価値Q(s,a)は、次元数(=要素数)が行動aの数と同じ多次元ベクトルによって表される。以下、この多次元ベクトルをQ(s,a)(→)として説明する。 FIG. 5 is a diagram schematically showing the second model MDL2. As shown in the illustrated example, the second model MDL2 contains a state variable s including a part or all of the rudder angle vector δ (→), the angle of attack α, the load vector F (→), and the total load difference ΔF sum. Entered. When the state variable s is input, the second model MDL2 outputs the respective value Q (s, a) of one or a plurality of actions a that can be taken under the state variable s. When there are a plurality of actions a that can be taken under the state variable s, each of the plurality of actions a has a value. Therefore, the action value Q (s, a) is represented by a multidimensional vector in which the number of dimensions (= number of elements) is the same as the number of actions a. Hereinafter, this multidimensional vector will be described as Q (s, a) (→).
 行動aは、例えば、以下の3つの選択肢の中から選択される。これら(1)~(3)の3つの選択肢は、あくまでも一例であり、一部が省略されてもよいし、別の選択肢が加えられてもよい。 Action a is selected from, for example, the following three options. These three options (1) to (3) are merely examples, and some of them may be omitted or another option may be added.
 (1)可動翼の舵角を変更しない。
 (2)可動翼の舵角をプラス1度大きくする。
 (3)可動翼の舵角をマイナス1度小さくする。
(1) Do not change the steering angle of the movable wing.
(2) Increase the steering angle of the movable wing by +1 degree.
(3) Reduce the steering angle of the movable wing by -1 degree.
 例えば、制御対象とする可動翼が8つのフラップFL1~FL8である場合、フラップFL1~FL8のそれぞれについて、(1)~(3)の選択肢の中からいずれか一つが選択される。この場合、第2モデルMDL2によって出力される行動価値Q(s,a)(→)は、24次元のベクトルとなる。これによって、制御対象とする全ての可動翼の舵角が、ある一つの処理周期の中で一度に決定される。 For example, when the movable wings to be controlled are eight flaps FL1 to FL8, one of the options (1) to (3) is selected for each of the flaps FL1 to FL8. In this case, the action value Q (s, a) (→) output by the second model MDL2 is a 24-dimensional vector. As a result, the steering angles of all the movable blades to be controlled are determined at once in one processing cycle.
 (1)の選択肢の行動は、可動翼の翼面に交差する方向に関して、その翼面の一方の面である第1面側(舵角のプラス側)と他方の面である第2面側(舵角のマイナス側)とのいずれにも可動翼を動かさないこと、と定義されてもよい。(2)の選択肢の行動は、可動翼の翼面に交差する方向に関して、第1面側に可動翼を動かすこと、と定義されてもよい。(3)の選択肢の行動は、可動翼の翼面に交差する方向に関して、第2面側に可動翼を動かすこと、と定義されてもよい。 The action of the option (1) is the first surface side (plus side of the rudder angle), which is one surface of the wing surface, and the second surface side, which is the other surface, with respect to the direction in which the movable wing intersects the wing surface. It may be defined that the movable wing is not moved on either side (minus side of the rudder angle). The action of the option (2) may be defined as moving the movable wing toward the first surface side in a direction intersecting the wing surface of the movable wing. The action of the option (3) may be defined as moving the movable wing toward the second surface side in a direction intersecting the wing surface of the movable wing.
 図3のフローチャートの説明に戻る。制御量決定部114は、第2モデルMDL2によって行動価値Q(s,a)(→)が出力されると、その行動価値Q(s,a)(→)を取得する(ステップS110)。そして、制御量決定部114は、取得した行動価値Q(s,a)(→)に基づいて、制御対象の各可動翼の舵角を決定する(ステップS112)。 Return to the explanation of the flowchart in FIG. When the action value Q (s, a) (→) is output by the second model MDL2, the control amount determination unit 114 acquires the action value Q (s, a) (→) (step S110). Then, the control amount determination unit 114 determines the steering angle of each movable wing to be controlled based on the acquired action value Q (s, a) (→) (step S112).
 例えば、制御対象の可動翼として、フラップFL1に着目したとする。この場合、行動価値Q(s,a)(→)には、フラップFL1に対して、(1)の行動aを起こしたときの価値と、(2)の行動aを起こしたときの価値と、(3)の行動aを起こしたときの価値とが要素値として含まれることになる。例えば、制御量決定部114は、フラップFL1に対するこれらの3つの行動aの中から、最も価値が高い行動aを選択する。この際、制御量決定部114は、Epsilon-Greedy法のように、ある確率εで全ての行動aの中から無作為に行動を選択し、残りの確率(1-ε)で最も価値の高い行動aを選択してもよい。 For example, suppose that the flap FL1 is focused on as a movable wing to be controlled. In this case, the action value Q (s, a) (→) includes the value when the action a of (1) is caused and the value when the action a of (2) is caused with respect to the flap FL1. , (3) The value when the action a is taken is included as an element value. For example, the control amount determination unit 114 selects the action a having the highest value from these three actions a with respect to the flap FL1. At this time, the control amount determination unit 114 randomly selects an action from all actions a with a certain probability ε like the Epsilon-Greedy method, and has the highest value with the remaining probability (1-ε). Action a may be selected.
 そして、制御量決定部114は、各可動翼について決定した行動aを基に、次の周期t+1に可動翼がとるべき舵角を決定する。 Then, the control amount determination unit 114 determines the steering angle that the movable wing should take in the next cycle t + 1 based on the action a determined for each movable wing.
 次に、駆動制御部116は、制御量決定部114によって決定された舵角に基づいて、駆動部104に含まれる各アクチュエータを制御して、可動翼を駆動する(ステップS114)。具体的には、駆動制御部116は、制御量決定部114によって決定された舵角(制御量)から各アクチュエータの操作量を決定し、その決定した操作量で各アクチュエータを制御することで、可動翼を駆動する。これによって本フローチャートの処理が終了する。 Next, the drive control unit 116 controls each actuator included in the drive unit 104 based on the steering angle determined by the control amount determination unit 114 to drive the movable wing (step S114). Specifically, the drive control unit 116 determines the operation amount of each actuator from the steering angle (control amount) determined by the control amount determination unit 114, and controls each actuator with the determined operation amount. Drive the movable wings. This ends the processing of this flowchart.
 [学習装置の構成]
 以下、第1モデルMDL1及び第2モデルMDL2を学習する学習装置200について説明する。学習装置200は、単一の装置であってもよいし、WANやLANといったネットワークを介して接続された複数の装置が互いに協働して動作するシステムであってもよい。すなわち、学習装置200は、分散コンピューティングやクラウドコンピューティングを利用したシステムに含まれる複数のコンピュータ(プロセッサ)によって実現されてもよい。
[Configuration of learning device]
Hereinafter, the learning device 200 for learning the first model MDL1 and the second model MDL2 will be described. The learning device 200 may be a single device, or may be a system in which a plurality of devices connected via a network such as a WAN or a LAN operate in cooperation with each other. That is, the learning device 200 may be realized by a plurality of computers (processors) included in a system using distributed computing or cloud computing.
 図6は、実施形態の学習装置200の構成の一例を示す図である。図示のように、例えば、学習装置200は、通信部202と、制御部210と、記憶部230とを備える。 FIG. 6 is a diagram showing an example of the configuration of the learning device 200 of the embodiment. As shown, for example, the learning device 200 includes a communication unit 202, a control unit 210, and a storage unit 230.
 通信部202は、例えば、受信機や送信機を含む無線通信モジュールであり、ネットワークを介して制御装置100等の外部装置と無線通信する。 The communication unit 202 is, for example, a wireless communication module including a receiver and a transmitter, and wirelessly communicates with an external device such as a control device 100 via a network.
 制御部210は、例えば、取得部212と、学習部214とを備える。制御部210の構成要素は、例えば、CPUやGPUなどのプロセッサが記憶部230に格納されたプログラムを実行することにより実現される。制御部210の構成要素の一部または全部は、LSI、ASIC、またはFPGAなどのハードウェアにより実現されてもよいし、ソフトウェアとハードウェアの協働によって実現されてもよい。 The control unit 210 includes, for example, an acquisition unit 212 and a learning unit 214. The components of the control unit 210 are realized, for example, by a processor such as a CPU or GPU executing a program stored in the storage unit 230. Some or all of the components of the control unit 210 may be realized by hardware such as LSI, ASIC, or FPGA, or may be realized by the cooperation of software and hardware.
 記憶部230は、例えば、HDD、フラッシュメモリ、EEPROM、ROM、RAMなどにより実現される。記憶部230は、ファームウェアやアプリケーションプログラムなどの各種プログラムの他に、上述した第1モデルデータD1及び第2モデルデータD2と、第3モデルデータD3と、教師データD4とを格納する。 The storage unit 230 is realized by, for example, an HDD, a flash memory, an EEPROM, a ROM, a RAM, or the like. The storage unit 230 stores the first model data D1 and the second model data D2, the third model data D3, and the teacher data D4 described above, in addition to various programs such as firmware and application programs.
 第3モデルデータD3は、第3モデルMDL3を定義した情報(プログラムまたはデータ構造)である。第3モデルMDL3は、深層強化学習を行うためのシミュレータであり、上述した運用時には使用されない。 The third model data D3 is information (program or data structure) that defines the third model MDL3. The third model MDL3 is a simulator for performing deep reinforcement learning, and is not used during the above-mentioned operation.
 第3モデルMDL3は、例えば、可動翼の舵角を表す舵角ベクトルδ(→)と、主翼10の迎角αとが入力されると、主翼10の荷重分布を表す荷重ベクトルF(→)を出力するように学習されたモデルである。このようなモデルは、例えば、入力層と、少なくとも一つの中間層(隠れ層)と、出力層とを含むニューラルネットワークによって実現されてよい。 In the third model MDL3, for example, when the steering angle vector δ (→) representing the steering angle of the movable wing and the angle of attack α of the main wing 10 are input, the load vector F (→) representing the load distribution of the main wing 10 is input. Is a model trained to output. Such a model may be realized, for example, by a neural network including an input layer, at least one intermediate layer (hidden layer), and an output layer.
 第3モデルMDL3は、後述する風洞試験の結果を表す単なる静的なデータセット或いはデータベースであってもよい。例えば、試験者が、可動翼の舵角と主翼10の迎角αとを任意に決定して風洞試験を行い、その試験中に主翼10の荷重分布を観測したとする。この場合、第3モデルMDL3は、可動翼の舵角と主翼10の迎角αとのデータセットに対して、観測された主翼10の荷重分布が対応付けられたテーブルデータなどであってよい。第3モデルMDL3は、テーブルデータの代わりに、入力値である可動翼の舵角及び主翼10の迎角αと、出力値である主翼10の荷重分布との関係性を関数式等で定義した数値モデルであってもよい。 The third model MDL3 may be a mere static data set or database representing the results of the wind tunnel test described later. For example, suppose that the tester arbitrarily determines the steering angle of the movable wing and the angle of attack α of the main wing 10 to perform a wind tunnel test, and observes the load distribution of the main wing 10 during the test. In this case, the third model MDL3 may be table data in which the observed load distribution of the main wing 10 is associated with the data set of the steering angle of the movable wing and the angle of attack α of the main wing 10. In the third model MDL3, instead of the table data, the relationship between the input value of the steering angle of the movable wing and the angle of attack α of the main wing 10 and the output value of the load distribution of the main wing 10 is defined by a function formula or the like. It may be a numerical model.
 第3モデルMDL3がニューラルネットワークで実現される場合、第3モデルデータD3には、第1モデルデータD1や第2モデルデータD2と同様に、結合情報などの各種情報が含まれてよい。 When the third model MDL3 is realized by a neural network, the third model data D3 may include various information such as coupling information as well as the first model data D1 and the second model data D2.
 教師データD4は、第1モデルMDL1を学習するためのデータである。例えば、教師データD4は、ひずみ情報であるひずみベクトルξ(→)と、舵角情報である舵角ベクトルδ(→)とに対して、第1モデルMDL1が出力すべき正解の荷重ベクトルF(→)と迎角αとが教師ラベル(ターゲットともいう)として対応付けられたデータセットである。このような教師データD4は、例えば、風洞試験を行うことで得られてよい。 The teacher data D4 is data for learning the first model MDL1. For example, in the teacher data D4, the correct load vector F (→) to be output by the first model MDL1 with respect to the strain vector ξ (→) which is the strain information and the rudder angle vector δ (→) which is the steering angle information. →) and the angle of attack α are a data set associated with each other as a teacher label (also called a target). Such teacher data D4 may be obtained, for example, by performing a wind tunnel test.
 例えば、風洞試験を行う試験室内に、航空機1の主翼10と同じもの、或いは光ファイバセンサSFB及び圧力センサSが設けられた類似模型の翼を、ターンテーブル等の回動可能な試験装置の上に載置する。そして、試験室内に気流を発生させている間、回動可能な試験装置の回転角を1度ずつ変更しながら、可動翼を駆動させる。この結果、気流が発生している環境下(既知の荷重が加えられる環境下)において、可動翼の舵角に応じて変化し得る翼のひずみと荷重が、光ファイバセンサSFB及び圧力センサSによって検出されることになる。このように、風洞試験によって、航空機1が飛行しているときと同じ環境を仮想的に作り出すことで、教師データD4は生成されてよい。 For example, the test chamber to perform wind tunnel tests, the same as the main wing 10 of the aircraft 1, or wings similar model for the optical fiber sensor S FB and the pressure sensor S P is provided, rotatable test device, such as a turntable Place on top. Then, while the air flow is generated in the test chamber, the movable wings are driven while changing the rotation angle of the rotatable test device by 1 degree. As a result, in an environment where airflow is generated (in an environment where a known load is applied), the strain and load of the blade, which can change according to the steering angle of the movable blade, are measured by the optical fiber sensor S FB and the pressure sensor S. It will be detected by P. In this way, the teacher data D4 may be generated by virtually creating the same environment as when the aircraft 1 is flying by the wind tunnel test.
 [学習時(トレーニング)の処理フロー]
 以下、フローチャートに即して制御部210の学習時の一連の処理の流れを説明する。学習とは、運用時に参照される第1モデルMDL1及び第2モデルMDL2を学習(訓練)する動作状態を表す。図7は、実施形態の制御部210の一連の処理の流れの一例を示すフローチャートである。本フローチャートの処理は、例えば、第1モデルMDL1を学習する際に所定の周期で繰り返し行われてよい。学習装置200が、分散コンピューティングやクラウドコンピューティングを利用したシステムに含まれる複数のコンピュータによって実現される場合、本フローチャートの処理の一部または全部は、複数のコンピュータによって並列処理されてよい。
[Processing flow during learning (training)]
Hereinafter, a series of processing flows during learning of the control unit 210 will be described according to the flowchart. The learning represents an operation state in which the first model MDL1 and the second model MDL2 referred to during operation are learned (trained). FIG. 7 is a flowchart showing an example of a series of processing flows of the control unit 210 of the embodiment. The processing of this flowchart may be repeated at a predetermined cycle, for example, when learning the first model MDL1. When the learning device 200 is realized by a plurality of computers included in a system utilizing distributed computing or cloud computing, a part or all of the processing of this flowchart may be processed in parallel by the plurality of computers.
 まず、取得部212は、記憶部230に格納された教師データD4から、教師ラベルに対応付けられたひずみベクトルξ(→)及び舵角ベクトルδ(→)を取得する(ステップS200)。 First, the acquisition unit 212 acquires the strain vector ξ (→) and the steering angle vector δ (→) associated with the teacher label from the teacher data D4 stored in the storage unit 230 (step S200).
 次に、学習部214は、取得部212によって取得されたひずみベクトルξ(→)及び舵角ベクトルδ(→)を、未学習の第1モデルMDL1に入力する(ステップS202)。 Next, the learning unit 214 inputs the strain vector ξ (→) and the steering angle vector δ (→) acquired by the acquisition unit 212 into the unlearned first model MDL1 (step S202).
 次に、学習部214は、第1モデルMDL1から、荷重分布情報と迎角情報とを取得する(ステップS204)。 Next, the learning unit 214 acquires the load distribution information and the angle of attack information from the first model MDL1 (step S204).
 次に、学習部214は、S200の処理で第1モデルMDL1に入力したひずみベクトルξ(→)及び舵角ベクトルδ(→)に対して、教師ラベルとして対応付けられていた荷重ベクトルF(→)と、第1モデルMDL1が荷重分布情報として出力した荷重ベクトルF(→)との差分を算出するとともに、教師ラベルとして対応付けられていた迎角αと、第1モデルMDL1が迎角情報として出力した迎角αとの差分を算出する(ステップS206)。 Next, the learning unit 214 has a load vector F (→) associated as a teacher label with respect to the strain vector ξ (→) and the angle of attack vector δ (→) input to the first model MDL1 in the process of S200. ) And the load vector F (→) output by the first model MDL1 as the load distribution information, and the angle of attack α associated as the teacher label and the first model MDL1 as the angle of attack information. The difference from the output angle of attack α is calculated (step S206).
 次に、学習部214は、荷重ベクトルF(→)の差分と、迎角αの差分が小さくなるように、第1モデルMDL1を学習する(ステップS208)。例えば、学習部214は、各差分が小さくなるように、第1モデルMDL1のパラメータである重み係数やバイアス成分などを確率的勾配降下法などを用いて決定(更新)する。 Next, the learning unit 214 learns the first model MDL1 so that the difference between the load vector F (→) and the difference between the angles of attack α become small (step S208). For example, the learning unit 214 determines (updates) the weighting coefficient, the bias component, and the like, which are parameters of the first model MDL1, by using a stochastic gradient descent method or the like so that each difference becomes small.
 学習部214は、学習した第1モデルMDL1を記憶部230に第1モデルデータD1として記憶させる。 The learning unit 214 stores the learned first model MDL1 in the storage unit 230 as the first model data D1.
 このように、学習部214は、S200からS208の処理を繰り返し行い(イタレーションを行い)、第1モデルMDL1を学習する。そして、学習部214は、十分に学習した学習済みの第1モデルMDL1を定義した第1モデルデータD1を、例えば、通信部202を介して制御装置100に送信する。これによって本フローチャートの処理が終了する。 In this way, the learning unit 214 repeats the processes of S200 to S208 (iterates) and learns the first model MDL1. Then, the learning unit 214 transmits the first model data D1 that defines the fully learned first model MDL1 to the control device 100 via, for example, the communication unit 202. This ends the processing of this flowchart.
 図8は、実施形態の制御部210の一連の処理の流れの他の例を示すフローチャートである。本フローチャートの処理は、例えば、第2モデルMDL2を学習する際に所定の周期で繰り返し行われてよい。また、学習装置200が、分散コンピューティングやクラウドコンピューティングを利用したシステムに含まれる複数のコンピュータによって実現される場合、本フローチャートの処理の一部または全部は、複数のコンピュータによって並列処理されてよい。 FIG. 8 is a flowchart showing another example of a series of processing flow of the control unit 210 of the embodiment. The processing of this flowchart may be repeated at a predetermined cycle, for example, when learning the second model MDL2. Further, when the learning device 200 is realized by a plurality of computers included in a system using distributed computing or cloud computing, a part or all of the processing of this flowchart may be processed in parallel by the plurality of computers. ..
 まず、取得部212は、ある周期(時刻)tにおける主翼10またはそれの類似模型の翼の状態変数s(t)を取得する(ステップS300)。ここでの状態変数s(t)は、運用時において第2モデルMDL2に入力される変数と同じものである。例えば、運用時のS108の処理において、第2モデルMDL2に、舵角ベクトルδ(→)、迎角α、荷重ベクトルF(→)、総荷重差ΔFsumの全部を含む状態変数s(t)が入力される場合、学習時のS300の処理において、これら全てを含む状態変数s(t)が取得される。 First, the acquisition unit 212 acquires the state variable s (t) of the main wing 10 or a wing of a similar model thereof in a certain period (time) t (step S300). The state variable s (t) here is the same as the variable input to the second model MDL2 during operation. For example, in the processing of S108 during operation, the state variable s (t) including all of the steering angle vector δ (→), the angle of attack α, the load vector F (→), and the total load difference ΔF sum in the second model MDL2. When is input, the state variable s (t) including all of them is acquired in the process of S300 at the time of learning.
 次に、学習部214は、取得部212によって取得された状態変数s(t)を、第2モデルMDL2に入力する(ステップS302)。 Next, the learning unit 214 inputs the state variable s (t) acquired by the acquisition unit 212 into the second model MDL2 (step S302).
 次に、学習部214は、第2モデルMDL2が行動価値Q(s,a)(→)を出力すると、その行動価値Q(s,a)(→)を第2モデルMDL2から取得する(ステップS304)。 Next, when the second model MDL2 outputs the action value Q (s, a) (→), the learning unit 214 acquires the action value Q (s, a) (→) from the second model MDL2 (step). S304).
 次に、学習部214は、取得した行動価値Q(s,a)(→)に基づいて、状態変数sの下で取り得ることが可能な一つまたは複数の行動の中から、最適な行動aを選択する(ステップS306)。最適な行動aとは、例えば、最も価値が高くなる行動であってもよいし、Epsilon-Greedy法に基づく行動であってもよい。Epsilon-Greedy法を採用して最適な行動aを選択する場合、学習部214は、本フローチャートの処理が繰り返されるごとに(イタレーションの回数が増えるごとに)、確率εを小さくしてよい。最適な行動aは、遺伝的アルゴリズムのルーレット選択の手法を用いて選択されてもよいし、ボルツマン分布を利用したソフトマックス手法を用いて選択されてもよい。 Next, the learning unit 214 determines the optimum action from one or more actions that can be taken under the state variable s based on the acquired action value Q (s, a) (→). Select a (step S306). The optimal action a may be, for example, the action having the highest value or the action based on the Epsilon-Greedy method. When the optimum action a is selected by adopting the Epsilon-Greedy method, the learning unit 214 may reduce the probability ε each time the processing of this flowchart is repeated (as the number of iterations increases). The optimal behavior a may be selected using the roulette selection method of the genetic algorithm, or may be selected using the softmax method utilizing the Boltzmann distribution.
 次に、学習部214は、選択した行動を表す行動変数aを、学習済みの第3モデルMDL3に入力する(ステップS308)。すなわち、学習部214は、制御対象の各可動翼が次の周期t+1において取るべき舵角を要素とした舵角ベクトルδ(→)を、学習済みの第3モデルMDL3に入力する。この際、学習部214は、第2モデルMDL2が出力した次の周期t+1の舵角ベクトルδ(→)に加えて、現在の周期tの迎角α、すなわち、S302の処理で第2モデルMDL2に対して状態変数sとして入力した迎角αを第3モデルMDL3にも入力する。 Next, the learning unit 214 inputs the action variable a representing the selected action into the learned third model MDL3 (step S308). That is, the learning unit 214 inputs the steering angle vector δ (→) having the steering angle to be taken by each movable wing to be controlled in the next period t + 1 as an element to the learned third model MDL3. At this time, the learning unit 214 adds the rudder angle vector δ (→) of the next period t + 1 output by the second model MDL2, and the angle of attack α of the current period t, that is, the processing of the second model MDL2 in S302. The angle of attack α input as the state variable s is also input to the third model MDL3.
 次に、学習部214は、次の周期t+1における主翼10の状態変数s´を表す情報として、第3モデルMDL3から、次の周期t+1における荷重分布情報(すなわち荷重ベクトルF(→))を取得する(ステップS310)。 Next, the learning unit 214 acquires the load distribution information (that is, the load vector F (→)) in the next period t + 1 from the third model MDL3 as the information representing the state variable s'of the main wing 10 in the next period t + 1. (Step S310).
 次に、学習部214は、状態変数s´として取得した荷重ベクトルF(→)に基づいて総荷重差ΔFsumを計算し、計算した総荷重差ΔFsumと、舵角ベクトルδ(→)と、迎角αと、荷重ベクトルF(→)とのうち一部または全部を状態変数s´として第2モデルMDL2に入力する(ステップS312)。第2モデルMDL2に入力する状態変数s´に含まれる迎角αは、現在の周期tにおける迎角αであってよい。すなわち、学習部214は、S300の処理で取得した周期tにおける迎角αの値をそのまま引き継ぎ、状態変数s´として第2モデルMDL2に入力してよい。 Next, the learning unit 214 calculates the total load difference ΔF sum based on the load vector F (→) acquired as the state variable s', and the calculated total load difference ΔF sum and the steering angle vector δ (→). , Part or all of the interception angle α and the load vector F (→) is input to the second model MDL2 as the state variable s'(step S312). The angle of attack α included in the state variable s'input to the second model MDL2 may be the angle of attack α in the current period t. That is, the learning unit 214 may take over the value of the angle of attack α in the period t acquired in the process of S300 as it is and input it to the second model MDL2 as the state variable s'.
 次に、学習部214は、S306の処理で選択した行動aに対する報酬を計算する(ステップS314)。例えば、学習部214は、数式(1)に基づいて、現在の周期tにおける報酬r(t)を計算してよい。 Next, the learning unit 214 calculates the reward for the action a selected in the process of S306 (step S314). For example, the learning unit 214 may calculate the reward r (t) in the current cycle t based on the mathematical formula (1).
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 式中のMは、荷重分布から計算される翼根モーメントを表し、Fsumは、主翼10に分布した荷重の総和を表している。学習部214は、所定条件を満たす場合、報酬r(t)をゼロにする。所定条件には、数式(1)に示すように、総荷重差ΔFsumの絶対値が、ある許容範囲の上限値(例えば5[N(ニュートン)])を超えること、又は時刻tにおける翼根モーメントM(t)が、初期時刻の翼根モーメントM(t=0)の1.2倍を超えること、が含まれてよい。初期時刻とは、例えば、制御装置100が航空機1の構造負荷を低減する制御(つまり運用時の制御)を開始する前の水平飛行状態の時刻である。所定条件には、更に、総荷重差ΔFsumの絶対値が、ある許容範囲の下限値未満となること、が含まれてもよい。 In the equation, M represents the wing root moment calculated from the load distribution, and F sum represents the total load distributed on the main wing 10. When the predetermined condition is satisfied, the learning unit 214 sets the reward r (t) to zero. Under the predetermined conditions, as shown in the mathematical formula (1), the absolute value of the total load difference ΔF sum exceeds the upper limit of a certain allowable range (for example, 5 [N (Newton)]), or the wing root at time t. It may be included that the moment M (t) exceeds 1.2 times the blade root moment M (t = 0) at the initial time. The initial time is, for example, the time in the horizontal flight state before the control device 100 starts the control for reducing the structural load of the aircraft 1 (that is, the control during operation). The predetermined condition may further include that the absolute value of the total load difference ΔF sum is less than the lower limit of a certain allowable range.
 学習部214は、所定条件を満たさない場合、報酬r(t)を、所定条件を満たす場合よりも大きくする。すなわち、学習部214は、総荷重差ΔFsumの絶対値が、許容範囲の上限値以下であり、且つ時刻tにおける翼根モーメントM(t)が、初期時刻の翼根モーメントM(t=0)の1.2倍以下となる場合、報酬r(t)をゼロよりも大きい値とする。所定条件に、総荷重差ΔFsumの絶対値が、許容範囲の下限値未満となることが含まれる場合、学習部214は、総荷重差ΔFsumの絶対値が許容範囲内であり、且つ時刻tにおける翼根モーメントM(t)が、初期時刻の翼根モーメントM(t=0)の1.2倍以下となる場合に、報酬r(t)をゼロよりも大きい値としてよい。 When the predetermined condition is not satisfied, the learning unit 214 increases the reward r (t) as compared with the case where the predetermined condition is satisfied. That is, in the learning unit 214, the absolute value of the total load difference ΔF sum is equal to or less than the upper limit of the allowable range, and the blade root moment M (t) at time t is the blade root moment M (t = 0) at the initial time. ) Is 1.2 times or less, the reward r (t) is set to a value larger than zero. When the predetermined condition includes that the absolute value of the total load difference ΔF sum is less than the lower limit of the allowable range, the learning unit 214 determines that the absolute value of the total load difference ΔF sum is within the allowable range and the time. When the blade root moment M (t) at t is 1.2 times or less the blade root moment M (t = 0) at the initial time, the reward r (t) may be set to a value larger than zero.
 具体的には、学習部214は、所定条件を満たさない場合、報酬r(t)を、時刻tにおける翼根モーメントM(t)と初期時刻の翼根モーメントM(t=0)との差と、時刻tにおける荷重分布の総和Fsum(t)と初期時刻における荷重分布の総和Fsum(t=0)との商とに基づく値とする。 Specifically, when the learning unit 214 does not satisfy the predetermined condition, the learning unit 214 sets the reward r (t) as the difference between the wing root moment M (t) at the time t and the wing root moment M (t = 0) at the initial time. The value is based on the quotient of the sum total F sum (t) of the load distribution at time t and the sum F sum (t = 0) of the load distribution at the initial time.
 学習部214は、報酬r(t)を、翼根モーメントの差と、荷重分布の総和の商とに基づく値にする代わりに、翼根モーメントM(t)及び翼根モーメントM(t=0)の差と、荷重分布の総和Fsum(t)及び荷重分布の総和Fsum(t=0)の差とに基づく値にしてもよい。荷重分布の総和Fsum(t)及び荷重分布の総和Fsum(t=0)の差は、例えば、Fsum(t)-Fsum(t=0)の絶対値であってよい。 Instead of setting the reward r (t) to a value based on the difference between the blade root moments and the quotient of the sum of the load distributions, the learning unit 214 sets the blade root moment M (t) and the blade root moment M (t = 0). ), And the difference between the sum of the load distributions, F sum (t), and the sum of the load distributions, F sum (t = 0). The difference between the sum F sum (t) of the load distribution and the sum F sum (t = 0) of the load distribution may be, for example, an absolute value of F sum (t) −F sum (t = 0).
 学習部214は、数式(2)に基づいて、計算した報酬r(t)に対して負の報酬(ペナルティ)を付与してもよい。 The learning unit 214 may give a negative reward (penalty) to the calculated reward r (t) based on the mathematical formula (2).
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
 式中のΣΔδ(→)は、制御対象の全ての可動翼のそれぞれについて、前回時刻t-1の可動翼の舵角と、現在時刻tの可動翼の舵角との差分を求めたときに、その求めた各可動翼の舵角の差分の絶対値を全て足し合わせたときの総和を表している。 ΣΔδ (→) in the equation is when the difference between the steering angle of the movable wing at the previous time t-1 and the steering angle of the movable wing at the current time t is obtained for each of the movable wings to be controlled. , Represents the total when all the absolute values of the difference in the rudder angles of each movable wing obtained are added together.
 例えば、学習部214は、各可動翼の舵角の差分の絶対値の総和ΣΔδ(→)に対して、任意の重み係数(数式(2)では一例として重み係数を3としている)を乗算し、その総和ΣΔδ(→)と重み係数との積を、S312の処理で計算した報酬r(t)から減算する。これによって、第2モデルMDL2は、制御対象とする複数の可動翼の中で、可動させる可動翼の数が多いほど、報酬r(t)が小さくなるように学習される。この結果、舵面(可動翼)を頻繁に可動させることなく、効率的に舵面を制御することができる。 For example, the learning unit 214 multiplies an arbitrary weighting coefficient (in the formula (2), the weighting coefficient is 3 as an example) by multiplying the sum of the absolute values of the differences in the steering angles of each movable blade by an arbitrary weighting coefficient ΣΔδ (→). , The product of the sum ΣΔδ (→) and the weighting coefficient is subtracted from the reward r (t) calculated in the process of S312. As a result, the second model MDL2 is learned so that the reward r (t) becomes smaller as the number of movable wings to be moved increases among the plurality of movable wings to be controlled. As a result, the control surface can be efficiently controlled without frequently moving the control surface (movable blades).
 次に、学習部214は、計算した報酬r(t)と、状態変数s´を入力した際に第2モデルMDL2が出力する行動価値Q(s´,a´)と、状態変数sを入力した際に第2モデルMDL2が出力する行動価値Q(s,a)とに基づいて、第2モデルMDL2を学習する(ステップS316)。 Next, the learning unit 214 inputs the calculated reward r (t), the action value Q (s', a') output by the second model MDL2 when the state variable s'is input, and the state variable s. The second model MDL2 is learned based on the action value Q (s, a) output by the second model MDL2 at the time of this (step S316).
 例えば、学習部214は、第2モデルMDL2を用いて、次の時刻t+1において取り得ることが可能な複数の行動a´のそれぞれについて行動価値Q(s´,a´)を求め、複数の行動a´のそれぞれに対応する行動価値Q(s´,a´)の中から最大値maxQ(s´,a´)を選択する。学習部214は、選択した行動価値maxQ(s´,a´)に対して、割引率γと呼ばれる重み係数(0<γ<1)を乗算し、更に、報酬r(t)を加算する。 For example, the learning unit 214 uses the second model MDL2 to obtain the action value Q (s', a') for each of the plurality of actions a'that can be taken at the next time t + 1, and the plurality of actions. The maximum value maxQ (s', a') is selected from the action values Q (s', a') corresponding to each of a'. The learning unit 214 multiplies the selected action value maxQ (s', a') by a weighting coefficient (0 <γ <1) called a discount rate γ, and further adds a reward r (t).
 そして、学習部214は、r(t)+γmaxQ(s´,a´)と、Q(s,a)との差分が小さくなるように、第2モデルMDL2を学習する。例えば、学習部214は、r(t)+γmaxQ(s´,a´)とQ(s,a)との差分が小さくなるように、第2モデルMDL2のパラメータである重み係数やバイアス成分などを確率的勾配降下法などを用いて決定(更新)する。Q(s,a)は、「第1価値」の一例であり、Q(s´,a´)は、「第2価値」の一例である。 Then, the learning unit 214 learns the second model MDL2 so that the difference between r (t) + γmaxQ (s', a') and Q (s, a) becomes small. For example, the learning unit 214 sets the weighting coefficient and bias component, which are parameters of the second model MDL2, so that the difference between r (t) + γmaxQ (s', a') and Q (s, a) becomes small. Determine (update) using stochastic gradient descent. Q (s, a) is an example of "first value", and Q (s', a') is an example of "second value".
 学習部214は、学習した第2モデルMDL2を記憶部230に第2モデルデータD2として記憶させる。 The learning unit 214 stores the learned second model MDL2 in the storage unit 230 as the second model data D2.
 このように、学習部214は、S300からS316の処理を繰り返し行い(イタレーションを行い)、第2モデルMDL2を学習する。そして、学習部214は、十分に学習した学習済みの第2モデルMDL2を定義した第2モデルデータD2を、例えば、通信部202を介して制御装置100に送信する。これによって本フローチャートの処理が終了する。 In this way, the learning unit 214 repeats the processes of S300 to S316 (iterates) and learns the second model MDL2. Then, the learning unit 214 transmits the second model data D2 that defines the fully learned second model MDL2 to the control device 100 via, for example, the communication unit 202. This ends the processing of this flowchart.
 以上説明した実施形態によれば、制御装置100は、航空機1の主翼10のひずみを示す情報であるひずみベクトルξ(→)と、航空機1のフラップFLやエルロンといった可動翼の舵角を示す情報である舵角ベクトルδ(→)とを取得する。制御装置100は、取得したひずみベクトルξ(→)と舵角ベクトルδ(→)とを、予め学習された第1モデルMDL1に入力し、これらベクトルを入力した第1モデルMDL1の出力結果に基づいて、主翼10の荷重分布と迎角αを決定する。制御装置100は、決定した主翼10の荷重分布を示す荷重ベクトルF(→)と、主翼10の総荷重差ΔFsumと、主翼10の迎角αと、フラップFL等の可動翼の舵角ベクトルδ(→)とのうち一部または全部を、予め学習された第2モデルMDL2に状態変数sとして入力し、状態変数sを入力した第2モデルMDL2の出力結果に基づいて可動翼の制御量を決定する。そして、制御装置100は、決定した制御量に基づいて、可動翼を制御する。これによって、例えば、航空機1の総揚力を一定に保ちながら、主翼10の構造的負荷(例えば翼根モーメントM)を低減することができる。 According to the embodiment described above, the control device 100 has a strain vector ξ (→), which is information indicating the strain of the main wing 10 of the aircraft 1, and information indicating a steering angle of a movable wing such as a flap FL or an aileron of the aircraft 1. The rudder angle vector δ (→) and is obtained. The control device 100 inputs the acquired strain vector ξ (→) and the steering angle vector δ (→) to the first model MDL1 learned in advance, and based on the output result of the first model MDL1 in which these vectors are input. Then, the load distribution of the main wing 10 and the angle of attack α are determined. The control device 100 includes a load vector F (→) indicating the determined load distribution of the main wing 10, a total load difference ΔF sum of the main wing 10, an angle of attack α of the main wing 10, and a steering angle vector of a movable wing such as a flap FL. Part or all of δ (→) is input to the pre-learned second model MDL2 as the state variable s, and the control amount of the movable wing is based on the output result of the second model MDL2 in which the state variable s is input. To determine. Then, the control device 100 controls the movable wing based on the determined control amount. Thereby, for example, the structural load of the main wing 10 (for example, the wing root moment M) can be reduced while keeping the total lift of the aircraft 1 constant.
 (変形例)
 以下、上述した実施形態の変形例について説明する。上述した実施形態では、第1モデルMDL1を利用して、ひずみベクトルξ(→)と舵角ベクトルδ(→)とから、荷重ベクトルF(→)と迎角αとを決定するものとして説明したがこれに限られない。例えば、ひずみベクトルξ(→)及び舵角ベクトルδ(→)と、荷重ベクトルF(→)及び迎角αとの相関関係などが近似式やテーブルで表される場合、第1モデルMDL1を実現するためのニューラルネットワークを利用する代わりに、それら近似式やテーブルを利用して、ひずみベクトルξ(→)と舵角ベクトルδ(→)とから、荷重ベクトルF(→)と迎角αとを決定してもよい。つまり、第1モデルMDL1は、第3モデルMDL3と同様に、テーブルデータや近似式であってもよい。
(Modification example)
Hereinafter, a modified example of the above-described embodiment will be described. In the above-described embodiment, the load vector F (→) and the angle of attack α are determined from the strain vector ξ (→) and the steering angle vector δ (→) using the first model MDL1. Is not limited to this. For example, when the correlation between the strain vector ξ (→) and the steering angle vector δ (→) and the load vector F (→) and the angle of attack α is expressed by an approximate expression or a table, the first model MDL1 is realized. Instead of using the neural network to do this, the load vector F (→) and the angle of attack α are obtained from the strain vector ξ (→) and the steering angle vector δ (→) by using these approximation formulas and tables. You may decide. That is, the first model MDL1 may be table data or an approximate expression like the third model MDL3.
 更に、上述した実施形態では、第1モデルMDL1に入力される制御量が、主翼10の可動翼の舵角であるものとして説明したがこれに限られない。例えば、制御量には、舵角に加えて、或いは代えて、スイープ角度やツイスト角度などが含まれてよい。スイープ角度は、ヨー軸(Z軸)周りに可動翼を回動させたときのピッチ軸(Y軸)とのなす角度である。ツイスト角度は、ピッチ軸(Y軸)周りに可動翼を回動させたときのロール軸(X軸)とのなす角度である。 Further, in the above-described embodiment, the control amount input to the first model MDL1 has been described as being the steering angle of the movable wing of the main wing 10, but the present invention is not limited to this. For example, the control amount may include a sweep angle, a twist angle, and the like in addition to or instead of the steering angle. The sweep angle is the angle formed by the pitch axis (Y axis) when the movable wing is rotated around the yaw axis (Z axis). The twist angle is the angle formed by the roll axis (X axis) when the movable wing is rotated around the pitch axis (Y axis).
 更に、上述した実施形態では、第2モデルMDL2が、環境状態sが入力されると、環境状態sの下で取り得ることが可能な一つまたは複数の行動(行動変数)aのそれぞれの価値(=行動価値Q(s,a))を出力するように学習されるものとして説明したがこれに限れらない。例えば、第2モデルMDL2は、行動価値Q(s,a)を出力する代わりに、行動変数aを出力するように学習されてもよい。 Further, in the embodiment described above, the second model MDL2 is, when the environmental conditions s t is input, one or more actions that can be taken under the environmental conditions s t a (behavior variable) a t It was explained that each value (= action value Q (s, a)) is learned to be output, but the explanation is not limited to this. For example, the second model MDL2 is action value Q (s, a) instead of outputting a, may be trained to output behavior variable a t.
 更に、上述した実施形態では、制御装置100が、航空機1の主翼10のひずみを示す情報であるひずみベクトルξ(→)と、航空機1のフラップFLやエルロンといった可動翼の舵角を示す情報である舵角ベクトルδ(→)とを取得し、予め学習された第1モデルMDL1や第2モデルMDL2を用いて、取得したひずみベクトルξ(→)と舵角ベクトルδ(→)とから、航空機1の可動翼の制御量を決定するものとして説明したがこれに限られない。 Further, in the above-described embodiment, the control device 100 uses the strain vector ξ (→), which is information indicating the strain of the main wing 10 of the aircraft 1, and the information indicating the steering angle of the movable wing such as the flap FL or aileron of the aircraft 1. An aircraft obtains a certain rudder angle vector δ (→) and uses the pre-learned first model MDL1 and second model MDL2 from the acquired strain vector ξ (→) and rudder angle vector δ (→). Although it has been described as determining the control amount of the movable blade of 1, the present invention is not limited to this.
 [可動翼付きタービンブレード]
 例えば、制御装置100は、風力発電装置が備えるタービンブレードや潮流発電装置が備えるタービンブレードの構造的負荷を低減するために、深層強化学習を適用して、各種タービンブレードの制御量を決定してもよい。この場合、タービンブレードには、主翼10のように、光ファイバセンサSFBが設けられるものとする。風力発電装置及び潮流発電装置は、「構造物」の他の例であり、タービンブレードは、「構造物の翼」の他の例である。
[Turbine blade with movable wing]
For example, the control device 100 determines the control amount of various turbine blades by applying deep reinforcement learning in order to reduce the structural load of the turbine blades included in the wind power generation device and the turbine blades included in the tidal current power generation device. May be good. In this case, it is assumed that the turbine blade is provided with the optical fiber sensor SFB like the main wing 10. Wind turbines and tidal current generators are other examples of "structures" and turbine blades are another example of "wings of structures".
 例えば、タービンブレードに、気流や水流を制御するためにフラップが設けられる場合がある。この場合、上述した実施形態の説明において、航空機1の主翼をタービンブレードに置き換え、航空機1のフラップFLやエルロンといった可動翼をタービンブレードのフラップに置き換えてよい。 For example, the turbine blade may be provided with a flap to control the air flow or water flow. In this case, in the above description of the embodiment, the main wing of the aircraft 1 may be replaced with a turbine blade, and the movable wing such as the flap FL or aileron of the aircraft 1 may be replaced with a flap of the turbine blade.
 すなわち、制御装置100は、風力発電装置や潮流発電装置のタービンブレードに設けられた光ファイバセンサSFBによって検出されたタービンブレードのひずみを示すひずみベクトルξ(→)と、そのタービンブレードのフラップの舵角を示す舵角ベクトルδ(→)とを取得する。 That is, the control device 100 includes a strain vector ξ (→) indicating the strain of the turbine blade detected by the optical fiber sensor SFB provided on the turbine blade of the wind power generation device or the tidal current power generation device, and a flap of the turbine blade. The steering angle vector δ (→) indicating the steering angle is acquired.
 制御装置100は、取得したひずみベクトルξ(→)と舵角ベクトルδ(→)とを、予め学習された第1モデルMDL1に入力し、これらベクトルを入力した第1モデルMDL1の出力結果に基づいて、タービンブレードの荷重分布と迎角αを決定する。 The control device 100 inputs the acquired strain vector ξ (→) and the steering angle vector δ (→) to the first model MDL1 learned in advance, and based on the output result of the first model MDL1 in which these vectors are input. The load distribution of the turbine blades and the angle of attack α are determined.
 制御装置100は、決定したタービンブレードの荷重分布を示す荷重ベクトルF(→)と、タービンブレードの総荷重差ΔFsumと、タービンブレードの迎角αと、フラップの舵角ベクトルδ(→)とのうち一部または全部を、予め学習された第2モデルMDL2に状態変数sとして入力し、状態変数sを入力した第2モデルMDL2の出力結果に基づいてフラップの制御量を決定する。 The control device 100 includes a load vector F (→) indicating the determined load distribution of the turbine blades, a total load difference ΔF sum of the turbine blades, an angle of attack α of the turbine blades, and a steering angle vector δ (→) of the flaps. A part or all of the state variables are input to the pre-learned second model MDL2 as state variables s, and the flap control amount is determined based on the output result of the second model MDL2 in which the state variables s are input.
 そして、制御装置100は、決定した制御量に基づいてフラップを制御する。これによって、タービンブレードによる気流や水流の受け方を適切に変えることができるため、タービンブレードの構造的負荷を低減したり、発電装置の発電効率を上げたりすることができる。 Then, the control device 100 controls the flap based on the determined control amount. As a result, the way in which the turbine blades receive the airflow and water flow can be appropriately changed, so that the structural load on the turbine blades can be reduced and the power generation efficiency of the power generation device can be increased.
 [可変ピッチのタービンブレード]
 例えば、タービンブレードが可変ピッチブレードである場合も考えられる。この場合、タービンブレードそのものが可動翼として機能する。すなわち、タービンブレードが、航空機1でいうところの主翼と可動翼との両方の機能を兼ねている。
[Variable pitch turbine blade]
For example, the turbine blade may be a variable pitch blade. In this case, the turbine blade itself functions as a movable blade. That is, the turbine blades have both the functions of the main wing and the movable wing in the aircraft 1.
 このような場合、制御装置100の取得部112は、タービンブレードに設けられた光ファイバセンサSFBから、タービンブレードのひずみ分布を示すひずみベクトルξ(→)を取得するとともに、タービンブレードをピッチ軸周りに回動させるアクチュエータから、タービンブレードのピッチ角度を示す角度ベクトルδ#(→)を取得する。 In such a case, the acquisition unit 112 of the control device 100 acquires the strain vector ξ (→) indicating the strain distribution of the turbine blade from the optical fiber sensor SFB provided on the turbine blade, and sets the turbine blade on the pitch axis. The angle vector δ # (→) indicating the pitch angle of the turbine blade is acquired from the actuator that rotates around.
 制御量決定部114は、取得部112によってタービンブレードのひずみベクトルξ(→)及び角度ベクトルδ#(→)が取得されると、それらひずみベクトルξ(→)及び角度ベクトルδ#(→)を、予め学習された第1モデルMDL1に入力する。 When the strain vector ξ (→) and the angle vector δ # (→) of the turbine blade are acquired by the acquisition unit 112, the control amount determination unit 114 obtains the strain vector ξ (→) and the angle vector δ # (→). , Input to the pre-learned first model MDL1.
 例えば、第1モデルMDL1は、上述した実施形態で説明したように、学習装置200によって事前にトレーニングされているものとする。具体的には、学習装置200の学習部214は、教師データを用いて、タービンブレードのひずみベクトルξ(→)及び角度ベクトルδ#(→)が入力されると、タービンブレードにかかる荷重分布を示す荷重ベクトルF(→)と、タービンブレードの迎角αとを出力するように、第1モデルMDL1を学習する。これによって、第1モデルMDL1は、タービンブレードのひずみベクトルξ(→)及び角度ベクトルδ#(→)が入力されると、タービンブレードの荷重ベクトルF(→)と迎角αとを出力するようになる。 For example, it is assumed that the first model MDL1 is pre-trained by the learning device 200 as described in the above-described embodiment. Specifically, the learning unit 214 of the learning device 200 uses the teacher data to determine the load distribution applied to the turbine blades when the strain vector ξ (→) and the angle vector δ # (→) of the turbine blades are input. The first model MDL1 is learned so as to output the indicated load vector F (→) and the angle of attack α of the turbine blade. As a result, the first model MDL1 outputs the turbine blade load vector F (→) and the angle of attack α when the turbine blade strain vector ξ (→) and the angle vector δ # (→) are input. become.
 制御量決定部114は、ひずみベクトルξ(→)及び角度ベクトルδ#(→)を入力した第1モデルMDL1から、タービンブレードの荷重ベクトルF(→)と迎角αとを取得する。 The control amount determination unit 114 acquires the load vector F (→) of the turbine blade and the angle of attack α from the first model MDL1 in which the strain vector ξ (→) and the angle vector δ # (→) are input.
 制御量決定部114は、取得した荷重ベクトルF(→)が示す荷重分布の総和と、目標とする荷重分布の総和との差分である総荷重差ΔFsumを算出する。 The control amount determination unit 114 calculates the total load difference ΔF sum , which is the difference between the sum of the load distributions indicated by the acquired load vector F (→) and the sum of the target load distributions.
 制御量決定部114は、タービンブレードの角度ベクトルδ#(→)、迎角α、荷重ベクトルF(→)、総荷重差ΔFsumのうち一部または全部(好ましくは全部)を状態変数sとして第2モデルMDL2に入力する。第2モデルMDL2は、上述した実施形態で説明したように、学習装置200によって事前にトレーニングされているものとする。すなわち、第2モデルMDL2は、角度ベクトルδ#(→)、迎角α、荷重ベクトルF(→)、総荷重差ΔFsumのうち一部または全部が状態変数sとして入力されると、その状態変数sに応じてとるべき行動の価値Q(s,a)(→)を出力するように学習される。 The control amount determination unit 114 uses a part or all (preferably all) of the turbine blade angle vector δ # (→), angle of attack α, load vector F (→), and total load difference ΔF sum as state variables s. Input to the second model MDL2. It is assumed that the second model MDL2 is pre-trained by the learning device 200 as described in the above-described embodiment. That is, when a part or all of the angle vector δ # (→), the angle of attack α, the load vector F (→), and the total load difference ΔF sum is input as the state variable s, the second model MDL2 is in that state. It is learned to output the value Q (s, a) (→) of the action to be taken according to the variable s.
 制御量決定部114は、第2モデルMDL2によって行動価値Q(s,a)(→)が出力されると、その行動価値Q(s,a)(→)を取得する。そして、制御量決定部114は、取得した行動価値Q(s,a)(→)に基づいて、タービンブレードのピッチ角度を決定する。 When the action value Q (s, a) (→) is output by the second model MDL2, the control amount determination unit 114 acquires the action value Q (s, a) (→). Then, the control amount determination unit 114 determines the pitch angle of the turbine blade based on the acquired action value Q (s, a) (→).
 駆動制御部116は、制御量決定部114によって決定されたピッチ角度に基づいてアクチュエータを制御して、タービンブレードをピッチ軸周りに回動させる。これによって、タービンブレードが可変ピッチブレードであっても、タービンブレードによる気流や水流の受け方を適切に変えることができるため、タービンブレードの構造的負荷を低減したり、発電装置の発電効率を上げたりすることができる。 The drive control unit 116 controls the actuator based on the pitch angle determined by the control amount determination unit 114 to rotate the turbine blade around the pitch axis. As a result, even if the turbine blade is a variable pitch blade, the way the turbine blade receives the airflow and water flow can be appropriately changed, so that the structural load of the turbine blade can be reduced and the power generation efficiency of the power generation device can be increased. can do.
 以上、本発明を実施するための形態について実施形態を用いて説明したが、本発明はこうした実施形態に何等限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変形及び置換を加えることができる。 Although the embodiments for carrying out the present invention have been described above using the embodiments, the present invention is not limited to these embodiments, and various modifications and substitutions are made without departing from the gist of the present invention. Can be added.
1…航空機、10…主翼、12…垂直尾翼、14…水平尾翼、100…制御装置、102…通信部、104…駆動部、110…制御部、112…取得部、114…制御量決定部、116…駆動制御部、130…記憶部、200…学習装置、202…通信部、210…制御部、212…取得部、214…学習部、230…記憶部、MDL1…第1モデル、MDL2…第2モデル、MDL3…第3モデル 1 ... Aircraft, 10 ... Main wing, 12 ... Vertical stabilizer, 14 ... Horizontal stabilizer, 100 ... Control device, 102 ... Communication unit, 104 ... Drive unit, 110 ... Control unit, 112 ... Acquisition unit, 114 ... Control amount determination unit, 116 ... drive control unit, 130 ... storage unit, 200 ... learning device, 202 ... communication unit, 210 ... control unit, 212 ... acquisition unit, 214 ... learning unit, 230 ... storage unit, MDL1 ... first model, MDL2 ... first 2 models, MDL3 ... 3rd model

Claims (17)

  1.  構造物の翼に設けられた光ファイバセンサによって検出された前記翼のひずみを示す情報と、前記構造物の可動翼の制御量を示す情報とを取得する取得部と、
     前記取得部により取得された情報が示す前記ひずみ及び前記制御量に基づいて、前記翼の荷重及び迎角を決定する第1決定部と、
     状態変数が入力されると、前記状態変数に応じてとるべき行動の価値又は前記行動を示す変数を出力するように学習されたモデルに、前記第1決定部により決定された前記荷重及び前記迎角と、前記取得部により取得された情報が示す前記制御量とのうち一部または全部を前記状態変数として入力し、前記状態変数を入力した前記モデルの出力結果に基づいて前記可動翼の制御量を決定する第2決定部と、
     前記第2決定部により決定された前記制御量に基づいて、前記可動翼を制御する制御部と、
     を備える制御装置。
    An acquisition unit that acquires information indicating the strain of the blade detected by an optical fiber sensor provided on the blade of the structure and information indicating the control amount of the movable blade of the structure.
    A first determination unit that determines the load and angle of attack of the blade based on the strain and the control amount indicated by the information acquired by the acquisition unit.
    When a state variable is input, the load determined by the first determination unit and the reception are input to a model trained to output the value of the action to be taken according to the state variable or the variable indicating the action. A part or all of the angle and the control amount indicated by the information acquired by the acquisition unit is input as the state variable, and the movable wing is controlled based on the output result of the model in which the state variable is input. The second decision part that determines the amount and
    A control unit that controls the movable wing based on the control amount determined by the second determination unit, and
    A control device comprising.
  2.  前記第1決定部は、前記ひずみ及び前記制御量が入力されると、前記翼の荷重及び迎角を出力するように学習された第2モデルに、前記取得部により取得された情報が示す前記ひずみ及び前記制御量を入力し、前記ひずみ及び前記制御量を入力した前記第2モデルの出力結果に基づいて前記翼の前記荷重及び前記迎角を決定する、
     請求項1に記載の制御装置。
    The first determination unit indicates the information acquired by the acquisition unit to the second model learned to output the load and angle of attack of the blade when the strain and the control amount are input. The strain and the control amount are input, and the load and the angle of attack of the blade are determined based on the output result of the second model in which the strain and the control amount are input.
    The control device according to claim 1.
  3.  構造物の可動翼の制御量、前記構造物の翼の荷重、及び前記構造物の翼の迎角のうち一部または全部を含む情報を取得する取得部と、
     深層強化学習を用いて、前記取得部により取得された情報を状態変数として入力すると、入力された前記状態変数に応じてとるべき行動の価値又は前記行動を示す変数を出力するようにモデルを学習する学習部と、
     を備える学習装置。
    An acquisition unit that acquires information including a control amount of a movable wing of a structure, a load of the wing of the structure, and a part or all of the angle of attack of the wing of the structure.
    When the information acquired by the acquisition unit is input as a state variable using deep reinforcement learning, the model is learned so as to output the value of the action to be taken or the variable indicating the action according to the input state variable. Learning department and
    A learning device equipped with.
  4.  前記取得部は、ある第1時刻における前記制御量、前記荷重、及び前記迎角のうち一部または全部を含む第1情報と、前記第1時刻よりも将来の第2時刻における前記制御量、前記荷重、及び前記迎角のうち一部または全部を含む第2情報とを取得し、
     前記学習部は、前記第1情報を前記モデルに入力したときに前記モデルが出力した価値を表す第1価値と、前記第2情報を前記モデルに入力したときに前記モデルが出力した価値を表す第2価値と、前記第1価値を基に選択した行動に対する報酬とに基づいて、前記モデルを学習する。
     請求項3に記載の学習装置。
    The acquisition unit includes the first information including a part or all of the control amount, the load, and the angle of attack at a certain first time, and the control amount at a second time after the first time. Acquire the load and the second information including a part or all of the angle of attack.
    The learning unit represents a first value representing the value output by the model when the first information is input to the model, and a value output by the model when the second information is input to the model. The model is learned based on the second value and the reward for the action selected based on the first value.
    The learning device according to claim 3.
  5.  前記学習部は、
      前記第1価値を基に、前記可動翼の翼面に交差する方向に関して前記翼面の一方の面である第1面側に前記可動翼を動かすこと、前記方向に関して前記翼面の他方の面である第2面側に前記可動翼を動かすこと、前記方向に関して前記可動翼を前記第1面側と前記第2面側とのいずれにも動かさないこと、の中からいずれかの行動を選択し、
      前記選択した行動に対する前記報酬を算出し、
      前記第1価値と、前記第2価値と、前記算出した報酬とに基づいて、前記モデルを学習する、
     請求項4に記載の学習装置。
    The learning unit
    Based on the first value, moving the movable wing toward the first surface side, which is one surface of the wing surface in a direction intersecting the wing surface of the movable wing, and moving the movable wing toward the first surface side, the other surface of the wing surface in the direction. Select one of the actions of moving the movable wing to the second surface side, and not moving the movable wing to either the first surface side or the second surface side in the direction. And
    Calculate the reward for the selected action and
    The model is learned based on the first value, the second value, and the calculated reward.
    The learning device according to claim 4.
  6.  前記学習部は、対象時刻における前記翼の荷重が許容範囲の上限値を超える、前記対象時刻における前記翼の荷重が前記許容範囲の下限値未満となる、又は前記対象時刻における前記翼のモーメントが初期時刻における前記翼のモーメントを超える所定条件を満たす場合、前記報酬をゼロとする、
     請求項4または5に記載の学習装置。
    In the learning unit, the load of the wing at the target time exceeds the upper limit of the allowable range, the load of the wing at the target time becomes less than the lower limit of the allowable range, or the moment of the wing at the target time is If a predetermined condition exceeding the moment of the wing at the initial time is satisfied, the reward is set to zero.
    The learning device according to claim 4 or 5.
  7.  前記学習部は、前記所定条件を満たさない場合、前記所定条件を満たす場合よりも前記報酬を大きくする、
     請求項6に記載の学習装置。
    When the predetermined condition is not satisfied, the learning unit increases the reward as compared with the case where the predetermined condition is satisfied.
    The learning device according to claim 6.
  8.  前記学習部は、前記所定条件を満たさない場合、前記報酬を、前記対象時刻における前記翼のモーメントと前記初期時刻における前記翼のモーメントとの差と、前記対象時刻における前記翼の荷重と前記初期時刻における前記翼の荷重との商とに基づく値とする、
     請求項7に記載の学習装置。
    When the predetermined condition is not satisfied, the learning unit receives the reward, the difference between the moment of the wing at the target time and the moment of the wing at the initial time, the load of the wing at the target time, and the initial stage. A value based on the quotient with the load of the wing at the time.
    The learning device according to claim 7.
  9.  前記学習部は、前記所定条件を満たさない場合、前記報酬を、前記対象時刻における前記翼のモーメントと前記初期時刻における前記翼のモーメントとの差と、前記対象時刻における前記翼の荷重と前記初期時刻における前記翼の荷重との差とに基づく値とする、
     請求項7に記載の学習装置。
    When the predetermined condition is not satisfied, the learning unit receives the reward, the difference between the moment of the wing at the target time and the moment of the wing at the initial time, the load of the wing at the target time, and the initial stage. The value is based on the difference from the load of the wing at the time.
    The learning device according to claim 7.
  10.  前記可動翼には、可動位置が互いに異なる複数の翼片が含まれ、
     前記学習部は、対象時刻において、可動させる前記翼片の数が多いほど、前記対象時刻における前記報酬を小さくする、
     請求項4から9のうちいずれか一項に記載の学習装置。
    The movable wing includes a plurality of wing pieces having different movable positions.
    The learning unit reduces the reward at the target time as the number of the wing pieces to be moved increases at the target time.
    The learning device according to any one of claims 4 to 9.
  11.  構造物の可動翼に設けられた光ファイバセンサによって検出された前記可動翼のひずみを示す情報と、前記可動翼の制御量を示す情報とを取得する取得部と、
     前記取得部により取得された情報が示す前記ひずみ及び前記制御量に基づいて、前記可動翼の荷重及び迎角を決定する第1決定部と、
     状態変数が入力されると、前記状態変数に応じてとるべき行動の価値又は前記行動を示す変数を出力するように学習されたモデルに、前記第1決定部により決定された前記荷重及び前記迎角と、前記取得部により取得された情報が示す前記制御量とのうち一部または全部を前記状態変数として入力し、前記状態変数を入力した前記モデルの出力結果に基づいて前記可動翼の制御量を決定する第2決定部と、
     前記第2決定部により決定された前記制御量に基づいて、前記可動翼を制御する制御部と、
     を備える制御装置。
    An acquisition unit that acquires information indicating the strain of the movable wing detected by an optical fiber sensor provided on the movable wing of the structure and information indicating the control amount of the movable wing.
    A first determination unit that determines the load and angle of attack of the movable blade based on the strain and the control amount indicated by the information acquired by the acquisition unit.
    When a state variable is input, the load determined by the first determination unit and the reception are input to a model trained to output the value of the action to be taken according to the state variable or the variable indicating the action. A part or all of the angle and the control amount indicated by the information acquired by the acquisition unit is input as the state variable, and the movable wing is controlled based on the output result of the model in which the state variable is input. The second decision part that determines the amount and
    A control unit that controls the movable wing based on the control amount determined by the second determination unit, and
    A control device comprising.
  12.  コンピュータが、
     構造物の翼に設けられた光ファイバセンサによって検出された前記翼のひずみを示す情報と、前記構造物の可動翼の制御量を示す情報とを取得し、
     取得した情報が示す前記ひずみ及び前記制御量に基づいて、前記翼の荷重及び迎角を決定し、
     状態変数が入力されると、前記状態変数に応じてとるべき行動の価値又は前記行動を示す変数を出力するように学習されたモデルに、決定した前記荷重及び前記迎角と、取得した情報が示す前記制御量とのうち一部または全部を前記状態変数として入力し、
     前記状態変数を入力した前記モデルの出力結果に基づいて前記可動翼の制御量を決定し、
     決定した前記制御量に基づいて、前記可動翼を制御する、
     制御方法。
    The computer
    Information indicating the strain of the blade detected by the optical fiber sensor provided on the blade of the structure and information indicating the control amount of the movable blade of the structure are acquired.
    Based on the strain and the control amount indicated by the acquired information, the load and angle of attack of the blade are determined.
    When a state variable is input, the determined load, the interception angle, and the acquired information are transmitted to a model trained to output the value of the action to be taken according to the state variable or the variable indicating the action. A part or all of the indicated control amount is input as the state variable, and
    The control amount of the movable wing is determined based on the output result of the model in which the state variable is input.
    The movable wing is controlled based on the determined control amount.
    Control method.
  13.  コンピュータが、
     構造物の可動翼に設けられた光ファイバセンサによって検出された前記可動翼のひずみを示す情報と、前記可動翼の制御量を示す情報とを取得し、
     取得した情報が示す前記ひずみ及び前記制御量に基づいて、前記可動翼の荷重及び迎角を決定し、
     状態変数が入力されると、前記状態変数に応じてとるべき行動の価値又は前記行動を示す変数を出力するように学習されたモデルに、決定した前記荷重及び前記迎角と、取得した情報が示す前記制御量とのうち一部または全部を前記状態変数として入力し、
     前記状態変数を入力した前記モデルの出力結果に基づいて前記可動翼の制御量を決定し、
     決定した前記制御量に基づいて、前記可動翼を制御する、
     制御方法。
    The computer
    Information indicating the strain of the movable wing detected by the optical fiber sensor provided on the movable wing of the structure and information indicating the control amount of the movable wing are acquired.
    The load and angle of attack of the movable wing are determined based on the strain and the control amount indicated by the acquired information.
    When a state variable is input, the determined load, the interception angle, and the acquired information are transmitted to a model trained to output the value of the action to be taken according to the state variable or the variable indicating the action. A part or all of the indicated control amount is input as the state variable, and
    The control amount of the movable wing is determined based on the output result of the model in which the state variable is input.
    The movable wing is controlled based on the determined control amount.
    Control method.
  14.  コンピュータが、
     構造物の可動翼の制御量、前記構造物の翼の荷重、及び前記構造物の翼の迎角のうち一部または全部を含む情報を取得し、
     深層強化学習を用いて、前記取得した情報を状態変数として入力すると、入力された前記状態変数に応じてとるべき行動の価値又は前記行動を示す変数を出力するようにモデルを学習する、
     学習方法。
    The computer
    Obtain information including the control amount of the movable wing of the structure, the load of the wing of the structure, and a part or all of the angle of attack of the wing of the structure.
    When the acquired information is input as a state variable using deep reinforcement learning, the model is trained so as to output the value of the action to be taken or the variable indicating the action according to the input state variable.
    Learning method.
  15.  コンピュータに、
     構造物の翼に設けられた光ファイバセンサによって検出された前記翼のひずみを示す情報と前記構造物の可動翼の制御量を示す情報とを取得すること、
     取得した情報が示す前記ひずみ及び前記制御量に基づいて、前記翼の荷重及び迎角を決定すること、
     状態変数が入力されると、前記状態変数に応じてとるべき行動の価値又は前記行動を示す変数を出力するように学習されたモデルに、決定した前記荷重及び前記迎角と、取得した情報が示す前記制御量とのうち一部または全部を前記状態変数として入力すること、
     前記状態変数を入力した前記モデルの出力結果に基づいて前記可動翼の制御量を決定すること、及び
     決定した前記制御量に基づいて、前記可動翼を制御すること、
     を実行させるためのプログラム。
    On the computer
    Acquiring information indicating the strain of the blade detected by an optical fiber sensor provided on the blade of the structure and information indicating the control amount of the movable blade of the structure.
    Determining the load and angle of attack of the blade based on the strain and the control amount indicated by the acquired information.
    When a state variable is input, the determined load, the interception angle, and the acquired information are transmitted to a model trained to output the value of the action to be taken according to the state variable or the variable indicating the action. Entering a part or all of the indicated control amount as the state variable,
    Determining the control amount of the movable wing based on the output result of the model in which the state variable is input, and controlling the movable wing based on the determined control amount.
    A program to execute.
  16.  コンピュータに、
     構造物の可動翼に設けられた光ファイバセンサによって検出された前記可動翼のひずみを示す情報と、前記可動翼の制御量を示す情報とを取得すること、
     取得した情報が示す前記ひずみ及び前記制御量に基づいて、前記可動翼の荷重及び迎角を決定すること、
     状態変数が入力されると、前記状態変数に応じてとるべき行動の価値又は前記行動を示す変数を出力するように学習されたモデルに、決定した前記荷重及び前記迎角と、取得した情報が示す前記制御量とのうち一部または全部を前記状態変数として入力すること、
     前記状態変数を入力した前記モデルの出力結果に基づいて前記可動翼の制御量を決定すること、及び
     決定した前記制御量に基づいて、前記可動翼を制御すること、
     を実行させるためのプログラム。
    On the computer
    Acquiring the information indicating the strain of the movable wing detected by the optical fiber sensor provided on the movable wing of the structure and the information indicating the control amount of the movable wing.
    Determining the load and angle of attack of the movable wing based on the strain and the control amount indicated by the acquired information.
    When a state variable is input, the determined load, the interception angle, and the acquired information are transmitted to a model trained to output the value of the action to be taken according to the state variable or the variable indicating the action. Entering a part or all of the indicated control amount as the state variable,
    Determining the control amount of the movable wing based on the output result of the model in which the state variable is input, and controlling the movable wing based on the determined control amount.
    A program to execute.
  17.  コンピュータに、
     構造物の可動翼の制御量、前記構造物の翼の荷重、及び前記構造物の翼の迎角のうち一部または全部を含む情報を取得すること、及び
     深層強化学習を用いて、前記取得した情報を状態変数として入力すると、入力された前記状態変数に応じてとるべき行動の価値又は前記行動を示す変数を出力するようにモデルを学習すること、
     を実行させるためのプログラム。
    On the computer
    Acquiring information including the control amount of the movable wing of the structure, the load of the wing of the structure, and a part or all of the angle of attack of the wing of the structure, and using deep reinforcement learning, said acquisition. When the input information is input as a state variable, the model is trained to output the value of the action to be taken or the variable indicating the action according to the input state variable.
    A program to execute.
PCT/JP2020/030640 2019-09-24 2020-08-12 Control device, learning device, control method, learning method, and program WO2021059787A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2019-173265 2019-09-24
JP2019173265A JP7280609B2 (en) 2019-09-24 2019-09-24 CONTROL DEVICE, LEARNING DEVICE, CONTROL METHOD, LEARNING METHOD, AND PROGRAM

Publications (1)

Publication Number Publication Date
WO2021059787A1 true WO2021059787A1 (en) 2021-04-01

Family

ID=75156769

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/030640 WO2021059787A1 (en) 2019-09-24 2020-08-12 Control device, learning device, control method, learning method, and program

Country Status (2)

Country Link
JP (1) JP7280609B2 (en)
WO (1) WO2021059787A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230256596A1 (en) * 2020-07-14 2023-08-17 University Of Tsukuba Information processing device, method, and program
CN113777919B (en) * 2021-08-13 2023-11-17 哈尔滨工程大学 NSGA-II genetic algorithm-based active disturbance rejection control cascade gas turbine power control method
JP2023175366A (en) * 2022-05-30 2023-12-12 国立研究開発法人宇宙航空研究開発機構 Control device, control method, and program

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5740991A (en) * 1994-06-27 1998-04-21 Daimler-Benz Aerospace Airbus Gmbh Method and apparatus for optimizing the aerodynamic effect of an airfoil
US9227721B1 (en) * 2011-10-07 2016-01-05 The United States of America as represented by the Administrator of the National Aeronautics & Space Administration (NASA) Variable camber continuous aerodynamic control surfaces and methods for active wing shaping control
JP2019086468A (en) * 2017-11-09 2019-06-06 株式会社Nttファシリティーズ Vibration suppression control system, method for controlling vibration suppression, vibration analyzer, and method for analyzing vibration
JP2019084897A (en) * 2017-11-02 2019-06-06 株式会社Subaru Aircraft control system, aircraft control method and aircraft

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5740991A (en) * 1994-06-27 1998-04-21 Daimler-Benz Aerospace Airbus Gmbh Method and apparatus for optimizing the aerodynamic effect of an airfoil
US9227721B1 (en) * 2011-10-07 2016-01-05 The United States of America as represented by the Administrator of the National Aeronautics & Space Administration (NASA) Variable camber continuous aerodynamic control surfaces and methods for active wing shaping control
JP2019084897A (en) * 2017-11-02 2019-06-06 株式会社Subaru Aircraft control system, aircraft control method and aircraft
JP2019086468A (en) * 2017-11-09 2019-06-06 株式会社Nttファシリティーズ Vibration suppression control system, method for controlling vibration suppression, vibration analyzer, and method for analyzing vibration

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WADA DAICHI, SUGIMOTO YOHEI, MURAYAMA HIDEAKI, IGAWA HIROTAKA, NAKAMURA TOSHIYA: "Investigation of Inverse Analysis and Neural Network Approaches for Identifying Distributed Load using Distributed Strains", TRANSACTIONS OF THE JAPAN SOCIETY FOR AERONAUTICAL AND SPACE SCIENCES, vol. 62, no. Issue 3, 4 May 2019 (2019-05-04), pages 151 - 161, XP055807277, Retrieved from the Internet <URL:https://doi.org/10.2322/tjsass.62.151> [retrieved on 20201020], DOI: 10.2322/tjsass.62.151 *
WADA DAICHI, TAMAYAMA MASATO: "Wing Load and Angle o f Attack Identification by Integrating Optical Fiber Sensing and Neur al Network Approach in Wind Tunnel Test", APPLIED SCIENCES, vol. 9, no. Issue 7, 8 April 2019 (2019-04-08), pages 1 - 15, XP055807272, Retrieved from the Internet <URL:https://doi.org/10.3390/app9071461> [retrieved on 20201020], DOI: 10.3390/app9071461 *

Also Published As

Publication number Publication date
JP7280609B2 (en) 2023-05-24
JP2021049841A (en) 2021-04-01

Similar Documents

Publication Publication Date Title
WO2021059787A1 (en) Control device, learning device, control method, learning method, and program
US10520389B2 (en) Aerodynamic modeling using flight data
CN104773304B (en) Load estimation system for aerodynamic structures
US11629694B2 (en) Wind turbine model based control and estimation with accurate online models
Siddiqui et al. Lab-scale, closed-loop experimental characterization, model refinement, and validation of a hydrokinetic energy-harvesting ocean kite
US20230080379A1 (en) Digital twin for an autonomous vehicle
CN114398049A (en) Self-adaptive dynamic updating method for digital twin model of discrete manufacturing workshop
Hinson et al. Gyroscopic sensing in the wings of the hawkmoth Manduca sexta: the role of sensor location and directional sensitivity
Rastgoo et al. A novel study on forecasting the airfoil self-noise, using a hybrid model based on the combination of CatBoost and Arithmetic Optimization Algorithm
CN113777931A (en) Icing wing type pneumatic model construction method, device, equipment and medium
Verma et al. Aircraft parameter estimation using ELM network
Cao et al. System identification method based on interpretable machine learning for unknown aircraft dynamics
CN114491788A (en) Method and system for predicting aerodynamic force of hypersonic aircraft, electronic device and medium
CN117436322B (en) Wind turbine blade aeroelastic simulation method and medium based on phyllin theory
Bouhelal et al. Blade element momentum theory coupled with machine learning to predict wind turbine aerodynamic performances
Singh et al. Modified Delta method for estimation of parameters from flight data of stable and unstable aircraft
CN117350096A (en) Multi-type sensor layout optimization method for load performance evaluation under driving of particle swarm optimization algorithm
Nugroho et al. Comparison of black-grey-white box approach in system identification of a flight vehicle
Newton Stability and control derivative estimation for the bell-shaped lift distribution
Wada et al. Smart wing load alleviation through optical fiber sensing, load identification, and deep reinforcement learning
Haughn et al. MFC Morphing Aileron Control With Intelligent Sensing
Omran et al. Global aircraft aero-propulsive linear parameter-varying model using design of experiments
CN117875090B (en) Fixed-wing unmanned aerial vehicle incremental element flight aerodynamic modeling method considering wind interference
Öznurlu et al. Data-Driven Model Discovery and Control: Real-Time Implementation to Highly Maneuverable Aircraft Lateral-Directional Dynamics
CN117252109B (en) Aeroengine stability analysis method and system based on data processing

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20867815

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20867815

Country of ref document: EP

Kind code of ref document: A1