WO2018213999A1 - 家用设备学习方法、及服务器 - Google Patents

家用设备学习方法、及服务器 Download PDF

Info

Publication number
WO2018213999A1
WO2018213999A1 PCT/CN2017/085385 CN2017085385W WO2018213999A1 WO 2018213999 A1 WO2018213999 A1 WO 2018213999A1 CN 2017085385 W CN2017085385 W CN 2017085385W WO 2018213999 A1 WO2018213999 A1 WO 2018213999A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
state
indoor environment
matrix
parameter
Prior art date
Application number
PCT/CN2017/085385
Other languages
English (en)
French (fr)
Inventor
谢毅
张鹏程
张晴晴
Original Assignee
深圳微自然创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳微自然创新科技有限公司 filed Critical 深圳微自然创新科技有限公司
Priority to PCT/CN2017/085385 priority Critical patent/WO2018213999A1/zh
Priority to CN201780003362.8A priority patent/CN108419439B/zh
Publication of WO2018213999A1 publication Critical patent/WO2018213999A1/zh

Links

Images

Classifications

    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24FAIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING
    • F24F11/00Control or safety arrangements
    • F24F11/50Control or safety arrangements characterised by user interfaces or communication
    • F24F11/56Remote control
    • F24F11/58Remote control using Internet communication
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24FAIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING
    • F24F11/00Control or safety arrangements
    • F24F11/62Control or safety arrangements characterised by the type of control or by internal processing, e.g. using fuzzy logic, adaptive control or estimation of values
    • F24F11/63Electronic processing
    • F24F11/64Electronic processing using pre-stored data
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24FAIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING
    • F24F11/00Control or safety arrangements
    • F24F11/70Control systems characterised by their outputs; Constructional details thereof
    • F24F11/72Control systems characterised by their outputs; Constructional details thereof for controlling the supply of treated air, e.g. its pressure
    • F24F11/74Control systems characterised by their outputs; Constructional details thereof for controlling the supply of treated air, e.g. its pressure for controlling air flow rate or air velocity
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24FAIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING
    • F24F11/00Control or safety arrangements
    • F24F11/89Arrangement or mounting of control or safety devices
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24FAIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING
    • F24F2110/00Control inputs relating to air properties
    • F24F2110/10Temperature
    • FMECHANICAL ENGINEERING; LIGHTING; HEATING; WEAPONS; BLASTING
    • F24HEATING; RANGES; VENTILATING
    • F24FAIR-CONDITIONING; AIR-HUMIDIFICATION; VENTILATION; USE OF AIR CURRENTS FOR SCREENING
    • F24F2110/00Control inputs relating to air properties
    • F24F2110/20Humidity

Definitions

  • the present invention relates to the field of computer technologies, and in particular, to a home device learning method and a server.
  • Embodiments of the present invention provide a method for learning a home device, which can quickly adjust an indoor environment to an expected state.
  • An embodiment of the present invention provides a method for learning a home device, including:
  • the operation set includes at least one type of adjustment operation
  • determining a target operation set to be selected according to the target matrix determining a target operation set to be selected according to the target matrix, generating a corresponding control instruction, and transmitting the control instruction to the environment adjustment device, the control instruction instructing the environment adjustment device to perform the The action specified by the target action collection;
  • the previously located second state, the first state, and the target state calculate a target value corresponding to the target operation set, and the target matrix is updated using the target value.
  • the method before the constructing the target matrix, the method further includes:
  • first indoor environment parameter represents the first state
  • first state is an initial indoor environment state
  • the constructing the target matrix includes:
  • the determining, by using the preset policy selection mechanism, the target operation set to be selected according to the target matrix includes:
  • the N operation sets corresponding to the N elements having the largest value are filtered out from the first row of the target matrix by a probability ⁇ , and an operation set is randomly selected from the N operation sets as the target Manipulating the set, the N is an integer greater than 1, the N elements do not include the element with the largest value; the operation set corresponding to the element having the largest value is selected from the first row by the probability 1- ⁇ , as the Target action collection.
  • the determining that the indoor environment does not reach the target state includes:
  • the updating the target matrix by using the target value includes:
  • Q(s t , a t ) on the left side of the equation is a parameter value corresponding to the target operation set after the target matrix is updated
  • Q(s t , a t ) on the right side of the equation is the target operation set in the a parameter value corresponding to the target matrix before the update
  • the ⁇ and the ⁇ are preset constants
  • the R is the target value
  • the max Q(s t+1 , a) is in the second The largest parameter value among the various parameter values corresponding to all the operation sets that can be selected in the state.
  • the second embodiment of the present invention provides a server, including:
  • a matrix construction unit configured to construct a target matrix, where a first row element of the target matrix is a parameter value corresponding to at least two operation sets selectable to adjust an indoor environment from a first state to a target state, and the parameter value is The higher the likelihood that the indoor environment is adjusted from the first state to the target state, the operation set including at least one type of adjustment operation;
  • a determining unit configured to determine, by using a preset policy selection mechanism, a target operation set to be selected according to the target matrix
  • a generating unit configured to generate, according to the target operation set, a corresponding control instruction, where the control instruction instructs the environment adjustment apparatus to perform an operation specified by the target operation set;
  • a sending unit configured to send the control instruction to the environment adjusting device
  • the determining unit is further configured to determine that the indoor environment does not reach the target state, and is further configured to determine that the indoor environment reaches the target state;
  • a calculating unit configured to calculate the target operation set according to the second state, the first state, and the target state in which the indoor environment is currently located, if it is determined that the indoor environment does not reach the target state Corresponding target value;
  • An update unit for updating the target matrix using the target value is
  • the server further includes:
  • An acquiring unit configured to acquire a first indoor environment parameter and an outdoor environment parameter, where the first indoor environment parameter represents the first state, the first state is an initial indoor environment state; and the obtaining is compared with the outdoor environment parameter Corresponding target indoor environment parameters, the target indoor environment parameters characterizing the target state.
  • the matrix construction unit is specifically configured to acquire, corresponding to the at least two operation sets that are selectable by adjusting the indoor environment from the first state to the target state. Constructing the target matrix by the parameter value;
  • the matrix construction unit is specifically configured to determine, according to the relationship between the at least two operation sets selectable in the first state and the target state, the corresponding at least two operation sets The parameter values are constructed, and the target matrix is constructed, and the state specified by the selectable at least two operation sets and the target state are closer to their corresponding parameter values.
  • the determining unit is specifically configured to select, from the first row of the target matrix, an operation set corresponding to an element with the largest value as the target operation set;
  • the determining unit is specifically configured to filter, by using the probability ⁇ , the N operation sets corresponding to the N elements with the largest value from the first row of the target matrix, and randomly select from the N operation sets.
  • An operation set, as the target operation set, the N is an integer greater than 1, the N elements do not include an element having the largest value; and the element having the largest value is selected from the first row by a probability 1- ⁇ A corresponding set of operations as the target operation set.
  • the determining unit is specifically configured to: after the preset time for sending the control instruction, determine that the second state in which the indoor environment is currently not reaching the target state .
  • the updating unit is specifically configured to update the target matrix by using the following formula:
  • Q(s t , a t ) on the left side of the equation is a parameter value corresponding to the target operation set after the target matrix is updated
  • Q(s t , a t ) on the right side of the equation is the target operation set in the a parameter value corresponding to the target matrix before the update
  • the ⁇ and the ⁇ are preset constants
  • the R is the target value
  • the max Q(s t+1 , a) is in the second The largest parameter value among the various parameter values corresponding to all the operation sets that can be selected in the state.
  • the third embodiment of the present invention further provides a server, including: a processor, a receiver, a transmitter, and a memory; an executable program is stored in the memory; and the processor implements the foregoing by executing the executable program.
  • a server including: a processor, a receiver, a transmitter, and a memory; an executable program is stored in the memory; and the processor implements the foregoing by executing the executable program.
  • the target matrix is constructed, and a corresponding operation set is selected according to the target matrix by using a preset policy selection mechanism.
  • the first row element of the target matrix is to adjust the indoor environment from the first state to the target state.
  • the parameter values corresponding to at least two operation sets; the algorithm of the reinforcement learning is used to continuously optimize the target matrix, and the operation set is selected according to the optimized target matrix, so that the indoor environment can quickly reach the target state.
  • FIG. 1 is a schematic structural diagram of a system according to an embodiment of the present invention.
  • FIG. 2 is a schematic flow chart of a method for learning a home device according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of constructing a target matrix according to an embodiment of the present invention.
  • FIG. 4 is a schematic flowchart of a method for learning a home device according to an embodiment of the present invention.
  • FIG. 5 is a schematic structural diagram of a server according to an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of a server according to an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a server according to an embodiment of the present invention.
  • FIG. 1 is a schematic structural diagram of a system according to an embodiment of the present invention.
  • the sensor in FIG. 1 can represent a plurality of sensors, such as a temperature sensor, a humidity sensor, a light intensity sensor, etc., for collecting temperature, humidity, light intensity, wind speed, and the like.
  • the sensor in Figure 1 can be located in the environmental conditioning device, or it can be installed in other devices, and the collected data can be uploaded to the server through the network.
  • the server in Figure 1 can communicate with the terminal device over the network.
  • the terminal device in FIG. 1 such as a smart phone or a tablet computer, can receive a control command sent by the server and send and receive a control command to the environment adjustment device.
  • the environment adjusting device in FIG. 1 can perform a corresponding operation according to a control command sent by the terminal device.
  • An embodiment of the present invention provides a method for learning a home device, as shown in FIG. 2, including:
  • the first row element of the target matrix is a parameter value corresponding to at least two operation sets selectable to adjust the indoor environment from the first state to the target state, and the larger the parameter value is to adjust the indoor environment from the first state to the foregoing
  • the above target matrix has at least one row.
  • the indoor environment may be in a car, inside an aircraft, in a ship, or the like.
  • the first state is a state in which the indoor environment is currently located, for example, (26 ° C, 67%, strong), the first parameter represents the current temperature of the indoor environment, and the second parameter represents the current humidity of the indoor environment.
  • the third parameter indicates the current indoor wind speed of the above indoor environment.
  • the indoor wind speed can be divided into three levels: weak, medium and strong according to the intensity of the air-conditioning wind speed.
  • the target state may be an ideal indoor environment state determined according to outdoor environmental parameters. Specifically, the target state may be determined according to a correspondence between an outdoor environment parameter and an indoor environment parameter. For example, outdoor environmental parameters (16 ° C, 37%) can correspond to indoor environmental parameters (26 ° C, 47%), outdoor environmental parameters (36 ° C, 37%) can be compared with indoor environmental parameters (28 ° C, 60%) correspond.
  • the above operation set corresponds to the working state of the environment adjusting device, taking the air conditioner as an example, and the corresponding operation set may be (air conditioning temperature, air conditioning mode, air conditioning wind speed).
  • the air conditioning mode may include cooling, dehumidification, automatic, air supply, heating, and the like.
  • the at least two sets of operations selectable above are the set of operations currently selectable by the environment adjustment device, and may not be limited to the set of operations capable of achieving the target state.
  • the first state is (28 ° C, 60%, strong)
  • the target state is (22 ° C, 50%, strong)
  • the at least two sets of operations selectable above may be (22 ° C, dehumidification, Strong), (21 ° C, dehumidification, strong), (26 ° C, refrigeration, weak), etc., where (26 ° C, refrigeration, weak) this set of operations can not make the above indoor environment reach the above target state.
  • the above selectable at least two sets of operations may also be limited to a set of operations capable of achieving the above target state, which may reduce the number of selectable sets of operations and improve the adjustment efficiency.
  • the first state is (28 ° C, 60%, strong)
  • the target state is (22 ° C, 50%, strong)
  • the above selectable at least two operation sets may not be (26 ° C, dehumidification) , weak), etc., because (26 ° C, dehumidification, weak) this set of operations can not make the above indoor environment reach the above target state.
  • the above target matrix may be a Q matrix, and the first state and the target state may be understood as being located in a state set, and the at least two selectable operations may be understood as a set of actions, and the parameter value may be understood as a bonus value.
  • the rows represent different states, and the columns represent different sets of operations.
  • the elements in the matrix are rewards that reach the target state after executing the set of operations represented by the column in which they are located from the state represented by the row in which they are located.
  • the value is the Q value, such as the first row of the first column element
  • the prime value represents a bonus value in which the first operation set is executed in the first state to reach the target state.
  • determining a target operation set to be selected according to the target matrix determining a target operation set to be selected according to the target matrix, generating a corresponding control instruction, and transmitting the control instruction to the environment adjustment device, where the control instruction instructs the environment adjustment device to perform the target operation set.
  • the specified operation determining a target operation set to be selected according to the target matrix, generating a corresponding control instruction, and transmitting the control instruction to the environment adjustment device, where the control instruction instructs the environment adjustment device to perform the target operation set.
  • the above-described environmental conditioning device may be an air conditioner, an air cleaner, a humidifier, a dehumidifier, or the like.
  • two methods for selecting a target operation set are provided, as follows: selecting an operation set corresponding to an element having the largest value from the first row of the target matrix as the target operation set;
  • the N operation sets corresponding to the N elements having the largest value are filtered out from the first row of the target matrix by the probability ⁇ , and an operation set is randomly selected from the N operation sets as the target operation set.
  • the above N is an integer greater than 1, and the N elements do not include the element having the largest value; and the operation set corresponding to the element having the largest value is selected from the first row by the probability 1- ⁇ as the target operation set.
  • the first method is to select the operation set corresponding to the element with the largest value from the first row of the target matrix. This method is simple to calculate, and when the target matrix approaches convergence, there is a large probability to find the best operation set.
  • the second method is that the probability 1- ⁇ selects an operation set corresponding to the element with the largest value from the first row, and as the target operation set, randomly selects an operation set from the N operation sets as the target operation set by probability ⁇ . There is a certain probability that the parameter value is not the largest operation set. When the above target matrix is far away from convergence, the speed of finding a better operation set is improved.
  • the server may send the foregoing control instruction to the environment adjustment device by using a terminal device such as a mobile phone.
  • the terminal device may be bound to the environment adjustment device and send the control command to the environment adjustment device by transmitting an infrared signal or the like.
  • two methods for selecting a target operation set are proposed, and a corresponding method may be selected according to the convergence of the target matrix to improve the speed of finding a preferred operation set.
  • whether the indoor environment reaches the target state may be detected according to the preset time interval, as follows: the foregoing determining that the indoor environment does not reach the target state includes:
  • the preset time may be 15 minutes, 20 minutes, 30 minutes, or the like.
  • the server starts timing. After the time reaches 20 minutes, the current indoor environment parameter is acquired, and it is determined whether the second state currently in the indoor environment reaches the target state.
  • the situation that the indoor environment does not reach the target state can be determined in time, so as to timely adjust the working state of the environmental adjustment device.
  • the target matrix is constructed, and a corresponding operation set is selected according to the target matrix by using a preset policy selection mechanism.
  • the first row element of the target matrix is to adjust the indoor environment from the first state to the target state.
  • the parameter values corresponding to at least two operation sets; the algorithm of reinforcement learning is used to continuously optimize the target matrix, and a better operation set is determined according to the optimized target matrix, so that the indoor environment can quickly reach the target state.
  • the server obtains the target indoor environment parameter by using the obtained outdoor environment parameter, as follows: Before the foregoing constructing the target matrix, the method further includes:
  • the server may obtain the first indoor environment parameter by using a sensor located indoors, and the outdoor environment parameter may be obtained by a sensor located outdoors or from another server.
  • the target state may be an ideal indoor environment state determined according to the outdoor environment parameter. Specifically, the target state may be determined according to the corresponding relationship between the outdoor environment parameter and the indoor environment parameter, where the corresponding relationship may be pre-stored in the server, and the correspondence relationship of different users may be different; the corresponding relationship may also be Determined by statistical analysis of multiple indoor environmental parameters. For example, when the outdoor temperature is 36 ° C and the humidity is 47%, the indoor environment is at the temperature of 26 ° C and the humidity is 40%.
  • the longest or longest time determines that the outdoor parameters (36 ° C, 47%) correspond to the indoor parameters (26 ° C, 40%).
  • the focus of the embodiments of the present invention is not how to determine the target indoor environmental parameters according to the outdoor environmental parameters, which will not be described in detail herein.
  • the target indoor environmental parameters can be accurately determined to meet the needs of different users.
  • the foregoing construction target matrix includes:
  • the first method is to acquire the parameter value corresponding to the at least two operation sets selected by adjusting the indoor environment from the first state to the target state from the target matrix that has been saved by the server, and construct the target matrix;
  • the second method is to determine the target matrix corresponding to the at least two operation sets that are selectable according to the relationship between the at least two operation sets that are selectable in the first state and the target state, and construct the target matrix.
  • the above set of operations includes at least one parameter representing the final state. For example, a set of operations is (26 ° C, dehumidified, strong), where 26 ° C is the final state of the temperature corresponding to the set of operations. For example, as shown in FIG.
  • the current temperature is 18 ° C
  • the target temperature is 21 ° C
  • the temperatures in the operation sets of different columns are different, such as the temperature in the operation set corresponding to the first column is 17 ° C, the second column
  • the temperature in the corresponding operation set is 18 ° C, and so on, it can be seen that the temperature corresponding to the operation set and the target temperature are closer to the parameter value.
  • the embodiment of the present invention can determine the proximity of the state specified by the operation set to the target state in a plurality of manners, which is not limited herein. For example, parameter values of at least two operation sets may be initialized according to preset rules.
  • two methods for constructing a target matrix are provided, which can accelerate the convergence of the target matrix and reduce the time required to reach the target state.
  • a method for updating a target matrix is provided, as follows: the foregoing updating the target matrix by using the foregoing target value includes:
  • the Q(s t , a t ) on the left side of the expression is the parameter value corresponding to the target operation set after the target matrix is updated
  • the Q(s t , a t ) on the right side of the expression is the target operation set before the target matrix is updated.
  • the corresponding parameter value, the above ⁇ and the ⁇ are preset constants
  • the R is the target value
  • the max Q(s t+1 , a) is corresponding to all the operation sets selectable in the second state.
  • the above ⁇ and the above ⁇ are preset constants, and different values can be set according to different problems.
  • the convergence of the target matrix can be accelerated, and the time required to reach the target state can be reduced.
  • An embodiment of the present invention provides an application scenario.
  • the specific process is as follows: a user sends an adjustment indoor environment command to a server through an application program on a terminal device, such as a mobile phone; after receiving the adjustment indoor environment command, the server parses the adjustment indoor environment command.
  • the server acquires the current outdoor environment parameter and the indoor environment parameter of the user according to the identification information, and determines a corresponding target indoor environment parameter, that is, The indoor environment parameter corresponding to the user's thermal comfort zone; the server selects the adjustment operation by using the reinforcement learning algorithm, and generates a corresponding control command to send to the terminal device; the terminal device sends the control instruction to the environment adjustment device; the environment The adjusting device performs an adjustment operation specified by the control instruction; the server detects a current state of the indoor environment after transmitting the preset time of the control command, and updates a target matrix, that is, a Q matrix, and sends a new control command; the server Keep updating this The target matrix until the parameters of the indoor environment are the same as the target indoor parameters.
  • An embodiment of the present invention provides another method for learning a home device, as shown in FIG. 4, including:
  • the first indoor environment parameter characterizes the first state, and the first state is an initial indoor environment state.
  • the target indoor environmental parameters described above characterize the target state.
  • the target matrix is constructed, and a corresponding operation set is selected according to the target matrix by using a preset policy selection mechanism.
  • the first row element of the target matrix is to adjust the indoor environment from the first state to the target state.
  • the parameter values corresponding to at least two operation sets; the algorithm of reinforcement learning is used to continuously optimize the target matrix, and a better operation set is determined according to the optimized target matrix, so that the indoor environment can quickly reach the target state and save power.
  • An embodiment of the present invention provides a server, as shown in FIG. 5, including:
  • the matrix construction unit 501 is configured to construct a target matrix, where the first row element of the target matrix is a parameter value corresponding to at least two operation sets that can be adjusted from the first state to the target state, and the parameter value is larger.
  • the operation set includes at least one type of adjustment operation;
  • a determining unit 502 configured to determine, by using a preset policy selection mechanism, a target operation set to be selected according to the target matrix
  • the generating unit 503 is configured to generate a corresponding control instruction according to the target operation set, where the control instruction instructs the environment adjusting device to perform an operation specified by the target operation set;
  • the sending unit 504 is configured to send the foregoing control instruction to the environment adjusting device;
  • the determining unit 502 is further configured to determine that the indoor environment does not reach the target state, and is further configured to determine that the indoor environment reaches the target state;
  • the calculating unit 505 is configured to, when determining that the indoor environment does not reach the target state, calculate a target value corresponding to the target operation set according to the second state, the first state, and the target state currently in the indoor environment;
  • the updating unit 506 is configured to update the target matrix by using the target value.
  • the server obtains the target indoor environment parameter by using the obtained outdoor environment parameter, as follows: as shown in FIG. 6, the server further includes:
  • the acquiring unit 601 is configured to obtain a first indoor environment parameter and an outdoor environment parameter, where the first indoor environment parameter represents the first state, the first state is an initial indoor environment state, and the target corresponding to the outdoor environment parameter is acquired.
  • the indoor environmental parameter, the target indoor environmental parameter characterizes the target state.
  • the target indoor environmental parameters can be accurately determined to meet the needs of different users.
  • the matrix construction unit 501 is specifically configured to obtain the selection of the indoor environment from the first state to the target state. Constructing the target matrix by using the parameter values corresponding to the at least two operation sets;
  • the matrix construction unit 501 is configured to determine, according to the relationship between the at least two operation sets that are selectable in the first state and the target state, the parameter values corresponding to the at least two operation sets that are selectable, and construct In the above target matrix, the state specified by the at least two selectable operation sets and the target state are closer to the corresponding parameter value.
  • two methods for constructing a target matrix are provided, which can accelerate the convergence of the target matrix and reduce the time required to reach the target state.
  • the determining unit 502 is specifically configured to select an operation set corresponding to an element with the largest value from the first row of the target matrix. , as the above target operation set;
  • the determining unit 502 is specifically configured to filter, by using the probability ⁇ , the N operation sets corresponding to the N elements having the largest value from the first row of the target matrix, and randomly select an operation from the N operation sets.
  • a set, as the target operation set, the above N is an integer greater than 1, and the N elements do not include an element having the largest value; and the operation set corresponding to the element having the largest value is selected from the first row by a probability 1- ⁇ The above set of target operations.
  • two methods for selecting a target operation set are proposed, and a corresponding method may be selected according to the convergence of the target matrix to improve the speed of finding a preferred operation set.
  • the indoor environment may be detected according to a preset time interval, as follows:
  • the determining unit 502 is specifically configured to determine the indoor after the preset time of sending the control command.
  • the above second state in which the environment is currently located does not reach the above target state.
  • the situation that the indoor environment does not reach the target state can be determined in time, so as to timely adjust the working state of the environmental adjustment device.
  • a method for updating the target matrix is provided, as follows:
  • the update unit 506 is specifically configured to update the target matrix by using the following formula:
  • the Q(s t , a t ) on the left side of the expression is the parameter value corresponding to the target operation set after the target matrix is updated
  • the Q(s t , a t ) on the right side of the expression is the target operation set before the target matrix is updated.
  • the corresponding parameter value, the above ⁇ and the ⁇ are preset constants
  • the R is the target value
  • the max Q(s t+1 , a) is corresponding to all the operation sets selectable in the second state. The largest of the various parameter values.
  • the convergence of the target matrix can be accelerated, and the time required to reach the target state can be reduced.
  • FIG. 7 is a server provided by an embodiment of the present invention.
  • the server includes a processor 701 (the number of processors 701 may be one or more, and one processor in FIG. 7 is taken as an example), and the memory 702.
  • the receiver 703, the transmitter 704, in some embodiments of the present invention, the processor 701, the memory 702, the receiver 703, and the transmitter 704 may be connected by a bus or other means.
  • Memory 702 includes, but is not limited to, random access memory (RAM), read only memory (ROM), An erasable programmable read only memory (EPROM or flash memory), or a portable read only memory (CD-ROM), which is used for related instructions and data.
  • RAM random access memory
  • ROM read only memory
  • EPROM erasable programmable read only memory
  • CD-ROM portable read only memory
  • the memory 702 is also used to store a target matrix.
  • the operation set includes at least one type of adjustment operation; using a preset policy selection mechanism, determining the target operation set to be selected according to the target matrix, and generating a corresponding control instruction, Transmitting the control command to the environment adjustment device, the control command instructing the environment adjustment device to perform an operation specified by the target operation set; and determining that the indoor environment does not reach the target state, according to the current indoor environment
  • the two states, the first state, and the target state calculate a target value corresponding to the target operation set, and the target matrix is updated using the target value.
  • the server obtains the target indoor environment parameter by using the acquired outdoor environment parameter, as follows: the processor 701 is further configured to acquire the first indoor environmental parameter and the outdoor before the constructing the target matrix.
  • the first parameter is characterized by the first indoor environment parameter, wherein the first state is an initial indoor environment state; the target indoor environment parameter corresponding to the outdoor environment parameter is acquired, and the target indoor environment parameter represents the target state.
  • the target indoor environmental parameters can be accurately determined to meet the needs of different users.
  • two methods for constructing a target matrix are provided, as follows: the processor 701 is specifically configured to obtain the foregoing that the indoor environment is adjusted from the first state to the target state. Constructing the target matrix according to the parameter values corresponding to the at least two operation sets; or specifically, determining, according to the relationship between the at least two operation sets that are selectable in the first state and the target state, determining the at least the selectable The target parameter matrix corresponding to the two operation sets is configured to construct the target matrix, and the state specified by the at least two selectable operation sets and the target state are closer to the corresponding parameter value.
  • two methods for constructing a target matrix are provided, which can accelerate the convergence of the target matrix and reduce the time required to reach the target state.
  • two methods for selecting a target operation set are provided, as follows:
  • the processor 701 is specifically configured to select an operation set corresponding to an element with the largest value from the first row of the target matrix.
  • the N operation groups corresponding to the N elements having the largest value are filtered out from the first row of the target matrix by a probability ⁇ , and randomly selected from the N operation sets.
  • An operation set, as the target operation set, the above N is an integer greater than 1, the N elements do not include an element having the largest value; and the operation set corresponding to the element having the largest value is selected from the first row by a probability 1- ⁇ , as a collection of the above target operations.
  • two methods for selecting a target operation set are proposed, and a corresponding method may be selected according to the convergence of the target matrix to improve the speed of finding a preferred operation set.
  • the indoor environment may be detected according to a preset time interval, as follows:
  • the processor 701 is specifically configured to determine the indoor after the preset time of sending the control command.
  • the above second state in which the environment is currently located does not reach the above target state.
  • the situation that the indoor environment does not reach the target state can be determined in time, so as to timely adjust the working state of the environmental adjustment device.
  • a method for updating a target matrix is provided, as follows:
  • the processor 701 is specifically configured to update the target matrix by using the following formula:
  • the Q(s t , a t ) on the left side of the expression is the parameter value corresponding to the target operation set after the target matrix is updated
  • the Q(s t , a t ) on the right side of the expression is the target operation set before the target matrix is updated.
  • the corresponding parameter value, the above ⁇ and the ⁇ are preset constants
  • the R is the target value
  • the max Q(s t+1 , a) is corresponding to all the operation sets selectable in the second state. The largest of the various parameter values.
  • the convergence of the target matrix can be accelerated, and the time required to reach the target state can be reduced.

Landscapes

  • Engineering & Computer Science (AREA)
  • Chemical & Material Sciences (AREA)
  • Combustion & Propulsion (AREA)
  • Mechanical Engineering (AREA)
  • General Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Fluid Mechanics (AREA)
  • Air Conditioning Control Device (AREA)

Abstract

本发明实施例涉及计算机技术领域,公开了一种家用设备学习方法、及服务器,该方法包括:构建目标矩阵;采用预置的策略选择机制,依据所述目标矩阵确定所要选择的目标操作集合,生成相应的控制指令,向环境调节装置发送所述控制指令,所述控制指令指示所述环境调节装置执行所述目标操作集合所指定的操作;在确定所述室内环境未达到所述目标状态的情况下,依据所述室内环境当前所处的第二状态、第一状态以及目标状态计算所述目标操作集合对应的目标值,使用所述目标值更新所述目标矩阵。本发明实施例中的方案可以将室内环境快速地调节到预期的状态。

Description

家用设备学习方法、及服务器 技术领域
本发明涉及计算机技术领域,尤其涉及一种家用设备学习方法、及服务器。
背景技术
目前,人们经常使用家用电器如空调、空气净化器等去调控室内环境,由于人们不能完全了解这些家用电器的特性,导致人们在使用的过程中不清楚如何控制这些家用电器快速地达到自己想要的效果。如今,控制家用电器达到预期效果的方式是一次次的进行尝试,直到达到预期的效果。举例来说,用户认为当前舒适的温度为26摄氏度,该用户可以通过遥控器将空调的温度调到26摄氏度,并设置该空调的模式和风速,当该空调的温度达到26摄氏度后保持温度不变,该空调对应的26摄氏度可能不是用户预期的温度,这时用户需要再重新设置温度。通过这种方式,用户难以一次将室内温度调节到预期的状态,也很难找到较理想的调节方式,导致室内环境不能快速地达到预期的效果。
在实际应用中,采用上述技术方案,难以将室内环境快速地调节到预期的状态。
发明内容
本发明实施例提供一种家用设备学习方法,可以将室内环境快速地调节到预期的状态。
一方面本发明实施例提供了一种家用设备学习方法,包括:
构建目标矩阵,所述目标矩阵的第一行元素为将室内环境从第一状态调整到目标状态可选择的至少两个操作集合所对应的参数值,所述参数值越大将所述室内环境从所述第一状态调整到所述目标状态的可能性越高,所述操作集合包括至少一种类型的调节操作;
采用预置的策略选择机制,依据所述目标矩阵确定所要选择的目标操作集合,生成相应的控制指令,向环境调节装置发送所述控制指令,所述控制指令指示所述环境调节装置执行所述目标操作集合所指定的操作;
在确定所述室内环境未达到所述目标状态的情况下,依据所述室内环境当 前所处的第二状态、所述第一状态以及所述目标状态计算所述目标操作集合对应的目标值,使用所述目标值更新所述目标矩阵。
在一个可选的实现方式中,在所述构建目标矩阵之前,所述方法还包括:
获取第一室内环境参数和室外环境参数,所述第一室内环境参数表征所述第一状态,所述第一状态为初始的室内环境状态;
获取与所述室外环境参数相对应的目标室内环境参数,所述目标室内环境参数表征所述目标状态。
在一个可选的实现方式中,所述构建目标矩阵包括:
获取将所述室内环境从所述第一状态调整到所述目标状态可选择的所述至少两个操作集合所对应的所述参数值,构建所述目标矩阵;
或者,依据所述第一状态下可选择的所述至少两个操作集合与所述目标状态的关系,确定所述可选择的至少两个操作集合对应的所述参数值,构建所述目标矩阵,所述可选择的至少两个操作集合所指定的状态与所述目标状态越接近其对应的参数值越大。
在一个可选的实现方式中,所述采用预置的策略选择机制,依据所述目标矩阵确定所要选择的目标操作集合包括:
从所述目标矩阵的第一行中选择数值最大的元素所对应的操作集合,作为所述目标操作集合;
或者,以概率ε从所述目标矩阵的第一行中筛选出数值最大的N个元素所对应的N个操作集合,并从所述N个操作集合中随机选择一个操作集合,作为所述目标操作集合,所述N为大于1的整数,所述N个元素不包括数值最大的元素;以概率1-ε从所述第一行中选择数值最大的元素所对应的操作集合,作为所述目标操作集合。
在一个可选的实现方式中,所述确定所述室内环境未达到所述目标状态的情况包括:
在发送所述控制指令的预置时间后,确定所述室内环境当前所处的所述第二状态未达到所述目标状态。
在一个可选的实现方式中,所述使用所述目标值更新所述目标矩阵包括:
使用如下算式更新所述目标矩阵:
Q(st,at)=Q(st,at)+α(R+γmax Q(st+1,a)-Q(st,at));
算式左边的Q(st,at)为所述目标操作集合在所述目标矩阵更新后所对应的参数值,算式右边的Q(st,at)为所述目标操作集合在所述目标矩阵更新前所对应的参数值,所述α和所述γ为预置的常数,所述R为所述目标值,所述max Q(st+1,a)为在所述第二状态下可选择的全部操作集合所对应的各个参数值中的最大参数值。
二方面本发明实施例提供了一种服务器,包括:
矩阵构建单元,用于构建目标矩阵,所述目标矩阵的第一行元素为将室内环境从第一状态调整到目标状态可选择的至少两个操作集合所对应的参数值,所述参数值越大将所述室内环境从所述第一状态调整到所述目标状态的可能性越高,所述操作集合包括至少一种类型的调节操作;
确定单元,用于采用预置的策略选择机制,依据所述目标矩阵确定所要选择的目标操作集合;
生成单元,用于依据所述目标操作集合生成相应的控制指令,所述控制指令指示所述环境调节装置执行所述目标操作集合所指定的操作;
发送单元,用于向环境调节装置发送所述控制指令;
所述确定单元,还用于确定所述室内环境未达到所述目标状态的情况;还用于确定所述室内环境达到所述目标状态;
计算单元,用于在确定所述室内环境未达到所述目标状态的情况下,依据所述室内环境当前所处的第二状态、所述第一状态以及所述目标状态计算所述目标操作集合对应的目标值;
更新单元,用于使用所述目标值更新所述目标矩阵。
在一种可选的实现方式中,所述服务器还包括:
获取单元,用于获取第一室内环境参数和室外环境参数,所述第一室内环境参数表征所述第一状态,所述第一状态为初始的室内环境状态;获取与所述室外环境参数相对应的目标室内环境参数,所述目标室内环境参数表征所述目标状态。
在一种可选的实现方式中,所述矩阵构建单元,具体用于获取将所述室内环境从所述第一状态调整到所述目标状态可选择的所述至少两个操作集合所对应的所述参数值,构建所述目标矩阵;
或者,所述矩阵构建单元,具体用于依据所述第一状态下可选择的所述至少两个操作集合与所述目标状态的关系,确定所述可选择的至少两个操作集合对应的所述参数值,构建所述目标矩阵,所述可选择的至少两个操作集合所指定的状态与所述目标状态越接近其对应的参数值越大。
在一种可选的实现方式中,所述确定单元,具体用于从所述目标矩阵的第一行中选择数值最大的元素所对应的操作集合,作为所述目标操作集合;
或者,所述确定单元,具体用于以概率ε从所述目标矩阵的第一行中筛选出数值最大的N个元素所对应的N个操作集合,并从所述N个操作集合中随机选择一个操作集合,作为所述目标操作集合,所述N为大于1的整数,所述N个元素不包括数值最大的元素;以概率1-ε从所述第一行中选择数值最大的元素所对应的操作集合,作为所述目标操作集合。
在一种可选的实现方式中,所述确定单元,具体用于在发送所述控制指令的预置时间后,确定所述室内环境当前所处的所述第二状态未达到所述目标状态。
在一种可选的实现方式中,所述更新单元,具体用于使用如下算式更新所述目标矩阵:
Q(st,at)=Q(st,at)+α(R+γmax Q(st+1,a)-Q(st,at));
算式左边的Q(st,at)为所述目标操作集合在所述目标矩阵更新后所对应的参数值,算式右边的Q(st,at)为所述目标操作集合在所述目标矩阵更新前所对应的参数值,所述α和所述γ为预置的常数,所述R为所述目标值,所述max Q(st+1,a)为在所述第二状态下可选择的全部操作集合所对应的各个参数值中的最大参数值。
三方面本发明实施例还提供了一种服务器,包括:处理器、接收器、发送器以及存储器;在所述存储器中存储有可执行程序;所述处理器通过执行所述可执行程序实现前述一方面提供的任意一项的方法流程。
本发明实施例中,构建目标矩阵,采用预置的策略选择机制依据该目标矩阵选择相应的操作集合,该目标矩阵的第一行元素为将室内环境从第一状态调整到目标状态可选择的至少两个操作集合所对应的参数值;使用强化学习的算法不断优化该目标矩阵,并依据优化后的目标矩阵选择操作集合,可以使得室内环境快速地达到目标状态。
附图说明
为了更清楚地说明本发明实施例中的技术方案,下面将对本发明实施例或背景技术中所需要使用的附图进行说明。
图1是本发明实施例***结构示意图;
图2是本发明实施例家用设备学习方法的流程示意图;
图3是本发明实施例构建目标矩阵的示意图;
图4是本发明实施例家用设备学习方法的流程示意图;
图5是本发明实施例服务器结构示意图;
图6是本发明实施例服务器结构示意图;
图7是本发明实施例服务器结构示意图。
具体实施方式
请参阅图1,图1是本发明实施例***结构示意图。图1中的传感器可以表示多个传感器,如温度传感器、湿度传感器、光照强度传感器等,用于采集温度、湿度、光照强度、风速等。图1中的传感器可以位于环境调节装置中,也可以安装在其他设备中,可以将采集到的数据通过网络上传到服务器。图1中的服务器可以通过网络与终端设备进行通信。图1中的终端设备如智能手机、平板电脑等可以接收服务器发送的控制指令,并向环境调节装置发送接收到控制指令。图1中的环境调节装置可以根据终端设备发送的控制指令,执行相应的操作。
本发明实施例提供了一种家用设备学习方法,如图2所示,包括:
201、构建目标矩阵;
上述目标矩阵的第一行元素为将室内环境从第一状态调整到目标状态可选择的至少两个操作集合所对应的参数值,上述参数值越大将上述室内环境从上述第一状态调整到上述目标状态的可能性越高,上述操作集合包括至少一种类型的调节操作。上述目标矩阵至少有一行。所述室内环境可以是车内、飞机内、船内等。上述第一状态为上述室内环境当前所处的状态,例如可以是(26℃,67%,强),第一个参数表示上述室内环境当前的温度,第二个参数表示上述室内环境当前的湿度,第三个参数表示上述室内环境当前的室内风速。室内风速可以按照空调风速的强度划分为弱、中、强三个等级。上述目标状态可以是根据室外环境参数确定的较为理想的室内环境状态。具体的,可以根据室外环境参数与室内环境参数的对应关系确定上述目标状态。举例来说,室外环境参数(16℃,37%)可以与室内环境参数(26℃,47%)对应,室外环境参数(36℃,37%)可以与室内环境参数(28℃,60%)对应。
上述操作集合对应环境调节装置的工作状态,以空调为例,其对应的操作集合可以是(空调温度,空调模式,空调风速)。空调模式可以包括制冷、抽湿、自动、送风、制热等。上述可选择的至少两个操作集合是指环境调节装置当前可以选择的操作集合,可以不限定于能够达到上述目标状态的操作集合。举例来说,上述第一状态为(28℃,60%,强),上述目标状态为(22℃,50%,强),上述可选择的至少两个操作集合可以是(22℃,除湿,强)、(21℃,除湿,强)、(26℃,制冷,弱)等,其中(26℃,制冷,弱)这一操作集合不能使上述室内环境达到上述目标状态。上述可选择的至少两个操作集合也可以限定于能够达到上述目标状态的操作集合,这样可以减少可选择的操作集合的数量,提高调节效率。举例来说,上述第一状态为(28℃,60%,强),上述目标状态为(22℃,50%,强),上述可选择的至少两个操作集合不可以是(26℃,除湿,弱)等,因为(26℃,除湿,弱)这一操作集合不能使上述室内环境达到上述目标状态。
上述目标矩阵可以是Q矩阵,上述第一状态和上述目标状态可以理解为位于状态集合中,上述可选择的至少两个操作集合可以理解为动作集合,上述参数值可以理解为奖励值。上述目标矩阵中,行表示不同的状态,列表示不同的操作集合,矩阵中的元素为从其所在的行所代表的状态出发执行其所在的列所代表的操作集合后,达到目标状态的奖励值即Q值,例如第一行第一列元 素表示在第一状态下执行第一操作集合达到目标状态的奖励值。
202、采用预置的策略选择机制,依据上述目标矩阵确定所要选择的目标操作集合,生成相应的控制指令,向环境调节装置发送上述控制指令,上述控制指令指示上述环境调节装置执行上述目标操作集合所指定的操作;
上述环境调节装置可以是空调、空气净化器、加湿器、除湿器等。
在一种可选的实现方式中,提供了两种选择目标操作集合方法,具体如下:从上述目标矩阵的第一行中选择数值最大的元素所对应的操作集合,作为上述目标操作集合;
或者,以概率ε从上述目标矩阵的第一行中筛选出数值最大的N个元素所对应的N个操作集合,并从上述N个操作集合中随机选择一个操作集合,作为上述目标操作集合,上述N为大于1的整数,上述N个元素不包括数值最大的元素;以概率1-ε从上述第一行中选择数值最大的元素所对应的操作集合,作为上述目标操作集合。
第一种方法是从上述目标矩阵的第一行中选择数值最大的元素所对应的操作集合,这种方法计算简单,在上述目标矩阵接近收敛时,有较大概率找到最好的操作集合。
第二种方法是概率1-ε从上述第一行中选择数值最大的元素所对应的操作集合,作为上述目标操作集合,以概率ε随机从N个操作集合中选择一个操作集合作为目标操作集合,有一定概率选择参数值不是最大的操作集合,在上述目标矩阵离收敛较远时,提高寻找较优操作集合的速度。
本发明实施例中,服务器可以通过终端设备如手机向环境调节装置发送上述控制指令。终端设备可以与上述环境调节装置进行绑定,并通过发射红外线信号等方式向上述环境调节装置发送上述控制指令。
本发明实施例中,提出了两种选择目标操作集合方法,可以根据目标矩阵的收敛情况,选择相应的方法,提高寻找较优操作集合的速度。
203、在确定上述室内环境未达到上述目标状态的情况下,依据上述室内环境当前所处的第二状态、上述第一状态以及上述目标状态计算上述目标操作集合对应的目标值,使用上述目标值更新上述目标矩阵。
在一种可选的实现方式中,可以按照预置的时间间隔检测室内环境是否达到目标状态,具体如下:上述确定上述室内环境未达到上述目标状态的情况包括:
在发送上述控制指令的预置时间后,确定上述室内环境当前所处的上述第二状态未达到上述目标状态。
上述预置时间可以是15分钟、20分钟、30分钟等。举例来说,在发送上述控制指令之后,服务器开始计时,当时间达到20分钟后,获取当前的室内环境参数,并确定上述室内环境当前所处的上述第二状态是否达到上述目标状态。
本发明实施例中,可以及时确定室内环境未达到目标状态的情况,以便于及时调整环境调节装置的工作状态。
本发明实施例中,构建目标矩阵,采用预置的策略选择机制依据该目标矩阵选择相应的操作集合,该目标矩阵的第一行元素为将室内环境从第一状态调整到目标状态可选择的至少两个操作集合所对应的参数值;使用强化学习的算法不断优化该目标矩阵,并依据优化后的目标矩阵确定较好的操作集合,可以使得室内环境快速地达到目标状态。
在一种可选的实现方式中,服务器通过获取到的室外环境参数获取目标室内环境参数,具体如下:在上述构建目标矩阵之前,上述方法还包括:
获取第一室内环境参数和室外环境参数,上述第一室内环境参数表征上述第一状态,上述第一状态为初始的室内环境状态;
获取与上述室外环境参数相对应的目标室内环境参数,上述目标室内环境参数表征上述目标状态。
服务器可以通过位于室内的传感器获取上述第一室内环境参数,可以通过位于室外的传感器或者从其它服务器获得上述室外环境参数。上述目标状态可以是根据上述室外环境参数确定的较为理想的室内环境状态。具体的,可以根据室外环境参数与室内环境参数的对应关系确定上述目标状态,该对应关系可以是预先存储在该服务器中的,且不同用户的对应关系可以是不同的;该对应关系也可以是通过对多个室内环境参数的统计分析确定的。例如,在室外温度为36℃,湿度为47%时,该室内环境处于温度26℃,湿度40%的状态的次数最 大或者时长最长,则确定室外参数(36℃,47%)与室内参数(26℃,40%)相对应。本发明实施例的重点不是如何根据室外环境参数确定目标室内环境参数,这里不作详述。
本发明实施例中,可以准确地确定目标室内环境参数,满足不同用户的需求。
在一种可选的实现方式中,提供了两种构建目标矩阵的方法,具体如下:上述构建目标矩阵包括:
获取将上述室内环境从上述第一状态调整到上述目标状态可选择的上述至少两个操作集合所对应的上述参数值,构建上述目标矩阵;
或者,依据上述第一状态下可选择的上述至少两个操作集合与上述目标状态的关系,确定上述可选择的至少两个操作集合对应的上述参数值,构建上述目标矩阵,上述可选择的至少两个操作集合所指定的状态与上述目标状态越接近其对应的参数值越大。
第一种方法是从服务器已保存的目标矩阵中获取将上述室内环境从上述第一状态调整到上述目标状态可选择的上述至少两个操作集合所对应的上述参数值,构建上述目标矩阵;
第二种方法是依据上述第一状态下可选择的上述至少两个操作集合与上述目标状态的关系,确定上述可选择的至少两个操作集合对应的上述参数值,构建上述目标矩阵。上述操作集合中包括至少一个表示最终状态的参数。例如,某一操作集合为(26℃,除湿,强),其中26℃就是该操作集合对应的温度的最终状态。举例来说,如图3所示,当前温度为18℃,目标温度为21℃,不同列的操作集合中的温度不同,如第一列对应的操作集合中的温度为17℃,第二列对应的操作集合中的温度为18℃,依次类推,可以看出操作集合所对应的温度与目标温度越接近其参数值越大。本发明实施例可以通过其他多种方式确定操作集合所指定的状态与上述目标状态的接近程度,这里不作限定。例如,可以根据预设的规则初始化至少两个操作集合的参数值。
本发明实施例中,提供了两种构建目标矩阵的方法,可以加速目标矩阵的收敛,减少到达目标状态所需的时间。
在一种可选的实现方式中,提供了一种更新目标矩阵的方法,具体如下:上述使用上述目标值更新上述目标矩阵包括:
使用如下算式更新上述目标矩阵:
Q(st,at)=Q(st,at)+α(R+γmax Q(st+1,a)-Q(st,at));
算式左边的Q(st,at)为上述目标操作集合在上述目标矩阵更新后所对应的参数值,算式右边的Q(st,at)为上述目标操作集合在上述目标矩阵更新前所对应的参数值,上述α和上述γ为预置的常数,上述R为上述目标值,上述max Q(st+1,a)为在上述第二状态下可选择的全部操作集合所对应的各个参数值中的最大参数值。上述α和上述γ为预置的常数,可以根据不同的问题设置不同的数值。
本发明实施例中,可以加快目标矩阵的收敛,减少到达目标状态所需的时间。
本发明实施例提供一种的应用场景,具体过程如下:用户通过终端设备如手机上的应用程序向服务器发送调节室内环境指令;服务器接收到该调节室内环境指令后,解析该调节室内环境指令,得到该终端设备的标识信息,该终端设备对应一个环境调节装置以及一个室内环境;该服务器依据该标识信息获取该用户当前的室外环境参数和室内环境参数,并确定对应的目标室内环境参数,即该用户的热舒适区对应的室内环境参数;该服务器利用强化学习算法选择调节操作,并生成相应的控制指令发送给该终端设备;该终端设备向该环境调节装置发送给该控制指令;该环境调节装置执行该控制指令所指定的调节操作;该服务器在发送该控制指令的预置时间后,检测该室内环境当前的状态,并更新目标矩阵,即Q矩阵,发送新的控制指令;该服务器不断更新该目标矩阵直到该室内环境的参数与目标室内参数相同。
本发明实施例提出了另一种家用设备学习方法,如图4所示,包括:
401、获取第一室内环境参数和室外环境参数;
上述第一室内环境参数表征上述第一状态,上述第一状态为初始的室内环境状态。
402、获取与上述室外环境参数相对应的目标室内环境参数;
上述目标室内环境参数表征上述目标状态。
403、依据第一状态下可选择的至少两个操作集合与目标状态的关系,确定上述可选择的至少两个操作集合对应的参数值,构建目标矩阵;
404、采用预置的策略选择机制,依据上述目标矩阵确定所要选择的目标操作集合;
405、依据上述目标操作集合生成控制指令,向环境调节装置发送上述控制指令;
406、在发送上述控制指令的预置时间后,确定上述室内环境当前所处的上述第二状态未达到上述目标状态;
407、计算上述目标操作集合对应的目标值;
408、使用上述目标值更新上述目标矩阵;
409、存储上述目标矩阵。
本发明实施例中,构建目标矩阵,采用预置的策略选择机制依据该目标矩阵选择相应的操作集合,该目标矩阵的第一行元素为将室内环境从第一状态调整到目标状态可选择的至少两个操作集合所对应的参数值;使用强化学习的算法不断优化该目标矩阵,并依据优化后的目标矩阵确定较好的操作集合,可以使得室内环境快速地达到目标状态,节省电能。
本发明实施例提供了一种服务器,如图5所示,包括:
矩阵构建单元501,用于构建目标矩阵,上述目标矩阵的第一行元素为将室内环境从第一状态调整到目标状态可选择的至少两个操作集合所对应的参数值,上述参数值越大将上述室内环境从上述第一状态调整到上述目标状态的可能性越高,上述操作集合包括至少一种类型的调节操作;
确定单元502,用于采用预置的策略选择机制,依据上述目标矩阵确定所要选择的目标操作集合;
生成单元503,用于依据上述目标操作集合生成相应的控制指令,上述控制指令指示上述环境调节装置执行上述目标操作集合所指定的操作;
发送单元504,用于向环境调节装置发送上述控制指令;
上述确定单元502,还用于确定上述室内环境未达到上述目标状态的情况;还用于确定上述室内环境达到上述目标状态;
计算单元505,用于在确定上述室内环境未达到上述目标状态的情况下,依据上述室内环境当前所处的第二状态、上述第一状态以及上述目标状态计算上述目标操作集合对应的目标值;
更新单元506,用于使用上述目标值更新上述目标矩阵。
具体实现方法和图2中的方法相同,这里不作详述。
在一种可选的实现方式中,服务器通过获取到的室外环境参数获取目标室内环境参数,具体如下:如图6所示,上述服务器还包括:
获取单元601,用于获取第一室内环境参数和室外环境参数,上述第一室内环境参数表征上述第一状态,上述第一状态为初始的室内环境状态;获取与上述室外环境参数相对应的目标室内环境参数,上述目标室内环境参数表征上述目标状态。
本发明实施例中,可以准确地确定目标室内环境参数,满足不同用户的需求。
在一种可选的实现方式中,提供了两种构建目标矩阵的方法,具体如下:上述矩阵构建单元501,具体用于获取将上述室内环境从上述第一状态调整到上述目标状态可选择的上述至少两个操作集合所对应的上述参数值,构建上述目标矩阵;
或者,上述矩阵构建单元501,具体用于依据上述第一状态下可选择的上述至少两个操作集合与上述目标状态的关系,确定上述可选择的至少两个操作集合对应的上述参数值,构建上述目标矩阵,上述可选择的至少两个操作集合所指定的状态与上述目标状态越接近其对应的参数值越大。
本发明实施例中,提供了两种构建目标矩阵的方法,可以加速目标矩阵的收敛,减少到达目标状态所需的时间。
在一种可选的实现方式中,提供了两种选择目标操作集合方法,具体如下:上述确定单元502,具体用于从上述目标矩阵的第一行中选择数值最大的元素所对应的操作集合,作为上述目标操作集合;
或者,上述确定单元502,具体用于以概率ε从上述目标矩阵的第一行中筛选出数值最大的N个元素所对应的N个操作集合,并从上述N个操作集合中随机选择一个操作集合,作为上述目标操作集合,上述N为大于1的整数,上述N个元素不包括数值最大的元素;以概率1-ε从上述第一行中选择数值最大的元素所对应的操作集合,作为上述目标操作集合。
本发明实施例中,提出了两种选择目标操作集合方法,可以根据目标矩阵的收敛情况,选择相应的方法,提高寻找较优操作集合的速度。
在一种可选的实现方式中,可以按照预置的时间间隔检测室内环境是否达到目标状态,具体如下:上述确定单元502,具体用于在发送上述控制指令的预置时间后,确定上述室内环境当前所处的上述第二状态未达到上述目标状态。
本发明实施例中,可以及时确定室内环境未达到目标状态的情况,以便于及时调整环境调节装置的工作状态。
一种可选的实现方式中,提供了一种更新目标矩阵的方法,具体如下:上述更新单元506,具体用于使用如下算式更新上述目标矩阵:
Q(st,at)=Q(st,at)+α(R+γmax Q(st+1,a)-Q(st,at));
算式左边的Q(st,at)为上述目标操作集合在上述目标矩阵更新后所对应的参数值,算式右边的Q(st,at)为上述目标操作集合在上述目标矩阵更新前所对应的参数值,上述α和上述γ为预置的常数,上述R为上述目标值,上述max Q(st+1,a)为在上述第二状态下可选择的全部操作集合所对应的各个参数值中的最大参数值。
本发明实施例中,可以加快目标矩阵的收敛,减少到达目标状态所需的时间。
请参见图7,图7是本发明实施例提供的一种服务器,该服务器包括处理器701(处理器701的数量可以一个或多个,图7中以一个处理器为例)、存储器702、接收器703、发送器704,在本发明的一些实施例中,处理器701、存储器702、接收器703、发送器704可通过总线或者其它方式连接。
存储器702包括但不限于是随机存取存储器(RAM)、只读存储器(ROM)、 可擦除可编程只读存储器(EPROM或者快闪存储器)、或便携式只读存储器(CD-ROM),该存储器702用于相关指令及数据。存储器702还用于存储目标矩阵。
上述服务器中的处理器701用于读取上述存储器702中存储的程序代码后,执行以下操作:
构建目标矩阵,上述目标矩阵的第一行元素为将室内环境从第一状态调整到目标状态可选择的至少两个操作集合所对应的参数值,上述参数值越大将上述室内环境从上述第一状态调整到上述目标状态的可能性越高,上述操作集合包括至少一种类型的调节操作;采用预置的策略选择机制,依据上述目标矩阵确定所要选择的目标操作集合,生成相应的控制指令,向环境调节装置发送上述控制指令,上述控制指令指示上述环境调节装置执行上述目标操作集合所指定的操作;在确定上述室内环境未达到上述目标状态的情况下,依据上述室内环境当前所处的第二状态、上述第一状态以及上述目标状态计算上述目标操作集合对应的目标值,使用上述目标值更新上述目标矩阵。
具体实现方法和图2中的方法相同,这里不作详述。
在一种可选的实现方式中,服务器通过获取到的室外环境参数获取目标室内环境参数,具体如下:上述处理器701,还用于在上述构建目标矩阵之前,获取第一室内环境参数和室外环境参数,上述第一室内环境参数表征上述第一状态,上述第一状态为初始的室内环境状态;获取与上述室外环境参数相对应的目标室内环境参数,上述目标室内环境参数表征上述目标状态。
本发明实施例中,可以准确地确定目标室内环境参数,满足不同用户的需求。
在一种可选的实现方式中,提供了两种构建目标矩阵的方法,具体如下:上述处理器701,具体用于获取将上述室内环境从上述第一状态调整到上述目标状态可选择的上述至少两个操作集合所对应的上述参数值,构建上述目标矩阵;或者,具体用于依据上述第一状态下可选择的上述至少两个操作集合与上述目标状态的关系,确定上述可选择的至少两个操作集合对应的上述参数值,构建上述目标矩阵,上述可选择的至少两个操作集合所指定的状态与上述目标状态越接近其对应的参数值越大。
本发明实施例中,提供了两种构建目标矩阵的方法,可以加速目标矩阵的收敛,减少到达目标状态所需的时间。
在一种可选的实现方式中,提供了两种选择目标操作集合方法,具体如下:上述处理器701,具体用于从上述目标矩阵的第一行中选择数值最大的元素所对应的操作集合,作为上述目标操作集合;或者,具体用于以概率ε从上述目标矩阵的第一行中筛选出数值最大的N个元素所对应的N个操作集合,并从上述N个操作集合中随机选择一个操作集合,作为上述目标操作集合,上述N为大于1的整数,上述N个元素不包括数值最大的元素;以概率1-ε从上述第一行中选择数值最大的元素所对应的操作集合,作为上述目标操作集合。
本发明实施例中,提出了两种选择目标操作集合方法,可以根据目标矩阵的收敛情况,选择相应的方法,提高寻找较优操作集合的速度。
在一种可选的实现方式中,可以按照预置的时间间隔检测室内环境是否达到目标状态,具体如下:上述处理器701,具体用于在发送上述控制指令的预置时间后,确定上述室内环境当前所处的上述第二状态未达到上述目标状态。
本发明实施例中,可以及时确定室内环境未达到目标状态的情况,以便于及时调整环境调节装置的工作状态。
在一种可选的实现方式中,提供了一种更新目标矩阵的方法,具体如下:上述处理器701,具体用于使用如下算式更新上述目标矩阵:
Q(st,at)=Q(st,at)+α(R+γmax Q(st+1,a)-Q(st,at));
算式左边的Q(st,at)为上述目标操作集合在上述目标矩阵更新后所对应的参数值,算式右边的Q(st,at)为上述目标操作集合在上述目标矩阵更新前所对应的参数值,上述α和上述γ为预置的常数,上述R为上述目标值,上述max Q(st+1,a)为在上述第二状态下可选择的全部操作集合所对应的各个参数值中的最大参数值。
本发明实施例中,可以加快目标矩阵的收敛,减少到达目标状态所需的时间。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到 各种等效的修改或替换,这些修改或替换都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以权利要求的保护范围为准。

Claims (12)

  1. 一种家用设备学习方法,其特征在于,包括:
    构建目标矩阵,所述目标矩阵的第一行元素为将室内环境从第一状态调整到目标状态可选择的至少两个操作集合所对应的参数值,所述参数值越大将所述室内环境从所述第一状态调整到所述目标状态的可能性越高,所述操作集合包括至少一种类型的调节操作;
    采用预置的策略选择机制,依据所述目标矩阵确定所要选择的目标操作集合,生成相应的控制指令,向环境调节装置发送所述控制指令,所述控制指令指示所述环境调节装置执行所述目标操作集合所指定的操作;
    在确定所述室内环境未达到所述目标状态的情况下,依据所述室内环境当前所处的第二状态、所述第一状态以及所述目标状态计算所述目标操作集合对应的目标值,使用所述目标值更新所述目标矩阵。
  2. 根据权利要求1所述方法,其特征在于,在所述构建目标矩阵之前,所述方法还包括:
    获取第一室内环境参数和室外环境参数,所述第一室内环境参数表征所述第一状态,所述第一状态为初始的室内环境状态;
    获取与所述室外环境参数相对应的目标室内环境参数,所述目标室内环境参数表征所述目标状态。
  3. 根据权利要求2所述方法,其特征在于,所述构建目标矩阵包括:
    获取将所述室内环境从所述第一状态调整到所述目标状态可选择的所述至少两个操作集合所对应的所述参数值,构建所述目标矩阵;
    或者,依据所述第一状态下可选择的所述至少两个操作集合与所述目标状态的关系,确定所述可选择的至少两个操作集合对应的所述参数值,构建所述目标矩阵,所述可选择的至少两个操作集合所指定的状态与所述目标状态越接近其对应的参数值越大。
  4. 根据权利要求3所述方法,其特征在于,所述采用预置的策略选择机制,依据所述目标矩阵确定所要选择的目标操作集合包括:
    从所述目标矩阵的第一行中选择数值最大的元素所对应的操作集合,作为所述目标操作集合;
    或者,以概率ε从所述目标矩阵的第一行中筛选出数值最大的N个元素所对应的N个操作集合,并从所述N个操作集合中随机选择一个操作集合,作为所述目标操作集合,所述N为大于1的整数,所述N个元素不包括数值最大的元素;以概率1-ε从所述第一行中选择数值最大的元素所对应的操作集合,作为所述目标操作集合。
  5. 根据权利要求4所述方法,其特征在于,所述确定所述室内环境未达到所述目标状态的情况包括:
    在发送所述控制指令的预置时间后,确定所述室内环境当前所处的所述第二状态未达到所述目标状态。
  6. 根据权利要求1至5任意一项所述方法,其特征在于,所述使用所述目标值更新所述目标矩阵包括:
    使用如下算式更新所述目标矩阵:
    Q(st,at)=Q(st,at)+α(R+γmax Q(st+1,a)-Q(st,at));
    算式左边的Q(st,at)为所述目标操作集合在所述目标矩阵更新后所对应的参数值,算式右边的Q(st,at)为所述目标操作集合在所述目标矩阵更新前所对应的参数值,所述α和所述γ为预置的常数,所述R为所述目标值,所述max Q(st+1,a)为在所述第二状态下可选择的全部操作集合所对应的各个参数值中的最大参数值。
  7. 一种服务器,其特征在于,包括:
    矩阵构建单元,用于构建目标矩阵,所述目标矩阵的第一行元素为将室内环境从第一状态调整到目标状态可选择的至少两个操作集合所对应的参数值,所述参数值越大将所述室内环境从所述第一状态调整到所述目标状态的可能性越高,所述操作集合包括至少一种类型的调节操作;
    确定单元,用于采用预置的策略选择机制,依据所述目标矩阵确定所要选择的目标操作集合;
    生成单元,用于依据所述目标操作集合生成相应的控制指令,所述控制指令指示所述环境调节装置执行所述目标操作集合所指定的操作;
    发送单元,用于向环境调节装置发送所述控制指令;
    所述确定单元,还用于确定所述室内环境未达到所述目标状态的情况;还用于确定所述室内环境达到所述目标状态;
    计算单元,用于在确定所述室内环境未达到所述目标状态的情况下,依据所述室内环境当前所处的第二状态、所述第一状态以及所述目标状态计算所述目标操作集合对应的目标值;
    更新单元,用于使用所述目标值更新所述目标矩阵。
  8. 根据权利要求7所述服务器,其特征在于,所述服务器还包括:
    获取单元,用于获取第一室内环境参数和室外环境参数,所述第一室内环境参数表征所述第一状态,所述第一状态为初始的室内环境状态;获取与所述室外环境参数相对应的目标室内环境参数,所述目标室内环境参数表征所述目标状态。
  9. 根据权利要求8所述服务器,其特征在于,
    所述矩阵构建单元,具体用于获取将所述室内环境从所述第一状态调整到所述目标状态可选择的所述至少两个操作集合所对应的所述参数值,构建所述目标矩阵;
    或者,所述矩阵构建单元,具体用于依据所述第一状态下可选择的所述至少两个操作集合与所述目标状态的关系,确定所述可选择的至少两个操作集合对应的所述参数值,构建所述目标矩阵,所述可选择的至少两个操作集合所指定的状态与所述目标状态越接近其对应的参数值越大。
  10. 根据权利要求9所述服务器,其特征在于,
    所述确定单元,具体用于从所述目标矩阵的第一行中选择数值最大的元素所对应的操作集合,作为所述目标操作集合;
    或者,所述确定单元,具体用于以概率ε从所述目标矩阵的第一行中筛选出数值最大的N个元素所对应的N个操作集合,并从所述N个操作集合中随机选择一个操作集合,作为所述目标操作集合,所述N为大于1的整数,所述N个元素不包括数值最大的元素;以概率1-ε从所述第一行中选择数值最大的元素所对应的操作集合,作为所述目标操作集合。
  11. 根据权利要求10所述服务器,其特征在于,
    所述确定单元,具体用于在发送所述控制指令的预置时间后,确定所述室内环境当前所处的所述第二状态未达到所述目标状态。
  12. 根据权利要求7至11任意一项所述服务器,其特征在于,
    所述更新单元,具体用于使用如下算式更新所述目标矩阵:
    Q(st,at)=Q(st,at)+α(R+γmax Q(st+1,a)-Q(st,at));
    算式左边的Q(st,at)为所述目标操作集合在所述目标矩阵更新后所对应的参数值,算式右边的Q(st,at)为所述目标操作集合在所述目标矩阵更新前所对应的参数值,所述α和所述γ为预置的常数,所述R为所述目标值,所述max Q(st+1,a)为在所述第二状态下可选择的全部操作集合所对应的各个参数值中的最大参数值。
PCT/CN2017/085385 2017-05-22 2017-05-22 家用设备学习方法、及服务器 WO2018213999A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2017/085385 WO2018213999A1 (zh) 2017-05-22 2017-05-22 家用设备学习方法、及服务器
CN201780003362.8A CN108419439B (zh) 2017-05-22 2017-05-22 家用设备学习方法、及服务器

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/085385 WO2018213999A1 (zh) 2017-05-22 2017-05-22 家用设备学习方法、及服务器

Publications (1)

Publication Number Publication Date
WO2018213999A1 true WO2018213999A1 (zh) 2018-11-29

Family

ID=63126496

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/085385 WO2018213999A1 (zh) 2017-05-22 2017-05-22 家用设备学习方法、及服务器

Country Status (2)

Country Link
CN (1) CN108419439B (zh)
WO (1) WO2018213999A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111505944A (zh) * 2019-01-30 2020-08-07 珠海格力电器股份有限公司 节能控制策略学习方法、实现空调节能控制的方法及装置

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110925969B (zh) * 2019-10-17 2020-11-27 珠海格力电器股份有限公司 一种空调控制方法、装置、电子设备及存储介质
CN113834200A (zh) * 2021-11-26 2021-12-24 深圳市愚公科技有限公司 基于强化学习模型的空气净化器调节方法及空气净化器

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160048142A1 (en) * 2014-08-15 2016-02-18 Delta Electronics, Inc. Intelligent air-conditioning controlling system and intelligent controlling method for the same
CN105737340A (zh) * 2016-03-09 2016-07-06 深圳微自然创新科技有限公司 一种空调温度智能控制方法及装置
CN106247554A (zh) * 2016-08-16 2016-12-21 华南理工大学 基于人体热适应和气候特点的室内环境控制***及方法

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103375869B (zh) * 2012-04-12 2015-12-02 珠海格力电器股份有限公司 空调器的控制方法、装置及空调器
JPWO2015111173A1 (ja) * 2014-01-23 2017-03-23 三菱電機株式会社 空調機用コントローラ
CN105091202B (zh) * 2014-05-16 2018-04-17 株式会社理光 控制多个空调设备的方法和***
CN105588251B (zh) * 2014-10-20 2018-10-02 株式会社理光 控制空气调节***的方法和装置
US10571414B2 (en) * 2015-01-30 2020-02-25 Schneider Electric USA, Inc. Interior volume thermal modeling and control apparatuses, methods and systems
CN105387565B (zh) * 2015-11-24 2018-03-30 深圳市酷开网络科技有限公司 调节温度的方法和装置
CN105548959B (zh) * 2015-12-07 2017-10-17 电子科技大学 一种基于稀疏重建的多传感器多目标的定位方法
CN106196423B (zh) * 2016-06-30 2018-08-24 西安建筑科技大学 一种基于模型预测的室内环境品质控制优化方法
CN106302041A (zh) * 2016-08-05 2017-01-04 深圳博科智能科技有限公司 一种智能家居设备控制方法及装置
CN106294881A (zh) * 2016-08-30 2017-01-04 五八同城信息技术有限公司 信息识别方法及装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160048142A1 (en) * 2014-08-15 2016-02-18 Delta Electronics, Inc. Intelligent air-conditioning controlling system and intelligent controlling method for the same
CN105737340A (zh) * 2016-03-09 2016-07-06 深圳微自然创新科技有限公司 一种空调温度智能控制方法及装置
CN106247554A (zh) * 2016-08-16 2016-12-21 华南理工大学 基于人体热适应和气候特点的室内环境控制***及方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111505944A (zh) * 2019-01-30 2020-08-07 珠海格力电器股份有限公司 节能控制策略学习方法、实现空调节能控制的方法及装置

Also Published As

Publication number Publication date
CN108419439B (zh) 2020-06-30
CN108419439A (zh) 2018-08-17

Similar Documents

Publication Publication Date Title
US10584892B2 (en) Air-conditioning control method, air-conditioning control apparatus, and storage medium
KR102393418B1 (ko) 데이터 학습 서버 및 이의 학습 모델 생성 및 이용 방법
CN106842968B (zh) 一种控制方法、装置及***
CN111121237B (zh) 空调设备及其控制方法、服务器、计算机可读存储介质
CN110895011B (zh) 一种空调控制方法、装置、存储介质及空调
JP6280569B2 (ja) 動作パラメータ値学習装置、動作パラメータ値学習方法、学習型機器制御装置及びプログラム
WO2018213999A1 (zh) 家用设备学习方法、及服务器
CN110736248B (zh) 空调出风温度的控制方法和装置
CN111256307A (zh) 温度控制方法、空气调节设备及控制设备和存储介质
CN111256325A (zh) 温度控制方法、空气调节设备及控制设备和存储介质
CN113339965A (zh) 用于空调控制的方法、装置和空调
JP7039148B2 (ja) 制御システム、設備機器、リモートコントローラ、制御方法、及び、プログラム
JP6777174B2 (ja) サーバ装置、アダプタおよび空気調和システム
CN105511279B (zh) 家用电器远程控制方法及***、家用电器和服务器
TWI679384B (zh) 空氣清淨機以及網路系統
JP6941819B2 (ja) 空気調和機の運転を開始させる方法および制御装置
CN109323403A (zh) 空调器及其控制方法和控制装置及电子设备
CN105241001A (zh) 一种参数调整方法及空调
EP3779618B1 (en) Smart apparatus control method, apparatus, computer storage medium, and smart apparatus control apparatus
JP2016029917A (ja) 栽培モニタリング装置、栽培モニタリング方法および栽培モニタリングプログラム
CN110986327A (zh) 空调器的睡眠模式控制方法与空调器
JP2021063611A (ja) 空気調和システム
KR20160071094A (ko) 차량의 공조 제어 방법 및 그 장치
CN111108489A (zh) 服务器、信息处理方法、网络***以及空气净化器
WO2018214001A1 (zh) 环境调节方法、及服务器

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17911204

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17911204

Country of ref document: EP

Kind code of ref document: A1