CN114004623A

CN114004623A - Machine learning method and system

Info

Publication number: CN114004623A
Application number: CN202010735878.5A
Authority: CN
Inventors: 吴安新; 何其真
Original assignee: Shanghai Bilibili Technology Co Ltd
Current assignee: Shanghai Bilibili Technology Co Ltd
Priority date: 2020-07-28
Filing date: 2020-07-28
Publication date: 2022-02-01

Abstract

The application discloses a machine learning method, which comprises the following steps: acquiring training data from a training database, and segmenting the training data into a plurality of data segments; distributing the plurality of data fragments to a plurality of nodes respectively; and each node loads a historical model, updates the model parameters of the historical model according to the received data fragments, and adopts the updated model to carry out model training and machine learning. The application also discloses a machine learning system, an electronic device and a computer readable storage medium. Therefore, the FM algorithm optimization based on the streaming learning can be provided, the model directly carries out incremental training on the basis of the historical model aiming at each batch of new training data, and the next model is produced, so that the problems of data scale and resource consumption are effectively balanced, and large-scale training samples can be dealt with.

Description

Machine learning method and system

Technical Field

The present application relates to the field of machine learning technologies, and in particular, to a machine learning method, a machine learning system, an electronic device, and a computer-readable storage medium.

Background

The FM (Factorization Machine) algorithm is a common automatic feature cross Machine learning algorithm, and solves the problem of feature combination in a sparse data scene. The automatic feature crossing means that the model automatically learns the interaction relation between implicit expression features, and the binding processing is not performed on two different features manually. Compared with a linear model (such as logistic regression), the algorithm can obtain better performance effect, and therefore, the algorithm is widely applied to various large machine learning application scenarios, such as recommendation systems, advertisement calculation, search ranking and the like.

In the advertisement system, the click rate refers to the ratio of the number of clicks to the number of impressions of the advertisement, and can reflect the popularity of the recommended advertisement. In the internet advertisement system, there is a need to recommend advertisements that are more easily clicked to users according to advertisement clicking behaviors of hundreds of millions of users. The click-to-advertisement behavior log data of these users is taken as model training data, but the data volume is too large and may be billion-scale training data every day. The existing FM algorithm optimization mode adopts a full-scale multi-round iterative learning algorithm, such as an L-BFGS algorithm, an OWL-QN algorithm and the like, and the algorithm consumes extremely many computing resources and cannot well introduce the time sequence of data, so that the method is not suitable for large-scale training samples.

It should be noted that the above-mentioned contents are not intended to limit the scope of protection of the application.

Disclosure of Invention

The present application mainly aims to provide a machine learning method, a machine learning system, an electronic device, and a computer-readable storage medium, and aims to solve the problem of how to provide an FM algorithm optimization method supporting large-scale training samples.

In order to achieve the above object, an embodiment of the present application provides a machine learning method, where the method includes:

acquiring training data from a training database, and segmenting the training data into a plurality of data segments;

distributing the plurality of data fragments to a plurality of nodes respectively; and

and each node loads a historical model, updates the model parameters of the historical model according to the received data fragments, and adopts the updated model to carry out model training and machine learning.

Optionally, the node includes a server and a work end, where the server is used to store the model fragments, and the work end is used to store the data fragments.

Optionally, the loading of a historical model by each node, updating model parameters of the historical model according to the received data fragments, and performing model training and machine learning by using the updated model includes:

the working end of each node reads the data fragments of the training data in parallel;

the working end acquires needed parameters from the server side of the node, and calculates the gradient of each parameter according to the read training data;

the server side loads a history model;

the server asynchronously updates the model parameters of the historical model according to the gradient to obtain an updated model;

and the server side performs model training and machine learning by adopting the updated model according to the training data provided by the working end.

Optionally, the segmenting the training data into a plurality of data segments comprises:

and obtaining node numbers by adopting a Hash modulo mode for the training data, wherein the training data with the same node number belong to the same data fragment.

Optionally, the historical model is a model obtained by a previous update.

Optionally, the step of asynchronously updating, by the server, the model parameter of the historical model according to the gradient to obtain an updated model includes:

and the server side of each node asynchronously updates model parameters by adopting an FTRL algorithm based on the loaded historical model according to the gradient received from the working end to obtain an updated model.

Optionally, the asynchronously updating, by the server, the model parameter of the historical model according to the gradient, and obtaining the updated model further includes:

and when the model parameters are updated, multiplying a variable in the FTRL algorithm by an attenuation coefficient to bias the weight of the model parameters to the latest data.

judging the retention of the model features according to the feature receding time window, wherein the feature receding time window is a set threshold value of the time interval between the updating time of each feature and the current time, when the time interval between the updating time of one feature and the current time exceeds the threshold value, judging that the feature is an overdue feature, and removing the feature from the model.

configuring an expansion threshold of a second-order parameter of model features, wherein the threshold is a preset value of the occurrence frequency of the features, and when the occurrence frequency of one feature is greater than the preset value and a first-order parameter of the feature is nonzero, the second-order parameter of the feature is expanded; and automatically closing the expanded second-order parameters of the features after the first-order parameter weights of the features are zeroed by sparse regularization in the updating process.

Optionally, the model is an automatic feature cross machine learning model.

In addition, to achieve the above object, an embodiment of the present application further provides a machine learning system, where the system includes:

the segmentation module is used for acquiring training data from a training database and segmenting the training data into a plurality of data segments;

a distribution module, configured to distribute the plurality of data fragments to a plurality of nodes respectively;

and the optimization module is used for loading a historical model on each node, updating model parameters of the historical model according to the received data fragments, and performing model training and machine learning by adopting the updated model.

In order to achieve the above object, an embodiment of the present application further provides an electronic device, including: a memory, a processor and a machine learning program stored on the memory and executable on the processor, the machine learning program when executed by the processor implementing the machine learning method as described above.

To achieve the above object, an embodiment of the present application further provides a computer-readable storage medium, on which a machine learning program is stored, and the machine learning program, when executed by a processor, implements the machine learning method as described above.

The machine learning method, the system, the electronic device and the computer readable storage medium provided by the embodiment of the application can provide FM algorithm optimization based on stream learning, the model directly performs incremental training on the basis of a historical model aiming at each batch of new training data, and outputs the next model, the full-data loading is not needed for multi-round full-learning, the problems of data scale and resource consumption are effectively balanced, and large-scale training samples can be handled more lightly.

Drawings

FIG. 1 is a diagram of an application environment architecture in which various embodiments of the present application may be implemented;

FIG. 2 is a flowchart of a machine learning method according to a first embodiment of the present disclosure;

FIG. 3 is a detailed flowchart of step S204 in FIG. 2;

FIG. 4 is a schematic diagram of a flow learning process of the present application;

FIG. 5 is a schematic diagram of a feature pull-out mechanism of the present application;

fig. 6 is a schematic hardware architecture diagram of an electronic device according to a second embodiment of the present application;

fig. 7 is a block diagram of a machine learning system according to a third embodiment of the present disclosure.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the descriptions relating to "first", "second", etc. in the embodiments of the present application are only for descriptive purposes and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present application.

Referring to fig. 1, fig. 1 is a diagram illustrating an application environment architecture for implementing various embodiments of the present application. The application can be applied to application environments including, but not limited to, a training database 2 and a server 4.

The training database 2 is used for storing training data, such as record data of the advertisement clicking behavior of the users of the advertisement system. The training database 2 may be separately located in another server, and may be in data communication with the server 4 through a network, or may be located in the server 4.

The server 4 is used for providing the FM algorithm optimization based on the streaming learning, and performing model training and machine learning through the optimized FM algorithm according to the training data in the training database 2. The server 4 may be a rack server, a blade server, a tower server, a cabinet server, or other computing devices, may be an independent server, or may be a server cluster formed by a plurality of servers.

In this embodiment, the server 4 includes a plurality of nodes 40, and each node 40 includes a server 42 and a worker 44. The server 42 is used for storing the model fragments, updating the model parameters, and performing model training. The working end 44 is used for storing data fragments of training data and providing training samples for the service end 42.

Example one

Fig. 2 is a flowchart of a machine learning method according to a first embodiment of the present application. It is to be understood that the flow charts in the embodiments of the present method are not intended to limit the order in which the steps are performed. In the embodiment, the machine learning method is particularly suitable for optimizing the FM algorithm model, and large-scale training samples are dealt with based on streaming learning and incremental training. The model in this embodiment is an automatic feature cross machine learning model, specifically, an FM algorithm model.

The method comprises the following steps:

s200, acquiring training data from a training database, and dividing the training data into a plurality of data segments.

The training data is data used for inputting the FM algorithm model for model training and machine learning, such as user click advertisement behavior record data (where click is positive sample and non-click is negative sample) in the advertisement system, and the like. It should be noted that, in this embodiment, the training data is newly added data, and the model directly performs incremental training on each batch of newly added training data on the historical model and outputs the next model, without loading full data to perform multiple rounds of full learning. The problems of data scale and resource consumption are effectively balanced, and the problems of large-scale samples are solved in a lighter weight mode. Moreover, for a large-scale training sample, since the data size of the training data is too large, it is necessary to perform segmentation so as to distribute the training data to a plurality of nodes for processing.

In this embodiment, the training data is subjected to hash (hash) modulo to obtain a node number, and the training data with the same node number belongs to the same data fragment and is subsequently distributed to a node corresponding to the node number. For example, assuming that there are 40 nodes in total, the node numbers are 0 to 39, respectively, the feature data used for training is converted from plaintext into a number, and the number is hashed and modulo to obtain the node numbers 0 to 39, and then data distribution is performed according to the obtained node numbers.

S202, distributing the data fragments to a plurality of nodes respectively.

In this embodiment, the data fragments are distributed to the nodes corresponding to the node numbers according to the node numbers obtained by performing hash modulo.

And S204, loading a historical model on each node, updating model parameters according to the received data fragments, and performing model training and machine learning by adopting the updated model.

The embodiment provides FM algorithm optimization based on streaming learning, also called incremental learning, that is, the model can be continuously updated iteratively on the basis of loading the historical model. The historical model is a model obtained by previous updating, for example, the historical model can be a model of the previous day, and the model of the previous day is loaded as a basis before the model is updated every day.

The FM algorithm can automatically learn the mutual information of the characteristics, and the objective function of the model is as follows:

first half of the equation

Representing linear regression, the second half

Are cross terms (combinations of features). Where n represents the number of features of the training sample, x_iIs the value of the ith feature, w₀、w_i、v_i、v_jAre model parameters. In the above model parameters, w₀、w_iBeing a first order parameter of the FM algorithm, w₀Is an initial weight value, or is understood as a bias term; w is a_iRepresents each feature x_iA corresponding weight value; v. of_i、v_jFor the second-order parameters of the FM algorithm, the vectors (K dimensions), (v) represent the embedding (a way to convert discrete variables into continuous variables)_i,v_j) Is the feature x of the input_iAnd x_jThe cross parameter between. For example, for a user's click-to-advertisement behavioral records data, including click action (1), gender (male), city (Shanghai), advertisement industry (Japanese), then the v-vector includes v_{For male}、v_{Shanghai province}、v_{Japanese language}。

Each node simultaneously comprises a service end and a working end. The server stores model fragments of the FM algorithm (wherein the fragmentation mode of the model data can be the same as that of the training data), and the working end stores the data fragments of the received training data. All the nodes read the training data in parallel through the working end, acquire required parameters (such as w and v) from the server end, calculate the gradient of each parameter according to the training data, and send the calculated gradient to the server end. And after receiving the gradient calculated by the working end, the server asynchronously updates the model parameters according to the gradient. The gradient is the partial derivative of the loss function in the machine learning algorithm, indicating in which direction the model parameters should be optimized in order to minimize the error of the objective function. Each model parameter corresponds to a respective gradient, and the gradient of each current parameter can be calculated according to the training data.

Further referring to fig. 3, a detailed flow chart of the step S204 is shown. In this embodiment, the step S204 specifically includes:

and S2040, reading the data fragments of the training data in parallel by the working end of each node.

And after the training data is segmented into a plurality of data segments, distributing the data segments to a plurality of nodes according to node numbers. And the working end of each node reads the data fragment corresponding to the node in parallel.

S2042, the working end obtains the needed parameters from the service end of the node, and the gradient of each parameter is calculated according to the read training data.

In this embodiment, the FTRL (Follow-the-regularized-Leader) algorithm is used for model optimization, and before updating the model parameters, the gradients of the parameters need to be calculated. And the working end of each node acquires model parameters (such as w and v) needing to be updated from the service end of the node, and then calculates the gradient of each parameter according to the training data (data shards).

S2044, the server side loads a history model.

The embodiment provides the FM algorithm optimization based on the streaming learning, and the iterative updating is continued on the basis of loading the historical model. Fig. 4 is a schematic diagram of the process of the streaming learning. In this embodiment, the historical model may be a previous day model, and before the model is updated every day, the server loads the previous day model as a basis, then continuously iteratively updates the model according to new training data on the basis, and trains by using the updated model.

S2046, the server asynchronously updates the model parameters of the historical model according to the gradient to obtain an updated model.

And after calculating the gradient of each parameter, the working end sends the gradient to the corresponding server. And the server side of each node asynchronously updates model parameters based on the loaded historical model according to the received gradient to obtain an updated model. And the updated model is used as the model of the current day and is used for carrying out model training and machine learning according to the training data.

Optionally, when the server side performs model updating, the weight of the model parameter may be biased to the latest data by multiplying the variable in the FTRL algorithm by an attenuation coefficient. That is, each time the model parameters are updated according to the gradient, the variable in the FTRL algorithm is multiplied by an attenuation coefficient, so that the older data is multiplied by the attenuation coefficient more times (so that the data is multiplied once in each update), and the proportion in the updated model is lower, so that the weight of the whole model is biased to the newest data, and the timeliness of the data is better captured.

Optionally, the embodiment may also determine the retention of the feature (variable) of the FM algorithm according to the feature retirement time window. That is, whether each feature persists is determined according to the time interval between the time node (update time) at which the feature was most recently implemented and the current time. Fig. 5 is a schematic diagram of a characteristic field-off mechanism. For features whose time interval exceeds a set threshold (i.e., outside the feature exit time window), those features are considered outdated features and are removed from the model. For example, the first feature in FIG. 5, the expired feature, needs to be culled, while other features that are not expired, are retained in the model. The embodiment enables the features which are not updated for a long time to automatically quit by flexibly configuring the feature overdue quit field, and keeps the size of the feature space stable.

Optionally, in this embodiment, an expansion threshold (a preset value of frequency of occurrence of the feature) of a second-order parameter of the feature of the FM algorithm may also be configured, so as to implement an intelligent parameter expansion mechanism. When the frequency of occurrence of a certain feature is greater than the preset value and the first-order parameter corresponding to the feature is nonzero, the second-order parameter corresponding to the feature is expanded and participates in model training and updating. When the first-order parameter weight of a certain characteristic is zeroed by sparse regularization in the updating process, the expanded second-order parameter is automatically closed and does not participate in model training and updating, so that the memory and the computing resource are saved (the parameter quantity is reduced).

And S2048, the server side performs model training and machine learning by adopting the updated model according to the training data provided by the working end.

And after the updated latest model is obtained, the server side performs model training and machine learning by adopting the updated model according to the training data provided by the working end, and outputs the final result. The specific process of model training and machine learning, i.e. the process of the existing FM algorithm, is not described herein again. When the next model updating (next day) is carried out, the updated model is loaded first, and the iterative updating is continued on the basis of the updated model.

The machine learning method provided by the embodiment can provide FM algorithm optimization based on stream learning, the model directly performs incremental training on the basis of a historical model aiming at each batch of new training data, and outputs the next model, and the next model does not need to be loaded with full data to perform multi-round full learning, so that the problems of data scale and resource consumption are effectively balanced, and large-scale training samples (supporting billions of large-scale data training) can be handled more lightly. In addition, the embodiment also supports flexible configuration of a feature expiration field-returning mechanism and an intelligent parameter deployment mechanism, saves storage resources and training resources, introduces a parameter attenuation mechanism, enables the weight of the whole model to be biased to the latest data, and can better capture the timeliness of the data.

Example two

Fig. 5 is a schematic diagram of a hardware architecture of an electronic device 20 according to a second embodiment of the present application. In the present embodiment, the electronic device 20 may include, but is not limited to, a memory 21, a processor 22, and a network interface 23, which are communicatively connected to each other through a system bus. It is noted that fig. 5 only shows the electronic device 20 with components 21-23, but it is to be understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. In this embodiment, the electronic device 20 may be the server 4.

The memory 21 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 21 may be an internal storage unit of the electronic device 20, such as a hard disk or a memory of the electronic device 20. In other embodiments, the memory 21 may also be an external storage device of the electronic apparatus 20, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the electronic apparatus 20. Of course, the memory 21 may also include both an internal storage unit and an external storage device of the electronic apparatus 20. In this embodiment, the memory 21 is generally used for storing an operating system installed in the electronic device 20 and various application software, such as program codes of the machine learning system 60. Further, the memory 21 may also be used to temporarily store various types of data that have been output or are to be output.

The processor 22 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 22 is generally used to control the overall operation of the electronic device 20. In this embodiment, the processor 22 is configured to execute the program codes stored in the memory 21 or process data, such as executing the machine learning system 60.

The network interface 23 may include a wireless network interface or a wired network interface, and the network interface 23 is generally used for establishing a communication connection between the electronic apparatus 20 and other electronic devices.

EXAMPLE III

Fig. 6 is a block diagram of a machine learning system 60 according to a third embodiment of the present disclosure. The machine learning system 60 may be partitioned into one or more program modules, which are stored in a storage medium and executed by one or more processors to implement embodiments of the present application. The program modules referred to in the embodiments of the present application refer to a series of computer program instruction segments capable of performing specific functions, and the following description will specifically describe the functions of each program module in the embodiments.

In the present embodiment, the machine learning system 60 includes:

the segmentation module 600 is configured to obtain training data from a training database, and segment the training data into a plurality of data segments.

A distributing module 602, configured to distribute the multiple data fragments to multiple nodes respectively.

And the optimization module 604 is configured to load a historical model on each node, update model parameters according to the received data segments, and perform model training and machine learning by using the updated model.

The embodiment provides FM algorithm optimization based on streaming learning, also called incremental learning, that is, the model can be continuously updated iteratively on the basis of loading the historical model. For example, the historical model is a model updated last time, for example, the historical model may be a model of the previous day, and the model of the previous day is loaded as a basis before the model is updated every day.

first half of the equationIs divided into

Representing linear regression, the second half

Are cross terms (combinations of features). Where n represents the number of features of the training sample, x_iIs the value of the ith feature, w₀、w_i、v_i、v_jAre model parameters. In the above model parameters, w₀、w_iBeing a first order parameter of the FM algorithm, w₀Is an initial weight value, or is understood as a bias term; w is a_iRepresents each feature x_iA corresponding weight value; v. of_i、v_jFor the second order parameter of the FM algorithm, the embeding vector (K dimension), (v) is expressed_i,v_j) Is the feature x of the input_iAnd x_jThe cross parameter between. For example, for a user's click-to-advertisement behavioral records data, including click action (1), gender (male), city (Shanghai), advertisement industry (Japanese), then the v-vector includes v_{For male}、v_{Shanghai province}、v_{Japanese language}。

In this embodiment, the specific process of the optimization module 604 for implementing the above functions includes:

(1) and the working end of each node reads the data fragments of the training data in parallel.

(2) And the working end acquires the required parameters from the service end of the node and calculates the gradient of each parameter according to the read training data.

In this embodiment, the FTRL algorithm is used to perform model optimization, and before updating the model parameters, the gradient of each parameter needs to be calculated. And the working end of each node acquires model parameters (such as w and v) needing to be updated from the service end of the node, and then calculates the gradient of each parameter according to the training data (data shards).

(3) And the server loads a history model.

(4) And the server asynchronously updates the model parameters of the historical model according to the gradient to obtain an updated model.

(5) And the server side performs model training and machine learning by adopting the updated model according to the training data provided by the working end.

The machine learning system provided by the embodiment can provide FM algorithm optimization based on stream learning, the model directly performs incremental training on the basis of a historical model aiming at each batch of new training data, and outputs the next model, and the next model does not need to be loaded with full data to perform multi-round full learning, so that the problems of data scale and resource consumption are effectively balanced, and large-scale training samples (supporting billions-level large-scale data training) can be handled more lightly. In addition, the embodiment also supports flexible configuration of a feature expiration field-returning mechanism and an intelligent parameter deployment mechanism, saves storage resources and training resources, introduces a parameter attenuation mechanism, enables the weight of the whole model to be biased to the latest data, and can better capture the timeliness of the data.

Example four

The present application further provides another embodiment, which is to provide a computer-readable storage medium storing a machine learning program, which is executable by at least one processor to cause the at least one processor to perform the steps of the machine learning method as described above.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

It will be apparent to those skilled in the art that the modules or steps of the embodiments of the present application described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different from that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, embodiments of the present application are not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications that can be made by the use of the equivalent structures or equivalent processes in the specification and drawings of the present application or that can be directly or indirectly applied to other related technologies are also included in the scope of the present application.

Claims

1. A machine learning method, the method comprising:

2. The machine learning method of claim 1, wherein the node comprises a server and a worker, the server is configured to store the model shards, and the worker is configured to store the data shards.

3. The machine learning method according to claim 2, wherein each of the nodes loads a history model, updates model parameters of the history model according to the received data slice, and performs model training and machine learning using the updated model comprises:

the server side loads a history model;

4. The machine learning method of claim 1, wherein the slicing the training data into a plurality of data slices comprises:

5. A machine learning method as claimed in claim 2 or 3 in which the historical model is a model from a previous update.

6. The machine learning method of claim 3, wherein the server asynchronously updates the model parameters of the historical model according to the gradient, and obtaining the updated model comprises:

7. The machine learning method of claim 6, wherein the server asynchronously updates the model parameters of the historical model according to the gradient, and obtaining the updated model further comprises:

8. The machine learning method of claim 6, wherein the server asynchronously updates the model parameters of the historical model according to the gradient, and obtaining the updated model further comprises:

9. The machine learning method of claim 6, wherein the server asynchronously updates the model parameters of the historical model according to the gradient, and obtaining the updated model further comprises:

10. The machine learning method of claim 1, wherein the model is an automatic feature-cross machine learning model.

11. A machine learning system, the system comprising:

12. An electronic device, comprising: a memory, a processor, and a machine learning program stored on the memory and executable on the processor, the machine learning program when executed by the processor implementing the machine learning method of any one of claims 1-10.

13. A computer-readable storage medium, having stored thereon a machine learning program which, when executed by a processor, implements a machine learning method according to any one of claims 1-10.