CN114004623A - Machine learning method and system - Google Patents

Machine learning method and system Download PDF

Info

Publication number
CN114004623A
CN114004623A CN202010735878.5A CN202010735878A CN114004623A CN 114004623 A CN114004623 A CN 114004623A CN 202010735878 A CN202010735878 A CN 202010735878A CN 114004623 A CN114004623 A CN 114004623A
Authority
CN
China
Prior art keywords
model
machine learning
data
training
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010735878.5A
Other languages
Chinese (zh)
Inventor
吴安新
何其真
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Bilibili Technology Co Ltd
Original Assignee
Shanghai Bilibili Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Bilibili Technology Co Ltd filed Critical Shanghai Bilibili Technology Co Ltd
Priority to CN202010735878.5A priority Critical patent/CN114004623A/en
Publication of CN114004623A publication Critical patent/CN114004623A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0242Determining effectiveness of advertisements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • G06F16/212Schema design and management with details for data modelling support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2379Updates performed during online database operations; commit processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Accounting & Taxation (AREA)
  • General Business, Economics & Management (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Game Theory and Decision Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Mathematical Physics (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The application discloses a machine learning method, which comprises the following steps: acquiring training data from a training database, and segmenting the training data into a plurality of data segments; distributing the plurality of data fragments to a plurality of nodes respectively; and each node loads a historical model, updates the model parameters of the historical model according to the received data fragments, and adopts the updated model to carry out model training and machine learning. The application also discloses a machine learning system, an electronic device and a computer readable storage medium. Therefore, the FM algorithm optimization based on the streaming learning can be provided, the model directly carries out incremental training on the basis of the historical model aiming at each batch of new training data, and the next model is produced, so that the problems of data scale and resource consumption are effectively balanced, and large-scale training samples can be dealt with.

Description

Machine learning method and system
Technical Field
The present application relates to the field of machine learning technologies, and in particular, to a machine learning method, a machine learning system, an electronic device, and a computer-readable storage medium.
Background
The FM (Factorization Machine) algorithm is a common automatic feature cross Machine learning algorithm, and solves the problem of feature combination in a sparse data scene. The automatic feature crossing means that the model automatically learns the interaction relation between implicit expression features, and the binding processing is not performed on two different features manually. Compared with a linear model (such as logistic regression), the algorithm can obtain better performance effect, and therefore, the algorithm is widely applied to various large machine learning application scenarios, such as recommendation systems, advertisement calculation, search ranking and the like.
In the advertisement system, the click rate refers to the ratio of the number of clicks to the number of impressions of the advertisement, and can reflect the popularity of the recommended advertisement. In the internet advertisement system, there is a need to recommend advertisements that are more easily clicked to users according to advertisement clicking behaviors of hundreds of millions of users. The click-to-advertisement behavior log data of these users is taken as model training data, but the data volume is too large and may be billion-scale training data every day. The existing FM algorithm optimization mode adopts a full-scale multi-round iterative learning algorithm, such as an L-BFGS algorithm, an OWL-QN algorithm and the like, and the algorithm consumes extremely many computing resources and cannot well introduce the time sequence of data, so that the method is not suitable for large-scale training samples.
It should be noted that the above-mentioned contents are not intended to limit the scope of protection of the application.
Disclosure of Invention
The present application mainly aims to provide a machine learning method, a machine learning system, an electronic device, and a computer-readable storage medium, and aims to solve the problem of how to provide an FM algorithm optimization method supporting large-scale training samples.
In order to achieve the above object, an embodiment of the present application provides a machine learning method, where the method includes:
acquiring training data from a training database, and segmenting the training data into a plurality of data segments;
distributing the plurality of data fragments to a plurality of nodes respectively; and
and each node loads a historical model, updates the model parameters of the historical model according to the received data fragments, and adopts the updated model to carry out model training and machine learning.
Optionally, the node includes a server and a work end, where the server is used to store the model fragments, and the work end is used to store the data fragments.
Optionally, the loading of a historical model by each node, updating model parameters of the historical model according to the received data fragments, and performing model training and machine learning by using the updated model includes:
the working end of each node reads the data fragments of the training data in parallel;
the working end acquires needed parameters from the server side of the node, and calculates the gradient of each parameter according to the read training data;
the server side loads a history model;
the server asynchronously updates the model parameters of the historical model according to the gradient to obtain an updated model;
and the server side performs model training and machine learning by adopting the updated model according to the training data provided by the working end.
Optionally, the segmenting the training data into a plurality of data segments comprises:
and obtaining node numbers by adopting a Hash modulo mode for the training data, wherein the training data with the same node number belong to the same data fragment.
Optionally, the historical model is a model obtained by a previous update.
Optionally, the step of asynchronously updating, by the server, the model parameter of the historical model according to the gradient to obtain an updated model includes:
and the server side of each node asynchronously updates model parameters by adopting an FTRL algorithm based on the loaded historical model according to the gradient received from the working end to obtain an updated model.
Optionally, the asynchronously updating, by the server, the model parameter of the historical model according to the gradient, and obtaining the updated model further includes:
and when the model parameters are updated, multiplying a variable in the FTRL algorithm by an attenuation coefficient to bias the weight of the model parameters to the latest data.
Optionally, the asynchronously updating, by the server, the model parameter of the historical model according to the gradient, and obtaining the updated model further includes:
judging the retention of the model features according to the feature receding time window, wherein the feature receding time window is a set threshold value of the time interval between the updating time of each feature and the current time, when the time interval between the updating time of one feature and the current time exceeds the threshold value, judging that the feature is an overdue feature, and removing the feature from the model.
Optionally, the asynchronously updating, by the server, the model parameter of the historical model according to the gradient, and obtaining the updated model further includes:
configuring an expansion threshold of a second-order parameter of model features, wherein the threshold is a preset value of the occurrence frequency of the features, and when the occurrence frequency of one feature is greater than the preset value and a first-order parameter of the feature is nonzero, the second-order parameter of the feature is expanded; and automatically closing the expanded second-order parameters of the features after the first-order parameter weights of the features are zeroed by sparse regularization in the updating process.
Optionally, the model is an automatic feature cross machine learning model.
In addition, to achieve the above object, an embodiment of the present application further provides a machine learning system, where the system includes:
the segmentation module is used for acquiring training data from a training database and segmenting the training data into a plurality of data segments;
a distribution module, configured to distribute the plurality of data fragments to a plurality of nodes respectively;
and the optimization module is used for loading a historical model on each node, updating model parameters of the historical model according to the received data fragments, and performing model training and machine learning by adopting the updated model.
In order to achieve the above object, an embodiment of the present application further provides an electronic device, including: a memory, a processor and a machine learning program stored on the memory and executable on the processor, the machine learning program when executed by the processor implementing the machine learning method as described above.
To achieve the above object, an embodiment of the present application further provides a computer-readable storage medium, on which a machine learning program is stored, and the machine learning program, when executed by a processor, implements the machine learning method as described above.
The machine learning method, the system, the electronic device and the computer readable storage medium provided by the embodiment of the application can provide FM algorithm optimization based on stream learning, the model directly performs incremental training on the basis of a historical model aiming at each batch of new training data, and outputs the next model, the full-data loading is not needed for multi-round full-learning, the problems of data scale and resource consumption are effectively balanced, and large-scale training samples can be handled more lightly.
Drawings
FIG. 1 is a diagram of an application environment architecture in which various embodiments of the present application may be implemented;
FIG. 2 is a flowchart of a machine learning method according to a first embodiment of the present disclosure;
FIG. 3 is a detailed flowchart of step S204 in FIG. 2;
FIG. 4 is a schematic diagram of a flow learning process of the present application;
FIG. 5 is a schematic diagram of a feature pull-out mechanism of the present application;
fig. 6 is a schematic hardware architecture diagram of an electronic device according to a second embodiment of the present application;
fig. 7 is a block diagram of a machine learning system according to a third embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the descriptions relating to "first", "second", etc. in the embodiments of the present application are only for descriptive purposes and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In addition, technical solutions between various embodiments may be combined with each other, but must be realized by a person skilled in the art, and when the technical solutions are contradictory or cannot be realized, such a combination should not be considered to exist, and is not within the protection scope of the present application.
Referring to fig. 1, fig. 1 is a diagram illustrating an application environment architecture for implementing various embodiments of the present application. The application can be applied to application environments including, but not limited to, a training database 2 and a server 4.
The training database 2 is used for storing training data, such as record data of the advertisement clicking behavior of the users of the advertisement system. The training database 2 may be separately located in another server, and may be in data communication with the server 4 through a network, or may be located in the server 4.
The server 4 is used for providing the FM algorithm optimization based on the streaming learning, and performing model training and machine learning through the optimized FM algorithm according to the training data in the training database 2. The server 4 may be a rack server, a blade server, a tower server, a cabinet server, or other computing devices, may be an independent server, or may be a server cluster formed by a plurality of servers.
In this embodiment, the server 4 includes a plurality of nodes 40, and each node 40 includes a server 42 and a worker 44. The server 42 is used for storing the model fragments, updating the model parameters, and performing model training. The working end 44 is used for storing data fragments of training data and providing training samples for the service end 42.
Example one
Fig. 2 is a flowchart of a machine learning method according to a first embodiment of the present application. It is to be understood that the flow charts in the embodiments of the present method are not intended to limit the order in which the steps are performed. In the embodiment, the machine learning method is particularly suitable for optimizing the FM algorithm model, and large-scale training samples are dealt with based on streaming learning and incremental training. The model in this embodiment is an automatic feature cross machine learning model, specifically, an FM algorithm model.
The method comprises the following steps:
s200, acquiring training data from a training database, and dividing the training data into a plurality of data segments.
The training data is data used for inputting the FM algorithm model for model training and machine learning, such as user click advertisement behavior record data (where click is positive sample and non-click is negative sample) in the advertisement system, and the like. It should be noted that, in this embodiment, the training data is newly added data, and the model directly performs incremental training on each batch of newly added training data on the historical model and outputs the next model, without loading full data to perform multiple rounds of full learning. The problems of data scale and resource consumption are effectively balanced, and the problems of large-scale samples are solved in a lighter weight mode. Moreover, for a large-scale training sample, since the data size of the training data is too large, it is necessary to perform segmentation so as to distribute the training data to a plurality of nodes for processing.
In this embodiment, the training data is subjected to hash (hash) modulo to obtain a node number, and the training data with the same node number belongs to the same data fragment and is subsequently distributed to a node corresponding to the node number. For example, assuming that there are 40 nodes in total, the node numbers are 0 to 39, respectively, the feature data used for training is converted from plaintext into a number, and the number is hashed and modulo to obtain the node numbers 0 to 39, and then data distribution is performed according to the obtained node numbers.
S202, distributing the data fragments to a plurality of nodes respectively.
In this embodiment, the data fragments are distributed to the nodes corresponding to the node numbers according to the node numbers obtained by performing hash modulo.
And S204, loading a historical model on each node, updating model parameters according to the received data fragments, and performing model training and machine learning by adopting the updated model.
The embodiment provides FM algorithm optimization based on streaming learning, also called incremental learning, that is, the model can be continuously updated iteratively on the basis of loading the historical model. The historical model is a model obtained by previous updating, for example, the historical model can be a model of the previous day, and the model of the previous day is loaded as a basis before the model is updated every day.
The FM algorithm can automatically learn the mutual information of the characteristics, and the objective function of the model is as follows:
Figure BDA0002604917950000071
first half of the equation
Figure BDA0002604917950000072
Representing linear regression, the second half
Figure BDA0002604917950000073
Are cross terms (combinations of features). Where n represents the number of features of the training sample, xiIs the value of the ith feature, w0、wi、vi、vjAre model parameters. In the above model parameters, w0、wiBeing a first order parameter of the FM algorithm, w0Is an initial weight value, or is understood as a bias term; w is aiRepresents each feature xiA corresponding weight value; v. ofi、vjFor the second-order parameters of the FM algorithm, the vectors (K dimensions), (v) represent the embedding (a way to convert discrete variables into continuous variables)i,vj) Is the feature x of the inputiAnd xjThe cross parameter between. For example, for a user's click-to-advertisement behavioral records data, including click action (1), gender (male), city (Shanghai), advertisement industry (Japanese), then the v-vector includes vFor male、vShanghai province、vJapanese language
Each node simultaneously comprises a service end and a working end. The server stores model fragments of the FM algorithm (wherein the fragmentation mode of the model data can be the same as that of the training data), and the working end stores the data fragments of the received training data. All the nodes read the training data in parallel through the working end, acquire required parameters (such as w and v) from the server end, calculate the gradient of each parameter according to the training data, and send the calculated gradient to the server end. And after receiving the gradient calculated by the working end, the server asynchronously updates the model parameters according to the gradient. The gradient is the partial derivative of the loss function in the machine learning algorithm, indicating in which direction the model parameters should be optimized in order to minimize the error of the objective function. Each model parameter corresponds to a respective gradient, and the gradient of each current parameter can be calculated according to the training data.
Further referring to fig. 3, a detailed flow chart of the step S204 is shown. In this embodiment, the step S204 specifically includes:
and S2040, reading the data fragments of the training data in parallel by the working end of each node.
And after the training data is segmented into a plurality of data segments, distributing the data segments to a plurality of nodes according to node numbers. And the working end of each node reads the data fragment corresponding to the node in parallel.
S2042, the working end obtains the needed parameters from the service end of the node, and the gradient of each parameter is calculated according to the read training data.
In this embodiment, the FTRL (Follow-the-regularized-Leader) algorithm is used for model optimization, and before updating the model parameters, the gradients of the parameters need to be calculated. And the working end of each node acquires model parameters (such as w and v) needing to be updated from the service end of the node, and then calculates the gradient of each parameter according to the training data (data shards).
S2044, the server side loads a history model.
The embodiment provides the FM algorithm optimization based on the streaming learning, and the iterative updating is continued on the basis of loading the historical model. Fig. 4 is a schematic diagram of the process of the streaming learning. In this embodiment, the historical model may be a previous day model, and before the model is updated every day, the server loads the previous day model as a basis, then continuously iteratively updates the model according to new training data on the basis, and trains by using the updated model.
S2046, the server asynchronously updates the model parameters of the historical model according to the gradient to obtain an updated model.
And after calculating the gradient of each parameter, the working end sends the gradient to the corresponding server. And the server side of each node asynchronously updates model parameters based on the loaded historical model according to the received gradient to obtain an updated model. And the updated model is used as the model of the current day and is used for carrying out model training and machine learning according to the training data.
Optionally, when the server side performs model updating, the weight of the model parameter may be biased to the latest data by multiplying the variable in the FTRL algorithm by an attenuation coefficient. That is, each time the model parameters are updated according to the gradient, the variable in the FTRL algorithm is multiplied by an attenuation coefficient, so that the older data is multiplied by the attenuation coefficient more times (so that the data is multiplied once in each update), and the proportion in the updated model is lower, so that the weight of the whole model is biased to the newest data, and the timeliness of the data is better captured.
Optionally, the embodiment may also determine the retention of the feature (variable) of the FM algorithm according to the feature retirement time window. That is, whether each feature persists is determined according to the time interval between the time node (update time) at which the feature was most recently implemented and the current time. Fig. 5 is a schematic diagram of a characteristic field-off mechanism. For features whose time interval exceeds a set threshold (i.e., outside the feature exit time window), those features are considered outdated features and are removed from the model. For example, the first feature in FIG. 5, the expired feature, needs to be culled, while other features that are not expired, are retained in the model. The embodiment enables the features which are not updated for a long time to automatically quit by flexibly configuring the feature overdue quit field, and keeps the size of the feature space stable.
Optionally, in this embodiment, an expansion threshold (a preset value of frequency of occurrence of the feature) of a second-order parameter of the feature of the FM algorithm may also be configured, so as to implement an intelligent parameter expansion mechanism. When the frequency of occurrence of a certain feature is greater than the preset value and the first-order parameter corresponding to the feature is nonzero, the second-order parameter corresponding to the feature is expanded and participates in model training and updating. When the first-order parameter weight of a certain characteristic is zeroed by sparse regularization in the updating process, the expanded second-order parameter is automatically closed and does not participate in model training and updating, so that the memory and the computing resource are saved (the parameter quantity is reduced).
And S2048, the server side performs model training and machine learning by adopting the updated model according to the training data provided by the working end.
And after the updated latest model is obtained, the server side performs model training and machine learning by adopting the updated model according to the training data provided by the working end, and outputs the final result. The specific process of model training and machine learning, i.e. the process of the existing FM algorithm, is not described herein again. When the next model updating (next day) is carried out, the updated model is loaded first, and the iterative updating is continued on the basis of the updated model.
The machine learning method provided by the embodiment can provide FM algorithm optimization based on stream learning, the model directly performs incremental training on the basis of a historical model aiming at each batch of new training data, and outputs the next model, and the next model does not need to be loaded with full data to perform multi-round full learning, so that the problems of data scale and resource consumption are effectively balanced, and large-scale training samples (supporting billions of large-scale data training) can be handled more lightly. In addition, the embodiment also supports flexible configuration of a feature expiration field-returning mechanism and an intelligent parameter deployment mechanism, saves storage resources and training resources, introduces a parameter attenuation mechanism, enables the weight of the whole model to be biased to the latest data, and can better capture the timeliness of the data.
Example two
Fig. 5 is a schematic diagram of a hardware architecture of an electronic device 20 according to a second embodiment of the present application. In the present embodiment, the electronic device 20 may include, but is not limited to, a memory 21, a processor 22, and a network interface 23, which are communicatively connected to each other through a system bus. It is noted that fig. 5 only shows the electronic device 20 with components 21-23, but it is to be understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. In this embodiment, the electronic device 20 may be the server 4.
The memory 21 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the storage 21 may be an internal storage unit of the electronic device 20, such as a hard disk or a memory of the electronic device 20. In other embodiments, the memory 21 may also be an external storage device of the electronic apparatus 20, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), or the like, provided on the electronic apparatus 20. Of course, the memory 21 may also include both an internal storage unit and an external storage device of the electronic apparatus 20. In this embodiment, the memory 21 is generally used for storing an operating system installed in the electronic device 20 and various application software, such as program codes of the machine learning system 60. Further, the memory 21 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 22 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 22 is generally used to control the overall operation of the electronic device 20. In this embodiment, the processor 22 is configured to execute the program codes stored in the memory 21 or process data, such as executing the machine learning system 60.
The network interface 23 may include a wireless network interface or a wired network interface, and the network interface 23 is generally used for establishing a communication connection between the electronic apparatus 20 and other electronic devices.
EXAMPLE III
Fig. 6 is a block diagram of a machine learning system 60 according to a third embodiment of the present disclosure. The machine learning system 60 may be partitioned into one or more program modules, which are stored in a storage medium and executed by one or more processors to implement embodiments of the present application. The program modules referred to in the embodiments of the present application refer to a series of computer program instruction segments capable of performing specific functions, and the following description will specifically describe the functions of each program module in the embodiments.
In the present embodiment, the machine learning system 60 includes:
the segmentation module 600 is configured to obtain training data from a training database, and segment the training data into a plurality of data segments.
The training data is data used for inputting the FM algorithm model for model training and machine learning, such as user click advertisement behavior record data (where click is positive sample and non-click is negative sample) in the advertisement system, and the like. It should be noted that, in this embodiment, the training data is newly added data, and the model directly performs incremental training on each batch of newly added training data on the historical model and outputs the next model, without loading full data to perform multiple rounds of full learning. The problems of data scale and resource consumption are effectively balanced, and the problems of large-scale samples are solved in a lighter weight mode. Moreover, for a large-scale training sample, since the data size of the training data is too large, it is necessary to perform segmentation so as to distribute the training data to a plurality of nodes for processing.
In this embodiment, the training data is subjected to hash (hash) modulo to obtain a node number, and the training data with the same node number belongs to the same data fragment and is subsequently distributed to a node corresponding to the node number. For example, assuming that there are 40 nodes in total, the node numbers are 0 to 39, respectively, the feature data used for training is converted from plaintext into a number, and the number is hashed and modulo to obtain the node numbers 0 to 39, and then data distribution is performed according to the obtained node numbers.
A distributing module 602, configured to distribute the multiple data fragments to multiple nodes respectively.
In this embodiment, the data fragments are distributed to the nodes corresponding to the node numbers according to the node numbers obtained by performing hash modulo.
And the optimization module 604 is configured to load a historical model on each node, update model parameters according to the received data segments, and perform model training and machine learning by using the updated model.
The embodiment provides FM algorithm optimization based on streaming learning, also called incremental learning, that is, the model can be continuously updated iteratively on the basis of loading the historical model. For example, the historical model is a model updated last time, for example, the historical model may be a model of the previous day, and the model of the previous day is loaded as a basis before the model is updated every day.
The FM algorithm can automatically learn the mutual information of the characteristics, and the objective function of the model is as follows:
Figure BDA0002604917950000121
first half of the equationIs divided into
Figure BDA0002604917950000122
Representing linear regression, the second half
Figure BDA0002604917950000123
Are cross terms (combinations of features). Where n represents the number of features of the training sample, xiIs the value of the ith feature, w0、wi、vi、vjAre model parameters. In the above model parameters, w0、wiBeing a first order parameter of the FM algorithm, w0Is an initial weight value, or is understood as a bias term; w is aiRepresents each feature xiA corresponding weight value; v. ofi、vjFor the second order parameter of the FM algorithm, the embeding vector (K dimension), (v) is expressedi,vj) Is the feature x of the inputiAnd xjThe cross parameter between. For example, for a user's click-to-advertisement behavioral records data, including click action (1), gender (male), city (Shanghai), advertisement industry (Japanese), then the v-vector includes vFor male、vShanghai province、vJapanese language
Each node simultaneously comprises a service end and a working end. The server stores model fragments of the FM algorithm (wherein the fragmentation mode of the model data can be the same as that of the training data), and the working end stores the data fragments of the received training data. All the nodes read the training data in parallel through the working end, acquire required parameters (such as w and v) from the server end, calculate the gradient of each parameter according to the training data, and send the calculated gradient to the server end. And after receiving the gradient calculated by the working end, the server asynchronously updates the model parameters according to the gradient. The gradient is the partial derivative of the loss function in the machine learning algorithm, indicating in which direction the model parameters should be optimized in order to minimize the error of the objective function. Each model parameter corresponds to a respective gradient, and the gradient of each current parameter can be calculated according to the training data.
In this embodiment, the specific process of the optimization module 604 for implementing the above functions includes:
(1) and the working end of each node reads the data fragments of the training data in parallel.
And after the training data is segmented into a plurality of data segments, distributing the data segments to a plurality of nodes according to node numbers. And the working end of each node reads the data fragment corresponding to the node in parallel.
(2) And the working end acquires the required parameters from the service end of the node and calculates the gradient of each parameter according to the read training data.
In this embodiment, the FTRL algorithm is used to perform model optimization, and before updating the model parameters, the gradient of each parameter needs to be calculated. And the working end of each node acquires model parameters (such as w and v) needing to be updated from the service end of the node, and then calculates the gradient of each parameter according to the training data (data shards).
(3) And the server loads a history model.
The embodiment provides the FM algorithm optimization based on the streaming learning, and the iterative updating is continued on the basis of loading the historical model. Fig. 4 is a schematic diagram of the process of the streaming learning. In this embodiment, the historical model may be a previous day model, and before the model is updated every day, the server loads the previous day model as a basis, then continuously iteratively updates the model according to new training data on the basis, and trains by using the updated model.
(4) And the server asynchronously updates the model parameters of the historical model according to the gradient to obtain an updated model.
And after calculating the gradient of each parameter, the working end sends the gradient to the corresponding server. And the server side of each node asynchronously updates model parameters based on the loaded historical model according to the received gradient to obtain an updated model. And the updated model is used as the model of the current day and is used for carrying out model training and machine learning according to the training data.
Optionally, when the server side performs model updating, the weight of the model parameter may be biased to the latest data by multiplying the variable in the FTRL algorithm by an attenuation coefficient. That is, each time the model parameters are updated according to the gradient, the variable in the FTRL algorithm is multiplied by an attenuation coefficient, so that the older data is multiplied by the attenuation coefficient more times (so that the data is multiplied once in each update), and the proportion in the updated model is lower, so that the weight of the whole model is biased to the newest data, and the timeliness of the data is better captured.
Optionally, the embodiment may also determine the retention of the feature (variable) of the FM algorithm according to the feature retirement time window. That is, whether each feature persists is determined according to the time interval between the time node (update time) at which the feature was most recently implemented and the current time. Fig. 5 is a schematic diagram of a characteristic field-off mechanism. For features whose time interval exceeds a set threshold (i.e., outside the feature exit time window), those features are considered outdated features and are removed from the model. For example, the first feature in FIG. 5, the expired feature, needs to be culled, while other features that are not expired, are retained in the model. The embodiment enables the features which are not updated for a long time to automatically quit by flexibly configuring the feature overdue quit field, and keeps the size of the feature space stable.
Optionally, in this embodiment, an expansion threshold (a preset value of frequency of occurrence of the feature) of a second-order parameter of the feature of the FM algorithm may also be configured, so as to implement an intelligent parameter expansion mechanism. When the frequency of occurrence of a certain feature is greater than the preset value and the first-order parameter corresponding to the feature is nonzero, the second-order parameter corresponding to the feature is expanded and participates in model training and updating. When the first-order parameter weight of a certain characteristic is zeroed by sparse regularization in the updating process, the expanded second-order parameter is automatically closed and does not participate in model training and updating, so that the memory and the computing resource are saved (the parameter quantity is reduced).
(5) And the server side performs model training and machine learning by adopting the updated model according to the training data provided by the working end.
And after the updated latest model is obtained, the server side performs model training and machine learning by adopting the updated model according to the training data provided by the working end, and outputs the final result. The specific process of model training and machine learning, i.e. the process of the existing FM algorithm, is not described herein again. When the next model updating (next day) is carried out, the updated model is loaded first, and the iterative updating is continued on the basis of the updated model.
The machine learning system provided by the embodiment can provide FM algorithm optimization based on stream learning, the model directly performs incremental training on the basis of a historical model aiming at each batch of new training data, and outputs the next model, and the next model does not need to be loaded with full data to perform multi-round full learning, so that the problems of data scale and resource consumption are effectively balanced, and large-scale training samples (supporting billions-level large-scale data training) can be handled more lightly. In addition, the embodiment also supports flexible configuration of a feature expiration field-returning mechanism and an intelligent parameter deployment mechanism, saves storage resources and training resources, introduces a parameter attenuation mechanism, enables the weight of the whole model to be biased to the latest data, and can better capture the timeliness of the data.
Example four
The present application further provides another embodiment, which is to provide a computer-readable storage medium storing a machine learning program, which is executable by at least one processor to cause the at least one processor to perform the steps of the machine learning method as described above.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
It will be apparent to those skilled in the art that the modules or steps of the embodiments of the present application described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different from that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, embodiments of the present application are not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications that can be made by the use of the equivalent structures or equivalent processes in the specification and drawings of the present application or that can be directly or indirectly applied to other related technologies are also included in the scope of the present application.

Claims (13)

1. A machine learning method, the method comprising:
acquiring training data from a training database, and segmenting the training data into a plurality of data segments;
distributing the plurality of data fragments to a plurality of nodes respectively; and
and each node loads a historical model, updates the model parameters of the historical model according to the received data fragments, and adopts the updated model to carry out model training and machine learning.
2. The machine learning method of claim 1, wherein the node comprises a server and a worker, the server is configured to store the model shards, and the worker is configured to store the data shards.
3. The machine learning method according to claim 2, wherein each of the nodes loads a history model, updates model parameters of the history model according to the received data slice, and performs model training and machine learning using the updated model comprises:
the working end of each node reads the data fragments of the training data in parallel;
the working end acquires needed parameters from the server side of the node, and calculates the gradient of each parameter according to the read training data;
the server side loads a history model;
the server asynchronously updates the model parameters of the historical model according to the gradient to obtain an updated model;
and the server side performs model training and machine learning by adopting the updated model according to the training data provided by the working end.
4. The machine learning method of claim 1, wherein the slicing the training data into a plurality of data slices comprises:
and obtaining node numbers by adopting a Hash modulo mode for the training data, wherein the training data with the same node number belong to the same data fragment.
5. A machine learning method as claimed in claim 2 or 3 in which the historical model is a model from a previous update.
6. The machine learning method of claim 3, wherein the server asynchronously updates the model parameters of the historical model according to the gradient, and obtaining the updated model comprises:
and the server side of each node asynchronously updates model parameters by adopting an FTRL algorithm based on the loaded historical model according to the gradient received from the working end to obtain an updated model.
7. The machine learning method of claim 6, wherein the server asynchronously updates the model parameters of the historical model according to the gradient, and obtaining the updated model further comprises:
and when the model parameters are updated, multiplying a variable in the FTRL algorithm by an attenuation coefficient to bias the weight of the model parameters to the latest data.
8. The machine learning method of claim 6, wherein the server asynchronously updates the model parameters of the historical model according to the gradient, and obtaining the updated model further comprises:
judging the retention of the model features according to the feature receding time window, wherein the feature receding time window is a set threshold value of the time interval between the updating time of each feature and the current time, when the time interval between the updating time of one feature and the current time exceeds the threshold value, judging that the feature is an overdue feature, and removing the feature from the model.
9. The machine learning method of claim 6, wherein the server asynchronously updates the model parameters of the historical model according to the gradient, and obtaining the updated model further comprises:
configuring an expansion threshold of a second-order parameter of model features, wherein the threshold is a preset value of the occurrence frequency of the features, and when the occurrence frequency of one feature is greater than the preset value and a first-order parameter of the feature is nonzero, the second-order parameter of the feature is expanded; and automatically closing the expanded second-order parameters of the features after the first-order parameter weights of the features are zeroed by sparse regularization in the updating process.
10. The machine learning method of claim 1, wherein the model is an automatic feature-cross machine learning model.
11. A machine learning system, the system comprising:
the segmentation module is used for acquiring training data from a training database and segmenting the training data into a plurality of data segments;
a distribution module, configured to distribute the plurality of data fragments to a plurality of nodes respectively;
and the optimization module is used for loading a historical model on each node, updating model parameters of the historical model according to the received data fragments, and performing model training and machine learning by adopting the updated model.
12. An electronic device, comprising: a memory, a processor, and a machine learning program stored on the memory and executable on the processor, the machine learning program when executed by the processor implementing the machine learning method of any one of claims 1-10.
13. A computer-readable storage medium, having stored thereon a machine learning program which, when executed by a processor, implements a machine learning method according to any one of claims 1-10.
CN202010735878.5A 2020-07-28 2020-07-28 Machine learning method and system Pending CN114004623A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010735878.5A CN114004623A (en) 2020-07-28 2020-07-28 Machine learning method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010735878.5A CN114004623A (en) 2020-07-28 2020-07-28 Machine learning method and system

Publications (1)

Publication Number Publication Date
CN114004623A true CN114004623A (en) 2022-02-01

Family

ID=79920327

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010735878.5A Pending CN114004623A (en) 2020-07-28 2020-07-28 Machine learning method and system

Country Status (1)

Country Link
CN (1) CN114004623A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114676141A (en) * 2022-03-31 2022-06-28 北京泰迪熊移动科技有限公司 Data processing method and device and electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114676141A (en) * 2022-03-31 2022-06-28 北京泰迪熊移动科技有限公司 Data processing method and device and electronic equipment

Similar Documents

Publication Publication Date Title
CN108536650B (en) Method and device for generating gradient lifting tree model
TW202139045A (en) Privacy protection-based target service model determination
CN108733508B (en) Method and system for controlling data backup
CN108733639B (en) Configuration parameter adjustment method and device, terminal equipment and storage medium
CN112052151A (en) Fault root cause analysis method, device, equipment and storage medium
CN103502899A (en) Dynamic predictive modeling platform
CN110825966A (en) Information recommendation method and device, recommendation server and storage medium
CN111368887B (en) Training method of thunderstorm weather prediction model and thunderstorm weather prediction method
CN113468199B (en) Index updating method and system
WO2015040806A1 (en) Hierarchical latent variable model estimation device, hierarchical latent variable model estimation method, supply amount prediction device, supply amount prediction method, and recording medium
CN112686418A (en) Method and device for predicting performance timeliness
CN114004623A (en) Machine learning method and system
CN108595685B (en) Data processing method and device
CN110704699A (en) Data image construction method and device, computer equipment and storage medium
CN112231299B (en) Method and device for dynamically adjusting feature library
CN111783883A (en) Abnormal data detection method and device
CN110084455B (en) Data processing method, device and system
CN115393100A (en) Resource recommendation method and device
CN112231590B (en) Content recommendation method, system, computer device and storage medium
CN114339689A (en) Internet of things machine card binding pool control method and device and related medium
CN114116744A (en) Method, device and equipment for updating pull chain table and storage medium
CN112836827A (en) Model training method and device and computer equipment
CN112784165A (en) Training method of incidence relation estimation model and method for estimating file popularity
CN111026863A (en) Customer behavior prediction method, apparatus, device and medium
CN110968773A (en) Application recommendation method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination