CN111695035A

CN111695035A - Recommendation system and multi-algorithm fusion recommendation processing flow

Info

Publication number: CN111695035A
Application number: CN202010522860.7A
Authority: CN
Inventors: 王劲; 周建平; 任兆江
Original assignee: Guangdong Sugo Technology Co ltd
Current assignee: Guangdong Sugo Technology Co ltd
Priority date: 2020-06-10
Filing date: 2020-06-10
Publication date: 2020-09-22
Anticipated expiration: 2040-06-10
Also published as: CN111695035B

Abstract

The invention discloses a recommendation system and a multi-algorithm fusion recommendation processing flow.A characteristic conversion and model training module integrates a plurality of characteristic conversion algorithms and a plurality of model training algorithms, and does not need to be loaded respectively when calling, so that the practicability is good, a new model can be obtained by training and storing only by modifying a configuration file when the model is tuned optimally, and in the actual application process, a user only needs to pay attention to input data and output results, and the intermediate characteristic processing and model training are packaged, so that the independent maintenance cost is saved; the model file storage module stores the model files on the distributed file system and records the basic information of the model files in the relational database, so that the model reading module can read the model basic information conveniently and quickly, the corresponding model files are loaded into the distributed file system according to the model basic information, and the API service module obtains the loaded models and receives network requests to return recommendation results to the front end.

Description

Recommendation system and multi-algorithm fusion recommendation processing flow

Technical Field

The invention relates to a recommendation system and a multi-algorithm fusion recommendation processing flow.

Background

In recent years, with the development and the popularization of mobile internet technology, more and more user behavior data are generated, and a great deal of information is enriched around users, so that the recommendation system can be used. The recommendation system is essentially used for searching interesting information for a user from massive information along with the explosive growth of the scale of the user and the increasing variety of articles provided by suppliers under the condition that the user demand is not clear, and providing accurate personalized recommendation for the user.

The user has dynamic and static attributes, static attributes such as age, gender, etc., dynamic attributes such as historical behavior, context information (login time, login device, etc.), corresponding items also have dynamic and static attributes such as price, label, category, etc., dynamic attributes such as sales activity, discounts, etc. And predicting the articles which the user is interested in by combining the dynamic and static attributes of the user and the articles, providing personalized recommendation for the user and achieving thousands of people.

One of the most common solutions for the current recommendation system is the TensorFlow based GPU. The TensorFlow architecture generally requires a high-performance graphics card to support and can run large-scale data, and the cost is high. Under the condition that the existing medium and small enterprises are Hadoop ecology generally, deployment is difficult, the Transform (feature processing) and the Trainer (model training) are stored separately, and the two parts are required to be called and connected together respectively when being called. The feature processing, model training and recommendation service are completely separated, each item needs independent maintenance, the cost is high, and meanwhile, the feature processing and the super parameter are inconvenient to adjust.

Therefore, how to overcome the above-mentioned drawbacks has become an important issue to be solved by those skilled in the art.

Disclosure of Invention

The invention overcomes the defects of the technology and provides a recommendation system and a multi-algorithm fusion recommendation processing flow.

In order to achieve the purpose, the invention adopts the following technical scheme:

a recommendation system comprising:

the data preprocessing module is used for analyzing input data, converting the input data into a data characteristic column with a specified format and then outputting the data characteristic column, wherein the format of the input data and the format of the data characteristic column are specified through a configuration file;

the characteristic conversion and model training module is used for performing characteristic conversion on the data characteristic column for a plurality of times, converting the data characteristic column into a sample with a required type and format, performing model training for the sample for a plurality of times, and storing an algorithm model, wherein the algorithm and parameters are specified through a configuration file in the characteristic conversion and model training;

the model file storage module is used for storing model files on the distributed file system, wherein the stored contents comprise processing flows and input and output of the data preprocessing module, the characteristic conversion and model training module and configuration information of a characteristic conversion algorithm and a model training algorithm, and recording basic information of the model files into a relational database, wherein the basic information of the model comprises a model file name, a version and a storage path;

the model reading module is used for linking the corresponding relational database, reading model basic information, loading a corresponding model file into the distributed file system according to the model basic information, and analyzing the file to obtain complete input and output information and complete parameters of the model;

and the API service module acquires the loaded model, monitors the network port and receives a network request, wherein the front end sends the network request, the request text comprises a model name and sample characteristics, and the API service module receives the network request and returns a recommendation result to the front end.

Preferably, the feature transformation and model training module adopts a plurality of servers to perform calculation simultaneously, and ensures the correctness of the calculation result by using a communication and aggregation mode, wherein the main server control architecture comprises:

the data and model segmentation unit is used for segmenting input data according to the principle of equal part size, wherein the size of each part of local data does not exceed a specified value, then splitting the model, directly copying the model into a plurality of parts in the calculation process, dividing the model into a plurality of parts of sub-models, and enabling the number of the sub-models to be equal to the number of data splitting results;

the communication mechanism control unit is responsible for sending the sub-models and the local data after being segmented to a plurality of sub-servers in a network mode, the sub-servers receive the sub-models and the local data and then carry out calculation, and after the calculation is finished, the calculation result is also sent to the main server in the network mode to carry out subsequent processing;

and the data and model aggregation unit is used for aggregating the data and the models of the calculation results, aggregating the gradients in reverse propagation in the algorithm training process, namely aggregating the data calculation results of the plurality of sub-servers and updating the main model, wherein the updated results are communicated to each sub-model by the universal communication mechanism control unit, so that the sub-models can be unified in the simultaneous calculation process.

Preferably, the model file storage module serializes the model by using BigDL, generates a model file and stores the model file in a distributed file system, and the model reading module analyzes the model file by using Mleap to generate the pre-estimated model.

Preferably, the feature conversion and model training module updates the model at regular time, and the API service module stores the result of each request and directly obtains the result when the same request is encountered.

The scheme also protects a multi-algorithm fusion recommendation processing flow which sequentially enforces a plurality of algorithms and finally performs API service, wherein the output data of the former algorithm is the input data of the latter algorithm, each algorithm sequentially performs a data preprocessing step, a characteristic conversion and model training step, a model file storage step and a model reading step, wherein,

a data preprocessing step: analyzing input data, converting the input data into a data characteristic column with a specified format, and outputting the data characteristic column, wherein the format of the input data and the format of the data characteristic column are both specified through a configuration file;

and (3) feature transformation and model training: performing a plurality of times of feature conversion on the data feature column, converting the data feature column into a sample with a required type and format, performing a plurality of times of model training on the sample, and storing an algorithm model, wherein the algorithm and parameters are specified through a configuration file in the feature conversion and the model training;

and (3) model file saving step: storing a model file on a distributed file system, wherein the stored content comprises the processing flow and input/output of a data preprocessing step, a characteristic conversion step and a model training step, and the configuration information of a characteristic conversion algorithm and a model training algorithm, and recording the basic information of the model file into a relational database, wherein the basic information of the model comprises the name, the version and a storage path of the model file;

a model reading step: linking the corresponding relational database, reading model basic information, loading a corresponding model file into a distributed file system according to the model basic information, and analyzing the file to obtain complete input and output information and complete parameters of the model;

API service: and acquiring the loaded model, monitoring a network port and receiving a network request, wherein the front end sends the network request, the request text comprises a model name and sample characteristics, and the API service module receives the network request and returns a recommendation result to the front end.

The scheme also protects another multi-algorithm fusion recommendation processing flow which parallelly enforces a plurality of preceding-stage algorithms, then outputs the output results of the preceding-stage algorithms to the same subsequent-stage algorithm, and finally performs API service, wherein each preceding-stage algorithm and each subsequent-stage algorithm sequentially performs a data preprocessing step, a characteristic conversion and model training step, a model file storage step and a model reading step, wherein,

Compared with the prior art, the invention has the beneficial effects that:

1. the data preprocessing module is used for analyzing input data into data which can be identified by the characteristic conversion and model training module; the feature conversion and model training module integrates a plurality of feature conversion algorithms and a plurality of model training algorithms, the feature conversion and model training modules do not need to be loaded respectively when being called, the practicability is good, a new model can be obtained only by modifying a configuration file and carrying out training and storage when the model is tuned and optimized, in the practical application process, a user only needs to pay attention to input data and output results, and the intermediate feature processing and model training are packaged, so that the cost of independent maintenance is saved; the algorithm selection of the feature processing and the model training can be realized by modifying the configuration file, and corresponding strategies can be selected according to different application scenes; the model file storage module stores the model files on the distributed file system and records the basic information of the model files in the relational database, so that the model reading module can be favorably linked with the corresponding relational database, the model basic information is read, the corresponding model files are loaded in the distributed file system according to the model basic information, the reading is convenient and quick, the API service module obtains the loaded models, and the network request is received to return the recommendation result to the front end.

2. The structure adopted by the characteristic conversion and model training module is convenient for parallel calculation by using a plurality of servers in the same calculation process, splitting the input data and aggregating respective calculation results, thereby being beneficial to improving the calculation efficiency of the algorithm to the maximum extent.

3. The algorithm in the multi-algorithm fusion recommendation processing flow adopts an upstream and downstream collocation fusion mode, namely, a calculation result that the algorithm depends on another algorithm exists, and the method is convenient for specific application, wherein each algorithm sequentially performs a data preprocessing step, a characteristic conversion and model training step, a model file storage step and a model reading step, and the model is convenient to update once through a plurality of actually selected algorithms.

4. The algorithm in the multi-algorithm fusion recommendation processing flow adopts a parallel collocation and fusion mode, namely, a plurality of algorithms do not have a mutual dependence relationship, but another algorithm depends on the calculation results of the algorithms at the same time, and the method is convenient for specific application.

5. The scheme adopts a lightweight service framework, and the stored model file is used as a medium for offline training and online service, so that the online service does not need to pay attention to complex model training and distributed deployment, the framework is lighter, and the estimation task can be efficiently completed.

6. The feature conversion and model training module updates the model at regular time, the API service module stores the result of each request, and the result is directly obtained when the same request is met, so that the better response to the request is facilitated.

Drawings

Fig. 1 is a structural diagram of the recommendation system of the present disclosure.

Fig. 2 is a schematic diagram of a multi-algorithm fusion recommendation process flow according to the present disclosure.

Fig. 3 is a second schematic diagram of the multi-algorithm fusion recommendation process flow according to the present invention.

Fig. 4 is an architecture that can be adopted by the multi-algorithm fusion recommendation processing flow.

Detailed Description

The features of the present invention and other related features are further described in detail below by way of examples to facilitate understanding by those skilled in the art:

as shown in fig. 1, a recommendation system includes:

As described above, the data preprocessing module is used for analyzing input data into data which can be recognized by the feature conversion and model training module; the feature conversion and model training module integrates a plurality of feature conversion algorithms and a plurality of model training algorithms, the feature conversion and model training modules do not need to be loaded respectively when being called, the practicability is good, a new model can be obtained only by modifying a configuration file and carrying out training and storage when the model is tuned and optimized, in the practical application process, a user only needs to pay attention to input data and output results, and the intermediate feature processing and model training are packaged, so that the cost of independent maintenance is saved; the algorithm selection of the feature processing and the model training can be realized by modifying the configuration file, and corresponding strategies can be selected according to different application scenes; the model file storage module stores the model files on the distributed file system and records the basic information of the model files in the relational database, so that the model reading module can be favorably linked with the corresponding relational database, the model basic information is read, the corresponding model files are loaded in the distributed file system according to the model basic information, the reading is convenient and quick, the API service module obtains the loaded models, and the network request is received to return the recommendation result to the front end.

As described above, in the implementation, the feature transformation and model training module adopts multiple servers to perform computation simultaneously, and uses communication and aggregation to ensure the correctness of the computation result, wherein the main server control architecture includes:

As mentioned above, the structure adopted by the feature conversion and model training module in the scheme is convenient for parallel computation by using a plurality of servers in the same computation process, splitting the input data and aggregating the respective computation results, thereby being beneficial to improving the computation efficiency of the algorithm to the maximum extent.

As described above, in specific implementation, the model file storage module serializes the model by using BigDL, generates a model file, stores the model file in the distributed file system, and the model reading module analyzes the model file by using Mleap, and generates the pre-estimation model.

As mentioned above, the scheme adopts the lightweight service framework, and the stored model file is used as a medium for offline training and online service, so that the online service does not pay attention to complex model training and distributed deployment, the framework is lighter, and the estimation task can be efficiently completed.

As described above, in the implementation, the feature conversion and model training module updates the model at regular time, and the API service module stores the result of each request and directly obtains the result when the same request is encountered, thereby facilitating better response to the request.

As shown in fig. 2, the present application further discloses a multi-algorithm fusion recommendation processing flow, sequentially enforcing a plurality of algorithms, and finally performing API service, wherein the output data of the previous algorithm is the input data of the next algorithm, and each algorithm sequentially performs a data preprocessing step, a feature conversion and model training step, a model file saving step, and a model reading step, wherein,

As described above, the algorithms in the multi-algorithm fusion recommendation processing flow of the present disclosure adopt an upstream and downstream collocation fusion manner, that is, there is a calculation result that one algorithm depends on another algorithm, which is convenient for specific application, wherein each algorithm sequentially performs a data preprocessing step, a feature conversion and model training step, a model file saving step, and a model reading step, so as to facilitate updating the model once through a plurality of actually selected algorithms.

As shown in fig. 3, the present application further discloses another multi-algorithm fusion recommendation processing flow, which is implemented by executing a plurality of pre-stage algorithms in parallel, then outputting the output results of the plurality of pre-stage algorithms to the same post-stage algorithm, and finally performing API service, wherein each of the pre-stage algorithms and the post-stage algorithms sequentially performs a data preprocessing step, a feature conversion and model training step, a model file saving step, and a model reading step,

As described above, the algorithms in the multi-algorithm fusion recommendation processing flow adopt a parallel collocation and fusion mode, that is, there is no interdependence relationship among a plurality of algorithms, but there is a calculation result that another algorithm depends on the plurality of algorithms at the same time, which is convenient for specific application.

As shown in fig. 4, both of the two multi-algorithm fusion recommendation processing flows can use a master-gateway architecture in a cluster, and include a master node and a plurality of gateway nodes. The gateway is used for receiving and processing the service request, providing a calculation result of the algorithm, and simultaneously has the function of dynamically acquiring model change, updating the model in real time and supporting horizontal expansion; the master is used for managing the model and fusing the strategy, has the function of strategy management, supports the main and standby modes and ensures high availability of service.

1. Predefining a multi-algorithm fusion strategy, and sending the well-defined fusion strategy to a mass by a user in a request form;

2. the master receives the strategy change request, changes the strategy information stored by the master and sends strategy change notification to all gateway;

3. and the gateway receives the change request, changes the algorithm fusion strategy, continuously receives the http request and provides interface service.

As described above, the master-gateway architecture facilitates real-time updating of fusion policies.

In summary, the user of the recommendation system of the present application does not need to pay attention to the operation of each part of the recommendation service, and can obtain the corresponding recommendation result by inputting slightly processed raw data. The user can simply select the required multi-algorithm fusion recommendation processing flow through the configuration file according to different user scenes, and can select the collocation of the upstream algorithm and the downstream algorithm, such as a recall algorithm and a sorting algorithm, to return a more accurate recommendation result by configuring the master service.

As described above, the present disclosure is directed to a recommendation system and a multi-algorithm fusion recommendation processing flow, and all technical solutions that are the same as or similar to the present disclosure should be considered as falling within the scope of the present disclosure.

Claims

1. A recommendation system, comprising:

2. The recommendation system according to claim 1, wherein the feature transformation and model training module employs multiple servers to perform computation simultaneously, and uses communication and aggregation to ensure the correctness of the computation result, wherein the main server control architecture comprises:

3. The recommendation system according to claim 1, wherein the model file storage module serializes the model using BigDL, generates the model file and stores the model file in the distributed file system, and the model reading module parses the model file using Mleap, and generates the pre-estimation model.

4. The recommendation system according to claim 1, wherein said feature transformation and model training module periodically updates the model, and said API service module saves the result of each request and obtains the result directly when the same request is encountered.

5. A multi-algorithm fusion recommendation processing flow is characterized in that a plurality of algorithms are sequentially executed, and API service is finally performed, wherein output data of a former algorithm is input data of a latter algorithm, each algorithm sequentially performs a data preprocessing step, a feature conversion and model training step, a model file storage step and a model reading step, wherein,

6. A multi-algorithm fusion recommendation processing flow is characterized in that a plurality of pre-stage algorithms are executed in parallel, then output results of the pre-stage algorithms are all output to the same post-stage algorithm, and finally API service is carried out, wherein each pre-stage algorithm and each post-stage algorithm sequentially carry out a data preprocessing step, a feature conversion and model training step, a model file storage step and a model reading step,