CN111695035A - Recommendation system and multi-algorithm fusion recommendation processing flow - Google Patents

Recommendation system and multi-algorithm fusion recommendation processing flow Download PDF

Info

Publication number
CN111695035A
CN111695035A CN202010522860.7A CN202010522860A CN111695035A CN 111695035 A CN111695035 A CN 111695035A CN 202010522860 A CN202010522860 A CN 202010522860A CN 111695035 A CN111695035 A CN 111695035A
Authority
CN
China
Prior art keywords
model
data
algorithm
file
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010522860.7A
Other languages
Chinese (zh)
Other versions
CN111695035B (en
Inventor
王劲
周建平
任兆江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Sugo Technology Co ltd
Original Assignee
Guangdong Sugo Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Sugo Technology Co ltd filed Critical Guangdong Sugo Technology Co ltd
Priority to CN202010522860.7A priority Critical patent/CN111695035B/en
Publication of CN111695035A publication Critical patent/CN111695035A/en
Application granted granted Critical
Publication of CN111695035B publication Critical patent/CN111695035B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a recommendation system and a multi-algorithm fusion recommendation processing flow.A characteristic conversion and model training module integrates a plurality of characteristic conversion algorithms and a plurality of model training algorithms, and does not need to be loaded respectively when calling, so that the practicability is good, a new model can be obtained by training and storing only by modifying a configuration file when the model is tuned optimally, and in the actual application process, a user only needs to pay attention to input data and output results, and the intermediate characteristic processing and model training are packaged, so that the independent maintenance cost is saved; the model file storage module stores the model files on the distributed file system and records the basic information of the model files in the relational database, so that the model reading module can read the model basic information conveniently and quickly, the corresponding model files are loaded into the distributed file system according to the model basic information, and the API service module obtains the loaded models and receives network requests to return recommendation results to the front end.

Description

Recommendation system and multi-algorithm fusion recommendation processing flow
Technical Field
The invention relates to a recommendation system and a multi-algorithm fusion recommendation processing flow.
Background
In recent years, with the development and the popularization of mobile internet technology, more and more user behavior data are generated, and a great deal of information is enriched around users, so that the recommendation system can be used. The recommendation system is essentially used for searching interesting information for a user from massive information along with the explosive growth of the scale of the user and the increasing variety of articles provided by suppliers under the condition that the user demand is not clear, and providing accurate personalized recommendation for the user.
The user has dynamic and static attributes, static attributes such as age, gender, etc., dynamic attributes such as historical behavior, context information (login time, login device, etc.), corresponding items also have dynamic and static attributes such as price, label, category, etc., dynamic attributes such as sales activity, discounts, etc. And predicting the articles which the user is interested in by combining the dynamic and static attributes of the user and the articles, providing personalized recommendation for the user and achieving thousands of people.
One of the most common solutions for the current recommendation system is the TensorFlow based GPU. The TensorFlow architecture generally requires a high-performance graphics card to support and can run large-scale data, and the cost is high. Under the condition that the existing medium and small enterprises are Hadoop ecology generally, deployment is difficult, the Transform (feature processing) and the Trainer (model training) are stored separately, and the two parts are required to be called and connected together respectively when being called. The feature processing, model training and recommendation service are completely separated, each item needs independent maintenance, the cost is high, and meanwhile, the feature processing and the super parameter are inconvenient to adjust.
Therefore, how to overcome the above-mentioned drawbacks has become an important issue to be solved by those skilled in the art.
Disclosure of Invention
The invention overcomes the defects of the technology and provides a recommendation system and a multi-algorithm fusion recommendation processing flow.
In order to achieve the purpose, the invention adopts the following technical scheme:
a recommendation system comprising:
the data preprocessing module is used for analyzing input data, converting the input data into a data characteristic column with a specified format and then outputting the data characteristic column, wherein the format of the input data and the format of the data characteristic column are specified through a configuration file;
the characteristic conversion and model training module is used for performing characteristic conversion on the data characteristic column for a plurality of times, converting the data characteristic column into a sample with a required type and format, performing model training for the sample for a plurality of times, and storing an algorithm model, wherein the algorithm and parameters are specified through a configuration file in the characteristic conversion and model training;
the model file storage module is used for storing model files on the distributed file system, wherein the stored contents comprise processing flows and input and output of the data preprocessing module, the characteristic conversion and model training module and configuration information of a characteristic conversion algorithm and a model training algorithm, and recording basic information of the model files into a relational database, wherein the basic information of the model comprises a model file name, a version and a storage path;
the model reading module is used for linking the corresponding relational database, reading model basic information, loading a corresponding model file into the distributed file system according to the model basic information, and analyzing the file to obtain complete input and output information and complete parameters of the model;
and the API service module acquires the loaded model, monitors the network port and receives a network request, wherein the front end sends the network request, the request text comprises a model name and sample characteristics, and the API service module receives the network request and returns a recommendation result to the front end.
Preferably, the feature transformation and model training module adopts a plurality of servers to perform calculation simultaneously, and ensures the correctness of the calculation result by using a communication and aggregation mode, wherein the main server control architecture comprises:
the data and model segmentation unit is used for segmenting input data according to the principle of equal part size, wherein the size of each part of local data does not exceed a specified value, then splitting the model, directly copying the model into a plurality of parts in the calculation process, dividing the model into a plurality of parts of sub-models, and enabling the number of the sub-models to be equal to the number of data splitting results;
the communication mechanism control unit is responsible for sending the sub-models and the local data after being segmented to a plurality of sub-servers in a network mode, the sub-servers receive the sub-models and the local data and then carry out calculation, and after the calculation is finished, the calculation result is also sent to the main server in the network mode to carry out subsequent processing;
and the data and model aggregation unit is used for aggregating the data and the models of the calculation results, aggregating the gradients in reverse propagation in the algorithm training process, namely aggregating the data calculation results of the plurality of sub-servers and updating the main model, wherein the updated results are communicated to each sub-model by the universal communication mechanism control unit, so that the sub-models can be unified in the simultaneous calculation process.
Preferably, the model file storage module serializes the model by using BigDL, generates a model file and stores the model file in a distributed file system, and the model reading module analyzes the model file by using Mleap to generate the pre-estimated model.
Preferably, the feature conversion and model training module updates the model at regular time, and the API service module stores the result of each request and directly obtains the result when the same request is encountered.
The scheme also protects a multi-algorithm fusion recommendation processing flow which sequentially enforces a plurality of algorithms and finally performs API service, wherein the output data of the former algorithm is the input data of the latter algorithm, each algorithm sequentially performs a data preprocessing step, a characteristic conversion and model training step, a model file storage step and a model reading step, wherein,
a data preprocessing step: analyzing input data, converting the input data into a data characteristic column with a specified format, and outputting the data characteristic column, wherein the format of the input data and the format of the data characteristic column are both specified through a configuration file;
and (3) feature transformation and model training: performing a plurality of times of feature conversion on the data feature column, converting the data feature column into a sample with a required type and format, performing a plurality of times of model training on the sample, and storing an algorithm model, wherein the algorithm and parameters are specified through a configuration file in the feature conversion and the model training;
and (3) model file saving step: storing a model file on a distributed file system, wherein the stored content comprises the processing flow and input/output of a data preprocessing step, a characteristic conversion step and a model training step, and the configuration information of a characteristic conversion algorithm and a model training algorithm, and recording the basic information of the model file into a relational database, wherein the basic information of the model comprises the name, the version and a storage path of the model file;
a model reading step: linking the corresponding relational database, reading model basic information, loading a corresponding model file into a distributed file system according to the model basic information, and analyzing the file to obtain complete input and output information and complete parameters of the model;
API service: and acquiring the loaded model, monitoring a network port and receiving a network request, wherein the front end sends the network request, the request text comprises a model name and sample characteristics, and the API service module receives the network request and returns a recommendation result to the front end.
The scheme also protects another multi-algorithm fusion recommendation processing flow which parallelly enforces a plurality of preceding-stage algorithms, then outputs the output results of the preceding-stage algorithms to the same subsequent-stage algorithm, and finally performs API service, wherein each preceding-stage algorithm and each subsequent-stage algorithm sequentially performs a data preprocessing step, a characteristic conversion and model training step, a model file storage step and a model reading step, wherein,
a data preprocessing step: analyzing input data, converting the input data into a data characteristic column with a specified format, and outputting the data characteristic column, wherein the format of the input data and the format of the data characteristic column are both specified through a configuration file;
and (3) feature transformation and model training: performing a plurality of times of feature conversion on the data feature column, converting the data feature column into a sample with a required type and format, performing a plurality of times of model training on the sample, and storing an algorithm model, wherein the algorithm and parameters are specified through a configuration file in the feature conversion and the model training;
and (3) model file saving step: storing a model file on a distributed file system, wherein the stored content comprises the processing flow and input/output of a data preprocessing step, a characteristic conversion step and a model training step, and the configuration information of a characteristic conversion algorithm and a model training algorithm, and recording the basic information of the model file into a relational database, wherein the basic information of the model comprises the name, the version and a storage path of the model file;
a model reading step: linking the corresponding relational database, reading model basic information, loading a corresponding model file into a distributed file system according to the model basic information, and analyzing the file to obtain complete input and output information and complete parameters of the model;
API service: and acquiring the loaded model, monitoring a network port and receiving a network request, wherein the front end sends the network request, the request text comprises a model name and sample characteristics, and the API service module receives the network request and returns a recommendation result to the front end.
Compared with the prior art, the invention has the beneficial effects that:
1. the data preprocessing module is used for analyzing input data into data which can be identified by the characteristic conversion and model training module; the feature conversion and model training module integrates a plurality of feature conversion algorithms and a plurality of model training algorithms, the feature conversion and model training modules do not need to be loaded respectively when being called, the practicability is good, a new model can be obtained only by modifying a configuration file and carrying out training and storage when the model is tuned and optimized, in the practical application process, a user only needs to pay attention to input data and output results, and the intermediate feature processing and model training are packaged, so that the cost of independent maintenance is saved; the algorithm selection of the feature processing and the model training can be realized by modifying the configuration file, and corresponding strategies can be selected according to different application scenes; the model file storage module stores the model files on the distributed file system and records the basic information of the model files in the relational database, so that the model reading module can be favorably linked with the corresponding relational database, the model basic information is read, the corresponding model files are loaded in the distributed file system according to the model basic information, the reading is convenient and quick, the API service module obtains the loaded models, and the network request is received to return the recommendation result to the front end.
2. The structure adopted by the characteristic conversion and model training module is convenient for parallel calculation by using a plurality of servers in the same calculation process, splitting the input data and aggregating respective calculation results, thereby being beneficial to improving the calculation efficiency of the algorithm to the maximum extent.
3. The algorithm in the multi-algorithm fusion recommendation processing flow adopts an upstream and downstream collocation fusion mode, namely, a calculation result that the algorithm depends on another algorithm exists, and the method is convenient for specific application, wherein each algorithm sequentially performs a data preprocessing step, a characteristic conversion and model training step, a model file storage step and a model reading step, and the model is convenient to update once through a plurality of actually selected algorithms.
4. The algorithm in the multi-algorithm fusion recommendation processing flow adopts a parallel collocation and fusion mode, namely, a plurality of algorithms do not have a mutual dependence relationship, but another algorithm depends on the calculation results of the algorithms at the same time, and the method is convenient for specific application.
5. The scheme adopts a lightweight service framework, and the stored model file is used as a medium for offline training and online service, so that the online service does not need to pay attention to complex model training and distributed deployment, the framework is lighter, and the estimation task can be efficiently completed.
6. The feature conversion and model training module updates the model at regular time, the API service module stores the result of each request, and the result is directly obtained when the same request is met, so that the better response to the request is facilitated.
Drawings
Fig. 1 is a structural diagram of the recommendation system of the present disclosure.
Fig. 2 is a schematic diagram of a multi-algorithm fusion recommendation process flow according to the present disclosure.
Fig. 3 is a second schematic diagram of the multi-algorithm fusion recommendation process flow according to the present invention.
Fig. 4 is an architecture that can be adopted by the multi-algorithm fusion recommendation processing flow.
Detailed Description
The features of the present invention and other related features are further described in detail below by way of examples to facilitate understanding by those skilled in the art:
as shown in fig. 1, a recommendation system includes:
the data preprocessing module is used for analyzing input data, converting the input data into a data characteristic column with a specified format and then outputting the data characteristic column, wherein the format of the input data and the format of the data characteristic column are specified through a configuration file;
the characteristic conversion and model training module is used for performing characteristic conversion on the data characteristic column for a plurality of times, converting the data characteristic column into a sample with a required type and format, performing model training for the sample for a plurality of times, and storing an algorithm model, wherein the algorithm and parameters are specified through a configuration file in the characteristic conversion and model training;
the model file storage module is used for storing model files on the distributed file system, wherein the stored contents comprise processing flows and input and output of the data preprocessing module, the characteristic conversion and model training module and configuration information of a characteristic conversion algorithm and a model training algorithm, and recording basic information of the model files into a relational database, wherein the basic information of the model comprises a model file name, a version and a storage path;
the model reading module is used for linking the corresponding relational database, reading model basic information, loading a corresponding model file into the distributed file system according to the model basic information, and analyzing the file to obtain complete input and output information and complete parameters of the model;
and the API service module acquires the loaded model, monitors the network port and receives a network request, wherein the front end sends the network request, the request text comprises a model name and sample characteristics, and the API service module receives the network request and returns a recommendation result to the front end.
As described above, the data preprocessing module is used for analyzing input data into data which can be recognized by the feature conversion and model training module; the feature conversion and model training module integrates a plurality of feature conversion algorithms and a plurality of model training algorithms, the feature conversion and model training modules do not need to be loaded respectively when being called, the practicability is good, a new model can be obtained only by modifying a configuration file and carrying out training and storage when the model is tuned and optimized, in the practical application process, a user only needs to pay attention to input data and output results, and the intermediate feature processing and model training are packaged, so that the cost of independent maintenance is saved; the algorithm selection of the feature processing and the model training can be realized by modifying the configuration file, and corresponding strategies can be selected according to different application scenes; the model file storage module stores the model files on the distributed file system and records the basic information of the model files in the relational database, so that the model reading module can be favorably linked with the corresponding relational database, the model basic information is read, the corresponding model files are loaded in the distributed file system according to the model basic information, the reading is convenient and quick, the API service module obtains the loaded models, and the network request is received to return the recommendation result to the front end.
As described above, in the implementation, the feature transformation and model training module adopts multiple servers to perform computation simultaneously, and uses communication and aggregation to ensure the correctness of the computation result, wherein the main server control architecture includes:
the data and model segmentation unit is used for segmenting input data according to the principle of equal part size, wherein the size of each part of local data does not exceed a specified value, then splitting the model, directly copying the model into a plurality of parts in the calculation process, dividing the model into a plurality of parts of sub-models, and enabling the number of the sub-models to be equal to the number of data splitting results;
the communication mechanism control unit is responsible for sending the sub-models and the local data after being segmented to a plurality of sub-servers in a network mode, the sub-servers receive the sub-models and the local data and then carry out calculation, and after the calculation is finished, the calculation result is also sent to the main server in the network mode to carry out subsequent processing;
and the data and model aggregation unit is used for aggregating the data and the models of the calculation results, aggregating the gradients in reverse propagation in the algorithm training process, namely aggregating the data calculation results of the plurality of sub-servers and updating the main model, wherein the updated results are communicated to each sub-model by the universal communication mechanism control unit, so that the sub-models can be unified in the simultaneous calculation process.
As mentioned above, the structure adopted by the feature conversion and model training module in the scheme is convenient for parallel computation by using a plurality of servers in the same computation process, splitting the input data and aggregating the respective computation results, thereby being beneficial to improving the computation efficiency of the algorithm to the maximum extent.
As described above, in specific implementation, the model file storage module serializes the model by using BigDL, generates a model file, stores the model file in the distributed file system, and the model reading module analyzes the model file by using Mleap, and generates the pre-estimation model.
As mentioned above, the scheme adopts the lightweight service framework, and the stored model file is used as a medium for offline training and online service, so that the online service does not pay attention to complex model training and distributed deployment, the framework is lighter, and the estimation task can be efficiently completed.
As described above, in the implementation, the feature conversion and model training module updates the model at regular time, and the API service module stores the result of each request and directly obtains the result when the same request is encountered, thereby facilitating better response to the request.
As shown in fig. 2, the present application further discloses a multi-algorithm fusion recommendation processing flow, sequentially enforcing a plurality of algorithms, and finally performing API service, wherein the output data of the previous algorithm is the input data of the next algorithm, and each algorithm sequentially performs a data preprocessing step, a feature conversion and model training step, a model file saving step, and a model reading step, wherein,
a data preprocessing step: analyzing input data, converting the input data into a data characteristic column with a specified format, and outputting the data characteristic column, wherein the format of the input data and the format of the data characteristic column are both specified through a configuration file;
and (3) feature transformation and model training: performing a plurality of times of feature conversion on the data feature column, converting the data feature column into a sample with a required type and format, performing a plurality of times of model training on the sample, and storing an algorithm model, wherein the algorithm and parameters are specified through a configuration file in the feature conversion and the model training;
and (3) model file saving step: storing a model file on a distributed file system, wherein the stored content comprises the processing flow and input/output of a data preprocessing step, a characteristic conversion step and a model training step, and the configuration information of a characteristic conversion algorithm and a model training algorithm, and recording the basic information of the model file into a relational database, wherein the basic information of the model comprises the name, the version and a storage path of the model file;
a model reading step: linking the corresponding relational database, reading model basic information, loading a corresponding model file into a distributed file system according to the model basic information, and analyzing the file to obtain complete input and output information and complete parameters of the model;
API service: and acquiring the loaded model, monitoring a network port and receiving a network request, wherein the front end sends the network request, the request text comprises a model name and sample characteristics, and the API service module receives the network request and returns a recommendation result to the front end.
As described above, the algorithms in the multi-algorithm fusion recommendation processing flow of the present disclosure adopt an upstream and downstream collocation fusion manner, that is, there is a calculation result that one algorithm depends on another algorithm, which is convenient for specific application, wherein each algorithm sequentially performs a data preprocessing step, a feature conversion and model training step, a model file saving step, and a model reading step, so as to facilitate updating the model once through a plurality of actually selected algorithms.
As shown in fig. 3, the present application further discloses another multi-algorithm fusion recommendation processing flow, which is implemented by executing a plurality of pre-stage algorithms in parallel, then outputting the output results of the plurality of pre-stage algorithms to the same post-stage algorithm, and finally performing API service, wherein each of the pre-stage algorithms and the post-stage algorithms sequentially performs a data preprocessing step, a feature conversion and model training step, a model file saving step, and a model reading step,
a data preprocessing step: analyzing input data, converting the input data into a data characteristic column with a specified format, and outputting the data characteristic column, wherein the format of the input data and the format of the data characteristic column are both specified through a configuration file;
and (3) feature transformation and model training: performing a plurality of times of feature conversion on the data feature column, converting the data feature column into a sample with a required type and format, performing a plurality of times of model training on the sample, and storing an algorithm model, wherein the algorithm and parameters are specified through a configuration file in the feature conversion and the model training;
and (3) model file saving step: storing a model file on a distributed file system, wherein the stored content comprises the processing flow and input/output of a data preprocessing step, a characteristic conversion step and a model training step, and the configuration information of a characteristic conversion algorithm and a model training algorithm, and recording the basic information of the model file into a relational database, wherein the basic information of the model comprises the name, the version and a storage path of the model file;
a model reading step: linking the corresponding relational database, reading model basic information, loading a corresponding model file into a distributed file system according to the model basic information, and analyzing the file to obtain complete input and output information and complete parameters of the model;
API service: and acquiring the loaded model, monitoring a network port and receiving a network request, wherein the front end sends the network request, the request text comprises a model name and sample characteristics, and the API service module receives the network request and returns a recommendation result to the front end.
As described above, the algorithms in the multi-algorithm fusion recommendation processing flow adopt a parallel collocation and fusion mode, that is, there is no interdependence relationship among a plurality of algorithms, but there is a calculation result that another algorithm depends on the plurality of algorithms at the same time, which is convenient for specific application.
As shown in fig. 4, both of the two multi-algorithm fusion recommendation processing flows can use a master-gateway architecture in a cluster, and include a master node and a plurality of gateway nodes. The gateway is used for receiving and processing the service request, providing a calculation result of the algorithm, and simultaneously has the function of dynamically acquiring model change, updating the model in real time and supporting horizontal expansion; the master is used for managing the model and fusing the strategy, has the function of strategy management, supports the main and standby modes and ensures high availability of service.
1. Predefining a multi-algorithm fusion strategy, and sending the well-defined fusion strategy to a mass by a user in a request form;
2. the master receives the strategy change request, changes the strategy information stored by the master and sends strategy change notification to all gateway;
3. and the gateway receives the change request, changes the algorithm fusion strategy, continuously receives the http request and provides interface service.
As described above, the master-gateway architecture facilitates real-time updating of fusion policies.
In summary, the user of the recommendation system of the present application does not need to pay attention to the operation of each part of the recommendation service, and can obtain the corresponding recommendation result by inputting slightly processed raw data. The user can simply select the required multi-algorithm fusion recommendation processing flow through the configuration file according to different user scenes, and can select the collocation of the upstream algorithm and the downstream algorithm, such as a recall algorithm and a sorting algorithm, to return a more accurate recommendation result by configuring the master service.
As described above, the present disclosure is directed to a recommendation system and a multi-algorithm fusion recommendation processing flow, and all technical solutions that are the same as or similar to the present disclosure should be considered as falling within the scope of the present disclosure.

Claims (6)

1. A recommendation system, comprising:
the data preprocessing module is used for analyzing input data, converting the input data into a data characteristic column with a specified format and then outputting the data characteristic column, wherein the format of the input data and the format of the data characteristic column are specified through a configuration file;
the characteristic conversion and model training module is used for performing characteristic conversion on the data characteristic column for a plurality of times, converting the data characteristic column into a sample with a required type and format, performing model training for the sample for a plurality of times, and storing an algorithm model, wherein the algorithm and parameters are specified through a configuration file in the characteristic conversion and model training;
the model file storage module is used for storing model files on the distributed file system, wherein the stored contents comprise processing flows and input and output of the data preprocessing module, the characteristic conversion and model training module and configuration information of a characteristic conversion algorithm and a model training algorithm, and recording basic information of the model files into a relational database, wherein the basic information of the model comprises a model file name, a version and a storage path;
the model reading module is used for linking the corresponding relational database, reading model basic information, loading a corresponding model file into the distributed file system according to the model basic information, and analyzing the file to obtain complete input and output information and complete parameters of the model;
and the API service module acquires the loaded model, monitors the network port and receives a network request, wherein the front end sends the network request, the request text comprises a model name and sample characteristics, and the API service module receives the network request and returns a recommendation result to the front end.
2. The recommendation system according to claim 1, wherein the feature transformation and model training module employs multiple servers to perform computation simultaneously, and uses communication and aggregation to ensure the correctness of the computation result, wherein the main server control architecture comprises:
the data and model segmentation unit is used for segmenting input data according to the principle of equal part size, wherein the size of each part of local data does not exceed a specified value, then splitting the model, directly copying the model into a plurality of parts in the calculation process, dividing the model into a plurality of parts of sub-models, and enabling the number of the sub-models to be equal to the number of data splitting results;
the communication mechanism control unit is responsible for sending the sub-models and the local data after being segmented to a plurality of sub-servers in a network mode, the sub-servers receive the sub-models and the local data and then carry out calculation, and after the calculation is finished, the calculation result is also sent to the main server in the network mode to carry out subsequent processing;
and the data and model aggregation unit is used for aggregating the data and the models of the calculation results, aggregating the gradients in reverse propagation in the algorithm training process, namely aggregating the data calculation results of the plurality of sub-servers and updating the main model, wherein the updated results are communicated to each sub-model by the universal communication mechanism control unit, so that the sub-models can be unified in the simultaneous calculation process.
3. The recommendation system according to claim 1, wherein the model file storage module serializes the model using BigDL, generates the model file and stores the model file in the distributed file system, and the model reading module parses the model file using Mleap, and generates the pre-estimation model.
4. The recommendation system according to claim 1, wherein said feature transformation and model training module periodically updates the model, and said API service module saves the result of each request and obtains the result directly when the same request is encountered.
5. A multi-algorithm fusion recommendation processing flow is characterized in that a plurality of algorithms are sequentially executed, and API service is finally performed, wherein output data of a former algorithm is input data of a latter algorithm, each algorithm sequentially performs a data preprocessing step, a feature conversion and model training step, a model file storage step and a model reading step, wherein,
a data preprocessing step: analyzing input data, converting the input data into a data characteristic column with a specified format, and outputting the data characteristic column, wherein the format of the input data and the format of the data characteristic column are both specified through a configuration file;
and (3) feature transformation and model training: performing a plurality of times of feature conversion on the data feature column, converting the data feature column into a sample with a required type and format, performing a plurality of times of model training on the sample, and storing an algorithm model, wherein the algorithm and parameters are specified through a configuration file in the feature conversion and the model training;
and (3) model file saving step: storing a model file on a distributed file system, wherein the stored content comprises the processing flow and input/output of a data preprocessing step, a characteristic conversion step and a model training step, and the configuration information of a characteristic conversion algorithm and a model training algorithm, and recording the basic information of the model file into a relational database, wherein the basic information of the model comprises the name, the version and a storage path of the model file;
a model reading step: linking the corresponding relational database, reading model basic information, loading a corresponding model file into a distributed file system according to the model basic information, and analyzing the file to obtain complete input and output information and complete parameters of the model;
API service: and acquiring the loaded model, monitoring a network port and receiving a network request, wherein the front end sends the network request, the request text comprises a model name and sample characteristics, and the API service module receives the network request and returns a recommendation result to the front end.
6. A multi-algorithm fusion recommendation processing flow is characterized in that a plurality of pre-stage algorithms are executed in parallel, then output results of the pre-stage algorithms are all output to the same post-stage algorithm, and finally API service is carried out, wherein each pre-stage algorithm and each post-stage algorithm sequentially carry out a data preprocessing step, a feature conversion and model training step, a model file storage step and a model reading step,
a data preprocessing step: analyzing input data, converting the input data into a data characteristic column with a specified format, and outputting the data characteristic column, wherein the format of the input data and the format of the data characteristic column are both specified through a configuration file;
and (3) feature transformation and model training: performing a plurality of times of feature conversion on the data feature column, converting the data feature column into a sample with a required type and format, performing a plurality of times of model training on the sample, and storing an algorithm model, wherein the algorithm and parameters are specified through a configuration file in the feature conversion and the model training;
and (3) model file saving step: storing a model file on a distributed file system, wherein the stored content comprises the processing flow and input/output of a data preprocessing step, a characteristic conversion step and a model training step, and the configuration information of a characteristic conversion algorithm and a model training algorithm, and recording the basic information of the model file into a relational database, wherein the basic information of the model comprises the name, the version and a storage path of the model file;
a model reading step: linking the corresponding relational database, reading model basic information, loading a corresponding model file into a distributed file system according to the model basic information, and analyzing the file to obtain complete input and output information and complete parameters of the model;
API service: and acquiring the loaded model, monitoring a network port and receiving a network request, wherein the front end sends the network request, the request text comprises a model name and sample characteristics, and the API service module receives the network request and returns a recommendation result to the front end.
CN202010522860.7A 2020-06-10 2020-06-10 Recommendation system and multi-algorithm fusion recommendation processing flow Active CN111695035B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010522860.7A CN111695035B (en) 2020-06-10 2020-06-10 Recommendation system and multi-algorithm fusion recommendation processing flow

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010522860.7A CN111695035B (en) 2020-06-10 2020-06-10 Recommendation system and multi-algorithm fusion recommendation processing flow

Publications (2)

Publication Number Publication Date
CN111695035A true CN111695035A (en) 2020-09-22
CN111695035B CN111695035B (en) 2023-05-05

Family

ID=72480103

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010522860.7A Active CN111695035B (en) 2020-06-10 2020-06-10 Recommendation system and multi-algorithm fusion recommendation processing flow

Country Status (1)

Country Link
CN (1) CN111695035B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112560939A (en) * 2020-12-11 2021-03-26 上海哔哩哔哩科技有限公司 Model verification method and device and computer equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107423442A (en) * 2017-08-07 2017-12-01 火烈鸟网络(广州)股份有限公司 Method and system, storage medium and computer equipment are recommended in application based on user's portrait behavioural analysis
CN110221817A (en) * 2019-06-17 2019-09-10 北京酷我科技有限公司 A kind of data recall module and recommender system
CN110633760A (en) * 2019-09-25 2019-12-31 北京酷我科技有限公司 Recommendation system integration strategy and recommendation system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107423442A (en) * 2017-08-07 2017-12-01 火烈鸟网络(广州)股份有限公司 Method and system, storage medium and computer equipment are recommended in application based on user's portrait behavioural analysis
CN110221817A (en) * 2019-06-17 2019-09-10 北京酷我科技有限公司 A kind of data recall module and recommender system
CN110633760A (en) * 2019-09-25 2019-12-31 北京酷我科技有限公司 Recommendation system integration strategy and recommendation system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112560939A (en) * 2020-12-11 2021-03-26 上海哔哩哔哩科技有限公司 Model verification method and device and computer equipment
CN112560939B (en) * 2020-12-11 2023-05-23 上海哔哩哔哩科技有限公司 Model verification method and device and computer equipment

Also Published As

Publication number Publication date
CN111695035B (en) 2023-05-05

Similar Documents

Publication Publication Date Title
CN109033387B (en) Internet of things searching system and method fusing multi-source data and storage medium
CN105868334B (en) Feature incremental type-based personalized movie recommendation method and system
WO2017167095A1 (en) Model training method and device
US20160196564A1 (en) Systems and methods for analyzing consumer sentiment with social perspective insight
CN110362544A (en) Log processing system, log processing method, terminal and storage medium
CN111898698B (en) Object processing method and device, storage medium and electronic equipment
CN106651544A (en) Conversational recommendation system for minimum user interaction
CN112988741A (en) Real-time service data merging method and device and electronic equipment
CN113420043A (en) Data real-time monitoring method, device, equipment and storage medium
CN114663155A (en) Advertisement putting and selecting method and device, equipment, medium and product thereof
CN110795613A (en) Commodity searching method, device and system and electronic equipment
Ibtisum A Comparative Study on Different Big Data Tools
Yeung et al. Data analytics architectures for e-commerce platforms in cloud
CN111695035B (en) Recommendation system and multi-algorithm fusion recommendation processing flow
CN112506887B (en) Vehicle terminal CAN bus data processing method and device
Su et al. Classification and interaction of new media instant music video based on deep learning under the background of artificial intelligence
CN107357919A (en) User behaviors log inquiry system and method
WO2023087933A1 (en) Content recommendation method and apparatus, device, storage medium, and program product
CN116506498A (en) Cloud computing-based data accurate pushing method
CN110062112A (en) Data processing method, device, equipment and computer readable storage medium
Martínez-Castaño et al. Polypus: a big data self-deployable architecture for microblogging text extraction and real-time sentiment analysis
CN115619475A (en) Commodity recommendation method, commodity recommendation system and related devices
WO2018049908A1 (en) Web page generation method and device
Fu et al. The Design of Personalized Education Resource Recommendation System under Big Data
CN111625568B (en) Big data statistics collection algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant