CN106846082A - Tourism cold start-up consumer products commending system and method based on hardware information - Google Patents

Tourism cold start-up consumer products commending system and method based on hardware information Download PDF

Info

Publication number
CN106846082A
CN106846082A CN201611134210.5A CN201611134210A CN106846082A CN 106846082 A CN106846082 A CN 106846082A CN 201611134210 A CN201611134210 A CN 201611134210A CN 106846082 A CN106846082 A CN 106846082A
Authority
CN
China
Prior art keywords
data
cold
arithmetic elements
tourism
cold start
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611134210.5A
Other languages
Chinese (zh)
Other versions
CN106846082B (en
Inventor
李宏恩
朱存良
裴少芳
李业北
孟敬慈
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu To Mdt Infotech Ltd
Original Assignee
Jiangsu To Mdt Infotech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu To Mdt Infotech Ltd filed Critical Jiangsu To Mdt Infotech Ltd
Priority to CN201611134210.5A priority Critical patent/CN106846082B/en
Publication of CN106846082A publication Critical patent/CN106846082A/en
Application granted granted Critical
Publication of CN106846082B publication Critical patent/CN106846082B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0255Targeted advertisements based on user history
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0269Targeted advertisements based on user profile or attribute
    • G06Q30/0271Personalized advertisement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/14Travel agencies

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Databases & Information Systems (AREA)
  • Tourism & Hospitality (AREA)
  • General Engineering & Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Game Theory and Decision Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Probability & Statistics with Applications (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Primary Health Care (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Recommend the invention discloses a kind of tourism cold start-up consumer products commending system based on hardware information and accordingly method, system includes data preprocessing module, algoritic module and prediction module, data preprocessing module includes data extracting unit, data vector unit and data serialization unit, the computing module includes Canopy arithmetic elements, Kmeans arithmetic elements, with RF arithmetic elements, the present invention utilizes the behavioural information of user, by the way of many algorithm fusions, with hardware as sample, it is characterized with the behavior of product and extracts and model, the combination of hardware is given and is recommended, can think that new user gives personalized recommendation without product information, lifting Consumer's Experience.Due to obtaining the newest output of each unit with daily newest user preference data, periodically each algorithm hyper parameter value of algoritic module is updated, continuous self-optimization, the different data of self adaptation, it is ensured that the accuracy and reliability of system recommendation, it is highly efficient.

Description

Tourism cold start-up consumer products commending system and method based on hardware information
Technical field
The invention belongs to microcomputer data processing field, and in particular to it is a kind of for cold start-up user based on hardware The Products Show system and recommendation method of information.
Background technology
At present, application of the personalized recommendation in each system platform is more and more universal.The existing way of recommendation is mainly root The hobby of user is analyzed according to the historical operation behavior record of user, according to the hobby of user, will be liked with user interest Good corresponding Products Show is to user.This mode can not only lift sales ratio of industrial enterprises, it is also possible to lift user's likability, one Two are lifted to obtain.But it is directed to cold start-up user --- i.e. new user, due to lacking the historical operation behavior record of these users, it is impossible to Historical data according to user carries out personalized recommendation.Therefore, currently for being pushed away in the following ways more than cold start-up user Recommend:
1st, using RECOMENDATION, identical recommendation results are provided to all users.Obviously, this way of recommendation cannot meet completely The individual demand of user, and the exposure rate of RECOMENDATION product is extremely low, it is difficult to lift the exchange hand and new product of whole station Light exposure.
2nd, business personnel or product manager are invited, manually to new user grouping, and recommendation results is formulated to each packet.But This mode needs manually to regularly update packet and recommendation results, not only needs to expend substantial amounts of human cost, and can not answer To mass data, it is impossible to which timely treatment new feature user, inefficiency, also more single, scale is smaller for the product of recommendation.
3rd, according to user log-on message or prompting problem gives and recommends.This way of recommendation is not complete due to information, The individualized feature of recommended products is not also obvious.
The content of the invention
To solve the above problems, the invention discloses a kind of more flexible system for being reliably used for new user recommendation and recommendation Method, by first clustering classification of the method the classified afterwards realization to new user, and recommends the much-sought-after item of correspondence classification.
In order to achieve the above object, the present invention provides following technical scheme:
A kind of tourism cold start-up consumer products commending system based on hardware information, including data preprocessing module, algoritic module And prediction module,
The data preprocessing module includes data extracting unit, data vector unit and data serialization unit,
The data extracting unit is used to select the user data of historical behavior based on time dimension, extracts and corresponding cold opens letter Breath combination, product and PV numbers that correspondence is browsed obtain user preference data table;The data vector unit is used to pass through data Matrixing method, based on user preference data table, information combination as analysis object is opened by the use of cold, and circuit number is done The feature of information combination is opened for this is cold, the cold corresponding relation for opening information combination and all product lists is obtained;The data sequence Cold after changing unit and being used for data vectorization cell processing opens information combination and product list mapping table is serialized;
The computing module includes Canopy arithmetic elements, Kmeans arithmetic elements and RF arithmetic elements,
The Canopy arithmetic elements are used to obtain center dot file after carrying out computing to cold start-up Data Serialization matrix data, Kmeans arithmetic elements carry out further optimization to the central point that Canopy arithmetic elements are obtained and obtain more accurately central point, RF Arithmetic element is used for the central point cluster result obtained according to Kmeans arithmetic elements, and RF model trainings are obtained by random sampling Data, using the method for checking of reporting to the leadship after accomplishing a task, obtain optimum RF forecast model;
The center dot file that the Kmeans arithmetic elements are used to be obtained according to Canopy calculates class center, Kmeans arithmetic elements Also include ClustrClassifier subelements, ClustrClassifier subelements are used to be obtained according to data preprocessing module The matrixing data that are browsed of article and the center dot files that obtain of Kmeans calculated, each in traversal calculating matrix Vector and each midpoint distance, with minimum value as the mark for judging vector generic, and category label are assigned to corresponding It is cold to open information combination, the cold information that opens is clustered, while article most popular under calculating each classification;
The prediction module is used to for online data to be input into RF forecast models, the prediction classification for being returned, and transfers popular thing Product list.
As a further improvement on the present invention, in Canopy arithmetic elements, suitable clusterFilter is preset Comprising the less central point of number of samples in removal cluster result.
As a further improvement on the present invention, after ClusterClassifier subelements carry out cluster output result, also Increase the ratio of between class distance and inter- object distance by the parameter for adjusting Canopy.
As a further improvement on the present invention, prediction module is screened according to the specific filter condition for setting from recommendation list Go out the article of particular community.
A kind of tourism cold start-up consumer products based on hardware information recommend method, comprise the following steps:
Step 1:Select the user data of historical behavior based on time dimension, extract it is corresponding it is cold open information combination, correspondence is clear The product and PV numbers look at, obtain user preference data table;
Step 2:By data matrix method, based on user preference data table, by the use of it is cold open information combination as point Analysis object, by the cold circuit number for opening information combination as the cold feature for opening information combination, obtains cold opening information combination and institute There is the corresponding relation of product list as matrix data;
Step 3:Matrix data in step 2 is serialized;
Step 4:Initial cluster center dot file is obtained by Canopy algorithms, including number of clusters and class center position.Make It is the improvement of the step, suitable clusterFilter, the isolated central point in removal cluster result should be preset;
Step 5:The center dot file that the serialized data and step 4 that the article that step 3 is obtained is browsed are obtained, passes through Mahout platforms obtain the center point data after Kmeans is calculated;
Step 6:The center dot file that the matrixing data and step 5 that the article that acquisition step 2 is obtained is browsed are obtained, traversal meter Each vector and each midpoint distance in matrix are calculated, with minimum value as the mark for judging vector generic, by category label Be assigned to it is corresponding it is cold open information combination, realization opens the cluster of information to cold, and generates popular article in each classification;
Step 7:Cold start-up combined information and its affiliated classification that step 6 is obtained are obtained, obtaining RF models by random sampling instructs Practice data, using the method for checking of reporting to the leadship after accomplishing a task, verify the accuracy of the output result of RF models, adjusted with reference to the limitation of platform resource The number and the depth of tree set in RF models, and make accuracy in tolerance interval, finally give under RF models and storage Come;
Step 8:Reception is cold online to open data and forwards the data to RF models by interface, in the prediction classification for being returned Afterwards, request is sent to storage popular article module of all categories, transfers popular item lists.
As a further improvement on the present invention, suitable clusterFilter removals cluster is preset in the step 4 Comprising the central point that number of samples is less in result.
Also increased by adjusting parameter as improvement of the invention, in the step 6 and compare between class distance and inter- object distance Ratio.
As improvement of the invention, screened from recommendation list always according to the specific filter condition for setting in the step 8 Go out the article of particular community.
Compared with prior art, the invention has the advantages that and beneficial effect:
The present invention utilizes the behavioural information of user, and by the way of many algorithm fusions, with hardware as sample, the behavior with product is Feature extraction is simultaneously modeled, and the combination of hardware is given and is recommended, and can think that new user gives personalized recommendation without product information, Lifting Consumer's Experience, and then lift the conversion ratio of purchase.Due to obtaining each computing module with daily newest user preference data The newest output of unit, is periodically updated, continuous self-optimization to each algorithm hyper parameter value of algoritic module, and self adaptation is not Same data, it is ensured that the accuracy and reliability of system recommendation, it is highly efficient.Meanwhile, user of service can voluntarily select to use The cold information dimension for opening in family, can voluntarily select the size of sample data and the time range of sample data, meet system and match somebody with somebody Putting demand.
Brief description of the drawings
The tourism cold start-up consumer products commending system Organization Chart based on hardware information that Fig. 1 is provided for the present invention.
The user preference data table that Fig. 2 is obtained for data extracting unit.
Fig. 3 is the cold corresponding relation matrix for opening information combination and all product lists of data vector unit.
Fig. 4 is the initial cluster center point that Canopy arithmetic elements are obtained.
Fig. 5 is the class center that Kmeans arithmetic elements are obtained.
Fig. 6 is the cluster result that ClusterClassifier subelements are obtained.
Fig. 7 recommends method flow diagram for the tourism cold start-up consumer products based on hardware information that the present invention is provided.
Specific embodiment
The technical scheme that the present invention is provided is described in detail below with reference to specific embodiment, it should be understood that following specific Implementation method is only illustrative of the invention and is not intended to limit the scope of the invention.
A kind of tourism cold start-up consumer products commending system based on hardware information, as shown in figure 1, including data prediction Module, computing module and prediction module.Data preprocessing module is used to that the user data for having historical behavior to be carried out to extract and pre- Treatment, the user preference data matrix for being serialized, matrix includes cold start-up information;Computing module is used for pre- to data The data matrix that processing module is obtained carries out cluster computing, so as to be classified to cold start-up information and is obtained middle hot topic of all categories Article, and obtain the forecast model classified to new user;Prediction module is used to obtain online cold start-up data, passes through The forecast model for calling computing module to obtain obtains prediction classification, and a step of going forward side by side takes popular item lists.
Data preprocessing module includes data extracting unit, data vector unit and data serialization unit.Data are carried Take the user data that unit selectes historical behavior based on time dimension, extract and corresponding cold open information combination(Such as terminal hardware Information, App version numbers, place city etc.), product and PV numbers that correspondence is browsed(Flow number), obtain user preference data table. User preference data table structure is as shown in Fig. 2 wherein hwinfo is represented and cold opened information combination information, dest_id representative products(It is main Circuit)Numbering, num represent the flow number for using the user of the hardware to the product.Data vector unit is used to pre-process number According to, by data matrix method, based on the user preference data table in Fig. 2, by the use of it is cold open information combination as point Analysis object, by the cold circuit number for opening information combination in Fig. 2 as the cold feature for opening information combination, obtains cold opening information combination With the corresponding relation of all product lists as shown in figure 3, the numerical value in Fig. 3 under each circuit number feature is and each cold opens information group Close the flow number for the circuit.Data Serialization unit to data vectorization cell processing after cold open information combination and product List mapping table is serialized, and serializing function by Mahout realizes, specific to use the customized vectors of Mahout Serializing-org.apache.mahout.math.SequentialAccessSparseVector, by the number of objects in internal memory According to being saved in disk, eliminate and read initial data every time(Disk)The extensive of java objects (to internal memory) is converted into disappear Consumption such that it is able to improve the efficiency of computing under big data high latitude.
Computing module includes Canopy arithmetic elements, Kmeans arithmetic elements and RF arithmetic elements, wherein Canopy computings Unit is used to obtain center dot file, Kmeans arithmetic elements pair after carrying out cold start-up Data Serialization matrix data computing The central point that Canopy arithmetic elements are obtained carries out further optimization and obtains more accurately central point, and RF arithmetic elements are used for basis The central point cluster result that Kmeans arithmetic elements are obtained, obtains RF models.
The Data Serialization matrix that Canopy arithmetic elements are browsed the article that pretreatment module is obtained(Fig. 3)For defeated Enter, T2 initial values obtained by the distance of general similar item in calculating matrix, Canopy model T1, T2 are obtained based on this, The initial setting up of clusterFilter parameters.Specifically, Canopy arithmetic elements are by calculating institute the distance between a little simultaneously Make three-dimensional Discrete point analysis institute a little(One point represents the data line in matrix)Distribution, then rule of thumb select Select suitable T1 and T2, T1 be usually no more than the ultimate range of point-to-point transmission, T2 initial selecteds average distance a little 1/2 Then experimental result is finely adjusted, to cause that the size of number of clusters and each class can receive.Finally, pass through Canopy arithmetic elements obtain initial cluster center dot file, as shown in figure 4, so as to the cluster needed in clear and definite Kmeans algorithms Quantity and the position at class center.As the improvement of Canopy arithmetic elements, preferably in Canopy arithmetic elements, conjunction is preset Suitable clusterFilter(Rule of thumb set, it is considered that not having for our recommendation less than 50 classes of point in this example Too big help, therefore should be filtered), can so remove in cluster result comprising the less central point of number of samples, to keep away Exempt from the situation that a certain classification is null value occur when follow-up Kmeans is clustered, improve the reliability of cluster result.
In Kmeans arithmetic elements, with common use Kmeans for each point cluster calculates class center not with Kmeans algorithms Together, the serialized data that the present invention is browsed article(Fig. 3)The center dot file obtained with Canopy(Fig. 4)As Kmeans The input of arithmetic element, the center dot file obtained according to Canopy with Kmeans arithmetic elements(Fig. 4)Class center is calculated, is obtained Class center as shown in figure 5, and being stored.ClusterClassifier subelements in Kmeans arithmetic elements according to The matrixing data that article is browsed(Fig. 3)The center dot file obtained with Kmeans(Fig. 5)Calculated, traveled through calculating matrix In each vector and each midpoint distance, with minimum value as the mark for judging vector generic, and by category label assignment To it is corresponding it is cold open information combination, while article most popular under calculating each classification, wherein most popular criterion is use Family flow and purchase volume.Can realize opening the cluster of information to cold by ClusterClassifier subelements, and generate each Popular article in classification, cluster result and it is of all categories in popular article be stored, cluster result such as Fig. 6(One in figure Data line in individual original point corresponding diagram 3)It is shown.So use distributed ClusterClassifier methods it is parallel for Substantial amounts of central point is classified, it is possible to increase efficiency simultaneously may be used on real-time scene.Further change as of the invention Enter, after ClusterClassifier subelements carry out cluster output result, by the ratio for calculating between class distance and inter- object distance Value judges Clustering Effect.Then, by adjusting the parameter of Canopy(Adjustable parameter with T1, based on T2, make by singular point threshold value It is auxiliary)Constantly increase ratio so that more separated between class and class, and sample more condenses in same class.By judging distance ratio Value, we cause that the accuracy rate that RF tests oneself is lifted to more than 90% from 70%.
After RF arithmetic elements obtain the cold start-up combined informations that obtain of ClusterClassifier and its affiliated classification, lead to Cross random sampling and obtain RF model training data, using the method for checking of reporting to the leadship after accomplishing a task, verify the output result accuracy of RF models.Knot Close the limitation of platform resource and the accuracy rate requirement of model, the number and the depth of tree set in adjustment RF models.RF models pass through Mahout realizes mainly there are 3 stages:Data explanation document is produced, RF modelings, data are reported to the leadship after accomplishing a task checking.During increase data Pretreatment(Data are changed into the pattern of the input of RF algorithm requirements)With model data self-test process, strengthen the reliability of model. Randomly select first 70% ClusterClassifier output result as model training data, data left is used as reporting to the leadship after accomplishing a task Checking data are used.Data supporting paper is obtained by calling mahout, this document is a part for RF modeling inputs.RF is modeled Process there are certain requirements to physical memory size, and it is situation about frequently encountering on stream that internal memory overflows.Lead in this experiment Cross adjusting parameter nbtrees determine tree number, with ms adjust node punish number Indirect method tree depth, and with model number According to self-test is carried out, multigroup acceptable parameter combination is quickly obtained.Best modeled is obtained eventually through the method for checking of reporting to the leadship after accomplishing a task Parameter, so as to obtain optimum RF forecast model and be stored.Due to the cluster for obtaining ClusterClassifier subelements Result as assorting process input so that training data and test data are easier to obtain, and are adapted to be processed on actual time line, and right User is lacked in partial information, with preferable generalization ability.
Online data is input into RF forecast models, after the prediction classification for being returned, Xiang Cun by prediction module by interface Store up popular article module of all categories and send request, transfer popular item lists.Prediction module can be according to specific setting simultaneously Filter condition filters out the article of particular community from recommendation list.
Based on the above-mentioned tourism cold start-up consumer products commending system based on hardware information, present invention also offers based on hard The tourism cold start-up consumer products of part information recommend method, as shown in fig. 7, comprises following steps:
Step 1:Select the user data of historical behavior based on time dimension, extract and corresponding cold open information combination(Such as terminal Hardware information, App version numbers, place city etc.), product and PV numbers that correspondence is browsed(Flow number), obtain user preference number According to table.
Step 2:By data matrix method, based on user preference data table, made using the cold information combination that opens It is analysis object, by the cold circuit number for opening information combination as the cold feature for opening information combination, obtains cold opening information combination With the corresponding relation of all product lists as matrix data.
Step 3:Matrix data in step 2 is serialized.
Step 4:Initial cluster center dot file is obtained by Canopy algorithms, including number of clusters and class center position Put.As the improvement of the step, suitable clusterFilter, the isolated central point in removal cluster result should be preset.
Step 5:The center dot file that the serialized data and step 4 that the article that step 3 is obtained is browsed are obtained, passes through Mahout platforms obtain the center point data after Kmeans is calculated, and storage center point data.
Step 6:The center dot file that the matrixing data and step 5 that the article that acquisition step 2 is obtained is browsed are obtained, time Each vector and each midpoint distance in calculating matrix are gone through, with minimum value as the mark for judging vector generic, by classification GO TO assignment to it is corresponding it is cold open information combination, realization opens the cluster of information to cold, and generates popular article in each classification, Cluster result and it is of all categories in popular article be stored.As the improvement of this step, class can also be compared by calculating Between distance and inter- object distance ratio, by the final cluster result of ratio in judgement and preserve, while collecting under each classification Popular article and preserve.
Step 7:Cold start-up combined information and its affiliated classification that step 6 is obtained are obtained, RF moulds are obtained by random sampling Type training data, using the method for checking of reporting to the leadship after accomplishing a task, verifies the accuracy of the output result of RF models, with reference to the limitation of platform resource The number and the depth of tree set in adjustment RF models, and make accuracy in tolerance interval, finally give RF models and store up Leave and.
Step 8:Reception is cold online to open data and forwards the data to RF models by interface, in the prediction class for being returned After not, request is sent to popular article module of all categories is stored, transfer popular item lists.As the improvement of this step, may be used also User foreground interface is shifted onto with the article that particular community is filtered out from recommendation list according to the specific filter condition for setting.
Technological means disclosed in the present invention program is not limited only to the technological means disclosed in above-mentioned implementation method, also includes Constituted technical scheme is combined by above technical characteristic.It should be pointed out that for those skilled in the art For, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications are also considered as Protection scope of the present invention.

Claims (8)

1. a kind of tourism cold start-up consumer products commending system based on hardware information, it is characterised in that:Including data prediction Module, algoritic module and prediction module,
The data preprocessing module includes data extracting unit, data vector unit and data serialization unit,
The data extracting unit is used to select the user data of historical behavior based on time dimension, extracts and corresponding cold opens letter Breath combination, product and PV numbers that correspondence is browsed obtain user preference data table;The data vector unit is used to pass through data Matrixing method, based on user preference data table, information combination as analysis object is opened by the use of cold, and circuit number is done The feature of information combination is opened for this is cold, the cold corresponding relation for opening information combination and all product lists is obtained;The data sequence Cold after changing unit and being used for data vectorization cell processing opens information combination and product list mapping table is serialized;
The computing module includes Canopy arithmetic elements, Kmeans arithmetic elements and RF arithmetic elements,
The Canopy arithmetic elements are used to obtain center dot file after carrying out computing to cold start-up Data Serialization matrix data, Kmeans arithmetic elements carry out further optimization to the central point that Canopy arithmetic elements are obtained and obtain more accurately central point, RF Arithmetic element is used for the central point cluster result obtained according to Kmeans arithmetic elements, and RF model trainings are obtained by random sampling Data, using the method for checking of reporting to the leadship after accomplishing a task, obtain optimum RF forecast model;
The center dot file that the Kmeans arithmetic elements are used to be obtained according to Canopy calculates class center, Kmeans arithmetic elements Also include ClustrClassifier subelements, ClustrClassifier subelements are used to be obtained according to data preprocessing module The matrixing data that are browsed of article and the center dot files that obtain of Kmeans calculated, each in traversal calculating matrix Vector and each midpoint distance, with minimum value as the mark for judging vector generic, and category label are assigned to corresponding It is cold to open information combination, the cold information that opens is clustered, while article most popular under calculating each classification;
The prediction module is used to for online data to be input into RF forecast models, the prediction classification for being returned, and transfers popular thing Product list.
2. the tourism cold start-up consumer products commending system based on hardware information according to claim 1, it is characterised in that: In Canopy arithmetic elements, preset less comprising number of samples in suitable clusterFilter removal cluster result Central point.
3. the tourism cold start-up consumer products commending system based on hardware information according to claim 1, it is characterised in that: After ClusterClassifier subelements carry out cluster output result, also increase class spacing by adjusting the parameter of Canopy From the ratio with inter- object distance.
4. the tourism cold start-up consumer products commending system based on hardware information according to claim 1, it is characterised in that: Prediction module filters out the article of particular community according to the specific filter condition for setting from recommendation list.
5. a kind of tourism cold start-up consumer products based on hardware information recommend method, it is characterised in that:Comprise the following steps:
Step 1:Select the user data of historical behavior based on time dimension, extract it is corresponding it is cold open information combination, correspondence is clear The product and PV numbers look at, obtain user preference data table;
Step 2:By data matrix method, based on user preference data table, by the use of it is cold open information combination as point Analysis object, by the cold circuit number for opening information combination as the cold feature for opening information combination, obtains cold opening information combination and institute There is the corresponding relation of product list as matrix data;
Step 3:Matrix data in step 2 is serialized;
Step 4:Initial cluster center dot file is obtained by Canopy algorithms, including number of clusters and class center position;
Step 5:The center dot file that the serialized data and step 4 that the article that step 3 is obtained is browsed are obtained, passes through Mahout platforms obtain the center point data after Kmeans is calculated;
Step 6:The center dot file that the matrixing data and step 5 that the article that acquisition step 2 is obtained is browsed are obtained, traversal meter Each vector and each midpoint distance in matrix are calculated, with minimum value as the mark for judging vector generic, by category label Be assigned to it is corresponding it is cold open information combination, realization opens the cluster of information to cold, and generates popular article in each classification;
Step 7:Cold start-up combined information and its affiliated classification that step 6 is obtained are obtained, obtaining RF models by random sampling instructs Practice data, using the method for checking of reporting to the leadship after accomplishing a task, verify the accuracy of the output result of RF models, adjusted with reference to the limitation of platform resource The number and the depth of tree set in RF models, and make accuracy in tolerance interval, finally give under RF models and storage Come;
Step 8:Reception is cold online to open data and forwards the data to RF models by interface, in the prediction classification for being returned Afterwards, request is sent to storage popular article module of all categories, transfers popular item lists.
6. the tourism cold start-up consumer products based on hardware information according to claim 5 recommend method, it is characterised in that: Preset in the step 4 in suitable clusterFilter removal cluster result comprising the less central point of number of samples.
7. the tourism cold start-up consumer products based on hardware information according to claim 5 recommend method, it is characterised in that: The ratio for comparing between class distance and inter- object distance is also increased by adjusting parameter in the step 6.
8. the tourism cold start-up consumer products based on hardware information according to claim 5 recommend method, it is characterised in that: Filter out the article of particular community in the step 8 from recommendation list always according to the specific filter condition for setting.
CN201611134210.5A 2016-12-10 2016-12-10 Travel cold start user product recommendation system and method based on hardware information Active CN106846082B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611134210.5A CN106846082B (en) 2016-12-10 2016-12-10 Travel cold start user product recommendation system and method based on hardware information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611134210.5A CN106846082B (en) 2016-12-10 2016-12-10 Travel cold start user product recommendation system and method based on hardware information

Publications (2)

Publication Number Publication Date
CN106846082A true CN106846082A (en) 2017-06-13
CN106846082B CN106846082B (en) 2021-07-30

Family

ID=59140727

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611134210.5A Active CN106846082B (en) 2016-12-10 2016-12-10 Travel cold start user product recommendation system and method based on hardware information

Country Status (1)

Country Link
CN (1) CN106846082B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108009877A (en) * 2017-11-24 2018-05-08 阿里巴巴集团控股有限公司 Information mining method and device
CN108629665A (en) * 2018-05-08 2018-10-09 北京邮电大学 A kind of individual commodity recommendation method and system
CN109102903A (en) * 2018-07-09 2018-12-28 康美药业股份有限公司 A kind of topic prediction technique and system for health consultation platform
CN112508512A (en) * 2020-11-26 2021-03-16 国网河北省电力有限公司经济技术研究院 Power grid engineering cost data management method and device and terminal equipment
CN113538110A (en) * 2021-08-13 2021-10-22 苏州工业职业技术学院 Similar article recommendation method based on browsing sequence
CN113744021A (en) * 2021-02-08 2021-12-03 北京沃东天骏信息技术有限公司 Recommendation method, recommendation device, computer storage medium and recommendation system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010013009A1 (en) * 1997-05-20 2001-08-09 Daniel R. Greening System and method for computer-based marketing
CN103455555A (en) * 2013-08-06 2013-12-18 北京大学深圳研究生院 Recommendation method and device based on mobile terminal similarity
CN103559252A (en) * 2013-11-01 2014-02-05 桂林电子科技大学 Method for recommending scenery spots probably browsed by tourists
CN104616221A (en) * 2014-07-30 2015-05-13 江苏物泰信息科技有限公司 Intelligent tour recommendation system
CN106033589A (en) * 2015-03-10 2016-10-19 上海昕鼎网络科技有限公司 Personalized service method and system for tour route

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010013009A1 (en) * 1997-05-20 2001-08-09 Daniel R. Greening System and method for computer-based marketing
CN103455555A (en) * 2013-08-06 2013-12-18 北京大学深圳研究生院 Recommendation method and device based on mobile terminal similarity
CN103559252A (en) * 2013-11-01 2014-02-05 桂林电子科技大学 Method for recommending scenery spots probably browsed by tourists
CN104616221A (en) * 2014-07-30 2015-05-13 江苏物泰信息科技有限公司 Intelligent tour recommendation system
CN106033589A (en) * 2015-03-10 2016-10-19 上海昕鼎网络科技有限公司 Personalized service method and system for tour route

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
HAMID PARVIN等: "Nearest Cluster Classifier", 《HYBRID ARTIFICIAL INTELLIGENT SYSTEMS》 *
冯跃飞等: "《形势与政策》", 31 August 2016 *
吴喜之: "《统计学:从数据到结论》", 31 March 2013 *
张影等: "《预测与评价》", 31 May 2015 *
朱蔷蔷等: "基于Hadoop平台上面向电影数据集Kmeans算法的改进", 《哈尔滨师范大学自然科学学报》 *
郑丹等: "基于weighted_slope_one用户聚类的林产品推荐算法", 《森林工程》 *
郑非等: "《体育统计学》", 31 July 2010 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108009877A (en) * 2017-11-24 2018-05-08 阿里巴巴集团控股有限公司 Information mining method and device
CN108009877B (en) * 2017-11-24 2021-10-15 创新先进技术有限公司 Information mining method and device
CN108629665A (en) * 2018-05-08 2018-10-09 北京邮电大学 A kind of individual commodity recommendation method and system
CN108629665B (en) * 2018-05-08 2021-07-16 北京邮电大学 Personalized commodity recommendation method and system
CN109102903A (en) * 2018-07-09 2018-12-28 康美药业股份有限公司 A kind of topic prediction technique and system for health consultation platform
CN112508512A (en) * 2020-11-26 2021-03-16 国网河北省电力有限公司经济技术研究院 Power grid engineering cost data management method and device and terminal equipment
CN112508512B (en) * 2020-11-26 2022-09-09 国网河北省电力有限公司经济技术研究院 Power grid engineering cost data management method and device and terminal equipment
CN113744021A (en) * 2021-02-08 2021-12-03 北京沃东天骏信息技术有限公司 Recommendation method, recommendation device, computer storage medium and recommendation system
CN113538110A (en) * 2021-08-13 2021-10-22 苏州工业职业技术学院 Similar article recommendation method based on browsing sequence
CN113538110B (en) * 2021-08-13 2023-08-11 苏州工业职业技术学院 Similar article recommending method based on browsing sequence

Also Published As

Publication number Publication date
CN106846082B (en) 2021-07-30

Similar Documents

Publication Publication Date Title
CN106846082A (en) Tourism cold start-up consumer products commending system and method based on hardware information
CN107844915B (en) Automatic scheduling method of call center based on traffic prediction
CN106897420B (en) Mobile phone signaling data-based user travel resident behavior identification method
CN108154430A (en) A kind of credit scoring construction method based on machine learning and big data technology
CN110503531A (en) The dynamic social activity scene recommended method of timing perception
CN111967910A (en) User passenger group classification method and device
CN104750674B (en) A kind of man-machine conversation's satisfaction degree estimation method and system
CN112291807B (en) Wireless cellular network traffic prediction method based on deep migration learning and cross-domain data fusion
CN104866831B (en) The face recognition algorithms of characteristic weighing
CN111178624A (en) Method for predicting new product demand
CN110674993A (en) User load short-term prediction method and device
CN107563343A (en) The self-perfection method and system of FaceID databases based on face recognition technology
CN107633035B (en) Shared traffic service reorder estimation method based on K-Means and LightGBM model
CN106776928A (en) Recommend method in position based on internal memory Computational frame, fusion social environment and space-time data
CN105956570B (en) Smiling face's recognition methods based on lip feature and deep learning
CN113706151A (en) Data processing method and device, computer equipment and storage medium
CN110147389A (en) Account number treating method and apparatus, storage medium and electronic device
CN109978215A (en) Patrol management method and device
CN105844334B (en) A kind of temperature interpolation method based on radial base neural net
CN109754122A (en) A kind of Numerical Predicting Method of the BP neural network based on random forest feature extraction
CN112418476A (en) Ultra-short-term power load prediction method
CN112288172A (en) Prediction method and device for line loss rate of transformer area
CN110222892A (en) The get-off stop prediction technique and device of passenger
CN112767038B (en) Poster CTR prediction method and device based on aesthetic characteristics
CN117436679A (en) Meta-universe resource matching method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant