CN115712780A - Information pushing method and device based on cloud computing and big data - Google Patents

Information pushing method and device based on cloud computing and big data Download PDF

Info

Publication number
CN115712780A
CN115712780A CN202211376436.1A CN202211376436A CN115712780A CN 115712780 A CN115712780 A CN 115712780A CN 202211376436 A CN202211376436 A CN 202211376436A CN 115712780 A CN115712780 A CN 115712780A
Authority
CN
China
Prior art keywords
user
data
spot
matrix
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211376436.1A
Other languages
Chinese (zh)
Inventor
黄嵩
徐辣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Bitpower Information Technology Co ltd
Original Assignee
Shenzhen Bitpower Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Bitpower Information Technology Co ltd filed Critical Shenzhen Bitpower Information Technology Co ltd
Priority to CN202211376436.1A priority Critical patent/CN115712780A/en
Publication of CN115712780A publication Critical patent/CN115712780A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The invention discloses an information pushing method and device based on cloud computing and big data, which are characterized in that scenic spot information is obtained and classified, keyword information is extracted from the scenic spot information to judge the type of the scenic spot, a scenic spot evaluation index is constructed according to the type of the scenic spot, a scoring matrix of a user and the scenic spot and an index matrix of the scenic spot and the evaluation index are established, a first similarity between every two scenic spots in a data calculation matrix in the scenic spot and evaluation index matrix is added according to the index weight and the fitting degree of user scoring, a nearest neighbor set is obtained according to the first similarity sorting, the nearest neighbor user is obtained by the scoring of the user and the scenic spot scoring matrix and calculating the second similarity between the users according to the score of the scenic spots to complete scenic spot information pushing, the scenic spot evaluation index is established, the related scenic spot index data is collected, a scenic spot-index system matrix is obtained by combining the determined index weight, higher pushing quality is achieved, and the information pushing accuracy and the working efficiency are improved.

Description

Information pushing method and device based on cloud computing and big data
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to an information pushing method and device based on cloud computing and big data.
Background
At present, with the increasing exuberance of the demands of the tourism industry, the scale of the tourism market is continuously enlarged, and the defects of the traditional tourism industry are exposed when the demands of people are continuously met in the development process. The tourism industry is more and more informationized along with social progress, and gradually develops into an internet + tourism mode. The information overload occurs along with the problems, and the information overload means that along with the development of social and economic technologies, more and more information is produced, and finally, the total amount of the information greatly exceeds the requirements of people, thereby causing difficulty in selecting and using the information for people. The problem of information overload also exists in the tourism industry, platforms such as websites and APPs can record a large amount of log data in the operation process, user behavior data contained in the log data comprise page browsing, purchasing, clicking, scoring, commenting and the like, and in the face of increasingly abundant users and tourism information on the network, how to quickly and effectively acquire and mine effective information in the information becomes a problem concerned by people without quickly and accurately pushing the information by the users.
Disclosure of Invention
In view of the above, the invention provides an information pushing method and an information pushing device based on cloud computing and big data, which can improve the viscosity of website users, save the time and energy for users to search and compare tourist attractions, and are used for solving the above technical problems.
In a first aspect, the invention provides an information push method based on cloud computing and big data, which is applied to a cloud platform, wherein the cloud platform comprises a data input layer, a recommendation algorithm layer and a data output layer, the data input layer is used for inputting user data, the recommendation algorithm layer is used for integrating the user data by the cloud platform so as to classify all information and provide recommendation information, and the data output layer is used for outputting a result of unifying and individualizing all data of the data input layer to a system background according to the recommendation algorithm layer as push content to be returned, and the method comprises the following steps:
the method comprises the steps of obtaining and classifying scenery spot information, extracting keyword information from the scenery spot information to judge the type of the scenery spot, and constructing a tourist spot evaluation index according to the type of the scenery spot, wherein the tourist spot evaluation index comprises index data of the scenery spot and score data of a user on the scenery spot;
preprocessing the scenery spot data corresponding to the scenery spot evaluation index, and establishing a scoring matrix QUOTE between the user and the scenery spot
Figure 675915DEST_PATH_IMAGE002
Figure 695823DEST_PATH_IMAGE002
Index matrix QUOTE of scenic spot and evaluation index
Figure 26311DEST_PATH_IMAGE004
Figure 752958DEST_PATH_IMAGE004
Determining each index weight in the scenic spot evaluation indexes by adopting a hierarchical analysis algorithm;
adding a scenic spot and evaluation index matrix QUOTE according to the index weight and the fitting degree of the user score
Figure 182803DEST_PATH_IMAGE004
Figure 1592DEST_PATH_IMAGE004
Calculating a first similarity between every two scenic spots in the internal data calculation matrix, and selecting QUOTE according to the first similarity in sequence
Figure 807874DEST_PATH_IMAGE006
Figure 654607DEST_PATH_IMAGE006
Obtaining a nearest neighbor set;
scoring matrix QUOTE based on user and scenic spot
Figure 255353DEST_PATH_IMAGE002
Figure 577750DEST_PATH_IMAGE002
Calculating the second similarity between users by grading the scenic spots to obtain the nearest neighbor user of the user, selecting QUOTE
Figure 859826DEST_PATH_IMAGE008
Figure 623383DEST_PATH_IMAGE008
According to QUOTE
Figure 208079DEST_PATH_IMAGE006
Figure 689876DEST_PATH_IMAGE006
Scenery data and QUOTE
Figure 775644DEST_PATH_IMAGE008
Figure 393707DEST_PATH_IMAGE008
The user data of the mobile terminal completes the pushing of the scenic spot information.
As a further improvement of the above solution, according to QUOTE
Figure 929731DEST_PATH_IMAGE006
Figure 570927DEST_PATH_IMAGE006
Scenic spot data and quench
Figure 257124DEST_PATH_IMAGE008
Figure 306857DEST_PATH_IMAGE008
The user data completes the scenic spot information push, which comprises the following steps:
obtaining QUOTE
Figure 420307DEST_PATH_IMAGE008
Figure 548800DEST_PATH_IMAGE008
The score of the user to the unscored scenic spots is predicted by the neighbor user through a weighted average method;
and judging the number of high-level scenic spots in the prediction score, and filling the missing parts by similar scenic spots of the scenic spots in the list when the number of the scenic spots is less than that of the recommended list.
As a further improvement of the technical scheme, the method is based on a user and scenery spot scoring matrix QUOTE
Figure 366583DEST_PATH_IMAGE002
Figure 959238DEST_PATH_IMAGE002
Calculating a second similarity between the users through the score of the scenic spot to obtain a nearest neighbor user of the user, wherein the method comprises the following steps:
acquiring specific rating data of a user on a scenic spot, and constructing the specific rating data into a user-scenic spot rating matrix, wherein rows of the matrix represent user QUOTE
Figure 446851DEST_PATH_IMAGE010
Figure 124957DEST_PATH_IMAGE010
The columns of the matrix represent scenic spots QUOTE
Figure 966006DEST_PATH_IMAGE012
Figure 413167DEST_PATH_IMAGE012
Each datum in the matrix represents the score value of the user n to the sight spot m, and the concrete matrix is represented as QUOTE
Figure DEST_PATH_IMAGE013
Figure 665157DEST_PATH_IMAGE013
After the matrix is obtained, according to specific user scoring data, sampling different similarity calculation formulas to calculate the similarity between users or scenic spots so as to obtain calculation results, sequencing the calculation results to obtain K neighbors of the users or the scenic spots, and selecting and generating a push result from the data of the neighbor users;
when a request is made to a target user QUOTE
Figure DEST_PATH_IMAGE015
Figure 33822DEST_PATH_IMAGE015
Pushing tourist attraction QUOTE
Figure DEST_PATH_IMAGE017
Figure 865511DEST_PATH_IMAGE017
Judging user QUOTE in time
Figure 744343DEST_PATH_IMAGE015
Figure 573759DEST_PATH_IMAGE015
If the score of the neighbor user to the scenery spot is higher, predicting the target user to the scenery spot QUOTE
Figure 960878DEST_PATH_IMAGE017
Figure 330680DEST_PATH_IMAGE017
Is high and pushes the sight spot QUOTE to the user
Figure 80330DEST_PATH_IMAGE017
Figure 80647DEST_PATH_IMAGE017
Otherwise, the push is not performed.
As a further improvement of the above technical solution, the similarity calculationThe process includes a modified similarity and a modified prediction formula, the modified similarity expression being QUOTE
Figure 955062DEST_PATH_IMAGE018
Figure 207183DEST_PATH_IMAGE018
Wherein QUOTE
Figure 483443DEST_PATH_IMAGE020
Figure 920241DEST_PATH_IMAGE020
Representing the time of the user u performing the operation on the content i, the longer the operation time of the user on the content i and the content j is, the higher the QUOTE is
Figure 344269DEST_PATH_IMAGE022
Figure 55873DEST_PATH_IMAGE022
The smaller, the attenuation function used is QUOTE
Figure 389903DEST_PATH_IMAGE024
Figure DEST_PATH_IMAGE025
Wherein QUOTE
Figure DEST_PATH_IMAGE027
Figure 631223DEST_PATH_IMAGE027
Representing a time decay parameter, QUOTE
Figure DEST_PATH_IMAGE029
Figure 417913DEST_PATH_IMAGE029
Representing a hyper-parameter; the modified prediction formula is QUOTE
Figure DEST_PATH_IMAGE031
Figure 526683DEST_PATH_IMAGE032
Wherein QUOTE
Figure 715219DEST_PATH_IMAGE027
Figure 290557DEST_PATH_IMAGE027
Representing a time decay parameter, QUOTE
Figure 439910DEST_PATH_IMAGE034
Figure 758896DEST_PATH_IMAGE034
Indicating a degree of temporal attenuation of the control, QUOTE
Figure 67517DEST_PATH_IMAGE036
Figure 141652DEST_PATH_IMAGE036
With QUOTE
Figure 637356DEST_PATH_IMAGE038
Figure 760033DEST_PATH_IMAGE038
The smaller the phase difference, the content with high similarity to the content j will also be ranked in the push list of the target user u to a high similarity.
As a further improvement of the above technical solution, a K-means clustering algorithm is used to cluster users and form K clusters to obtain cluster information, and when querying nearest neighbors of a user, the user in a cluster needs to be searched and a similarity value between the user and a cluster user needs to be recalculated to find the first N users and complete push, and the process includes:
initializing the scoring matrix QUOTE
Figure 31483DEST_PATH_IMAGE040
Figure 948623DEST_PATH_IMAGE040
Target user ui, parameters of matrix insufficiency QUOTE
Figure 197202DEST_PATH_IMAGE042
Figure 185887DEST_PATH_IMAGE042
Neighbor parameter M, time decay parameter QUOTE
Figure 259DEST_PATH_IMAGE027
Figure 291563DEST_PATH_IMAGE027
Carrying out SVT algorithm solution on the scoring matrix of the user and completing the matrix;
clustering the completed matrix by using a K-means algorithm under big data, obtaining clusters with high correlation of all users by dividing, and searching other Top-K with highest similarity in the target users as a neighbor set of the target users for the cluster in which the target users are located;
adopting similarity of introduced time factors to carry out personalized push based on scenic spots, and selecting the first N contents as push results to finish the QUOTE
Figure 824175DEST_PATH_IMAGE008
Figure 836125DEST_PATH_IMAGE008
And (4) pushing.
As a further improvement of the above technical solution, the preprocessing of the scenic spot data corresponding to the scenic spot evaluation index includes:
converting the collected user information into a two-dimensional matrix, performing digital representation to obtain user data, performing noise reduction and normalization processing on the user data, and constructing the preprocessed user data into a scoring matrix of the user and the scenic spot;
QUOTE is obtained by using similarity calculation formula for user data
Figure 770583DEST_PATH_IMAGE006
Figure 232788DEST_PATH_IMAGE006
Obtaining the scored data of the similar users through a similar user list corresponding to the first similarity, and obtaining a prediction score value of the target user through weighted average calculation;
QUOTE ordered according to prediction score value results
Figure 987118DEST_PATH_IMAGE006
Figure 317605DEST_PATH_IMAGE006
And pushing the user as the generated pushing result.
As a further improvement of the above technical solution, QUOTE sorted according to the result of predicting score value
Figure 106569DEST_PATH_IMAGE006
Figure 474097DEST_PATH_IMAGE006
Pushing for the user as a generated pushing result, comprising:
dividing a data set corresponding to user data into a training set test set, and training a model of the behavior and interest of a user by using the training set to obtain a training result;
applying the test set data to the model according to the training result for testing, and comparing the training set data with the test set result to calculate the prediction accuracy of the model, wherein the scoring prediction process comprises the following steps;
predicting the score of the user on the unevaluated scenic spot, and predicting by analyzing the scored data of the user, wherein the algorithm of the predicted score is Root Mean Square Error (RMSE), and the expression of the root mean square error is QUOTE
Figure 981301DEST_PATH_IMAGE044
Figure 364747DEST_PATH_IMAGE044
Where T represents the data set used to test the model, and the number of element data in T is QUOTE
Figure 945901DEST_PATH_IMAGE046
Figure 812226DEST_PATH_IMAGE046
U denotes user, i denotes attraction, QUETE
Figure 603464DEST_PATH_IMAGE048
Figure 213437DEST_PATH_IMAGE048
Representing the u-to-i true score, QUOTE, derived from the training set
Figure 914677DEST_PATH_IMAGE050
Figure 686324DEST_PATH_IMAGE050
Representing the prediction score of u vs. i derived from the prediction set.
As a further improvement of the technical scheme, a scenic spot and evaluation index matrix QUOTE is added according to the fitting degree of the index weight and the user score
Figure 981170DEST_PATH_IMAGE004
Figure 129255DEST_PATH_IMAGE004
The data in the inner calculation matrix calculates a first similarity between each two sights, including:
using a content-based collaborative filtering algorithm for pushing sights similar to the sight liked by the user before, wherein the expression of the first similarity is QUOTE
Figure DEST_PATH_IMAGE051
Figure 685001DEST_PATH_IMAGE051
The expression of the temporal attenuation of the scenic spots in which the user is interested is QUOTE
Figure DEST_PATH_IMAGE053
Figure 221025DEST_PATH_IMAGE053
After obtaining the first similarity of the scenic spots, the first similarity is expressed as QUOTE
Figure DEST_PATH_IMAGE055
Figure 127801DEST_PATH_IMAGE056
Wherein QUOTE
Figure 391161DEST_PATH_IMAGE058
Figure 863730DEST_PATH_IMAGE058
Indicating that user u likes a collection of sights, QUOTE
Figure 977180DEST_PATH_IMAGE060
Figure 105673DEST_PATH_IMAGE060
Represents the set of K sights, QUOTE, that is most similar to sight j
Figure 595560DEST_PATH_IMAGE062
Figure 250532DEST_PATH_IMAGE062
Representing a first degree of similarity, QUOTE, of sight i and sight j
Figure 800462DEST_PATH_IMAGE048
Figure 150672DEST_PATH_IMAGE048
Representing the interest of user u in sight i.
As a further improvement of the above technical solution, the first similarity of the scenic spots is calculated by using euclidean distance, and the process includes:
after determining the weight of each item of data in the evaluation index, constructing the QUOTE about the scenic spot and the evaluation index from the collected data
Figure 444250DEST_PATH_IMAGE064
Figure 704462DEST_PATH_IMAGE064
Dimension matrix: quote
Figure DEST_PATH_IMAGE065
Figure 628555DEST_PATH_IMAGE065
Wherein each row represents data of a scenery spot, the columns represent constructed evaluation index data, and the data of each column is multiplied by the index weight to obtain a scenery spot and weight index matrix QUOTE
Figure DEST_PATH_IMAGE067
Figure 121853DEST_PATH_IMAGE067
, QUOTE
Figure DEST_PATH_IMAGE069
Figure 156806DEST_PATH_IMAGE069
Pair matrix QUOTE
Figure 458474DEST_PATH_IMAGE067
Figure 667650DEST_PATH_IMAGE067
Similarity calculation is carried out on the scenic spot data structures in the two scenic spot data structures, euclidean distances of the two scenic spot data structures are calculated pairwise, and TOP-K responding to the Euclidean distances is selected as similar scenic spots of the scenic spots.
In a second aspect, the present invention further provides an information pushing apparatus based on cloud computing and big data, including:
the system comprises an acquisition unit, a classification unit and a processing unit, wherein the acquisition unit is used for acquiring and classifying the scenery spot information, extracting keyword information from the scenery spot information to judge the type of the scenery spot, and constructing a tourist spot evaluation index according to the type of the scenery spot, and the tourist spot evaluation index comprises index data of the scenery spot and score data of a user on the scenery spot;
a preprocessing unit for preprocessing the scenery spot data corresponding to the scenery spot evaluation index and establishing a scoring matrix QUOTE between the user and the scenery spot
Figure 54769DEST_PATH_IMAGE070
Figure 627833DEST_PATH_IMAGE070
Index matrix QUOTE of scenic spots and evaluation indexes
Figure 49587DEST_PATH_IMAGE004
Figure 174538DEST_PATH_IMAGE004
Determining each index weight in the scenic spot evaluation indexes by adopting a hierarchical analysis algorithm;
a computing unit for adding a scenery spot and evaluation index matrix QUOTE according to the index weight and the fitting degree of the user score
Figure 48953DEST_PATH_IMAGE004
Figure 160129DEST_PATH_IMAGE004
Calculating a first similarity between every two scenic spots in the matrix by the data in the system, and selecting QUOTE according to the first similarity in sequence
Figure 436389DEST_PATH_IMAGE006
Figure 482974DEST_PATH_IMAGE006
Obtaining a nearest neighbor set;
an information push unit for pushing the score matrix QUOTE based on the user and the scenic spot
Figure DEST_PATH_IMAGE071
Figure 47947DEST_PATH_IMAGE071
Calculating the second similarity between users by grading the scenic spots to obtain the nearest neighbor user of the user, selecting QUOTE
Figure 25131DEST_PATH_IMAGE008
Figure 483794DEST_PATH_IMAGE008
According to QUOTE
Figure 888230DEST_PATH_IMAGE006
Figure 409341DEST_PATH_IMAGE006
Scenic spot data and quench
Figure 190216DEST_PATH_IMAGE008
Figure 752653DEST_PATH_IMAGE008
The user data of (2) completes the pushing of the sight spot information.
The invention provides an information pushing method and device based on cloud computing and big data, which comprises the steps of obtaining scenery spot information, classifying the scenery spot information, extracting keyword information from the scenery spot information to judge the type of the scenery spot, constructing a tourist spot evaluation index according to the type of the scenery spot, preprocessing the scenery spot data corresponding to the scenery spot evaluation index, and establishing a scoring matrix QUOTE of a user and the scenery spot
Figure 327991DEST_PATH_IMAGE070
Figure 336398DEST_PATH_IMAGE070
Index matrix QUOTE of scenic spot and evaluation index
Figure 655384DEST_PATH_IMAGE004
Figure 823060DEST_PATH_IMAGE004
Determining each index weight in the scenic spot evaluation indexes by adopting a hierarchical analysis algorithm, and adding a scenic spot and evaluation index matrix QUOTE according to the index weight and the fitting degree of user scores
Figure 569299DEST_PATH_IMAGE004
Figure 330582DEST_PATH_IMAGE004
Calculating a first similarity between every two scenic spots in the matrix by the data in the system, and selecting QUOTE according to the first similarity in sequence
Figure 187679DEST_PATH_IMAGE006
Figure 226174DEST_PATH_IMAGE006
Obtaining a nearest neighbor set based on a user and scenery spot scoring matrix QUOTE
Figure 143314DEST_PATH_IMAGE071
Figure 126314DEST_PATH_IMAGE071
Calculating the second similarity between users by grading the scenic spots to obtain the nearest neighbor user of the user, selecting QUOTE
Figure 787102DEST_PATH_IMAGE008
Figure 929370DEST_PATH_IMAGE008
According to QUOTE
Figure 17412DEST_PATH_IMAGE006
Figure 487708DEST_PATH_IMAGE006
Scenery data and QUOTE
Figure 952187DEST_PATH_IMAGE008
Figure 198229DEST_PATH_IMAGE008
The scenic spot information pushing is completed by the user data, scenic spot evaluation indexes can be established, a scenic spot-index system matrix is obtained by collecting relevant scenic spot index data, analyzing and processing are combined with the determined index weight, the higher pushing quality is achieved, better information pushing can be provided for the user, and the accuracy of information pushing and the working efficiency of the system are improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a flowchart of an information push method based on cloud computing and big data according to the present invention;
FIG. 2 is a process diagram of scene point data preprocessing of the present invention;
fig. 3 is a block diagram of a cloud computing and big data based information pushing apparatus according to the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention and are not to be construed as limiting the present invention.
Referring to fig. 1, the invention provides an information push method based on cloud computing and big data, which is applied to a cloud platform, wherein the cloud platform comprises a data input layer, a recommendation algorithm layer and a data output layer, the data input layer is used for inputting user data, the recommendation algorithm layer is used for integrating the user data by the cloud platform so as to classify all information and provide recommendation information, and the data output layer is used for outputting a result of unifying and individualizing all data of the data input layer to a system background according to the recommendation algorithm layer as push content to be returned, and the method comprises the following steps:
s1: the method comprises the steps of obtaining scenic spot information, classifying the scenic spot information, extracting keyword information from the scenic spot information, judging the type of the scenic spot, and constructing a scenic spot evaluation index according to the type of the scenic spot, wherein the scenic spot evaluation index comprises scenic spot index data and score data of a user on the scenic spot;
s2: preprocessing the scenery spot data corresponding to the scenery spot evaluation index, and establishing a scoring matrix QUOTE between the user and the scenery spot
Figure 457173DEST_PATH_IMAGE071
Figure 414764DEST_PATH_IMAGE071
Index matrix QUOTE of scenic spot and evaluation index
Figure 682935DEST_PATH_IMAGE004
Figure 534216DEST_PATH_IMAGE004
Determining each index weight in the scenic spot evaluation indexes by adopting a hierarchical analysis algorithm;
s3: adding a scenery spot and evaluation index matrix QUOTE according to the index weight and the fitting degree of the user score
Figure 964060DEST_PATH_IMAGE004
Figure 408948DEST_PATH_IMAGE004
Calculating a first similarity between every two scenic spots in the matrix by the data in the system, and selecting QUOTE according to the first similarity in sequence
Figure 215230DEST_PATH_IMAGE006
Figure 671750DEST_PATH_IMAGE006
Obtaining a nearest neighbor set;
s4: scoring matrix QUOTE based on user and scenic spot
Figure 538075DEST_PATH_IMAGE071
Figure 204680DEST_PATH_IMAGE071
Calculating the second similarity between users by grading the scenic spots to obtain the nearest neighbor user of the user, and selecting QUOTE
Figure 814653DEST_PATH_IMAGE008
Figure 640526DEST_PATH_IMAGE008
According to QUOTE
Figure 615435DEST_PATH_IMAGE006
Figure 408817DEST_PATH_IMAGE006
Scenery data and QUOTE
Figure 556901DEST_PATH_IMAGE008
Figure 112648DEST_PATH_IMAGE008
The user data of the mobile terminal completes the pushing of the scenic spot information.
In the present example, according to QUOTE
Figure 320775DEST_PATH_IMAGE006
Figure 86606DEST_PATH_IMAGE006
Scenic spot data and quench
Figure 38381DEST_PATH_IMAGE008
Figure 448634DEST_PATH_IMAGE008
The user data completes the scenic spot information push, which comprises the following steps: obtaining QUOTE
Figure 827663DEST_PATH_IMAGE008
Figure 831522DEST_PATH_IMAGE008
The neighbor user scores the unscored scenic spots of the user, and the scores of the unscored scenic spots of the user are predicted by a weighted average method; and judging the number of high-level scenic spots in the prediction score, and filling the missing part with similar scenic spots of the scenic spots in the list when the number of the scenic spots is less than that of the recommended list. The user modeling is an important part of the tourism pushing system, plays a decisive role in pushing results, the suitable object of the tourism pushing system is a user, the basis for pushing the user is to obtain preference information of the user, the more comprehensive the information is, the more personalized the pushing results are, therefore, most of the users extract user information more perfectly, but the tourist attractions which form the pushing system are also important for analyzing and calculating the tourist attraction information.
Specifically, the preference degree of the user for the content is calculated according to the preference characteristic, the age characteristic, the gender characteristic and the address characteristic, then favorite values are respectively obtained according to the characteristics of current user registration information, finally, the final favorite values are obtained by using weighted summation, and the most interested money n scenic spots are selected for page display. The new pushed scenic spots are added into a database corresponding to the cloud platform, no user behavior information exists, similar pushing cannot be found according to a collaborative filtering algorithm, the situation that new content cannot be recommended by a user is caused, the characteristics of the pushed content can be extracted by combining with a content recommending algorithm, and similar content can be found and recommended for the content of a newly added system. The content category is mainly the feature and the content title, the purpose and the object are the secondary features, the weight value is determined, the more important the function is, the larger the weight of the content type is, the more the content function is, the more detailed the label information is, and the higher the pushing accuracy of the similar content is.
It should be noted that, since the scoring behavior of the user has randomness, data of the scoring matrix of the user scenic spot appears to be extremely sparse, when calculating the similarity TOP-K of the user or scenic spot, available data is limited, a large amount of useless data participates in the calculation, the obtained accuracy is too low, and meanwhile, the pushing quality of the pushing system is also affected, so that the pushing effect does not meet the expectation. The scoring matrix is a very sparse matrix with low rank, and is analyzed from the perspective of the user and the perspective of pushing in personalized delivery, if the user QUOTE
Figure DEST_PATH_IMAGE073
Figure 524672DEST_PATH_IMAGE073
And user QUOTE
Figure DEST_PATH_IMAGE075
Figure 445223DEST_PATH_IMAGE075
User QUOTE, inclined to push scenery i at the same time
Figure 729574DEST_PATH_IMAGE073
Figure 345363DEST_PATH_IMAGE073
And user QUOTE
Figure 638941DEST_PATH_IMAGE076
Figure 415266DEST_PATH_IMAGE076
The similarity is higher in preference to other pushed sights. Or, if the user QUOTE
Figure 136097DEST_PATH_IMAGE073
Figure 973603DEST_PATH_IMAGE073
At the same timeAnd if the scenic spots i and j are favored, the similarity of other users to the scenic spots i and j is higher in preference. The assumption is reflected in a matrix M, the matrix M is the low rank of the matrix, and the matrix is complemented according to the low rank of the scoring matrix so as to alleviate the problem of data sparsity. When the system provides pushing for users, the similarity relation between each user needs to be calculated to solve the nearest neighbors of the users, when the number of the users is small and the number of the contents is small, the users can push information quickly, but with the increase of the number of the users and the number of scenic spots, the calculation consumes time and occupies system resources, the cluster analysis is performed on the complete click score matrix, the users are divided into a plurality of clusters, when the users are searching for the nearest neighbors, only the clusters are needed to be searched without calculating all the users, the time for searching the nearest neighbors is shortened, and the complexity of the algorithm is also reduced. And clustering the user movement by adopting a K-means clustering algorithm and forming K clustering, acquiring cluster information for a specific user, searching the user in the cluster when searching the nearest neighbor of the user, then recalculating the similarity value between the user and the cluster, and finding the first N users to finish pushing. In the data of the push system, not only user rating data, but also many hidden data, such as user browsing information, evaluation time information, evaluation location information, etc., play an important role in interest mining.
It should be understood that the sparse user scoring matrix is supplemented through the SVT algorithm, the users are clustered through the K-means algorithm, user clustering is achieved, the neighbor search range is reduced, TOP-K pushing is completed through the similarity calculation method of the time factors, and the first K scenic spots are taken to be displayed. After a new user registers, because there is no behavior information, the user can only start from the registration information, the registration information of the user includes the preference, sex, age, address and other information of the user, for the personalized push system, the preference information of the user is the most important, and then sex, third age and final address information, when calculating the preference of the user, the functions will obtain different weights, and the more important the characteristics are, the higher the weight is. The pushing process of the registration information may be: obtaining user registration information, and performing user registration according to the user registration informationThe classification can be multiple classification such as a plurality of characteristics, the scenic spots which are most liked by the user in the categories to which the user belongs are pushed to the user, the preference items, namely the functions, of the user in each category are weighted and summed, and the core problem is that the preference degree QUOTE of the user with the function is calculated for each function, namely each scenic spot for each function f
Figure DEST_PATH_IMAGE077
Figure 133189DEST_PATH_IMAGE077
Wherein QUOTE
Figure DEST_PATH_IMAGE079
Figure 903699DEST_PATH_IMAGE079
Representing a set of users, QUOTE, interested in pushing a sight i
Figure DEST_PATH_IMAGE081
Figure 795431DEST_PATH_IMAGE081
Representing a set of users whose features contain f.
Optionally, QUOTE based on user and attraction scoring matrix
Figure 995600DEST_PATH_IMAGE071
Figure 630980DEST_PATH_IMAGE071
Calculating a second similarity between the users through the scores of the scenic spots to obtain a nearest neighbor user of the user, wherein the method comprises the following steps:
acquiring specific rating data of a user on a scenic spot, and constructing the specific rating data into a user-scenic spot rating matrix, wherein rows of the matrix represent user QUOTE
Figure 990417DEST_PATH_IMAGE010
Figure 53051DEST_PATH_IMAGE010
The columns of the matrix represent scenic spots QUOTE
Figure 255363DEST_PATH_IMAGE012
Figure 366538DEST_PATH_IMAGE012
Each datum in the matrix represents the value of the score of the user n for the sight spot m, and the specific matrix is represented as QUOTE
Figure 642799DEST_PATH_IMAGE013
Figure 187918DEST_PATH_IMAGE013
After the matrix is obtained, according to specific user scoring data, different similarity calculation formulas are sampled to calculate the similarity between the users or the scenic spots so as to obtain calculation results, the calculation results are sequenced to obtain K neighbors of the users or the scenic spots, and the pushing results are generated by selecting from data of the neighbor users;
when a target user is required to be QUOTE
Figure 549630DEST_PATH_IMAGE015
Figure 464496DEST_PATH_IMAGE015
Pushing tourist attraction QUOTE
Figure 595263DEST_PATH_IMAGE017
Figure 62016DEST_PATH_IMAGE017
Judging user QUOTE in time first
Figure 911024DEST_PATH_IMAGE015
Figure 629581DEST_PATH_IMAGE015
If the score of the neighbor user for the scenery spot is determined, the score of the neighbor user for the scenery spot is determinedIf the target user is higher than the preset threshold, predicting the goal user to the sight spot QUOTE
Figure 614855DEST_PATH_IMAGE017
Figure 3242DEST_PATH_IMAGE017
Has a higher score and pushes the sight spot QUOTE to the user
Figure 339545DEST_PATH_IMAGE017
Figure 596214DEST_PATH_IMAGE017
Otherwise, the push is not performed.
In this embodiment, the similarity calculation process includes a modified similarity and a modified prediction formula, and the modified similarity expression is quale
Figure 701573DEST_PATH_IMAGE018
Figure 510129DEST_PATH_IMAGE018
Wherein QUOTE
Figure 333729DEST_PATH_IMAGE020
Figure 128510DEST_PATH_IMAGE020
Representing the time of the user u performing the operation on the content i, the longer the operation time of the user on the content i and the content j is, the higher the QUOTE is
Figure 88375DEST_PATH_IMAGE022
Figure 582679DEST_PATH_IMAGE022
The smaller, the attenuation function used is QUOTE
Figure 627996DEST_PATH_IMAGE024
Figure 226467DEST_PATH_IMAGE025
Wherein QUOTE
Figure 306419DEST_PATH_IMAGE027
Figure 456777DEST_PATH_IMAGE027
Representing a time decay parameter, QUOTE
Figure 989390DEST_PATH_IMAGE029
Figure 391552DEST_PATH_IMAGE029
Representing a hyper-parameter; the modified prediction equation is QUOTE
Figure 326010DEST_PATH_IMAGE031
Figure 398003DEST_PATH_IMAGE032
Wherein QUOTE
Figure 417911DEST_PATH_IMAGE027
Figure 358186DEST_PATH_IMAGE027
Representing a time attenuation parameter, QUOTE
Figure 147150DEST_PATH_IMAGE034
Figure 904890DEST_PATH_IMAGE034
Indicating a degree of temporal attenuation of the control, QUOTE
Figure 412095DEST_PATH_IMAGE036
Figure 890481DEST_PATH_IMAGE036
With QUOTE
Figure 799531DEST_PATH_IMAGE038
Figure 711861DEST_PATH_IMAGE038
The smaller the phase difference, the higher the similarity of content j to content j will be ranked in the push list of target user u to a high similarity. Using a K-means clustering algorithm to cluster the users and form K clustering to obtain cluster information, when the nearest neighbors of the users are inquired, searching the users in the cluster and recalculating the similarity value between the users and the cluster users to find the first N users and finish pushing, wherein the process comprises the following steps: initializing the scoring matrix QUOTE
Figure 706362DEST_PATH_IMAGE082
Figure 988439DEST_PATH_IMAGE040
QUOTE PARAMETERS OF A TARGET USER ui, MATRIX UNDERLY
Figure 751996DEST_PATH_IMAGE042
Figure 851539DEST_PATH_IMAGE042
Neighbor parameter M, time decay parameter QUOTE
Figure 67756DEST_PATH_IMAGE027
Figure 153524DEST_PATH_IMAGE027
Carrying out SVT algorithm solution on the scoring matrix of the user, and completing the matrix; clustering the completed matrix by using a K-means algorithm under big data, obtaining clusters with high correlation of all users by dividing, and searching other Top-K with highest similarity in the target users as a neighbor set of the target users for the cluster in which the target users are located; adopting similarity of introduced time factors to carry out personalized push based on scenic spots, and selecting the first N contents as push results to finish QUOTE
Figure 771587DEST_PATH_IMAGE008
Figure 792764DEST_PATH_IMAGE008
And (4) pushing.
It should be noted that, when a user does not score tourist attractions in an actual system, it is impossible to score all tourist attractions one by one, and conversely, one attraction is not scored by all users, so the matrix is usually a sparse matrix in application, after the matrix is obtained, the similarity between users or attractions can be calculated by adopting different similarity calculation formulas according to specific user scoring data, after a result is obtained, K neighbors of the users or attractions are obtained by sequencing, and the recommendation result is generated by selecting from data of the neighbor users. When a target user QUOTE is required
Figure 496278DEST_PATH_IMAGE015
Figure 120157DEST_PATH_IMAGE015
Pushing tourist attraction QUOTE
Figure 858306DEST_PATH_IMAGE017
Figure 565231DEST_PATH_IMAGE017
Judging user QUOTE in time first
Figure 693724DEST_PATH_IMAGE015
Figure 183611DEST_PATH_IMAGE015
If the score of the neighbor user to the scenery spot is generally higher, predicting the score of the target user to the scenery spot
Figure 828131DEST_PATH_IMAGE017
Figure 378061DEST_PATH_IMAGE017
Is biased toward high and pushes sight spot QUOTE to the user
Figure 993850DEST_PATH_IMAGE017
Figure 287428DEST_PATH_IMAGE017
Otherwise, the push is not performed. The pushing result is mainly based on user rating data, and the algorithm has no personalized pushing for a new user of the system because the new user does not generate enough data, or the rating data cannot be pushed out because a new scene is added into the system. The similarity can be calculated by acquiring the dominant or recessive behaviors of the user, such as scoring, forwarding, saving, marking, commenting, collecting, clicking, page staying time, purchasing and the like, and processing the behaviors to obtain the similarity, wherein the closer the similarity of the calculation result is, the more similar the similarity is, the similar user is considered to be interested. The algorithm process can be as follows: all collected user information is converted into a two-dimensional matrix to be digitally represented, noise reduction and normalization processing are carried out on the data, the preprocessed data are constructed into a user-scenic spot scoring matrix, a similar user list of TOP-N is obtained by using a similarity calculation formula on the data, scored data of similar users are obtained, a prediction score value of a target user is obtained through weighted average calculation, the TOP-N sorted according to the prediction score value result is used as a generated pushing result to be pushed to the user, and therefore the accuracy of information pushing is improved.
Referring to fig. 2, optionally, the preprocessing of the sight spot data corresponding to the sight spot evaluation index includes:
s10: converting the collected user information into a two-dimensional matrix, carrying out digital representation to obtain user data, carrying out noise reduction and normalization processing on the user data, and constructing the preprocessed user data into a scoring matrix of the user and the scenic spots;
s11: QUOTE is obtained by using similarity calculation formula for user data
Figure 796907DEST_PATH_IMAGE006
Figure 252159DEST_PATH_IMAGE006
The similar user list corresponding to the first similarity obtains the scored data of the similar users, and the weighted average calculation is carried outObtaining a prediction score value of a target user;
s12: QUOTE ordered according to prediction score value results
Figure 355244DEST_PATH_IMAGE006
Figure 186934DEST_PATH_IMAGE006
And pushing the user as the generated pushing result.
In this embodiment, QUOTE sorted according to prediction score results
Figure 567231DEST_PATH_IMAGE006
Figure 193384DEST_PATH_IMAGE006
Pushing for the user as a generated pushing result, comprising: dividing a data set corresponding to user data into a training set test set, and training a model of the behavior and interest of a user by using the training set to obtain a training result; applying the test set data to the model according to the training result for testing, and comparing the training set data with the test set result to calculate the prediction accuracy of the model, wherein the scoring prediction process comprises the following steps; predicting the score of the user on the unscored scenic spot, and predicting by analyzing the scored data of the user, wherein the algorithm of the predicted score is Root Mean Square Error (RMSE), and the expression of the root mean square error is QUOTE
Figure DEST_PATH_IMAGE083
Figure 49345DEST_PATH_IMAGE083
Where T represents the data set used to test the model and the number of element data in T is QUOTE
Figure 747042DEST_PATH_IMAGE046
Figure 903217DEST_PATH_IMAGE046
U denotes user, i denotes attraction, QUETE
Figure 903534DEST_PATH_IMAGE048
Figure 777949DEST_PATH_IMAGE048
Representing the u-to-i true score, QUOTE, derived from the training set
Figure 528605DEST_PATH_IMAGE050
Figure 804866DEST_PATH_IMAGE050
Representing the prediction score of u vs. i derived from the prediction set. The algorithm for determining the data weight is an expert scoring method and an AHP analytic hierarchy process, and the determination process can be as follows: firstly, 10 experts are selected to score 9 evaluation indexes, a judgment matrix is constructed, and the construction mode of the judgment matrix is as follows: calculating the average value of each analysis item, then dividing the average value to obtain a judgment matrix, wherein the larger the average value is, the higher the importance is, the higher the weight is, after the judgment matrix is obtained, the CR value needs to be calculated, and the specific calculation expression is QUOTE
Figure 976084DEST_PATH_IMAGE084
Figure 337796DEST_PATH_IMAGE084
Wherein the process of checking the consistency index, namely the CR value, comprises the following steps: first, the CI value calculated above is described, and its expression is QUOTE
Figure 377296DEST_PATH_IMAGE086
Figure 508063DEST_PATH_IMAGE086
And obtaining an RI value by combining the order of the judgment matrix, obtaining a CR value by the obtained CI value and the RI value, and judging whether the obtained weight has consistency according to the result. The criterion for judging whether the matrixes are consistent is a CR value, the smaller the CR value is, the higher the consistency of the matrixes is, the threshold value for judging whether the matrixes are consistent by the CR value is 0.1, 13 index values can be known in the tourist attractions according to the constructed scenic spot evaluation index, and the index values are the total number of the tourist attractionsThe total number of the available indexes used for representing the weight is 9, so that the judging matrix is a 9-order matrix, the CI value is 0.000, the RI value table lookup is 1.460, and the calculated CR value is QUOTE
Figure 850182DEST_PATH_IMAGE088
Figure 777818DEST_PATH_IMAGE088
It can be known that the evaluation index judgment matrix meets the relevant requirements in the consistency result test, so that the obtained weight results have consistency.
Optionally, adding a scenery spot and evaluation index matrix QUOTE according to the fitting degree of the index weight and the user score
Figure 293113DEST_PATH_IMAGE004
Figure 481649DEST_PATH_IMAGE004
The data in the matrix is used for calculating a first similarity between every two scenic spots, and the first similarity comprises the following steps:
using a content-based collaborative filtering algorithm for pushing sights similar to the sight liked by the user before, wherein the expression of the first similarity is QUOTE
Figure 791408DEST_PATH_IMAGE051
Figure 127711DEST_PATH_IMAGE051
The expression of the time attenuation of the scenic spot in which the user is interested is QUOTE
Figure 509014DEST_PATH_IMAGE053
Figure 614373DEST_PATH_IMAGE053
After the first similarity of the scenic spots is obtained, the expression is QUOTE
Figure 298295DEST_PATH_IMAGE055
Figure 856316DEST_PATH_IMAGE056
Wherein QUOTE
Figure 556156DEST_PATH_IMAGE058
Figure 453705DEST_PATH_IMAGE058
Indicating that user u likes a collection of sights, QUOTE
Figure 370845DEST_PATH_IMAGE060
Figure 744058DEST_PATH_IMAGE060
Represents the set of K sights, QUOTE, that is most similar to sight j
Figure 404846DEST_PATH_IMAGE062
Figure 156902DEST_PATH_IMAGE062
Representing a first degree of similarity, QUOTE, of sight i and sight j
Figure 510523DEST_PATH_IMAGE048
Figure 856185DEST_PATH_IMAGE048
Representing the interest of user u in sight i.
In this embodiment, the first similarity of the scenic spots is calculated by using euclidean distance, and the process includes: after determining the weight of each item of data in the evaluation index, the QUOTE about the scenic spot and the evaluation index is constructed from the collected data
Figure 320664DEST_PATH_IMAGE064
Figure 927226DEST_PATH_IMAGE064
Dimension matrix: QUOTE
Figure 451748DEST_PATH_IMAGE065
Figure 799553DEST_PATH_IMAGE065
Wherein each row represents data of a scenery spot, the columns represent constructed evaluation index data, and the data of each column is multiplied by the index weight to obtain a scenery spot and weight index matrix QUOTE
Figure 739827DEST_PATH_IMAGE067
Figure 528791DEST_PATH_IMAGE067
, QUOTE
Figure 535799DEST_PATH_IMAGE069
Figure 777425DEST_PATH_IMAGE069
(ii) a Pair matrix QUOTE
Figure 521390DEST_PATH_IMAGE067
Figure 430440DEST_PATH_IMAGE067
Similarity calculation is carried out on the scenic spot data structures in the two scenic spot data structures, euclidean distances of the two scenic spot data structures are calculated pairwise, and TOP-K responding to the Euclidean distances is selected as similar scenic spots of the scenic spots.
It should be noted that, for example, the scenic spot X and the scenic spot Y are combined with the index data of the scenic spot X and the quantum
Figure 93503DEST_PATH_IMAGE090
Figure 88003DEST_PATH_IMAGE090
Index data set QUOTE with scenery Y
Figure 370080DEST_PATH_IMAGE092
Figure 133637DEST_PATH_IMAGE092
Combined to obtain a similarity value QUOTE
Figure 718333DEST_PATH_IMAGE094
Figure 200130DEST_PATH_IMAGE094
. The number of the scenic spots is not easy to change greatly and is far less than the number of users, the obtained scenic spot-evaluation index matrix is dense in data, common values basically exist among variables, and Euclidean distance calculation is selected for the scenic spot similarity. Constructing user-grade-scenery point scoring matrix QUOTE according to scores of users to scenery points
Figure DEST_PATH_IMAGE095
Figure 551477DEST_PATH_IMAGE095
Here, each user does not have an m-dimensional vector, where QUOTE
Figure DEST_PATH_IMAGE097
Figure 497436DEST_PATH_IMAGE097
Represents the value of the credit of the nth user to the mth attraction when the user is QUOTE
Figure DEST_PATH_IMAGE099
Figure 908826DEST_PATH_IMAGE099
When the system is the user who has been scored, the user QUOTE is determined by the user-scenery spot scoring matrix
Figure 918065DEST_PATH_IMAGE099
Figure 604261DEST_PATH_IMAGE099
Feature vector QUOTE of
Figure DEST_PATH_IMAGE101
Figure 545672DEST_PATH_IMAGE101
And performing similarity calculation with the feature vectors of other users to obtain similar users TOP-N. When the user is a user who has not scored scenic spots in the system, extracting user characteristics according to user information to calculate the similarity of the user, calculating the weighted average of scores of the scenic spots of similar users as the score of a new user and adding the score into QUOTE
Figure DEST_PATH_IMAGE103
Figure 987018DEST_PATH_IMAGE103
In the matrix, the nearest neighbor users of the user are obtained, the prediction scores of the user are obtained according to the weighted average of the non-scores calculated by the neighbor users, and the scenic spot list QUOTE of TOP-N is obtained
Figure DEST_PATH_IMAGE105
Figure 381090DEST_PATH_IMAGE105
It should be understood that if the number of the scenic spots in the finally generated push list L is s, the commander judgment list QUOTE
Figure 684027DEST_PATH_IMAGE106
Figure 276682DEST_PATH_IMAGE106
The number n of the user score of the middle forecast is more than or equal to 3 points, when the score is equal to QUOTE
Figure 764295DEST_PATH_IMAGE108
Figure 770297DEST_PATH_IMAGE108
In the list QUOTE
Figure 798296DEST_PATH_IMAGE106
Figure 979879DEST_PATH_IMAGE106
The first s scenic spots are taken and filled into the list L to generate a push list(ii) a When QUOTE
Figure 903973DEST_PATH_IMAGE110
Figure 380959DEST_PATH_IMAGE110
Now, the list QUOTE
Figure 212649DEST_PATH_IMAGE106
Figure 452000DEST_PATH_IMAGE106
Get b scenic spots to fill in the list L, the remaining QUOTE
Figure 343733DEST_PATH_IMAGE112
Figure 58748DEST_PATH_IMAGE112
The values are listed in the list QUOTE
Figure 428550DEST_PATH_IMAGE114
Figure 787987DEST_PATH_IMAGE114
Combining similarity value ranking and user scoring ranking in similar sights of n sights in the medium screening list L into top quick
Figure 850621DEST_PATH_IMAGE112
Figure 538085DEST_PATH_IMAGE112
The scenic spots of the position and the existing n scenic spots form a final push list L together, and the final push list L is pushed for the user.
Referring to fig. 3, the present invention further provides an information pushing apparatus based on cloud computing and big data, including:
the system comprises an acquisition unit, a classification unit and a processing unit, wherein the acquisition unit is used for acquiring and classifying scenic spot information, extracting keyword information from the scenic spot information, judging the type of the scenic spot, and constructing a scenic spot evaluation index according to the type of the scenic spot, wherein the scenic spot evaluation index comprises scenic spot index data and score data of a user on the scenic spot;
a preprocessing unit, configured to preprocess the scenery spot data corresponding to the scenery spot evaluation index, and establish a scoring matrix qualte between the user and the scenery spot
Figure 977157DEST_PATH_IMAGE070
Figure 191100DEST_PATH_IMAGE070
Index matrix QUOTE of scenic spots and evaluation indexes
Figure 424636DEST_PATH_IMAGE004
Figure 848664DEST_PATH_IMAGE004
Determining each index weight in the scenic spot evaluation indexes by adopting a hierarchical analysis algorithm;
a computing unit for adding a scenery spot and evaluation index matrix QUOTE according to the index weight and the fitting degree of the user score
Figure 560268DEST_PATH_IMAGE004
Figure 894297DEST_PATH_IMAGE004
Calculating a first similarity between every two scenic spots in the internal data calculation matrix, and selecting QUOTE according to the first similarity in sequence
Figure 298734DEST_PATH_IMAGE006
Figure 459325DEST_PATH_IMAGE006
Obtaining a nearest neighbor set;
an information pushing unit used for obtaining a score matrix QUOTE based on the user and the scenic spot
Figure 974620DEST_PATH_IMAGE071
Figure 163156DEST_PATH_IMAGE071
Calculating the second similarity between users by grading the scenic spots to obtain the nearest neighbor user of the user, and selecting QUOTE
Figure 738494DEST_PATH_IMAGE008
Figure 137114DEST_PATH_IMAGE008
According to QUOTE
Figure 456100DEST_PATH_IMAGE006
Figure 499143DEST_PATH_IMAGE006
Scenic spot data and quench
Figure 324010DEST_PATH_IMAGE008
Figure 882031DEST_PATH_IMAGE008
The user data of the mobile terminal completes the pushing of the scenic spot information.
In this embodiment, the cosine similarity is used to calculate the pre-included angle between two space vectors to measure the similarity, and measure the difference between different objects, and is widely applied in a push system, and each vector is drawn into a coordinate space according to the coordinate value of the space where the vector is located, and the similarity between the vectors is calculated by using a formula, if the cosine value of the included angle between the vectors is calculated in the range quale
Figure 942390DEST_PATH_IMAGE116
Figure 636677DEST_PATH_IMAGE116
And deducing the formula to be suitable for the vector of any dimension according to the result. Whether the vectors and the vectors are in the same direction is judged according to cosine values, if the cosine values between the vectors are close to 1, namely the included angle between the vectors is almost zero degrees, the two vectors can be judged to be in the same direction, the length is irrelevant to whether the vectors are in the same direction, and for the n-dimensional vector QUOTE
Figure 881714DEST_PATH_IMAGE118
Figure 192609DEST_PATH_IMAGE118
、 QUOTE
Figure 791081DEST_PATH_IMAGE120
Figure 605453DEST_PATH_IMAGE120
The expression for calculating the cosine of the angle between them is QUOTE
Figure DEST_PATH_IMAGE121
Figure 536238DEST_PATH_IMAGE121
After the user information is processed and converted into the character string vector, the similarity is calculated by using the pre-similarity. The Euclidean distance is calculated according to the real distance between a point and a midpoint in a certain vector space, namely the real distance between an individual and the individual is obtained in space to judge the similarity degree between the two individuals, the Euclidean distance is required to be kept in a scale between the two points all the time when the Euclidean distance is used, and the Euclidean distance is calculated according to the absolute distance between the point and the midpoint in the multidimensional space. The cosine similarity calculation method calculates whether vectors are in the same direction or not, and the Euclidean distance calculates the real distance between points, so that the algorithm is more applicable than the cosine similarity when the user behavior is used as an index to calculate the user similarity, and the Euclidean distance calculation expression of the vector X and the vector Y is QUOTE
Figure DEST_PATH_IMAGE123
Figure 272112DEST_PATH_IMAGE123
When similarity calculation is carried out on users according to user scores, the Euclidean distance emphasizes on expressing the fitting degree of the user scores, and the pre-similarity can better distinguish the separation states of the users, namely the score levels.
In all examples shown and described herein, any particular value should be construed as exemplary only and not as a limitation, and thus other examples of example embodiments may have different values.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The above examples are merely illustrative of several embodiments of the present invention, and the description thereof is more specific and detailed, but not to be construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.

Claims (10)

1. The information pushing method based on cloud computing and big data is characterized by being applied to a cloud platform, wherein the cloud platform comprises a data input layer, a recommendation algorithm layer and a data output layer, the data input layer is used for inputting user data, the recommendation algorithm layer is used for integrating the user data by the cloud platform so as to classify all information and provide recommendation information, the data output layer is used for outputting results of all data of the data input layer to a system background according to the recommendation algorithm layer, the results are processed in a unified and personalized mode, the results are used as pushed and pushed contents to return, and the method comprises the following steps:
the method comprises the steps of obtaining and classifying scenery spot information, extracting keyword information from the scenery spot information to judge the type of the scenery spot, and constructing a tourist spot evaluation index according to the type of the scenery spot, wherein the tourist spot evaluation index comprises index data of the scenery spot and score data of a user on the scenery spot;
preprocessing the scenery spot data corresponding to the scenery spot evaluation index, and establishing a scoring matrix of the user and the scenery spot
Figure 534764DEST_PATH_IMAGE001
Index matrix of scenic spot and evaluation index
Figure 469222DEST_PATH_IMAGE002
Determining each index weight in the scenic spot evaluation indexes by adopting a hierarchical analysis algorithm;
adding a matrix of the scenic spots and the evaluation indexes according to the index weight and the fitting degree of the user score
Figure 768311DEST_PATH_IMAGE002
Calculating a first similarity between every two scenic spots in the matrix by the internal data, and sorting and selecting according to the first similarity
Figure 850536DEST_PATH_IMAGE003
Obtaining a nearest neighbor set;
scoring matrix based on user and scenic spot
Figure 321969DEST_PATH_IMAGE001
Calculating the second similarity between users by grading the scenic spots to obtain the nearest neighbor user of the user, and selecting
Figure 845354DEST_PATH_IMAGE004
Of neighbor users according to
Figure 275198DEST_PATH_IMAGE003
The scenic spot data of
Figure 861032DEST_PATH_IMAGE004
The user data of (2) completes the pushing of the sight spot information.
2. The cloud computing and big data based information push method according to claim 1, characterized in that according to
Figure 401734DEST_PATH_IMAGE003
The sight data of
Figure 982888DEST_PATH_IMAGE004
The user data completes the pushing of the scenic spot information, and the method comprises the following steps:
obtaining
Figure 849213DEST_PATH_IMAGE004
The score of the user to the unscored scenic spots is predicted by the neighbor user through a weighted average method;
and judging the number of high-level scenic spots in the prediction score, and filling the missing part with similar scenic spots of the scenic spots in the list when the number of the scenic spots is less than that of the recommended list.
3. The information push method based on cloud computing and big data as claimed in claim 1, wherein the scoring matrix is based on user and scenery spot
Figure 640452DEST_PATH_IMAGE001
Calculating a second similarity between the users through the scores of the scenic spots to obtain a nearest neighbor user of the user, wherein the method comprises the following steps:
acquiring specific rating data of a user on a scenic spot, and constructing the specific rating data into a user-scenic spot rating matrix, wherein rows of the matrix represent the user
Figure 250425DEST_PATH_IMAGE005
The columns of the matrix represent the sights
Figure 951664DEST_PATH_IMAGE006
Each datum in the matrix represents the value of the score of the user n for the sight spot m, and the specific matrix is represented as
Figure 723311DEST_PATH_IMAGE007
After the matrix is obtained, the similarity between the users or the scenic spots is calculated according to different similarity calculation formulas of specific user grading data sampling to obtain calculation results, and the calculation results are sequenced to obtainSelecting and generating a pushing result from data of neighbor users when K neighbors of the user or the scenic spot are reached;
when needed, the target user is provided with the information
Figure 516693DEST_PATH_IMAGE008
Pushing tourist attractions
Figure 664777DEST_PATH_IMAGE009
First judging the user
Figure 220524DEST_PATH_IMAGE008
If the score of the neighbor user to the scenic spot is higher, predicting the target user to the scenic spot
Figure 163072DEST_PATH_IMAGE009
Has a higher score and pushes the scenery spot to the user
Figure 928902DEST_PATH_IMAGE009
Otherwise, the push is not performed.
4. The information pushing method based on cloud computing and big data as claimed in claim 3, wherein the similarity computing process comprises a revised similarity and a revised prediction formula, and the revised similarity is expressed by a revised similarity expression
Figure 880678DEST_PATH_IMAGE010
Wherein
Figure 290931DEST_PATH_IMAGE011
The time of the user u performing the operation on the content i is represented, and the object of the f function is that the longer the time of the user performing the operation on the content i and the content j is, the longer the time is
Figure 669959DEST_PATH_IMAGE012
The smaller the attenuation function used is
Figure 673819DEST_PATH_IMAGE013
Wherein
Figure 163706DEST_PATH_IMAGE014
A time-decay parameter is represented which is,
Figure 694044DEST_PATH_IMAGE015
representing a hyper-parameter; the modified prediction formula is
Figure 243974DEST_PATH_IMAGE016
Wherein
Figure 718818DEST_PATH_IMAGE014
A time-decay parameter is represented which is,
Figure 12396DEST_PATH_IMAGE017
a hyperparameter representing the degree of control time decay,
Figure 397241DEST_PATH_IMAGE018
and
Figure 118072DEST_PATH_IMAGE019
the smaller the phase difference, the higher the similarity of content j to content j will be ranked in the push list of target user u to a high similarity.
5. The information pushing method based on cloud computing and big data according to claim 3, wherein a K-means clustering algorithm is used to cluster users and form K clusters to obtain cluster information, when a nearest neighbor of a user is queried, the user in a cluster needs to be searched and a similarity value between the user and the cluster user needs to be recalculated to find the first N users and complete pushing, and the process includes:
initializing a scoring matrix
Figure 595059DEST_PATH_IMAGE020
Parameters of target user ui and matrix insufficiency
Figure 426749DEST_PATH_IMAGE021
Neighbor parameter M, time decay parameter
Figure 666100DEST_PATH_IMAGE014
Carrying out SVT algorithm solution on the scoring matrix of the user and completing the matrix;
clustering the completed matrix by using a K-means algorithm under big data, obtaining clusters with high correlation of all users by dividing, and searching other Top-K with highest similarity in the target users as a neighbor set of the target users for the cluster in which the target users are located;
adopting similarity of introduced time factors to carry out personalized push based on scenic spots, and selecting the first N contents as push results to finish the process
Figure 557833DEST_PATH_IMAGE004
And (4) pushing.
6. The information pushing method based on cloud computing and big data as claimed in claim 1, wherein preprocessing the scenic spot data corresponding to the scenic spot evaluation index includes:
converting the collected user information into a two-dimensional matrix, carrying out digital representation to obtain user data, carrying out noise reduction and normalization processing on the user data, and constructing the preprocessed user data into a scoring matrix of the user and the scenic spots;
obtained by using similarity calculation formula for user data
Figure 7269DEST_PATH_IMAGE003
Obtaining scored data of the similar users through a similar user list corresponding to the first similarity, and obtaining a prediction score value of the target user through weighted average calculation;
ordered according to prediction score result
Figure 642649DEST_PATH_IMAGE003
And pushing the user as the generated pushing result.
7. The cloud computing and big data based information push method according to claim 6, wherein the results are sorted according to the predicted score value
Figure 2087DEST_PATH_IMAGE003
Pushing for the user as a generated pushing result, comprising:
dividing a data set corresponding to user data into a training set test set, and training a model of the behavior and interest of a user by using the training set to obtain a training result;
applying the test set data to the model according to the training result for testing, and comparing the training set data with the test set result to calculate the prediction accuracy of the model, wherein the scoring prediction process comprises the following steps;
predicting the score of the user on the unevaluated scenic spots by analyzing the scored data of the user, wherein the algorithm of the predicted score is Root Mean Square Error (RMSE), and the expression of the root mean square error is
Figure 64721DEST_PATH_IMAGE022
Where T represents the data set used to test the model, and the number of element data within T is
Figure 752185DEST_PATH_IMAGE023
U denotes a user, i denotes an attraction,
Figure 925677DEST_PATH_IMAGE024
representing the training set derived true score of u vs. i,
Figure 139621DEST_PATH_IMAGE025
representing the prediction score of u vs. i derived from the prediction set.
8. Root of herbaceous plantThe information push method based on cloud computing and big data as claimed in claim 1, wherein a scenery spot and evaluation index matrix is added according to the fitting degree of the index weight and the user score
Figure 373156DEST_PATH_IMAGE002
The data in the inner calculation matrix calculates a first similarity between each two sights, including:
using a content-based collaborative filtering algorithm to push sights similar to the sights liked by the user before, wherein the expression of the first similarity is
Figure 797184DEST_PATH_IMAGE026
The expression of the scenic spot in which the user is interested according to the time attenuation is
Figure 774368DEST_PATH_IMAGE027
After the first similarity of the scenic spots is obtained, the expression is
Figure 108397DEST_PATH_IMAGE028
In which
Figure 247254DEST_PATH_IMAGE029
Indicating that user u likes a collection of sights,
Figure 830682DEST_PATH_IMAGE030
represents the set of K sights that are most similar to sight j,
Figure 923141DEST_PATH_IMAGE031
representing a first similarity of sight i and sight j,
Figure 173994DEST_PATH_IMAGE024
representing the interest of the user u in the attraction i.
9. The information pushing method based on cloud computing and big data as claimed in claim 1, wherein the first similarity of the scenic spot is calculated by euclidean distance, and the process includes:
after determining the weight of each item of data in the evaluation index, constructing the scenic spot and the evaluation index from the collected data
Figure 687015DEST_PATH_IMAGE032
Dimension matrix:
Figure 757739DEST_PATH_IMAGE033
wherein each row represents data of a scenery spot, the columns represent constructed evaluation index data, and the data of each column is multiplied by the index weight to obtain a scenery spot and weight index matrix
Figure 404621DEST_PATH_IMAGE034
Figure 447663DEST_PATH_IMAGE035
For matrix
Figure 193902DEST_PATH_IMAGE034
Similarity calculation is carried out on the data structure of the scenic spots in the building, euclidean distances of the scenic spots are calculated pairwise respectively, and TOP-K of the responses is selected as similar scenic spots of the scenic spots.
10. The cloud computing and big data based information pushing device based on the cloud computing and big data based information pushing method according to any one of claims 1 to 9, comprising:
the system comprises an acquisition unit, a classification unit and a processing unit, wherein the acquisition unit is used for acquiring and classifying scenic spot information, extracting keyword information from the scenic spot information, judging the type of the scenic spot, and constructing a scenic spot evaluation index according to the type of the scenic spot, wherein the scenic spot evaluation index comprises scenic spot index data and score data of a user on the scenic spot;
a preprocessing unit for processing the scenery spotsPreprocessing the scenic spot data corresponding to the evaluation index, and establishing a scoring matrix of the user and the scenic spot
Figure 830551DEST_PATH_IMAGE001
Index matrix of scenic spot and evaluation index
Figure 687649DEST_PATH_IMAGE002
Determining each index weight in the scenic spot evaluation indexes by adopting a hierarchical analysis algorithm;
a computing unit for adding the scenery spot and evaluation index matrix according to the fitting degree of the index weight and the user score
Figure 850777DEST_PATH_IMAGE002
Calculating a first similarity between every two scenic spots in the matrix by the internal data, and sorting and selecting according to the first similarity
Figure 767917DEST_PATH_IMAGE003
Obtaining a nearest neighbor set;
an information pushing unit for scoring matrix based on user and scenery spot
Figure 547654DEST_PATH_IMAGE001
Calculating the second similarity between users by grading the scenic spots to obtain the nearest neighbor user of the user, and selecting
Figure 536339DEST_PATH_IMAGE004
Of neighbor users according to
Figure 553974DEST_PATH_IMAGE003
The sight data of
Figure 376436DEST_PATH_IMAGE004
The user data of (2) completes the pushing of the sight spot information.
CN202211376436.1A 2022-11-04 2022-11-04 Information pushing method and device based on cloud computing and big data Pending CN115712780A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211376436.1A CN115712780A (en) 2022-11-04 2022-11-04 Information pushing method and device based on cloud computing and big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211376436.1A CN115712780A (en) 2022-11-04 2022-11-04 Information pushing method and device based on cloud computing and big data

Publications (1)

Publication Number Publication Date
CN115712780A true CN115712780A (en) 2023-02-24

Family

ID=85232201

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211376436.1A Pending CN115712780A (en) 2022-11-04 2022-11-04 Information pushing method and device based on cloud computing and big data

Country Status (1)

Country Link
CN (1) CN115712780A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117349535A (en) * 2023-12-04 2024-01-05 四川启明芯智能科技有限公司 Cross-platform multi-business comprehensive travel management system and method
CN117614845A (en) * 2023-11-13 2024-02-27 纬创软件(武汉)有限公司 Communication information processing method and device based on big data analysis
CN117648497A (en) * 2024-01-29 2024-03-05 贵州大学 Method and system for realizing intelligent acquisition of user information based on big data
CN117614845B (en) * 2023-11-13 2024-05-10 纬创软件(武汉)有限公司 Communication information processing method and device based on big data analysis

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107729444A (en) * 2017-09-30 2018-02-23 桂林电子科技大学 Recommend method in a kind of personalized tourist attractions of knowledge based collection of illustrative plates

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107729444A (en) * 2017-09-30 2018-02-23 桂林电子科技大学 Recommend method in a kind of personalized tourist attractions of knowledge based collection of illustrative plates

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
付巧萍: "基于协同过滤的个性化推送***设计与实现", 万方, pages 4 - 5 *
史睿瑶: "基于改进协同过滤算法的旅游推荐***设计与实现", 中国优秀硕士学位论文全文数据库 信息科技辑, pages 2 - 4 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117614845A (en) * 2023-11-13 2024-02-27 纬创软件(武汉)有限公司 Communication information processing method and device based on big data analysis
CN117614845B (en) * 2023-11-13 2024-05-10 纬创软件(武汉)有限公司 Communication information processing method and device based on big data analysis
CN117349535A (en) * 2023-12-04 2024-01-05 四川启明芯智能科技有限公司 Cross-platform multi-business comprehensive travel management system and method
CN117648497A (en) * 2024-01-29 2024-03-05 贵州大学 Method and system for realizing intelligent acquisition of user information based on big data
CN117648497B (en) * 2024-01-29 2024-04-30 贵州大学 Method and system for realizing intelligent acquisition of user information based on big data

Similar Documents

Publication Publication Date Title
CN110162706B (en) Personalized recommendation method and system based on interactive data clustering
CN110516160B (en) Knowledge graph-based user modeling method and sequence recommendation method
Hasan et al. Dominance of AI and Machine Learning Techniques in Hybrid Movie Recommendation System Applying Text-to-number Conversion and Cosine Similarity Approaches
TWI623842B (en) Image search and method and device for acquiring image text information
US10019442B2 (en) Method and system for peer detection
CN109918563B (en) Book recommendation method based on public data
CN115712780A (en) Information pushing method and device based on cloud computing and big data
CN109471982B (en) Web service recommendation method based on QoS (quality of service) perception of user and service clustering
CN114238573B (en) Text countercheck sample-based information pushing method and device
CN108897750B (en) Personalized place recommendation method and device integrating multiple contextual information
CN109816015B (en) Recommendation method and system based on material data
CN111460251A (en) Data content personalized push cold start method, device, equipment and storage medium
CN112749330B (en) Information pushing method, device, computer equipment and storage medium
Cheung et al. Characterizing user connections in social media through user-shared images
CN110110220A (en) Merge the recommended models of social networks and user's evaluation
CN108491477B (en) Neural network recommendation method based on multi-dimensional cloud and user dynamic interest
Ramadhan et al. Collaborative Filtering Recommender System Based on Memory Based in Twitter Using Decision Tree Learning Classification (Case Study: Movie on Netflix)
CN111723302A (en) Recommendation method based on collaborative dual-model deep representation learning
CN115408618B (en) Point-of-interest recommendation method based on social relation fusion position dynamic popularity and geographic features
Chen et al. Exploiting aesthetic features in visual contents for movie recommendation
CN113688281B (en) Video recommendation method and system based on deep learning behavior sequence
CN113657766A (en) Tourist attraction joy index metering method based on tourist multi-metadata
CN113704617A (en) Article recommendation method, system, electronic device and storage medium
CN114595693A (en) Text emotion analysis method based on deep learning
KR102583679B1 (en) Apparatus and method for recommending items based on big-data of reviews

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination