CN115712780A - Information pushing method and device based on cloud computing and big data - Google Patents
Information pushing method and device based on cloud computing and big data Download PDFInfo
- Publication number
- CN115712780A CN115712780A CN202211376436.1A CN202211376436A CN115712780A CN 115712780 A CN115712780 A CN 115712780A CN 202211376436 A CN202211376436 A CN 202211376436A CN 115712780 A CN115712780 A CN 115712780A
- Authority
- CN
- China
- Prior art keywords
- user
- data
- spot
- matrix
- similarity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Abstract
The invention discloses an information pushing method and device based on cloud computing and big data, which are characterized in that scenic spot information is obtained and classified, keyword information is extracted from the scenic spot information to judge the type of the scenic spot, a scenic spot evaluation index is constructed according to the type of the scenic spot, a scoring matrix of a user and the scenic spot and an index matrix of the scenic spot and the evaluation index are established, a first similarity between every two scenic spots in a data calculation matrix in the scenic spot and evaluation index matrix is added according to the index weight and the fitting degree of user scoring, a nearest neighbor set is obtained according to the first similarity sorting, the nearest neighbor user is obtained by the scoring of the user and the scenic spot scoring matrix and calculating the second similarity between the users according to the score of the scenic spots to complete scenic spot information pushing, the scenic spot evaluation index is established, the related scenic spot index data is collected, a scenic spot-index system matrix is obtained by combining the determined index weight, higher pushing quality is achieved, and the information pushing accuracy and the working efficiency are improved.
Description
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to an information pushing method and device based on cloud computing and big data.
Background
At present, with the increasing exuberance of the demands of the tourism industry, the scale of the tourism market is continuously enlarged, and the defects of the traditional tourism industry are exposed when the demands of people are continuously met in the development process. The tourism industry is more and more informationized along with social progress, and gradually develops into an internet + tourism mode. The information overload occurs along with the problems, and the information overload means that along with the development of social and economic technologies, more and more information is produced, and finally, the total amount of the information greatly exceeds the requirements of people, thereby causing difficulty in selecting and using the information for people. The problem of information overload also exists in the tourism industry, platforms such as websites and APPs can record a large amount of log data in the operation process, user behavior data contained in the log data comprise page browsing, purchasing, clicking, scoring, commenting and the like, and in the face of increasingly abundant users and tourism information on the network, how to quickly and effectively acquire and mine effective information in the information becomes a problem concerned by people without quickly and accurately pushing the information by the users.
Disclosure of Invention
In view of the above, the invention provides an information pushing method and an information pushing device based on cloud computing and big data, which can improve the viscosity of website users, save the time and energy for users to search and compare tourist attractions, and are used for solving the above technical problems.
In a first aspect, the invention provides an information push method based on cloud computing and big data, which is applied to a cloud platform, wherein the cloud platform comprises a data input layer, a recommendation algorithm layer and a data output layer, the data input layer is used for inputting user data, the recommendation algorithm layer is used for integrating the user data by the cloud platform so as to classify all information and provide recommendation information, and the data output layer is used for outputting a result of unifying and individualizing all data of the data input layer to a system background according to the recommendation algorithm layer as push content to be returned, and the method comprises the following steps:
the method comprises the steps of obtaining and classifying scenery spot information, extracting keyword information from the scenery spot information to judge the type of the scenery spot, and constructing a tourist spot evaluation index according to the type of the scenery spot, wherein the tourist spot evaluation index comprises index data of the scenery spot and score data of a user on the scenery spot;
preprocessing the scenery spot data corresponding to the scenery spot evaluation index, and establishing a scoring matrix QUOTE between the user and the scenery spot Index matrix QUOTE of scenic spot and evaluation index Determining each index weight in the scenic spot evaluation indexes by adopting a hierarchical analysis algorithm;
adding a scenic spot and evaluation index matrix QUOTE according to the index weight and the fitting degree of the user score Calculating a first similarity between every two scenic spots in the internal data calculation matrix, and selecting QUOTE according to the first similarity in sequence Obtaining a nearest neighbor set;
scoring matrix QUOTE based on user and scenic spot Calculating the second similarity between users by grading the scenic spots to obtain the nearest neighbor user of the user, selecting QUOTE According to QUOTE Scenery data and QUOTE The user data of the mobile terminal completes the pushing of the scenic spot information.
As a further improvement of the above solution, according to QUOTE Scenic spot data and quench The user data completes the scenic spot information push, which comprises the following steps:
obtaining QUOTE The score of the user to the unscored scenic spots is predicted by the neighbor user through a weighted average method;
and judging the number of high-level scenic spots in the prediction score, and filling the missing parts by similar scenic spots of the scenic spots in the list when the number of the scenic spots is less than that of the recommended list.
As a further improvement of the technical scheme, the method is based on a user and scenery spot scoring matrix QUOTE Calculating a second similarity between the users through the score of the scenic spot to obtain a nearest neighbor user of the user, wherein the method comprises the following steps:
acquiring specific rating data of a user on a scenic spot, and constructing the specific rating data into a user-scenic spot rating matrix, wherein rows of the matrix represent user QUOTE The columns of the matrix represent scenic spots QUOTE Each datum in the matrix represents the score value of the user n to the sight spot m, and the concrete matrix is represented as QUOTE After the matrix is obtained, according to specific user scoring data, sampling different similarity calculation formulas to calculate the similarity between users or scenic spots so as to obtain calculation results, sequencing the calculation results to obtain K neighbors of the users or the scenic spots, and selecting and generating a push result from the data of the neighbor users;
when a request is made to a target user QUOTE Pushing tourist attraction QUOTE Judging user QUOTE in time If the score of the neighbor user to the scenery spot is higher, predicting the target user to the scenery spot QUOTE Is high and pushes the sight spot QUOTE to the user Otherwise, the push is not performed.
As a further improvement of the above technical solution, the similarity calculationThe process includes a modified similarity and a modified prediction formula, the modified similarity expression being QUOTE Wherein QUOTE Representing the time of the user u performing the operation on the content i, the longer the operation time of the user on the content i and the content j is, the higher the QUOTE is The smaller, the attenuation function used is QUOTE Wherein QUOTE Representing a time decay parameter, QUOTE Representing a hyper-parameter; the modified prediction formula is QUOTE Wherein QUOTE Representing a time decay parameter, QUOTE Indicating a degree of temporal attenuation of the control, QUOTE With QUOTE The smaller the phase difference, the content with high similarity to the content j will also be ranked in the push list of the target user u to a high similarity.
As a further improvement of the above technical solution, a K-means clustering algorithm is used to cluster users and form K clusters to obtain cluster information, and when querying nearest neighbors of a user, the user in a cluster needs to be searched and a similarity value between the user and a cluster user needs to be recalculated to find the first N users and complete push, and the process includes:
initializing the scoring matrix QUOTE Target user ui, parameters of matrix insufficiency QUOTE Neighbor parameter M, time decay parameter QUOTE Carrying out SVT algorithm solution on the scoring matrix of the user and completing the matrix;
clustering the completed matrix by using a K-means algorithm under big data, obtaining clusters with high correlation of all users by dividing, and searching other Top-K with highest similarity in the target users as a neighbor set of the target users for the cluster in which the target users are located;
adopting similarity of introduced time factors to carry out personalized push based on scenic spots, and selecting the first N contents as push results to finish the QUOTE And (4) pushing.
As a further improvement of the above technical solution, the preprocessing of the scenic spot data corresponding to the scenic spot evaluation index includes:
converting the collected user information into a two-dimensional matrix, performing digital representation to obtain user data, performing noise reduction and normalization processing on the user data, and constructing the preprocessed user data into a scoring matrix of the user and the scenic spot;
QUOTE is obtained by using similarity calculation formula for user data Obtaining the scored data of the similar users through a similar user list corresponding to the first similarity, and obtaining a prediction score value of the target user through weighted average calculation;
QUOTE ordered according to prediction score value results And pushing the user as the generated pushing result.
As a further improvement of the above technical solution, QUOTE sorted according to the result of predicting score value Pushing for the user as a generated pushing result, comprising:
dividing a data set corresponding to user data into a training set test set, and training a model of the behavior and interest of a user by using the training set to obtain a training result;
applying the test set data to the model according to the training result for testing, and comparing the training set data with the test set result to calculate the prediction accuracy of the model, wherein the scoring prediction process comprises the following steps;
predicting the score of the user on the unevaluated scenic spot, and predicting by analyzing the scored data of the user, wherein the algorithm of the predicted score is Root Mean Square Error (RMSE), and the expression of the root mean square error is QUOTE Where T represents the data set used to test the model, and the number of element data in T is QUOTE U denotes user, i denotes attraction, QUETE Representing the u-to-i true score, QUOTE, derived from the training set Representing the prediction score of u vs. i derived from the prediction set.
As a further improvement of the technical scheme, a scenic spot and evaluation index matrix QUOTE is added according to the fitting degree of the index weight and the user score The data in the inner calculation matrix calculates a first similarity between each two sights, including:
using a content-based collaborative filtering algorithm for pushing sights similar to the sight liked by the user before, wherein the expression of the first similarity is QUOTE The expression of the temporal attenuation of the scenic spots in which the user is interested is QUOTE ;
After obtaining the first similarity of the scenic spots, the first similarity is expressed as QUOTE Wherein QUOTE Indicating that user u likes a collection of sights, QUOTE Represents the set of K sights, QUOTE, that is most similar to sight j Representing a first degree of similarity, QUOTE, of sight i and sight j Representing the interest of user u in sight i.
As a further improvement of the above technical solution, the first similarity of the scenic spots is calculated by using euclidean distance, and the process includes:
after determining the weight of each item of data in the evaluation index, constructing the QUOTE about the scenic spot and the evaluation index from the collected data Dimension matrix: quote Wherein each row represents data of a scenery spot, the columns represent constructed evaluation index data, and the data of each column is multiplied by the index weight to obtain a scenery spot and weight index matrix QUOTE , QUOTE ;
Pair matrix QUOTE Similarity calculation is carried out on the scenic spot data structures in the two scenic spot data structures, euclidean distances of the two scenic spot data structures are calculated pairwise, and TOP-K responding to the Euclidean distances is selected as similar scenic spots of the scenic spots.
In a second aspect, the present invention further provides an information pushing apparatus based on cloud computing and big data, including:
the system comprises an acquisition unit, a classification unit and a processing unit, wherein the acquisition unit is used for acquiring and classifying the scenery spot information, extracting keyword information from the scenery spot information to judge the type of the scenery spot, and constructing a tourist spot evaluation index according to the type of the scenery spot, and the tourist spot evaluation index comprises index data of the scenery spot and score data of a user on the scenery spot;
a preprocessing unit for preprocessing the scenery spot data corresponding to the scenery spot evaluation index and establishing a scoring matrix QUOTE between the user and the scenery spot Index matrix QUOTE of scenic spots and evaluation indexes Determining each index weight in the scenic spot evaluation indexes by adopting a hierarchical analysis algorithm;
a computing unit for adding a scenery spot and evaluation index matrix QUOTE according to the index weight and the fitting degree of the user score Calculating a first similarity between every two scenic spots in the matrix by the data in the system, and selecting QUOTE according to the first similarity in sequence Obtaining a nearest neighbor set;
an information push unit for pushing the score matrix QUOTE based on the user and the scenic spot Calculating the second similarity between users by grading the scenic spots to obtain the nearest neighbor user of the user, selecting QUOTE According to QUOTE Scenic spot data and quench The user data of (2) completes the pushing of the sight spot information.
The invention provides an information pushing method and device based on cloud computing and big data, which comprises the steps of obtaining scenery spot information, classifying the scenery spot information, extracting keyword information from the scenery spot information to judge the type of the scenery spot, constructing a tourist spot evaluation index according to the type of the scenery spot, preprocessing the scenery spot data corresponding to the scenery spot evaluation index, and establishing a scoring matrix QUOTE of a user and the scenery spot Index matrix QUOTE of scenic spot and evaluation index Determining each index weight in the scenic spot evaluation indexes by adopting a hierarchical analysis algorithm, and adding a scenic spot and evaluation index matrix QUOTE according to the index weight and the fitting degree of user scores Calculating a first similarity between every two scenic spots in the matrix by the data in the system, and selecting QUOTE according to the first similarity in sequence Obtaining a nearest neighbor set based on a user and scenery spot scoring matrix QUOTE Calculating the second similarity between users by grading the scenic spots to obtain the nearest neighbor user of the user, selecting QUOTE According to QUOTE Scenery data and QUOTE The scenic spot information pushing is completed by the user data, scenic spot evaluation indexes can be established, a scenic spot-index system matrix is obtained by collecting relevant scenic spot index data, analyzing and processing are combined with the determined index weight, the higher pushing quality is achieved, better information pushing can be provided for the user, and the accuracy of information pushing and the working efficiency of the system are improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.
Fig. 1 is a flowchart of an information push method based on cloud computing and big data according to the present invention;
FIG. 2 is a process diagram of scene point data preprocessing of the present invention;
fig. 3 is a block diagram of a cloud computing and big data based information pushing apparatus according to the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention and are not to be construed as limiting the present invention.
Referring to fig. 1, the invention provides an information push method based on cloud computing and big data, which is applied to a cloud platform, wherein the cloud platform comprises a data input layer, a recommendation algorithm layer and a data output layer, the data input layer is used for inputting user data, the recommendation algorithm layer is used for integrating the user data by the cloud platform so as to classify all information and provide recommendation information, and the data output layer is used for outputting a result of unifying and individualizing all data of the data input layer to a system background according to the recommendation algorithm layer as push content to be returned, and the method comprises the following steps:
s1: the method comprises the steps of obtaining scenic spot information, classifying the scenic spot information, extracting keyword information from the scenic spot information, judging the type of the scenic spot, and constructing a scenic spot evaluation index according to the type of the scenic spot, wherein the scenic spot evaluation index comprises scenic spot index data and score data of a user on the scenic spot;
s2: preprocessing the scenery spot data corresponding to the scenery spot evaluation index, and establishing a scoring matrix QUOTE between the user and the scenery spot Index matrix QUOTE of scenic spot and evaluation index Determining each index weight in the scenic spot evaluation indexes by adopting a hierarchical analysis algorithm;
s3: adding a scenery spot and evaluation index matrix QUOTE according to the index weight and the fitting degree of the user score Calculating a first similarity between every two scenic spots in the matrix by the data in the system, and selecting QUOTE according to the first similarity in sequence Obtaining a nearest neighbor set;
s4: scoring matrix QUOTE based on user and scenic spot Calculating the second similarity between users by grading the scenic spots to obtain the nearest neighbor user of the user, and selecting QUOTE According to QUOTE Scenery data and QUOTE The user data of the mobile terminal completes the pushing of the scenic spot information.
In the present example, according to QUOTE Scenic spot data and quench The user data completes the scenic spot information push, which comprises the following steps: obtaining QUOTE The neighbor user scores the unscored scenic spots of the user, and the scores of the unscored scenic spots of the user are predicted by a weighted average method; and judging the number of high-level scenic spots in the prediction score, and filling the missing part with similar scenic spots of the scenic spots in the list when the number of the scenic spots is less than that of the recommended list. The user modeling is an important part of the tourism pushing system, plays a decisive role in pushing results, the suitable object of the tourism pushing system is a user, the basis for pushing the user is to obtain preference information of the user, the more comprehensive the information is, the more personalized the pushing results are, therefore, most of the users extract user information more perfectly, but the tourist attractions which form the pushing system are also important for analyzing and calculating the tourist attraction information.
Specifically, the preference degree of the user for the content is calculated according to the preference characteristic, the age characteristic, the gender characteristic and the address characteristic, then favorite values are respectively obtained according to the characteristics of current user registration information, finally, the final favorite values are obtained by using weighted summation, and the most interested money n scenic spots are selected for page display. The new pushed scenic spots are added into a database corresponding to the cloud platform, no user behavior information exists, similar pushing cannot be found according to a collaborative filtering algorithm, the situation that new content cannot be recommended by a user is caused, the characteristics of the pushed content can be extracted by combining with a content recommending algorithm, and similar content can be found and recommended for the content of a newly added system. The content category is mainly the feature and the content title, the purpose and the object are the secondary features, the weight value is determined, the more important the function is, the larger the weight of the content type is, the more the content function is, the more detailed the label information is, and the higher the pushing accuracy of the similar content is.
It should be noted that, since the scoring behavior of the user has randomness, data of the scoring matrix of the user scenic spot appears to be extremely sparse, when calculating the similarity TOP-K of the user or scenic spot, available data is limited, a large amount of useless data participates in the calculation, the obtained accuracy is too low, and meanwhile, the pushing quality of the pushing system is also affected, so that the pushing effect does not meet the expectation. The scoring matrix is a very sparse matrix with low rank, and is analyzed from the perspective of the user and the perspective of pushing in personalized delivery, if the user QUOTE And user QUOTE User QUOTE, inclined to push scenery i at the same time And user QUOTE The similarity is higher in preference to other pushed sights. Or, if the user QUOTE At the same timeAnd if the scenic spots i and j are favored, the similarity of other users to the scenic spots i and j is higher in preference. The assumption is reflected in a matrix M, the matrix M is the low rank of the matrix, and the matrix is complemented according to the low rank of the scoring matrix so as to alleviate the problem of data sparsity. When the system provides pushing for users, the similarity relation between each user needs to be calculated to solve the nearest neighbors of the users, when the number of the users is small and the number of the contents is small, the users can push information quickly, but with the increase of the number of the users and the number of scenic spots, the calculation consumes time and occupies system resources, the cluster analysis is performed on the complete click score matrix, the users are divided into a plurality of clusters, when the users are searching for the nearest neighbors, only the clusters are needed to be searched without calculating all the users, the time for searching the nearest neighbors is shortened, and the complexity of the algorithm is also reduced. And clustering the user movement by adopting a K-means clustering algorithm and forming K clustering, acquiring cluster information for a specific user, searching the user in the cluster when searching the nearest neighbor of the user, then recalculating the similarity value between the user and the cluster, and finding the first N users to finish pushing. In the data of the push system, not only user rating data, but also many hidden data, such as user browsing information, evaluation time information, evaluation location information, etc., play an important role in interest mining.
It should be understood that the sparse user scoring matrix is supplemented through the SVT algorithm, the users are clustered through the K-means algorithm, user clustering is achieved, the neighbor search range is reduced, TOP-K pushing is completed through the similarity calculation method of the time factors, and the first K scenic spots are taken to be displayed. After a new user registers, because there is no behavior information, the user can only start from the registration information, the registration information of the user includes the preference, sex, age, address and other information of the user, for the personalized push system, the preference information of the user is the most important, and then sex, third age and final address information, when calculating the preference of the user, the functions will obtain different weights, and the more important the characteristics are, the higher the weight is. The pushing process of the registration information may be: obtaining user registration information, and performing user registration according to the user registration informationThe classification can be multiple classification such as a plurality of characteristics, the scenic spots which are most liked by the user in the categories to which the user belongs are pushed to the user, the preference items, namely the functions, of the user in each category are weighted and summed, and the core problem is that the preference degree QUOTE of the user with the function is calculated for each function, namely each scenic spot for each function f Wherein QUOTE Representing a set of users, QUOTE, interested in pushing a sight i Representing a set of users whose features contain f.
Optionally, QUOTE based on user and attraction scoring matrix Calculating a second similarity between the users through the scores of the scenic spots to obtain a nearest neighbor user of the user, wherein the method comprises the following steps:
acquiring specific rating data of a user on a scenic spot, and constructing the specific rating data into a user-scenic spot rating matrix, wherein rows of the matrix represent user QUOTE The columns of the matrix represent scenic spots QUOTE Each datum in the matrix represents the value of the score of the user n for the sight spot m, and the specific matrix is represented as QUOTE After the matrix is obtained, according to specific user scoring data, different similarity calculation formulas are sampled to calculate the similarity between the users or the scenic spots so as to obtain calculation results, the calculation results are sequenced to obtain K neighbors of the users or the scenic spots, and the pushing results are generated by selecting from data of the neighbor users;
when a target user is required to be QUOTE Pushing tourist attraction QUOTE Judging user QUOTE in time first If the score of the neighbor user for the scenery spot is determined, the score of the neighbor user for the scenery spot is determinedIf the target user is higher than the preset threshold, predicting the goal user to the sight spot QUOTE Has a higher score and pushes the sight spot QUOTE to the user Otherwise, the push is not performed.
In this embodiment, the similarity calculation process includes a modified similarity and a modified prediction formula, and the modified similarity expression is quale Wherein QUOTE Representing the time of the user u performing the operation on the content i, the longer the operation time of the user on the content i and the content j is, the higher the QUOTE is The smaller, the attenuation function used is QUOTE Wherein QUOTE Representing a time decay parameter, QUOTE Representing a hyper-parameter; the modified prediction equation is QUOTE Wherein QUOTE Representing a time attenuation parameter, QUOTE Indicating a degree of temporal attenuation of the control, QUOTE With QUOTE The smaller the phase difference, the higher the similarity of content j to content j will be ranked in the push list of target user u to a high similarity. Using a K-means clustering algorithm to cluster the users and form K clustering to obtain cluster information, when the nearest neighbors of the users are inquired, searching the users in the cluster and recalculating the similarity value between the users and the cluster users to find the first N users and finish pushing, wherein the process comprises the following steps: initializing the scoring matrix QUOTE QUOTE PARAMETERS OF A TARGET USER ui, MATRIX UNDERLY Neighbor parameter M, time decay parameter QUOTE Carrying out SVT algorithm solution on the scoring matrix of the user, and completing the matrix; clustering the completed matrix by using a K-means algorithm under big data, obtaining clusters with high correlation of all users by dividing, and searching other Top-K with highest similarity in the target users as a neighbor set of the target users for the cluster in which the target users are located; adopting similarity of introduced time factors to carry out personalized push based on scenic spots, and selecting the first N contents as push results to finish QUOTE And (4) pushing.
It should be noted that, when a user does not score tourist attractions in an actual system, it is impossible to score all tourist attractions one by one, and conversely, one attraction is not scored by all users, so the matrix is usually a sparse matrix in application, after the matrix is obtained, the similarity between users or attractions can be calculated by adopting different similarity calculation formulas according to specific user scoring data, after a result is obtained, K neighbors of the users or attractions are obtained by sequencing, and the recommendation result is generated by selecting from data of the neighbor users. When a target user QUOTE is required Pushing tourist attraction QUOTE Judging user QUOTE in time first If the score of the neighbor user to the scenery spot is generally higher, predicting the score of the target user to the scenery spot Is biased toward high and pushes sight spot QUOTE to the user Otherwise, the push is not performed. The pushing result is mainly based on user rating data, and the algorithm has no personalized pushing for a new user of the system because the new user does not generate enough data, or the rating data cannot be pushed out because a new scene is added into the system. The similarity can be calculated by acquiring the dominant or recessive behaviors of the user, such as scoring, forwarding, saving, marking, commenting, collecting, clicking, page staying time, purchasing and the like, and processing the behaviors to obtain the similarity, wherein the closer the similarity of the calculation result is, the more similar the similarity is, the similar user is considered to be interested. The algorithm process can be as follows: all collected user information is converted into a two-dimensional matrix to be digitally represented, noise reduction and normalization processing are carried out on the data, the preprocessed data are constructed into a user-scenic spot scoring matrix, a similar user list of TOP-N is obtained by using a similarity calculation formula on the data, scored data of similar users are obtained, a prediction score value of a target user is obtained through weighted average calculation, the TOP-N sorted according to the prediction score value result is used as a generated pushing result to be pushed to the user, and therefore the accuracy of information pushing is improved.
Referring to fig. 2, optionally, the preprocessing of the sight spot data corresponding to the sight spot evaluation index includes:
s10: converting the collected user information into a two-dimensional matrix, carrying out digital representation to obtain user data, carrying out noise reduction and normalization processing on the user data, and constructing the preprocessed user data into a scoring matrix of the user and the scenic spots;
s11: QUOTE is obtained by using similarity calculation formula for user data The similar user list corresponding to the first similarity obtains the scored data of the similar users, and the weighted average calculation is carried outObtaining a prediction score value of a target user;
s12: QUOTE ordered according to prediction score value results And pushing the user as the generated pushing result.
In this embodiment, QUOTE sorted according to prediction score results Pushing for the user as a generated pushing result, comprising: dividing a data set corresponding to user data into a training set test set, and training a model of the behavior and interest of a user by using the training set to obtain a training result; applying the test set data to the model according to the training result for testing, and comparing the training set data with the test set result to calculate the prediction accuracy of the model, wherein the scoring prediction process comprises the following steps; predicting the score of the user on the unscored scenic spot, and predicting by analyzing the scored data of the user, wherein the algorithm of the predicted score is Root Mean Square Error (RMSE), and the expression of the root mean square error is QUOTE Where T represents the data set used to test the model and the number of element data in T is QUOTE U denotes user, i denotes attraction, QUETE Representing the u-to-i true score, QUOTE, derived from the training set Representing the prediction score of u vs. i derived from the prediction set. The algorithm for determining the data weight is an expert scoring method and an AHP analytic hierarchy process, and the determination process can be as follows: firstly, 10 experts are selected to score 9 evaluation indexes, a judgment matrix is constructed, and the construction mode of the judgment matrix is as follows: calculating the average value of each analysis item, then dividing the average value to obtain a judgment matrix, wherein the larger the average value is, the higher the importance is, the higher the weight is, after the judgment matrix is obtained, the CR value needs to be calculated, and the specific calculation expression is QUOTE Wherein the process of checking the consistency index, namely the CR value, comprises the following steps: first, the CI value calculated above is described, and its expression is QUOTE And obtaining an RI value by combining the order of the judgment matrix, obtaining a CR value by the obtained CI value and the RI value, and judging whether the obtained weight has consistency according to the result. The criterion for judging whether the matrixes are consistent is a CR value, the smaller the CR value is, the higher the consistency of the matrixes is, the threshold value for judging whether the matrixes are consistent by the CR value is 0.1, 13 index values can be known in the tourist attractions according to the constructed scenic spot evaluation index, and the index values are the total number of the tourist attractionsThe total number of the available indexes used for representing the weight is 9, so that the judging matrix is a 9-order matrix, the CI value is 0.000, the RI value table lookup is 1.460, and the calculated CR value is QUOTE It can be known that the evaluation index judgment matrix meets the relevant requirements in the consistency result test, so that the obtained weight results have consistency.
Optionally, adding a scenery spot and evaluation index matrix QUOTE according to the fitting degree of the index weight and the user score The data in the matrix is used for calculating a first similarity between every two scenic spots, and the first similarity comprises the following steps:
using a content-based collaborative filtering algorithm for pushing sights similar to the sight liked by the user before, wherein the expression of the first similarity is QUOTE The expression of the time attenuation of the scenic spot in which the user is interested is QUOTE ;
After the first similarity of the scenic spots is obtained, the expression is QUOTE Wherein QUOTE Indicating that user u likes a collection of sights, QUOTE Represents the set of K sights, QUOTE, that is most similar to sight j Representing a first degree of similarity, QUOTE, of sight i and sight j Representing the interest of user u in sight i.
In this embodiment, the first similarity of the scenic spots is calculated by using euclidean distance, and the process includes: after determining the weight of each item of data in the evaluation index, the QUOTE about the scenic spot and the evaluation index is constructed from the collected data Dimension matrix: QUOTE Wherein each row represents data of a scenery spot, the columns represent constructed evaluation index data, and the data of each column is multiplied by the index weight to obtain a scenery spot and weight index matrix QUOTE , QUOTE (ii) a Pair matrix QUOTE Similarity calculation is carried out on the scenic spot data structures in the two scenic spot data structures, euclidean distances of the two scenic spot data structures are calculated pairwise, and TOP-K responding to the Euclidean distances is selected as similar scenic spots of the scenic spots.
It should be noted that, for example, the scenic spot X and the scenic spot Y are combined with the index data of the scenic spot X and the quantum Index data set QUOTE with scenery Y Combined to obtain a similarity value QUOTE . The number of the scenic spots is not easy to change greatly and is far less than the number of users, the obtained scenic spot-evaluation index matrix is dense in data, common values basically exist among variables, and Euclidean distance calculation is selected for the scenic spot similarity. Constructing user-grade-scenery point scoring matrix QUOTE according to scores of users to scenery points Here, each user does not have an m-dimensional vector, where QUOTE Represents the value of the credit of the nth user to the mth attraction when the user is QUOTE When the system is the user who has been scored, the user QUOTE is determined by the user-scenery spot scoring matrix Feature vector QUOTE of And performing similarity calculation with the feature vectors of other users to obtain similar users TOP-N. When the user is a user who has not scored scenic spots in the system, extracting user characteristics according to user information to calculate the similarity of the user, calculating the weighted average of scores of the scenic spots of similar users as the score of a new user and adding the score into QUOTE In the matrix, the nearest neighbor users of the user are obtained, the prediction scores of the user are obtained according to the weighted average of the non-scores calculated by the neighbor users, and the scenic spot list QUOTE of TOP-N is obtained 。
It should be understood that if the number of the scenic spots in the finally generated push list L is s, the commander judgment list QUOTE The number n of the user score of the middle forecast is more than or equal to 3 points, when the score is equal to QUOTE In the list QUOTE The first s scenic spots are taken and filled into the list L to generate a push list(ii) a When QUOTE Now, the list QUOTE Get b scenic spots to fill in the list L, the remaining QUOTE The values are listed in the list QUOTE Combining similarity value ranking and user scoring ranking in similar sights of n sights in the medium screening list L into top quick The scenic spots of the position and the existing n scenic spots form a final push list L together, and the final push list L is pushed for the user.
Referring to fig. 3, the present invention further provides an information pushing apparatus based on cloud computing and big data, including:
the system comprises an acquisition unit, a classification unit and a processing unit, wherein the acquisition unit is used for acquiring and classifying scenic spot information, extracting keyword information from the scenic spot information, judging the type of the scenic spot, and constructing a scenic spot evaluation index according to the type of the scenic spot, wherein the scenic spot evaluation index comprises scenic spot index data and score data of a user on the scenic spot;
a preprocessing unit, configured to preprocess the scenery spot data corresponding to the scenery spot evaluation index, and establish a scoring matrix qualte between the user and the scenery spot Index matrix QUOTE of scenic spots and evaluation indexes Determining each index weight in the scenic spot evaluation indexes by adopting a hierarchical analysis algorithm;
a computing unit for adding a scenery spot and evaluation index matrix QUOTE according to the index weight and the fitting degree of the user score Calculating a first similarity between every two scenic spots in the internal data calculation matrix, and selecting QUOTE according to the first similarity in sequence Obtaining a nearest neighbor set;
an information pushing unit used for obtaining a score matrix QUOTE based on the user and the scenic spot Calculating the second similarity between users by grading the scenic spots to obtain the nearest neighbor user of the user, and selecting QUOTE According to QUOTE Scenic spot data and quench The user data of the mobile terminal completes the pushing of the scenic spot information.
In this embodiment, the cosine similarity is used to calculate the pre-included angle between two space vectors to measure the similarity, and measure the difference between different objects, and is widely applied in a push system, and each vector is drawn into a coordinate space according to the coordinate value of the space where the vector is located, and the similarity between the vectors is calculated by using a formula, if the cosine value of the included angle between the vectors is calculated in the range quale And deducing the formula to be suitable for the vector of any dimension according to the result. Whether the vectors and the vectors are in the same direction is judged according to cosine values, if the cosine values between the vectors are close to 1, namely the included angle between the vectors is almost zero degrees, the two vectors can be judged to be in the same direction, the length is irrelevant to whether the vectors are in the same direction, and for the n-dimensional vector QUOTE 、 QUOTE The expression for calculating the cosine of the angle between them is QUOTE After the user information is processed and converted into the character string vector, the similarity is calculated by using the pre-similarity. The Euclidean distance is calculated according to the real distance between a point and a midpoint in a certain vector space, namely the real distance between an individual and the individual is obtained in space to judge the similarity degree between the two individuals, the Euclidean distance is required to be kept in a scale between the two points all the time when the Euclidean distance is used, and the Euclidean distance is calculated according to the absolute distance between the point and the midpoint in the multidimensional space. The cosine similarity calculation method calculates whether vectors are in the same direction or not, and the Euclidean distance calculates the real distance between points, so that the algorithm is more applicable than the cosine similarity when the user behavior is used as an index to calculate the user similarity, and the Euclidean distance calculation expression of the vector X and the vector Y is QUOTE When similarity calculation is carried out on users according to user scores, the Euclidean distance emphasizes on expressing the fitting degree of the user scores, and the pre-similarity can better distinguish the separation states of the users, namely the score levels.
In all examples shown and described herein, any particular value should be construed as exemplary only and not as a limitation, and thus other examples of example embodiments may have different values.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
The above examples are merely illustrative of several embodiments of the present invention, and the description thereof is more specific and detailed, but not to be construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.
Claims (10)
1. The information pushing method based on cloud computing and big data is characterized by being applied to a cloud platform, wherein the cloud platform comprises a data input layer, a recommendation algorithm layer and a data output layer, the data input layer is used for inputting user data, the recommendation algorithm layer is used for integrating the user data by the cloud platform so as to classify all information and provide recommendation information, the data output layer is used for outputting results of all data of the data input layer to a system background according to the recommendation algorithm layer, the results are processed in a unified and personalized mode, the results are used as pushed and pushed contents to return, and the method comprises the following steps:
the method comprises the steps of obtaining and classifying scenery spot information, extracting keyword information from the scenery spot information to judge the type of the scenery spot, and constructing a tourist spot evaluation index according to the type of the scenery spot, wherein the tourist spot evaluation index comprises index data of the scenery spot and score data of a user on the scenery spot;
preprocessing the scenery spot data corresponding to the scenery spot evaluation index, and establishing a scoring matrix of the user and the scenery spotIndex matrix of scenic spot and evaluation indexDetermining each index weight in the scenic spot evaluation indexes by adopting a hierarchical analysis algorithm;
adding a matrix of the scenic spots and the evaluation indexes according to the index weight and the fitting degree of the user scoreCalculating a first similarity between every two scenic spots in the matrix by the internal data, and sorting and selecting according to the first similarityObtaining a nearest neighbor set;
scoring matrix based on user and scenic spotCalculating the second similarity between users by grading the scenic spots to obtain the nearest neighbor user of the user, and selectingOf neighbor users according toThe scenic spot data ofThe user data of (2) completes the pushing of the sight spot information.
2. The cloud computing and big data based information push method according to claim 1, characterized in that according toThe sight data ofThe user data completes the pushing of the scenic spot information, and the method comprises the following steps:
obtainingThe score of the user to the unscored scenic spots is predicted by the neighbor user through a weighted average method;
and judging the number of high-level scenic spots in the prediction score, and filling the missing part with similar scenic spots of the scenic spots in the list when the number of the scenic spots is less than that of the recommended list.
3. The information push method based on cloud computing and big data as claimed in claim 1, wherein the scoring matrix is based on user and scenery spotCalculating a second similarity between the users through the scores of the scenic spots to obtain a nearest neighbor user of the user, wherein the method comprises the following steps:
acquiring specific rating data of a user on a scenic spot, and constructing the specific rating data into a user-scenic spot rating matrix, wherein rows of the matrix represent the userThe columns of the matrix represent the sightsEach datum in the matrix represents the value of the score of the user n for the sight spot m, and the specific matrix is represented asAfter the matrix is obtained, the similarity between the users or the scenic spots is calculated according to different similarity calculation formulas of specific user grading data sampling to obtain calculation results, and the calculation results are sequenced to obtainSelecting and generating a pushing result from data of neighbor users when K neighbors of the user or the scenic spot are reached;
when needed, the target user is provided with the informationPushing tourist attractionsFirst judging the userIf the score of the neighbor user to the scenic spot is higher, predicting the target user to the scenic spotHas a higher score and pushes the scenery spot to the userOtherwise, the push is not performed.
4. The information pushing method based on cloud computing and big data as claimed in claim 3, wherein the similarity computing process comprises a revised similarity and a revised prediction formula, and the revised similarity is expressed by a revised similarity expressionWhereinThe time of the user u performing the operation on the content i is represented, and the object of the f function is that the longer the time of the user performing the operation on the content i and the content j is, the longer the time isThe smaller the attenuation function used isWhereinA time-decay parameter is represented which is,representing a hyper-parameter; the modified prediction formula isWhereinA time-decay parameter is represented which is,a hyperparameter representing the degree of control time decay,andthe smaller the phase difference, the higher the similarity of content j to content j will be ranked in the push list of target user u to a high similarity.
5. The information pushing method based on cloud computing and big data according to claim 3, wherein a K-means clustering algorithm is used to cluster users and form K clusters to obtain cluster information, when a nearest neighbor of a user is queried, the user in a cluster needs to be searched and a similarity value between the user and the cluster user needs to be recalculated to find the first N users and complete pushing, and the process includes:
initializing a scoring matrixParameters of target user ui and matrix insufficiencyNeighbor parameter M, time decay parameterCarrying out SVT algorithm solution on the scoring matrix of the user and completing the matrix;
clustering the completed matrix by using a K-means algorithm under big data, obtaining clusters with high correlation of all users by dividing, and searching other Top-K with highest similarity in the target users as a neighbor set of the target users for the cluster in which the target users are located;
6. The information pushing method based on cloud computing and big data as claimed in claim 1, wherein preprocessing the scenic spot data corresponding to the scenic spot evaluation index includes:
converting the collected user information into a two-dimensional matrix, carrying out digital representation to obtain user data, carrying out noise reduction and normalization processing on the user data, and constructing the preprocessed user data into a scoring matrix of the user and the scenic spots;
obtained by using similarity calculation formula for user dataObtaining scored data of the similar users through a similar user list corresponding to the first similarity, and obtaining a prediction score value of the target user through weighted average calculation;
7. The cloud computing and big data based information push method according to claim 6, wherein the results are sorted according to the predicted score valuePushing for the user as a generated pushing result, comprising:
dividing a data set corresponding to user data into a training set test set, and training a model of the behavior and interest of a user by using the training set to obtain a training result;
applying the test set data to the model according to the training result for testing, and comparing the training set data with the test set result to calculate the prediction accuracy of the model, wherein the scoring prediction process comprises the following steps;
predicting the score of the user on the unevaluated scenic spots by analyzing the scored data of the user, wherein the algorithm of the predicted score is Root Mean Square Error (RMSE), and the expression of the root mean square error isWhere T represents the data set used to test the model, and the number of element data within T isU denotes a user, i denotes an attraction,representing the training set derived true score of u vs. i,representing the prediction score of u vs. i derived from the prediction set.
8. Root of herbaceous plantThe information push method based on cloud computing and big data as claimed in claim 1, wherein a scenery spot and evaluation index matrix is added according to the fitting degree of the index weight and the user scoreThe data in the inner calculation matrix calculates a first similarity between each two sights, including:
using a content-based collaborative filtering algorithm to push sights similar to the sights liked by the user before, wherein the expression of the first similarity isThe expression of the scenic spot in which the user is interested according to the time attenuation is;
After the first similarity of the scenic spots is obtained, the expression isIn whichIndicating that user u likes a collection of sights,represents the set of K sights that are most similar to sight j,representing a first similarity of sight i and sight j,representing the interest of the user u in the attraction i.
9. The information pushing method based on cloud computing and big data as claimed in claim 1, wherein the first similarity of the scenic spot is calculated by euclidean distance, and the process includes:
after determining the weight of each item of data in the evaluation index, constructing the scenic spot and the evaluation index from the collected dataDimension matrix:wherein each row represents data of a scenery spot, the columns represent constructed evaluation index data, and the data of each column is multiplied by the index weight to obtain a scenery spot and weight index matrix,;
10. The cloud computing and big data based information pushing device based on the cloud computing and big data based information pushing method according to any one of claims 1 to 9, comprising:
the system comprises an acquisition unit, a classification unit and a processing unit, wherein the acquisition unit is used for acquiring and classifying scenic spot information, extracting keyword information from the scenic spot information, judging the type of the scenic spot, and constructing a scenic spot evaluation index according to the type of the scenic spot, wherein the scenic spot evaluation index comprises scenic spot index data and score data of a user on the scenic spot;
a preprocessing unit for processing the scenery spotsPreprocessing the scenic spot data corresponding to the evaluation index, and establishing a scoring matrix of the user and the scenic spotIndex matrix of scenic spot and evaluation indexDetermining each index weight in the scenic spot evaluation indexes by adopting a hierarchical analysis algorithm;
a computing unit for adding the scenery spot and evaluation index matrix according to the fitting degree of the index weight and the user scoreCalculating a first similarity between every two scenic spots in the matrix by the internal data, and sorting and selecting according to the first similarityObtaining a nearest neighbor set;
an information pushing unit for scoring matrix based on user and scenery spotCalculating the second similarity between users by grading the scenic spots to obtain the nearest neighbor user of the user, and selectingOf neighbor users according toThe sight data ofThe user data of (2) completes the pushing of the sight spot information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211376436.1A CN115712780A (en) | 2022-11-04 | 2022-11-04 | Information pushing method and device based on cloud computing and big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211376436.1A CN115712780A (en) | 2022-11-04 | 2022-11-04 | Information pushing method and device based on cloud computing and big data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115712780A true CN115712780A (en) | 2023-02-24 |
Family
ID=85232201
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211376436.1A Pending CN115712780A (en) | 2022-11-04 | 2022-11-04 | Information pushing method and device based on cloud computing and big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115712780A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117349535A (en) * | 2023-12-04 | 2024-01-05 | 四川启明芯智能科技有限公司 | Cross-platform multi-business comprehensive travel management system and method |
CN117614845A (en) * | 2023-11-13 | 2024-02-27 | 纬创软件(武汉)有限公司 | Communication information processing method and device based on big data analysis |
CN117648497A (en) * | 2024-01-29 | 2024-03-05 | 贵州大学 | Method and system for realizing intelligent acquisition of user information based on big data |
CN117614845B (en) * | 2023-11-13 | 2024-05-10 | 纬创软件(武汉)有限公司 | Communication information processing method and device based on big data analysis |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107729444A (en) * | 2017-09-30 | 2018-02-23 | 桂林电子科技大学 | Recommend method in a kind of personalized tourist attractions of knowledge based collection of illustrative plates |
-
2022
- 2022-11-04 CN CN202211376436.1A patent/CN115712780A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107729444A (en) * | 2017-09-30 | 2018-02-23 | 桂林电子科技大学 | Recommend method in a kind of personalized tourist attractions of knowledge based collection of illustrative plates |
Non-Patent Citations (2)
Title |
---|
付巧萍: "基于协同过滤的个性化推送***设计与实现", 万方, pages 4 - 5 * |
史睿瑶: "基于改进协同过滤算法的旅游推荐***设计与实现", 中国优秀硕士学位论文全文数据库 信息科技辑, pages 2 - 4 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117614845A (en) * | 2023-11-13 | 2024-02-27 | 纬创软件(武汉)有限公司 | Communication information processing method and device based on big data analysis |
CN117614845B (en) * | 2023-11-13 | 2024-05-10 | 纬创软件(武汉)有限公司 | Communication information processing method and device based on big data analysis |
CN117349535A (en) * | 2023-12-04 | 2024-01-05 | 四川启明芯智能科技有限公司 | Cross-platform multi-business comprehensive travel management system and method |
CN117648497A (en) * | 2024-01-29 | 2024-03-05 | 贵州大学 | Method and system for realizing intelligent acquisition of user information based on big data |
CN117648497B (en) * | 2024-01-29 | 2024-04-30 | 贵州大学 | Method and system for realizing intelligent acquisition of user information based on big data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110162706B (en) | Personalized recommendation method and system based on interactive data clustering | |
CN110516160B (en) | Knowledge graph-based user modeling method and sequence recommendation method | |
Hasan et al. | Dominance of AI and Machine Learning Techniques in Hybrid Movie Recommendation System Applying Text-to-number Conversion and Cosine Similarity Approaches | |
TWI623842B (en) | Image search and method and device for acquiring image text information | |
US10019442B2 (en) | Method and system for peer detection | |
CN109918563B (en) | Book recommendation method based on public data | |
CN115712780A (en) | Information pushing method and device based on cloud computing and big data | |
CN109471982B (en) | Web service recommendation method based on QoS (quality of service) perception of user and service clustering | |
CN114238573B (en) | Text countercheck sample-based information pushing method and device | |
CN108897750B (en) | Personalized place recommendation method and device integrating multiple contextual information | |
CN109816015B (en) | Recommendation method and system based on material data | |
CN111460251A (en) | Data content personalized push cold start method, device, equipment and storage medium | |
CN112749330B (en) | Information pushing method, device, computer equipment and storage medium | |
Cheung et al. | Characterizing user connections in social media through user-shared images | |
CN110110220A (en) | Merge the recommended models of social networks and user's evaluation | |
CN108491477B (en) | Neural network recommendation method based on multi-dimensional cloud and user dynamic interest | |
Ramadhan et al. | Collaborative Filtering Recommender System Based on Memory Based in Twitter Using Decision Tree Learning Classification (Case Study: Movie on Netflix) | |
CN111723302A (en) | Recommendation method based on collaborative dual-model deep representation learning | |
CN115408618B (en) | Point-of-interest recommendation method based on social relation fusion position dynamic popularity and geographic features | |
Chen et al. | Exploiting aesthetic features in visual contents for movie recommendation | |
CN113688281B (en) | Video recommendation method and system based on deep learning behavior sequence | |
CN113657766A (en) | Tourist attraction joy index metering method based on tourist multi-metadata | |
CN113704617A (en) | Article recommendation method, system, electronic device and storage medium | |
CN114595693A (en) | Text emotion analysis method based on deep learning | |
KR102583679B1 (en) | Apparatus and method for recommending items based on big-data of reviews |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |