CN115712780A

CN115712780A - Information pushing method and device based on cloud computing and big data

Info

Publication number: CN115712780A
Application number: CN202211376436.1A
Authority: CN
Inventors: 黄嵩; 徐辣
Original assignee: Shenzhen Bitpower Information Technology Co ltd
Current assignee: Shenzhen Bitpower Information Technology Co ltd
Priority date: 2022-11-04
Filing date: 2022-11-04
Publication date: 2023-02-24

Abstract

The invention discloses an information pushing method and device based on cloud computing and big data, which are characterized in that scenic spot information is obtained and classified, keyword information is extracted from the scenic spot information to judge the type of the scenic spot, a scenic spot evaluation index is constructed according to the type of the scenic spot, a scoring matrix of a user and the scenic spot and an index matrix of the scenic spot and the evaluation index are established, a first similarity between every two scenic spots in a data calculation matrix in the scenic spot and evaluation index matrix is added according to the index weight and the fitting degree of user scoring, a nearest neighbor set is obtained according to the first similarity sorting, the nearest neighbor user is obtained by the scoring of the user and the scenic spot scoring matrix and calculating the second similarity between the users according to the score of the scenic spots to complete scenic spot information pushing, the scenic spot evaluation index is established, the related scenic spot index data is collected, a scenic spot-index system matrix is obtained by combining the determined index weight, higher pushing quality is achieved, and the information pushing accuracy and the working efficiency are improved.

Description

Information pushing method and device based on cloud computing and big data

Technical Field

The invention belongs to the technical field of data processing, and particularly relates to an information pushing method and device based on cloud computing and big data.

Background

At present, with the increasing exuberance of the demands of the tourism industry, the scale of the tourism market is continuously enlarged, and the defects of the traditional tourism industry are exposed when the demands of people are continuously met in the development process. The tourism industry is more and more informationized along with social progress, and gradually develops into an internet + tourism mode. The information overload occurs along with the problems, and the information overload means that along with the development of social and economic technologies, more and more information is produced, and finally, the total amount of the information greatly exceeds the requirements of people, thereby causing difficulty in selecting and using the information for people. The problem of information overload also exists in the tourism industry, platforms such as websites and APPs can record a large amount of log data in the operation process, user behavior data contained in the log data comprise page browsing, purchasing, clicking, scoring, commenting and the like, and in the face of increasingly abundant users and tourism information on the network, how to quickly and effectively acquire and mine effective information in the information becomes a problem concerned by people without quickly and accurately pushing the information by the users.

Disclosure of Invention

In view of the above, the invention provides an information pushing method and an information pushing device based on cloud computing and big data, which can improve the viscosity of website users, save the time and energy for users to search and compare tourist attractions, and are used for solving the above technical problems.

In a first aspect, the invention provides an information push method based on cloud computing and big data, which is applied to a cloud platform, wherein the cloud platform comprises a data input layer, a recommendation algorithm layer and a data output layer, the data input layer is used for inputting user data, the recommendation algorithm layer is used for integrating the user data by the cloud platform so as to classify all information and provide recommendation information, and the data output layer is used for outputting a result of unifying and individualizing all data of the data input layer to a system background according to the recommendation algorithm layer as push content to be returned, and the method comprises the following steps:

the method comprises the steps of obtaining and classifying scenery spot information, extracting keyword information from the scenery spot information to judge the type of the scenery spot, and constructing a tourist spot evaluation index according to the type of the scenery spot, wherein the tourist spot evaluation index comprises index data of the scenery spot and score data of a user on the scenery spot;

preprocessing the scenery spot data corresponding to the scenery spot evaluation index, and establishing a scoring matrix QUOTE between the user and the scenery spot

Index matrix QUOTE of scenic spot and evaluation index

Determining each index weight in the scenic spot evaluation indexes by adopting a hierarchical analysis algorithm;

adding a scenic spot and evaluation index matrix QUOTE according to the index weight and the fitting degree of the user score

Calculating a first similarity between every two scenic spots in the internal data calculation matrix, and selecting QUOTE according to the first similarity in sequence

Obtaining a nearest neighbor set;

scoring matrix QUOTE based on user and scenic spot

Calculating the second similarity between users by grading the scenic spots to obtain the nearest neighbor user of the user, selecting QUOTE

According to QUOTE

Scenery data and QUOTE

The user data of the mobile terminal completes the pushing of the scenic spot information.

As a further improvement of the above solution, according to QUOTE

Scenic spot data and quench

The user data completes the scenic spot information push, which comprises the following steps:

obtaining QUOTE

The score of the user to the unscored scenic spots is predicted by the neighbor user through a weighted average method;

and judging the number of high-level scenic spots in the prediction score, and filling the missing parts by similar scenic spots of the scenic spots in the list when the number of the scenic spots is less than that of the recommended list.

As a further improvement of the technical scheme, the method is based on a user and scenery spot scoring matrix QUOTE

Calculating a second similarity between the users through the score of the scenic spot to obtain a nearest neighbor user of the user, wherein the method comprises the following steps:

acquiring specific rating data of a user on a scenic spot, and constructing the specific rating data into a user-scenic spot rating matrix, wherein rows of the matrix represent user QUOTE

The columns of the matrix represent scenic spots QUOTE

Each datum in the matrix represents the score value of the user n to the sight spot m, and the concrete matrix is represented as QUOTE

After the matrix is obtained, according to specific user scoring data, sampling different similarity calculation formulas to calculate the similarity between users or scenic spots so as to obtain calculation results, sequencing the calculation results to obtain K neighbors of the users or the scenic spots, and selecting and generating a push result from the data of the neighbor users;

when a request is made to a target user QUOTE

Pushing tourist attraction QUOTE

Judging user QUOTE in time

If the score of the neighbor user to the scenery spot is higher, predicting the target user to the scenery spot QUOTE

Is high and pushes the sight spot QUOTE to the user

Otherwise, the push is not performed.

As a further improvement of the above technical solution, the similarity calculationThe process includes a modified similarity and a modified prediction formula, the modified similarity expression being QUOTE

Wherein QUOTE

Representing the time of the user u performing the operation on the content i, the longer the operation time of the user on the content i and the content j is, the higher the QUOTE is

The smaller, the attenuation function used is QUOTE

Wherein QUOTE

Representing a time decay parameter, QUOTE

Representing a hyper-parameter; the modified prediction formula is QUOTE

Wherein QUOTE

Representing a time decay parameter, QUOTE

Indicating a degree of temporal attenuation of the control, QUOTE

With QUOTE

The smaller the phase difference, the content with high similarity to the content j will also be ranked in the push list of the target user u to a high similarity.

As a further improvement of the above technical solution, a K-means clustering algorithm is used to cluster users and form K clusters to obtain cluster information, and when querying nearest neighbors of a user, the user in a cluster needs to be searched and a similarity value between the user and a cluster user needs to be recalculated to find the first N users and complete push, and the process includes:

initializing the scoring matrix QUOTE

Target user ui, parameters of matrix insufficiency QUOTE

Neighbor parameter M, time decay parameter QUOTE

Carrying out SVT algorithm solution on the scoring matrix of the user and completing the matrix;

clustering the completed matrix by using a K-means algorithm under big data, obtaining clusters with high correlation of all users by dividing, and searching other Top-K with highest similarity in the target users as a neighbor set of the target users for the cluster in which the target users are located;

adopting similarity of introduced time factors to carry out personalized push based on scenic spots, and selecting the first N contents as push results to finish the QUOTE

And (4) pushing.

As a further improvement of the above technical solution, the preprocessing of the scenic spot data corresponding to the scenic spot evaluation index includes:

converting the collected user information into a two-dimensional matrix, performing digital representation to obtain user data, performing noise reduction and normalization processing on the user data, and constructing the preprocessed user data into a scoring matrix of the user and the scenic spot;

QUOTE is obtained by using similarity calculation formula for user data

Obtaining the scored data of the similar users through a similar user list corresponding to the first similarity, and obtaining a prediction score value of the target user through weighted average calculation;

QUOTE ordered according to prediction score value results

And pushing the user as the generated pushing result.

As a further improvement of the above technical solution, QUOTE sorted according to the result of predicting score value

Pushing for the user as a generated pushing result, comprising:

dividing a data set corresponding to user data into a training set test set, and training a model of the behavior and interest of a user by using the training set to obtain a training result;

applying the test set data to the model according to the training result for testing, and comparing the training set data with the test set result to calculate the prediction accuracy of the model, wherein the scoring prediction process comprises the following steps;

predicting the score of the user on the unevaluated scenic spot, and predicting by analyzing the scored data of the user, wherein the algorithm of the predicted score is Root Mean Square Error (RMSE), and the expression of the root mean square error is QUOTE

Where T represents the data set used to test the model, and the number of element data in T is QUOTE

U denotes user, i denotes attraction, QUETE

Representing the u-to-i true score, QUOTE, derived from the training set

Representing the prediction score of u vs. i derived from the prediction set.

As a further improvement of the technical scheme, a scenic spot and evaluation index matrix QUOTE is added according to the fitting degree of the index weight and the user score

The data in the inner calculation matrix calculates a first similarity between each two sights, including:

using a content-based collaborative filtering algorithm for pushing sights similar to the sight liked by the user before, wherein the expression of the first similarity is QUOTE

The expression of the temporal attenuation of the scenic spots in which the user is interested is QUOTE

；

After obtaining the first similarity of the scenic spots, the first similarity is expressed as QUOTE

Wherein QUOTE

Indicating that user u likes a collection of sights, QUOTE

Represents the set of K sights, QUOTE, that is most similar to sight j

Representing a first degree of similarity, QUOTE, of sight i and sight j

Representing the interest of user u in sight i.

As a further improvement of the above technical solution, the first similarity of the scenic spots is calculated by using euclidean distance, and the process includes:

after determining the weight of each item of data in the evaluation index, constructing the QUOTE about the scenic spot and the evaluation index from the collected data

Dimension matrix: quote

Wherein each row represents data of a scenery spot, the columns represent constructed evaluation index data, and the data of each column is multiplied by the index weight to obtain a scenery spot and weight index matrix QUOTE

， QUOTE

；

Pair matrix QUOTE

Similarity calculation is carried out on the scenic spot data structures in the two scenic spot data structures, euclidean distances of the two scenic spot data structures are calculated pairwise, and TOP-K responding to the Euclidean distances is selected as similar scenic spots of the scenic spots.

In a second aspect, the present invention further provides an information pushing apparatus based on cloud computing and big data, including:

the system comprises an acquisition unit, a classification unit and a processing unit, wherein the acquisition unit is used for acquiring and classifying the scenery spot information, extracting keyword information from the scenery spot information to judge the type of the scenery spot, and constructing a tourist spot evaluation index according to the type of the scenery spot, and the tourist spot evaluation index comprises index data of the scenery spot and score data of a user on the scenery spot;

a preprocessing unit for preprocessing the scenery spot data corresponding to the scenery spot evaluation index and establishing a scoring matrix QUOTE between the user and the scenery spot

Index matrix QUOTE of scenic spots and evaluation indexes

a computing unit for adding a scenery spot and evaluation index matrix QUOTE according to the index weight and the fitting degree of the user score

Calculating a first similarity between every two scenic spots in the matrix by the data in the system, and selecting QUOTE according to the first similarity in sequence

Obtaining a nearest neighbor set;

an information push unit for pushing the score matrix QUOTE based on the user and the scenic spot

According to QUOTE

Scenic spot data and quench

The user data of (2) completes the pushing of the sight spot information.

The invention provides an information pushing method and device based on cloud computing and big data, which comprises the steps of obtaining scenery spot information, classifying the scenery spot information, extracting keyword information from the scenery spot information to judge the type of the scenery spot, constructing a tourist spot evaluation index according to the type of the scenery spot, preprocessing the scenery spot data corresponding to the scenery spot evaluation index, and establishing a scoring matrix QUOTE of a user and the scenery spot

Index matrix QUOTE of scenic spot and evaluation index

Determining each index weight in the scenic spot evaluation indexes by adopting a hierarchical analysis algorithm, and adding a scenic spot and evaluation index matrix QUOTE according to the index weight and the fitting degree of user scores

Obtaining a nearest neighbor set based on a user and scenery spot scoring matrix QUOTE

According to QUOTE

Scenery data and QUOTE

The scenic spot information pushing is completed by the user data, scenic spot evaluation indexes can be established, a scenic spot-index system matrix is obtained by collecting relevant scenic spot index data, analyzing and processing are combined with the determined index weight, the higher pushing quality is achieved, better information pushing can be provided for the user, and the accuracy of information pushing and the working efficiency of the system are improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and those skilled in the art can also obtain other related drawings based on the drawings without inventive efforts.

Fig. 1 is a flowchart of an information push method based on cloud computing and big data according to the present invention;

FIG. 2 is a process diagram of scene point data preprocessing of the present invention;

fig. 3 is a block diagram of a cloud computing and big data based information pushing apparatus according to the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention and are not to be construed as limiting the present invention.

Referring to fig. 1, the invention provides an information push method based on cloud computing and big data, which is applied to a cloud platform, wherein the cloud platform comprises a data input layer, a recommendation algorithm layer and a data output layer, the data input layer is used for inputting user data, the recommendation algorithm layer is used for integrating the user data by the cloud platform so as to classify all information and provide recommendation information, and the data output layer is used for outputting a result of unifying and individualizing all data of the data input layer to a system background according to the recommendation algorithm layer as push content to be returned, and the method comprises the following steps:

s1: the method comprises the steps of obtaining scenic spot information, classifying the scenic spot information, extracting keyword information from the scenic spot information, judging the type of the scenic spot, and constructing a scenic spot evaluation index according to the type of the scenic spot, wherein the scenic spot evaluation index comprises scenic spot index data and score data of a user on the scenic spot;

s2: preprocessing the scenery spot data corresponding to the scenery spot evaluation index, and establishing a scoring matrix QUOTE between the user and the scenery spot

Index matrix QUOTE of scenic spot and evaluation index

s3: adding a scenery spot and evaluation index matrix QUOTE according to the index weight and the fitting degree of the user score

Obtaining a nearest neighbor set;

s4: scoring matrix QUOTE based on user and scenic spot

Calculating the second similarity between users by grading the scenic spots to obtain the nearest neighbor user of the user, and selecting QUOTE

According to QUOTE

Scenery data and QUOTE

In the present example, according to QUOTE

Scenic spot data and quench

The user data completes the scenic spot information push, which comprises the following steps: obtaining QUOTE

The neighbor user scores the unscored scenic spots of the user, and the scores of the unscored scenic spots of the user are predicted by a weighted average method; and judging the number of high-level scenic spots in the prediction score, and filling the missing part with similar scenic spots of the scenic spots in the list when the number of the scenic spots is less than that of the recommended list. The user modeling is an important part of the tourism pushing system, plays a decisive role in pushing results, the suitable object of the tourism pushing system is a user, the basis for pushing the user is to obtain preference information of the user, the more comprehensive the information is, the more personalized the pushing results are, therefore, most of the users extract user information more perfectly, but the tourist attractions which form the pushing system are also important for analyzing and calculating the tourist attraction information.

Specifically, the preference degree of the user for the content is calculated according to the preference characteristic, the age characteristic, the gender characteristic and the address characteristic, then favorite values are respectively obtained according to the characteristics of current user registration information, finally, the final favorite values are obtained by using weighted summation, and the most interested money n scenic spots are selected for page display. The new pushed scenic spots are added into a database corresponding to the cloud platform, no user behavior information exists, similar pushing cannot be found according to a collaborative filtering algorithm, the situation that new content cannot be recommended by a user is caused, the characteristics of the pushed content can be extracted by combining with a content recommending algorithm, and similar content can be found and recommended for the content of a newly added system. The content category is mainly the feature and the content title, the purpose and the object are the secondary features, the weight value is determined, the more important the function is, the larger the weight of the content type is, the more the content function is, the more detailed the label information is, and the higher the pushing accuracy of the similar content is.

It should be noted that, since the scoring behavior of the user has randomness, data of the scoring matrix of the user scenic spot appears to be extremely sparse, when calculating the similarity TOP-K of the user or scenic spot, available data is limited, a large amount of useless data participates in the calculation, the obtained accuracy is too low, and meanwhile, the pushing quality of the pushing system is also affected, so that the pushing effect does not meet the expectation. The scoring matrix is a very sparse matrix with low rank, and is analyzed from the perspective of the user and the perspective of pushing in personalized delivery, if the user QUOTE

And user QUOTE

User QUOTE, inclined to push scenery i at the same time

And user QUOTE

The similarity is higher in preference to other pushed sights. Or, if the user QUOTE

At the same timeAnd if the scenic spots i and j are favored, the similarity of other users to the scenic spots i and j is higher in preference. The assumption is reflected in a matrix M, the matrix M is the low rank of the matrix, and the matrix is complemented according to the low rank of the scoring matrix so as to alleviate the problem of data sparsity. When the system provides pushing for users, the similarity relation between each user needs to be calculated to solve the nearest neighbors of the users, when the number of the users is small and the number of the contents is small, the users can push information quickly, but with the increase of the number of the users and the number of scenic spots, the calculation consumes time and occupies system resources, the cluster analysis is performed on the complete click score matrix, the users are divided into a plurality of clusters, when the users are searching for the nearest neighbors, only the clusters are needed to be searched without calculating all the users, the time for searching the nearest neighbors is shortened, and the complexity of the algorithm is also reduced. And clustering the user movement by adopting a K-means clustering algorithm and forming K clustering, acquiring cluster information for a specific user, searching the user in the cluster when searching the nearest neighbor of the user, then recalculating the similarity value between the user and the cluster, and finding the first N users to finish pushing. In the data of the push system, not only user rating data, but also many hidden data, such as user browsing information, evaluation time information, evaluation location information, etc., play an important role in interest mining.

It should be understood that the sparse user scoring matrix is supplemented through the SVT algorithm, the users are clustered through the K-means algorithm, user clustering is achieved, the neighbor search range is reduced, TOP-K pushing is completed through the similarity calculation method of the time factors, and the first K scenic spots are taken to be displayed. After a new user registers, because there is no behavior information, the user can only start from the registration information, the registration information of the user includes the preference, sex, age, address and other information of the user, for the personalized push system, the preference information of the user is the most important, and then sex, third age and final address information, when calculating the preference of the user, the functions will obtain different weights, and the more important the characteristics are, the higher the weight is. The pushing process of the registration information may be: obtaining user registration information, and performing user registration according to the user registration informationThe classification can be multiple classification such as a plurality of characteristics, the scenic spots which are most liked by the user in the categories to which the user belongs are pushed to the user, the preference items, namely the functions, of the user in each category are weighted and summed, and the core problem is that the preference degree QUOTE of the user with the function is calculated for each function, namely each scenic spot for each function f

Wherein QUOTE

Representing a set of users, QUOTE, interested in pushing a sight i

Representing a set of users whose features contain f.

Optionally, QUOTE based on user and attraction scoring matrix

Calculating a second similarity between the users through the scores of the scenic spots to obtain a nearest neighbor user of the user, wherein the method comprises the following steps:

The columns of the matrix represent scenic spots QUOTE

Each datum in the matrix represents the value of the score of the user n for the sight spot m, and the specific matrix is represented as QUOTE

After the matrix is obtained, according to specific user scoring data, different similarity calculation formulas are sampled to calculate the similarity between the users or the scenic spots so as to obtain calculation results, the calculation results are sequenced to obtain K neighbors of the users or the scenic spots, and the pushing results are generated by selecting from data of the neighbor users;

when a target user is required to be QUOTE

Pushing tourist attraction QUOTE

Judging user QUOTE in time first

If the score of the neighbor user for the scenery spot is determined, the score of the neighbor user for the scenery spot is determinedIf the target user is higher than the preset threshold, predicting the goal user to the sight spot QUOTE

Has a higher score and pushes the sight spot QUOTE to the user

Otherwise, the push is not performed.

In this embodiment, the similarity calculation process includes a modified similarity and a modified prediction formula, and the modified similarity expression is quale

Wherein QUOTE

The smaller, the attenuation function used is QUOTE

Wherein QUOTE

Representing a time decay parameter, QUOTE

Representing a hyper-parameter; the modified prediction equation is QUOTE

Wherein QUOTE

Representing a time attenuation parameter, QUOTE

Indicating a degree of temporal attenuation of the control, QUOTE

With QUOTE

The smaller the phase difference, the higher the similarity of content j to content j will be ranked in the push list of target user u to a high similarity. Using a K-means clustering algorithm to cluster the users and form K clustering to obtain cluster information, when the nearest neighbors of the users are inquired, searching the users in the cluster and recalculating the similarity value between the users and the cluster users to find the first N users and finish pushing, wherein the process comprises the following steps: initializing the scoring matrix QUOTE

QUOTE PARAMETERS OF A TARGET USER ui, MATRIX UNDERLY

Neighbor parameter M, time decay parameter QUOTE

Carrying out SVT algorithm solution on the scoring matrix of the user, and completing the matrix; clustering the completed matrix by using a K-means algorithm under big data, obtaining clusters with high correlation of all users by dividing, and searching other Top-K with highest similarity in the target users as a neighbor set of the target users for the cluster in which the target users are located; adopting similarity of introduced time factors to carry out personalized push based on scenic spots, and selecting the first N contents as push results to finish QUOTE

And (4) pushing.

It should be noted that, when a user does not score tourist attractions in an actual system, it is impossible to score all tourist attractions one by one, and conversely, one attraction is not scored by all users, so the matrix is usually a sparse matrix in application, after the matrix is obtained, the similarity between users or attractions can be calculated by adopting different similarity calculation formulas according to specific user scoring data, after a result is obtained, K neighbors of the users or attractions are obtained by sequencing, and the recommendation result is generated by selecting from data of the neighbor users. When a target user QUOTE is required

Pushing tourist attraction QUOTE

Judging user QUOTE in time first

If the score of the neighbor user to the scenery spot is generally higher, predicting the score of the target user to the scenery spot

Is biased toward high and pushes sight spot QUOTE to the user

Otherwise, the push is not performed. The pushing result is mainly based on user rating data, and the algorithm has no personalized pushing for a new user of the system because the new user does not generate enough data, or the rating data cannot be pushed out because a new scene is added into the system. The similarity can be calculated by acquiring the dominant or recessive behaviors of the user, such as scoring, forwarding, saving, marking, commenting, collecting, clicking, page staying time, purchasing and the like, and processing the behaviors to obtain the similarity, wherein the closer the similarity of the calculation result is, the more similar the similarity is, the similar user is considered to be interested. The algorithm process can be as follows: all collected user information is converted into a two-dimensional matrix to be digitally represented, noise reduction and normalization processing are carried out on the data, the preprocessed data are constructed into a user-scenic spot scoring matrix, a similar user list of TOP-N is obtained by using a similarity calculation formula on the data, scored data of similar users are obtained, a prediction score value of a target user is obtained through weighted average calculation, the TOP-N sorted according to the prediction score value result is used as a generated pushing result to be pushed to the user, and therefore the accuracy of information pushing is improved.

Referring to fig. 2, optionally, the preprocessing of the sight spot data corresponding to the sight spot evaluation index includes:

s10: converting the collected user information into a two-dimensional matrix, carrying out digital representation to obtain user data, carrying out noise reduction and normalization processing on the user data, and constructing the preprocessed user data into a scoring matrix of the user and the scenic spots;

s11: QUOTE is obtained by using similarity calculation formula for user data

The similar user list corresponding to the first similarity obtains the scored data of the similar users, and the weighted average calculation is carried outObtaining a prediction score value of a target user;

s12: QUOTE ordered according to prediction score value results

And pushing the user as the generated pushing result.

In this embodiment, QUOTE sorted according to prediction score results

Pushing for the user as a generated pushing result, comprising: dividing a data set corresponding to user data into a training set test set, and training a model of the behavior and interest of a user by using the training set to obtain a training result; applying the test set data to the model according to the training result for testing, and comparing the training set data with the test set result to calculate the prediction accuracy of the model, wherein the scoring prediction process comprises the following steps; predicting the score of the user on the unscored scenic spot, and predicting by analyzing the scored data of the user, wherein the algorithm of the predicted score is Root Mean Square Error (RMSE), and the expression of the root mean square error is QUOTE

Where T represents the data set used to test the model and the number of element data in T is QUOTE

U denotes user, i denotes attraction, QUETE

Representing the u-to-i true score, QUOTE, derived from the training set

Representing the prediction score of u vs. i derived from the prediction set. The algorithm for determining the data weight is an expert scoring method and an AHP analytic hierarchy process, and the determination process can be as follows: firstly, 10 experts are selected to score 9 evaluation indexes, a judgment matrix is constructed, and the construction mode of the judgment matrix is as follows: calculating the average value of each analysis item, then dividing the average value to obtain a judgment matrix, wherein the larger the average value is, the higher the importance is, the higher the weight is, after the judgment matrix is obtained, the CR value needs to be calculated, and the specific calculation expression is QUOTE

Wherein the process of checking the consistency index, namely the CR value, comprises the following steps: first, the CI value calculated above is described, and its expression is QUOTE

And obtaining an RI value by combining the order of the judgment matrix, obtaining a CR value by the obtained CI value and the RI value, and judging whether the obtained weight has consistency according to the result. The criterion for judging whether the matrixes are consistent is a CR value, the smaller the CR value is, the higher the consistency of the matrixes is, the threshold value for judging whether the matrixes are consistent by the CR value is 0.1, 13 index values can be known in the tourist attractions according to the constructed scenic spot evaluation index, and the index values are the total number of the tourist attractionsThe total number of the available indexes used for representing the weight is 9, so that the judging matrix is a 9-order matrix, the CI value is 0.000, the RI value table lookup is 1.460, and the calculated CR value is QUOTE

It can be known that the evaluation index judgment matrix meets the relevant requirements in the consistency result test, so that the obtained weight results have consistency.

Optionally, adding a scenery spot and evaluation index matrix QUOTE according to the fitting degree of the index weight and the user score

The data in the matrix is used for calculating a first similarity between every two scenic spots, and the first similarity comprises the following steps:

The expression of the time attenuation of the scenic spot in which the user is interested is QUOTE

；

After the first similarity of the scenic spots is obtained, the expression is QUOTE

Wherein QUOTE

Indicating that user u likes a collection of sights, QUOTE

Represents the set of K sights, QUOTE, that is most similar to sight j

Representing a first degree of similarity, QUOTE, of sight i and sight j

Representing the interest of user u in sight i.

In this embodiment, the first similarity of the scenic spots is calculated by using euclidean distance, and the process includes: after determining the weight of each item of data in the evaluation index, the QUOTE about the scenic spot and the evaluation index is constructed from the collected data

Dimension matrix: QUOTE

， QUOTE

(ii) a Pair matrix QUOTE

It should be noted that, for example, the scenic spot X and the scenic spot Y are combined with the index data of the scenic spot X and the quantum

Index data set QUOTE with scenery Y

Combined to obtain a similarity value QUOTE

. The number of the scenic spots is not easy to change greatly and is far less than the number of users, the obtained scenic spot-evaluation index matrix is dense in data, common values basically exist among variables, and Euclidean distance calculation is selected for the scenic spot similarity. Constructing user-grade-scenery point scoring matrix QUOTE according to scores of users to scenery points

Here, each user does not have an m-dimensional vector, where QUOTE

Represents the value of the credit of the nth user to the mth attraction when the user is QUOTE

When the system is the user who has been scored, the user QUOTE is determined by the user-scenery spot scoring matrix

Feature vector QUOTE of

And performing similarity calculation with the feature vectors of other users to obtain similar users TOP-N. When the user is a user who has not scored scenic spots in the system, extracting user characteristics according to user information to calculate the similarity of the user, calculating the weighted average of scores of the scenic spots of similar users as the score of a new user and adding the score into QUOTE

In the matrix, the nearest neighbor users of the user are obtained, the prediction scores of the user are obtained according to the weighted average of the non-scores calculated by the neighbor users, and the scenic spot list QUOTE of TOP-N is obtained

。

It should be understood that if the number of the scenic spots in the finally generated push list L is s, the commander judgment list QUOTE

The number n of the user score of the middle forecast is more than or equal to 3 points, when the score is equal to QUOTE

In the list QUOTE

The first s scenic spots are taken and filled into the list L to generate a push list(ii) a When QUOTE

Now, the list QUOTE

Get b scenic spots to fill in the list L, the remaining QUOTE

The values are listed in the list QUOTE

Combining similarity value ranking and user scoring ranking in similar sights of n sights in the medium screening list L into top quick

The scenic spots of the position and the existing n scenic spots form a final push list L together, and the final push list L is pushed for the user.

Referring to fig. 3, the present invention further provides an information pushing apparatus based on cloud computing and big data, including:

the system comprises an acquisition unit, a classification unit and a processing unit, wherein the acquisition unit is used for acquiring and classifying scenic spot information, extracting keyword information from the scenic spot information, judging the type of the scenic spot, and constructing a scenic spot evaluation index according to the type of the scenic spot, wherein the scenic spot evaluation index comprises scenic spot index data and score data of a user on the scenic spot;

a preprocessing unit, configured to preprocess the scenery spot data corresponding to the scenery spot evaluation index, and establish a scoring matrix qualte between the user and the scenery spot

Index matrix QUOTE of scenic spots and evaluation indexes

Obtaining a nearest neighbor set;

an information pushing unit used for obtaining a score matrix QUOTE based on the user and the scenic spot

According to QUOTE

Scenic spot data and quench

In this embodiment, the cosine similarity is used to calculate the pre-included angle between two space vectors to measure the similarity, and measure the difference between different objects, and is widely applied in a push system, and each vector is drawn into a coordinate space according to the coordinate value of the space where the vector is located, and the similarity between the vectors is calculated by using a formula, if the cosine value of the included angle between the vectors is calculated in the range quale

And deducing the formula to be suitable for the vector of any dimension according to the result. Whether the vectors and the vectors are in the same direction is judged according to cosine values, if the cosine values between the vectors are close to 1, namely the included angle between the vectors is almost zero degrees, the two vectors can be judged to be in the same direction, the length is irrelevant to whether the vectors are in the same direction, and for the n-dimensional vector QUOTE

、 QUOTE

The expression for calculating the cosine of the angle between them is QUOTE

After the user information is processed and converted into the character string vector, the similarity is calculated by using the pre-similarity. The Euclidean distance is calculated according to the real distance between a point and a midpoint in a certain vector space, namely the real distance between an individual and the individual is obtained in space to judge the similarity degree between the two individuals, the Euclidean distance is required to be kept in a scale between the two points all the time when the Euclidean distance is used, and the Euclidean distance is calculated according to the absolute distance between the point and the midpoint in the multidimensional space. The cosine similarity calculation method calculates whether vectors are in the same direction or not, and the Euclidean distance calculates the real distance between points, so that the algorithm is more applicable than the cosine similarity when the user behavior is used as an index to calculate the user similarity, and the Euclidean distance calculation expression of the vector X and the vector Y is QUOTE

When similarity calculation is carried out on users according to user scores, the Euclidean distance emphasizes on expressing the fitting degree of the user scores, and the pre-similarity can better distinguish the separation states of the users, namely the score levels.

In all examples shown and described herein, any particular value should be construed as exemplary only and not as a limitation, and thus other examples of example embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.

The above examples are merely illustrative of several embodiments of the present invention, and the description thereof is more specific and detailed, but not to be construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention.

Claims

1. The information pushing method based on cloud computing and big data is characterized by being applied to a cloud platform, wherein the cloud platform comprises a data input layer, a recommendation algorithm layer and a data output layer, the data input layer is used for inputting user data, the recommendation algorithm layer is used for integrating the user data by the cloud platform so as to classify all information and provide recommendation information, the data output layer is used for outputting results of all data of the data input layer to a system background according to the recommendation algorithm layer, the results are processed in a unified and personalized mode, the results are used as pushed and pushed contents to return, and the method comprises the following steps:

preprocessing the scenery spot data corresponding to the scenery spot evaluation index, and establishing a scoring matrix of the user and the scenery spot

Index matrix of scenic spot and evaluation index

adding a matrix of the scenic spots and the evaluation indexes according to the index weight and the fitting degree of the user score

Calculating a first similarity between every two scenic spots in the matrix by the internal data, and sorting and selecting according to the first similarity

Obtaining a nearest neighbor set;

scoring matrix based on user and scenic spot

Calculating the second similarity between users by grading the scenic spots to obtain the nearest neighbor user of the user, and selecting

Of neighbor users according to

The scenic spot data of

The user data of (2) completes the pushing of the sight spot information.

2. The cloud computing and big data based information push method according to claim 1, characterized in that according to

The sight data of

The user data completes the pushing of the scenic spot information, and the method comprises the following steps:

obtaining

and judging the number of high-level scenic spots in the prediction score, and filling the missing part with similar scenic spots of the scenic spots in the list when the number of the scenic spots is less than that of the recommended list.

3. The information push method based on cloud computing and big data as claimed in claim 1, wherein the scoring matrix is based on user and scenery spot

acquiring specific rating data of a user on a scenic spot, and constructing the specific rating data into a user-scenic spot rating matrix, wherein rows of the matrix represent the user

The columns of the matrix represent the sights

Each datum in the matrix represents the value of the score of the user n for the sight spot m, and the specific matrix is represented as

After the matrix is obtained, the similarity between the users or the scenic spots is calculated according to different similarity calculation formulas of specific user grading data sampling to obtain calculation results, and the calculation results are sequenced to obtainSelecting and generating a pushing result from data of neighbor users when K neighbors of the user or the scenic spot are reached;

when needed, the target user is provided with the information

Pushing tourist attractions

First judging the user

If the score of the neighbor user to the scenic spot is higher, predicting the target user to the scenic spot

Has a higher score and pushes the scenery spot to the user

Otherwise, the push is not performed.

4. The information pushing method based on cloud computing and big data as claimed in claim 3, wherein the similarity computing process comprises a revised similarity and a revised prediction formula, and the revised similarity is expressed by a revised similarity expression

Wherein

The time of the user u performing the operation on the content i is represented, and the object of the f function is that the longer the time of the user performing the operation on the content i and the content j is, the longer the time is

The smaller the attenuation function used is

Wherein

A time-decay parameter is represented which is,

representing a hyper-parameter; the modified prediction formula is

Wherein

A time-decay parameter is represented which is,

a hyperparameter representing the degree of control time decay,

and

the smaller the phase difference, the higher the similarity of content j to content j will be ranked in the push list of target user u to a high similarity.

5. The information pushing method based on cloud computing and big data according to claim 3, wherein a K-means clustering algorithm is used to cluster users and form K clusters to obtain cluster information, when a nearest neighbor of a user is queried, the user in a cluster needs to be searched and a similarity value between the user and the cluster user needs to be recalculated to find the first N users and complete pushing, and the process includes:

initializing a scoring matrix

Parameters of target user ui and matrix insufficiency

Neighbor parameter M, time decay parameter

adopting similarity of introduced time factors to carry out personalized push based on scenic spots, and selecting the first N contents as push results to finish the process

And (4) pushing.

6. The information pushing method based on cloud computing and big data as claimed in claim 1, wherein preprocessing the scenic spot data corresponding to the scenic spot evaluation index includes:

converting the collected user information into a two-dimensional matrix, carrying out digital representation to obtain user data, carrying out noise reduction and normalization processing on the user data, and constructing the preprocessed user data into a scoring matrix of the user and the scenic spots;

obtained by using similarity calculation formula for user data

Obtaining scored data of the similar users through a similar user list corresponding to the first similarity, and obtaining a prediction score value of the target user through weighted average calculation;

ordered according to prediction score result

And pushing the user as the generated pushing result.

7. The cloud computing and big data based information push method according to claim 6, wherein the results are sorted according to the predicted score value

Pushing for the user as a generated pushing result, comprising:

predicting the score of the user on the unevaluated scenic spots by analyzing the scored data of the user, wherein the algorithm of the predicted score is Root Mean Square Error (RMSE), and the expression of the root mean square error is

Where T represents the data set used to test the model, and the number of element data within T is

U denotes a user, i denotes an attraction,

representing the training set derived true score of u vs. i,

representing the prediction score of u vs. i derived from the prediction set.

8. Root of herbaceous plantThe information push method based on cloud computing and big data as claimed in claim 1, wherein a scenery spot and evaluation index matrix is added according to the fitting degree of the index weight and the user score

using a content-based collaborative filtering algorithm to push sights similar to the sights liked by the user before, wherein the expression of the first similarity is

The expression of the scenic spot in which the user is interested according to the time attenuation is

；

After the first similarity of the scenic spots is obtained, the expression is

In which

Indicating that user u likes a collection of sights,

represents the set of K sights that are most similar to sight j,

representing a first similarity of sight i and sight j,

representing the interest of the user u in the attraction i.

9. The information pushing method based on cloud computing and big data as claimed in claim 1, wherein the first similarity of the scenic spot is calculated by euclidean distance, and the process includes:

after determining the weight of each item of data in the evaluation index, constructing the scenic spot and the evaluation index from the collected data

Dimension matrix:

wherein each row represents data of a scenery spot, the columns represent constructed evaluation index data, and the data of each column is multiplied by the index weight to obtain a scenery spot and weight index matrix

，

；

For matrix

Similarity calculation is carried out on the data structure of the scenic spots in the building, euclidean distances of the scenic spots are calculated pairwise respectively, and TOP-K of the responses is selected as similar scenic spots of the scenic spots.

10. The cloud computing and big data based information pushing device based on the cloud computing and big data based information pushing method according to any one of claims 1 to 9, comprising:

a preprocessing unit for processing the scenery spotsPreprocessing the scenic spot data corresponding to the evaluation index, and establishing a scoring matrix of the user and the scenic spot

Index matrix of scenic spot and evaluation index

a computing unit for adding the scenery spot and evaluation index matrix according to the fitting degree of the index weight and the user score

Obtaining a nearest neighbor set;

an information pushing unit for scoring matrix based on user and scenery spot

Of neighbor users according to

The sight data of

The user data of (2) completes the pushing of the sight spot information.