CN113220931B

CN113220931B - Multi-label song menu recommendation method, system, equipment and storage medium

Info

Publication number: CN113220931B
Application number: CN202110316152.2A
Authority: CN
Inventors: 王晨旭; 郭晨野; 杨煜; 索凯强; 管晓宏
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2021-03-24
Filing date: 2021-03-24
Publication date: 2023-01-03
Anticipated expiration: 2041-03-24
Also published as: CN113220931A

Abstract

A song list multi-label recommendation method, a system, equipment and a storage medium divide song list data into a test set and a training set, and calculate a song information hash bucket, a singer information hash bucket and a user information hash bucket by respectively adopting a locality sensitive hash algorithm on song information, singer information and user information in the training set; mapping songs, singers and user information in the test set according to corresponding hash buckets respectively to obtain a similar song sheet alternative set, and calculating an initial tag set to be recommended; mining a song list label set in a training set to obtain an association rule set of labels, reordering the labels, and selecting the first A labels in the front order for recommendation to realize song list recommendation. The invention has higher recommendation accuracy and lower time consumption. The method has better compatibility with a recommendation model based on a collaborative filtering algorithm adopted by the current online music platform, the cost and risk for upgrading the system recommendation algorithm are lower, and the method is simple and efficient.

Description

Multi-label song list recommendation method, system, equipment and storage medium

Technical Field

The invention relates to the field of music recommendation systems, in particular to a song list multi-label recommendation method, a song list multi-label recommendation system, song list multi-label recommendation equipment and a song list multi-label recommendation storage medium.

Background

The song list label plays an important role in improving the song listening experience of the online music user and encouraging the user to produce the personalized song list. With the benefit of the development of big data technology, one can implicitly infer the characteristics of songs in a song sheet from a large number of expert tagged song sheets. The collaborative filtering algorithm is used as a classical recommendation algorithm, and can help people to obtain implicit information of the song list from a large amount of data, further obtain other similar song lists according to the target song list, and then calculate a label which is possibly suitable for the target song list through a similar song list set. However, although the collaborative filtering algorithm in the big data era is widely applied, due to the high-dimensional sparsity of song list data, the traditional collaborative filtering algorithm still has the problems that the recommendation accuracy is low, the calculation complexity is high, the tag cannot be recommended on line in real time on a newly created song list, and the like, so that the method is difficult to be applied in practice.

Disclosure of Invention

The invention aims to provide a song list multi-label recommendation method, a song list multi-label recommendation system, song list multi-label recommendation equipment and a song list multi-label recommendation storage medium.

In order to achieve the purpose, the invention adopts the following technical scheme:

a multi-label song menu recommending method is characterized in that,

dividing the song list data into a test set and a training set, and respectively calculating a song information hash bucket, a singer information hash bucket and a user information hash bucket by using a locality sensitive hash algorithm for song information, singer information and user information in the training set;

hash mapping is carried out on the songs, the singers and the user information in the test set respectively according to the song information Hash bucket, the singer information Hash bucket and the user information Hash bucket by adopting a local sensitive Hash algorithm, and a similar song list alternative set is obtained;

calculating an initial tag set to be recommended according to the similar song list candidate set and the tag relevance weight of each song list;

mining the song list tag set in the training set through an FP-Growth algorithm to obtain an association rule set of tags, reordering the tags of the initial to-be-recommended tag set according to the association rule set of the tags, and selecting the first A tags in the front of the ordering for recommendation to realize song list recommendation, wherein A is a set value.

The invention is further improved in that the method specifically comprises the following steps:

step 1: collecting the song data, and dividing the song data into test sets L _test And training set L _train Respectively training sets L using Min-Hash algorithm _train Reducing the dimension of the song information, the singer information and the user information singer sheet sample to N x K dimension, and generating a user-singer sheet signature matrix, a singer sheet-singer signature matrix and a singer sheet-song signature matrix; wherein N is the number of the song sets, and K is the number of random permutation Hash functions in the Min-Hash algorithm;

step 2: LSH barrel optimization is carried out on a user-singing bill signature matrix, a singing bill-singer signature matrix and a singing bill-song signature matrix, similar samples are divided into the same Hash barrel, and a training set L is subjected to _train Respectively carrying out LSH bucket optimization on the signature matrixes of the song information, the singer information and the user information to obtain a singer-song hash bucket, a singer-singer hash bucket and a user-singer hash bucket;

and 3, step 3: will test set L _test Signature vectors of song information, singer information and user information of the medium-target song list after Min-Hash dimensionality reduction are input into corresponding song list-song Hash buckets, song list-singer Hash buckets and user-song list Hash buckets to carry out rapid search of similar song lists, and a similar song list alternative set Sim of the target song list is obtained _set ；

And 4, step 4: selection set Sim according to similar song list _set And the relevance value of the label

Calculating the first z recommendation indexes

Maximum set of tags Rec to be recommended _T ；

And 5: training set L through FP-Growth algorithm _train Chinese song list label combination L _Tag Mining the association rule of the label to obtain the association rule set rules of the label _T ；

And 6: rule set rules associated with labels according to satisfaction of threshold _T For the first z recommendation indexes

Maximum set of tags Rec to be recommended _T The labels in (1) are reordered, and then the first A labels are selected as the final recommendation result of the target song list.

A further development of the invention consists in that, in step 1, a singing sheet-song signature matrix S _M Generated by the following process:

step 1): mapping the song subscript value of the song list in the training set to the song list L _i Generating the song list L of the Chinese song _i Corresponding song subscript list

Is represented as follows:

wherein,

denotes the index value of the jth song in the training set song list, j denotes the menu L _i The number of songs;

step 2): initializing k and generating k different random permutation functions h _i (x) Each permutation function h _i (x) Is represented as follows:

wherein HASH _ PRIME is a large PRIME number, a _i And b _i Are all [1,HASH _PRIME -1]A random number in between, and a random number,

a parameter representing an input;

step 3): initializing song list L _i Corresponding song subscript list

Signature vector of

The updating strategy is to update the song list L _i List of corresponding song indices

Each song subscript in (1)

Respectively substituted into the kth permutation function h _k In (3), signature vector

The kth parameter of (1)

Update to a minimum value, expressed as follows:

and step 4): repeating the step 3), and mixing the song list L _i List of corresponding song indices

Signature vector of

Updating completion, updating training set L _train The rest N-1 song lists are finally calculated and mapped user-song list signature matrix is completed

Is represented as follows:

the further improvement of the invention is that in the step 2, the song list-song hash bucket is obtained through the following processes:

step 1): sign matrix S for song list-song _M Dividing the segments into b segments, each segment consisting of r rows, wherein b x r = k;

step 2): signature matrix S for song list-song _M Performing hash bucket division on each segment of each column to obtain a plurality of hash buckets; sings divided into the same hash bucket on any section are regarded as similar sings;

and step 3): calculating a hash bucket to which a target song list in the test set can be mapped, and taking the song list in the hash bucket as a song list set similar to the target song list so as to obtain a song list-song hash bucket;

song list-song hash bucket Sim _LM ：

Sim _LM ＝{L _i →mj|i∈L，j∈M}

Singing-singer hash bucket Sim _LS ：

Sim _LS ＝{L _i →s _k |i∈L，j∈S}

User-song list hash bucket Sim _LU ：

Sim _LU ＝{u _j →L _i |i∈L，j∈U}

Wherein M represents a training set L _train The song set is in, S represents the training set L _train The middle singer set, U, represents the training set L _train And (4) user collection.

The invention is further improved in that the specific process of the step 4 is as follows:

step 5.1: initializing a quantum tag

Where c =73, recommendation index per label

Are all initialized to 1;

step 5.2: according to the song list label correlation weight value

Updating recommendation indicators in Tag

The calculation formula is as follows:

step 5.3: similar song menu alternative Sim for calculating target song menu _set In (1) recommendation index

Then according to the recommendation index

The labels are sorted from high to low to obtain the first z recommendation indexes

Maximum set of tags Rec _T ，

Wherein

Wherein, the correlation weight of the song list

Calculated by the following formula:

wherein,

for training set L _train Each song list L _i Is combined in a training set L _train Frequency of occurrence of, L _i Is a song list, t _i Is a song list L _i The label of (1).

The invention is further improved in that the specific process of the step 5 is as follows:

step 6.1: constructing an item head table, constructing a hollow FP tree, and scanning a training set L _train Counting each label combination, deleting the label combination with the support degree lower than the minimum support degree min _ supp according to the counting result to obtain a first frequent item set, storing the first frequent item set into an item head table, and then sorting the first frequent item set in a descending order according to the support degree;

step 6.2: secondary scanning training set L _train Combining the song list labels in the step (1), removing an infrequent item set, and sorting in a descending order according to the support degree to obtain a well-sorted frequent item set;

step 6.3: sequentially inserting the frequent item sets in the step 6.2 into the FP tree;

step 6.4: recursively mining a frequent item set through an item header table, and filtering out the frequent item set which does not meet the minimum confidence coefficient min _ conf to obtain an association rule set:

among them, rules _T Representing a set of tag association rules, d representing the total number of association rules mined, R _i Representing the ith association rule;

the support degree is calculated by the following formula:

in the formula, delta (t) _i ，t _j ) Label combination for representing song list t _i ，t _j ]At L _train Number of occurrences of winning combinations, delta (L) _Tag ) Represents L _train Total number of combinations of medium tags; supp (t) _i →t _j ) Indicating the support of this combination;

confidence is calculated by:

in the formula, delta (t) _i ) Indicates the label t _i Total number of (c), conf (t) _i →t _j ) Is shown at the label t _i On the premise of occurrence, t _j Probability that the tag will appear.

The invention is further improved in that the specific process of the step 6 is as follows:

step 7.1: according to implicit relation between labels shown by association rules, treating recommended label set Rec _T Tagging recommendation index

Reordering is performed by the following formula:

and 7.2: finally, according to the label recommendation index

Result of (2) to-be-recommended labelset Rec _T And sorting from high to low again, and selecting the top A labels as the final label recommendation result of the target song list.

A song menu multi-tag recommendation system comprising:

the hash bucket calculation module is used for dividing the song list data into a test set and a training set, and calculating a song information hash bucket, a singer information hash bucket and a user information hash bucket by respectively adopting a local sensitive hash algorithm on song information, singer information and user information in the training set;

the Hash mapping module is used for carrying out Hash mapping on the songs, the singers and the user information in the test set by adopting a local sensitive Hash algorithm according to the song information Hash bucket, the singer information Hash bucket and the user information Hash bucket respectively to obtain a similar song list alternative set;

the initial to-be-recommended tag set calculating module is used for calculating an initial to-be-recommended tag set according to the similar song list alternative set and the tag relevance weight of each song list;

and the label reordering module is used for mining the song list label set in the training set through an FP-Growth algorithm to obtain an association rule set of labels, reordering the labels of the initial to-be-recommended label set according to the association rule set of the labels, and selecting the first A labels in the front order for recommendation to realize song list recommendation, wherein A is a set value.

A computer device comprising a memory and a processor, the memory having stored thereon a computer program operable on the processor, the computer program when executed by the processor implementing a singing sheet multi-tag recommendation method as described above.

A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform a singing sheet multi-tag recommendation method as claimed above.

Compared with the prior art, the invention has the following beneficial effects:

aiming at the problems of low recommendation efficiency, low recommendation accuracy and the like of the traditional collaborative filtering algorithm in the network-accessible cloud song list data, the LSH/Min-Hash-based song list label recommendation method introduced into FP-Growth is provided, and the song list label recommendation process is mainly divided into three steps: firstly, the authority of the song list label combination attribute in the song list of the training set is measured by using the concept of 'label frequency weight determination', the expression capacity of a certain type of label on the song list category is embodied through the song list label combination frequency and the inverse label frequency, and therefore the phenomena that the label recommendation result is inclined and the hot label is recommended due to unbalanced label sample distribution are solved. Secondly, in order to quickly search out a similar song list alternative collection of the test collection target song list, a Min-Hash algorithm is used for carrying out Hash operation on a song list data matrix for K times, and the song list data is mapped into a song list signature matrix with dimensions of N x K, so that the effects of matrix compression and dimension reduction are achieved. On the basis of obtaining the compressed signature matrix, b-fold Hash is carried out by using an LSH algorithm, the singing sheets with local similarity are mapped into the same Hash barrel, and then the singing sheets mapped into the same Hash barrel at least once are used as candidate similarity pairs for carrying out similarity calculation, so that a nearest neighbor set of the target singing sheet is generated, namely a similar singing sheet alternative set of the target singing sheet, and the problem of low efficiency of searching the similar singing sheet set caused by large data scale is solved. And finally, on the basis of forming an initial recommendation result of the tag by using the song list in the alternative set, obtaining a deep implicit relation with high quality of the tag by using an FP-Growth tag implicit information mining technology to reorder the initial recommendation result, and then selecting the former A tags as a final tag recommendation result of the target song list, so that the accuracy of the recommendation method is improved. The invention can adapt to the label recommendation task of large-scale song list data and has the characteristics of simple structure and high recommendation efficiency. Compared with the traditional collaborative filtering method, the method can also be used for labeling the newly created song list data to recommend the song list data in real time, and is high in accuracy and high in recommending speed.

Drawings

FIG. 1 is a power law distribution diagram of the number of singing orders contained in different labels and label combinations. Wherein, (a) is the tag rank, and (b) is the tag combination rank.

FIG. 2 is a graph showing a distribution of the number of singing orders contained in the combination of the Top15 tag and the Top15 tag. Wherein, (a) is single label distribution, and (b) is label combination distribution.

Fig. 3 is a flowchart of a song menu multi-tag recommendation method based on an improved collaborative filtering algorithm.

Fig. 4 is a schematic diagram of a singing bill signature matrix for bucket optimization using the LSH algorithm.

FIG. 5 is a graph comparing time of recommendation versus accuracy of recommendations at different data set scales.

Fig. 6 is a comparison graph of the method (LSH) and the conventional collaborative filtering method (TRAD) in different optimization modes.

FIG. 7 is a comparison of different optimization regimes at two dataset scales.

Detailed Description

The present invention is described in detail below with reference to the attached drawings.

A song list multi-label recommendation method based on an improved collaborative filtering algorithm,using captured one hundred thousand scale network Yi cloud music song list data, dividing the song list data into test set song list data L according to the proportion of 1: 9 _test And training set song list data L _train 。

First through L _train Calculating the label correlation weight of each song list, and then calculating L _train Respectively using locality sensitive hashing algorithm to calculate hash bucket Sim of song information _LM Singer information hash bucket Sim _LS And user information hash bucket Sim _LU ；

Then use locality sensitive hashing algorithm to pair L _test The songs, singers and user information in the song list are subjected to Hash mapping according to a Hash barrel respectively to obtain a similar song list alternative set;

then, calculating an initial to-be-recommended label set Rec according to the alternative set of similar vocabularies _T ；

And finally, mining the song list label set in the training set through an FP-Growth algorithm to obtain an association rule set of labels, reordering the labels of the initial to-be-recommended label set according to the association rule set of the labels, and selecting the first A labels in the front order for recommendation, thereby realizing the effect of quickly and accurately recommending song list data. Wherein A is a set value, and A is more than 0 and less than or equal to 73.

The method specifically comprises the following steps:

step 1: calculating the correlation weight of the song list through the combination frequency and the inverse tag frequency of the song list tag

Specifically, the correlation weight of the song list

The specific calculation process of (2) is as follows:

step 1.1: firstly, the song list data is divided into test sets L according to the ratio of 1: 9 _test And training set L _train Calculate L _train Each song list L _i Is combined in the whole L _train Frequency of occurrence in

And calculates the song list L _i Label t in (1) _i Present alone in L _train Frequency of

Step 1.2: song list correlation weight

The calculation formula is as follows:

and 2, step: separately dividing L by using Min-Hash algorithm _train Reducing the dimension of song information, singer information and user information song list samples to N x K dimensions to generate three signature matrixes, wherein N is the number of song list sets, and K is the number of random replacement Hash functions in a Min-Hash algorithm; l is _train In each song list L _i Containing 3 types of data objects, each being a song list L _i Album LM _i Singer LS collection _i And its creator LU _i 。

Generating a song list-song signature matrix S _M The specific process is as follows:

step 2.1: mapping the song subscript value of the song list in the training set to the song list L _i Generating a song list L _i Corresponding song subscript list

It is represented as follows:

wherein,

indicating that the jth song is trainingSubscript value in exercise album song list, j represents the song list L _i The number of songs.

Step 2.2: initializing k, generating k different random permutation functions h _i (x) Each permutation function h _i (x) Is represented as follows:

representing the parameters of the input.

Step 2.3: initialization

Signature vector of

The update strategy is to

Each song subscript in (1)

Respectively substituted into the k permutation function h _k In (3), signature vector

The kth parameter of

Update to its minimum value, which is expressed as follows:

step 2.4: repeating step 2.3, and mixing

Signature vector of

Update completed, update L _train The rest N-1 song lists are finally calculated to obtain the mapped song list-song signature matrix

It is represented as follows:

training set L _train The song list data in (a) generates a song list-singer signature matrix and a user-song list signature matrix, respectively, in the same manner as the song list-song signature matrix.

And step 3: l subjected to dimension reduction _train LSH barrel optimization is carried out on the singing signature matrix, similar samples are divided into the same Hash barrel (Hash-bucket), samples without similarity can not be divided into the same Hash barrel with high probability, and L is subjected to _train LSH (local least squares) bucket optimization is respectively carried out on signature matrixes of song information, singer information and user information in the database to obtain three different Hash buckets (Hash-buckets), and the concept of the Hash buckets is as follows:

the hash bucket is a block of area which is provided with an identifier in the hash table and used for storing the objects after hash operation, if a plurality of objects fall into the same bucket after hash operation, the collision is indicated, and the objects in the hash bucket have higher similarity;

the specific process is as follows:

step 3.1: sign the song list-song matrix S _M Dividing into b segments, each segment consisting of r rows, wherein b x r = k;

step 3.2: signature matrix S for song list-song _M Each column of the song list, namely the signature vector of each song list, is respectively subjected to Hash bucket division on each section,obtaining a plurality of hash buckets; sings that are classified into the same hash bucket on any segment are considered similar sings.

Step 3.3: finally, only the hash bucket to which the target song list in the test set can be mapped needs to be calculated, and then the song list in the hash bucket is used as the song list set similar to the target song list.

Step 3.4: and (3) respectively generating a corresponding song list-song hash bucket, a song list-singer hash bucket and a user-song list hash bucket by using the generated song list-song, song list-singer and user-song list signature matrixes in the modes from the step 3.1 to the step 3.3, wherein the specific steps are as follows:

song list-song Hash-bucket:

Sim _LM ＝{L _i →m _j |i∈L，j∈M}

song list-singer Hash-bucket:

Sim _LS ＝{L _i →s _k |i∈L，j∈S}

user-song list Hash-bucket:

Sim _LU ＝{u _j →L _i |i∈L，j∈U}

wherein M represents L _train Middle song collection, S represents L _train Middle singer set, U represents L _train And (4) user collection.

And 4, step 4: mixing L with _test The song information, singer information and user information of the medium target song list are respectively subjected to Min-Hash dimensionality reduction, and signature vectors are input into corresponding song list-song Hash buckets, song list-singer Hash buckets and user-song list Hash buckets to carry out similar song list quick retrieval, so that a similar song list alternative set Sim of the target song list is obtained _set ；

And 5: selection of Sim by similar song list _set And song list label correlation weight value

Calculating the front z recommendation indexes

Maximum set of tags Rec to be recommended _T (ii) a Utensil for cleaning buttockThe process is as follows:

step 5.1: initializing a full-size tag

Where c =73 indicates that there are 73 different labels in total, and the recommendation index of each label

Are all initialized to 1;

step 5.2: correlation weight value combined with song list label

To update the recommendation index in the Tag

The calculation formula is as follows:

Then according to the recommended index

Sequencing the labels from high to low to obtain the first z recommendation indexes

Maximum set of tags Rec _T ，

Wherein

Step 6: training set L through FP-Growth algorithm _train Chinese song list label combination L _Tag Mining the label association rule to obtain the association rule set rules of the label _T (ii) a The specific process is as follows:

setting a minimum support min _ supp and a minimum confidence min _ conf threshold value according to experience through an FP-Growth algorithm, and setting L _train When the song list label combination in the song list is used for carrying out label association rule mining, the label combination with the support degree smaller than the minimum support degree is deleted, then the association rule set with the confidence degree smaller than the minimum confidence degree is filtered, and finally the label association rule set rules meeting the threshold value is obtained _T (ii) a The calculation formulas of the support degree and the confidence degree are as follows;

the support degree is as follows:

in the formula, delta (t) _i ，t _j ) Label combination for representing song list t _i ，t _j ]At L _train Number of occurrences of the winning tag combination, δ (L) _Tag ) Represents L _train Total number of combinations of tags in (c). supp (t) _i →t _j ) Indicating the support of this combination, with larger values indicating a greater frequency of occurrence of this combination, and more relevant two tags.

Confidence coefficient:

in the formula, delta (t) _i ) Indicates the label t _i Total number of (c), conf (t) _i →t _j ) Is shown at the label t _i On the premise of occurrence, t _j If the confidence coefficient is 1, the probability that the label will appear indicates the label t _i When present, the label t _j Must occur.

The specific process of step 6 is as follows:

step 1: construct item header tables, construct empty FP trees, and scan L _train The song list label combination is counted, and according to the counting result, every label combination is countedDeleting the label combination with the support degree lower than the minimum support degree min _ supp to obtain a first frequent item set, storing the first frequent item set into an item head table, and then sorting the first frequent item set in a descending order according to the support degree;

step 2: second pass L _train Combining the song list labels in the step (1), removing an infrequent item set, and sorting in a descending order according to the support degree to obtain a well-sorted frequent item set;

and step 3: inserting the frequent item sets in the step 2 into the FP tree in sequence (the ancestor node of the FP tree is the node with the top rank, and the descendant node is the node with the back rank);

and 4, step 4: recursively mining a frequent item set through an item header table, and filtering out the frequent item set which does not meet the minimum confidence coefficient min _ conf to obtain an association rule set:

among them, rules _T Representing mined tag association rule set, d representing the total number of mined association rules, R _i Representing the ith association rule and the confidence of the rule;

and 7: according to the slave L _train The set of related rules of the tags mined in _T For the first z recommendation indexes

The specific process is as follows:

step 7.1: according to the implicit relation between the labels shown by the association rule, the label set Rec to be recommended _T Index for making label recommendation

Resequencing, countingThe calculation formula is as follows:

step 7.2: finally, according to the label recommendation index

Result of (2) treating the recommended tab set Rec _T And sorting from high to low again, and selecting the top A labels as the final label recommendation result of the target song list.

Firstly, dividing captured Internet music song list data into test set song lists L according to the proportion of 1: 9 _test And training set song list L _train . Using L _train Calculating L of the song list set _train The tag relevance weight of each song list is weighted, and then L is added _train And (5) carrying out dimension reduction on song list data and carrying out barrel optimization. To L _test And finally, rearranging the initial tag recommendation result of the tag by using the association rule of the tag to obtain a more accurate tag recommendation result, thereby realizing the rapid and accurate tag recommendation of large-scale song menu data.

According to the invention, based on the characteristics of huge scale, high-dimensional sparseness, uneven label distribution and the like of the song list data, the multidimensional information of the data and the distribution information of the song list label combination are combined, and the accuracy of label recommendation is improved by introducing an FP-Growth association rule algorithm and calculating the song list correlation degree weight based on the song list label combination frequency and the inverse label frequency. By introducing the Min-Hash-based locality sensitive Hash algorithm into the collaborative filtering algorithm, the retrieval efficiency of similar singing lists is greatly improved, and the time for recommending the model is reduced. Therefore, compared with the singing list tag recommendation model based on the traditional collaborative filtering method, the recommendation accuracy of the tag recommendation model provided by the method is greatly improved, and the singing list tag recommendation real-time target is realized due to the low calculation complexity of the recommendation model. It is worth noting that compared with a recommendation model based on a deep learning algorithm, the method provided by the invention has better compatibility with a recommendation model based on a collaborative filtering algorithm adopted by a current online music platform, and the cost and risk for upgrading the system recommendation algorithm are lower.

The following are specific examples.

Example 1

TABLE 1 Experimental data set

Referring to (a) and (b) in fig. 1 and (a) and (b) in fig. 2, the invention first captures 10 ten thousand song sheets from the internet cloud music platform, there are 73 different Tag tags Tag, and there are 16449 different song sheet Tag combinations L _Tag And as shown in Table 1, the total song list data was divided into test set song lists L in a ratio of 1: 9 _test And training set song list L _train . The labels in the song list data set are explored to find that labels in Europe, america, fashion and the like have high utilization rate in the whole data set, but labels in games, cantonese and the like have low utilization rate, the situation that the number of each label in the song list is large in proportion is shown, and the number of the song lists owned by different song list label combinations also shows large inclination. Under the condition, if the traditional collaborative filtering singing list label recommendation model is directly adopted, the phenomenon that the recommendation result tends to recommend hot labels is caused, and even under the condition that the similarity of the target singing list and other singing lists is scattered accurately, the recommendation accuracy of the model is not high. Therefore, the invention considers the idea of determining the weight by using the label frequency to carry out the relevant measurement on the label combination authority in the song list, and the expression capability of a certain type of label on the song list category is embodied by the song list label combination frequency and the inverse label frequency.

The method specifically comprises the following steps:

step 1: compute training set L _train Each song list L in _i Is combined in a training set L _train Frequency of occurrence in

And calculates the song list L _i Label t in (1) _i Appear in the training set L alone _train Frequency of

Step 2: the expression ability of a certain type of labels to song categories is embodied by the song list label combination frequency and the inverse label frequency, and the calculation formula is as follows:

in the formula

Represents the song list L _i The purpose of the numerator plus 1 is to prevent the weight from being negative. In particular, assume that for two songs L having different tag combinations ₁ And L ₂ The combination frequency of the song list label is

But L ₁ The labels in (1) are relatively hot, have

The initial weight calculation formula of the song list can be obtained

It is reasonable to do so that L ₁ The positive influence of the hot tag makes the song list tag combination frequency

And with

And the influence of the hot tags needs to be punished to a certain extent, so that a more real singing list tag combination weight authority value is obtained.

FIG. 3 is a flow chart of a song list multi-tag recommendation method of the present invention, which uses a song list weight initialization mode to reduce the influence of hot tags; acquiring a deep implicit relation with high quality of a label by using a label association rule mining technology based on FP-Growth; the method comprises the following steps of mapping high-dimensional sparse song list data into a low-dimensional dense song list signature matrix through a Min-Hash algorithm, then carrying out similarity clustering on songs by using an LSH algorithm, and combining the similarity clustering with a collaborative filtering method, so that a song list label recommendation task can be carried out more quickly and accurately, and the method specifically comprises the following steps:

step 1: firstly, initializing the song list label weight through the song list label combination frequency and the inverse label frequency, and enabling the song list label correlation degree weight to be weighted

And introducing the singing bill similarity calculation to reflect the probability distribution of the labels by using the label-singing bill correlation.

The larger the size, the more the song list L _i The higher the authority of the tag combination in (1), and conversely,

the smaller the size, the singing list L _i The lower the authority of the tag combination in (1). The calculation formula is as follows:

and 2, step: by training set L _train The song list data set respectively calculates the signature matrixes of the song list-song, the song list-singer and the user-song list by using a Min-Hash algorithm, then maps the song list with high similarity in each signature matrix into a corresponding Hash bucket by using an LSH algorithm, and realizes fast retrieval of the test set L according to the three Hash buckets _test A selection of similar singing lists for medium target singing lists. The specific retrieval process is as follows:

1) First adopt Min-Hash algorithm generates training set L _train The signature matrix of the following three data sets is calculated in the same manner, for example, the singe-song signature matrix is calculated as follows:

1.1 will song list L _i The song ID in (1) is mapped to the subscript value in the total list of songs, and the song list L _i List of songs in (1)

Can be expressed in the following form:

1.2 generating k different random permutation functions h _i (x) Permutation of parameter a in function _i And b _i Are random numbers with a value of [1,HASH _PRIME]HASH _ PRIME is a large PRIME number, and the permutation function is expressed as follows:

1.3 pairs of song list L _i Song list of

Initializing a signature vector in the form of

The updating strategy of the signature vector is to pass a permutation function h _k Updating each parameter in the signature vector to the minimum value thereof, wherein the signature vector after updating is represented as follows:

1.4 repeating steps 1.2 and 1.3 until all the singing menu signature vectors are mapped, and generating a signature matrix S _M Watch, thereofShown below:

2) Then, the calculated signature matrix S is again compared with the signature matrix S in FIG. 4 _M LSH barrel optimization is carried out, and the specific process can be divided into three steps:

2.1 signature matrix S _M Dividing into b segments, each segment consisting of r rows, wherein b x r = k;

2.2, performing hash bucket division on each segment of each signature vector in the matrix, and mapping any segment in the same bucket as a similar segment, wherein the song list corresponding to the segments is regarded as a similar song list;

2.3, when searching the similar song list set of the target song list, only calculating the hash bucket to which the signature vector of the target song list can be mapped, and then taking the song list in the hash bucket as the similar song list alternative set of the target song list.

And step 3: pass test set L _test Alternative collection Sim of similar singing lists of medium-target singing lists _set And song list initial weight information

Calculating recommendation indexes of each to-be-recommended label

The simplified calculation formula is as follows:

and 4, step 4: selecting the labels of the first z maximum recommendation indexes as the initial recommendation result Rec of the target song list _T It is represented as follows:

and 5: using initial recommendation Rec _T And label association rules mined through FP-Growth _T Reordering the tags to be recommended, wherein the reordering is calculated by the following steps:

step 6: finally, according to the calculated label set Rec to be recommended _T Selecting a recommendation index

The largest top A is used as the label recommendation result of the target song list.

Compared with the traditional collaborative filtering recommendation method, the method has the advantages of high recommendation accuracy, high recommendation speed, capability of performing online real-time recommendation, simplicity and effectiveness.

Fig. 5 shows a comparison between the improved collaborative filtering multi-label recommendation method (LSH) and the conventional collaborative filtering recommendation method (TRAD) in terms of recommendation accuracy and recommendation efficiency. The effect evaluation of the recommendation method is to calculate an F1 value through precision and recall, use the F1 value as an evaluation index of the recommendation result, and define the following:

the precision ratio is as follows:

the higher the accuracy rate is, the larger the proportion of the recommended correct label number in the recommended total label number is.

The recall ratio is as follows:

wherein test _ tags _ num represents the actual correct tag number. The higher the recall rate is, the larger the proportion of the recommended correct label number to the actual correct total label number is.

F1 value:

the F1 value serves as a weighted harmony of precision ratio precision and recall ratio call, the influence of the precision ratio precision and recall ratio call on model accuracy evaluation is comprehensively considered, the higher the F1 value is, the more stable the overall accuracy of the recommended model is, and vice versa.

TABLE 2 song list on-line recommendation accuracy comparison results

TABLE 3 results of the present invention in the scene of song list online real-time recommendation

TABLE 4 detailed description of ablation experiments

Tables 2 and 3 show the comparison results of the invention and the traditional collaborative filtering recommendation method under the situation of singing list online real-time recommendation. Table 4 and fig. 6-7 show the comparison result of the multi-label recommendation of the singing sheet after the improved method is not used or partially used on the real singing sheet data in order to verify whether the improved methods of the present invention are effective. Referring to tables 2 to 4 and fig. 6 to 7, the comparison result shows that all the improvement modes proposed by the present invention can greatly improve the effect of the singing style label recommendation task, and have better recommendation accuracy and recommendation real-time performance.

In the invention, it is proposed that: 1) singing order relevance weight calculation based on label combination frequency and inverse label frequency, 2) a collaborative filtering recommendation method based on LSH/Min-Hash, and 3) a label rearrangement method based on FP-Growth. The method mainly solves the problems that the traditional collaborative filtering method is low in recommendation accuracy and high in calculation complexity when song list label recommendation is carried out, and labels cannot be recommended on newly created song lists in real time on line. Aiming at the problems of recommendation efficiency and online real-time recommendation, the invention provides a collaborative filtering song list recommendation method based on LSH/Min-Hash, so that a similar song list alternative set can be quickly searched out according to a target song list, the quick query of the similar song list is realized, the algorithm calculation complexity is reduced, and the online recommendation of a newly made song list label is supported. Aiming at the problem of accuracy improvement of recommendation, the invention calculates the singing bill relevance weight through the singing bill label combination frequency and the inverse label frequency, then calculates the index to be recommended of the label by using a weighting fusion strategy, and finally carries out label rearrangement on candidate label combinations by using the calculated label association rule to obtain the final recommended label. The experiment result based on the real song list data of the Internet music cloud indicates that the recommendation evaluation index F1 value of the recommendation result after the optimization of the weight and the association rule is greatly improved compared with that of the traditional collaborative filtering method, and the average running time is greatly improved compared with that of the traditional collaborative filtering method.

A song menu multi-tag recommendation system comprising:

and the label reordering module is used for mining the song list label set in the training set through an FP-Growth algorithm to obtain an association rule set of labels, reordering the labels of the initial to-be-recommended label set according to the association rule set of the labels, and selecting the first A labels in the front order for recommendation to realize song list recommendation.

A computer device comprising a memory and a processor, the memory having stored thereon a computer program executable on the processor, the computer program when executed by the processor implementing a singing sheet multi-tag recommendation method as claimed in any one of claims 1 to 7.

A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to execute the singing sheet multi-tag recommendation method of any one of claims 1 to 7.

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that, in the present application, the embodiments and features of the embodiments may be combined with each other without conflict.

The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

As used in this disclosure, "module," "device," "system," and the like are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, or software in execution. In particular, for example, an element may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. Also, an application or script running on a server, or a server, may be an element. One or more elements may be in a process and/or thread of execution and an element may be localized on one computer and/or distributed between two or more computers and may be operated by various computer-readable media. The elements may also communicate by way of local and/or remote processes based on a signal having one or more data packets, e.g., from a data packet interacting with another element in a local system, distributed system, and/or across a network in the internet with other systems by way of the signal.

Finally, it should also be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising 8230; \8230;" comprises 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Finally, it should be noted that: although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims

1. A song list multi-label recommendation method is characterized by comprising the following steps:

step 1: collecting the song data, and dividing the song data into test sets L _test And training set L _train Respectively using Min-Hash algorithm to respectively convert the training sets L _train Reducing the dimension of the song information, singer information and user information singing sheet samples to NxK dimension, and generating a user-singing sheet signature matrix, a singing sheet-singer signature matrix and a singing sheet-song signature matrix; wherein N is the number of song sets, and K is the number of random permutation Hash functions in the Min-Hash algorithm;

and 2, step: LSH barrel optimization is carried out on a user-singing bill signature matrix, a singing bill-singer signature matrix and a singing bill-song signature matrix, similar samples are divided into the same Hash barrel, and a training set L is subjected to _train Respectively carrying out LSH bucket optimization on the signature matrixes of the song information, the singer information and the user information to obtain a singer-song hash bucket, a singer-singer hash bucket and a user-singer hash bucket;

and step 3: will test set L _test The song information, singer information and user information of the medium target song list are respectively subjected to Min-Hash dimensionality reduction, and signature vectors are input into corresponding song list-song Hash buckets, song list-singer Hash buckets and user-song list Hash buckets to carry out similar song list quick retrieval, so that a similar song list alternative set Sim of the target song list is obtained _set ；

And 4, step 4: selection set Sim according to similar song list _set And song list label correlation weight

Calculating the first z recommendation indexes

Maximum set of tags Rec to be recommended _T ；

And 5: by FP-Growth algorithm, training set L _train Chinese song list label combination L _Tag Mining the label association rule to obtain the association rule set rules of the label _T (ii) a The specific process of step 5 is as follows:

step 6.2: secondary scanning training set L _train Combining the song list tags in the step (1), removing an infrequent item set, and sorting in a descending order according to the support degree to obtain a frequent item set in a sorted order;

step 6.4: recursively mining a frequent item set through an item head table, and filtering out the frequent item set which does not meet the minimum confidence coefficient min _ conf to obtain an association rule set:

the support degree is calculated by the following formula:

in the formula, delta (t) _i ,t _j ) Label combination for representing song list t _i ,t _j ]At L _train Number of occurrences of the winning tag combination, δ (L) _Tag ) Represents L _train Total number of combinations of medium tags; supp (t) _i →t _j ) Indicating the support of this combination;

confidence is calculated by:

in the formula, delta (t) _i ) Indicates the label t _i Total number of (c), conf (t) _i →t _j ) Is shown at label t _i On the premise of occurrence, t _j The probability that a tag will appear;

step 6: associating rules sets rules according to tags meeting threshold _T For the first z recommendation indexes

2. The method as claimed in claim 1, wherein the singing list-song signature matrix S in step 1 _M Generated by the following process:

step 1): mapping song subscript value of song list in training set to song list L _i Generating a song list L _i Corresponding song subscript list

Is represented as follows:

wherein,

index value indicating jth song in the training set song list, j indicates the menu L _i The number of songs;

a parameter representing an input;

step 3): initializing song list L _i Corresponding song subscript list

Signature vector of

The updating strategy is to change the song list L _i List of corresponding song indices

Each song subscript in (1)

Respectively substituted into the kth permutation function h _k In (2), signature vector

The kth parameter of

Update to a minimum value, expressed as follows:

step 4): repeating the step 3), and mixingSong list L _i List of corresponding song indices

Signature vector of

Updating is completed, the training set L is updated _train The rest N-1 singing lists are finally calculated and mapped user-singing list signature matrixes are obtained

Is represented as follows:

3. the method for recommending song tickets with multiple tags as claimed in claim 1, wherein in step 2, the song ticket-song hash bucket is obtained by the following steps:

step 1): sign the song list-song matrix S _M Dividing into b segments, each segment consisting of r rows, wherein b x r = k;

step 2): signature matrix S for song list-song _M Performing hash bucket division on each section of each row to obtain a plurality of hash buckets; sings divided into the same hash bucket on any section are regarded as similar sings;

step 3): calculating a hash bucket to which a target song list in the test set can be mapped, and taking the song list in the hash bucket as a song list set similar to the target song list so as to obtain a song list-song hash bucket;

song list-song hash bucket Sim _LM ：

Sim _LM ＝{L _i →m _j |i∈L,j∈M}

Singing list-singer hash bucketSim _LS ：

Sim _LS ＝{L _i →s _k |i∈L,j∈S}

User-song list hash bucket Sim _LU ：

Sim _LU ＝{u _j →L _i |i∈L,j∈U}

Wherein M represents a training set L _train The middle song set, S represents the training set L _train The middle singer set, U, represents the training set L _train And (4) user collection.

4. The method for recommending song list with multiple labels as claimed in claim 1, wherein the specific process of step 4 is as follows:

step 5.1: initializing a full-size tag

Where c =73, recommendation index per label

Are all initialized to 1;

step 5.2: according to the song list label correlation weight

Updating recommendation indicators in Tag

The calculation formula is as follows:

step 5.3: similar song list alternate set Sim for calculating target song list _set In (2) recommendation index

Then according to the recommendation index

Maximum set of tags Rec _T ，

Wherein

Wherein, the correlation weight of the song list

Calculated by the following formula:

wherein,

5. The method for recommending song list with multiple labels as claimed in claim 1, wherein the specific process of step 6 is as follows:

step 7.1: according to the implicit relation between the labels shown by the association rule, the label set Rec to be recommended _T Tagging recommendation index

Reordering, the calculation formula is as follows:

and 7.2: finally, according to the label recommendation index

Result of (2) to-be-recommended labelset Rec _T And sorting from high to low again, and selecting the first A labels as the final label recommendation result of the target song list.

6. A song list multi-tag recommendation system, comprising:

a signature matrix generation module for collecting the song list data and dividing the song list data into a test set L _test And training set L _train Respectively training sets L using Min-Hash algorithm _train Reducing the dimension of the song information, the singer information and the user information singer sheet sample to N x K dimension, and generating a user-singer sheet signature matrix, a singer sheet-singer signature matrix and a singer sheet-song signature matrix; wherein N is the number of the song sets, and K is the number of random permutation Hash functions in the Min-Hash algorithm;

a Hash barrel optimization block for carrying out LSH barrel optimization on the user-singer signature matrix, the singer-singer signature matrix and the singer-song signature matrix to lead similar samples to be divided into the same Hash barrel and carrying out LSH barrel optimization on the training set L _train Respectively carrying out LSH bucket optimization on the signature matrixes of the song information, the singer information and the user information to obtain a singer-song hash bucket, a singer-singer hash bucket and a user-singer hash bucket;

a signature vector retrieval module for retrieving the test set L _test The song information, singer information and user information of the medium target song list are respectively subjected to Min-Hash dimensionality reduction, and signature vectors are input into corresponding song list-song Hash buckets, song list-singer Hash buckets and user-song list Hash buckets to carry out similar song list blockFast search to obtain similar song list alternate set Sim of target song list _set ；

A module for calculating the set of labels to be recommended, which is used for preparing the set Sim according to the similar song list _set And song list label correlation weight

Calculating the first z recommendation indexes

Maximum set of tags Rec to be recommended _T ；

A label association rule mining module used for mining the training set L through FP-Growth algorithm _train Chinese song list label combination L _Tag Mining the label association rule to obtain the association rule set rules of the label _T ；

A reordering module for reordering the set of rules based on the label association rule satisfying a threshold _T For the first z recommendation indexes

Maximum set of tags Rec to be recommended _T The tags in (1) are reordered, and then the first A tags are selected as the final recommendation result of the target song list.

7. A computer device, characterized in that the computer device comprises a memory and a processor, the memory having stored thereon a computer program operable on the processor, the computer program, when executed by the processor, implementing a singing sheet multi-tag recommendation method as claimed in any one of claims 1 to 5.

8. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to execute the singing sheet multi-tag recommendation method according to any one of claims 1 to 5.