CN113220931B - Multi-label song menu recommendation method, system, equipment and storage medium - Google Patents

Multi-label song menu recommendation method, system, equipment and storage medium Download PDF

Info

Publication number
CN113220931B
CN113220931B CN202110316152.2A CN202110316152A CN113220931B CN 113220931 B CN113220931 B CN 113220931B CN 202110316152 A CN202110316152 A CN 202110316152A CN 113220931 B CN113220931 B CN 113220931B
Authority
CN
China
Prior art keywords
song
song list
list
label
hash
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110316152.2A
Other languages
Chinese (zh)
Other versions
CN113220931A (en
Inventor
王晨旭
郭晨野
杨煜
索凯强
管晓宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202110316152.2A priority Critical patent/CN113220931B/en
Publication of CN113220931A publication Critical patent/CN113220931A/en
Application granted granted Critical
Publication of CN113220931B publication Critical patent/CN113220931B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/65Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/61Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • G06F16/635Filtering based on additional data, e.g. user or group profiles

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A song list multi-label recommendation method, a system, equipment and a storage medium divide song list data into a test set and a training set, and calculate a song information hash bucket, a singer information hash bucket and a user information hash bucket by respectively adopting a locality sensitive hash algorithm on song information, singer information and user information in the training set; mapping songs, singers and user information in the test set according to corresponding hash buckets respectively to obtain a similar song sheet alternative set, and calculating an initial tag set to be recommended; mining a song list label set in a training set to obtain an association rule set of labels, reordering the labels, and selecting the first A labels in the front order for recommendation to realize song list recommendation. The invention has higher recommendation accuracy and lower time consumption. The method has better compatibility with a recommendation model based on a collaborative filtering algorithm adopted by the current online music platform, the cost and risk for upgrading the system recommendation algorithm are lower, and the method is simple and efficient.

Description

Multi-label song list recommendation method, system, equipment and storage medium
Technical Field
The invention relates to the field of music recommendation systems, in particular to a song list multi-label recommendation method, a song list multi-label recommendation system, song list multi-label recommendation equipment and a song list multi-label recommendation storage medium.
Background
The song list label plays an important role in improving the song listening experience of the online music user and encouraging the user to produce the personalized song list. With the benefit of the development of big data technology, one can implicitly infer the characteristics of songs in a song sheet from a large number of expert tagged song sheets. The collaborative filtering algorithm is used as a classical recommendation algorithm, and can help people to obtain implicit information of the song list from a large amount of data, further obtain other similar song lists according to the target song list, and then calculate a label which is possibly suitable for the target song list through a similar song list set. However, although the collaborative filtering algorithm in the big data era is widely applied, due to the high-dimensional sparsity of song list data, the traditional collaborative filtering algorithm still has the problems that the recommendation accuracy is low, the calculation complexity is high, the tag cannot be recommended on line in real time on a newly created song list, and the like, so that the method is difficult to be applied in practice.
Disclosure of Invention
The invention aims to provide a song list multi-label recommendation method, a song list multi-label recommendation system, song list multi-label recommendation equipment and a song list multi-label recommendation storage medium.
In order to achieve the purpose, the invention adopts the following technical scheme:
a multi-label song menu recommending method is characterized in that,
dividing the song list data into a test set and a training set, and respectively calculating a song information hash bucket, a singer information hash bucket and a user information hash bucket by using a locality sensitive hash algorithm for song information, singer information and user information in the training set;
hash mapping is carried out on the songs, the singers and the user information in the test set respectively according to the song information Hash bucket, the singer information Hash bucket and the user information Hash bucket by adopting a local sensitive Hash algorithm, and a similar song list alternative set is obtained;
calculating an initial tag set to be recommended according to the similar song list candidate set and the tag relevance weight of each song list;
mining the song list tag set in the training set through an FP-Growth algorithm to obtain an association rule set of tags, reordering the tags of the initial to-be-recommended tag set according to the association rule set of the tags, and selecting the first A tags in the front of the ordering for recommendation to realize song list recommendation, wherein A is a set value.
The invention is further improved in that the method specifically comprises the following steps:
step 1: collecting the song data, and dividing the song data into test sets L test And training set L train Respectively training sets L using Min-Hash algorithm train Reducing the dimension of the song information, the singer information and the user information singer sheet sample to N x K dimension, and generating a user-singer sheet signature matrix, a singer sheet-singer signature matrix and a singer sheet-song signature matrix; wherein N is the number of the song sets, and K is the number of random permutation Hash functions in the Min-Hash algorithm;
step 2: LSH barrel optimization is carried out on a user-singing bill signature matrix, a singing bill-singer signature matrix and a singing bill-song signature matrix, similar samples are divided into the same Hash barrel, and a training set L is subjected to train Respectively carrying out LSH bucket optimization on the signature matrixes of the song information, the singer information and the user information to obtain a singer-song hash bucket, a singer-singer hash bucket and a user-singer hash bucket;
and 3, step 3: will test set L test Signature vectors of song information, singer information and user information of the medium-target song list after Min-Hash dimensionality reduction are input into corresponding song list-song Hash buckets, song list-singer Hash buckets and user-song list Hash buckets to carry out rapid search of similar song lists, and a similar song list alternative set Sim of the target song list is obtained set
And 4, step 4: selection set Sim according to similar song list set And the relevance value of the label
Figure BDA0002991293610000021
Calculating the first z recommendation indexes
Figure BDA0002991293610000022
Maximum set of tags Rec to be recommended T
And 5: training set L through FP-Growth algorithm train Chinese song list label combination L Tag Mining the association rule of the label to obtain the association rule set rules of the label T
And 6: rule set rules associated with labels according to satisfaction of threshold T For the first z recommendation indexes
Figure BDA0002991293610000023
Maximum set of tags Rec to be recommended T The labels in (1) are reordered, and then the first A labels are selected as the final recommendation result of the target song list.
A further development of the invention consists in that, in step 1, a singing sheet-song signature matrix S M Generated by the following process:
step 1): mapping the song subscript value of the song list in the training set to the song list L i Generating the song list L of the Chinese song i Corresponding song subscript list
Figure BDA0002991293610000031
Is represented as follows:
Figure BDA0002991293610000032
wherein,
Figure BDA0002991293610000033
denotes the index value of the jth song in the training set song list, j denotes the menu L i The number of songs;
step 2): initializing k and generating k different random permutation functions h i (x) Each permutation function h i (x) Is represented as follows:
Figure BDA0002991293610000034
wherein HASH _ PRIME is a large PRIME number, a i And b i Are all [1,HASH PRIME -1]A random number in between, and a random number,
Figure BDA0002991293610000035
a parameter representing an input;
step 3): initializing song list L i Corresponding song subscript list
Figure BDA0002991293610000036
Signature vector of
Figure BDA0002991293610000037
The updating strategy is to update the song list L i List of corresponding song indices
Figure BDA0002991293610000038
Each song subscript in (1)
Figure BDA0002991293610000039
Respectively substituted into the kth permutation function h k In (3), signature vector
Figure BDA00029912936100000310
The kth parameter of (1)
Figure BDA00029912936100000311
Update to a minimum value, expressed as follows:
Figure BDA00029912936100000312
and step 4): repeating the step 3), and mixing the song list L i List of corresponding song indices
Figure BDA00029912936100000313
Signature vector of
Figure BDA00029912936100000314
Updating completion, updating training set L train The rest N-1 song lists are finally calculated and mapped user-song list signature matrix is completed
Figure BDA00029912936100000315
Is represented as follows:
Figure BDA00029912936100000316
the further improvement of the invention is that in the step 2, the song list-song hash bucket is obtained through the following processes:
step 1): sign matrix S for song list-song M Dividing the segments into b segments, each segment consisting of r rows, wherein b x r = k;
step 2): signature matrix S for song list-song M Performing hash bucket division on each segment of each column to obtain a plurality of hash buckets; sings divided into the same hash bucket on any section are regarded as similar sings;
and step 3): calculating a hash bucket to which a target song list in the test set can be mapped, and taking the song list in the hash bucket as a song list set similar to the target song list so as to obtain a song list-song hash bucket;
song list-song hash bucket Sim LM
Sim LM ={L i →mj|i∈L,j∈M}
Singing-singer hash bucket Sim LS
Sim LS ={L i →s k |i∈L,j∈S}
User-song list hash bucket Sim LU
Sim LU ={u j →L i |i∈L,j∈U}
Wherein M represents a training set L train The song set is in, S represents the training set L train The middle singer set, U, represents the training set L train And (4) user collection.
The invention is further improved in that the specific process of the step 4 is as follows:
step 5.1: initializing a quantum tag
Figure BDA0002991293610000041
Where c =73, recommendation index per label
Figure BDA0002991293610000042
Are all initialized to 1;
step 5.2: according to the song list label correlation weight value
Figure BDA0002991293610000043
Updating recommendation indicators in Tag
Figure BDA0002991293610000044
The calculation formula is as follows:
Figure BDA0002991293610000045
step 5.3: similar song menu alternative Sim for calculating target song menu set In (1) recommendation index
Figure BDA0002991293610000046
Then according to the recommendation index
Figure BDA0002991293610000047
The labels are sorted from high to low to obtain the first z recommendation indexes
Figure BDA0002991293610000048
Maximum set of tags Rec T
Figure BDA0002991293610000049
Wherein
Figure BDA00029912936100000410
Wherein, the correlation weight of the song list
Figure BDA00029912936100000411
Calculated by the following formula:
Figure BDA00029912936100000412
wherein,
Figure BDA00029912936100000413
for training set L train Each song list L i Is combined in a training set L train Frequency of occurrence of, L i Is a song list, t i Is a song list L i The label of (1).
The invention is further improved in that the specific process of the step 5 is as follows:
step 6.1: constructing an item head table, constructing a hollow FP tree, and scanning a training set L train Counting each label combination, deleting the label combination with the support degree lower than the minimum support degree min _ supp according to the counting result to obtain a first frequent item set, storing the first frequent item set into an item head table, and then sorting the first frequent item set in a descending order according to the support degree;
step 6.2: secondary scanning training set L train Combining the song list labels in the step (1), removing an infrequent item set, and sorting in a descending order according to the support degree to obtain a well-sorted frequent item set;
step 6.3: sequentially inserting the frequent item sets in the step 6.2 into the FP tree;
step 6.4: recursively mining a frequent item set through an item header table, and filtering out the frequent item set which does not meet the minimum confidence coefficient min _ conf to obtain an association rule set:
Figure BDA0002991293610000051
among them, rules T Representing a set of tag association rules, d representing the total number of association rules mined, R i Representing the ith association rule;
the support degree is calculated by the following formula:
Figure BDA0002991293610000052
in the formula, delta (t) i ,t j ) Label combination for representing song list t i ,t j ]At L train Number of occurrences of winning combinations, delta (L) Tag ) Represents L train Total number of combinations of medium tags; supp (t) i →t j ) Indicating the support of this combination;
confidence is calculated by:
Figure BDA0002991293610000053
in the formula, delta (t) i ) Indicates the label t i Total number of (c), conf (t) i →t j ) Is shown at the label t i On the premise of occurrence, t j Probability that the tag will appear.
The invention is further improved in that the specific process of the step 6 is as follows:
step 7.1: according to implicit relation between labels shown by association rules, treating recommended label set Rec T Tagging recommendation index
Figure BDA0002991293610000061
Reordering is performed by the following formula:
Figure BDA0002991293610000062
and 7.2: finally, according to the label recommendation index
Figure BDA0002991293610000063
Result of (2) to-be-recommended labelset Rec T And sorting from high to low again, and selecting the top A labels as the final label recommendation result of the target song list.
A song menu multi-tag recommendation system comprising:
the hash bucket calculation module is used for dividing the song list data into a test set and a training set, and calculating a song information hash bucket, a singer information hash bucket and a user information hash bucket by respectively adopting a local sensitive hash algorithm on song information, singer information and user information in the training set;
the Hash mapping module is used for carrying out Hash mapping on the songs, the singers and the user information in the test set by adopting a local sensitive Hash algorithm according to the song information Hash bucket, the singer information Hash bucket and the user information Hash bucket respectively to obtain a similar song list alternative set;
the initial to-be-recommended tag set calculating module is used for calculating an initial to-be-recommended tag set according to the similar song list alternative set and the tag relevance weight of each song list;
and the label reordering module is used for mining the song list label set in the training set through an FP-Growth algorithm to obtain an association rule set of labels, reordering the labels of the initial to-be-recommended label set according to the association rule set of the labels, and selecting the first A labels in the front order for recommendation to realize song list recommendation, wherein A is a set value.
A computer device comprising a memory and a processor, the memory having stored thereon a computer program operable on the processor, the computer program when executed by the processor implementing a singing sheet multi-tag recommendation method as described above.
A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to perform a singing sheet multi-tag recommendation method as claimed above.
Compared with the prior art, the invention has the following beneficial effects:
aiming at the problems of low recommendation efficiency, low recommendation accuracy and the like of the traditional collaborative filtering algorithm in the network-accessible cloud song list data, the LSH/Min-Hash-based song list label recommendation method introduced into FP-Growth is provided, and the song list label recommendation process is mainly divided into three steps: firstly, the authority of the song list label combination attribute in the song list of the training set is measured by using the concept of 'label frequency weight determination', the expression capacity of a certain type of label on the song list category is embodied through the song list label combination frequency and the inverse label frequency, and therefore the phenomena that the label recommendation result is inclined and the hot label is recommended due to unbalanced label sample distribution are solved. Secondly, in order to quickly search out a similar song list alternative collection of the test collection target song list, a Min-Hash algorithm is used for carrying out Hash operation on a song list data matrix for K times, and the song list data is mapped into a song list signature matrix with dimensions of N x K, so that the effects of matrix compression and dimension reduction are achieved. On the basis of obtaining the compressed signature matrix, b-fold Hash is carried out by using an LSH algorithm, the singing sheets with local similarity are mapped into the same Hash barrel, and then the singing sheets mapped into the same Hash barrel at least once are used as candidate similarity pairs for carrying out similarity calculation, so that a nearest neighbor set of the target singing sheet is generated, namely a similar singing sheet alternative set of the target singing sheet, and the problem of low efficiency of searching the similar singing sheet set caused by large data scale is solved. And finally, on the basis of forming an initial recommendation result of the tag by using the song list in the alternative set, obtaining a deep implicit relation with high quality of the tag by using an FP-Growth tag implicit information mining technology to reorder the initial recommendation result, and then selecting the former A tags as a final tag recommendation result of the target song list, so that the accuracy of the recommendation method is improved. The invention can adapt to the label recommendation task of large-scale song list data and has the characteristics of simple structure and high recommendation efficiency. Compared with the traditional collaborative filtering method, the method can also be used for labeling the newly created song list data to recommend the song list data in real time, and is high in accuracy and high in recommending speed.
Drawings
FIG. 1 is a power law distribution diagram of the number of singing orders contained in different labels and label combinations. Wherein, (a) is the tag rank, and (b) is the tag combination rank.
FIG. 2 is a graph showing a distribution of the number of singing orders contained in the combination of the Top15 tag and the Top15 tag. Wherein, (a) is single label distribution, and (b) is label combination distribution.
Fig. 3 is a flowchart of a song menu multi-tag recommendation method based on an improved collaborative filtering algorithm.
Fig. 4 is a schematic diagram of a singing bill signature matrix for bucket optimization using the LSH algorithm.
FIG. 5 is a graph comparing time of recommendation versus accuracy of recommendations at different data set scales.
Fig. 6 is a comparison graph of the method (LSH) and the conventional collaborative filtering method (TRAD) in different optimization modes.
FIG. 7 is a comparison of different optimization regimes at two dataset scales.
Detailed Description
The present invention is described in detail below with reference to the attached drawings.
A song list multi-label recommendation method based on an improved collaborative filtering algorithm,using captured one hundred thousand scale network Yi cloud music song list data, dividing the song list data into test set song list data L according to the proportion of 1: 9 test And training set song list data L train
First through L train Calculating the label correlation weight of each song list, and then calculating L train Respectively using locality sensitive hashing algorithm to calculate hash bucket Sim of song information LM Singer information hash bucket Sim LS And user information hash bucket Sim LU
Then use locality sensitive hashing algorithm to pair L test The songs, singers and user information in the song list are subjected to Hash mapping according to a Hash barrel respectively to obtain a similar song list alternative set;
then, calculating an initial to-be-recommended label set Rec according to the alternative set of similar vocabularies T
And finally, mining the song list label set in the training set through an FP-Growth algorithm to obtain an association rule set of labels, reordering the labels of the initial to-be-recommended label set according to the association rule set of the labels, and selecting the first A labels in the front order for recommendation, thereby realizing the effect of quickly and accurately recommending song list data. Wherein A is a set value, and A is more than 0 and less than or equal to 73.
The method specifically comprises the following steps:
step 1: calculating the correlation weight of the song list through the combination frequency and the inverse tag frequency of the song list tag
Figure BDA0002991293610000081
Specifically, the correlation weight of the song list
Figure BDA0002991293610000082
The specific calculation process of (2) is as follows:
step 1.1: firstly, the song list data is divided into test sets L according to the ratio of 1: 9 test And training set L train Calculate L train Each song list L i Is combined in the whole L train Frequency of occurrence in
Figure BDA0002991293610000083
And calculates the song list L i Label t in (1) i Present alone in L train Frequency of
Figure BDA0002991293610000091
Step 1.2: song list correlation weight
Figure BDA0002991293610000092
The calculation formula is as follows:
Figure BDA0002991293610000093
and 2, step: separately dividing L by using Min-Hash algorithm train Reducing the dimension of song information, singer information and user information song list samples to N x K dimensions to generate three signature matrixes, wherein N is the number of song list sets, and K is the number of random replacement Hash functions in a Min-Hash algorithm; l is train In each song list L i Containing 3 types of data objects, each being a song list L i Album LM i Singer LS collection i And its creator LU i
Generating a song list-song signature matrix S M The specific process is as follows:
step 2.1: mapping the song subscript value of the song list in the training set to the song list L i Generating a song list L i Corresponding song subscript list
Figure BDA0002991293610000094
It is represented as follows:
Figure BDA0002991293610000095
wherein,
Figure BDA0002991293610000096
indicating that the jth song is trainingSubscript value in exercise album song list, j represents the song list L i The number of songs.
Step 2.2: initializing k, generating k different random permutation functions h i (x) Each permutation function h i (x) Is represented as follows:
Figure BDA0002991293610000097
wherein HASH _ PRIME is a large PRIME number, a i And b i Are all [1,HASH PRIME -1]A random number in between, and a random number,
Figure BDA0002991293610000098
representing the parameters of the input.
Step 2.3: initialization
Figure BDA0002991293610000099
Signature vector of
Figure BDA00029912936100000910
The update strategy is to
Figure BDA00029912936100000911
Each song subscript in (1)
Figure BDA00029912936100000912
Respectively substituted into the k permutation function h k In (3), signature vector
Figure BDA00029912936100000918
The kth parameter of
Figure BDA00029912936100000913
Update to its minimum value, which is expressed as follows:
Figure BDA00029912936100000914
step 2.4: repeating step 2.3, and mixing
Figure BDA00029912936100000915
Signature vector of
Figure BDA00029912936100000916
Update completed, update L train The rest N-1 song lists are finally calculated to obtain the mapped song list-song signature matrix
Figure BDA00029912936100000917
It is represented as follows:
Figure BDA0002991293610000101
training set L train The song list data in (a) generates a song list-singer signature matrix and a user-song list signature matrix, respectively, in the same manner as the song list-song signature matrix.
And step 3: l subjected to dimension reduction train LSH barrel optimization is carried out on the singing signature matrix, similar samples are divided into the same Hash barrel (Hash-bucket), samples without similarity can not be divided into the same Hash barrel with high probability, and L is subjected to train LSH (local least squares) bucket optimization is respectively carried out on signature matrixes of song information, singer information and user information in the database to obtain three different Hash buckets (Hash-buckets), and the concept of the Hash buckets is as follows:
the hash bucket is a block of area which is provided with an identifier in the hash table and used for storing the objects after hash operation, if a plurality of objects fall into the same bucket after hash operation, the collision is indicated, and the objects in the hash bucket have higher similarity;
the specific process is as follows:
step 3.1: sign the song list-song matrix S M Dividing into b segments, each segment consisting of r rows, wherein b x r = k;
step 3.2: signature matrix S for song list-song M Each column of the song list, namely the signature vector of each song list, is respectively subjected to Hash bucket division on each section,obtaining a plurality of hash buckets; sings that are classified into the same hash bucket on any segment are considered similar sings.
Step 3.3: finally, only the hash bucket to which the target song list in the test set can be mapped needs to be calculated, and then the song list in the hash bucket is used as the song list set similar to the target song list.
Step 3.4: and (3) respectively generating a corresponding song list-song hash bucket, a song list-singer hash bucket and a user-song list hash bucket by using the generated song list-song, song list-singer and user-song list signature matrixes in the modes from the step 3.1 to the step 3.3, wherein the specific steps are as follows:
song list-song Hash-bucket:
Sim LM ={L i →m j |i∈L,j∈M}
song list-singer Hash-bucket:
Sim LS ={L i →s k |i∈L,j∈S}
user-song list Hash-bucket:
Sim LU ={u j →L i |i∈L,j∈U}
wherein M represents L train Middle song collection, S represents L train Middle singer set, U represents L train And (4) user collection.
And 4, step 4: mixing L with test The song information, singer information and user information of the medium target song list are respectively subjected to Min-Hash dimensionality reduction, and signature vectors are input into corresponding song list-song Hash buckets, song list-singer Hash buckets and user-song list Hash buckets to carry out similar song list quick retrieval, so that a similar song list alternative set Sim of the target song list is obtained set
And 5: selection of Sim by similar song list set And song list label correlation weight value
Figure BDA0002991293610000111
Calculating the front z recommendation indexes
Figure BDA0002991293610000112
Maximum set of tags Rec to be recommended T (ii) a Utensil for cleaning buttockThe process is as follows:
step 5.1: initializing a full-size tag
Figure BDA0002991293610000113
Where c =73 indicates that there are 73 different labels in total, and the recommendation index of each label
Figure BDA0002991293610000114
Are all initialized to 1;
step 5.2: correlation weight value combined with song list label
Figure BDA0002991293610000115
To update the recommendation index in the Tag
Figure BDA0002991293610000116
The calculation formula is as follows:
Figure BDA0002991293610000117
step 5.3: similar song menu alternative Sim for calculating target song menu set In (1) recommendation index
Figure BDA0002991293610000118
Then according to the recommended index
Figure BDA0002991293610000119
Sequencing the labels from high to low to obtain the first z recommendation indexes
Figure BDA00029912936100001110
Maximum set of tags Rec T
Figure BDA00029912936100001111
Wherein
Figure BDA00029912936100001112
Step 6: training set L through FP-Growth algorithm train Chinese song list label combination L Tag Mining the label association rule to obtain the association rule set rules of the label T (ii) a The specific process is as follows:
setting a minimum support min _ supp and a minimum confidence min _ conf threshold value according to experience through an FP-Growth algorithm, and setting L train When the song list label combination in the song list is used for carrying out label association rule mining, the label combination with the support degree smaller than the minimum support degree is deleted, then the association rule set with the confidence degree smaller than the minimum confidence degree is filtered, and finally the label association rule set rules meeting the threshold value is obtained T (ii) a The calculation formulas of the support degree and the confidence degree are as follows;
the support degree is as follows:
Figure BDA0002991293610000121
in the formula, delta (t) i ,t j ) Label combination for representing song list t i ,t j ]At L train Number of occurrences of the winning tag combination, δ (L) Tag ) Represents L train Total number of combinations of tags in (c). supp (t) i →t j ) Indicating the support of this combination, with larger values indicating a greater frequency of occurrence of this combination, and more relevant two tags.
Confidence coefficient:
Figure BDA0002991293610000122
in the formula, delta (t) i ) Indicates the label t i Total number of (c), conf (t) i →t j ) Is shown at the label t i On the premise of occurrence, t j If the confidence coefficient is 1, the probability that the label will appear indicates the label t i When present, the label t j Must occur.
The specific process of step 6 is as follows:
step 1: construct item header tables, construct empty FP trees, and scan L train The song list label combination is counted, and according to the counting result, every label combination is countedDeleting the label combination with the support degree lower than the minimum support degree min _ supp to obtain a first frequent item set, storing the first frequent item set into an item head table, and then sorting the first frequent item set in a descending order according to the support degree;
step 2: second pass L train Combining the song list labels in the step (1), removing an infrequent item set, and sorting in a descending order according to the support degree to obtain a well-sorted frequent item set;
and step 3: inserting the frequent item sets in the step 2 into the FP tree in sequence (the ancestor node of the FP tree is the node with the top rank, and the descendant node is the node with the back rank);
and 4, step 4: recursively mining a frequent item set through an item header table, and filtering out the frequent item set which does not meet the minimum confidence coefficient min _ conf to obtain an association rule set:
Figure BDA0002991293610000123
among them, rules T Representing mined tag association rule set, d representing the total number of mined association rules, R i Representing the ith association rule and the confidence of the rule;
and 7: according to the slave L train The set of related rules of the tags mined in T For the first z recommendation indexes
Figure BDA0002991293610000131
Maximum set of tags Rec to be recommended T The labels in (1) are reordered, and then the first A labels are selected as the final recommendation result of the target song list.
The specific process is as follows:
step 7.1: according to the implicit relation between the labels shown by the association rule, the label set Rec to be recommended T Index for making label recommendation
Figure BDA0002991293610000132
Resequencing, countingThe calculation formula is as follows:
Figure BDA0002991293610000133
step 7.2: finally, according to the label recommendation index
Figure BDA0002991293610000134
Result of (2) treating the recommended tab set Rec T And sorting from high to low again, and selecting the top A labels as the final label recommendation result of the target song list.
Firstly, dividing captured Internet music song list data into test set song lists L according to the proportion of 1: 9 test And training set song list L train . Using L train Calculating L of the song list set train The tag relevance weight of each song list is weighted, and then L is added train And (5) carrying out dimension reduction on song list data and carrying out barrel optimization. To L test And finally, rearranging the initial tag recommendation result of the tag by using the association rule of the tag to obtain a more accurate tag recommendation result, thereby realizing the rapid and accurate tag recommendation of large-scale song menu data.
According to the invention, based on the characteristics of huge scale, high-dimensional sparseness, uneven label distribution and the like of the song list data, the multidimensional information of the data and the distribution information of the song list label combination are combined, and the accuracy of label recommendation is improved by introducing an FP-Growth association rule algorithm and calculating the song list correlation degree weight based on the song list label combination frequency and the inverse label frequency. By introducing the Min-Hash-based locality sensitive Hash algorithm into the collaborative filtering algorithm, the retrieval efficiency of similar singing lists is greatly improved, and the time for recommending the model is reduced. Therefore, compared with the singing list tag recommendation model based on the traditional collaborative filtering method, the recommendation accuracy of the tag recommendation model provided by the method is greatly improved, and the singing list tag recommendation real-time target is realized due to the low calculation complexity of the recommendation model. It is worth noting that compared with a recommendation model based on a deep learning algorithm, the method provided by the invention has better compatibility with a recommendation model based on a collaborative filtering algorithm adopted by a current online music platform, and the cost and risk for upgrading the system recommendation algorithm are lower.
The following are specific examples.
Example 1
TABLE 1 Experimental data set
Figure BDA0002991293610000141
Referring to (a) and (b) in fig. 1 and (a) and (b) in fig. 2, the invention first captures 10 ten thousand song sheets from the internet cloud music platform, there are 73 different Tag tags Tag, and there are 16449 different song sheet Tag combinations L Tag And as shown in Table 1, the total song list data was divided into test set song lists L in a ratio of 1: 9 test And training set song list L train . The labels in the song list data set are explored to find that labels in Europe, america, fashion and the like have high utilization rate in the whole data set, but labels in games, cantonese and the like have low utilization rate, the situation that the number of each label in the song list is large in proportion is shown, and the number of the song lists owned by different song list label combinations also shows large inclination. Under the condition, if the traditional collaborative filtering singing list label recommendation model is directly adopted, the phenomenon that the recommendation result tends to recommend hot labels is caused, and even under the condition that the similarity of the target singing list and other singing lists is scattered accurately, the recommendation accuracy of the model is not high. Therefore, the invention considers the idea of determining the weight by using the label frequency to carry out the relevant measurement on the label combination authority in the song list, and the expression capability of a certain type of label on the song list category is embodied by the song list label combination frequency and the inverse label frequency.
The method specifically comprises the following steps:
step 1: compute training set L train Each song list L in i Is combined in a training set L train Frequency of occurrence in
Figure BDA0002991293610000142
And calculates the song list L i Label t in (1) i Appear in the training set L alone train Frequency of
Figure BDA0002991293610000143
Step 2: the expression ability of a certain type of labels to song categories is embodied by the song list label combination frequency and the inverse label frequency, and the calculation formula is as follows:
Figure BDA0002991293610000151
in the formula
Figure BDA0002991293610000152
Represents the song list L i The purpose of the numerator plus 1 is to prevent the weight from being negative. In particular, assume that for two songs L having different tag combinations 1 And L 2 The combination frequency of the song list label is
Figure BDA0002991293610000153
But L 1 The labels in (1) are relatively hot, have
Figure BDA0002991293610000154
The initial weight calculation formula of the song list can be obtained
Figure BDA0002991293610000155
It is reasonable to do so that L 1 The positive influence of the hot tag makes the song list tag combination frequency
Figure BDA0002991293610000156
And with
Figure BDA0002991293610000157
And the influence of the hot tags needs to be punished to a certain extent, so that a more real singing list tag combination weight authority value is obtained.
FIG. 3 is a flow chart of a song list multi-tag recommendation method of the present invention, which uses a song list weight initialization mode to reduce the influence of hot tags; acquiring a deep implicit relation with high quality of a label by using a label association rule mining technology based on FP-Growth; the method comprises the following steps of mapping high-dimensional sparse song list data into a low-dimensional dense song list signature matrix through a Min-Hash algorithm, then carrying out similarity clustering on songs by using an LSH algorithm, and combining the similarity clustering with a collaborative filtering method, so that a song list label recommendation task can be carried out more quickly and accurately, and the method specifically comprises the following steps:
step 1: firstly, initializing the song list label weight through the song list label combination frequency and the inverse label frequency, and enabling the song list label correlation degree weight to be weighted
Figure BDA0002991293610000158
And introducing the singing bill similarity calculation to reflect the probability distribution of the labels by using the label-singing bill correlation.
Figure BDA0002991293610000159
The larger the size, the more the song list L i The higher the authority of the tag combination in (1), and conversely,
Figure BDA00029912936100001510
the smaller the size, the singing list L i The lower the authority of the tag combination in (1). The calculation formula is as follows:
Figure BDA00029912936100001511
and 2, step: by training set L train The song list data set respectively calculates the signature matrixes of the song list-song, the song list-singer and the user-song list by using a Min-Hash algorithm, then maps the song list with high similarity in each signature matrix into a corresponding Hash bucket by using an LSH algorithm, and realizes fast retrieval of the test set L according to the three Hash buckets test A selection of similar singing lists for medium target singing lists. The specific retrieval process is as follows:
1) First adopt Min-Hash algorithm generates training set L train The signature matrix of the following three data sets is calculated in the same manner, for example, the singe-song signature matrix is calculated as follows:
1.1 will song list L i The song ID in (1) is mapped to the subscript value in the total list of songs, and the song list L i List of songs in (1)
Figure BDA0002991293610000161
Can be expressed in the following form:
Figure BDA0002991293610000162
1.2 generating k different random permutation functions h i (x) Permutation of parameter a in function i And b i Are random numbers with a value of [1,HASH _PRIME]HASH _ PRIME is a large PRIME number, and the permutation function is expressed as follows:
Figure BDA0002991293610000163
1.3 pairs of song list L i Song list of
Figure BDA0002991293610000164
Initializing a signature vector in the form of
Figure BDA0002991293610000165
The updating strategy of the signature vector is to pass a permutation function h k Updating each parameter in the signature vector to the minimum value thereof, wherein the signature vector after updating is represented as follows:
Figure BDA0002991293610000166
1.4 repeating steps 1.2 and 1.3 until all the singing menu signature vectors are mapped, and generating a signature matrix S M Watch, thereofShown below:
Figure BDA0002991293610000167
2) Then, the calculated signature matrix S is again compared with the signature matrix S in FIG. 4 M LSH barrel optimization is carried out, and the specific process can be divided into three steps:
2.1 signature matrix S M Dividing into b segments, each segment consisting of r rows, wherein b x r = k;
2.2, performing hash bucket division on each segment of each signature vector in the matrix, and mapping any segment in the same bucket as a similar segment, wherein the song list corresponding to the segments is regarded as a similar song list;
2.3, when searching the similar song list set of the target song list, only calculating the hash bucket to which the signature vector of the target song list can be mapped, and then taking the song list in the hash bucket as the similar song list alternative set of the target song list.
And step 3: pass test set L test Alternative collection Sim of similar singing lists of medium-target singing lists set And song list initial weight information
Figure BDA0002991293610000168
Calculating recommendation indexes of each to-be-recommended label
Figure BDA0002991293610000169
The simplified calculation formula is as follows:
Figure BDA00029912936100001610
and 4, step 4: selecting the labels of the first z maximum recommendation indexes as the initial recommendation result Rec of the target song list T It is represented as follows:
Figure BDA0002991293610000171
and 5: using initial recommendation Rec T And label association rules mined through FP-Growth T Reordering the tags to be recommended, wherein the reordering is calculated by the following steps:
Figure BDA0002991293610000172
step 6: finally, according to the calculated label set Rec to be recommended T Selecting a recommendation index
Figure BDA0002991293610000173
The largest top A is used as the label recommendation result of the target song list.
Compared with the traditional collaborative filtering recommendation method, the method has the advantages of high recommendation accuracy, high recommendation speed, capability of performing online real-time recommendation, simplicity and effectiveness.
Fig. 5 shows a comparison between the improved collaborative filtering multi-label recommendation method (LSH) and the conventional collaborative filtering recommendation method (TRAD) in terms of recommendation accuracy and recommendation efficiency. The effect evaluation of the recommendation method is to calculate an F1 value through precision and recall, use the F1 value as an evaluation index of the recommendation result, and define the following:
the precision ratio is as follows:
Figure BDA0002991293610000174
the higher the accuracy rate is, the larger the proportion of the recommended correct label number in the recommended total label number is.
The recall ratio is as follows:
Figure BDA0002991293610000175
wherein test _ tags _ num represents the actual correct tag number. The higher the recall rate is, the larger the proportion of the recommended correct label number to the actual correct total label number is.
F1 value:
Figure BDA0002991293610000181
the F1 value serves as a weighted harmony of precision ratio precision and recall ratio call, the influence of the precision ratio precision and recall ratio call on model accuracy evaluation is comprehensively considered, the higher the F1 value is, the more stable the overall accuracy of the recommended model is, and vice versa.
TABLE 2 song list on-line recommendation accuracy comparison results
Figure BDA0002991293610000182
TABLE 3 results of the present invention in the scene of song list online real-time recommendation
Figure BDA0002991293610000183
TABLE 4 detailed description of ablation experiments
Figure BDA0002991293610000184
Tables 2 and 3 show the comparison results of the invention and the traditional collaborative filtering recommendation method under the situation of singing list online real-time recommendation. Table 4 and fig. 6-7 show the comparison result of the multi-label recommendation of the singing sheet after the improved method is not used or partially used on the real singing sheet data in order to verify whether the improved methods of the present invention are effective. Referring to tables 2 to 4 and fig. 6 to 7, the comparison result shows that all the improvement modes proposed by the present invention can greatly improve the effect of the singing style label recommendation task, and have better recommendation accuracy and recommendation real-time performance.
In the invention, it is proposed that: 1) singing order relevance weight calculation based on label combination frequency and inverse label frequency, 2) a collaborative filtering recommendation method based on LSH/Min-Hash, and 3) a label rearrangement method based on FP-Growth. The method mainly solves the problems that the traditional collaborative filtering method is low in recommendation accuracy and high in calculation complexity when song list label recommendation is carried out, and labels cannot be recommended on newly created song lists in real time on line. Aiming at the problems of recommendation efficiency and online real-time recommendation, the invention provides a collaborative filtering song list recommendation method based on LSH/Min-Hash, so that a similar song list alternative set can be quickly searched out according to a target song list, the quick query of the similar song list is realized, the algorithm calculation complexity is reduced, and the online recommendation of a newly made song list label is supported. Aiming at the problem of accuracy improvement of recommendation, the invention calculates the singing bill relevance weight through the singing bill label combination frequency and the inverse label frequency, then calculates the index to be recommended of the label by using a weighting fusion strategy, and finally carries out label rearrangement on candidate label combinations by using the calculated label association rule to obtain the final recommended label. The experiment result based on the real song list data of the Internet music cloud indicates that the recommendation evaluation index F1 value of the recommendation result after the optimization of the weight and the association rule is greatly improved compared with that of the traditional collaborative filtering method, and the average running time is greatly improved compared with that of the traditional collaborative filtering method.
A song menu multi-tag recommendation system comprising:
the hash bucket calculation module is used for dividing the song list data into a test set and a training set, and calculating a song information hash bucket, a singer information hash bucket and a user information hash bucket by respectively adopting a local sensitive hash algorithm on song information, singer information and user information in the training set;
the Hash mapping module is used for carrying out Hash mapping on the songs, the singers and the user information in the test set by adopting a local sensitive Hash algorithm according to the song information Hash bucket, the singer information Hash bucket and the user information Hash bucket respectively to obtain a similar song list alternative set;
the initial to-be-recommended tag set calculating module is used for calculating an initial to-be-recommended tag set according to the similar song list alternative set and the tag relevance weight of each song list;
and the label reordering module is used for mining the song list label set in the training set through an FP-Growth algorithm to obtain an association rule set of labels, reordering the labels of the initial to-be-recommended label set according to the association rule set of the labels, and selecting the first A labels in the front order for recommendation to realize song list recommendation.
A computer device comprising a memory and a processor, the memory having stored thereon a computer program executable on the processor, the computer program when executed by the processor implementing a singing sheet multi-tag recommendation method as claimed in any one of claims 1 to 7.
A computer-readable storage medium storing a computer program which, when executed by a processor, causes the processor to execute the singing sheet multi-tag recommendation method of any one of claims 1 to 7.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that, in the present application, the embodiments and features of the embodiments may be combined with each other without conflict.
The invention may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
As used in this disclosure, "module," "device," "system," and the like are intended to refer to a computer-related entity, either hardware, a combination of hardware and software, or software in execution. In particular, for example, an element may be, but is not limited to being, a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. Also, an application or script running on a server, or a server, may be an element. One or more elements may be in a process and/or thread of execution and an element may be localized on one computer and/or distributed between two or more computers and may be operated by various computer-readable media. The elements may also communicate by way of local and/or remote processes based on a signal having one or more data packets, e.g., from a data packet interacting with another element in a local system, distributed system, and/or across a network in the internet with other systems by way of the signal.
Finally, it should also be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising 8230; \8230;" comprises 8230; "does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it should be noted that: although the present invention has been described in detail with reference to the above embodiments, it should be understood by those skilled in the art that: modifications and equivalents may be made to the embodiments of the invention without departing from the spirit and scope of the invention, which is to be covered by the claims.

Claims (8)

1. A song list multi-label recommendation method is characterized by comprising the following steps:
step 1: collecting the song data, and dividing the song data into test sets L test And training set L train Respectively using Min-Hash algorithm to respectively convert the training sets L train Reducing the dimension of the song information, singer information and user information singing sheet samples to NxK dimension, and generating a user-singing sheet signature matrix, a singing sheet-singer signature matrix and a singing sheet-song signature matrix; wherein N is the number of song sets, and K is the number of random permutation Hash functions in the Min-Hash algorithm;
and 2, step: LSH barrel optimization is carried out on a user-singing bill signature matrix, a singing bill-singer signature matrix and a singing bill-song signature matrix, similar samples are divided into the same Hash barrel, and a training set L is subjected to train Respectively carrying out LSH bucket optimization on the signature matrixes of the song information, the singer information and the user information to obtain a singer-song hash bucket, a singer-singer hash bucket and a user-singer hash bucket;
and step 3: will test set L test The song information, singer information and user information of the medium target song list are respectively subjected to Min-Hash dimensionality reduction, and signature vectors are input into corresponding song list-song Hash buckets, song list-singer Hash buckets and user-song list Hash buckets to carry out similar song list quick retrieval, so that a similar song list alternative set Sim of the target song list is obtained set
And 4, step 4: selection set Sim according to similar song list set And song list label correlation weight
Figure FDA0003953299540000011
Calculating the first z recommendation indexes
Figure FDA0003953299540000012
Maximum set of tags Rec to be recommended T
And 5: by FP-Growth algorithm, training set L train Chinese song list label combination L Tag Mining the label association rule to obtain the association rule set rules of the label T (ii) a The specific process of step 5 is as follows:
step 6.1: constructing an item head table, constructing a hollow FP tree, and scanning a training set L train Counting each label combination, deleting the label combination with the support degree lower than the minimum support degree min _ supp according to the counting result to obtain a first frequent item set, storing the first frequent item set into an item head table, and then sorting the first frequent item set in a descending order according to the support degree;
step 6.2: secondary scanning training set L train Combining the song list tags in the step (1), removing an infrequent item set, and sorting in a descending order according to the support degree to obtain a frequent item set in a sorted order;
step 6.3: sequentially inserting the frequent item sets in the step 6.2 into the FP tree;
step 6.4: recursively mining a frequent item set through an item head table, and filtering out the frequent item set which does not meet the minimum confidence coefficient min _ conf to obtain an association rule set:
Figure FDA0003953299540000021
among them, rules T Representing a set of tag association rules, d representing the total number of association rules mined, R i Representing the ith association rule;
the support degree is calculated by the following formula:
Figure FDA0003953299540000022
in the formula, delta (t) i ,t j ) Label combination for representing song list t i ,t j ]At L train Number of occurrences of the winning tag combination, δ (L) Tag ) Represents L train Total number of combinations of medium tags; supp (t) i →t j ) Indicating the support of this combination;
confidence is calculated by:
Figure FDA0003953299540000023
in the formula, delta (t) i ) Indicates the label t i Total number of (c), conf (t) i →t j ) Is shown at label t i On the premise of occurrence, t j The probability that a tag will appear;
step 6: associating rules sets rules according to tags meeting threshold T For the first z recommendation indexes
Figure FDA0003953299540000024
Maximum set of tags Rec to be recommended T The labels in (1) are reordered, and then the first A labels are selected as the final recommendation result of the target song list.
2. The method as claimed in claim 1, wherein the singing list-song signature matrix S in step 1 M Generated by the following process:
step 1): mapping song subscript value of song list in training set to song list L i Generating a song list L i Corresponding song subscript list
Figure FDA0003953299540000025
Is represented as follows:
Figure FDA0003953299540000026
wherein,
Figure FDA0003953299540000027
index value indicating jth song in the training set song list, j indicates the menu L i The number of songs;
step 2): initializing k and generating k different random permutation functions h i (x) Each permutation function h i (x) Is represented as follows:
Figure FDA0003953299540000031
wherein HASH _ PRIME is a large PRIME number, a i And b i Are all [1,HASH PRIME -1]A random number in between, and a random number,
Figure FDA0003953299540000032
a parameter representing an input;
step 3): initializing song list L i Corresponding song subscript list
Figure FDA0003953299540000033
Signature vector of
Figure FDA0003953299540000034
The updating strategy is to change the song list L i List of corresponding song indices
Figure FDA0003953299540000035
Each song subscript in (1)
Figure FDA0003953299540000036
Respectively substituted into the kth permutation function h k In (2), signature vector
Figure FDA0003953299540000037
The kth parameter of
Figure FDA0003953299540000038
Update to a minimum value, expressed as follows:
Figure FDA0003953299540000039
step 4): repeating the step 3), and mixingSong list L i List of corresponding song indices
Figure FDA00039532995400000310
Signature vector of
Figure FDA00039532995400000311
Updating is completed, the training set L is updated train The rest N-1 singing lists are finally calculated and mapped user-singing list signature matrixes are obtained
Figure FDA00039532995400000312
Figure FDA00039532995400000313
Is represented as follows:
Figure FDA00039532995400000314
3. the method for recommending song tickets with multiple tags as claimed in claim 1, wherein in step 2, the song ticket-song hash bucket is obtained by the following steps:
step 1): sign the song list-song matrix S M Dividing into b segments, each segment consisting of r rows, wherein b x r = k;
step 2): signature matrix S for song list-song M Performing hash bucket division on each section of each row to obtain a plurality of hash buckets; sings divided into the same hash bucket on any section are regarded as similar sings;
step 3): calculating a hash bucket to which a target song list in the test set can be mapped, and taking the song list in the hash bucket as a song list set similar to the target song list so as to obtain a song list-song hash bucket;
song list-song hash bucket Sim LM
Sim LM ={L i →m j |i∈L,j∈M}
Singing list-singer hash bucketSim LS
Sim LS ={L i →s k |i∈L,j∈S}
User-song list hash bucket Sim LU
Sim LU ={u j →L i |i∈L,j∈U}
Wherein M represents a training set L train The middle song set, S represents the training set L train The middle singer set, U, represents the training set L train And (4) user collection.
4. The method for recommending song list with multiple labels as claimed in claim 1, wherein the specific process of step 4 is as follows:
step 5.1: initializing a full-size tag
Figure FDA0003953299540000041
Where c =73, recommendation index per label
Figure FDA0003953299540000042
Are all initialized to 1;
step 5.2: according to the song list label correlation weight
Figure FDA0003953299540000043
Updating recommendation indicators in Tag
Figure FDA0003953299540000044
The calculation formula is as follows:
Figure FDA0003953299540000045
step 5.3: similar song list alternate set Sim for calculating target song list set In (2) recommendation index
Figure FDA0003953299540000046
Then according to the recommendation index
Figure FDA0003953299540000047
Sequencing the labels from high to low to obtain the first z recommendation indexes
Figure FDA0003953299540000048
Maximum set of tags Rec T
Figure FDA0003953299540000049
Figure FDA00039532995400000410
Wherein
Figure FDA00039532995400000411
Wherein, the correlation weight of the song list
Figure FDA00039532995400000412
Calculated by the following formula:
Figure FDA00039532995400000413
wherein,
Figure FDA00039532995400000414
for training set L train Each song list L i Is combined in a training set L train Frequency of occurrence of, L i Is a song list, t i Is a song list L i The label of (1).
5. The method for recommending song list with multiple labels as claimed in claim 1, wherein the specific process of step 6 is as follows:
step 7.1: according to the implicit relation between the labels shown by the association rule, the label set Rec to be recommended T Tagging recommendation index
Figure FDA00039532995400000415
Reordering, the calculation formula is as follows:
Figure FDA0003953299540000051
and 7.2: finally, according to the label recommendation index
Figure FDA0003953299540000052
Result of (2) to-be-recommended labelset Rec T And sorting from high to low again, and selecting the first A labels as the final label recommendation result of the target song list.
6. A song list multi-tag recommendation system, comprising:
a signature matrix generation module for collecting the song list data and dividing the song list data into a test set L test And training set L train Respectively training sets L using Min-Hash algorithm train Reducing the dimension of the song information, the singer information and the user information singer sheet sample to N x K dimension, and generating a user-singer sheet signature matrix, a singer sheet-singer signature matrix and a singer sheet-song signature matrix; wherein N is the number of the song sets, and K is the number of random permutation Hash functions in the Min-Hash algorithm;
a Hash barrel optimization block for carrying out LSH barrel optimization on the user-singer signature matrix, the singer-singer signature matrix and the singer-song signature matrix to lead similar samples to be divided into the same Hash barrel and carrying out LSH barrel optimization on the training set L train Respectively carrying out LSH bucket optimization on the signature matrixes of the song information, the singer information and the user information to obtain a singer-song hash bucket, a singer-singer hash bucket and a user-singer hash bucket;
a signature vector retrieval module for retrieving the test set L test The song information, singer information and user information of the medium target song list are respectively subjected to Min-Hash dimensionality reduction, and signature vectors are input into corresponding song list-song Hash buckets, song list-singer Hash buckets and user-song list Hash buckets to carry out similar song list blockFast search to obtain similar song list alternate set Sim of target song list set
A module for calculating the set of labels to be recommended, which is used for preparing the set Sim according to the similar song list set And song list label correlation weight
Figure FDA0003953299540000053
Calculating the first z recommendation indexes
Figure FDA0003953299540000054
Maximum set of tags Rec to be recommended T
A label association rule mining module used for mining the training set L through FP-Growth algorithm train Chinese song list label combination L Tag Mining the label association rule to obtain the association rule set rules of the label T
A reordering module for reordering the set of rules based on the label association rule satisfying a threshold T For the first z recommendation indexes
Figure FDA0003953299540000055
Maximum set of tags Rec to be recommended T The tags in (1) are reordered, and then the first A tags are selected as the final recommendation result of the target song list.
7. A computer device, characterized in that the computer device comprises a memory and a processor, the memory having stored thereon a computer program operable on the processor, the computer program, when executed by the processor, implementing a singing sheet multi-tag recommendation method as claimed in any one of claims 1 to 5.
8. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, causes the processor to execute the singing sheet multi-tag recommendation method according to any one of claims 1 to 5.
CN202110316152.2A 2021-03-24 2021-03-24 Multi-label song menu recommendation method, system, equipment and storage medium Active CN113220931B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110316152.2A CN113220931B (en) 2021-03-24 2021-03-24 Multi-label song menu recommendation method, system, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110316152.2A CN113220931B (en) 2021-03-24 2021-03-24 Multi-label song menu recommendation method, system, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113220931A CN113220931A (en) 2021-08-06
CN113220931B true CN113220931B (en) 2023-01-03

Family

ID=77083949

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110316152.2A Active CN113220931B (en) 2021-03-24 2021-03-24 Multi-label song menu recommendation method, system, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113220931B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106227881A (en) * 2016-08-04 2016-12-14 腾讯科技(深圳)有限公司 A kind of information processing method and server
CN108334601A (en) * 2018-01-31 2018-07-27 腾讯音乐娱乐科技(深圳)有限公司 Song recommendations method, apparatus and storage medium based on label topic model
CN108717442A (en) * 2018-05-16 2018-10-30 成都市极米科技有限公司 Similar video display based on machine learning recommend method
CN109102326A (en) * 2018-07-15 2018-12-28 山东工业职业学院 A kind of cloud food and drink platform and analysis method based on big data signature analysis
CN109299366A (en) * 2018-09-28 2019-02-01 西安交通大学深圳研究院 A kind of network data classification recommender system calculated in real time based on content similarity
CN109299313A (en) * 2018-08-03 2019-02-01 昆明理工大学 A kind of song recommendations method based on FP-growth
KR20200045310A (en) * 2018-10-22 2020-05-04 삼성에스디에스 주식회사 Method for recommending information based on hashtag and terminal for executing the same
CN111738807A (en) * 2020-07-23 2020-10-02 上海众旦信息科技有限公司 Method, computing device, and computer storage medium for recommending target objects

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106227881A (en) * 2016-08-04 2016-12-14 腾讯科技(深圳)有限公司 A kind of information processing method and server
CN108334601A (en) * 2018-01-31 2018-07-27 腾讯音乐娱乐科技(深圳)有限公司 Song recommendations method, apparatus and storage medium based on label topic model
CN108717442A (en) * 2018-05-16 2018-10-30 成都市极米科技有限公司 Similar video display based on machine learning recommend method
CN109102326A (en) * 2018-07-15 2018-12-28 山东工业职业学院 A kind of cloud food and drink platform and analysis method based on big data signature analysis
CN109299313A (en) * 2018-08-03 2019-02-01 昆明理工大学 A kind of song recommendations method based on FP-growth
CN109299366A (en) * 2018-09-28 2019-02-01 西安交通大学深圳研究院 A kind of network data classification recommender system calculated in real time based on content similarity
KR20200045310A (en) * 2018-10-22 2020-05-04 삼성에스디에스 주식회사 Method for recommending information based on hashtag and terminal for executing the same
CN111738807A (en) * 2020-07-23 2020-10-02 上海众旦信息科技有限公司 Method, computing device, and computer storage medium for recommending target objects

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"A conceptual framework for building a mobile services" recommendation engine";Ivan Ganchev et al.;《International IEEE Conference on Intelligent Systems》;20161110;全文 *
"LSH-RTRS:基于局部敏感哈希技术的能耗社区实时推荐***";韩军辉 等;《万方数据知识服务平台》;20180427;全文 *

Also Published As

Publication number Publication date
CN113220931A (en) 2021-08-06

Similar Documents

Publication Publication Date Title
KR102092691B1 (en) Web page training methods and devices, and search intention identification methods and devices
CN108804641B (en) Text similarity calculation method, device, equipment and storage medium
CN108363821A (en) A kind of information-pushing method, device, terminal device and storage medium
CN105701216B (en) A kind of information-pushing method and device
CN102053971B (en) Recommending method and equipment for sequencing-oriented collaborative filtering
CN110688474B (en) Embedded representation obtaining and citation recommending method based on deep learning and link prediction
US20070112867A1 (en) Methods and apparatus for rank-based response set clustering
CN108647322B (en) Method for identifying similarity of mass Web text information based on word network
WO2013066929A1 (en) Method and apparatus of ranking search results, and search method and apparatus
CN110929161A (en) Large-scale user-oriented personalized teaching resource recommendation method
CN107291895B (en) Quick hierarchical document query method
CN110188197A (en) It is a kind of for marking the Active Learning Method and device of platform
CN109933660A (en) The API information search method based on handout and Stack Overflow towards natural language form
Zhou et al. Relevance feature mapping for content-based multimedia information retrieval
CN110096499A (en) A kind of the user object recognition methods and system of Behavior-based control time series big data
CN103761286B (en) A kind of Service Source search method based on user interest
CN106126681B (en) A kind of increment type stream data clustering method and system
CN111078859A (en) Author recommendation method based on reference times
CN114722086A (en) Method and device for determining search rearrangement model
CN110472659A (en) Data processing method, device, computer readable storage medium and computer equipment
CN102760127B (en) Method, device and the equipment of resource type are determined based on expanded text information
CN106294784B (en) resource searching method and device
CN103186650A (en) Searching method and device
CN113220931B (en) Multi-label song menu recommendation method, system, equipment and storage medium
CN115098728A (en) Video retrieval method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant