CN105868372A - Label distribution method and device - Google Patents

Label distribution method and device Download PDF

Info

Publication number
CN105868372A
CN105868372A CN201610194484.7A CN201610194484A CN105868372A CN 105868372 A CN105868372 A CN 105868372A CN 201610194484 A CN201610194484 A CN 201610194484A CN 105868372 A CN105868372 A CN 105868372A
Authority
CN
China
Prior art keywords
song
user
matrix
sample
scoring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610194484.7A
Other languages
Chinese (zh)
Other versions
CN105868372B (en
Inventor
林锡雄
赵忠
陈胜凯
李祖辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Kugou Computer Technology Co Ltd
Original Assignee
Guangzhou Kugou Computer Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Kugou Computer Technology Co Ltd filed Critical Guangzhou Kugou Computer Technology Co Ltd
Priority to CN201610194484.7A priority Critical patent/CN105868372B/en
Publication of CN105868372A publication Critical patent/CN105868372A/en
Application granted granted Critical
Publication of CN105868372B publication Critical patent/CN105868372B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/686Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title or artist information, time, location or usage information, user ratings

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a label distribution method and device, and belongs to the field of song classification. The method comprises the steps that a sample song is obtained, wherein the sample song comprises a song label labeled in advance; a user rating matrix is obtained, wherein the user rating matrix comprises scores of at least one user on the sample song, and the scores are obtained by calculating according to operation behaviors of the users to the sample song; a song classifier is generated according to the user ranking matrix and the song label of the sample song; the song classifier distributes song labels for all songs in a song library. According to the label distribution method and device, behavior characteristics are extracted from user behavior data produced in the process of listening to songs by the users, the song classifier is constructed by means of the behavior characteristics, classification is further conducted on the songs through the classifier, and therefore the accuracy rate of song classification is increased.

Description

Label distribution method and device
Technical field
It relates to categorizing songs field, particularly to a kind of label distribution method and device.
Background technology
Carrying out song recommendations for convenience, each big music platform is according to units such as the school of song, style, scenes Element, is assigned with song label miscellaneous for the song in library.
Owing in library, number of songs is huge, the mode using manual allocation is that song distribution song label becomes This is too high, and therefore, each big music platform generally uses the mode building categorizing songs device automatically to distribute for song Song label.During building categorizing songs device, the mode beforehand through manual allocation is some samples Song distribution song label, and extract the song features such as the tone color of sample song, rhythm, pitch and the lyrics, And then the song label and song features according to sample song constructs categorizing songs device.For unallocated song The song to be allocated of bent label, categorizing songs device can be that it distributes phase according to the song features of song to be allocated The song label answered.
Inconspicuous for song features or that song features is the most similar song, carries out song according to song features Label distribution accuracy rate is relatively low, affects categorizing songs effect.
Summary of the invention
In order to solve inconspicuous for song features or that song features is the most similar song, according to song features Carrying out song label distribution accuracy rate relatively low, the problem affecting categorizing songs effect, the disclosure provides one mark Sign distribution method and device.Described technical scheme is as follows:
First aspect according to disclosure embodiment, it is provided that a kind of label distribution method, the method includes:
Obtaining sample song, sample song comprises the song label marked in advance;
Obtain user's rating matrix, user's rating matrix comprise at least one user scoring to sample song, The operation behavior of sample song is calculated by scoring according to user;
Song label according to user's rating matrix and sample song generates categorizing songs device;
It is that in library, each song distributes described song label by categorizing songs device.
Alternatively, during the operation behavior of sample song is included playing, download, collect and sharing by user extremely Few one;
Obtain user's rating matrix, including:
According to each self-corresponding weighted value of dissimilar operation behavior, calculate at least one user to sample song Scoring;
According at least one user, the scoring of each sample song is generated user's rating matrix.
Alternatively, after obtaining user's rating matrix, also include:
Use TF-IDF (Term Frequency-Inverse Document Frequency, term frequency-inverse document frequency Rate) user's rating matrix is optimized by model.
Alternatively, for scoring arbitrary in user's rating matrix, use TF-IDF model to user's rating matrix It is optimized, including:
Obtain the scoring C in user's rating matrixij, CijRepresent the user i scoring to song j;
Calculate CijCorresponding word frequency tfij, wherein, tK is for controlling parameter;
Calculate the inverse document frequency idf that song j is correspondingj, wherein,N is sample song sum, njIt instruction user's rating matrix is not the number of users of 0 to the scoring of song j;
According to tfijAnd idfjScoring w after calculation optimizationij, wherein, wij=tfij*idfj
According to wijGenerate the user's rating matrix after optimizing.
Alternatively, the user's rating matrix after optimization is x-y matrix, uses TF-IDF model to mark user After matrix is optimized, also include:
User's rating matrix after optimizing is carried out implicitization matrix decomposition, obtains the first matrix and the second matrix, Wherein, the first matrix is x-z matrix, and the second matrix is z-y matrix, z < x, and the first matrix instruction user Preference to song features, the second matrix instruction song and degree of association of song features;
Song label according to user's rating matrix and sample song generates categorizing songs device, including:
Song label according to the second matrix and sample song generates categorizing songs device.
Alternatively, generate categorizing songs device according to the song label of the second matrix and sample song, including:
The sample song of predetermined ratio is defined as training set, and predetermined ratio is more than 50%;
Categorizing songs device is built according to the second matrix and training set;
The sample song in addition to training set is used to test the classifying quality of categorizing songs device.
Second aspect according to disclosure embodiment, it is provided that a kind of label distribution device, this device includes:
Sample acquisition module, is used for obtaining sample song, and sample song comprises the song label marked in advance;
Matrix acquisition module, is used for obtaining user's rating matrix, comprises at least one and use in user's rating matrix The family scoring to sample song, the operation behavior of sample song is calculated by scoring according to user;
Generation module, generates categorizing songs for the song label according to user's rating matrix and sample song Device;
Distribution module, being used for by categorizing songs device is each song distribution song label in library.
Alternatively, during the operation behavior of sample song is included playing, download, collect and sharing by user extremely Few one;
Matrix acquisition module, including:
Calculating sub module, for according to each self-corresponding weighted value of dissimilar operation behavior, calculates at least one The individual user scoring to sample song;
Matrix generates submodule, for the scoring of each sample song being generated user according at least one user Rating matrix.
Alternatively, this device, also include:
Optimize module, be used for using TF-IDF model that user's rating matrix is optimized.
Alternatively, for scoring arbitrary in user's rating matrix, optimize module, including:
Obtain submodule, for obtaining the scoring C in user's rating matrixij, CijRepresent that user i is to song j Scoring;
First calculating sub module, is used for calculating CijCorresponding word frequency tfij, wherein,K is for controlling Parameter;
Second calculating sub module, for calculating the inverse document frequency idf that song j is correspondingj, wherein, N is sample song sum, njIt instruction user's rating matrix is not the number of users of 0 to the scoring of song j;
3rd calculating sub module, for according to tfijAnd idfjScoring w after calculation optimizationij, wherein, wij=tfij*idfj
Optimize submodule, for according to wijGenerate the user's rating matrix after optimizing.
Alternatively, the user's rating matrix after optimization is x-y matrix, and this device also includes:
Decomposing module, for the user's rating matrix after optimizing is carried out implicitization matrix decomposition, obtains the first square Battle array and the second matrix, wherein, the first matrix is x-z matrix, and the second matrix is z-y matrix, z < x, and First matrix instruction user's preference to song features, the second matrix instruction song and degree of association of song features;
Generation module, is used for:
Song label according to the second matrix and sample song generates categorizing songs device.
Alternatively, generation module, including:
Determining submodule, for the sample song of predetermined ratio is defined as training set, predetermined ratio is more than 50%;
Build submodule, for building categorizing songs device according to the second matrix and training set;
Test submodule, for using the classifying quality of the sample song test categorizing songs device in addition to training set.
Embodiment of the disclosure that the technical scheme of offer can include following beneficial effect:
By in advance sample song being carried out song label for labelling, according to user's operation behavior to sample song Obtain corresponding user's rating matrix, and then according to this user's rating matrix and the song label of sample song Generate categorizing songs device, and be all songs distribution song label in library by this categorizing songs device; Solve inconspicuous for song features or that song features is the most similar song, sing according to song features Bent label distribution accuracy rate is relatively low, the problem affecting categorizing songs effect;Reach to listen to song mistake from user The user behavior data produced in journey extracts behavior characteristics, utilizes behavior feature construction categorizing songs device, And according to this categorizing songs device, song is classified further, thus improve categorizing songs accuracy rate.
It should be appreciated that it is only exemplary that above general description and details hereinafter describe, can not Limit the disclosure.
Accompanying drawing explanation
Accompanying drawing herein is merged in description and constitutes the part of this specification, it is shown that meet the disclosure Embodiment, and in description together for explaining the principle of the disclosure.
Fig. 1 is the flow chart according to a kind of label distribution method shown in an exemplary embodiment;
Fig. 2 A is the flow chart according to a kind of label distribution method shown in another exemplary embodiment;
Fig. 2 B is the flow chart obtaining user's rating matrix process in label distribution method shown in Fig. 2 A;
Fig. 2 C is the flow chart of categorizing songs device test process in label distribution method shown in Fig. 2 A;
Fig. 3 is the block diagram according to a kind of label distribution device shown in an exemplary embodiment;
Fig. 4 is the block diagram according to a kind of label distribution device shown in another exemplary embodiment.
Detailed description of the invention
Here will illustrate exemplary embodiment in detail, its example represents in the accompanying drawings.Following retouches Stating when relating to accompanying drawing, unless otherwise indicated, the same numbers in different accompanying drawings represents same or analogous key element. Embodiment described in following exemplary embodiment does not represent all embodiment party consistent with the disclosure Formula.On the contrary, they only with describe in detail in appended claims, the disclosure some in terms of mutually one The example of the apparatus and method caused.
The label distribution method that each embodiment of the disclosure provides, can be come by the background server of music platform Realize.This background server can be single server, it is also possible to be the clothes being made up of some station servers Business device cluster or cloud computing center etc..
Fig. 1 is the flow chart according to a kind of label distribution method shown in an exemplary embodiment, the present embodiment Illustrating as a example by the background server of music platform by this label distribution method, the method can include Following step:
In a step 101, obtaining sample song, this sample song comprises the song label marked in advance.
A part of sample song in music platform library is labeled with song label in advance, and this song label is used In the mark school of sample song, style, scene etc..During carrying out label distribution, backstage takes Business device i.e. obtains the sample song in library.
In step 103, obtain user's rating matrix, this user's rating matrix comprises at least one user Scoring to sample song, the operation behavior of sample song is calculated by this scoring according to user.
Background server collect user's operation behavior to song, such as user play song, download song or Sharing song etc., and calculate user's scoring to this song according to this operation behavior, the highest instruction of marking is used Family is the highest to the preference of this song.
In step 105, generate song according to the song label of this user's rating matrix and sample song to divide Class device.
Background server is according to the song label of user's rating matrix and sample song, it is possible to further determine that Go out the song features of user preference and the song label that this song features is corresponding, and generate corresponding song and divide Class device.
In step 107, by categorizing songs device be in library each song distribution song label.
The user of remaining song in library is commented by background server according to the categorizing songs device generated and user Sub matrix, distributes corresponding song label for each song in library.
In sum, the label distribution method that the present embodiment provides, by carrying out song to sample song in advance Label for labelling, obtains corresponding user's rating matrix, Jin Ergen according to user to the operation behavior of sample song Generate categorizing songs device according to the song label of this user's rating matrix and sample song, and divided by this song Class device is all songs distribution song label in library;Solve or song inconspicuous for song features The song that feature is the most similar, carries out song label distribution accuracy rate according to song features relatively low, affects song The problem of classifying quality;Reach to listen to from user the user behavior data produced during song extracts row Be characterized, utilize behavior feature construction categorizing songs device, and further according to this categorizing songs device to song Classify, thus improve categorizing songs accuracy rate.
Fig. 2 A is the flow chart according to a kind of label distribution method shown in another exemplary embodiment, this enforcement Example illustrates as a example by the background server of music platform by this label distribution method, and the method can be wrapped Include following step:
In step 201, obtaining sample song, this sample song comprises the song label marked in advance.
Owing in music platform library, number of songs is huge, it is each song mark by the way of artificial mark Note song labeling requirement takes considerable time, and cost is high.In the label distribution method that the present embodiment provides, Before building categorizing songs device, it is only necessary to the sub-fraction song in library is defined as sample song, and Being that sample song marks song label by the way of artificial mark, wherein, this song label is used for indicating song Bent school (jazz, rock and roll, popular, classic etc.), style (sentimental, joyful, quiet etc.), Scene (motion, bar, shop at night, match etc.) etc..
Such as, music platform library comprises 1,000,000 songs, before building categorizing songs device, choose 1 Ten thousand songs are as sample song, and sample song carries out the mark of song label.It should be noted that The sample song chosen needs all schools in covering library, style, scene, thus ensures that sample is sung Bent is comprehensive and representative.
Building the categorizing songs device stage, background server i.e. obtains this sample song, schematically, sample The corresponding relation of song and song label can the most as shown in Table 1.
Table one
Sample song Song label
Song 00001 Classic, quiet, wedding
Song 00002 Rock and roll, fanaticism, bar
Song 10000 Popular, celebrating, wedding
In step 202., obtain user's rating matrix, this user's rating matrix comprises at least one user Scoring to sample song, the operation behavior of sample song is calculated by this scoring according to user.
User, during listening to song, can carry out certain operations to song, and the action row that user is to song For a series of user behavior data then can be produced.Such as, when user selects to listen to certain song, i.e. produce Played data to this song, this played data can include broadcasting time;The most such as, user downloads certain During song, i.e. produce the download data to this song;For another example, certain song is shared with good friend by user Time, i.e. create the sharing data to this song.
The music platform providing playback of songs can collect these user behavior datas, and to these user behaviors Data are analyzed, and obtain user's scoring to each song, and determine that user is to song according to this scoring Fancy grade.
In a kind of possible embodiment, as shown in Figure 2 B, this step may include steps of.
In step 202A, according to each self-corresponding weighted value of dissimilar operation behavior, calculate at least one User's scoring to sample song.
User is during listening to song, it is possible to carry out different types of operation, such as can play song, Download song, collect song or share song etc.;Further, different types of operation behavior instruction user couple The fancy grade of song is the most different, and such as, collection user indicated by this operation behavior of song is to song Fancy grade is more than playing user indicated by this operation behavior of the song fancy grade to song, therefore, When the operation behavior of song being calculated user to the scoring of song according to user, different types of operation behavior Corresponding weighted value is the most different.Schematically, the weighted value that different types of operation behavior is corresponding can be as Shown in table two.
Table two
Operation behavior type Weighted value
Play 1
Download 3
Collection 5
Share 10
According to each self-corresponding weighted value of dissimilar operation behavior, calculate at least one user to sample song Scoring time, the number of times that background server also needs to binding operation behavior corresponding calculates.Such as, for For song 00001, play when user 00001 has carried out 5 times to song 00001, and carried out 1 time When sharing, user 00001 is 1*5+8*1=13 to the scoring of song 00001.
It should be noted that for the operation behavior of some passiveness, such as delete song, labelling song For not liking etc., the weighted value of its correspondence can also be negative, and the present embodiment only enters with above-mentioned operation behavior Row schematically illustrates, and does not constitute the disclosure and limits.
In step 202B, according at least one user, the scoring of each sample song generated user and mark square Battle array.
According at least one user scoring to sample song, background server generates corresponding user and marks square Battle array.Such as, background server have collected the scoring to 10,000 first sample song of the 100 general-purpose families, generates 100 User's rating matrix of ten thousand-10 ten thousand, in this user's rating matrix, row represents user, and row represent song.
In step 203, use TF-IDF model that user's rating matrix is optimized.
The scoring determined according to user operation behavior determines that the fancy grade of user exists certain one-sidedness drawn game Sex-limited, such as, for song 00001 and song 00002, user plays 100 songs 00001, Corresponding user is 100 points to the scoring of song 00001, and user plays 10 songs 00002, corresponding User be 10 points to the scoring of song 00002, but user's hobby to song 00001 can not be represented Degree is 10 times of the fancy grade to song 00002.And in a practical situation, user's scoring to song And in power law relation between fancy grade.Therefore, after getting user's rating matrix, background server is also Need to use the TF-IDF model improved that user's rating matrix is optimized.
In a kind of possible embodiment, this step can also comprise the steps.
In step 203A, obtain the scoring C in user's rating matrixij, CijRepresent that user i is to song j Scoring.
Background server obtains the arbitrary scoring C in this user's rating matrixij
In step 203B, calculate CijCorresponding word frequency tfij, wherein,K is for controlling parameter.
Due between user's scoring and fancy grade to song in power law relation, therefore, background server is to Cij Carry out lg computing, obtain CijCorresponding word frequency tfij.If CijIt is 0, then it represents that song j was not performed by user i Operation, tfijAlso it is 0;If CijIt is not 0, then it represents that user i performed operation to song j.
Such as, C is worked asijIt is 10, and when k is 10, CijCorresponding word frequency tfijIt is ln2.
In step 203C, calculate the inverse document frequency idf that song j is correspondingj, wherein, For sample song sum, njIt instruction user's rating matrix is not the number of users of 0 to the scoring of song j.
According to the definition of IDF, if the document comprising a certain entry is the fewest, then illustrate that this entry has the best Class discrimination ability, the IDF that this entry is corresponding accordingly is the biggest.Similar, for a certain song, If a large number of users has carried out mark (i.e. mark is not 0) to this song, then the class discrimination energy of this song is described Power is the most weak.Therefore, background server is according to the sum of sample song, and song j is in user's rating matrix Scoring is not the number of 0, calculates the inverse document frequency idf that song j is correspondingj
Such as, for song j, sing in antiphonal style in sample song sum is 10000, and user's rating matrix Bent j scoring is not the number of users of 0 when being 2000, the inverse document frequency idf that song j is correspondingjIt is ln6.
It should be noted that there is not strict precedence relationship between above-mentioned steps 203B and step 203C, I.e. above-mentioned steps 203B and step 203C can perform simultaneously.
In step 203D, according to tfijAnd idfjScoring w after calculation optimizationij, wherein, wij=tfij*idfj
It is calculated tfijAnd idfjAfter, background server is i.e. according to formula wij=tfij*idfjIt is calculated commenting after optimization Point.
In step 203E, according to wijGenerate the user's rating matrix after optimizing.
Each scoring during user's rating matrix is converged by background server is optimized, thus after generating optimization User's rating matrix, the user's rating matrix after this optimization can embody user characteristics.
In step 204, the user's rating matrix after optimizing is carried out implicitization matrix decomposition, obtains the first square Battle array and the second matrix, wherein, the first matrix is x-z matrix, and the second matrix is z-y matrix, z < x, and First matrix instruction user's preference to song features, the second matrix instruction song and degree of association of song features.
Owing to number of users is huge, the dimension of the user's rating matrix after optimization may be got and tens million of even go up Hundred million (relevant with number of users), if directly generating grader according to the user's rating matrix after optimizing, can become Serious over-fitting, causes the classifying quality extreme difference of the grader generated.
In order to avoid producing Expired Drugs, background server needs to map the user's rating matrix after optimizing To a hidden vector space compared with low dimensional so that this user's rating matrix can be expressed as the interior of two matrixes Long-pending, one of them matrix is for indicating user's preference to song features, and another matrix is then used for indicating Song and the degree of association of song features.
As a kind of possible embodiment, background server carries out implicit expression to the user's rating matrix after optimizing Matrix decomposition, is x-z matrix and z-y matrix by x-y matrix decomposition originally, and x-z matrix therein indicates User's preference to song features, z-y matrix then indicates the degree of association of song and song features.
Such as, when the user's rating matrix after optimizing is 1000000-10000 matrix, background server is permissible It is 1000000-300 matrix (the first matrix) and 300-10000 matrix (the second matrix) by this matrix decomposition.
In step 205, categorizing songs device is generated according to the song label of the second matrix and sample song.
Owing to song features is embodied in song with the form of song label, therefore, according to the second matrix instruction The degree of association of song and song features and the song label of sample song, background server can generate song Bent grader.In a kind of possible implementation, this categorizing songs device can be SVM (Support Vector Machine, support vector machine).
In step 206, by categorizing songs device be in library each song distribution song label.
After generating categorizing songs device, background server determines the song not marking song label in library, and Obtain user's user's rating matrix to these songs, and then using user's rating matrix as categorizing songs device Input, for the song distribution song label not marked.It should be noted that do not mark the user that song is corresponding Rating matrix is also required to through similar above-mentioned steps 203 and the process of step 204, and the present embodiment is at this no longer Repeat.
In sum, the label distribution method that the present embodiment provides, by carrying out song to sample song in advance Label for labelling, obtains corresponding user's rating matrix, Jin Ergen according to user to the operation behavior of sample song Generate categorizing songs device according to the song label of this user's rating matrix and sample song, and divided by this song Class device is all songs distribution song label in library;Solve or song inconspicuous for song features The song that feature is the most similar, carries out song label distribution accuracy rate according to song features relatively low, affects song The problem of classifying quality;Reach to listen to from user the user behavior data produced during song extracts row Be characterized, utilize behavior feature construction categorizing songs device, and further according to this categorizing songs device to song Classify, thus improve categorizing songs accuracy rate.
In the present embodiment, by arranging different weighted values for different types of operation behavior, and according to this power Weight values is calculated user's scoring to song, generates corresponding user's rating matrix, and then comments according to user The song label of sub matrix and sample song generates categorizing songs device, it is achieved the classification to song, improves The efficiency of categorizing songs.
In the present embodiment, by using TF-IDF model that user's rating matrix is optimized, and use optimization After user's rating matrix generate categorizing songs device, improve the classifying quality of the grader of generation.
In the present embodiment, by the user's rating matrix after optimizing is carried out implicitization matrix decomposition, indicated The song features matrix of the degree of association of song and song features, and according to this song features matrix and sample song Song label generate categorizing songs device, it is to avoid because user's rating matrix dimension too high over-fitting caused is existing As, further increase the classifying quality of grader.
In order to ensure the classifying quality of the categorizing songs device generated, as a kind of possible embodiment, such as figure Shown in 2C, above-mentioned steps 205 may include steps of.
In step 205A, the sample song of predetermined ratio being defined as training set, predetermined ratio is more than 50%.
Background server chooses the sample song of predetermined ratio as training set, and is made by remaining sample song For test set.It should be noted that for the quality ensureing categorizing songs device, this predetermined ratio need to be more than 50%, I.e. in training set, the quantity of sample song need to be more than the quantity of sample song in test set.
Such as, background server chooses the sample song of 70% as training set, and by the sample of remaining 30% This song is as test set.
In step 205B, build categorizing songs device according to the second matrix and training set.
Background server uses the second matrix and training set to build categorizing songs device.
In step 205C, the sample song in addition to training set is used to test the classifying quality of categorizing songs device.
After completing the structure of categorizing songs device, background server use test set as the input of categorizing songs device, Distribute song label for the sample song in test set, and detect the song label of distribution and the song of manual mark The matching degree of bent label.If matching degree is higher than preset matching degree threshold value, it is determined that the classification of this categorizing songs device Effect is up to standard, and performs step 206;If matching degree is less than preset matching degree threshold value, it is determined that this categorizing songs The classifying quality of device is below standard.
In the present embodiment, by sample song is divided into training set and test set, and training set is used to build song Bent grader, uses the classifying quality of test set test categorizing songs device, thus ensure that categorizing songs device Classification quality.
Following for disclosure device embodiment, may be used for performing method of disclosure embodiment.For the disclosure The details not disclosed in device embodiment, refer to method of disclosure embodiment.
Fig. 3 is the block diagram according to a kind of label distribution device shown in an exemplary embodiment, and this device is permissible It is implemented as all or part of of music platform background server by software, hardware or software and hardware combining, This device includes:
Sample acquisition module 310, is used for obtaining sample song, and sample song comprises the song label marked in advance;
Matrix acquisition module 320, is used for obtaining user's rating matrix, comprises at least one in user's rating matrix User's scoring to sample song, the operation behavior of sample song is calculated by scoring according to user;
Generation module 330, generates song for the song label according to user's rating matrix and sample song and divides Class device;
Distribution module 340, being used for by categorizing songs device is each song distribution song label in library.
In sum, the label distribution device that the present embodiment provides, by carrying out song to sample song in advance Label for labelling, obtains corresponding user's rating matrix, Jin Ergen according to user to the operation behavior of sample song Generate categorizing songs device according to the song label of this user's rating matrix and sample song, and divided by this song Class device is all songs distribution song label in library;Solve or song inconspicuous for song features The song that feature is the most similar, carries out song label distribution accuracy rate according to song features relatively low, affects song The problem of classifying quality;Reach to listen to from user the user behavior data produced during song extracts row Be characterized, utilize behavior feature construction categorizing songs device, and further according to this categorizing songs device to song Classify, thus improve categorizing songs accuracy rate.
Fig. 4 is the block diagram according to a kind of label distribution device shown in another exemplary embodiment, and this device can To be implemented as all or part of of music platform background server by software, hardware or software and hardware combining, This device includes:
Sample acquisition module 410, is used for obtaining sample song, and sample song comprises the song label marked in advance;
Matrix acquisition module 420, is used for obtaining user's rating matrix, comprises at least one in user's rating matrix User's scoring to sample song, the operation behavior of sample song is calculated by scoring according to user;
Generation module 430, generates song for the song label according to user's rating matrix and sample song and divides Class device;
Distribution module 440, being used for by categorizing songs device is each song distribution song label in library.
In an optional embodiment, the operation behavior of sample song is included playing, downloads, receives by user Hide with share at least one;
Matrix acquisition module 420, including:
Calculating sub module 421, for according to each self-corresponding weighted value of dissimilar operation behavior, calculates at least One user scoring to sample song;
Matrix generates submodule 422, for the scoring of each sample song being generated use according at least one user Family rating matrix.
Alternatively, this device, also include:
Optimize module 450, be used for using TF-IDF model that user's rating matrix is optimized.
Alternatively, for scoring arbitrary in user's rating matrix, optimize module 450, including:
Obtain submodule 451, for obtaining the scoring C in user's rating matrixij, CijRepresent that user i is to song The scoring of j;
First calculating sub module 452, is used for calculating CijCorresponding word frequency tfij, wherein,K is Control parameter;
Second calculating sub module 453, for calculating the inverse document frequency idf that song j is correspondingj, wherein,N is sample song sum, njIn instruction user's rating matrix marking song j is not 0 Number of users;
3rd calculating sub module 454, for according to tfijAnd idfjScoring w after calculation optimizationij, wherein, wij=tfij*idfj
Optimize submodule 455, for according to wijGenerate the user's rating matrix after optimizing.
Alternatively, the user's rating matrix after optimization is x-y matrix, and this device also includes:
Decomposing module 460, for the user's rating matrix after optimizing is carried out implicitization matrix decomposition, obtains first Matrix and the second matrix, wherein, the first matrix is x-z matrix, and the second matrix is z-y matrix, z < x, And first matrix instruction user's preference to song features, second matrix instruction song relevant to song features Degree;
Generation module 430, is used for:
Song label according to the second matrix and sample song generates categorizing songs device.
Alternatively, generation module 430, including:
Determining submodule 431, for the sample song of predetermined ratio is defined as training set, predetermined ratio is more than 50%;
Build submodule 432, for building categorizing songs device according to the second matrix and training set;
Test submodule 433, for using the classification effect of the sample song test categorizing songs device in addition to training set Really.
In sum, the label distribution device that the present embodiment provides, by carrying out song to sample song in advance Label for labelling, obtains corresponding user's rating matrix, Jin Ergen according to user to the operation behavior of sample song Generate categorizing songs device according to the song label of this user's rating matrix and sample song, and divided by this song Class device is all songs distribution song label in library;Solve or song inconspicuous for song features The song that feature is the most similar, carries out song label distribution accuracy rate according to song features relatively low, affects song The problem of classifying quality;Reach to listen to from user the user behavior data produced during song extracts row Be characterized, utilize behavior feature construction categorizing songs device, and further according to this categorizing songs device to song Classify, thus improve categorizing songs accuracy rate.
In the present embodiment, by arranging different weighted values for different types of operation behavior, and according to this power Weight values is calculated user's scoring to song, generates corresponding user's rating matrix, and then comments according to user The song label of sub matrix and sample song generates categorizing songs device, it is achieved the classification to song, improves The efficiency of categorizing songs.
In the present embodiment, by using TF-IDF model that user's rating matrix is optimized, and use optimization After user's rating matrix generate categorizing songs device, improve the classifying quality of the grader of generation.
In the present embodiment, by the user's rating matrix after optimizing is carried out implicitization matrix decomposition, indicated The song features matrix of the degree of association of song and song features, and according to this song features matrix and sample song Song label generate categorizing songs device, it is to avoid because user's rating matrix dimension too high over-fitting caused is existing As, further increase the classifying quality of grader.
In the present embodiment, by sample song is divided into training set and test set, and training set is used to build song Bent grader, uses the classifying quality of test set test categorizing songs device, thus ensure that categorizing songs device Classification quality.
About the device in above-described embodiment, wherein modules performs the concrete mode of operation relevant The embodiment of the method is described in detail, explanation will be not set forth in detail herein.
Those skilled in the art, after considering description and putting into practice invention disclosed herein, will readily occur to these public affairs Other embodiment opened.The application is intended to any modification, purposes or the adaptations of the disclosure, These modification, purposes or adaptations are followed the general principle of the disclosure and include that the disclosure is not disclosed Common knowledge in the art or conventional techniques means.Description and embodiments is considered only as exemplary , the true scope of the disclosure and spirit are pointed out by claim below.
It should be appreciated that the disclosure is not limited to accurate knot described above and illustrated in the accompanying drawings Structure, and various modifications and changes can carried out without departing from the scope.The scope of the present disclosure is only by appended Claim limits.

Claims (12)

1. a label distribution method, it is characterised in that described method includes:
Obtaining sample song, described sample song comprises the song label marked in advance;
Obtain user's rating matrix, described user's rating matrix comprises at least one user described sample is sung Bent scoring, the operation behavior of described sample song is calculated by described scoring according to user;
Described song label according to described user's rating matrix and described sample song generates categorizing songs Device;
It is that in library, each song distributes described song label by described categorizing songs device.
Method the most according to claim 1, it is characterised in that user is to described in described sample song Operation behavior includes at least one in playing, download, collect and sharing;
Described acquisition user's rating matrix, including:
According to each self-corresponding weighted value of dissimilar operation behavior, calculate at least one user to described sample The described scoring of song;
According at least one user, the described scoring of each sample song is generated described user's rating matrix.
Method the most according to claim 1 and 2, it is characterised in that described acquisition user's rating matrix Afterwards, also include:
Use term frequency-inverse document frequency TF-IDF model that described user's rating matrix is optimized.
Method the most according to claim 3, it is characterised in that for appointing in described user's rating matrix One scoring, described user's rating matrix is optimized by described use TF-IDF model, including:
Obtain the scoring C in described user's rating matrixij, described CijRepresent user i institute's commentary to song j Point;
Calculate described CijCorresponding word frequency tfij, wherein,K is for controlling parameter;
Calculate the inverse document frequency idf that song j is correspondingj, wherein,N is described sample song Sum, described njIndicating in described user's rating matrix to song j scoring be not 0 number of users;
According to described tfijAnd idfjScoring w after calculation optimizationij, wherein, wij=tfij*idfj
According to described wijGenerate the described user's rating matrix after optimizing.
Method the most according to claim 3, it is characterised in that the described user's rating matrix after optimization For x-y matrix, after described user's rating matrix is optimized by described use TF-IDF model, also include:
Described user's rating matrix after optimizing is carried out implicitization matrix decomposition, obtains the first matrix and second Matrix, wherein, described first matrix is x-z matrix, and described second matrix is z-y matrix, z < x, and institute State first matrix instruction user's preference to song features, described second matrix instruction song and song features Degree of association;
The described described song label according to described user's rating matrix and described sample song generates song and divides Class device, including:
Described song label according to described second matrix and described sample song generates described categorizing songs Device.
Method the most according to claim 5, it is characterised in that described according to described second matrix and The described song label of described sample song generates described categorizing songs device, including:
The described sample song of predetermined ratio is defined as training set, and described predetermined ratio is more than 50%;
Described categorizing songs device is built according to described second matrix and described training set;
The described sample song in addition to described training set is used to test the classifying quality of described categorizing songs device.
7. a label distribution device, it is characterised in that described device includes:
Sample acquisition module, is used for obtaining sample song, and described sample song comprises the song mark marked in advance Sign;
Matrix acquisition module, is used for obtaining user's rating matrix, comprises at least one in described user's rating matrix The individual user scoring to described sample song, described scoring is according to user's operation behavior to described sample song It is calculated;
Generation module, for according to described user's rating matrix and the described song label of described sample song Generate categorizing songs device;
Distribution module, being used for by described categorizing songs device is that in library, each song distributes described song mark Sign.
Device the most according to claim 7, it is characterised in that user is to described in described sample song Operation behavior includes at least one in playing, download, collect and sharing;
Described matrix acquisition module, including:
Calculating sub module, for according to each self-corresponding weighted value of dissimilar operation behavior, calculates at least one The individual user described scoring to described sample song;
Matrix generates submodule, for generating the described scoring of each sample song according at least one user Described user's rating matrix.
9., according to the device described in claim 7 or 8, it is characterised in that described device, also include:
Optimize module, be used for using term frequency-inverse document frequency TF-IDF model that described user's rating matrix is entered Row optimizes.
Device the most according to claim 9, it is characterised in that in described user's rating matrix Arbitrary scoring, described optimization module, including:
Obtain submodule, for obtaining the scoring C in described user's rating matrixij, described CijRepresent user i couple The described scoring of song j;
First calculating sub module, is used for calculating described CijCorresponding word frequency tfij, wherein,K is Control parameter;
Second calculating sub module, for calculating the inverse document frequency idf that song j is correspondingj, wherein, N is described sample song sum, described njIndicate in described user's rating matrix to song j scoring be not 0 Number of users;
3rd calculating sub module, for according to described tfijAnd idfjScoring w after calculation optimizationij, wherein, wij=tfij*idfj
Optimize submodule, for according to described wijGenerate the described user's rating matrix after optimizing.
11. devices according to claim 9, it is characterised in that the described user after optimization marks square Battle array is x-y matrix, and described device also includes:
Decomposing module, for the described user's rating matrix after optimizing is carried out implicitization matrix decomposition, obtains the One matrix and the second matrix, wherein, described first matrix is x-z matrix, and described second matrix is z-y square Battle array, z < x, and described first matrix instruction user's preference to song features, described second matrix instruction song The bent degree of association with song features;
Described generation module, is used for:
Described song label according to described second matrix and described sample song generates described categorizing songs Device.
12. devices according to claim 11, it is characterised in that described generation module, including:
Determine submodule, for the described sample song of predetermined ratio being defined as training set, described pre-definite proportion Example is more than 50%;
Build submodule, for building described categorizing songs device according to described second matrix and described training set;
Test submodule, for using the described sample song in addition to described training set to test described categorizing songs The classifying quality of device.
CN201610194484.7A 2016-03-31 2016-03-31 Label distribution method and device Active CN105868372B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610194484.7A CN105868372B (en) 2016-03-31 2016-03-31 Label distribution method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610194484.7A CN105868372B (en) 2016-03-31 2016-03-31 Label distribution method and device

Publications (2)

Publication Number Publication Date
CN105868372A true CN105868372A (en) 2016-08-17
CN105868372B CN105868372B (en) 2019-11-05

Family

ID=56626431

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610194484.7A Active CN105868372B (en) 2016-03-31 2016-03-31 Label distribution method and device

Country Status (1)

Country Link
CN (1) CN105868372B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106951527A (en) * 2017-03-21 2017-07-14 北京邮电大学 A kind of song recommendations method and device
WO2017201976A1 (en) * 2016-05-24 2017-11-30 华为技术有限公司 Topic recommending method and device
CN107977374A (en) * 2016-10-21 2018-05-01 北京酷我科技有限公司 Bent storehouse optimization method and device
CN108108338A (en) * 2018-01-05 2018-06-01 维沃移动通信有限公司 A kind of method for processing lyric, lyric display method, server and mobile terminal
CN108268544A (en) * 2016-12-30 2018-07-10 北京酷我科技有限公司 The mask method and system of a kind of song
CN109063069A (en) * 2018-07-23 2018-12-21 天翼爱音乐文化科技有限公司 Song label determines method, apparatus, computer equipment and readable storage medium storing program for executing
CN110188268A (en) * 2019-05-21 2019-08-30 浙江工商大学 A kind of personalized recommendation method based on label and temporal information
CN112163116A (en) * 2020-09-28 2021-01-01 广州酷狗计算机科技有限公司 Song classification method and device and computer readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104156472A (en) * 2014-08-25 2014-11-19 四达时代通讯网络技术有限公司 Video recommendation method and system
CN104503973A (en) * 2014-11-14 2015-04-08 浙江大学软件学院(宁波)管理中心(宁波软件教育中心) Recommendation method based on singular value decomposition and classifier combination

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104156472A (en) * 2014-08-25 2014-11-19 四达时代通讯网络技术有限公司 Video recommendation method and system
CN104503973A (en) * 2014-11-14 2015-04-08 浙江大学软件学院(宁波)管理中心(宁波软件教育中心) Recommendation method based on singular value decomposition and classifier combination

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JOACHIM SELKE 等,: "Extracting Features from Ratings: The Role of Factor Models", 《HTTP://CN.ARXIV.ORG/ABS/1101.2378》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017201976A1 (en) * 2016-05-24 2017-11-30 华为技术有限公司 Topic recommending method and device
US11830033B2 (en) 2016-05-24 2023-11-28 Huawei Technologies Co., Ltd. Theme recommendation method and apparatus
US20190087884A1 (en) 2016-05-24 2019-03-21 Huawei Technologies Co., Ltd. Theme recommendation method and apparatus
CN107977374A (en) * 2016-10-21 2018-05-01 北京酷我科技有限公司 Bent storehouse optimization method and device
CN108268544A (en) * 2016-12-30 2018-07-10 北京酷我科技有限公司 The mask method and system of a kind of song
CN108268544B (en) * 2016-12-30 2021-07-23 北京酷我科技有限公司 Song labeling method and system
CN106951527B (en) * 2017-03-21 2020-01-17 北京邮电大学 Song recommendation method and device
CN106951527A (en) * 2017-03-21 2017-07-14 北京邮电大学 A kind of song recommendations method and device
CN108108338A (en) * 2018-01-05 2018-06-01 维沃移动通信有限公司 A kind of method for processing lyric, lyric display method, server and mobile terminal
CN108108338B (en) * 2018-01-05 2022-02-15 维沃移动通信有限公司 Lyric processing method, lyric display method, server and mobile terminal
CN109063069A (en) * 2018-07-23 2018-12-21 天翼爱音乐文化科技有限公司 Song label determines method, apparatus, computer equipment and readable storage medium storing program for executing
CN110188268A (en) * 2019-05-21 2019-08-30 浙江工商大学 A kind of personalized recommendation method based on label and temporal information
CN112163116A (en) * 2020-09-28 2021-01-01 广州酷狗计算机科技有限公司 Song classification method and device and computer readable storage medium

Also Published As

Publication number Publication date
CN105868372B (en) 2019-11-05

Similar Documents

Publication Publication Date Title
CN105868372A (en) Label distribution method and device
CN103823867B (en) Humming type music retrieval method and system based on note modeling
CN103327053B (en) Online Music method for pushing and system
CN108304429A (en) Information recommendation method, device and computer equipment
CN104991899A (en) Identification method and apparatus of user property
Patra et al. Automatic music mood classification of Hindi songs
CN108806657A (en) Music model training, musical composition method, apparatus, terminal and storage medium
CN105718532A (en) Cross-media sequencing method based on multi-depth network structure
Tai The structure of knowledge and dynamics of scholarly communication in agenda setting research, 1996–2005
CN107767850A (en) A kind of singing marking method and system
CN108766451B (en) Audio file processing method and device and storage medium
CN107993636B (en) Recursive neural network-based music score modeling and generating method
CN110008397A (en) A kind of recommended models training method and device
CN110992988B (en) Speech emotion recognition method and device based on domain confrontation
CN113813609B (en) Game music style classification method and device, readable medium and electronic equipment
CN103123636A (en) Method to build vocabulary entry classification models, method of vocabulary entry automatic classification and device
CN109346043A (en) A kind of music generating method and device based on generation confrontation network
CN108806355A (en) A kind of calligraphy and painting art interactive education system
Frieler et al. Is it the song and not the singer? Hit song prediction using structural features of melodies
CN110347821A (en) A kind of method, electronic equipment and the readable storage medium storing program for executing of text categories mark
CN107493641A (en) A kind of lamp light control method and device driven using music
CN109471951A (en) Lyrics generation method, device, equipment and storage medium neural network based
CN108520436A (en) The value assessment method and apparatus of content
CN103116646B (en) A kind of music emotion recognition method based on cloud gene expression programming
CN107729486A (en) A kind of video searching method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 510660 Guangzhou City, Guangzhou, Guangdong, Whampoa Avenue, No. 315, self - made 1-17

Applicant after: Guangzhou KuGou Networks Co., Ltd.

Address before: 510000 B1, building, No. 16, rhyme Road, Guangzhou, Guangdong, China 13F

Applicant before: Guangzhou KuGou Networks Co., Ltd.

GR01 Patent grant
GR01 Patent grant