CN105868372A

CN105868372A - Label distribution method and device

Info

Publication number: CN105868372A
Application number: CN201610194484.7A
Authority: CN
Inventors: 林锡雄; 赵忠; 陈胜凯; 李祖辉
Original assignee: Guangzhou Kugou Computer Technology Co Ltd
Current assignee: Guangzhou Kugou Computer Technology Co Ltd
Priority date: 2016-03-31
Filing date: 2016-03-31
Publication date: 2016-08-17
Anticipated expiration: 2036-03-31
Also published as: CN105868372B

Abstract

The invention discloses a label distribution method and device, and belongs to the field of song classification. The method comprises the steps that a sample song is obtained, wherein the sample song comprises a song label labeled in advance; a user rating matrix is obtained, wherein the user rating matrix comprises scores of at least one user on the sample song, and the scores are obtained by calculating according to operation behaviors of the users to the sample song; a song classifier is generated according to the user ranking matrix and the song label of the sample song; the song classifier distributes song labels for all songs in a song library. According to the label distribution method and device, behavior characteristics are extracted from user behavior data produced in the process of listening to songs by the users, the song classifier is constructed by means of the behavior characteristics, classification is further conducted on the songs through the classifier, and therefore the accuracy rate of song classification is increased.

Description

Label distribution method and device

Technical field

It relates to categorizing songs field, particularly to a kind of label distribution method and device.

Background technology

Carrying out song recommendations for convenience, each big music platform is according to units such as the school of song, style, scenes Element, is assigned with song label miscellaneous for the song in library.

Owing in library, number of songs is huge, the mode using manual allocation is that song distribution song label becomes This is too high, and therefore, each big music platform generally uses the mode building categorizing songs device automatically to distribute for song Song label.During building categorizing songs device, the mode beforehand through manual allocation is some samples Song distribution song label, and extract the song features such as the tone color of sample song, rhythm, pitch and the lyrics, And then the song label and song features according to sample song constructs categorizing songs device.For unallocated song The song to be allocated of bent label, categorizing songs device can be that it distributes phase according to the song features of song to be allocated The song label answered.

Inconspicuous for song features or that song features is the most similar song, carries out song according to song features Label distribution accuracy rate is relatively low, affects categorizing songs effect.

Summary of the invention

In order to solve inconspicuous for song features or that song features is the most similar song, according to song features Carrying out song label distribution accuracy rate relatively low, the problem affecting categorizing songs effect, the disclosure provides one mark Sign distribution method and device.Described technical scheme is as follows:

First aspect according to disclosure embodiment, it is provided that a kind of label distribution method, the method includes:

Obtaining sample song, sample song comprises the song label marked in advance；

Obtain user's rating matrix, user's rating matrix comprise at least one user scoring to sample song, The operation behavior of sample song is calculated by scoring according to user；

Song label according to user's rating matrix and sample song generates categorizing songs device；

It is that in library, each song distributes described song label by categorizing songs device.

Alternatively, during the operation behavior of sample song is included playing, download, collect and sharing by user extremely Few one；

Obtain user's rating matrix, including:

According to each self-corresponding weighted value of dissimilar operation behavior, calculate at least one user to sample song Scoring；

According at least one user, the scoring of each sample song is generated user's rating matrix.

Alternatively, after obtaining user's rating matrix, also include:

Use TF-IDF (Term Frequency-Inverse Document Frequency, term frequency-inverse document frequency Rate) user's rating matrix is optimized by model.

Alternatively, for scoring arbitrary in user's rating matrix, use TF-IDF model to user's rating matrix It is optimized, including:

Obtain the scoring C in user's rating matrix_ij, C_ijRepresent the user i scoring to song j；

Calculate C_ijCorresponding word frequency tf_ij, wherein, tK is for controlling parameter；

Calculate the inverse document frequency idf that song j is corresponding_j, wherein,N is sample song sum, n_jIt instruction user's rating matrix is not the number of users of 0 to the scoring of song j；

According to tf_ijAnd idf_jScoring w after calculation optimization_ij, wherein, w_ij=tf_ij*idf_j；

According to w_ijGenerate the user's rating matrix after optimizing.

Alternatively, the user's rating matrix after optimization is x-y matrix, uses TF-IDF model to mark user After matrix is optimized, also include:

User's rating matrix after optimizing is carried out implicitization matrix decomposition, obtains the first matrix and the second matrix, Wherein, the first matrix is x-z matrix, and the second matrix is z-y matrix, z ＜ x, and the first matrix instruction user Preference to song features, the second matrix instruction song and degree of association of song features；

Song label according to user's rating matrix and sample song generates categorizing songs device, including:

Song label according to the second matrix and sample song generates categorizing songs device.

Alternatively, generate categorizing songs device according to the song label of the second matrix and sample song, including:

The sample song of predetermined ratio is defined as training set, and predetermined ratio is more than 50%；

Categorizing songs device is built according to the second matrix and training set；

The sample song in addition to training set is used to test the classifying quality of categorizing songs device.

Second aspect according to disclosure embodiment, it is provided that a kind of label distribution device, this device includes:

Sample acquisition module, is used for obtaining sample song, and sample song comprises the song label marked in advance；

Matrix acquisition module, is used for obtaining user's rating matrix, comprises at least one and use in user's rating matrix The family scoring to sample song, the operation behavior of sample song is calculated by scoring according to user；

Generation module, generates categorizing songs for the song label according to user's rating matrix and sample song Device；

Distribution module, being used for by categorizing songs device is each song distribution song label in library.

Matrix acquisition module, including:

Calculating sub module, for according to each self-corresponding weighted value of dissimilar operation behavior, calculates at least one The individual user scoring to sample song；

Matrix generates submodule, for the scoring of each sample song being generated user according at least one user Rating matrix.

Alternatively, this device, also include:

Optimize module, be used for using TF-IDF model that user's rating matrix is optimized.

Alternatively, for scoring arbitrary in user's rating matrix, optimize module, including:

Obtain submodule, for obtaining the scoring C in user's rating matrix_ij, C_ijRepresent that user i is to song j Scoring；

First calculating sub module, is used for calculating C_ijCorresponding word frequency tf_ij, wherein,K is for controlling Parameter；

Second calculating sub module, for calculating the inverse document frequency idf that song j is corresponding_j, wherein, N is sample song sum, n_jIt instruction user's rating matrix is not the number of users of 0 to the scoring of song j；

3rd calculating sub module, for according to tf_ijAnd idf_jScoring w after calculation optimization_ij, wherein, w_ij=tf_ij*idf_j；

Optimize submodule, for according to w_ijGenerate the user's rating matrix after optimizing.

Alternatively, the user's rating matrix after optimization is x-y matrix, and this device also includes:

Decomposing module, for the user's rating matrix after optimizing is carried out implicitization matrix decomposition, obtains the first square Battle array and the second matrix, wherein, the first matrix is x-z matrix, and the second matrix is z-y matrix, z ＜ x, and First matrix instruction user's preference to song features, the second matrix instruction song and degree of association of song features；

Generation module, is used for:

Alternatively, generation module, including:

Determining submodule, for the sample song of predetermined ratio is defined as training set, predetermined ratio is more than 50%；

Build submodule, for building categorizing songs device according to the second matrix and training set；

Test submodule, for using the classifying quality of the sample song test categorizing songs device in addition to training set.

Embodiment of the disclosure that the technical scheme of offer can include following beneficial effect:

By in advance sample song being carried out song label for labelling, according to user's operation behavior to sample song Obtain corresponding user's rating matrix, and then according to this user's rating matrix and the song label of sample song Generate categorizing songs device, and be all songs distribution song label in library by this categorizing songs device； Solve inconspicuous for song features or that song features is the most similar song, sing according to song features Bent label distribution accuracy rate is relatively low, the problem affecting categorizing songs effect；Reach to listen to song mistake from user The user behavior data produced in journey extracts behavior characteristics, utilizes behavior feature construction categorizing songs device, And according to this categorizing songs device, song is classified further, thus improve categorizing songs accuracy rate.

It should be appreciated that it is only exemplary that above general description and details hereinafter describe, can not Limit the disclosure.

Accompanying drawing explanation

Accompanying drawing herein is merged in description and constitutes the part of this specification, it is shown that meet the disclosure Embodiment, and in description together for explaining the principle of the disclosure.

Fig. 1 is the flow chart according to a kind of label distribution method shown in an exemplary embodiment；

Fig. 2 A is the flow chart according to a kind of label distribution method shown in another exemplary embodiment；

Fig. 2 B is the flow chart obtaining user's rating matrix process in label distribution method shown in Fig. 2 A；

Fig. 2 C is the flow chart of categorizing songs device test process in label distribution method shown in Fig. 2 A；

Fig. 3 is the block diagram according to a kind of label distribution device shown in an exemplary embodiment；

Fig. 4 is the block diagram according to a kind of label distribution device shown in another exemplary embodiment.

Detailed description of the invention

Here will illustrate exemplary embodiment in detail, its example represents in the accompanying drawings.Following retouches Stating when relating to accompanying drawing, unless otherwise indicated, the same numbers in different accompanying drawings represents same or analogous key element. Embodiment described in following exemplary embodiment does not represent all embodiment party consistent with the disclosure Formula.On the contrary, they only with describe in detail in appended claims, the disclosure some in terms of mutually one The example of the apparatus and method caused.

The label distribution method that each embodiment of the disclosure provides, can be come by the background server of music platform Realize.This background server can be single server, it is also possible to be the clothes being made up of some station servers Business device cluster or cloud computing center etc..

Fig. 1 is the flow chart according to a kind of label distribution method shown in an exemplary embodiment, the present embodiment Illustrating as a example by the background server of music platform by this label distribution method, the method can include Following step:

In a step 101, obtaining sample song, this sample song comprises the song label marked in advance.

A part of sample song in music platform library is labeled with song label in advance, and this song label is used In the mark school of sample song, style, scene etc..During carrying out label distribution, backstage takes Business device i.e. obtains the sample song in library.

In step 103, obtain user's rating matrix, this user's rating matrix comprises at least one user Scoring to sample song, the operation behavior of sample song is calculated by this scoring according to user.

Background server collect user's operation behavior to song, such as user play song, download song or Sharing song etc., and calculate user's scoring to this song according to this operation behavior, the highest instruction of marking is used Family is the highest to the preference of this song.

In step 105, generate song according to the song label of this user's rating matrix and sample song to divide Class device.

Background server is according to the song label of user's rating matrix and sample song, it is possible to further determine that Go out the song features of user preference and the song label that this song features is corresponding, and generate corresponding song and divide Class device.

In step 107, by categorizing songs device be in library each song distribution song label.

The user of remaining song in library is commented by background server according to the categorizing songs device generated and user Sub matrix, distributes corresponding song label for each song in library.

In sum, the label distribution method that the present embodiment provides, by carrying out song to sample song in advance Label for labelling, obtains corresponding user's rating matrix, Jin Ergen according to user to the operation behavior of sample song Generate categorizing songs device according to the song label of this user's rating matrix and sample song, and divided by this song Class device is all songs distribution song label in library；Solve or song inconspicuous for song features The song that feature is the most similar, carries out song label distribution accuracy rate according to song features relatively low, affects song The problem of classifying quality；Reach to listen to from user the user behavior data produced during song extracts row Be characterized, utilize behavior feature construction categorizing songs device, and further according to this categorizing songs device to song Classify, thus improve categorizing songs accuracy rate.

Fig. 2 A is the flow chart according to a kind of label distribution method shown in another exemplary embodiment, this enforcement Example illustrates as a example by the background server of music platform by this label distribution method, and the method can be wrapped Include following step:

In step 201, obtaining sample song, this sample song comprises the song label marked in advance.

Owing in music platform library, number of songs is huge, it is each song mark by the way of artificial mark Note song labeling requirement takes considerable time, and cost is high.In the label distribution method that the present embodiment provides, Before building categorizing songs device, it is only necessary to the sub-fraction song in library is defined as sample song, and Being that sample song marks song label by the way of artificial mark, wherein, this song label is used for indicating song Bent school (jazz, rock and roll, popular, classic etc.), style (sentimental, joyful, quiet etc.), Scene (motion, bar, shop at night, match etc.) etc..

Such as, music platform library comprises 1,000,000 songs, before building categorizing songs device, choose 1 Ten thousand songs are as sample song, and sample song carries out the mark of song label.It should be noted that The sample song chosen needs all schools in covering library, style, scene, thus ensures that sample is sung Bent is comprehensive and representative.

Building the categorizing songs device stage, background server i.e. obtains this sample song, schematically, sample The corresponding relation of song and song label can the most as shown in Table 1.

Table one

Sample song	Song label
		Song 00001	Classic, quiet, wedding
Song 00002	Rock and roll, fanaticism, bar
		…	…
Song 10000	Popular, celebrating, wedding

In step 202., obtain user's rating matrix, this user's rating matrix comprises at least one user Scoring to sample song, the operation behavior of sample song is calculated by this scoring according to user.

User, during listening to song, can carry out certain operations to song, and the action row that user is to song For a series of user behavior data then can be produced.Such as, when user selects to listen to certain song, i.e. produce Played data to this song, this played data can include broadcasting time；The most such as, user downloads certain During song, i.e. produce the download data to this song；For another example, certain song is shared with good friend by user Time, i.e. create the sharing data to this song.

The music platform providing playback of songs can collect these user behavior datas, and to these user behaviors Data are analyzed, and obtain user's scoring to each song, and determine that user is to song according to this scoring Fancy grade.

In a kind of possible embodiment, as shown in Figure 2 B, this step may include steps of.

In step 202A, according to each self-corresponding weighted value of dissimilar operation behavior, calculate at least one User's scoring to sample song.

User is during listening to song, it is possible to carry out different types of operation, such as can play song, Download song, collect song or share song etc.；Further, different types of operation behavior instruction user couple The fancy grade of song is the most different, and such as, collection user indicated by this operation behavior of song is to song Fancy grade is more than playing user indicated by this operation behavior of the song fancy grade to song, therefore, When the operation behavior of song being calculated user to the scoring of song according to user, different types of operation behavior Corresponding weighted value is the most different.Schematically, the weighted value that different types of operation behavior is corresponding can be as Shown in table two.

Table two

Operation behavior type	Weighted value
		Play	1
Download	3
		Collection	5
Share	10

According to each self-corresponding weighted value of dissimilar operation behavior, calculate at least one user to sample song Scoring time, the number of times that background server also needs to binding operation behavior corresponding calculates.Such as, for For song 00001, play when user 00001 has carried out 5 times to song 00001, and carried out 1 time When sharing, user 00001 is 1*5+8*1=13 to the scoring of song 00001.

It should be noted that for the operation behavior of some passiveness, such as delete song, labelling song For not liking etc., the weighted value of its correspondence can also be negative, and the present embodiment only enters with above-mentioned operation behavior Row schematically illustrates, and does not constitute the disclosure and limits.

In step 202B, according at least one user, the scoring of each sample song generated user and mark square Battle array.

According at least one user scoring to sample song, background server generates corresponding user and marks square Battle array.Such as, background server have collected the scoring to 10,000 first sample song of the 100 general-purpose families, generates 100 User's rating matrix of ten thousand-10 ten thousand, in this user's rating matrix, row represents user, and row represent song.

In step 203, use TF-IDF model that user's rating matrix is optimized.

The scoring determined according to user operation behavior determines that the fancy grade of user exists certain one-sidedness drawn game Sex-limited, such as, for song 00001 and song 00002, user plays 100 songs 00001, Corresponding user is 100 points to the scoring of song 00001, and user plays 10 songs 00002, corresponding User be 10 points to the scoring of song 00002, but user's hobby to song 00001 can not be represented Degree is 10 times of the fancy grade to song 00002.And in a practical situation, user's scoring to song And in power law relation between fancy grade.Therefore, after getting user's rating matrix, background server is also Need to use the TF-IDF model improved that user's rating matrix is optimized.

In a kind of possible embodiment, this step can also comprise the steps.

In step 203A, obtain the scoring C in user's rating matrix_ij, C_ijRepresent that user i is to song j Scoring.

Background server obtains the arbitrary scoring C in this user's rating matrix_ij。

In step 203B, calculate C_ijCorresponding word frequency tf_ij, wherein,K is for controlling parameter.

Due between user's scoring and fancy grade to song in power law relation, therefore, background server is to C_ij Carry out lg computing, obtain C_ijCorresponding word frequency tf_ij.If C_ijIt is 0, then it represents that song j was not performed by user i Operation, tf_ijAlso it is 0；If C_ijIt is not 0, then it represents that user i performed operation to song j.

Such as, C is worked as_ijIt is 10, and when k is 10, C_ijCorresponding word frequency tf_ijIt is ln2.

In step 203C, calculate the inverse document frequency idf that song j is corresponding_j, wherein, For sample song sum, n_jIt instruction user's rating matrix is not the number of users of 0 to the scoring of song j.

According to the definition of IDF, if the document comprising a certain entry is the fewest, then illustrate that this entry has the best Class discrimination ability, the IDF that this entry is corresponding accordingly is the biggest.Similar, for a certain song, If a large number of users has carried out mark (i.e. mark is not 0) to this song, then the class discrimination energy of this song is described Power is the most weak.Therefore, background server is according to the sum of sample song, and song j is in user's rating matrix Scoring is not the number of 0, calculates the inverse document frequency idf that song j is corresponding_j。

Such as, for song j, sing in antiphonal style in sample song sum is 10000, and user's rating matrix Bent j scoring is not the number of users of 0 when being 2000, the inverse document frequency idf that song j is corresponding_jIt is ln6.

It should be noted that there is not strict precedence relationship between above-mentioned steps 203B and step 203C, I.e. above-mentioned steps 203B and step 203C can perform simultaneously.

In step 203D, according to tf_ijAnd idf_jScoring w after calculation optimization_ij, wherein, w_ij=tf_ij*idf_j。

It is calculated tf_ijAnd idf_jAfter, background server is i.e. according to formula w_ij=tf_ij*idf_jIt is calculated commenting after optimization Point.

In step 203E, according to w_ijGenerate the user's rating matrix after optimizing.

Each scoring during user's rating matrix is converged by background server is optimized, thus after generating optimization User's rating matrix, the user's rating matrix after this optimization can embody user characteristics.

In step 204, the user's rating matrix after optimizing is carried out implicitization matrix decomposition, obtains the first square Battle array and the second matrix, wherein, the first matrix is x-z matrix, and the second matrix is z-y matrix, z ＜ x, and First matrix instruction user's preference to song features, the second matrix instruction song and degree of association of song features.

Owing to number of users is huge, the dimension of the user's rating matrix after optimization may be got and tens million of even go up Hundred million (relevant with number of users), if directly generating grader according to the user's rating matrix after optimizing, can become Serious over-fitting, causes the classifying quality extreme difference of the grader generated.

In order to avoid producing Expired Drugs, background server needs to map the user's rating matrix after optimizing To a hidden vector space compared with low dimensional so that this user's rating matrix can be expressed as the interior of two matrixes Long-pending, one of them matrix is for indicating user's preference to song features, and another matrix is then used for indicating Song and the degree of association of song features.

As a kind of possible embodiment, background server carries out implicit expression to the user's rating matrix after optimizing Matrix decomposition, is x-z matrix and z-y matrix by x-y matrix decomposition originally, and x-z matrix therein indicates User's preference to song features, z-y matrix then indicates the degree of association of song and song features.

Such as, when the user's rating matrix after optimizing is 1000000-10000 matrix, background server is permissible It is 1000000-300 matrix (the first matrix) and 300-10000 matrix (the second matrix) by this matrix decomposition.

In step 205, categorizing songs device is generated according to the song label of the second matrix and sample song.

Owing to song features is embodied in song with the form of song label, therefore, according to the second matrix instruction The degree of association of song and song features and the song label of sample song, background server can generate song Bent grader.In a kind of possible implementation, this categorizing songs device can be SVM (Support Vector Machine, support vector machine).

In step 206, by categorizing songs device be in library each song distribution song label.

After generating categorizing songs device, background server determines the song not marking song label in library, and Obtain user's user's rating matrix to these songs, and then using user's rating matrix as categorizing songs device Input, for the song distribution song label not marked.It should be noted that do not mark the user that song is corresponding Rating matrix is also required to through similar above-mentioned steps 203 and the process of step 204, and the present embodiment is at this no longer Repeat.

In the present embodiment, by arranging different weighted values for different types of operation behavior, and according to this power Weight values is calculated user's scoring to song, generates corresponding user's rating matrix, and then comments according to user The song label of sub matrix and sample song generates categorizing songs device, it is achieved the classification to song, improves The efficiency of categorizing songs.

In the present embodiment, by using TF-IDF model that user's rating matrix is optimized, and use optimization After user's rating matrix generate categorizing songs device, improve the classifying quality of the grader of generation.

In the present embodiment, by the user's rating matrix after optimizing is carried out implicitization matrix decomposition, indicated The song features matrix of the degree of association of song and song features, and according to this song features matrix and sample song Song label generate categorizing songs device, it is to avoid because user's rating matrix dimension too high over-fitting caused is existing As, further increase the classifying quality of grader.

In order to ensure the classifying quality of the categorizing songs device generated, as a kind of possible embodiment, such as figure Shown in 2C, above-mentioned steps 205 may include steps of.

In step 205A, the sample song of predetermined ratio being defined as training set, predetermined ratio is more than 50%.

Background server chooses the sample song of predetermined ratio as training set, and is made by remaining sample song For test set.It should be noted that for the quality ensureing categorizing songs device, this predetermined ratio need to be more than 50%, I.e. in training set, the quantity of sample song need to be more than the quantity of sample song in test set.

Such as, background server chooses the sample song of 70% as training set, and by the sample of remaining 30% This song is as test set.

In step 205B, build categorizing songs device according to the second matrix and training set.

Background server uses the second matrix and training set to build categorizing songs device.

In step 205C, the sample song in addition to training set is used to test the classifying quality of categorizing songs device.

After completing the structure of categorizing songs device, background server use test set as the input of categorizing songs device, Distribute song label for the sample song in test set, and detect the song label of distribution and the song of manual mark The matching degree of bent label.If matching degree is higher than preset matching degree threshold value, it is determined that the classification of this categorizing songs device Effect is up to standard, and performs step 206；If matching degree is less than preset matching degree threshold value, it is determined that this categorizing songs The classifying quality of device is below standard.

In the present embodiment, by sample song is divided into training set and test set, and training set is used to build song Bent grader, uses the classifying quality of test set test categorizing songs device, thus ensure that categorizing songs device Classification quality.

Following for disclosure device embodiment, may be used for performing method of disclosure embodiment.For the disclosure The details not disclosed in device embodiment, refer to method of disclosure embodiment.

Fig. 3 is the block diagram according to a kind of label distribution device shown in an exemplary embodiment, and this device is permissible It is implemented as all or part of of music platform background server by software, hardware or software and hardware combining, This device includes:

Sample acquisition module 310, is used for obtaining sample song, and sample song comprises the song label marked in advance；

Matrix acquisition module 320, is used for obtaining user's rating matrix, comprises at least one in user's rating matrix User's scoring to sample song, the operation behavior of sample song is calculated by scoring according to user；

Generation module 330, generates song for the song label according to user's rating matrix and sample song and divides Class device；

Distribution module 340, being used for by categorizing songs device is each song distribution song label in library.

In sum, the label distribution device that the present embodiment provides, by carrying out song to sample song in advance Label for labelling, obtains corresponding user's rating matrix, Jin Ergen according to user to the operation behavior of sample song Generate categorizing songs device according to the song label of this user's rating matrix and sample song, and divided by this song Class device is all songs distribution song label in library；Solve or song inconspicuous for song features The song that feature is the most similar, carries out song label distribution accuracy rate according to song features relatively low, affects song The problem of classifying quality；Reach to listen to from user the user behavior data produced during song extracts row Be characterized, utilize behavior feature construction categorizing songs device, and further according to this categorizing songs device to song Classify, thus improve categorizing songs accuracy rate.

Fig. 4 is the block diagram according to a kind of label distribution device shown in another exemplary embodiment, and this device can To be implemented as all or part of of music platform background server by software, hardware or software and hardware combining, This device includes:

Sample acquisition module 410, is used for obtaining sample song, and sample song comprises the song label marked in advance；

Matrix acquisition module 420, is used for obtaining user's rating matrix, comprises at least one in user's rating matrix User's scoring to sample song, the operation behavior of sample song is calculated by scoring according to user；

Generation module 430, generates song for the song label according to user's rating matrix and sample song and divides Class device；

Distribution module 440, being used for by categorizing songs device is each song distribution song label in library.

In an optional embodiment, the operation behavior of sample song is included playing, downloads, receives by user Hide with share at least one；

Matrix acquisition module 420, including:

Calculating sub module 421, for according to each self-corresponding weighted value of dissimilar operation behavior, calculates at least One user scoring to sample song；

Matrix generates submodule 422, for the scoring of each sample song being generated use according at least one user Family rating matrix.

Alternatively, this device, also include:

Optimize module 450, be used for using TF-IDF model that user's rating matrix is optimized.

Alternatively, for scoring arbitrary in user's rating matrix, optimize module 450, including:

Obtain submodule 451, for obtaining the scoring C in user's rating matrix_ij, C_ijRepresent that user i is to song The scoring of j；

First calculating sub module 452, is used for calculating C_ijCorresponding word frequency tf_ij, wherein,K is Control parameter；

Second calculating sub module 453, for calculating the inverse document frequency idf that song j is corresponding_j, wherein,N is sample song sum, n_jIn instruction user's rating matrix marking song j is not 0 Number of users；

3rd calculating sub module 454, for according to tf_ijAnd idf_jScoring w after calculation optimization_ij, wherein, w_ij=tf_ij*idf_j；

Optimize submodule 455, for according to w_ijGenerate the user's rating matrix after optimizing.

Decomposing module 460, for the user's rating matrix after optimizing is carried out implicitization matrix decomposition, obtains first Matrix and the second matrix, wherein, the first matrix is x-z matrix, and the second matrix is z-y matrix, z ＜ x, And first matrix instruction user's preference to song features, second matrix instruction song relevant to song features Degree；

Generation module 430, is used for:

Alternatively, generation module 430, including:

Determining submodule 431, for the sample song of predetermined ratio is defined as training set, predetermined ratio is more than 50%；

Build submodule 432, for building categorizing songs device according to the second matrix and training set；

Test submodule 433, for using the classification effect of the sample song test categorizing songs device in addition to training set Really.

About the device in above-described embodiment, wherein modules performs the concrete mode of operation relevant The embodiment of the method is described in detail, explanation will be not set forth in detail herein.

Those skilled in the art, after considering description and putting into practice invention disclosed herein, will readily occur to these public affairs Other embodiment opened.The application is intended to any modification, purposes or the adaptations of the disclosure, These modification, purposes or adaptations are followed the general principle of the disclosure and include that the disclosure is not disclosed Common knowledge in the art or conventional techniques means.Description and embodiments is considered only as exemplary , the true scope of the disclosure and spirit are pointed out by claim below.

It should be appreciated that the disclosure is not limited to accurate knot described above and illustrated in the accompanying drawings Structure, and various modifications and changes can carried out without departing from the scope.The scope of the present disclosure is only by appended Claim limits.

Claims

1. a label distribution method, it is characterised in that described method includes:

Obtaining sample song, described sample song comprises the song label marked in advance；

Obtain user's rating matrix, described user's rating matrix comprises at least one user described sample is sung Bent scoring, the operation behavior of described sample song is calculated by described scoring according to user；

Described song label according to described user's rating matrix and described sample song generates categorizing songs Device；

It is that in library, each song distributes described song label by described categorizing songs device.

Method the most according to claim 1, it is characterised in that user is to described in described sample song Operation behavior includes at least one in playing, download, collect and sharing；

Described acquisition user's rating matrix, including:

According to each self-corresponding weighted value of dissimilar operation behavior, calculate at least one user to described sample The described scoring of song；

According at least one user, the described scoring of each sample song is generated described user's rating matrix.

Method the most according to claim 1 and 2, it is characterised in that described acquisition user's rating matrix Afterwards, also include:

Use term frequency-inverse document frequency TF-IDF model that described user's rating matrix is optimized.

Method the most according to claim 3, it is characterised in that for appointing in described user's rating matrix One scoring, described user's rating matrix is optimized by described use TF-IDF model, including:

Obtain the scoring C in described user's rating matrix_ij, described C_ijRepresent user i institute's commentary to song j Point；

Calculate described C_ijCorresponding word frequency tf_ij, wherein,K is for controlling parameter；

Calculate the inverse document frequency idf that song j is corresponding_j, wherein,N is described sample song Sum, described n_jIndicating in described user's rating matrix to song j scoring be not 0 number of users；

According to described tf_ijAnd idf_jScoring w after calculation optimization_ij, wherein, w_ij=tf_ij*idf_j；

According to described w_ijGenerate the described user's rating matrix after optimizing.

Method the most according to claim 3, it is characterised in that the described user's rating matrix after optimization For x-y matrix, after described user's rating matrix is optimized by described use TF-IDF model, also include:

Described user's rating matrix after optimizing is carried out implicitization matrix decomposition, obtains the first matrix and second Matrix, wherein, described first matrix is x-z matrix, and described second matrix is z-y matrix, z ＜ x, and institute State first matrix instruction user's preference to song features, described second matrix instruction song and song features Degree of association；

The described described song label according to described user's rating matrix and described sample song generates song and divides Class device, including:

Described song label according to described second matrix and described sample song generates described categorizing songs Device.

Method the most according to claim 5, it is characterised in that described according to described second matrix and The described song label of described sample song generates described categorizing songs device, including:

The described sample song of predetermined ratio is defined as training set, and described predetermined ratio is more than 50%；

Described categorizing songs device is built according to described second matrix and described training set；

The described sample song in addition to described training set is used to test the classifying quality of described categorizing songs device.

7. a label distribution device, it is characterised in that described device includes:

Sample acquisition module, is used for obtaining sample song, and described sample song comprises the song mark marked in advance Sign；

Matrix acquisition module, is used for obtaining user's rating matrix, comprises at least one in described user's rating matrix The individual user scoring to described sample song, described scoring is according to user's operation behavior to described sample song It is calculated；

Generation module, for according to described user's rating matrix and the described song label of described sample song Generate categorizing songs device；

Distribution module, being used for by described categorizing songs device is that in library, each song distributes described song mark Sign.

Device the most according to claim 7, it is characterised in that user is to described in described sample song Operation behavior includes at least one in playing, download, collect and sharing；

Described matrix acquisition module, including:

Calculating sub module, for according to each self-corresponding weighted value of dissimilar operation behavior, calculates at least one The individual user described scoring to described sample song；

Matrix generates submodule, for generating the described scoring of each sample song according at least one user Described user's rating matrix.

9., according to the device described in claim 7 or 8, it is characterised in that described device, also include:

Optimize module, be used for using term frequency-inverse document frequency TF-IDF model that described user's rating matrix is entered Row optimizes.

Device the most according to claim 9, it is characterised in that in described user's rating matrix Arbitrary scoring, described optimization module, including:

Obtain submodule, for obtaining the scoring C in described user's rating matrix_ij, described C_ijRepresent user i couple The described scoring of song j；

First calculating sub module, is used for calculating described C_ijCorresponding word frequency tf_ij, wherein,K is Control parameter；

Second calculating sub module, for calculating the inverse document frequency idf that song j is corresponding_j, wherein, N is described sample song sum, described n_jIndicate in described user's rating matrix to song j scoring be not 0 Number of users；

3rd calculating sub module, for according to described tf_ijAnd idf_jScoring w after calculation optimization_ij, wherein, w_ij=tf_ij*idf_j；

Optimize submodule, for according to described w_ijGenerate the described user's rating matrix after optimizing.

11. devices according to claim 9, it is characterised in that the described user after optimization marks square Battle array is x-y matrix, and described device also includes:

Decomposing module, for the described user's rating matrix after optimizing is carried out implicitization matrix decomposition, obtains the One matrix and the second matrix, wherein, described first matrix is x-z matrix, and described second matrix is z-y square Battle array, z ＜ x, and described first matrix instruction user's preference to song features, described second matrix instruction song The bent degree of association with song features；

Described generation module, is used for:

12. devices according to claim 11, it is characterised in that described generation module, including:

Determine submodule, for the described sample song of predetermined ratio being defined as training set, described pre-definite proportion Example is more than 50%；

Build submodule, for building described categorizing songs device according to described second matrix and described training set；

Test submodule, for using the described sample song in addition to described training set to test described categorizing songs The classifying quality of device.