CN113010728A

CN113010728A - Song recommendation method, system, intelligent device and storage medium

Info

Publication number: CN113010728A
Application number: CN202110367621.3A
Authority: CN
Inventors: 叶建仲
Original assignee: Jinbaobei Network Technology Suzhou Co ltd
Current assignee: Jinbaobei Network Technology Suzhou Co ltd
Priority date: 2021-04-06
Filing date: 2021-04-06
Publication date: 2021-06-22

Abstract

The invention provides a song recommending method, a song recommending system, intelligent equipment and a storage medium, wherein the method comprises the following steps: finely classifying the target type songs; acquiring a plurality of song files under each subdivision category, and converting the song files into a training spectrogram; establishing a song classification model, and training the song classification model through the training spectrogram; acquiring a current song selected by a user under a target type, inputting a spectrogram corresponding to the current song into the trained song classification model, and acquiring a current fine classification corresponding to the current song; and recommending songs according to the current fine classification. According to the scheme, similar recommendation can be performed according to the characteristics of the songs without acquiring user behaviors, data and the like, the recommendation precision is higher, the application range is wider, and the requirements of different types of people can be met.

Description

Song recommendation method, system, intelligent device and storage medium

Technical Field

The invention relates to the technical field of children song recommendation, in particular to a song recommendation method, a song recommendation system, intelligent equipment and a storage medium.

Background

The existing song recommendation methods are many, and the mainstream music recommendation technology is to realize recommendation such as spotify and dried shrimp music by classifying and judging a large amount of user behavior data, social data, user portrait and other data by using a collaborative algorithm. However, the method is not suitable for many vertical or relatively small-sized users, such as children's songs, because the user data is little or no, and the recommendation cannot be made according to the user behavior, data and the like or the recommendation accuracy is low. Therefore, a method for recommending songs with higher accuracy without acquiring behaviors, data, and the like of the user is required.

Disclosure of Invention

The invention aims to provide a song recommendation method, a song recommendation system, intelligent equipment and a storage medium, the scheme can perform similar recommendation according to the characteristics of songs without acquiring user behaviors, data and the like, the recommendation precision is higher, the application range is wider, and the requirements of different types of people can be met.

The technical scheme provided by the invention is as follows:

the invention provides a song recommending method, which comprises the following steps:

finely classifying the target type songs;

acquiring a plurality of song files under each subdivision category, and converting the song files into a training spectrogram;

establishing a song classification model, and training the song classification model through the training spectrogram;

acquiring a current song selected by a user under a target type, inputting a spectrogram corresponding to the current song into the trained song classification model, and acquiring a current fine classification corresponding to the current song;

and recommending songs according to the current fine classification.

Specifically, a large type of songs can be subdivided, for example, children's songs can be classified into 21 categories, such as cradle songs, game songs, digital songs, question and answer songs, passwords, chain tunes, riddles songs, reverse songs, capitals songs, story categories, lyrics categories, kindergartens, antenatal music, sleeping songs, ear grinding, animation songs, bevaco songs, sanji songs, ancient poems, english children's songs, classical music, and the like. Of course, classification may be made according to other criteria.

When a plurality of song files under each fine category are obtained as a training set, more song files are selected for each fine category in order to improve the accuracy of the model. Taking the above-mentioned children's song as an example, in the present embodiment, 50 mp3 song files are selected for each fine category, and 1050 mp3 files are total.

The audio files contain a lot of information, and in order to extract features and remove noise, audio can be converted into an image form, an audio signal is converted into a frequency domain by using a Fourier transform method, and 1050 MP3 audio files are processed by the method, and each song is extracted and converted into a spectrogram. A spectrogram is a visual representation of the frequency spectrum of a sound over time, and the shade of color in the spectrogram represents the sound magnitude at that frequency.

In this embodiment, the song classification model is a tensflo convolutional neural network model, when performing classification, a spectrogram image is converted into a number matrix representing the color of each pixel, then data is processed by a convolutional layer, a pooling layer, a full connection layer and the like and then converted into a softmax classifier, the classifier is a vector consisting of a plurality of numbers (for example, 21 numbers corresponding to the songguang subclasses), the probability that the convolutional neural network model assigns each song subclass to the spectrogram is included, and finally the classification of the maximum probability position is selected as a final identification classification. In other embodiments, other neural network models may be used to classify songs.

The song classification model can be trained by performing fine classification on the target type songs, acquiring a plurality of song files under each fine classification, and converting the song files into the training spectrogram; when a user selects a song under the target type, the spectrogram corresponding to the current song is input into the trained song classification model, so that the current fine classification corresponding to the current song can be obtained, and the song recommendation can be carried out according to the current fine classification. Because the scheme does not need to acquire user behaviors, data and the like, similar recommendation can be performed according to the characteristics of the songs, so that the recommendation precision of the scheme is higher, the application range is wider, and the requirements of different types of people can be met.

Further, after the song classification model is trained through the training spectrogram, the method further comprises the following steps:

performing fine classification on all the songs in the stored target type through the trained song classification model;

establishing a song similarity model for each fine category, and extracting the feature vectors of all songs under each fine category through the song similarity model;

after obtaining the current fine classification corresponding to the current song, the method further comprises the following steps:

extracting a feature vector corresponding to the current song through the song similarity model corresponding to the current fine classification:

and comparing the feature vector corresponding to the current song with the feature vectors of all the songs under the current fine classification to obtain the target song with the highest similarity and recommend the target song.

Specifically, in order to recommend a target song more accurately, when a song classification model is trained, all songs in a stored target type are classified finely, a song similarity model is established for each fine classification, and then the feature vectors of all songs in each fine classification are extracted through the song similarity model; after the fine classification of the current song is determined, the feature vector corresponding to the current song is extracted through the song similarity model corresponding to the current fine classification, the feature vector corresponding to the current song is compared with the feature vectors of all songs under the current fine classification, and the target song with the highest similarity can be obtained for recommendation, so that the recommendation accuracy is improved.

Further, comparing the feature vector corresponding to the current song with the feature vectors of all songs in the current fine category through a cosine similarity algorithm;

and recommending the song with the maximum cosine similarity as a target song.

Specifically, the feature vector corresponding to the current song may be compared with the feature vectors of all songs in the current fine category through a cosine similarity algorithm. The cosine similarity algorithm is to draw the vectors into a vector space according to coordinate values, obtain the included angle of the vectors and obtain a cosine value corresponding to the included angle, and the cosine value can be used for measuring the similarity of the two vectors; the smaller the angle, the closer the cosine value is to 1, and the more similar the two vectors are. Therefore, the cosine similarity can be used as a main index to compare the similarity of the songs, and the song with the maximum cosine similarity is recommended as a target song.

Further, the obtaining a plurality of song files under each fine category and converting the song files into a training spectrogram specifically includes:

extracting audio of a preset time period in the song file;

converting the audio of the preset time period into a complete spectrogram;

dividing the complete spectrogram into a preset number of training spectrograms in equal parts;

the inputting the spectrogram corresponding to the current song into the trained song classification model specifically includes:

extracting the audio of the current song in a preset time period;

converting the audio frequency of the current song in a preset time period into spectrogram with preset quantity;

and inputting the preset number of spectrogram corresponding to the current song into the trained song classification model.

Specifically, in order to ensure the uniformity of data, the audio of a preset time period in the song file can be extracted, the audio of the preset time period is converted into a complete spectrogram, and the complete spectrogram is divided into a preset number of training spectrograms in equal parts. For example, taking the above-mentioned classification of children songs as an example, about 2 minutes of audio may be extracted from each song, and then converted into a spectrogram, then the picture is divided into 256 × 256 pixels square pictures, each picture has about 5 seconds of audio, one song has about 24 spectrograms, after the division, total about 25200 pictures are processed, and a label is made for each picture, and the classification is labeled. By splitting each song into 24 spectrogram, the accuracy of model training can be improved, and accidental performance is avoided. Of course, in other embodiments, different splits of the spectrogram may be performed according to actual requirements.

Further, the extracting feature vectors of all songs under each fine category through the song similarity model specifically includes:

extracting the audio of all songs in each fine category in a preset time period;

converting the audio of the preset time period corresponding to all songs in each subdivision category into spectrogram with preset quantity;

extracting a preset number of feature vectors corresponding to the preset number of spectrogram of each song under each subdivision classification through the song similarity model;

calculating an average vector of each song under each fine category;

the extracting of the feature vector corresponding to the current song through the song similarity model corresponding to the current fine classification specifically includes:

extracting a preset number of feature vectors corresponding to a preset number of spectrogram of the current song through the song similar model corresponding to the current subdivision class;

calculating an average vector corresponding to the current song;

comparing the feature vector corresponding to the current song with the feature vectors of all songs under the current fine classification specifically comprises:

and comparing the average vector corresponding to the current song with the average vectors of the songs under the current fine classification.

Specifically, when extracting the feature vectors, in order to improve the accuracy and avoid the contingency, 24 feature vectors corresponding to one song can be obtained according to the same method, then the average vector of each song is obtained, the average vector is used as an evaluation standard to evaluate the similarity between different songs, and the optimal recommendation can be achieved.

In addition, the present invention also provides a song recommendation system, comprising:

the first classification module is used for performing fine classification on the target type songs;

the acquisition module is connected with the first classification module and used for acquiring a plurality of song files under each fine classification and converting the song files into a training spectrogram;

the training module is connected with the acquisition module and used for establishing a song classification model and training the song classification model through the training spectrogram;

and the recommending module is connected with the training module and used for acquiring the current song under the target type selected by the user, inputting the spectrogram corresponding to the current song into the trained song classification model, acquiring the current fine classification corresponding to the current song and recommending the song according to the current fine classification.

The method comprises the steps that a target type song is classified in a subdivided mode through a first classification module, a plurality of song files under each subdivided mode are obtained through an obtaining module, the song files are converted into a training spectrogram, and a song classification model can be trained through a training module; when a user selects a song under the target type, the spectrogram corresponding to the current song can be input into the trained song classification model through the recommendation module, so that the current fine classification corresponding to the current song can be obtained, and the song recommendation can be carried out according to the current fine classification. Because the scheme does not need to acquire user behaviors, data and the like, similar recommendation can be performed according to the characteristics of the songs, so that the recommendation precision of the scheme is higher, the application range is wider, and the requirements of different types of people can be met.

Further, still include:

the second classification module is used for performing fine classification on all the stored songs under the target type through the trained song classification model;

the first extraction module is used for establishing a song similarity model for each fine category and extracting the feature vectors of all songs under each fine category through the song similarity model;

the second extraction module is used for extracting the feature vector corresponding to the current song through the song similarity model corresponding to the current fine classification;

and the comparison module is used for comparing the feature vector corresponding to the current song with the feature vectors of all the songs under the current fine classification to obtain the target song with the highest similarity, and the recommendation module recommends the target song.

Specifically, in order to recommend a target song more accurately, when a song classification model is trained, a second classification module is used for performing fine classification on all songs in a stored target type, a song similarity model is established for each fine classification, and a first extraction module is used for extracting feature vectors of all songs in each fine classification; after the fine classification of the current song is determined, the feature vector corresponding to the current song can be extracted through the second extraction module, and then the feature vector corresponding to the current song is compared with the feature vectors of all songs under the current fine classification through the comparison module, so that the target song with the highest similarity can be obtained for recommendation, and the recommendation accuracy is improved.

Further, the comparison module calculates the feature vector corresponding to the current song and the feature vectors of all the songs in the current fine category through a cosine similarity algorithm, and takes the song with the largest cosine similarity as the target song.

In addition, the present invention also provides an intelligent device, comprising:

the memory is used for storing the running program;

and the processor is used for executing the running program stored in the memory and realizing the operation executed by the song recommending method.

In addition, the present invention also provides a storage medium, where at least one instruction is stored, and the instruction is loaded and executed by a processor to implement the operations performed by the above-mentioned song recommendation method.

According to the song recommendation method, the song recommendation system, the intelligent device and the storage medium, provided by the invention, the song classification model can be trained by finely classifying the target type of songs, acquiring a plurality of song files under each fine classification, and converting the song files into the training spectrogram; when a user selects a song under the target type, the spectrogram corresponding to the current song is input into the trained song classification model, so that the current fine classification corresponding to the current song can be obtained, and the song recommendation can be carried out according to the current fine classification. Because the scheme does not need to acquire user behaviors, data and the like, similar recommendation can be performed according to the characteristics of the songs, so that the recommendation precision of the scheme is higher, the application range is wider, and the requirements of different types of people can be met.

Drawings

The foregoing features, technical features, advantages and embodiments of the present invention will be further explained in the following detailed description of the preferred embodiments, which is to be read in connection with the accompanying drawings.

FIG. 1 is a schematic overall flow diagram of an embodiment of the present invention;

FIG. 2 is a schematic flow diagram of another embodiment of the present invention;

FIG. 3 is a system architecture diagram of an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an intelligent device according to an embodiment of the present invention.

Reference numerals: 1-a first classification module; 2-an acquisition module; 3-a training module; 4-a recommendation module; 5-a second classification module; 6-a first extraction module; 7-a second extraction module; 8-a comparison module; 100-a memory; 200-a processor; 300-a communication interface; 400-a communication bus; 500-input/output interface.

Detailed Description

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description will be made with reference to the accompanying drawings. It is obvious that the drawings in the following description are only some examples of the invention, and that for a person skilled in the art, other drawings and embodiments can be derived from them without inventive effort.

For the sake of simplicity, the drawings only schematically show the parts relevant to the present invention, and they do not represent the actual structure as a product. In addition, in order to make the drawings concise and understandable, components having the same structure or function in some of the drawings are only schematically illustrated or only labeled. In this document, "one" means not only "only one" but also a case of "more than one".

Example 1

One embodiment of the present invention, as shown in fig. 1, provides a song recommendation method, including the steps of:

and S1, finely classifying the target type songs.

And S2, acquiring a plurality of song files under each fine classification, and converting the song files into a training spectrogram.

And S3, establishing a song classification model, and training the song classification model through a training spectrogram.

And S4, acquiring the current song under the target type selected by the user, inputting the spectrogram corresponding to the current song into the trained song classification model, and acquiring the current fine classification corresponding to the current song.

And S5, recommending songs according to the current fine classification.

Example 2

As shown in fig. 2, an embodiment of the present invention, on the basis of embodiment 1, after the song classification model is trained by training the spectrogram, further includes the steps of:

and S31, finely classifying all the songs under the stored target type through the trained song classification model.

And S32, establishing a song similarity model for each fine category, and extracting the feature vectors of all songs under each fine category through the song similarity model.

and S41, extracting the feature vector corresponding to the current song through the song similarity model corresponding to the current fine classification.

And S42, comparing the feature vector corresponding to the current song with the feature vectors of all songs under the current fine classification to obtain the target song with the highest similarity and recommending the target song.

Preferably, comparing the feature vector corresponding to the current song with the feature vectors of all songs in the current fine classification through a cosine similarity algorithm; and recommending the song with the maximum cosine similarity as a target song.

Example 3

An embodiment of the present invention, on the basis of embodiment 2, obtains a plurality of song files under each subdivision category, and converts the song files into a training spectrogram, specifically including:

extracting audio of a preset time period in the song file; converting the audio frequency of a preset time period into a complete spectrogram; and dividing the complete spectrogram into a preset number of training spectrograms in equal.

Inputting the spectrogram corresponding to the current song into the trained song classification model, which specifically comprises the following steps:

extracting the audio of the current song in a preset time period; converting the audio frequency of the current song in a preset time period into spectrogram with preset quantity; and inputting the preset number of spectrogram corresponding to the current song into the trained song classification model.

Preferably, extracting feature vectors of all songs under each fine category through a song similarity model specifically includes:

extracting the audio of all songs in each fine category in a preset time period; converting the audio of the preset time period corresponding to all songs in each subdivision category into spectrogram with preset quantity; extracting a preset number of feature vectors corresponding to the preset number of spectrogram of each song under each subdivision classification through a song similarity model; an average vector is calculated for each song under each fine category.

Extracting a feature vector corresponding to the current song through a song similarity model corresponding to the current fine classification, which specifically comprises the following steps:

extracting a preset number of feature vectors corresponding to the spectrogram of the preset number of the current songs through a song similarity model corresponding to the current fine classification; and calculating an average vector corresponding to the current song.

Comparing the feature vector corresponding to the current song with the feature vectors of all songs under the current fine classification, specifically comprising:

the average vector corresponding to the current song is compared to the average vectors for the individual songs under the current fine category.

Example 4

In an embodiment of the present invention, as shown in fig. 3, the present invention further provides a song recommendation system, which includes a first classification module 1, an acquisition module 2, a training module 3, and a recommendation module 4.

The first classification module 1 is used for performing fine classification on the target type songs.

The obtaining module 2 is connected with the first classification module 1, and is used for obtaining a plurality of song files under each fine classification and converting the song files into a training spectrogram.

The training module 3 is connected with the acquisition module 2 and used for establishing a song classification model and training the song classification model through a training spectrogram.

And the recommendation module 4 is connected with the training module 3 and is used for acquiring the current song under the target type selected by the user, inputting the spectrogram corresponding to the current song into the trained song classification model, acquiring the current fine classification corresponding to the current song and recommending the song according to the current fine classification.

The method comprises the steps that a target type song is classified in a subdivided mode through a first classification module 1, a plurality of song files under each subdivided mode are obtained through an obtaining module 2, the song files are converted into a training spectrogram, and a song classification model can be trained through a training module 3; when a user selects a song under the target type, the spectrogram corresponding to the current song can be input into the trained song classification model through the recommendation module 4, so that the current fine classification corresponding to the current song can be obtained, and the song recommendation can be carried out according to the current fine classification. Because the scheme does not need to acquire user behaviors, data and the like, similar recommendation can be performed according to the characteristics of the songs, so that the recommendation precision of the scheme is higher, the application range is wider, and the requirements of different types of people can be met.

Example 5

In an embodiment of the present invention, as shown in fig. 3, on the basis of embodiment 4, the song recommending system further includes a second classification module 5, a first extraction module 6, a second extraction module 7 and a comparison module 8.

The second classification module 5 is used for performing fine classification on all the songs under the stored target type through the trained song classification model.

The first extraction module 6 is used for establishing a song similarity model for each fine category and extracting the feature vectors of all songs under each fine category through the song similarity model.

The second extraction module 7 is configured to extract a feature vector corresponding to the current song through the song similarity model corresponding to the current fine classification.

The comparison module 8 is configured to compare the feature vector corresponding to the current song with the feature vectors of all songs in the current fine category, to obtain a target song with the highest similarity, and the recommendation module recommends the target song.

Specifically, in order to recommend a target song more accurately, when a song classification model is trained, a second classification module 5 is used for performing fine classification on all songs in a stored target type, a song similarity model is established for each fine classification, and a first extraction module 6 is used for extracting feature vectors of all songs in each fine classification; after the fine classification of the current song is determined, the feature vector corresponding to the current song can be extracted through the second extraction module 7, and then the feature vector corresponding to the current song is compared with the feature vectors of all songs under the current fine classification through the comparison module 8, so that the target song with the highest similarity can be obtained for recommendation, and the recommendation accuracy is improved.

Preferably, the comparing module 8 calculates the feature vector corresponding to the current song and the feature vectors of all songs in the current fine category through a cosine similarity algorithm, and takes the song with the largest cosine similarity as the target song.

In addition, in order to ensure the uniformity of data, the audio frequency of a preset time period in the song file can be extracted, the audio frequency of the preset time period is converted into a complete spectrogram, and the complete spectrogram is divided into a preset number of training spectrograms in equal halves. For example, taking the above-mentioned classification of children songs as an example, about 2 minutes of audio may be extracted from each song, and then converted into a spectrogram, then the picture is divided into 256 × 256 pixels square pictures, each picture has about 5 seconds of audio, one song has about 24 spectrograms, after the division, total about 25200 pictures are processed, and a label is made for each picture, and the classification is labeled. By splitting each song into 24 spectrogram, the accuracy of model training can be improved, and accidental performance is avoided. Meanwhile, when extracting the feature vectors, in order to improve the accuracy and avoid the contingency, 24 feature vectors corresponding to one song can be obtained according to the same method, then the average vector of each song is obtained, the average vector is used as an evaluation standard to evaluate the similarity between different songs, and the optimal recommendation can be achieved.

Example 6

As shown in fig. 4, an embodiment of the present invention further provides an intelligent device, which includes a memory 100 and a processor 200, where the memory 100 is used to store an execution program, and the processor 200 is used to execute the execution program stored in the memory, so as to implement the operations performed by the song recommendation method according to any one of embodiments 1 to 3.

Specifically, the smart device may further include a communication interface 300, a communication bus 400 and an input/output interface 500, wherein the processor 200, the memory 100, the input/output interface 500 and the communication interface 300 complete communication with each other through the communication bus 400.

A communication bus 400 is a circuit that connects the elements described and enables transmission between these elements. For example, the processor 200 receives commands from other elements through the communication bus 400, decrypts the received commands, and performs calculations or data processing according to the decrypted commands. The memory 100 may include program modules such as a kernel (kernel), middleware (middleware), an Application Programming Interface (API), and applications. The program modules may be comprised of software, firmware or hardware, or at least two of the same. The input/output interface 500 forwards commands or data entered by a user via an input/output device (e.g., sensor, keyboard, touch screen). The communication interface 300 connects the electronic device with other network devices, user equipment, networks. For example, the communication interface 300 may be connected to a network by wire or wirelessly to connect to external other network devices or user devices. The wireless communication may include at least one of: wireless fidelity (WiFi), Bluetooth (BT), Near Field Communication (NFC), Global Positioning Satellite (GPS) and cellular communications, among others. The wired communication may include at least one of: universal Serial Bus (USB), high-definition multimedia interface (HDMI), asynchronous transfer standard interface (RS-232), and the like. The network may be a telecommunications network and a communications network. The communication network may be a computer network, the internet of things, a telephone network. The smart device may connect to the network through the communication interface 300 and the protocol by which the smart device communicates with other network devices may be supported by at least one of an application, an Application Programming Interface (API), middleware, a kernel, and a communication interface.

Example 7

An embodiment of the present invention further provides a storage medium, where at least one instruction is stored in the storage medium, and the instruction is loaded and executed by a processor to implement the operations performed by the song recommendation method according to any one of embodiments 1 to 3. For example, the computer readable storage medium may be a read-only memory (ROM), a random-access memory (RAM), a compact disc read-only memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like. They may be implemented in program code that is executable by a computing device such that it is executed by the computing device, or separately, or as individual integrated circuit modules, or as a plurality or steps of individual integrated circuit modules. Thus, the present invention is not limited to any specific combination of hardware and software.

It should be noted that the above embodiments can be freely combined as necessary. The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A song recommendation method, comprising the steps of:

finely classifying the target type songs;

and recommending songs according to the current fine classification.

2. The method of claim 1, wherein after the training of the song classification model by the training spectrogram, the method further comprises the steps of:

3. A song recommendation method according to claim 2, characterized by: comparing the feature vector corresponding to the current song with the feature vectors of all songs in the current fine classification through a cosine similarity algorithm;

and recommending the song with the maximum cosine similarity as a target song.

4. The method according to claim 2, wherein the obtaining a plurality of song files under each fine category and converting the song files into a training spectrogram specifically comprises:

extracting audio of a preset time period in the song file;

converting the audio of the preset time period into a complete spectrogram;

extracting the audio of the current song in a preset time period;

5. The song recommendation method according to claim 4, wherein the extracting feature vectors of all songs under each fine category through the song similarity model specifically comprises:

calculating an average vector of each song under each fine category;

calculating an average vector corresponding to the current song;

6. A song recommendation system, comprising:

7. The song recommendation system of claim 6, further comprising:

8. The song recommendation system of claim 7, wherein: and the comparison module calculates the feature vector corresponding to the current song and the feature vectors of all the songs under the current fine classification through a cosine similarity algorithm, and takes the song with the maximum cosine similarity as a target song.

9. A smart device, comprising:

the memory is used for storing the running program;

a processor for executing the operating program stored in the memory to implement the operations performed by the song recommendation method according to any one of claims 1 to 5.

10. A storage medium, characterized by: the storage medium has stored therein at least one instruction that is loaded and executed by a processor to perform operations performed by the song recommendation method of any one of claims 1-5.