CN111552778B

CN111552778B - Audio resource management method, device, computer readable storage medium and equipment

Info

Publication number: CN111552778B
Application number: CN202010338886.6A
Authority: CN
Inventors: 牛闯
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2020-04-26
Filing date: 2020-04-26
Publication date: 2024-05-14
Anticipated expiration: 2040-04-26
Also published as: CN111552778A

Abstract

The disclosure relates to an audio resource management method, an audio resource management device, a storage medium and audio resource management equipment, and belongs to the field of computer application. Comprising the following steps: determining a target resource set according to the resource names of the audio resources in the audio resource library, wherein the target resource set comprises at least one audio resource with non-repeated resource names; respectively distributing a resource library identifier for each audio resource included in the target resource set; for any audio resource to be classified, which is not allocated with a resource library identifier, in the audio resource library, searching a designated audio resource matched with the audio resource to be classified in the target resource set, wherein the designated audio resource is matched with the resource name of the audio resource to be classified, and the designated audio resource has the resource library identifier; obtaining the similarity between the audio resources to be classified and the appointed audio resources; if the similarity exceeds the target threshold, establishing a corresponding relation between the audio resources to be classified and the resource library identifications of the designated audio resources. The present disclosure is capable of efficiently managing audio resources.

Description

Audio resource management method, device, computer readable storage medium and equipment

Technical Field

The present disclosure relates to the field of computer applications, and in particular, to a method and apparatus for audio resource management, a computer readable storage medium, and a device.

Background

The rapid development of material civilization makes the pursuit of masses to the spirit civilization increasingly improved, and numerous resource sharing platforms are emerging on the market, and short video platforms are one of them.

With the rapid popularization of short video platforms in people and the flourishing development of the short video industry, music as a form of multimedia resource has become an important constituent element of short video. For example, when taking short videos, users have grown accustomed to selecting songs as a soundtrack in a song library provided by a short video platform.

It is well known that there are a great deal of differences in the needs of different users for songs. For example, some users like the original version of the song, some users like the flipped version of the song, and some users prefer to use the soundtrack of other short videos. This has a problem that: there may be multiple versions and multiple sources for the same song in the song library. For this reason, how to effectively manage songs in a song library so as to better serve video soundtrack services becomes a problem to be solved by those skilled in the art.

Disclosure of Invention

The present disclosure provides an audio resource management method, apparatus, computer-readable storage medium, and device, capable of effectively managing songs in a song library so as to better serve video soundtrack services. The technical scheme of the present disclosure is as follows:

according to a first aspect of embodiments of the present disclosure, there is provided an audio resource management method, the method including:

Determining a target resource set according to the resource names of the audio resources in the audio resource library, wherein the target resource set comprises at least one audio resource with non-repeated resource names;

respectively distributing a resource library identifier for each audio resource included in the target resource set;

for any audio resource to be classified, which is not allocated with a resource library identifier, in the audio resource library, searching a specified audio resource matched with the audio resource to be classified in the target resource set, wherein the specified audio resource is matched with a resource name of the audio resource to be classified, and the specified audio resource is provided with a resource library identifier;

Obtaining the similarity between the audio resources to be classified and the appointed audio resources;

and if the similarity exceeds a target threshold, establishing a corresponding relation between the audio resources to be classified and the resource library identifications of the specified audio resources.

In a possible implementation manner, the obtaining the similarity between the audio resource to be categorized and the specified audio resource includes:

Acquiring a first text of the audio resource to be classified and a second text of the appointed audio resource;

And calculating the text similarity between the first text and the second text to obtain the similarity between the audio resource to be classified and the appointed audio resource.

In a possible implementation manner, after the obtaining the similarity between the audio resource to be categorized and the specified audio resource, the method further includes:

And if the similarity does not exceed a target threshold, a new resource library identifier is allocated to the audio resource to be classified.

In one possible implementation, the calculating the text similarity between the first text and the second text includes:

vectorizing the first text to obtain word vectors of words in the first text;

Vectorizing the second text to obtain word vectors of each word in the second text;

Obtaining the distance between the word vector of the ith word in the first text and the word vector of the jth word in the second text, wherein the values of i and j are positive integers, the value range of i is 1 to the total number of words included in the first text, and the value range of j is 1 to the total number of words included in the second text;

And carrying out weighted summation on the obtained distance to obtain the text similarity between the first text and the second text.

In a possible implementation manner, after the correspondence between the audio resource to be categorized and the resource library identifier of the specified audio resource is established, the method further includes:

for any uploaded video resource, determining a target audio resource matched with a score resource in the video resource in the audio resource library;

and establishing a corresponding relation between the score resource and the resource library identification of the target audio resource.

In one possible implementation, the determining, in the audio resource library, a target audio resource that matches a soundtrack resource in the video resources includes:

Performing voice recognition on the score resource to obtain a third text of the score resource;

performing text matching on the third text and the text of each item of audio resources which are marked by the allocated resource library in the audio resource library;

and determining the audio resources with the text similarity with the score resources exceeding the target threshold value in the audio resource library as the target audio resources.

identifying a fourth text that appears in the video asset;

performing text matching on the fourth text and the text of each item of audio resources which are marked by the allocated resource library in the audio resource library;

And determining the audio resource, of which the text similarity with the video resource exceeds the target threshold value, in the audio resource library as the target audio resource.

According to a second aspect of embodiments of the present disclosure, there is provided an audio resource management apparatus, the apparatus comprising:

a determining module configured to determine a target resource set according to resource names of audio resources in an audio resource library, the target resource set including at least one audio resource whose resource names are not repeated;

the allocation module is configured to allocate a resource library identifier for each audio resource included in the target resource set;

The searching module is configured to search a specified audio resource matched with the audio resource to be classified in the target resource set for any audio resource to be classified which is not allocated with a resource library identifier in the audio resource library, wherein the specified audio resource is matched with a resource name of the audio resource to be classified, and the specified audio resource is provided with a resource library identifier;

The acquisition module is configured to acquire the similarity between the audio resources to be categorized and the appointed audio resources;

and the association module is configured to establish a corresponding relation between the audio resource to be classified and the resource library identifier of the appointed audio resource if the similarity exceeds a target threshold.

In a possible implementation manner, the obtaining module is further configured to obtain a first text of the audio resource to be categorized and a second text of the specified audio resource; and calculating the text similarity between the first text and the second text to obtain the similarity between the audio resource to be classified and the appointed audio resource.

In a possible implementation manner, the allocation module is further configured to allocate a new resource pool identifier to the audio resource to be categorized if the similarity does not exceed a target threshold.

In a possible implementation manner, the obtaining module is further configured to vectorize the first text to obtain word vectors of words in the first text; vectorizing the second text to obtain word vectors of each word in the second text; obtaining the distance between the word vector of the ith word in the first text and the word vector of the jth word in the second text, wherein the values of i and j are positive integers, the value range of i is 1 to the total number of words included in the first text, and the value range of j is 1 to the total number of words included in the second text; and carrying out weighted summation on the obtained distance to obtain the text similarity between the first text and the second text.

In one possible implementation, the apparatus further includes:

A matching module configured to determine, for any one of the uploaded video assets, a target audio asset in the audio asset library that matches a soundtrack asset in the video assets;

the association module is further configured to establish a correspondence between the soundtrack resource and a resource library identifier of the target audio resource.

In a possible implementation manner, the matching module is further configured to perform voice recognition on the score resource to obtain a third text of the score resource; performing text matching on the third text and the text of each item of audio resources which are marked by the allocated resource library in the audio resource library; and determining the audio resources with the text similarity with the score resources exceeding the target threshold value in the audio resource library as the target audio resources.

In one possible implementation, the matching module is further configured to identify fourth text that appears in the video resource; performing text matching on the fourth text and the text of each item of audio resources which are marked by the allocated resource library in the audio resource library; and determining the audio resource, of which the text similarity with the video resource exceeds the target threshold value, in the audio resource library as the target audio resource.

According to a third aspect of embodiments of the present disclosure, there is provided an electronic device, comprising:

A processor;

A memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the audio resource management method as described in the first aspect above.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the audio resource management method as described in the first aspect above.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product, instructions in which, when executed by a processor of an electronic device, enable the electronic device to perform the audio resource management method as described in the first aspect above.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

When managing audio resources in an audio resource library, the embodiment of the disclosure firstly determines a target resource set according to the resource names of the audio resources in the audio resource library, and respectively allocates a resource library identifier for each audio resource included in the target resource set, wherein the target resource set comprises at least one audio resource with a non-repeated resource name; then, aiming at any audio resource to be classified which is not allocated with the resource library identification in the audio resource library, searching a designated audio resource matched with the audio resource to be classified in a target resource set, wherein the designated audio resource is matched with the resource name of the audio resource to be classified, and the designated audio resource has the resource library identification; and then, obtaining the similarity between the audio resource to be classified and the designated audio resource, and if the similarity exceeds a target threshold, establishing a corresponding relation between the audio resource to be classified and a resource library identifier of the designated audio resource, namely, according to the embodiment of the disclosure, multiple versions of the same audio resource can be combined by comparing the resource names with the obtained similarity, namely, an association relation is established for multiple versions of the same song, so that the management of the audio resource library by the resource sharing platform is facilitated, the effective management and statistics of the audio resource library by the resource sharing platform are realized, and video score service can be better served.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure and do not constitute an undue limitation on the disclosure.

Fig. 1 is a schematic diagram illustrating an implementation environment involved in an audio resource management method according to an exemplary embodiment.

Fig. 2 is a flow chart illustrating a method of audio resource management according to an exemplary embodiment.

Fig. 3 is a flowchart illustrating a method of audio resource management, according to an exemplary embodiment.

Fig. 4 is a schematic diagram illustrating an audio resource management procedure according to an exemplary embodiment.

Fig. 5 is a schematic diagram illustrating an audio resource management procedure according to an exemplary embodiment.

Fig. 6 is a schematic diagram illustrating an audio resource management procedure according to an exemplary embodiment.

Fig. 7 is a schematic diagram illustrating an audio resource management procedure according to an exemplary embodiment.

Fig. 8 is a block diagram illustrating an audio resource management device according to an exemplary embodiment.

Fig. 9 is a block diagram of an electronic device, according to an example embodiment.

Detailed Description

In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.

The user information referred to in the present disclosure may be information authorized by the user or sufficiently authorized by each party.

Before explaining embodiments of the present disclosure in detail, some noun terms or abbreviations referred to in the present disclosure are introduced.

Multimedia resource library: refer to a set of multimedia assets for storing the multimedia assets.

By way of example, taking a multimedia asset as an audio asset, the multimedia asset library may be a song library provided by a short video platform, where songs stored in the song library are used to match for short videos, i.e., the user may select a song in the song library as background music for his or her short video.

The short video platform is used for recording and sharing production and life of users. On the short video platform, a user can record his own life drops by using short videos, and can also interact with vermicelli in real time by live broadcast. The content of the short video platform can cover aspects of life. Here, the user can find the favorite content, find the interested person, see the more real and interesting world, and also can make the world find the real and interesting person. Illustratively, a short video may refer to a video having a duration less than a certain duration (e.g., 60 s), which is not particularly limited by the embodiments of the present disclosure.

Audio resources: in the disclosed embodiments, the audio resource may refer to a song, i.e., music. Such as songs in a song library that may be provided for a short video platform.

Resource name: for identifying the audio resource. Taking a song as an example, the resource name is the song name.

Text matching: in NLP (Natural Language Processing ), text matching techniques, typically in the form of text similarity calculations, text relevance calculations, play a core supporting role in certain application systems, such as search engines, intelligent questions and answers, knowledge retrieval, information flow recommendation, etc. That is, text matching is a core problem in natural language processing, which can be applied to a large number of natural language processing tasks, such as search engines, intelligent questions and answers, knowledge retrieval, information flow recommendation, and the like.

OCR (Optical Character Recognition ): techniques for converting textual content on an image directly into editable text. That is, the OCR technology is a process of determining the shape of a character on an image by detecting dark and bright patterns and then translating the recognized shape into computer text by a character recognition method.

The following describes an implementation environment related to an electronic resource issuing method in live broadcast provided by an embodiment of the present disclosure.

The audio resource management method provided by the embodiment of the application can be applied to a short video platform. Wherein the short video platform may appear as a server. Illustratively, referring to FIG. 1, the implementation environment may include: a terminal 101, a short video platform 102 and a multimedia repository 103.

In one possible implementation, the short video platform 102 provides a multimedia asset library 103 for the user. Taking the multimedia asset library 103 as an example of a song library, songs stored in the song library are used for making short video matches for users, i.e. the users can select the songs in the song library as background music of their own short videos. The terminal 101 is generally provided with a short video application, so that a user can conveniently shoot short videos, watch the short videos shared by others, live broadcast or watch live broadcast of others, and the like.

The short video platform 102 may be an independent physical server, or may be a server cluster or a distributed system formed by a plurality of physical servers, which is not limited in particular in the embodiment of the present application.

In the embodiment of the present application, the type of the terminal 101 is typically a mobile terminal. As one example, mobile terminals include, but are not limited to: smart phones, tablet computers, electronic readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, and the like.

In addition, the terminal 101 and the short video platform 102 may be directly or indirectly connected through wired or wireless communication, and the present application is not limited herein.

Based on the above-mentioned implementation environment, the embodiment of the disclosure provides an audio resource management method, taking the management of a song library provided for a user by a short video platform as an example, the embodiment of the disclosure can combine multiple versions of the same song by comparing song names of songs with determining similarities between songs, and combine short video sound using the same song, thereby establishing an association relationship for multiple versions of the same song and multiple sources, and facilitating the management of the song library by the short video platform. In other words, the embodiment of the disclosure integrates the flipped version, the original sound version and various sources (such as short video original sound) of the same song under one song library identifier, so that the short video platform can effectively manage and count songs in the song library, and can better serve short video soundtrack service.

Fig. 2 is a flowchart illustrating an audio resource management method for the short video platform shown in fig. 1, as shown in fig. 2, according to an exemplary embodiment, including the following steps.

In step 201, a target set of resources is determined from the resource names of the audio resources in the audio resource library, the target set of resources comprising at least one audio resource whose resource names are not repeated.

In step 202, a repository identification is assigned to each audio resource included in the target set of resources.

In step 203, for any audio resource to be categorized that is not assigned with a resource library identifier in the audio resource library, a specified audio resource that matches the audio resource to be categorized is found in the target resource set, the specified audio resource matches a resource name of the audio resource to be categorized, and the specified audio resource has the resource library identifier.

In step 204, a similarity between the audio resource to be categorized and the specified audio resource is obtained.

In step 205, if the similarity exceeds the target threshold, a correspondence between the audio resources to be categorized and the resource library identifier of the specified audio resources is established.

When managing audio resources in an audio resource library, the method provided by the embodiment of the disclosure determines a target resource set according to the resource names of the audio resources in the audio resource library, and allocates a resource library identifier to each audio resource included in the target resource set, wherein the target resource set comprises at least one audio resource with a non-repeated resource name; then, aiming at any audio resource to be classified which is not allocated with the resource library identification in the audio resource library, searching a designated audio resource matched with the audio resource to be classified in a target resource set, wherein the designated audio resource is matched with the resource name of the audio resource to be classified, and the designated audio resource has the resource library identification; then, obtaining the similarity between the audio resources to be classified and the appointed audio resources; if the similarity exceeds a target threshold, a corresponding relation between the audio resource to be classified and the resource library identification of the appointed audio resource is established, namely, the embodiment of the disclosure can combine multiple versions of the same audio resource by comparing the resource names with the acquired similarity, namely, an association relation is established for multiple versions of the same song, so that the management of the audio resource library by the resource sharing platform is facilitated, the effective management and statistics of the audio resource library by the resource sharing platform are realized, and video score service can be better served.

Illustratively, when the similarity comparison is performed, the embodiment of the disclosure performs text similarity comparison, namely, compares the similarity between lyrics. In addition to text similarity comparison, similarity comparison such as melody, rhythm, or spectral feature between audio resources may be performed, which is not particularly limited in the embodiments of the present disclosure. For example, any feature information that can be used to compare the similarity between two audio resources can be used in the present disclosure.

For the implementation mode, if the text similarity between the audio resources to be classified and the appointed audio resources does not exceed the target threshold, two different audio resources with the same resource names as the appointed audio resources are determined, a new resource library identifier is allocated for the audio resources to be classified, and effective management of the audio resources in the audio resource library is achieved.

vectorizing the first text to obtain word vectors of words in the first text;

For example, text matching may be performed by calculating Word shift distance (WMD, word river' S DISTANCE), i.e., calculating similarity between texts. In addition to this way of calculating word shift distance, other text matching ways may be employed, which are not specifically limited by the embodiments of the present disclosure.

Aiming at the implementation mode, the similarity between texts is accurately calculated, and guarantee is provided for effective management of audio resources in an audio resource library.

The embodiment of the disclosure also supports the management of the soundtrack appearing in the video resources uploaded by the user, namely after the identifier of the resource library is allocated to each audio resource in the audio resources, the identifier of the resource library is further allocated to the soundtrack on the basis of the identifier of the resource library, so that the management range of the audio resources is enlarged.

The target audio resource matched with the soundtrack in the audio resource library can be determined by performing voice recognition on the soundtrack and performing text matching on the recognized text and the text of each item of audio resource identified by the allocated resource library in the audio resource library.

identifying a fourth text that appears in the video asset;

In addition, the target audio resource matched with the match in the audio resource library can be determined by identifying the text appearing in the video resource and performing text matching on the identified text and the text of each item of audio resource identified by the allocated resource library in the audio resource library.

The embodiment of the disclosure provides a plurality of ways for determining the target audio resource matched with the score, and enriches the implementation modes.

Any combination of the above-mentioned optional solutions may be adopted to form an optional embodiment of the present disclosure, which is not described herein in detail.

Fig. 3 is a flowchart of an audio resource management method according to an exemplary embodiment, and as shown in fig. 3, the audio resource management method is used in the short video platform shown in fig. 1, and takes an example that an audio resource is provided to a user for a song of video soundtrack by using the short video platform, and accordingly, the foregoing audio resource library is a song library provided to the user by using the short video platform, a resource name is a song name, a resource set is a song set, a resource library identifier is a song library identifier, and a text is lyric information. The method comprises the following steps.

In step 301, a target resource set is determined according to the resource names of audio resources in an audio resource library, and a resource library identifier is respectively allocated to each audio resource included in the target resource set; wherein the target resource set includes at least one audio resource whose resource name is not repeated.

The step is to assign a song library Identification (ID) to songs whose song names are not repeated in the song library.

As shown in fig. 4, at least one song ID is included for each song in the song library, such as symbol identifications 1001 through 1006 in fig. 4; a song title; and, singer information. Based on fig. 4, in order to manage different versions or different sources of each song in the song library, the embodiment of the disclosure further assigns a song library identifier to each song in the song library. Wherein, as shown in fig. 5, symbol identifiers 1 to 3 are the library identifiers.

In the embodiment of the disclosure, in order to allocate a song library identifier to each song in the song library, a target song set with a non-repeated song name is first selected from the song library. As one example, song title non-duplication may be a complete difference in song title, such as the song title "AAAA" and song title "BB" in FIG. 4. That is, songs with completely different song names in the song library are filtered out to form a target song set. Thereafter, each song included in the target song set is assigned a song library identification.

Taking fig. 4 as an example, there may be three songs selected from AAAA, BB (song identifier 1002) and Disco, and a unique library identifier is allocated to each of the three songs, see the library identifier 1, the library identifier 2 and the library identifier 3 shown in fig. 5.

In step 302, for any audio resource to be categorized that is not assigned with a resource library identifier in the audio resource library, a specified audio resource that matches the audio resource to be categorized is found in the target resource set, where the specified audio resource matches a resource name of the audio resource to be categorized, and the specified audio resource has the resource library identifier.

For the remaining songs not assigned a library identification in step 301, the manner provided by this step may be taken to obtain the corresponding library identification. That is, for any song a of the remaining songs, a song b having a matching song name is first found in the target song set, and a song library identification is assigned. Wherein song a is the song to be categorized and song b is the designated song that matches the song to be categorized.

It should be noted that, the song names may be completely identical, for example, song "BB" with song identification 1002 and song "BB" with song identification 1006 in fig. 4 correspond to this case; in addition, the song title matches may be that there is a substantial overlap in song titles, such as the song identified as 1001 in FIG. 4

The "AAAA", song name "AAAA" (cover MChl) with song identification 1003 ", song" AAAA "DJ version" with song identification 1004 are specific to this case. In addition, the song names in fig. 5 and 8 refer to identical parts in song names corresponding to different versions of the same song.

In step 303, a first text of the audio resource to be categorized and a second text of the specified audio resource are obtained, and a text similarity between the first text and the second text is calculated, so as to obtain a similarity between the audio resource to be categorized and the specified audio resource.

This step is to compare the lyric information of the song to be categorized (herein referred to as a first text) with the lyric information of the designated song (herein referred to as a second text). And then, carrying out text matching on the lyric information of the songs to be classified and the lyric information of the appointed songs. In one possible implementation, the text similarity between the first text and the second text is calculated, including but not limited to, the following:

3031. vectorizing the first text to obtain word vectors of words in the first text; and vectorizing the second text to obtain word vectors of the words in the second text.

As one example, vectorizing the song information includes, but is not limited to: converting the lyric information in text form into feature vectors by adopting word embedding (word embedding) mode; or, converting the lyric information in text form into feature vectors by adopting a BERT (Bidirectional Encoder Representations from Transformers bi-directional coding representation based on a transformer) model; a CNN (Convolutional Neural Networks, convolutional neural network) model may also be employed to convert the lyric information in text form into feature vectors, which is not particularly limited by the embodiments of the present disclosure.

3032. And obtaining the distance between the word vector of the ith word in the first text and the word vector of the jth word in the second text.

The values of i and j are positive integers, the value range of i is 1 to the total number of words included in the first text, and the value range of j is 1 to the total number of words included in the second text.

For example, text matching may be performed by calculating word shift distance, i.e., calculating similarity between texts. In addition to this way of calculating word shift distance, other text matching ways may be employed, which are not specifically limited by the embodiments of the present disclosure.

In term vector space, WMD is understood to be the minimum total cost required to translate from one text to another, which is derived from the weighted summation of the word-to-word movement costs in the two texts. Illustratively, the movement cost from word to word may be measured by the Euclidean distance between the word vectors of the two. That is, the distance between the word vectors may be a euclidean distance. In other words, WMD can reflect the similarity between texts, and the text distance can be modeled as a combination of semantic distances of words in two texts, for example, euclidean distances are calculated for word vectors corresponding to any two words in two texts, and then weighted and summed, that is, c (i, j) is the euclidean distance between word vectors of i, j, which is the calculation method of WMD.

In one possible implementation, the distance c (i, j) = |x _i-x_j||₂ between the word vector of word i and the word vector of word j, where c (i, j) may be considered the cost of transferring from word i to word j.

3033. And carrying out weighted summation processing on the obtained distance to obtain the text similarity between the first text and the second text.

When vectorizing the first text and the second text, each word is assigned a weight and can therefore be regarded as a transportation problem. Taking d and d 'as examples of the first text and the second text, respectively, the weighting matrix T is a sparse matrix, and T _ij > 0 represents the ratio of the word i in d to the word j in d'. At this time, there are two constraints:

This transportation problem can be written in the form of:

the equation represents the total cost that the first text needs to be converted to the second text. After the cost is taken as a lower bound, i.e. a minimum value, the shortest total distance from all words in the first text to each word in the second text is taken, which value represents the similarity between the two texts.

In step 304, if the text similarity between the audio resource to be categorized and the specified audio resource exceeds the target threshold, a correspondence between the audio resource to be categorized and the resource library identifier of the specified audio resource is established.

In the embodiment of the disclosure, if the lyric similarity between the song to be categorized and the designated song exceeds the target threshold, the song to be categorized and the designated song are regarded as two different versions of the same song, and the song to be categorized is also categorized under the song library identifier of the designated song, namely, the corresponding relation between the audio resource to be categorized and the resource library identifier of the designated audio resource is established.

In addition, in addition to the lyric similarity comparison, similarity comparison of melodies, rhythms, spectral features, or the like between songs may be performed, which is not particularly limited in the embodiments of the present disclosure.

In step 305, if the text similarity between the audio resource to be categorized and the specified audio resource does not exceed the target threshold, a new resource library identifier is allocated to the audio resource to be categorized.

In the embodiment of the disclosure, if the lyric similarity between the song to be categorized and the designated song does not exceed the target threshold, determining that the song to be categorized and the designated song are two different songs with the same song name, and allocating a new resource library identifier for the audio resource to be categorized.

It should be noted that, in the first aspect, the embodiments of the present disclosure may repeatedly perform the steps 302 to 305 until the remaining songs in the song library are traversed, so as to allocate a song library identifier to each song in the song library.

Taking fig. 4 and fig. 5 as an example, if the selected songs whose song names are not repeated are three of "AAAA", "BB" (song identifier 1002) and "Disco", and a unique song library identifier is allocated to each of the three songs, referring to the song library identifier 1, the song library identifier 2 and the song library identifier 3 shown in fig. 5, the remaining songs in fig. 4 include: song "(" AAAA "(cover MChl)") with song identification 1003, song "(" AAAA "DJ version) with song identification 1004, song" ("BB") with song identification 1006, wherein song "(" AAAA "(cover MChl)") and song "(" AAAA "DJ version") would be assigned to the song identification 1 and song "(" BB ") with song identification 1006 would be assigned to the song identification 2. I.e. forming fig. 5.

The second point to be described is that after each song in the song library is assigned with a song library identifier, the embodiment of the disclosure also classifies the score resources appearing in the video resources uploaded by the user into the assigned song library identifiers. That is, according to the corresponding relation between the short video sound and the songs in the song library, the short video sound is combined to the song library identifier to which the corresponding song belongs, and the following step 306 is referred to for details.

In step 306, for any video resource uploaded by the user, a target audio resource matching with the score resource in the video resource is determined in the audio resource library, and a correspondence between the score resource and the resource library identifier of the target audio resource is established.

In one possible implementation, the short video soundtrack is incorporated under the song library identifier to which the corresponding song belongs, including but not limited to the following two ways:

3061. The lyric information of the soundtrack played in the short video is identified through a voice recognition technology, the corresponding relation between the song in the song library and the short video original sound is determined according to the lyric information, and the short video original sound is combined under the song library identifier to which the corresponding song belongs according to the corresponding relation between the short video original sound and the song in the song library.

Namely, performing voice recognition on the music resources played in the video resources through a voice recognition technology to obtain a third text of the music resources; text matching is carried out on the third text and the text of each item of audio resources which are marked by the allocated resource library in the audio resource library; and determining the audio resource with the text similarity exceeding the target threshold value with the score resource in the audio resource library as the target audio resource. Illustratively, common methods for speech recognition technology include, but are not limited to, the following four: linguistic and acoustic based methods, stochastic modeling methods, methods using artificial neural networks, and probabilistic grammar analysis.

As one example, embodiments of the present disclosure may employ a deep learning model for speech recognition of short video originals, which embodiments of the present disclosure are not particularly limited. In addition, text matching is performed between the third text and the text of each audio resource in the audio resource library, and the foregoing text matching manner may be adopted, which is not described herein.

The method is characterized in that the target song is determined by comparing the lyric information of the music played in the short video with the lyrics of each song in a song library through a voice recognition technology, wherein the lyrics similarity between the lyrics of the target song and the music played in the short video exceeds a target threshold value, and accordingly the song name of the target song is determined to be consistent with the song name of the music played in the short video, so that the music played in the short video is combined under the song library identifier to which the target song belongs.

3062. Text information appearing in the short video is recognized through an OCR technology, the corresponding relation between the song in the song library and the short video original sound is determined according to the text information, and the short video original sound is combined under a song library identifier to which the corresponding song belongs according to the corresponding relation between the short video original sound and the song in the song library.

That is, text information (referred to herein as fourth text) that appears in the video asset is recognized by OCR technology; text matching is carried out on the fourth text and the text of each item of audio resources which are marked by the allocated resource library in the audio resource library; and determining the audio resource with the text similarity exceeding the target threshold value with the video resource in the audio resource library as the target audio resource.

Illustratively, the frames of images in the video asset are processed using OCR techniques, including but not limited to: image preprocessing, feature extraction, character recognition and recognition post-processing. Wherein, the preprocessing can include graying, binarization, tilt detection and correction, line and word segmentation, smoothing, normalization, etc.; the recognition post-processing is to correct the recognition result according to the relation of the specific language context. Wherein feature extraction and character recognition may be performed based on a deep learning network, i.e., the deep learning model acts primarily as a feature extractor and classifier, to which embodiments of the present disclosure are not specifically limited.

Illustratively, the OCR processing may be performed on each frame in the video resource, or may be performed on a portion of the frames in the video resource, for example, the video resource is divided into N segments, and an image of a frame is taken from each segment to perform the OCR processing, which is not specifically limited in the embodiments of the present disclosure.

Aiming at the mode, the target song is determined by comparing text information appearing in the short video with lyrics of each song in a song library through an OCR technology, wherein the text similarity between the lyrics of the target song and the text appearing in the short video exceeds a target threshold value, and accordingly, the song name of the target song is determined to be consistent with the song name of the soundtrack played in the short video, so that the soundtrack played in the short video is combined under a song library identifier to which the target song belongs.

For example, four short video sounds are shown in fig. 6, and after the processing of the above step 306, "AAAA" with the sound ID of 101 and "AAAA" with the sound ID of 103 are combined under the song base identifier 1 in fig. 5; and the "BB" adapted version "with the sound ID of 102 and the" ddd work sound "with the sound ID of 104 are combined under the music library mark 2 in FIG. 5, so as to form the corresponding relation shown in FIG. 8.

The method provided by the embodiment of the disclosure has at least the following beneficial effects:

The embodiment of the disclosure provides an audio resource management method, taking a management of a song library provided for a user by a short video platform as an example, and the embodiment of the disclosure can combine multiple overturn versions and original sound versions of the same song by comparing song names of songs with determined similarity (such as text similarity between calculated lyrics); and the short video original sound using the song can be combined under the same song library identifier, so that the association relationship between multiple versions and multiple sources of the same song is established, and the short video platform is convenient to manage the song library. In other words, the embodiment of the disclosure integrates the flipped version, the original sound version and various sources (such as short video original sound) of the same song under one song library identifier, so that the short video platform can effectively manage and count songs in the song library, and can better serve short video soundtrack service.

Fig. 8 is a block diagram illustrating an audio resource management device according to an exemplary embodiment. Referring to fig. 8, the apparatus includes a determination module 801, an allocation module 802, a lookup module 803, and an association module 804.

A determining module 801 configured to determine a target set of resources from resource names of audio resources in an audio resource library, the target set of resources comprising at least one audio resource whose resource names do not repeat;

An allocation module 802 configured to allocate a resource library identifier to each audio resource included in the target resource set;

A searching module 803, configured to search, for any audio resource to be categorized that is not allocated with a resource library identifier in the audio resource library, a specified audio resource that is matched with the audio resource to be categorized in the target resource set, where the specified audio resource is matched with a resource name of the audio resource to be categorized, and the specified audio resource has a resource library identifier;

An obtaining module 804, configured to obtain a similarity between the audio resource to be categorized and the specified audio resource;

And an association module 805 configured to establish a correspondence between the audio resource to be categorized and the resource library identifier of the specified audio resource if the similarity exceeds a target threshold.

When managing audio resources in an audio resource library, the device provided by the embodiment of the disclosure determines a target resource set according to the resource names of the audio resources in the audio resource library, and allocates a resource library identifier to each audio resource included in the target resource set, wherein the target resource set comprises at least one audio resource with a non-repeated resource name; then, aiming at any audio resource to be classified which is not allocated with the resource library identification in the audio resource library, searching a designated audio resource matched with the audio resource to be classified in a target resource set, wherein the designated audio resource is matched with the resource name of the audio resource to be classified, and the designated audio resource has the resource library identification; then, obtaining the similarity between the audio resources to be classified and the appointed audio resources; if the similarity exceeds a target threshold, a corresponding relation between the audio resource to be classified and the resource library identification of the appointed audio resource is established, namely, the embodiment of the disclosure can combine multiple versions of the same audio resource by comparing the resource names with the acquired similarity, namely, an association relation is established for multiple versions of the same song, so that the management of the audio resource library by the resource sharing platform is facilitated, the effective management and statistics of the audio resource library by the resource sharing platform are realized, and video score service can be better served.

In one possible implementation, the obtaining module is further configured to obtain a first text of the audio resource to be categorized and a second text of the specified audio resource; and calculating the text similarity between the first text and the second text to obtain the similarity between the audio resource to be classified and the appointed audio resource.

In a possible implementation manner, the obtaining module is further configured to perform vectorization processing on the first text to obtain word vectors of words in the first text; vectorizing the second text to obtain word vectors of each word in the second text; obtaining the distance between the word vector of the ith word in the first text and the word vector of the jth word in the second text, wherein the values of i and j are positive integers, the value range of i is 1 to the total number of words included in the first text, and the value range of j is 1 to the total number of words included in the second text; and carrying out weighted summation on the obtained distance to obtain the text similarity between the first text and the second text.

In one possible implementation, the apparatus further includes:

The matching module is configured to determine matching with the soundtrack resources in the video resources in the audio resource library for any one of the uploaded video resources;

And the association module is further configured to establish a corresponding relation between the score resource and the resource library identification of the target audio resource.

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

Fig. 9 shows a block diagram of an electronic device 900 provided by an exemplary embodiment of the present disclosure.

In general, the apparatus 900 includes: a processor 901 and a memory 902.

Processor 901 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 901 may be implemented in at least one hardware form of DSP (DIGITAL SIGNAL Processing), FPGA (Field-Programmable gate array), PLA (Programmable Logic Array ). Processor 901 may also include a main processor, which is a processor for processing data in an awake state, also referred to as a CPU (CentralProcessing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 901 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 901 may also include an AI (ARTIFICIAL INTELLIGENCE ) processor for processing computing operations related to machine learning.

The memory 902 may include one or more computer-readable storage media, which may be non-transitory. The memory 902 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 902 is used to store at least one instruction for execution by processor 901 to implement the audio resource management methods provided by the method embodiments in the present disclosure.

In some embodiments, the apparatus 900 may further optionally include: a peripheral interface 903, and at least one peripheral. The processor 901, memory 902, and peripheral interface 903 may be connected by a bus or signal line. The individual peripheral devices may be connected to the peripheral device interface 903 via buses, signal lines, or circuit boards. Specifically, the peripheral device includes: a power supply 904.

The peripheral interface 903 may be used to connect at least one peripheral device associated with an I/O (Input/Output) to the processor 901 and the memory 902. In some embodiments, the processor 901, memory 902, and peripheral interface 903 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 901, the memory 902, and the peripheral interface 903 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.

The power supply 904 is used to power the various components in the device 900. The power source 904 may be alternating current, direct current, disposable battery, or rechargeable battery. When the power source 904 comprises a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

Those skilled in the art will appreciate that the structure shown in fig. 9 is not limiting of the apparatus 900 and may include more or fewer components than shown, or may combine certain components, or may employ a different arrangement of components.

In an exemplary embodiment, a storage medium is also provided, such as a memory, comprising instructions executable by a processor of the device 900 to perform the above-described audio resource management method. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

In an exemplary embodiment, a computer program product is also provided, the instructions in which, when executed by a processor of the electronic device 900, enable the electronic device 900 to perform the audio resource management method as in the method embodiments described above.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An audio resource management method, applied to a video soundtrack service, comprising:

screening out audio resources with non-repeated resource names in an audio resource library to obtain a target resource set;

Acquiring a first text of the audio resource to be classified and a second text of the appointed audio resource; vectorizing the first text to obtain word vectors of words in the first text; vectorizing the second text to obtain word vectors of each word in the second text; obtaining the distance between the word vector of the ith word in the first text and the word vector of the jth word in the second text, wherein the values of i and j are positive integers, the value range of i is 1 to the total number of words included in the first text, and the value range of j is 1 to the total number of words included in the second text; the distance is used to reflect the cost of transferring from the ith word to the jth word; carrying out weighted summation processing on the obtained distances to obtain the similarity between the audio resources to be classified and the appointed audio resources;

if the similarity exceeds a target threshold, establishing a corresponding relation between the audio resources to be classified and the resource library identifications of the designated audio resources;

If the similarity does not exceed a target threshold, a new resource library identifier is allocated to the audio resource to be classified;

For any uploaded video resource, determining a target audio resource matched with a score resource in the video resource in the audio resource library; establishing a corresponding relation between the score resource and a resource library identifier of the target audio resource; wherein the target audio resource is determined by speech recognition of the soundtrack resource or by recognition of text information present in the video resource.

2. The audio resource management method of claim 1, wherein said determining in said audio resource library a target audio resource that matches a soundtrack resource in said video resources comprises:

3. The audio resource management method of claim 1, wherein said determining in said audio resource library a target audio resource that matches a soundtrack resource in said video resources comprises:

identifying a fourth text that appears in the video asset;

4. An audio resource management apparatus for use in a video soundtrack service, the apparatus comprising:

The determining module is configured to screen out the audio resources with non-repeated resource names in the audio resource library to obtain a target resource set;

The acquisition module is configured to acquire a first text of the audio resource to be classified and a second text of the specified audio resource; vectorizing the first text to obtain word vectors of words in the first text; vectorizing the second text to obtain word vectors of each word in the second text; obtaining the distance between the word vector of the ith word in the first text and the word vector of the jth word in the second text, wherein the values of i and j are positive integers, the value range of i is 1 to the total number of words included in the first text, and the value range of j is 1 to the total number of words included in the second text; the distance is used to reflect the cost of transferring from the ith word to the jth word; carrying out weighted summation processing on the obtained distances to obtain the similarity between the audio resources to be classified and the appointed audio resources;

the association module is configured to establish a corresponding relation between the audio resources to be classified and the resource library identifications of the specified audio resources if the similarity exceeds a target threshold;

the allocation module is further configured to allocate a new resource library identifier to the audio resource to be classified if the similarity does not exceed a target threshold;

The association module is further configured to establish a correspondence between the score resources and the resource library identifications of the target audio resources; wherein the target audio resource is determined by speech recognition of the soundtrack resource or by recognition of text information present in the video resource.

5. The audio resource management device of claim 4, wherein the matching module is further configured to perform speech recognition on the soundtrack resource to obtain a third text of the soundtrack resource; performing text matching on the third text and the text of each item of audio resources which are marked by the allocated resource library in the audio resource library; and determining the audio resources with the text similarity with the score resources exceeding the target threshold value in the audio resource library as the target audio resources.

6. The audio resource management apparatus of claim 4, wherein the matching module is further configured to identify fourth text that appears in the video resource; performing text matching on the fourth text and the text of each item of audio resources which are marked by the allocated resource library in the audio resource library; and determining the audio resource, of which the text similarity with the video resource exceeds the target threshold value, in the audio resource library as the target audio resource.

7. An electronic device, comprising:

A processor;

A memory for storing the processor-executable instructions;

Wherein the processor is configured to execute the instructions to implement the audio resource management method of any of claims 1 to 3.

8. A computer readable storage medium, characterized in that instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the audio resource management method of any one of claims 1 to 3.