CN111125387A

CN111125387A - Multimedia list generation and naming method and device, electronic equipment and storage medium

Info

Publication number: CN111125387A
Application number: CN201911274690.9A
Authority: CN
Inventors: 华磊; 刘权; 李锐; 陈志刚
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2019-12-12
Filing date: 2019-12-12
Publication date: 2020-05-08
Anticipated expiration: 2039-12-12
Also published as: CN111125387B

Abstract

The embodiment of the invention provides a multimedia list generation method, a multimedia list naming method, a multimedia list generation device, electronic equipment and a storage medium, wherein the list generation method comprises the following steps: determining a plurality of multimedia question-answer pairs; determining question-answer pair characteristics of each multimedia question-answer pair, wherein the question-answer pair characteristics represent type characteristics of multimedia resources contained in the multimedia question-answer pairs; and clustering each multimedia question-answer pair based on the question-answer pair characteristics of each multimedia question-answer pair, and generating a multimedia list based on a clustering result. The method, the device, the electronic equipment and the storage medium provided by the embodiment of the invention use the multimedia question answers containing rich multimedia resource information for generating the multimedia list, are beneficial to realizing fine-grained multimedia resource division, and can cover various user requirements. The question-answer pair characteristics based on the multimedia question-answer pair are subjected to unsupervised clustering, additional manpower consumption is not needed for labeling, and the manpower consumption required by multimedia list generation can be effectively saved.

Description

Multimedia list generation and naming method and device, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of natural language processing, in particular to a multimedia list generation and naming method, a multimedia list generation and naming device, electronic equipment and a storage medium.

Background

With the rapid development of internet technology, people can obtain massive multimedia resources through internet query. If the interested contents are obtained from massive multimedia resources, a great deal of time and energy are fully consumed for selection and discrimination.

At present, a multimedia list can be generated by manually labeling type tags of multimedia resources, so that different types of multimedia resources are provided for people, and people can quickly find multimedia resources of interesting types. However, the labor cost required by manual labeling is high, the efficiency is low, and the type labels are too wide, so that the requirement for searching fine granularity cannot be met.

Disclosure of Invention

The embodiment of the invention provides a multimedia list generation and naming method, a multimedia list generation and naming device, electronic equipment and a storage medium, and aims to solve the problems of high labor cost and coarse classification granularity of the conventional multimedia list generation method.

In a first aspect, an embodiment of the present invention provides a multimedia list generating method, including:

determining a plurality of multimedia question-answer pairs;

determining question-answer pair characteristics of each multimedia question-answer pair, wherein the question-answer pair characteristics represent type characteristics of multimedia resources contained in the multimedia question-answer pairs;

and clustering each multimedia question-answer pair based on the question-answer pair characteristics of each multimedia question-answer pair, and generating a multimedia list based on a clustering result.

Preferably, the determining the question-answer pair characteristics of each multimedia question-answer pair specifically includes:

determining the question text characteristics of any multimedia question-answer pair;

and/or, determining the answer text characteristics of any multimedia question-answer pair;

and determining the question-answer pair characteristics of any multimedia question-answer pair based on the question text characteristics and/or the answer text characteristics.

Preferably, the determining the question text feature of any multimedia question-answer pair specifically includes:

and determining semantic features of the question text of any multimedia question-answer pair as the question text features of the multimedia question-answer pair.

Preferably, the determining the answer text feature of any multimedia question-answer pair specifically includes:

determining each multimedia resource contained in the answer text of any multimedia question-answer pair;

determining related information of each multimedia resource;

and determining the answer text characteristics of any multimedia question-answer pair based on the relevant information of each multimedia resource.

Preferably, the determining a plurality of multimedia question-answer pairs specifically includes:

determining a plurality of candidate question-answer pairs;

inputting any candidate question-answer pair into an intention classification model to obtain an intention classification result output by the intention classification model; the intention classification model is obtained by training based on the sample question-answer pairs and the sample intention classification results thereof;

and if the intention classification result shows that the multimedia is related, determining that any candidate question-answer pair is the multimedia question-answer pair.

Preferably, the inputting any candidate question-answer pair into the intention classification model to obtain an intention classification result output by the intention classification model specifically includes:

inputting a word vector of each word in the question text of any candidate question-answer pair into a semantic coding layer of the intention classification model to obtain semantic features of the question text output by the semantic coding layer;

and inputting the semantic features into a classification output layer of the intention classification model to obtain the intention classification result output by the classification output layer.

Preferably, the clustering is performed on each multimedia question-answer pair based on the question-answer pair characteristics of each multimedia question-answer pair, and a multimedia list is generated based on a clustering result, which specifically includes:

clustering each multimedia question-answer pair based on the question-answer pair characteristics of each multimedia question-answer pair to obtain a clustering result;

and generating a multimedia list corresponding to any cluster based on each multimedia question-answer pair belonging to any cluster in the clustering result.

In a second aspect, an embodiment of the present invention provides a multimedia list naming method, including:

determining related texts of multimedia resources in a multimedia list; wherein the multimedia list is generated based on the multimedia list generation method as provided in the first aspect;

and generating a list name of the multimedia list based on the related text of the multimedia resources in the multimedia list.

Preferably, the generating a list name of the multimedia list based on the related text of the multimedia resource in the multimedia list specifically includes:

determining keywords of related texts of multimedia resources in the multimedia list;

and generating a list name of the multimedia list based on the keyword.

Preferably, the generating a list name of a multimedia list based on the keyword specifically includes:

inputting the keywords into a list name generation model to obtain the list names of the multimedia lists output by the list name generation model;

the list name generation model is obtained by training based on sample keywords and sample list names.

In a third aspect, an embodiment of the present invention provides a multimedia list generating apparatus, including:

a multimedia question-answer pair determining unit for determining a plurality of multimedia question-answer pairs;

the system comprises a question-answer pair characteristic determining unit, a question-answer pair characteristic determining unit and a question-answer pair characteristic determining unit, wherein the question-answer pair characteristic determining unit is used for determining the question-answer pair characteristic of each multimedia question-answer pair, and the question-answer pair characteristic represents the type characteristic of multimedia resources contained in the multimedia question-answer pair;

and the list generating unit is used for clustering each multimedia question-answer pair based on the question-answer pair characteristics of each multimedia question-answer pair and generating a multimedia list based on a clustering result.

In a fourth aspect, an embodiment of the present invention provides a multimedia list naming apparatus, including:

the relevant text determining unit is used for determining the relevant text of the multimedia resources in the multimedia list; wherein the multimedia list is generated based on the multimedia list generation method as provided in the first aspect;

and the naming unit is used for generating the list name of the multimedia list based on the related text of the multimedia resources in the multimedia list.

In a fifth aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a bus, where the processor and the communication interface, the memory complete mutual communication through the bus, and the processor may call a logic command in the memory to execute the steps of the method as provided in the first aspect or the second aspect.

In a sixth aspect, embodiments of the present invention provide a non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the method as provided in the first or second aspect.

The multimedia list generation and naming method, device, electronic equipment and storage medium provided by the embodiment of the invention use the multimedia question and answer containing abundant multimedia resource information for generating the multimedia list, are beneficial to realizing fine-grained multimedia resource division, and can cover various user requirements. The question-answer pair characteristics based on the multimedia question-answer pair are subjected to unsupervised clustering, additional manpower consumption is not needed for labeling, and the manpower consumption required by multimedia list generation can be effectively saved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a multimedia list generating method according to an embodiment of the present invention;

fig. 2 is a schematic flow chart of a method for determining characteristics of question-answer pairs according to an embodiment of the present invention;

fig. 3 is a schematic flowchart of an answer text feature determination method according to an embodiment of the present invention;

fig. 4 is a schematic flow chart of a method for determining a multimedia question-answer pair according to an embodiment of the present invention;

fig. 5 is a schematic flowchart of a method for determining an intention classification result according to an embodiment of the present invention;

fig. 6 is a flowchart illustrating a multimedia list generating method according to another embodiment of the present invention;

FIG. 7 is a flowchart illustrating a multimedia list naming method according to an embodiment of the present invention;

fig. 8 is a flowchart illustrating a list name generating method according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a multimedia list generating apparatus according to an embodiment of the present invention;

FIG. 10 is a schematic structural diagram of a multimedia list naming apparatus according to an embodiment of the present invention;

fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

With the rapid development of internet technology, people can obtain massive multimedia resources through internet query. The advent of various multimedia lists, such as a shadow sheet, a book sheet, a song sheet, etc., provides people with different kinds of multimedia resources, so that people can quickly find multimedia resources of interest types.

At present, the generation of the multimedia list is usually done manually, for example, a user can classify books viewed by the user to form different kinds of book sheets, such as a person biography book sheet, a travel notes book sheet, a food book sheet, and the like. In addition, the type labels of the multimedia resources can be manually labeled to realize the automatic generation of the multimedia list, but the manually labeled type labels of the multimedia resources usually have the problems of being too wide and too coarse in classification granularity, for example, when the type labels of the film are manually labeled, the type labels can be love, actually, love films can be subdivided into campus love, parent love, war love and the like, and the classification granularity is too coarse, so that the automatically generated multimedia list cannot meet the fine-grained searching requirement easily. In addition, manual labeling requires a lot of labor and time, and is too costly.

In view of the above, an embodiment of the present invention provides a multimedia list generation method, so as to implement automatic generation of a fine-grained and efficient multimedia list. Fig. 1 is a schematic flow chart of a multimedia list generating method according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:

step 110, a plurality of multimedia question-answer pairs are determined.

Specifically, the multimedia question-answer pair is a question-answer pair text related to multimedia, the multimedia question-answer pair specifically includes a question text and answer texts corresponding to the question text, and one question text may correspond to one or more answer texts. The multimedia question-answer pairs herein may be for one or more multimedia fields, for example, when a movie list needs to be generated, step 110 correspondingly determines question-answer pairs for movies, when a song list needs to be generated, step 110 correspondingly determines question-answer pairs for songs, when a book list needs to be generated, step 110 correspondingly determines question-answer pairs for books, and when a list containing both movies and songs needs to be generated, step 110 correspondingly determines question-answer pairs for movies and/or question-answer pairs for songs.

Here, the multimedia question-answer pair may be mined from a forum, community or other type of question-answer platform in the internet-related field. The multimedia question-answer pair is natural artificial data, usually has high knowledge and rich semantic information, is beneficial to realizing fine-grained multimedia resource division, and can cover various types of user requirements. And the multimedia question-answer pair can be directly obtained without additional labor consumption for labeling.

Step 120, determining the question-answer pair characteristics of each multimedia question-answer pair, wherein the question-answer pair characteristics represent the type characteristics of the multimedia resources contained in the multimedia question-answer pair.

Specifically, for any multimedia question-answer pair, the multimedia question-answer pair includes a multimedia resource and a related description for the multimedia resource. The question-answer pair features of the multimedia question-answer pair are used for representing type features of multimedia resources contained in the multimedia question-answer pair, the type features of the multimedia resources are used for representing the features of types to which the multimedia resources belong, and the type features of the multimedia resources can be features represented by the multimedia resources, such as features corresponding to related descriptions of the types to which the multimedia resources belong, collected through the internet or pre-stored in a database, or features corresponding to the related descriptions of the multimedia resources in the multimedia question-answer pair. The description herein refers to a natural language description text for multimedia resources, and may be comment words and set label texts issued by users in the internet for multimedia resources, or adjective words for multimedia resources in a multimedia question and answer pair, and the like, which is not specifically limited in the embodiment of the present invention.

For example, in the multimedia question-answer pair, the question text is "which movies are relatively laughter", the corresponding answer text is "charlotter annoyance, and the baby plan is good," charlotter annoyance "and" baby plan "are two multimedia resources included in the multimedia question-answer pair, and" laugh "is a description of the two multimedia resources, and the characteristics of the question-answer pair of the multimedia question-answer pair may include characteristics corresponding to" laugh ", or may include characteristics corresponding to descriptions of types of the two multimedia resources, which are acquired through the internet, of" charlotter annoyance "and" baby plan ".

And step 130, clustering each multimedia question-answer pair based on the question-answer pair characteristics of each multimedia question-answer pair, and generating a multimedia list based on a clustering result.

Specifically, based on the question-answer pair characteristics of each multimedia question-answer pair, each multimedia question-answer pair is clustered, and a clustering result can be obtained. Here, the clustering result is a plurality of clusters, each cluster includes a plurality of multimedia question-answer pairs with similar characteristics, and a multimedia list is generated for each cluster. Further, for any one cluster, a multimedia list corresponding to the cluster may be constructed based on the multimedia resources contained in each multimedia question-answer pair contained in the cluster.

For example, one cluster includes two multimedia question-and-answer pairs, the question text of the multimedia question-and-answer pair 1 is "which laughing movies can be recommended", the corresponding answer text is "life in flight, charlotter worries that i feel like laughing", the question text of the multimedia question-and-answer pair 2 is "what good looking comedy movie is recently", the corresponding answer text is "that is, two tigers", and the multimedia list thus generated includes "life in flight", "charlotter worries" and "two tigers".

The method provided by the embodiment of the invention uses the multimedia question answers containing rich multimedia resource information for generating the multimedia list, is beneficial to realizing fine-grained multimedia resource division, and can cover various user requirements. The question-answer pair characteristics based on the multimedia question-answer pair are subjected to unsupervised clustering, additional manpower consumption is not needed for labeling, and the manpower consumption required by multimedia list generation can be effectively saved.

Based on the foregoing embodiment, fig. 2 is a schematic flow chart of a method for determining a question-answer pair characteristic according to an embodiment of the present invention, and as shown in fig. 2, step 120 specifically includes:

step 121, determining the question text characteristics of any multimedia question-answer pair.

Specifically, the multimedia question-answer pair includes question texts, and the question texts generally include related descriptions of multimedia resources, such as "what are laughing foreign movies", where "laughing" and "foreign" are related descriptions of movie-like multimedia resources.

The problem text features are obtained by extracting the features of the problem text, and the problem text features can be word vectors of each participle in the problem text, semantic features of the problem text, features of descriptions related to multimedia resources in the problem text and the like.

And/or, step 122, determining the answer text characteristics of any multimedia question-answer pair.

Specifically, the multimedia question-answer pair includes answer text, which usually includes multimedia resources, and may also include related descriptions of the multimedia resources, such as "what are too laughing foreign films" with respect to the question text, and the answer text may be "silly bean worker is very laughing", where "silly bean worker" is a multimedia resource and "laughing" is a related description of the multimedia resource.

The answer text features are obtained by extracting the features of the answer text, and the answer text features may include type features of multimedia resources related to the answer text, where the type features of the multimedia resources may be features of relevant descriptions for the multimedia resources in the answer text, or features of various labels of the multimedia resources obtained through a multimedia resource database or other approaches.

And step 123, determining the question-answer pair characteristics of the multimedia question-answer pair based on the question text characteristics and/or the answer text characteristics.

Specifically, when only step 121 is executed and step 122 is not executed, the question text feature may be directly used as the question-answer pair feature of the multimedia question-answer pair. When only step 122 is performed and step 121 is not performed, the answer text feature may be directly used as the question-answer pair feature of the multimedia question-answer pair. When both step 121 and step 122 are executed, the question-answer pair features of the multimedia question-answer pair may be obtained by feature splicing of the question text features and the answer text features or by other feature fusion methods.

For example, any multimedia question-answer pair is [ Q, a ], where Q denotes question text, a denotes answer text, the question text feature of the question text is S (Q), the answer text feature of the answer text is S (a), and the corresponding obtained question-answer pair feature may be denoted as S [ Q, a ] ═ S (Q), S (a) ].

The method provided by the embodiment of the invention refers to the text structure of the multimedia question-answer pair, and respectively extracts the characteristics from the question text and the answer text to obtain the characteristics of the question-answer pair of the multimedia question-answer pair, thereby realizing deep mining aiming at the multimedia question-answer pair, improving the clustering accuracy of the multimedia question-answer pair and realizing accurate multimedia list generation.

Based on any of the above embodiments, in the method, step 121 specifically includes: and determining semantic features of the question text of any multimedia question-answer pair as the question text features of the multimedia question-answer pair.

Specifically, the semantic features of the question text are features corresponding to semantics contained in the question text. The semantic features of the problem text may be obtained based on hidden layer features of each word in the problem text, and specifically, the semantic features of the problem text may be obtained by inputting a word vector of each word in the problem text into a neural network such as a recurrent neural network RNN or a long-term memory network LSTM, and the like, and the embodiment of the present invention is not particularly limited thereto.

The semantic features of the question texts are used as the question text features of the multimedia question-answer pairs, and when the semantic features are applied to clustering of the multimedia question-answer pairs, the question texts of each multimedia question-answer pair in any cluster in a clustering result have similar semantics, so that the combination of the multimedia question-answer pairs with similar questions is realized, for example, one question text is 'wanting to learn foreign languages and introducing a good magic movie', the other question text is 'recommending the magic movie of which foreign country', the semantics of the two question texts are similar, the semantic features of the question texts are clustered, so that the multimedia question-answer pairs corresponding to the two question texts are divided into the same cluster, and multimedia resources in the answer texts corresponding to the two question texts are listed in the same multimedia list.

The method provided by the embodiment of the invention takes the semantic features of the question text as the question text features of the multimedia question-answer pairs, can realize the clustering of the multimedia question-answer pairs with similar question text semantics, and further realizes the automatic generation of the multimedia list.

Based on any of the above embodiments, fig. 3 is a schematic flowchart of an answer text feature determination method provided by an embodiment of the present invention, as shown in fig. 3, in the method, step 122 specifically includes:

step 1221, determine each multimedia resource contained in the answer text of the multimedia question-answer pair.

Specifically, a multimedia database containing a large number of multimedia resources may be preset, and each multimedia resource contained in the answer text may be determined by matching the multimedia resource in the multimedia database with the answer text.

Step 1222, determine the related information of each multimedia resource.

Specifically, the related information of the multimedia resource may be obtained by searching the multimedia resource through the internet, or may be extracted from a multimedia database, taking a movie-like multimedia resource as an example, the related information of the multimedia resource may include information such as a movie type, a language, a director, and the like, or may be a tag set by a user on a network platform for the multimedia resource, for example, a tag corresponding to a movie "silly bean special worker" includes comedy, uk, laugh, special worker, an action, and the like.

Step 1223, determining the answer text characteristics of the multi-media question-answer pair based on the relevant information of each multi-media resource.

Specifically, the answer text feature includes a corresponding feature of the related information of each multimedia resource included in the answer text. The method for determining the answer text features may be to perform feature conversion on the related information of each multimedia resource to obtain corresponding features, and then to splice the corresponding features of the related information of each multimedia resource to obtain the answer text features. Or the relevant information of each multimedia resource can be fused to obtain the fused relevant information, and then the fused relevant information is subjected to feature conversion to obtain the answer text feature. For example, each multimedia resource corresponds to a plurality of tags, the tags of all multimedia resources are integrated, and then feature conversion is performed. The feature transformation may adopt an expression mode of one-hot encoding, or directly transform the participle of the related information into a corresponding word vector, which is not specifically limited in the embodiment of the present invention. For example, the answer text feature may be expressed as a vector of L-one-hot, where L is used to represent the labels corresponding to all multimedia resources in the answer text.

When the answer text features of the answer texts are applied to clustering of the multimedia question-answer pairs, multimedia resources contained in the answer texts of each multimedia question-answer pair have similar related information in any cluster of clustering results, so that the multimedia question-answer pairs with similar multimedia resources are merged, for example, one answer text is 'silly bean speciality smiling', the other answer text is 'I feel wang card speciality', wherein the related information of the 'silly bean speciality' and the 'wang card feature' comprises comedy, English, laugh and speciality, the related information of two movies is similar, the multimedia question-answer pairs corresponding to the two answer texts are clustered through the answer text features of the answer texts, and the multimedia resources in the two answer texts are listed in the same multimedia list.

According to the method provided by the embodiment of the invention, the answer text characteristics are determined through the relevant information of each multimedia resource contained in the answer text, and the clustering of multimedia question and answer pairs similar to the relevant information of the multimedia resources can be realized, so that the automatic generation of the multimedia list is realized.

Based on any of the above embodiments, fig. 4 is a schematic flow chart of the method for determining a multimedia question-answer pair according to the embodiment of the present invention, and as shown in fig. 4, step 110 specifically includes:

step 111, determining a plurality of candidate question-answer pairs.

Here, the candidate question-answer pairs may be mined from forums, communities or other types of question-answer platforms in the internet-related fields, and may have a large association with the multimedia or a small association with the multimedia.

When the candidate question-answer pairs are collected, the question-answer pairs with higher number of the point-answer pairs and higher browsing amount can be selected from the massive question-answer pairs as the candidate question-answer pairs through the information such as the number of the point-answer pairs and the browsing amount corresponding to the question-answer pairs on the network, so that the quality of the candidate question-answer pairs is ensured.

Step 112, inputting any candidate question-answer pair into an intention classification model to obtain an intention classification result output by the intention classification model; the intention classification model is obtained by training based on the sample question-answer pairs and the sample intention classification results thereof.

Specifically, the intention classification model is used for judging whether the intention of the candidate question-answer pair is related to the multimedia or not according to the input candidate question-answer pair and outputting an intention classification result representing whether the candidate question-answer pair is related to the multimedia or not. Here, the intention classification result may be multimedia-related or multimedia-unrelated, and may also be a probability that the candidate question-answer pair is related to the multimedia, which is not specifically limited in the embodiment of the present invention.

Before step 112 is executed, the intention classification model may also be obtained through training in advance, and specifically, the intention classification model may be obtained through training in the following manner: first, a large number of sample question-answer pairs are collected, and the sample intention classification results of the sample question-answer pairs are manually labeled, namely whether the sample question-answer pairs are related to multimedia or not. And then training the initial model based on the sample question-answer pairs and the sample intention classification results of the sample question-answer pairs, thereby obtaining an intention classification model.

And step 113, if the intention classification result shows that the multimedia is relevant, determining the candidate question-answer pair as a multimedia question-answer pair.

Specifically, screening of candidate question-answer pairs is achieved according to the intention classification result, and the media question-answer pairs are obtained. For example, if the intention classification result is multimedia-related, the candidate question-answer pair is determined to be a multimedia question-answer pair, otherwise, the candidate question-answer pair is not taken as the multimedia question-answer pair. For another example, if the intention classification result is a multimedia-related probability, and the probability value is greater than a preset probability threshold, it is determined that the candidate question-answer pair is related to the multimedia, the candidate question-answer pair is a multimedia question-answer pair, otherwise, the candidate question-answer pair is not taken as the multimedia question-answer pair.

According to the method provided by the embodiment of the invention, the candidate question-answer pairs are screened through the intention classification model, so that the multimedia question-answer pairs have higher confidence, and a subsequently generated multimedia list has higher accuracy.

Based on any of the above embodiments, in the method, step 112 specifically includes: and inputting the question text of any candidate question-answer pair into the intention classification model to obtain an intention classification result output by the intention classification model.

Specifically, in the candidate question-answer pairs, the question text usually includes a clear intention, such as "find action movie", "recommend song suitable for picnic", wherein the "movie", "song", and the like all include a clear intention, and the relevance to the multimedia can be determined through the question text.

Therefore, when the intention classification model is applied to carry out intention classification on the candidate question-answer pairs, the candidate question-answer pairs do not need to be input into the intention classification model in a whole manner, and only the question texts in the candidate question-answer pairs need to be input into the intention classification model.

Based on any of the above embodiments, fig. 5 is a schematic flow chart of the method for determining an intention classification result according to an embodiment of the present invention, as shown in fig. 5, step 112 specifically includes:

step 1121, inputting the word vector of each word in the question text of any candidate question-answer pair to the semantic coding layer of the intention classification model to obtain the semantic features of the question text output by the semantic coding layer.

Step 1122, inputting the semantic features into the classification output layer of the intent classification model to obtain an intent classification result output by the classification output layer.

Specifically, the intention classification model comprises a semantic coding layer and a classification output layer, wherein the semantic coding layer is used for coding hidden layer characteristics of each word based on a word vector of each word of the input question text, and obtaining semantic characteristic output of the question text based on the hidden layer characteristics; and the classification output layer is used for analyzing the probability of the problem text related to the multimedia based on the input semantic features and outputting an intention classification result.

The word vector of each word may be generated by random initialization, or may be generated by training of tools such as word2vec, and this is not specifically limited in the embodiment of the present invention.

The semantic coding layer may be a recurrent neural network RNN or a long-short time memory network LSTM, and taking RNN as an example, the semantic coding layer codes to obtain a hidden layer feature h ═ of each word₁,h₂…h_i…h_N]Wherein, in the step (A),

and outputting hidden nodes corresponding to the ith word, namely hidden layer characteristics, wherein m is the dimension of the hidden nodes corresponding to the RNN hidden layer, and N is the word number of the problem text. The semantic feature thus obtained is h_NI.e. the hidden layer feature of the last word in the question text.

The classification output layer can apply a softmax function or other types of classifiers to realize the output of the intention classification result, and when the softmax function is applied, the intention classification result is expressed as the following formula, wherein o is the intention classification result and represents the probability that the question text is related to the multimedia, and W and b are parameters of the softmax function:

o＝softmax(h_NW+b)

assuming that the probability of the question text contained in the intention classification result o related to the multimedia is o [1], the preset probability threshold is 0.9, and o [1] corresponding to the candidate question-answer pair 1, 2, 3 is 0.95, 0.97 and 0.23 respectively, then the candidate question-answer pair 1, 2 is a multimedia question-answer pair, and the candidate question-answer pair 3 is not a multimedia question-answer pair and does not participate in the generation of the subsequent multimedia list.

Based on any of the above embodiments, fig. 6 is a schematic flow chart of a multimedia list generation method according to another embodiment of the present invention, as shown in fig. 6, step 130 specifically includes:

and step 131, clustering each multimedia question-answer pair based on the question-answer pair characteristics of each multimedia question-answer pair to obtain a clustering result.

Specifically, based on the question-answer pair characteristics of each multimedia question-answer pair, each multimedia question-answer pair is clustered, and a clustering result can be obtained. Here, the clustering algorithm specifically applied may be a k-means clustering algorithm, a DBSCAN clustering algorithm, a mean shift clustering algorithm, and the like, which is not specifically limited in this embodiment of the present invention.

Step 132, based on each multimedia resource contained in the answer text of each multimedia question-answer pair belonging to any cluster in the clustering result, a multimedia list corresponding to the cluster is generated.

Specifically, for any one cluster obtained by clustering, the cluster may include a plurality of similar multimedia question-answer pairs, each multimedia question-answer pair includes a plurality of multimedia resources, and the multimedia resources included in each multimedia question-answer pair in the cluster may be listed in a multimedia list corresponding to the cluster. Different multimedia lists may be generated for different clusters.

Based on any one of the above embodiments, a multimedia list generation method is used for generating a movie list, and specifically includes the following steps:

first, a plurality of candidate question-answer pairs are collected. And then inputting the question text of each candidate question-answer pair into an intention classification model to obtain an intention classification result output by the intention classification model.

The following table shows a plurality of candidate question-answer pairs, and the intention classification results of the candidate question-answer pairs:

assuming that the preset probability threshold is 0.9, the question-answer pairs corresponding to sequence numbers 1, 2, 4, 5 and 6 are determined to be multimedia question-answer pairs related to the movie.

Then, the question-answer pair characteristics corresponding to each multimedia question-answer pair. Here, the question-answer pair features are obtained by splicing the question text features and the answer text features. Wherein the question text feature may be a semantic feature of the question text, here denoted as h_NI.e. the hidden layer character of the last word in the question text, N is the number of words in the question text. The answer text feature may be a feature corresponding to the related information of each multimedia resource included in the answer text. In the above table, the multimedia resources included in each answer text are represented by underlines, taking a multimedia question-answer pair with a serial number of 4 as an example, the three multimedia resources (i.e., three movies) included in the answer text are respectively "nini-ya legend", "dragon knight", and "asia and mini country", and for "asia and mini country", the relevant information obtained through the internet includes animation, lukkespine, france, magic, and the like. After the relevant information of 'Nannian legend', 'Dragon knight' and 'Arthur and Mini nation' is collected, a set corresponding to all relevant information is represented as L, an expression mode of one-hot coding is applied, and answer text features are represented as L-one-hot. Thus obtaining the characteristic S (Q, A) ═ h of any multi-media question-answer pair_N,L-one-hot]。

And then, clustering each multimedia question-answer pair based on the question-answer pair characteristics of each multimedia question-answer pair to obtain a clustering result. In the above table, the multimedia question-answer pairs with serial numbers 1 and 4 form a cluster, the multimedia question-answer pairs with serial numbers 2 and 5 form a cluster, and two multimedia lists are correspondingly formed, wherein one multimedia list comprises "flying life", "charlotte vexation" and "two tigers", and the other multimedia list comprises "twilight city", "nanny legend", "dragon knight" and "asiase and mini-country".

Currently, automatically generated multimedia lists usually directly take the tags of the multimedia resources contained therein as list names, for example, directly naming the shadow as a love disc. The list names named by the method are single in form and are relatively stiff, and the interest of the user is usually difficult to arouse. To solve this problem, embodiments of the present invention provide a multimedia list naming method. Fig. 7 is a schematic flowchart of a multimedia list naming method according to an embodiment of the present invention, as shown in fig. 7, the method includes:

step 710, determining related texts of multimedia resources in a multimedia list; the multimedia list is generated based on the multimedia list generation method provided by any one of the above embodiments.

Here, the related text refers to text associated with the multimedia resource, such as comment text posted by a user for the multimedia resource, or introduction text for the multimedia resource on a network, and the like. The related text is generally artificially written text, has richer forms and expressions relative to the labels, and is easier to attract the interests of the user.

Any multimedia list may include a plurality of multimedia resources, and the related text herein may correspond to one of the multimedia resources or a plurality of the multimedia resources, which is not specifically limited in the embodiment of the present invention. When the relevant texts are collected, texts with higher praise number and browsing amount can be selected from massive texts through information such as praise number and browsing amount corresponding to comment texts, introduction texts and the like on the network to serve as the relevant texts, so that the quality of the relevant texts is ensured.

Step 720, generating a list name of the multimedia list based on the related text of the multimedia resources in the multimedia list.

Specifically, after the related text is obtained, a part of the word segment may be directly intercepted from the related text to serve as the list name of the multimedia list, or the list name may be automatically generated based on the high-frequency word in the related text, which is not specifically limited in the embodiment of the present invention.

For example, the multimedia list includes "charlotte annoyance", a short comment text "about" charlotte annoyance "issued by the user is obtained by searching" charlotte annoyance ", which is a top work of a domestic comedy, and" domestic comedy peak "is used as the list name of the multimedia list.

The method provided by the embodiment of the invention generates the list name of the multimedia list through the related text of the multimedia resource, and the application of the related text is beneficial to enriching the form and expression of the list name, so that the multimedia list is more easy to arouse the interest of the user.

Based on any of the above embodiments, fig. 8 is a flowchart illustrating a list name generation method according to an embodiment of the present invention, and as shown in fig. 8, step 720 specifically includes:

in step 721, keywords of the related text of the multimedia asset in the media list are determined.

Based on the keywords, list names of the multimedia list are generated, step 722.

Specifically, the keyword may be a word appearing frequently in the related text, or a word segmentation with high importance in the related text, and the keyword may be one or more. The determination of the keyword may be implemented by various keyword extraction methods, for example, keyword matching, and the like, which is not specifically limited in this embodiment of the present invention. The keywords are applied to the generation of the list names, so that rich meanings expressed by the related texts can be better embodied.

Based on any of the above embodiments, in this method, step 722 specifically includes: and carrying out synonym replacement on the keywords in the related text to obtain the list name of the multimedia list.

Specifically, the synonym pair to which the keyword belongs in the related text may be determined based on a synonym pair collected in advance, so as to perform synonym replacement on the keyword in the related text, and use the related text after synonym replacement as the list name.

Before this, the collection of synonym pairs may also be completed in advance, for example, the collection of synonym pairs may be implemented by the following method: collecting a large number of keywords of the sample related texts, training word vectors of each keyword, and judging whether the two keywords are synonym pairs or not by calculating the dot product of the word vectors of every two keywords and comparing the dot product with a preset threshold value.

For example, the related text is "the precursor of domestic animation", and the keywords include "domestic" and "animation", wherein a synonym of "animation" is "animation", and the list name "the precursor of domestic animation" is obtained by synonym replacement.

According to the method provided by the embodiment of the invention, the synonym replacement is carried out on the related text, so that the diversity of the list names is realized.

Based on any of the above embodiments, in this method, step 722 specifically includes: inputting the keywords into a list name generation model to obtain a list name of a multimedia list output by the list name generation model; the list name generation model is obtained by training based on the sample keywords and the sample list names.

Specifically, the list name generation model is used for automatically generating the list names of the multimedia lists according to the input keywords. Before step 722 is executed, the list name generation model may be obtained through training in advance, and specifically, the list name generation model may be obtained through training in the following manner: first, a large number of sample keywords and sample list names corresponding to the sample keywords are collected. And then, training the initial model based on the sample keywords and the sample list names, so as to obtain a list name generation model. The list name generation model learns the corresponding relation between the sample keywords and the sample list names, and applies the corresponding relation to the list name generation process based on the keywords, so that the list names are obtained. The list name generation model here may be a seq2seq model.

For example, keywords "fun" and "animation" are input to the list name generation model, and the list name "peak work of fun" output by the list name generation model is obtained.

According to the method provided by the embodiment of the invention, the keywords are input into the list name generation model to determine the list names, and more diversified list names can be obtained based on different keywords.

Based on any of the above embodiments, fig. 9 is a schematic structural diagram of a multimedia list generating device according to an embodiment of the present invention, as shown in fig. 9, the multimedia list generating device includes a multimedia question-answer pair determining unit 910, a question-answer pair characteristic determining unit 920, and a list generating unit 930;

the multimedia question-answer pair determining unit 910 is configured to determine a plurality of multimedia question-answer pairs;

the question-answer pair characteristic determining unit 920 is configured to determine a question-answer pair characteristic of each multimedia question-answer pair, where the question-answer pair characteristic represents a type characteristic of a multimedia resource included in the multimedia question-answer pair;

the list generating unit 930 is configured to cluster each multimedia question-answer pair based on the question-answer pair characteristics of each multimedia question-answer pair, and generate a multimedia list based on a clustering result.

The device provided by the embodiment of the invention uses the multimedia question answers containing rich multimedia resource information for generating the multimedia list, is beneficial to realizing fine-grained multimedia resource division, and can cover various user requirements. The question-answer pair characteristics based on the multimedia question-answer pair are subjected to unsupervised clustering, additional manpower consumption is not needed for labeling, and the manpower consumption required by multimedia list generation can be effectively saved.

Based on any of the above embodiments, the question-answer pair feature determination unit 920 includes:

the question text characteristic determining unit is used for determining the question text characteristic of any multimedia question-answer pair;

and/or, the answer text characteristic determining unit is used for determining the answer text characteristic of any multimedia question-answer pair;

and the question-answer pair characteristic determining unit is used for determining the question-answer pair characteristic of any multimedia question-answer pair based on the question text characteristic and/or the answer text characteristic.

Based on any of the above embodiments, the question text feature determining unit is specifically configured to:

Based on any of the above embodiments, the answer text feature determination unit is specifically configured to:

determining related information of each multimedia resource;

Based on any of the above embodiments, the multimedia question-answer pair determining unit 910 includes:

a candidate question-answer pair determining unit for determining a plurality of candidate question-answer pairs;

the intention classification unit is used for inputting any candidate question-answer pair into an intention classification model to obtain an intention classification result output by the intention classification model; the intention classification model is obtained by training based on the sample question-answer pairs and the sample intention classification results thereof;

and the question-answer pair screening unit is used for determining any candidate question-answer pair as the multimedia question-answer pair if the intention classification result shows that multimedia is relevant.

Based on any of the embodiments above, the intention classification unit is specifically configured to:

Based on any of the above embodiments, the list generating unit 930 is specifically configured to:

Based on any of the above embodiments, fig. 10 is a schematic structural diagram of a multimedia list naming apparatus according to an embodiment of the present invention, as shown in fig. 10, the multimedia list naming apparatus includes a related text determining unit 1010 and a naming unit 1020;

the relevant text determining unit 1010 is configured to determine relevant texts of multimedia resources in the multimedia list; wherein the multimedia list is generated based on a multimedia list generation method;

the naming unit 1020 is configured to generate a list name of the multimedia list based on the related text of the multimedia resource in the multimedia list.

The device provided by the embodiment of the invention generates the list name of the multimedia list through the related text of the multimedia resource, and the application of the related text is beneficial to enriching the form and expression of the list name, so that the multimedia list is more easily interested by the user.

Based on any of the above embodiments, the naming unit 1020 includes:

the keyword determining unit is used for determining keywords of related texts of multimedia resources in the multimedia list;

and the name generating unit is used for generating the list name of the multimedia list based on the keyword.

Based on any of the embodiments above, the name generation unit is specifically configured to:

Fig. 11 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 11, the electronic device may include: a processor (processor)1110, a communication Interface (Communications Interface)1120, a memory (memory)1130, and a communication bus 1140, wherein the processor 1110, the communication Interface 1120, and the memory 1130 communicate with each other via the communication bus 1140. Processor 1110 may call logical commands in memory 1130 to perform the following method: determining a plurality of multimedia question-answer pairs; determining question-answer pair characteristics of each multimedia question-answer pair, wherein the question-answer pair characteristics represent type characteristics of multimedia resources contained in the multimedia question-answer pairs; and clustering each multimedia question-answer pair based on the question-answer pair characteristics of each multimedia question-answer pair, and generating a multimedia list based on a clustering result.

In addition, processor 1110 may also call logical commands in memory 1130 to perform the following method: determining related texts of multimedia resources in a multimedia list; wherein the multimedia list is generated by a multi-media list generation method; and generating a list name of the multimedia list based on the related text of the multimedia resources in the multimedia list.

In addition, the logic commands in the memory 1130 may be implemented in the form of software functional units and may be stored in a computer readable storage medium when sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes a plurality of commands for enabling a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Embodiments of the present invention further provide a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to perform the method provided in the foregoing embodiments when executed by a processor, and the method includes: determining a plurality of multimedia question-answer pairs; determining question-answer pair characteristics of each multimedia question-answer pair, wherein the question-answer pair characteristics represent type characteristics of multimedia resources contained in the multimedia question-answer pairs; and clustering each multimedia question-answer pair based on the question-answer pair characteristics of each multimedia question-answer pair, and generating a multimedia list based on a clustering result.

Furthermore, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented by a processor to execute the method provided by the foregoing embodiments, for example, including: determining related texts of multimedia resources in a multimedia list; wherein the multimedia list is generated by a multi-media list generation method; and generating a list name of the multimedia list based on the related text of the multimedia resources in the multimedia list.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc., and includes commands for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method according to the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for generating a multimedia list, comprising:

determining a plurality of multimedia question-answer pairs;

2. The method for generating a multimedia list according to claim 1, wherein the determining the question-answer pair characteristics of each multimedia question-answer pair specifically comprises:

3. The method for generating a multimedia list according to claim 2, wherein the determining the question text feature of any multimedia question-answer pair specifically comprises:

4. The method for generating a multimedia list according to claim 2, wherein the determining the answer text feature of any multimedia question-answer pair specifically comprises:

determining related information of each multimedia resource;

5. The method for generating a multimedia list according to claim 1, wherein the determining a plurality of multimedia question-answer pairs specifically comprises:

determining a plurality of candidate question-answer pairs;

6. The method for generating a multimedia list according to claim 5, wherein the inputting any candidate question-answer pair into an intention classification model to obtain an intention classification result output by the intention classification model specifically comprises:

7. The method as claimed in any one of claims 1 to 6, wherein the clustering is performed on each multimedia question-answer pair based on the question-answer pair characteristics of each multimedia question-answer pair, and the generating of the multimedia list based on the clustering result specifically comprises:

8. A method for naming a multimedia list, comprising:

determining related texts of multimedia resources in a multimedia list; wherein the multimedia list is generated based on the multimedia list generation method according to any one of claims 1 to 7;

9. The method for naming a multimedia list according to claim 8, wherein the generating a list name of the multimedia list based on the associated text of the multimedia resource in the multimedia list specifically comprises:

and generating a list name of the multimedia list based on the keyword.

10. The method for naming a multimedia list according to claim 9, wherein the generating a list name of a multimedia list based on the keyword specifically comprises:

11. A multimedia list generation apparatus, comprising:

12. A multimedia list naming apparatus, comprising:

the relevant text determining unit is used for determining the relevant text of the multimedia resources in the multimedia list; wherein the multimedia list is generated based on the multimedia list generation method according to any one of claims 1 to 7;

13. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the multimedia list generation method according to any of claims 1 to 7 or the multimedia list naming method according to any of claims 8 to 10 when executing said program.

14. A non-transitory computer readable storage medium, having stored thereon a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the multimedia list generation method of any one of claims 1 to 7, or the multimedia list naming method of any one of claims 8 to 10.