CN114186074A

CN114186074A - Video search word recommendation method and device, electronic equipment and storage medium

Info

Publication number: CN114186074A
Application number: CN202111523639.4A
Authority: CN
Inventors: 黄诗磊
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2021-12-14
Filing date: 2021-12-14
Publication date: 2022-03-15

Abstract

The disclosure relates to a video search term recommendation method, a video search term recommendation device, electronic equipment and a storage medium, wherein the method comprises the following steps: extracting concept entities associated with video content of the target video and entity relations among the concept entities from a preset knowledge graph aiming at the target video to generate an associated knowledge graph corresponding to the target video; inputting a video content text and an associated knowledge graph of a target video into a pre-trained search word generation model to obtain at least one candidate search word; inputting each candidate search word and the video content text into a pre-trained search word evaluation model to obtain a correlation label corresponding to each candidate search word; determining a search word to be recommended aiming at a target video according to the candidate search word of which the relevance label meets the preset condition; the search terms to be recommended are used for guiding the user account to execute search operation after the user account accesses the target video. By adopting the method and the device, the search terms can be generated by combining the associated knowledge graph, and the relevance of the search terms and the video content is improved.

Description

Video search word recommendation method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for recommending video search terms, an electronic device, and a storage medium.

Background

At present, when a user watches videos, the user can be guided to initiate searching by generating and recommending search terms aiming at the watched videos to the user, exposure opportunities of corresponding search products are increased, and deep consumption appeal of the user is met. However, only the brand name related to the video can be extracted by using the conventional method to generate the corresponding search term, so that the application range of the video for generating the search term is limited, the relevance of the video content is weak, and the search term lacks diversity.

Therefore, the related art has a problem that the search term generated for a video has a low degree of correlation with the video content of the video.

Disclosure of Invention

The disclosure provides a video search term recommendation method, a video search term recommendation device, an electronic device and a storage medium, which are used for at least solving the problem that in the related technology, the correlation degree of a search term generated aiming at a video and the video content of the video is low. The technical scheme of the disclosure is as follows:

according to a first aspect of the embodiments of the present disclosure, there is provided a video search term recommendation method, including:

extracting concept entities associated with video content of a target video and entity relations among the concept entities from a preset knowledge graph to generate an associated knowledge graph corresponding to the target video;

inputting the video content text of the target video and the associated knowledge graph into a pre-trained search word generation model to obtain at least one candidate search word;

inputting each candidate search word and the video content text into a pre-trained search word evaluation model to obtain a correlation label corresponding to each candidate search word; the relevance label is used for representing the relevance degree of the candidate search word and the video content of the target video;

determining a search word to be recommended aiming at the target video according to the candidate search word of which the relevance label meets a preset condition; and the search word to be recommended is used for guiding a user account to execute search operation after accessing the target video.

In one possible implementation manner, the determining, according to the candidate search term whose relevance label satisfies a preset condition, a search term to be recommended for the target video includes:

taking the candidate search words with the relevance labels meeting preset conditions as target search words;

filtering abnormal search words in at least one target search word to obtain the search words to be recommended; and the abnormal search word is determined according to preset service requirements and/or playing platform rules.

In a possible implementation manner, the filtering the abnormal search term in the at least one target search term to obtain the search term to be recommended includes:

if the target search word contains the specified word, judging that the target search word is the abnormal search word; the specified words are determined based on preset service requirements and playing platform rules;

deleting the specified words in the abnormal search words to obtain modified search words;

taking the modified search terms and the non-abnormal search terms as the search terms to be recommended; the non-abnormal search word is a target search word which does not contain the specified word.

if the target search word contains a preset word and/or the word representation concept of the target search word is matched with a preset abnormal representation concept, determining the target search word as the abnormal search word; the preset words and the preset abnormal representation concepts are determined based on preset service requirements and playing platform rules;

and deleting the abnormal search word in at least one target search word to obtain the search word to be recommended.

In one possible implementation, the pre-trained search term generation model has a pre-trained encoder and a pre-trained decoder, and the inputting the video content text of the target video and the associated knowledge map into the pre-trained search term generation model to obtain at least one candidate search term includes:

inputting the video content text and the associated knowledge graph to the pre-trained encoder to obtain an encoding result; the coding result comprises a first coding result obtained by coding the video content text and a second coding result obtained by coding the associated knowledge graph;

inputting the coding result to the pre-trained decoder to obtain at least one candidate search term; the candidate search term is obtained by decoding a fused coding result between the first coding result and the second coding result through the pre-trained decoder.

In one possible implementation, the inputting the video content text and the associated knowledge-graph to the pre-trained encoder to obtain an encoding result includes:

splicing a preset search word control code with the video content text to obtain a spliced text; the search word control code comprises a word length control code and a keyword control code; the word length control code is used for controlling the word length of the candidate search word; the keyword control code is used for controlling whether the candidate search word contains a keyword corresponding to the keyword control code;

and inputting the spliced text and the associated knowledge graph into the pre-trained encoder to obtain an encoding result.

In one possible implementation manner, before the step of inputting the video content text of the target video and the associated knowledge graph into a pre-trained search term generation model to obtain at least one candidate search term, the method further includes:

acquiring first training sample data; each first training sample data comprises a video content text of a first sample video and a first sample search word of the first sample video, and the click times of a user account corresponding to the first sample search word are larger than a preset click time threshold;

extracting concept entities associated with video content of the first sample video and entity relations among the concept entities from a preset knowledge graph aiming at the first sample video, and generating an associated knowledge graph corresponding to the first sample video;

training a search term generation model to be trained based on the video content text of the first sample video, the associated knowledge map corresponding to the first sample video and the first sample search term to obtain the pre-trained search term generation model.

In one possible implementation manner, before the step of inputting each candidate search term and the video content text into a pre-trained search term evaluation model to obtain a relevance label corresponding to each candidate search term, the method further includes:

acquiring second training sample data; each second training sample data comprises a video content text of a second sample video, a second sample search word and a corresponding correlation label thereof, a random search word and a corresponding correlation label thereof, and different correlation labels correspond to different correlation degrees;

training a search term evaluation model to be trained based on the video content text of the second sample video, the second sample search terms and the corresponding relevance labels thereof, and the random search terms and the corresponding relevance labels thereof to obtain the pre-trained search term evaluation model.

According to a second aspect of the embodiments of the present disclosure, there is provided a video search word recommendation apparatus including:

the associated knowledge graph generating unit is configured to extract concept entities associated with video content of a target video and entity relations among the concept entities from a preset knowledge graph aiming at the target video, and generate an associated knowledge graph corresponding to the target video;

a candidate search term obtaining unit configured to perform input of the video content text of the target video and the associated knowledge graph to a pre-trained search term generation model to obtain at least one candidate search term;

a relevance label obtaining unit configured to perform input of each candidate search term and the video content text into a pre-trained search term evaluation model to obtain a relevance label corresponding to each candidate search term; the relevance label is used for representing the relevance degree of the candidate search word and the video content of the target video;

the to-be-recommended search word determining unit is configured to execute the candidate search words meeting preset conditions according to the relevance labels and determine the to-be-recommended search words aiming at the target video; and the search word to be recommended is used for guiding a user account to execute search operation after accessing the target video.

In a possible implementation manner, the search term to be recommended determining unit is specifically configured to execute the candidate search term whose relevance label meets a preset condition as a target search term; filtering abnormal search words in at least one target search word to obtain the search words to be recommended; and the abnormal search word is determined according to preset service requirements and/or playing platform rules.

In a possible implementation manner, the search term to be recommended determining unit is specifically further configured to determine that the target search term is the abnormal search term if the target search term includes a specified term; the specified words are determined based on preset service requirements and playing platform rules; deleting the specified words in the abnormal search words to obtain modified search words; taking the modified search terms and the non-abnormal search terms as the search terms to be recommended; the non-abnormal search word is a target search word which does not contain the specified word.

In a possible implementation manner, the search term to be recommended determining unit is specifically configured to determine that the target search term is the abnormal search term if the target search term includes a preset term and/or a term representation concept of the target search term matches a preset abnormal representation concept; the preset words and the preset abnormal representation concepts are determined based on preset service requirements and playing platform rules; and deleting the abnormal search word in at least one target search word to obtain the search word to be recommended.

In a possible implementation manner, the pre-trained search term generation model has a pre-trained encoder and a pre-trained decoder, and the candidate search term obtaining unit is specifically configured to perform input of the video content text and the associated knowledge map to the pre-trained encoder to obtain an encoding result; the coding result comprises a first coding result obtained by coding the video content text and a second coding result obtained by coding the associated knowledge graph; inputting the coding result to the pre-trained decoder to obtain at least one candidate search term; the candidate search term is obtained by decoding a fused coding result between the first coding result and the second coding result through the pre-trained decoder.

In a possible implementation manner, the candidate search term obtaining unit is specifically configured to perform stitching of a preset search term control code and the video content text to obtain a stitched text; the search word control code comprises a word length control code and a keyword control code; the word length control code is used for controlling the word length of the candidate search word; the keyword control code is used for controlling whether the candidate search word contains a keyword corresponding to the keyword control code; and inputting the spliced text and the associated knowledge graph into the pre-trained encoder to obtain an encoding result.

In one possible implementation manner, the video search term recommendation apparatus further includes:

a first training sample data acquisition unit configured to perform acquisition of first training sample data; each first training sample data comprises a video content text of a first sample video and a first sample search word of the first sample video, and the click times of a user account corresponding to the first sample search word are larger than a preset click time threshold;

the associated knowledge graph generating unit corresponding to the first sample video is specifically configured to extract concept entities associated with video content of the first sample video and entity relationships among the concept entities from a preset knowledge graph aiming at the first sample video, and generate an associated knowledge graph corresponding to the first sample video;

and the search word generation model training unit is specifically configured to execute training of a search word generation model to be trained based on the video content text of the first sample video, the associated knowledge graph corresponding to the first sample video and the first sample search word, so as to obtain the pre-trained search word generation model.

a second training sample data obtaining unit configured to perform obtaining second training sample data; each second training sample data comprises a video content text of a second sample video, a second sample search word and a corresponding correlation label thereof, a random search word and a corresponding correlation label thereof, and different correlation labels correspond to different correlation degrees;

and the search term evaluation model training unit is specifically configured to execute training of a search term evaluation model to be trained based on the video content text of the second sample video, the second sample search terms and their corresponding relevance labels, and the random search terms and their corresponding relevance labels, so as to obtain the pre-trained search term evaluation model.

According to a third aspect of the embodiments of the present disclosure, there is provided an electronic device, including a memory and a processor, where the memory stores a computer program, and the processor implements the video search term recommendation method according to the first aspect or any one of the possible implementations of the first aspect when executing the computer program.

According to a fourth aspect of embodiments of the present disclosure, there is provided a storage medium having stored thereon a computer program that, when executed by a processor, implements the video search term recommendation method according to the first aspect or any one of the possible implementations of the first aspect.

According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product, the program product comprising a computer program, the computer program being stored in a readable storage medium, from which the at least one processor of the apparatus reads and executes the computer program, so that the apparatus performs the video search term recommendation method described in any one of the embodiments of the first aspect.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

according to the scheme, a concept entity associated with video content of a target video and an entity relationship between the concept entities are extracted from a preset knowledge graph aiming at the target video, an associated knowledge graph corresponding to the target video is generated, a video content text and the associated knowledge graph of the target video are input into a pre-trained search word generation model to obtain at least one candidate search word, then the candidate search word and the video content text are input into a pre-trained search word evaluation model to obtain a correlation label corresponding to each candidate search word, the correlation label is used for representing the correlation degree of the candidate search word and the video content of the target video, and further a search word to be recommended aiming at the target video is determined according to the candidate search word of which the correlation label meets a preset condition, and the search word to be recommended is used for guiding a user account to execute search operation after accessing the target video, therefore, the search terms can be generated by combining the associated knowledge graph extracted from the knowledge graph on the basis of the video content, the correlation between the generated search terms and the video content is improved, the search terms to be recommended of the video are determined according to the correlation label, and the correlation and the effectiveness of the search terms can be guaranteed under the condition of no manual participation.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

Fig. 1 is a diagram illustrating an application environment of a video search term recommendation method according to an exemplary embodiment.

Fig. 2 is a flowchart illustrating a video search term recommendation method according to an example embodiment.

FIG. 3a is a schematic diagram illustrating a video search term presentation interface in accordance with an exemplary embodiment.

FIG. 3b is a flowchart illustrating an example of video search term recommendation, according to an example embodiment.

FIG. 4 is a flow diagram illustrating another method for video search term recommendation, according to an example embodiment.

Fig. 5 is a block diagram illustrating a video search term recommendation apparatus according to an example embodiment.

Fig. 6 is an internal block diagram of an electronic device shown in accordance with an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood by those of ordinary skill in the art, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure.

It should also be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for presentation, analyzed data, etc.) referred to in the present disclosure are both information and data that are authorized by the user or sufficiently authorized by various parties.

The video search term recommendation method provided by the present disclosure may be applied to an application environment as shown in fig. 1. Wherein, the user terminal 110 interacts with the server 120 through the network. The server 120 may determine, for a target video played in the user terminal 110, a search term to be recommended for the target video, and then the user terminal 110 may display the search term to be recommended in a played target video interface, so as to guide a user account corresponding to the user terminal 110 to execute a search operation after accessing the target video. In practical applications, the user terminal 110 may include, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 120 may be implemented by an independent server or a server cluster composed of a plurality of servers.

Fig. 2 is a flowchart illustrating a video search term recommendation method according to an exemplary embodiment, which may be used in the server 120 of fig. 1, as shown in fig. 2, and includes the following steps.

In step S210, extracting concept entities associated with video content of a target video and entity relationships between the concept entities from a preset knowledge graph, and generating an associated knowledge graph corresponding to the target video;

the target video may be a video to be generated with the search term, for example, in the process of playing a short video, the search term may be generated for the short video.

As an example, the preset knowledge graph may be a semantic network pre-established based on related content related to short video, which may have a plurality of nodes and edges connected between the nodes, each node representing an entity (entity) or a concept (concept), the edges representing various semantic relationships between the entities/concepts.

In practical application, for a target video to be generated with a search word, a concept entity associated with video content of the target video and an entity relationship between the concept entities can be extracted from a preset knowledge graph, and then the extracted concept entity and the entity relationship between the concept entities can be used as an associated knowledge graph corresponding to the target video to further generate the search word by combining with a video content text of the target video.

In an example, for a certain vehicle type series name x test driving video (i.e. a target video), according to a preset knowledge graph, video-related knowledge (vehicle type series name x, brand name y), (vehicle type series name x, manufacturer z, vehicle), (vehicle type series name x, engine type, turbo boost), (brand name y, is a kind of vehicle brand) can be extracted as an associated knowledge graph of the certain vehicle type series name x test driving video, wherein a concept entity can be the vehicle type series name x, the brand name y, and the like, and an entity relationship between concept entities, such as the brand corresponding to the vehicle type series name x is the brand name y.

In step S220, inputting the video content text of the target video and the associated knowledge graph into a pre-trained search term generation model to obtain at least one candidate search term;

as an example, the video content text may be extracted from the target video through a text extraction module, for example, a certain video content text may include a video voice text, a picture text, and short video text meta information, that is, a title, a description, a cover page, and the like, which is not limited in this embodiment.

In practical application, for a target video of a search term to be generated, a video content text of the target video and an associated knowledge graph extracted from a preset knowledge graph can be input into a pre-trained search term generation model, and then at least one candidate search term for the target video can be obtained.

Specifically, the text content, the video title, the video voice text, the video picture text and the like contained in the video cover image of the target video can be used as the video content text, a pre-trained search word generation model is further adopted, and at least one candidate search word, such as a group of search words conforming to the short video content, can be generated according to the video content text and the associated knowledge graph corresponding to the extracted target video.

In one example, the pre-trained search term generation model architecture adopts a text generation idea enhanced by a knowledge graph, and knowledge is blended into a traditional generation model structure, so that rich semantic information between entities in the knowledge graph and entity relations thereof can be used, and a search term generation task can be facilitated.

In step S230, inputting each candidate search term and the video content text into a pre-trained search term evaluation model to obtain a relevance label corresponding to each candidate search term; the relevance label is used for representing the relevance degree of the candidate search word and the video content of the target video;

the relevance tags may include a plurality of level tags corresponding to a plurality of relevance degrees, and different relevance tags may represent different relevance degrees of the candidate search term and the video content of the target video, for example, the relevance tag setting manner may be a strong relevance tag, a weak relevance tag, and an irrelevant tag, and may also be level 2, level 1, and level 0, which is not limited in this embodiment.

After at least one candidate search word is obtained, each candidate search word and the video content text can be input into a pre-trained search word evaluation model, the relevance of each candidate search word and the video content is scored, the relevance label corresponding to each candidate search word can be obtained, and the relevance degree of each candidate search word and the video content of the target video can be determined based on the relevance label.

In step S240, determining a search term to be recommended for the target video according to the candidate search term whose correlation tag satisfies the preset condition; the search terms to be recommended are used for guiding the user account to execute search operation after the user account accesses the target video.

After the relevance labels corresponding to the candidate search terms are obtained, the candidate search terms corresponding to the relevance labels meeting the preset conditions can be screened out, and further, through search term filtering processing, the search terms to be recommended for the target video can be determined according to the candidate search terms of which the relevance labels meet the preset conditions, so that the user account is guided to execute search operation after accessing the target video based on the search terms to be recommended.

For example, in order to ensure the correlation of the generated search term, the candidate search term corresponding to the strong correlation label and the candidate search term corresponding to the weak correlation label may be used as candidate search terms whose correlation labels satisfy the preset condition, and the candidate search term corresponding to the irrelevant label does not enter the subsequent search term filtering process flow.

In an optional embodiment, after the search terms to be recommended are obtained for the target video, the search terms to be recommended can be displayed to the user watching the target video through the search term recommendation area in the interface for playing the target video, and then the user can be guided to initiate searching after accessing the target video based on the search terms to be recommended, so that the video consumption requirements of the user are fully mined, the search threshold of the user is reduced, the video searching flow is increased, and the exposure opportunities are increased for the searched related products.

In an example, as shown in fig. 3a, in a scene where a user consumes a video using a short video platform, when the user watches a currently played video, by mounting a search PLC (Programmable Logic Controller), that is, a search word recommendation area, in a playing video interface, based on a displayed search word xxx, the user may be guided to initiate a search behavior after accessing the video.

Compared with the traditional method that the corresponding search words are generated only by extracting the brand names related to the videos, most videos cannot generate the corresponding search words because the videos containing the brand names only occupy a small part of the whole videos; the brand name is only partial information in the short video transmission information, only the deep association between the lack of the search word generated by the brand name and the content of the short video is extracted, and only the brand name as the search word cannot reflect the real search requirement of the user; and short video may involve various contents, different users are more interested in the contents of different aspects, and the diversity of the video contents cannot be embodied only by taking the brand name as a search word.

By adopting the technical scheme of the embodiment, the candidate search words are generated by combining the video content text and the associated knowledge map extracted from the knowledge map, the search words to be recommended of the video are determined from the candidate search words according to the relevance tags, the limitation of the video range is avoided, the search words related to the content can be generated for any type of short videos, the adaptability of the search words and the short video transmission information is improved, the generated short video content of the search words is closely related, and for each short video, a plurality of related search words reflecting different content angles of the short videos can be produced, and the diversity of the search words is enhanced.

In the video search word recommendation method, the concept entities related to the video content of the target video and the entity relationship among the concept entities are extracted from the preset knowledge graph aiming at the target video to generate the related knowledge graph corresponding to the target video, the video content text and the related knowledge graph of the target video are input into the pre-trained search word generation model to obtain at least one candidate search word, then the candidate search word and the video content text are input into the pre-trained search word evaluation model to obtain the correlation label corresponding to each candidate search word, and the candidate search word aiming at the target video is determined according to the candidate search word of which the correlation label meets the preset condition, so that the related knowledge graph extracted from the knowledge graph can be combined on the basis of the video content to generate the search word, and the correlation between the generated search word and the video content is improved, and the search terms to be recommended of the video are determined according to the relevance tags, so that the relevance and the effectiveness of the search terms can be guaranteed under the condition of no manual participation.

In an exemplary embodiment, determining a search term to be recommended for a target video according to a candidate search term of which a relevance tag meets a preset condition includes: taking candidate search terms with the relevance labels meeting preset conditions as target search terms; filtering abnormal search words in at least one target search word to obtain search words to be recommended; and the abnormal search words are determined according to preset service requirements and/or playing platform rules.

In specific implementation, according to the relevance tags corresponding to the candidate search terms, the candidate search terms corresponding to the relevance tags meeting preset conditions can be screened out to serve as target search terms, then abnormal search terms can be determined according to preset service requirements and/or playing platform rules, and then the abnormal search terms in at least one target search term are filtered, so that the search terms to be recommended can be obtained.

For example, filtering and rewriting rules may be configured in a user-defined manner according to business requirements, or rules (i.e., preset business requirements and/or playing platform rules) may be preset to ensure that the generated search terms conform to platform unification, and then abnormal search terms may be determined for candidate search terms corresponding to strong related tags and candidate search terms corresponding to weak related tags (i.e., target search terms), and the search terms to be recommended may be obtained by filtering the abnormal search terms.

According to the technical scheme, the candidate search words with the relevance labels meeting the preset conditions are used as the target search words, and then the abnormal search words in the at least one target search word are filtered to obtain the search words to be recommended, so that the relevance and the safety of the search words can be guaranteed under the condition of no manual participation, and the risk of the search words appearing is reduced.

In an exemplary embodiment, the filtering the abnormal search terms in the at least one target search term to obtain the search terms to be recommended includes: if the target search word contains the specified word, judging the target search word as an abnormal search word; the appointed words are determined based on preset service requirements and playing platform rules; deleting specified words in the abnormal search words to obtain modified search words; taking the modified search terms and the non-abnormal search terms as search terms to be recommended; the non-abnormal search word is a target search word which does not contain the specified word.

In practical application, if it is detected that a certain target search word contains a specified word, it may be determined that the certain target search word is an abnormal search word, and further, the abnormal search word may be rewritten.

Specifically, a rewrite rule may be configured by self-definition according to a business requirement, and based on the rewrite rule, when it is detected that a certain target search word includes a geographic word (i.e., a specified word), the target search word may be determined as an abnormal search word, and then the geographic word in the abnormal search word may be deleted, so as to rewrite the abnormal search word, for example, for an abnormal search word including a geographic word such as "black longjiang", the geographic word "black longjiang" in the abnormal search word may be deleted.

According to the technical scheme of the embodiment, if the target search word contains the specified word, the target search word is judged to be the abnormal search word, the specified word in the abnormal search word is deleted, the modified search word is obtained, and the modified search word and the non-abnormal search word are used as the search word to be recommended, so that the relevance and the effectiveness of the search word can be guaranteed on the basis of the post-processing logic of the rewriting of the search word without manual participation.

In an exemplary embodiment, the filtering the abnormal search terms in the at least one target search term to obtain the search terms to be recommended includes: if the target search word contains a preset word and/or the word representation concept of the target search word is matched with the preset abnormal representation concept, judging the target search word as an abnormal search word; the preset words and the preset abnormal representation concepts are determined based on preset service requirements and playing platform rules; and deleting the abnormal search word in the at least one target search word to obtain the search word to be recommended.

In practical application, if it is detected that a certain target search word contains a preset word and/or the word representation concept of the target search word matches with a preset abnormal representation concept, it may be determined that the certain target search word is an abnormal search word, and further, the abnormal search word may be filtered and deleted, and by deleting the abnormal search word, the target search word from which the abnormal search word is deleted may be used as a search word to be recommended.

Specifically, a filtering rule can be configured by self-definition according to a service requirement, based on the filtering rule, when a certain target search word is detected to contain an illegal word (namely a preset word), the abnormal search word can be determined to be an abnormal search word, and then the abnormal search word is filtered and deleted, or according to a rule for ensuring that the generated search word conforms to a unified platform, when a word representation concept of a certain target search word is detected to be matched with a preset abnormal representation concept, the abnormal search word can be determined to be an abnormal search word, and then the abnormal search word is filtered and deleted.

For example, deletions may be filtered directly for abnormal search terms containing the illicit word "advertisement"; the abnormal search words determined as the related concepts of "pornography" and "politics" (namely, the preset abnormal characteristic concepts) can be directly filtered and deleted.

According to the technical scheme, if the target search word comprises the preset word and/or the word representation concept of the target search word is matched with the preset abnormal representation concept, the target search word is judged to be the abnormal search word, the abnormal search word is deleted from at least one target search word, the search word to be recommended is obtained, and the relevance and the effectiveness of the search word can be guaranteed without manual participation based on post-processing logic of search word filtering.

In an exemplary embodiment, the pre-trained search term generation model has a pre-trained encoder and a pre-trained decoder, and the method for inputting the video content text and the associated knowledge map of the target video into the pre-trained search term generation model to obtain at least one candidate search term includes: inputting the video content text and the associated knowledge graph into a pre-trained encoder to obtain an encoding result; the coding result comprises a first coding result obtained by coding the video content text and a second coding result obtained by coding the associated knowledge graph; inputting the coding result into a pre-trained decoder to obtain at least one candidate search term; the candidate search term is obtained by decoding a fused coding result between the first coding result and the second coding result through a pre-trained decoder.

In a specific implementation, the pre-trained search term generation model may employ a bert2bert based on a transform (a model based on an encoder-decoder, i.e., an encoding-decoding structure), and the pre-trained search term generation model may have a pre-trained encoder and a pre-trained decoder, and relevant parameters of the encoder and the decoder may be initialized by the pre-trained bert (a pre-trained model with a strong generalization capability).

The method comprises the steps of generating a model based on a pre-trained search word, inputting a video content text and an associated knowledge graph into a pre-trained encoder, encoding the video content text to obtain a first encoding result, encoding the associated knowledge graph to obtain a second encoding result, and decoding a fused encoding result between the first encoding result and the second encoding result through a pre-trained decoder to obtain at least one candidate search word.

In an example, video content text can be encoded in an encoding process, for example, video text information of multiple modalities including a video title, a speech text (ASR), an image text (OCR) and the like can be merged, video content understanding can be enhanced, a knowledge graph can be enhanced, video semantic information can be enriched by a priori knowledge merged into the knowledge graph (i.e., an associated knowledge graph), for example, concept entities associated with video content are extracted from the knowledge graph, entity relationships among the concept entities serve as a priori knowledge, and then the priori knowledge is encoded separately as a path semantic context (context) to constrain search word generation in a decoding process, so that search word generation quality is improved.

In yet another example, in the decoding process, diversity control may be performed, that is, sampling is performed in a sampling manner using a Top-p decoding strategy, so as to increase diversity of the generated result; repeated character control can also be carried out, namely, punishment is carried out on repeated text segments so as to reduce the problem that the model generates the repeated segments and avoid generating search words similar to the 'second-hand vehicle market', the probability of characters which possibly form the repeated segments can be directly set to be 0 in each step of decoding process, and then the characters can be prevented from being sampled during sampling, and if the decoder generates the prefix of 'second-hand second', the probability of 'hand' characters can be set to be 0; word length control can also be carried out, namely the length of the generated search word and whether the generated search word contains a specific keyword can be controlled by changing the number in the input word length control code; and correlation control can be performed, namely in the decoding process, the generated words can be constrained by fusing the prior knowledge in the knowledge graph and the video content text, so that the semantic correlation of the generated search words and the video content is improved.

According to the technical scheme, the video content text and the associated knowledge graph are input to the pre-trained encoder to obtain the encoding result, the encoding result is input to the pre-trained decoder to obtain at least one candidate search word, diversity and readability of the generated search word can be improved based on diversity control, repeated character control, word length control and correlation control, the word length can be defined by user, specific keywords can be contained, and correlation of the generated search word is improved.

In an exemplary embodiment, inputting the video content text and the associated knowledge-graph to a pre-trained encoder to obtain an encoding result, comprises: splicing a preset search word control code with a video content text to obtain a spliced text; the search word control code comprises a word length control code and a keyword control code; the word length control code is used for controlling the word length of the candidate search word; the keyword control code is used for controlling whether the candidate search words contain keywords corresponding to the keyword control code; and inputting the spliced text and the associated knowledge graph into a pre-trained encoder to obtain an encoding result.

In an example, for the requirement that the length of the search term in the service scenario cannot be too long and the requirement of product marketing, if the search term needs to contain a specific keyword (brand name), the search term control code may be based on a preset search term control code, and the search term control code may include a term length control code and a keyword control code, where the term length control code may be used to control the term length of the candidate search term, and the keyword control code may be used to control whether the candidate search term contains a keyword corresponding to the keyword control code, and by splicing the search term control code with the video content text, a spliced text may be obtained, and then the spliced text and the associated knowledge graph are input to a pre-trained encoder, and an encoding result may be obtained.

For example, the spliced text may be represented as follows:

[ START ]6[ INTERLEAVER ] Brand name xx [ INTERLEAVER ]1[ INTERLEAVER ] XXXXXXXYYYYYYY

Wherein, the number 6 between the start character and the first spacer may indicate the generated search word length (i.e. the word length of the candidate search word), the brand name xx between the first spacer and the second spacer may be the content of the keyword (i.e. the keyword corresponding to the keyword control code) included in the search word, the number between the second spacer and the third spacer may indicate whether the keyword is included (e.g. 1 indicates yes, 0 indicates no), and the xxxxxxyyyyyyyy following the third spacer may be the text content of the short video (i.e. the video content text).

According to the technical scheme, the spliced text is obtained by splicing the preset search word control code with the video content text, and then the spliced text and the associated knowledge map are input to the pre-trained encoder to obtain the encoding result, so that the word length of the generated search word and whether the generated search word contains the specified keyword can be finely controlled.

In an exemplary embodiment, before the step of inputting the video content text of the target video and the associated knowledge graph into the pre-trained search term generation model to obtain at least one candidate search term, the method further includes: acquiring first training sample data; each first training sample data comprises a video content text of a first sample video and a first sample search word of the first sample video, and the click times of a user account corresponding to the first sample search word are larger than a preset click time threshold; extracting concept entities associated with video content of the first sample video and entity relations among the concept entities from a preset knowledge graph aiming at the first sample video to generate an associated knowledge graph corresponding to the first sample video; training a search term generation model to be trained based on a video content text of the first sample video, an associated knowledge map corresponding to the first sample video and the first sample search term to obtain a pre-trained search term generation model.

In an example, first training sample data can be obtained through a click log of an online user short video search, the click log is a behavior of clicking a short video after the user searches, the behavior is recorded in a short video platform background, based on the click log, a search word associated with each short video and the click times of the search word can be obtained, then for each short video, according to the click times data of the search word, a search word with a higher click time (namely a first sample search word) can be selected, and if the click times are larger than 100 (namely a preset click times threshold), a video content text of the first sample video and the first sample search word of the first sample video can be used as training sample data of a pre-trained search word generation model.

For example, the input data and output data of the search term generation model to be trained may be as follows:

according to the technical scheme, the first training sample data is obtained, then, for the first sample video, the concept entities related to the video content of the first sample video and the entity relations among the concept entities are extracted from the preset knowledge graph, the related knowledge graph corresponding to the first sample video is generated, further, based on the video content text of the first sample video, the related knowledge graph corresponding to the first sample video and the first sample search terms, the search term generation model to be trained is trained, the pre-trained search term generation model is obtained, high-quality search terms can be screened out according to click times based on log data, the first training sample data is constructed, manual labeling cost is not needed, and pre-training efficiency of the search term generation model is improved.

In an exemplary embodiment, before the step of inputting each candidate search term and the video content text into a pre-trained search term evaluation model to obtain a relevance label corresponding to each candidate search term, the method further includes: acquiring second training sample data; each second training sample data comprises a video content text of a second sample video, a second sample search word and a corresponding correlation label thereof, a random search word and a corresponding correlation label thereof, and different correlation labels correspond to different correlation degrees; and training the search term evaluation model to be trained based on the video content text of the second sample video, the second sample search terms and the corresponding correlation labels thereof, and the random search terms and the corresponding correlation labels thereof to obtain a pre-trained search term evaluation model.

In an example, by screening out a plurality of videos (i.e., second sample videos) having both high-click search terms and low-click search terms, the labels of the high-click search terms may be set to "strong correlation" and "weak correlation" (i.e., second sample search terms and their corresponding correlation labels), and for each video, a search term may be randomly sampled from the search term set, and its label may be set to "irrelevant" (i.e., a random search term and its corresponding correlation label), so that the video content text of the second sample video, the second sample search terms and their corresponding correlation labels, the random search terms and their corresponding correlation labels may be used as training sample data of the pre-trained search term evaluation model.

For example, the relevance label configuration of the search term evaluation model to be trained may be as follows:

in yet another example, the search term evaluation model may employ a pre-trained Bert, and the search term evaluation model to be trained (a tri-classification model) may be obtained by performing fine-tuning on the constructed training data.

According to the technical scheme, the pre-trained search term evaluation model is obtained by obtaining the second training sample data and then training the search term evaluation model to be trained based on the video content text of the second sample video, the second sample search terms and the corresponding correlation labels thereof, the random search terms and the corresponding correlation labels thereof, and the correlation marking can be automatically performed according to the distribution difference of the click times and the random sampling to construct the second training sample data, so that the pre-training efficiency of the search term evaluation model is improved.

In order to enable those skilled in the art to better understand the above steps, the embodiment of the present disclosure is illustrated below by an example, but it should be understood that the embodiment of the present disclosure is not limited thereto.

As shown in fig. 3b, in the video search term recommendation process, three modules may be included: the search word generation module, the search word evaluation module and the search word filtering module are used for generating a search word (namely at least one candidate search word) according to short video text content (namely video content text of a target video) and an associated knowledge graph based on the search word generation module (namely a pre-trained search word generation model) in the process S001; in the process S002, based on the search term evaluation module (i.e., the pre-trained search term evaluation model), the relevance between each search term obtained in the process S001 and the short video may be scored to obtain a relevance label corresponding to each search term (i.e., a relevance label corresponding to each candidate search term); in the process S003 and the process S004, based on the search term filtering module, search term rewriting/filtering and wind control filtering can be performed on the search terms with strong correlation labels and the search terms with weak correlation labels (i.e., target search terms) obtained in the process S002 according to preset service requirements and/or playing platform rules, so as to obtain short video search terms (i.e., search terms to be recommended) to guide the user account to execute search operations after accessing the short video.

Fig. 4 is a flowchart illustrating another video search term recommendation method according to an example embodiment, which may be used in the server 120 of fig. 1, as shown in fig. 4, and includes the following steps.

In step S410, first training sample data is acquired; each first training sample data comprises a video content text of a first sample video and a first sample search word of the first sample video, and the number of clicks of a user account corresponding to the first sample search word is larger than a preset number of clicks threshold. In step S420, for the first sample video, extracting concept entities associated with video content of the first sample video and entity relationships between the concept entities from a preset knowledge graph, and generating an associated knowledge graph corresponding to the first sample video. In step S430, training a to-be-trained search term generation model based on the video content text of the first sample video, the associated knowledge map corresponding to the first sample video, and the first sample search term, to obtain the pre-trained search term generation model. In step S440, second training sample data is acquired; each second training sample data comprises a video content text of a second sample video, a second sample search word and a corresponding correlation label thereof, a random search word and a corresponding correlation label thereof, and different correlation labels correspond to different correlation degrees. In step S450, a search term evaluation model to be trained is trained based on the video content text of the second sample video, the second sample search term and its corresponding relevance label, and the random search term and its corresponding relevance label, so as to obtain the pre-trained search term evaluation model. In step S460, for a target video, extracting concept entities associated with video content of the target video and entity relationships between the concept entities from a preset knowledge graph, and generating an associated knowledge graph corresponding to the target video. In step S470, the video content text of the target video and the associated knowledge map are input to a pre-trained search term generation model to obtain at least one candidate search term. In step S480, inputting each candidate search term and the video content text into a pre-trained search term evaluation model to obtain a relevance label corresponding to each candidate search term; the relevance label is used for characterizing the relevance degree of the candidate search word and the video content of the target video. In step S490, determining a search term to be recommended for the target video according to the candidate search term whose relevance label meets a preset condition; and the search word to be recommended is used for guiding a user account to execute search operation after accessing the target video. It should be noted that, for the specific limitations of the above steps, reference may be made to the above specific limitations of a video search term recommendation method, and details are not repeated here.

It should be understood that, although the steps in the flowcharts of fig. 1 and 4 are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 1 and 4 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of performing the steps or stages is not necessarily sequential, but may be performed alternately or alternately with other steps or at least a portion of the steps or stages in other steps.

It is understood that the same/similar parts between the embodiments of the method described above in this specification can be referred to each other, and each embodiment focuses on the differences from the other embodiments, and it is sufficient that the relevant points are referred to the descriptions of the other method embodiments.

Fig. 5 is a block diagram illustrating a video search term recommender in accordance with an exemplary embodiment. Referring to fig. 5, the apparatus includes:

an associated knowledge graph generating unit 501, configured to extract, for a target video, concept entities associated with video content of the target video and entity relationships between the concept entities from a preset knowledge graph, and generate an associated knowledge graph corresponding to the target video;

a candidate search term obtaining unit 502 configured to perform input of the video content text of the target video and the associated knowledge graph to a pre-trained search term generation model, so as to obtain at least one candidate search term;

a relevance label obtaining unit 503, configured to perform input of each candidate search term and the video content text into a pre-trained search term evaluation model, so as to obtain a relevance label corresponding to each candidate search term; the relevance label is used for representing the relevance degree of the candidate search word and the video content of the target video;

a to-be-recommended search term determining unit 504 configured to execute the candidate search terms satisfying a preset condition according to the relevance tags, and determine a to-be-recommended search term for the target video; and the search word to be recommended is used for guiding a user account to execute search operation after accessing the target video.

In a possible implementation manner, the search term to be recommended determining unit 504 is specifically configured to execute the candidate search term whose relevance label meets a preset condition as a target search term; filtering abnormal search words in at least one target search word to obtain the search words to be recommended; and the abnormal search word is determined according to preset service requirements and/or playing platform rules.

In a possible implementation manner, the search term to be recommended determining unit 504 is specifically further configured to determine that the target search term is the abnormal search term if the target search term includes a specified term; the specified words are determined based on preset service requirements and playing platform rules; deleting the specified words in the abnormal search words to obtain modified search words; taking the modified search terms and the non-abnormal search terms as the search terms to be recommended; the non-abnormal search word is a target search word which does not contain the specified word.

In a possible implementation manner, the search term to be recommended determining unit 504 is specifically further configured to determine that the target search term is the abnormal search term if the target search term includes a preset term and/or a term representation concept of the target search term matches a preset abnormal representation concept; the preset words and the preset abnormal representation concepts are determined based on preset service requirements and playing platform rules; and deleting the abnormal search word in at least one target search word to obtain the search word to be recommended.

In a possible implementation manner, the pre-trained search term generation model has a pre-trained encoder and a pre-trained decoder, and the candidate search term obtaining unit 502 is specifically configured to perform input of the video content text and the associated knowledge map to the pre-trained encoder to obtain an encoding result; the coding result comprises a first coding result obtained by coding the video content text and a second coding result obtained by coding the associated knowledge graph; inputting the coding result to the pre-trained decoder to obtain at least one candidate search term; the candidate search term is obtained by decoding a fused coding result between the first coding result and the second coding result through the pre-trained decoder.

In a possible implementation manner, the candidate search term obtaining unit 502 is specifically further configured to perform stitching of a preset search term control code and the video content text to obtain a stitched text; the search word control code comprises a word length control code and a keyword control code; the word length control code is used for controlling the word length of the candidate search word; the keyword control code is used for controlling whether the candidate search word contains a keyword corresponding to the keyword control code; and inputting the spliced text and the associated knowledge graph into the pre-trained encoder to obtain an encoding result.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 6 is a block diagram illustrating an electronic device 600 for a video search term recommendation method according to an example embodiment. For example, the electronic device 600 may be a server. Referring to fig. 6, electronic device 600 includes a processing component 620 that further includes one or more processors, and memory resources, represented by memory 622, for storing instructions, such as application programs, that are executable by processing component 620. The application programs stored in memory 622 may include one or more modules that each correspond to a set of instructions. Further, the processing component 620 is configured to execute instructions to perform the above-described methods.

The electronic device 600 may further include: the power component 624 is configured to perform power management for the electronic device 600, the wired or wireless network interface 626 is configured to connect the electronic device 600 to a network, and the input/output (I/O) interface 628. The electronic device 600 may operate based on an operating system stored in the memory 622, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, or the like.

In an exemplary embodiment, a computer-readable storage medium comprising instructions, such as the memory 622 comprising instructions, executable by the processor of the electronic device 600 to perform the above-described method is also provided. The storage medium may be a computer-readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program product is also provided, which includes instructions executable by a processor of the electronic device 600 to perform the above-described method.

It should be noted that the descriptions of the above-mentioned apparatus, the electronic device, the computer-readable storage medium, the computer program product, and the like according to the method embodiments may also include other embodiments, and specific implementations may refer to the descriptions of the related method embodiments, which are not described in detail herein.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A video search term recommendation method, the method comprising:

2. The method according to claim 1, wherein the determining a search term to be recommended for the target video according to the candidate search terms whose relevance labels satisfy a preset condition comprises:

3. The method of claim 1, wherein the pre-trained search term generation model has a pre-trained encoder and a pre-trained decoder, and the inputting the video content text of the target video and the associated knowledge-graph into the pre-trained search term generation model to obtain at least one candidate search term comprises:

4. The method of claim 3, wherein inputting the video content text and the associated knowledge-graph to the pre-trained encoder results in an encoding result comprising:

5. The method of claim 1, wherein before the step of inputting the video content text of the target video and the associated knowledge-graph into a pre-trained search term generation model to obtain at least one candidate search term, further comprising:

6. The method according to any one of claims 1 to 5, wherein before the step of inputting each candidate search term and the video content text into a pre-trained search term evaluation model to obtain a relevance label corresponding to each candidate search term, the method further comprises:

7. A video search term recommendation apparatus, comprising:

8. An electronic device, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the video search term recommendation method of any of claims 1 to 6.

9. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the video search term recommendation method of any of claims 1-6.

10. A computer program product comprising instructions which, when executed by a processor of an electronic device, enable the electronic device to perform the video search term recommendation method of any one of claims 1 to 6.