CN111353070B

CN111353070B - Video title processing method and device, electronic equipment and readable storage medium

Info

Publication number: CN111353070B
Application number: CN202010098765.9A
Authority: CN
Inventors: 卞东海; 蒋帅; 罗雨
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-02-18
Filing date: 2020-02-18
Publication date: 2023-08-18
Anticipated expiration: 2040-02-18
Also published as: CN111353070A

Abstract

The application discloses a video title processing method, a video title processing device, electronic equipment and a readable storage medium, and relates to video processing technology. The specific implementation scheme is as follows: acquiring video information and video types of a video to be processed; obtaining a title template of the video type according to the video type; obtaining entity information related to the video to be processed according to the knowledge graph and the video information; and generating a video title of the video to be processed according to the title template and the entity information.

Description

Video title processing method and device, electronic equipment and readable storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a video title processing method, apparatus, electronic device, and readable storage medium.

Background

In the era of internet information explosion, the reduction of traffic charges and the increase of network speed make video-related data volume increase, and users tend to be more multimedia rather than simple text when acquiring knowledge.

However, it may be difficult for the user to find his/her desired video at the time of searching due to the lack of titles of the video or lack of intuitiveness. Therefore, it is needed to provide a method for generating video titles to improve the quality of video titles, so as to meet the search requirements of users.

Disclosure of Invention

Aspects of the present application provide a method, apparatus, electronic device, and readable storage medium for processing a video title to improve the quality of the video title.

In one aspect of the present application, a method for processing a video title is provided, including:

acquiring video information and video types of a video to be processed;

obtaining a title template of the video type according to the video type;

obtaining entity information related to the video to be processed according to the knowledge graph and the video information;

and generating a video title of the video to be processed according to the title template and the entity information.

The aspect and any possible implementation manner as described above further provide an implementation manner, before the acquiring the video information and the video type of the video to be processed, the method further includes:

taking the current video without the video title as the video to be processed; and/or

Judging the existing title of the current video to determine whether the existing title of the current video needs to be adjusted; and taking the current video with the existing title needing to be adjusted as the video to be processed.

The aspect and any possible implementation manner as described above further provide an implementation manner, where the obtaining, according to a knowledge graph and the video information, the entity information related to the video to be processed includes:

Obtaining symbol data according to the video information;

and carrying out entity identification processing on the video information by utilizing the knowledge graph and the symbol data so as to obtain the entity information related to the video to be processed.

Aspects and any one of the possible implementations as described above, further providing an implementation, where the knowledge-graph is a general knowledge-graph; the video information includes at least one of title data, subtitle data, and voice data; and obtaining the entity information related to the video to be processed according to the knowledge graph and the video information, wherein the entity information comprises:

and carrying out text-based entity identification processing on the video information by using the universal knowledge graph so as to obtain the entity information related to the video to be processed.

Aspects and any one of the possible implementations as described above, further providing an implementation, where the knowledge-graph is a video knowledge-graph; the video information comprises video feature data; and obtaining the entity information related to the video to be processed according to the knowledge graph and the video information, wherein the entity information comprises:

and carrying out feature-based entity identification processing on the video information by utilizing the video knowledge graph so as to obtain the entity information related to the video to be processed.

In the aspect and any possible implementation manner described above, there is further provided an implementation manner, where the generating, according to the title template and the entity information, a video title of the video to be processed includes:

combining the entity information to obtain an entity combination;

and utilizing the title template to organize the entity combination so as to generate a video title of the video to be processed.

The aspect and any possible implementation manner as described above further provide an implementation manner, where the generating, according to the title template and the entity information, a video title of the video to be processed further includes:

performing syntactic analysis on the video information to determine pronouns;

and replacing the pronouns according to the entity information with highest occurrence frequency to generate the video title of the video to be processed.

and generating a video title of the video to be processed by using a title adjustment model according to the existing title of the video to be processed.

In another aspect of the present application, there is provided a processing apparatus for video titles, comprising:

the acquisition unit is used for acquiring video information and video types of the video to be processed;

the template unit is used for obtaining a title template of the video type according to the video type;

the entity unit is used for obtaining entity information related to the video to be processed according to the knowledge graph and the video information;

and the generating unit is used for generating the video title of the video to be processed according to the title template and the entity information.

The aspects and any possible implementation manner as described above further provide an implementation manner, the obtaining unit is further configured to

Aspects and any one of the possible implementations as described above, further provide an implementation, the entity unit is specifically configured to

Obtaining symbol data according to the video information; and

Aspects and any one of the possible implementations as described above, further providing an implementation, where the knowledge-graph is a general knowledge-graph; the video information includes at least one of title data, subtitle data, and voice data; the entity unit is particularly used for

Aspects and any one of the possible implementations as described above, further providing an implementation, where the knowledge-graph is a video knowledge-graph; the video information comprises video feature data; the entity unit is particularly used for

The aspects and any possible implementation manner as described above further provide an implementation manner, the generating unit is specifically configured to

Combining the entity information to obtain an entity combination; and

The aspects and any possible implementation manner as described above further provide an implementation manner, the generating unit is further configured to

Performing syntactic analysis on the video information to determine pronouns; and

In another aspect of the present invention, there is provided an electronic apparatus including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the aspects and methods of any one of the possible implementations described above.

In another aspect of the application, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method of the aspects and any possible implementation described above.

According to the technical scheme, the video title of the video is automatically generated based on the video to be processed and the knowledge graph, and the video information of the video to be processed and the related entity information are considered, so that the generated video title can be used for describing the main characteristics of the video to be processed in a targeted manner, and the quality of the video title is improved.

In addition, by adopting the technical scheme provided by the application, the current video without the video title or the current video with the existing title needing to be adjusted is used as the video to be processed through the analysis processing of the existing title of the current video, and further, the video title based on the video to be processed and the knowledge graph is automatically generated, and the video title does not need to be regenerated for all the current videos, so that the processing efficiency of the video title is improved.

In addition, by adopting the technical scheme provided by the application, the symbol data is obtained according to the video information of the video to be processed, and further, the entity identification processing is carried out on the video information by utilizing the universal knowledge graph and the symbol data so as to obtain the entity information related to the video to be processed.

In addition, by adopting the technical scheme provided by the application, the common knowledge graph is adopted, and the text-based entity recognition processing can be performed on the video information containing at least one of the caption data of the video to be processed and the voice data of the video to be processed so as to obtain the entity information related to the video to be processed, and further, the video title of the video is automatically generated based on the entity information related to the video to be processed.

In addition, by adopting the technical scheme provided by the application, the video knowledge graph added with the video features can perform feature-based entity identification processing on the video information containing the video feature data of the video to be processed so as to obtain the entity information related to the video to be processed, and further, based on the entity information related to the video to be processed, the video title of the video is automatically generated, and the main features of the video to be processed can be specifically described due to the fact that the video information of the video to be processed and the entity information related to the video feature are considered, so that the quality of the video title is improved.

In addition, by adopting the technical scheme provided by the application, the entity combination is obtained by carrying out combination processing on the entity information related to the video, and then the entity combination can be organized by utilizing the title template to generate the video title of the video to be processed.

In addition, by adopting the technical scheme provided by the application, the experience of the user can be effectively improved.

Other effects of the above aspects or possible implementations will be described below in connection with specific embodiments.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art. The drawings are only for a better understanding of the present solution and are not to be construed as limiting the application. Wherein:

Fig. 1 is a flowchart illustrating a method for processing a video title according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a video title processing device according to another embodiment of the present application;

fig. 3 is a schematic diagram of an electronic device for implementing a method for processing a video title according to an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present application are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It will be apparent that the described embodiments are some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It should be noted that, the terminal according to the embodiment of the present application may include, but is not limited to, a mobile phone, a personal digital assistant (PersonalDigitalAssistant, PDA), a wireless handheld device, a tablet computer (tablet computer), a personal computer (PersonalComputer, PC), an MP3 player, an MP4 player, a wearable device (for example, smart glasses, smart watches, smart bracelets, and the like), and a smart device such as a smart home device.

In addition, the term "and/or" herein is merely an association relationship describing an association object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.

In the age of internet information explosion, trending applications and products that spread around network data are of great interest. In the context of content ecological construction, each internet company needs to have its own unique content resources to form its own unique competitiveness.

The richness, ornamental value and accuracy of the content displayed by searching are key to ensuring good experience of netizens, at present, the reduction of flow tariffs and the improvement of network speed lead to the increase of video related data volume, and users tend to be more prone to multimedia rather than simple text modes when acquiring knowledge, but because video titles are missing or are not visual enough, the users are difficult to find the video wanted by themselves during searching.

Therefore, the method for generating the video title can generate the related video title for the video with the missing video title, and adjust the video title with poor quality, thereby meeting the search requirement of a user and improving the search experience of the user.

Fig. 1 is a flowchart illustrating a method for processing a video title according to an embodiment of the application, as shown in fig. 1.

101. And acquiring video information and video types of the video to be processed.

102. And obtaining a title template of the video type according to the video type.

103. And obtaining the entity information related to the video to be processed according to the knowledge graph and the video information.

104. And generating a video title of the video to be processed according to the title template and the entity information.

The knowledge graph is a network knowledge base formed by connecting entities with attributes through relationships, and from the view point of the graph, the knowledge graph is essentially a concept network, wherein nodes represent entities (or concepts) in the physical world, and various semantic relationships among the entities form edges in the network, and the edges are directional. Thus, the knowledge graph is a symbolic representation of the physical world.

It should be noted that, part or all of the execution subjects of 101 to 104 may be applications located in the local terminal, or may be functional units such as a plug-in unit or a software development kit (SoftwareDevelopmentKit, SDK) provided in the applications located in the local terminal, or may be a processing engine located in a server on the network side, or may be a distributed system located on the network side, for example, a processing engine or a distributed system in a video processing platform on the network side, which is not particularly limited in this embodiment.

It will be appreciated that the application may be a native program (native app) installed on the terminal, or may also be a web page program (webApp) of a browser on the terminal, which is not limited in this embodiment.

Therefore, the video title of the video is automatically generated based on the video to be processed and the knowledge graph, and the main characteristics of the video to be processed can be described in a targeted manner due to the fact that the video information of the video to be processed and the related entity information of the video to be processed are considered, so that the quality of the video title is improved.

Optionally, in a possible implementation of this embodiment, before 101, the current video may be further preprocessed to determine the video that needs to be subjected to video title generation.

In a specific implementation process, the current video without the video title may be specifically used directly as the video to be processed.

In another specific implementation process, the existing title of the current video can be specifically subjected to discrimination processing to determine whether the existing title of the current video needs to be adjusted; and taking the current video with the existing title needing to be adjusted as the video to be processed.

In the implementation process, the total number of characters of the existing titles of the current video can be acquired first, and the existing titles with the total number of characters smaller than or equal to the shortest threshold value or the existing titles with the total number of characters larger than or equal to the longest threshold value are determined to be the existing titles needing to be adjusted. Otherwise, the existing titles with the total character number larger than the shortest threshold and smaller than the longest threshold are input into a general language processing model to further judge and process the existing titles.

And then, further acquiring the number of characters of the appointed language type in the existing titles of the current video, and determining the existing titles with the number of characters of the appointed language type less than or equal to half of the total number of characters as the existing titles needing to be adjusted. Otherwise, the existing titles with the number of characters of the appointed language type being more than half of the total number of characters are input into a general language processing model to further judge and process the existing titles.

In this implementation process, the language representation model may use various existing language representation models to obtain vector representations of video titles, specifically may be a unidirectional language representation model, for example, a generative pre-Training (GPT) model or a GPT2 model, or may be a bidirectional language representation model, for example, an embedded (embedded from LanguageModels, ELMo) model or a bidirectional encoder representation (BidirectionalEncoderRepresentationsfromTransformers, BERT) model from a transducer, which is not particularly limited in this embodiment.

Taking the BERT model as an example, the pre-training parameters corresponding to the BERT model can be selected as the base layer according to the language type of the video title. For example, consider that application requirements are primarily in video title generation in Chinese, and thus, chinese pre-training parameters may be used as an underlying basis.

In this implementation process, before the existing title is input into the language representation model, word segmentation processing may be further performed on the existing title, for example, in units of characters, separation by separation symbols, and the like, so as to obtain a word segmentation result of the existing title. Furthermore, the word segmentation result of the existing headline is input into the BERT model to obtain the vector expression of the existing headline.

After obtaining the vector representation of the existing title of the current video, it may then be determined whether the existing title of the current video needs to be adjusted based on the vector representation of the existing title of the current video. For example, a Softmax layer is added after the top-level output of the BERT model, which outputs two categories, namely the probability that an adjustment is needed and the probability that no adjustment is needed. Then, with a preset probability threshold value, such as 0.5, it can be determined that existing titles greater than or equal to the probability threshold value need to be adjusted, and existing titles less than the probability threshold value do not need to be adjusted.

In the implementation manner, the current video without the video title or the current video with the existing title to be adjusted can be used as the video to be processed through analysis processing of the existing title of the current video, and further, the video title based on the video to be processed and the knowledge graph is automatically generated for the video to be processed, and the video title does not need to be regenerated for all the current videos, so that the processing efficiency of the video title is improved.

Alternatively, in one possible implementation manner of the present embodiment, in 101, the acquired video information of the video to be processed may include, but is not limited to, at least one of title data, subtitle data, voice data, and video feature data of the video to be processed, which is not particularly limited in this embodiment.

In a specific implementation process, text recognition processing may be specifically performed on an existing title of a video to be processed, so as to obtain title data of the video to be processed.

In another specific implementation process, text recognition processing may be specifically performed on the subtitle of the video to be processed to obtain subtitle data of the video to be processed, or the subtitle data of the video to be processed may also be directly obtained from a subtitle file of the video to be processed.

In another specific implementation process, the video to be processed may be subjected to voice recognition processing to obtain voice data of the video to be processed.

In another specific implementation process, the feature extraction process may be specifically performed on the video to be processed to obtain video feature data of the video to be processed, for example, image feature data, time sequence feature data, audio feature data, and the like, which is not particularly limited in this embodiment.

The video feature data of the video to be processed may be static feature data, for example, image feature data of an object such as a face, an automobile, or may also be dynamic feature data, for example, time sequence feature data of running, riding, etc., which is not particularly limited in this embodiment.

Alternatively, in one possible implementation of the present embodiment, in 101, the video type of the acquired video to be processed may include, but is not limited to, at least one of music, food, games, movies, sports, animation, society, automobile, entertainment, science and technology, life, history, leisure, military, and relatives, which the present embodiment is not particularly limited to.

In a specific implementation process, the video type of the video to be processed may be obtained specifically according to the classification information of the video to be processed.

In another specific implementation process, if the classification information of the video to be processed does not exist, the video type of the video to be processed may be obtained according to the video information of the video to be processed.

For example, the video type of the video to be processed may be obtained according to the title data of the video to be processed, for example, keyword matching processing is performed on the video title of the video to be processed.

Or, for another example, the video type of the video to be processed may be obtained according to the subtitle data of the video to be processed, for example, keyword matching processing is performed on the subtitle data of the video to be processed.

Or, for another example, the video type of the video to be processed may be obtained specifically according to video feature data of the video to be processed, such as image feature data, audio feature data, and the like, by using a video classification model, such as a Softmax model.

Optionally, in a possible implementation of the present embodiment, before 102, a title template of each video type may be further configured according to a title requirement of each video type.

Optionally, in one possible implementation manner of this embodiment, in 103, symbol data may be obtained specifically according to the video information, and then, entity identification processing may be performed on the video information by using the knowledge graph and the symbol data, so as to obtain entity information related to the video to be processed.

The entity information refers to related information such as entity name, entity type, and entity attribute of an entity obtained based on a knowledge graph.

In a specific implementation process, symbol data in the video information, such as a title number (title), a square bracket [ ], a quotation mark "", a colon: symbol data with special meaning such as space and the like can be used for directly identifying the content corresponding to the symbol data in video information as entity information corresponding to the symbol data according to the video type of the video to be processed, so as to be used as entity information related to the video to be processed.

For example, the video type is music, video information "[ CCTV concert hall ] [ Liang Zhu ] violin: lv Saiqing Huang Binning Peak Huang Mengla Chen Xiliu Campsis", then the method can be directly based on the title number: and blank spaces, namely identifying content ' beam blessing ' corresponding to the book name number ' and ' beam blessing ' as entity names ' beam blessing ' and entity types ' music ', and marking: the contents "Lv Saiqing", "Huang Bin", "Ning Feng", "Huang Mengla", "Chen Xi" and "Liu Xiao" corresponding to spaces are identified as entity names "Lv Saiqing", "Huang Bin", "Ning Feng", "Huang Mengla", "Chen Xi" and "Liu Xiao", and entity types "character".

And then, carrying out entity identification processing on other contents in the video information by further utilizing the knowledge graph so as to obtain other relevant entity information of the video to be processed.

The knowledge graph in the application can be a knowledge graph aiming at a general concept in the physical world, can be called a general knowledge graph in the application, or can also be a video knowledge graph aiming at video characteristics in the video field, namely, the video characteristics of an entity are added in the general knowledge graph, and can be called a video knowledge graph in the application.

In a specific implementation, the video information may include, but is not limited to, at least one of title data, subtitle data, and voice data, which is not particularly limited in this embodiment. And then, performing text-based entity identification processing on the video information by using a universal knowledge graph to obtain entity information related to the video to be processed.

For example, text recognition processing may be specifically performed on an existing title of a video to be processed to obtain title data of the video to be processed.

Or, for another example, text recognition processing may be specifically performed on the subtitle of the video to be processed to obtain the subtitle data of the video to be processed, or the subtitle data of the video to be processed may also be directly obtained from the subtitle file of the video to be processed. After the caption data of the video to be processed is obtained, text-based entity identification processing can be performed on the caption data of the video to be processed by using the universal knowledge graph so as to obtain entity information related to the video to be processed.

Alternatively, for another example, a voice recognition process may be performed on the video to be processed, and the voice of the video to be processed is recognized as text as voice data of the video to be processed. After the voice data of the video to be processed is obtained, the universal knowledge graph can be utilized to perform text-based entity recognition processing on the voice data of the video to be processed so as to obtain entity information related to the video to be processed.

In another specific implementation, the video information may include, but is not limited to, video feature data, such as image feature data, time sequence feature data, audio feature data, and the like, which is not particularly limited in this embodiment.

Then, after obtaining the video feature data of the video to be processed, the video information may be subjected to feature-based entity identification processing by using a common knowledge graph, i.e. a video knowledge graph, to which the video features of the entities are added, so as to obtain the entity information related to the video to be processed.

For example, a video frame image of the video to be processed may be extracted from the video to be processed, and then a residual network (residual) feature extractor may be used to perform feature extraction processing on the video frame image of each video to be processed, so as to obtain static feature data of the video to be processed. After the static feature data of the video to be processed is obtained, the feature-based entity identification processing can be performed on the static feature data by utilizing a general knowledge graph, namely a video knowledge graph, of the video features of the entity, so as to obtain the entity information related to the video to be processed.

Or, for another example, the video frame image of the video to be processed may be extracted from the video to be processed, and then, a three-dimensional convolution (3D, C3D) network feature extractor may be utilized to perform feature extraction processing on a plurality of continuous video frame images of the video to be processed, so as to obtain dynamic feature data of the video to be processed. After the dynamic feature data of the video to be processed is obtained, the dynamic feature data can be subjected to feature-based entity identification processing by utilizing a general knowledge graph of the video features of the added entities, namely, a video knowledge graph, so as to obtain the entity information related to the video to be processed.

Optionally, in one possible implementation manner of this embodiment, in 104, the entity information may be specifically organized by using the title template, so as to generate a video title of the video to be processed.

In a specific implementation process, if the number of the entity information is one, the entity information may be directly organized by using the title template to generate a video title of the video to be processed.

In another specific implementation process, if the number of the entity information is plural, the entity information may be combined according to the association relationship between the entity information, so as to obtain an entity combination.

For example, for the entity information "halier" and "ganinril", since "halier" and "ganinril" are both dominant members of a fast fleet, the "halier" and "ganinril" may be combined to obtain the entity combination "halier—ganinril".

After the entity combination is obtained, the title template can be further reused to organize the entity combination so as to generate the video title of the video to be processed.

In another specific implementation process, the video information may be further subjected to syntax analysis to determine a pronoun in the video information, and then, according to the occurrence frequency of the entity information, entity information with the highest occurrence frequency may be replaced with the pronoun to generate a video title of the video to be processed.

For example, the title data of the video to be processed, "she is the first female of china" is parsed to determine the pronoun "she" in the title data. Then, according to the occurrence frequency of each entity information in the entity information of the video to be processed, replacing the pronoun of the entity information XXX with the highest occurrence frequency to generate a video title XXX of the video to be processed, which is the first female top in China.

In another specific implementation process, the video title of the video to be processed may be further generated by using a title adjustment model according to the existing title of the video to be processed.

The title adjustment model may be any of various existing deep learning models, for example, a deep learning model based on encoding/decoding (Encoder-Decoder), and the present embodiment is not particularly limited thereto.

By adopting the technical scheme provided by the application, the video title of the video to be processed is automatically generated, manual participation is not needed, the operation is simple, errors are not easy to occur, and the efficiency and the reliability of video title generation can be effectively improved.

After the technical scheme provided by the application is adopted to generate the video title of the video to be processed, the generated video title can be further subjected to discrimination processing so as to determine whether the generated video title still needs further adjustment. If it is determined that the generated video title needs further adjustment, the current adjustment processing is failed, and the reporting processing can be further performed for verification and adjustment manually.

In this embodiment, the video title of the video is automatically generated based on the video to be processed and the knowledge graph, and the video information of the video to be processed and the related entity information thereof are considered at the same time, so that the generated video title can describe the main features of the video to be processed in a targeted manner, thereby improving the quality of the video title.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.

Fig. 2 is a schematic structural diagram of a video title processing device according to another embodiment of the present application, as shown in fig. 2. The processing apparatus 200 of the video title of the present embodiment may include an acquisition unit 201, a template unit 202, an entity unit 203, and a generation unit 204. The acquiring unit 201 is configured to acquire video information and a video type of a video to be processed; a template unit 202, configured to obtain a title template of the video type according to the video type; the entity unit 203 is configured to obtain entity information related to the video to be processed according to the knowledge graph and the video information; and a generating unit 204, configured to generate a video title of the video to be processed according to the title template and the entity information.

It should be noted that, part or all of the execution body of the processing apparatus for video titles provided in this embodiment may be an application located at a local terminal, or may be a functional unit such as a plug-in unit or a software development kit (SoftwareDevelopmentKit, SDK) disposed in the application located at the local terminal, or may be a processing engine located in a server on a network side, or may be a distributed system located on the network side, for example, a processing engine or a distributed system in a video processing platform on the network side, which is not limited in this embodiment.

Optionally, in a possible implementation manner of this embodiment, the obtaining unit 201 may be further configured to use, as the video to be processed, a current video without a video title; and/or judging the existing title of the current video to determine whether the existing title of the current video needs to be adjusted; and taking the current video with the existing title needing to be adjusted as the video to be processed.

Optionally, in a possible implementation manner of this embodiment, the entity unit 203 may be specifically configured to obtain symbol data according to the video information; and performing entity identification processing on the video information by using the knowledge graph and the symbol data to obtain entity information related to the video to be processed.

Optionally, in one possible implementation manner of this embodiment, the knowledge graph is a general knowledge graph; the video information includes at least one of title data, subtitle data, and voice data; the entity unit 203 may be specifically configured to perform text-based entity identification processing on the video information by using the universal knowledge graph, so as to obtain entity information related to the video to be processed.

Optionally, in one possible implementation manner of this embodiment, the knowledge graph is a video knowledge graph; the video information comprises video feature data; the entity unit 203 may be specifically configured to perform feature-based entity identification processing on the video information by using the video knowledge graph, so as to obtain entity information related to the video to be processed.

Optionally, in a possible implementation manner of this embodiment, the generating unit 204 may specifically be configured to perform a combination process on the entity information to obtain an entity combination; and utilizing the title template to organize the entity combination so as to generate a video title of the video to be processed.

In a specific implementation, the generating unit 204 may be further configured to parse the video information to determine a pronoun; and replacing the pronouns according to the entity information with highest occurrence frequency to generate the video title of the video to be processed.

In another specific implementation process, the generating unit 204 may be further configured to generate, according to an existing title of the video to be processed, a video title of the video to be processed by using a title adjustment model.

It should be noted that, the method in the embodiment corresponding to fig. 1 may be implemented by the processing device for video titles provided in this embodiment. The detailed description may refer to the relevant content in the corresponding embodiment of fig. 1, and will not be repeated here.

In this embodiment, the obtaining unit obtains the video information and the video type of the video to be processed, and then the template unit obtains the title template of the video type according to the video type, and the entity unit obtains the entity information related to the video to be processed according to the knowledge graph and the video information, so that the generating unit can generate the video title of the video to be processed according to the title template and the entity information.

According to an embodiment of the present application, there is also provided an electronic device and a non-transitory computer-readable storage medium storing computer instructions.

Fig. 3 is a schematic diagram of an electronic device for implementing the method for processing a video title according to the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.

As shown in fig. 3, the electronic device includes: one or more processors 301, memory 302, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of a Graphical User Interface (GUI) on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 301 is illustrated in fig. 3.

Memory 302 is a non-transitory computer readable storage medium provided by the present application. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for processing video titles provided by the present application. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the processing method of video titles provided by the present application.

The memory 302 is used as a non-transitory computer readable storage medium, and may be used to store a non-transitory software program, a non-transitory computer executable program, and units, such as program instructions/units (e.g., the acquisition unit 201, the template unit 202, the entity unit 203, and the generation unit 204 shown in fig. 2) corresponding to the processing method of a video title in the embodiment of the present application. The processor 301 executes various functional applications of the server and data processing, that is, implements the processing method of the video title in the above-described method embodiment, by running a non-transitory software program, instructions, and units stored in the memory 302.

Memory 302 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created by use of an electronic device or the like according to a processing method of a video title provided by an embodiment of the present application. In addition, memory 302 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 302 may optionally include memory remotely located with respect to processor 301, which may be connected via a network to an electronic device implementing the video title processing method provided by embodiments of the present application. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the video title processing method may further include: an input device 303 and an output device 304. The processor 301, memory 302, input device 303, and output device 304 may be connected by a bus or other means, for example in fig. 3.

The input device 303 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of an electronic device implementing the video title processing method provided by embodiments of the present application, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer stick, one or more mouse buttons, a track ball, a joystick, etc. The output device 304 may include a display apparatus, auxiliary lighting devices (e.g., LEDs), haptic feedback devices (e.g., vibration motors), and the like. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application Specific Integrated Circuits (ASICs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a Cathode Ray Tube (CRT) or Liquid Crystal Display (LCD) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme provided by the embodiment of the application, the video title of the video is automatically generated based on the video to be processed and the knowledge graph, and the generated video title can be used for describing the main characteristics of the video to be processed in a targeted manner due to the fact that the video information of the video to be processed and the related entity information are considered at the same time, so that the quality of the video title is improved.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed embodiments are achieved, and are not limited herein.

The above embodiments do not limit the scope of the present application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application should be included in the scope of the present application.

Claims

1. A method for processing a video title, comprising:

judging the existing title of the current video to determine whether the existing title of the current video needs to be adjusted, comprising the following steps: acquiring the total character number of the existing titles of the current video, and determining the existing titles with the total character number smaller than or equal to the shortest threshold value or the existing titles with the total character number larger than or equal to the longest threshold value as the existing titles needing to be adjusted; inputting the existing titles with the total character number larger than the shortest threshold and smaller than the longest threshold into a language processing model to obtain vector expression of the existing titles, and determining whether the existing titles of the current video need to be adjusted according to the vector expression of the existing titles of the current video;

Taking the current video of the existing title needing to be adjusted as a video to be processed;

acquiring video information and video types of a video to be processed;

obtaining a title template of the video type according to the video type;

2. The method of claim 1, wherein prior to the obtaining video information and video type of the video to be processed, further comprising:

and taking the current video without the video title as the video to be processed.

3. The method according to claim 1, wherein the obtaining the entity information related to the video to be processed according to the knowledge-graph and the video information includes:

obtaining symbol data according to the video information;

4. The method of claim 1, wherein the knowledge-graph is a universal knowledge-graph; the video information includes at least one of title data, subtitle data, and voice data; and obtaining the entity information related to the video to be processed according to the knowledge graph and the video information, wherein the entity information comprises:

5. The method of claim 1, wherein the knowledge-graph is a video knowledge-graph; the video information comprises video feature data; and obtaining the entity information related to the video to be processed according to the knowledge graph and the video information, wherein the entity information comprises:

6. The method according to any one of claims 1-5, wherein generating a video title of the video to be processed from the title template and the entity information comprises:

combining the entity information to obtain an entity combination;

7. The method of claim 6, wherein generating a video title of the video to be processed based on the title template and the entity information, further comprises:

Performing syntactic analysis on the video information to determine pronouns;

8. The method of claim 6, wherein generating a video title of the video to be processed based on the title template and the entity information, further comprises:

9. A video title processing apparatus, comprising:

an obtaining unit, configured to perform a discriminating process on an existing title of a current video to determine whether the existing title of the current video needs to be adjusted, includes: acquiring the total character number of the existing titles of the current video, and determining the existing titles with the total character number smaller than or equal to the shortest threshold value or the existing titles with the total character number larger than or equal to the longest threshold value as the existing titles needing to be adjusted; inputting the existing titles with the total character number larger than the shortest threshold and smaller than the longest threshold into a language processing model to obtain vector expression of the existing titles, and determining whether the existing titles of the current video need to be adjusted according to the vector expression of the existing titles of the current video;

The acquisition unit is further used for taking the current video of the existing title needing to be adjusted as a video to be processed; acquiring video information and video types of a video to be processed;

10. The apparatus of claim 9, wherein the obtaining unit is further configured to use a current video without a video title as the video to be processed.

11. The apparatus according to claim 9, wherein the entity unit is specifically configured to

Obtaining symbol data according to the video information; and

12. The apparatus of claim 9, wherein the knowledge-graph is a universal knowledge-graph; the video information includes at least one of title data, subtitle data, and voice data; the entity unit is particularly used for

13. The apparatus of claim 9, wherein the knowledge-graph is a video knowledge-graph; the video information comprises video feature data; the entity unit is particularly used for

14. The apparatus according to any of the claims 9-13, characterized in that the generating unit is in particular adapted to

Combining the entity information to obtain an entity combination; and

15. The apparatus of claim 14, wherein the generating unit is further configured to parse the video information to determine a pronoun; and

16. The apparatus of claim 14, wherein the generating unit is further configured to generate a video title of the video to be processed using a title adjustment model based on an existing title of the video to be processed.

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

18. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-8.