CN113051966A

CN113051966A - Video keyword processing method and device

Info

Publication number: CN113051966A
Application number: CN201911368195.4A
Authority: CN
Inventors: 万庆川; 周丽莎; 曹旭; 周波
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Group Chongqing Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Group Chongqing Co Ltd
Priority date: 2019-12-26
Filing date: 2019-12-26
Publication date: 2021-06-29

Abstract

The invention discloses a method and a device for processing video keywords, wherein the method comprises the following steps: converting the video to obtain at least one video frame image; performing character recognition processing on a subtitle area of at least one video frame image to obtain a subtitle text contained in the at least one video frame image; performing word segmentation processing on the subtitle text to obtain at least one text entry; determining the ordering weight of at least one text entry according to the position attribute information and the word frequency information of at least one text entry; and screening at least one target entry from the at least one text entry according to the sorting weight of the at least one text entry, and storing the at least one target entry into a database as a keyword of the video. Through the mode, the video keywords are extracted from the video subtitles, the accuracy of the video keywords is improved, the video keywords are enriched, a user can search the video through the more accurate keywords, and the accuracy of video searching is improved.

Description

Video keyword processing method and device

Technical Field

The invention relates to the technical field of computers, in particular to a method and a device for processing video keywords.

Background

Nowadays, the internet industry develops well, and various internet products have penetrated into various fields of modern social and economic life, so that great convenience is provided for daily work, study and life of people. With the rapid development of the internet, the network content is greatly enriched, various video collection platforms are promoted, and the video platforms combine a large amount of video resources, but cause the difficulty of searching for target videos by people during use. In the prior art, videos are retrieved through simple modes such as video titles and video classification, and a user mainly classifies required target videos through title keywords and video classification.

However, the inventor finds that the prior art has at least the following defects in the process of implementing the invention: however, in the case of massive videos, especially many similar videos, the video titles and video classifications often cannot accurately express key information of the videos, and it is difficult for users to perform accurate retrieval through the video titles and video classifications, for example, training videos developed in JAVA, because different training institutions and different instructors emphasize knowledge points of explanation, based on the prior art, it is difficult to distinguish them through title and classification search, and it is difficult for users to find suitable videos through title or classification.

Disclosure of Invention

In view of the above, the present invention is proposed to provide a method and apparatus for processing a video keyword that overcomes or at least partially solves the above problems.

According to an aspect of the present invention, there is provided a method for processing a video keyword, including:

converting the video to obtain at least one video frame image;

performing character recognition processing on a subtitle area of at least one video frame image to obtain a subtitle text contained in the at least one video frame image;

performing word segmentation processing on the subtitle text to obtain at least one text entry;

determining the ordering weight of at least one text entry according to the position attribute information and the word frequency information of at least one text entry;

and screening at least one target entry from the at least one text entry according to the sorting weight of the at least one text entry, and storing the at least one target entry into a database as a keyword of the video.

Optionally, the step of screening at least one target entry from the at least one text entry according to the ranking weight of the at least one text entry specifically includes:

and sequencing at least one text entry according to the sequencing weight, and screening out the text entries with the sequencing positioned at the top N as target entries, wherein N is not less than 1.

Optionally, the screening of the at least one target entry from the at least one text entry according to the ranking weight of the at least one text entry further comprises:

and screening out the text entries with the sequencing weight larger than a preset threshold value as target entries according to the sequencing weight of at least one text entry.

Optionally, before the method is executed, the method further includes:

constructing an image character set, and constructing a character recognition model according to the image character set;

performing character recognition processing on a subtitle region of at least one video frame image to obtain a subtitle text included in the at least one video frame image specifically includes:

and identifying subtitle text contained in a subtitle area of at least one video frame image based on the character identification model.

Optionally, the constructing the image text set specifically includes:

and generating the image characters of at least one font by utilizing a character generation function according to the font file of at least one font.

According to another aspect of the present invention, there is provided a processing apparatus for a video keyword, including:

the conversion processing module is suitable for converting the video to obtain at least one video frame image;

the recognition processing module is suitable for performing character recognition processing on a subtitle region of at least one video frame image to obtain a subtitle text contained in the at least one video frame image;

the word segmentation processing module is suitable for carrying out word segmentation processing on the subtitle text to obtain at least one text entry;

the screening module is suitable for determining the sequencing weight of at least one text entry according to the position attribute information and the word frequency information of the at least one text entry; and screening at least one target entry from the at least one text entry according to the sorting weight of the at least one text entry, and storing the at least one target entry into a database as a keyword of the video.

Optionally, the screening module is further adapted to:

the screening module is further adapted to: and sequencing at least one text entry according to the sequencing weight, and screening out the text entries with the sequencing positioned at the top N as target entries, wherein N is not less than 1.

Optionally, the screening module is further adapted to:

Optionally, the apparatus further comprises:

the model construction module is suitable for constructing an image character set and constructing a character recognition model according to the image character set;

the identification processing module is further adapted to: and identifying subtitle text contained in a subtitle area of at least one video frame image based on the character identification model.

Optionally, the model building module is further adapted to:

According to yet another aspect of the present invention, there is provided a computing device comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the processing method of the video keywords.

According to still another aspect of the present invention, there is provided a computer storage medium, in which at least one executable instruction is stored, and the executable instruction causes a processor to perform an operation corresponding to the processing method of the video keyword.

According to the processing method and device of the video keywords, the video is converted to obtain at least one video frame image; performing character recognition processing on a subtitle area of at least one video frame image to obtain a subtitle text contained in the at least one video frame image; performing word segmentation processing on the subtitle text to obtain at least one text entry; determining the ordering weight of at least one text entry according to the position attribute information and the word frequency information of at least one text entry; and screening at least one target entry from the at least one text entry according to the sorting weight of the at least one text entry, and storing the at least one target entry into a database as a keyword of the video. Through this kind of mode, the extraction subtitle carries out semantic analysis, the word segmentation, selects the keyword and offers the user to retrieve the inquiry, has extracted the video keyword from the subtitle of video equivalently, has promoted the precision of video keyword, and richened the video keyword, and make the user can search the video through more accurate keyword, can solve prior art through the inaccurate problem of search that simple video title or video classification lead to, help promoting the accuracy of video search.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

fig. 1 is a flowchart illustrating a method for processing a video keyword according to an embodiment of the present invention;

fig. 2 is a flowchart illustrating a method for processing a video keyword according to another embodiment of the present invention;

FIG. 3 is a diagram illustrating a dataflow graph during model training in one embodiment of the present invention;

fig. 4 is a block diagram illustrating a processing apparatus for a video keyword according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a computing device provided by an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

The method for setting the keywords for the video is beneficial to improving the accuracy of video retrieval, the keywords are extracted in a manual summarizing mode in the existing solution, however, a large amount of manpower and material resources are consumed, the ability of people watching the video is different, and the summarized keywords are inaccurate. Based on this, the embodiment of the invention provides a method for extracting video keywords from subtitles of a video.

Fig. 1 shows a flowchart of a method for processing a video keyword according to an embodiment of the present invention, and as shown in fig. 1, the method includes the following steps:

step S101, converting the video to obtain at least one video frame image.

Video is a continuous picture, consisting of one frame by one frame of pictures. For example, a common video is 25 frames per second, the video is converted, and the video can be divided into 25 video frame images per second.

Step S102, performing character recognition processing on the subtitle region of at least one video frame image to obtain a subtitle text contained in at least one video frame image.

In order to help users understand, subtitles are added to videos, and text recognition processing is carried out on subtitle areas of video frame images to obtain subtitle texts contained in the video frame images. In general, a subtitle region is located below a video frame image. In this way, the caption text of the video can be acquired.

Step S103, performing word segmentation processing on the subtitle text to obtain at least one text entry.

And performing word segmentation on the subtitle text to obtain a plurality of text entries.

And step S104, determining the ordering weight of at least one text entry according to the position attribute information and the word frequency information of at least one text entry.

The position of the entry in the text is different, and the importance of the contribution to the subject meaning of the text is different, so that the weight distribution is necessary according to the position of the entry in the text. Meanwhile, the term frequency of the term in the subtitle text can also represent the importance of the term, and the higher the term frequency of the term is, the more the term can represent the theme of the video.

Specifically, the position weight of the caption title is set to a first weight, and the position weight of the caption text is set to a second weight. The ordering weight of a text entry is determined based on the position weight of the text entry and its word frequency, e.g., the ordering weight of the text entry is equal to the product of its position weight and word frequency.

Step S105, according to the sequencing weight of at least one text entry, screening at least one target entry from the at least one text entry, and storing the at least one target entry into a database as a keyword of a video.

After the ranking weight of each text entry is determined, at least one target entry is screened from the text entries as a keyword of the video, and the keyword of the video is stored in a database, for example, the entry ranked in the top N bits can be screened as the target entry.

The key words of the videos are the index words of the videos, at least one target entry is screened out from the extracted text entries to serve as the key of the videos, and the key words are stored in the database, so that the user can search for more accurate videos by adopting more accurate search words, and in the subsequent process, a knowledge graph can be formed to intelligently recommend the user.

According to the method for processing the video keywords, provided by the embodiment of the invention, the video is converted into the image, the subtitle text in the image is identified, the subtitle text is subjected to word segmentation to obtain a plurality of entries, one part of the entries is screened from the extracted plurality of entries to be used as the keywords of the video, and the keywords are stored in the database. Through this kind of mode, the extraction subtitle carries out semantic analysis, the word segmentation, selects the keyword and offers the user to retrieve the inquiry, has extracted the video keyword from the subtitle of video equivalently, has promoted the precision of video keyword, and richened the video keyword, and make the user can search the video through more accurate keyword, can solve prior art through the inaccurate problem of search that simple video title or video classification lead to, help promoting the accuracy of video search.

Fig. 2 is a flowchart illustrating a method for processing a video keyword according to another embodiment of the present invention, and as shown in fig. 2, the method includes the following steps:

step S201, an image character set is constructed, and a character recognition model is constructed according to the image character set.

In this embodiment, the image Character Recognition is performed in an OCR (Optical Character Recognition) manner, the core of the OCR is Character Recognition, and the computer recognizes characters, and information such as characteristics of characters needs to be stored in the computer first, so that to implement the Character Recognition of an image, an image Character set needs to be constructed first.

In the embodiment of the invention, the construction of the image character set comprises the following steps:

(1) and generating a corresponding table recorded with Chinese characters and label. For example, the standard of GB2312-80 Chinese standard is used as the basis, and about 3755 common Chinese characters are used. And generating a mapping table by using a pickel module, wherein the mapping table records the mapping relation between the ID and the Chinese characters, and then storing the mapping table.

(2) The font files are collected. For example, 10 Chinese character fonts are collected as the fonts used in the Chinese character data set.

(3) Font images are generated and stored in a prescribed directory. First, input parameters are defined, including output table of contents, font table of contents, test set size, image rotation amplitude, etc. Reading a mapping table obtained in the first step into the memory, wherein the table represents the mapping from the ID to the Chinese character, converting the mapping table into the mapping from the Chinese character to the ID, and using the mapping for generating the font later. The tool used for generating the font image is the PIL, and the digitalized Chinese characters can be generated by combining the Chinese character generating function in the PIL with the font file.

(4) Data enhancement processing, namely increasing the recognition rate of characters through data addition processing, and specifically comprises the following steps: text distortion, background noise (salt and pepper), text position (setting the center point of the text), stroke adhesion (expansion to simulate), stroke fracture (corrosion to simulate), text inclination (text rotation), various fonts and the like.

Finally, an image collection file corresponding to each Chinese character is generated. In the embodiment of the invention, one character corresponds to a plurality of image characters, and the image characters of the same character are stored in the same image file.

In the embodiment of the invention, the image character code generation flow is as follows: (1) setting the background, the color and the size of the font, and the font file used, (2) font generation, (3) conversion to np array, (4) finding the minimum bounding rectangle of the font, (4) adjusting the image text, and (5) returning the generated image.

After the image character set is constructed, constructing a character recognition model based on a deep learning technology, specifically: first, a network is built. The text recognition is a multi-classification task, and the 3755 text recognition is a classification task of 3755 classes. In an alternative embodiment of the invention, deep learning is used for character recognition, the defined network uses a simpler modified version of LeNet, the penalty function selects spark _ softmax _ cross _ entry _ with _ locations, the optimizer selects Adam, and the learning rate is set to 0.1.

The second is model training. The data needs to be designed so as to be fed to the network training efficiently. A data flow graph is created, and the data flow graph consists of a plurality of pipeline stages which are connected together by a queue. Fig. 3 shows a schematic diagram of a data flow diagram during model training in an embodiment of the present invention, as shown in fig. 3, the first stage is to generate file names (Filenames), read these file names and arrange them into a (random shuffle) file name queue (FilenameQueue). The second stage reads data from the file using Reader, performs decoding (Decoder) to generate samples, and places the samples in a sample queue (ExampleQueue). Depending on the setting, the samples of the second stage can actually be copied so that they are independent of each other, so that they can be read in parallel from a plurality of files. At the end of the second stage is a queuing operation, i.e. enqueuing to the queue, and dequeuing in the next stage. Because it is the thread that is going to begin running these enqueue operations, the training loop causes the samples in the sample queue to be continually dequeued.

Enqueue operations are all performed in the main thread, and sessions can be executed by multiple threads together. In the application scene of data input, the enqueue operation is to read the input from the hard disk and put the input into the memory, and the speed is slower. The use of queueranner creates a new series of threads to enqueue, allowing the main thread to continue using the data. If in the scene of training the neural network, the training network and the reading data are asynchronous, the main thread is in the training network, and the other thread is reading the data into the memory from the hard disk.

Step S202, the video is converted to obtain at least one video frame image.

The code for the video conversion process is as follows:

step S203, based on the character recognition model, recognizing the caption text contained in the caption area of at least one video frame image.

Based on the character recognition model constructed in step S201, the subtitle text included in the subtitle region of each video frame image is recognized.

Step S204, performing word segmentation processing on the subtitle text to obtain at least one text entry.

Firstly, a word segmentation tool HanLP is adopted to segment words of a text, and the word segmentation tool HanLP is an open-source toolkit. Optionally, after the word segmentation process is completed, the word segmentation process result is filtered to filter out some stop words, which specifically refer to functional words that do not reflect the subject, such as "of", "ground", "get", "however", "therefore", and so on. They do not reflect the subject matter of the text and also interfere with the extraction of keywords, which must be filtered out.

Step S205, determining the ordering weight of at least one text entry according to the position attribute information and the word frequency information of at least one text entry.

For example, the position weight of the title is set to 5, and the text of the subtitle is set to 1. After the position weights of the various parts of the text are determined, each position is marked with a numeric label. And when the word frequency is counted by word-by-word scanning, the position information of each entry is recorded simultaneously. In this way, the degree of contribution of each term in the text to the entire text can be determined. The specific calculation formula is as follows:

W＝n*w

wherein, W is a text entry ordering weight, namely an overall weight in a subtitle text; w is the position weight of each entry in the caption text, wherein the title position weight is 5, and the caption text is 1; n is the word frequency and refers to the number of times the text entry appears in the subtitle text.

And S206, sequencing at least one text entry according to the sequencing weight, and screening out the text entries with the sequencing positioned at the top N as target entries, wherein N is not less than 1.

After the sequencing weight of each text entry is determined, at least one target entry is screened from the text entries to be used as a keyword of the video, and the keyword of the video is stored in a database.

Specifically, at least one text entry is ranked according to the ranking weight, and the text entry with the ranking positioned at the top N is screened out to be used as a target entry, wherein N is not less than 1. And sequencing according to the sequencing weight, taking the first N results as keywords of the video, and extracting the first 15 text entries as the keywords of the video in specific implementation.

In addition, the text entries with the sorting weight larger than the preset threshold value can be screened out as the target entries according to the sorting weight of at least one text entry. Setting a weight threshold value, and taking the text entry with the sorting weight larger than the weight threshold value as the key word of the video.

Therefore, the method disclosed by the embodiment of the invention applies machine deep learning to the extraction of the video subtitles, performs word segmentation and word meaning analysis on the extracted subtitle text, calculates the sorting weight of the terms according to the positions and frequencies of the terms, forms the term sorting according to the weight, selects the keywords of the video, realizes the extraction of the important keywords of the video, and can improve the accuracy of the keywords of the subtitles by extracting the subtitle content of the video, so that a user can search the video through more accurate keywords, the accuracy of video search is improved, and basic data can be provided for building accurate map knowledge based on the keywords of the video, thereby realizing the accurate content recommendation for the user.

Fig. 4 is a block diagram illustrating a processing apparatus for processing a video keyword according to an embodiment of the present invention, and as shown in fig. 4, the apparatus includes:

a conversion processing module 41, adapted to perform conversion processing on the video to obtain at least one video frame image;

the recognition processing module 42 is adapted to perform character recognition processing on a subtitle region of at least one video frame image to obtain a subtitle text included in the at least one video frame image;

a word segmentation processing module 43, adapted to perform word segmentation processing on the subtitle text to obtain at least one text entry;

a filtering module 44 adapted to determine a ranking weight of at least one text entry according to position attribute information of the at least one text entry and word frequency information thereof; and screening at least one target entry from the at least one text entry according to the sequencing weight of the at least one text entry, and storing the at least one target entry into a database as a keyword of the video.

In an alternative approach, the screening module 44 is further adapted to:

In an optional manner, the apparatus further comprises:

the identification processing module 42 is further adapted to: and identifying subtitle text contained in a subtitle area of at least one video frame image based on the character identification model.

In an alternative approach, the model building module is further adapted to:

Therefore, the device disclosed by the embodiment of the invention applies machine deep learning to the extraction of the video subtitles, performs word segmentation and word meaning analysis on the extracted subtitle text, calculates the sorting weight of the terms according to the positions and frequencies of the terms, forms the term sorting according to the weight, selects the keywords of the video, realizes the extraction of the important keywords, can improve the accuracy of the subtitle keywords by extracting the subtitle content of the video, enables a user to search the video through more accurate keywords, improves the accuracy of video search, can provide basic data for constructing an accurate knowledge map based on the video keywords, and realizes the accurate content recommendation for the user.

The embodiment of the invention provides a nonvolatile computer storage medium, wherein at least one executable instruction is stored in the computer storage medium, and the computer executable instruction can execute the processing method of the video keywords in any method embodiment.

The executable instructions may be specifically configured to cause the processor to:

converting the video to obtain at least one video frame image;

In an alternative, the executable instructions cause the processor to:

Therefore, the method applies machine deep learning to the extraction of the video subtitles, carries out word segmentation and word meaning analysis on the extracted subtitle text, calculates the sorting weight of the terms according to the positions and the frequencies of the terms, forms term sorting according to the weight, selects the keywords of the video, realizes the extraction of important keywords, can improve the accuracy of the subtitle keywords by extracting the subtitle content of the video, enables a user to search the video through more accurate keywords, improves the accuracy of video search, can also provide basic data for constructing an accurate knowledge graph based on the video keywords, and realizes the accurate content recommendation for the user.

Fig. 5 is a schematic structural diagram of an embodiment of a computing device according to the present invention, and a specific embodiment of the present invention does not limit a specific implementation of the computing device.

As shown in fig. 5, the computing device may include: a processor (processor)502, a Communications Interface 504, a memory 506, and a communication bus 508.

Wherein: the processor 502, communication interface 504, and memory 506 communicate with one another via a communication bus 508. A communication interface 504 for communicating with network elements of other devices, such as clients or other servers. The processor 502 is configured to execute the program 510, and may specifically perform relevant steps in the above-described embodiment of the method for processing the video keyword for the computing device.

In particular, program 510 may include program code that includes computer operating instructions.

The processor 502 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement an embodiment of the present invention. The computing device includes one or more processors, which may be the same type of processor, such as one or more CPUs; or may be different types of processors such as one or more CPUs and one or more ASICs.

And a memory 506 for storing a program 510. The memory 506 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.

The program 510 may specifically be used to cause the processor 502 to perform the following operations:

converting the video to obtain at least one video frame image;

In an alternative, the program 510 causes the processor 502 to:

In an alternative, the program 510 causes the processor 502 to: constructing an image character set, and constructing a character recognition model according to the image character set;

In an alternative, the program 510 causes the processor 502 to: and generating the image characters of at least one font by utilizing a character generation function according to the font file of at least one font.

The algorithms or displays presented herein are not inherently related to any particular computer, virtual system, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. In addition, embodiments of the present invention are not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the embodiments of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components according to embodiments of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names. The steps in the above embodiments should not be construed as limiting the order of execution unless specified otherwise.

Claims

1. A method for processing video keywords comprises the following steps:

converting the video to obtain at least one video frame image;

performing character recognition processing on a subtitle region of the at least one video frame image to obtain a subtitle text contained in the at least one video frame image;

determining the ordering weight of the at least one text entry according to the position attribute information and the word frequency information of the at least one text entry;

and screening at least one target entry from the at least one text entry according to the sequencing weight of the at least one text entry, and storing the at least one target entry into a database as a keyword of the video.

2. The method of claim 1, wherein the filtering out at least one target entry from the at least one text entry according to the ranking weight of the at least one text entry comprises:

and sequencing the at least one text entry according to the sequencing weight, and screening out the text entries with the sequencing positioned at the top N as target entries, wherein N is not less than 1.

3. The method of claim 1, wherein filtering out at least one target entry from the at least one text entry based on the ranking weight of the at least one text entry further comprises:

4. The method of claim 1, wherein prior to performing the method, further comprising:

the performing character recognition processing on the caption area of the at least one video frame image to obtain the caption text included in the at least one video frame image specifically includes:

and identifying subtitle texts contained in a subtitle region of the at least one video frame image based on the character identification model.

5. The method of claim 4, wherein the constructing the image text set specifically comprises:

6. A video keyword processing apparatus, comprising:

the recognition processing module is suitable for performing character recognition processing on a subtitle region of the at least one video frame image to obtain a subtitle text contained in the at least one video frame image;

the screening module is suitable for determining the sequencing weight of at least one text entry according to the position attribute information and the word frequency information of the at least one text entry; and screening at least one target entry from the at least one text entry according to the sequencing weight of the at least one text entry, and storing the at least one target entry into a database as a keyword of the video.

7. The apparatus of claim 6, wherein the screening module is further adapted to: and sequencing the at least one text entry according to the sequencing weight, and screening out the text entries with the sequencing positioned at the top N as target entries, wherein N is not less than 1.

8. The apparatus of claim 6, wherein the screening module is further adapted to: and screening out the text entries with the sequencing weight larger than a preset threshold value as target entries according to the sequencing weight of at least one text entry.

9. A computing device, comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;

the memory is used for storing at least one executable instruction, and the executable instruction causes the processor to execute the operation corresponding to the video keyword processing method of any one of claims 1-5.

10. A computer storage medium having at least one executable instruction stored therein, the executable instruction causing a processor to perform operations corresponding to the method for processing video keywords according to any one of claims 1 to 5.