CN109101558B

CN109101558B - Video retrieval method and device

Info

Publication number: CN109101558B
Application number: CN201810766347.5A
Authority: CN
Inventors: 张蒙; 徐荣阳
Original assignee: Beijing Cat Eyes Culture Media Co ltd
Current assignee: Beijing Cat Eyes Culture Media Co ltd
Priority date: 2018-07-12
Filing date: 2018-07-12
Publication date: 2022-07-01
Anticipated expiration: 2038-07-12
Also published as: CN109101558A

Abstract

The invention provides a video retrieval method and device. The method comprises the following steps: receiving a retrieval text input by a user; matching the retrieval text with keywords prestored in a video database; wherein, the video database stores the association relationship between the keywords and the video clips; and when the keywords matched with the retrieval text are matched, acquiring and displaying at least one video clip corresponding to the keywords. According to the invention, the retrieval text input by the user is matched through the video database, so that the video segments corresponding to the keywords are obtained, the collection and screening of video materials are not required to be manually carried out, the waste of human resources is reduced, and the economic expenditure is saved.

Description

Video retrieval method and device

Technical Field

The invention relates to the technical field of video retrieval, in particular to a video retrieval method and a video retrieval device.

Background

With the increase of entertainment content consumption modes of people, video editing work has a larger demand. The video clip refers to a process of cutting video segments in a video and then splicing the cut video segments to obtain a video desired by a user. The editing work of the film and television drama is the re-creation work in many times, and materials in the creation process are usually collected and screened manually, so that a lot of manual experience is needed, human resources are wasted, and further economic loss is caused.

Disclosure of Invention

The invention provides a video retrieval method and a video retrieval device, which are used for solving the problem of economic loss caused by human resource consumption due to manual collection and screening of video materials in the prior art.

In order to solve the above problems, the present invention discloses a video retrieval method, comprising: receiving a retrieval text input by a user; matching the retrieval text with keywords prestored in a video database; wherein, the video database stores the association relationship between the keywords and the video clips; and when the keywords matched with the retrieval text are matched, acquiring and displaying at least one video clip corresponding to the keywords.

Preferably, before the step of receiving the search text input by the user, the method further comprises: carrying out segmentation processing on the target video to obtain each segmented video segment; sequentially extracting video subtitle texts of the video clips aiming at each video clip; performing word segmentation processing on the video subtitle text to acquire at least one keyword corresponding to the video clip; and establishing and storing the association relationship between the video clip and the corresponding at least one keyword.

Preferably, the step of performing segmentation processing on the target video to obtain each segmented video segment includes: carrying out blocking processing on each frame of video image in the target video so as to divide each frame of video image into a plurality of video blocks; comparing corresponding video blocks of two adjacent frames of video images in the target video to obtain the change value of each corresponding video block; removing the corresponding video block with the maximum value in the change values, and performing normalization processing on the difference value square sum of the other corresponding video blocks to obtain a set value; when the set value is larger than the difference threshold value, taking the next frame of video image in the two adjacent frames of video images as a video segmentation point; and carrying out segmentation processing on the target video by using the video segmentation points to obtain each segmented video segment.

Preferably, the step of sequentially extracting the video caption text of each video segment includes: acquiring a subtitle file corresponding to each video clip, and extracting a video subtitle text corresponding to each video clip from each subtitle file; or sequentially selecting the subtitle areas in the video clips; and aiming at the subtitle area of each video clip, performing character recognition on the subtitle area to acquire a video subtitle text corresponding to the video clip.

Preferably, the step of obtaining and presenting at least one video segment corresponding to the keyword includes: acquiring the weight of the keyword in the at least one video segment; and sequencing and displaying the at least one video segment according to the weight.

In order to solve the above technical problem, the present invention also discloses a video retrieval apparatus, comprising: the receiving module is used for receiving a retrieval text input by a user; the matching module is used for matching the retrieval text with keywords prestored in a video database; wherein, the video database stores the association relationship between the keywords and the video clips; and the obtaining and displaying module is used for obtaining and displaying at least one video clip corresponding to the keyword when the keyword matched with the retrieval text is matched.

Preferably, the method further comprises the following steps: the segmentation processing module is used for carrying out segmentation processing on the target video to obtain each video segment after the segmentation processing; the extraction module is used for sequentially extracting the video subtitle text of each video clip; the keyword acquisition module is used for performing word segmentation processing on the video subtitle text to acquire at least one keyword corresponding to the video clip; and the association establishing module is used for establishing and storing the association relationship between the video clip and the corresponding at least one keyword.

Preferably, the segmentation processing module includes: the block processing submodule is used for carrying out block processing on each frame of video image in the target video so as to divide each frame of video image into a plurality of video blocks; the change value acquisition sub-module is used for comparing corresponding video blocks of two adjacent frames of video images in the target video to acquire the change value of each corresponding video block; the set value obtaining sub-module is used for removing the corresponding video block with the maximum value in the change values and normalizing the sum of the square differences of the other corresponding video blocks to obtain a set value; the video division point determining submodule is used for taking a next frame of video image in the two adjacent frames of video images as a video division point when the set value is greater than the difference threshold value; and the video segment acquisition sub-module is used for carrying out segmentation processing on the target video by the video segmentation points to obtain each segmented video segment.

Preferably, the extraction module comprises: the first video subtitle acquisition submodule is used for acquiring subtitle files corresponding to the video clips and extracting video subtitle texts corresponding to the video clips from the subtitle files; or the subtitle area selection submodule is used for sequentially selecting the subtitle areas in the video clips; and the second video subtitle acquisition sub-module is used for performing character recognition on the subtitle region of each video segment so as to acquire the video subtitle text corresponding to the video segment.

Preferably, the acquisition and display module comprises: the weight obtaining submodule is used for obtaining the weight of the keyword in the at least one video segment; and the sequencing display sub-module is used for sequencing and displaying the at least one video segment according to the weight.

Compared with the prior art, the invention has the following advantages:

the embodiment of the invention provides a video retrieval method and a video retrieval device, which match a retrieval text with keywords prestored in a video database by receiving the retrieval text input by a user, wherein the video database stores the association relationship between the keywords and video segments, and at least one video segment corresponding to the keywords is acquired and displayed when the keywords matched with the retrieval text are matched. According to the invention, the retrieval text input by the user is matched through the video database, so that the video segments corresponding to the keywords are obtained, the collection and screening of video materials are not required to be manually carried out, the waste of human resources is reduced, and the economic expenditure is saved.

Drawings

Fig. 1 is a flowchart illustrating steps of a video retrieval method according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram illustrating a video retrieval apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

Example one

Referring to fig. 1, a flowchart illustrating steps of a video retrieval method according to an embodiment of the present invention is shown, which may specifically include the following steps:

step 101: and receiving retrieval text input by a user.

The embodiment of the invention can be applied to the scene of video clip retrieval.

The search text input by the user may be a word, such as "buddha" or "science fiction", or may be a segment of text, such as "short video related to science fiction", and the embodiment of the present invention is not limited thereto.

In a preferred embodiment of the present invention, before the step 101, the method may further include:

step S1: and carrying out segmentation processing on the target video to obtain each video segment after the segmentation processing.

In the embodiment of the present invention, the target video refers to a video that needs to be subjected to video segmentation to obtain corresponding video segments.

Through the segmentation processing of the target video, each video clip corresponding to the target video can be obtained, and the specific implementation process can be executed by referring to the following steps:

step S1-1: and carrying out blocking processing on each frame of video image in the target video so as to divide each frame of video image into a plurality of video blocks.

In the embodiment of the present invention, each frame of video image in the target video may be subjected to a blocking process to divide each frame of video image into several video blocks, for example, a first frame of video image into 3 × 3 video blocks, and the like.

In practical applications, a person skilled in the art may set the number of blocks of each frame of the video image according to actual needs, which is not limited by the embodiment of the present invention.

After dividing each frame of video image of the target video into video blocks, step S1-2 is performed.

Step S1-2: and comparing corresponding video blocks of two adjacent frames of video images in the target video to obtain the change value of each corresponding video block.

After each frame of video image in the target video is blocked, the corresponding video blocks of two adjacent frames of video images in the target video may be compared to obtain the variation value of each corresponding video block, for example, after the first frame of video image and the second frame of video image are equally divided into 3 × 3 video blocks, the first video block of the first frame of video image, i.e., the upper left corner first video block, may be compared with the second video block of the second frame of video image, and the second video block, i.e., the video block behind the first video block in the same row, may be compared with the second video block in the second video image.

In the embodiment of the present invention, the change value of each corresponding video block may be preset, for example, in corresponding video blocks in two adjacent frames of video images, if the scenes are completely the same, the change value may be set to 0, if the scenes are completely different, the change value may be set to 1, and if the scenes are partially the same, the corresponding change value may be set according to the sizes of different portions.

In practical applications, a person skilled in the art can set the size of the variation value of each video block according to actual needs, which is not limited by the embodiment of the present invention.

After obtaining the variation values of the corresponding video blocks of the two adjacent frames of video images in the target video, step S1-3 is executed.

Step S1-3: and removing the corresponding video block with the maximum value in the change values, and performing normalization processing on the sum of squared differences of the other corresponding video blocks to obtain a set value.

After obtaining the variation values of each corresponding video block in two adjacent frames of video images in the target video, the video block corresponding to the maximum value in the variation values may be removed, and the sum of squared differences of the remaining corresponding video blocks is normalized to obtain a set value.

According to the embodiment of the invention, the influence of scenes suddenly appearing in a certain video block on subsequent analysis can be avoided by removing the maximum value.

Step S1-4: and when the set value is larger than the difference threshold value, taking the next frame of video image in the two adjacent frames of video images as a video segmentation point.

In the embodiment of the present invention, a difference threshold, that is, a threshold corresponding to a change value of a corresponding video block in two adjacent frames of video images, may be set, and the difference threshold is compared with a set value, and when the set value is smaller than the difference threshold, it indicates that the change of the two adjacent frames of video images is small, and the change is not considered, so as to compare the next two adjacent frames of video images.

And when the set value is larger than the difference threshold value, the change of the two adjacent frames of video images is large, and a next frame of video image in the two adjacent frames of video images is taken as a video segmentation point to complete the segmentation processing of the target video image.

After the video division point is determined, step S1-5 is performed.

Step S1-5: and carrying out segmentation processing on the target video by using the video segmentation points to obtain each segmented video segment.

After the video segmentation points are obtained, the target video may be segmented by the video segmentation points, so as to obtain each video segment after the segmentation processing.

Of course, the above-mentioned video segmentation scheme is only provided by the embodiment of the present invention, and in practical applications, a person skilled in the art may also use other modes to segment a target video, which is not limited in this embodiment of the present invention.

After the target video is subjected to the division processing to obtain the video segments, step S2 is executed.

Step S2: and sequentially extracting the video subtitle text of each video clip.

After acquiring the video segments of the target video, extraction of the video subtitle text may be performed for each video segment.

Specifically, the extraction of the video subtitle text may be performed in two ways:

1. for video clips with video subtitle files, video subtitle texts can be directly extracted from the video subtitle files corresponding to the video clips.

2. For the subtitles embedded together with the video segment images in the video, subtitle regions in each video segment can be sequentially selected, for example, the lower half part of the video segment images is selected, and the bilaterally-symmetrical white pixel points are used as rectangular regions of the subtitles, and the like.

In practical applications, those skilled in the art may also use other manners to obtain the video subtitle text in each video segment, which is not limited in this embodiment of the present invention.

After the video subtitle text of the video clip is sequentially extracted for each video clip, step S3 is performed.

Step S3: and performing word segmentation processing on the video subtitle text to acquire at least one keyword corresponding to the video clip.

After extracting the video subtitle text of each video segment, the video subtitle text may be segmented to extract keywords of each video segment as keywords of the video frequency band, for example, the video subtitle text of the video segment a is "true scenery of west lake", the extracted keywords may be "west lake" and "scenery", and the extracted words of "west lake" and "scenery" are used as the keywords of the video segment a.

One video segment may correspond to one keyword, and may also correspond to a plurality of video keywords, which is not limited in this embodiment of the present invention.

After acquiring at least one keyword corresponding to each video segment, step S4 is performed.

Step S4: and establishing and storing the association relationship between the video clip and the corresponding at least one keyword.

After obtaining at least one keyword corresponding to each video segment, an association relationship between the at least one keyword and the video segment may be established, and the association relationship is stored in the video database, for example, there is an association relationship between the keyword a and the video segment a, there is an association relationship between the keyword a and the keyword B and the video segment B, and after obtaining the association relationship, the keyword a and the video segment a may be associated and stored, and the keyword a and the keyword B may be associated and stored with the video segment B.

It should be understood that the above examples are only examples for better understanding of the technical solutions of the embodiments of the present invention, and are not to be construed as the only limitations of the present invention.

After receiving the search text input by the user, step 102 is performed.

Step 102: and matching the retrieval text with keywords prestored in a video database.

In the embodiment of the present invention, a video database may be preset, each video segment is stored in the video database, an association relationship between each video segment and a keyword is stored, an association relationship between one video segment and one or more keywords may be established for one video segment, an association relationship between one keyword and one or more video segments may also be established for one keyword, which is not limited in this embodiment of the present invention.

Of course, the video database may be stored on the terminal side, or may be stored on the server side.

When the video database is stored in the terminal side, when a retrieval text input by a user is received, such as "jinda", the retrieval text can be matched with the keywords stored in the video database, or when the user inputs a segment of characters "short videos related to science fiction types", the retrieval text can be analyzed, so that the keywords, such as "science fiction", can be extracted, and the keywords can be directly matched in the local video database according to the science fiction.

When the video database is stored in the server side and the search text input by the user is received, the search text can be sent to the server for matching, the specific matching process is similar to that at the terminal side, and details are not repeated herein.

After matching the search text with the keywords pre-stored in the video database, step 103 is performed.

Step 103: and when the keywords matched with the retrieval text are matched, acquiring and displaying at least one video clip corresponding to the keywords.

After matching keywords corresponding to a search text input by a user from a video database, at least one video segment corresponding to the keywords, for example, the search text a input by the user, where the keywords correspond to b, c, and d in the video database, the keyword b has an association relationship with the video segment 1 and the video segment 2, the keyword c has an association relationship with the video segment 1 and the video segment 3, and the keyword d has an association relationship with the video segment 1 and the video segment 3, then the video frequency band obtained according to the search text a has: video segment 1, video segment 2, video segment 3.

It should be understood that the above examples are only examples for better understanding of the technical solutions of the embodiments of the present invention, and are not intended to limit the present invention.

In a preferred embodiment of the present invention, after the step 103, the method may further include:

step N1: acquiring the weight of the keyword in the at least one video segment;

step N2: and sequencing and displaying the at least one video segment according to the weight.

In the embodiment of the present invention, weights of different keywords in one video segment may be preset, for example, the video segment a is associated with the keyword 1 and the keyword 2, the weight of the keyword 1 in the video segment a is 0.6, and the weight of the keyword 2 in the video segment a is 0.4. When only one keyword is associated with one video segment, the weight of the keyword in the video segment can be set to be 1, and the like.

After a plurality of video segments are obtained, each video segment may be associated with one or more keywords, and after keywords corresponding to a retrieval text input by a user are obtained, each video segment can be displayed in a sorted manner according to the weight of the corresponding keywords in each video segment after each searched video segment.

Of course, in practical applications, a person skilled in the art may also use other manners to display the searched video segments in an ordered manner, and the embodiment of the present invention is not limited thereto.

According to the video retrieval method provided by the embodiment of the invention, the retrieval text input by the user is received, the retrieval text is matched with the keywords prestored in the video database, wherein the incidence relation between the keywords and the video segments is stored in the video database, and when the keywords matched with the retrieval text are matched, at least one video segment corresponding to the keywords is obtained and displayed. According to the invention, the retrieval text input by the user is matched through the video database, so that the video segments corresponding to the keywords are obtained, the collection and screening of video materials are not required to be manually carried out, the waste of human resources is reduced, and the economic expenditure is saved.

Example two

Referring to fig. 2, a schematic structural diagram of a video retrieval apparatus according to an embodiment of the present invention is shown, which may specifically include:

a receiving module 210, configured to receive a search text input by a user; a matching module 220, configured to match the search text with a keyword pre-stored in a video database; wherein, the video database stores the association relationship between the keywords and the video clips; and the obtaining and displaying module 230 is configured to obtain and display at least one video segment corresponding to the keyword when the keyword matched with the search text is matched.

Preferably, the segmentation processing module includes: the block processing submodule is used for carrying out block processing on each frame of video image in the target video so as to divide each frame of video image into a plurality of video blocks; the change value acquisition sub-module is used for comparing corresponding video blocks of two adjacent frames of video images in the target video to acquire the change value of each corresponding video block; the set value acquisition sub-module is used for removing the corresponding video block with the maximum value in the change values and normalizing the sum of the square differences of the other corresponding video blocks to obtain a set value; the video division point determining submodule is used for taking a next frame of video image in the two adjacent frames of video images as a video division point when the set value is greater than the difference threshold value; and the video segment acquisition sub-module is used for carrying out segmentation processing on the target video by the video segmentation points to obtain each segmented video segment.

Preferably, the extraction module comprises: the first video subtitle acquisition submodule is used for acquiring subtitle files corresponding to the video clips and extracting video subtitle texts corresponding to the video clips from the subtitle files; or the subtitle area selection submodule is used for sequentially selecting the subtitle areas in the video clips; and the second video subtitle obtaining sub-module is used for carrying out character recognition on the subtitle region of each video segment so as to obtain the video subtitle text corresponding to the video segment.

Preferably, the acquisition and display module 230 includes: the weight obtaining submodule is used for obtaining the weight of the keyword in the at least one video segment; and the sequencing display sub-module is used for sequencing and displaying the at least one video clip according to the weight.

According to the video retrieval device provided by the embodiment of the invention, the retrieval text input by a user is received and matched with the keywords prestored in the video database, wherein the incidence relation between the keywords and the video segments is stored in the video database, and when the keywords matched with the retrieval text are matched, at least one video segment corresponding to the keywords is obtained and displayed. According to the invention, the retrieval text input by the user is matched through the video database, so that the video segments corresponding to the keywords are obtained, the collection and screening of video materials are not required to be manually carried out, the waste of human resources is reduced, and the economic expenditure is saved.

While, for purposes of simplicity of explanation, the foregoing method embodiments have been described as a series of acts or combination of acts, it will be appreciated by those skilled in the art that the present invention is not limited by the illustrated ordering of acts, as some steps may occur in other orders or concurrently with other steps in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The video retrieval method and the video retrieval device provided by the invention are described in detail, and the principle and the implementation mode of the invention are explained by applying specific examples, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims

1. A method for video retrieval, comprising:

receiving a retrieval text input by a user;

matching the retrieval text with keywords prestored in a video database; wherein the keywords are obtained by the following steps:

the method comprises the steps of carrying out blocking processing on each frame of video image in a target video to obtain a change value between corresponding blocks in adjacent frames, removing a corresponding video block with the maximum value in the change values, and when the change value is larger than a difference threshold value, segmenting the image to obtain each video segment;

acquiring a video subtitle text corresponding to the video clip, and processing the video subtitle text to obtain at least one keyword corresponding to the video clip;

the video database stores all video clips of the target video and the incidence relation between the keywords and all the video clips;

and when the keywords matched with the retrieval text are matched, acquiring and displaying at least one video clip corresponding to the keywords.

2. The method of claim 1, wherein the obtaining of the video subtitle text corresponding to the video segment comprises:

acquiring a subtitle file corresponding to each video clip, and extracting a video subtitle text corresponding to each video clip from each subtitle file; or

Sequentially selecting a subtitle area in each video clip;

and aiming at the subtitle area of each video clip, performing character recognition on the subtitle area to acquire a video subtitle text corresponding to the video clip.

3. The method according to claim 1, wherein the step of obtaining and presenting at least one video segment corresponding to the keyword comprises:

acquiring the weight of the keyword in the at least one video segment;

and sequencing and displaying the at least one video segment according to the weight.

4. A video retrieval apparatus, comprising:

the receiving module is used for receiving a retrieval text input by a user;

the matching module is used for matching the retrieval text with keywords prestored in a video database; wherein the keywords are obtained by the following steps:

performing block processing on each frame of video image in the target video to obtain a change value between corresponding blocks in adjacent frames, removing a corresponding video block with the maximum value in the change values, and when the change value is greater than a difference threshold value, segmenting the image to obtain each video segment;

the video database stores all video clips of the target video and the association relation between the keywords and the video clips;

and the obtaining and displaying module is used for obtaining and displaying at least one video clip corresponding to the keyword when the keyword matched with the retrieval text is matched.

5. The apparatus of claim 4, wherein the matching module comprises:

the first video subtitle acquisition submodule is used for acquiring subtitle files corresponding to the video clips and extracting video subtitle texts corresponding to the video clips from the subtitle files; or

The subtitle area selection submodule is used for sequentially selecting the subtitle areas in the video clips;

and the second video subtitle obtaining sub-module is used for carrying out character recognition on the subtitle region of each video segment so as to obtain the video subtitle text corresponding to the video segment.

6. The apparatus of claim 4, wherein the acquisition presentation module comprises:

the weight obtaining submodule is used for obtaining the weight of the keyword in the at least one video segment;

and the sequencing display sub-module is used for sequencing and displaying the at least one video segment according to the weight.