CN103945140A

CN103945140A - Method and system for generating video captions

Info

Publication number: CN103945140A
Application number: CN201310018669.9A
Authority: CN
Inventors: 赵永刚
Original assignee: Lenovo Beijing Ltd
Current assignee: Lenovo Beijing Ltd
Priority date: 2013-01-17
Filing date: 2013-01-17
Publication date: 2014-07-23
Anticipated expiration: 2033-01-17
Also published as: CN103945140B

Abstract

The invention discloses a method and system for generating video captions. The method includes the steps that control information of the playing type of the video captions is detected; video caption playing information matched with the control information of the playing type of the video captions is obtained; a video caption animation model corresponding to the video caption playing information is determined; video caption character information is extracted; finally, the video caption animation model is adopted for converting the video caption character information to generate the video captions, and because the generated video captions are the video captions which have the caption animation model, the purpose of the dynamic effect of the video captions is achieved.

Description

The generation method and system of video caption

Technical field

The present invention relates to technical field of data processing, more particularly, relate to a kind of generation method and system of video caption.

Background technology

Now, video comprises film and TV, because bringing the lifting of visual experience, is popularized rapidly.

But in prior art, the generating mode of the captions of video still can only meet the requirement of the solid-state demonstration of plane, can not realize dynamic demonstration.

Summary of the invention

In view of this, the invention provides a kind of generation method of video caption, to generate the video caption of dynamic effect.

To achieve these goals, the existing scheme proposing is as follows:

A kind of video caption generation method, comprising:

Detect video caption and play type control information;

Obtain with described video caption and play the video caption broadcast information that type control information matches;

Determine the video caption animation model corresponding with described video caption broadcast information;

Extract video caption Word message;

Adopt described video caption animation model to transform described video caption Word message, generating video captions.

Preferably, the control information of described detection video caption broadcasting type comprises:

Gather voice supplier's corresponding with captions in video human facial expression information.

Receive user's input video captions and play type control information.

Gather voice supplier's corresponding with captions in video tone;

Calculate the tonal variations of Preset Time section, determine the video caption broadcasting type control information corresponding with described tonal variations.

Preferably, described extraction video caption Word message comprises:

Gather voice supplier's corresponding with captions in video voice messaging;

Identify described voice messaging, generate the Word message corresponding with described voice.

Preferably, before generating video captions, also comprise:

Gather voice supplier's corresponding with captions in video speech volume;

Adjust the parameter of described video caption animation model according to described speech volume.

A kind of video caption generation system, comprising:

Detector, plays type control information for detection of video caption;

Processor, plays for obtaining with described video caption the video caption broadcast information that type control information matches; Determine the video caption animation model corresponding with described video caption broadcast information; Extract video caption Word message; Adopt described video caption animation model to make described video caption Word message, generating video captions.

Preferably, described detector is image acquisition device, for gathering the voice supplier's that video is corresponding with captions human facial expression information.

Preferably, described detector is receiver, plays type control information for the video caption that receives user's input.

Preferably, described detector is voice collector, for gathering the voice supplier's that video is corresponding with captions tone;

Described processor, also for obtaining described tone, calculates the tonal variations of Preset Time section, determines the video caption broadcasting type control information corresponding with described tonal variations.

Preferably, the mode of described processor extraction video caption Word message comprises:

Gather voice supplier's corresponding with captions in video voice messaging;

Preferably, described processor, also for before generating video captions, gathers voice supplier's corresponding with captions in video speech volume; Adjust the parameter of described video caption animation model according to described speech volume.

Can find out from above-mentioned technical scheme, in the generation method of video caption disclosed by the invention, the video caption of generation is the video caption with caption animation model, has realized the dynamic effect of video caption.

Brief description of the drawings

In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, to the accompanying drawing of required use in embodiment or description of the Prior Art be briefly described below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, do not paying under the prerequisite of creative work, can also obtain according to these accompanying drawings other accompanying drawing.

Fig. 1 is the flow chart of the generation method of the disclosed a kind of video caption of the embodiment of the present invention;

Fig. 2 is the flow chart of the generation method of the disclosed a kind of video caption of another embodiment of the present invention;

Fig. 3 is the flow chart of the generation method of the disclosed a kind of video caption of another embodiment of the present invention;

Fig. 4 is the flow chart of the generation method of the disclosed a kind of video caption of another embodiment of the present invention;

Fig. 5 is the flow chart of the generation method of the disclosed a kind of video caption of another embodiment of the present invention;

Fig. 6 is the structure chart of the disclosed a kind of video caption generation system of another embodiment of the present invention.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiment.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtaining under creative work prerequisite, belong to the scope of protection of the invention.

The embodiment of the present invention provides a kind of generation method of video caption, to generate the video caption of dynamic effect.

Referring to Fig. 1, the disclosed video caption of embodiment of the present invention generation method, comprises step:

S101, detection video caption are play type control information;

Wherein, described video caption is play the generation type of video caption described in type control information control, and adopts the generation type of video caption to play when displaying video captions.

S102, obtain with described video caption and play the video caption broadcast information that type control information matches;

Concrete, pre-stored video caption is play the corresponding relation of type control information and video caption broadcast information, play after type control information when getting described video caption, in described corresponding relation, search with described video caption and play the video caption broadcast information that type control information matches.

S103, definite video caption animation model corresponding with described video caption broadcast information;

S104, extraction video caption Word message;

Concrete, can pre-stored video caption Word message, in the time of needs generating video captions, obtain pre-stored video caption Word message; Or, in the time of needs generating video captions, receive the video caption Word message of input.

S105, adopt described video caption animation model to transform described video caption Word message, generating video captions.

Wherein, in the time need to generating the video caption of animation effect, need to generate according to video caption animation model.

In the generation method of the disclosed video caption of the present embodiment, the video caption of generation is the video caption with caption animation model, has realized the dynamic effect of video caption.

Preferably, in the disclosed video caption generation of the present embodiment method, before step S105, can also carry out following steps:

Gather voice supplier's corresponding with captions in video speech volume;

Concrete, the parameter in described video caption animation model is used for the degree of the animation effect of the video caption of controlling generation, in the time that needs generate the video caption of different animation effect degree, can adjust the parameter of video caption animation model.

In the playing process of video, the language and characters of the speaker in video is corresponding with captions, gathers speaker's speech volume, adjusts the parameter of described video caption animation model according to described speech volume, generates the video caption of different animation effect degree.

Another embodiment of the present invention also discloses a kind of video caption generation method, as shown in Figure 2, comprises step:

The voice supplier's corresponding with captions human facial expression information in S201, collection video;

Concrete, in video display process, the speaker's of current demonstration language and characters can be identical with captions.And speaker's human face expression can change according to the scene of video, gathers the speaker's of current demonstration human facial expression information, can ensure that the animation effect of the video caption generating is identical with the scene of current video.

And human facial expression information can comprise the interpupillary distance of eyes, single characteristic information such as appearance profile and mouth shape of eyes, or comprise all features that can reflect that human face expression changes, comprise eyes, the corners of the mouth and eyebrow etc.

S202, obtain the video caption broadcast information matching with described human facial expression information;

Concrete, collect after human facial expression information, reflect the scene of current video by identifying described human facial expression information.And the video caption broadcast information and the human facial expression information that obtain match, can ensure that the video caption generating meets the demand of video scene.

For example: when the human facial expression information gathering shows that the speaker of current demonstration is very glad, the scene that current video is described is cheerful and light-hearted scene; When the human facial expression information gathering shows that the speaker of current demonstration is very angry, the scene that current video is described is nervous scene.

Wherein, when the interpupillary distance that described human facial expression information is eyes, can analyze the size of the interpupillary distance of eyes, determine the speaker's of current demonstration mood; In the time of appearance profile that described human facial expression information is eyes, can analyze the trend of the appearance profile of eyes, determine the speaker's of current demonstration mood; When described human facial expression information is mouth shape, can analyze equally the mood of always determining the speaker of current demonstration of walking of mouth shape.

In the time that described human facial expression information is integrated information, comprise all features that can reflect that human face expression changes, described all features human face expression that form that can reflect that human face expression changes can be mated to the speaker's that the indicated people's of basic human face expression template that matching degree is higher mood is current demonstration mood with multiple basic human face expression templates.

Or, adopt analysis of neural network method, using the expression of basic face as output neuron, be generally the expression of six kinds of basic faces, using the human facial expression information collecting as input neuron, analytical calculation obtains correspondence states the human face expression type of human facial expression information, determines the speaker's of current demonstration mood.

S203, definite video caption animation model corresponding with described video caption broadcast information;

Wherein: different video caption broadcast information is to there being different video caption animation models; Get after video caption broadcast information, need to determine the video caption animation model corresponding with described video caption broadcast information.

For example, speaker's the mood of reacting current demonstration when described video caption broadcast information when glad, can be determined the beat model of effect of cheerful and light-hearted captions; The mood of reacting the speaker of current demonstration when described video caption broadcast information is when indignation, can determine the model with destructive effect.

S204, extraction video caption Word message;

Same as the previously described embodiments, can pre-stored video caption Word message, in the time of needs generating video captions, obtain pre-stored video caption Word message; Or, in the time of needs generating video captions, receive the video caption Word message of input.

S205, adopt described video caption animation model to transform described video caption Word message, generating video captions.

In the generation method of video caption disclosed by the invention, the video caption of generation is the video caption with caption animation model, has realized the dynamic effect of video caption; And the caption animation model of described video caption is also corresponding with the voice supplier human facial expression information corresponding with described captions, the dynamic effect of captions is met to subtitle language supplier's human face expression demand, strengthen the iconicity of screen picture.

Same as the previously described embodiments, the present embodiment, before step S205, also can comprise step:

Gather voice supplier's corresponding with captions in video speech volume;

For example: when definite video caption animation model is the beat models of effect of cheerful and light-hearted captions, adjust the beat parameter of model of effect of cheerful and light-hearted captions by the speech volume gathering, determine the amplitude of the captions of beating.

Another embodiment of the present invention also discloses a kind of video caption generation method, as shown in Figure 3, comprises step:

The video caption of S301, reception user input is play type control information;

Concrete, in the time needing the broadcasting type of video caption of manual control generation, can play type control information by input video captions.

S302, obtain with described video caption and play the video caption broadcast information that type control information matches;

Equally, the pre-stored corresponding relation that has store video captions to play type control information and video caption broadcast information, play after type control information when getting described video caption, in described corresponding relation, search with described video caption and play the video caption broadcast information that type control information matches.

S303, definite video caption animation model corresponding with described video caption broadcast information;

S304, extraction video caption Word message;

S305, adopt described video caption animation model to transform described video caption Word message, generating video captions.

The detailed process of the present embodiment is shown in above-mentioned two disclosed contents of embodiment, repeats no more herein.

The disclosed video caption of the present embodiment generation method, play type control information according to the video caption of user's input, final definite video caption animation model, then adopt described video caption animation model to transform described video caption Word message, generating video captions; So, can be according to user's request generating video captions.

Another embodiment of the present invention also discloses a kind of video caption generation method, as shown in Figure 4, comprises step:

The voice supplier's corresponding with captions tone in S401, collection video;

Concrete, in video display process, the scene difference of video, speaker's mood difference, speaker's the tone of speaking is also different; Pass judgment on current speech supplier's mood by gathering voice supplier's corresponding with captions in the video of a period of time tone.

The tonal variations of S402, calculating Preset Time section, determines the video caption broadcasting type control information corresponding with described tonal variations;

Concrete, according to actual user demand setting-up time section, the variation of calculating the tone of the collection of this time period, determines the control information of video caption broadcasting type according to tonal variations.

Wherein, generally, when judging that the tonal variations of Preset Time section is fast, show that speaker's mood is for excitement or angry, definite video caption broadcasting type control information can have for controlling video caption the video caption of violent animation effect and play type control information;

When judging that the tonal variations of Preset Time is less, or do not change, show that speaker's phychology is gentle, definite video caption broadcasting type control information can have for controlling video caption the video caption of mild animation effect and play type control information.

S403, obtain with described video caption and play the video caption broadcast information that type control information matches;

S404, definite video caption animation model corresponding with described video caption broadcast information;

S405, extraction video caption Word message;

S406, adopt described video caption animation model to transform described video caption Word message, generating video captions.

In the present embodiment, according to voice supplier tonal variations generating video captions, the dynamic effect of video caption is met to subtitle language supplier's tonal variations, equally also strengthened the iconicity of screen picture.

The embodiment of corresponding diagram 3 and Fig. 4, preferably, all can be before generating video captions, execution step:

Gather voice supplier's corresponding with captions in video speech volume;

Wherein, concrete process is shown in the embodiment of corresponding diagram 1 and Fig. 2, repeats no more herein.

Referring to Fig. 5, another embodiment of the present invention is disclosed video caption generation method also, comprises step:

S501, detection video caption are play type control information;

S502, obtain with described video caption and play the video caption broadcast information that type control information matches;

S503, definite video caption animation model corresponding with described video caption broadcast information;

The voice supplier's corresponding with captions voice messaging in S504, collection video;

S505, identify described voice messaging, generate the Word message corresponding with described voice.

S506, adopt described video caption animation model to transform described video caption Word message, generating video captions.

In the present embodiment, in the time of video playback, gather voice messaging, identification voice messaging, generates the Word message corresponding with described voice, does not need pre-stored video caption Word message, does not also need to obtain video caption Word message, more simple and convenient.

The detailed process of the present embodiment is shown in the disclosed content of above-mentioned all embodiment, repeats no more herein.

Another embodiment of the present invention also discloses a kind of video caption generation system, referring to Fig. 6, comprising:

Detector 101, plays type control information for detection of video caption;

Processor 102, plays for obtaining with described video caption the video caption broadcast information that type control information matches; Determine the video caption animation model corresponding with described video caption broadcast information; Extract video caption Word message; Adopt described video caption animation model to make described video caption Word message, generating video captions.

Concrete, detector 101 detects after video caption is play type control information and transfers to processor 102, the interior pre-stored video caption of processor 102 is play the corresponding relation of type control information and video caption broadcast information, when receiving described video caption, plays after type control information by processor 102, in described corresponding relation, search with described video caption and play the video caption broadcast information that type control information matches, then determine the video caption animation model corresponding with described video caption broadcast information; And extract video caption Word message, and last, adopt described video caption animation model to make described video caption Word message, generating video captions.

Wherein, processor 102 can pre-stored video caption Word message, in the time of needs generating video captions, obtains pre-stored video caption Word message; Or in the time of needs generating video captions, processor 102 receives the video caption Word message of input.

The disclosed video caption generation system of the present embodiment, when detecting video caption, detector 101 plays type control information, and sending it to processor 102, processor 102 obtains with described video caption and plays the video caption broadcast information that type control information matches; Determine the video caption animation model corresponding with described video caption broadcast information; Extract video caption Word message; Adopt described video caption animation model to make described video caption Word message, generating video captions.Like this, the video caption that processor 102 generates is the video caption with caption animation model, has realized the object of the dynamic effect of video caption.

Preferably, the detector 101 in above-described embodiment can be image acquisition device, for gathering the voice supplier's that video is corresponding with captions human facial expression information.

Concrete, described image acquisition device can be camera, takes the facial image of speaker in screen; Wherein, can take whole face, also can only take for the part of face, as: human eye, mouth etc.

Described processor obtains the image that camera is taken, and recognition image, determines speaker's current mood, and obtains the video caption broadcast information matching with described human facial expression information.

Wherein, recognition image is shown in the embodiment corresponding with Fig. 2 with the process of the current mood of determining speaker, repeats no more herein.

Or preferably, the detector 101 in above-described embodiment is receiver, play type control information for the video caption that receives user's input.

Concrete, described receiver can communication interface, described processor is connected with external equipment by communication interface, and user plays type control information at the human-computer interaction interface input video captions of external device, and described video caption is play type control information and transferred to processor by communication interface.

Again or, preferably, the detector 101 in above-described embodiment is voice collector, for gathering the voice supplier's that video is corresponding with captions tone;

Concrete, described voice collector can be speech transducer, gathers the frequency of speaker's voice, i.e. tone.Processor obtains the frequency of the speaker's of speech transducer collection voice, calculates the tonal variations of Preset Time section, determines the video caption broadcasting type control information corresponding with described tonal variations.

Described processor is determined the control information of video caption broadcasting type according to the speed of tonal variations, and detailed process is shown in the content of the embodiment that Fig. 4 is corresponding, repeats no more herein.

In above-mentioned all embodiment, the mode that described processor extracts video caption Word message can be: pre-stored video caption Word message, in the time of needs generating video captions, obtains pre-stored video caption Word message; Or, in the time of needs generating video captions, receive the video caption Word message of input.

Can also comprise: in the process of video playback, described processor gathers voice supplier's corresponding with captions in video voice messaging; Identify described voice messaging, generate the Word message corresponding with described voice.Like this, do not need extra storage video caption Word message or additionally receive store video caption character information, only need transform according to video voice, simple and convenient.

And above-mentioned disclosed all embodiment, transform described video caption Word message at video caption animation model described in described processor adopting, before generating video captions, described processor can also be carried out following operation:

Gather voice supplier's corresponding with captions in video speech volume;

Finally, also it should be noted that, in this article, relational terms such as the first and second grades is only used for an entity or operation to separate with another entity or operating space, and not necessarily requires or imply and between these entities or operation, have the relation of any this reality or sequentially.And, term " comprises ", " comprising " or its any other variant are intended to contain comprising of nonexcludability, thereby the process, method, article or the equipment that make to comprise a series of key elements not only comprise those key elements, but also comprise other key elements of clearly not listing, or be also included as the intrinsic key element of this process, method, article or equipment.The in the situation that of more restrictions not, the key element being limited by statement " comprising ... ", and be not precluded within process, method, article or the equipment that comprises described key element and also have other identical element.

In this specification, each embodiment adopts the mode of going forward one by one to describe, and what each embodiment stressed is and the difference of other embodiment, between each embodiment identical similar part mutually referring to.

To the above-mentioned explanation of the disclosed embodiments, make professional and technical personnel in the field can realize or use the present invention.To be apparent for those skilled in the art to the multiple amendment of these embodiment, General Principle as defined herein can, in the situation that not departing from the spirit or scope of the present invention, realize in other embodiments.Therefore, the present invention will can not be restricted to these embodiment shown in this article, but will meet the widest scope consistent with principle disclosed herein and features of novelty.

Claims

1. a video caption generation method, is characterized in that, comprising:

Detect video caption and play type control information;

Extract video caption Word message;

2. method according to claim 1, is characterized in that, described detection video caption is play type control information and comprised:

3. method according to claim 1, is characterized in that, described detection video caption is play type control information and comprised:

Receive user's input video captions and play type control information.

4. method according to claim 1, is characterized in that, described detection video caption is play type control information and comprised:

Gather voice supplier's corresponding with captions in video tone;

5. method according to claim 1, is characterized in that, described extraction video caption Word message comprises:

Gather voice supplier's corresponding with captions in video voice messaging;

6. according to the method described in any one in claim 1-5, it is characterized in that, before generating video captions, also comprise:

Gather voice supplier's corresponding with captions in video speech volume;

7. a video caption generation system, is characterized in that, comprising:

Detector, plays type control information for detection of video caption;

8. system according to claim 7, is characterized in that, described detector is image acquisition device, for gathering the voice supplier's that video is corresponding with captions human facial expression information.

9. system according to claim 7, is characterized in that, described detector is receiver, plays type control information for the video caption that receives user's input.

10. system according to claim 7, is characterized in that, described detector is voice collector, for gathering the voice supplier's that video is corresponding with captions tone;

11. systems according to claim 7, is characterized in that, the mode that described processor extracts video caption Word message comprises:

Gather voice supplier's corresponding with captions in video voice messaging;

12. according to the system described in claim 7-11 any one, it is characterized in that, described processor, also for before generating video captions, gathers voice supplier's corresponding with captions in video speech volume; Adjust the parameter of described video caption animation model according to described speech volume.