CN107277645A

CN107277645A - Error correction method and device for subtitle content

Info

Publication number: CN107277645A
Application number: CN201710624479.XA
Authority: CN
Inventors: 王金龙
Original assignee: Guangdong Genius Technology Co Ltd
Current assignee: Guangdong Genius Technology Co Ltd
Priority date: 2017-07-27
Filing date: 2017-07-27
Publication date: 2017-10-20

Abstract

The embodiment of the invention discloses a method and a device for correcting subtitle content, wherein the method comprises the following steps: extracting first text information corresponding to a target subtitle strip in a video file; identifying the audio information of the target subtitle strip to obtain corresponding second text information; and comparing the first text information with the second text information through texts to correct errors, and outputting an error correction result. The intelligent correction of the subtitle content is realized, and the problems of low efficiency of manual correction and high input cost are solved.

Description

The error correction method and device of a kind of caption content

Technical field

The present embodiments relate to the error correction method and device of multimedia technology, more particularly to a kind of caption content.

Background technology

It is usually while seeing that audio, one side typing captions text are listened in video or side generally when the captions of audio frequency and video make This, and whether the captioned test content recorded is consistent or corresponding with the audio content in video, influence user viewing video or receipts Listen the experience of audio.

It is typically manually to go to check in the prior art, crosschecks and pinpoint the problems.The result that artificial error correction is brought is effect Rate underground, input cost is high.

The content of the invention

The embodiment of the present invention provides a kind of error correction method and device of caption content, realizes and the intelligence of caption content is entangled Mistake, the problem of solving low artificial error correction efficiency and high input cost.

In a first aspect, the embodiments of the invention provide a kind of error correction method of caption content, methods described includes：

Extract corresponding first text message of target caption strips in video file；

Recognize that the audio-frequency information of the target caption strips obtains corresponding second text message；

First text message and second text message are compared into carry out error correction by text, error correction knot is exported Really.

Further, first text message for extracting target caption strips in video file includes：

Judge whether current image frame there are captions, if, it is determined that the position of the caption strips and the caption strips Start frame and abort frame；

Extract the first text message of the caption strips.

Further, corresponding second text message of audio-frequency information of the identification target caption strips includes：

Time interval is determined according to the start frame and the abort frame；

The audio-frequency information in video is parsed and cut according to the time interval；

Audio-frequency information after parsing and cutting is compared with pre-set text storehouse, the audio-frequency information corresponding the is recognized Two text messages.

Further, described compare first text message and second text message by text is entangled Mistake, output error correction result includes：

First text message and second text message are compared one by one in units of word or word；

Record words or word different from first text in second text；

The word or word are exported as error correction result.

Further, the pre-set text library storage is in the server being connected with sound identification module.

Second aspect, the embodiments of the invention provide a kind of error correction device of caption content, described device includes：

Information extraction modules, for extracting corresponding first text message of target caption strips in video file；

Information identification module, recognizes that the audio-frequency information of the target caption strips obtains corresponding second text message；

Information comparison module, for first text message and second text message to be compared into progress by text Error correction, exports error correction result.

Further, described information extraction module specifically for：

Extract the first text message of the caption strips.

Further, described information identification module specifically for：

Time interval is determined according to the start frame and the abort frame；

Further, described information comparing module specifically for：

Record words or word different from first text in second text；

The word or word are exported as error correction result.

In the embodiment of the present invention, corresponding first text message of target caption strips in video file is extracted；Recognize the mesh The audio-frequency information of mark caption strips obtains corresponding second text message；By first text message and second text message Compared by text and carry out error correction, export error correction result.The intelligent correction to caption content is realized, artificial error correction efficiency is solved The problem of low and input cost is high.

Brief description of the drawings

Fig. 1 is a kind of flow chart of the error correction method of caption content in the embodiment of the present invention one；

Fig. 2 is a kind of flow chart of the error correction method of caption content in the embodiment of the present invention two；

Fig. 3 is a kind of flow chart of the error correction method of caption content in the embodiment of the present invention three；

Fig. 4 is a kind of structural representation of the error correction device of caption content in the embodiment of the present invention four.

Embodiment

The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the present invention, rather than limitation of the invention.It also should be noted that, in order to just Part related to the present invention rather than entire infrastructure are illustrate only in description, accompanying drawing.

Embodiment one

Fig. 1 is a kind of flow chart of the error correction method for caption content that the embodiment of the present invention one is provided, and the present embodiment can be fitted Situation for carrying out error correction to caption content, this method can be a kind of entangling for caption content that embodiment is provided by the present invention Misloading is put to perform, and the device can be realized by the way of software and/or hardware.With reference to Fig. 1, this method can specifically be included such as Lower step：

Corresponding first text message of target caption strips in S110, extraction video file.

Specifically, it is necessary to the audio heard with reference to the caption information in video and user during user's viewing video Information appreciates the picture in video.Usual caption strips are located at the middle and lower part that user watches the whole screen of picture, are broadcast in video During putting, it may appear that multiple caption strips, determine that at least one caption strips is according to the demand of user in multiple caption strips Target caption strips, extract corresponding first text message of target caption strips in video file.Wherein, the first text message and target Captions in caption strips are corresponded.

Optionally, corresponding first text message of target caption strips is extracted using texture denoising method.Detailed process is as follows： Seek survival in the average and image in the caption strips region of the multiple image frame luminance picture of same captions；Average and image is carried out Split by maximum variance between clusters, generation only has the caption area image in two kinds of Color-Connected domains of black and white；To maximum kind Between variance method segmentation after image determine which kind of color be character area；Finally reject non-legible noise.

S120, the audio-frequency information of the identification target caption strips obtain corresponding second text message.

Wherein, speech recognition is carried out to the corresponding audio-frequency information of target caption strips, recognition result is labeled as the second text envelope Breath, wherein, the second text message is corresponding with the audio-frequency information of target caption strips.

S130, first text message and second text message by text compared into carry out error correction, output is entangled Wrong result.

Specifically, the first text message and the second text message are subjected to error correction by text comparison method, optionally, by It is that speech recognition acquisition is carried out to audio-frequency information in the second text message, the second text message can be believed as target text Breath, the first text message is compared with target text information.In comparison result, by part different in two text messages Error section is defined as, that is, error correction result, then exports error correction result.

On the basis of above-mentioned technical proposal, " first text message and second text message are passed through into text Compare and carry out error correction, export error correction result " can be specifically：

First text message and second text message are compared one by one in units of word or word；Record The word or word different from first text message in second text message；It regard the word or word as error correction result Exported.

Optionally, can be by the first text message and the second text message in the specific error correction implementation to text It is compared one by one in units of word or word.In a specific example, word can be short word or long word, right Specific word length is not specifically limited.It should be noted that the length of word is shorter, the result of comparison is more accurate.Contrast Different word or word are recorded, record result is exported as wrong result is entered.

Embodiment two

Fig. 2 is a kind of flow chart of the error correction method for caption content that the embodiment of the present invention two is provided, and the present embodiment is upper State on the basis of embodiment, " the first text message for extracting target caption strips in video file " is optimized.With reference to figure 2, this method specifically may include steps of：

S210, judge whether current image frame there are captions, if so, then performing S220, S210 is performed if it is not, then returning.

Specifically, according to determining current picture frame in the video played, and judge that current image frame middle row is It is no to have captions, if without captions, returning and continuing to judge whether current image frame has captions, until there is captions appearance.

S220, the position for determining the caption strips and the caption strips start frame and abort frame.

Specifically, when determining the position of caption strips, the luminance picture of acquired image frames first generates texture maps, by hanging down Straight grain figure floor projection seeks difference, first determines the upper and lower side frame of horizontal caption strips, then determines the left and right side frame of horizontal caption strips, So that it is determined that the horizontal level of caption strips；Then the position of vertical caption strips is determined, is asked and looked into by horizontal texture figure upright projection Point, vertical caption strips left and right side frame is first determined, then vertical caption strips upper and lower side frame is determined, caption strips denoising is finally carried out, it is determined that The position of caption strips.

Wherein, if there is caption strips, if current image frame is caption strips key frame, then in previous key frame and the word The start frame of caption strips is determined between curtain bar key frame, then the caption strips region of the caption strips key frame is matched below successively Key frame, if matching is consistent, continues to match, inconsistent until matching, then true in previous key frame and current key frame Determine the abort frame of caption strips.

S230, the first text message for extracting the caption strips.

S240, the audio-frequency information of the identification target caption strips obtain corresponding second text message.

S250, first text message and second text message by text compared into carry out error correction, output is entangled Wrong result.

In the embodiment of the present invention, by judging whether there is captions in current image frame, if so, then determining the position of caption strips And the start frame and abort frame of the caption strips, if being judged always untill detecting the presence of captions without if.Pass through The judgement of the start frame and abort frame of caption strips, realizes the extraction to caption information in caption strips.

Embodiment three

Fig. 3 is a kind of flow chart of the error correction method for caption content that the embodiment of the present invention three is provided, and the present embodiment is upper State on the basis of embodiment, " corresponding second text messages of audio-frequency information of the identification target caption strips " have been carried out excellent Change.With reference to Fig. 3, this method specifically may include steps of：

S310, judge whether current image frame there are captions, if so, then performing S320, S310 is performed if it is not, then returning.

S320, the position for determining the caption strips and the caption strips start frame and abort frame.

S330, the first text message for extracting the caption strips.

S340, time interval determined according to the start frame and the abort frame.

Can be T to time interval specifically, determining a time interval according to start frame and abort frame, that is, from The time of the start frame of same caption strips to abort frame is T.

S350, the audio-frequency information in time interval parsing and cutting video.

Wherein, on the basis of at definite intervals, the audio-frequency information in video is parsed and split.In a tool In the example of body, by video on the basis of time interval T, the audio in video is carried out to be divided into some section audio information, and Audio-frequency information after segmentation is parsed.

S360, the audio-frequency information after parsing and cutting and pre-set text storehouse be compared, recognize the audio-frequency information pair The second text message answered.

Specifically, the audio-frequency information after parsing and cutting is compared with pre-set text storehouse, optionally, pre-set text storehouse It can be obtained by speech identifying function, can be by calling University of Science and Technology's news to fly opening for speech recognition in a specific example Source interface is obtained.Wherein, be stored with the corresponding relation of each audio content and corresponding text message in pre-set text storehouse. Audio-frequency information after parsing and cutting is compared with pre-set text storehouse, corresponding second text message of identification audio-frequency information.

Optionally, the pre-set text library storage is in the server being connected with sound identification module.

Wherein, sound identification module is connected with server, and pre-set text library storage is in the server.Stored in server There is the pre-set text, realize according to the real-time calling for being used for demand to pre-set text storehouse.

S370, first text message and second text message by text compared into carry out error correction, output is entangled Wrong result.

In the embodiment of the present invention, the start frame of preferred picture frame and the abort frame determine time interval, and according to described Time interval parses and cut the audio-frequency information in video, and the audio-frequency information after parsing and cutting is compared with pre-set text storehouse It is right, recognize corresponding second text message of the audio-frequency information.Realize the knowledge of the second text message corresponding to audio-frequency information Not.

Example IV

Fig. 4 be the present invention be example IV provide a kind of caption content error correction device structural representation, the device It is adapted for carrying out a kind of error correction method for caption content that the embodiment of the present invention is supplied to.As shown in figure 4, the device specifically can be with Including：

Information extraction modules 410, for extracting corresponding first text message of target caption strips in video file；

Information identification module 420, recognizes that the audio-frequency information of the target caption strips obtains corresponding second text message；

Information comparison module 430, for first text message to be compared with second text message by text Error correction is carried out, error correction result is exported.

Further, information extraction modules 410 specifically for：

Extract the first text message of the caption strips.

Further, information identification module 420 specifically for：

Time interval is determined according to the start frame and the abort frame；

Further, information comparison module 430 specifically for：

Record words or word different from first text in second text；

The word or word are exported as error correction result.

The captions that the executable any embodiment of the present invention of the error correction device of caption content provided in an embodiment of the present invention is provided The error correction method of content, possesses the corresponding functional module of execution method and beneficial effect.

Note, above are only presently preferred embodiments of the present invention and institute's application technology principle.It will be appreciated by those skilled in the art that The invention is not restricted to specific embodiment described here, can carry out for a person skilled in the art it is various it is obvious change, Readjust and substitute without departing from protection scope of the present invention.Therefore, although the present invention is carried out by above example It is described in further detail, but the present invention is not limited only to above example, without departing from the inventive concept, also Other more equivalent embodiments can be included, and the scope of the present invention is determined by scope of the appended claims.

Claims

1. a kind of error correction method of caption content, it is characterised in that including：

First text message and second text message are compared into carry out error correction by text, error correction result is exported.

2. according to the method described in claim 1, it is characterised in that first text for extracting target caption strips in video file This information includes：

Judge whether current image frame there are captions, if, it is determined that the starting of the position of the caption strips and the caption strips Frame and abort frame；

Extract the first text message of the caption strips.

3. method according to claim 2, it is characterised in that the audio-frequency information correspondence of the identification target caption strips The second text message include：

Time interval is determined according to the start frame and the abort frame；

Audio-frequency information after parsing and cutting is compared with pre-set text storehouse, corresponding second text of the audio-frequency information is recognized This information.

4. according to the method described in claim 1, it is characterised in that described by first text message and second text Information is compared by text carries out error correction, and output error correction result includes：

Record words or word different from first text in second text；

The word or word are exported as error correction result.

5. method according to claim 3, it is characterised in that the pre-set text library storage with sound identification module phase In server even.

6. a kind of error correction device of caption content, it is characterised in that including：

Information comparison module, is entangled for first text message to be compared with second text message by text Mistake, exports error correction result.

7. device according to claim 6, it is characterised in that described information extraction module specifically for：

Extract the first text message of the caption strips.

8. device according to claim 7, it is characterised in that described information identification module specifically for：

Time interval is determined according to the start frame and the abort frame；

9. device according to claim 6, it is characterised in that described information comparing module specifically for：

Record words or word different from first text in second text；

The word or word are exported as error correction result.

10. device according to claim 8, it is characterised in that the pre-set text library storage with sound identification module In connected server.