CN102208023B

CN102208023B - Method for recognizing and designing video captions based on edge information and distribution entropy

Info

Publication number: CN102208023B
Application number: CN 201110024330
Authority: CN
Inventors: 魏宝刚; 庄越挺; 袁杰; 鲁伟明
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2011-01-23
Filing date: 2011-01-23
Publication date: 2013-05-08
Anticipated expiration: 2031-01-23
Also published as: CN102208023A

Abstract

The invention discloses a method for recognizing and designing video captions based on edge information and distribution entropy. The method comprises the following steps of: acquiring image edge information by utilizing an angular point reinforced edge detection method; connecting edge points and collecting connected domains, and segmenting the connected domains appropriately by utilizing a segmentation algorithm; acquiring the accurate positions of the connected domains by utilizing refinement operation; filtering out non-text areas by utilizing a tailing filter and a combination entropy filter, wherein the remains are text areas; and setting the detected text areas into the uniform format of white characters on black, carrying out local threshold binaryzation, edge noise point expansion removal operation based on forbidden expansion point constraint and noise removal operation based on surrounding edge point counting to obtain a binary image and sending the binary image to OCR (optical character recognition) software for recognition. Through the method, the defect that common methods are very sensitive to languages and caption arrangement manner, background complexity degree and the like can be overcome; favorable detection effects can be obtained through the introduction of the segmentation algorithm and the combination entropy filter; and the recognition accuracy is improved greatly by improving the traditional binaryzation method.

Description

Video caption identification method for designing based on marginal information and Distribution Entropy

Technical field

The present invention relates to a kind of video caption recognition methods based on marginal information and Distribution Entropy, the method be used for to realize that detecting and extracts captions at video identifies for OCR, belongs to the Computer Image Processing field.

Background technology

Along with the development of multimedia and electronics industry, increasing video information is produced.How effectively to organize and retrieve them and just become a difficult problem.A lot of video datas such as TV news, sports tournament, film, variety show etc. have the caption information that adds in post-production, and these caption informations and video content are closely related.If effectively identify these captions, can utilize them that video data is organized and retrieved, have very strong practical value.

Video caption identification was divided into for four steps: captions detection, caption location, captions extract and OCR identification.Captions detect and are used for determining caption area; Caption location is used for locating the exact position of every delegation captions; Captions extract and are used for the caption area binaryzation, only keep the stroke pixel; Final step is generally transferred to commercial OCR software and is realized.Captions detect and can be divided into four kinds of methods: based on the method on limit, based on the method for the method of connected domain, color-based cluster with based on the method for texture.Detect the text limit based on the method on limit with the limit filtrator, then merge them with morphological operation.The 8th document analysis and identification meeting (In Proceedings of 8 ^rdInternational Conference on Document Analysis and Recognition (ICDAR), 2005, the method of 610-614) announcing uses edge detection method to obtain four edge map, then use the K-MEANS algorithm to detect the candidate text filed, determine and refine text filed with heuristic rule and Projection Analysis at last.If there is no complicated background, can be fine based on the effect of the method on limit, but when background comprises a lot of side information, their effect is just poorly.Method based on texture is used the texture feature extractions such as Gabor filtrator, wavelet transformation, Fast Fourier Transform (FFT), and then the method with machine learning such as neural network, svm classifier devices detects caption area.Ieee communication technology meeting paper in 2008 is concentrated (In Proceeding of IEEE International Conference on Communication Technology (ICCT), 2008, a kind of method of 722-725) announcing by 4 fritter wavelet coefficients being merged into a bulk of the location in the big font text, then strengthens effect with morphological dilation and neural grid with the HARR wavelet transformation.Method based on connected domain is divided into a plurality of little connected domains with a frame, then they is merged to and is used for locating captions in larger connected domain.ACM multimedia technology meeting paper in 2007 is concentrated (In Proceedings of the ACM International Multimedia Conference and Exhibition 2007 (MM), a kind of method of 847-850) announcing uses the cluster of fiduciary color to remove noise, and they come the relatively best color panel of adaptive selection to carry out the binaryzation operation according to the TEXT CONTRAST difference of each color panel.Textcolor in the method hypothesis frame of video of color-based cluster is all unified, yet this hypothesis is in most of the cases invalid, so the limitation of its application is larger.To detect its effect undesirable owing to utilizing a kind of feature to carry out captions, and therefore a lot of methods are united and used above various features.For caption location, generally use the method for Gray Projection.The captions extracting method can be divided into the method for color-based and based on the method for stroke.The method of a lot of color-baseds uses the Otsu method to carry out binaryzation to gray-scale map, but when the gray level of captions and background was very close, the method can not well be told them, thus well denoising." Institute of Electrical and Electric Engineers video technology circuit and system's journal " the 15th phase in 2005 (IEEE Transactions on Circuits and Systems for Video Technology 2005, 15(2): 243-255) with " the Institute of Electrical and Electric Engineers image is processed journal " the 18th phase in 2009 (IEEE Transactions on Image Processing 2009, 18(2): a kind of method of announcing 401-411) is used the local adaptation threshold values that better resolving power is arranged, in conjunction with dam point mark and inwardly filling, make most of noise spot can go to be removed.

Above these caption detection methods all to video caption testing made some good tries, but these methods are not fine to the resolving effect of captions and background, and only adopting these methods to detect the changeable video of some language, font and text alignment mode, to carry out treatment effect not good.Although the captions extracting method of having deposited in addition can remove most of noise, because OCR software is very responsive to noise spot, cause the poor effect of text identification under complex background.

Summary of the invention

The objective of the invention is to overcome the deficiencies in the prior art, a kind of video caption recognition methods based on marginal information and Distribution Entropy is provided.

Step based on the video caption recognition methods of marginal information and Distribution Entropy is as follows:

1) detect the difference of present frame and last processed frame, if difference is large, carry out following subtitle recognition operation, judge otherwise continue to get next frame;

2) at first subtitle recognition carries out the captions detection, use in captions detect that rim detection, marginal point connect, connected domain collection and dividing method, connected domain is refined and the filter method that trails obtains candidate's textview field and position thereof, remove non-textview field with the combination entropy filtrator again, only stay caption area;

3) caption area is carried out repeatability and detect, if should not repeat in the zone, its color is extremely unified to be the black matrix wrongly written or mispronounced character, then carry out captions and extract, otherwise process next caption area;

4) in captions extract, color utmost point caption area is after reunification carried out binaryzation, send the identification of OCR software after the removal noise spot.

The difference of described detection present frame and last processed frame if difference is large, is carried out the operation of following subtitle recognition, and carry out determining step and be otherwise continue to get next frame: establishing this frame is I _i, its edge binary map is E _i, its last processed frame is that front the 5th frame is I _i-5, its edge binary map is E _I-5, make D _{I, i-5}=E _i⊕ E _I-5, make that last detected caption area is Area _{I-5, j}, again last each caption area edge binary map cumulative and minimum value be PMES, in present frame, the cumulative difference of caption area is calculated as follows:

Figure 2011100243300100002DEST_PATH_IMAGE002

1

If cFDBe less than or equal to PMES* 0.5, do not need this frame is carried out the subtitle recognition operation, continue to get next the 5th frame and judge, otherwise just need to carry out the subtitle recognition operation to this frame, in order further to prevent from missing captions, separately establish a count value ck, each cFDBe less than or equal to PMESThe ck value added 1 in * 0.5 o'clock, otherwise ck resets to 0, if ck equals 5, regardless of the front judgement, all needed this frame is carried out subtitle recognition operation, ck reassignment 0 simultaneously.

At first described subtitle recognition is carried out captions and is detected, use in captions detect that rim detection, marginal point connect, connected domain collection and dividing method, connected domain is refined and the filter method that trails obtains candidate's textview field and position thereof, remove non-textview field with the combination entropy filtrator again, only stay the caption area step and be:

(1) edge detection method

Given image I adopts Sobel operator Edge detected, and the Sobel operator is by horizontal S _H, vertical S _V, diagonal line S _LD, contrary diagonal line S _RDGradient template on four direction forms, and fringing field is calculated by following formula:

Figure 2011100243300100002DEST_PATH_IMAGE004

2

Wherein

Be illustrated in pixel (x, y) and locate direction with greatest gradient absolute value perpendicular direction, k is an adjustment factor, and its value is that then 1, S is quantized into 16 grades herein, is expressed as S ' after quantification, then obtains edge map with following formula EdgeMap:

3

(2) marginal point method of attachment

For edge map EdgeMapIf the distance of two marginal points of going together is less than a certain threshold values T _d, will EdgeMapIn pixel value between these two pixels all be set to 1, also namely fill the pixel between these two marginal points, T _dDetermined by following formula:

4

Wherein height and width are respectively the height and width of image I;

(3) connected domain is collected and dividing method

The upper step is obtained EdgeMapCarry out connected domain and collect, remove those and highly or wide remove simultaneously those minimum area-encasing rectangles less than the connected domain of entire image area 0.2% less than 1% high or wide connected domain of entire image, re-use following steps each connected domain C is carried out Region Segmentation:

A) for the every i of delegation in C

, obtain the area of the minimum area-encasing rectangle of this row and above part

Area with this minimum area-encasing rectangle of part below row , obtain this two sum of areas, find out obtain minimum and line number be stored in bR;

B) for each the row j in C

, obtain the area of the minimum area-encasing rectangle of these row and left-hand component

Area with the minimum area-encasing rectangle of this row right-hand component

, obtain this two sum of areas, find out obtain minimum and row number be stored in bC;

C) order

,

If, mRA＜mCA, connected domain C being expert at upper is divided into two connected domains with bR behavior circle, otherwise connected domain C is classified as the boundary with bC and is divided into two connected domains listing;

Wherein t _c , b _c l _cWith r _cRespectively upper bound line number, the lower bound line number of regional C, left boundary row number and right boundary row number;

In order to prevent over-segmentation, to cut apart just only have when connected region C satisfies following two conditions simultaneously:

The connected domain filling rate is less than 0.8;

Two new connected domain areas that are divided into are all greater than 0.2% of entire image area;

(4) connected domain is refined and is trailed filter method

Carrying out before the zone refines, removing first that those are tall and big in the wide connected domain of 2 times, may delete like this captions of those vertical setting of types by mistake, in order to process the vertical setting of types captions, need only be with image rotation 90 degree, other operate striking resemblances;

To each connected region C that the upper step obtains, the step of further being refined in its position is as follows:

Input: edge map EdgeMap, the initial up-and-down boundary position of connected domain C

Output:Bound position after refining

D) for any row of connected domain C

, calculate its EdgeMapIn the non-zero pixel span in left and right

, and be stored in set cSA;

E) for any row of connected domain C

, calculate its EdgeMapIn capable pixel number, and be stored in set

In, namely have

The maximal value of f) getting in cSA exists In, and its sequence number is existed PSRNIn;

Get In maximal value exist

In, and its sequence number is existed

In;

G) for

All row in scope are got

The maximum row sequence number

For

All row in scope are got

The minimum row sequence number

For

All row in scope are got

The maximum row sequence number

For

All row in scope are got

The minimum row sequence number

H) order

, the bound position after namely being refined

Wherein

With

Usually value is 0.6 and 0.3;

Use following hangover filter method to remove some non-captions connected domains:

I) step g in the above) complete after, continue in oPNA to upper and lower scanning, until the value at current line place less than

, suppose that the line number that obtains is respectively t _TailWith b _Tail

J) use the length of following formula figuring the tail:

tl ₁= t ₂- t _tail , tl ₂= b _tail- b ₂ , tl=max ( tl ₁, tl ₂)

K) filter with following formula, if DeleteFlag (C)Be 1, illustrate that this connected domain is not caption area, should delete;

5

Ub wherein _cAnd ut _cRepresent respectively the bound position after connected domain C refines, and and usually value be 0.2 and 0.3;

(5) combination entropy filtrator

Use the combination entropy filtrator of associating foreground pixel Distribution Entropy and edge pixel Distribution Entropy to filter, only stay caption area;

For the foreground pixel Distribution Entropy, be the minimum area-encasing rectangle Rect [t to a certain connected domain C _c, b _c, l _c, r _c], t wherein _c, b _cRespectively bound, l _c, r _cBe respectively the circle, left and right, use the Otsu threshold values with its binaryzation, then be divided into 2 row * 4 row=8 parts, use following formula Computation distribution entropy:

6

P wherein _i,jThe ratio that represents the non-zero pixel of i capable j row part;

For the edge pixel Distribution Entropy, be the minimum area-encasing rectangle Rect [t with connected domain C _c, b _c, l _c, r _c] in Sobel edge binary map be divided into 2 row * 4 row=8 parts, use following formula Computation distribution entropy:

7

Wherein e _ijRepresent the capable j row of i part edge pixel number, and e _r8 part edge number of pixels summations,

For the connected domain C after arbitrary refining, if its

And

, think that it is caption area, otherwise be just non-caption area, should delete, test

With

Get respectively 6.4 and 2.76 o'clock effects best;

The image that the vertical setting of types captions are arranged again for some existing horizontally-arranged carries out captions and detects, then both testing results are merged in the image of former figure and 90-degree rotation gained, eliminate and repeat.

Describedly caption area carried out repeatability detect, if should not repeat in the zone, be the black matrix wrongly written or mispronounced character with its color is extremely unified, then carry out captions and extract, otherwise process next caption area step be:

(6) repeatability detects

Adopt binding site and the histogrammic method of greyscale color that detected caption area is disappeared heavily, step is as follows:

L) extract and store all caption area position Rect of last processed frame _i[t _i, b _i, l _i, r _i] and grey level histogram GH _i{ g _i,0, g _i,1... g _{I, 255}, wherein

Be i the number of pixels that the caption area gray level is k; Extract and store all caption area position Rect of present frame _j[t _j, b _j, l _j, r _j] and grey level histogram GH _j{ g _j,0, g _j,1... g _{J, 255};

M) calculate their location similarity

With the grey level histogram similarity

, wherein

The area of their public part, and

The area of large that in them, if With

Having one greater than 0.8, is the duplicate detection of same area, and remove one this moment, keeps one;

(7) color is extremely unified

The caption area gray-scale map is unified into the black matrix wrongly written or mispronounced character, takes following steps:

N) at first the caption area after gray processing is used the value of Otsu method two, then used respectively 3 * 3 mask

With

Caption area after binaryzation is carried out convolution operation, determines the edge color at each pixel place with following formula:

8

Make N _wAnd N _bRepresent respectively white edge number of pixels and black border number of pixels, definition

For they the ratio;

O) the edge map P that formula of upper steps 8 is obtained presses the row projection with edge pixel, establishes edge map and resolves into { x listing ₀, x ₁..., x _n, x wherein _iListing the mid point that is projected as 0 a certain continuum, set up successively rectangle Rect for outline map _i[1, height, x _i, x _i+1], in the edge map P in this rectangular extent, from four limits to interscan, first edge pixel point deletion that will run into is added up white edge number of pixels and black border number of pixels again, is made as respectively With

P) definition

Be their ratio, definition

9

Judge with the following method the color utmost point of caption area:

(a) if , be wrongly written or mispronounced character;

(b) if

, work as

The time, captions are white, when

The time captions be black;

(c) if

, be wrongly written or mispronounced character;

(d) if

, work as

The time, captions are white, when

The time captions be black;

(e) if

, be surplus;

Wherein

Q) judge the captions color extremely after, if surplus with this caption area gray-scale map inverse, otherwise does not operate.

Describedly in captions extract, color utmost point caption area is after reunification carried out binaryzation, send OCR software identification step to be after removing noise spot:

R) the high regular of caption area with gray processing turns to 24, then expands up and down respectively 4 pixels, thereby is highly 32 after expansion, is made as EI;

S) each pixel of binary map B as a result is initialized as 1, then EI is carried out the local threshold values binaryzation of step-by-step movement level, carry out binaryzation with the Otsu method in the local window of 16 * 32,8 pixels of every sub-level stepping, same method is carried out the vertical local threshold values binaryzation of step-by-step movement to EI, carry out binaryzation with the Otsu method in the local window of image_width * 8, each vertical 4 pixels of stepping, in each subwindow, in EI, gray-scale value is lower than local threshold values, and in B, corresponding pixel value is made as 0; T) will be that the pixel that 1 pixel is connected sets to 0 with the expansion area thresholding in B, in order to prevent that the stroke pixel also is set to 0, definition dam points:

The length of the longest horizontal continuity 1 sequence at H_len (x, y) expression pixel (x, y) place wherein, and the length of the longest vertical continuous 1 sequence at V_len (x, y) expression pixel (x, y) place, for dam points point, can't expand to background pixel;

U) using the Sobel operator to obtain the marginal information of EI, is 1 connected domain to each value in B, and statistics drops on wherein or around its number of edge pixel point epnIf, Epn＜tepn, all pixels with this connected domain are set to 0, thereby this connected domain is removed, and tepn determines with following formula:

tepn=max( cheight,cwidth)

Wherein CheightWith CwidthBe respectively the height and width of this connected domain.

V) binary map B being sent into OCR software identifies.

The beneficial effect that the present invention compared with prior art has:

1) the captions detection algorithm in the present invention can overcome detection algorithm commonly used to the shortcoming of language, captions alignment thereof and background complicacy sensitivity, by strengthening the distinctive angle point information of captions and using Region Segmentation Algorithm, simultaneously in conjunction with the combination entropy filtrator, can obtain the variation robustness of language, captions alignment thereof and background complicacy testing result preferably;

2) the captions extraction algorithm in the present invention can further remove noise pixel on the basis of general extraction algorithm, and follow-up OCR accuracy of identification is improved;

3) the present invention can solve to a certain extent and repeat the too much problem of captions in frame of video, can prevent that again some captions is missed simultaneously, has obtained effect preferably on continuous sequence of frames of video.

Description of drawings

Fig. 1 is video caption identification framework figure;

Fig. 2 is that video caption detects frame diagram;

Fig. 3 carries out to a certain two field picture the flow instance figure that video caption detects;

Fig. 4 carries out to a certain caption area the instance graph that video caption extracts;

Embodiment

Technical scheme for a better understanding of the present invention, the invention will be further described below in conjunction with accompanying drawing 1 and accompanying drawing 2.Accompanying drawing 1 has been described the frame diagram of video caption recognition methods of the present invention, and accompanying drawing 2 has been described the frame diagram of video caption detection method in the present invention.

1

(1) edge detection method

2

Wherein

3

(2) marginal point method of attachment

4

Wherein height and width are respectively the height and width of image I;

(3) connected domain is collected and dividing method

A) for the every i of delegation in C

Area with this minimum area-encasing rectangle of part below row

, obtain this two sum of areas, find out obtain minimum and line number be stored in bR;

B) for each the row j in C

Area with the minimum area-encasing rectangle of this row right-hand component

C) order

,

The connected domain filling rate is less than 0.8;

(4) connected domain is refined and is trailed filter method

Output:Bound position after refining

D) for any row of connected domain C , calculate its EdgeMapIn the non-zero pixel span in left and right

, and be stored in set cSA;

E) for any row of connected domain C

, calculate its EdgeMapIn capable pixel number, and be stored in set

In, namely have

Get

In maximal value exist

In, and its sequence number is existed

In;

G) for All row in scope are got

The maximum row sequence number

For

All row in scope are got The minimum row sequence number

For

All row in scope are got

The maximum row sequence number

For

All row in scope are got

The minimum row sequence number

H) order

, the bound position after namely being refined

Wherein With

Usually value is 0.6 and 0.3;

I) step g in the above) complete after, continue in oPNA to upper and lower scanning, until the value at current line place less than , suppose that the line number that obtains is respectively t _TailWith b _Tail

J) use the length of following formula figuring the tail:

tl ₁= t ₂- t _tail , tl ₂= b _tail- b ₂ , tl=max ( tl ₁, tl ₂)

5

Ub wherein _cAnd ut _cRepresent respectively the bound position after connected domain C refines, and

With

Usually value is 0.2 and 0.3;

(5) combination entropy filtrator

6

7

For the connected domain C after arbitrary refining, if its

And

With

Get respectively 6.4 and 2.76 o'clock effects best;

(6) repeatability detects

M) calculate their location similarity

With the grey level histogram similarity

, wherein The area of their public part, and

The area of large that in them, if

With Having one greater than 0.8, is the duplicate detection of same area, and remove one this moment, keeps one;

(7) color is extremely unified

With

8

For they the ratio;

O) the edge map P that formula of upper steps 8 is obtained presses the row projection with edge pixel, establishes edge map and resolves into { x listing ₀, x ₁..., x _n, x wherein _iListing the mid point that is projected as 0 a certain continuum, set up successively rectangle Rect for outline map _i[1, height, x _i, x _i+1], in the edge map P in this rectangular extent, from four limits to interscan, first edge pixel point deletion that will run into is added up white edge number of pixels and black border number of pixels again, is made as respectively

With

P) definition

Be their ratio, definition

9

Judge with the following method the color utmost point of caption area:

(a) if

, be wrongly written or mispronounced character;

(d) if

, work as The time, captions are white, when The time captions be black;

(e) if

, be wrongly written or mispronounced character;

(d) if

, work as The time, captions are white, when

The time captions be black;

(f) if

, be surplus;

Wherein

tepn=max( cheight,cwidth)

V) binary map B being sent into OCR software identifies.

Embodiment

As shown in Fig. 3,4, for a certain width two field picture in video, provided being included in the identification process example of captions wherein.Describe below in conjunction with method of the present invention the concrete steps that this example is implemented in detail, as follows:

For a certain two field picture, as accompanying drawing 3(a) as shown in, adopt (1) edge detection method in claim 3 to draw the edge map that its angle point is strengthened, result is as accompanying drawing 3(b) as shown in;

(1) the above edge map that obtains that goes on foot is input, adopts (2) the marginal point method of attachment in claim 3 to connect marginal point, and result is as accompanying drawing 3(c) as shown in;

(2) mapping graph after connecting take marginal point is as input, adopts (3) connected domain in claim 3 to collect and partitioning algorithm obtains larger connected domain, and result is as accompanying drawing 3(d) as shown in;

(3) connected domain that the upper step was obtained, right to use require (4) connected domain in 3 to refine and the filter method that trails obtains more accurately the regional location size and tentatively filters, and result is as accompanying drawing 3(e) as shown in;

(4) to filtering rear remaining connected domain, right to use requires (5) the combination entropy filtrator in 3 to remove non-caption area, and last testing result is as accompanying drawing 3(f) as shown in;

(5) for detected a certain specific caption area of upper step, as accompanying drawing 4(a) as shown in, first right to use require (6) repeatability in 4 to detect judge its whether with surveyed area repetition before, as not repeating, right to use requires (7) the color utmost point unified approach in 4 that this zone is unified into the black matrix wrongly written or mispronounced character;

(6) caption area after extremely to unified color, right to use requires binaryzation and the denoise algorithm in 5, obtains binary map preferably, result is as accompanying drawing 4(b) as shown in;

(7) use business OCR software that binary map is identified, result is as accompanying drawing 4(c) as shown in.

Can find out from accompanying drawing, this method can detect the caption area in video frame image preferably, and with it binaryzation, the result after binaryzation can reach accuracy of identification preferably.

Claims

1. video caption recognition methods based on marginal information and Distribution Entropy is characterized in that its step is as follows:

4) in captions extract, color utmost point caption area is after reunification carried out binaryzation, send the identification of OCR software after the removal noise spot; The difference of described detection present frame and last processed frame if difference is large, is carried out the operation of following subtitle recognition, and carry out determining step and be otherwise continue to get next frame: establishing this frame is I _i, its edge binary map is E _i, its last processed frame is that front the 5th frame is I _i-5, its edge binary map is E _I-5, make D _{I, i-5}=E _i⊕ E _I-5, the last detected arbitrary caption area of order is Area _{I-5, j}, again last each caption area edge binary map cumulative and minimum value be pMES, in present frame, the cumulative difference of caption area is calculated as follows:

cFD = \underset{j}{Σ} D_{i, i - 5} ({Area}_{i - 5, j}) - - - 1

If cFD is less than or equal to pMES * 0.5, do not need this frame is carried out the subtitle recognition operation, continue to get next the 5th frame and judge, otherwise just need to carry out the subtitle recognition operation to this frame, in order further to prevent from missing captions, separately establish a count value ck, each cFD is less than or equal to pMES * 0.5 an o'clock ck value and adds 1, otherwise ck resets to 0, if ck equals 5, regardless of the front judgement, all need this frame is carried out subtitle recognition operation, ck reassignment 0 simultaneously.

2. a kind of video caption recognition methods based on marginal information and Distribution Entropy according to claim 1, it is characterized in that at first described subtitle recognition carry out captions and detect, use in captions detect that rim detection, marginal point connect, connected region collection and dividing method, connected region is refined and the filter method that trails obtains candidate's textview field and position thereof, remove non-textview field with the combination entropy filtrator again, only stay the caption area step and be:

(1) edge detection method

S＝MAX(|S _H|,|S _V|,|S _LD|,|S _RD|)+k×|S _⊥-MAX| 2

Wherein ⊥-MAX is illustrated in pixel (x, y) and locates direction with greatest gradient absolute value perpendicular direction, and k is an adjustment factor, its value is 1 herein, then S is quantized into 16 grades, is expressed as S ' after quantification, then obtains edge map EdgeMap with following formula:

EdgeMap (x, y) = \{\begin{matrix} 1 & S^{'} (x, y) > = 15 \\ 0 & S^{'} (x, y) < 15 \end{matrix} - - - 3

(2) marginal point method of attachment

For edge map EdgeMap, if the distance of two marginal points of going together is less than a certain threshold values T _d, the pixel value between these two pixels in EdgeMap all is set to 1, T _dDetermined by following formula:

T_{d} = \max (4, \min ([\frac{\max (height, width)}{50}], 16)) - - - 4

Wherein height and width are respectively the height and width of image I;

(3) connected region is collected and dividing method

The EdgeMap that the upper step was obtained carries out the connected region collection, remove those high or wide less than 1% high or wide connected region of entire image, remove simultaneously those minimum area-encasing rectangles less than the connected region of entire image area 0.2%, re-use following steps each connected region C is carried out Region Segmentation:

A) for the every i ∈ [t of delegation in connected region C _c, b _c], obtain the area A rea of the minimum area-encasing rectangle of this row and above part _up(i) and the area A rea of the minimum area-encasing rectangle of the following part of this row _Down(i), obtain this two sum of areas, find out obtain minimum and line number be stored in bR;

B) for each the row j ∈ [l in connected region C _c, r _c], obtain the area A rea of the minimum area-encasing rectangle of these row and left-hand component _Left(j) and the area A rea of the minimum area-encasing rectangle of this row right-hand component _Right(j), obtain this two sum of areas, find out obtain minimum and row number be stored in bC;

C) make mRA=Area _up(bR)+Area _Down(bR), mCA=Area _Left(bC)+Area _Right(bC), if mRA＜mCA, connected region C being expert at upper is divided into two connected regions with bR behavior circle, otherwise connected region C is classified as the boundary with bC and is divided into two connected regions listing;

T wherein _c, b _c, l _cAnd r _cRespectively upper bound line number, the lower bound line number of connected region C, left boundary row number and right boundary row number;

For prevent over-segmentation, cut apart just only have when connected region C satisfies following two conditions simultaneously: 1. the connected region filling rate is less than 0.8; 2. two new connected region areas that are divided into are all greater than 0.2% of entire image area;

(4) connected region is refined and is trailed filter method

Carrying out before the zone refines, removing first that those are tall and big in the wide connected region of 2 times, may delete like this captions of those vertical setting of types by mistake, in order to process the vertical setting of types captions, need only be with image rotation 90 degree, other operate striking resemblances;

Input: edge map edgeMap, the initial bound line number t of connected region C _c, b _c

Output: the bound line number ut after refining _c, ub _c

D) for any capable i ∈ [t of connected region C _c, b _c], calculate its left and right in edgeMap non-zero pixel span r _i-l _i, and be stored in set cSA;

E) for any capable i ∈ [t of connected domain C _c, b _c], calculate its capable pixel number in edgeMap, and be stored in set oPNA

The maximal value of f) getting in cSA exists in pCS, and its sequence number is existed in pSRN;

The maximal value of getting in oPNA exists in pOPN, and its sequence number is existed in pPRN;

G) at i ∈ [t _c, pSRN] and all interior row of scope, get cSA[i]＜pCS * η ₁Maximum row sequence number t ₁

For at i ∈ [pSRN, b _c] all interior row of scope, get cSA[i]＜pCS * η ₁Minimum row sequence number b ₁

For at i ∈ [t _c, pPRN] and all interior row of scope, get oPNA[i]＜pOPN * η ₂Maximum row sequence number t ₂

For at i ∈ [pPRN, b _c] all interior row of scope, get oPNA[i]＜pOPN * η ₂Minimum row sequence number b ₂

H) make ut _c=max (t ₁, t ₂), ub _c=min (b ₁, b ₂), the bound line number ut after namely being refined _c, ub _c

η wherein ₁And η ₂Usually value is 0.6 and 0.3;

I) step g in the above) complete after, continue in oPNA to upper and lower scanning, until the value at current line place is less than pOPN * η ₃, suppose that the line number that obtains is respectively t _TailAnd b _Tail

J) use the length of following formula figuring the tail:

tl ₁=t ₂-t _tail,tl ₂=b _tail-b ₂,tl=max(tl ₁,tl ₂)

K) filter with following formula, if deleteFlag (C) is 1, illustrate that this connected region is not caption area, should delete;

deleteFlag (C) = \{\begin{matrix} 1 & tl > ({ub}_{c} - {ut}_{c}) \times η_{4} \\ 0 & others \end{matrix} - - - 5

Ut wherein _cAnd ub _cRepresent respectively the bound line number after connected region C refines, and η ₃And η ₄Usually value is 0.2 and 0.3;

(5) combination entropy filtrator

For the foreground pixel Distribution Entropy, be the minimum area-encasing rectangle Rect[t to a certain connected region C _c, b _c, l _c, r _c], t wherein _c, b _cRespectively the bound line number, l _c, r _cBe respectively circle, left and right row number, use the Otsu threshold values with its binaryzation, then be divided into 2 row * 4 row=8 parts, use following formula Computation distribution entropy:

E_{FPD} = - \underset{i, j}{Σ} (p_{i, j} \ln p_{i, j} + (1 - p_{i, j}) \ln (1 - p_{i, j})) i &Element; {1,2}, j &Element; {1,2,3,4} - - - 6

For the edge pixel Distribution Entropy, be the minimum area-encasing rectangle Rect[t with connected region C _c, b _c, l _c, r _c] in Sobel edge binary map be divided into 2 row * 4 row=8 parts, use following formula Computation distribution entropy:

E_{E} = - \underset{i, j}{Σ} (\frac{e_{ij}}{e_{r}} \ln \frac{e_{ij}}{e_{r}} + (1 - \frac{e_{ij}}{e_{r}}) \ln (1 - \frac{e_{ij}}{e_{r}})) i &Element; {1,2}, j {1,2,3,4} - - - 7

E wherein _ijRepresent the capable j row of i part edge pixel number, and e _r8 part edge number of pixels summations,

For the connected region C after arbitrary refining, if its E _FPD＞E _T1And E _E＞E _T2, think that it is caption area, otherwise be just non-caption area, should delete, test to get E _T1And E _T2Get respectively 6.4 and 2.76 o'clock effects best;

3. a kind of video caption recognition methods based on marginal information and Distribution Entropy according to claim 1, it is characterized in that describedly caption area is carried out repeatability detecting, if should not repeat in the zone, its color is extremely unified to be the black matrix wrongly written or mispronounced character, then carry out captions and extract, otherwise process next caption area step be:

(6) repeatability detects

L) extract and store all caption area position Rect of last processed frame _i[t _i, b _i, l _i, r _i] and grey level histogram GH _i{ g _i,0, g _i,1... g _i,k... g _{I, 255}, g wherein _i,kBe i the number of pixels that the caption area gray level is k; Extract and store all caption area position Rect of present frame _j[t _j, b _j, l _j, r _j] and grey level histogram GH _j{ g _j,0, g _j,1... g _{J, 255};

M) calculate their location similarity

With the grey level histogram similarity

Rect wherein _i∩ Rect _jThe area of their public part, and max (Rect _i, Rect _j) be the area of large that in them, if Simi _Loc(i, j) and Simi _GHis(i, j) has one greater than 0.8, is the duplicate detection of same area, and remove one this moment, keeps one;

(7) color is extremely unified

k_{w} (x, y) = [\begin{matrix} 0 & - 1 & 0 \\ - 1 & 4 & - 1 \\ 0 & - 1 & 0 \end{matrix}]

With

k_{b} (x, y) = [\begin{matrix} 0 & 1 & 0 \\ 1 & - 4 & 1 \\ 0 & 1 & 0 \end{matrix}]

P (x, y) = \{\begin{matrix} White_Edge, & K_{w} (x, y) > 0 \\ Black_Edge, & K_{b} (x, y) > 0 \\ Non_Edge, & K_{w} (x, y) \leq 0 and K_{b} (x, y) \leq 0 \end{matrix} - - - 8

Make N _wAnd N _bRepresent respectively white edge number of pixels and black border number of pixels, definition R ₁=N _w/ N _bFor they the ratio;

O) the edge map P that formula of upper steps 8 is obtained presses the row projection with edge pixel, establishes edge map and resolves into { x listing ₀, x ₁..., x _n, x wherein _iListing the mid point that is projected as 0 a certain continuum, set up successively rectangle Rect for edge map _i[1, height, x _i, x _i+1], in the edge map P in this rectangular extent, from four limits to interscan, first edge pixel point deletion that will run into is added up white edge number of pixels and black border number of pixels again, is made as respectively N _w' and N _b';

P) definition R ₂=N _w'/N _b' be their ratio, definition

ΔR = \frac{R_{2} - R_{1}}{\max (R_{1}, R_{2})}, - 1 \leq ΔR \leq 1 - - - 9

Judge with the following method the color utmost point of caption area:

(a) if Δ R＞T _2h, be wrongly written or mispronounced character;

If T _2l≤ Δ R≤T _2h, work as R ₁＞T _2vThe time, captions are white, work as R ₁≤ T _2vThe time captions be black;

If T _1h≤ Δ R≤T _2l, be wrongly written or mispronounced character;

(d) if T _1l≤ Δ R≤T _1h, work as R ₁＞T _1vThe time, captions are white, work as R ₁≤ T _1vThe time captions be black;

If Δ R＜T _1l, be surplus;

T wherein _1l=-0.25, T _1h=-0.15, T _1v=1.2, T _2l=0, T _2h=0.35, T _2v=0.8;

4. a kind of video caption recognition methods based on marginal information and Distribution Entropy according to claim 1, it is characterized in that describedly in captions extract, color utmost point caption area after reunification being carried out binaryzation, send OCR software identification step to be after removing noise spot:

R) the high regular of caption area with gray processing turns to 24, then expands up and down respectively 4 pixels, thereby is highly 32 after expansion, and the caption area image after expansion is made as EI;

Dam points＝{(x,y)|B(x,y)＝1∧1≤min(H_len(x,y),V_len(x,y)≤4

U) use the Sobel operator to obtain the marginal information of EI, it is 1 connected region to each value in B, statistics drops on wherein or around its number epn of edge pixel point, if epn＜tepn, all pixels with this connected region are set to 0, thereby this connected region is removed, and tepn determines with following formula:

tepn=max(cheight,cwidth)

Wherein cheight and cwidth are respectively the height and width of this connected region;

V) binary map B being sent into OCR software identifies.