CN102208023B - Method for recognizing and designing video captions based on edge information and distribution entropy - Google Patents

Method for recognizing and designing video captions based on edge information and distribution entropy Download PDF

Info

Publication number
CN102208023B
CN102208023B CN 201110024330 CN201110024330A CN102208023B CN 102208023 B CN102208023 B CN 102208023B CN 201110024330 CN201110024330 CN 201110024330 CN 201110024330 A CN201110024330 A CN 201110024330A CN 102208023 B CN102208023 B CN 102208023B
Authority
CN
China
Prior art keywords
area
edge
row
captions
pixel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 201110024330
Other languages
Chinese (zh)
Other versions
CN102208023A (en
Inventor
魏宝刚
庄越挺
袁杰
鲁伟明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN 201110024330 priority Critical patent/CN102208023B/en
Publication of CN102208023A publication Critical patent/CN102208023A/en
Application granted granted Critical
Publication of CN102208023B publication Critical patent/CN102208023B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)
  • Character Input (AREA)

Abstract

The invention discloses a method for recognizing and designing video captions based on edge information and distribution entropy. The method comprises the following steps of: acquiring image edge information by utilizing an angular point reinforced edge detection method; connecting edge points and collecting connected domains, and segmenting the connected domains appropriately by utilizing a segmentation algorithm; acquiring the accurate positions of the connected domains by utilizing refinement operation; filtering out non-text areas by utilizing a tailing filter and a combination entropy filter, wherein the remains are text areas; and setting the detected text areas into the uniform format of white characters on black, carrying out local threshold binaryzation, edge noise point expansion removal operation based on forbidden expansion point constraint and noise removal operation based on surrounding edge point counting to obtain a binary image and sending the binary image to OCR (optical character recognition) software for recognition. Through the method, the defect that common methods are very sensitive to languages and caption arrangement manner, background complexity degree and the like can be overcome; favorable detection effects can be obtained through the introduction of the segmentation algorithm and the combination entropy filter; and the recognition accuracy is improved greatly by improving the traditional binaryzation method.

Description

Video caption identification method for designing based on marginal information and Distribution Entropy
Technical field
The present invention relates to a kind of video caption recognition methods based on marginal information and Distribution Entropy, the method be used for to realize that detecting and extracts captions at video identifies for OCR, belongs to the Computer Image Processing field.
Background technology
Along with the development of multimedia and electronics industry, increasing video information is produced.How effectively to organize and retrieve them and just become a difficult problem.A lot of video datas such as TV news, sports tournament, film, variety show etc. have the caption information that adds in post-production, and these caption informations and video content are closely related.If effectively identify these captions, can utilize them that video data is organized and retrieved, have very strong practical value.
Video caption identification was divided into for four steps: captions detection, caption location, captions extract and OCR identification.Captions detect and are used for determining caption area; Caption location is used for locating the exact position of every delegation captions; Captions extract and are used for the caption area binaryzation, only keep the stroke pixel; Final step is generally transferred to commercial OCR software and is realized.Captions detect and can be divided into four kinds of methods: based on the method on limit, based on the method for the method of connected domain, color-based cluster with based on the method for texture.Detect the text limit based on the method on limit with the limit filtrator, then merge them with morphological operation.The 8th document analysis and identification meeting (In Proceedings of 8 rdInternational Conference on Document Analysis and Recognition (ICDAR), 2005, the method of 610-614) announcing uses edge detection method to obtain four edge map, then use the K-MEANS algorithm to detect the candidate text filed, determine and refine text filed with heuristic rule and Projection Analysis at last.If there is no complicated background, can be fine based on the effect of the method on limit, but when background comprises a lot of side information, their effect is just poorly.Method based on texture is used the texture feature extractions such as Gabor filtrator, wavelet transformation, Fast Fourier Transform (FFT), and then the method with machine learning such as neural network, svm classifier devices detects caption area.Ieee communication technology meeting paper in 2008 is concentrated (In Proceeding of IEEE International Conference on Communication Technology (ICCT), 2008, a kind of method of 722-725) announcing by 4 fritter wavelet coefficients being merged into a bulk of the location in the big font text, then strengthens effect with morphological dilation and neural grid with the HARR wavelet transformation.Method based on connected domain is divided into a plurality of little connected domains with a frame, then they is merged to and is used for locating captions in larger connected domain.ACM multimedia technology meeting paper in 2007 is concentrated (In Proceedings of the ACM International Multimedia Conference and Exhibition 2007 (MM), a kind of method of 847-850) announcing uses the cluster of fiduciary color to remove noise, and they come the relatively best color panel of adaptive selection to carry out the binaryzation operation according to the TEXT CONTRAST difference of each color panel.Textcolor in the method hypothesis frame of video of color-based cluster is all unified, yet this hypothesis is in most of the cases invalid, so the limitation of its application is larger.To detect its effect undesirable owing to utilizing a kind of feature to carry out captions, and therefore a lot of methods are united and used above various features.For caption location, generally use the method for Gray Projection.The captions extracting method can be divided into the method for color-based and based on the method for stroke.The method of a lot of color-baseds uses the Otsu method to carry out binaryzation to gray-scale map, but when the gray level of captions and background was very close, the method can not well be told them, thus well denoising." Institute of Electrical and Electric Engineers video technology circuit and system's journal " the 15th phase in 2005 (IEEE Transactions on Circuits and Systems for Video Technology 2005, 15(2): 243-255) with " the Institute of Electrical and Electric Engineers image is processed journal " the 18th phase in 2009 (IEEE Transactions on Image Processing 2009, 18(2): a kind of method of announcing 401-411) is used the local adaptation threshold values that better resolving power is arranged, in conjunction with dam point mark and inwardly filling, make most of noise spot can go to be removed.
Above these caption detection methods all to video caption testing made some good tries, but these methods are not fine to the resolving effect of captions and background, and only adopting these methods to detect the changeable video of some language, font and text alignment mode, to carry out treatment effect not good.Although the captions extracting method of having deposited in addition can remove most of noise, because OCR software is very responsive to noise spot, cause the poor effect of text identification under complex background.
Summary of the invention
The objective of the invention is to overcome the deficiencies in the prior art, a kind of video caption recognition methods based on marginal information and Distribution Entropy is provided.
Step based on the video caption recognition methods of marginal information and Distribution Entropy is as follows:
1) detect the difference of present frame and last processed frame, if difference is large, carry out following subtitle recognition operation, judge otherwise continue to get next frame;
2) at first subtitle recognition carries out the captions detection, use in captions detect that rim detection, marginal point connect, connected domain collection and dividing method, connected domain is refined and the filter method that trails obtains candidate's textview field and position thereof, remove non-textview field with the combination entropy filtrator again, only stay caption area;
3) caption area is carried out repeatability and detect, if should not repeat in the zone, its color is extremely unified to be the black matrix wrongly written or mispronounced character, then carry out captions and extract, otherwise process next caption area;
4) in captions extract, color utmost point caption area is after reunification carried out binaryzation, send the identification of OCR software after the removal noise spot.
The difference of described detection present frame and last processed frame if difference is large, is carried out the operation of following subtitle recognition, and carry out determining step and be otherwise continue to get next frame: establishing this frame is I i, its edge binary map is E i, its last processed frame is that front the 5th frame is I i-5, its edge binary map is E I-5, make D I, i-5=E i⊕ E I-5, make that last detected caption area is Area I-5, j, again last each caption area edge binary map cumulative and minimum value be PMES, in present frame, the cumulative difference of caption area is calculated as follows:
Figure 2011100243300100002DEST_PATH_IMAGE002
1
If cFDBe less than or equal to PMES* 0.5, do not need this frame is carried out the subtitle recognition operation, continue to get next the 5th frame and judge, otherwise just need to carry out the subtitle recognition operation to this frame, in order further to prevent from missing captions, separately establish a count value ck, each cFDBe less than or equal to PMESThe ck value added 1 in * 0.5 o'clock, otherwise ck resets to 0, if ck equals 5, regardless of the front judgement, all needed this frame is carried out subtitle recognition operation, ck reassignment 0 simultaneously.
At first described subtitle recognition is carried out captions and is detected, use in captions detect that rim detection, marginal point connect, connected domain collection and dividing method, connected domain is refined and the filter method that trails obtains candidate's textview field and position thereof, remove non-textview field with the combination entropy filtrator again, only stay the caption area step and be:
(1) edge detection method
Given image I adopts Sobel operator Edge detected, and the Sobel operator is by horizontal S H, vertical S V, diagonal line S LD, contrary diagonal line S RDGradient template on four direction forms, and fringing field is calculated by following formula:
Figure 2011100243300100002DEST_PATH_IMAGE004
2
Wherein
Figure 2011100243300100002DEST_PATH_IMAGE006
Be illustrated in pixel (x, y) and locate direction with greatest gradient absolute value perpendicular direction, k is an adjustment factor, and its value is that then 1, S is quantized into 16 grades herein, is expressed as S ' after quantification, then obtains edge map with following formula EdgeMap:
Figure DEST_PATH_IMAGE008
3
(2) marginal point method of attachment
For edge map EdgeMapIf the distance of two marginal points of going together is less than a certain threshold values T d , will EdgeMapIn pixel value between these two pixels all be set to 1, also namely fill the pixel between these two marginal points, T d Determined by following formula:
Figure DEST_PATH_IMAGE010
4
Wherein height and width are respectively the height and width of image I;
(3) connected domain is collected and dividing method
The upper step is obtained EdgeMapCarry out connected domain and collect, remove those and highly or wide remove simultaneously those minimum area-encasing rectangles less than the connected domain of entire image area 0.2% less than 1% high or wide connected domain of entire image, re-use following steps each connected domain C is carried out Region Segmentation:
A) for the every i of delegation in C
Figure DEST_PATH_IMAGE012
, obtain the area of the minimum area-encasing rectangle of this row and above part
Figure DEST_PATH_IMAGE014
Area with this minimum area-encasing rectangle of part below row , obtain this two sum of areas, find out obtain minimum and line number be stored in bR;
B) for each the row j in C
Figure DEST_PATH_IMAGE018
, obtain the area of the minimum area-encasing rectangle of these row and left-hand component
Figure DEST_PATH_IMAGE020
Area with the minimum area-encasing rectangle of this row right-hand component
Figure DEST_PATH_IMAGE022
, obtain this two sum of areas, find out obtain minimum and row number be stored in bC;
C) order
Figure DEST_PATH_IMAGE024
,
Figure DEST_PATH_IMAGE026
If, mRA<mCA, connected domain C being expert at upper is divided into two connected domains with bR behavior circle, otherwise connected domain C is classified as the boundary with bC and is divided into two connected domains listing;
Wherein t c , b c l c With r c Respectively upper bound line number, the lower bound line number of regional C, left boundary row number and right boundary row number;
In order to prevent over-segmentation, to cut apart just only have when connected region C satisfies following two conditions simultaneously:
Figure DEST_PATH_IMAGE028
The connected domain filling rate is less than 0.8;
Figure DEST_PATH_IMAGE030
Two new connected domain areas that are divided into are all greater than 0.2% of entire image area;
(4) connected domain is refined and is trailed filter method
Carrying out before the zone refines, removing first that those are tall and big in the wide connected domain of 2 times, may delete like this captions of those vertical setting of types by mistake, in order to process the vertical setting of types captions, need only be with image rotation 90 degree, other operate striking resemblances;
To each connected region C that the upper step obtains, the step of further being refined in its position is as follows:
Input: edge map EdgeMap, the initial up-and-down boundary position of connected domain C
Figure DEST_PATH_IMAGE032
Output:Bound position after refining
Figure DEST_PATH_IMAGE034
D) for any row of connected domain C
Figure DEST_PATH_IMAGE036
, calculate its EdgeMapIn the non-zero pixel span in left and right
Figure DEST_PATH_IMAGE038
, and be stored in set cSA;
E) for any row of connected domain C
Figure 103460DEST_PATH_IMAGE036
, calculate its EdgeMapIn capable pixel number, and be stored in set
Figure DEST_PATH_IMAGE040
In, namely have
Figure DEST_PATH_IMAGE042
The maximal value of f) getting in cSA exists In, and its sequence number is existed PSRNIn;
Get In maximal value exist
Figure DEST_PATH_IMAGE046
In, and its sequence number is existed
Figure DEST_PATH_IMAGE048
In;
G) for
Figure DEST_PATH_IMAGE050
All row in scope are got
Figure DEST_PATH_IMAGE052
The maximum row sequence number
Figure DEST_PATH_IMAGE054
For
Figure DEST_PATH_IMAGE056
All row in scope are got
Figure 63118DEST_PATH_IMAGE052
The minimum row sequence number
Figure DEST_PATH_IMAGE058
For
Figure DEST_PATH_IMAGE060
All row in scope are got
Figure DEST_PATH_IMAGE062
The maximum row sequence number
Figure DEST_PATH_IMAGE064
For
Figure DEST_PATH_IMAGE066
All row in scope are got
Figure 475514DEST_PATH_IMAGE062
The minimum row sequence number
Figure DEST_PATH_IMAGE068
H) order
Figure DEST_PATH_IMAGE070
, the bound position after namely being refined
Wherein
Figure DEST_PATH_IMAGE072
With
Figure DEST_PATH_IMAGE074
Usually value is 0.6 and 0.3;
Use following hangover filter method to remove some non-captions connected domains:
I) step g in the above) complete after, continue in oPNA to upper and lower scanning, until the value at current line place less than
Figure DEST_PATH_IMAGE076
, suppose that the line number that obtains is respectively t Tail With b Tail
J) use the length of following formula figuring the tail:
tl 1 = t 2 - t tail , tl 2 = b tail - b 2 , tl=max ( tl 1 , tl 2 )
K) filter with following formula, if DeleteFlag (C)Be 1, illustrate that this connected domain is not caption area, should delete;
Figure DEST_PATH_IMAGE078
5
Ub wherein cAnd ut cRepresent respectively the bound position after connected domain C refines, and and usually value be 0.2 and 0.3;
(5) combination entropy filtrator
Use the combination entropy filtrator of associating foreground pixel Distribution Entropy and edge pixel Distribution Entropy to filter, only stay caption area;
For the foreground pixel Distribution Entropy, be the minimum area-encasing rectangle Rect [t to a certain connected domain C c, b c, l c, r c], t wherein c, b cRespectively bound, l c, r cBe respectively the circle, left and right, use the Otsu threshold values with its binaryzation, then be divided into 2 row * 4 row=8 parts, use following formula Computation distribution entropy:
Figure DEST_PATH_IMAGE080
6
P wherein i,jThe ratio that represents the non-zero pixel of i capable j row part;
For the edge pixel Distribution Entropy, be the minimum area-encasing rectangle Rect [t with connected domain C c, b c, l c, r c] in Sobel edge binary map be divided into 2 row * 4 row=8 parts, use following formula Computation distribution entropy:
Figure DEST_PATH_IMAGE082
7
Wherein e ij Represent the capable j row of i part edge pixel number, and e r 8 part edge number of pixels summations,
For the connected domain C after arbitrary refining, if its
Figure DEST_PATH_IMAGE084
And
Figure DEST_PATH_IMAGE086
, think that it is caption area, otherwise be just non-caption area, should delete, test
Figure DEST_PATH_IMAGE088
With
Figure DEST_PATH_IMAGE090
Get respectively 6.4 and 2.76 o'clock effects best;
The image that the vertical setting of types captions are arranged again for some existing horizontally-arranged carries out captions and detects, then both testing results are merged in the image of former figure and 90-degree rotation gained, eliminate and repeat.
Describedly caption area carried out repeatability detect, if should not repeat in the zone, be the black matrix wrongly written or mispronounced character with its color is extremely unified, then carry out captions and extract, otherwise process next caption area step be:
(6) repeatability detects
Adopt binding site and the histogrammic method of greyscale color that detected caption area is disappeared heavily, step is as follows:
L) extract and store all caption area position Rect of last processed frame i[t i, b i, l i, r i] and grey level histogram GH i{ g i,0, g i,1... g I, 255, wherein
Figure DEST_PATH_IMAGE092
Be i the number of pixels that the caption area gray level is k; Extract and store all caption area position Rect of present frame j[t j, b j, l j, r j] and grey level histogram GH j{ g j,0, g j,1... g J, 255;
M) calculate their location similarity
Figure DEST_PATH_IMAGE094
With the grey level histogram similarity
Figure DEST_PATH_IMAGE096
, wherein
Figure DEST_PATH_IMAGE098
The area of their public part, and
Figure DEST_PATH_IMAGE100
The area of large that in them, if With
Figure DEST_PATH_IMAGE104
Having one greater than 0.8, is the duplicate detection of same area, and remove one this moment, keeps one;
(7) color is extremely unified
The caption area gray-scale map is unified into the black matrix wrongly written or mispronounced character, takes following steps:
N) at first the caption area after gray processing is used the value of Otsu method two, then used respectively 3 * 3 mask
Figure DEST_PATH_IMAGE106
With
Figure DEST_PATH_IMAGE108
Caption area after binaryzation is carried out convolution operation, determines the edge color at each pixel place with following formula:
Figure DEST_PATH_IMAGE110
8
Make N wAnd N bRepresent respectively white edge number of pixels and black border number of pixels, definition
Figure DEST_PATH_IMAGE112
For they the ratio;
O) the edge map P that formula of upper steps 8 is obtained presses the row projection with edge pixel, establishes edge map and resolves into { x listing 0, x 1..., x n, x wherein iListing the mid point that is projected as 0 a certain continuum, set up successively rectangle Rect for outline map i[1, height, x i, x i+1], in the edge map P in this rectangular extent, from four limits to interscan, first edge pixel point deletion that will run into is added up white edge number of pixels and black border number of pixels again, is made as respectively With
Figure DEST_PATH_IMAGE116
P) definition
Figure DEST_PATH_IMAGE118
Be their ratio, definition
Figure DEST_PATH_IMAGE120
9
Judge with the following method the color utmost point of caption area:
(a) if , be wrongly written or mispronounced character;
(b) if
Figure DEST_PATH_IMAGE124
, work as
Figure DEST_PATH_IMAGE126
The time, captions are white, when
Figure DEST_PATH_IMAGE128
The time captions be black;
(c) if
Figure DEST_PATH_IMAGE130
, be wrongly written or mispronounced character;
(d) if
Figure DEST_PATH_IMAGE132
, work as
Figure DEST_PATH_IMAGE134
The time, captions are white, when
Figure DEST_PATH_IMAGE136
The time captions be black;
(e) if
Figure DEST_PATH_IMAGE138
, be surplus;
Wherein
Figure DEST_PATH_IMAGE140
Q) judge the captions color extremely after, if surplus with this caption area gray-scale map inverse, otherwise does not operate.
Describedly in captions extract, color utmost point caption area is after reunification carried out binaryzation, send OCR software identification step to be after removing noise spot:
R) the high regular of caption area with gray processing turns to 24, then expands up and down respectively 4 pixels, thereby is highly 32 after expansion, is made as EI;
S) each pixel of binary map B as a result is initialized as 1, then EI is carried out the local threshold values binaryzation of step-by-step movement level, carry out binaryzation with the Otsu method in the local window of 16 * 32,8 pixels of every sub-level stepping, same method is carried out the vertical local threshold values binaryzation of step-by-step movement to EI, carry out binaryzation with the Otsu method in the local window of image_width * 8, each vertical 4 pixels of stepping, in each subwindow, in EI, gray-scale value is lower than local threshold values, and in B, corresponding pixel value is made as 0; T) will be that the pixel that 1 pixel is connected sets to 0 with the expansion area thresholding in B, in order to prevent that the stroke pixel also is set to 0, definition dam points:
Figure DEST_PATH_IMAGE142
The length of the longest horizontal continuity 1 sequence at H_len (x, y) expression pixel (x, y) place wherein, and the length of the longest vertical continuous 1 sequence at V_len (x, y) expression pixel (x, y) place, for dam points point, can't expand to background pixel;
U) using the Sobel operator to obtain the marginal information of EI, is 1 connected domain to each value in B, and statistics drops on wherein or around its number of edge pixel point epnIf, Epn<tepn, all pixels with this connected domain are set to 0, thereby this connected domain is removed, and tepn determines with following formula:
tepn=max( cheight,cwidth)
Wherein CheightWith CwidthBe respectively the height and width of this connected domain.
V) binary map B being sent into OCR software identifies.
The beneficial effect that the present invention compared with prior art has:
1) the captions detection algorithm in the present invention can overcome detection algorithm commonly used to the shortcoming of language, captions alignment thereof and background complicacy sensitivity, by strengthening the distinctive angle point information of captions and using Region Segmentation Algorithm, simultaneously in conjunction with the combination entropy filtrator, can obtain the variation robustness of language, captions alignment thereof and background complicacy testing result preferably;
2) the captions extraction algorithm in the present invention can further remove noise pixel on the basis of general extraction algorithm, and follow-up OCR accuracy of identification is improved;
3) the present invention can solve to a certain extent and repeat the too much problem of captions in frame of video, can prevent that again some captions is missed simultaneously, has obtained effect preferably on continuous sequence of frames of video.
Description of drawings
Fig. 1 is video caption identification framework figure;
Fig. 2 is that video caption detects frame diagram;
Fig. 3 carries out to a certain two field picture the flow instance figure that video caption detects;
Fig. 4 carries out to a certain caption area the instance graph that video caption extracts;
Embodiment
Technical scheme for a better understanding of the present invention, the invention will be further described below in conjunction with accompanying drawing 1 and accompanying drawing 2.Accompanying drawing 1 has been described the frame diagram of video caption recognition methods of the present invention, and accompanying drawing 2 has been described the frame diagram of video caption detection method in the present invention.
Step based on the video caption recognition methods of marginal information and Distribution Entropy is as follows:
1) detect the difference of present frame and last processed frame, if difference is large, carry out following subtitle recognition operation, judge otherwise continue to get next frame;
2) at first subtitle recognition carries out the captions detection, use in captions detect that rim detection, marginal point connect, connected domain collection and dividing method, connected domain is refined and the filter method that trails obtains candidate's textview field and position thereof, remove non-textview field with the combination entropy filtrator again, only stay caption area;
3) caption area is carried out repeatability and detect, if should not repeat in the zone, its color is extremely unified to be the black matrix wrongly written or mispronounced character, then carry out captions and extract, otherwise process next caption area;
4) in captions extract, color utmost point caption area is after reunification carried out binaryzation, send the identification of OCR software after the removal noise spot.
The difference of described detection present frame and last processed frame if difference is large, is carried out the operation of following subtitle recognition, and carry out determining step and be otherwise continue to get next frame: establishing this frame is I i, its edge binary map is E i, its last processed frame is that front the 5th frame is I i-5, its edge binary map is E I-5, make D I, i-5=E i⊕ E I-5, make that last detected caption area is Area I-5, j, again last each caption area edge binary map cumulative and minimum value be PMES, in present frame, the cumulative difference of caption area is calculated as follows:
Figure 750821DEST_PATH_IMAGE002
1
If cFDBe less than or equal to PMES* 0.5, do not need this frame is carried out the subtitle recognition operation, continue to get next the 5th frame and judge, otherwise just need to carry out the subtitle recognition operation to this frame, in order further to prevent from missing captions, separately establish a count value ck, each cFDBe less than or equal to PMESThe ck value added 1 in * 0.5 o'clock, otherwise ck resets to 0, if ck equals 5, regardless of the front judgement, all needed this frame is carried out subtitle recognition operation, ck reassignment 0 simultaneously.
At first described subtitle recognition is carried out captions and is detected, use in captions detect that rim detection, marginal point connect, connected domain collection and dividing method, connected domain is refined and the filter method that trails obtains candidate's textview field and position thereof, remove non-textview field with the combination entropy filtrator again, only stay the caption area step and be:
(1) edge detection method
Given image I adopts Sobel operator Edge detected, and the Sobel operator is by horizontal S H, vertical S V, diagonal line S LD, contrary diagonal line S RDGradient template on four direction forms, and fringing field is calculated by following formula:
Figure 552555DEST_PATH_IMAGE004
2
Wherein
Figure 39031DEST_PATH_IMAGE006
Be illustrated in pixel (x, y) and locate direction with greatest gradient absolute value perpendicular direction, k is an adjustment factor, and its value is that then 1, S is quantized into 16 grades herein, is expressed as S ' after quantification, then obtains edge map with following formula EdgeMap:
Figure 658232DEST_PATH_IMAGE008
3
(2) marginal point method of attachment
For edge map EdgeMapIf the distance of two marginal points of going together is less than a certain threshold values T d , will EdgeMapIn pixel value between these two pixels all be set to 1, also namely fill the pixel between these two marginal points, T d Determined by following formula:
Figure 830325DEST_PATH_IMAGE010
4
Wherein height and width are respectively the height and width of image I;
(3) connected domain is collected and dividing method
The upper step is obtained EdgeMapCarry out connected domain and collect, remove those and highly or wide remove simultaneously those minimum area-encasing rectangles less than the connected domain of entire image area 0.2% less than 1% high or wide connected domain of entire image, re-use following steps each connected domain C is carried out Region Segmentation:
A) for the every i of delegation in C
Figure 232487DEST_PATH_IMAGE012
, obtain the area of the minimum area-encasing rectangle of this row and above part
Figure 839049DEST_PATH_IMAGE014
Area with this minimum area-encasing rectangle of part below row
Figure 35675DEST_PATH_IMAGE016
, obtain this two sum of areas, find out obtain minimum and line number be stored in bR;
B) for each the row j in C
Figure 898327DEST_PATH_IMAGE018
, obtain the area of the minimum area-encasing rectangle of these row and left-hand component
Figure 838601DEST_PATH_IMAGE020
Area with the minimum area-encasing rectangle of this row right-hand component
Figure 768511DEST_PATH_IMAGE022
, obtain this two sum of areas, find out obtain minimum and row number be stored in bC;
C) order
Figure 463934DEST_PATH_IMAGE024
,
Figure 548303DEST_PATH_IMAGE026
If, mRA<mCA, connected domain C being expert at upper is divided into two connected domains with bR behavior circle, otherwise connected domain C is classified as the boundary with bC and is divided into two connected domains listing;
Wherein t c , b c l c With r c Respectively upper bound line number, the lower bound line number of regional C, left boundary row number and right boundary row number;
In order to prevent over-segmentation, to cut apart just only have when connected region C satisfies following two conditions simultaneously:
Figure 26689DEST_PATH_IMAGE028
The connected domain filling rate is less than 0.8;
Figure 342263DEST_PATH_IMAGE030
Two new connected domain areas that are divided into are all greater than 0.2% of entire image area;
(4) connected domain is refined and is trailed filter method
Carrying out before the zone refines, removing first that those are tall and big in the wide connected domain of 2 times, may delete like this captions of those vertical setting of types by mistake, in order to process the vertical setting of types captions, need only be with image rotation 90 degree, other operate striking resemblances;
To each connected region C that the upper step obtains, the step of further being refined in its position is as follows:
Input: edge map EdgeMap, the initial up-and-down boundary position of connected domain C
Output:Bound position after refining
D) for any row of connected domain C , calculate its EdgeMapIn the non-zero pixel span in left and right
Figure 91465DEST_PATH_IMAGE038
, and be stored in set cSA;
E) for any row of connected domain C
Figure 535216DEST_PATH_IMAGE036
, calculate its EdgeMapIn capable pixel number, and be stored in set
Figure 187652DEST_PATH_IMAGE040
In, namely have
Figure 414365DEST_PATH_IMAGE042
The maximal value of f) getting in cSA exists In, and its sequence number is existed PSRNIn;
Get
Figure 755403DEST_PATH_IMAGE040
In maximal value exist
Figure 131020DEST_PATH_IMAGE046
In, and its sequence number is existed
Figure 348375DEST_PATH_IMAGE048
In;
G) for All row in scope are got
Figure 54435DEST_PATH_IMAGE052
The maximum row sequence number
Figure 323874DEST_PATH_IMAGE054
For
Figure 922083DEST_PATH_IMAGE056
All row in scope are got The minimum row sequence number
Figure 2351DEST_PATH_IMAGE058
For
Figure 555824DEST_PATH_IMAGE060
All row in scope are got
Figure 114981DEST_PATH_IMAGE062
The maximum row sequence number
Figure 139306DEST_PATH_IMAGE064
For
Figure 532242DEST_PATH_IMAGE066
All row in scope are got
Figure 369748DEST_PATH_IMAGE062
The minimum row sequence number
Figure 873541DEST_PATH_IMAGE068
H) order
Figure 877007DEST_PATH_IMAGE070
, the bound position after namely being refined
Wherein With
Figure 276393DEST_PATH_IMAGE074
Usually value is 0.6 and 0.3;
Use following hangover filter method to remove some non-captions connected domains:
I) step g in the above) complete after, continue in oPNA to upper and lower scanning, until the value at current line place less than , suppose that the line number that obtains is respectively t Tail With b Tail
J) use the length of following formula figuring the tail:
tl 1 = t 2 - t tail , tl 2 = b tail - b 2 , tl=max ( tl 1 , tl 2 )
K) filter with following formula, if DeleteFlag (C)Be 1, illustrate that this connected domain is not caption area, should delete;
5
Ub wherein cAnd ut cRepresent respectively the bound position after connected domain C refines, and
Figure DEST_PATH_IMAGE144
With
Figure DEST_PATH_IMAGE146
Usually value is 0.2 and 0.3;
(5) combination entropy filtrator
Use the combination entropy filtrator of associating foreground pixel Distribution Entropy and edge pixel Distribution Entropy to filter, only stay caption area;
For the foreground pixel Distribution Entropy, be the minimum area-encasing rectangle Rect [t to a certain connected domain C c, b c, l c, r c], t wherein c, b cRespectively bound, l c, r cBe respectively the circle, left and right, use the Otsu threshold values with its binaryzation, then be divided into 2 row * 4 row=8 parts, use following formula Computation distribution entropy:
Figure 55102DEST_PATH_IMAGE080
6
P wherein i,jThe ratio that represents the non-zero pixel of i capable j row part;
For the edge pixel Distribution Entropy, be the minimum area-encasing rectangle Rect [t with connected domain C c, b c, l c, r c] in Sobel edge binary map be divided into 2 row * 4 row=8 parts, use following formula Computation distribution entropy:
Figure 369540DEST_PATH_IMAGE082
7
Wherein e ij Represent the capable j row of i part edge pixel number, and e r 8 part edge number of pixels summations,
For the connected domain C after arbitrary refining, if its
Figure 911380DEST_PATH_IMAGE084
And
Figure 285861DEST_PATH_IMAGE086
, think that it is caption area, otherwise be just non-caption area, should delete, test
Figure 850834DEST_PATH_IMAGE088
With
Figure 998657DEST_PATH_IMAGE090
Get respectively 6.4 and 2.76 o'clock effects best;
The image that the vertical setting of types captions are arranged again for some existing horizontally-arranged carries out captions and detects, then both testing results are merged in the image of former figure and 90-degree rotation gained, eliminate and repeat.
Describedly caption area carried out repeatability detect, if should not repeat in the zone, be the black matrix wrongly written or mispronounced character with its color is extremely unified, then carry out captions and extract, otherwise process next caption area step be:
(6) repeatability detects
Adopt binding site and the histogrammic method of greyscale color that detected caption area is disappeared heavily, step is as follows:
L) extract and store all caption area position Rect of last processed frame i[t i, b i, l i, r i] and grey level histogram GH i{ g i,0, g i,1... g I, 255, wherein
Figure 67107DEST_PATH_IMAGE092
Be i the number of pixels that the caption area gray level is k; Extract and store all caption area position Rect of present frame j[t j, b j, l j, r j] and grey level histogram GH j{ g j,0, g j,1... g J, 255;
M) calculate their location similarity
Figure 143647DEST_PATH_IMAGE094
With the grey level histogram similarity
Figure 399179DEST_PATH_IMAGE096
, wherein The area of their public part, and
Figure 680174DEST_PATH_IMAGE100
The area of large that in them, if
Figure 521091DEST_PATH_IMAGE102
With Having one greater than 0.8, is the duplicate detection of same area, and remove one this moment, keeps one;
(7) color is extremely unified
The caption area gray-scale map is unified into the black matrix wrongly written or mispronounced character, takes following steps:
N) at first the caption area after gray processing is used the value of Otsu method two, then used respectively 3 * 3 mask
Figure 765262DEST_PATH_IMAGE106
With
Figure 542726DEST_PATH_IMAGE108
Caption area after binaryzation is carried out convolution operation, determines the edge color at each pixel place with following formula:
Figure 961069DEST_PATH_IMAGE110
8
Make N wAnd N bRepresent respectively white edge number of pixels and black border number of pixels, definition
Figure 456772DEST_PATH_IMAGE112
For they the ratio;
O) the edge map P that formula of upper steps 8 is obtained presses the row projection with edge pixel, establishes edge map and resolves into { x listing 0, x 1..., x n, x wherein iListing the mid point that is projected as 0 a certain continuum, set up successively rectangle Rect for outline map i[1, height, x i, x i+1], in the edge map P in this rectangular extent, from four limits to interscan, first edge pixel point deletion that will run into is added up white edge number of pixels and black border number of pixels again, is made as respectively
Figure 891033DEST_PATH_IMAGE114
With
Figure 788582DEST_PATH_IMAGE116
P) definition
Figure 971302DEST_PATH_IMAGE118
Be their ratio, definition
Figure 688722DEST_PATH_IMAGE120
9
Judge with the following method the color utmost point of caption area:
(a) if
Figure 988991DEST_PATH_IMAGE122
, be wrongly written or mispronounced character;
(d) if
Figure 272205DEST_PATH_IMAGE124
, work as The time, captions are white, when The time captions be black;
(e) if
Figure 747551DEST_PATH_IMAGE130
, be wrongly written or mispronounced character;
(d) if
Figure 822955DEST_PATH_IMAGE132
, work as The time, captions are white, when
Figure 538025DEST_PATH_IMAGE136
The time captions be black;
(f) if
Figure 947140DEST_PATH_IMAGE138
, be surplus;
Wherein
Figure 939367DEST_PATH_IMAGE140
Q) judge the captions color extremely after, if surplus with this caption area gray-scale map inverse, otherwise does not operate.
Describedly in captions extract, color utmost point caption area is after reunification carried out binaryzation, send OCR software identification step to be after removing noise spot:
R) the high regular of caption area with gray processing turns to 24, then expands up and down respectively 4 pixels, thereby is highly 32 after expansion, is made as EI;
S) each pixel of binary map B as a result is initialized as 1, then EI is carried out the local threshold values binaryzation of step-by-step movement level, carry out binaryzation with the Otsu method in the local window of 16 * 32,8 pixels of every sub-level stepping, same method is carried out the vertical local threshold values binaryzation of step-by-step movement to EI, carry out binaryzation with the Otsu method in the local window of image_width * 8, each vertical 4 pixels of stepping, in each subwindow, in EI, gray-scale value is lower than local threshold values, and in B, corresponding pixel value is made as 0; T) will be that the pixel that 1 pixel is connected sets to 0 with the expansion area thresholding in B, in order to prevent that the stroke pixel also is set to 0, definition dam points:
Figure 41315DEST_PATH_IMAGE142
The length of the longest horizontal continuity 1 sequence at H_len (x, y) expression pixel (x, y) place wherein, and the length of the longest vertical continuous 1 sequence at V_len (x, y) expression pixel (x, y) place, for dam points point, can't expand to background pixel;
U) using the Sobel operator to obtain the marginal information of EI, is 1 connected domain to each value in B, and statistics drops on wherein or around its number of edge pixel point epnIf, Epn<tepn, all pixels with this connected domain are set to 0, thereby this connected domain is removed, and tepn determines with following formula:
tepn=max( cheight,cwidth)
Wherein CheightWith CwidthBe respectively the height and width of this connected domain.
V) binary map B being sent into OCR software identifies.
Embodiment
As shown in Fig. 3,4, for a certain width two field picture in video, provided being included in the identification process example of captions wherein.Describe below in conjunction with method of the present invention the concrete steps that this example is implemented in detail, as follows:
For a certain two field picture, as accompanying drawing 3(a) as shown in, adopt (1) edge detection method in claim 3 to draw the edge map that its angle point is strengthened, result is as accompanying drawing 3(b) as shown in;
(1) the above edge map that obtains that goes on foot is input, adopts (2) the marginal point method of attachment in claim 3 to connect marginal point, and result is as accompanying drawing 3(c) as shown in;
(2) mapping graph after connecting take marginal point is as input, adopts (3) connected domain in claim 3 to collect and partitioning algorithm obtains larger connected domain, and result is as accompanying drawing 3(d) as shown in;
(3) connected domain that the upper step was obtained, right to use require (4) connected domain in 3 to refine and the filter method that trails obtains more accurately the regional location size and tentatively filters, and result is as accompanying drawing 3(e) as shown in;
(4) to filtering rear remaining connected domain, right to use requires (5) the combination entropy filtrator in 3 to remove non-caption area, and last testing result is as accompanying drawing 3(f) as shown in;
(5) for detected a certain specific caption area of upper step, as accompanying drawing 4(a) as shown in, first right to use require (6) repeatability in 4 to detect judge its whether with surveyed area repetition before, as not repeating, right to use requires (7) the color utmost point unified approach in 4 that this zone is unified into the black matrix wrongly written or mispronounced character;
(6) caption area after extremely to unified color, right to use requires binaryzation and the denoise algorithm in 5, obtains binary map preferably, result is as accompanying drawing 4(b) as shown in;
(7) use business OCR software that binary map is identified, result is as accompanying drawing 4(c) as shown in.
Can find out from accompanying drawing, this method can detect the caption area in video frame image preferably, and with it binaryzation, the result after binaryzation can reach accuracy of identification preferably.

Claims (4)

1. video caption recognition methods based on marginal information and Distribution Entropy is characterized in that its step is as follows:
1) detect the difference of present frame and last processed frame, if difference is large, carry out following subtitle recognition operation, judge otherwise continue to get next frame;
2) at first subtitle recognition carries out the captions detection, use in captions detect that rim detection, marginal point connect, connected domain collection and dividing method, connected domain is refined and the filter method that trails obtains candidate's textview field and position thereof, remove non-textview field with the combination entropy filtrator again, only stay caption area;
3) caption area is carried out repeatability and detect, if should not repeat in the zone, its color is extremely unified to be the black matrix wrongly written or mispronounced character, then carry out captions and extract, otherwise process next caption area;
4) in captions extract, color utmost point caption area is after reunification carried out binaryzation, send the identification of OCR software after the removal noise spot; The difference of described detection present frame and last processed frame if difference is large, is carried out the operation of following subtitle recognition, and carry out determining step and be otherwise continue to get next frame: establishing this frame is I i, its edge binary map is E i, its last processed frame is that front the 5th frame is I i-5, its edge binary map is E I-5, make D I, i-5=E i⊕ E I-5, the last detected arbitrary caption area of order is Area I-5, j, again last each caption area edge binary map cumulative and minimum value be pMES, in present frame, the cumulative difference of caption area is calculated as follows: cFD = Σ j D i , i - 5 ( Area i - 5 , j ) - - - 1
If cFD is less than or equal to pMES * 0.5, do not need this frame is carried out the subtitle recognition operation, continue to get next the 5th frame and judge, otherwise just need to carry out the subtitle recognition operation to this frame, in order further to prevent from missing captions, separately establish a count value ck, each cFD is less than or equal to pMES * 0.5 an o'clock ck value and adds 1, otherwise ck resets to 0, if ck equals 5, regardless of the front judgement, all need this frame is carried out subtitle recognition operation, ck reassignment 0 simultaneously.
2. a kind of video caption recognition methods based on marginal information and Distribution Entropy according to claim 1, it is characterized in that at first described subtitle recognition carry out captions and detect, use in captions detect that rim detection, marginal point connect, connected region collection and dividing method, connected region is refined and the filter method that trails obtains candidate's textview field and position thereof, remove non-textview field with the combination entropy filtrator again, only stay the caption area step and be:
(1) edge detection method
Given image I adopts Sobel operator Edge detected, and the Sobel operator is by horizontal S H, vertical S V, diagonal line S LD, contrary diagonal line S RDGradient template on four direction forms, and fringing field is calculated by following formula:
S=MAX(|S H|,|S V|,|S LD|,|S RD|)+k×|S ⊥-MAX| 2
Wherein ⊥-MAX is illustrated in pixel (x, y) and locates direction with greatest gradient absolute value perpendicular direction, and k is an adjustment factor, its value is 1 herein, then S is quantized into 16 grades, is expressed as S ' after quantification, then obtains edge map EdgeMap with following formula:
EdgeMap ( x , y ) = 1 S &prime; ( x , y ) > = 15 0 S &prime; ( x , y ) < 15 - - - 3
(2) marginal point method of attachment
For edge map EdgeMap, if the distance of two marginal points of going together is less than a certain threshold values T d, the pixel value between these two pixels in EdgeMap all is set to 1, T dDetermined by following formula:
T d = max ( 4 , min ( [ max ( height , width ) 50 ] , 16 ) ) - - - 4
Wherein height and width are respectively the height and width of image I;
(3) connected region is collected and dividing method
The EdgeMap that the upper step was obtained carries out the connected region collection, remove those high or wide less than 1% high or wide connected region of entire image, remove simultaneously those minimum area-encasing rectangles less than the connected region of entire image area 0.2%, re-use following steps each connected region C is carried out Region Segmentation:
A) for the every i ∈ [t of delegation in connected region C c, b c], obtain the area A rea of the minimum area-encasing rectangle of this row and above part up(i) and the area A rea of the minimum area-encasing rectangle of the following part of this row Down(i), obtain this two sum of areas, find out obtain minimum and line number be stored in bR;
B) for each the row j ∈ [l in connected region C c, r c], obtain the area A rea of the minimum area-encasing rectangle of these row and left-hand component Left(j) and the area A rea of the minimum area-encasing rectangle of this row right-hand component Right(j), obtain this two sum of areas, find out obtain minimum and row number be stored in bC;
C) make mRA=Area up(bR)+Area Down(bR), mCA=Area Left(bC)+Area Right(bC), if mRA<mCA, connected region C being expert at upper is divided into two connected regions with bR behavior circle, otherwise connected region C is classified as the boundary with bC and is divided into two connected regions listing;
T wherein c, b c, l cAnd r cRespectively upper bound line number, the lower bound line number of connected region C, left boundary row number and right boundary row number;
For prevent over-segmentation, cut apart just only have when connected region C satisfies following two conditions simultaneously: 1. the connected region filling rate is less than 0.8; 2. two new connected region areas that are divided into are all greater than 0.2% of entire image area;
(4) connected region is refined and is trailed filter method
Carrying out before the zone refines, removing first that those are tall and big in the wide connected region of 2 times, may delete like this captions of those vertical setting of types by mistake, in order to process the vertical setting of types captions, need only be with image rotation 90 degree, other operate striking resemblances;
To each connected region C that the upper step obtains, the step of further being refined in its position is as follows:
Input: edge map edgeMap, the initial bound line number t of connected region C c, b c
Output: the bound line number ut after refining c, ub c
D) for any capable i ∈ [t of connected region C c, b c], calculate its left and right in edgeMap non-zero pixel span r i-l i, and be stored in set cSA;
E) for any capable i ∈ [t of connected domain C c, b c], calculate its capable pixel number in edgeMap, and be stored in set oPNA
Figure FDA00002844657000031
The maximal value of f) getting in cSA exists in pCS, and its sequence number is existed in pSRN;
The maximal value of getting in oPNA exists in pOPN, and its sequence number is existed in pPRN;
G) at i ∈ [t c, pSRN] and all interior row of scope, get cSA[i]<pCS * η 1Maximum row sequence number t 1
For at i ∈ [pSRN, b c] all interior row of scope, get cSA[i]<pCS * η 1Minimum row sequence number b 1
For at i ∈ [t c, pPRN] and all interior row of scope, get oPNA[i]<pOPN * η 2Maximum row sequence number t 2
For at i ∈ [pPRN, b c] all interior row of scope, get oPNA[i]<pOPN * η 2Minimum row sequence number b 2
H) make ut c=max (t 1, t 2), ub c=min (b 1, b 2), the bound line number ut after namely being refined c, ub c
η wherein 1And η 2Usually value is 0.6 and 0.3;
Use following hangover filter method to remove some non-captions connected domains:
I) step g in the above) complete after, continue in oPNA to upper and lower scanning, until the value at current line place is less than pOPN * η 3, suppose that the line number that obtains is respectively t TailAnd b Tail
J) use the length of following formula figuring the tail:
tl 1=t 2-t tail,tl 2=b tail-b 2,tl=max(tl 1,tl 2)
K) filter with following formula, if deleteFlag (C) is 1, illustrate that this connected region is not caption area, should delete;
deleteFlag ( C ) = 1 tl > ( ub c - ut c ) &times; &eta; 4 0 others - - - 5
Ut wherein cAnd ub cRepresent respectively the bound line number after connected region C refines, and η 3And η 4Usually value is 0.2 and 0.3;
(5) combination entropy filtrator
Use the combination entropy filtrator of associating foreground pixel Distribution Entropy and edge pixel Distribution Entropy to filter, only stay caption area;
For the foreground pixel Distribution Entropy, be the minimum area-encasing rectangle Rect[t to a certain connected region C c, b c, l c, r c], t wherein c, b cRespectively the bound line number, l c, r cBe respectively circle, left and right row number, use the Otsu threshold values with its binaryzation, then be divided into 2 row * 4 row=8 parts, use following formula Computation distribution entropy:
E FPD = - &Sigma; i , j ( p i , j ln p i , j + ( 1 - p i , j ) ln ( 1 - p i , j ) ) i &Element; { 1,2 } , j &Element; { 1,2,3,4 } - - - 6
P wherein i,jThe ratio that represents the non-zero pixel of i capable j row part;
For the edge pixel Distribution Entropy, be the minimum area-encasing rectangle Rect[t with connected region C c, b c, l c, r c] in Sobel edge binary map be divided into 2 row * 4 row=8 parts, use following formula Computation distribution entropy:
E E = - &Sigma; i , j ( e ij e r ln e ij e r + ( 1 - e ij e r ) ln ( 1 - e ij e r ) ) i &Element; { 1,2 } , j { 1,2,3,4 } - - - 7
E wherein ijRepresent the capable j row of i part edge pixel number, and e r8 part edge number of pixels summations,
For the connected region C after arbitrary refining, if its E FPD>E T1And E E>E T2, think that it is caption area, otherwise be just non-caption area, should delete, test to get E T1And E T2Get respectively 6.4 and 2.76 o'clock effects best;
The image that the vertical setting of types captions are arranged again for some existing horizontally-arranged carries out captions and detects, then both testing results are merged in the image of former figure and 90-degree rotation gained, eliminate and repeat.
3. a kind of video caption recognition methods based on marginal information and Distribution Entropy according to claim 1, it is characterized in that describedly caption area is carried out repeatability detecting, if should not repeat in the zone, its color is extremely unified to be the black matrix wrongly written or mispronounced character, then carry out captions and extract, otherwise process next caption area step be:
(6) repeatability detects
Adopt binding site and the histogrammic method of greyscale color that detected caption area is disappeared heavily, step is as follows:
L) extract and store all caption area position Rect of last processed frame i[t i, b i, l i, r i] and grey level histogram GH i{ g i,0, g i,1... g i,k... g I, 255, g wherein i,kBe i the number of pixels that the caption area gray level is k; Extract and store all caption area position Rect of present frame j[t j, b j, l j, r j] and grey level histogram GH j{ g j,0, g j,1... g J, 255;
M) calculate their location similarity
Figure FDA00002844657000043
With the grey level histogram similarity
Figure FDA00002844657000044
Rect wherein i∩ Rect jThe area of their public part, and max (Rect i, Rect j) be the area of large that in them, if Simi Loc(i, j) and Simi GHis(i, j) has one greater than 0.8, is the duplicate detection of same area, and remove one this moment, keeps one;
(7) color is extremely unified
The caption area gray-scale map is unified into the black matrix wrongly written or mispronounced character, takes following steps:
N) at first the caption area after gray processing is used the value of Otsu method two, then used respectively 3 * 3 mask k w ( x , y ) = 0 - 1 0 - 1 4 - 1 0 - 1 0 With k b ( x , y ) = 0 1 0 1 - 4 1 0 1 0 Caption area after binaryzation is carried out convolution operation, determines the edge color at each pixel place with following formula:
P ( x , y ) = White _ Edge , K w ( x , y ) > 0 Black _ Edge , K b ( x , y ) > 0 Non _ Edge , K w ( x , y ) &le; 0 and K b ( x , y ) &le; 0 - - - 8
Make N wAnd N bRepresent respectively white edge number of pixels and black border number of pixels, definition R 1=N w/ N bFor they the ratio;
O) the edge map P that formula of upper steps 8 is obtained presses the row projection with edge pixel, establishes edge map and resolves into { x listing 0, x 1..., x n, x wherein iListing the mid point that is projected as 0 a certain continuum, set up successively rectangle Rect for edge map i[1, height, x i, x i+1], in the edge map P in this rectangular extent, from four limits to interscan, first edge pixel point deletion that will run into is added up white edge number of pixels and black border number of pixels again, is made as respectively N w' and N b';
P) definition R 2=N w'/N b' be their ratio, definition
&Delta;R = R 2 - R 1 max ( R 1 , R 2 ) , - 1 &le; &Delta;R &le; 1 - - - 9
Judge with the following method the color utmost point of caption area:
(a) if Δ R>T 2h, be wrongly written or mispronounced character;
If T 2l≤ Δ R≤T 2h, work as R 1>T 2vThe time, captions are white, work as R 1≤ T 2vThe time captions be black;
If T 1h≤ Δ R≤T 2l, be wrongly written or mispronounced character;
(d) if T 1l≤ Δ R≤T 1h, work as R 1>T 1vThe time, captions are white, work as R 1≤ T 1vThe time captions be black;
If Δ R<T 1l, be surplus;
T wherein 1l=-0.25, T 1h=-0.15, T 1v=1.2, T 2l=0, T 2h=0.35, T 2v=0.8;
Q) judge the captions color extremely after, if surplus with this caption area gray-scale map inverse, otherwise does not operate.
4. a kind of video caption recognition methods based on marginal information and Distribution Entropy according to claim 1, it is characterized in that describedly in captions extract, color utmost point caption area after reunification being carried out binaryzation, send OCR software identification step to be after removing noise spot:
R) the high regular of caption area with gray processing turns to 24, then expands up and down respectively 4 pixels, thereby is highly 32 after expansion, and the caption area image after expansion is made as EI;
S) each pixel of binary map B as a result is initialized as 1, then EI is carried out the local threshold values binaryzation of step-by-step movement level, carry out binaryzation with the Otsu method in the local window of 16 * 32,8 pixels of every sub-level stepping, same method is carried out the vertical local threshold values binaryzation of step-by-step movement to EI, carry out binaryzation with the Otsu method in the local window of image_width * 8, each vertical 4 pixels of stepping, in each subwindow, in EI, gray-scale value is lower than local threshold values, and in B, corresponding pixel value is made as 0; T) will be that the pixel that 1 pixel is connected sets to 0 with the expansion area thresholding in B, in order to prevent that the stroke pixel also is set to 0, definition dam points:
Dam points={(x,y)|B(x,y)=1∧1≤min(H_len(x,y),V_len(x,y)≤4
The length of the longest horizontal continuity 1 sequence at H_len (x, y) expression pixel (x, y) place wherein, and the length of the longest vertical continuous 1 sequence at V_len (x, y) expression pixel (x, y) place, for dam points point, can't expand to background pixel;
U) use the Sobel operator to obtain the marginal information of EI, it is 1 connected region to each value in B, statistics drops on wherein or around its number epn of edge pixel point, if epn<tepn, all pixels with this connected region are set to 0, thereby this connected region is removed, and tepn determines with following formula:
tepn=max(cheight,cwidth)
Wherein cheight and cwidth are respectively the height and width of this connected region;
V) binary map B being sent into OCR software identifies.
CN 201110024330 2011-01-23 2011-01-23 Method for recognizing and designing video captions based on edge information and distribution entropy Expired - Fee Related CN102208023B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110024330 CN102208023B (en) 2011-01-23 2011-01-23 Method for recognizing and designing video captions based on edge information and distribution entropy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110024330 CN102208023B (en) 2011-01-23 2011-01-23 Method for recognizing and designing video captions based on edge information and distribution entropy

Publications (2)

Publication Number Publication Date
CN102208023A CN102208023A (en) 2011-10-05
CN102208023B true CN102208023B (en) 2013-05-08

Family

ID=44696845

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110024330 Expired - Fee Related CN102208023B (en) 2011-01-23 2011-01-23 Method for recognizing and designing video captions based on edge information and distribution entropy

Country Status (1)

Country Link
CN (1) CN102208023B (en)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102780856B (en) * 2012-04-12 2013-11-27 天脉聚源(北京)传媒科技有限公司 Method for annotating subtitles in news video
CN103377379A (en) * 2012-04-27 2013-10-30 佳能株式会社 Text detection device and method and text information extraction system and method
CN103136523B (en) * 2012-11-29 2016-06-29 浙江大学 Any direction text line detection method in a kind of natural image
US9177383B2 (en) * 2013-08-29 2015-11-03 Analog Devices Global Facial detection
WO2018023538A1 (en) * 2016-08-04 2018-02-08 黄新勇 Method and system for extracting television broadcast subtitle
CN106355172A (en) * 2016-08-11 2017-01-25 无锡天脉聚源传媒科技有限公司 Character recognition method and device
CN107590447B (en) * 2017-08-29 2021-01-08 北京奇艺世纪科技有限公司 Method and device for recognizing word title
CN108982106B (en) * 2018-07-26 2020-09-22 安徽大学 Effective method for rapidly detecting kinetic mutation of complex system
CN109284751A (en) * 2018-10-31 2019-01-29 河南科技大学 The non-textual filtering method of text location based on spectrum analysis and SVM
CN111754414B (en) * 2019-03-29 2023-10-27 北京搜狗科技发展有限公司 Image processing method and device for image processing
CN110197177B (en) * 2019-04-22 2024-03-19 平安科技(深圳)有限公司 Method, device, computer equipment and storage medium for extracting video captions
CN111064990B (en) * 2019-11-22 2021-12-14 华中师范大学 Video processing method and device and electronic equipment
CN113496223A (en) * 2020-03-19 2021-10-12 顺丰科技有限公司 Method and device for establishing text region detection model
CN111783771B (en) * 2020-06-12 2024-03-19 北京达佳互联信息技术有限公司 Text detection method, text detection device, electronic equipment and storage medium
CN111860521B (en) * 2020-07-21 2022-04-22 西安交通大学 Method for segmenting distorted code-spraying characters layer by layer
CN111967526B (en) * 2020-08-20 2023-09-22 东北大学秦皇岛分校 Remote sensing image change detection method and system based on edge mapping and deep learning
CN111741236B (en) * 2020-08-24 2021-01-01 浙江大学 Method and device for generating positioning natural image subtitles based on consensus diagram characteristic reasoning
CN112925905B (en) * 2021-01-28 2024-02-27 北京达佳互联信息技术有限公司 Method, device, electronic equipment and storage medium for extracting video subtitles
CN113485432A (en) * 2021-07-26 2021-10-08 西安热工研究院有限公司 Photovoltaic power station electroluminescence intelligent diagnosis system and method based on unmanned aerial vehicle
CN116453030B (en) * 2023-04-07 2024-07-05 郑州大学 Building material recycling method based on computer vision

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1835462A1 (en) * 2004-12-02 2007-09-19 National Institute of Advanced Industrial Science and Technology Tracing device, and tracing method
CN101122952A (en) * 2007-09-21 2008-02-13 北京大学 Picture words detecting method
CN101833664A (en) * 2010-04-21 2010-09-15 中国科学院自动化研究所 Video image character detecting method based on sparse expression

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100836197B1 (en) * 2006-12-14 2008-06-09 삼성전자주식회사 Apparatus for detecting caption in moving picture and method of operating the apparatus

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1835462A1 (en) * 2004-12-02 2007-09-19 National Institute of Advanced Industrial Science and Technology Tracing device, and tracing method
CN101122952A (en) * 2007-09-21 2008-02-13 北京大学 Picture words detecting method
CN101833664A (en) * 2010-04-21 2010-09-15 中国科学院自动化研究所 Video image character detecting method based on sparse expression

Non-Patent Citations (12)

* Cited by examiner, † Cited by third party
Title
A comprehensive method for multilingual video text detection, localization, and extraction;Michael R.Lyu等;《IEEE Transactions on Circuits and Systems for Video Technology 2005》;20050228;第15卷(第2期);243-255 *
A New Approach for Overlay Text Detection and Extraction from Complex Video Scene;Wonjun Kim等;《IEEE Transactions on Image Processing 2009》;20090228;第18卷(第2期);401-411 *
A robust statistic method for classifying color polarity of video text;J. Song等;《Acoust, Speech, and Signal Processing, 2003. Proceedings.(ICASSP ’03). 2003 IEEE International Conference on》;20030410;第3卷;581-584 *
J. Song等.A robust statistic method for classifying color polarity of video text.《Acoust, Speech, and Signal Processing, 2003. Proceedings.(ICASSP ’03). 2003 IEEE International Conference on》.2003,第3卷581–584.
Michael R.Lyu等.A comprehensive method for multilingual video text detection, localization, and extraction.《IEEE Transactions on Circuits and Systems for Video Technology 2005》.2005,第15卷(第2期),243-255.
Wonjun Kim等.A New Approach for Overlay Text Detection and Extraction from Complex Video Scene.《IEEE Transactions on Image Processing 2009》.2009,第18卷(第2期),401-411.
基于多帧图像的视频文字跟踪和分割算法;密聪杰等;《计算机研究与发展》;20060930;第43卷(第9期);1523-1529 *
基于时空域信息的视频字幕提取算法研究;沈淑娟;《中国优秀硕士学位论文全文数据库》;20040611;45-47 *
密聪杰等.基于多帧图像的视频文字跟踪和分割算法.《计算机研究与发展》.2006,第43卷(第9期),1523-1529.
沈淑娟.基于时空域信息的视频字幕提取算法研究.《中国优秀硕士学位论文全文数据库》.2004,45-47.
视频中的文本提取及其应用;陆兵;《中国优秀硕士学位论文全文数据库》;20071012;7-62 *
陆兵.视频中的文本提取及其应用.《中国优秀硕士学位论文全文数据库》.2007,7-62.

Also Published As

Publication number Publication date
CN102208023A (en) 2011-10-05

Similar Documents

Publication Publication Date Title
CN102208023B (en) Method for recognizing and designing video captions based on edge information and distribution entropy
Gllavata et al. A robust algorithm for text detection in images
Zhang et al. Extraction of text objects in video documents: Recent progress
CN104751142B (en) A kind of natural scene Method for text detection based on stroke feature
JP5492205B2 (en) Segment print pages into articles
WO2018018788A1 (en) Image recognition-based meter reading apparatus and method thereof
CN101593276B (en) Video OCR image-text separation method and system
CN101122953A (en) Picture words segmentation method
DE102013206009A1 (en) Robust cutting of license plate images
KR20010110416A (en) Video stream classifiable symbol isolation method and system
CN102081731A (en) Method and device for extracting text from image
CN101122952A (en) Picture words detecting method
CN110598566A (en) Image processing method, device, terminal and computer readable storage medium
Bijalwan et al. Automatic text recognition in natural scene and its translation into user defined language
CN104598907A (en) Stroke width figure based method for extracting Chinese character data from image
CN110633635A (en) ROI-based traffic sign board real-time detection method and system
CN113971792A (en) Character recognition method, device, equipment and storage medium for traffic sign board
Gllavata et al. A text detection, localization and segmentation system for OCR in images
Anthimopoulos et al. Multiresolution text detection in video frames
JP6377214B2 (en) Text detection method and apparatus
Seeri et al. A novel approach for Kannada text extraction
Satish et al. Edge assisted fast binarization scheme for improved vehicle license plate recognition
Mol et al. Text recognition using poisson filtering and edge enhanced maximally stable extremal regions
CN116030472A (en) Text coordinate determining method and device
Kaur et al. An Efficient Method of Number Plate Extraction from Indian Vehicles Image

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130508

Termination date: 20140123