CN102724384B

CN102724384B - Detecting method for three-dimensional video subtitles and system using same

Info

Publication number: CN102724384B
Application number: CN201210208898.2A
Authority: CN
Inventors: 戴琼海; 李龙弢; 王瑞平
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2012-06-19
Filing date: 2012-06-19
Publication date: 2015-01-14
Anticipated expiration: 2032-06-19
Also published as: CN102724384A

Abstract

The invention provides a detecting method for three-dimensional video subtitles and a system using the same. The detecting system for three-dimensional video subtitles comprises a delay module, an edge-finding module, an averaging module, a subtitle area determination module and a memory cell array. The detecting method for three-dimensional video subtitles comprises the steps of inputting video synchronization signals and video data; extracting video format information and determining the dimension of subtitle detection panes; carrying out delay treatment for the input video synchronization signals; calculating the sum of absolute values of video data edges, and calculating line average values and column average values for results; and comparing the results of the average values with threshold values to determine whether panes corresponding to the column average values belong to subtitle areas. According to the invention, the detecting method and the system using the same adopt programmable devices to detect subtitle areas by programming, thereby having the advantages of small size, low cost, high efficiency and high speed. The detecting method for subtitle areas for hardware can detect subtitles in real time, and can be used for subsequent subtitle identification, extracting video information, improving three-dimensional conversion effects and the like.

Description

Stereo video subtitle detection method and system using same

Technical Field

The invention relates to the technical field of video processing, in particular to a stereo video subtitle detection method and a system using the method.

Background

At present, the full-automatic plane three-dimensional conversion technology can convert a plane video into a three-dimensional video in real time without manual participation, and can solve the problem of insufficient three-dimensional film sources. However, some stereoscopic video conversion algorithms or systems may have obvious jitter in the subtitle region, which affects the viewing effect. One subtitle detection method is to detect subtitles through software, and the method is designed for software, so that subtitle regions are difficult to detect from real-time videos, and more resources are needed when the method is applied to hardware. Due to the limitation of hardware resources, especially the difference between hardware real-time video and software video, software algorithms cannot be simply applied to hardware systems. On the other hand, the programmable device has the advantages of small size, low cost, high speed, high parallelism and the like, and the subtitle area is programmed and detected by utilizing the programmable device, so that the programmable device has the advantages of high efficiency and high speed. Therefore, designing a method and a system for detecting a caption area applied to hardware to realize real-time caption detection is a technical problem which needs to be solved urgently at present.

Disclosure of Invention

The invention aims to at least solve the technical problems in the prior art, and particularly provides a stereo video subtitle detection method and a system using the method.

In order to achieve the above object, according to one aspect of the present invention, there is provided a stereoscopic video subtitle detecting method including:

s1: inputting a video synchronization signal and video data;

s2: extracting video format information from the input video synchronous signal, and determining a subtitle detection pane size n multiplied by m according to the video format information and a video selection mode, wherein n is the height of the subtitle detection pane, and m is the length of the subtitle detection pane;

s3: performing delay processing on an input video synchronous signal according to the video format information and the video selection mode, and outputting a delayed video synchronous signal;

s4: calculating the sum of absolute values of edges in the direction of the video data X, Y according to the video selection mode;

s5: obtaining a row average value and a column average value from the result of the step S4 according to a video selection mode;

s6: and comparing the result of the step S5 with a threshold value, determining whether the pane corresponding to the column average value belongs to the subtitle area, and outputting a judgment result.

The method for detecting the caption area by programming the programmable device has the advantages of small volume, low cost, high efficiency and high speed, can detect the caption in real time, and can be used for subsequent caption identification, video information extraction, stereo conversion effect improvement and the like.

In order to achieve the above object of the present invention, according to another aspect of the present invention, there is provided a stereoscopic video subtitle detecting system including: the device comprises a delay module, an edge solving module, an average solving module, a caption area determining module and a storage unit array; the storage unit array comprises a first storage unit array, a second storage unit array and a third storage unit array, the first storage unit array is connected with the edge calculating module, the second storage unit array is connected with the average calculating module, and the third storage unit array is connected with the caption area determining module; the delay module receives an input video synchronization signal, delays the input video synchronization signal and outputs a delayed video synchronization signal; the edge calculating module receives an input video synchronization signal and video data, calculates the sum of absolute values of edges in the direction of the video data X, Y by using the first storage cell array and transmits the result to the averaging module; the averaging module uses a second storage cell array to calculate a row average value and a column average value of the sum of absolute values of edges in the direction of the video data X, Y and transmits the result to the caption area determining module; and the caption area determining module determines whether the pane corresponding to the average result belongs to the caption area by using a third storage unit module, and outputs a judgment result.

The stereo video caption detection system of the invention adopts the programmable device to program and detect the caption area, and has the advantages of small volume, low cost, high efficiency and high speed.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

fig. 1 is a block diagram of a stereo video caption detection system in a preferred embodiment of the present invention;

FIG. 2 is a diagram of the relationship between the video data line length and the valid data line length of the enable signal DEN in a preferred embodiment of the present invention;

FIG. 3 shows a delayed video sync signal x according to a preferred embodiment of the present invention_tThe implementation diagram of (1);

FIG. 4 is a flow chart of the line average calculation of video data according to the present invention;

FIG. 5 is a timing diagram illustrating read and write operations of a second memory cell array according to the present invention;

FIG. 6 is a flowchart illustrating the operation of the present invention in averaging the rows for the corresponding columns of n rows.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

In the description of the present invention, it is to be understood that the terms "longitudinal", "lateral", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, and are used merely for convenience of description and for simplicity of description, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed in a particular orientation, and be operated, and thus, are not to be construed as limiting the present invention.

In the description of the present invention, unless otherwise specified and limited, it is to be noted that the terms "mounted," "connected," and "connected" are to be interpreted broadly, and may be, for example, a mechanical connection or an electrical connection, a communication between two elements, a direct connection, or an indirect connection via an intermediate medium, and specific meanings of the terms may be understood by those skilled in the art according to specific situations.

Fig. 1 is a structural diagram of a stereo video caption detection system in a preferred embodiment of the present invention, and as can be seen from the diagram, the stereo video caption detection system includes a delay module, an edge-finding module, an average-finding module, a caption-region-determining module, and a memory cell array, the memory cell array includes a first memory cell array, a second memory cell array, and a third memory cell array, the first memory cell array is connected to the edge-finding module, the second memory cell array is connected to the average-finding module, and the third memory cell array is connected to the caption-region-determining module. The delay module receives an input video synchronization signal, delays the input video synchronization signal and outputs a delayed video synchronization signal; the edge calculating module receives the input video synchronization signal and the video data, calculates the sum of absolute values of edges in the direction of the video data X, Y by using the first storage cell array and transmits the sum of absolute values to the average calculating module; the averaging module uses the second storage cell array to calculate the row average value and the column average value of the sum of the absolute values of the edges of the video data X, Y in the direction, and transmits the result to the subtitle area determining module; and the caption area determining module determines whether the pane corresponding to the average result belongs to the caption area by using the third storage unit module, and outputs a judgment result.

As can be seen from fig. 1, when the stereo video caption detection system detects a stereo video caption, the averaging module is further connected to the input video synchronization signal and the delayed video synchronization signal, and the caption region determining module is further connected to the delayed video synchronization signal.

The invention also provides a stereo video subtitle detection method, which can be applied to the stereo video subtitle detection system of the invention and comprises the following steps:

s1: inputting a video synchronization signal and video data;

s2: extracting video format information from an input video synchronous signal, and determining a subtitle detection pane size n multiplied by m according to the video format information and a video selection mode, wherein n is the height of the subtitle detection pane, and m is the length of the subtitle detection pane;

s3: performing delay processing on the input video synchronous signal according to the video format information and the video selection mode, and outputting a delayed video synchronous signal;

s5: obtaining a row average value and a column average value for the result of step S4 according to the video selection mode;

In a preferred embodiment of the present invention, the method for detecting subtitles of stereoscopic video includes the steps of:

in the first step, a video sync signal and video data are input, and in the present embodiment, the video sync signal x0 includes an enable signal DEN, a line sync signal HSYNC, and a frame sync signal VSYNC.

And secondly, extracting video format information from the input video synchronous signal, and determining the size n multiplied by m of the caption detection window pane according to the video format information and the video selection mode, wherein n is the height of the caption detection window pane, and m is the length of the caption detection window pane. In this embodiment, the extracted video format information includes a video data line length, an effective data line length, and a video effective data line number; the video selection mode is one of flat video, column interlaced stereoscopic video, or row interlaced stereoscopic video.

Fig. 2 is a diagram showing a relationship between a video data line length and a valid data line length of the enable signal DEN, and as shown in fig. 2, in the present embodiment, a method for extracting video data line length and valid data line length information includes: the valid data line length information can be extracted by accumulating when the enable signal DEN is active (i.e., the value of DEN is 1) and updating the valid data line length and zeroing the counter when the enable signal DEN is inactive (i.e., the value of DEN is 0). Counting two rising edge time intervals of the enable signal DEN, and updating the video data line length at the rising edge of the enable signal DEN and zeroing a counter if the frame synchronization signal VSYNC does not jump between the two rising edges; if there is a transition in the frame synchronization signal VSYNC between two rising edges, the counter is only zeroed, so that the video data line length information can be extracted. The number of video effective data lines is updated at the falling edge of the frame synchronization signal VSYNC, the counter is reset to zero, and then the rising edge of the enable signal DEN is counted, so that the information of the number of video effective data lines can be extracted.

In the present embodiment, if the effective data line length is less than 1040, the subtitle detection pane length m =16, and if the effective data line length is not less than 1040, the subtitle detection pane length m = 32; if the number of valid data lines is less than 640, the caption detection pane is taken to be n =16 high, and if the number of valid data lines is greater than or equal to 640, the caption detection pane is taken to be n =32 high.

And thirdly, delaying the input video synchronous signal according to the acquired video format information and the video selection mode, and outputting the delayed video synchronous signal. In this embodiment, the line synchronizing signal HSYNC in the input video synchronizing signal is delayed by p periods. Based on the video format information, repeating the operation of delaying 1 line n or n +1 times for the frame synchronization signal VSYNC in the video synchronization signal, and then delaying the p periods again on the basis of the operation, specifically, when the selection mode is normal flat video or column interlaced stereo video, the operation of delaying 1 line n times for the frame synchronization signal VSYNC is repeated, and when the selection mode is row interlaced stereo video, the operation of delaying 1 line n +1 times for the frame synchronization signal VSYNC is repeated. Based on the video format information, repeating the operation of delaying 1 line for n or n +1 times on the enable signal DEN in the video synchronization signal, and delaying for p cycles again, specifically, when the selection mode is normal flat video or column interlaced stereo video, repeating the operation of delaying 1 line for n times on the enable signal DEN, and when the selection mode is row interlaced stereo video, repeating the operation of delaying 1 line for n +1 times on the enable signal DEN. In this step p is the delay period on the longest path. Specific implementation is shown in fig. 3, and it can be seen that p periods can be directly delayed for the row synchronization signal HSYNC. For the enable signal DEN, two pieces of information, namely a video data line length and a valid data line length, need to be extracted from the DEN signal, and the enable signal DEN is delayed by n lines or n +1 lines and then delayed by p periods, wherein the delay of n lines or n +1 lines can be realized by serially connecting n lines or n +1 delay 1 lines, that is, the operation of delaying 1 line is repeated n times or n +1 times. The specific method for delaying the enable signal DEN by 1 line is as follows: counting is started at a rising edge of the enable signal DEN, the value of the DELAY signal DEN _ DELAY is set to be active (i.e., 1) and counting is continued when the count value reaches the valid data line length, and the value of the DELAY signal DEN _ DELAY is set to be inactive (i.e., 0) and counting is continued when the count value reaches the video data line length. For the frame synchronization signal VSYNC, it is necessary to extract the video data line length information from the enable signal DEN signal, delay the frame synchronization signal VSYNC by n lines or n +1 lines, and then delay by p periods, where the delay by n lines or n +1 lines can be realized by serially connecting n lines or n +1 delay 1 lines, and the specific method for delaying the frame synchronization signal VSYNC by 1 line is as follows: recording the state before and after the frame synchronizing signal VSYNC jump at the jump position of the frame synchronizing signal VSYNC and starting counting, and generating the same jump for the DELAY signal VSYNC _ DELAY according to the state before and after the frame synchronizing signal VSYNC jump when the counting value reaches the line length of the video data and stopping counting.

The fourth step is to calculate the sum of absolute values of edges in the direction of the video data X, Y according to the video selection mode, and in the present embodiment, to calculate the sum of absolute values of edges in the direction of the video data X, Y by using a sobel operator. In this process, it is necessary to utilize p × K₁The edge a of the video data X, Y direction is calculated by using sobel operator in the first memory cell array consisting of bit memory cells_x、a_ySum of absolute values of (b)_ijThe formula of (1) is:

when the selected mode is a normal flat video,

a_x=2(a_i,j+1-a_i,j-1)+(a_i-1,j+1+a_i+1,j+1-a_i-1,j-1-a_i+1,j-1)

a_y=2(a_i+1,j-a_i-1,j)+(a_i+1,j-1+a_i+1,j+1-a_i-1,j-1-a_i-1，j+1)

when the selection mode is column interlaced stereoscopic video,

a_x=2(a_i，j+2-a_i,j-2)+(a_i-1,j+2+a_i+1,j+2-a_i-1，j-2-a_i+1,j-2)

a_y=2(a_i+1,j-a_i-1,j)+(a_i+1,j-2+a_i+1,j+2-a_i-1,j-2-a_i-1，j+2)

when the selection mode is line interlaced stereoscopic video,

a_x=2(a_i,j+1-a_i,j-1)+(a_i-2,j+1+a_i+2,j+1-a_i-2,j-1-a_i+2,j-1)

a_y=2(a_i+2,j-a_i-2,j)+(a_i+2，j-1+a_i+2,j+1-a_i-2，j-1-a_i-2,j+1)

the sum of the absolute values of the edges in the direction of video data X, Y is: b_ij=(|a_x|+|a_y|)/A,

Wherein, a_i，jIs a certificate between 0 and 255, representing the gray value of ith row and jth column of the video frame, A is the scaling for limiting the bit number of the final result, q1 is the data amount required to be accessed for solving the sobel edge under row interleaving, p is the maximum width of the processed video, K is the maximum width of the processed video₁Is the number of bits of the calculation result. In this embodiment, a takes a value of 4, q1 takes a value of 4, and the maximum width of the video is 4p is 2880, K₁Is 9.

In this embodiment, the specific process of subtraction in the calculation formula is as follows: the subtractions are converted into complementary codes, then the complementary codes and the subtractions are subjected to addition operation, multiplication division is converted into sum of shift operation, and for points of image edges, the area beyond the image is filled with 0.

Fifthly, according to the video selection mode, the sum of absolute values of edges in the direction of X, Y video data is used to obtain the row average value and the column average value, and q2 p × K pixels are used in the calculation process₁A second memory cell array composed of bit memory cells, wherein q2 is the pane maximum height, in the present embodiment, the pane maximum height q2 is 31, and the calculation process specifically includes the following steps:

first, a row average value is obtained for the sum of absolute values of edges in the direction of video data X, Y and stored in a storage unit, and the sum b of absolute values of ith row and jth column is stored in a storage unit_ijLine average value c thereof_ijThe formula of (1) is:

when the selection mode is flat video or line interlaced stereoscopic video,

when the selection mode is column interlaced stereoscopic video,

average value of line c_ijStoring the positions of the storage units corresponding to the video data of i rows and j columns,

wherein,symbolRepresents the largest integer less than or equal to x.

Then, the column average value is calculated for the values of the corresponding columns in each row in the memory cell, and the row average value c for the jth column in the ith row is calculated_ijReading out the row average value c of the jth column corresponding to the first n-1 rows in the memory cell_i-n+1，j～c_i-1，jCalculating the column mean d_ijThe formula is as follows:

when the selection mode is flat video or line interlaced stereoscopic video,

when the selection mode is column interlaced stereoscopic video,

wherein,symbolRepresents the largest integer less than or equal to x.

Fig. 4 is a flowchart of the line average calculation work of video data, and the output value is determined according to the value of the pane length m and the selection mode. When m =16 and the selection mode is column interlaced stereoscopic video, the row average value c_ijComprises the following steps:

line mean c when m =16, the selection mode is flat video or line interlaced stereoscopic video_ijComprises the following steps:

when m =32 and the selection mode is column interlaced stereoscopic video, the row average value c_ijComprises the following steps:

line mean c when m =32, the selection mode is flat video or line interlaced stereoscopic video_ijComprises the following steps:

fig. 5 is a timing chart of read/write operations of the second memory array, and it can be seen that the read/write operation signals of 31 memory cells are represented by a variable WREN of 31 bits, and are set to 1 at the falling edge of the frame synchronizing signal VSYNC, and then cyclically shifted from low to high at the rising edge of each row synchronizing signal HSYNC. The read addresses of the 31 memory cells are set to 0 when the transition edge or value of the row sync signal HSYNC is equal to the effective data line length, and thereafter are accumulated at each clock. The read and write addresses of the 31 memory cells are perfectly identical and the write addresses are relatively behind the read addresses.

Fig. 6 is a flowchart of the operation of averaging the rows and columns of the n rows and corresponding columns, and it can be seen from the diagram that when n =16 and the selection mode is flat video or row interlaced stereo video, the column average d is_ijComprises the following steps:

when n =16 and the selection mode is column interlaced stereoscopic video, the column average value d_ijComprises the following steps:

when n =32 and the selection mode is flat video or line-interleaved stereoscopic video, the column average value d_ijComprises the following steps:

column average d when n =32 and the selection mode is column interlaced stereoscopic video_ijComprises the following steps:

sixth, average value d of rows_ijAnd comparing the average value result with a threshold value, determining whether the pane corresponding to the average value result belongs to the subtitle area, and outputting a judgment result. In the present embodiment, the threshold value is 10/16 which is the maximum value of the previous frame. In this step, a third memory cell array formed by q2 p × 1bit memory cells is needed, where q2 is the maximum height of the pane, and in this embodiment, the value of the maximum height q2 of the pane is 31, which specifically includes the following steps:

firstly: average value d of column_ijComparing with a threshold value, and recording the comparison result as e_ijIf d is_ijIf the value is larger than the threshold value, marking as a caption area, e_ij=1, if d_ijIf not, marking as non-caption area, e_ij=0；

Then: reading out e value of j column of first n-1 row in third memory cell array_i-n+1，j～e_i-1，jThen e is added_ijStoring the data into the ith row and the jth column in the corresponding third storage unit array;

then: for n values e_i-n+1，j～e_ijOperation to obtain e¹To e is aligned with¹Respectively delaying for 1-m-1 cycles to obtain e²～e^m，e¹The calculation formula of (2) is as follows:

when the selection mode is a flat video or a column interlaced stereoscopic video,

when the selection mode is line interlaced stereoscopic video,

then: for m values e²～e^mIs obtained by operationThe calculation formula of (2) is as follows:

when the selection mode is a flat video or a line interlaced stereoscopic video,

<math> <mrow> <mover> <mi>e</mi> <mo>&OverBar;</mo> </mover> <mo>=</mo> <munderover> <mi>∪</mi> <mrow> <mi>l</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>m</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msup> <mi>e</mi> <mrow> <mi>i</mi> <mo>-</mo> <mn>1</mn> </mrow> </msup> <mo>,</mo> </mrow> </math>

when the selection mode is column interlaced stereoscopic video,

<math> <mrow> <mover> <mi>e</mi> <mo>&OverBar;</mo> </mover> <mo>=</mo> <munderover> <mi>∪</mi> <mrow> <mi>l</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mrow> <mo>(</mo> <mi>m</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>/</mo> <mn>2</mn> </mrow> </munderover> <msup> <mi>e</mi> <mrow> <mi>i</mi> <mo>-</mo> <mn>2</mn> <mi>l</mi> <mo>,</mo> <mi>j</mi> </mrow> </msup> <mo>,</mo> </mrow> </math>

and finally: output of

In this embodiment, the operation of the third memory cell array is the same as the operation of the second memory cell array, and is not described herein again.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. A method for detecting a stereoscopic video subtitle is characterized by comprising the following steps:

s1: inputting a video synchronization signal and video data;

s5: obtaining a row average value and a column average value of the result of the step S4 according to a video selection mode, wherein the row average value is obtained for the sum of absolute values of edges in the video data X, Y direction and is stored in a storage unit, and the column average value is obtained for the value of the corresponding column of each row in the storage unit;

2. The method of claim 1, wherein the video selection mode is one of flat video, column interlaced stereo video, or row interlaced stereo video.

3. The method for detecting subtitles in stereoscopic video according to claim 1, wherein in the step S2, the video format information extracted from the input video sync signal includes a video data line length, an effective data line length and a video effective data line number, and if the effective data line length is less than 1040, the subtitle detection pane length m is 16, and if the effective data line length is not less than 1040, the subtitle detection pane length m is 32; if the number of effective data lines is less than 640, the caption detection pane is taken as high n as 16, and if the number of effective data lines is more than or equal to 640, the caption detection pane is taken as high n as 32.

4. The method for detecting subtitles for stereoscopic video according to claim 1, wherein in the step S3, a line sync signal in the input video sync signal is delayed by p cycles; repeating the operation of delaying 1 line for n or n +1 times on the frame synchronizing signal in the input video synchronizing signal based on the video format information, and delaying for p periods again on the basis; repeating the operation of delaying 1 line for n or n +1 times on the enable signal in the input video synchronous signal based on the video format information, and delaying for p periods again, wherein p is the delay period on the longest path.

5. The method for detecting subtitles of stereoscopic video according to claim 1, wherein in the step S4, a sum of absolute values of edges in a direction of the video data X, Y is calculated by using a sobel operator.

6. The method for detecting subtitles for stereoscopic video according to any one of claims 1, 2 and 5, wherein in the step S4, q1 p x K are used₁The first memory cell array is composed of bit memory cells, and the sobel operator calculates the edge a of the video data X, Y direction_x、a_ySum of absolute values of (b)_ijThe formula of (1) is:

when the selected mode is a normal flat video,

a_x＝2(a_i,j+1-a_i,j-1)+(a_i-1,j+1+a_i+1,j+1-a_i-1,j-1-a_i+1,j-1)

a_y＝2(a_i+1,j-a_i-1,j)+(a_i+1,j-1+a_i+1,j+1-a_i-1,j-1-a_i-1,j+1)

when the selection mode is column interlaced stereoscopic video,

a_x＝2(a_i,j+2-a_i,j-2)+(a_i-1,j+2+a_i+1,j+2-a_i-1,j-2-a_i+1,j-2)

a_y＝2(a_i+1,j-a_i-1,j)+(a_i+1,j-2+a_i+1,j+2-a_i-1,j-2-a_i-1,j+2)

when the selection mode is line interlaced stereoscopic video,

a_x＝2(a_i,j+1-a_i,j-1)+(a_i-2,j+1+a_i+2,j+1-a_i-2,j-1-a_i+2,j-1)

a_y＝2(a_i+2,j-a_i-2,j)+(a_i+2,j-1+a_i+2,j+1-a_i-2,j-1-a_i-2,j+1)

the sum of the absolute values of the edges in the direction of video data X, Y is:

b_ij＝(|a_x|+|a_y|)/A,

wherein, a_i,jIs an integer between 0 and 255, which represents the gray value of ith row and jth column of the video frame, A is the scaling, q1 is the data amount required to be accessed for solving the sobel edge under row interleaving, p is the maximum width of the processed video, K is the maximum width of the processed video₁Is the number of bits of the calculation result.

7. The stereoscopic video subtitle detection method of claim 1, wherein in the step S5, q2 p x K are used₁A second memory cell array composed of bit memory cells, wherein q2 is the maximum height of the window pane, and the step of obtaining the row average value and the column average value from the result of the step S4 specifically comprises the following steps:

s51: obtaining a line average value from the result of the step S4 and storing the line average value in a storage unit;

s52: and calculating the column average value of the values of the corresponding columns of the rows in the storage unit.

8. The method for detecting subtitles for stereoscopic video according to claim 7, wherein in the step S51, the sum b of absolute values of ith row and jth column is calculated_ijLine average value c thereof_ijThe formula of (1) is:

when the selection mode is flat video or line interlaced stereoscopic video,

when the selection mode is column interlaced stereoscopic video,

average value of line c_ijStoring i row and j column video dataThe location of the corresponding memory cell is,

wherein,symbolRepresents the largest integer less than or equal to x.

9. The method for detecting subtitles of stereoscopic video according to claim 7, wherein the step S52 specifically includes: row mean value c for ith row and jth column_ijReading out the row average value c of the jth column corresponding to the first n-1 rows in the memory cell_i-n+1,j～c_i-1,jCalculating the column mean d_ijThe formula is as follows:

when the selection mode is flat video or line interlaced stereoscopic video,

when the selection mode is column interlaced stereoscopic video,

wherein,symbolRepresents the largest integer less than or equal to x.

10. The method for detecting subtitles in stereoscopic video according to claim 1 or 2, wherein in the step S6, a third memory cell array comprising q2 p × 1bit memory cells is used, and the step S6 specifically comprises the steps of:

s61: the result d of step S5_ijComparing with a threshold value, and recording the comparison result as e_ijIf d is_ijIf the value is larger than the threshold value, marking as a caption area, e_ijIf d is equal to 1_ijIf not, marking as non-caption area, e_ij＝0；

S62: reading out e value of j column of first n-1 row in third memory cell array_i-n+1,j～e_i-1,j，Then e is added_ijStoring the data into the ith row and the jth column in the corresponding third storage unit array;

s63: for n values e_i-n+1,j～e_ijOperation to obtain e¹To e is aligned with¹Respectively delaying for 1-m-1 cycles to obtain e²～e^mSaid e is¹The calculation formula of (2) is as follows:

when the selection mode is line interlaced stereoscopic video,

s64: for m values e²～e^mIs obtained by operationSaidThe calculation formula of (2) is as follows:

<math> <mrow> <mover> <mi>e</mi> <mo>&OverBar;</mo> </mover> <mo>=</mo> <mi></mi> <munderover> <mrow> <mi></mi> <mo>∪</mo> </mrow> <mrow> <mi>l</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>m</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msup> <mi>e</mi> <mrow> <mi>i</mi> <mo>-</mo> <mi>l</mi> </mrow> </msup> <mo>,</mo> </mrow> </math>

when the selection mode is column interlaced stereoscopic video,

<math> <mrow> <mover> <mi>e</mi> <mo>&OverBar;</mo> </mover> <mo>=</mo> <mi></mi> <munderover> <mrow> <mi></mi> <mo>∪</mo> </mrow> <mrow> <mi>l</mi> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mrow> <mo>(</mo> <mi>m</mi> <mo>-</mo> <mn>1</mn> <mo>)</mo> </mrow> <mo>/</mo> <mn>2</mn> </mrow> </munderover> <msup> <mi>e</mi> <mrow> <mi>i</mi> <mo>-</mo> <mn>21</mn> <mo>,</mo> <mi>j</mi> </mrow> </msup> <mo>,</mo> </mrow> </math>

s65: output of

11. The method of detecting subtitles for stereoscopic video according to claim 10, wherein the threshold is 10/16 of the maximum value of the previous frame.

12. A stereoscopic video caption detection system, comprising: the device comprises a delay module, an edge solving module, an average solving module, a caption area determining module and a storage unit array;

the storage unit array comprises a first storage unit array, a second storage unit array and a third storage unit array, the first storage unit array is connected with the edge calculating module, the second storage unit array is connected with the average calculating module, and the third storage unit array is connected with the caption area determining module;

the delay module receives an input video synchronization signal, delays the input video synchronization signal and outputs a delayed video synchronization signal;

the edge calculating module receives an input video synchronization signal and video data, calculates the sum of absolute values of edges in the direction of the video data X, Y by using the first storage cell array and transmits the result to the averaging module;

the averaging module uses a second storage cell array to calculate a row average value and a column average value of the sum of the absolute values of the edges in the direction of the video data X, Y and transmits the results to the subtitle area determining module, wherein the row average value is calculated for the sum of the absolute values of the edges in the direction of the video data X, Y and is stored in a storage cell, and the column average value is calculated for the value of the corresponding column in each row in the storage cell;

and the caption area determining module determines whether the pane corresponding to the average result belongs to the caption area by using a third storage unit module, and outputs a judgment result.