CN110533030A - Sun film image timestamp information extracting method based on deep learning - Google Patents
Sun film image timestamp information extracting method based on deep learning Download PDFInfo
- Publication number
- CN110533030A CN110533030A CN201910765276.1A CN201910765276A CN110533030A CN 110533030 A CN110533030 A CN 110533030A CN 201910765276 A CN201910765276 A CN 201910765276A CN 110533030 A CN110533030 A CN 110533030A
- Authority
- CN
- China
- Prior art keywords
- picture
- character
- image
- sun
- timestamp
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
- Character Discrimination (AREA)
Abstract
Sun film image timestamp information extracting method based on deep learning, including step 1: to the timestamp information zone location in solar chromosphere film image and cutting out;Timestamp information is in solar chromosphere film image for indicating the year, month, day, hour, min information of shooting time;Step 2: the character in timestamp information region is further divided, obtains single character by monocase segmentation;Step 3: character recognition first uses a large amount of sample training network, then identifies the single character that step 2 segmentation obtains to it using trained network, and its recognition result is integrated and saved.This method carries out machine automatic identification to the Digital Time-stamp in sun observation image film, and the temporal information output that will identify that.With the workload for reducing manual identified He writing temporal information, so as to accelerate the digitlization process of this batch of film data, the historical summary for keeping these precious is more easily for Solar Physics research.
Description
Technical field
The present invention relates to sun observation technical field of image processing, and in particular to a kind of sun film based on deep learning
Image temporal stabs information extracting method.
Background technique
Solar chromosphere layer is the atmosphere on photosphere, and as photosphere to the transition region of corona layer, magnetic field is less
Stablize, often generates violent flare burst phenomenon.The radiation of solar flare usually band-like appears in magnetic with elongated in chromosphere
The two sides of umpolung line (PIL), this is considered as the evidence of the representative configuration of magnetic reconnection.In order to study flare burst phenomenon, need
Related personnel carries out shooting record to solar chromosphere for a long time.Because of historical summary enormous amount, the time of very big a collection of chromosphere image
Information is still to be presented in data with image format, and the not formed digital information directly read for computer.This is to utilization
The expansion scientific research of these data brings big inconvenience.
On the one hand excavation to historical summary effective information has been deepened in the shooting time digitlization of image, on the other hand can be with
Greatly mitigate the retrieval workload of scientific research personnel, in order to they obtained from these data it is more valuable as a result, right
The progress of research work is very helpful.
Historical sun observation image mostly uses film to save, and its shooting time is imprinted on film.For the ease of section
It grinds personnel and effectively uses these image documents, it is necessary to extract the timestamp information in film.Picture number is very
Huge, manual identified and extraction are time-consuming and laborious.Therefore, using the timestamp information in Computer Automatic Recognition image, can be
Efficiently utilize the key point of these data.
Summary of the invention
In order to solve the above technical problems, the present invention provides a kind of sun film image timestamp information based on deep learning
Extracting method, this method carry out machine automatic identification to the Digital Time-stamp in sun observation image film, and will identify that
Temporal information output, to reduce the workload of manual identified and entry time information.So as to accelerate the number of this batch of film data
Word process, the historical summary for keeping these precious is more easily for Solar Physics research.
The technical scheme adopted by the invention is as follows:
Sun film image timestamp information extracting method based on deep learning, comprising the following steps:
Step 1: to the timestamp information zone location in solar chromosphere film image and cutting out;
Timestamp information is in solar chromosphere film image for indicating the year, month, day, hour, min information of shooting time;
Step 2: the character in timestamp information region is further divided, obtains single character by monocase segmentation;
Step 3: character recognition first uses a large amount of sample training network, then step 2 is divided to obtained single word
Symbol identifies it using trained network, and its recognition result is integrated and saved.
The step 1 comprises the steps of:
Step 1.1, the sun spherical aberration step based on upright projection:
Accumulation operations are carried out in vertical component to image and obtain the vector of a 1 × n, it is assumed that the size of picture be m ×
N, pixel are f in the pixel value that i row j is arrangedij(x, y) is then projected in vertical direction are as follows:
Wherein, S1jIndicate image the i-th column pixel summation as a result, S1jSize be 1 × n.It is being hung down by calculating picture
Histogram to projection can further judge the position of sun spherical surface.In S1jIt is solar chromosphere part between vector [400,1800]
The result of projection.Again because the sun is symmetrically, only to need to learn in vector S1jIn maximum value, can position the sun circle
The position of the heart, then the length in pixels according to sun spherical surface shared by vertical direction, the picture comprising sun land portions is gone
It removes.
Step 1.2, timestamp position and overturning correction are judged based on variance:
The variance of picture comprising timestamp is far longer than the variance of the picture not comprising timestamp.When to judge
Between picture where stamp, after learning picture where timestamp, need to carry out inversion correction to picture, then be needed if Image to left suitable
Hour hands, which turn over, to be turn 90 degrees, otherwise is rotated by 90 ° counterclockwise.
Step 1.3, the timestamp character zone essence segmentation based on sciagraphy:
The picture for being m × n for size, pixel are x in the pixel value that i row j is arrangedij, then both horizontally and vertically
Projection is respectively as follows:
Wherein, S1jIndicate image the i-th column pixel summation as a result, S1jSize be 1 × n.Si1Indicate image jth row
Pixel summation as a result, Si1Size be m × 1.By calculating the projection of picture both horizontally and vertically, Neng Goujin
The specific location in one step timestamp region is to realize the Accurate Segmentation of picture.
Step 1.4, timestamp region is cut:
Based on picture in projection result both horizontally and vertically, picture is cut.In order to guarantee the continuous of picture
Property be not destroyed, so the point for being greater than mean value using first is used as starting point, the last one is greater than the point of mean value as terminal, reservation
All images in origin-to-destination region.Assuming that the size of original image S is m × n, the size of picture P is m ' × n ' after cutting, is cut
Cut formula are as follows:
P=S (a:b, c:d), (a, c>1, b<m, d<n)
Wherein:
In formula, S (a:b, c:d) is indicated in picture S, a to b row, c to d column.X indicates the value that certain is put in floor projection,
Indicate each point mean value in floor projection.Y indicates the value that certain is put in upright projection,Indicate each point mean value in upright projection.It indicates in projection vector, x is greater thanPosition minimum value.It indicates in projection vector, x is big
InPosition maximum value, other are similarly.
In the step 2, the process of monocase segmentation are as follows:
The background of picture is removed using top cap operation first, noise is then removed using local binarization algorithm,
Character zone is finally extracted using connected domain algorithm;The algorithm default character color is white, is passed through if character color is black
Crossing after connected domain is extracted will extract less than any effective coverage, so effective coverage then returns to local binarization portion if it does not exist
Point, color overturning is carried out to the picture after local binarization.
Background removal is carried out using top cap mathematical algorithm, algorithm principle is that original image is done with original image opening operation result
Difference.Picture can eliminate a part of ambient noise after top cap operation, and can protrude the character in image.
Noise remove is carried out using Sauvola local binarization algorithm.It only needs to mention the connected domain for meeting character boundary
Take out the cutting that can complete character.And it is fast using the method speed that connected domain is extracted, many program fortune can be saved
The row time.
Using connected domain algorithm carry out character zone extraction, first to picture carry out local binarization, then remove it is excessive or
Too small connected domain is interfered with eliminating part.Effect is preferable when local binarization threshold value is 16 × 16, the connected domain area of character
It is between [500,5000].By still having some inactive areas not eliminate after above-mentioned processing, therefore the present invention is logical
The length for judging whether each connected domain complies with standard character, width and length-width ratio are crossed further to delete inactive area.By
Statistical information, the height of character is between [90,110], and for character width between [10,60], the length-width ratio of character is not less than 1.
The position in original image is corresponded to according to the position of connected domain each in binary map, it can be by the single character in original image
Region is respectively cut out.In order to which the dimension of picture guaranteed is consistent, the present invention is respectively filled every picture and incites somebody to action
The normal pictures that its size conversion is 28 × 28.
In the step 3, the process of character recognition are as follows:
Character recognition is carried out using the convolutional neural networks algorithm in deep learning.The character recognition that the present invention is built
Convolutional neural networks include two convolutional layers, two pond layers and a full articulamentum.First convolutional layer by 6 having a size of
5 × 5 different convolution kernels carry out convolution, after first layer convolution, former word having a size of 28 × 28 character picture to input
Symbol picture becomes 24 × 24 × 6 characteristic pattern.It is 2 × 2 pond function to first that first pond layer, which uses sliding window,
The result of convolutional layer carries out feature and extracts again, undergoes this layer of Chi Huahou, becomes 12 × 12 × 6 characteristic pattern.Second convolutional layer
Feature carried out to the characteristic pattern of pond layer extract again having a size of 5 × 5 different convolution kernels using 12, it is extracted after characteristic pattern
Having a size of 8 × 8 × 12.Second pond layer carries out pond, Chi Huahou characteristic pattern to the characteristic pattern after second convolutional layer convolution
Size becomes 4 × 4 × 12.Characteristic pattern after second of pondization is operated inputs full articulamentum, obtains the feature vector of the character.
The feature vector of character is finally subjected to classification and corresponding with real figure just completes time character recognition in timestamp.
The convolutional neural networks that training is obtained carry out the single character in the temporal information in solar chromosphere film graph
Identification will identify that the character come combines corresponding with the filename of original image in order and inserts in Excel table, after being used for
Phase artificial nucleus couple and establish database.
Further include step 4, artificial nucleus are to the date:
For the chromosphere image in a period of time, it is only necessary to the date information for inputting first image, it can be automatic
Calculate the shooting date of every picture after obtaining.There are some dates for not carrying out sun observation for intermediate occasional, at this moment
It needs by the way of artificial nucleus couple, the wrong date information automatically generated is modified.
A kind of sun film image timestamp information extracting method based on deep learning of the present invention, technical effect are as follows:
1) the timestamp information extracting method based on deep learning that, the invention proposes a kind of, be used to systematically identification and
More than digitized 700 ten thousand solar chromosphere film images are scanned between arrangement US National sun observatory 1956-2003
The temporal information of data.Firstly, positioning timestamp information region in image and being divided;Secondly, using top cap operation, local
The methods of binaryzation, connected domain screening eliminate the interference of noise, carry out Character segmentation to timestamp information area image;Then,
10000 categorized character picture training convolutional neural networks are chosen, and test the recognition effect of obtained network;Most
Afterwards, the network obtained using training carries out batch identification to the timestamp information in 10000 chromosphere images, ties later to identification
Fruit carries out quantitative analysis.The result shows that this method can automatically, accurately and rapidly realize timestamp in scanning sun film image
The positioning and identification of information.
2), using the method based on convolutional neural networks in deep learning, to nearly 50 years of the shooting of US National observatory
The identification problem of temporal information is studied in solar chromosphere film picture.The result shows that: this method is to the character in picture
Identification has very strong applicability, and recognition correct rate can achieve 98% or more, and one picture of average treatment is no more than 0.1 second,
It can satisfy the present invention in practical applications to the needs of recognition speed, identification quality, and there is very strong portability, it is right
Solve the problems, such as that later period same type has very high reference value.
Detailed description of the invention
Present invention will be further explained below with reference to the attached drawings and examples:
Fig. 1 is general convolution neural network structure figure.
Fig. 2 is timestamp character recognizing process figure.
Fig. 3 (a) is the solar chromosphere film image one with timestamp information;
Fig. 3 (b) is the solar chromosphere film image two with timestamp information;
Fig. 3 (c) is the solar chromosphere film image three with timestamp information.
Fig. 4 is the timestamp information area schematic in image.
Fig. 5 is S1jDisplay figure of the vector in reference axis.
Fig. 6 is to remove the figure obtained after sun spherical surface.
Fig. 7 is the picture comprising timestamp.
Fig. 8 (a) is projection vector figure one;
Fig. 8 (b) is projection vector figure two.
Fig. 9 is timestamp region cutting result figure.
Figure 10 is monocase extraction algorithm flow chart.
Figure 11 (a) is the intensity distribution of character (before top cap operation);
Figure 11 (b) is the intensity distribution of character (after top cap operation).
Figure 12 is binaryzation picture.
Figure 13 is the binary map removed after noise.
Figure 14 is the binary map eliminated after inactive area.
Figure 15 is Character segmentation result figure.
Figure 16 is character recognition convolutional neural networks structure chart.
Figure 17 is to check graphical interfaces figure on the date.
Specific embodiment
Illustrate embodiments of the present invention below by way of specific specific example:
The framework of one general convolutional neural networks, including input layer, convolutional layer, pond layer, full articulamentum and output
Layer etc., structure is as shown in Figure 1.Input layer is inputted with the feature vector that output layer exports by softmax logistic regression
Data classify, when input layer be character image data when, can by output layer classify result to character picture into
Row classification, and then realize character recognition.Multiple convolutional layers, pond layer and complete are can according to need in one convolutional neural networks
Articulamentum, Fig. 1 are merely representative of its general type.
The timestamp information in scanning solar chromosphere film image is extracted by convolutional neural networks (CNN),
Three parts are broadly divided into, as shown in Figure 2:
Step 1, picture time stamp information area is positioned and is cut out;
Step 2, the character in timestamp information region is further divided and obtains single character;
Step 3, a large amount of sample training network is first used, the character for then obtaining segmentation uses trained network pair
It identify and its recognition result is integrated and saved.
Step 1: picture time stamp information area is positioned and cut out.
Shown in original solar chromosphere film picture such as Fig. 3 (a), Fig. 3 (b), Fig. 3 (c), the resolution ratio of every picture is 1600
× 2048, timestamp information is normally placed at the left or right side of picture, and position is not fixed.The character format of temporal information is substantially divided
For two classes, such as Fig. 3 (a), Fig. 3 (b), and the Character Style of every one kind is different, it is therefore desirable to which classification identifies character.And by
Different in the clarity of every picture, most of picture is more dim, and character is difficult to recognize, as shown in Fig. 3 (c).Therefore it needs
Every picture is pre-processed.It is that the present invention needs to obtain that whole picture, which only has timestamp character, such as the portion in the red frame of Fig. 4
Point.Record has the year, month, day for shooting the photo with timely, minute, second information respectively in timestamp information, in frame yellow in Fig. 4
Part.According to the observation time precision of these data, it is only necessary to obtain year, month, day with information that is timely, dividing.Due to
The position of temporal information is not fixed, so timestamp region should be positioned and be cut first by the present invention.The position of timestamp
It sets on left side or right side, it can't be on the spherical surface of the sun, so the present invention needs first to cut off the part comprising the sun, so
Position where hunting time stabs again afterwards.
Step 1.1: being the sun spherical aberration step based on upright projection first.
Upright projection method is to be carried out according to the information of pixel in image in characteristic distributions of the vertical direction to pixel
A kind of method checked.Its calculation method is to carry out accumulation operations in vertical component to image and obtain the vector of a 1 × n.
Assuming that the size of picture is m × n, pixel is f in the pixel value that i row j is arrangedij(x, y) is then projected in vertical direction are as follows:
Wherein, S1jIndicate image the i-th column pixel summation as a result, S1jSize be 1 × n.It is being hung down by calculating picture
Histogram to projection can further judge the position of sun spherical surface.After carrying out vertical direction projection calculating to Fig. 3 (a), by
To S1jVector is as shown in Figure 5.As seen from Figure 5, in S1jIt is the knot of solar chromosphere part projection between vector [400,1800]
Fruit.Again because the sun is symmetrically, the present invention only needs to learn in vector S1jIn maximum value, the sun center of circle can be positioned
Position, then the length in pixels according to sun spherical surface shared by vertical direction will include the picture of the sun land portions removal.
Fig. 6 is except two Zhang little Tu obtained after sun spherical surface.
Step 1.2: timestamp position and overturning correction are judged based on variance.
Picture is inherently the matrix for including multiple pixels, and the stool and urine of pixel point value is reflected in the face in picture
On color.Such as: " 0 " indicates black in bianry image, and " 1 " indicates white.It is not bianry image in Fig. 6 but 256 grades of gray scales
Image.The size of each pixel indicates the brightness degree of the point, shares 256 grades, series is higher in the bright of the pixel position
It spends bigger.It will be appreciated from fig. 6 that timestamp is indicated in the picture comprising timestamp with point of high brightness, and timestamp is not included
Picture in, most pixels are all that brightness is darker.This step uses the timestamp position judging method based on variance,
Judged timestamp on any picture according to the variance size of pixel value in image array.The figure for being m × n for size
Piece, variance calculation formula are expressed as follows.
Wherein, xijIndicate pixel value of the pixel at [m, n],Indicate overall pixel value mean value, n*m indicates total
Number of pixels.
Obviously, the variance of the picture comprising timestamp is far longer than the variance of the picture not comprising timestamp.To judge
Picture where timestamp out.After picture where learning timestamp, need to carry out inversion correction to picture.Then if Image to left
90 degree need to be turned clockwise, otherwise is rotated by 90 ° counterclockwise.Image rotation formula is as follows:
Wherein, x, y indicate that original pixel position, the location of pixels after x ', y ' expression rotation transformation, β indicate rotation counterclockwise
The angle turned.It is as shown in Figure 7 by variance judgement and postrotational picture by taking Fig. 6 as an example.
As can be known from Fig. 7, a large amount of garbages are still had in the picture comprising timestamp.This to a certain extent can be right
Calculating speed has an impact.In view of huge picture amount, these useless regions will cause unnecessary storage consumption.
Step 1.3: the present invention realizes the essence segmentation in timestamp region using horizontal vertical sciagraphy.
Horizontal vertical projecting method is the information according to pixel in image both horizontally and vertically respectively to picture
A kind of method that the characteristic distributions of vegetarian refreshments are checked.It is frequently used for carrying out accurate projection to target area, point for the later period
Cut operation.Its calculation method is to carry out accumulation operations respectively on horizontal and vertical component to image and obtain two vectors.
The picture for being m × n for size, pixel are x in the pixel value that i row j is arrangedij, then both horizontally and vertically projection is distinguished
Are as follows:
Wherein, S1jIndicate image the i-th column pixel summation as a result, S1jSize be 1 × n.Si1Indicate image jth row
Pixel summation as a result, Si1Size be m × 1.It can be by calculating picture projection both horizontally and vertically
The specific location in one step timestamp region is to realize the Accurate Segmentation of picture.By taking Fig. 7 as an example, horizontally and vertically
Shown in the vector of projection such as Fig. 8 (a), Fig. 8 (b).
Step 1.4: timestamp region is cut.
Based on picture in projection result both horizontally and vertically, picture is cut.It can be with from Fig. 8 (a), 8 (b)
It learns, the corresponding projection result both horizontally and vertically in timestamp region is higher in picture.According to this feature, limits and only protect
Timestamp region segmentation can be completed in the part for staying projection result to be greater than average value.In order to guarantee that the continuity of picture is not broken
Bad, so the point for being greater than mean value using first is as starting point, the last one is greater than the point of mean value as terminal, retains starting point to eventually
All images in point region.Assuming that the size of original image S is m × n, the size of picture P is m ' × n ' after cutting, cuts formula
Are as follows:
P=S (a:b, c:d), (a, c>1, b<m, d<n) (5)
Wherein:
In formula, S (a:b, c:d) is indicated in picture S, a to b row, c to d column.X indicates the value that certain is put in floor projection,
Indicate each point mean value in floor projection.Y indicates the value that certain is put in upright projection,Indicate each point mean value in upright projection.It indicates in projection vector, x is greater thanPosition minimum value.It indicates in projection vector, x is big
InPosition maximum value, other are similarly.By taking Fig. 7 as an example, after cutting as shown in Figure 9.
Step 2: monocase segmentation.
As shown in Figure 9, the smaller easy and obscure portions of date character are unclear, are easy to be taken as in end processing sequences and make an uproar
Acoustic jamming and be deleted, so that the influence to result is more violent.Therefore, identification time division information is only considered in identification process,
For the date by the way of being filled in manually and workload is little.
In the actual operation process, the cutting of single character is the most difficult.Fig. 9 is from the preferable picture of mass ratio
Cut obtained result.However under actual conditions, for most of picture similar to Fig. 3 (c), character portion is smudgy, and some is even
Human eye can not identify.Therefore the acquisition of single character is more than the acquisition difficulty of character zone, while being also easiest to influence most
Whole recognition result.In addition, shown in 3 (b), 3 (c), timestamp information area characters format is different, therefore needs to handle such as Fig. 3 (a)
The versatility of algorithm is more considered in the process.It is as shown in Figure 10 for single Character segmentation process, top cap operation pair is used first
The background of picture is removed, and then removes noise using local binarization algorithm, finally extracts character using connected domain algorithm
Region.Because character types are divided into two classes, such as Fig. 3 (a), 3 (b), character is indicated with white and black respectively.The algorithm defaults word
According with color is white, will be extracted less than any effective coverage after connected domain is extracted if character color is black, so if
There is no effective coverages then to return to local binarization part, carries out color overturning to the picture after local binarization.
Step 2.1: background removal is based on top cap operation.Its algorithm principle is that original image is done with original image opening operation result
Difference.Its algorithm can be described as:
Topimg=tophat (img, element)=img-open (img, element) (6)
Wherein, topimg indicates that the picture after top cap operation, img indicate original image, and element indicates to transport for top cap
Calculate the core with opening operation.
For 29 × 29 verification Fig. 9 processing, Figure 11 (a), Figure 11 (b) are respectively the intensity point before and after top cap operation
Butut comparison.A part of ambient noise can be eliminated after top cap operation from can be seen that picture in Figure 11 (b), and can
To protrude the character in image.Although noise cannot be completely removed, the workload of later period operation can be reduced, is avoided excessive
The result of influence of noise local binarization.
Step 2.2: noise remove is then using based on Sauvola local binarization algorithm.
Sauvola algorithmic translation is as follows:
Step1: the mean value MEAN and variance STD of pixel f (x, y) within the scope of n*n are calculated;
Step 2: the threshold value T (x, y) of pixel f (x, y) is calculated according to formula;
Wherein, k is custom parameter and 0 < k < 1.N is the dynamic range of standard variance;
Using k=35, treated that picture is as shown in figure 12 by N=0.08.
From binaryzation picture, it can be seen that character can be separated with background, and background or excessive too small also has very much
Noise.It only needs to extract the connected domain for meeting character boundary into the cutting that can complete character.And use connected domain
The method speed of extraction is fast, can save many program runtimes.
Step 2.3: character zone, which extracts, is based on connected domain algorithm.
For removal noise as much as possible and retain effective information, then needs first to carry out picture if the 1st, 2 class pictures
Then local binarization removes excessive or too small connected domain, interfered with eliminating part.If the 2nd class picture then to picture into
The binarization operation and being inverted that row threshold value is 0.25 obtains the character indicated with white, the same class of post-processing it is identical.Through
Experiment show local binarization threshold value be 16 × 16 when effect it is preferable, the connected domain area of character be in [500,5000] it
Between.Processing result is as shown in figure 13.
As seen from Figure 13, by still thering are some inactive areas not eliminate after above-mentioned processing, therefore the present invention
Inactive area is further deleted by judging whether each connected domain complies with standard the length of character, width and length-width ratio.
By statistical information, the height of character is between [90,110], and between [10,60], the length-width ratio of character is not less than character width
1.Obtained result is as shown in figure 14.
It can be obtained from Figure 14, the inactive area of picture is fully cancelled, in this way according to the position of connected domain each in binary map
Single character zone in original image can be respectively cut out by the position in corresponding original image.For the picture ruler guaranteed
It is very little consistent, the present invention every picture is filled respectively and be 28 × 28 by its size conversion normal pictures, single character
It is as shown in figure 15 to cut final result.
Step 3: character recognition.
Character recognition is carried out using the convolutional neural networks algorithm in deep learning.The character recognition that the present invention is built
Convolutional neural networks include two convolutional layers, two pond layers and a full articulamentum, as shown in figure 16.First convolutional layer is logical
It crosses 6 and convolution is carried out having a size of 28 × 28 character picture to input having a size of 5 × 5 different convolution kernels, by first layer
After convolution, former character picture becomes 24 × 24 × 6 characteristic pattern.First pond layer use sliding window for 2 × 2 Chi Huahan
Several results to first convolutional layer carry out feature and extract again, undergo this layer of Chi Huahou, become 12 × 12 × 6 characteristic pattern.The
Two convolutional layers extract the characteristic pattern progress feature of pond layer having a size of 5 × 5 different convolution kernels using 12 again, extracted
Characteristic pattern afterwards is having a size of 8 × 8 × 12.Second pond layer carries out pond, pond to the characteristic pattern after second convolutional layer convolution
Characteristic pattern size becomes 4 × 4 × 12 after change.Characteristic pattern after second of pondization is operated inputs full articulamentum, obtains the character
Feature vector.The feature vector of character is finally subjected to classification and corresponding with real figure just completes in timestamp the time
Character recognition.
In the present invention, the training step of the convolutional neural networks of time character recognition is divided into following 3 step:
Step 3.1: to the single character picture obtained in the chromosphere image and adding label as needed for trained network
Data sample.
Step 3.2: sample data being integrated into X vector of the matrix as input layer of 28 × 28 × N, wherein N is
The number of character sample.Using the corresponding digital label of one-dimensional matrix every in X vector as the Y-direction amount of input layer.
Step 3.3: the network is trained by propagated forward and backpropagation (BP), loop iteration updates its coefficient, adopts
With ReLu activation primitive and maximum pond function, the higher network structure of recognition accuracy is finally obtained.Wherein input to
Amount X and Y is admitted to iteration 100 times.
The convolutional neural networks that training is obtained carry out the single character in the temporal information in solar chromosphere film graph
Identification will identify that the character come combines corresponding with the filename of original image in order and is automatically filled in Excel table, use
In later period artificial nucleus couple and establish database.
Step 4: artificial nucleus check whether the picture date information automatically generated has to the date by the way of artificial
Accidentally.
After the completion of temporal informations automatic identification such as " when, point " in timestamp, it is exactly artificial for also needing the essential step carried out
It checks date information (year, month, day), referring to Fig.1 7.Since the time of shooting is continuous mostly, and using chronometry when 24,
Therefore it is easy to just to judge whether shooting date changes.Such as first recognition result is " 2359 ", second recognition result
For " 000 ", then second shooting date adds one day on the basis of first shooting date.So in a period of time
Chromosphere image, it is only necessary to know that the from date of shooting can calculate the shooting date of every picture after obtaining.It adopts
With the interface of Figure 17, former daily data are verified, if DATE ERROR just need to only modify daily first picture
Shooting date is automatically updated below by recursive algorithm so the date of picture later.
When user carries out date verification, this interface filling original image path and corresponding Excel table path are opened.It clicks
" Open " button, program successively opens picture according to numbering corresponding to each first day date in Excel table, and will fill out on the date
Enter in the text box of right side.If the date recorded in Excel table is correct and a upper picture was a upper date, click directly on
" Next Day " button carries out the verification on next date.It needs to find date jump by " Last ", " Next " button if mistake
Turn picture and insert right side text box, click to update button program and the dates all below be updated automatically.Successively core
It is right, it is to complete until program runs to the last one date.After tested, the date that 1 people completes 10,000 pictures checks institute
It takes time about 10 minutes.In the present invention, the temporal informations such as " when, point " are automatically extracted using step 1-3 in timestamp,
Because this time mainly expends, on " step 4. artificial nucleus are to the date ", this step is above.According to traditional mode, artificial record one by one
Enter " year, month, day, hour, min " information, then the timestamp information typing that 1 people at least needs two days could complete 10,000 pictures.
Therefore, benefit of present invention in terms of improving timestamp information efficiency of inputting and saving is significant.
Claims (9)
1. the sun film image timestamp information extracting method based on deep learning, it is characterised in that the following steps are included:
Step 1: to the timestamp information zone location in solar chromosphere film image and cutting out;
Step 2: the character in timestamp information region is further divided, obtains single character by monocase segmentation;
Step 3: character recognition first uses a large amount of sample training network, then makes the single character that step 2 segmentation obtains
It is identified with trained network, and its recognition result is integrated and saved.
2. the sun film image timestamp information extracting method based on deep learning, feature exist according to claim 1
In: the step 1 comprises the steps of:
Step 1.1, the sun spherical aberration step based on upright projection:
Accumulation operations are carried out in vertical component to image and obtain the vector of a 1 × n, it is assumed that the size of picture is m × n, as
Element is f in the pixel value that i row j is arrangedij(x, y) is then projected in vertical direction are as follows:
Wherein, S1jIndicate image the i-th column pixel summation as a result, S1jSize be 1 × n;By calculating picture in Vertical Square
To projection can further judge the position of sun spherical surface;In S1jIt is projected between vector [400,1800] for solar chromosphere part
Result;Again because the sun is symmetrically, only to need to learn in vector S1jIn maximum value, the sun center of circle can be positioned
Position, then the length in pixels according to sun spherical surface shared by vertical direction, will remove comprising the picture of sun land portions;
Step 1.2, timestamp position and overturning correction are judged based on variance:
The variance of picture comprising timestamp is far longer than the variance of the picture not comprising timestamp;To judge timestamp
The picture at place is needed to carry out inversion correction to picture, then be needed if Image to left clockwise after learning timestamp place picture
It turns over and turn 90 degrees, otherwise be rotated by 90 ° counterclockwise;
Step 1.3, the timestamp character zone essence segmentation based on sciagraphy:
The picture for being m × n for size, pixel are x in the pixel value that i row j is arrangedij, then both horizontally and vertically project
It is respectively as follows:
Wherein, S1jIndicate image the i-th column pixel summation as a result, S1jSize be 1 × n;Si1Indicate image jth row pixel
Point summation as a result, Si1Size be m × 1;It, can be further by calculating the projection of picture both horizontally and vertically
The specific location in timestamp region is to realize the Accurate Segmentation of picture.
3. the sun film image timestamp information extracting method based on deep learning, feature exist according to claim 2
In:
The picture obtained based on step 1.3 cuts picture in projection result both horizontally and vertically:
In order to guarantee that the continuity of picture is not destroyed, as starting point, the last one is greater than equal the point for being greater than mean value using first
The point of value retains all images in origin-to-destination region as terminal;Assuming that the size of original image S is m × n, picture P after cutting
Size be m ' × n ', cut formula are as follows:
P=S (a:b, c:d), (a, c>1, b<m, d<n)
Wherein:
In formula, S (a:b, c:d) is indicated in picture S, a to b row, c to d column;X indicates the value that certain is put in floor projection,It indicates
Each point mean value in floor projection;Y indicates the value that certain is put in upright projection,Indicate each point mean value in upright projection;It indicates in projection vector, x is greater thanPosition minimum value;It indicates in projection vector, x is big
InPosition maximum value.
4. the sun film image timestamp information extracting method based on deep learning, feature exist according to claim 1
In:
In the step 2, the process of monocase segmentation are as follows:
The background of picture is removed using top cap operation first, noise is then removed using local binarization algorithm, finally
Character zone is extracted using connected domain algorithm;The algorithm default character color is white, by connecting if character color is black
Logical domain will be extracted less than any effective coverage after extracting, so effective coverage then returns to local binarization part if it does not exist,
Color overturning is carried out to the picture after local binarization.
5. the sun film image timestamp information extracting method based on deep learning, feature exist according to claim 4
In:
Background removal is carried out using top cap mathematical algorithm, algorithm principle is that original image makes the difference with original image opening operation result;Figure
Piece can eliminate a part of ambient noise after top cap operation, and can protrude the character in image.
6. the sun film image timestamp information extracting method based on deep learning, feature exist according to claim 4
In:
Noise remove is carried out using Sauvola local binarization algorithm;It only needs to extract the connected domain for meeting character boundary
It can complete the cutting of character.
7. the sun film image timestamp information extracting method based on deep learning, feature exist according to claim 4
In:
Character zone extraction is carried out using connected domain algorithm, by judging whether each connected domain complies with standard the length of character,
Width and length-width ratio further delete inactive area;The height of character is between [90,110], and character width is in [10,60]
Between, the length-width ratio of character is not less than 1;
The position in original image is corresponded to according to the position of connected domain each in binary map, it can be by the single character zone in original image point
It is not cut into;In order to which the dimension of picture guaranteed is consistent, which is respectively filled every picture, and by its size
It is transformed to 28 × 28 normal pictures.
8. the sun film image timestamp information extracting method based on deep learning, feature exist according to claim 1
In:
In the step 3, the process of character recognition are as follows:
Character recognition is carried out using the convolutional neural networks algorithm in deep learning, the convolutional Neural net for the character recognition built
Network includes two convolutional layers, two pond layers and a full articulamentum, and first convolutional layer passes through 6 having a size of 5 × 5 difference
Convolution kernel to input having a size of 28 × 28 character picture carry out convolution, after first layer convolution, former character picture becomes
24 × 24 × 6 characteristic pattern;It is 2 × 2 pond function to the knot of first convolutional layer that first pond layer, which uses sliding window,
Fruit carries out feature and extracts again, undergoes this layer of Chi Huahou, becomes 12 × 12 × 6 characteristic pattern;Second convolutional layer uses 12 rulers
The very little different convolution kernels for being 5 × 5 carry out feature to the characteristic pattern of pond layer and extract again, it is extracted after characteristic pattern having a size of 8 × 8
×12;Second pond layer carries out pond to the characteristic pattern after second convolutional layer convolution, and Chi Huahou characteristic pattern size becomes 4 ×
4×12;Characteristic pattern after second of pondization is operated inputs full articulamentum, obtains the feature vector of the character;Finally by character
Feature vector carry out classification and corresponding with real figure just complete time character recognition in timestamp;
The convolutional neural networks that training is obtained, know the single character in the temporal information in solar chromosphere film graph
Not, it will identify that the character come combines corresponding with the filename of original image in order and inserts in Excel table, be used for the later period
Artificial nucleus couple and establish database.
9. the sun film image timestamp information extracting method based on deep learning, feature exist according to claim 1
In: further include step 4, artificial nucleus are to the date:
For the chromosphere image in a period of time, it is only necessary to which the year, month, day information for inputting first image can calculate automatically
The shooting date of every picture after obtaining;There are some dates for not carrying out sun observation for intermediate occasional, using artificial
The mode of verification modifies the wrong date information automatically generated.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910765276.1A CN110533030B (en) | 2019-08-19 | 2019-08-19 | Deep learning-based sun film image timestamp information extraction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910765276.1A CN110533030B (en) | 2019-08-19 | 2019-08-19 | Deep learning-based sun film image timestamp information extraction method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110533030A true CN110533030A (en) | 2019-12-03 |
CN110533030B CN110533030B (en) | 2023-07-14 |
Family
ID=68663766
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910765276.1A Active CN110533030B (en) | 2019-08-19 | 2019-08-19 | Deep learning-based sun film image timestamp information extraction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110533030B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111898606A (en) * | 2020-05-19 | 2020-11-06 | 武汉东智科技股份有限公司 | Night imaging identification method for superimposing transparent time characters in video image |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6026177A (en) * | 1995-08-29 | 2000-02-15 | The Hong Kong University Of Science & Technology | Method for identifying a sequence of alphanumeric characters |
CN101246551A (en) * | 2008-03-07 | 2008-08-20 | 北京航空航天大学 | Fast license plate locating method |
CN101751568A (en) * | 2008-12-12 | 2010-06-23 | 汉王科技股份有限公司 | ID No. locating and recognizing method |
CN102402686A (en) * | 2011-12-07 | 2012-04-04 | 北京云星宇交通工程有限公司 | Method for dividing license plate characters based on connected domain analysis |
US20150278626A1 (en) * | 2014-03-31 | 2015-10-01 | Nidec Sankyo Corporation | Character recognition device and character segmentation method |
CN105528606A (en) * | 2015-10-30 | 2016-04-27 | 小米科技有限责任公司 | Region identification method and device |
CN105528600A (en) * | 2015-10-30 | 2016-04-27 | 小米科技有限责任公司 | Region identification method and device |
WO2017020723A1 (en) * | 2015-08-04 | 2017-02-09 | 阿里巴巴集团控股有限公司 | Character segmentation method and device and electronic device |
CN106611174A (en) * | 2016-12-29 | 2017-05-03 | 成都数联铭品科技有限公司 | OCR recognition method for unusual fonts |
CN108734189A (en) * | 2017-04-20 | 2018-11-02 | 天津工业大学 | Vehicle License Plate Recognition System based on atmospherical scattering model and deep learning under thick fog weather |
CN108921163A (en) * | 2018-06-08 | 2018-11-30 | 南京大学 | A kind of packaging coding detection method based on deep learning |
CN109359695A (en) * | 2018-10-26 | 2019-02-19 | 东莞理工学院 | A kind of computer vision 0-O recognition methods based on deep learning |
US20190095739A1 (en) * | 2017-09-27 | 2019-03-28 | Harbin Institute Of Technology | Adaptive Auto Meter Detection Method based on Character Segmentation and Cascade Classifier |
CN109657665A (en) * | 2018-10-31 | 2019-04-19 | 广东工业大学 | A kind of invoice batch automatic recognition system based on deep learning |
CN109784342A (en) * | 2019-01-24 | 2019-05-21 | 厦门商集网络科技有限责任公司 | A kind of OCR recognition methods and terminal based on deep learning model |
-
2019
- 2019-08-19 CN CN201910765276.1A patent/CN110533030B/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6026177A (en) * | 1995-08-29 | 2000-02-15 | The Hong Kong University Of Science & Technology | Method for identifying a sequence of alphanumeric characters |
CN101246551A (en) * | 2008-03-07 | 2008-08-20 | 北京航空航天大学 | Fast license plate locating method |
CN101751568A (en) * | 2008-12-12 | 2010-06-23 | 汉王科技股份有限公司 | ID No. locating and recognizing method |
CN102402686A (en) * | 2011-12-07 | 2012-04-04 | 北京云星宇交通工程有限公司 | Method for dividing license plate characters based on connected domain analysis |
US20150278626A1 (en) * | 2014-03-31 | 2015-10-01 | Nidec Sankyo Corporation | Character recognition device and character segmentation method |
WO2017020723A1 (en) * | 2015-08-04 | 2017-02-09 | 阿里巴巴集团控股有限公司 | Character segmentation method and device and electronic device |
CN105528600A (en) * | 2015-10-30 | 2016-04-27 | 小米科技有限责任公司 | Region identification method and device |
CN105528606A (en) * | 2015-10-30 | 2016-04-27 | 小米科技有限责任公司 | Region identification method and device |
CN106611174A (en) * | 2016-12-29 | 2017-05-03 | 成都数联铭品科技有限公司 | OCR recognition method for unusual fonts |
CN108734189A (en) * | 2017-04-20 | 2018-11-02 | 天津工业大学 | Vehicle License Plate Recognition System based on atmospherical scattering model and deep learning under thick fog weather |
US20190095739A1 (en) * | 2017-09-27 | 2019-03-28 | Harbin Institute Of Technology | Adaptive Auto Meter Detection Method based on Character Segmentation and Cascade Classifier |
CN108921163A (en) * | 2018-06-08 | 2018-11-30 | 南京大学 | A kind of packaging coding detection method based on deep learning |
CN109359695A (en) * | 2018-10-26 | 2019-02-19 | 东莞理工学院 | A kind of computer vision 0-O recognition methods based on deep learning |
CN109657665A (en) * | 2018-10-31 | 2019-04-19 | 广东工业大学 | A kind of invoice batch automatic recognition system based on deep learning |
CN109784342A (en) * | 2019-01-24 | 2019-05-21 | 厦门商集网络科技有限责任公司 | A kind of OCR recognition methods and terminal based on deep learning model |
Non-Patent Citations (6)
Title |
---|
A.KRIZHEVSKY,I.SUTSKEVER, G.E. HINTON: "Imagenet classification with deep convolutional neural networks", 《PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS》 * |
唐铭豆;陶青川;冯谦;: "基于神经网络的芯片表面字符检测识别***", 现代计算机(专业版), no. 09 * |
曾祥云,郑胜等: "基于SVM手绘太阳黑子图像背景提取方法", 《微型机与应用》 * |
朱明锋;郑胜;曾祥云;徐高贵: "基于SVM 手绘太阳黑子图像背景提取方法", 《人工智能》 * |
朱明锋;郑胜;曾祥云;徐高贵: "基于SVM 手绘太阳黑子图像背景提取方法", 《人工智能》, 16 December 2016 (2016-12-16) * |
朱道远;郑胜;曾祥云;徐高贵;: "手绘太阳黑子图手写字符分割方法研究", 微型机与应用, no. 20 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111898606A (en) * | 2020-05-19 | 2020-11-06 | 武汉东智科技股份有限公司 | Night imaging identification method for superimposing transparent time characters in video image |
CN111898606B (en) * | 2020-05-19 | 2023-04-07 | 武汉东智科技股份有限公司 | Night imaging identification method for superimposing transparent time characters in video image |
Also Published As
Publication number | Publication date |
---|---|
CN110533030B (en) | 2023-07-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110443143B (en) | Multi-branch convolutional neural network fused remote sensing image scene classification method | |
CN113392775B (en) | Sugarcane seedling automatic identification and counting method based on deep neural network | |
CN112232371B (en) | American license plate recognition method based on YOLOv3 and text recognition | |
CN110490212A (en) | Molybdenum target image processing arrangement, method and apparatus | |
CN107742099A (en) | A kind of crowd density estimation based on full convolutional network, the method for demographics | |
CN108921201B (en) | Dam defect identification and classification method based on feature combination with CNN | |
CN109086652A (en) | Handwritten word model training method, Chinese characters recognition method, device, equipment and medium | |
CN110569747A (en) | method for rapidly counting rice ears of paddy field rice by using image pyramid and fast-RCNN | |
CN109886153A (en) | A kind of real-time face detection method based on depth convolutional neural networks | |
CN110929746A (en) | Electronic file title positioning, extracting and classifying method based on deep neural network | |
CN110674777A (en) | Optical character recognition method in patent text scene | |
CN115099297B (en) | Soybean plant phenotype data statistical method based on improved YOLO v5 model | |
CN111553438A (en) | Image identification method based on convolutional neural network | |
CN111339902A (en) | Liquid crystal display number identification method and device of digital display instrument | |
CN110147833A (en) | Facial image processing method, apparatus, system and readable storage medium storing program for executing | |
CN110533100A (en) | A method of CME detection and tracking is carried out based on machine learning | |
CN113378812A (en) | Digital dial plate identification method based on Mask R-CNN and CRNN | |
CN109002771A (en) | A kind of Classifying Method in Remote Sensing Image based on recurrent neural network | |
CN109034213A (en) | Hyperspectral image classification method and system based on joint entropy principle | |
CN116758539B (en) | Embryo image blastomere identification method based on data enhancement | |
CN116883996B (en) | Embryo development stage prediction and quality assessment system based on rotation constant-change network | |
CN110533030A (en) | Sun film image timestamp information extracting method based on deep learning | |
CN110569871B (en) | Saddle point identification method based on deep convolutional neural network | |
CN117095180A (en) | Embryo development stage prediction and quality assessment method based on stage identification | |
CN116612272A (en) | Intelligent digital detection system for image processing and detection method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |