CN110533030A - Sun film image timestamp information extracting method based on deep learning - Google Patents

Sun film image timestamp information extracting method based on deep learning Download PDF

Info

Publication number
CN110533030A
CN110533030A CN201910765276.1A CN201910765276A CN110533030A CN 110533030 A CN110533030 A CN 110533030A CN 201910765276 A CN201910765276 A CN 201910765276A CN 110533030 A CN110533030 A CN 110533030A
Authority
CN
China
Prior art keywords
picture
character
image
sun
timestamp
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910765276.1A
Other languages
Chinese (zh)
Other versions
CN110533030B (en
Inventor
曾曙光
左肖雄
郑胜
张佳锋
曾祥云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Three Gorges University CTGU
Original Assignee
China Three Gorges University CTGU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Three Gorges University CTGU filed Critical China Three Gorges University CTGU
Priority to CN201910765276.1A priority Critical patent/CN110533030B/en
Publication of CN110533030A publication Critical patent/CN110533030A/en
Application granted granted Critical
Publication of CN110533030B publication Critical patent/CN110533030B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)
  • Character Discrimination (AREA)

Abstract

Sun film image timestamp information extracting method based on deep learning, including step 1: to the timestamp information zone location in solar chromosphere film image and cutting out;Timestamp information is in solar chromosphere film image for indicating the year, month, day, hour, min information of shooting time;Step 2: the character in timestamp information region is further divided, obtains single character by monocase segmentation;Step 3: character recognition first uses a large amount of sample training network, then identifies the single character that step 2 segmentation obtains to it using trained network, and its recognition result is integrated and saved.This method carries out machine automatic identification to the Digital Time-stamp in sun observation image film, and the temporal information output that will identify that.With the workload for reducing manual identified He writing temporal information, so as to accelerate the digitlization process of this batch of film data, the historical summary for keeping these precious is more easily for Solar Physics research.

Description

Sun film image timestamp information extracting method based on deep learning
Technical field
The present invention relates to sun observation technical field of image processing, and in particular to a kind of sun film based on deep learning Image temporal stabs information extracting method.
Background technique
Solar chromosphere layer is the atmosphere on photosphere, and as photosphere to the transition region of corona layer, magnetic field is less Stablize, often generates violent flare burst phenomenon.The radiation of solar flare usually band-like appears in magnetic with elongated in chromosphere The two sides of umpolung line (PIL), this is considered as the evidence of the representative configuration of magnetic reconnection.In order to study flare burst phenomenon, need Related personnel carries out shooting record to solar chromosphere for a long time.Because of historical summary enormous amount, the time of very big a collection of chromosphere image Information is still to be presented in data with image format, and the not formed digital information directly read for computer.This is to utilization The expansion scientific research of these data brings big inconvenience.
On the one hand excavation to historical summary effective information has been deepened in the shooting time digitlization of image, on the other hand can be with Greatly mitigate the retrieval workload of scientific research personnel, in order to they obtained from these data it is more valuable as a result, right The progress of research work is very helpful.
Historical sun observation image mostly uses film to save, and its shooting time is imprinted on film.For the ease of section It grinds personnel and effectively uses these image documents, it is necessary to extract the timestamp information in film.Picture number is very Huge, manual identified and extraction are time-consuming and laborious.Therefore, using the timestamp information in Computer Automatic Recognition image, can be Efficiently utilize the key point of these data.
Summary of the invention
In order to solve the above technical problems, the present invention provides a kind of sun film image timestamp information based on deep learning Extracting method, this method carry out machine automatic identification to the Digital Time-stamp in sun observation image film, and will identify that Temporal information output, to reduce the workload of manual identified and entry time information.So as to accelerate the number of this batch of film data Word process, the historical summary for keeping these precious is more easily for Solar Physics research.
The technical scheme adopted by the invention is as follows:
Sun film image timestamp information extracting method based on deep learning, comprising the following steps:
Step 1: to the timestamp information zone location in solar chromosphere film image and cutting out;
Timestamp information is in solar chromosphere film image for indicating the year, month, day, hour, min information of shooting time;
Step 2: the character in timestamp information region is further divided, obtains single character by monocase segmentation;
Step 3: character recognition first uses a large amount of sample training network, then step 2 is divided to obtained single word Symbol identifies it using trained network, and its recognition result is integrated and saved.
The step 1 comprises the steps of:
Step 1.1, the sun spherical aberration step based on upright projection:
Accumulation operations are carried out in vertical component to image and obtain the vector of a 1 × n, it is assumed that the size of picture be m × N, pixel are f in the pixel value that i row j is arrangedij(x, y) is then projected in vertical direction are as follows:
Wherein, S1jIndicate image the i-th column pixel summation as a result, S1jSize be 1 × n.It is being hung down by calculating picture Histogram to projection can further judge the position of sun spherical surface.In S1jIt is solar chromosphere part between vector [400,1800] The result of projection.Again because the sun is symmetrically, only to need to learn in vector S1jIn maximum value, can position the sun circle The position of the heart, then the length in pixels according to sun spherical surface shared by vertical direction, the picture comprising sun land portions is gone It removes.
Step 1.2, timestamp position and overturning correction are judged based on variance:
The variance of picture comprising timestamp is far longer than the variance of the picture not comprising timestamp.When to judge Between picture where stamp, after learning picture where timestamp, need to carry out inversion correction to picture, then be needed if Image to left suitable Hour hands, which turn over, to be turn 90 degrees, otherwise is rotated by 90 ° counterclockwise.
Step 1.3, the timestamp character zone essence segmentation based on sciagraphy:
The picture for being m × n for size, pixel are x in the pixel value that i row j is arrangedij, then both horizontally and vertically Projection is respectively as follows:
Wherein, S1jIndicate image the i-th column pixel summation as a result, S1jSize be 1 × n.Si1Indicate image jth row Pixel summation as a result, Si1Size be m × 1.By calculating the projection of picture both horizontally and vertically, Neng Goujin The specific location in one step timestamp region is to realize the Accurate Segmentation of picture.
Step 1.4, timestamp region is cut:
Based on picture in projection result both horizontally and vertically, picture is cut.In order to guarantee the continuous of picture Property be not destroyed, so the point for being greater than mean value using first is used as starting point, the last one is greater than the point of mean value as terminal, reservation All images in origin-to-destination region.Assuming that the size of original image S is m × n, the size of picture P is m ' × n ' after cutting, is cut Cut formula are as follows:
P=S (a:b, c:d), (a, c>1, b<m, d<n)
Wherein:
In formula, S (a:b, c:d) is indicated in picture S, a to b row, c to d column.X indicates the value that certain is put in floor projection, Indicate each point mean value in floor projection.Y indicates the value that certain is put in upright projection,Indicate each point mean value in upright projection.It indicates in projection vector, x is greater thanPosition minimum value.It indicates in projection vector, x is big InPosition maximum value, other are similarly.
In the step 2, the process of monocase segmentation are as follows:
The background of picture is removed using top cap operation first, noise is then removed using local binarization algorithm, Character zone is finally extracted using connected domain algorithm;The algorithm default character color is white, is passed through if character color is black Crossing after connected domain is extracted will extract less than any effective coverage, so effective coverage then returns to local binarization portion if it does not exist Point, color overturning is carried out to the picture after local binarization.
Background removal is carried out using top cap mathematical algorithm, algorithm principle is that original image is done with original image opening operation result Difference.Picture can eliminate a part of ambient noise after top cap operation, and can protrude the character in image.
Noise remove is carried out using Sauvola local binarization algorithm.It only needs to mention the connected domain for meeting character boundary Take out the cutting that can complete character.And it is fast using the method speed that connected domain is extracted, many program fortune can be saved The row time.
Using connected domain algorithm carry out character zone extraction, first to picture carry out local binarization, then remove it is excessive or Too small connected domain is interfered with eliminating part.Effect is preferable when local binarization threshold value is 16 × 16, the connected domain area of character It is between [500,5000].By still having some inactive areas not eliminate after above-mentioned processing, therefore the present invention is logical The length for judging whether each connected domain complies with standard character, width and length-width ratio are crossed further to delete inactive area.By Statistical information, the height of character is between [90,110], and for character width between [10,60], the length-width ratio of character is not less than 1.
The position in original image is corresponded to according to the position of connected domain each in binary map, it can be by the single character in original image Region is respectively cut out.In order to which the dimension of picture guaranteed is consistent, the present invention is respectively filled every picture and incites somebody to action The normal pictures that its size conversion is 28 × 28.
In the step 3, the process of character recognition are as follows:
Character recognition is carried out using the convolutional neural networks algorithm in deep learning.The character recognition that the present invention is built Convolutional neural networks include two convolutional layers, two pond layers and a full articulamentum.First convolutional layer by 6 having a size of 5 × 5 different convolution kernels carry out convolution, after first layer convolution, former word having a size of 28 × 28 character picture to input Symbol picture becomes 24 × 24 × 6 characteristic pattern.It is 2 × 2 pond function to first that first pond layer, which uses sliding window, The result of convolutional layer carries out feature and extracts again, undergoes this layer of Chi Huahou, becomes 12 × 12 × 6 characteristic pattern.Second convolutional layer Feature carried out to the characteristic pattern of pond layer extract again having a size of 5 × 5 different convolution kernels using 12, it is extracted after characteristic pattern Having a size of 8 × 8 × 12.Second pond layer carries out pond, Chi Huahou characteristic pattern to the characteristic pattern after second convolutional layer convolution Size becomes 4 × 4 × 12.Characteristic pattern after second of pondization is operated inputs full articulamentum, obtains the feature vector of the character. The feature vector of character is finally subjected to classification and corresponding with real figure just completes time character recognition in timestamp.
The convolutional neural networks that training is obtained carry out the single character in the temporal information in solar chromosphere film graph Identification will identify that the character come combines corresponding with the filename of original image in order and inserts in Excel table, after being used for Phase artificial nucleus couple and establish database.
Further include step 4, artificial nucleus are to the date:
For the chromosphere image in a period of time, it is only necessary to the date information for inputting first image, it can be automatic Calculate the shooting date of every picture after obtaining.There are some dates for not carrying out sun observation for intermediate occasional, at this moment It needs by the way of artificial nucleus couple, the wrong date information automatically generated is modified.
A kind of sun film image timestamp information extracting method based on deep learning of the present invention, technical effect are as follows:
1) the timestamp information extracting method based on deep learning that, the invention proposes a kind of, be used to systematically identification and More than digitized 700 ten thousand solar chromosphere film images are scanned between arrangement US National sun observatory 1956-2003 The temporal information of data.Firstly, positioning timestamp information region in image and being divided;Secondly, using top cap operation, local The methods of binaryzation, connected domain screening eliminate the interference of noise, carry out Character segmentation to timestamp information area image;Then, 10000 categorized character picture training convolutional neural networks are chosen, and test the recognition effect of obtained network;Most Afterwards, the network obtained using training carries out batch identification to the timestamp information in 10000 chromosphere images, ties later to identification Fruit carries out quantitative analysis.The result shows that this method can automatically, accurately and rapidly realize timestamp in scanning sun film image The positioning and identification of information.
2), using the method based on convolutional neural networks in deep learning, to nearly 50 years of the shooting of US National observatory The identification problem of temporal information is studied in solar chromosphere film picture.The result shows that: this method is to the character in picture Identification has very strong applicability, and recognition correct rate can achieve 98% or more, and one picture of average treatment is no more than 0.1 second, It can satisfy the present invention in practical applications to the needs of recognition speed, identification quality, and there is very strong portability, it is right Solve the problems, such as that later period same type has very high reference value.
Detailed description of the invention
Present invention will be further explained below with reference to the attached drawings and examples:
Fig. 1 is general convolution neural network structure figure.
Fig. 2 is timestamp character recognizing process figure.
Fig. 3 (a) is the solar chromosphere film image one with timestamp information;
Fig. 3 (b) is the solar chromosphere film image two with timestamp information;
Fig. 3 (c) is the solar chromosphere film image three with timestamp information.
Fig. 4 is the timestamp information area schematic in image.
Fig. 5 is S1jDisplay figure of the vector in reference axis.
Fig. 6 is to remove the figure obtained after sun spherical surface.
Fig. 7 is the picture comprising timestamp.
Fig. 8 (a) is projection vector figure one;
Fig. 8 (b) is projection vector figure two.
Fig. 9 is timestamp region cutting result figure.
Figure 10 is monocase extraction algorithm flow chart.
Figure 11 (a) is the intensity distribution of character (before top cap operation);
Figure 11 (b) is the intensity distribution of character (after top cap operation).
Figure 12 is binaryzation picture.
Figure 13 is the binary map removed after noise.
Figure 14 is the binary map eliminated after inactive area.
Figure 15 is Character segmentation result figure.
Figure 16 is character recognition convolutional neural networks structure chart.
Figure 17 is to check graphical interfaces figure on the date.
Specific embodiment
Illustrate embodiments of the present invention below by way of specific specific example:
The framework of one general convolutional neural networks, including input layer, convolutional layer, pond layer, full articulamentum and output Layer etc., structure is as shown in Figure 1.Input layer is inputted with the feature vector that output layer exports by softmax logistic regression Data classify, when input layer be character image data when, can by output layer classify result to character picture into Row classification, and then realize character recognition.Multiple convolutional layers, pond layer and complete are can according to need in one convolutional neural networks Articulamentum, Fig. 1 are merely representative of its general type.
The timestamp information in scanning solar chromosphere film image is extracted by convolutional neural networks (CNN), Three parts are broadly divided into, as shown in Figure 2:
Step 1, picture time stamp information area is positioned and is cut out;
Step 2, the character in timestamp information region is further divided and obtains single character;
Step 3, a large amount of sample training network is first used, the character for then obtaining segmentation uses trained network pair It identify and its recognition result is integrated and saved.
Step 1: picture time stamp information area is positioned and cut out.
Shown in original solar chromosphere film picture such as Fig. 3 (a), Fig. 3 (b), Fig. 3 (c), the resolution ratio of every picture is 1600 × 2048, timestamp information is normally placed at the left or right side of picture, and position is not fixed.The character format of temporal information is substantially divided For two classes, such as Fig. 3 (a), Fig. 3 (b), and the Character Style of every one kind is different, it is therefore desirable to which classification identifies character.And by Different in the clarity of every picture, most of picture is more dim, and character is difficult to recognize, as shown in Fig. 3 (c).Therefore it needs Every picture is pre-processed.It is that the present invention needs to obtain that whole picture, which only has timestamp character, such as the portion in the red frame of Fig. 4 Point.Record has the year, month, day for shooting the photo with timely, minute, second information respectively in timestamp information, in frame yellow in Fig. 4 Part.According to the observation time precision of these data, it is only necessary to obtain year, month, day with information that is timely, dividing.Due to The position of temporal information is not fixed, so timestamp region should be positioned and be cut first by the present invention.The position of timestamp It sets on left side or right side, it can't be on the spherical surface of the sun, so the present invention needs first to cut off the part comprising the sun, so Position where hunting time stabs again afterwards.
Step 1.1: being the sun spherical aberration step based on upright projection first.
Upright projection method is to be carried out according to the information of pixel in image in characteristic distributions of the vertical direction to pixel A kind of method checked.Its calculation method is to carry out accumulation operations in vertical component to image and obtain the vector of a 1 × n. Assuming that the size of picture is m × n, pixel is f in the pixel value that i row j is arrangedij(x, y) is then projected in vertical direction are as follows:
Wherein, S1jIndicate image the i-th column pixel summation as a result, S1jSize be 1 × n.It is being hung down by calculating picture Histogram to projection can further judge the position of sun spherical surface.After carrying out vertical direction projection calculating to Fig. 3 (a), by To S1jVector is as shown in Figure 5.As seen from Figure 5, in S1jIt is the knot of solar chromosphere part projection between vector [400,1800] Fruit.Again because the sun is symmetrically, the present invention only needs to learn in vector S1jIn maximum value, the sun center of circle can be positioned Position, then the length in pixels according to sun spherical surface shared by vertical direction will include the picture of the sun land portions removal. Fig. 6 is except two Zhang little Tu obtained after sun spherical surface.
Step 1.2: timestamp position and overturning correction are judged based on variance.
Picture is inherently the matrix for including multiple pixels, and the stool and urine of pixel point value is reflected in the face in picture On color.Such as: " 0 " indicates black in bianry image, and " 1 " indicates white.It is not bianry image in Fig. 6 but 256 grades of gray scales Image.The size of each pixel indicates the brightness degree of the point, shares 256 grades, series is higher in the bright of the pixel position It spends bigger.It will be appreciated from fig. 6 that timestamp is indicated in the picture comprising timestamp with point of high brightness, and timestamp is not included Picture in, most pixels are all that brightness is darker.This step uses the timestamp position judging method based on variance, Judged timestamp on any picture according to the variance size of pixel value in image array.The figure for being m × n for size Piece, variance calculation formula are expressed as follows.
Wherein, xijIndicate pixel value of the pixel at [m, n],Indicate overall pixel value mean value, n*m indicates total Number of pixels.
Obviously, the variance of the picture comprising timestamp is far longer than the variance of the picture not comprising timestamp.To judge Picture where timestamp out.After picture where learning timestamp, need to carry out inversion correction to picture.Then if Image to left 90 degree need to be turned clockwise, otherwise is rotated by 90 ° counterclockwise.Image rotation formula is as follows:
Wherein, x, y indicate that original pixel position, the location of pixels after x ', y ' expression rotation transformation, β indicate rotation counterclockwise The angle turned.It is as shown in Figure 7 by variance judgement and postrotational picture by taking Fig. 6 as an example.
As can be known from Fig. 7, a large amount of garbages are still had in the picture comprising timestamp.This to a certain extent can be right Calculating speed has an impact.In view of huge picture amount, these useless regions will cause unnecessary storage consumption.
Step 1.3: the present invention realizes the essence segmentation in timestamp region using horizontal vertical sciagraphy.
Horizontal vertical projecting method is the information according to pixel in image both horizontally and vertically respectively to picture A kind of method that the characteristic distributions of vegetarian refreshments are checked.It is frequently used for carrying out accurate projection to target area, point for the later period Cut operation.Its calculation method is to carry out accumulation operations respectively on horizontal and vertical component to image and obtain two vectors. The picture for being m × n for size, pixel are x in the pixel value that i row j is arrangedij, then both horizontally and vertically projection is distinguished Are as follows:
Wherein, S1jIndicate image the i-th column pixel summation as a result, S1jSize be 1 × n.Si1Indicate image jth row Pixel summation as a result, Si1Size be m × 1.It can be by calculating picture projection both horizontally and vertically The specific location in one step timestamp region is to realize the Accurate Segmentation of picture.By taking Fig. 7 as an example, horizontally and vertically Shown in the vector of projection such as Fig. 8 (a), Fig. 8 (b).
Step 1.4: timestamp region is cut.
Based on picture in projection result both horizontally and vertically, picture is cut.It can be with from Fig. 8 (a), 8 (b) It learns, the corresponding projection result both horizontally and vertically in timestamp region is higher in picture.According to this feature, limits and only protect Timestamp region segmentation can be completed in the part for staying projection result to be greater than average value.In order to guarantee that the continuity of picture is not broken Bad, so the point for being greater than mean value using first is as starting point, the last one is greater than the point of mean value as terminal, retains starting point to eventually All images in point region.Assuming that the size of original image S is m × n, the size of picture P is m ' × n ' after cutting, cuts formula Are as follows:
P=S (a:b, c:d), (a, c>1, b<m, d<n) (5)
Wherein:
In formula, S (a:b, c:d) is indicated in picture S, a to b row, c to d column.X indicates the value that certain is put in floor projection, Indicate each point mean value in floor projection.Y indicates the value that certain is put in upright projection,Indicate each point mean value in upright projection.It indicates in projection vector, x is greater thanPosition minimum value.It indicates in projection vector, x is big InPosition maximum value, other are similarly.By taking Fig. 7 as an example, after cutting as shown in Figure 9.
Step 2: monocase segmentation.
As shown in Figure 9, the smaller easy and obscure portions of date character are unclear, are easy to be taken as in end processing sequences and make an uproar Acoustic jamming and be deleted, so that the influence to result is more violent.Therefore, identification time division information is only considered in identification process, For the date by the way of being filled in manually and workload is little.
In the actual operation process, the cutting of single character is the most difficult.Fig. 9 is from the preferable picture of mass ratio Cut obtained result.However under actual conditions, for most of picture similar to Fig. 3 (c), character portion is smudgy, and some is even Human eye can not identify.Therefore the acquisition of single character is more than the acquisition difficulty of character zone, while being also easiest to influence most Whole recognition result.In addition, shown in 3 (b), 3 (c), timestamp information area characters format is different, therefore needs to handle such as Fig. 3 (a) The versatility of algorithm is more considered in the process.It is as shown in Figure 10 for single Character segmentation process, top cap operation pair is used first The background of picture is removed, and then removes noise using local binarization algorithm, finally extracts character using connected domain algorithm Region.Because character types are divided into two classes, such as Fig. 3 (a), 3 (b), character is indicated with white and black respectively.The algorithm defaults word According with color is white, will be extracted less than any effective coverage after connected domain is extracted if character color is black, so if There is no effective coverages then to return to local binarization part, carries out color overturning to the picture after local binarization.
Step 2.1: background removal is based on top cap operation.Its algorithm principle is that original image is done with original image opening operation result Difference.Its algorithm can be described as:
Topimg=tophat (img, element)=img-open (img, element) (6)
Wherein, topimg indicates that the picture after top cap operation, img indicate original image, and element indicates to transport for top cap Calculate the core with opening operation.
For 29 × 29 verification Fig. 9 processing, Figure 11 (a), Figure 11 (b) are respectively the intensity point before and after top cap operation Butut comparison.A part of ambient noise can be eliminated after top cap operation from can be seen that picture in Figure 11 (b), and can To protrude the character in image.Although noise cannot be completely removed, the workload of later period operation can be reduced, is avoided excessive The result of influence of noise local binarization.
Step 2.2: noise remove is then using based on Sauvola local binarization algorithm.
Sauvola algorithmic translation is as follows:
Step1: the mean value MEAN and variance STD of pixel f (x, y) within the scope of n*n are calculated;
Step 2: the threshold value T (x, y) of pixel f (x, y) is calculated according to formula;
Wherein, k is custom parameter and 0 < k < 1.N is the dynamic range of standard variance;
Using k=35, treated that picture is as shown in figure 12 by N=0.08.
From binaryzation picture, it can be seen that character can be separated with background, and background or excessive too small also has very much Noise.It only needs to extract the connected domain for meeting character boundary into the cutting that can complete character.And use connected domain The method speed of extraction is fast, can save many program runtimes.
Step 2.3: character zone, which extracts, is based on connected domain algorithm.
For removal noise as much as possible and retain effective information, then needs first to carry out picture if the 1st, 2 class pictures Then local binarization removes excessive or too small connected domain, interfered with eliminating part.If the 2nd class picture then to picture into The binarization operation and being inverted that row threshold value is 0.25 obtains the character indicated with white, the same class of post-processing it is identical.Through Experiment show local binarization threshold value be 16 × 16 when effect it is preferable, the connected domain area of character be in [500,5000] it Between.Processing result is as shown in figure 13.
As seen from Figure 13, by still thering are some inactive areas not eliminate after above-mentioned processing, therefore the present invention Inactive area is further deleted by judging whether each connected domain complies with standard the length of character, width and length-width ratio. By statistical information, the height of character is between [90,110], and between [10,60], the length-width ratio of character is not less than character width 1.Obtained result is as shown in figure 14.
It can be obtained from Figure 14, the inactive area of picture is fully cancelled, in this way according to the position of connected domain each in binary map Single character zone in original image can be respectively cut out by the position in corresponding original image.For the picture ruler guaranteed It is very little consistent, the present invention every picture is filled respectively and be 28 × 28 by its size conversion normal pictures, single character It is as shown in figure 15 to cut final result.
Step 3: character recognition.
Character recognition is carried out using the convolutional neural networks algorithm in deep learning.The character recognition that the present invention is built Convolutional neural networks include two convolutional layers, two pond layers and a full articulamentum, as shown in figure 16.First convolutional layer is logical It crosses 6 and convolution is carried out having a size of 28 × 28 character picture to input having a size of 5 × 5 different convolution kernels, by first layer After convolution, former character picture becomes 24 × 24 × 6 characteristic pattern.First pond layer use sliding window for 2 × 2 Chi Huahan Several results to first convolutional layer carry out feature and extract again, undergo this layer of Chi Huahou, become 12 × 12 × 6 characteristic pattern.The Two convolutional layers extract the characteristic pattern progress feature of pond layer having a size of 5 × 5 different convolution kernels using 12 again, extracted Characteristic pattern afterwards is having a size of 8 × 8 × 12.Second pond layer carries out pond, pond to the characteristic pattern after second convolutional layer convolution Characteristic pattern size becomes 4 × 4 × 12 after change.Characteristic pattern after second of pondization is operated inputs full articulamentum, obtains the character Feature vector.The feature vector of character is finally subjected to classification and corresponding with real figure just completes in timestamp the time Character recognition.
In the present invention, the training step of the convolutional neural networks of time character recognition is divided into following 3 step:
Step 3.1: to the single character picture obtained in the chromosphere image and adding label as needed for trained network Data sample.
Step 3.2: sample data being integrated into X vector of the matrix as input layer of 28 × 28 × N, wherein N is The number of character sample.Using the corresponding digital label of one-dimensional matrix every in X vector as the Y-direction amount of input layer.
Step 3.3: the network is trained by propagated forward and backpropagation (BP), loop iteration updates its coefficient, adopts With ReLu activation primitive and maximum pond function, the higher network structure of recognition accuracy is finally obtained.Wherein input to Amount X and Y is admitted to iteration 100 times.
The convolutional neural networks that training is obtained carry out the single character in the temporal information in solar chromosphere film graph Identification will identify that the character come combines corresponding with the filename of original image in order and is automatically filled in Excel table, use In later period artificial nucleus couple and establish database.
Step 4: artificial nucleus check whether the picture date information automatically generated has to the date by the way of artificial Accidentally.
After the completion of temporal informations automatic identification such as " when, point " in timestamp, it is exactly artificial for also needing the essential step carried out It checks date information (year, month, day), referring to Fig.1 7.Since the time of shooting is continuous mostly, and using chronometry when 24, Therefore it is easy to just to judge whether shooting date changes.Such as first recognition result is " 2359 ", second recognition result For " 000 ", then second shooting date adds one day on the basis of first shooting date.So in a period of time Chromosphere image, it is only necessary to know that the from date of shooting can calculate the shooting date of every picture after obtaining.It adopts With the interface of Figure 17, former daily data are verified, if DATE ERROR just need to only modify daily first picture Shooting date is automatically updated below by recursive algorithm so the date of picture later.
When user carries out date verification, this interface filling original image path and corresponding Excel table path are opened.It clicks " Open " button, program successively opens picture according to numbering corresponding to each first day date in Excel table, and will fill out on the date Enter in the text box of right side.If the date recorded in Excel table is correct and a upper picture was a upper date, click directly on " Next Day " button carries out the verification on next date.It needs to find date jump by " Last ", " Next " button if mistake Turn picture and insert right side text box, click to update button program and the dates all below be updated automatically.Successively core It is right, it is to complete until program runs to the last one date.After tested, the date that 1 people completes 10,000 pictures checks institute It takes time about 10 minutes.In the present invention, the temporal informations such as " when, point " are automatically extracted using step 1-3 in timestamp, Because this time mainly expends, on " step 4. artificial nucleus are to the date ", this step is above.According to traditional mode, artificial record one by one Enter " year, month, day, hour, min " information, then the timestamp information typing that 1 people at least needs two days could complete 10,000 pictures. Therefore, benefit of present invention in terms of improving timestamp information efficiency of inputting and saving is significant.

Claims (9)

1. the sun film image timestamp information extracting method based on deep learning, it is characterised in that the following steps are included:
Step 1: to the timestamp information zone location in solar chromosphere film image and cutting out;
Step 2: the character in timestamp information region is further divided, obtains single character by monocase segmentation;
Step 3: character recognition first uses a large amount of sample training network, then makes the single character that step 2 segmentation obtains It is identified with trained network, and its recognition result is integrated and saved.
2. the sun film image timestamp information extracting method based on deep learning, feature exist according to claim 1 In: the step 1 comprises the steps of:
Step 1.1, the sun spherical aberration step based on upright projection:
Accumulation operations are carried out in vertical component to image and obtain the vector of a 1 × n, it is assumed that the size of picture is m × n, as Element is f in the pixel value that i row j is arrangedij(x, y) is then projected in vertical direction are as follows:
Wherein, S1jIndicate image the i-th column pixel summation as a result, S1jSize be 1 × n;By calculating picture in Vertical Square To projection can further judge the position of sun spherical surface;In S1jIt is projected between vector [400,1800] for solar chromosphere part Result;Again because the sun is symmetrically, only to need to learn in vector S1jIn maximum value, the sun center of circle can be positioned Position, then the length in pixels according to sun spherical surface shared by vertical direction, will remove comprising the picture of sun land portions;
Step 1.2, timestamp position and overturning correction are judged based on variance:
The variance of picture comprising timestamp is far longer than the variance of the picture not comprising timestamp;To judge timestamp The picture at place is needed to carry out inversion correction to picture, then be needed if Image to left clockwise after learning timestamp place picture It turns over and turn 90 degrees, otherwise be rotated by 90 ° counterclockwise;
Step 1.3, the timestamp character zone essence segmentation based on sciagraphy:
The picture for being m × n for size, pixel are x in the pixel value that i row j is arrangedij, then both horizontally and vertically project It is respectively as follows:
Wherein, S1jIndicate image the i-th column pixel summation as a result, S1jSize be 1 × n;Si1Indicate image jth row pixel Point summation as a result, Si1Size be m × 1;It, can be further by calculating the projection of picture both horizontally and vertically The specific location in timestamp region is to realize the Accurate Segmentation of picture.
3. the sun film image timestamp information extracting method based on deep learning, feature exist according to claim 2 In:
The picture obtained based on step 1.3 cuts picture in projection result both horizontally and vertically:
In order to guarantee that the continuity of picture is not destroyed, as starting point, the last one is greater than equal the point for being greater than mean value using first The point of value retains all images in origin-to-destination region as terminal;Assuming that the size of original image S is m × n, picture P after cutting Size be m ' × n ', cut formula are as follows:
P=S (a:b, c:d), (a, c>1, b<m, d<n)
Wherein:
In formula, S (a:b, c:d) is indicated in picture S, a to b row, c to d column;X indicates the value that certain is put in floor projection,It indicates Each point mean value in floor projection;Y indicates the value that certain is put in upright projection,Indicate each point mean value in upright projection;It indicates in projection vector, x is greater thanPosition minimum value;It indicates in projection vector, x is big InPosition maximum value.
4. the sun film image timestamp information extracting method based on deep learning, feature exist according to claim 1 In:
In the step 2, the process of monocase segmentation are as follows:
The background of picture is removed using top cap operation first, noise is then removed using local binarization algorithm, finally Character zone is extracted using connected domain algorithm;The algorithm default character color is white, by connecting if character color is black Logical domain will be extracted less than any effective coverage after extracting, so effective coverage then returns to local binarization part if it does not exist, Color overturning is carried out to the picture after local binarization.
5. the sun film image timestamp information extracting method based on deep learning, feature exist according to claim 4 In:
Background removal is carried out using top cap mathematical algorithm, algorithm principle is that original image makes the difference with original image opening operation result;Figure Piece can eliminate a part of ambient noise after top cap operation, and can protrude the character in image.
6. the sun film image timestamp information extracting method based on deep learning, feature exist according to claim 4 In:
Noise remove is carried out using Sauvola local binarization algorithm;It only needs to extract the connected domain for meeting character boundary It can complete the cutting of character.
7. the sun film image timestamp information extracting method based on deep learning, feature exist according to claim 4 In:
Character zone extraction is carried out using connected domain algorithm, by judging whether each connected domain complies with standard the length of character, Width and length-width ratio further delete inactive area;The height of character is between [90,110], and character width is in [10,60] Between, the length-width ratio of character is not less than 1;
The position in original image is corresponded to according to the position of connected domain each in binary map, it can be by the single character zone in original image point It is not cut into;In order to which the dimension of picture guaranteed is consistent, which is respectively filled every picture, and by its size It is transformed to 28 × 28 normal pictures.
8. the sun film image timestamp information extracting method based on deep learning, feature exist according to claim 1 In:
In the step 3, the process of character recognition are as follows:
Character recognition is carried out using the convolutional neural networks algorithm in deep learning, the convolutional Neural net for the character recognition built Network includes two convolutional layers, two pond layers and a full articulamentum, and first convolutional layer passes through 6 having a size of 5 × 5 difference Convolution kernel to input having a size of 28 × 28 character picture carry out convolution, after first layer convolution, former character picture becomes 24 × 24 × 6 characteristic pattern;It is 2 × 2 pond function to the knot of first convolutional layer that first pond layer, which uses sliding window, Fruit carries out feature and extracts again, undergoes this layer of Chi Huahou, becomes 12 × 12 × 6 characteristic pattern;Second convolutional layer uses 12 rulers The very little different convolution kernels for being 5 × 5 carry out feature to the characteristic pattern of pond layer and extract again, it is extracted after characteristic pattern having a size of 8 × 8 ×12;Second pond layer carries out pond to the characteristic pattern after second convolutional layer convolution, and Chi Huahou characteristic pattern size becomes 4 × 4×12;Characteristic pattern after second of pondization is operated inputs full articulamentum, obtains the feature vector of the character;Finally by character Feature vector carry out classification and corresponding with real figure just complete time character recognition in timestamp;
The convolutional neural networks that training is obtained, know the single character in the temporal information in solar chromosphere film graph Not, it will identify that the character come combines corresponding with the filename of original image in order and inserts in Excel table, be used for the later period Artificial nucleus couple and establish database.
9. the sun film image timestamp information extracting method based on deep learning, feature exist according to claim 1 In: further include step 4, artificial nucleus are to the date:
For the chromosphere image in a period of time, it is only necessary to which the year, month, day information for inputting first image can calculate automatically The shooting date of every picture after obtaining;There are some dates for not carrying out sun observation for intermediate occasional, using artificial The mode of verification modifies the wrong date information automatically generated.
CN201910765276.1A 2019-08-19 2019-08-19 Deep learning-based sun film image timestamp information extraction method Active CN110533030B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910765276.1A CN110533030B (en) 2019-08-19 2019-08-19 Deep learning-based sun film image timestamp information extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910765276.1A CN110533030B (en) 2019-08-19 2019-08-19 Deep learning-based sun film image timestamp information extraction method

Publications (2)

Publication Number Publication Date
CN110533030A true CN110533030A (en) 2019-12-03
CN110533030B CN110533030B (en) 2023-07-14

Family

ID=68663766

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910765276.1A Active CN110533030B (en) 2019-08-19 2019-08-19 Deep learning-based sun film image timestamp information extraction method

Country Status (1)

Country Link
CN (1) CN110533030B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898606A (en) * 2020-05-19 2020-11-06 武汉东智科技股份有限公司 Night imaging identification method for superimposing transparent time characters in video image

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6026177A (en) * 1995-08-29 2000-02-15 The Hong Kong University Of Science & Technology Method for identifying a sequence of alphanumeric characters
CN101246551A (en) * 2008-03-07 2008-08-20 北京航空航天大学 Fast license plate locating method
CN101751568A (en) * 2008-12-12 2010-06-23 汉王科技股份有限公司 ID No. locating and recognizing method
CN102402686A (en) * 2011-12-07 2012-04-04 北京云星宇交通工程有限公司 Method for dividing license plate characters based on connected domain analysis
US20150278626A1 (en) * 2014-03-31 2015-10-01 Nidec Sankyo Corporation Character recognition device and character segmentation method
CN105528606A (en) * 2015-10-30 2016-04-27 小米科技有限责任公司 Region identification method and device
CN105528600A (en) * 2015-10-30 2016-04-27 小米科技有限责任公司 Region identification method and device
WO2017020723A1 (en) * 2015-08-04 2017-02-09 阿里巴巴集团控股有限公司 Character segmentation method and device and electronic device
CN106611174A (en) * 2016-12-29 2017-05-03 成都数联铭品科技有限公司 OCR recognition method for unusual fonts
CN108734189A (en) * 2017-04-20 2018-11-02 天津工业大学 Vehicle License Plate Recognition System based on atmospherical scattering model and deep learning under thick fog weather
CN108921163A (en) * 2018-06-08 2018-11-30 南京大学 A kind of packaging coding detection method based on deep learning
CN109359695A (en) * 2018-10-26 2019-02-19 东莞理工学院 A kind of computer vision 0-O recognition methods based on deep learning
US20190095739A1 (en) * 2017-09-27 2019-03-28 Harbin Institute Of Technology Adaptive Auto Meter Detection Method based on Character Segmentation and Cascade Classifier
CN109657665A (en) * 2018-10-31 2019-04-19 广东工业大学 A kind of invoice batch automatic recognition system based on deep learning
CN109784342A (en) * 2019-01-24 2019-05-21 厦门商集网络科技有限责任公司 A kind of OCR recognition methods and terminal based on deep learning model

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6026177A (en) * 1995-08-29 2000-02-15 The Hong Kong University Of Science & Technology Method for identifying a sequence of alphanumeric characters
CN101246551A (en) * 2008-03-07 2008-08-20 北京航空航天大学 Fast license plate locating method
CN101751568A (en) * 2008-12-12 2010-06-23 汉王科技股份有限公司 ID No. locating and recognizing method
CN102402686A (en) * 2011-12-07 2012-04-04 北京云星宇交通工程有限公司 Method for dividing license plate characters based on connected domain analysis
US20150278626A1 (en) * 2014-03-31 2015-10-01 Nidec Sankyo Corporation Character recognition device and character segmentation method
WO2017020723A1 (en) * 2015-08-04 2017-02-09 阿里巴巴集团控股有限公司 Character segmentation method and device and electronic device
CN105528600A (en) * 2015-10-30 2016-04-27 小米科技有限责任公司 Region identification method and device
CN105528606A (en) * 2015-10-30 2016-04-27 小米科技有限责任公司 Region identification method and device
CN106611174A (en) * 2016-12-29 2017-05-03 成都数联铭品科技有限公司 OCR recognition method for unusual fonts
CN108734189A (en) * 2017-04-20 2018-11-02 天津工业大学 Vehicle License Plate Recognition System based on atmospherical scattering model and deep learning under thick fog weather
US20190095739A1 (en) * 2017-09-27 2019-03-28 Harbin Institute Of Technology Adaptive Auto Meter Detection Method based on Character Segmentation and Cascade Classifier
CN108921163A (en) * 2018-06-08 2018-11-30 南京大学 A kind of packaging coding detection method based on deep learning
CN109359695A (en) * 2018-10-26 2019-02-19 东莞理工学院 A kind of computer vision 0-O recognition methods based on deep learning
CN109657665A (en) * 2018-10-31 2019-04-19 广东工业大学 A kind of invoice batch automatic recognition system based on deep learning
CN109784342A (en) * 2019-01-24 2019-05-21 厦门商集网络科技有限责任公司 A kind of OCR recognition methods and terminal based on deep learning model

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
A.KRIZHEVSKY,I.SUTSKEVER, G.E. HINTON: "Imagenet classification with deep convolutional neural networks", 《PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON NEURAL INFORMATION PROCESSING SYSTEMS》 *
唐铭豆;陶青川;冯谦;: "基于神经网络的芯片表面字符检测识别***", 现代计算机(专业版), no. 09 *
曾祥云,郑胜等: "基于SVM手绘太阳黑子图像背景提取方法", 《微型机与应用》 *
朱明锋;郑胜;曾祥云;徐高贵: "基于SVM 手绘太阳黑子图像背景提取方法", 《人工智能》 *
朱明锋;郑胜;曾祥云;徐高贵: "基于SVM 手绘太阳黑子图像背景提取方法", 《人工智能》, 16 December 2016 (2016-12-16) *
朱道远;郑胜;曾祥云;徐高贵;: "手绘太阳黑子图手写字符分割方法研究", 微型机与应用, no. 20 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898606A (en) * 2020-05-19 2020-11-06 武汉东智科技股份有限公司 Night imaging identification method for superimposing transparent time characters in video image
CN111898606B (en) * 2020-05-19 2023-04-07 武汉东智科技股份有限公司 Night imaging identification method for superimposing transparent time characters in video image

Also Published As

Publication number Publication date
CN110533030B (en) 2023-07-14

Similar Documents

Publication Publication Date Title
CN110443143B (en) Multi-branch convolutional neural network fused remote sensing image scene classification method
CN113392775B (en) Sugarcane seedling automatic identification and counting method based on deep neural network
CN112232371B (en) American license plate recognition method based on YOLOv3 and text recognition
CN110490212A (en) Molybdenum target image processing arrangement, method and apparatus
CN107742099A (en) A kind of crowd density estimation based on full convolutional network, the method for demographics
CN108921201B (en) Dam defect identification and classification method based on feature combination with CNN
CN109086652A (en) Handwritten word model training method, Chinese characters recognition method, device, equipment and medium
CN110569747A (en) method for rapidly counting rice ears of paddy field rice by using image pyramid and fast-RCNN
CN109886153A (en) A kind of real-time face detection method based on depth convolutional neural networks
CN110929746A (en) Electronic file title positioning, extracting and classifying method based on deep neural network
CN110674777A (en) Optical character recognition method in patent text scene
CN115099297B (en) Soybean plant phenotype data statistical method based on improved YOLO v5 model
CN111553438A (en) Image identification method based on convolutional neural network
CN111339902A (en) Liquid crystal display number identification method and device of digital display instrument
CN110147833A (en) Facial image processing method, apparatus, system and readable storage medium storing program for executing
CN110533100A (en) A method of CME detection and tracking is carried out based on machine learning
CN113378812A (en) Digital dial plate identification method based on Mask R-CNN and CRNN
CN109002771A (en) A kind of Classifying Method in Remote Sensing Image based on recurrent neural network
CN109034213A (en) Hyperspectral image classification method and system based on joint entropy principle
CN116758539B (en) Embryo image blastomere identification method based on data enhancement
CN116883996B (en) Embryo development stage prediction and quality assessment system based on rotation constant-change network
CN110533030A (en) Sun film image timestamp information extracting method based on deep learning
CN110569871B (en) Saddle point identification method based on deep convolutional neural network
CN117095180A (en) Embryo development stage prediction and quality assessment method based on stage identification
CN116612272A (en) Intelligent digital detection system for image processing and detection method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant