CN110533030A

CN110533030A - Sun film image timestamp information extracting method based on deep learning

Info

Publication number: CN110533030A
Application number: CN201910765276.1A
Authority: CN
Inventors: 曾曙光; 左肖雄; 郑胜; 张佳锋; 曾祥云
Original assignee: China Three Gorges University CTGU
Current assignee: China Three Gorges University CTGU
Priority date: 2019-08-19
Filing date: 2019-08-19
Publication date: 2019-12-03
Anticipated expiration: 2039-08-19
Also published as: CN110533030B

Abstract

Sun film image timestamp information extracting method based on deep learning, including step 1: to the timestamp information zone location in solar chromosphere film image and cutting out；Timestamp information is in solar chromosphere film image for indicating the year, month, day, hour, min information of shooting time；Step 2: the character in timestamp information region is further divided, obtains single character by monocase segmentation；Step 3: character recognition first uses a large amount of sample training network, then identifies the single character that step 2 segmentation obtains to it using trained network, and its recognition result is integrated and saved.This method carries out machine automatic identification to the Digital Time-stamp in sun observation image film, and the temporal information output that will identify that.With the workload for reducing manual identified He writing temporal information, so as to accelerate the digitlization process of this batch of film data, the historical summary for keeping these precious is more easily for Solar Physics research.

Description

Sun film image timestamp information extracting method based on deep learning

Technical field

The present invention relates to sun observation technical field of image processing, and in particular to a kind of sun film based on deep learning Image temporal stabs information extracting method.

Background technique

Solar chromosphere layer is the atmosphere on photosphere, and as photosphere to the transition region of corona layer, magnetic field is less Stablize, often generates violent flare burst phenomenon.The radiation of solar flare usually band-like appears in magnetic with elongated in chromosphere The two sides of umpolung line (PIL), this is considered as the evidence of the representative configuration of magnetic reconnection.In order to study flare burst phenomenon, need Related personnel carries out shooting record to solar chromosphere for a long time.Because of historical summary enormous amount, the time of very big a collection of chromosphere image Information is still to be presented in data with image format, and the not formed digital information directly read for computer.This is to utilization The expansion scientific research of these data brings big inconvenience.

On the one hand excavation to historical summary effective information has been deepened in the shooting time digitlization of image, on the other hand can be with Greatly mitigate the retrieval workload of scientific research personnel, in order to they obtained from these data it is more valuable as a result, right The progress of research work is very helpful.

Historical sun observation image mostly uses film to save, and its shooting time is imprinted on film.For the ease of section It grinds personnel and effectively uses these image documents, it is necessary to extract the timestamp information in film.Picture number is very Huge, manual identified and extraction are time-consuming and laborious.Therefore, using the timestamp information in Computer Automatic Recognition image, can be Efficiently utilize the key point of these data.

Summary of the invention

In order to solve the above technical problems, the present invention provides a kind of sun film image timestamp information based on deep learning Extracting method, this method carry out machine automatic identification to the Digital Time-stamp in sun observation image film, and will identify that Temporal information output, to reduce the workload of manual identified and entry time information.So as to accelerate the number of this batch of film data Word process, the historical summary for keeping these precious is more easily for Solar Physics research.

The technical scheme adopted by the invention is as follows:

Sun film image timestamp information extracting method based on deep learning, comprising the following steps:

Step 1: to the timestamp information zone location in solar chromosphere film image and cutting out；

Timestamp information is in solar chromosphere film image for indicating the year, month, day, hour, min information of shooting time；

Step 2: the character in timestamp information region is further divided, obtains single character by monocase segmentation；

Step 3: character recognition first uses a large amount of sample training network, then step 2 is divided to obtained single word Symbol identifies it using trained network, and its recognition result is integrated and saved.

The step 1 comprises the steps of:

Step 1.1, the sun spherical aberration step based on upright projection:

Accumulation operations are carried out in vertical component to image and obtain the vector of a 1 × n, it is assumed that the size of picture be m × N, pixel are f in the pixel value that i row j is arranged_ij(x, y) is then projected in vertical direction are as follows:

Wherein, S_1jIndicate image the i-th column pixel summation as a result, S_1jSize be 1 × n.It is being hung down by calculating picture Histogram to projection can further judge the position of sun spherical surface.In S_1jIt is solar chromosphere part between vector [400,1800] The result of projection.Again because the sun is symmetrically, only to need to learn in vector S_1jIn maximum value, can position the sun circle The position of the heart, then the length in pixels according to sun spherical surface shared by vertical direction, the picture comprising sun land portions is gone It removes.

Step 1.2, timestamp position and overturning correction are judged based on variance:

The variance of picture comprising timestamp is far longer than the variance of the picture not comprising timestamp.When to judge Between picture where stamp, after learning picture where timestamp, need to carry out inversion correction to picture, then be needed if Image to left suitable Hour hands, which turn over, to be turn 90 degrees, otherwise is rotated by 90 ° counterclockwise.

Step 1.3, the timestamp character zone essence segmentation based on sciagraphy:

The picture for being m × n for size, pixel are x in the pixel value that i row j is arranged_ij, then both horizontally and vertically Projection is respectively as follows:

Wherein, S_1jIndicate image the i-th column pixel summation as a result, S_1jSize be 1 × n.S_i1Indicate image jth row Pixel summation as a result, S_i1Size be m × 1.By calculating the projection of picture both horizontally and vertically, Neng Goujin The specific location in one step timestamp region is to realize the Accurate Segmentation of picture.

Step 1.4, timestamp region is cut:

Based on picture in projection result both horizontally and vertically, picture is cut.In order to guarantee the continuous of picture Property be not destroyed, so the point for being greater than mean value using first is used as starting point, the last one is greater than the point of mean value as terminal, reservation All images in origin-to-destination region.Assuming that the size of original image S is m × n, the size of picture P is m ' × n ' after cutting, is cut Cut formula are as follows:

P=S (a:b, c:d), (a, c>1, b<m, d<n)

Wherein:

In formula, S (a:b, c:d) is indicated in picture S, a to b row, c to d column.X indicates the value that certain is put in floor projection, Indicate each point mean value in floor projection.Y indicates the value that certain is put in upright projection,Indicate each point mean value in upright projection.It indicates in projection vector, x is greater thanPosition minimum value.It indicates in projection vector, x is big InPosition maximum value, other are similarly.

In the step 2, the process of monocase segmentation are as follows:

The background of picture is removed using top cap operation first, noise is then removed using local binarization algorithm, Character zone is finally extracted using connected domain algorithm；The algorithm default character color is white, is passed through if character color is black Crossing after connected domain is extracted will extract less than any effective coverage, so effective coverage then returns to local binarization portion if it does not exist Point, color overturning is carried out to the picture after local binarization.

Background removal is carried out using top cap mathematical algorithm, algorithm principle is that original image is done with original image opening operation result Difference.Picture can eliminate a part of ambient noise after top cap operation, and can protrude the character in image.

Noise remove is carried out using Sauvola local binarization algorithm.It only needs to mention the connected domain for meeting character boundary Take out the cutting that can complete character.And it is fast using the method speed that connected domain is extracted, many program fortune can be saved The row time.

Using connected domain algorithm carry out character zone extraction, first to picture carry out local binarization, then remove it is excessive or Too small connected domain is interfered with eliminating part.Effect is preferable when local binarization threshold value is 16 × 16, the connected domain area of character It is between [500,5000].By still having some inactive areas not eliminate after above-mentioned processing, therefore the present invention is logical The length for judging whether each connected domain complies with standard character, width and length-width ratio are crossed further to delete inactive area.By Statistical information, the height of character is between [90,110], and for character width between [10,60], the length-width ratio of character is not less than 1.

The position in original image is corresponded to according to the position of connected domain each in binary map, it can be by the single character in original image Region is respectively cut out.In order to which the dimension of picture guaranteed is consistent, the present invention is respectively filled every picture and incites somebody to action The normal pictures that its size conversion is 28 × 28.

In the step 3, the process of character recognition are as follows:

Character recognition is carried out using the convolutional neural networks algorithm in deep learning.The character recognition that the present invention is built Convolutional neural networks include two convolutional layers, two pond layers and a full articulamentum.First convolutional layer by 6 having a size of 5 × 5 different convolution kernels carry out convolution, after first layer convolution, former word having a size of 28 × 28 character picture to input Symbol picture becomes 24 × 24 × 6 characteristic pattern.It is 2 × 2 pond function to first that first pond layer, which uses sliding window, The result of convolutional layer carries out feature and extracts again, undergoes this layer of Chi Huahou, becomes 12 × 12 × 6 characteristic pattern.Second convolutional layer Feature carried out to the characteristic pattern of pond layer extract again having a size of 5 × 5 different convolution kernels using 12, it is extracted after characteristic pattern Having a size of 8 × 8 × 12.Second pond layer carries out pond, Chi Huahou characteristic pattern to the characteristic pattern after second convolutional layer convolution Size becomes 4 × 4 × 12.Characteristic pattern after second of pondization is operated inputs full articulamentum, obtains the feature vector of the character. The feature vector of character is finally subjected to classification and corresponding with real figure just completes time character recognition in timestamp.

The convolutional neural networks that training is obtained carry out the single character in the temporal information in solar chromosphere film graph Identification will identify that the character come combines corresponding with the filename of original image in order and inserts in Excel table, after being used for Phase artificial nucleus couple and establish database.

Further include step 4, artificial nucleus are to the date:

For the chromosphere image in a period of time, it is only necessary to the date information for inputting first image, it can be automatic Calculate the shooting date of every picture after obtaining.There are some dates for not carrying out sun observation for intermediate occasional, at this moment It needs by the way of artificial nucleus couple, the wrong date information automatically generated is modified.

A kind of sun film image timestamp information extracting method based on deep learning of the present invention, technical effect are as follows:

1) the timestamp information extracting method based on deep learning that, the invention proposes a kind of, be used to systematically identification and More than digitized 700 ten thousand solar chromosphere film images are scanned between arrangement US National sun observatory 1956-2003 The temporal information of data.Firstly, positioning timestamp information region in image and being divided；Secondly, using top cap operation, local The methods of binaryzation, connected domain screening eliminate the interference of noise, carry out Character segmentation to timestamp information area image；Then, 10000 categorized character picture training convolutional neural networks are chosen, and test the recognition effect of obtained network；Most Afterwards, the network obtained using training carries out batch identification to the timestamp information in 10000 chromosphere images, ties later to identification Fruit carries out quantitative analysis.The result shows that this method can automatically, accurately and rapidly realize timestamp in scanning sun film image The positioning and identification of information.

2), using the method based on convolutional neural networks in deep learning, to nearly 50 years of the shooting of US National observatory The identification problem of temporal information is studied in solar chromosphere film picture.The result shows that: this method is to the character in picture Identification has very strong applicability, and recognition correct rate can achieve 98% or more, and one picture of average treatment is no more than 0.1 second, It can satisfy the present invention in practical applications to the needs of recognition speed, identification quality, and there is very strong portability, it is right Solve the problems, such as that later period same type has very high reference value.

Detailed description of the invention

Present invention will be further explained below with reference to the attached drawings and examples:

Fig. 1 is general convolution neural network structure figure.

Fig. 2 is timestamp character recognizing process figure.

Fig. 3 (a) is the solar chromosphere film image one with timestamp information；

Fig. 3 (b) is the solar chromosphere film image two with timestamp information；

Fig. 3 (c) is the solar chromosphere film image three with timestamp information.

Fig. 4 is the timestamp information area schematic in image.

Fig. 5 is S_1jDisplay figure of the vector in reference axis.

Fig. 6 is to remove the figure obtained after sun spherical surface.

Fig. 7 is the picture comprising timestamp.

Fig. 8 (a) is projection vector figure one；

Fig. 8 (b) is projection vector figure two.

Fig. 9 is timestamp region cutting result figure.

Figure 10 is monocase extraction algorithm flow chart.

Figure 11 (a) is the intensity distribution of character (before top cap operation)；

Figure 11 (b) is the intensity distribution of character (after top cap operation).

Figure 12 is binaryzation picture.

Figure 13 is the binary map removed after noise.

Figure 14 is the binary map eliminated after inactive area.

Figure 15 is Character segmentation result figure.

Figure 16 is character recognition convolutional neural networks structure chart.

Figure 17 is to check graphical interfaces figure on the date.

Specific embodiment

Illustrate embodiments of the present invention below by way of specific specific example:

The framework of one general convolutional neural networks, including input layer, convolutional layer, pond layer, full articulamentum and output Layer etc., structure is as shown in Figure 1.Input layer is inputted with the feature vector that output layer exports by softmax logistic regression Data classify, when input layer be character image data when, can by output layer classify result to character picture into Row classification, and then realize character recognition.Multiple convolutional layers, pond layer and complete are can according to need in one convolutional neural networks Articulamentum, Fig. 1 are merely representative of its general type.

The timestamp information in scanning solar chromosphere film image is extracted by convolutional neural networks (CNN), Three parts are broadly divided into, as shown in Figure 2:

Step 1, picture time stamp information area is positioned and is cut out；

Step 2, the character in timestamp information region is further divided and obtains single character；

Step 3, a large amount of sample training network is first used, the character for then obtaining segmentation uses trained network pair It identify and its recognition result is integrated and saved.

Step 1: picture time stamp information area is positioned and cut out.

Shown in original solar chromosphere film picture such as Fig. 3 (a), Fig. 3 (b), Fig. 3 (c), the resolution ratio of every picture is 1600 × 2048, timestamp information is normally placed at the left or right side of picture, and position is not fixed.The character format of temporal information is substantially divided For two classes, such as Fig. 3 (a), Fig. 3 (b), and the Character Style of every one kind is different, it is therefore desirable to which classification identifies character.And by Different in the clarity of every picture, most of picture is more dim, and character is difficult to recognize, as shown in Fig. 3 (c).Therefore it needs Every picture is pre-processed.It is that the present invention needs to obtain that whole picture, which only has timestamp character, such as the portion in the red frame of Fig. 4 Point.Record has the year, month, day for shooting the photo with timely, minute, second information respectively in timestamp information, in frame yellow in Fig. 4 Part.According to the observation time precision of these data, it is only necessary to obtain year, month, day with information that is timely, dividing.Due to The position of temporal information is not fixed, so timestamp region should be positioned and be cut first by the present invention.The position of timestamp It sets on left side or right side, it can't be on the spherical surface of the sun, so the present invention needs first to cut off the part comprising the sun, so Position where hunting time stabs again afterwards.

Step 1.1: being the sun spherical aberration step based on upright projection first.

Upright projection method is to be carried out according to the information of pixel in image in characteristic distributions of the vertical direction to pixel A kind of method checked.Its calculation method is to carry out accumulation operations in vertical component to image and obtain the vector of a 1 × n. Assuming that the size of picture is m × n, pixel is f in the pixel value that i row j is arranged_ij(x, y) is then projected in vertical direction are as follows:

Wherein, S_1jIndicate image the i-th column pixel summation as a result, S_1jSize be 1 × n.It is being hung down by calculating picture Histogram to projection can further judge the position of sun spherical surface.After carrying out vertical direction projection calculating to Fig. 3 (a), by To S_1jVector is as shown in Figure 5.As seen from Figure 5, in S_1jIt is the knot of solar chromosphere part projection between vector [400,1800] Fruit.Again because the sun is symmetrically, the present invention only needs to learn in vector S_1jIn maximum value, the sun center of circle can be positioned Position, then the length in pixels according to sun spherical surface shared by vertical direction will include the picture of the sun land portions removal. Fig. 6 is except two Zhang little Tu obtained after sun spherical surface.

Step 1.2: timestamp position and overturning correction are judged based on variance.

Picture is inherently the matrix for including multiple pixels, and the stool and urine of pixel point value is reflected in the face in picture On color.Such as: " 0 " indicates black in bianry image, and " 1 " indicates white.It is not bianry image in Fig. 6 but 256 grades of gray scales Image.The size of each pixel indicates the brightness degree of the point, shares 256 grades, series is higher in the bright of the pixel position It spends bigger.It will be appreciated from fig. 6 that timestamp is indicated in the picture comprising timestamp with point of high brightness, and timestamp is not included Picture in, most pixels are all that brightness is darker.This step uses the timestamp position judging method based on variance, Judged timestamp on any picture according to the variance size of pixel value in image array.The figure for being m × n for size Piece, variance calculation formula are expressed as follows.

Wherein, x_ijIndicate pixel value of the pixel at [m, n],Indicate overall pixel value mean value, n*m indicates total Number of pixels.

Obviously, the variance of the picture comprising timestamp is far longer than the variance of the picture not comprising timestamp.To judge Picture where timestamp out.After picture where learning timestamp, need to carry out inversion correction to picture.Then if Image to left 90 degree need to be turned clockwise, otherwise is rotated by 90 ° counterclockwise.Image rotation formula is as follows:

Wherein, x, y indicate that original pixel position, the location of pixels after x ', y ' expression rotation transformation, β indicate rotation counterclockwise The angle turned.It is as shown in Figure 7 by variance judgement and postrotational picture by taking Fig. 6 as an example.

As can be known from Fig. 7, a large amount of garbages are still had in the picture comprising timestamp.This to a certain extent can be right Calculating speed has an impact.In view of huge picture amount, these useless regions will cause unnecessary storage consumption.

Step 1.3: the present invention realizes the essence segmentation in timestamp region using horizontal vertical sciagraphy.

Horizontal vertical projecting method is the information according to pixel in image both horizontally and vertically respectively to picture A kind of method that the characteristic distributions of vegetarian refreshments are checked.It is frequently used for carrying out accurate projection to target area, point for the later period Cut operation.Its calculation method is to carry out accumulation operations respectively on horizontal and vertical component to image and obtain two vectors. The picture for being m × n for size, pixel are x in the pixel value that i row j is arranged_ij, then both horizontally and vertically projection is distinguished Are as follows:

Wherein, S_1jIndicate image the i-th column pixel summation as a result, S_1jSize be 1 × n.S_i1Indicate image jth row Pixel summation as a result, S_i1Size be m × 1.It can be by calculating picture projection both horizontally and vertically The specific location in one step timestamp region is to realize the Accurate Segmentation of picture.By taking Fig. 7 as an example, horizontally and vertically Shown in the vector of projection such as Fig. 8 (a), Fig. 8 (b).

Step 1.4: timestamp region is cut.

Based on picture in projection result both horizontally and vertically, picture is cut.It can be with from Fig. 8 (a), 8 (b) It learns, the corresponding projection result both horizontally and vertically in timestamp region is higher in picture.According to this feature, limits and only protect Timestamp region segmentation can be completed in the part for staying projection result to be greater than average value.In order to guarantee that the continuity of picture is not broken Bad, so the point for being greater than mean value using first is as starting point, the last one is greater than the point of mean value as terminal, retains starting point to eventually All images in point region.Assuming that the size of original image S is m × n, the size of picture P is m ' × n ' after cutting, cuts formula Are as follows:

P=S (a:b, c:d), (a, c>1, b<m, d<n) (5)

Wherein:

In formula, S (a:b, c:d) is indicated in picture S, a to b row, c to d column.X indicates the value that certain is put in floor projection, Indicate each point mean value in floor projection.Y indicates the value that certain is put in upright projection,Indicate each point mean value in upright projection.It indicates in projection vector, x is greater thanPosition minimum value.It indicates in projection vector, x is big InPosition maximum value, other are similarly.By taking Fig. 7 as an example, after cutting as shown in Figure 9.

Step 2: monocase segmentation.

As shown in Figure 9, the smaller easy and obscure portions of date character are unclear, are easy to be taken as in end processing sequences and make an uproar Acoustic jamming and be deleted, so that the influence to result is more violent.Therefore, identification time division information is only considered in identification process, For the date by the way of being filled in manually and workload is little.

In the actual operation process, the cutting of single character is the most difficult.Fig. 9 is from the preferable picture of mass ratio Cut obtained result.However under actual conditions, for most of picture similar to Fig. 3 (c), character portion is smudgy, and some is even Human eye can not identify.Therefore the acquisition of single character is more than the acquisition difficulty of character zone, while being also easiest to influence most Whole recognition result.In addition, shown in 3 (b), 3 (c), timestamp information area characters format is different, therefore needs to handle such as Fig. 3 (a) The versatility of algorithm is more considered in the process.It is as shown in Figure 10 for single Character segmentation process, top cap operation pair is used first The background of picture is removed, and then removes noise using local binarization algorithm, finally extracts character using connected domain algorithm Region.Because character types are divided into two classes, such as Fig. 3 (a), 3 (b), character is indicated with white and black respectively.The algorithm defaults word According with color is white, will be extracted less than any effective coverage after connected domain is extracted if character color is black, so if There is no effective coverages then to return to local binarization part, carries out color overturning to the picture after local binarization.

Step 2.1: background removal is based on top cap operation.Its algorithm principle is that original image is done with original image opening operation result Difference.Its algorithm can be described as:

Topimg=tophat (img, element)=img-open (img, element) (6)

Wherein, topimg indicates that the picture after top cap operation, img indicate original image, and element indicates to transport for top cap Calculate the core with opening operation.

For 29 × 29 verification Fig. 9 processing, Figure 11 (a), Figure 11 (b) are respectively the intensity point before and after top cap operation Butut comparison.A part of ambient noise can be eliminated after top cap operation from can be seen that picture in Figure 11 (b), and can To protrude the character in image.Although noise cannot be completely removed, the workload of later period operation can be reduced, is avoided excessive The result of influence of noise local binarization.

Step 2.2: noise remove is then using based on Sauvola local binarization algorithm.

Sauvola algorithmic translation is as follows:

Step1: the mean value MEAN and variance STD of pixel f (x, y) within the scope of n*n are calculated；

Step 2: the threshold value T (x, y) of pixel f (x, y) is calculated according to formula；

Wherein, k is custom parameter and 0 < k < 1.N is the dynamic range of standard variance；

Using k=35, treated that picture is as shown in figure 12 by N=0.08.

From binaryzation picture, it can be seen that character can be separated with background, and background or excessive too small also has very much Noise.It only needs to extract the connected domain for meeting character boundary into the cutting that can complete character.And use connected domain The method speed of extraction is fast, can save many program runtimes.

Step 2.3: character zone, which extracts, is based on connected domain algorithm.

For removal noise as much as possible and retain effective information, then needs first to carry out picture if the 1st, 2 class pictures Then local binarization removes excessive or too small connected domain, interfered with eliminating part.If the 2nd class picture then to picture into The binarization operation and being inverted that row threshold value is 0.25 obtains the character indicated with white, the same class of post-processing it is identical.Through Experiment show local binarization threshold value be 16 × 16 when effect it is preferable, the connected domain area of character be in [500,5000] it Between.Processing result is as shown in figure 13.

As seen from Figure 13, by still thering are some inactive areas not eliminate after above-mentioned processing, therefore the present invention Inactive area is further deleted by judging whether each connected domain complies with standard the length of character, width and length-width ratio. By statistical information, the height of character is between [90,110], and between [10,60], the length-width ratio of character is not less than character width 1.Obtained result is as shown in figure 14.

It can be obtained from Figure 14, the inactive area of picture is fully cancelled, in this way according to the position of connected domain each in binary map Single character zone in original image can be respectively cut out by the position in corresponding original image.For the picture ruler guaranteed It is very little consistent, the present invention every picture is filled respectively and be 28 × 28 by its size conversion normal pictures, single character It is as shown in figure 15 to cut final result.

Step 3: character recognition.

Character recognition is carried out using the convolutional neural networks algorithm in deep learning.The character recognition that the present invention is built Convolutional neural networks include two convolutional layers, two pond layers and a full articulamentum, as shown in figure 16.First convolutional layer is logical It crosses 6 and convolution is carried out having a size of 28 × 28 character picture to input having a size of 5 × 5 different convolution kernels, by first layer After convolution, former character picture becomes 24 × 24 × 6 characteristic pattern.First pond layer use sliding window for 2 × 2 Chi Huahan Several results to first convolutional layer carry out feature and extract again, undergo this layer of Chi Huahou, become 12 × 12 × 6 characteristic pattern.The Two convolutional layers extract the characteristic pattern progress feature of pond layer having a size of 5 × 5 different convolution kernels using 12 again, extracted Characteristic pattern afterwards is having a size of 8 × 8 × 12.Second pond layer carries out pond, pond to the characteristic pattern after second convolutional layer convolution Characteristic pattern size becomes 4 × 4 × 12 after change.Characteristic pattern after second of pondization is operated inputs full articulamentum, obtains the character Feature vector.The feature vector of character is finally subjected to classification and corresponding with real figure just completes in timestamp the time Character recognition.

In the present invention, the training step of the convolutional neural networks of time character recognition is divided into following 3 step:

Step 3.1: to the single character picture obtained in the chromosphere image and adding label as needed for trained network Data sample.

Step 3.2: sample data being integrated into X vector of the matrix as input layer of 28 × 28 × N, wherein N is The number of character sample.Using the corresponding digital label of one-dimensional matrix every in X vector as the Y-direction amount of input layer.

Step 3.3: the network is trained by propagated forward and backpropagation (BP), loop iteration updates its coefficient, adopts With ReLu activation primitive and maximum pond function, the higher network structure of recognition accuracy is finally obtained.Wherein input to Amount X and Y is admitted to iteration 100 times.

The convolutional neural networks that training is obtained carry out the single character in the temporal information in solar chromosphere film graph Identification will identify that the character come combines corresponding with the filename of original image in order and is automatically filled in Excel table, use In later period artificial nucleus couple and establish database.

Step 4: artificial nucleus check whether the picture date information automatically generated has to the date by the way of artificial Accidentally.

After the completion of temporal informations automatic identification such as " when, point " in timestamp, it is exactly artificial for also needing the essential step carried out It checks date information (year, month, day), referring to Fig.1 7.Since the time of shooting is continuous mostly, and using chronometry when 24, Therefore it is easy to just to judge whether shooting date changes.Such as first recognition result is " 2359 ", second recognition result For " 000 ", then second shooting date adds one day on the basis of first shooting date.So in a period of time Chromosphere image, it is only necessary to know that the from date of shooting can calculate the shooting date of every picture after obtaining.It adopts With the interface of Figure 17, former daily data are verified, if DATE ERROR just need to only modify daily first picture Shooting date is automatically updated below by recursive algorithm so the date of picture later.

When user carries out date verification, this interface filling original image path and corresponding Excel table path are opened.It clicks " Open " button, program successively opens picture according to numbering corresponding to each first day date in Excel table, and will fill out on the date Enter in the text box of right side.If the date recorded in Excel table is correct and a upper picture was a upper date, click directly on " Next Day " button carries out the verification on next date.It needs to find date jump by " Last ", " Next " button if mistake Turn picture and insert right side text box, click to update button program and the dates all below be updated automatically.Successively core It is right, it is to complete until program runs to the last one date.After tested, the date that 1 people completes 10,000 pictures checks institute It takes time about 10 minutes.In the present invention, the temporal informations such as " when, point " are automatically extracted using step 1-3 in timestamp, Because this time mainly expends, on " step 4. artificial nucleus are to the date ", this step is above.According to traditional mode, artificial record one by one Enter " year, month, day, hour, min " information, then the timestamp information typing that 1 people at least needs two days could complete 10,000 pictures. Therefore, benefit of present invention in terms of improving timestamp information efficiency of inputting and saving is significant.

Claims

1. the sun film image timestamp information extracting method based on deep learning, it is characterised in that the following steps are included:

Step 3: character recognition first uses a large amount of sample training network, then makes the single character that step 2 segmentation obtains It is identified with trained network, and its recognition result is integrated and saved.

2. the sun film image timestamp information extracting method based on deep learning, feature exist according to claim 1 In: the step 1 comprises the steps of:

Step 1.1, the sun spherical aberration step based on upright projection:

Accumulation operations are carried out in vertical component to image and obtain the vector of a 1 × n, it is assumed that the size of picture is m × n, as Element is f in the pixel value that i row j is arranged_ij(x, y) is then projected in vertical direction are as follows:

Wherein, S_1jIndicate image the i-th column pixel summation as a result, S_1jSize be 1 × n；By calculating picture in Vertical Square To projection can further judge the position of sun spherical surface；In S_1jIt is projected between vector [400,1800] for solar chromosphere part Result；Again because the sun is symmetrically, only to need to learn in vector S_1jIn maximum value, the sun center of circle can be positioned Position, then the length in pixels according to sun spherical surface shared by vertical direction, will remove comprising the picture of sun land portions；

The variance of picture comprising timestamp is far longer than the variance of the picture not comprising timestamp；To judge timestamp The picture at place is needed to carry out inversion correction to picture, then be needed if Image to left clockwise after learning timestamp place picture It turns over and turn 90 degrees, otherwise be rotated by 90 ° counterclockwise；

The picture for being m × n for size, pixel are x in the pixel value that i row j is arranged_ij, then both horizontally and vertically project It is respectively as follows:

Wherein, S_1jIndicate image the i-th column pixel summation as a result, S_1jSize be 1 × n；S_i1Indicate image jth row pixel Point summation as a result, S_i1Size be m × 1；It, can be further by calculating the projection of picture both horizontally and vertically The specific location in timestamp region is to realize the Accurate Segmentation of picture.

3. the sun film image timestamp information extracting method based on deep learning, feature exist according to claim 2 In:

The picture obtained based on step 1.3 cuts picture in projection result both horizontally and vertically:

In order to guarantee that the continuity of picture is not destroyed, as starting point, the last one is greater than equal the point for being greater than mean value using first The point of value retains all images in origin-to-destination region as terminal；Assuming that the size of original image S is m × n, picture P after cutting Size be m ' × n ', cut formula are as follows:

P=S (a:b, c:d), (a, c>1, b<m, d<n)

Wherein:

In formula, S (a:b, c:d) is indicated in picture S, a to b row, c to d column；X indicates the value that certain is put in floor projection,It indicates Each point mean value in floor projection；Y indicates the value that certain is put in upright projection,Indicate each point mean value in upright projection；It indicates in projection vector, x is greater thanPosition minimum value；It indicates in projection vector, x is big InPosition maximum value.

4. the sun film image timestamp information extracting method based on deep learning, feature exist according to claim 1 In:

In the step 2, the process of monocase segmentation are as follows:

The background of picture is removed using top cap operation first, noise is then removed using local binarization algorithm, finally Character zone is extracted using connected domain algorithm；The algorithm default character color is white, by connecting if character color is black Logical domain will be extracted less than any effective coverage after extracting, so effective coverage then returns to local binarization part if it does not exist, Color overturning is carried out to the picture after local binarization.

5. the sun film image timestamp information extracting method based on deep learning, feature exist according to claim 4 In:

Background removal is carried out using top cap mathematical algorithm, algorithm principle is that original image makes the difference with original image opening operation result；Figure Piece can eliminate a part of ambient noise after top cap operation, and can protrude the character in image.

6. the sun film image timestamp information extracting method based on deep learning, feature exist according to claim 4 In:

Noise remove is carried out using Sauvola local binarization algorithm；It only needs to extract the connected domain for meeting character boundary It can complete the cutting of character.

7. the sun film image timestamp information extracting method based on deep learning, feature exist according to claim 4 In:

Character zone extraction is carried out using connected domain algorithm, by judging whether each connected domain complies with standard the length of character, Width and length-width ratio further delete inactive area；The height of character is between [90,110], and character width is in [10,60] Between, the length-width ratio of character is not less than 1；

The position in original image is corresponded to according to the position of connected domain each in binary map, it can be by the single character zone in original image point It is not cut into；In order to which the dimension of picture guaranteed is consistent, which is respectively filled every picture, and by its size It is transformed to 28 × 28 normal pictures.

8. the sun film image timestamp information extracting method based on deep learning, feature exist according to claim 1 In:

In the step 3, the process of character recognition are as follows:

Character recognition is carried out using the convolutional neural networks algorithm in deep learning, the convolutional Neural net for the character recognition built Network includes two convolutional layers, two pond layers and a full articulamentum, and first convolutional layer passes through 6 having a size of 5 × 5 difference Convolution kernel to input having a size of 28 × 28 character picture carry out convolution, after first layer convolution, former character picture becomes 24 × 24 × 6 characteristic pattern；It is 2 × 2 pond function to the knot of first convolutional layer that first pond layer, which uses sliding window, Fruit carries out feature and extracts again, undergoes this layer of Chi Huahou, becomes 12 × 12 × 6 characteristic pattern；Second convolutional layer uses 12 rulers The very little different convolution kernels for being 5 × 5 carry out feature to the characteristic pattern of pond layer and extract again, it is extracted after characteristic pattern having a size of 8 × 8 ×12；Second pond layer carries out pond to the characteristic pattern after second convolutional layer convolution, and Chi Huahou characteristic pattern size becomes 4 × 4×12；Characteristic pattern after second of pondization is operated inputs full articulamentum, obtains the feature vector of the character；Finally by character Feature vector carry out classification and corresponding with real figure just complete time character recognition in timestamp；

The convolutional neural networks that training is obtained, know the single character in the temporal information in solar chromosphere film graph Not, it will identify that the character come combines corresponding with the filename of original image in order and inserts in Excel table, be used for the later period Artificial nucleus couple and establish database.

9. the sun film image timestamp information extracting method based on deep learning, feature exist according to claim 1 In: further include step 4, artificial nucleus are to the date:

For the chromosphere image in a period of time, it is only necessary to which the year, month, day information for inputting first image can calculate automatically The shooting date of every picture after obtaining；There are some dates for not carrying out sun observation for intermediate occasional, using artificial The mode of verification modifies the wrong date information automatically generated.