CN111652275B

CN111652275B - Sparse astragal identification model construction method, sparse astragal identification method and sparse astragal identification system

Info

Publication number: CN111652275B
Application number: CN202010362496.2A
Authority: CN
Inventors: 席江波; 向姚冰
Original assignee: Changan University
Current assignee: Changan University
Priority date: 2020-04-30
Filing date: 2020-04-30
Publication date: 2023-04-07
Anticipated expiration: 2040-04-30
Also published as: CN111652275A

Abstract

The invention discloses a construction method of an evacuation star group identification model, which comprises the following steps of 1, obtaining regional star body structural data of a marked sky area, and filling labels for star groups consisting of regional star bodies in the sky area to form a regional star body structural data set and a star group label set; step 2, processing the region star structured data set to obtain a Parzen window density map sample set; and 3, training a pre-constructed sparse star catalogue identification model by taking the Parzen window density map sample set as input and the star catalogue label set as output to obtain the trained sparse star catalogue identification model. The invention solves the technical problems of poor automation degree and low efficiency of evacuation star group identification relying on manual judgment in astronomical research. The invention also discloses an evacuation star group identification method and system.

Description

Sparse astragal identification model construction method, sparse astragal identification method and sparse astragal identification system

Technical Field

The invention belongs to the field of astronomical data processing, relates to an astronomical data processing method, and particularly relates to a construction method of an evacuation star group identification model, and an evacuation star group identification method and system.

Background

The commonly used method for identifying and separating member stars of an evacuation star group at present mainly comprises the following steps:

1.1, manual identification

Astronomy adopts manual one-by-one selection and manual judgment method to search and evacuate the star groups all the time. The method comprises the steps of obtaining a data set, randomly selecting a small day area, observing the position distribution and self-distribution of data, extracting a self-distributed area with a high density of star bodies as a subset, observing whether a color-star map (CMD) of the area conforms to an experience curve, judging whether a cluster exists in the area if the area conforms to the experience curve, and otherwise, negating that an evacuated star cluster exists in the area. The method is an original but very effective method, and with the continuous increase of astronomical observation data sets, the feasibility and the effectiveness of the method are greatly challenged.

1.2 DBSCAN Density Cluster analysis

Castro-Ginard et al devised a method to systematically identify the evacuated stars in the GAIA data in an automated fashion and apply it to the tga data set of GAIA DR1 (GAIA corporation et al 2016) as an initial testing step. These density aggregates were detected using a density-based clustering algorithm named DBSCAN (Ester et al, 1996). They were validated using a classification algorithm based on an artificial neural network (Hinton 1989) to identify isochrone profile features on the CMD. The detected candidate evacuation starballs were finally tested manually using GAIA DR2 (GAIA corporation et al 2018) photometric data to confirm the validity of the application of the method in the full GAIA DR2 dataset. The method realizes an automatic search cluster structure, but the method is greatly interfered by background field satellites and is not suitable for searching remote sparse clusters.

Disclosure of Invention

Aiming at the problems or defects in the prior art, the invention aims to provide a construction method of an evacuation star group identification model, an evacuation star group identification method and an evacuation star group identification system, so as to solve the technical problems of low automation degree check and low efficiency caused by the fact that the evacuation star group identification in the prior art depends on manual judgment.

In order to solve the technical problems, the invention adopts the following technical scheme: an evacuation star constellation identification model construction method comprises the following steps:

step 1, obtaining regional star structured data of a marked sky region, and labeling a star group consisting of regional stars in the sky region to form a regional star structured data set and a star group label set;

step 2, processing the region star structured data set to obtain a Parzen window density map sample set;

step 3, training a pre-constructed sparse star-constellation recognition model by taking the Parzen window density map sample set as input and the star-constellation tag set as output to obtain a trained sparse star-constellation recognition model;

wherein, step 2 comprises the following substeps:

step 2.1, cleaning and screening the regional star structured data in the regional star structured data set, and reserving right ascension dimension variables and declination dimension variables of regional stars;

step 2.2, determining a regional star kernel function and a marked sky bandwidth parameter according to the right ascension dimension variable and the declination dimension variable of the regional star;

2.3, carrying out data distribution probability density estimation on the obtained region satellite kernel function and the marked antenna bandwidth parameter to obtain a region satellite density distribution model;

and 2.4, calculating the sampling rate of the self-adaptive sampling grid, and sampling the area star density distribution model by using the self-adaptive sampling grid to obtain a Parzen window density estimation diagram to form a Parzen window density estimation diagram sample set.

Specifically, the step 2.2 of determining the regional star kernel function and the marked-sky bandwidth parameter by using the right ascension dimension variable and the declination dimension variable of the regional star specifically includes the following steps:

step 2.2.1, calculating a marked antenna bandwidth parameter h according to formula I:

in the formula (I), sigma is a standard deviation, and N is the number of regional stars;

step 2.2.2, calculating the region satellite kernel function by the formula II:

in the formula (II), mu _1n Red meridian dimension variable, mu, representing the n-th regional star in the marked sky region _2n Representing declination dimension variable of n number area star in the marked sky area, sigma is standard deviation, rho is constant coefficient, and x is declination dimension variable mu of relative area star _1n Y is the star declination dimension variable mu of the relative region _2n In the ordinate value of (a), wherein _1n -h<x<μ _1n +h,μ _2n -h<y<μ _2n +h。

Specifically, the step 2.3 of performing data distribution probability density estimation on the obtained region satellite kernel function and the marked antenna bandwidth parameter to obtain the region satellite density distribution model specifically includes the following steps:

step 2.3.1, obtaining Gaussian distribution of the regional stars through the regional star kernel function;

step 2.3.2, integrating the Gaussian distribution of the regional stars in the two-dimensional direction to obtain a regional star distribution function;

and 2.3.3, carrying out normalization processing on the obtained region star distribution function to obtain a region star density distribution model.

Specifically, the sampling rate of the adaptive sampling grid is calculated in step 2.4; sampling the regional star density distribution model by using a self-adaptive sampling grid to obtain a Parzen window density estimation diagram, and forming a Parzen window density estimation diagram sample set specifically comprises the following steps:

step 2.4.1, calculating the sampling rate N of the adaptive sampling grid ^* ：

In the formula (III), d is the cone radius of the marked sky area, the unit is degree, and G is a constant such as a cutoff star;

step 2.4.2, sampling the area star density distribution model by using a self-adaptive sampling grid with the size of N x N to obtain a density numerical value grid graph and obtain a density numerical value grid graph sample set;

and 2.4.3, stretching the pixel values of the density numerical grid map in the density numerical grid map sample set to a preset gray scale range to obtain a Parzen window density estimation map, and forming the Parzen window density estimation map sample set.

Specifically, the predetermined gray scale range described in step 2.4.3 is 0-255.

Specifically, the Parzen window estimate map described in step 2.4.3 has dimensions of 168 × 168 pixels.

Specifically, the sparse starburst identification model in step 3 is based on a convolutional neural network and is composed of six convolutional groups connected in sequence, wherein the first to fifth convolutional groups have the same structure and each comprise two convolutional layers and a maximum pooling layer connected in sequence; the first convolution group is 16 channels, the second convolution group is 32 channels, the third convolution group is 64 channels, the fourth convolution group is 128 channels, the fifth convolution group is 128 channels, the size of the superposed output of the five convolution groups is 10 multiplied by 128, the superposed output of the five convolution groups is converted through a flat layer and finally acts on 3 full-connection layers of the sixth part, and the number of nodes is 128, 64 and 64 respectively; a dropout layer with a retention rate of 0.8 is added after each fully connected layer.

An evacuation star group identification method comprises the following steps:

(1) Acquiring regional star structured data to obtain a structured data set;

(2) Processing the region star structured data set to obtain a Parzen window density map sample set;

(3) Inputting the Parzen window density pattern book set into an evacuation astral cluster recognition model to obtain a recognition result of the evacuation astral cluster;

the evacuation star group identification model is the evacuation star group identification model constructed by the evacuation star group identification model construction method.

An evacuation star group identification system comprises an evacuation star group data acquisition module and an evacuation star group identification module;

the evacuation star group data acquisition module takes regional star-structured data as an acquisition object;

the evacuation star group identification module is used for executing the evacuation star group identification method. Compared with the prior art, the invention has the following technical effects:

(1) The sparse starry cluster recognition model constructed by the method is high in recognition accuracy and good in judgment standard consistency.

(2) The method has high automation degree, and the constructed model does not need manual intervention after being trained.

(3) The method has the advantages of high speed and high efficiency for identifying the evacuated star group.

Drawings

Fig. 1 is a flow chart of the evacuation star group identification according to the present invention.

Fig. 2 is a schematic illustration of Parzen window density estimates of non-evacuated stars collected in example 1.

Fig. 3 is a schematic illustration of Parzen window density estimates of the evacuated stars collected in example 1.

Fig. 4 is a schematic diagram of the regional star scattergram and the sampling adaptive Parzen window estimation of the evacuated and non-evacuated stars in example 1.

Fig. 5 is a schematic diagram of the recognition result of the evacuation star group recognition model in embodiment 1.

Fig. 6 is a schematic diagram of correctly identified non-evacuated stars in embodiment 1.

Fig. 7 is a schematic diagram of the non-evacuation star group being erroneously identified as an evacuation star group in embodiment 1.

Fig. 8 is a schematic diagram illustrating that the evacuation star group is correctly identified in embodiment 1.

Fig. 9 is a schematic diagram of the evacuated galaxy misidentified as a non-evacuated galaxy in example 1.

Fig. 10 is a schematic diagram of simulation region star structured data in example 2.

FIG. 11 is a feature weight thermodynamic diagram for identifying a sample.

FIG. 12 is a dimension reduction visualization of an identified sample.

The present invention will be explained in further detail with reference to examples.

Detailed Description

The following embodiments are given as examples, and it should be noted that the present invention is not limited to the following embodiments, and all equivalent changes based on the technical solutions of the present application are included in the protection scope of the present invention.

It should be noted that, in the embodiment of the present invention, it is necessary to perform training of an identification model and identification of an evacuated star group by using a computer, where the configuration of the computer includes an Intel (i 7-8700 k) processor and a 32GB Random Access Memory (RAM), a windows 10-bit operating system, and an NVIDIA GTX 1080TI (11 GB) video card; the software environment was Tensorflow1.14.0.

Example 1

In example 1, the structured data actually collected was GAIA DR2 astronomical observations. The present example was carried out according to the following steps:

within the selected sampling area, acquiring area star structured data of marked celestial bodies, wherein the structured data comprises multidimensional information of the celestial bodies, such as: number, right ascension, declination, parallax, right ascension, star, etc. and right ascension.

In this embodiment, a star group consisting of regional stars in the marked sky region is labeled with a label 1, and a non-evacuation star group is labeled with a label 0.

Step 2, processing the regional star structured data set to obtain a Parzen window density map sample set;

in this embodiment, structured data of regional stars are collected in an evacuated starry area and a non-evacuated starry area formed by regional stars, 550 structured data are collected for positive and negative stars, and 1100 structured data are collected in the evacuated starry area and the non-evacuated starry area.

step 2.2.1, calculating a marked antenna bandwidth parameter h by formula I:

in the formula (I), sigma is standard deviation, N is the number of regional stars,

step 2.2.2, calculating a region star kernel function according to a formula II:

in the formula (II), mu _1n Red meridian dimension variable, mu, representing the n-th regional star in the marked sky region _2n Representing declination dimension variable of n number area star in the marked sky area, sigma is standard deviation, rho is constant coefficient, and x is declination dimension variable mu of relative area star _1n Y is the star declination dimension variable mu of the relative region _2n In the ordinate value of (a), wherein _1n -h<x<μ _1n +h,μ _2n -h<y<μ _2n + h. x, y are coordinate positions in a local coordinate system with the star position as the origin of coordinates.

Step 2.3, carrying out data distribution probability density estimation on the obtained region star kernel function and the marked antenna bandwidth parameter to obtain a region star density distribution model;

and 2.3.3, carrying out normalization processing on the obtained regional star distribution function to obtain a regional star density distribution model.

Step 2.4.1, calculating the sampling rate N of the self-adaptive sampling grid ^* ：

in this embodiment, G takes the value 22.

Step 2.4.2, sampling the density distribution model of the regional star body by using a self-adaptive sampling grid with the size of N x N to obtain a density numerical grid graph and a density numerical grid graph sample set;

and 2.4.3, stretching the pixel values of the density numerical grid map in the density numerical grid map sample set to a preset gray scale range to obtain a Parzen window density estimation map, and forming a Parzen window density estimation map sample set.

In order to meet the requirement of a convolutional neural network on the size consistency of the image, the size of the output sampling self-adaptive Parzen window density estimation graph is converted into 168 multiplied by 168 pixels through image scaling; so as to obtain a Parzen window density estimation diagram, and form a Parzen window density estimation diagram sample set.

And 3, training a pre-constructed sparse star catalogue identification model by taking the Parzen window density map sample set as input and the star catalogue label set as output to obtain the trained sparse star catalogue identification model.

In this embodiment, as an optimization scheme, first, 400 Parzen window density estimation maps are selected from the positive and negative of 1100 samples of the Parzen window density estimation maps, 800 Parzen window density estimation maps are used as training data to form a training sample set, and 150 remaining positive and negative Parzen density estimation maps are used as test data to form a test sample set.

And (3) performing rotation, mirror image and cutting scaling on the data set by using a data expansion technology on the training data in the training sample set, and expanding the original 800 Parzen window density estimation graphs into 16000 images.

And carrying out autonomous sampling on the expanded data to obtain a sub-sample set, and dividing the obtained sub-data into training data and verification data to obtain a training set and a verification set.

Training the constructed sparse astrology recognition model by using data in a training set, using a verification set as a monitoring model, and reserving an optimal basic model: adam optimization algorithm is used during the training process to minimize the cross entropy error of the training subsample set. Each group randomly selects 10% of data from the training subsample set as a validation set, and the rest is used as training data of the group. And (4) improving the performance of the model on the training subsample set by using a random gradient descent algorithm. Using the cross entropy as a loss function, after each group of training is completed, calculating the loss value of the model on the verification set at the moment by using the verification set. And taking the loss value of the verification set as an evaluation index, saving the model with the minimum loss value of the verification set as an optimal model for training, and stopping training if the model is overfitting and 10 continuous groups of loss of the verification set are set to be not reduced. 14400 images are randomly selected from 16000 images in each group to form a training set for training the model, the rest 1600 images form a verification set, and the verification set is used for selecting the optimal parameters of the model and avoiding overfitting.

The sparse starburst identification model is based on a convolutional neural network and consists of six convolutional groups connected in sequence, wherein the first to fifth convolutional groups have the same structure and respectively comprise two convolutional layers and a maximum pooling layer which are connected in sequence; the first convolution group is 16 channels, the second convolution group is 32 channels, the third convolution group is 64 channels, the fourth convolution group is 128 channels, the fifth convolution group is 128 channels, the size of the superposed output of the five convolution groups is 10 multiplied by 128, the superposed output of the five convolution groups is converted through a flat layer and finally acts on 3 full-connection layers of the sixth part, and the number of nodes is 128, 64 and 64 respectively; a dropout layer with a retention rate of 0.8 is added after each fully connected layer. The output result of the identification model is the probability of the existence of the sparse astragal in the area to be identified: the existence probability is close to 1, which indicates that the area is highly likely to have the sparse asteroid, and the existence probability is close to 0, which indicates that the area has a very small probability of having the sparse asteroid. As shown in fig. 5, the probability of the existence of the evacuated stars in each Parzen window density estimation diagram decreases from left to right.

And after obtaining the identification result, calculating a probability regression value of the output tag set by using a Softmax activation function.

In the final model prediction process of the embodiment, the dropout layer is closed, which helps to achieve better model performance.

And finally, testing the obtained optimal scattered star group identification model by using 300 Parzen window density estimation graphs in the test set, testing the identification effect of the scattered star group identification model, and evaluating the model by using the accuracy P and the recall rate R.

The predictions made by the model may have several cases: the positive class (sparse constellation) is predicted as a positive class (TP), the negative class (non-sparse constellation) is predicted as a positive class (FP), the positive class is predicted as a positive class (TP), and the positive class is predicted as a negative class (FN), and the identified accuracy rate P and accuracy rate R are calculated using the following formulas:

the model was tested using 300 test data and the final test results are shown in the following table:

and (3) calculating the result:

the precision ratio P = 118/(118 + 30) =0.7973,

accuracy rate R = 118/(118 + 33) =0.7815.

Example 2:

the present embodiment is different from embodiment 1 in that a sample set constructed from simulation data is selected in the present embodiment.

As shown in fig. 10, the left side in fig. 10 is a simulation diagram of an evacuated star constellation, the right side is a simulation diagram of an evacuated star constellation region, and the size of a single picture generated by an experiment is 168 × 168 × 3.

Then, according to the procedure described in example 1, the sample set for simulation training was also screened, divided, and expanded according to the method described above. 8000 positive and negative simulation samples are obtained, 16000 simulation samples are totalized and used as a training set, 300 actually acquired samples are selected as a testing set, the simulation model constructed in the embodiment 1 is trained and tested, and finally the obtained testing result is shown in the following table:

and (3) calculating the result:

the precision ratio P = 115/(115 + 33) =0.7770,

accuracy R = 115/(115 + 35) =0.7667.

Example 3:

the difference between this embodiment and embodiment 1 is that the kernel function is a uniform distribution function, and other experimental steps are the same as those in embodiment 1, and the formula of the uniform distribution function is as follows:

in the formula, x is the abscissa value of the relative area star body declination dimension variable, and y is the ordinate value of the relative area star body declination dimension variable. x, y are the coordinate positions of points in a local coordinate system with the star position as the origin of coordinates.

In this embodiment, the recognition model constructed in embodiment 1 is trained with 16000 simulation data and 16000 data expanded from 800 actually collected data, and 300 actually collected data are selected to test the recognition model, and the final recognition result is shown in the following table:

and (3) calculating the result:

the precision ratio P = 107/(107 + 39) =0.7329,

accuracy rate R = 107/(107 + 43) =0.7133.

Example 4:

compared with embodiment 1, in this embodiment, 10 recognition network models with the same structure are respectively trained on 10 different subsample sets, and then recognition results are integrated through a Bagging integration algorithm and averaged to be finally output.

In this embodiment, the pre-constructed simulation model described in embodiment 1 is trained with 16000 simulation data and 800 actually collected data, 300 actually collected data are selected for testing, and the final test results are shown in the following table:

the calculation result shows that the accuracy is P = 128/(128 + 21) =0.8591, and the accuracy is R = 128/(128 + 24) =0.8421.

For the above embodiments 1 to 4, the model reliability test is finally performed by using a class activation thermodynamic diagram and a multi-distribution random neighbor embedding method, which specifically includes the following steps:

1. class activation thermodynamic diagram test model feature weights;

the convolution signature is extracted and a class-activated thermodynamic map (Grad-CAM) is used to determine which portion of the input image has the greatest effect on the model classification results. As shown in fig. 11, the graph on the left side of the first row is a density graph with the presence of evacuated stars and the graph on the right side is its corresponding class activation thermodynamic diagram. As can be seen from the figure, the high-density central area in the density map has corresponding color highlighting on the class activation thermodynamic diagram, which indicates that the area occupies larger weight when the sparse starry is judged to exist, and the basis is the same as that used for artificially distinguishing the class images. The second left-hand line is a density map with no sparse stars present, and the class activation heatmap highlights random fluctuations, indicating that the model has selected the appropriate feature.

2. Determining a decision boundary of a visual model by using a t-distribution random neighborhood embedding algorithm;

the last hidden layer of the neural network is a linearly separable representation of the input set, i.e. the sparse astragals and the non-sparse astragals have a classification surface in a high-dimensional space. In order to visually display the classification surface, a t-distribution random neighborhood embedding algorithm is introduced to perform dimension reduction visualization. Fig. 12 shows the distribution of evacuated and non-evacuated constellations on a two-dimensional plane. The pictures with the sparse astragal are concentrated on the right side of the lower graph, and the pictures without the sparse astragal are concentrated on the left side of the lower graph.

By comprehensive comparison, the average identification accuracy of the method provided by the invention reaches more than 85%. Compared with the existing manual identification method, the method has the advantages that the identification precision is high, the consistency of the judgment standard is good, manual intervention is not needed after the model is trained, and the automation degree is high; the time required for completing automatic identification is short, the identification speed is high, the efficiency is high, and the workload is greatly reduced.

Claims

1. An evacuation star clique recognition model construction method is characterized by comprising the following steps of:

step 3, training a pre-constructed sparse star atlas identification model by taking the Parzen window density atlas sample set as input and the star atlas label set as output to obtain a trained sparse star atlas identification model;

wherein, step 2 comprises the following substeps:

and 2.4, calculating the sampling rate of the self-adaptive sampling grid, sampling the regional star density distribution model by using the self-adaptive sampling grid to obtain a Parzen window density estimation image, and forming a Parzen window density estimation image sample set.

2. The method for constructing an evacuation star group identification model according to claim 1, wherein the step 2.2 of determining the regional star kernel function and the marked antenna bandwidth parameter by the right ascension dimension variable and the declination dimension variable of the regional star specifically comprises the following steps:

step 2.2.1, calculating a marked antenna bandwidth parameter h by formula I:

in the formula (I), sigma is a standard deviation, and N is the total number of regional stars;

step 2.2.2, calculating the region satellite kernel function by the formula II:

in the formula (II), mu _1n The right ascension dimension variable, mu, of the n-th region star in the marked sky region _2n Representing declination dimension variable of n number area star in the marked sky area, sigma is standard deviation, rho is constant coefficient, and x is declination dimension variable mu of relative area star _1n Y is the star declination dimension variable mu of the relative region _2n In the ordinate value of (a), wherein _1n -h<x<μ _1n +h,μ _2n -h<y<μ _2n +h。

3. The method for constructing an evacuation starry cluster identification model according to claim 1, wherein the step 2.3 of performing data distribution probability density estimation on the obtained regional star kernel function and the marked antenna bandwidth parameter to obtain the regional star density distribution model specifically comprises the following steps:

4. An evacuation constellation identification model construction method according to claim 1, characterized by calculating the sampling rate of the adaptive sampling grid in step 2.4; sampling the regional star density distribution model by using a self-adaptive sampling grid to obtain a Parzen window density estimation diagram, and forming a Parzen window density estimation diagram sample set, wherein the Parzen window density estimation diagram sample set specifically comprises the following steps of:

5. An evacuation star constellation identification model construction method according to claim 4, characterized in that said predetermined gray scale range in step 2.4.3 is 0-255.

6. An evacuation constellation identification model construction method according to claim 4, characterized in that the Parzen window estimate map in step 2.4.3 has dimensions of 168 x 168 pixels.

7. The method according to claim 1, wherein the sparse constellation identification model in step 3 is based on a convolutional neural network and is composed of six convolutional groups connected in sequence, wherein the first to fifth convolutional groups have the same structure and each comprise two convolutional layers and a maximum pooling layer connected in sequence; the first convolution group is 16 channels, the second convolution group is 32 channels, the third convolution group is 64 channels, the fourth convolution group is 128 channels, the fifth convolution group is 128 channels, the size of the superposition output of the five convolution groups is 10 multiplied by 128, the conversion is carried out through a flat layer, and finally the conversion is acted on 3 full connection layers of the sixth part, and the number of nodes is 128, 64 and 64 respectively; a dropout layer with a retention rate of 0.8 is added after each fully connected layer.

8. An evacuation star group identification method is characterized by comprising the following steps:

step 8.1, obtaining regional star structured data of a marked sky region, and labeling a star group consisting of regional stars in the sky region to form a regional star structured data set and a star group label set;

step 8.2, processing the region star structured data set to obtain a Parzen window density map sample set;

step 8.3, inputting the Parzen window density pattern book set into an evacuation star group identification model to obtain an evacuation star map identification result;

the evacuation star group identification model is the evacuation star group identification model constructed by the evacuation star group identification model construction method according to any one of claims 1 to 7.

9. An evacuation star group identification system is characterized by comprising an evacuation star group data acquisition module and an evacuation star group identification module;

the evacuation constellation identification module is configured to perform the evacuation constellation identification method according to claim 8.