KR20200069222A

KR20200069222A - Method of identification and analysis for materials

Info

Publication number: KR20200069222A
Application number: KR1020190150069A
Authority: KR
Inventors: 손기선; 이진웅
Original assignee: 세종대학교산학협력단
Priority date: 2018-12-06
Filing date: 2019-11-21
Publication date: 2020-06-16
Also published as: KR102235934B1

Abstract

The present invention relates to a method of easily distinguishing and analyzing elements, phases, compositions, etc. included in various materials including inorganic materials and organic materials. According to the present invention, the method comprises: (a) a step of selecting two or more elements; (b) a step of collecting data of a plurality of compounds analyzed to be created with the two or more elements; (c) a step of preparing analysis data of a spectrum form or an image for each of the collected plurality of compounds; (d) a step of selecting two or more compounds among the plurality of compounds to mix the two or more compounds at a prescribed mixing ratio, and generating training data including mixing and processing the image or spectrum data in accordance with the prescribed mixing ratio; (e) a step of using the training data to perform machine learning; and (f) a step of distinguishing and/or analyzing analysis data of a spectrum form or an image obtained from a real material by using a model obtained by the machine learning.

Description

물질의 판별 및 분석 방법 {METHOD OF IDENTIFICATION AND ANALYSIS FOR MATERIALS}METHOD OF IDENTIFICATION AND ANALYSIS FOR MATERIALS

본 발명은 무기물 및 유기물 등 다양한 물질에 포함된 원소, 상(phase), 조성 등의 판별 및 분석을 효율적으로 수행할 수 있도록 하는 방법에 관한 것이다.The present invention relates to a method for efficiently performing identification and analysis of elements, phases, and compositions contained in various substances such as inorganic and organic substances.

분말 분석 기술에 기초한 물질 발견 과정에서 가장 빈번하게 직면하는 상황 중 하나는 알려지지 않은 다상 화합물의 식별 및 정량화와 관련이 있다.One of the most frequently encountered situations in the discovery of materials based on powder analytical techniques involves identification and quantification of unknown polyphase compounds.

일반적으로 물질에 포함된 원소, 상(phase), 성분 등의 분석에는, XRD, XPS, EDS 등의 다양한 분석장비가 사용되고 있다. 그런데, 상기 분석장비를 통해 수득한 분석결과는 해당 기술분야에 전문적인 지식과 많은 경험을 가진 사람이 상당한 시간을 들여야만 정확한 분석결과를 얻는 경우가 많다.In general, various analysis equipments such as XRD, XPS, and EDS are used for analysis of elements, phases, and components contained in substances. However, the analysis results obtained through the above analysis equipment are often obtained only when a person with professional knowledge and a lot of experience in the relevant technical field takes considerable time.

물질 분석 시간을 단축하기 위하여, 개발되어 있는 다양한 분석 소프트웨어가 있다. 예를 들어, 이론 결정학적 이론을 기반으로 하는 알고리즘을 통한 분석이 있으나 충분한 정확도를 제공하지 못한다. 또한, 기계 학습 방법을 이용하는 것도, 훈련 데이터세트가 소규모이고, 이론적인 지식에 기반한 과도한 피처 엔지니어링을 사용하기 때문에, 기존의 규칙 기반의 분석 소프트웨어와 큰 차이가 없는 결과를 제공한다.In order to shorten the material analysis time, there are various analysis software developed. For example, there is an analysis through an algorithm based on theoretical deterministic theory, but it does not provide sufficient accuracy. In addition, using the machine learning method provides results that are not significantly different from existing rule-based analysis software because the training dataset is small and uses excessive feature engineering based on theoretical knowledge.

또한, 기존의 분석 소프트웨어는, 해당 물질을 포함하는 제조 공정 라인에서 실시간으로 제조되고 있는 물질에 관한 정보를 정확하게 도출할 수 있는 것은 거의 없다고 할 수 있다.In addition, it can be said that the existing analysis software can hardly accurately derive information about a substance being manufactured in real time in a manufacturing process line containing the substance.

대한민국 공개특허공보 제10-2018-0073819호Republic of Korea Patent Publication No. 10-2018-0073819

본 발명의 과제는 무기물 및 유기물 등 다양한 물질에 포함된 원소, 상(phase) 또는 함량 등의 판별과 분석이 단시간 내에 높은 정확도로 이루어질 수 있도록 하는, 물질의 판별 및 분석 방법을 제공하는데 있다.An object of the present invention is to provide a method for discriminating and analyzing substances, which enables the discrimination and analysis of elements, phases, or contents contained in various substances such as inorganic substances and organic substances with high accuracy within a short period of time.

상기 과제를 해결하기 위해 본 발명은, (a) 2 이상의 원소를 선택하는 단계; (b) 상기 2 이상의 원소로 생성 가능한 것으로 분석된 복수의 화합물의 데이터를 수집하는 단계; (c) 상기 수집된 복수의 화합물 각각에 관한 이미지 또는 스펙트럼 형태의 분석 데이터를 준비하는 단계; (d) 상기 복수의 화합물 중에서, 2종 이상의 화합물을 선택하여 소정의 혼합 비율로 혼합하고, 상기 각각 이미지 또는 스펙트럼 데이터를 상기 소정의 혼합 비율에 맞추어 혼합 가공한 것을 포함하여 훈련 데이터를 만드는 단계; (e) 상기 훈련 데이터를 사용하여 기계 학습을 수행하는 단계; 및 (f) 실제 물질로부터 얻은 이미지 또는 스펙트럼 형태의 분석 데이터를 상기 기계 학습을 통해 수득한 모델을 사용하여 판별 및/또는 분석하는 단계를 포함하는, 물질의 판별 및 분석 방법을 제공한다.In order to solve the above problems, the present invention, (a) selecting two or more elements; (b) collecting data of a plurality of compounds analyzed as being capable of being produced by the two or more elements; (c) preparing analytical data in the form of an image or spectrum for each of the plurality of compounds collected; (d) selecting two or more compounds from among the plurality of compounds, mixing them at a predetermined mixing ratio, and creating training data including mixing and processing the image or spectrum data according to the predetermined mixing ratio; (e) performing machine learning using the training data; And (f) discriminating and/or analyzing the analysis data in the form of an image or spectrum obtained from a real material using the model obtained through the machine learning.

상기 방법에 있어서, 상기 2 이상의 원소를 선택하는 단계에 있어서, 바람직한 원소의 수는 2 ~ 10이고, 보다 바람직한 원소의 수는 3 ~ 8이다.In the above method, in the step of selecting the two or more elements, the number of preferred elements is 2 to 10, and the number of more preferred elements is 3 to 8.

상기 방법에 있어서, 상기 복수의 화합물의 데이터는 현재까지 수행되어 있는 화학분석, 물질분석 데이터를 이용할 수 있다. 예를 들어, 무기물의 경우, ICSD에 등록된 화합물을 사용할 수 있다.In the above method, the data of the plurality of compounds may use chemical analysis and material analysis data that have been performed to date. For example, for inorganic substances, compounds registered in ICSD can be used.

상기 방법에 있어서, 상기 복수의 화합물에 관하여 각각의 이미지 또는 스펙트럼 형태의 분석 데이터를 준비하는 단계는, 실제 분석된 결과를 사용하거나, 각각의 화합물에 관한 물질 정보를 이용하여 프로그램을 사용하여 해당 분석 이미지 또는 스펙트럼을 계산한 것을 사용할 수도 있다.In the above method, preparing the analysis data of each image or spectral form for the plurality of compounds may be performed by using an actual analyzed result or by using a program using material information about each compound. You can also use computed images or spectra.

상기 방법에 있어서, 상기 소정의 혼합 비율은 예를 들어, A 화합물, B 화합물, C 화합물이 3종이 존재할 경우, 상기 3종의 화합물의 함량이 질량%로 0.5% 단위로 변경되도록 하고, 그에 맞추어 이미지 또는 스펙트럼 데이터를 가공하는 형태일 수 있으나, 분석할 물질의 특성 및 구비되는 데이터의 사정에 맞추어 조절될 수 있으며, 반드시 이에 제한되는 것은 아니다.In the above method, the predetermined mixing ratio is, for example, when there are three types of the A compound, the B compound, and the C compound, the content of the three compounds is changed by 0.5% by mass%, and accordingly It may be in the form of processing image or spectrum data, but may be adjusted according to the characteristics of the material to be analyzed and the circumstances of the data provided, and is not necessarily limited thereto.

상기 방법에 있어서, 상기 이미지 또는 스펙트럼 형태의 분석 데이터는 기계 학습이 가능한 형태라면 특별히 제한되지 않으며, 예를 들어, XRD 데이터, XPS 데이터, IR 데이터, 투과전자현미경 회절 패턴 등이 사용될 수 있다.In the above method, the analysis data in the form of an image or spectrum is not particularly limited as long as it is a machine-learnable form. For example, XRD data, XPS data, IR data, and transmission electron microscope diffraction patterns may be used.

상기 방법에 있어서, 상기 기계 학습은 다양한 형태의 알고리즘이 적용될 수 있으며, 예를 들어 CNN(Convolutional Neural Network)을 통해 수행될 수 있다.In the above method, various types of algorithms may be applied to the machine learning, for example, may be performed through a convolutional neural network (CNN).

본 발명에 따르면, 임의의 원소로 이루어진 성분 시스템을 설정하고, 이 시스템으로 가능하며 알려진 모든 화합물의 분석 데이터를 이용하여, 상기 모든 화합물의 혼합 비율을 미리 상정하여 이미지 또는 스펙트럼 데이터를 가공하여 훈련 데이터세트로 기계 학습을 한 후, 기계 학습의 결과물인 예측모델을 통해서, 실제 물질의 판별 및/또는 분석이 이루어지므로, 단시간 내에 정확한 분석 결과를 도출할 수 있다. 예를 들어, 철강 공장 라인에서, XRD 데이터 결과를 본 발명에서 수득한 피팅 결과를 사용할 경우 실시간으로 상(phase) 분석이 가능하여 불량 발생 여부를 확인할 수 있게 된다.According to the present invention, a training system is prepared by processing an image or spectral data by pre-establishing the mixing ratio of all the compounds in advance by setting up a component system composed of any element and using analysis data of all possible and known compounds with this system. After machine learning with a set, the actual model is determined and/or analyzed through a predictive model that is the result of machine learning, so that accurate analysis results can be derived within a short period of time. For example, in the steel plant line, when using the fitting results obtained in the present invention for XRD data results, it is possible to perform phase analysis in real time to check whether a defect has occurred.

또한, 본 발명에 따른 물질의 판별 및/또는 분석 방법은, 종래 개발된 이론 기반의 소프트웨어 방식 등에 비해, 다양한 성분이 혼합되어 있는 실제 물질의 상 분석 및 상 분율의 분석을 높은 정확도로 수행할 수 있게 한다.In addition, the method for discriminating and/or analyzing substances according to the present invention can perform phase analysis and analysis of phase fractions of actual substances in which various components are mixed with high accuracy, compared to a conventionally developed theory-based software method or the like. To make.

도 1은 본 발명의 실시예에 따른 물질의 판별 방법에 관한 흐름도이다.
도 2는 본 발명의 실시예에서 사용된 화합물을 나타낸 것이다.
도 3은 본 발명의 일 실시형태에 따라, ICSD 결정구조 데이터로부터 XRD 구성 파라미터를 이용하여 특정 화합물의 XRD 데이터를 계산하여 도출하는 단계를 설명하는 도면이다.
도 4는 분석하고자 하는 성분을 포함하는 무기 화합물인 Al₂O₃, Li₂O, SrO 및 SrAl₂O₄의 실제로 분석한 XRD 패턴과, 상기 방법을 통해 시뮬레이션된 XRD 패턴을 비교한 것이다.
도 5는 각 화합물의 혼합 데이터를 도출하는 과정을 나타낸 것이다.
도 6은 도 5에서 도출된 혼합 비율에 맞추어 XRD 데이터를 가공하는 과정을 나타낸 것이다.
도 7은 혼합 XRD 데이터를 도출하는 구체적인 조건을 나타낸 것이다.
도 8은 본 발명의 실시예에서 딥러닝에 사용한 제1 CNN 아키텍쳐를 나타낸 것이다(a는 CNN2, b는 CNN3).
도 9는 본 발명의 실시예에서 딥러닝에 사용한 제2 CNN 아키텍쳐를 나타낸 것이다.
도 10은 'Dataset_800k_org'를 사용한 경우 학습 코스트 및 정확도를 나타낸 것이다.
도 11은 'Dataset_800k_rand' 및 'Dataset_180k_rand'를 사용한 경우 학습 코스트 및 정확도를 나타낸 것이다.1 is a flowchart of a method for discriminating substances according to an embodiment of the present invention.
Figure 2 shows the compounds used in the Examples of the present invention.
3 is a diagram for explaining steps of calculating and deriving XRD data of a specific compound using XRD configuration parameters from ICSD crystal structure data, according to an embodiment of the present invention.
Figure 4 compares the XRD pattern actually analyzed of the inorganic compounds Al ₂ O ₃ , Li ₂ O, SrO and SrAl ₂ O ₄ containing the component to be analyzed, and the simulated XRD pattern through the above method.
Figure 5 shows the process of deriving the mixed data of each compound.
6 illustrates a process of processing XRD data according to the mixing ratio derived in FIG. 5.
7 shows specific conditions for deriving mixed XRD data.
8 shows a first CNN architecture used for deep learning in an embodiment of the present invention (a is CNN2, b is CNN3).
9 shows a second CNN architecture used for deep learning in an embodiment of the present invention.
10 shows learning cost and accuracy when'Dataset_800k_org' is used.
11 shows learning cost and accuracy when'Dataset_800k_rand'and'Dataset_180k_rand' are used.

이하 본 발명의 실시예에 대하여 첨부된 도면을 참고로 그 구성 및 작용을 설명하기로 한다. 하기에서 본 발명을 설명함에 있어, 관련된 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이다. 또한, 어떤 부분이 어떤 구성요소를 '포함'한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다.Hereinafter, with reference to the accompanying drawings for the embodiment of the present invention will be described the configuration and operation. In the following description of the present invention, when it is determined that a detailed description of related known functions or configurations may unnecessarily obscure the subject matter of the present invention, the detailed description will be omitted. Also, when a part is said to'include' a certain component, this means that other components may be further included instead of excluding other components, unless otherwise stated.

도 1은 본 발명의 실시예에 따른 물질의 판별 방법에 관한 흐름도이다. 도 1에 도시된 것과 같이, 본 발명의 실시예에 따른 물질의 판별 방법은, 분석하고자 하는 물질을 구성하는 원소를 선정하는 단계와, 선정된 원소로 가능한 화합물에 관한 정보를 추출하는 단계와, 추출된 각 화합물에 대한 XRD 데이터를 시뮬레이션하는 단계와, 시뮬레이션된 XRD 데이터를 소정 비율로 혼합하여 기계학습용 빅데이터를 제조하는 단계와, CNN 아키텍처를 통해 기계학습을 수행하는 단계를 포함하여 이루어진다.1 is a flowchart of a method for discriminating substances according to an embodiment of the present invention. As illustrated in FIG. 1, a method for discriminating a substance according to an embodiment of the present invention includes: selecting an element constituting a substance to be analyzed, and extracting information about a possible compound from the selected element, It comprises the steps of simulating XRD data for each extracted compound, mixing the simulated XRD data in a predetermined ratio to produce big data for machine learning, and performing machine learning through CNN architecture.

원소 선정 및 화합물 데이터의 수집Element selection and compound data collection

본 발명의 실시예에서는, 분석하고자 하는 물질에 포함되는 원소로, 무기 발광 물질의 전형적인 성분인 스트론튬(Sr), 알루미늄(Al), 리튬(Li) 및 산소(O)를 선정하였다.In the embodiment of the present invention, strontium (Sr), aluminum (Al), lithium (Li), and oxygen (O), which are typical components of the inorganic light emitting material, were selected as elements included in the material to be analyzed.

다음으로, 상기 4가지 원소를 포함하는 화합물 정보를 무기결정구조 데이터(ICSD, Inorganic Cryatal Structure Data)를 통해 확인한 결과 총 174개가 등재되어 있으며, 이중 5개는 잘못된 것이므로 제외하였다. 여기에 2018년 ICSD 버전에는 포함되지 않았으나, 최근에 발견된 4원계 화합물로 LED용 발광 재료로 유망한 Sr₂LiAlO₄를 추가하였다.Next, as a result of confirming the compound information including the four elements through inorganic crystal structure data (ICSD, Inorganic Cryatal Structure Data), a total of 174 were registered, and five of them were excluded because they were wrong. Although not included in the 2018 ICSD version, Sr ₂ LiAlO ₄ , a promising light emitting material for LEDs, was added as a recently discovered quaternary compound.

이러한 과정을 통해, 스트론튬(Sr), 알루미늄(Al), 리튬(Li) 및 산소(O)를 포함하는 총 170개의 화합물 정보를 추출하였으며, 추출된 무기 화합물을 도 2에 나타내었다.Through this process, a total of 170 compound information including strontium (Sr), aluminum (Al), lithium (Li), and oxygen (O) was extracted, and the extracted inorganic compound is shown in FIG. 2.

본 발명의 실시예에서는 분석하고자 하는 물질을 무기물로 선정하여 ICSD 데이터를 이용하였으나, 다른 형태의 물질의 경우, 다른 형태의 데이터를 사용할 수 있음은 물론이다.In the embodiment of the present invention, the material to be analyzed is selected as an inorganic material, and ICSD data is used. However, in the case of other types of materials, other types of data can be used.

화합물 분석을 위한 XRD 데이터 시뮬레이션XRD data simulation for compound analysis

이상과 같은 화합물의 분석에는 다양한 분석방법이 사용될 수 있으므로, 화합물의 분석에 적용하고자 하는 분석방법을 선택하고 관련 데이터의 확보가 필요하다. 물질의 분석에는 XRD 데이터, XPS 데이터, 또는 IR 데이터 등 다양한 데이터가 사용될 수 있는데, 본 발명의 실시예에서는, 무기물의 상(phase)을 판별하는 분석에 일반적으로 사용되고 있는 XRD를 물질 판별용 분석 데이터로 선택하였다.Since various analytical methods can be used for the analysis of the above compounds, it is necessary to select an analytical method to be applied to the analysis of the compound and secure relevant data. Various data such as XRD data, XPS data, or IR data may be used for the analysis of the material. In the embodiment of the present invention, XRD, which is generally used for analysis for determining the phase of an inorganic material, is analysis data for material identification. Was selected.

다음으로, 170개 화합물에 관한 XRD 데이터가 있을 경우, 해당 XRD 데이터를 사용할 수도 있으나, 본 발명의 실시예에서는 충분한 XRD 데이터가 없더라도, 기계학습에 충분한 효과를 구현할 수 있도록, 시뮬레이션 방법을 통해 XRD 데이터를 준비하였다.Next, if there is XRD data for 170 compounds, the corresponding XRD data may be used, but in the embodiment of the present invention, XRD data through a simulation method to implement a sufficient effect in machine learning even if there is not sufficient XRD data. Was prepared.

도 3은 ICSD 결정구조 데이터로부터 특정 화합물의 XRD 데이터를 시뮬레이션하는 과정을 나타낸 것이다.3 shows a process of simulating XRD data of a specific compound from ICSD crystal structure data.

본 발명의 실시예에서는 백그라운드 시뮬레이션을 위한 총 6개의 무작위 파라미터를 선택하고, 피크 모양 시뮬레이션을 위해서는 총 5개의 무작위 파라미터를 사용하며, 랜덤 노이즈 데이터를 부여하는 방법을 통해, 실제와 매우 유사한 시뮬레이션된 XRD 패턴 데이터를 도출하였다. 이때, ICSD 결정 정보로부터 XRD 패턴을 도출할 때 격자상수를 무작위로 변형시켜 보다 많은 수의 XRD 패턴을 도출할 수 있도록 하였다.In an embodiment of the present invention, a total of six random parameters are selected for background simulation, and a total of five random parameters are used for peak shape simulation. Pattern data was derived. At this time, when deriving the XRD pattern from the ICSD decision information, the lattice constant was randomly modified so that a larger number of XRD patterns could be derived.

시뮬레이션된 XRD 패턴은, ICSD에서 얻은 구조 팩터(structural factor) 및 열적 팩터(thermal factor)와, 다중도(multiplicity), 로렌츠 폴라리제이션 펙터(Lorentz and polarization factor), 흡수도(absorption) 및 우선 방향(preferred orientation)을 고정 변수로 사용하고, 동시에 피크 프로파일(Caglioti 및 믹싱 매개 변수), 배경 및 화이트 노이즈와 같은 조정 가능한 매개 변수를 사용하여 얻었다.The simulated XRD patterns include structural and thermal factors obtained from ICSD, multiplicity, Lorentz and polarization factor, absorbance, and preferred orientation. (preferred orientation) was used as a fixed variable, and at the same time obtained using adjustable parameters such as peak profile (Caglioti and mixing parameters), background and white noise.

구체적으로, ICSD에 제시된 시메트리(symmetry) 데이터로부터 다중도을 얻고, 입사 빔에 흑연 모노 크로메이터가 장착된 브래그-브란타노(Bragg-Brantano) 지오메트리를 사용하여 일반 실험용 XRD에 폴라리제이션 보정이 적용되었으며, 우선 배향은 존재하지 않는 것으로 간주하였다. 또한, 실험실 규모의 X-선 회절계에 흔히 나타나는 파라미터 값을 참조하여 조정 가능한 변수(parameter)를 무작위로 할당하였다.Specifically, a multiplicity was obtained from the symmetry data presented in the ICSD, and polarization correction was applied to the general experimental XRD using Bragg-Brantano geometry equipped with a graphite monochromator in the incident beam. , Priority orientation was deemed non-existent. In addition, adjustable parameters were randomly assigned with reference to parameter values commonly found in laboratory-scale X-ray diffractometers.

도 4는 분석하고자 하는 성분을 포함하는 무기 화합물인 Al₂O₃, Li₂O, SrO 및 SrAl₂O₄의 실제로 분석한 XRD 패턴과, 상기 방법을 통해 시뮬레이션된 XRD 패턴을 비교한 것이다. 도 4에서 위에 표시된 패턴(녹색)은 실제 실험을 통해 분석된 결과를 나타낸 것이고, 아래에 표시된 패턴(갈색)은 상기 방법으로 시뮬레이션된 XRD 패턴을 나타낸 것이다. 도 4에 나타낸 것과 같이, 시뮬레이션된 XRD 패턴과 실제 XRD 패턴 사이에 차이를 구분하기 어려울 정도로 매우 유사한 형태를 나타내었다. 이를 통해, 본 발명의 실시예에 따라 시뮬레이션된 XRD 패턴은 기계학습용 훈련 데이터에 적용될 수 있다고 보여지며, 이는 시뮬레이션 XRD 패턴을 이용한 기계 학습을 통해 수립한 모델을 사용하여 실제 물질의 정확한 판별로 이어지는 결과를 통해서도 확인된다.Figure 4 compares the XRD pattern actually analyzed of the inorganic compounds Al ₂ O ₃ , Li ₂ O, SrO and SrAl ₂ O ₄ containing the component to be analyzed, and the simulated XRD pattern through the above method. In FIG. 4, the pattern (green) indicated above represents the analyzed result through actual experiment, and the pattern (brown) indicated below represents the XRD pattern simulated by the above method. As shown in FIG. 4, a very similar form was shown to make it difficult to distinguish the difference between the simulated XRD pattern and the actual XRD pattern. Through this, it is shown that the simulated XRD pattern according to an embodiment of the present invention can be applied to training data for machine learning, which results in accurate discrimination of actual materials using a model established through machine learning using the simulated XRD pattern. It is also confirmed through.

이상과 같은 과정을 통해, 총 170개 화합물에 대한 다양한 시뮬레이션 XRD 패턴을 도출하였다.Through the above process, various simulated XRD patterns for 170 compounds were derived.

훈련 데이터세트 작성Create training dataset

도 5는 각 화합물의 혼합 데이터를 도출하는 과정을 나타낸 것이고, 도 6은 도 5에서 도출된 혼합 비율에 맞추어 XRD 데이터를 가공하는 과정을 나타낸 것이고, 도 7은 혼합 데이터를 도출하는 구체적인 조건을 나타낸 것이다.Figure 5 shows the process of deriving the mixed data of each compound, Figure 6 shows the process of processing the XRD data according to the mixing ratio derived in Figure 5, Figure 7 shows the specific conditions for deriving the mixed data will be.

도 5에 도시된 것과 같이, A 화합물, B 화합물, C 화합물을 이용한 1종(unary), 2종(binary), 3종(ternary) 혼합물에 대한 조성 선택과 혼합 비율의 할당을 통해 다양한 성분과 혼합 비율을 가진 혼합물이 도출되도록 하였다.As shown in FIG. 5, various components and components are selected through composition selection and allocation of mixing ratios for a mixture of a compound, a compound B, a compound C, and a mixture of one, two, and three ternary mixtures. The mixture with the mixing ratio was allowed to be derived.

2종 및 3종 혼합은, 도 2에 나타낸 그룹을 기반으로, 170개가 아닌 38개의 클래스로 구성된 조합을 기반으로 생성하였다. 그런 다음 클래스를 선택할 때마다 각 구성 요소에 170개의 변형물 중 하나를 무작위로 할당하였다. 예를 들어, Al₂O₃가 관심 혼합물의 성분으로 선택되었을 때, 74개의 변형 중 하나를 Al₂O₃에 무작위로 할당하였다. 이를 통해 Al₂O₃는 74개의 변형이적어도 한 번은 나타나도록 하였다. 또한, ₃₈C₃×21+₃₈C₂×9+₃₈C₁(183,521)의 상이한 혼합물에 대해, 3종 혼합물은 21개의 고정된 조성을 가지고, 2종 혼합물은 9개의 고정된 조성을 가진다. 즉, 혼합물당 평균 4배 반복을 초래할 수 있으나, 많은 변형이 있는 일부 혼합물에 대해서만 반복적인 선택을 채택했기 때문에, 전체 데이터세트에서 동일한 혼합물(즉, 동일한 성분 및 동일한 분율)을 거의 찾을 수 없었다. 이러한 방식으로 데이터 준비를 하지 않으면, 38개의 클래스 각각이 동일한 선택 기회를 가지기 때문에, 예를 들어 변형상이 없는 Li₂O₂에 비해 74개의 변형상들이 있는 Al₂O₃ 의 각 상이 선택될 가능성이 훨씬 적어지는 문제가 발생할 수 있다.Mixtures of two and three species were generated based on the group consisting of 38 classes, not 170, based on the groups shown in FIG. 2. Then, each time a class was selected, one of 170 variants was randomly assigned to each component. For example, when Al ₂ O ₃ was selected as a component of the mixture of interest, one of 74 modifications was randomly assigned to Al ₂ O ₃ . Through this, Al ₂ O ₃ was made to appear at least once even if there were at least 74 strains. In addition, for different mixtures of ₃₈ C ₃ ×21+ ₃₈ C ₂ ×9+ ₃₈ C ₁ (183,521), the three mixtures had 21 fixed compositions and the two mixtures had 9 fixed compositions. In other words, it can result in an average of 4 times repetition per mixture, but since the repetitive selection was adopted only for some mixtures with many variations, it was hard to find the same mixture (i.e., the same component and the same fraction) in the entire dataset. Without preparing the data in this way, each of the 38 classes has the same selection opportunity, so it is possible that each phase of Al ₂ O ₃ with 74 strains is selected, for example, compared to Li ₂ O ₂ without strain. Much less problems can occur.

이상과 같은 과정을 통해 혼합 비율이 도출되면, 도 6에 도시된 방법을 통해, 혼합물의 XRD 패턴을 도출한다. 도 6에 도시된 바와 같이, A 화합물로 시뮬레인션된 XRD 패턴과 B 화합물로 시뮬레인션된 XRD 패턴과 C 화합물로 시뮬레인션된 XRD 패턴을, 혼합물의 혼합 비율에 맞추어 XRD 패턴을 더하는 방법을 통해, 혼합물의 XRD 패턴을 도출하였다.When the mixing ratio is derived through the above process, the XRD pattern of the mixture is derived through the method illustrated in FIG. 6. As shown in FIG. 6, a method of adding an XRD pattern simulated with a compound A, an XRD pattern simulated with a compound B, and an XRD pattern simulated with a compound C according to the mixing ratio of the mixture, is added. Through, the XRD pattern of the mixture was derived.

혼합물의 시뮬레이션된 XRD 패턴은, 2종 또는 3종 혼합물 조합을 구성하기 위해 170개 엔트리에서 2 ~ 3개의 화합물을 선택할 때마다, 피크 프로파일이 무작위로 선택되게 하여 총 800,942개의 XRD 패턴 사이에 동일한 것이 존재하지 않도록 하였다. 이 과정에 사용된 구체적인 조건은 도 7에 나타내었다. 이때, 동일한 클래스(class)에 속하는 엔트리로 구성된 2종 또는 3종 혼합물 조합은 모두 제외하였다. 또한, 임의로 선택된 배경 및 백색 잡음이 각 혼합물에 적용되었다.The simulated XRD pattern of the mixture is the same between a total of 800,942 XRD patterns, with a peak profile randomly selected whenever two or three compounds are selected from 170 entries to form a combination of two or three mixtures. Was not present. The specific conditions used in this process are shown in FIG. 7. At this time, any combination of two or three mixtures consisting of entries belonging to the same class was excluded. In addition, randomly selected background and white noise was applied to each mixture.

또한, 도출된 XRD 패턴은, 무작위로 훈련용 데이터세트(600,942 패턴), 유효성 검사용 데이터세트(100,000 패턴) 및 테스트 데이터세트(100,000 패턴)로 각각 분할되었다.Further, the derived XRD patterns were randomly divided into training datasets (600,942 patterns), validation datasets (100,000 patterns), and test datasets (100,000 patterns), respectively.

또한, 데이터 유사성이 없도록 랜덤 조성 선택 및 반복 선택이 없는 조건으로, 상기한 방법에 비해 축소된 데이터세트를 별도로 준비하였고, 이 축소된 데이터세트의 혼합물 수는 ₃₈C₃×21+₃₈C₂×9+₃₈C₁ (183,521)로 줄었다.In addition, a reduced dataset was prepared separately compared to the above method, with no random composition selection and repeated selection so that there is no data similarity, and the number of mixtures of the reduced dataset is ₃₈ C ₃ ×21 + ₃₈ C ₂ × 9+ ₃₈ C ₁ (183,521).

이상과 같은 과정을 통해 3종류의 데이터세트가 준비되었다. 도 5의 위에 나타낸 것과 같이, A 화합물,B 화합물, C 화합물의 3종 21개, 2종 9개를 등간격으로 나누고, 도 2에 나타낸 물질 중에 ICSD 개수가 많이 포함되어 있는 혼합물은 도 7에 나타낸 조건에 따라 반복하여 생성한 800,942개의 XRD 패턴 데이터를 'Dataset_800k_org'라고 한다.Through the above process, three types of data sets were prepared. As shown above in FIG. 5, the mixture of A compound, B compound, and C compound divided into 21 of 3 types and 9 of 2 types is separated at equal intervals. 800,942 XRD pattern data repeatedly generated according to the indicated conditions are referred to as'Dataset_800k_org'.

도 5의 아래에 나타낸 것과 같이, A 화합물,B 화합물, C 화합물의 3종 21개, 2종 9개를 랜덤하게 나누고, 도 2에 나타낸 물질 중에 ICSD 개수가 많이 포함되어 있는 혼합물은 도 7과 같은 조건에 따라 반복하여 생성한 800,942개의 XRD 패턴 데이터를 'Dataset_800k_Rand'라고 한다.As shown below in FIG. 5, the mixture of A compound, B compound, and C compound in 21 of 3 types and 9 of 2 types is randomly divided. 800,942 XRD pattern data repeatedly generated under the same conditions are referred to as'Dataset_800k_Rand'.

A 화합물,B 화합물, C 화합물의 3종 21개, 2종 9개를 랜덤하게 나누고, 반복 횟수 없이 생성된 183,521개의 XRD 패턴 데이터를 'Dataset_180k_Rand'라고 한다.A compound, B compound, C compound, 21 of 3, 2 of 9 are randomly divided, and 183,521 XRD pattern data generated without repetition is referred to as'Dataset_180k_Rand'.

CNN을 적용한 딥러닝Deep Learning with CNN

본 발명의 실시예에서는 데이터세트 크기가 크고, 다상 XRD 패턴을 다루므로, 이의 분석에 적합한 CNN을 적용하여 기계학습을 수행하였다.In the embodiment of the present invention, since the dataset size is large and the multi-phase XRD pattern is handled, machine learning was performed by applying CNN suitable for its analysis.

도 8 및 도 9는 본 발명의 실시예에서 딥러닝에 사용한 2가지 CNN 아키텍쳐를 나타낸 것이다.8 and 9 show two CNN architectures used for deep learning in the embodiment of the present invention.

도 8 및 도 9에 나타낸 것과 같이, CNN 아키텍처는, 여러 컨볼루션 레이어(convolution layer)와 풀링 레이어(pooling layer)와 3개의 연속적으로 연결된 전결합층(fully connected layer)으로 구성된다.8 and 9, the CNN architecture consists of several convolutional layers, a pooling layer, and three fully connected layers.

구체적으로, 도 8에 도시된 것과 같이, 2개의 컨볼루션 레이어가 있는 아키텍처를 'CNN_2'라고 하고, 도 9에 도시된 것과 같이, 3개의 컨볼루션 레이어가 있는 아키텍처를 'CNN_3'라고 한다. 이 두가지 아키텍처에 있어서, 필터 수, 커널 크기, 풀링 크기, 스트라이드의 구성은 도 8 및 도 9에 도시된 것과 같다.Specifically, as illustrated in FIG. 8, an architecture with two convolutional layers is referred to as'CNN_2', and as illustrated in FIG. 9, an architecture with three convolutional layers is referred to as'CNN_3'. In these two architectures, the number of filters, kernel size, pooling size, and stride configuration are as shown in FIGS. 8 and 9.

한편, 도 8에 나타난 것과 같이, 본 발명의 실시예에서는, 스트라이드가 컨볼루션 레이어 중 하나의 풀링 크기보다 넓은 것을 채택하였는데, 이러한 구성은 기존의 풀링보다 현저하게 빠른 수렴을 제공하였다. 한편, CNN2 및 CNN3의 풀링 크기와 동일한 스트라이드로도 테스트한 결과 수렴은 상대적으로 느리지만 정확도는 실질적으로 동일한 수준으로 나타났다.On the other hand, as shown in Figure 8, in the embodiment of the present invention, the stride adopted a wider than the pooling size of one of the convolution layer, this configuration provided a significantly faster convergence than the existing pooling. On the other hand, when tested with strides equal to the pooling sizes of CNN2 and CNN3, the convergence was relatively slow, but the accuracy was substantially the same.

또한, 컨볼루션 레이어를 위한 활성화 함수와 전결합층을 위한 선형 함수로 정류선형유닛(Relu)이 채택되었고, 드롭 아웃은 전결합층에 대해서만 구현되었다. 전결합층에 대한 최종 활성화 기능은 시그모이드 기능이며, 그에 따른 코스트(또는 손실) 기능은 교차 엔트로피 기능이었다. 입력은 4501×1 벡터를, 출력은 38×1 벡터로 하였다. 또한, Adam 옵티 마이저를 사용하였다. 상 식별을 위한 모든 에포크(epoch)에 대해 실행 속도는 0.001로 고정되었다.In addition, a rectifying linear unit (Relu) was adopted as an activation function for the convolution layer and a linear function for the pre-combination layer, and dropout was implemented only for the pre-combination layer. The final activation function for the prebonded layer was a sigmoid function, and the resulting cost (or loss) function was a cross entropy function. The input was a 4501×1 vector, and the output was a 38×1 vector. Also, an Adam optimizer was used. The run rate was fixed at 0.001 for all epochs for phase identification.

이상과 같은 CNN 아키텍처로 훈련 데이터세트를 사용하여 예측모델을 도출하였다. The prediction model was derived by using the training dataset with the above CNN architecture.

한편, 본 발명의 실시예에서 적용한 CNN 아키텍처는 다양하게 변형될 수 있고 본 실시예에서 제시한 아키텍처에 제한되지 않는다. 나아가 반드시 CNN만을 이용할 필요도 없으며, 예를 들어 순환 신경망(Recurrent Neural Network, RNN)도 사용될 수 있으며, 특히 시뮬레이션으로 도출된 XRD 패턴의 실측 XRD 패턴과의 차이를 보완하기 위해서는 생성적 적대 신경망(Generative Adversarial Network, GAN) 기술도 적용될 수 있다.Meanwhile, the CNN architecture applied in the embodiment of the present invention can be variously modified and is not limited to the architecture presented in the present embodiment. Furthermore, it is not necessary to use only CNN, for example, a recurrent neural network (RNN) may be used, and in particular, to compensate for the difference between the XRD pattern derived from the simulation and the actual XRD pattern, a generative hostile neural network (Generative) Adversarial Network (GAN) technology can also be applied.

CNN 예측 모델을 이용한 상 판별 결과Phase discrimination result using CNN prediction model

이상에서 설명한 훈련 데이터세트를 사용하여 CNN2 및 CNN3 아키텍처를 통한 학습 결과를 통해 도출된 예측 모델을 사용하여, 상 판별(phase identification)이 가능한지를 테스트하였다.Using the training dataset described above, it was tested whether phase identification is possible using a predictive model derived from learning results through the CNN2 and CNN3 architectures.

(1) Dataset_800k_org(1) Dataset_800k_org

도 10은 'Dataset_800k_org'를 사용한 경우 학습 코스트 및 정확도를 나타낸 것이다. 도 9에서 확인되는 것과 같이, CNN2 아키텍처나 CNN3 아키텍처 모두 유효성 검사 결과 정확도가 거의 100%에 도달하였고, CNN2 및 CNN3에 대해 유효성 검증 코스트가 각각 0.007 및 0.0018로 감소하였다.10 shows learning cost and accuracy when'Dataset_800k_org' is used. As can be seen in Figure 9, both the CNN2 architecture and the CNN3 architecture, the accuracy of the validation result has reached almost 100%, and the validation cost for CNN2 and CNN3 is reduced to 0.007 and 0.0018, respectively.

'Dataset_800k_org'를 사용하여 훈련한 CNN2 및 CNN3의 테스트를 위하여, 훈련 데이터세트와 겹치지 않도록 100,000개의 시뮬레이션 XRD 패턴을 준비하였고, 또한 3종의 화합물 Li₂O-SrO-Al₂O₃과 SrAl₂O₄-SrO-Al₂O₃을 다양한 분율로 건식 혼합하여 각각 50개의 혼합물 샘플을 제조하여, 실제 XRD 패턴을 획득하여 상 판별 테스트에 사용하였다.For testing of CNN2 and CNN3 trained using'Dataset_800k_org', 100,000 simulated XRD patterns were prepared so as not to overlap with the training dataset, and also three compounds Li ₂ O-SrO-Al ₂ O ₃ and SrAl ₂ O ₄ -SrO-Al ₂ O ₃ was dry-mixed in various fractions to prepare 50 mixture samples, respectively, to obtain an actual XRD pattern and use it in a phase discrimination test.

아래 표 1은 상기 2가지 테스트 데이터세트를 사용하여 테스트한 결과를 나타낸 것이다.Table 1 below shows the test results using the two test datasets.

CNN 아키텍처CNN architecture Dataset_80K_orgDataset_80K_org CNN2CNN2 CNN3CNN3 상 판별
(2 Epochs)Phase discrimination
(2 Epochs) 시뮬레이션 XRD
테스트 데이터세트Simulation XRD
Test dataset 100,000 패턴100,000 patterns 99.60%99.60% 100%100% 실제 XRD
테스트 데이터세트 Real XRD
Test dataset Li₂O_SrO_Al₂O₃
(50 패턴)Li ₂ O_SrO_Al ₂ O ₃
(50 patterns) 100%100% 100%100% SrAl₂O₄_SrO_Al₂O₃
(50 패턴)SrAl ₂ O ₄ _SrO_Al ₂ O ₃
(50 patterns) 97.33%97.33% 98.67%98.67%

상기 표 1에 나타난 것과 같이, 시뮬레이션된 테스트 데이터세트는 CNN2 및 CNN3 아키텍처에 모두 테스트 정확도가 거의 완벽한 수준으로 나타났다. 또한, Li₂O-SrO-Al₂O₃ 데이터세트의 테스트 결과도 CNN2 및 CNN3 모두에 대해 정확도 100%로 완벽하게 일치하였다.As shown in Table 1 above, the simulated test dataset showed nearly perfect test accuracy for both the CNN2 and CNN3 architectures. In addition, the test results of the Li ₂ O-SrO-Al ₂ O ₃ dataset were also perfectly matched with 100% accuracy for both CNN2 and CNN3.

한편, SrAl₂O₄-SrO-Al₂O₃ 데이터세트의 경우 테스트 정확도가 다소 저하되는 결과를 나타내는데, 이는 혼합원료로 사용된 SrAl₂O₄(SAO) 분말에 일정량의 불순물 상이 포함된 것에 기인하는 것으로 확인되었으며, 이는 약간의 잘못 예측된 결과가 CNN 모델의 오류가 아니라는 점을 의미한다.On the other hand, in the case of the SrAl ₂ O ₄ -SrO-Al ₂ O ₃ dataset, the test accuracy is somewhat deteriorated, which is due to the fact that a certain amount of impurity phase is contained in the SrAl ₂ O ₄ (SAO) powder used as a mixed raw material. And some mispredicted results are not an error in the CNN model.

이러한 테스트 결과로부터, 본 발명의 실시예에 따른 방법이, 다양한 물질이 혼합된 다성분계 물질에 포함된 상(phase)의 판별에 있어서, 매우 높은 정확도를 제공할 수 있음을 알 수 있다.From these test results, it can be seen that the method according to the embodiment of the present invention can provide very high accuracy in discrimination of a phase contained in a multi-component material in which various materials are mixed.

(2) Dataset_800k_rand / Dataset_180k_rand에 의한 상 식별(2) Phase identification by Dataset_800k_rand / Dataset_180k_rand

데이터세트 'Dataset_800k_rand'와 'Dataset_180k_rand'는 상기 데이터세트와 달리, 2종 및 3종 혼합물을 설정함에 있어서, 임의적인 방법을 사용한 것으로, 훈련 데이터세트를 준비할 때 혼합물 데이터를 생성할 때 블랜딩 방법의 차이에 따른 영향을 비교하기 위한 것이다.Unlike the above datasets, the datasets'Dataset_800k_rand' and'Dataset_180k_rand' are used in an arbitrary method for setting two and three types of mixtures. To compare the effects of differences.

도 11에 나타난 것과 같이, Dataset_800k_org를 사용하여 CNN을 수행한 경우와 실질적으로 동일한 코스트 및 정확도가 달성되었다.As shown in FIG. 11, substantially the same cost and accuracy were achieved as in the case of performing CNN using Dataset_800k_org.

아래 표 2는 상기한 테스트 데이터세트를 이용하여 테스트한 결과를 나타낸 것이다.Table 2 below shows the test results using the test data set described above.

CNN 아키텍처CNN architecture Dataset_80K_randDataset_80K_rand Dataset_18K_randDataset_18K_rand CNN3CNN3 CNN3CNN3 상 판별
(2 Epochs)Phase discrimination
(2 Epochs) 시뮬레이션 XRD 테스트 데이터세트Simulation XRD test dataset 100,000 패턴100,000 patterns 100%100% (23,000
패턴)
99.76%(23,000
pattern)
99.76% 실제 XRD
테스트 데이터세트 Real XRD
Test dataset Li₂O_SrO_Al₂O₃
(50 패턴)Li ₂ O_SrO_Al ₂ O ₃
(50 patterns) 100%100% 98.67%98.67% SrAl₂O₄_SrO_Al₂O₃
(50 패턴)SrAl ₂ O ₄ _SrO_Al ₂ O ₃
(50 patterns) 98%98% 97.33%97.33%

표 2와 같이, 시뮬레이션된 테스트 데이터세트와 실제 Li₂O-SrO-Al₂O₃ 테스트 데이터세트는 'Dataset_800k_rand'로 훈련할 때 거의 100% 정확도를 나타내었다. 한편, 'Dataset_180k_rand'로 훈련하면 시뮬레이션된 테스트 데이터세트와 실제 테스트 데이터세트에서 모두 테스트 정확도가 다소 저하되는 경향을 나타내는데 이는 훈련 데이터세트의 크기가 상대적으로 작은 것에 기인하는 것으로 보인다.As shown in Table 2, the simulated test dataset and the actual Li ₂ O-SrO-Al ₂ O ₃ test dataset showed almost 100% accuracy when training with'Dataset_800k_rand'. On the other hand, when training with'Dataset_180k_rand', the test accuracy tends to decrease somewhat in both the simulated test data set and the actual test data set, which seems to be due to the relatively small size of the training data set.

이상과 같은 결로부터, 훈련 데이터세트를 준비할 때 혼합물의 블랜딩 방법의 차이에 따른 영향은 거의 없으며, 데이터세트의 크기에 따라 정확도에 약간의 차이가 있음을 알 수 있다.From the above results, it can be seen that when preparing the training dataset, there is little influence due to the difference in the blending method of the mixture, and there is a slight difference in accuracy depending on the size of the dataset.

Claims

(a) 2 이상의 원소를 선택하는 단계;
(b) 상기 2 이상의 원소로 생성 가능한 것으로 분석된 복수의 화합물의 데이터를 수집하는 단계;
(c) 상기 수집된 복수의 화합물 각각에 관한 이미지 또는 스펙트럼 형태의 분석 데이터를 준비하는 단계;
(d) 상기 복수의 화합물 중에서, 2종 이상의 화합물을 선택하여 소정의 혼합 비율로 혼합하고, 상기 각각 이미지 또는 스펙트럼 데이터를 상기 소정의 혼합 비율에 맞추어 혼합 가공한 것을 포함하여 훈련 데이터를 만드는 단계;
(e) 상기 훈련 데이터를 사용하여 기계 학습을 수행하는 단계; 및
(f) 실제 물질로부터 얻은 이미지 또는 스펙트럼 형태의 분석 데이터를 상기 기계 학습을 통해 수득한 모델을 사용하여 판별 및/또는 분석하는 단계를 포함하는, 물질의 판별 및 분석방법.(a) selecting two or more elements;
(b) collecting data of a plurality of compounds analyzed as being capable of being produced by the two or more elements;
(c) preparing analytical data in the form of an image or spectrum for each of the plurality of compounds collected;
(d) selecting two or more compounds from among the plurality of compounds, mixing them at a predetermined mixing ratio, and creating training data including mixing and processing the image or spectrum data according to the predetermined mixing ratio;
(e) performing machine learning using the training data; And
(f) discrimination and/or analysis method of the substance, comprising the step of discriminating and/or analyzing the analysis data in the form of an image or spectrum obtained from the actual substance using the model obtained through the machine learning.

제1항에 있어서,
상기 복수의 화합물의 분석 데이터는, 기존에 분석되어 있는 화학분석, 물질분석 데이터를 이용하는, 물질의 판별 및 분석방법.According to claim 1,
Analytical data of the plurality of compounds, using the chemical analysis, material analysis data previously analyzed, the method of discrimination and analysis of substances.

제1항에 있어서,
상기 복수의 화합물에 관하여 각각의 이미지 또는 스펙트럼 형태의 분석 데이터를 준비하는 단계는, 실제 분석된 결과를 사용하거나, 각각의 화합물에 관한 물질 정보를 이용하여 프로그램을 사용하여 해당 분석 이미지 또는 스펙트럼을 시뮬레이션하여 획득하는, 물질의 판별 및 분석방법.According to claim 1,
The step of preparing the analysis data in the form of respective images or spectral forms for the plurality of compounds may be performed using actual analyzed results or using a program using material information about each compound to simulate the corresponding analysis images or spectra. The method of discrimination and analysis of a substance, which is obtained by.

제1항에 있어서,
상기 이미지 또는 스펙트럼 형태의 분석 데이터는, XRD 데이터, XPS 데이터, IR 데이터 또는 투과전자현미경 회절 패턴인, 물질의 판별 및 분석방법.According to claim 1,
The image or spectral form of the analysis data, XRD data, XPS data, IR data or a transmission electron microscope diffraction pattern, the discrimination and analysis method of the material.

제1항에 있어서,
상기 기계 학습은 CNN(Convolutional Neural Network)을 통해 수행되는, 물질의 판별 및 분석방법.According to claim 1,
The machine learning is performed through a convolutional neural network (CNN), a method for discriminating and analyzing substances.