CN116933151A

CN116933151A - Method for distinguishing deposit types based on sphalerite trace elements

Info

Publication number: CN116933151A
Application number: CN202310533691.0A
Authority: CN
Inventors: 赵红涛; 邵拥军; 张宇
Original assignee: Central South University
Current assignee: Central South University
Priority date: 2023-05-12
Filing date: 2023-05-12
Publication date: 2023-10-24

Abstract

The invention relates to a method for distinguishing deposit types based on zinc blende microelements; belonging to the technical field of mineral deposit prospecting. The invention firstly provides the construction of the sphalerite trace element database, the construction of the sphalerite trace element training set and the sphalerite trace element test set and the combination of a machine learning method, takes sphalerite trace elements widely existing in various mineral deposits as marks for judging the mineral deposit types, improves the mineral deposit prospecting efficiency and accuracy related to sphalerite, and solves the problems of high mineral deposit deep side prospecting difficulty and high cost.

Description

Method for distinguishing deposit types based on sphalerite trace elements

Technical Field

The invention relates to a method for distinguishing deposit types based on zinc blende microelements; belonging to the technical field of mineral deposit prospecting.

Background

Determining deposit causes is still one of the most critical but challenging problems in petrography research, and correctly determining deposit causes helps to better understand regional large-scale ore-forming processes, and earlier application of deposit models can significantly improve exploration efficiency. Different types of deposits are marked by different sources of mineral matter, physicochemical conditions and mineral formation processes, all of which will significantly affect the trace element composition of the mineral. Mineral trace element chemistry is therefore widely used to determine ore causes, the most common minerals including garnet, quartz, pyrite and magnetite. However, the same minerals from different ore genetic types may have similar microelement geochemistry and thus it is difficult to determine the deposit type.

Sphalerite is the most important zinc-bearing ore mineral and is ubiquitous in many types of deposits, including volcanic block sulfides (VMS), misibbean (MVT), porphyry (Porphyry), hydrothermal (EPI), jet-deposit (SEDEX) and Skarn deposits (Skarn). Sphalerite can contain a variety of trace elements by displacement, the content of which can distinguish between deposit types. Over the last several decades, studies have been conducted to classify mineral deposit types using sphalerite trace elements. The traditional method is to strengthen the discrimination of the deposit types of the sphalerite microelements through a binary diagram of Mn-Fe, co/Ni-Cd/Fe, cd/Fe-Mn, ge-In and a ternary diagram of Cd-Mn-1000 Ge. However, the existing discrimination diagrams cannot accurately distinguish different deposit types due to the fact that sphalerite microelements from different deposit types are similar in composition.

Search and find: so far, no report related to the accurate and efficient discrimination of deposit types by constructing a sphalerite trace element database, a sphalerite trace element training set and a sphalerite trace element testing set and combining a machine learning method is known.

Disclosure of Invention

The invention firstly provides the construction of the sphalerite trace element database, the construction of the sphalerite trace element training set and the sphalerite trace element testing set and the combination of a machine learning method, takes the content of the sphalerite trace elements widely existing in various mineral deposits as a mark for judging the mineral deposit types, improves the mining efficiency and accuracy of the mineral deposits related to sphalerite, and solves the problems of high mining difficulty and high cost at the deep side of the mineral deposit.

The invention discloses a method for accurately distinguishing deposit types based on sphalerite trace elements, which specifically comprises the following steps:

step one, establishing a zinc blende trace element database

Collecting sphalerite trace element data from globally published literature;

sorting the collected data to create at least 3000 sets of trace element databases from the six deposit types;

the database comprises deposit names, deposit positions, deposit types and sphalerite trace element contents;

in the database, the names of ore deposits, the types of the ore deposits and the content of the sphalerite microelements are in a corresponding relation, namely, one ore deposit; the mineral deposit types and the contents of sphalerite microelements in the mineral deposit are in corresponding relation.

The six deposit types include at least: volcanic block sulfides (VMS), michibix Valley Types (MVT), porphyry, hydrothermal (EPI), jet-deposit types (SEDEX), and Skarn deposits (Skarn);

the sphalerite trace elements at least comprise at least 10 of Ag, as, cd, co, ga, ge, sb, pb, fe, mn, in, sn and Cu;

the database is an important content for learning a subsequent machine learning model, is a key for determining the deposit type discrimination, and needs a large amount of zincblende trace element content when the database is built, and the zincblende trace element content is accurately divided into different deposit types;

step two data preprocessing

And (3) performing nearest neighbor interpolation and center logarithmic ratio conversion on the data of various sphalerite microelements in the sphalerite microelements database established in the step one, so that the data covariance is unchanged and accords with normal distribution. The data of various sphalerite trace elements include Ag, as, cd, co, ga, ge, sb, pb, fe, mn, in, sn and Cu content;

when the data is preprocessed, the data of the trace elements of the determined types are processed under the condition of determining the deposit types, so that the data of the trace elements in the deposit types become normalized distribution; the data includes content data;

step three, building a training set test set

For training and testing Random forest (Random forest) and lifting gradient decision tree (Gradient Boosting) classifiers, a test set and training set are established, and at least 200, preferably 200-400, random forest and gradient lifting classifiers are extracted from each deposit type for which a sphalerite trace element database is established; to avoid favoring more data classes, the same amount of data is randomly extracted from each deposit type by a random function; the rest data are used for testing to obtain a training and tested classification matrix; training and testing a classification matrix for evaluating accuracy of model training, each column representing a predicted deposit type, the total number of columns representing the number of data predicted as the deposit type; each row represents the true deposit type of data, and the total number of data for each row represents the number of data instances for that deposit type.

Step four, establishing a machine learning model

And establishing a machine learning model by using a random forest and gradient lifting algorithm. Random forest and gradient lifting adopts a bootstrap sampling method to randomly extract training samples from a sample set to generate a decision tree and a training subset. When constructing a decision tree, optimally dividing each node in the decision tree; the quality of node segmentation is therefore very important for creating decision trees. And (3) performing model super-parameter tuning by using cross verification, and stopping splitting when the depth of the decision tree is greater than or equal to 4500, preferably 5000 and the generated child node appears N times, wherein the parameter is the optimal parameter of the model. The N is less than or equal to 6;

the model is randomly sampled in an original data set to form n different sample data sets, then n different decision tree models are built according to the data sets, and finally a final result is obtained according to voting conditions of the decision tree models.

In the invention, for different data types, the super parameters of the algorithms need to be adjusted to achieve the optimal effect, and the key parameters of the machine learning model are as follows: n_evastiators=5000, max_depth=3, min_samples_split=6;

step five, evaluating reliability of the model

Receiver Operating Characteristics (ROC) of random forest and gradient lifting algorithm models are obtained using orange software, by describing true and false positive rates, and by plotting the true positive rate on the y-axis and the false positive rate on the x-axis to obtain a ROC curve, the area under the ROC curve (AUC), which is typically used as a measure of classifier performance. The AUC value ranges from 0 to 1, and the model with a reliable model AUC value greater than 0.5, i.e., an AUC value greater than 0.5, is considered to be a reliable model, the closer to the 1 model the more reliable.

Step six, distinguishing the deposit type

Obtaining the content of each trace element in the sphalerite of the type to be judged, predicting the deposit type by utilizing the reliable machine learning model obtained in the step five, namely, establishing a machine learning model by using a random forest and gradient lifting algorithm to obtain a classification matrix of the trace elements in the sphalerite of the type to be judged, and judging the deposit type according to the classification matrix.

In the second step, the missing value refers to the fact that the content is lower than the detection limit of a testing instrument and part of research does not carry out testing work on individual elements, and the missing value can possibly change the mean value and variance estimation in analysis, so that the invention interpolates by using a k nearest neighbor method, uses center logarithmic transformation, and enables data to accord with normal distribution.

The database also comprises mineral symbiotic combination parameters; the mineral symbiotic combination parameter is that geological information of mineral symbiosis is parameterized, namely that the symbiotic minerals existing in a deposit are marked as 1, the non-existing symbiotic minerals are marked as 0, and the symbiotic minerals comprise at least one of chalcopyrite, pyrite, galena, arsenopyrite and magnetite.

Preferably, the method for accurately distinguishing the deposit type based on sphalerite trace elements of the invention is characterized in that when the data is preprocessed,

the elements Ag, as, cd, co, ga, ge, sb, pb, fe, mn, in, sn and Cu with the loss value of the content of the microelements of the sphalerite less than 40% are selected;

the missing value means that the content is lower than the detection limit of a testing instrument and the test work is not carried out on the individual elements by partial researches;

for the missing values of sphalerite microelements (Ag, as, cd, co, ga, ge, sb, pb, fe, mn, in, sn and Cu) in the database, interpolation is carried out in data processing software XLSTAT by adopting a nearest neighbor method based on Euclidean distance without changing covariance of a data set;

in order to make the sphalerite microelements conform to normal distribution, center-to-log ratio conversion is performed on the sphalerite microelements in the ioGAS.

Preferably, the method for accurately judging the deposit type based on sphalerite trace elements is provided, and during model training,

a random forest and gradient lifting algorithm is used for establishing a machine learning model, and a sphalerite trace element training set is used for training;

random forest and gradient lifting adopts a bootstrap sampling method to randomly extract training samples from a sample set to generate a decision tree and a training subset. When constructing a decision tree, optimally dividing each node in the decision tree; the quality of node segmentation is therefore very important for creating decision trees. When the decision tree is 4500 or more, preferably 5000 or more, the depth is 3 or more, and the generated child nodes appear N times, splitting is stopped, and the parameter is the optimal parameter of the model. And N is less than or equal to 6. In the technical process, the invention finds that six deposit types are very difficult to judge through the content of the sphalerite microelements, and because the sphalerite microelements of the six deposit types have similar content, higher accuracy can be obtained only through multiple parameter adjustment and constraint by adding geological conditions.

In the technical process, the invention also tries to classify the deposit types by utilizing a plurality of machine learning methods, including Random Forests, gradient lifting decision trees, artificial neural networks, lasso algorithms, support vector machines, k-nearest neighbors (Random forces, gradient Boosting, artificial Neural Networks, least Absolute Shrinkage and Selection Operator, support Vector Machines, k-Nearest Neighbors) and the like; however, the machine learning model established by the random forest method and the gradient lifting decision tree method is found to be more reliable and accurate.

In the technical process, particularly, the VMS deposit and other deposits have similar trace element content and are highly overlapped with other types of deposits, so that the VMS deposit and other deposit types are difficult to distinguish, and the result is found to be obviously improved through parameterization of geological information.

Drawings

Fig. 1 is a schematic flow chart of an implementation for accurately distinguishing the type of a deposit based on sphalerite trace elements.

Detailed Description

Example 1

S1, data collection:

according to recently published literature 4095 sets of sphalerite trace element data were collected for 86 deposits worldwide, these 86 deposits including 11 shallow hot fluid deposits, 27 misischibi valley deposits, 4 zebra deposits, 5 jet deposit deposits, 26 skarn deposits and 12 volcanic lump sulfide deposits, the element statistics for each deposit type deposit being shown in table 1; the database has the advantage of wide data sources and covers sphalerite geochemical data of global deposits, which ensures that the usability of the used model is not limited to any one region.

The database comprises deposit names, deposit positions, deposit types, mineral symbiotic combination parameters and sphalerite trace element contents;

in the database, the names of ore deposits, the types of the ore deposits and the content of microelements of sphalerite are in a corresponding relation, namely, one ore deposit; the mineral deposit types and the content of sphalerite microelements in the mineral deposit are in corresponding relation.

The mineral symbiotic combination parameter is parameterization of geological information of mineral symbiosis, namely, the symbiotic minerals existing in a deposit are marked as 1, the non-existing symbiotic minerals are marked as 0, and the symbiotic minerals comprise at least one of chalcopyrite, pyrite, galena, arsenopyrite and magnetite;

the sphalerite microelements at least comprise Ag, as, cd, co, ga, ge, sb, pb, fe, mn, in, sn and Cu;

in the database, the content of each trace element in each ore bed type is shown in table 1;

table 1 database of trace element content for six deposit types

N=number; MIN = minimum; MAX = maximum; MEAN = average;

s2, data preprocessing:

and (3) performing nearest neighbor interpolation and center logarithmic ratio conversion on the data of various sphalerite microelements in the sphalerite microelements database established in the step (S1) so that the data covariance is unchanged and accords with normal distribution. The data of various sphalerite trace elements include Ag, as, cd, co, ga, ge, sb, pb, fe, mn, in, sn and Cu content;

the specific operation is as follows:

for each deposit type, the following operations are performed after the deposit type is determined:

1. the elements Ag, as, cd, co, ga, ge, sb, pb, fe, mn, in, sn and Cu with the loss value of the zinc blende microelements less than 40 percent are selected;

2. for the missing values of sphalerite microelements (Ag, as, cd, co, ga, ge, sb, pb, fe, mn, in, sn and Cu) in the database, interpolation is carried out in data processing software XLSTAT by adopting a nearest neighbor method based on Euclidean distance without changing covariance of a data set;

3. in order that the content of each element in the zincblende microelements (Ag, as, cd, co, ga, ge, sb, pb, fe, mn, in, sn and Cu) accords with normal distribution, carrying out center logarithmic transformation on the zincblende microelements in the ioGAS;

s3, establishing a zinc blende trace element test set and a training set

1. Randomly selecting 300 groups of sphalerite trace element data from each deposit type in the data pre-processed database by using a random function to establish a training set;

2. the data of the residual sphalerite trace elements of each deposit type are used to build a test set;

s4, model training

1. Establishing a machine learning model by using a random forest and gradient lifting algorithm, and training by using the established training set;

2. random forest and gradient lifting adopts a bootstrap sampling method to randomly extract training samples from a sample set to generate a decision tree and a training subset. When constructing a decision tree, optimally dividing each node in the decision tree; the quality of node segmentation is therefore very important for creating decision trees. When the decision tree is 5000 and the depth is 3, and the generated child nodes appear 6 times, stopping splitting, wherein the parameters are optimal parameters of the model (n_identifiers=5000, max_depth=3 and min_samples_split=6);

3. the random forest and gradient lifting identification deposit type classification matrix (table 1) of the sphalerite trace element test set is obtained, and the overall classification accuracy is 93.02% and 92.82%, respectively.

TABLE 2 sphalerite microelements measurement set classification matrix

S5, evaluating reliability of the model

1. The reliability of the machine learning model is evaluated using the test set.

The performance of both models was evaluated with AUC values obtained from Receiver Operating Characteristics (ROC) curves. And (3) acquiring receiver operation characteristic curves of the random forest and gradient lifting algorithm model by using orange software, so as to acquire AUC values of 0.989 and 0.991 of the random forest and gradient lifting identification deposit types, which shows that the two machine learning models have higher reliability.

Example 2

S1.s2.s3.s4 example 1 procedure is identical and S5, S6 are specifically set forth herein

S5, acquiring a sphalerite trace data set of a lead-zinc deposit of the pool

1. Obtaining a zinc blende sample of a lead-zinc ore deposit of a clean water pond through field sampling, and manufacturing a laser sheet of the zinc blende sample;

2. and obtaining the zinc blende trace element content set of the lead-zinc ore deposit of the clean water pond by a trace element analyzer laser ablation inductively coupled plasma mass spectrometer (LA-ICP-MS).

S6, ore deposit type prediction

A machine learning model is established by utilizing a random forest and gradient lifting algorithm, the type of the lead-zinc ore deposit of the clean water pond is predicted, the obtained lead-zinc ore deposit sphalerite trace element set is established by utilizing the random forest and gradient lifting algorithm, a classification matrix (table 2) of the clean water pond sphalerite trace element is obtained,

TABLE 3 clear water pond sphalerite microelement classification matrix

The deposit type was judged to be a Misischibi Valley Type (MVT) deposit according to table 2.

Claims

1. A method for accurately distinguishing the type of a deposit based on sphalerite trace elements is characterized in that; the method comprises the following steps:

step one, establishing a zinc blende trace element database

Collecting sphalerite trace element data from globally published literature;

in the database, the names of ore deposits, the types of the ore deposits and the content of microelements of sphalerite are in corresponding relation, namely, one ore deposit; the mineral deposit types and the contents of sphalerite microelements in the mineral deposit are in corresponding relation;

the six deposit types include at least: volcanic block sulfides, michibijou type, porphyry type, shallow hydrothermal type, jet deposition type, and skarn deposit;

step two data preprocessing

Performing nearest neighbor interpolation and center logarithmic ratio conversion on the data of various sphalerite microelements in the sphalerite microelements database established in the first step, so that the covariance of the data is unchanged and accords with normal distribution; the data of various sphalerite trace elements include Ag, as, cd, co, ga, ge, sb, pb, fe, mn, in, sn and Cu content;

when the data is preprocessed, under the condition that the deposit type is determined, processing the data of the trace elements of the determined type so that the data of the trace elements in the deposit type become normalized distribution; the data includes content data;

step three, building a training set test set

For training and testing random forest and gradient lifting decision tree classifiers, a test set and a training set are established, and at least 200, preferably 200-400, random forest and gradient lifting classifiers are extracted from each deposit type for which a sphalerite trace element database is established; to avoid favoring more data classes, the same amount of data is randomly extracted from each deposit type by a random function; the rest data are used for testing to obtain a training and tested classification matrix;

step four, establishing a machine learning model

Establishing a machine learning model by utilizing a random forest and gradient lifting algorithm; randomly extracting training samples from a sample set by adopting a bootstrap sampling method to generate a decision tree and a training subset; when constructing a decision tree, optimally dividing each node in the decision tree; performing model super-parameter tuning by using cross verification, stopping splitting when the decision tree is 4500 or more, preferably 5000 or more and the depth is 3 or more and the generated child nodes appear N times, wherein the parameter is the optimal parameter of the model; the N is less than or equal to 6;

randomly sampling the model in an original data set to form n different sample data sets, constructing n different decision tree models according to the data sets, and finally obtaining a final result according to voting conditions of the decision tree models;

step five, evaluating the reliability of the model:

acquiring receiver operation characteristic curves of random forest and gradient lifting algorithm models, namely ROC curves, by describing true positive rate and false positive rate, obtaining the ROC curves by drawing the true positive rate on a y axis and drawing the false positive rate on an x axis, and taking the area under the curve AUC as a measurement standard of classifier performance; the AUC value range is 0-1, and the AUC value of the reliable model is more than 0.5, namely, the model with the AUC value more than 0.5 is considered to be a reliable model, and the closer to the 1 model, the more reliable;

step six, distinguishing the deposit type

2. The method for accurately distinguishing the deposit type based on sphalerite trace elements according to claim 1, wherein the method is characterized by comprising the following steps: in the second step, interpolation is carried out by using a k nearest neighbor method, and central logarithmic transformation is used to enable data to accord with normal distribution.

3. The method for accurately distinguishing the deposit type based on sphalerite trace elements according to claim 1, wherein the method is characterized by comprising the following steps: in the course of the pre-processing of the data,

for the missing values of the sphalerite microelements Ag, as, cd, co, ga, ge, sb, pb, fe, mn, in, sn and Cu in the database, interpolation is carried out in the data processing software XLSTAT by adopting a nearest neighbor method based on Euclidean distance, wherein the covariance of the data set is not changed;

4. The method for accurately distinguishing the deposit type based on sphalerite trace elements according to claim 1, wherein the method is characterized by comprising the following steps: in the third step, training and testing a classification matrix for evaluating the accuracy of model training, each column representing a predicted deposit type, the total number of columns representing the number of data predicted as the deposit type; each row represents the true deposit type of data, and the total number of data for each row represents the number of data instances for that deposit type.

5. The method for accurately distinguishing the deposit type based on sphalerite trace elements according to claim 1, wherein the method is characterized by comprising the following steps: during model training, training is carried out by using the existing sphalerite trace element training set.

6. The method for accurately distinguishing the deposit type based on sphalerite trace elements according to claim 1, wherein the method is characterized by comprising the following steps: the key parameters of the machine learning model are as follows: n_evastiators=5000, max_depth=3, min_samples_split=6.