CN116340839B - Algorithm selecting method and device based on ant lion algorithm - Google Patents

Algorithm selecting method and device based on ant lion algorithm Download PDF

Info

Publication number
CN116340839B
CN116340839B CN202310127816.XA CN202310127816A CN116340839B CN 116340839 B CN116340839 B CN 116340839B CN 202310127816 A CN202310127816 A CN 202310127816A CN 116340839 B CN116340839 B CN 116340839B
Authority
CN
China
Prior art keywords
ant
algorithm
lion
meta
population
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310127816.XA
Other languages
Chinese (zh)
Other versions
CN116340839A (en
Inventor
刘艺
李庚松
郑奇斌
秦伟
李翔
刘坤
刁兴春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Big Data Advanced Technology Research Institute
Original Assignee
Beijing Big Data Advanced Technology Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Big Data Advanced Technology Research Institute filed Critical Beijing Big Data Advanced Technology Research Institute
Priority to CN202310127816.XA priority Critical patent/CN116340839B/en
Publication of CN116340839A publication Critical patent/CN116340839A/en
Application granted granted Critical
Publication of CN116340839B publication Critical patent/CN116340839B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/006Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides an algorithm selection method and device based on an ant lion algorithm, and relates to the technical field of computers. Comprising the following steps: firstly, constructing an initial ant population, calculating the accuracy and diversity of an integral element algorithm corresponding to each ant individual to serve as fitness of the ant individual, then calculating the pareto dominant relationship among the ant individuals through the fitness, and selecting the pareto solution in the ants as an ant lion to construct the initial ant lion population. Then, the elite ant lion in the ant lion population is determined, and a new ant population is generated by enhancing the wandering strategy according to the elite ant lion. And finally, iteratively executing the steps, and outputting the ant-lion population obtained in the last iteration to determine a target integrated meta-algorithm when the iteration exit condition is met, and selecting a target algorithm from the alternative algorithms according to the target integrated meta-algorithm. In the invention, the diversity of the meta-algorithm is effectively utilized through selective integration, and the accuracy and diversity of the integrated meta-algorithm are comprehensively improved.

Description

Algorithm selecting method and device based on ant lion algorithm
Technical Field
The invention relates to the technical field of computers, in particular to an algorithm selection method and device based on an ant lion algorithm.
Background
In the background of big data age, the analysis and decision making by using data become important works in different fields, and various artificial intelligence algorithms provide non-negligible assistance for the important works. However, there is no "optimal" algorithm with superior performance over all problems. How to select a suitable algorithm to meet the requirements for a given task from a large number of possible methods under different task scenarios is thus a key issue in engineering.
In the prior art, the algorithm selection method based on meta learning has the advantages of high flexibility, wide application range, low calculation cost and the like, and becomes a main method for algorithm selection. However, the algorithm selection method based on meta learning still has the problems of weak expansibility and insufficient diversity in the aspects of meta feature and meta algorithm selection and use.
Disclosure of Invention
The embodiment of the invention provides an algorithm selection method and device based on an ant lion algorithm, which aim to solve or partially solve the problems in the background technology.
In order to solve the technical problems, the invention is realized as follows:
in a first aspect, an embodiment of the present invention provides an algorithm selection method based on an ant lion algorithm, where the method includes:
Determining an alternative algorithm according to the application scene;
constructing a metadata set;
randomly initializing a plurality of ant individuals, and carrying out mixed coding on the ant individuals according to the meta-feature quantity of the meta-data set to generate an initial ant population;
based on the target optimization function, respectively carrying out fitness calculation on ant individuals in the initial ant population, and determining the ant lion population according to the fitness calculation result of the ant individuals;
determining elite ant lion in the ant lion population, and generating a newly-increased ant population by enhancing a random walk strategy according to the elite ant lion;
the individual in the newly-increased ant population is respectively subjected to fitness calculation, and then the newly-increased ant population is put into the ant lion population to update the ant lion population;
iteratively executing steps of determining elite lion in the updated termite-lion population according to the fitness calculation result, generating a newly-increased termite population according to the elite lion by enhancing a random walk strategy, and placing the newly-increased termite population into the termite-lion population so as to realize updating of the termite-lion population;
when the iteration exit condition is met, outputting the ant-lion population obtained in the last iteration to obtain an integrated element algorithm set;
and selecting a target integrating element algorithm from the integrating element algorithm set, and selecting an algorithm with the best performance in the application scene from alternative algorithms according to the target integrating element algorithm.
Optionally, the step of constructing the metadata set comprises:
extracting features of the historical data set, and determining the number of meta-features;
generating an optimal algorithm on a historical data set through a performance measure evaluation method, wherein different performance measure evaluation methods generate different optimal algorithms;
and forming a meta-instance by taking the meta-feature as an attribute and taking the optimal algorithm as a label, and constructing a meta-data set according to the meta-instance.
Optionally, the step of generating an optimization algorithm on the historical dataset by a performance measure evaluation method comprises:
determining a performance evaluation index of a performance measure evaluation method;
determining the performance evaluation index value of the alternative algorithm according to the historical data set;
and determining an alternative algorithm with the optimal performance evaluation index value as an optimal algorithm.
Optionally, each ant individual in the initial ant population corresponds to an integrated meta algorithm, the step of generating the initial ant population includes:
dividing the metadata set into a training metadata set and a test metadata set;
sampling the training metadata set by a self-help method according to the number of the base classifiers to obtain an initial training sub-data set;
Performing discrete coding on ant individuals according to the number of the meta-features, and performing meta-feature selection on an initial training sub-data set according to a coding result;
updating the initial training sub-data set according to the selection result of the meta-feature to obtain a first training sub-data set;
according to the number of the base classifiers, carrying out continuous coding on ant individuals to obtain weight codes of the base classifiers, and updating the first training sub-data set according to the size relation between the weight codes of the base classifiers and the selection threshold codes to obtain a target training sub-data set;
training the base classifier based on the target training sub-data set to obtain a base classifier set;
and generating an integration weight of the base classifier set according to the weight code of the base classifier, and integrating the base classifier according to the integration weight to obtain an integration meta algorithm.
Optionally, the target optimization function includes a first target function and a second target function, the step of calculating fitness of each ant individual in the initial ant population based on the target optimization function, and determining the ant lion population according to the calculation result of fitness includes:
determining the lowest algorithm classification error rate as the optimization direction of the first objective function;
Optimally determining the algorithm diversity index value as an optimization direction of the second objective function;
according to the first objective function and the second objective function, respectively carrying out fitness calculation on each ant individual;
determining the pareto dominant relationship among the fitness calculation results, eliminating ant individuals corresponding to the fitness calculation results with the dominant relationship, and generating ant lion populations.
Optionally, the step of performing fitness calculation on each ant individual according to the first objective function and the second objective function includes:
calculating a classification error rate of the integrated meta algorithm according to the first objective function;
according to the second objective function, calculating a diversity index value of the integrating element algorithm;
and determining the adaptability of the ant individuals corresponding to the integrated meta algorithm according to the classification error rate of the integrated meta algorithm and the diversity index value of the integrated meta algorithm.
Optionally, the steps of determining elite lions in the ant-lion population and generating a newly added ant population by enhancing random walk strategy according to elite lions include:
determining an ant-lion individual with the lowest classification error rate of the corresponding integrated element algorithm as an accurate elite ant-lion, and determining an ant-lion individual with the optimal diversity index value of the corresponding integrated element algorithm as a diversity elite ant-lion;
Respectively carrying out sparsity calculation on each ant-lion individual in the ant-lion population;
according to the sparsity calculation result of the ant lion individuals, selecting a first roulette ant lion and a second roulette ant lion from the ant lion population through a roulette method;
surrounding the accuracy elite lion and the first roulette termite lion, and carrying out random walk based on the enhanced walk strategy to generate a first newly-increased termite individual;
surrounding the diversity elite lion and the second roulette termite lion, and carrying out random walk based on the enhanced walk strategy to generate a second newly-added ant individual;
generating a new ant population according to the first new ant individuals and the second new ant individuals.
Optionally, the step of performing sparsity computation on each ant-lion individual in the ant-lion population includes:
determining the corresponding niche range of each ant-lion individual;
determining other ant-lion individuals in the range of the niche as adjacent ant-lions;
and determining the sparsity of each ant lion individual according to the number of adjacent ant lions.
A second aspect of the embodiment of the present invention provides an algorithm selection device based on ant lion algorithm, where the device includes:
the data acquisition module is used for determining an alternative algorithm according to the application scene;
The encoding module is used for constructing a metadata set;
the initial ant population generation module is used for randomly initializing a plurality of ant individuals, carrying out mixed coding on the ant individuals according to the meta-feature quantity of the meta-data set, and generating an initial ant population;
the ant lion determining module is used for respectively carrying out adaptability calculation on ant individuals in the initial ant population based on the target optimization function, and determining the ant lion population according to the adaptability calculation result of the ant individuals;
the ant population updating module is used for determining elite ant lions in the ant lion population and generating a new increased ant population by enhancing a random walk strategy according to the elite ant lions;
the ant lion population updating module is used for respectively carrying out fitness calculation on individuals in the newly-increased ant population, and then placing the newly-increased ant population into the ant lion population so as to update the ant lion population;
the iteration module is used for carrying out iteration to determine elite and ant lion in the updated ant and lion population according to the fitness calculation result, generating a newly-increased ant population by enhancing a random walk strategy according to the elite and ant lion, and placing the newly-increased ant population into the ant and lion population so as to realize the updating of the ant and lion population;
The output module is used for outputting the ant-lion population obtained in the last iteration when the iteration exit condition is met, so as to obtain an integrated element algorithm set;
the algorithm selection module is used for selecting a target integrated meta algorithm from the integrated meta algorithm set and selecting an algorithm with the best performance in the application scene from the alternative algorithms according to the target integrated meta algorithm.
The embodiment of the invention has the advantages that an initial ant population formed by a plurality of ant individuals is obtained according to a plurality of initial integrated element algorithm sets, and each ant individual in the initial ant population is respectively subjected to fitness calculation based on a target optimization function, so that the initial ant lion population is determined. Then, the elite lion in the initial termite-lion population is determined, and newly-increased ant individuals are generated by enhancing random walk strategies according to the elite lion. And finally, iteratively executing the steps, decoding the target ant-lion population obtained in the last iteration when the iteration exit condition is met, obtaining a final integrated element algorithm set, determining a target integrated element algorithm, and selecting the target algorithm according to the target integrated element algorithm. According to the invention, the full utilization of the meta-features is realized through the meta-feature selection, the diversity of the meta-features is improved, and the accuracy and the diversity of the meta-algorithm are integrated, so that the algorithm which is most matched with the application scene can be selected when the algorithm selection is carried out.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a prior art algorithm selection framework based on meta learning in an embodiment of the present invention;
FIG. 2 is a schematic diagram of an integrated meta-algorithm construction process in an embodiment of the present invention;
FIG. 3 is a flow chart of steps of an algorithm selection method based on the ant lion algorithm in an embodiment of the invention;
FIG. 4 is a schematic diagram of a coding scheme according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of the range of the termite lion to resolve the niche in the embodiment of the present invention;
FIG. 6 is a block flow diagram of an algorithm selection method based on the ant lion algorithm in an embodiment of the invention;
fig. 7 is a schematic block diagram of an algorithm selecting device based on the ant lion algorithm in an embodiment of the invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the related art, the algorithm selection problem can be solved by a manual selection method and a machine learning method. An existing algorithm selection framework based on meta learning is shown in fig. 1. Firstly, extracting meta-characteristics of a historical data set, and obtaining an optimal algorithm of the historical data set through performance measure evaluation, wherein different optimal algorithms can be obtained by adopting different performance measure methods in the process. And then taking the meta-feature as an attribute, taking the optimal algorithm as a label to form a meta-instance to construct a meta-data set, and training the meta-algorithm on the meta-data set. And after extracting the meta-characteristics of the new data set, inputting the meta-algorithm, and predicting and outputting the optimal algorithm by the meta-algorithm. However, in the current method, the selection of meta-features and meta-algorithms still has some problems. In the aspect of meta-features, the existing method generally selects fixed meta-features according to requirements, and has higher coupling and weaker expandability; different meta-features describe and extract abstract features of the dataset from different aspects, with some complementarity, however, existing approaches are difficult to use effectively. In the aspect of meta-algorithm, the existing method either uses a single meta-algorithm with weaker generalization performance or adopts an integrated meta-algorithm of a isomorphic base learner, so that the advantages and diversity of different base learners are not fully utilized.
Based on this, the inventors have proposed the core inventive concept of the present application: and simultaneously selecting meta-characteristics and constructing a selective integrated meta-algorithm, and taking the accuracy and diversity of the integrated meta-algorithm as an optimization target. And selecting a meta-feature subset through discrete coding, constructing an integrated meta-algorithm by using continuous coding, and improving optimizing performance by applying a migration-enhancing strategy and a preference elite selection mechanism on the basis.
Firstly, the application provides an algorithm selection model, which improves algorithm selection performance from two aspects of meta-features and meta-algorithms, and selects a meta-feature subset with stronger complementarity in the meta-features so as to effectively utilize complementarity among various meta-features; in the aspect of the meta algorithm, different types of base classifiers are integrated through a selective integration method, and an integrated meta algorithm with stronger generalization performance and diversity is constructed, so that the advantages of different base learners are fully utilized, and the diversity of the integrated meta algorithm is improved. The process of constructing the integrated meta algorithm based on the algorithm selection model of the present application is shown in fig. 2: firstly, sampling a training set for a plurality of times by using a self-service method to generate a plurality of training sub-data sets, namely a selection process of corresponding meta-features, wherein the process is realized based on a hybrid coding mechanism, and a base classifier is used for training on the training sub-data sets to form a base classifier set; and selecting a base classifier with high accuracy and diversity from the set by a selective integration method, and combining based on a weighted voting strategy to obtain an integrated element algorithm. And different integrated meta-algorithms formed by different base classifier combinations are used for generating an integrated meta-algorithm set formed by different integrated meta-algorithms, and in the integrated meta-algorithm set, optimization is performed from the accuracy of the algorithm and the diversity of the algorithm to obtain a final target integrated meta-algorithm, wherein the process is to convert the multi-target optimization problem into the multi-target mixed ant-lion optimization problem. This process is realized by an algorithm selection method based on the ant lion algorithm as shown in fig. 3 of the present application.
The following describes an algorithm selection method based on the ant lion algorithm, as shown in fig. 3, fig. 3 shows a flow chart of an algorithm selection method based on the ant lion algorithm.
S301: and determining an alternative algorithm according to the application scene.
In this embodiment, the application scenario refers to an environment in which a user performs a task, for example, if the user needs to classify a given image, the application scenario may be an image classification scenario, if the user needs to evaluate the income of residents according to information of residents in a certain area, the application scenario may be a scenario for predicting data, and the alternative algorithm refers to an algorithm required for completing the application scenario.
As an example, if the application scene is an image classification scene, the corresponding candidate algorithms may be an image classification algorithm 1, an image classification algorithm 2, and an image classification algorithm 3 for image classification, and if the application scene is a data prediction scene, the corresponding candidate algorithms may be a prediction algorithm a, a prediction algorithm b, and a prediction algorithm c for data prediction.
S302: a set of metadata is constructed.
In this embodiment, the number of dimensions of different types of codes of an individual is determined based on the number of meta-features and the number of base classifiers of a metadata set, and the metadata set is constructed by:
S302-1: and extracting the characteristics of the historical data set, and determining the number of the meta-characteristics.
In this embodiment, if the application scenario is to evaluate the income of residents according to the information of residents in a certain area, as an example, the resident income history data in table 1, a-E represents various characteristics of each resident, such as age, sex, etc., and the characteristic A, B, C, D is a characteristic for determining the income of the residents, and thus the characteristics are determined as meta-characteristics, the number of meta-characteristics is 4,1-9 represents 9 different residents, namely, the number of corresponding meta-examples, the number of meta-characteristics is 9, and E represents the actual income of the residents, namely, the target algorithm label corresponding to the meta-characteristics.
Table 1: resident income history data sheet
Sequence number A B C D E
1 X X X X X
2 X X X X X
3 X X X X X
4 X X X X X
5 X X X X X
6 X X X X X
7 X X X X X
8 X X X X X
9 X X X X X
S302-2: and generating an optimal algorithm on the historical data set by a performance measure evaluation method.
In the present embodiment, the history data set is used in the same application scenario. The evaluation criteria are different, and the determined optimal algorithms are also different.
As an example, if for an image classified scene, alternative algorithms include: the image classification algorithm 1, the image classification algorithm 2 and the image classification algorithm 3, if the classification accuracy is used as a performance measure evaluation standard, the corresponding optimal algorithm can be the image classification algorithm 1, and if the classification speed is used as a performance measure evaluation standard, the corresponding optimal algorithm can be the image classification algorithm 2, so that different performance measure evaluation methods generate different optimal algorithms.
And the step of generating an optimization algorithm on the historical dataset by the performance measure evaluation method may include:
s302-2-1: and determining a performance evaluation index of the performance measure evaluation method.
S302-2-2: and determining the performance evaluation index value of the alternative algorithm according to the historical data set.
S302-2-3: and determining an alternative algorithm with the optimal performance evaluation index value as an optimal algorithm.
In the implementation of S302-2-1 to S302-2-3, continuing the description of the above embodiments, the performance evaluation index may be the classification speed, the higher the classification speed is, the higher the corresponding performance evaluation index value is, and then the classification speeds corresponding to the image classification algorithm 1, the image classification algorithm 2 and the image classification algorithm 3 are calculated respectively, which are 1S,2S and 3S respectively, and the performance evaluation index values corresponding to the classification algorithm 1, the image classification algorithm 2 and the image classification algorithm 3 may be 100 minutes, 90 minutes and 80 minutes, and the classification algorithm 1 with the highest performance evaluation index value is regarded as the optimal algorithm under the performance evaluation index of the classification speed.
S302-3: and forming a meta-instance by taking the meta-feature as an attribute and taking the optimal algorithm as a label, and constructing a meta-data set according to the meta-instance.
In this embodiment, after the number of meta-features is determined, for any one meta-feature, an optimal algorithm corresponding to the meta-feature is used as an algorithm tag of the meta-feature, so as to form a meta-instance, and a metadata set is constructed by a plurality of meta-instances.
As an example, continuing to take table 1 as an example, for each resident, its A, B, C, D four-element features are extracted and combined with its target algorithm label (corresponding optimal algorithm), resulting in the following table 2: the resident revenue metadata instance, and the plurality of different resident revenue metadata instances, may form a resident revenue metadata set.
Table 2: meta-instance corresponding to resident income data set
S303: and randomly initializing a plurality of ant individuals, and carrying out mixed coding on the ant individuals according to the meta-feature quantity of the meta-data set to generate an initial ant population.
In the embodiment, the algorithm accuracy and the algorithm diversity are used as different optimization targets, the integrated element algorithm is optimized, and the optimization process is converted into a solving process of the multi-target ant-lion optimization algorithm. Multiple ant individuals can be generated by a random initialization method.
S303-1: the metadata set is divided into a training metadata set and a test metadata set.
In the present embodiment, after the metadata set is determined, the metadata set is divided according to the needs of the user. As an example, if it is determined that the ratio of the training set to the test metadata set is 1:2, the training set is divided according to the number of training samples, if the number of training samples is 9, the number of training samples for training is 3, the number of training samples for testing is 6, and the table 2 is continuously used for explanation, then 6 rows are selected for testing, and 3 rows are selected for training. As an example, if residents numbered 1, 4, 9 are selected as the test metadata set, one training sample in the training metadata set is trained, as shown in table 3.
Table 3: training metadata samples
S303-2: and sampling the training metadata set by a self-service method according to the number of the base classifiers to obtain an initial training sub-data set.
In this embodiment, the self-service sampling refers to sampling with a put back when sampling, and in order to utilize the advantages and diversity of different base classifiers, three types of base classifiers, namely KNN (Knearest neighbors, K nearest neighbor), SVM (support vector machine ) and CART (classificationand regression tree, classification regression tree), are used as the base classifiers, and the number of each base classifier is set to be equal. The number of the base classifiers is determined according to the setting of the user, taking the number of the set base classifiers as 5 as an example, 5 initial training sub-data samples with 5x3 specifications are required to be generated, so that an initial training sub-data set is obtained, and the initial training sub-data set is obtained by sampling the training metadata set with 5x3 specifications through a self-service method. By way of example, for each meta-instance in a training set of meta-data, since self-service sampling is employed, the probability that each meta-instance is selected at each sampling is the same. After 5 random samplings, 5 initial training sub-data sets matching the number of base classifiers are generated. Wherein any one of the initial training sub-data sets may be as shown in table 4.
Table 4: initial training sub-data samples
S303-3: and performing discrete coding on the ant individuals according to the quantity of the meta-characteristics, and performing meta-characteristic selection on the initial training sub-data set according to the coding result.
S303-4: and updating the initial training sub-data set according to the selection result of the meta-feature to obtain a first training sub-data set.
In the embodiments of S303-3 to S303-4, for the meta-feature selection, a discrete code is used to select a meta-feature subset, i.e. for each meta-feature, there are only two properties, selected or unselected, when it is used for training of the integrated meta-algorithm, so by discrete coding each meta-feature, and the number of discrete codes is equal to the number of meta-features. Whether each meta-feature is selected for training of the base classifier is determined by whether the discrete code corresponding to each meta-feature is 0 or 1, namely, the initial training sub-data set is updated according to the selection result of the meta-feature, and a first training sub-data set is obtained.
As an example, if the initial training sub-data sample of the resident income shown in table 4 is subjected to meta-feature selection, the discrete number corresponding to meta-feature a is 0, the discrete number corresponding to meta-feature B is 1, the discrete number corresponding to meta-feature C is 1, and the discrete number corresponding to meta-feature D is 0, it is explained that meta-features B and C are selected as training of the base classifier, and the initial training sub-data set is updated to obtain a first training sub-data set, and one training sample in the first training sub-data set is shown in table 5.
Table 5: first training sub-data sample
S303-5: and continuously encoding the initial training sub-data set according to the number of the initial base classifiers to obtain the weight codes of the base classifiers, and updating the first training sub-data set according to the size relation between the weight codes of the initial base classifiers and the preset selection threshold codes to obtain the final training sub-data set.
S303-6: and training the initial base classifier based on the final training sub-data set to obtain the integration weight of the target base classifier and the target base classifier.
S303-7: and the target base classifier obtains a prediction result according to the test metadata set, and initiates an integrated meta algorithm set according to the accuracy of the prediction result and the integration weight of the target base classifier.
In the embodiments of S303-5 to S303-7, while the meta-feature selection is performed, since each base classifier may be trained on any one of the initial training sub-data sets, it is determined whether each type of base classifier is trained using the initial training sub-data set by continuously encoding the initial training sub-data set. By way of example, if there are 5 base classifiers and each base classifier has 3 types, there may be 15 base classifiers according to which to train for any one initial training sub-data set. The judgment of the process is carried out through the size relation between the weight code of each base classifier and the preset selection threshold code. For any base classifier, if the weight code of the base classifier is larger than or equal to the selection threshold code, the base classifier can be trained on the current initial training sub-data set, and if the weight code of the base classifier is smaller than the selection threshold code, the base classifier can not be trained on the current initial training sub-data set. In this way, for any one initial training sub-data set, it is possible to determine which meta-features in the initial training sub-data set can be used for training and for training which base classifiers.
As an example, if there are 5 base classifiers, numbered 1-5, respectively, and 5 initial training sub-data sets, numbered F-L, respectively, wherein on the base classifier numbered 1, there are training sub-data sets numbered F and numbered L as training data sets for the base classifier of KNN type, then from the first training sub-data set numbered F-L after meta-feature selection, the first training sub-data set numbered F and numbered L is selected as the final training sub-data set for the base classifier of KNN type of the base classifier numbered 1. Similarly, the above operations are performed on each type of base classifier numbered 1-5, so as to obtain the integration weights of the target base classifier and the target base classifier, and then normalization is performed.
For example, if there are 4 target base classifiers, the predicted result of the first 2 target base classifiers is a, the predicted result of the last 2 target base classifiers is B, the weights occupied by the first 3 target base classifiers are all 0.2, and the weight occupied by the last target base classifier is 0.4, the final predicted result of the target base classifier is B, and then the predicted result is B and the optimal algorithm label are compared to determine the accuracy of the predicted result. And then, initializing an integrated meta algorithm set according to the accuracy of the prediction result and the integration weight of the target base classifier.
While using the integration weight w= { W 1 ,w 2 ,…,w v V radix classifiers b= { B, B } are combined 2 ,…,b v Building an integrating element algorithm E, wherein the formula for predicting the test element instance x is as follows:
wherein E (x) represents the predicted optimal algorithm result; a is a candidate algorithm label in A; b i (x) The prediction result of the ith base classifier on x; i (·) is an indirection function, when the expression logic therein is true, I (·) is 1, otherwise is 0. The argmax function represents an algorithm tag a that maximizes its internal expression; w (w) i Is the integration weight of the i-th base classifier.
In the case of performing hybrid coding on any one of the initial training sub-data sets, as shown in fig. 4, the initial training sub-data sets are first subjected to discrete coding according to the number of the meta-features, and if M meta-features are included, the individual discrete coding number is M, and the discrete coding is 0 or 1, and then, in the case of performing hybrid coding, the number of the base classifiers is V, so that the weight coding number of the base classifier is 3V, and the selection threshold coding is set between the weight coding and the discrete coding, and therefore, the continuous coding number is 3v+1, and the weight coding and the selection threshold coding are floating points between 0 and 1.
By way of example, the following is a mathematical implementation of the hybrid encoding of the present application:
According to the hybrid coding mechanism, the formula for initializing the coding values of each dimension of the individual is as follows:
wherein A is i For the coding value of the i-th dimension of an individual, the M-dimension coding before the individual is selected and coded for the discrete meta-feature, and the formula of the coding value is initialized through R (rand) as follows:
a value of 1 indicates that the meta-feature corresponding to the i-th dimension code is selected, and a value of 0 indicates that the meta-feature is not selected. The M+1th dimension code of the individual is a selection threshold code, the following 3V codes are base classifier weight codes, the first V base classifier weight codes are knN selection training sub-data sets, and the formula is as follows:
wherein A is M+1 To select a threshold code value, a J (Ai) value of 1 indicates that the training sub-dataset corresponding to the ith dimension code is selected, a value of 0 indicates that it is not selected, and so on, another training sub-dataset of 2 base classifiers is available. The 3 base classifiers are independently trained on the selected training sub-data sets respectively to obtain a base classifier set B= { B1, B2, …, bv }, and the integrated weights W= { W1, W2, …, wv } of the base classifiers are generated through the weight coding values of the base classifiers, wherein the formula is as follows:
wherein wi is the integration weight of the ith base classifier, ai is the base classifier weight coding value corresponding to the ith base classifier, and Aj is the base classifier weight coding value corresponding to the jth base classifier. By the normalization method, the integrated weight sum is 1.
S304: and respectively carrying out fitness calculation on ant individuals in the initial ant population based on the target optimization function, and determining the ant lion population according to the fitness calculation result of the ant individuals.
In this embodiment, the fitness calculation is performed on each ant by taking the accuracy and the diversity of each integrated meta-algorithm as optimization targets, that is, by calculating the accuracy and the diversity of the integrated meta-algorithm as the fitness of the corresponding ant, and according to the result of the fitness calculation, a pareto solution is selected as an initial ant lion population through a pareto dominant relationship, and is stored in an external archive.
The step of calculating the fitness of each ant individual can be as follows:
s304-1: determining the lowest algorithm classification error rate as the optimization direction of the first objective function;
s304-2: optimally determining the algorithm diversity index value as an optimization direction of the second objective function;
s304-3: according to the first objective function and the second objective function, respectively carrying out fitness calculation on each ant individual in the initial ant population;
s304-4: determining the pareto dominant relationship among the fitness calculation results, eliminating ant individuals corresponding to the fitness calculation results with the dominant relationship, and generating ant lion populations.
In the embodiments of S304-1 to S304-5, for any one of the algorithms, the accuracy of the algorithms is evaluated using a classification Error Rate (ER), i.e. the first objective function is used to calculate the diversity of each algorithm using different diversity indicators (diversity indicator, DI) to measure the diversity of the algorithms, including Q Statistics (QS), K Statistics (KS), correlation coefficients (correlation coefficient, CC), coincidence metrics (agreement measure, AM) and double fault measures (DF), i.e. the second objective function is used to calculate the diversity of each algorithm. The lower the classification error rate of the integrated element algorithm is, the better the diversity index value is, the better the corresponding performance of the integrated element algorithm is, namely the higher the adaptability of the corresponding ant individual is. It should be noted that, for the integrated meta algorithm, there are two optimization targets of classification error rate and diversity, so the integrated meta algorithm is a multi-target optimization problem, where multi-target optimization refers to a problem of optimizing multiple targets simultaneously and contradicting each other, and the problem is usually a problem of uncertainty (Non-deterministic Polynomial, NP) of polynomial complexity, that is, a problem that a unique optimal solution cannot be obtained, but a set of suboptimal solutions. For both x and y solutions, when the formula below constraint is satisfied, then x is said to dominate y, denoted x < y.
And after the fitness of each ant individual is determined, selecting ant lions according to the pareto dominant relationship among ant individual fitness calculation results.
As an example, if a is found to dominate B, D dominates C, and there is no dominance between a and D for the ant individuals numbered A, B, C and D, based on the fitness calculation result, the B and C ant individuals may be eliminated, and the a and D individuals may be regarded as ant lions.
In a possible implementation manner, the step of performing fitness calculation on each ant individual according to the first objective function and the second objective function includes:
s304-3-1: calculating a classification error rate of the integrated meta algorithm according to the first objective function;
s304-3-2: according to the second objective function, calculating a diversity index value of the integrating element algorithm;
s304-3-3: and determining the adaptability of the ant individuals corresponding to the integrated meta algorithm according to the classification error rate of the integrated meta algorithm and the diversity index value of the integrated meta algorithm.
In the implementation modes from S304-3-1 to S304-3-3, m test element examples X= { X1, X2, …, xm } are set by measuring the accuracy and diversity of each integrated element algorithm as the fitness of corresponding ant individuals, the optimal algorithm label corresponding to each element example is Y= { Y1, Y2, …, ym } and has yk epsilon A, 1.ltoreq.k.ltoreq.m, wherein A= { a1, a2, …, al } is a set containing candidate algorithm labels, and the calculation formula of the first objective function ER is as follows:
Wherein E represents an integrated meta-algorithm, and y i And respectively representing the predicted optimal algorithm label and the true optimal algorithm label of the ith test element instance. The integration meta algorithm E is constructed by combining v base classes b= { B1, B2, …, bv } with the integration weights w= { W1, W2, …, wv } by a weighted voting method.
For m testcell instances, the base classifier bi and bj prediction results are listed in table 6.
Table 6: predicted outcome list
In the table, k is more than or equal to 1 and less than or equal to m, c represents the number of meta-instances where bi and bj are both predicted correctly; p represents the number of meta-instances where bi predicts correctly and bj predicts incorrectly; q represents the number of meta-instances where bi is mispredicted and bj is correctly predicted; d represents the number of meta-instances of errors in both bi and bj. According to the predicted result list, the calculation formula of QS, KS, CC, AM, DF indexes of bi and bj is as follows:
the smaller the index value is, the stronger the diversity of the base classifier is, and the value range of the indexes for dividing AM and DF is [0,1 ]]The values of the other indexes are [ -1,1]. In addition, the indexes are paired indexes, namely, the indexes meetThe DI value of the integrated meta algorithm E is the average of all the pairwise basis classifiers DI, i.e. the calculation formula of the second objective function for calculating the diversity index value of each ant individual is as follows:
In summary, the fitness calculation formula for the ant individual is shown as follows, namely, the accuracy and diversity index value of the integrated element algorithm is used as the fitness of the ant individual:
F SAMO =min(ER E ,DI E ) (14)
wherein F is SAMO And (4) representing the objective function vector of the model, and calculating the fitness of each ant according to the formula (14).
S305: and determining elite ant lions in the ant lion population, and generating a new increased ant population by enhancing a random walk strategy according to the elite ant lions.
In this embodiment, according to elite ant lion, the step of generating newly added ant individual by enhancing random walk strategy may be:
s305-1: and determining the ant lion individuals with the lowest classification error rate of the corresponding integrated meta-algorithm in the initial ant lion population as the accuracy elite ant lion, and determining the ant lion individuals with the lowest classification error rate of the corresponding integrated meta-algorithm in the initial ant lion population as the accuracy elite ant lion.
S305-2: and (5) respectively carrying out sparsity calculation on each ant-lion individual in the ant-lion population.
S305-3: and according to the sparsity calculation result of the ant lion individuals, selecting a first roulette ant lion and a second roulette ant lion from the ant lion population by a roulette method.
S305-4: and (3) gambling the ant lion around the accuracy elite ant lion and the first roulette ant lion, and carrying out random walk based on the enhanced walk strategy to generate a first newly-added ant individual.
S305-5: and (3) gambling the ant lion around the diversity elite ant lion and the second roulette ant lion, and carrying out random walk based on the enhanced walk strategy to generate a second newly-added ant individual.
S305-6: generating a new ant population according to the first new ant individuals and the second new ant individuals.
In the embodiments of S305-1 to S305-6, elite ant lion refers to an individual with the highest fitness in the ant lion population, and since two optimization directions of the lowest classification error rate and the optimal diversity index value exist, the ant lion individual with the lowest classification error rate is determined to be the accurate elite ant lion, the ant lion individual with the optimal diversity index value is determined to be the diversity elite lion, and after the accuracy elite lion and the diversity elite lion are determined, the individual is searched and updated by adopting an enhanced walk strategy on the basis of a hybrid coding mechanism, and meanwhile, randomness is introduced in the process, so that the population diversity of an algorithm is increased.
As an example, in the iterative process, each initialization ant performs 2 roulette bets to select an ant lion with better distribution, so as to generate a first roulette ant lion and a second roulette ant lion, which are mutually independent, that is, performs the distribution selection according to accuracy elite lion, generates the first roulette ant lion, performs the distribution selection according to diversity elite lion, and generates the second roulette ant lion. Then, 2 new ants were generated around the accuracy elite lion and the first selected roulette ant lion, and around the diversity elite lion and the second selected roulette ant lion, and randomly walked based on the enhanced walk strategy. It should be noted that the above process is for each initial ant, and all the first and second newly-added ants generated by the initial ants together form a new ant population. The random walk strategy is realized through the following steps:
In the iterative process, the calculation formula of the discrete coding of the individual using the discrete random walk method is as follows:
wherein,,performing discrete random walk for the discrete code of the t-th iterative ant, and obtaining an i-th dimension element value of the discrete vector; />An ith dimension discrete code value of an ant lion which is formed by enabling an ant to walk around for the t time iteration; xt = cumsum of random walk steps for t previous iterations of the cumsum (r (t)) table, where r (t) = (0, r1, r2, …, rt) table) vector of random walk steps for t previous iterations, r is calculated as follows: />
Wherein rand represents a position of [0,1 ]]Random numbers uniformly distributed among the two. It is not difficult to calculate X t The value is [ -t, t]From this, it can be seen that X t With/t being at [ -1,1]The calculation formula of m (t) is as follows:
wherein,,at [0,1]The value range of (2 rand-1) is [ -1,1 ] which shows nonlinear decreasing trend with the increase of the iteration times t]So that m (t) is [ -1,1]The time of the random-type T-value is increased to show a nonlinear decreasing trend with randomness.
After random walk, the updating of the ant discrete coding value is shown as the following formula:
wherein,,i-th dimension discrete code value for t-th iteration ant, < >>And->The ith dimension element value of the discrete vector obtained by performing discrete random walk on ants around elite ant lion and roulette 0 ant lion in the t-th iteration is respectively.
For continuous coding of individuals, randomness is introduced in the wandering update process, so that population diversity of an algorithm is increased, and the formula is as follows:
wherein c and d represent the upper and lower bounds of the individual dimension values, c t And d t The calculation formula of I is shown as follows, wherein the calculation formula respectively shows the upper bound and the lower bound of the search range of each dimension value of ants in the t-th iteration:
wherein, when the value of w depends on the current iteration times T and T is less than or equal to 0.1T, w=0; t is t>At 0.1T, w=2; t is t>At 0.5T, w=3; t is t>0.75T is w=4;t>0.9T then w=5; t is t>0.95T then w=6. It can be seen that 10 as t increases w And I value rendering trend increases such that c t And d t In a decreasing trend, on the other hand,in [0.5,1]The number of iterations t increases to present a non-linear decreasing trend, so that +.>At [0,1.5]Exhibiting a non-linear decreasing trend with randomness in between, thereby at c t And d t Introducing a certain randomness in the changing process of the ant, and updating the ant continuous coding value after the random walk is shown as the following formula:
wherein A is t Is the continuous coding vector of the t-th iteration ant,for the t iteration ant around elite ant lion to randomly walk the obtained continuous vector,/for the t iteration ant>The resulting continuous vector was randomly walked around the roulette ant lion for the t-th iteration. The enhanced walk strategy searches and updates different types of codes of the individual, the process that ALO simulates ants to slide into traps is reserved, namely the decreasing change trend of ant search boundaries is reserved, and meanwhile randomness is introduced in the process, so that the population diversity of the algorithm is increased. / >
Through the enhanced random walk strategy, each ant individual a can generate a first new ant individual a' and a second new ant individual a″ through random walk around the accuracy elite ant lion and the roulette ant lion, and after the operation is performed on each individual in the ant population, all the obtained new ant individuals formed by the first new ant individual and the second new ant individual are obtained.
In one possible embodiment, the sparsity calculation is determined by:
s305-2-1: determining the corresponding niche range of each ant-lion individual;
s305-2-2: determining other ant-lion individuals in the range of the niche as adjacent ant-lions;
s305-2-3: and determining the sparsity of each ant lion individual according to the number of adjacent ant lions.
In the embodiments of S305-2-1 to S305-2-3, the sparseness of each ant lion solution is determined by determining the number of other ant lion solutions in the adjacent area of the ant lion solution, and the smaller the number of other ant lion solutions in the adjacent area is, the better the distribution of the ant lion solution is, and the greater the number of other ant lion solutions in the adjacent area is, the worse the distribution of the ant lion solution is.
As an example, as shown in the schematic diagram of fig. 5, in the coordinate system, the horizontal axis of the coordinate system represents the accuracy of the integrated meta-algorithm corresponding to the ant-lion solution, the vertical axis represents the diversity of the integrated meta-algorithm corresponding to the ant-lion solution, the range of the circle corresponding to the ant-lion solution characterizes the niche range of the ant-lion solution, and the niche range of the solution of the ant-lion solution is realized through the following steps: the distribution of ant lion solutions is quantified by calculating the sparsity of ant lion solutions in archives through a niche technology, and the calculation formula is as follows:
where s (x, φ)) shows the sparsity of the solution x in the niche range for a given radius φ, y represents another solution in the archive, and the calculation formula for radius φ is as follows:
wherein m (m.gtoreq.2) is) chemicalThe target numbers ei and ej respectively represent the elite ant lion, ||F (e) i )-F(e j ) Computing Euclidean distance of the objective function value vectors of the two steps, wherein c is a constant; the Euclidean distance of the solution x and the solution y objective function value vector is calculated by I F (x) -F (y), and the value of the Euclidean distance is smaller than the radiusWhere y represents the adjacent solution of x. The more adjacent solutions a given solution is in the archive, the lower the sparsity of the solution, the worse the distribution.
S306: and respectively carrying out fitness calculation on individuals in the newly-increased ant population, and then placing the newly-increased ant population into the ant lion population to update the ant lion population.
In this embodiment, after the newly-increased ant population obtained in this iteration is obtained, the newly-increased ant population is put into an external archive corresponding to the ant lion population, so as to realize updating of the ant lion population.
S307: and iteratively executing the steps of determining elite lion in the updated termite lion population according to the fitness calculation result, generating a newly-increased termite population according to the elite lion by enhancing a random walk strategy, and placing the newly-increased termite population into the termite lion population so as to update the termite lion population.
In this embodiment, after the updated ant lion population is obtained, fitness calculation is performed on newly-increased individuals of the updated ant lion population, that is, the individuals in the newly-increased ant lion population, and according to the result of the fitness calculation, the accuracy elite lion and the diversity elite lion individuals are updated, and based on the updated accuracy elite lion and diversity elite lion individuals, a newly-increased ant population of a new generation is generated by enhancing a random walk strategy, and based on the newly-increased ant population of the new generation, the ant lion population is updated, and the iterative process described in the present process is continued until the corresponding iterative condition is satisfied.
S308: and when the iteration exit condition is met, outputting the ant-lion population obtained in the last iteration to obtain an integrated element algorithm set.
In this embodiment, when the maximum iteration number is reached, the ant lion population obtained by the last iteration stored in the outside is taken as output, and each ant lion individual in the ant lion population corresponds to one integrated meta-algorithm, so as to obtain an integrated meta-algorithm set, where the integrated meta-algorithm set is formed by a plurality of screened integrated meta-algorithms.
S309: and selecting a target integrating element algorithm from the integrating element algorithm set, and selecting an algorithm with the best performance in the application scene from alternative algorithms according to the target integrating element algorithm.
In this embodiment, after the final set of integrated meta-algorithms is obtained, the user's requirements may be algorithm accuracy, algorithm diversity, and then the target integrated meta-algorithm is selected from the final set of integrated meta-algorithms according to the user's requirements.
As an example, if the application scenario of the user is an application scenario of image classification, and the user hopes to select the algorithm for image classification from the algorithms most accurately, the target integrated meta-algorithm is the integrated meta-algorithm with the highest accuracy, then each algorithm in the alternative algorithm is used as input to the target integrated meta-algorithm, and the result output by the target integrated meta-algorithm is the algorithm with the best performance in the application scenario for executing the image classification task at this time.
In the following description of the scheme of the present application in connection with the actual operation flow of fig. 6, a metadata set is first input; initializing ant population according to the mixed coding mechanism; calculating the fitness of ants, selecting pareto solutions from the ant solutions through pareto dominant relations, and storing the solution as ant lions to be stored outside; according to a preference elite selection mechanism, elite individuals on accuracy and diversity optimization targets are selected from the ant lions, and the accuracy elite individuals are called diversity elite ant lions and accuracy elite lions; in the iteration process, each initialized ant carries out 2 roulette bets to select ant lions with better distribution, surrounds accurate elite ant lions and first roulette ant lions, surrounds diverse elite ant lions and second roulette ant lions, carries out random walk based on an enhanced walk strategy, and generates 2 new ants; calculating the fitness of the new ant population, adding the new ant population into an archive, and updating the archive and elite ant lion; and finally, outputting an integrated element algorithm constructed by ant lion when the maximum iteration is reached. And optimizing the algorithm selection model and taking the accuracy and diversity index value of the integrated meta algorithm as the fitness of the individual.
According to the algorithm selection method based on the ant lion algorithm, the full utilization of the meta-features is realized through meta-feature selection, the integrated meta-algorithm is constructed by using selective integration, and the accuracy and the diversity of the integrated meta-algorithm are used as optimization targets; and optimizing the model by applying an algorithm, enabling an individual to simultaneously complete meta-feature selection and selective integrated meta-algorithm construction by using a hybrid coding mechanism, and enhancing the optimizing capability of the algorithm by adopting an enhanced walk strategy and a preference elite selection mechanism. In the iterative process, for each optimization objective, each initialization ant selects one ant lion with better distribution through roulette, and the elite ant lion and the roulette ant lion surrounding the optimization objective walk randomly to generate new individual solutions. Thus, a new ant population with individuals several times of the initialized population can be obtained, and added into an external archive, and the pareto solution in the archive is screened and reserved as a new generation ant lion. Through a preference elite selection mechanism, iterative updating is respectively carried out around optimal individuals on different optimization targets, so that the optimizing performance of an algorithm is enhanced; searching around better-distributed individuals of roulette selection makes the solution more distributed.
The embodiment of the invention also provides an algorithm selecting device based on the ant lion algorithm, referring to fig. 7, a functional block diagram of a first aspect of the embodiment of the algorithm selecting device based on the ant lion algorithm is shown, and the device comprises:
a data acquisition module 701, configured to determine an alternative algorithm according to an application scenario;
an encoding module 702 for constructing a metadata set;
an initial ant population generation module 703, configured to randomly initialize a plurality of ant individuals, and perform mixed encoding on the ant individuals according to the number of meta-features of the meta-data set, so as to generate an initial ant population;
the ant lion determining module 704 is configured to perform fitness calculation on ant individuals in the initial ant population based on the objective optimization function, and determine the ant lion population according to the fitness calculation result of the ant individuals;
the ant population updating module 705 is configured to determine elite ant lions in the ant lion population, and generate a new ant population by enhancing a random walk strategy according to the elite ant lions;
the ant lion population updating module 706 is configured to perform fitness calculation on individuals in the newly-increased ant population, and then place the newly-increased ant population into the ant lion population to update the ant lion population;
The iteration module 707 is configured to iteratively perform the steps of determining elite ant lion in the updated ant lion population according to the fitness calculation result, generating a newly-increased ant population by enhancing a random walk strategy according to the elite ant lion, and placing the newly-increased ant population into the ant lion population to implement updating of the ant lion population;
the output module 708 is configured to output the ant-lion population obtained in the last iteration to obtain an integrated meta-algorithm set when the iteration exit condition is satisfied;
the algorithm selection module 709 is configured to select a target integrated meta-algorithm from the integrated meta-algorithm set, and select, according to the target integrated meta-algorithm, an algorithm with the best performance in the application scenario from the candidate algorithms.
In one possible implementation, the encoding module 702 includes:
the meta-feature determining submodule is used for extracting features of the historical data set and determining the quantity of meta-features;
the evaluation sub-module is used for generating an optimal algorithm on the historical data set through a performance measure evaluation method, wherein different performance measure evaluation methods generate different optimal algorithms;
and the metadata set construction sub-module is used for forming a metadata instance by taking the metadata characteristics as attributes and the optimal algorithm as a label, and constructing the metadata set according to the metadata instance.
In a possible embodiment, the evaluation submodule includes:
an index determining unit for determining a performance evaluation index of the performance measure evaluating method;
the evaluation unit is used for determining the performance evaluation index value of the alternative algorithm according to the historical data set;
and the screening unit is used for determining the alternative algorithm with the optimal performance evaluation index value as an optimal algorithm.
In one possible implementation, the encoding module 702 further includes:
the dividing sub-module is used for dividing the metadata set into a training metadata set and a testing metadata set;
the sampling sub-module is used for sampling the training metadata set by a self-service method according to the number of the base classifiers to obtain an initial training sub-data set;
the selection sub-module is used for carrying out discrete coding on ant individuals according to the number of the meta-characteristics and carrying out meta-characteristic selection on the initial training sub-data set according to the coding result;
the first updating sub-module is used for updating the initial training sub-data set according to the selection result of the meta-characteristics to obtain a first training sub-data set;
the second updating sub-module is used for carrying out continuous coding on ant individuals according to the number of the base classifiers to obtain weight codes of the base classifiers, and updating the first training sub-data set according to the size relation between the weight codes of the base classifiers and the selection threshold codes to obtain a target training sub-data set;
The training sub-module is used for training the base classifier based on the target training sub-data set to obtain a base classifier set;
and the integration sub-module is used for generating the integration weight of the base classifier set according to the weight code of the base classifier, and integrating the base classifier according to the integration weight to obtain an integration meta-algorithm.
In one possible implementation, the ant lion determination module 704 includes:
the first optimization sub-module is used for determining the lowest algorithm classification error rate as the optimization direction of the first objective function;
the second optimization sub-module is used for optimally determining the algorithm diversity index value as the optimization direction of the second objective function;
the fitness calculation sub-module is used for calculating the fitness of each ant according to the first objective function and the second objective function;
the initial ant lion population generation sub-module is used for determining the pareto dominant relationship among the fitness calculation results, eliminating ant individuals corresponding to the fitness calculation results with the dominant relationship, and generating the ant lion population.
In one possible implementation, the fitness computing sub-module includes:
a classification error rate calculation unit for calculating a classification error rate of the integrated element algorithm according to the first objective function;
The diversity index value calculation unit is used for calculating the diversity index value of the integrated element algorithm according to the second objective function;
and the fitness calculating unit is used for determining the fitness of the ant individuals corresponding to the integrated element algorithm according to the classification error rate of the integrated element algorithm and the diversity index value of the integrated element algorithm.
In one possible implementation, the ant population update module 705 includes:
the elite ant lion determination submodule is used for determining an ant lion individual with the lowest classification error rate of the corresponding integrated element algorithm as an accurate elite ant lion, and determining an ant lion individual with the optimal diversity index value of the corresponding integrated element algorithm as a diversity elite ant lion;
the sparsity calculation submodule is used for respectively carrying out sparsity calculation on each ant-lion individual in the ant-lion population;
the selecting submodule is used for respectively selecting a first roulette ant lion and a second roulette ant lion from the ant lion population by a roulette method according to the calculation result of the sparseness;
the first individual new adding sub-module is used for surrounding the accuracy elite lion and the first roulette termite lion, and carrying out random walk based on the enhanced walk strategy to generate a first new added ant individual;
and the second individual newly-added sub-module is used for surrounding the diversity elite lion and the second roulette termite lion, and carrying out random walk based on the enhanced walk strategy to generate a second newly-added ant individual.
In one possible implementation, the sparsity computation submodule includes:
a range determining unit for determining a niche range corresponding to each ant-lion individual;
the adjacent ant lion determining unit is used for determining other ant lion individuals in the range of the niche as adjacent ant lions;
the sparsity determining unit is used for determining the sparsity of each ant-lion individual according to the number of adjacent ant-lions.
It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the invention may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (apparatus), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. "and/or" means either or both of which may be selected. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.
The above description of the invention provides an algorithm selection method and device based on ant lion algorithm, which uses specific examples to describe the principle and implementation of the invention, the above examples are only used to help understand the method and core idea of the invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims (10)

1. An algorithm selection method based on an ant lion algorithm is characterized by comprising the following steps:
determining an alternative algorithm according to an application scene, wherein the application scene is an image classification scene, and the alternative algorithm is a plurality of image classification algorithms for image classification;
constructing a metadata set, comprising: for any meta-feature, the meta-feature corresponds to an optimal algorithm, the optimal algorithm is used as an algorithm label of the meta-feature to form a meta-instance, and a meta-data set is constructed by a plurality of meta-instances; wherein, the optimal algorithm is: under the performance evaluation index of the classification speed, an image classification algorithm with the highest performance index value in a plurality of image classification algorithms;
randomly initializing a plurality of ant individuals, and carrying out mixed coding on each ant individual according to the meta-feature quantity of the meta-data set to generate an initial ant population, wherein the method comprises the following steps: the selection process of the corresponding meta-characteristics comprises the following steps: based on a hybrid coding mechanism, sampling the training set for a plurality of times by using a self-help method to generate a plurality of training sub-data sets; training on the training sub-data sets by using a base classifier to form a base classifier set; selecting a base classifier with stronger accuracy and diversity from the base classifier set by a selective integration method, and combining based on a weighted voting method strategy to obtain an integrated element algorithm; generating an integrated meta algorithm set formed by different integrated meta algorithms through different integrated meta algorithms formed by different base classifier combinations;
Based on a target optimization function, respectively carrying out fitness calculation on ant individuals in the initial ant population, and determining an ant lion population according to the fitness calculation result of the ant individuals;
determining elite lions in the termite lion population, and generating a new increased termite population by enhancing a random walk strategy according to the elite lions;
the individual in the newly-increased ant population is respectively subjected to fitness calculation, and then the newly-increased ant population is put into the ant lion population to update the ant lion population;
iteratively executing steps of determining elite lion in the updated termite lion population according to the fitness calculation result, generating a newly-increased termite population according to the elite lion by enhancing a random walk strategy, and placing the newly-increased termite population into the termite lion population so as to realize updating of the termite lion population;
when the iteration exit condition is met, outputting the ant-lion population obtained in the last iteration to obtain an integrated element algorithm set;
selecting a target integrated meta-algorithm from the integrated meta-algorithm set, and selecting an algorithm with the best performance in the application scene from the alternative algorithms according to the target integrated meta-algorithm, wherein the method comprises the following steps: and taking the plurality of image classification algorithms as input and inputting the image classification algorithms into the target integrating element algorithm, wherein the output result of the target integrating element algorithm is the algorithm with highest accuracy in the application scene for executing the image classification task at this time.
2. The method of claim 1, wherein the step of constructing the metadata set comprises:
extracting features of the historical data set, and determining the number of meta-features;
generating an optimal algorithm on the historical data set through a performance measure evaluation method, wherein different performance measure evaluation methods generate different optimal algorithms;
and forming a meta-instance by taking the meta-feature as an attribute and the optimal algorithm as a label, and constructing the metadata set according to the meta-instance.
3. An algorithm selection method based on the ant lion algorithm according to claim 2, wherein the step of generating the optimal algorithm on the historical dataset by the performance measure evaluation method comprises:
determining a performance evaluation index of the performance measure evaluation method;
determining the performance evaluation index value of the alternative algorithm according to the historical data set;
and determining the alternative algorithm with the optimal performance evaluation index value as the optimal algorithm.
4. The method of claim 1, wherein each ant individual in the initial ant population corresponds to an integrated meta-algorithm, and the step of performing hybrid encoding on the ant individuals according to the meta-feature number of the meta-data set to generate the initial ant population comprises:
Dividing the metadata set into a training metadata set and a testing metadata set;
sampling the training metadata set by a self-help method according to the number of the base classifiers to obtain an initial training sub-data set;
performing discrete coding on the ant individuals according to the quantity of the meta-characteristics, and performing meta-characteristic selection on the initial training sub-data set according to a coding result;
updating the initial training sub-data set according to the selection result of the meta-feature to obtain a first training sub-data set;
continuously encoding the ant individuals according to the number of the base classifiers to obtain weight codes of the base classifiers, and updating the first training sub-data set according to the size relation between the weight codes of the base classifiers and the selection threshold codes to obtain a target training sub-data set;
training the base classifier based on the target training sub-data set to obtain a base classifier set;
and generating an integration weight of the base classifier set according to the weight code of the base classifier, and integrating the base classifier according to the integration weight to obtain an integration meta algorithm.
5. The method for selecting an algorithm based on an ant lion algorithm according to claim 1, wherein the objective optimization function includes a first objective function and a second objective function, the step of performing fitness calculation on each ant individual in the initial ant population based on the objective optimization function, and determining the ant lion population according to the fitness calculation result includes:
Determining the lowest algorithm classification error rate as the optimization direction of the first objective function;
optimally determining an algorithm diversity index value as an optimization direction of the second objective function;
according to the first objective function and the second objective function, respectively carrying out fitness calculation on each ant individual;
determining the pareto dominant relationship among the fitness calculation results, eliminating ant individuals corresponding to the fitness calculation results with the dominant relationship, and generating the ant lion population.
6. The method according to claim 5, wherein the step of performing fitness calculation on each ant individual according to the first objective function and the second objective function, respectively, comprises:
calculating the classification error rate of the integrated element algorithm according to the first objective function;
calculating a diversity index value of an integrating element algorithm according to the second objective function;
and determining the adaptability of the ant individuals corresponding to the integrating element algorithm according to the classification error rate of the integrating element algorithm and the diversity index value of the integrating element algorithm.
7. The method of claim 1, wherein the step of determining elite lions in the ant lion population and generating a newly added ant population by enhancing random walk strategy based on the elite lions comprises:
Determining an ant-lion individual with the lowest classification error rate of the corresponding integrated element algorithm as an accurate elite ant-lion, and determining an ant-lion individual with the optimal diversity index value of the corresponding integrated element algorithm as a diversity elite ant-lion;
respectively carrying out sparsity calculation on each ant lion individual in the ant lion population;
according to the sparsity calculation result of the ant lion individuals, selecting a first roulette ant lion and a second roulette ant lion from the ant lion population through a roulette method;
surrounding the accuracy elite lion and the first roulette termite lion, and performing random walk based on an enhanced walk strategy to generate a first new ant individual;
surrounding the diversity elite lion and the second roulette termite lion, and performing random walk based on an enhanced walk strategy to generate a second newly-increased ant individual;
generating the newly-increased ant population according to the first newly-increased ant individuals and the second newly-increased ant individuals.
8. The method for selecting an algorithm based on an ant-lion algorithm according to claim 7, wherein the step of performing sparsity computation on each ant-lion individual in the ant-lion population, respectively, comprises:
Determining the corresponding niche range of each ant-lion individual;
determining other ant lion individuals within the niche range as adjacent ant lions;
and determining the sparsity of each ant lion individual according to the number of the adjacent ant lions.
9. An algorithm selecting device based on ant lion algorithm, which is characterized in that the device comprises:
the data acquisition module is used for determining an alternative algorithm according to an application scene, wherein the application scene is an image classification scene, and the alternative algorithm is a plurality of image classification algorithms for image classification;
an encoding module for constructing a metadata set, comprising: for any meta-feature, the meta-feature corresponds to an optimal algorithm, the optimal algorithm is used as an algorithm label of the meta-feature to form a meta-instance, and a meta-data set is constructed by a plurality of meta-instances; wherein, the optimal algorithm is: under the performance evaluation index of the classification speed, an image classification algorithm with the highest performance index value in a plurality of image classification algorithms;
the initial ant population generation module is used for randomly initializing a plurality of ant individuals, carrying out mixed coding on each ant individual according to the meta-feature quantity of the meta-data set, and generating an initial ant population, and comprises the following steps: the selection process of the corresponding meta-characteristics comprises the following steps: based on a hybrid coding mechanism, sampling the training set for a plurality of times by using a self-help method to generate a plurality of training sub-data sets; training on the training sub-data sets by using a base classifier to form a base classifier set; selecting a base classifier with stronger accuracy and diversity from the base classifier set by a selective integration method, and combining based on a weighted voting method strategy to obtain an integrated element algorithm; generating an integrated meta algorithm set formed by different integrated meta algorithms through different integrated meta algorithms formed by different base classifier combinations;
The ant lion determining module is used for respectively carrying out adaptability calculation on ant individuals in the initial ant population based on a target optimization function, and determining the ant lion population according to the adaptability calculation result of the ant individuals;
the ant population updating module is used for determining elite ant lion in the ant lion population and generating a new increased ant population by enhancing a random walk strategy according to the elite ant lion;
the ant lion population updating module is used for putting the newly-increased ant population into the ant lion population so as to update the ant lion population;
the iteration module is used for iterating to determine elite lion in the updated termite-lion population according to the fitness calculation result, generating a newly-increased termite population by enhancing a random walk strategy according to the elite lion, and placing the newly-increased termite population into the termite-lion population so as to realize updating of the termite-lion population;
the output module is used for outputting the ant-lion population obtained in the last iteration when the iteration exit condition is met, so as to obtain an integrated element algorithm set;
the algorithm selection module is configured to select a target integrated meta algorithm from the integrated meta algorithm set, and select, according to the target integrated meta algorithm, an algorithm with the best performance in the application scenario from the alternative algorithms, where the algorithm selection module includes: and taking the plurality of image classification algorithms as input and inputting the image classification algorithms into the target integrating element algorithm, wherein the output result of the target integrating element algorithm is the algorithm with highest accuracy in the application scene for executing the image classification task at this time.
10. The ant lion algorithm based algorithm selection device of claim 9, wherein the encoding module comprises:
the meta-feature determining submodule is used for extracting features of the historical data set and determining the quantity of meta-features;
the evaluation sub-module is used for generating an optimal algorithm on the historical data set through a performance measure evaluation method, wherein different performance measure evaluation methods generate different optimal algorithms;
and the metadata set construction sub-module is used for forming a metadata instance by taking the metadata features as attributes and the optimal algorithm as a label, and constructing the metadata set according to the metadata instance.
CN202310127816.XA 2023-02-08 2023-02-08 Algorithm selecting method and device based on ant lion algorithm Active CN116340839B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310127816.XA CN116340839B (en) 2023-02-08 2023-02-08 Algorithm selecting method and device based on ant lion algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310127816.XA CN116340839B (en) 2023-02-08 2023-02-08 Algorithm selecting method and device based on ant lion algorithm

Publications (2)

Publication Number Publication Date
CN116340839A CN116340839A (en) 2023-06-27
CN116340839B true CN116340839B (en) 2023-10-20

Family

ID=86886686

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310127816.XA Active CN116340839B (en) 2023-02-08 2023-02-08 Algorithm selecting method and device based on ant lion algorithm

Country Status (1)

Country Link
CN (1) CN116340839B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116934119A (en) * 2023-07-27 2023-10-24 淮阴工学院 Intelligent flexible regulation and control system and method for realizing zero carbon and zero pollution of kitchen waste based on optimized ant lion algorithm
CN117634302B (en) * 2023-12-05 2024-05-14 北京大数据先进技术研究院 Dynamic service combination selection method, device and product
CN117787444B (en) * 2024-02-27 2024-05-17 西安羚控电子科技有限公司 Intelligent algorithm rapid integration method and device for cluster countermeasure scene

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241576A (en) * 2018-08-14 2019-01-18 西安电子科技大学 Sparse antenna Pattern Synthesis method based on ant lion algorithm
CN111062515A (en) * 2019-11-18 2020-04-24 深圳供电局有限公司 Distribution network distributed power supply configuration method
CN112085318A (en) * 2020-07-28 2020-12-15 河南科技大学 Client demand modular process configuration method based on multi-target ant lion algorithm
CN113240068A (en) * 2021-05-14 2021-08-10 江苏科技大学 RBF neural network optimization method based on improved ant lion algorithm
WO2021179462A1 (en) * 2020-03-12 2021-09-16 重庆邮电大学 Improved quantum ant colony algorithm-based spark platform task scheduling method
CN114490618A (en) * 2022-02-15 2022-05-13 北京大数据先进技术研究院 Ant-lion algorithm-based data filling method, device, equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241576A (en) * 2018-08-14 2019-01-18 西安电子科技大学 Sparse antenna Pattern Synthesis method based on ant lion algorithm
CN111062515A (en) * 2019-11-18 2020-04-24 深圳供电局有限公司 Distribution network distributed power supply configuration method
WO2021179462A1 (en) * 2020-03-12 2021-09-16 重庆邮电大学 Improved quantum ant colony algorithm-based spark platform task scheduling method
CN112085318A (en) * 2020-07-28 2020-12-15 河南科技大学 Client demand modular process configuration method based on multi-target ant lion algorithm
CN113240068A (en) * 2021-05-14 2021-08-10 江苏科技大学 RBF neural network optimization method based on improved ant lion algorithm
CN114490618A (en) * 2022-02-15 2022-05-13 北京大数据先进技术研究院 Ant-lion algorithm-based data filling method, device, equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
An Improved Ant Lion Optimization Algorithm and Its Application;Haicheng Shen等;《2022 IEEE International Conference on Networking, Sensing and Control》;1-6 *
遗传增强蚁群优化算法;梁豪默;王智学;刘艺;;微电子学与计算机(08);107-110+114 *

Also Published As

Publication number Publication date
CN116340839A (en) 2023-06-27

Similar Documents

Publication Publication Date Title
CN116340839B (en) Algorithm selecting method and device based on ant lion algorithm
Chen et al. Efficient ant colony optimization for image feature selection
Pelikan et al. Estimation of distribution algorithms
Khaleghi et al. Consistent algorithms for clustering time series
US20180018566A1 (en) Finding k extreme values in constant processing time
Tian et al. Learning subspace-based RBFNN using coevolutionary algorithm for complex classification tasks
Wen et al. Comparision of four machine learning techniques for the prediction of prostate cancer survivability
CN117391258B (en) Method, device, equipment and storage medium for predicting negative carbon emission
Rojo et al. Machine learning applied to wi-fi fingerprinting: The experiences of the ubiqum challenge
Jiang et al. Meta-learning to cluster
Fong et al. Gesture recognition from data streams of human motion sensor using accelerated PSO swarm search feature selection algorithm
Goel et al. Learning procedural abstractions and evaluating discrete latent temporal structure
Saini et al. Select wisely and explain: Active learning and probabilistic local post-hoc explainability
Lu et al. From Comparing Clusterings to Combining Clusterings.
Brunello et al. Towards interpretability in fingerprint based indoor positioning: May attention be with us
Trajdos et al. A correction method of a binary classifier applied to multi-label pairwise models
Singh et al. Correlation‐based classifier combination in the field of pattern recognition
Canchila et al. Hyperparameter optimization and importance ranking in deep learning–based crack segmentation
CN116208399A (en) Network malicious behavior detection method and device based on metagraph
Phiwhorm et al. A hybrid genetic algorithm with multi-parent crossover in fuzzy rule-based
Saccomanno et al. Let’s forget about exact signal strength: Indoor positioning based on access point ranking and recurrent neural networks
Lu et al. Combining multiple clusterings using fast simulated annealing
Parraga-Alava et al. A bi-objective model for gene clustering combining expression data and external biological knowledge
Punjabi et al. Enhancing Performance of Lazy Learner by Means of Binary Particle Swarm Optimization
Szymanski et al. Lnemlc: Label network embeddings for multi-label classifiation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant