CN114625901A

CN114625901A - Multi-algorithm integration method and device

Info

Publication number: CN114625901A
Application number: CN202210519444.0A
Authority: CN
Inventors: 王冲; 唐建松; 张晟辉; 张犇; 朱云; 何远峰
Original assignee: Nanjing Dimension Software Co ltd
Current assignee: Nanjing Dimension Software Co ltd
Priority date: 2022-05-13
Filing date: 2022-05-13
Publication date: 2022-06-14
Anticipated expiration: 2042-05-13
Also published as: CN114625901B

Abstract

The invention discloses a multi-algorithm integration method and a device, wherein the method comprises the following steps: collecting registration information of a plurality of service algorithms and registering; integrating interfaces of a plurality of registered service algorithms; calling a plurality of registered databases of service algorithms and receiving data of each database; extracting the characteristics of the data of each database; performing association, comparison and identification processing on the extracted features, and performing classified storage according to feature types; receiving a service task request; determining at least two optimal service algorithms from the registered multiple service algorithms according to the service task request; executing the service tasks through the at least two optimal service algorithms, and integrating the execution results of the executed service tasks; according to the method, the multiple algorithms are integrated and compared, and the service task request is optimally matched with the algorithms, so that the efficient integration of the multiple algorithms is realized, the algorithm calling speed is increased, and the service task execution efficiency and accuracy are higher.

Description

Multi-algorithm integration method and device

Technical Field

The invention relates to the technical field of computers, in particular to a multi-algorithm integration method and device.

Background

With the continuous development of image processing technology, video image information systems are increasingly used in public security, customs and other inspections. In the process of computer deployment and personnel merging, a corresponding system is required to provide multiple services such as image analysis, image deployment and control, image retrieval and the like, so that multiple service algorithms provided by multiple manufacturers are required to be adopted. Due to the fact that technical implementation modes of different manufacturers have certain differences, and algorithm using modes have various differences, integration of multi-algorithm universality is needed in the process of multi-algorithm design of a video image information system, so that the sampling inspection work can be finished more smoothly in the working process.

At present, various algorithms have been integrated. For example, patent document CN112988384A discloses a scene-based algorithm resource automatic integration calling method, which performs pre-calibration of algorithm capability, performs uniform format encapsulation on an algorithm model of input algorithm resources in a form of a callable interface, deploys and opens a received algorithm interface and verifies an operation mechanism thereof, so as to match an optimal algorithm interface for a user.

According to the scheme, the tasks are distributed, and various algorithm resources are integrally called, so that the most suitable algorithm interface is automatically evaluated and recommended, the interface formats of the similar algorithms are unified, and the time for a user to select the algorithm interface is saved. However, the method only selects the algorithm which is most matched with the task through the algorithm capability, and is poor in matching accuracy and low in efficiency.

Disclosure of Invention

The invention provides a multi-algorithm integration method and a multi-algorithm integration device, which are used for realizing high-efficiency integration of various algorithms by integrating and comparing various algorithms and optimally matching service task requests with the algorithms, improving the algorithm calling speed and realizing higher service task execution efficiency and higher accuracy.

A method of multi-algorithm integration, comprising:

collecting registration information of a plurality of service algorithms and registering;

interface integration is carried out on a plurality of registered service algorithms;

calling a plurality of registered databases of service algorithms and receiving data of each database;

extracting the characteristics of the data of each database;

performing association, comparison and identification processing on the extracted features, and performing classified storage according to feature types;

receiving a service task request;

determining at least two optimal service algorithms from the registered service algorithms according to the service task request;

and executing the service tasks through the at least two optimal service algorithms, and integrating the execution results of the executed service tasks.

Further, interface integration is performed on the registered service algorithms, and the method comprises the following steps:

and carrying out unified assignment on the interface parameters and the return result formats of all the service algorithms to form a universal interface.

Further, the service algorithm includes a classification algorithm, a regression algorithm and a clustering algorithm.

Further, determining at least two optimal service algorithms from the registered plurality of service algorithms according to the service task request, comprising:

analyzing the service task request to obtain a setting parameter related to the service task;

calling at least three service algorithms of different types, inputting the setting parameters into the service algorithms of different types, and obtaining output result parameters;

and verifying the output result parameters, and determining at least two optimal service algorithms according to the verification result.

Further, verifying the output result parameters, and determining at least two optimal service algorithms according to the verification result, including:

respectively establishing N data sets according to the N output result parameters, wherein N is an integer greater than or equal to 3;

in each round of verification, one data set is selected from the data sets to serve as verification data, the other data sets serve as training data, the training data are input into a service algorithm to be trained, then the verification data are input to be verified, the mean square error is calculated, and the mean square error average value of the service algorithm is calculated after N rounds of verification;

performing the verification on each service algorithm to obtain a mean square error average value of each service algorithm, and sequencing the mean square error average values in an ascending order according to the mean square error average values of the service algorithms;

and selecting a preset number of service algorithms which are ranked in the front to determine as the optimal service algorithm, wherein the preset number is more than or equal to two.

Further, the mean square error is calculated by the following formula:

；

wherein the content of the first and second substances,

m is the total number of the result data and the verification data output by training, r is the grouping number of the result data and the verification data output by training, M-r is the degree of freedom,

for the (i) th sample,

sample variance generated for each group

The mean square error average is calculated according to the following formula:

；

wherein E is the mean square error, N is the number of verifications,

the mean square error obtained in the verification of the ith round.

Further, integrating the execution result of executing the service task includes:

and carrying out weighted average or simple average on the execution result to obtain an integrated result.

and carrying out classification voting on the execution result to obtain an integration result.

A multi-algorithm integration device applied to the method comprises the following steps:

the registration module is used for acquiring registration information of a plurality of service algorithms and registering;

the interface integration module is used for integrating the interfaces of the registered service algorithms;

the database calling module is used for calling the registered databases of the plurality of service algorithms and receiving the data of each database;

the database feature extraction module is used for extracting features of data of each database;

the database integration module is used for performing association, comparison and identification processing on the extracted features and performing classified storage according to feature types;

the receiving module is used for receiving a service task request;

the determining module is used for determining at least two optimal service algorithms from the registered service algorithms according to the service task request;

and the result integration module is used for executing the service tasks through the at least two optimal service algorithms and integrating the execution results of the executed service tasks.

An electronic device comprises a processor and a storage device, wherein the storage device stores a plurality of instructions, and the processor is used for reading the instructions and executing the method.

The multi-algorithm integration method and the device provided by the invention at least have the following beneficial effects:

(1) the interfaces of a plurality of service algorithms and the related databases are integrated to form external interfaces with uniform formats and databases which are more convenient to call and compare, so that the running speed of the algorithms is increased, and the execution efficiency of service tasks is improved.

(2) The optimal algorithms can be matched from various algorithms for the service task request, the obtained result generalization capability is better, more stable and comprehensive, the error of algorithm execution is minimized, and the accuracy of task execution is improved.

(3) And integrating the algorithm results by a voting method and an averaging method, so that the accuracy of the service execution result is improved.

Drawings

Fig. 1 is a flowchart of an embodiment of a multi-algorithm integration method provided in the present invention.

Fig. 2 is a flowchart of an embodiment of a method for verifying an output result parameter in the method according to the present invention.

Fig. 3 is a schematic structural diagram of an embodiment of a multi-algorithm integration apparatus provided in the present invention.

Fig. 4 is a schematic structural diagram of an embodiment of an electronic device provided in the present invention.

Reference numerals: 1-a processor, 101-a registration module, 102-an interface integration module, 103-a database calling module, 104-a database feature extraction module, 105-a database integration module, 106-a receiving module, 107-a determination module, 108-a result integration module and 2-a storage device.

Detailed Description

In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.

Referring to fig. 1, in some embodiments, there is provided a multi-algorithm integration method comprising:

s1, collecting and registering registration information of a plurality of service algorithms;

s2, integrating interfaces of a plurality of registered service algorithms;

s3, calling the databases of the registered service algorithms and receiving the data of each database;

s4, extracting the characteristics of the data of each database;

s5, performing association, comparison and identification processing on the extracted features, and performing classified storage according to feature types;

s6, receiving a service task request;

s7, determining at least two optimal service algorithms from the registered service algorithms according to the service task request;

and S8, executing the service tasks through the at least two optimal service algorithms, and integrating the execution results of the executed service tasks.

Specifically, in some embodiments, for example, in the field of deployment and control, in step S1, the collected multiple service algorithms are such that the algorithm platform has the following service capabilities: the system comprises an image structured analysis service, an image control service, an image retrieval service, an image scheduling service and an image autonomous comparison service.

The image structured analysis service refers to the on-line image stream analysis capability, namely the structured analysis processing capability of the image stream imported in real time, and specifically comprises the analysis of human faces, human bodies and other accessories, so that the human face track structured data meeting the specification is formed; the image deployment and control service refers to the utilization of image similarity comparison capability, namely the capability of realizing real-time deployment and control comparison service based on images, also supports online comparison deployment and control service for images of human bodies and the like of human faces, and realizes deployment and control early warning capability by constructing various deployment and control object libraries in different ranges and comparing the characteristics of personnel objects extracted from online image streams with the characteristics of a specified deployment and control object library (such as an escaping personnel library and a key object library); the image retrieval service is to realize the retrieval service based on image similarity comparison by utilizing the extraction and calculation capabilities of image features, and return results according to the set image similarity and other retrieval conditions (time, space, element attributes and the like); the image scheduling service is based on various image storage environments processed by a video image analysis platform, builds retrieval service capability of various images (including local small images and overall background images), and comprises image retrieval based on identification such as access addresses, image IDs and the like, and supports the inquiry and retrieval of batch images through time ranges and point locations; the image autonomous comparison service refers to comparing the similarity of a specified image or image set based on image similarity comparison capability, giving a result according with the similarity, and specifically supporting multiple comparison modes such as 1:1, 1: q, q: q and the like.

The 1:1 mode is to compare the similarity of two designated images and determine whether the images point to the same object. If the certificate photo and the field snapshot photo are compared, the identity verification of the checking personnel is completed; 1: the q-mode is to compare a specified image with a specific image library and find out an image whose specified similarity is achieved in the image library. If the picture of the checking personnel is compared with the key object library, whether the checking personnel is the key object is judged; the q mode refers to comparing two image sets to find out a coincident object with close similarity so as to achieve the purpose of data intersection set analysis. Such as comparing sets of photos of suspect objects appearing in two different cases for case string-parallel analysis.

The registration information of the plurality of service algorithms comprises detailed information of each algorithm, corresponding use modes, technical specifications and other information, and specifically comprises algorithm basic information, algorithm monitoring analysis, algorithm heartbeat monitoring records, algorithm use log records and algorithm data reconciliation.

The basic algorithm information comprises: source algorithm protocol (e.g., webservice), return result format (e.g., xml), execution mode (e.g., post), source interface address, algorithm authorization password information, algorithm providing unit, algorithm provider contact, source algorithm usage description (e.g., algorithm using API document), monitoring status (e.g., monitoring once every M minutes, no return result beyond K seconds is considered as no response exception), affiliation service.

The algorithm monitoring analysis is specifically the display of the daily running condition of the algorithm. As a better implementation mode, a monitoring analysis condition display report is provided and consists of a two-dimensional coordinate system, wherein the x axis displays the date of the last 30 days, and the y axis displays the normal or abnormal condition of the current day; the upper right corner of the report provides two choices of time or times for the y axis for switching, and the time is displayed in a default state; the y-axis selects two columns, one showing time to failure (in hours) and the other showing time to normality (in hours).

The algorithm heartbeat monitoring record comprises the following steps: query conditions, list presentation, and list content. The query conditions comprise monitoring time parameters and monitoring results which are selected according to the range; the list display comprises a sequence number, monitoring time, response speed (unit is millisecond) and monitoring results, wherein the monitoring results comprise three types of no response, slow response speed and normal response.

The algorithm usage logging includes: query terms, list presentation, and list content. The query conditions comprise use time, return result time, use units and use IP which are selected according to the range; the tabular presentation includes sequence number, time of use, return speed, use of IP, unit of use, and request parameter details.

The algorithm data reconciliation comprises the following steps: query conditions, list presentation, and list content. The query conditions comprise account checking time and account checking results selected according to the range, and the account checking results comprise three types of normal data, lost data and abnormal increase; the list display comprises a serial number, account checking time, an access data type, an access data volume, an output data volume and a docking result.

In step S2, the interface integration of the registered multiple service algorithms includes:

As a better implementation mode, based on a uniform GA/T1400 service interface specification, the specific use parameters and the return result formats of the algorithm interfaces of different manufacturers are subjected to universal integration to form a uniform assignment standard and a universal interface. A universal rest framework is supported to be a scheduling interface implementation mode, and two interface checking mechanisms are provided to ensure timeliness and accuracy of the service. Firstly, an overtime control mode is adopted, and in order to avoid the influence on user experience caused by that no feedback is obtained for a long time when a service interface is dispatched, the task dispatching of the interface which does not return a result for more than 5 seconds is automatically terminated; and secondly, a repeated request mode is adopted, repeated scheduling is carried out after the response of a single service interface fails in order to avoid response faults caused by factors such as network instability, and the response rate of the interface is improved through repeated cyclic scheduling.

For various image analysis interfaces which need to be used online, such as face retrieval, face similarity comparison and the like, and need to process real-time data or respond to requests in real time, algorithm API interfaces of different manufacturers are subjected to standardized conversion on the basis of design depending on relevant specifications of a view library to form a universal interface. Specifically, the restful interface specification is used as a using mode of the universal interface, standardized conversion can be carried out on various HTTP interfaces and development kits based on SDK of different manufacturers, and the universal interface is formed and applied to various service scenes.

The relevant specification interfaces include a common class interface, a collection class interface and a service class interface. The public interfaces comprise four interfaces of registration keeping, keep-alive, cancellation and timing; the collection interfaces comprise interfaces for uploading video clips, images, files, human faces, motor vehicles, non-motor vehicles, articles, scenes and the like; the service interface comprises interfaces such as video clip inquiring and maintaining, image inquiring and maintaining, file inquiring and maintaining, face inquiring and maintaining, motor vehicle inquiring and maintaining, non-motor vehicle inquiring and maintaining, object inquiring and maintaining, scene inquiring and maintaining, deployment and control task inquiring and maintaining, alarm information inquiring and maintaining, subscription record inquiring and maintaining, notification record inquiring and maintaining and the like.

The relevant service specification mode is as follows: the interface message Content-Type header field should be set to application/+ JSON. The result item returned by the GET method is a result object returned when the query is successful (that is, the HTTP response status code is 2 XX). When the query is unsuccessful (i.e., the HTTP response status code is not 2 XX), the result object returned is ResponseStatus. In one specific application scenario, the canonical service interface is seen in table 1.

TABLE 1

The definitions of Register and ResponseStatus should conform to the regulations in GA/T1400.3 protocol, where ID of ResponseStatus is DeviceID requesting registration, StatusCode is operation response code of this registration, StatusString is operation response description of this registration, LocalTime is system time of the registered party, and may be used for timing of the registered party.

In step S3, a plurality of databases of registered service algorithms are called, and data of each database is received, thereby preparing for database integration. The database integration is a data standardization process and supports real-time calculation, off-line calculation and batch processing operation, and the data transmission process supports a distributed data transmission mode. In the data processing process, an artificial intelligence technology, such as a graph calculation technology and a memory calculation technology, is introduced, so that the processing of structured data and unstructured data is realized, and the value of the data is improved. In the data processing process, a model system, a label system and a knowledge map technology are introduced, so that the value density of data is further improved, and data value increment, data preparation and data abstraction are realized for data intelligent application. For various situations that need to rely on a designated database for algorithm processing, such as a database used by an image organization and control comparison service, a database used by a person merging and the like (e.g. an Oracle library and an Hbase library), two different ways of view conversion and database docking are designed and provided. The view conversion mode is to convert a general database into a corresponding view structure according to the use requirement of the algorithm and provide the view structure for the algorithm to use; the database docking mode is to switch the analysis result output by the algorithm from the database of the algorithm to a universal database in a database docking mode, and to complete the conversion of the structure specification in the switching process.

In step S4, the feature extraction process is to extract features such as names, characters, pictures, id cards, mobile phone numbers, key frames, facial features, fingerprint features, voiceprint features, iris features, and the like associated with the algorithm processing from the structures such as full text structured data, web page information, multimedia information, biological features, and the like included in the data of each database. After step S3, filtering and cleaning the extracted features to obtain data with higher quality, specifically including the following steps: unifying data formats, removing repeated errors and associated errors, correcting content errors and logic errors, correcting data inconsistency, splitting content, supplementing missing data and the like.

In step S5, association, comparison and identification processing are performed on the cleaned data features, data are output according to data standards by classification, and data storage is performed according to certain storage specifications and storage strategies. Specifically, the association includes associating internet site data, fixed network data, mobile internet data, and the like; the comparison comprises the comparison of keywords, texts, voice images, binary systems, structuralization and the like; the identification includes marking the business, region, national language, spatial location, sensitivity level, etc. The processed database data are classified according to the characteristic types and stored in databases such as a resource library, a subject library, a special subject library and the like.

As a preferred implementation mode, the database integration adopts a database integration platform. The database integration platform supports automatic data processing and checking functions according to strategies and rules, adopts a dynamic, configurable and extensible open architecture, supports dynamic arrangement in a data preprocessing link, establishes a unified coordination mechanism for processing data flow and control flow, realizes unified addressing management of a data life cycle, and ensures the integrity and consistency of data. The data caching mechanism is used for processing instantaneous peak data streams, and the processing capacity of structured and unstructured data such as multimedia, text and encrypted files is achieved. And the database integration platform processes the structured data, the semi-structured data and the unstructured data according to the data standard, the data verification rule and the data processing strategy. The database integration platform fully considers the characteristics of data, gives consideration to the variety diversity, the mass property of data quantity, the multi-source heterogeneity of the data, the complexity of data formats and the online timeliness of the data, and comprehensively constructs a data resource fusion system with all-dimensional acquisition, all-network convergence and all-dimensional integration. By taking intelligent application as a guide and data processing automation and intellectualization, the data association degree and the service compactness are improved, the data quality is improved, the potential and the value of data resources are mined, a mass data resource pool is constructed by scientific classification, a foundation is made for data organization and storage, and the actual combat application of each business department is effectively supported.

Specifically, the database integration platform supports real-time streaming data processing, offline data calculation and distributed data management. The data storage supports data types such as a relational database, a column cluster database, a graph database, a text file, a binary file, a video format, an audio format, a picture format, a large object, serialized data, XML, JSON, a general machine learning model, a statistical analysis model and the like; the data integration platform supports real-time marking and off-line marking functions of processed data and supports label engineering and knowledge map technology; the database integration platform supports generation of an analysis report on the data quality condition of the processed data, the quality of the processed data passes through a data quality evaluation model, and the data is subjected to quality grading, and the analysis report comprises a data consistency report, a data integrity report and a data credibility report; the data integration basic platform records detailed operation logs of data processing, supports log records of an operation level, a service level and a system level, and supports auditing, system maintenance, tuning analysis, problem tracking and the like of the operation logs; the data integration basic platform supports automatic processing and manual processing of problem data so as to analyze the reason of unqualified data, solve the problem of unqualified data, improve the quality of accessed data, support additional recording of problem data and repair and reuse of problem data; the data integration basic platform system needs to support the monitoring of the running state of the processed data, the statistical analysis of the data and the quality monitoring.

In step S7, determining at least two optimal service algorithms from the registered plurality of service algorithms according to the service task request, including:

s71, analyzing the service task request to obtain the setting parameters of the service task;

s72, calling at least three service algorithms of different types, inputting the setting parameters into the service algorithms of different types, and obtaining output result parameters;

and S73, verifying the output result parameters, and determining at least two optimal service algorithms according to the verification result.

In step S71, most algorithms need to choose to set many parameters to help us control the behavior of the algorithm while maximizing the platform performance.

Some learning algorithms make certain assumptions about the structure of the data or the expected outcome in step S72, and if a desired algorithm type is found, a more useful outcome, more accurate prediction, or faster settling time may be provided. Referring to table 2, when comparing and selecting the classification, regression, and clustering series type algorithms, the most important features such as accuracy, setting time, linearity, number of main parameters, etc. of the algorithms are mainly evaluated.

TABLE 2

The following describes determining an optimal service algorithm in a specific application scenario. For example, for a task of extracting human face features, setting parameters of the task are extracted, wherein the setting parameters comprise a loss function, a learning rate, a kernel function, a smooth curve function, a weighting category, a penalty coefficient, iteration times and the like, a HoG algorithm, a Dlib algorithm and a convolutional neural network feature extraction algorithm are called, the setting parameters are input, the algorithms are operated, a result parameter list is output, the result parameters are output for verification, and at least two optimal service algorithms are determined according to verification results.

Referring to fig. 2, in step S73, verifying the output result parameter, and determining an optimal service algorithm according to the verification result includes:

s731, respectively establishing N data sets according to the N output result parameters, wherein N is an integer greater than or equal to 3;

s732, in each round of verification, selecting one data set from the data sets as verification data, using other data sets as training data, inputting the training data into a service algorithm for training, then inputting the verification data for verification and calculating a mean square error, and calculating a mean square error average value of the service algorithm after N rounds of verification;

s733, performing the verification on each service algorithm to obtain a mean square error average value of each service algorithm, and performing ascending sequencing according to the mean square error average value of the service algorithms;

s734, selecting a preset number of service algorithms in the top sequence to determine the service algorithms as optimal service algorithms, wherein the preset number is greater than or equal to two.

In step S732, the mean square error is calculated by the following formula:

；

wherein the content of the first and second substances,

for the (i) -th sample, the sample is,

sample variance generated for each group:

；

wherein the content of the first and second substances,

representing a random variable;

the mean square error average is calculated according to the following formula:

；

wherein E is the mean square error, N is the number of verifications,

the mean square error obtained in the verification of the ith round.

The above method aims at assessing how generalized the generalization ability of a given algorithm is trained on a particular data set, and by observing the difference in accuracy in different rounds, it is possible to learn the worst and best performance of the model when the algorithm is applied to new data. The obtained result is more stable and comprehensive.

In step S8, a service task is executed by the at least two optimal service algorithms.

The following is further described by specific application scenarios.

In a specific application scenario, three algorithms, namely CLS (a kind of decision tree), DET (a kind of target detection algorithm) and REC (font recognition convolutional neural network), are adopted in OCR (optical text recognition). CLS is a method and a device for training a decision tree model and determining data attributes in an OCR result, wherein the method for training the decision tree model comprises the following steps: acquiring a sample picture and performing OCR recognition on the sample picture to generate a first OCR recognition result, wherein the first OCR recognition result is a two-dimensional character string array, and each row of data in the two-dimensional character string array is used for indicating data belonging to the same attribute row; extracting first characteristic information of each data in the first OCR recognition result; acquiring first labeling data corresponding to each data in the first OCR recognition result, wherein the first labeling data are used for indicating the attribute of each data; and training according to the first characteristic information and the first marking data to generate a decision tree model for determining data attributes in the OCR recognition result. The method realizes automatic marking of data attributes in the identification result, effectively reduces consumption cost in the identification process of the picture to be identified, and improves identification efficiency; DET is a target detection algorithm, a picture is input, the output of a model needs to circle the positions of all characters in the picture and the categories of the characters, then visual features related to candidate regions are extracted, and finally a classifier is used for identifying and detecting whether pixel points in the region range form the characters or not; the REC is used for specifically recognizing characters in the area, predicting characters in the corresponding area according to the trained model, and the algorithm is also a core algorithm in the OCR function.

The cooperation mode of the algorithms comprises three modes of concurrency, primary and secondary and division of labor, a user can allocate specific amount of resources to each algorithm in the concurrency mode and the primary and secondary modes, and the allocation mode comprises three modes of regional allocation, point allocation and random allocation.

The concurrent cooperation mode refers to that a plurality of different algorithms are used for the same task at the same time, each algorithm respectively processes resources of different data sources and then returns a result, and the tasks perform unified result summarization. By the method, the established algorithm achievements and the subsequent newly-established algorithms can be fully utilized, and potential risks in analysis reliability and accuracy caused by a single algorithm are avoided. The multiple algorithms can also be verified in a cooperation mode, generally, a data analysis result completed by a first algorithm is handed to a second algorithm for secondary result verification, whether the processing of the same picture is different or not is checked, and whether the analysis result is reliable or not is judged by comparing the difference of the two calculation results.

The region allocation means selecting according to each level of administrative division; point location selection refers to fuzzy retrieval according to each point location; the random distribution refers to the random dynamic adjustment of the performance of each algorithm process according to the pressure magnitude of the data process.

In step S8, the integrating the execution result of the service task includes:

carrying out weighted average or simple average on the execution results to obtain an integration result;

and carrying out classification voting or classification probability voting on the execution result to obtain an integration result.

The execution results of the execution service tasks are integrated, and the receiving aspect comprises two different modes: firstly, a kafka message service channel is constructed and opened to a plurality of algorithm manufacturers, the algorithm manufacturers actively push the returned result to the service channel, and at the moment, the system can sense the newly fed back service result in real time and can process the newly fed back service result in time; the other is that the manufacturer provides an output service interface or a database of the algorithm result, and the system scans the service interface or the database of the manufacturer in a timing polling mode to judge whether a new result is generated.

Specifically, for the result of the regression problem, a simple average method is adopted for the prediction results of various algorithm models, so that the obtained result can reduce overfitting, the boundary is smoother, and the problem that the boundary of a single model is rough is avoided. The results of the algorithm calculation are generally set, and the results of the execution of a certain classification task refer to table 3.

TABLE 3

Algorithm a = [0,1,0,1,1,0,0 ];

algorithm B = [0,0,1,0,1,0,0 ];

algorithm C = [0,1,1,1,1,0,1 ];

in the results, 0 indicates that the class α and 1 indicates the class β, and there is no difference in the quality. The result set of the algorithm A, B, C represents the predicted results for the corresponding sample from left to right, and the results are integrated as follows:

simple averaging:

，

，

，

，

，

；

namely, it is

，

；

Wherein S represents the integration result, alpha and beta represent categories, A, B, C represents the result set under different algorithms,

representing the probability of the result set of the category alpha under different algorithms.

Weighted average:

accuracy of algorithm

Calculating the algorithm weight:

，

；

where TP indicates that a Positive determination is made and the determination is correct, FP indicates that a Positive determination is made, but TN that the determination is wrong indicates that a Negative determination is made and the determination is correct, and FN indicates that a Negative determination is made but the determination is wrong.

Further, the classification voting is to use the output of each algorithm as input, convert the one-dimensional result into N samples in a two-dimensional feature space by using a KNN (nearest neighbor node algorithm), calculate the distances from the test samples to other sample points, sort each distance, select K points with the smallest distance, compare the categories to which the K points belong, and classify the test sample points into the category with the highest ratio among the K points according to the principle that minority obeys majority.

Most of the K most similar samples in the feature space belong to a certain class, and the KNN algorithm is suitable for automatic classification of class domains with large sample capacity and is suitable for classification voting.

The portion of the KNN algorithm responsible for implementing classification is straightforward, but two points of the algorithm are not easily determined, just from its name: one is how to determine "K" and the other is how to determine "NN". The similar attraction is a guiding idea of the KNN classification algorithm, so that the machine learning model can be separated from the dependence on the deviation and also has the classification effect. The actual sample has many dimensions, and the distribution of sample data points is different from different dimensions. Assuming that 2 of the 4 dimensions are arbitrarily taken as the X-axis and Y-axis coordinates of the image at a time, 16 images will be obtained.

It can be seen that, for the same sample, after different dimensions are selected, a more complex relationship of canine-crossing is presented between classes, but the tendency of "clustering" of the same class becomes less obvious, the distribution range of samples in the class is wider, and the possibility of being mixed with samples of other classes becomes higher. For KNN, the classification is determined by distance. Specifically, each data point can be made according to the value of each dimension of the sample, and only the distance between each data point and each point needs to be measured, and then if a certain point stroke is classified, only the point needs to be taken as the center of a circle, and then the points adjacent to the point can be found, so that the class is formed. Only points within the circle have a voting weight on which class this point belongs, rather than being voted on by the entire sample. The adjustable parameters of different algorithm models are different. In the KNN algorithm, the selected points, namely the K in the KNN, are parameters which need to be adjusted according to actual conditions so as to obtain better fitting effect, and can be set by combining working experience, wherein the value of the K is generally 3-10.

In a specific application scenario, assuming that there are three independent models, each with 70% accuracy, voting in a minority majority-compliant manner, the final accuracy will be:

(ii) a Namely, the result is subjected to simple classification voting, and the accuracy is improved by 8%. This is a simple probabilistic problem, and if the more the voting algorithm results, the better the result will be, but the precondition is that the algorithm models are independent from each other, and there is no correlation between the results. The more similar algorithm models are integrated, the poorer the integration effect is; the larger the correlation difference between the algorithm models is, the better the integration result will be, and the characteristic will not be affected by the integration mode.

The method also comprises the step of checking the operation condition of the algorithm by adopting the algorithm monitoring and data reconciliation modes. Algorithm monitoring refers to providing monitoring capability of algorithm processing conditions for each algorithm in use, and comprises functions of algorithm current state, algorithm processing flow monitoring and the like, and timely alarming is performed on abnormal algorithms, for example, the abnormal algorithms are checked for heartbeat monitoring conditions, and the functions comprise serial numbers, monitoring time, response speed (unit millisecond), monitoring results (no response, slow response speed and normal response). The data packet of the heartbeat monitoring is from simulation data generated by existing data, the data is sent to the trained model, and the model can predict and return results in real time.

The data reconciliation refers to the process of checking and verifying the number of data, the size of the data and the data fingerprint in the data exchange process of a data provider and a data access party, and comprises the steps of checking the reconciliation of the access number of the algorithms and the data quantity analyzed and output according to the running logs of each algorithm and checking whether the algorithm has data omission or not. After account checking is finished, account checking is required, and logs are required to be recorded when account checking is abnormal. And dividing the data reconciliation into a data access reconciliation and a data distribution reconciliation according to the scene of the data reconciliation. The account checking content comprises the following steps: sequence number, account checking time, access data type, access data volume, output data volume and butt joint result.

Referring to fig. 3, in some embodiments, there is provided a multi-algorithm integration apparatus applied to the above method, including:

the registration module 101 is configured to collect registration information of a plurality of service algorithms and perform registration;

the interface integration module 102 is used for integrating interfaces of a plurality of registered service algorithms;

a database calling module 103, configured to call databases of multiple registered service algorithms, and receive data of each database;

a database feature extraction module 104, configured to perform feature extraction on data of each database;

the database integration module 105 is used for performing association, comparison and identification processing on the extracted features and performing classified storage according to feature types;

a receiving module 106, configured to receive a service task request;

a determining module 107, configured to determine at least two optimal service algorithms from the registered multiple service algorithms according to the service task request;

and the result integration module 108 is configured to execute the service task through the at least two optimal service algorithms and integrate an execution result of executing the service task.

Specifically, the interface integration module 102 is further configured to perform unified assignment on the interface parameters and the return result formats of the service algorithms to form a universal interface.

In some embodiments, the determining module 107 is further configured to parse the service task request to obtain a setting parameter related to the service task; calling at least three service algorithms of different types, inputting the setting parameters into the service algorithms of different types, and obtaining output result parameters; and verifying the output result parameters, and determining at least two optimal service algorithms according to the verification result.

In some embodiments, the determining module 107 is further configured to respectively establish N data sets according to the N output result parameters, where N is an integer greater than or equal to 3; in each round of verification, one data set is selected from the data sets to serve as verification data, the other data sets serve as training data, the training data are input into a service algorithm to be trained, then the verification data are input to be verified, the mean square error is calculated, and the mean square error average value of the service algorithm is calculated after N rounds of verification; performing the verification on each service algorithm to obtain a mean square error average value of each service algorithm, and sequencing the mean square error average values in an ascending order according to the mean square error average values of the service algorithms; and selecting a preset number of service algorithms which are ranked in the front to determine as the optimal service algorithm, wherein the preset number is more than or equal to two.

In some embodiments, the result integration module 108 is further configured to perform weighted average or simple average on the execution results to obtain integration results, and perform classification voting on the execution results to obtain integration results.

Referring to fig. 4, in some embodiments, an electronic device is provided, which includes a processor 1 and a storage 2, where the storage 2 stores a plurality of instructions, and the processor 1 is configured to read the plurality of instructions and execute the method described above.

The multi-algorithm integration method and the multi-algorithm integration device provided by the embodiment integrate the interfaces of a plurality of service algorithms and the related databases to form external interfaces with uniform formats and databases which are more convenient to call and compare, improve the running speed of the plurality of algorithms, and thus improve the execution efficiency of service tasks; the optimal algorithms are matched from the multiple algorithms, the obtained result has better generalization capability and is more stable and comprehensive, so that the error of algorithm execution is minimized, and the accuracy of task execution is improved; the algorithm results are integrated through a voting method and an averaging method, and then training is performed through a parallel or serial mode, so that a service task execution mode with higher accuracy is obtained, and the performance of the multi-algorithm model is further optimized.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention. It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method for multi-algorithm integration, comprising:

integrating interfaces of a plurality of registered service algorithms;

extracting the characteristics of the data of each database;

receiving a service task request;

2. The method of claim 1, wherein the interface integration of the registered plurality of service algorithms comprises:

3. The method of claim 1, wherein the types of service algorithms include classification algorithms, regression algorithms, and clustering algorithms.

4. The method of claim 1, wherein determining at least two optimal service algorithms from the registered plurality of service algorithms based on the service task request comprises:

5. The method of claim 4, wherein the verifying the output result parameters and determining at least two optimal service algorithms according to the verification result comprises:

6. The method of claim 5, wherein the mean square error is calculated by the following equation:

；

wherein the content of the first and second substances,

for the (i) th sample,

the sample variance generated for each group;

the mean square error average is calculated according to the following formula:

；

wherein E is the mean square error, N is the number of verifications,

the mean square error obtained in the verification of the ith round.

7. The method of claim 1 or 5, wherein integrating results of executing the service task comprises:

8. The method of claim 1, wherein integrating results of executing service tasks comprises:

9. A multi-algorithm integration apparatus for use in the method of any one of claims 1-8, comprising:

the receiving module is used for receiving a service task request;

10. An electronic device comprising a processor and a storage device, the storage device storing a plurality of instructions, the processor being configured to read the plurality of instructions and to perform the method according to any one of claims 1-8.