CN114625901A - Multi-algorithm integration method and device - Google Patents

Multi-algorithm integration method and device Download PDF

Info

Publication number
CN114625901A
CN114625901A CN202210519444.0A CN202210519444A CN114625901A CN 114625901 A CN114625901 A CN 114625901A CN 202210519444 A CN202210519444 A CN 202210519444A CN 114625901 A CN114625901 A CN 114625901A
Authority
CN
China
Prior art keywords
service
algorithms
data
algorithm
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210519444.0A
Other languages
Chinese (zh)
Other versions
CN114625901B (en
Inventor
王冲
唐建松
张晟辉
张犇
朱云
何远峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Dimension Software Co ltd
Original Assignee
Nanjing Dimension Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Dimension Software Co ltd filed Critical Nanjing Dimension Software Co ltd
Priority to CN202210519444.0A priority Critical patent/CN114625901B/en
Publication of CN114625901A publication Critical patent/CN114625901A/en
Application granted granted Critical
Publication of CN114625901B publication Critical patent/CN114625901B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/55Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/5866Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, manually generated location and time information

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a multi-algorithm integration method and a device, wherein the method comprises the following steps: collecting registration information of a plurality of service algorithms and registering; integrating interfaces of a plurality of registered service algorithms; calling a plurality of registered databases of service algorithms and receiving data of each database; extracting the characteristics of the data of each database; performing association, comparison and identification processing on the extracted features, and performing classified storage according to feature types; receiving a service task request; determining at least two optimal service algorithms from the registered multiple service algorithms according to the service task request; executing the service tasks through the at least two optimal service algorithms, and integrating the execution results of the executed service tasks; according to the method, the multiple algorithms are integrated and compared, and the service task request is optimally matched with the algorithms, so that the efficient integration of the multiple algorithms is realized, the algorithm calling speed is increased, and the service task execution efficiency and accuracy are higher.

Description

Multi-algorithm integration method and device
Technical Field
The invention relates to the technical field of computers, in particular to a multi-algorithm integration method and device.
Background
With the continuous development of image processing technology, video image information systems are increasingly used in public security, customs and other inspections. In the process of computer deployment and personnel merging, a corresponding system is required to provide multiple services such as image analysis, image deployment and control, image retrieval and the like, so that multiple service algorithms provided by multiple manufacturers are required to be adopted. Due to the fact that technical implementation modes of different manufacturers have certain differences, and algorithm using modes have various differences, integration of multi-algorithm universality is needed in the process of multi-algorithm design of a video image information system, so that the sampling inspection work can be finished more smoothly in the working process.
At present, various algorithms have been integrated. For example, patent document CN112988384A discloses a scene-based algorithm resource automatic integration calling method, which performs pre-calibration of algorithm capability, performs uniform format encapsulation on an algorithm model of input algorithm resources in a form of a callable interface, deploys and opens a received algorithm interface and verifies an operation mechanism thereof, so as to match an optimal algorithm interface for a user.
According to the scheme, the tasks are distributed, and various algorithm resources are integrally called, so that the most suitable algorithm interface is automatically evaluated and recommended, the interface formats of the similar algorithms are unified, and the time for a user to select the algorithm interface is saved. However, the method only selects the algorithm which is most matched with the task through the algorithm capability, and is poor in matching accuracy and low in efficiency.
Disclosure of Invention
The invention provides a multi-algorithm integration method and a multi-algorithm integration device, which are used for realizing high-efficiency integration of various algorithms by integrating and comparing various algorithms and optimally matching service task requests with the algorithms, improving the algorithm calling speed and realizing higher service task execution efficiency and higher accuracy.
A method of multi-algorithm integration, comprising:
collecting registration information of a plurality of service algorithms and registering;
interface integration is carried out on a plurality of registered service algorithms;
calling a plurality of registered databases of service algorithms and receiving data of each database;
extracting the characteristics of the data of each database;
performing association, comparison and identification processing on the extracted features, and performing classified storage according to feature types;
receiving a service task request;
determining at least two optimal service algorithms from the registered service algorithms according to the service task request;
and executing the service tasks through the at least two optimal service algorithms, and integrating the execution results of the executed service tasks.
Further, interface integration is performed on the registered service algorithms, and the method comprises the following steps:
and carrying out unified assignment on the interface parameters and the return result formats of all the service algorithms to form a universal interface.
Further, the service algorithm includes a classification algorithm, a regression algorithm and a clustering algorithm.
Further, determining at least two optimal service algorithms from the registered plurality of service algorithms according to the service task request, comprising:
analyzing the service task request to obtain a setting parameter related to the service task;
calling at least three service algorithms of different types, inputting the setting parameters into the service algorithms of different types, and obtaining output result parameters;
and verifying the output result parameters, and determining at least two optimal service algorithms according to the verification result.
Further, verifying the output result parameters, and determining at least two optimal service algorithms according to the verification result, including:
respectively establishing N data sets according to the N output result parameters, wherein N is an integer greater than or equal to 3;
in each round of verification, one data set is selected from the data sets to serve as verification data, the other data sets serve as training data, the training data are input into a service algorithm to be trained, then the verification data are input to be verified, the mean square error is calculated, and the mean square error average value of the service algorithm is calculated after N rounds of verification;
performing the verification on each service algorithm to obtain a mean square error average value of each service algorithm, and sequencing the mean square error average values in an ascending order according to the mean square error average values of the service algorithms;
and selecting a preset number of service algorithms which are ranked in the front to determine as the optimal service algorithm, wherein the preset number is more than or equal to two.
Further, the mean square error is calculated by the following formula:
Figure 935740DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 210339DEST_PATH_IMAGE002
m is the total number of the result data and the verification data output by training, r is the grouping number of the result data and the verification data output by training, M-r is the degree of freedom,
Figure 348059DEST_PATH_IMAGE003
for the (i) th sample,
Figure 685631DEST_PATH_IMAGE004
sample variance generated for each group
The mean square error average is calculated according to the following formula:
Figure 377643DEST_PATH_IMAGE005
wherein E is the mean square error, N is the number of verifications,
Figure 52338DEST_PATH_IMAGE002
the mean square error obtained in the verification of the ith round.
Further, integrating the execution result of executing the service task includes:
and carrying out weighted average or simple average on the execution result to obtain an integrated result.
Further, integrating the execution result of executing the service task includes:
and carrying out classification voting on the execution result to obtain an integration result.
A multi-algorithm integration device applied to the method comprises the following steps:
the registration module is used for acquiring registration information of a plurality of service algorithms and registering;
the interface integration module is used for integrating the interfaces of the registered service algorithms;
the database calling module is used for calling the registered databases of the plurality of service algorithms and receiving the data of each database;
the database feature extraction module is used for extracting features of data of each database;
the database integration module is used for performing association, comparison and identification processing on the extracted features and performing classified storage according to feature types;
the receiving module is used for receiving a service task request;
the determining module is used for determining at least two optimal service algorithms from the registered service algorithms according to the service task request;
and the result integration module is used for executing the service tasks through the at least two optimal service algorithms and integrating the execution results of the executed service tasks.
An electronic device comprises a processor and a storage device, wherein the storage device stores a plurality of instructions, and the processor is used for reading the instructions and executing the method.
The multi-algorithm integration method and the device provided by the invention at least have the following beneficial effects:
(1) the interfaces of a plurality of service algorithms and the related databases are integrated to form external interfaces with uniform formats and databases which are more convenient to call and compare, so that the running speed of the algorithms is increased, and the execution efficiency of service tasks is improved.
(2) The optimal algorithms can be matched from various algorithms for the service task request, the obtained result generalization capability is better, more stable and comprehensive, the error of algorithm execution is minimized, and the accuracy of task execution is improved.
(3) And integrating the algorithm results by a voting method and an averaging method, so that the accuracy of the service execution result is improved.
Drawings
Fig. 1 is a flowchart of an embodiment of a multi-algorithm integration method provided in the present invention.
Fig. 2 is a flowchart of an embodiment of a method for verifying an output result parameter in the method according to the present invention.
Fig. 3 is a schematic structural diagram of an embodiment of a multi-algorithm integration apparatus provided in the present invention.
Fig. 4 is a schematic structural diagram of an embodiment of an electronic device provided in the present invention.
Reference numerals: 1-a processor, 101-a registration module, 102-an interface integration module, 103-a database calling module, 104-a database feature extraction module, 105-a database integration module, 106-a receiving module, 107-a determination module, 108-a result integration module and 2-a storage device.
Detailed Description
In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.
Referring to fig. 1, in some embodiments, there is provided a multi-algorithm integration method comprising:
s1, collecting and registering registration information of a plurality of service algorithms;
s2, integrating interfaces of a plurality of registered service algorithms;
s3, calling the databases of the registered service algorithms and receiving the data of each database;
s4, extracting the characteristics of the data of each database;
s5, performing association, comparison and identification processing on the extracted features, and performing classified storage according to feature types;
s6, receiving a service task request;
s7, determining at least two optimal service algorithms from the registered service algorithms according to the service task request;
and S8, executing the service tasks through the at least two optimal service algorithms, and integrating the execution results of the executed service tasks.
Specifically, in some embodiments, for example, in the field of deployment and control, in step S1, the collected multiple service algorithms are such that the algorithm platform has the following service capabilities: the system comprises an image structured analysis service, an image control service, an image retrieval service, an image scheduling service and an image autonomous comparison service.
The image structured analysis service refers to the on-line image stream analysis capability, namely the structured analysis processing capability of the image stream imported in real time, and specifically comprises the analysis of human faces, human bodies and other accessories, so that the human face track structured data meeting the specification is formed; the image deployment and control service refers to the utilization of image similarity comparison capability, namely the capability of realizing real-time deployment and control comparison service based on images, also supports online comparison deployment and control service for images of human bodies and the like of human faces, and realizes deployment and control early warning capability by constructing various deployment and control object libraries in different ranges and comparing the characteristics of personnel objects extracted from online image streams with the characteristics of a specified deployment and control object library (such as an escaping personnel library and a key object library); the image retrieval service is to realize the retrieval service based on image similarity comparison by utilizing the extraction and calculation capabilities of image features, and return results according to the set image similarity and other retrieval conditions (time, space, element attributes and the like); the image scheduling service is based on various image storage environments processed by a video image analysis platform, builds retrieval service capability of various images (including local small images and overall background images), and comprises image retrieval based on identification such as access addresses, image IDs and the like, and supports the inquiry and retrieval of batch images through time ranges and point locations; the image autonomous comparison service refers to comparing the similarity of a specified image or image set based on image similarity comparison capability, giving a result according with the similarity, and specifically supporting multiple comparison modes such as 1:1, 1: q, q: q and the like.
The 1:1 mode is to compare the similarity of two designated images and determine whether the images point to the same object. If the certificate photo and the field snapshot photo are compared, the identity verification of the checking personnel is completed; 1: the q-mode is to compare a specified image with a specific image library and find out an image whose specified similarity is achieved in the image library. If the picture of the checking personnel is compared with the key object library, whether the checking personnel is the key object is judged; the q mode refers to comparing two image sets to find out a coincident object with close similarity so as to achieve the purpose of data intersection set analysis. Such as comparing sets of photos of suspect objects appearing in two different cases for case string-parallel analysis.
The registration information of the plurality of service algorithms comprises detailed information of each algorithm, corresponding use modes, technical specifications and other information, and specifically comprises algorithm basic information, algorithm monitoring analysis, algorithm heartbeat monitoring records, algorithm use log records and algorithm data reconciliation.
The basic algorithm information comprises: source algorithm protocol (e.g., webservice), return result format (e.g., xml), execution mode (e.g., post), source interface address, algorithm authorization password information, algorithm providing unit, algorithm provider contact, source algorithm usage description (e.g., algorithm using API document), monitoring status (e.g., monitoring once every M minutes, no return result beyond K seconds is considered as no response exception), affiliation service.
The algorithm monitoring analysis is specifically the display of the daily running condition of the algorithm. As a better implementation mode, a monitoring analysis condition display report is provided and consists of a two-dimensional coordinate system, wherein the x axis displays the date of the last 30 days, and the y axis displays the normal or abnormal condition of the current day; the upper right corner of the report provides two choices of time or times for the y axis for switching, and the time is displayed in a default state; the y-axis selects two columns, one showing time to failure (in hours) and the other showing time to normality (in hours).
The algorithm heartbeat monitoring record comprises the following steps: query conditions, list presentation, and list content. The query conditions comprise monitoring time parameters and monitoring results which are selected according to the range; the list display comprises a sequence number, monitoring time, response speed (unit is millisecond) and monitoring results, wherein the monitoring results comprise three types of no response, slow response speed and normal response.
The algorithm usage logging includes: query terms, list presentation, and list content. The query conditions comprise use time, return result time, use units and use IP which are selected according to the range; the tabular presentation includes sequence number, time of use, return speed, use of IP, unit of use, and request parameter details.
The algorithm data reconciliation comprises the following steps: query conditions, list presentation, and list content. The query conditions comprise account checking time and account checking results selected according to the range, and the account checking results comprise three types of normal data, lost data and abnormal increase; the list display comprises a serial number, account checking time, an access data type, an access data volume, an output data volume and a docking result.
In step S2, the interface integration of the registered multiple service algorithms includes:
and carrying out unified assignment on the interface parameters and the return result formats of all the service algorithms to form a universal interface.
As a better implementation mode, based on a uniform GA/T1400 service interface specification, the specific use parameters and the return result formats of the algorithm interfaces of different manufacturers are subjected to universal integration to form a uniform assignment standard and a universal interface. A universal rest framework is supported to be a scheduling interface implementation mode, and two interface checking mechanisms are provided to ensure timeliness and accuracy of the service. Firstly, an overtime control mode is adopted, and in order to avoid the influence on user experience caused by that no feedback is obtained for a long time when a service interface is dispatched, the task dispatching of the interface which does not return a result for more than 5 seconds is automatically terminated; and secondly, a repeated request mode is adopted, repeated scheduling is carried out after the response of a single service interface fails in order to avoid response faults caused by factors such as network instability, and the response rate of the interface is improved through repeated cyclic scheduling.
For various image analysis interfaces which need to be used online, such as face retrieval, face similarity comparison and the like, and need to process real-time data or respond to requests in real time, algorithm API interfaces of different manufacturers are subjected to standardized conversion on the basis of design depending on relevant specifications of a view library to form a universal interface. Specifically, the restful interface specification is used as a using mode of the universal interface, standardized conversion can be carried out on various HTTP interfaces and development kits based on SDK of different manufacturers, and the universal interface is formed and applied to various service scenes.
The relevant specification interfaces include a common class interface, a collection class interface and a service class interface. The public interfaces comprise four interfaces of registration keeping, keep-alive, cancellation and timing; the collection interfaces comprise interfaces for uploading video clips, images, files, human faces, motor vehicles, non-motor vehicles, articles, scenes and the like; the service interface comprises interfaces such as video clip inquiring and maintaining, image inquiring and maintaining, file inquiring and maintaining, face inquiring and maintaining, motor vehicle inquiring and maintaining, non-motor vehicle inquiring and maintaining, object inquiring and maintaining, scene inquiring and maintaining, deployment and control task inquiring and maintaining, alarm information inquiring and maintaining, subscription record inquiring and maintaining, notification record inquiring and maintaining and the like.
The relevant service specification mode is as follows: the interface message Content-Type header field should be set to application/+ JSON. The result item returned by the GET method is a result object returned when the query is successful (that is, the HTTP response status code is 2 XX). When the query is unsuccessful (i.e., the HTTP response status code is not 2 XX), the result object returned is ResponseStatus. In one specific application scenario, the canonical service interface is seen in table 1.
TABLE 1
Figure 44565DEST_PATH_IMAGE006
The definitions of Register and ResponseStatus should conform to the regulations in GA/T1400.3 protocol, where ID of ResponseStatus is DeviceID requesting registration, StatusCode is operation response code of this registration, StatusString is operation response description of this registration, LocalTime is system time of the registered party, and may be used for timing of the registered party.
In step S3, a plurality of databases of registered service algorithms are called, and data of each database is received, thereby preparing for database integration. The database integration is a data standardization process and supports real-time calculation, off-line calculation and batch processing operation, and the data transmission process supports a distributed data transmission mode. In the data processing process, an artificial intelligence technology, such as a graph calculation technology and a memory calculation technology, is introduced, so that the processing of structured data and unstructured data is realized, and the value of the data is improved. In the data processing process, a model system, a label system and a knowledge map technology are introduced, so that the value density of data is further improved, and data value increment, data preparation and data abstraction are realized for data intelligent application. For various situations that need to rely on a designated database for algorithm processing, such as a database used by an image organization and control comparison service, a database used by a person merging and the like (e.g. an Oracle library and an Hbase library), two different ways of view conversion and database docking are designed and provided. The view conversion mode is to convert a general database into a corresponding view structure according to the use requirement of the algorithm and provide the view structure for the algorithm to use; the database docking mode is to switch the analysis result output by the algorithm from the database of the algorithm to a universal database in a database docking mode, and to complete the conversion of the structure specification in the switching process.
In step S4, the feature extraction process is to extract features such as names, characters, pictures, id cards, mobile phone numbers, key frames, facial features, fingerprint features, voiceprint features, iris features, and the like associated with the algorithm processing from the structures such as full text structured data, web page information, multimedia information, biological features, and the like included in the data of each database. After step S3, filtering and cleaning the extracted features to obtain data with higher quality, specifically including the following steps: unifying data formats, removing repeated errors and associated errors, correcting content errors and logic errors, correcting data inconsistency, splitting content, supplementing missing data and the like.
In step S5, association, comparison and identification processing are performed on the cleaned data features, data are output according to data standards by classification, and data storage is performed according to certain storage specifications and storage strategies. Specifically, the association includes associating internet site data, fixed network data, mobile internet data, and the like; the comparison comprises the comparison of keywords, texts, voice images, binary systems, structuralization and the like; the identification includes marking the business, region, national language, spatial location, sensitivity level, etc. The processed database data are classified according to the characteristic types and stored in databases such as a resource library, a subject library, a special subject library and the like.
As a preferred implementation mode, the database integration adopts a database integration platform. The database integration platform supports automatic data processing and checking functions according to strategies and rules, adopts a dynamic, configurable and extensible open architecture, supports dynamic arrangement in a data preprocessing link, establishes a unified coordination mechanism for processing data flow and control flow, realizes unified addressing management of a data life cycle, and ensures the integrity and consistency of data. The data caching mechanism is used for processing instantaneous peak data streams, and the processing capacity of structured and unstructured data such as multimedia, text and encrypted files is achieved. And the database integration platform processes the structured data, the semi-structured data and the unstructured data according to the data standard, the data verification rule and the data processing strategy. The database integration platform fully considers the characteristics of data, gives consideration to the variety diversity, the mass property of data quantity, the multi-source heterogeneity of the data, the complexity of data formats and the online timeliness of the data, and comprehensively constructs a data resource fusion system with all-dimensional acquisition, all-network convergence and all-dimensional integration. By taking intelligent application as a guide and data processing automation and intellectualization, the data association degree and the service compactness are improved, the data quality is improved, the potential and the value of data resources are mined, a mass data resource pool is constructed by scientific classification, a foundation is made for data organization and storage, and the actual combat application of each business department is effectively supported.
Specifically, the database integration platform supports real-time streaming data processing, offline data calculation and distributed data management. The data storage supports data types such as a relational database, a column cluster database, a graph database, a text file, a binary file, a video format, an audio format, a picture format, a large object, serialized data, XML, JSON, a general machine learning model, a statistical analysis model and the like; the data integration platform supports real-time marking and off-line marking functions of processed data and supports label engineering and knowledge map technology; the database integration platform supports generation of an analysis report on the data quality condition of the processed data, the quality of the processed data passes through a data quality evaluation model, and the data is subjected to quality grading, and the analysis report comprises a data consistency report, a data integrity report and a data credibility report; the data integration basic platform records detailed operation logs of data processing, supports log records of an operation level, a service level and a system level, and supports auditing, system maintenance, tuning analysis, problem tracking and the like of the operation logs; the data integration basic platform supports automatic processing and manual processing of problem data so as to analyze the reason of unqualified data, solve the problem of unqualified data, improve the quality of accessed data, support additional recording of problem data and repair and reuse of problem data; the data integration basic platform system needs to support the monitoring of the running state of the processed data, the statistical analysis of the data and the quality monitoring.
In step S7, determining at least two optimal service algorithms from the registered plurality of service algorithms according to the service task request, including:
s71, analyzing the service task request to obtain the setting parameters of the service task;
s72, calling at least three service algorithms of different types, inputting the setting parameters into the service algorithms of different types, and obtaining output result parameters;
and S73, verifying the output result parameters, and determining at least two optimal service algorithms according to the verification result.
In step S71, most algorithms need to choose to set many parameters to help us control the behavior of the algorithm while maximizing the platform performance.
Some learning algorithms make certain assumptions about the structure of the data or the expected outcome in step S72, and if a desired algorithm type is found, a more useful outcome, more accurate prediction, or faster settling time may be provided. Referring to table 2, when comparing and selecting the classification, regression, and clustering series type algorithms, the most important features such as accuracy, setting time, linearity, number of main parameters, etc. of the algorithms are mainly evaluated.
TABLE 2
Figure 809828DEST_PATH_IMAGE007
The following describes determining an optimal service algorithm in a specific application scenario. For example, for a task of extracting human face features, setting parameters of the task are extracted, wherein the setting parameters comprise a loss function, a learning rate, a kernel function, a smooth curve function, a weighting category, a penalty coefficient, iteration times and the like, a HoG algorithm, a Dlib algorithm and a convolutional neural network feature extraction algorithm are called, the setting parameters are input, the algorithms are operated, a result parameter list is output, the result parameters are output for verification, and at least two optimal service algorithms are determined according to verification results.
Referring to fig. 2, in step S73, verifying the output result parameter, and determining an optimal service algorithm according to the verification result includes:
s731, respectively establishing N data sets according to the N output result parameters, wherein N is an integer greater than or equal to 3;
s732, in each round of verification, selecting one data set from the data sets as verification data, using other data sets as training data, inputting the training data into a service algorithm for training, then inputting the verification data for verification and calculating a mean square error, and calculating a mean square error average value of the service algorithm after N rounds of verification;
s733, performing the verification on each service algorithm to obtain a mean square error average value of each service algorithm, and performing ascending sequencing according to the mean square error average value of the service algorithms;
s734, selecting a preset number of service algorithms in the top sequence to determine the service algorithms as optimal service algorithms, wherein the preset number is greater than or equal to two.
In step S732, the mean square error is calculated by the following formula:
Figure 785874DEST_PATH_IMAGE008
wherein the content of the first and second substances,
Figure 405206DEST_PATH_IMAGE002
m is the total number of the result data and the verification data output by training, r is the grouping number of the result data and the verification data output by training, M-r is the degree of freedom,
Figure 986360DEST_PATH_IMAGE003
for the (i) -th sample, the sample is,
Figure 259209DEST_PATH_IMAGE004
sample variance generated for each group:
Figure 660235DEST_PATH_IMAGE009
wherein the content of the first and second substances,
Figure 673803DEST_PATH_IMAGE010
representing a random variable;
the mean square error average is calculated according to the following formula:
Figure 109463DEST_PATH_IMAGE011
wherein E is the mean square error, N is the number of verifications,
Figure 818793DEST_PATH_IMAGE002
the mean square error obtained in the verification of the ith round.
The above method aims at assessing how generalized the generalization ability of a given algorithm is trained on a particular data set, and by observing the difference in accuracy in different rounds, it is possible to learn the worst and best performance of the model when the algorithm is applied to new data. The obtained result is more stable and comprehensive.
In step S8, a service task is executed by the at least two optimal service algorithms.
The following is further described by specific application scenarios.
In a specific application scenario, three algorithms, namely CLS (a kind of decision tree), DET (a kind of target detection algorithm) and REC (font recognition convolutional neural network), are adopted in OCR (optical text recognition). CLS is a method and a device for training a decision tree model and determining data attributes in an OCR result, wherein the method for training the decision tree model comprises the following steps: acquiring a sample picture and performing OCR recognition on the sample picture to generate a first OCR recognition result, wherein the first OCR recognition result is a two-dimensional character string array, and each row of data in the two-dimensional character string array is used for indicating data belonging to the same attribute row; extracting first characteristic information of each data in the first OCR recognition result; acquiring first labeling data corresponding to each data in the first OCR recognition result, wherein the first labeling data are used for indicating the attribute of each data; and training according to the first characteristic information and the first marking data to generate a decision tree model for determining data attributes in the OCR recognition result. The method realizes automatic marking of data attributes in the identification result, effectively reduces consumption cost in the identification process of the picture to be identified, and improves identification efficiency; DET is a target detection algorithm, a picture is input, the output of a model needs to circle the positions of all characters in the picture and the categories of the characters, then visual features related to candidate regions are extracted, and finally a classifier is used for identifying and detecting whether pixel points in the region range form the characters or not; the REC is used for specifically recognizing characters in the area, predicting characters in the corresponding area according to the trained model, and the algorithm is also a core algorithm in the OCR function.
The cooperation mode of the algorithms comprises three modes of concurrency, primary and secondary and division of labor, a user can allocate specific amount of resources to each algorithm in the concurrency mode and the primary and secondary modes, and the allocation mode comprises three modes of regional allocation, point allocation and random allocation.
The concurrent cooperation mode refers to that a plurality of different algorithms are used for the same task at the same time, each algorithm respectively processes resources of different data sources and then returns a result, and the tasks perform unified result summarization. By the method, the established algorithm achievements and the subsequent newly-established algorithms can be fully utilized, and potential risks in analysis reliability and accuracy caused by a single algorithm are avoided. The multiple algorithms can also be verified in a cooperation mode, generally, a data analysis result completed by a first algorithm is handed to a second algorithm for secondary result verification, whether the processing of the same picture is different or not is checked, and whether the analysis result is reliable or not is judged by comparing the difference of the two calculation results.
The region allocation means selecting according to each level of administrative division; point location selection refers to fuzzy retrieval according to each point location; the random distribution refers to the random dynamic adjustment of the performance of each algorithm process according to the pressure magnitude of the data process.
In step S8, the integrating the execution result of the service task includes:
carrying out weighted average or simple average on the execution results to obtain an integration result;
and carrying out classification voting or classification probability voting on the execution result to obtain an integration result.
The execution results of the execution service tasks are integrated, and the receiving aspect comprises two different modes: firstly, a kafka message service channel is constructed and opened to a plurality of algorithm manufacturers, the algorithm manufacturers actively push the returned result to the service channel, and at the moment, the system can sense the newly fed back service result in real time and can process the newly fed back service result in time; the other is that the manufacturer provides an output service interface or a database of the algorithm result, and the system scans the service interface or the database of the manufacturer in a timing polling mode to judge whether a new result is generated.
Specifically, for the result of the regression problem, a simple average method is adopted for the prediction results of various algorithm models, so that the obtained result can reduce overfitting, the boundary is smoother, and the problem that the boundary of a single model is rough is avoided. The results of the algorithm calculation are generally set, and the results of the execution of a certain classification task refer to table 3.
TABLE 3
Figure 238273DEST_PATH_IMAGE012
Algorithm a = [0,1,0,1,1,0,0 ];
algorithm B = [0,0,1,0,1,0,0 ];
algorithm C = [0,1,1,1,1,0,1 ];
in the results, 0 indicates that the class α and 1 indicates the class β, and there is no difference in the quality. The result set of the algorithm A, B, C represents the predicted results for the corresponding sample from left to right, and the results are integrated as follows:
simple averaging:
Figure 58462DEST_PATH_IMAGE013
Figure 20733DEST_PATH_IMAGE014
Figure 900964DEST_PATH_IMAGE015
Figure 279511DEST_PATH_IMAGE016
Figure 372232DEST_PATH_IMAGE017
Figure 782485DEST_PATH_IMAGE018
namely, it is
Figure 833618DEST_PATH_IMAGE019
Figure 493269DEST_PATH_IMAGE020
Wherein S represents the integration result, alpha and beta represent categories, A, B, C represents the result set under different algorithms,
Figure 655260DEST_PATH_IMAGE021
representing the probability of the result set of the category alpha under different algorithms.
Weighted average:
accuracy of algorithm
Figure 388861DEST_PATH_IMAGE022
Calculating the algorithm weight:
Figure 280069DEST_PATH_IMAGE023
Figure 364700DEST_PATH_IMAGE024
Figure 330382DEST_PATH_IMAGE025
where TP indicates that a Positive determination is made and the determination is correct, FP indicates that a Positive determination is made, but TN that the determination is wrong indicates that a Negative determination is made and the determination is correct, and FN indicates that a Negative determination is made but the determination is wrong.
Further, the classification voting is to use the output of each algorithm as input, convert the one-dimensional result into N samples in a two-dimensional feature space by using a KNN (nearest neighbor node algorithm), calculate the distances from the test samples to other sample points, sort each distance, select K points with the smallest distance, compare the categories to which the K points belong, and classify the test sample points into the category with the highest ratio among the K points according to the principle that minority obeys majority.
Most of the K most similar samples in the feature space belong to a certain class, and the KNN algorithm is suitable for automatic classification of class domains with large sample capacity and is suitable for classification voting.
The portion of the KNN algorithm responsible for implementing classification is straightforward, but two points of the algorithm are not easily determined, just from its name: one is how to determine "K" and the other is how to determine "NN". The similar attraction is a guiding idea of the KNN classification algorithm, so that the machine learning model can be separated from the dependence on the deviation and also has the classification effect. The actual sample has many dimensions, and the distribution of sample data points is different from different dimensions. Assuming that 2 of the 4 dimensions are arbitrarily taken as the X-axis and Y-axis coordinates of the image at a time, 16 images will be obtained.
It can be seen that, for the same sample, after different dimensions are selected, a more complex relationship of canine-crossing is presented between classes, but the tendency of "clustering" of the same class becomes less obvious, the distribution range of samples in the class is wider, and the possibility of being mixed with samples of other classes becomes higher. For KNN, the classification is determined by distance. Specifically, each data point can be made according to the value of each dimension of the sample, and only the distance between each data point and each point needs to be measured, and then if a certain point stroke is classified, only the point needs to be taken as the center of a circle, and then the points adjacent to the point can be found, so that the class is formed. Only points within the circle have a voting weight on which class this point belongs, rather than being voted on by the entire sample. The adjustable parameters of different algorithm models are different. In the KNN algorithm, the selected points, namely the K in the KNN, are parameters which need to be adjusted according to actual conditions so as to obtain better fitting effect, and can be set by combining working experience, wherein the value of the K is generally 3-10.
In a specific application scenario, assuming that there are three independent models, each with 70% accuracy, voting in a minority majority-compliant manner, the final accuracy will be:
Figure 449647DEST_PATH_IMAGE026
(ii) a Namely, the result is subjected to simple classification voting, and the accuracy is improved by 8%. This is a simple probabilistic problem, and if the more the voting algorithm results, the better the result will be, but the precondition is that the algorithm models are independent from each other, and there is no correlation between the results. The more similar algorithm models are integrated, the poorer the integration effect is; the larger the correlation difference between the algorithm models is, the better the integration result will be, and the characteristic will not be affected by the integration mode.
The method also comprises the step of checking the operation condition of the algorithm by adopting the algorithm monitoring and data reconciliation modes. Algorithm monitoring refers to providing monitoring capability of algorithm processing conditions for each algorithm in use, and comprises functions of algorithm current state, algorithm processing flow monitoring and the like, and timely alarming is performed on abnormal algorithms, for example, the abnormal algorithms are checked for heartbeat monitoring conditions, and the functions comprise serial numbers, monitoring time, response speed (unit millisecond), monitoring results (no response, slow response speed and normal response). The data packet of the heartbeat monitoring is from simulation data generated by existing data, the data is sent to the trained model, and the model can predict and return results in real time.
The data reconciliation refers to the process of checking and verifying the number of data, the size of the data and the data fingerprint in the data exchange process of a data provider and a data access party, and comprises the steps of checking the reconciliation of the access number of the algorithms and the data quantity analyzed and output according to the running logs of each algorithm and checking whether the algorithm has data omission or not. After account checking is finished, account checking is required, and logs are required to be recorded when account checking is abnormal. And dividing the data reconciliation into a data access reconciliation and a data distribution reconciliation according to the scene of the data reconciliation. The account checking content comprises the following steps: sequence number, account checking time, access data type, access data volume, output data volume and butt joint result.
Referring to fig. 3, in some embodiments, there is provided a multi-algorithm integration apparatus applied to the above method, including:
the registration module 101 is configured to collect registration information of a plurality of service algorithms and perform registration;
the interface integration module 102 is used for integrating interfaces of a plurality of registered service algorithms;
a database calling module 103, configured to call databases of multiple registered service algorithms, and receive data of each database;
a database feature extraction module 104, configured to perform feature extraction on data of each database;
the database integration module 105 is used for performing association, comparison and identification processing on the extracted features and performing classified storage according to feature types;
a receiving module 106, configured to receive a service task request;
a determining module 107, configured to determine at least two optimal service algorithms from the registered multiple service algorithms according to the service task request;
and the result integration module 108 is configured to execute the service task through the at least two optimal service algorithms and integrate an execution result of executing the service task.
Specifically, the interface integration module 102 is further configured to perform unified assignment on the interface parameters and the return result formats of the service algorithms to form a universal interface.
In some embodiments, the determining module 107 is further configured to parse the service task request to obtain a setting parameter related to the service task; calling at least three service algorithms of different types, inputting the setting parameters into the service algorithms of different types, and obtaining output result parameters; and verifying the output result parameters, and determining at least two optimal service algorithms according to the verification result.
In some embodiments, the determining module 107 is further configured to respectively establish N data sets according to the N output result parameters, where N is an integer greater than or equal to 3; in each round of verification, one data set is selected from the data sets to serve as verification data, the other data sets serve as training data, the training data are input into a service algorithm to be trained, then the verification data are input to be verified, the mean square error is calculated, and the mean square error average value of the service algorithm is calculated after N rounds of verification; performing the verification on each service algorithm to obtain a mean square error average value of each service algorithm, and sequencing the mean square error average values in an ascending order according to the mean square error average values of the service algorithms; and selecting a preset number of service algorithms which are ranked in the front to determine as the optimal service algorithm, wherein the preset number is more than or equal to two.
In some embodiments, the result integration module 108 is further configured to perform weighted average or simple average on the execution results to obtain integration results, and perform classification voting on the execution results to obtain integration results.
Referring to fig. 4, in some embodiments, an electronic device is provided, which includes a processor 1 and a storage 2, where the storage 2 stores a plurality of instructions, and the processor 1 is configured to read the plurality of instructions and execute the method described above.
The multi-algorithm integration method and the multi-algorithm integration device provided by the embodiment integrate the interfaces of a plurality of service algorithms and the related databases to form external interfaces with uniform formats and databases which are more convenient to call and compare, improve the running speed of the plurality of algorithms, and thus improve the execution efficiency of service tasks; the optimal algorithms are matched from the multiple algorithms, the obtained result has better generalization capability and is more stable and comprehensive, so that the error of algorithm execution is minimized, and the accuracy of task execution is improved; the algorithm results are integrated through a voting method and an averaging method, and then training is performed through a parallel or serial mode, so that a service task execution mode with higher accuracy is obtained, and the performance of the multi-algorithm model is further optimized.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention. It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. A method for multi-algorithm integration, comprising:
collecting registration information of a plurality of service algorithms and registering;
integrating interfaces of a plurality of registered service algorithms;
calling a plurality of registered databases of service algorithms and receiving data of each database;
extracting the characteristics of the data of each database;
performing association, comparison and identification processing on the extracted features, and performing classified storage according to feature types;
receiving a service task request;
determining at least two optimal service algorithms from the registered service algorithms according to the service task request;
and executing the service tasks through the at least two optimal service algorithms, and integrating the execution results of the executed service tasks.
2. The method of claim 1, wherein the interface integration of the registered plurality of service algorithms comprises:
and carrying out unified assignment on the interface parameters and the return result formats of all the service algorithms to form a universal interface.
3. The method of claim 1, wherein the types of service algorithms include classification algorithms, regression algorithms, and clustering algorithms.
4. The method of claim 1, wherein determining at least two optimal service algorithms from the registered plurality of service algorithms based on the service task request comprises:
analyzing the service task request to obtain a setting parameter related to the service task;
calling at least three service algorithms of different types, inputting the setting parameters into the service algorithms of different types, and obtaining output result parameters;
and verifying the output result parameters, and determining at least two optimal service algorithms according to the verification result.
5. The method of claim 4, wherein the verifying the output result parameters and determining at least two optimal service algorithms according to the verification result comprises:
respectively establishing N data sets according to the N output result parameters, wherein N is an integer greater than or equal to 3;
in each round of verification, one data set is selected from the data sets to serve as verification data, the other data sets serve as training data, the training data are input into a service algorithm to be trained, then the verification data are input to be verified, the mean square error is calculated, and the mean square error average value of the service algorithm is calculated after N rounds of verification;
performing the verification on each service algorithm to obtain a mean square error average value of each service algorithm, and sequencing the mean square error average values in an ascending order according to the mean square error average values of the service algorithms;
and selecting a preset number of service algorithms which are ranked in the front to determine as the optimal service algorithm, wherein the preset number is more than or equal to two.
6. The method of claim 5, wherein the mean square error is calculated by the following equation:
Figure 180813DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 376915DEST_PATH_IMAGE002
m is the total number of the result data and the verification data output by training, r is the grouping number of the result data and the verification data output by training, M-r is the degree of freedom,
Figure 969701DEST_PATH_IMAGE003
for the (i) th sample,
Figure 240277DEST_PATH_IMAGE004
the sample variance generated for each group;
the mean square error average is calculated according to the following formula:
Figure 523490DEST_PATH_IMAGE005
wherein E is the mean square error, N is the number of verifications,
Figure 693090DEST_PATH_IMAGE002
the mean square error obtained in the verification of the ith round.
7. The method of claim 1 or 5, wherein integrating results of executing the service task comprises:
and carrying out weighted average or simple average on the execution result to obtain an integrated result.
8. The method of claim 1, wherein integrating results of executing service tasks comprises:
and carrying out classification voting on the execution result to obtain an integration result.
9. A multi-algorithm integration apparatus for use in the method of any one of claims 1-8, comprising:
the registration module is used for acquiring registration information of a plurality of service algorithms and registering;
the interface integration module is used for integrating the interfaces of the registered service algorithms;
the database calling module is used for calling the registered databases of the plurality of service algorithms and receiving the data of each database;
the database feature extraction module is used for extracting features of data of each database;
the database integration module is used for performing association, comparison and identification processing on the extracted features and performing classified storage according to feature types;
the receiving module is used for receiving a service task request;
the determining module is used for determining at least two optimal service algorithms from the registered service algorithms according to the service task request;
and the result integration module is used for executing the service tasks through the at least two optimal service algorithms and integrating the execution results of the executed service tasks.
10. An electronic device comprising a processor and a storage device, the storage device storing a plurality of instructions, the processor being configured to read the plurality of instructions and to perform the method according to any one of claims 1-8.
CN202210519444.0A 2022-05-13 2022-05-13 Multi-algorithm integration method and device Active CN114625901B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210519444.0A CN114625901B (en) 2022-05-13 2022-05-13 Multi-algorithm integration method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210519444.0A CN114625901B (en) 2022-05-13 2022-05-13 Multi-algorithm integration method and device

Publications (2)

Publication Number Publication Date
CN114625901A true CN114625901A (en) 2022-06-14
CN114625901B CN114625901B (en) 2022-08-05

Family

ID=81907170

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210519444.0A Active CN114625901B (en) 2022-05-13 2022-05-13 Multi-algorithm integration method and device

Country Status (1)

Country Link
CN (1) CN114625901B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116415206A (en) * 2023-06-06 2023-07-11 ***紫金(江苏)创新研究院有限公司 Operator multiple data fusion method, system, electronic equipment and computer storage medium
EP4375897A1 (en) * 2022-11-25 2024-05-29 Samsung SDS Co., Ltd. System for business process automation and method thereof

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015140592A1 (en) * 2014-03-20 2015-09-24 Tata Consultancy Services Limited Repository and recommendation system for computer programs
CN108280091A (en) * 2017-01-06 2018-07-13 阿里巴巴集团控股有限公司 A kind of task requests execution method and apparatus
CN111241915A (en) * 2019-12-24 2020-06-05 北京中盾安全技术开发公司 Multi-analysis algorithm fusion application service platform method based on micro-service
CN113641482A (en) * 2021-08-31 2021-11-12 联通(广东)产业互联网有限公司 AI algorithm off-line scheduling method, system, computer equipment and storage medium
CN113760513A (en) * 2021-09-16 2021-12-07 康键信息技术(深圳)有限公司 Distributed task scheduling method, device, equipment and medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015140592A1 (en) * 2014-03-20 2015-09-24 Tata Consultancy Services Limited Repository and recommendation system for computer programs
CN108280091A (en) * 2017-01-06 2018-07-13 阿里巴巴集团控股有限公司 A kind of task requests execution method and apparatus
CN111241915A (en) * 2019-12-24 2020-06-05 北京中盾安全技术开发公司 Multi-analysis algorithm fusion application service platform method based on micro-service
CN113641482A (en) * 2021-08-31 2021-11-12 联通(广东)产业互联网有限公司 AI algorithm off-line scheduling method, system, computer equipment and storage medium
CN113760513A (en) * 2021-09-16 2021-12-07 康键信息技术(深圳)有限公司 Distributed task scheduling method, device, equipment and medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4375897A1 (en) * 2022-11-25 2024-05-29 Samsung SDS Co., Ltd. System for business process automation and method thereof
CN116415206A (en) * 2023-06-06 2023-07-11 ***紫金(江苏)创新研究院有限公司 Operator multiple data fusion method, system, electronic equipment and computer storage medium
CN116415206B (en) * 2023-06-06 2023-08-22 ***紫金(江苏)创新研究院有限公司 Operator multiple data fusion method, system, electronic equipment and computer storage medium

Also Published As

Publication number Publication date
CN114625901B (en) 2022-08-05

Similar Documents

Publication Publication Date Title
CN114625901B (en) Multi-algorithm integration method and device
CN108280795A (en) The screening technique of highway green channel exception vehicle based on dynamic data base
CN116049454A (en) Intelligent searching method and system based on multi-source heterogeneous data
CN111897859B (en) Big data intelligent report platform for enterprise online education
CN111931616A (en) Emotion recognition method and system based on mobile intelligent terminal sensor equipment
CN115062675A (en) Full-spectrum pollution tracing method based on neural network and cloud system
CN117112776A (en) Enterprise knowledge base management and retrieval platform and method based on large language model
CN113486983A (en) Big data office information analysis method and system for anti-fraud processing
CN115719283A (en) Intelligent accounting management system
CN110929032A (en) User demand processing system and method for software system
CN116932523B (en) Platform for integrating and supervising third party environment detection mechanism
CN116452212B (en) Intelligent customer service commodity knowledge base information management method and system
Bourqui et al. Detecting structural changes and command hierarchies in dynamic social networks
CN115062725B (en) Hotel income anomaly analysis method and system
CN110689028A (en) Site map evaluation method, site survey record evaluation method and site survey record evaluation device
CN115309705A (en) Data integration classification system and method for automatically identifying basic data elements of urban information model platform
CN112506930B (en) Data insight system based on machine learning technology
CN113516229A (en) Credible user optimization selection method facing crowd sensing system
CN113128452A (en) Greening satisfaction acquisition method and system based on image recognition
CN116993307B (en) Collaborative office method and system with artificial intelligence learning capability
CN117749836B (en) Internet of things terminal monitoring method and system based on artificial intelligence
CN113393216B (en) Laboratory digital system
KR102671618B1 (en) Method and system for providing user-customized interview feedback for educational purposes based on deep learning
CN112712177A (en) Knowledge engineering method and device based on cooperative processing
CN117556256A (en) Private domain service label screening system and method based on big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant