CN114139720A - Government affair big data processing method and device based on machine learning - Google Patents

Government affair big data processing method and device based on machine learning Download PDF

Info

Publication number
CN114139720A
CN114139720A CN202111358382.1A CN202111358382A CN114139720A CN 114139720 A CN114139720 A CN 114139720A CN 202111358382 A CN202111358382 A CN 202111358382A CN 114139720 A CN114139720 A CN 114139720A
Authority
CN
China
Prior art keywords
data processing
model
preset
processing model
government affair
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111358382.1A
Other languages
Chinese (zh)
Inventor
梁明杰
郑鹏
刘志徽
韦静贤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi Zhongke Shuguang Cloud Computing Co ltd
Pingnan Zhongke Shuguang Cloud Computing Co Ltd
Original Assignee
Guangxi Zhongke Shuguang Cloud Computing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi Zhongke Shuguang Cloud Computing Co ltd filed Critical Guangxi Zhongke Shuguang Cloud Computing Co ltd
Priority to CN202111358382.1A priority Critical patent/CN114139720A/en
Publication of CN114139720A publication Critical patent/CN114139720A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Tourism & Hospitality (AREA)
  • Software Systems (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Primary Health Care (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a method and a device for processing government affair big data based on machine learning, wherein a preset data processing model for preprocessing the government affair log data is determined in a preset search space by acquiring the government affair log data, so that the government affair log data can be automatically processed based on machine learning in the preset search space; based on the optimizer technology and the evaluator technology, the preset data processing model is optimally trained until the optimal data processing model is obtained, so that the problems that the traditional manual process is easy to make mistakes, low in efficiency and difficult to manage are solved; and finally, preprocessing the government affair log data by using the optimal data processing model to obtain high-quality government affair data, and storing or visually displaying the high-quality government affair data, so that the data preprocessing capacity is improved, and the efficiency of real-time batch acquisition and accurate filtering is improved.

Description

Government affair big data processing method and device based on machine learning
Technical Field
The application relates to the technical field of big data, in particular to a government affair big data processing method and device based on machine learning.
Background
With the development of big data technology, municipal administration data are gradually gathered to a big data platform of government affairs of smart cities, matched tools for data acquisition, calculation, processing, analysis and the like are formed, mechanisms such as metadata management, data sharing, data safety protection and the like are established, and data innovation application is developed.
But the smart city government big data platform also faces challenges: the traditional data preprocessing comprises the processes of data cleaning, data sampling, data processing, data segmentation and the like, each process has multiple alternative methods, data analysis is often required to be carried out on data before the method is selected, the whole data preprocessing process is repeated and time-consuming, and the data preprocessing efficiency of a government affair big data processing system is very low.
Disclosure of Invention
The application provides a government affair big data processing method and device based on machine learning, and aims to solve the technical problem that a government affair big data processing system is low in data preprocessing efficiency.
To solve the above technical problem, a first aspect. The embodiment of the application provides a government affair big data processing method based on machine learning, which comprises the following steps:
acquiring government affair log data;
in a preset search space, determining a preset data processing model for preprocessing government affair log data;
performing optimization training on a preset data processing model based on a tuner technology and an evaluator technology until an optimal data processing model is obtained;
preprocessing the government affair log data by using the optimal data processing model to obtain high-quality government affair data;
and storing or visually displaying the high-quality government affair data.
In the embodiment, by acquiring government affair log data and determining a preset data processing model for preprocessing the government affair log data in a preset search space, the government affair log data can be automatically processed in the preset search space based on machine learning; based on the optimizer technology and the evaluator technology, the preset data processing model is optimized and trained until the optimal data processing model is obtained, so that the problems that the traditional manual process is easy to make mistakes, low in efficiency and difficult to manage are solved, and the problem that configuration parameter adjustment is difficult due to the lack of professional knowledge for configuring and optimizing different algorithms is solved; and finally, preprocessing the government affair log data by using the optimal data processing model to obtain high-quality government affair data, and storing or visually displaying the high-quality government affair data, so that the data preprocessing capacity is improved, and the efficiency of real-time batch acquisition and accurate filtering is improved.
In one embodiment, in the preset search space, determining a preset data processing model for preprocessing government affair log data includes:
in a preset search space, selecting a model file containing a default network structure and hyper-parameters according to government affair log data;
and determining an algorithm file of an iterative algorithm according to a preset model loss expected value, wherein the preset data processing model comprises a model file and an algorithm file.
In the embodiment, the model files and the algorithm files are automatically determined in the preset search space, so that the automation of model selection and algorithm selection is realized, the model deployment training efficiency is improved, and the data preprocessing efficiency is improved.
In one embodiment, the performing optimization training on the preset data processing model based on the tuner technology and the evaluator technology until obtaining the optimal data processing model includes:
training a preset data processing model by using a preset tuner to obtain a target data processing model, wherein the target data processing model comprises model parameters;
evaluating the target data processing model by using a preset evaluator according to the model parameters to obtain a model evaluation result;
initializing a target data processing model by using an optimizer according to a model evaluation result;
and circularly optimizing the initialized target data processing model based on the tuning and optimizing device and the evaluator until the target data processing model reaches a preset convergence condition to obtain an optimal data processing model.
In the embodiment, model parameters are continuously and circularly optimized through the optimizer and the evaluator to obtain the intelligent model for adjusting the acquisition and filtering big data processing mechanism, so that the automatic parameter adjustment of the model is realized, the problem of error caused by complicated steps of large manual parameter adjustment is solved, the time is saved, and the labor cost is reduced.
In a preferred embodiment, training a preset data processing model by using a preset tuning device to obtain a target data processing model, includes:
and training the preset data processing model by using the optimizer according to a preset optimization mode to obtain a target data processing model, wherein the preset optimization mode comprises a heuristic search mode, a non-conductive optimization mode and a reinforcement learning mode.
The embodiment trains through a heuristic search mode, a non-guide optimization mode, a reinforcement learning mode and other preset optimization modes, does not need specific assumed conditions, and enables the model training to be more efficient.
In a preferred embodiment, the evaluating the target data processing model according to the model parameters by using a preset evaluator to obtain a model evaluation result, including:
and performing auxiliary evaluation on the target data processing model by using an evaluator according to the model parameters by adopting a preset auxiliary evaluation method to obtain a model evaluation result, wherein the preset auxiliary evaluation method comprises a sub-sampling method, a parameter multiplexing method and an agent evaluation method.
In the embodiment, the evaluation is performed by using an auxiliary evaluation method such as a sub-sampling method, a parameter multiplexing method or a proxy evaluation method, so that the increase of the load of the evaluation process due to the increase of the data volume and the increase of the iteration times is avoided, and the resource consumption of the evaluation process is reduced.
In a preferred embodiment, initializing the target data processing model according to the model evaluation result by using the tuner includes:
determining optimal model parameters corresponding to the model evaluation result by using an empirical learning algorithm through an optimizer;
and initializing the target data processing model according to the optimal model parameters.
In the embodiment, the parameters are adjusted by introducing machine experience so as to accelerate the training process of the network structure and greatly improve the efficiency of optimization training.
In a second aspect, an embodiment of the present application provides a big government affair data processing device based on machine learning, including:
the acquisition module is used for acquiring government affair log data;
the system comprises a determining module, a searching module and a searching module, wherein the determining module is used for determining a preset data processing model for preprocessing government affair log data in a preset searching space;
the training module is used for carrying out optimization training on a preset data processing model based on the tuner technology and the evaluator technology until an optimal data processing model is obtained;
the processing module is used for preprocessing the government affair log data by utilizing the optimal data processing model to obtain high-quality government affair data;
and the display module is used for storing or visually displaying the high-quality government affair data.
In one embodiment, a training module comprises:
the training unit is used for training a preset data processing model by using a preset tuning device to obtain a target data processing model, and the target data processing model comprises model parameters;
the evaluation unit is used for evaluating the target data processing model according to the model parameters by using a preset evaluator to obtain a model evaluation result;
the initialization unit is used for initializing the target data processing model according to the model evaluation result by using the tuner;
and the circulating unit is used for circularly optimizing the initialized target data processing model based on the tuning and optimizing device and the evaluator until the target data processing model reaches a preset convergence condition, so as to obtain an optimal data processing model.
In a third aspect, an embodiment of the present application provides a computer device, including a processor and a memory, where the memory is used to store a computer program, and the computer program, when executed by the processor, implements the machine learning-based government affair big data processing method according to the first aspect.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the method for processing government affair big data based on machine learning according to the first aspect.
Please refer to the relevant description of the first aspect for the beneficial effects of the second to fourth aspects, which are not repeated herein.
Drawings
Fig. 1 is a schematic flowchart of a method for processing government affairs big data based on machine learning according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a big government data processing device based on machine learning according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
As described in the related art, the smart city government big data platform faces challenges: the traditional data preprocessing comprises the processes of data cleaning, data sampling, data processing, data segmentation and the like, each process has multiple alternative methods, data analysis is often required to be carried out on data before the method is selected, the whole data preprocessing process is repeated and time-consuming, and the data preprocessing efficiency of a government affair big data processing system is very low.
Therefore, according to the method and the device for processing government affair big data based on machine learning, government affair log data are obtained, and a preset data processing model for preprocessing the government affair log data is determined in a preset search space, so that the government affair log data can be automatically processed based on machine learning in the preset search space; based on the optimizer technology and the evaluator technology, the preset data processing model is optimized and trained until the optimal data processing model is obtained, so that the problems that the traditional manual process is easy to make mistakes, low in efficiency and difficult to manage are solved, and the problem that configuration parameter adjustment is difficult due to the lack of professional knowledge for configuring and optimizing different algorithms is solved; and finally, preprocessing the government affair log data by using the optimal data processing model to obtain high-quality government affair data, and storing or visually displaying the high-quality government affair data, so that the data preprocessing capacity is improved, and the efficiency of real-time batch acquisition and accurate filtering is improved.
Referring to fig. 1, fig. 1 is a schematic flowchart of a government affair big data processing method based on machine learning according to an embodiment of the present application. The government affair big data processing method based on machine learning can be applied to computer equipment, and the computer equipment comprises but is not limited to a smart phone, a tablet computer, a notebook computer, a desktop computer, a physical server or a cloud server. As shown in fig. 1, the machine learning-based government affairs big data processing includes steps S101 to S105, which are detailed as follows:
step S101, government affair log data are obtained.
In this step, the government affair log data is the log data of the government affair system. Optionally, the logstack data engine collects government affair log data, and transmits the collected government affair log data to the computer device. It can be understood that the Logstash data engine supports dynamic data collection from various data sources, and performs operations such as filtering, analysis, enrichment, uniform format and the like on the data, and then stores the data in a preset storage space.
And S102, determining a preset data processing model for preprocessing the government affair log data in a preset search space.
In this step, the preset search space includes model files and algorithm files of a plurality of candidate models, and the model files include model network structures and model hyper-parameters. The embodiment of determining the preset data processing model comprises determining a model file and an algorithm file.
And S103, carrying out optimization training on the preset data processing model based on the tuner technology and the evaluator technology until an optimal data processing model is obtained.
In this step, the tuner is used to optimize the model through the sample and the evaluator is used for model performance evaluation. Optionally, the model is trained by using an optimizer, after the training is completed, a verification data set is obtained from a logstack data engine, the model effect is verified through an evaluator, so that the LOSS information of each sample of the verification data set is obtained, the optimizer automatically adjusts the network structure and the hyper-parameters through a machine learning technology according to the LOSS information of each sample of the verification data set, and so on, the model is optimized through continuous machine learning, the optimal model scheme is iterated, the current optimal model is trained, so that the processing mechanism for acquiring and filtering big data is adjusted, the complex problem of manual parameter adjustment is solved, the labor cost is reduced, and the model value is improved.
And step S104, preprocessing the government affair log data by using the optimal data processing model to obtain high-quality government affair data.
In the step, the optimal data processing model is utilized to preprocess the government affair log data to obtain high-quality government affair data, and the efficiency of preprocessing the real-time data is improved.
And step S105, storing or visually displaying the high-quality government affair data.
In this step, optionally, data storage is performed through an Elasticsearch distributed search analysis engine, the engine has the characteristics of high scalability, high reliability, easiness in management and the like, can be constructed based on Apache Lucene, and can perform near-real-time storage, search and analysis operations on large-capacity data.
Optionally, the data of the elastic search is searched, analyzed and displayed in a statistical chart mode through a Kibana data analysis and visualization platform and used together with the elastic search, and the data in the es can be displayed in multiple dimensions.
In an embodiment, based on the embodiment shown in fig. 1, the step S102 includes:
in the preset search space, selecting a model file containing a default network structure and hyper-parameters according to the government affair log data;
and determining an algorithm file of an iterative algorithm according to a preset model loss expected value, wherein the preset data processing model comprises the model file and the algorithm file.
In this embodiment, because there may be multiple alternative models for the same problem, and the hyper-parameters of each model are also unknown, compared with the conventional method in which an "optimal" result is obtained through user professional knowledge and repeated tests, the method and the system can realize automatic selection of the model files by presetting multiple alternative models and the hyper-parameters corresponding to the models in a search space, and selecting the model files containing the default network results and the hyper-parameters through government affair log data during actual application.
The purpose of the algorithm file selection is to automatically find an optimization algorithm to balance model efficiency and model performance. Illustratively, the goal is to minimize a smooth objective function, the computer device may select among a gradient descent algorithm, a random gradient descent algorithm, and a L-BGFS algorithm. The gradient descent algorithm has less hyper-parameters, but the model convergence speed is low, and the complexity of each iteration process is high; the resource consumption of the L-BFGS is higher, but the model convergence speed is higher, the consumption of each iteration process of the random gradient descent algorithm is small, and the iteration times are more. If the expected loss value of the preset model is the model convergence block, the computer equipment can balance the efficiency and the performance among the three so as to select the optimal algorithm.
In an embodiment, based on the embodiment shown in fig. 1, the step S103 includes:
training the preset data processing model by using a preset tuning device to obtain a target data processing model, wherein the target data processing model comprises model parameters;
evaluating the target data processing model by using a preset evaluator according to the model parameters to obtain a model evaluation result;
initializing the target data processing model by using the tuner according to the model evaluation result;
and circularly optimizing the initialized target data processing model based on the tuning and optimizing device and the evaluator until the target data processing model reaches a preset convergence condition to obtain the optimal data processing model.
Optionally, the training the preset data processing model by using a preset tuning device to obtain a target data processing model includes:
and training the preset data processing model by using the tuning optimizer according to a preset optimization mode to obtain a target data processing model, wherein the preset optimization mode comprises a heuristic search mode, a non-conductive optimization mode and a reinforcement learning mode.
In this embodiment, for the tuner technology, the preset optimization method is a sample optimization method, and the sample optimization method includes a heuristic search method, a model-based non-conductive optimization method, and a reinforcement learning method.
A heuristic search mode: the method is sensitive to biological behaviors and phenomena, and is widely applied to the problems of non-convex, unsmooth and discontinuous tuning. The basic idea is to initialize a population, obtain a new population through the optimizer and the original population, then evaluate the new population, and repeat the above process.
Model-based non-optimization mode: a model is constructed through samples, then new samples are generated according to evaluation, and then the process is iterated repeatedly to achieve the purpose of targeted space search, so that the method can be used for unguided space optimization, and mainly comprises Bayesian optimization, classification-based optimization and synchronous optimistic optimization.
Bayes optimization: by constructing a probability model (e.g., gaussian, tree, deep network), and then defining an acquisition function (e.g., desired lift, upper confidence limit) based on the probability model, a new sample is obtained from the acquisition function for updating the probability function at each iteration. Bayesian optimization has the advantage of high convergence rate.
Optimization based on classification: by training a classifier with the old samples and dividing the search space into positive and negative regions, the samples in the positive region are more likely to obtain excellent results, so that the new samples are obtained from the positive region, and the steps are iterated, so that the method has the advantage of being very efficient.
And the synchronous optimistic optimization is a branch boundary optimization algorithm. And constructing a tree structure in a search space, wherein each leaf node is a small area, coordinating depth and breadth, and finding a global optimal point.
The reinforcement learning method is a wide and powerful optimization framework, solves the problem through delayed feedback, and is different from other optimization methods in that delayed feedback exists to add a time sequence concept to learning. Which includes policy learning and Q-learning.
Strategy learning: by considering a policy as a function, only one input to the current state, the action to be performed in the current state is determined based on a priori policy, but knowing the policy in advance is not an easy task, where a sophisticated function mapping states to targets needs to be understood in depth.
Q-Learning: unlike policy Learning, the Q-Learning algorithm has two inputs, state and action respectively, and returns a corresponding value for each state action pair. When faced with selection, the algorithm calculates the expected values for the agents taking different actions to select the best result.
Optionally, the evaluating the target data processing model according to the model parameter by using a preset evaluator to obtain a model evaluation result includes:
and performing auxiliary evaluation on the target data processing model according to the model parameters by using the evaluator by adopting a preset auxiliary evaluation method to obtain a model evaluation result, wherein the preset auxiliary evaluation method comprises a sub-sampling method, a parameter multiplexing method and a proxy evaluation method.
In this embodiment, the overall consumption of the evaluator is much larger compared to the tuner technology, where direct evaluation is the simplest method, and evaluation after direct training of the model is more expensive, although accurate. With the increase of the data volume and the increase of the iteration times, the direct evaluation clearly causes great burden to the whole process. In order to improve the efficiency of direct evaluation, the present embodiment designs the following method to assist the direct evaluation method to reduce the consumption thereof.
A sub-sampling method: the less training data, the faster the speed and the more noisy the evaluation is performed using the original sample or feature subset. Early termination: unlike in conventional machine learning, early termination is used to prevent overfitting. When the configuration information without prospect is met, the evaluation can be directly terminated, and unnecessary waste is avoided.
Multiplexing parameters: for configuration information with small difference, the previous parameters can be used as the initial information, so that the convergence speed can be increased, and better performance can be obtained.
Agent evaluation: given that configuration information can be quantified, the behavior of a given configuration can be predicted by building a proxy model.
Optionally, the initializing the target data processing model according to the model evaluation result by using the tuning optimizer includes:
determining an optimal model parameter corresponding to the model evaluation result by using the tuning and optimizing device and adopting an empirical learning algorithm;
and initializing the target data processing model according to the optimal model parameters.
In the present embodiment, the empirical learning algorithm improves the efficiency of automated machine learning by reducing the consumption in configuration generation and evaluation. The empirical learning algorithm includes meta learning and transfer learning.
Meta learning guides learning by extracting meta information. Meta-learning first characterizes the learning problem and learning tools (e.g., statistical features of the data, hyperparameters of the learning tools), then extracts meta-features from past experiences, and finally the meta-learner can train with meta-knowledge. Meta learning has important significance in automatic machine learning, on one hand, important information can be found through characteristic learning problems and learning tools, for example, data drifting exists in data (a model is not accurate any more along with time), and on the other hand, similar problems are easily found along with characterization, so that knowledge can be multiplexed and transferred among different problems. On the other hand, meta-learners encode past knowledge as a guide to solve future problems. Meta-learning can be applied to the evaluator to reduce the huge consumption caused by training in the evaluation process. By entering configuration information into a previously trained meta-learner for evaluation to predict the performance or fitness of the configuration, the meta-learner may directly select the optimal configuration, ideally if all configurations have been enumerated. Meta-learning can also be applied to the tuning unit to reduce meaningless consumption in the tuning process by optimizing the search space, and in the configuration generation stage, the features of the learning problem are extracted as input to predict the promising configuration by the meta-learning unit obtained from previous experience. Meanwhile, the method can be applied to transfer learning, and the configuration generation hot start is carried out by using the configuration which is most similar to the previous task element feature space as initialization data. In addition, meta-learning can be applied to dynamic configuration adaptation, whether concept drift occurs or not is detected through statistics of data and features, and once the concept drift is found, a promising configuration is predicted again to guarantee model availability.
The transfer learning is used for guiding learning by using previous experience, and in the machine learning, an optimal trained agent model or a search strategy is reused to save consumption. In the tuning process, the agent model can be migrated, and in the problem of network structure, due to the transferability of the network, the migration learning is widely applied to the neural architecture search. Transfer learning is used in the evaluator to expedite evaluation of the preselected configuration. For a general optimization problem, the transfer learning can transfer model parameters, and initialization is performed by using the trained optimal parameters. Another idea of the transfer learning is to initialize a new network by requiring the same function as the previously trained model through function retention conversion, such as Net2Net, so as to accelerate the training process of the network structure and greatly improve the efficiency.
In order to execute the government affair big data processing method based on machine learning corresponding to the method embodiment, corresponding functions and technical effects are achieved. Referring to fig. 2, fig. 2 is a block diagram illustrating a structure of a government affair big data processing device based on machine learning according to an embodiment of the present application. For convenience of explanation, only the parts related to the present embodiment are shown, and the device for processing government affairs big data based on machine learning according to the embodiment of the present application includes:
an obtaining module 201, configured to obtain government affair log data;
a determining module 202, configured to determine, in a preset search space, a preset data processing model for preprocessing the government affair log data;
the training module 203 is used for performing optimization training on the preset data processing model based on a tuner technology and an evaluator technology until an optimal data processing model is obtained;
the processing module 204 is configured to utilize the optimal data processing model to preprocess the government affair log data to obtain high-quality government affair data;
and the display module 205 is used for storing or visually displaying the high-quality government affair data.
In one embodiment, the determining module 202 includes:
the selecting unit is used for selecting a model file containing a default network structure and hyper-parameters in the preset search space according to the government affair log data;
and the determining unit is used for determining an algorithm file of an iterative algorithm according to a preset model loss expected value, and the preset data processing model comprises the model file and the algorithm file.
In one embodiment, the training module 203 comprises:
the training unit is used for training the preset data processing model by using a preset tuning device to obtain a target data processing model, and the target data processing model comprises model parameters;
the evaluation unit is used for evaluating the target data processing model according to the model parameters by using a preset evaluator to obtain a model evaluation result;
the initialization unit is used for initializing the target data processing model according to the model evaluation result by utilizing the tuner;
and the circulating unit is used for circularly optimizing the initialized target data processing model based on the tuning and optimizing device and the evaluator until the target data processing model reaches a preset convergence condition, so as to obtain the optimal data processing model.
In a preferred embodiment, the training unit includes:
and the training subunit is used for training the preset data processing model by using the tuner according to a preset optimization mode to obtain a target data processing model, wherein the preset optimization mode comprises a heuristic search mode, a non-conductive optimization mode and a reinforcement learning mode.
In a preferred embodiment, the evaluation unit includes:
and the evaluation subunit is used for performing auxiliary evaluation on the target data processing model according to the model parameters by using a preset auxiliary evaluation method by using the evaluator to obtain a model evaluation result, wherein the preset auxiliary evaluation method comprises a sub-sampling method, a parameter multiplexing method and a proxy evaluation method.
In a preferred embodiment, the initialization unit includes:
the determining subunit is used for determining the optimal model parameters corresponding to the model evaluation result by using the tuning and optimizing device and adopting an empirical learning algorithm;
and the initialization subunit is used for initializing the target data processing model according to the optimal model parameters.
The device for processing big government affairs data based on machine learning can implement the method for processing big government affairs data based on machine learning of the method embodiment. The alternatives in the above-described method embodiments are also applicable to this embodiment and will not be described in detail here. The rest of the embodiments of the present application may refer to the contents of the above method embodiments, and in this embodiment, details are not described again.
Fig. 3 is a schematic structural diagram of a computer device according to an embodiment of the present application. As shown in fig. 3, the computer device 3 of this embodiment includes: at least one processor 30 (only one shown in fig. 3), a memory 31, and a computer program 32 stored in the memory 31 and executable on the at least one processor 30, the processor 30 implementing the steps of any of the above-described method embodiments when executing the computer program 32.
The computer device 3 may be a computing device such as a smart collection, a tablet computer, a desktop computer, and a cloud server. The computer device may include, but is not limited to, a processor 30, a memory 31. Those skilled in the art will appreciate that fig. 3 is merely an example of the computer device 3, and does not constitute a limitation of the computer device 3, and may include more or less components than those shown, or combine some of the components, or different components, such as input output devices, network access devices, etc.
The Processor 30 may be a Central Processing Unit (CPU), and the Processor 30 may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 31 may in some embodiments be an internal storage unit of the computer device 3, such as a hard disk or a memory of the computer device 3. The memory 31 may also be an external storage device of the computer device 3 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the computer device 3. Further, the memory 31 may also include both an internal storage unit and an external storage device of the computer device 3. The memory 31 is used for storing an operating system, an application program, a BootLoader (BootLoader), data, and other programs, such as program codes of the computer program. The memory 31 may also be used to temporarily store data that has been output or is to be output.
In addition, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the steps in any of the method embodiments described above.
The embodiments of the present application provide a computer program product, which when executed on a computer device, enables the computer device to implement the steps in the above method embodiments.
In several embodiments provided herein, it will be understood that each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above-mentioned embodiments are further detailed to explain the objects, technical solutions and advantages of the present application, and it should be understood that the above-mentioned embodiments are only examples of the present application and are not intended to limit the scope of the present application. It should be understood that any modifications, equivalents, improvements and the like, which come within the spirit and principle of the present application, may occur to those skilled in the art and are intended to be included within the scope of the present application.

Claims (10)

1. A government affair big data processing method based on machine learning is characterized by comprising the following steps:
acquiring government affair log data;
in a preset search space, determining a preset data processing model for preprocessing the government affair log data;
performing optimization training on the preset data processing model based on a tuner technology and an evaluator technology until an optimal data processing model is obtained;
preprocessing the government affair log data by using the optimal data processing model to obtain high-quality government affair data;
and storing or visually displaying the high-quality government affair data.
2. The government affair big data processing method according to claim 1, wherein determining a preset data processing model for preprocessing the government affair log data in a preset search space includes:
in the preset search space, selecting a model file containing a default network structure and hyper-parameters according to the government affair log data;
and determining an algorithm file of an iterative algorithm according to a preset model loss expected value, wherein the preset data processing model comprises the model file and the algorithm file.
3. The government affair big data processing method according to claim 1, wherein the optimally training the preset data processing model based on the tuner technology and the evaluator technology until an optimal data processing model is obtained comprises:
training the preset data processing model by using a preset tuning device to obtain a target data processing model, wherein the target data processing model comprises model parameters;
evaluating the target data processing model by using a preset evaluator according to the model parameters to obtain a model evaluation result;
initializing the target data processing model by using the tuner according to the model evaluation result;
and circularly optimizing the initialized target data processing model based on the tuning and optimizing device and the evaluator until the target data processing model reaches a preset convergence condition to obtain the optimal data processing model.
4. The government affair big data processing method according to claim 3, wherein the training of the preset data processing model by using a preset tuner to obtain a target data processing model comprises:
and training the preset data processing model by using the tuning optimizer according to a preset optimization mode to obtain a target data processing model, wherein the preset optimization mode comprises a heuristic search mode, a non-conductive optimization mode and a reinforcement learning mode.
5. The government affair big data processing method according to claim 3, wherein the evaluating the target data processing model according to the model parameters by using a preset evaluator to obtain a model evaluation result comprises:
and performing auxiliary evaluation on the target data processing model according to the model parameters by using the evaluator by adopting a preset auxiliary evaluation method to obtain a model evaluation result, wherein the preset auxiliary evaluation method comprises a sub-sampling method, a parameter multiplexing method and a proxy evaluation method.
6. A government affairs big data processing method according to claim 3, wherein the initializing the target data processing model according to the model evaluation result by using the tuner includes:
determining an optimal model parameter corresponding to the model evaluation result by using the tuning and optimizing device and adopting an empirical learning algorithm;
and initializing the target data processing model according to the optimal model parameters.
7. A big data processing apparatus of government affairs based on machine learning, characterized by comprising:
the acquisition module is used for acquiring government affair log data;
the determining module is used for determining a preset data processing model for preprocessing the government affair log data in a preset search space;
the training module is used for carrying out optimization training on the preset data processing model based on the tuner technology and the evaluator technology until an optimal data processing model is obtained;
the processing module is used for preprocessing the government affair log data by utilizing the optimal data processing model to obtain high-quality government affair data;
and the display module is used for storing or visually displaying the high-quality government affair data.
8. The machine learning-based government affairs big data processing device according to claim 7, wherein the training module comprises:
the training unit is used for training the preset data processing model by using a preset tuning device to obtain a target data processing model, and the target data processing model comprises model parameters;
the evaluation unit is used for evaluating the target data processing model according to the model parameters by using a preset evaluator to obtain a model evaluation result;
the initialization unit is used for initializing the target data processing model according to the model evaluation result by utilizing the tuner;
and the circulating unit is used for circularly optimizing the initialized target data processing model based on the tuning and optimizing device and the evaluator until the target data processing model reaches a preset convergence condition, so as to obtain the optimal data processing model.
9. A computer device comprising a processor and a memory for storing a computer program which, when executed by the processor, implements the machine learning-based government affair big data processing method according to any one of claims 1 to 6.
10. A computer-readable storage medium characterized by storing a computer program which, when executed by a processor, implements the machine learning-based government affair big data processing method according to any one of claims 1 to 6.
CN202111358382.1A 2021-11-16 2021-11-16 Government affair big data processing method and device based on machine learning Pending CN114139720A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111358382.1A CN114139720A (en) 2021-11-16 2021-11-16 Government affair big data processing method and device based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111358382.1A CN114139720A (en) 2021-11-16 2021-11-16 Government affair big data processing method and device based on machine learning

Publications (1)

Publication Number Publication Date
CN114139720A true CN114139720A (en) 2022-03-04

Family

ID=80390341

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111358382.1A Pending CN114139720A (en) 2021-11-16 2021-11-16 Government affair big data processing method and device based on machine learning

Country Status (1)

Country Link
CN (1) CN114139720A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114493379A (en) * 2022-04-08 2022-05-13 金电联行(北京)信息技术有限公司 Enterprise evaluation model automatic generation method, device and system based on government affair data

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114493379A (en) * 2022-04-08 2022-05-13 金电联行(北京)信息技术有限公司 Enterprise evaluation model automatic generation method, device and system based on government affair data

Similar Documents

Publication Publication Date Title
CN110347873B (en) Video classification method and device, electronic equipment and storage medium
CN110390396B (en) Method, device and system for estimating causal relationship between observed variables
WO2022027937A1 (en) Neural network compression method, apparatus and device, and storage medium
US20180181867A1 (en) Artificial neural network class-based pruning
US8315960B2 (en) Experience transfer for the configuration tuning of large scale computing systems
US20240054146A1 (en) Selectively identifying and recommending digital content items for synchronization
CN104217225A (en) A visual target detection and labeling method
CN111406264A (en) Neural architecture search
JP2021022367A (en) Image processing method and information processor
CN111914159A (en) Information recommendation method and terminal
CN111160959B (en) User click conversion prediction method and device
CN111047563A (en) Neural network construction method applied to medical ultrasonic image
Shyam et al. Competitive analysis of the top gradient boosting machine learning algorithms
CN117744754B (en) Large language model task processing method, device, equipment and medium
CN114139720A (en) Government affair big data processing method and device based on machine learning
CN110276081B (en) Text generation method, device and storage medium
CN114492601A (en) Resource classification model training method and device, electronic equipment and storage medium
CN113806579A (en) Text image retrieval method and device
CN116049376B (en) Method, device and system for retrieving and replying information and creating knowledge
CN111898766A (en) Ether house fuel limitation prediction method and device based on automatic machine learning
CN111126443A (en) Network representation learning method based on random walk
CN112926611B (en) Feature extraction method, device and computer readable storage medium
CN114489574B (en) SVM-based automatic optimization method for stream processing framework
JP2020009122A (en) Control program, control method and system
CN114330542A (en) Sample mining method and device based on target detection and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20220310

Address after: Room 1119, building 6, Derui garden, 143 Minzu Avenue, Qingxiu District, Nanning City, Guangxi Zhuang Autonomous Region

Applicant after: Guangxi Zhongke Shuguang cloud computing Co.,Ltd.

Applicant after: Pingnan Zhongke Shuguang cloud computing Co., Ltd

Address before: Room 1119, building 6, Derui garden, 143 Minzu Avenue, Qingxiu District, Nanning City, Guangxi Zhuang Autonomous Region

Applicant before: Guangxi Zhongke Shuguang cloud computing Co.,Ltd.

TA01 Transfer of patent application right