CN116932612A - Basic society governs intelligent data processing system - Google Patents

Basic society governs intelligent data processing system Download PDF

Info

Publication number
CN116932612A
CN116932612A CN202310950113.7A CN202310950113A CN116932612A CN 116932612 A CN116932612 A CN 116932612A CN 202310950113 A CN202310950113 A CN 202310950113A CN 116932612 A CN116932612 A CN 116932612A
Authority
CN
China
Prior art keywords
data
analysis
content
file
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310950113.7A
Other languages
Chinese (zh)
Other versions
CN116932612B (en
Inventor
赵路生
鲍传扬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Yuansheng Pixel Technology Co ltd
Original Assignee
Hangzhou Yuansheng Pixel Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Yuansheng Pixel Technology Co ltd filed Critical Hangzhou Yuansheng Pixel Technology Co ltd
Priority to CN202310950113.7A priority Critical patent/CN116932612B/en
Publication of CN116932612A publication Critical patent/CN116932612A/en
Application granted granted Critical
Publication of CN116932612B publication Critical patent/CN116932612B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2379Updates performed during online database operations; commit processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Optimization (AREA)
  • Tourism & Hospitality (AREA)
  • Computational Mathematics (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Economics (AREA)
  • Quality & Reliability (AREA)
  • Algebra (AREA)
  • Fuzzy Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Educational Administration (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Development Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an intelligent data processing system for basic social management, which is characterized in that data acquisition capabilities of different modes are provided for basic governments or organizations, and a data acquisition module of the system is constructed in the mode. Then, carrying out algorithm analysis processing on the acquired data in a format type and a service type, and constructing a data analysis module of the system in the mode; the multi-mode heterogeneous source data can be utilized to integrate and form domain knowledge, the system has the advantages of automatic and semi-automatic file updating, incremental updating, data asset processing to asset knowledge processing forming system, data relation association mining, wide-depth query forming, data reasoning system construction, data association value discovery, multiple application scenes forming extraction bases, multi-level application construction, and multi-direction scene formation based on base application.

Description

Basic society governs intelligent data processing system
Technical Field
The invention relates to the technical field of data systems, in particular to an intelligent data processing system for basic social management.
Background
Currently, basic government or organization business development mainly relies on some conventional office software such as Excel, word, PPT, or performs business data reporting and query among different informatization systems, APPs and applets. And a large amount of manpower, material resources and financial resources are required to complete a single task in the aspects of data management chains such as inquiring, collecting, filling, sorting and summarizing data. And the new task can be trapped in the expensive chain for data management as shown before.
The main defects of the data management flow are as follows:
1. the files are more, consult loaded down with trivial details:
in the daily work of the basic government or organization, a plurality of business lines are involved, a plurality of file documents are received, and the file documents of a plurality of types are created according to requirements. And these file documents are scattered within different workers, different work devices or application systems of the base layer. Particularly under the influence of personnel change, long time, equipment replacement, old application systems and other reasons, the searching of the files is required to take a great deal of time; meanwhile, the content in the file needs to be opened one by one for checking because the file exists in an inherent file document format, and the whole process is complicated and difficult to check; in addition, file document circulation is relatively difficult and there is a risk of file reference loss.
2. Data heterogeneous integration is difficult:
in the daily work of the basic government or organization, and in the daily accumulation of government informatization, multiple data sources, multimodal heterogeneous data, and data types are formed. The data are difficult to integrate and share, and are difficult to directly fuse and share. This heterogeneity can present significant challenges for data integration, management, and use of data, as well as data isolation and information islanding.
3. Data definition standards disorder:
in the daily work of basic government or organization, unified data standard is not formed, and data acquired by system reporting and self is filled in according to random or simple templates. These are many, messy, bad, lack of standard and normative data, are difficult to use directly, are difficult to share for use, and are difficult to reuse subsequently.
4. The data processing period is long:
in the daily work of basic government or organization, most of past data processing is manually filled in, and manual import calculation is performed, so that even if a corresponding information system exists, a lot of data is only recorded, but the data is difficult to extract and use again, and a great deal of effort is required to be spent on data dictionary inquiry, cleaning and processing of the data. And once the business is changed, the script which possibly processes the data is difficult to continuously play the role of data extraction. At the same time, the influence of high data error rate of processing exists, so that a great deal of effort is spent to obtain a pile of dirty data.
5. The data scattered system is lacking:
in the daily work of the basic government or organization, even if data management and processing are performed to a certain extent, the data content is also more due to more business lines, and the business content of each business is self-organized and difficult to cooperate with each other. In the process of basic layer organization informatization, different business line systems exacerbate the degree of data dispersion, and finally, multi-head searching data and multi-source data use are dilemma.
6. The data conversion value is low:
after data collection and integration are carried out, basic staff tries to use the data in an inefficient way, so that the situation that the data cannot be found or the data is time-consuming and labor-consuming is caused, and the situation that the conversion value is low because the data collection and integration work is carried out in the earlier stage and time-consuming and labor-consuming way is caused by the fact that the data is not convenient to call and use the data is avoided.
In view of the foregoing, it is desirable to design an intelligent data processing system for primary social management that can solve the above-mentioned problems.
Disclosure of Invention
The technical problem to be solved by the invention is to provide an intelligent data processing system for basic social treatment, and provide a system which is based on corpus precipitation in the field of basic social treatment and can be used for rapidly integrating and processing data sources such as documents of basic government or organization and forming systematic, normalized and directly valued data elements.
The invention is realized by the following technical scheme.
The invention relates to an intelligent data processing system for basic social management, which comprises a data acquisition module, a data processing module and a data processing module, wherein the data acquisition module is used for providing data acquisition capacity for the system based on different modes of basic government or organization;
the data analysis module is used for carrying out algorithm analysis processing on the acquired data in the format type and the service type;
the data design module is used for normalizing the data generated by analysis through field expected precipitation;
a data service module for providing data for the structured data of the integrated and systemized total graph
And (5) serving.
And the data application module is used for providing data support for the upper file document application by using the corresponding data service.
Further, the data acquisition capabilities of the data acquisition module provide four ways:
mode one: the browser RPA (Robotic Process Automation) robot flow automation plug-in replaces the action of repeated copying and pasting to acquire data, and the acquired data is pushed to the data analysis module to be analyzed.
Mode two: through the provided webpage cloud disk, desktop cloud disk and mobile cloud disk APP, local files can be rapidly, batched and periodically uploaded to a cloud disk file center, and the cloud disk file center can push file documents into a content to a data analysis module for analysis in an automatic and manual mode.
Mode three: and receiving the files or data which are delivered or distributed by the upper level of the basic government or organization by the way of the provided data receiving API, and then pushing the received files or data into a data analysis module for analysis.
Mode four: and synchronously receiving the data of the basic government or other systems or the existing data of the organization by providing a database data receiving mode, and pushing the received data into a data analysis module for analysis.
Further, the data analysis of the data analysis module has three major types of data analysis modes:
first general category: after the data format and the content are identified, analyzing the data format and the content into a structured type;
second general class: after the data format and the content are identified, resolving the data format and the content into a semi-structured type;
third general class: after identifying the data format and content, parsing into unstructured types.
Further, the parsing and component flow based on Excel related data in the first major class of structured type data is as follows:
the analysis flow is as follows:
s01, recognizing data, starting solution, and selecting manual analysis or automatic analysis;
s02, judging whether the format is an Excel (containing csv) format or not;
s03, judging that the result is not in an Excel (containing csv) format, and ending the analysis by using other analysis modes;
s04, judging whether an Excel (containing csv) format is adopted, and further judging whether the Excel file is a multishell file or not;
s05, dividing the multi-sheet file into a plurality of single tables according to sheets, and carrying out next-stage processing;
s06, performing next-stage processing if the multi-sheet file is not the multi-sheet file;
s07, judging whether the data of the multi-sheet file split into a plurality of single sheets and the data of the multi-sheet file are single-sheet multi-sheets or not;
s08, splitting a plurality of single sheets into a plurality of single sheets according to blanks, and performing next-stage processing;
s09, performing next-stage processing instead of single sheet multi-table;
s10, simply cleaning a data file of a single sheet multi-form and a data file which is not the single sheet multi-form and is split into a plurality of single forms according to blanks;
s11, splitting a cell table;
s12, supplementing and completing the cell content;
s13, identifying whether a title area and content area identification algorithm can be applied.
And S14, the method cannot be applied, outputs a result, cannot be analyzed temporarily and ends.
S15, the method can be applied to obtain analysis content by applying a title area and content area division recognition algorithm;
s16, generating the operation diagram structure data according to the analysis content, and ending.
5. The intelligent data processing system for basic social governance of claim 5, wherein: the title area and content area division recognition algorithm steps:
s01, simply cleaning data in Excel table cells, wherein the cleaning content involves removing formats, punctuations and stop words, occupying space of null cell data by manually defined special characters, and displaying an Excel table content area in a content matrix mode;
s02, performing word segmentation on data in Excel table cells by using HanLP;
s03, word2vec is used for calculating Word vectors of each Word for data in Excel table cells, then the average value of the Word vectors of the words in each cell is calculated, and then the average value of the Word vectors is used as the overall Word vector of the cell.
S04, performing cosine similarity calculation on each line of cell word vector of the Excel table and the next line of cell word vector of the Excel table to obtain cosine similarity values between the front cell and the rear cell, and then adding and summing the cosine similarity values between the front cell and the rear cell of each line of the Excel table and the rear line of the Excel table to obtain total cosine similarity values between the two lines;
s05, calculating and comparing cosine similarity values among all lines, wherein two lines with the minimum inter-line cosine similarity value are dividing positions between a header line and a data line;
s06, after dividing the content between the title line and the data line, merging the data of the title lines into one line of title data through a merging rule, wherein the title line is the content of a plurality of lines; the multi-row merging calculation mode mainly calculates merging of header column data from top to bottom, and combines non-repeated headers to form a new single-row header.
Further, the data design module mainly presents the graph structure data elements generated from the data analysis module in a mode of operation graph structure data through a graph structure data construction theory, and different operation graph structure data are subjected to fusion and summarization through a data fusion algorithm and a data summarization algorithm to form systematic total graph structure data;
the diagram structure data is mainly divided into two layers, wherein one layer is a concept ontology layer and mainly carries out concept definition and system classification;
the other layer is a data layer, and mainly corresponds to a specific data real-time routine.
The data design module firstly carries out fusion or summarization judgment on the data analyzed on the conceptual layer, and processes the data layer based on the judgment result of fusion or summarization.
Further, the data service is to collect the graph structure data through the constructed operation graph structure data on the basis of the previous module, provide the data service to the outside and provide different data opening modes.
Further, the data application module provides three types of applications:
first kind: a personal knowledge service center;
second kind: a full element search center;
third kind: intelligent form data center.
The invention has the beneficial effects that: the system provided by the invention provides a plurality of data acquisition modes, is convenient for user interaction, is easy to operate, has a plurality of product form selections according to requirements, can utilize multi-mode heterogeneous source data to integrate and form domain knowledge, has an automatic and semi-automatic file update acquisition system, an incremental update system, a system from data asset processing to asset knowledge processing, a data relationship association mining system, a more extensive depth query formation system, a data reasoning system construction system, can find more data association values, a plurality of application scenes, an extraction base formation system, a multi-level application construction system and a multi-direction scene formation system based on base application.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are used in the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the description below are only some embodiments of the invention, and that other drawings can be obtained from these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow of parsing and building blocks of Excel related data in structured data of the present invention;
FIG. 2 is an example of a job graph structure data fusion rule of the present invention.
Detailed Description
The invention is described in detail below with reference to fig. 1-2.
1-2, a primary social governance intelligent data processing system comprising a data acquisition module for providing the system with data acquisition capabilities in different manners based on a primary government or organization;
the data analysis module is used for carrying out algorithm analysis processing on the acquired data in the format type and the service type;
the data design module is used for normalizing the data generated by analysis through field expected precipitation;
and the data service module is used for providing data service for the structured total graph structure data after the total system.
And the data application module is used for providing data support for the upper file document application by using the corresponding data service.
By providing the basic government or organization with different ways of data acquisition capabilities, a data acquisition module of the system is constructed in this way. And then carrying out the algorithm analysis processing on the acquired data in the format type and the service type, and constructing a data analysis module of the system in the mode. The data generated by the parsing is then normalized for field predicted precipitation. Based on the graph structure data construction theory, constructing and forming operation graph structure data, carrying out fusion and summarization on different operation graph structure data through a data fusion algorithm and a data summarization algorithm to form systematic summarized graph structure data, and constructing a data design module of the system in this way. For the structured data of the summarized graph, a data service module of the system is constructed by providing data services such as visualization, retrieval, recommendation, question-answering, reasoning and the like. The data application module of the system is constructed by using corresponding data services to provide data support for applications such as deep search of upper-layer file document content, full-element search, intelligent form distribution and filling, AI digital man-made and the like.
The data acquisition capabilities of the data acquisition module provide four ways:
mode one: the browser RPA (Robotic Process Automation) robot flow automation plug-in replaces the action of repeated copying and pasting to acquire data, and the acquired data is pushed to the data analysis module to be analyzed.
Mode two: through the provided webpage cloud disk, desktop cloud disk and mobile cloud disk APP, local files can be rapidly, batched and periodically uploaded to a cloud disk file center, and the cloud disk file center can push file documents into a content to a data analysis module for analysis in an automatic and manual mode.
Mode three: and receiving the files or data which are delivered or distributed by the upper level of the basic government or organization by the way of the provided data receiving API, and then pushing the received files or data into a data analysis module for analysis.
Mode four: and synchronously receiving the data of the basic government or other systems or the existing data of the organization by providing a database data receiving mode, and pushing the received data into a data analysis module for analysis.
The data analysis of the data analysis module has three major types of data analysis modes:
first general category: after the data format and the content are identified, and the result is the structured type, the structured data is subjected to graph structure data element identification and construction by configuring a mapping mode and setting a mapping rule, for example, excel table data with clear header line data lines and relational database data are typical structured type data.
Second general class: after the data format and the content are identified, and the result is a semi-structured type, the data is subjected to drawing structure data element extraction and construction through a design extractor and a wrapper, and for example, word table, XML and JSON files are typical semi-structured type data.
Third general class: after the data format and the content are identified, and the unstructured type is obtained as a result, drawing structure data elements of the data are extracted and constructed through a design algorithm, for example, common Word documents, web pages, pictures, audio, video and the like are all unstructured type data.
The analysis and component flow based on Excel related data in the first major class of structured type data is as follows:
as shown in fig. 1, the analysis flow is as follows:
s01, recognizing data, starting solution, and selecting manual analysis or automatic analysis;
s02, judging whether the format is an Excel (containing csv) format or not;
s03, judging that the result is not in an Excel (containing csv) format, and ending the analysis by using other analysis modes;
s04, judging whether an Excel (containing csv) format is adopted, and further judging whether the Excel file is a multishell file or not;
s05, dividing the multi-sheet file into a plurality of single tables according to sheets, and carrying out next-stage processing;
s06, performing next-stage processing if the multi-sheet file is not the multi-sheet file;
s07, judging whether the data of the multi-sheet file split into a plurality of single sheets and the data of the multi-sheet file are single-sheet multi-sheets or not;
s08, splitting a plurality of single sheets into a plurality of single sheets according to blanks, and performing next-stage processing;
s09, performing next-stage processing instead of single sheet multi-table;
s10, simply cleaning a data file of a single sheet multi-form and a data file which is not the single sheet multi-form and is split into a plurality of single forms according to blanks;
s11, splitting a cell table;
s12, supplementing and completing the cell content;
s13, identifying whether a title area and content area identification algorithm can be applied.
And S14, the method cannot be applied, outputs a result, cannot be analyzed temporarily and ends.
S15, the method can be applied to obtain analysis content by applying a title area and content area division recognition algorithm;
s16, generating the structure data of the operation diagram according to the analysis content, and ending.
Wherein, the title area and the content area divide the identification algorithm step:
s01, simply cleaning data in Excel table cells, wherein the cleaning content involves removing formats, punctuations and stop words, occupying space of null cell data by manually defined special characters, and displaying an Excel table content area in a content matrix mode;
s02, performing word segmentation on the data in the Excel table cells by using HanLP.
Wmn={W1,W2,...,W}
S03, word2vec is used for calculating Word vectors of each Word for data in Excel table cells, then the average value of the Word vectors of the words in each cell is calculated, and then the average value of the Word vectors is used as the overall Word vector of the cell.
And S04, performing cosine similarity calculation on each line of cell word vector of the Excel table and the next line of cell word vector of the Excel table to obtain cosine similarity values between the front cell and the rear cell, and then adding and summing the cosine similarity values between the front cell and the rear cell of each line of the Excel table and the rear line of the Excel table to obtain total cosine similarity values between the two lines.
Column m ={V m1 ,V m2 ,...V mn },
And S05, calculating and comparing cosine similarity values among all lines, wherein two lines with the minimum inter-line cosine similarity value are dividing positions between the header line and the data line.
And S06, after dividing the content between the title line and the data line, merging the data of the title lines into one line of title data by a merging rule, wherein the title line is the content of a plurality of lines. The multi-row merging calculation mode mainly calculates merging of header column data from top to bottom, and combines non-repeated headers to form a new single-row header.
The data design module mainly presents the graph structure data elements generated from the data analysis module in a mode of operation graph structure data through a graph structure data construction theory, and different operation graph structure data are subjected to fusion and summarization through a data fusion algorithm and a data summarization algorithm to form systematic and summary graph structure data;
the diagram structure data is mainly divided into two layers, wherein one layer is a concept ontology layer and mainly carries out concept definition and system classification;
the other layer is a data layer, and mainly corresponds to a specific data real-time routine.
The data design module firstly carries out fusion or summarization judgment on the data analyzed on the conceptual layer, and processes the data layer based on the judgment result of fusion or summarization.
The data design module mainly presents the graph structure data elements generated from the data analysis module in a mode of operating the graph structure data through a graph structure data construction theory. The diagram structure data is mainly divided into two layers, wherein one layer is a concept ontology layer and mainly carries out concept definition and system classification; the other layer is a data layer, and mainly corresponds to a specific data real-time routine. In the process that the graph structure data elements are constructed into the graph structure data, the operations of data standardization, processing, mapping, correction and the like are carried out through the precipitated domain corpus to form a piece of operation graph structure data with relatively complete logic. Different operation diagram structure data can be fused into a new operation diagram structure data under the condition of meeting fusion. The operation diagram structure data can be summarized on the summarized diagram structure data, and the conceptual ontology layer and the data layer of the summarized diagram structure data are summarized according to specific algorithm rules. And finally, building templates of the template diagram structure conceptual layer on the basis of the operation diagram, thereby facilitating subsequent sharing and repeated use.
The operation diagram structure data fusion rule is as follows:
1. graph structure data concept layer fusion
The fusion algorithm is mainly used for calculating the similarity between concept categories on a concept layer of the graph structure data, and judging that two concept categories belong to the same concept category when a certain threshold value is exceeded. After judging the fusion concept categories, calculating the respective attributes of the two fusion concept categories and the relationship edge data through similarity, and after exceeding a certain threshold, carrying out the same fusion, otherwise, not carrying out the fusion operation on the corresponding attributes and the relationship edges, and simultaneously keeping. Based on the concept category which is judged to be impossible to fuse, the fusion operation is not performed.
2. Graph structure data layer fusion
Based on the category of the concept which can be fused, the corresponding data layer can also fuse the data layer according to the configured positioning attribute mode.
The rule for summarizing the operation diagram structure data into the summary diagram structure data is as follows:
1. graph structure data concept layer summarization
The summarization algorithm mainly calculates the similarity between concept categories on a concept layer of the graph structure data, and judges that two concept categories belong to the same concept category when a certain threshold value is exceeded. After judging the concept categories which can be combined, calculating the respective attributes of the two combined concept categories and the relationship edge data through the similarity, and after exceeding a certain threshold value, equally combining, otherwise, not performing the combination operation on the corresponding attributes and the relationship edge, and simultaneously keeping. Based on the concept category which can not be combined, the concept category is newly added on the concept layer of the structure data of the summary graph, and the corresponding attribute and the relationship side are also newly added.
2. Graph structure data layer summary
Based on the concept categories which can be combined, the corresponding data layers can also be combined according to the configured positioning attribute mode. Based on concept categories which cannot be combined, the corresponding data layers are directly added on the data layers corresponding to the summarized graph structure data.
An example of the flow structure among the job graph structure data, the summary graph structure data, and the template graph structure data is shown in fig. 2.
After the content in the data source is accessed, a job graph (e.g. job graph 1, job graph 2, job graph 4) corresponding to the content is generated according to the data source type and the data extraction analysis mode. The job graphs can be fused, and a new job graph can be generated according to a fusion rule (for example, the job graph 1 and the job graph 2 generate a fused job graph 3). The job graphs may be summarized into summary graphs (e.g., job graph 3 into summary graph 1, job graph 4 into summary graph 2) according to summary rules.
The data service is based on the previous module, and the structured data of the operation graph is collected, the data service is provided for the outside and different data opening modes are provided.
The data service is mainly based on the previous module, and the data service related to data output, calculation, search, question-answer, prediction, reasoning and the like is provided for the outside by integrating the structured data of the operation diagram and the diagram structure data. Meanwhile, the data service with control is provided through different data opening modes, for example, through an API, and the data service with high credibility requirement is provided through library opening.
Data application modules currently mainly provide three types of applications:
first kind: a personal knowledge service center;
second kind: a full element search center;
third kind: intelligent form data center.
First kind: the personal knowledge service center is mainly used for carrying out deep search on file data processed by the data integration module, the data analysis module and the data design module, wherein the search content comprises information such as any word, file name, subject word, file abstract and the like. And meanwhile, the secondary search can be carried out on the searched result, wherein the secondary search comprises keyword search, search in the result and exclusion in the result. For the retrieved file content, online previewing and file downloading can be achieved.
Second kind: the full-element search center is mainly used for searching the structure data of the total graph processed by the data integration module, the data analysis module and the data design module, and the search content comprises information such as concept classification, arbitrary words, concept names, concept labels, concept abstracts and the like. And meanwhile, the secondary search can be carried out on the searched result, wherein the secondary search comprises keyword search, search in the result and exclusion in the result. For the retrieved file content, the detailed information and the full element relation can be checked.
Third kind: the intelligent form data center is mainly characterized in that the form can be automatically created by integrating and using the structure data of the total graph processed by the data integration module, the data analysis module and the data design module, relevant data is automatically filled according to fields in the form, and then a form collection department sends corresponding form filling content to relevant departments or range filling personnel to automatically fill and unify the filled data.
The above embodiments are only for illustrating the technical concept and features of the present invention, and are intended to enable those skilled in the art to understand the present invention and implement it without limiting the scope of the present invention. All equivalent changes or modifications made in accordance with the spirit of the present invention should be construed to be included in the scope of the present invention.

Claims (8)

1. An intelligent data processing system for basic social management is characterized in that:
the system comprises a data acquisition module for providing data acquisition capability of the system based on different modes of basic government or organization;
the data analysis module is used for carrying out algorithm analysis processing on the acquired data in the format type and the service type;
the data design module is used for normalizing the data generated by analysis through field expected precipitation;
and the data service module is used for providing data service for the structured total graph structure data after the total system.
And the data application module is used for providing data support for the upper file document application by using the corresponding data service.
2. The base-level social governance intelligent data processing system of claim 1, wherein: the data acquisition capabilities of the data acquisition module provide four ways:
mode one: the browser RPA (Robotic Process Automation) robot flow automation plug-in replaces the action of repeated copying and pasting to acquire data, and the acquired data is pushed to the data analysis module to be analyzed.
Mode two: through the provided webpage cloud disk, desktop cloud disk and mobile cloud disk APP, local files can be rapidly, batched and periodically uploaded to a cloud disk file center, and the cloud disk file center can push file documents into a content to a data analysis module for analysis in an automatic and manual mode.
Mode three: and receiving the files or data which are delivered or distributed by the upper level of the basic government or organization by the way of the provided data receiving API, and then pushing the received files or data into a data analysis module for analysis.
Mode four: and synchronously receiving the data of the basic government or other systems or the existing data of the organization by providing a database data receiving mode, and pushing the received data into a data analysis module for analysis.
3. The base-level social governance intelligent data processing system of claim 1, wherein: the data analysis of the data analysis module has three major types of data analysis modes:
first general category: after the data format and the content are identified, analyzing the data format and the content into a structured type;
second general class: after the data format and the content are identified, resolving the data format and the content into a semi-structured type;
third general class: after identifying the data format and content, parsing into unstructured types.
4. The intelligent data processing system for basic social governance of claim 4, wherein: the analysis and component flow based on Excel related data in the first major class of structured type data is as follows:
the analysis flow is as follows:
s01, recognizing data, starting solution, and selecting manual analysis or automatic analysis;
s02, judging whether the format is an Excel (containing csv) format or not;
s03, judging that the result is not in an Excel (containing csv) format, and ending the analysis by using other analysis modes;
s04, judging whether an Excel (containing csv) format is adopted, and further judging whether the Excel file is a multishell file or not;
s05, dividing the multi-sheet file into a plurality of single tables according to sheets, and carrying out next-stage processing;
s06, performing next-stage processing if the multi-sheet file is not the multi-sheet file;
s07, judging whether the data of the multi-sheet file split into a plurality of single sheets and the data of the multi-sheet file are single-sheet multi-sheets or not;
s08, splitting a plurality of single sheets into a plurality of single sheets according to blanks, and performing next-stage processing;
s09, performing next-stage processing instead of single sheet multi-table;
s10, simply cleaning a data file of a single sheet multi-form and a data file which is not the single sheet multi-form and is split into a plurality of single forms according to blanks;
s11, splitting a cell table;
s12, supplementing and completing the cell content;
s13, identifying whether a title area and content area identification algorithm can be applied.
And S14, the method cannot be applied, outputs a result, cannot be analyzed temporarily and ends.
S15, the method can be applied to obtain analysis content by applying a title area and content area division recognition algorithm;
s16, generating the operation diagram structure data according to the analysis content, and ending.
5. The intelligent data processing system for basic social governance of claim 4, wherein: the title area and content area division recognition algorithm steps:
s01, simply cleaning data in Excel table cells, wherein the cleaning content involves removing formats, punctuations and stop words, occupying space of null cell data by manually defined special characters, and displaying an Excel table content area in a content matrix mode;
s02, performing word segmentation on data in Excel table cells by using HanLP;
s03, word2vec is used for calculating Word vectors of each Word for data in Excel table cells, then the average value of the Word vectors of the words in each cell is calculated, and then the average value of the Word vectors is used as the overall Word vector of the cell.
S04, performing cosine similarity calculation on each line of cell word vector of the Excel table and the next line of cell word vector of the Excel table to obtain cosine similarity values between the front cell and the rear cell, and then adding and summing the cosine similarity values between the front cell and the rear cell of each line of the Excel table and the rear line of the Excel table to obtain total cosine similarity values between the two lines;
s05, calculating and comparing cosine similarity values among all lines, wherein two lines with the minimum inter-line cosine similarity value are dividing positions between a header line and a data line;
s06, after dividing the content between the title line and the data line, merging the data of the title lines into one line of title data through a merging rule, wherein the title line is the content of a plurality of lines; the multi-row merging calculation mode mainly calculates merging of header column data from top to bottom, and combines non-repeated headers to form a new single-row header.
6. The base-level social governance intelligent data processing system of claim 1, wherein: the data design module mainly presents the graph structure data elements generated from the data analysis module in a mode of operation graph structure data through a graph structure data construction theory, and different operation graph structure data are subjected to fusion and summarization through a data fusion algorithm and a data summarization algorithm to form systematic and summary graph structure data;
the diagram structure data is mainly divided into two layers, wherein one layer is a concept ontology layer and mainly carries out concept definition and system classification;
the other layer is a data layer, and mainly corresponds to a specific data real-time routine.
The data design module firstly carries out fusion or summarization judgment on the data analyzed on the conceptual layer, and processes the data layer based on the judgment result of fusion or summarization.
7. The base-level social governance intelligent data processing system of claim 1, wherein: the data service is based on the previous module, and the structured data of the operation graph is collected, the data service is provided for the outside and different data opening modes are provided.
8. The base-level social governance intelligent data processing system of claim 1, wherein: the data application module provides three types of applications:
first kind: a personal knowledge service center;
second kind: a full element search center;
third kind: intelligent form data center.
CN202310950113.7A 2023-07-31 2023-07-31 Basic society governs intelligent data processing system Active CN116932612B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310950113.7A CN116932612B (en) 2023-07-31 2023-07-31 Basic society governs intelligent data processing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310950113.7A CN116932612B (en) 2023-07-31 2023-07-31 Basic society governs intelligent data processing system

Publications (2)

Publication Number Publication Date
CN116932612A true CN116932612A (en) 2023-10-24
CN116932612B CN116932612B (en) 2024-05-10

Family

ID=88389665

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310950113.7A Active CN116932612B (en) 2023-07-31 2023-07-31 Basic society governs intelligent data processing system

Country Status (1)

Country Link
CN (1) CN116932612B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117575172A (en) * 2024-01-09 2024-02-20 天津市大数据管理中心 Integrated social management informatization system with multi-level integration

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112418180A (en) * 2020-12-11 2021-02-26 深圳前海微众银行股份有限公司 Table data extraction method, device, equipment and computer storage medium
US20210192412A1 (en) * 2017-11-27 2021-06-24 Sankar Krishnaswamy Cognitive Intelligent Autonomous Transformation System for actionable Business intelligence (CIATSFABI)
CN115438199A (en) * 2022-11-08 2022-12-06 眉山环天智慧科技有限公司 Knowledge platform system based on smart city scene data middling platform technology
WO2022266168A1 (en) * 2021-06-15 2022-12-22 Jason Crowell Systems, devices and methods for data processing and presentation
CN116204656A (en) * 2023-02-14 2023-06-02 湖北工业大学 Big data knowledge graph construction method, system, equipment and storage medium
CN116361487A (en) * 2023-04-19 2023-06-30 中电云数智科技有限公司 Multi-source heterogeneous policy knowledge graph construction and storage method and system
CN116502817A (en) * 2022-12-27 2023-07-28 浪潮云信息技术股份公司 Basic unit administers wisdom platform

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210192412A1 (en) * 2017-11-27 2021-06-24 Sankar Krishnaswamy Cognitive Intelligent Autonomous Transformation System for actionable Business intelligence (CIATSFABI)
CN112418180A (en) * 2020-12-11 2021-02-26 深圳前海微众银行股份有限公司 Table data extraction method, device, equipment and computer storage medium
WO2022266168A1 (en) * 2021-06-15 2022-12-22 Jason Crowell Systems, devices and methods for data processing and presentation
CN115438199A (en) * 2022-11-08 2022-12-06 眉山环天智慧科技有限公司 Knowledge platform system based on smart city scene data middling platform technology
CN116502817A (en) * 2022-12-27 2023-07-28 浪潮云信息技术股份公司 Basic unit administers wisdom platform
CN116204656A (en) * 2023-02-14 2023-06-02 湖北工业大学 Big data knowledge graph construction method, system, equipment and storage medium
CN116361487A (en) * 2023-04-19 2023-06-30 中电云数智科技有限公司 Multi-source heterogeneous policy knowledge graph construction and storage method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117575172A (en) * 2024-01-09 2024-02-20 天津市大数据管理中心 Integrated social management informatization system with multi-level integration
CN117575172B (en) * 2024-01-09 2024-04-12 天津市大数据管理中心 Integrated social management informatization system with multi-level integration

Also Published As

Publication number Publication date
CN116932612B (en) 2024-05-10

Similar Documents

Publication Publication Date Title
CN109446344B (en) Intelligent analysis report automatic generation system based on big data
WO2019196226A1 (en) System information querying method and apparatus, computer device, and storage medium
Hariharakrishnan et al. Survey of pre-processing techniques for mining big data
CN111324784A (en) Character string processing method and device
CN116932612B (en) Basic society governs intelligent data processing system
CN110851663B (en) Method and device for managing metadata
US20100082625A1 (en) Method for merging document clusters
CN111581193A (en) Data processing method, device, computer system and storage medium
CN116361487A (en) Multi-source heterogeneous policy knowledge graph construction and storage method and system
CN115422155A (en) Modeling method of data lake metadata model
CN115132366A (en) Multi-source data processing method and system based on health and medical big data standard library
US9165053B2 (en) Multi-source contextual information item grouping for document analysis
EP1510935A1 (en) Mapping a data from a data warehouse to a data mart
US20230016485A1 (en) Systems and Methods for Intelligent Automatic Filing of Documents in a Content Management System
CN112214615A (en) Policy document processing method and device based on knowledge graph and storage medium
CN110826845B (en) Multidimensional combination cost allocation device and method
CN112199488A (en) Incremental knowledge graph entity extraction method and system for power customer service question answering
CN110888977B (en) Text classification method, apparatus, computer device and storage medium
WO2023178767A1 (en) Enterprise risk detection method and apparatus based on enterprise credit investigation big data knowledge graph
CN111723122A (en) Method, device and equipment for determining association rule between data and readable storage medium
CN116260866A (en) Government information pushing method and device based on machine learning and computer equipment
CN115794798A (en) Market supervision informationized standard management and dynamic maintenance system and method
CN115759253A (en) Power grid operation and maintenance knowledge map construction method and system
CN114330720A (en) Knowledge graph construction method and device for cloud computing and storage medium
CN111460046A (en) Scientific and technological information clustering method based on big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant