CN111291102A - High-performance scale statistical calculation method for government affair data mining - Google Patents

High-performance scale statistical calculation method for government affair data mining Download PDF

Info

Publication number
CN111291102A
CN111291102A CN202010047244.0A CN202010047244A CN111291102A CN 111291102 A CN111291102 A CN 111291102A CN 202010047244 A CN202010047244 A CN 202010047244A CN 111291102 A CN111291102 A CN 111291102A
Authority
CN
China
Prior art keywords
data
analysis
layer
database
government affair
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010047244.0A
Other languages
Chinese (zh)
Inventor
金震宇
树华伟
罗玉泉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dahan Software Co Ltd
Original Assignee
Dahan Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dahan Software Co Ltd filed Critical Dahan Software Co Ltd
Priority to CN202010047244.0A priority Critical patent/CN111291102A/en
Publication of CN111291102A publication Critical patent/CN111291102A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/26Visual data mining; Browsing structured data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a high-performance scale statistical calculation method for government affair data mining, which belongs to the technical field of statistical calculation methods and is realized by a presentation layer, an application support layer and a data layer; the presentation layer mainly comprises a data large screen and a data analysis report; the application layer is mainly divided into three parts, namely data analysis, data source management and demonstration management; the data analysis is to combine the display template and the data model to generate a data analysis page, establish the data model and use different themes and demonstration modes; the application support layer may provide association components. According to the invention, by adopting the visual display system oriented to mass data analysis and combining the characteristics of government affairs service industry, an industry data mining model is established in advance to quickly extract and display key data from mass data, so that the data type analysis and modeling workload in the early stage is greatly reduced compared with that of the traditional data mining system, and the data mining efficiency is improved by at least 10%.

Description

High-performance scale statistical calculation method for government affair data mining
Technical Field
The invention relates to a statistical calculation method, in particular to a high-performance scale statistical calculation method for government affair data mining, and belongs to the technical field of statistical calculation methods.
Background
With the continuous promotion of the work of 'internet + government affair service', a large number of government affair systems and applications are integrated by an intensive and government affair service platform, and the intensive and government affair service platform has the largest number of data resources with the highest value density at present. From the perspective of government construction, the internet and government affair service invests a large amount of capital and manpower, the obtained scores and effects cannot be displayed, internal examination cannot be quantized, objective fairness is difficult to achieve, and meanwhile, the problems of how to operate and maintain a huge system and how to improve the existing services and the like emerge from the water surface successively.
Because of this, the mining and utilization of government affair data resources are urgent, and it can play a supporting, feedback and guiding role to government affairs and intensive work, and simultaneously promote the operation and maintenance service capability of the system, find problems that appear in the development. From the perspective of public management, the method has important values for promoting economic development, improving social governance, and improving government service and supervision capabilities.
In the field of data intelligence, although application scenarios are various, most data statistics analysis logics are relatively simple and are often based on simple event statistics. For example, the vehicle license plate is captured by the intelligent traffic gate, and the event information is the license plate number, the vehicle attribution, the time and the place. The structure of the event log itself is very simple. The service pain points are mainly large in data scale, various in statistical dimensionality and huge in computing resource consumption. Meanwhile, the business department submits a new calculation task and resubmits developers to carry out secondary development, so that the efficiency is low.
The defects and shortcomings exist at present:
supportable data sources are single, data of multiple data sources cannot be uniformly integrated, most of the supportable data sources are only collected aiming at the single data sources, the data sources in the actual environment are numerous and have interfaces, databases, index libraries and other data sources, and diversified requirements cannot be met;
the method has the advantages of acquisition, processing, analysis, display, high technical content, high use threshold, implementation by professional personnel, code compiling and relatively long implementation period;
the visualization effect is not rich, the adjustment difficulty is high, the data display depends on the customization of developers, the response cannot be timely carried out when the adjustment is needed, and the cost is high;
the data scale is large, the statistical dimensionality is various, the computing resource consumption is huge, and the computing mode of a single point cannot meet the requirement.
Disclosure of Invention
The invention mainly aims to solve the problems that the statistical calculation method in the prior art cannot meet diversified demands, is long in implementation period, is not timely in response and high in cost, and provides a high-performance scale statistical calculation method for government affair data mining.
The purpose of the invention can be achieved by adopting the following technical scheme:
a high-performance scale statistical calculation method for government affair data mining is realized by a presentation layer, an application support layer and a data layer.
Further, the presentation layer mainly comprises a data large screen and a data analysis report.
Furthermore, the data large screen can be applied to conference exhibition and business monitoring of government affair users; the data analysis reports may be used for lead decisions and operational analysis.
Furthermore, the application layer is mainly divided into three parts, namely data analysis, data source management and demonstration management.
Further, the data analysis is to combine the presentation template and the data model to generate a data analysis page; the data source management mainly comprises the steps of adding and connecting target data sources and establishing a data model; the demonstration management is used for demonstrating requirements of users in different scenes, and different themes and demonstration modes are used.
Further, the application support layer can provide a visual template, a template tag, chart data, a chart style, a dimension field, a measurement dictionary, a user-defined expression and a visual table association component, provides the visual template, the template tag, the chart data, the chart style, the dimension field, the measurement dictionary, the user-defined expression and the visual table association component for an application layer to call services, and performs multidimensional analysis and display on data by using an online analysis processing technology.
Further, the data layer supports the docking of various data sources, including excel data, csv data, mysql database, oracle database, hive and hbase of hadoop.
Furthermore, the data layer provides data interfaces for each service system in a software interface mode, and data acquisition is converged.
Furthermore, the data layer collects and gathers data in an open database mode, and configures the architecture owners of corresponding database names and tables to directly access the databases which can be directly communicated; and the indirectly connected database is accessed in a link server mode or an openset and opendatasource mode.
Further, the data layer obtains a bottom data exchange of the software system, a network flow packet between the software client and the database, acquires all data generated by the target system, converts and re-structures the data, and outputs the data to a new database for the software system to call through a data direct acquisition mode based on the bottom data exchange and based on technologies such as a bottom IO request and network analysis.
The invention has the beneficial technical effects that:
the invention provides a high-performance scale statistical calculation method for government affair data mining, which adopts a distributed file storage system, a mass data ETL engine, a stream data processing engine and an analysis result visual display system oriented to mass data analysis and combines the characteristics of government affair service industry, an industry data mining model is established in advance to quickly extract and display key data from mass data, compared with the traditional data mining system, the early data type analysis and modeling workload are greatly reduced, the data mining efficiency is improved by at least 10%, and in combination with a large amount of experience accumulation in the domestic government affair service industry, the accuracy of data mining is also guaranteed, the diversified demands are met, the implementation period is shorter, the response is timely, and the cost is low.
Drawings
FIG. 1 is a schematic view of the present invention;
FIG. 2 is a flow chart of an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail below in order to make the technical solutions of the present invention more clear and definite to those skilled in the art, but the embodiments of the present invention are not limited thereto.
Example (b):
as shown in fig. 1 and fig. 2, the high-performance statistical computation method for government affair data mining provided in this embodiment specifically includes a presentation layer, an application support layer, and a data layer; the presentation layer mainly comprises a data large screen and a data analysis report; the data large screen can be applied to conference exhibition and business monitoring of government affair users; the data analysis report can be used for leader decision-making and operation analysis, in order to enable analyzed data to be visually and briefly presented to a user, a certain form needs to be represented and issued, some query and report tools are generally adopted, a system provides a common data visualization chart, a coordinate system, a legend, a prompt, a tool box and other basic components are established on the bottom layer based on a ZRender, a broken line graph (a region graph), a bar graph (a bar graph), a scatter graph (a bubble graph), a pie graph (a ring graph), a K line graph, a map, a force guidance layout graph and a chord graph are constructed on the basic component, and meanwhile, stacking and multi-chart mixed presentation of any dimension are supported; the application layer mainly comprises three parts, namely data analysis, data source management and demonstration management; the data analysis is to combine the display template and the data model to generate a data analysis page; the data source management mainly comprises the steps of adding and connecting target data sources and establishing a data model; the demonstration management is used for demonstrating requirements of users in different scenes, and different themes and demonstration modes are used.
The application support layer can provide visual templates, template labels, chart data, chart styles, dimension fields, measurement dictionaries, custom expressions, visual table associations and other components, provides the components for the application layer to carry out service calling, and utilizes an Online Analytical Processing (OLAP) technology to carry out multi-dimensional analysis and display on data, so that analysts, managers or executives can rapidly, consistently and interactively access information which is converted from original data from various angles, can be really understood by users and truly reflects dimension characteristics;
the OLAP analysis is carried out on the premise that a built data warehouse is available, then, the complex query capability, data comparison, data extraction and report forms of the OLAP can be utilized to carry out detection type data analysis, after a user selects related data, the user can carry out analysis attempts on the data at different granularities through operations such as slicing (selecting data according to two dimensions), slicing (selecting data according to three dimensions), drilling (selecting data detail information and data view of higher level), drilling (expanding detailed information of data of the same level), rotating (obtaining data of different views) and the like to obtain different forms of knowledge and results, and online analysis processing is carried out on ROLAP (OLAP based on a relational database) and MOLAP (OLAP based on a multi-dimensional data organization) to reduce storage space and improve system performance;
predictive models, which can determine a result precisely from the values of data items, mainly include specific algorithms such as basket Analysis (Market Analysis), cluster Detection (Clustering Detection), Neural Networks (Neural Networks), Decision tree methods (Decision Trees), Genetic algorithms (Genetic Analysis), Link Analysis (Link Analysis), Case-Based Reasoning (Case Based learning), and rough set (RoughSet), as well as various statistical models.
The data layer supports the butt joint of various data sources, including excel data, csv data, mysql database, oracle database, hive and hbase of hadoop and the like.
The data layer provides data interfaces for each service system in a software interface mode, and collects and converges data;
the data layer collects and gathers data in an open database mode, the database can be directly communicated, and a structure owner configuring corresponding database names and tables directly accesses the database; the indirectly connected database is accessed in a mode of linking a server or using openset and opendatasource;
the data layer obtains a bottom data exchange of the software system, a network flow packet between the software client and the database, acquires all data generated by the target system, converts and re-structures the data, and outputs the data to a new database for the software system to call through a data direct acquisition mode based on the bottom data exchange and based on technologies such as a bottom IO request, network analysis and the like.
As shown in fig. 2, the specific implementation steps are as follows:
1. establishing a data source, newly establishing the data source in the system, filling in connection information of the data source, selecting different data sources for data connection, and testing the connectivity without problems;
2. data modeling, wherein table relations in a data source are set in a system and are mainly divided into inner relations, left relations and right relations, the relation relations before the tables are set, measurement calculation can be carried out on numerical fields, the numerical fields are subjected to de-duplication, summation, average number, counting, median, maximum value and minimum value calculation, and the numerical fields become independent measurement fields after being stored;
3. the method comprises the steps of displaying a diagram, setting the size and the background of a canvas of a page, dragging a diagram component, setting the style and the display data of the diagram component, generating a dynamic diagram after saving, and setting the public parameters, the updating period and the like of the diagram.
4. And (4) demonstration control, namely putting the prepared chart into a group of demonstration control, setting demonstration effects such as full screen and carousel, and switching demonstration through a mobile phone H5.
In this embodiment, as shown in fig. 2, by adopting a distributed file storage system, a mass data ETL engine, a stream data processing engine, and an analysis result visualization presentation system that establish a data source, perform data modeling, perform graph presentation, and perform control, are oriented to mass data analysis, and combining characteristics of the government affairs service industry, an industry data mining model is established in advance to quickly extract and present key data from mass data.
The above description is only for the purpose of illustrating the present invention and is not intended to limit the scope of the present invention, and any person skilled in the art can substitute or change the technical solution of the present invention and its conception within the scope of the present invention.

Claims (10)

1. A high-performance scale statistical calculation method for government affair data mining is characterized by being realized by a presentation layer, an application support layer and a data layer.
2. The method of claim 1, wherein said presentation layer is mainly a data large screen and a data analysis report.
3. The method for high-performance statistical scale calculation of government affair data mining according to claim 2, wherein the data large screen can be applied to conference exhibition and business monitoring of government affair users; the data analysis reports may be used for lead decisions and operational analysis.
4. The method of claim 1, wherein the application layer is mainly divided into three parts, namely data analysis, data source management and demonstration management.
5. The method according to claim 4, wherein the data analysis is a combination of a presentation template and a data model to generate a data analysis page; the data source management mainly comprises the steps of adding and connecting target data sources and establishing a data model; the demonstration management is used for demonstrating requirements of users in different scenes, and different themes and demonstration modes are used.
6. The method according to claim 1, wherein the application support layer can provide visual templates, template labels, chart data, chart styles, dimension fields, measurement dictionaries, custom expressions and visual table association components, provide the visual templates, the template labels, the chart data, the chart styles, the dimension fields, the measurement dictionaries, the custom expressions and the visual table association components for service invocation by the application layer, and perform multidimensional analysis and presentation on the data by using an online analysis processing technology.
7. The method of claim 1, wherein the data layer supports interfacing of multiple data sources, including excel data, csv data, mysql database, oracle database, hive and hbase of hadoop.
8. The method according to claim 7, wherein the data layer provides data interfaces for each service system to collect and aggregate data by means of software interfaces.
9. The method according to claim 8, wherein the data layer gathers and aggregates data in an open database manner, and the architecture owner configuring the corresponding database name and table directly accesses the database which can be directly connected; and the indirectly connected database is accessed in a link server mode or an openset and opendatasource mode.
10. The high-performance statistical computation method for government affair data mining according to claim 9, wherein the data layer obtains the underlying data exchange of the software system, the network traffic package between the software client and the database, collects all data generated by the target system, converts and re-structures the data, and outputs the data to a new database for the software system to call, through a direct data collection mode based on the underlying data exchange, and based on techniques such as underlying IO request and network analysis.
CN202010047244.0A 2020-01-16 2020-01-16 High-performance scale statistical calculation method for government affair data mining Withdrawn CN111291102A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010047244.0A CN111291102A (en) 2020-01-16 2020-01-16 High-performance scale statistical calculation method for government affair data mining

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010047244.0A CN111291102A (en) 2020-01-16 2020-01-16 High-performance scale statistical calculation method for government affair data mining

Publications (1)

Publication Number Publication Date
CN111291102A true CN111291102A (en) 2020-06-16

Family

ID=71017060

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010047244.0A Withdrawn CN111291102A (en) 2020-01-16 2020-01-16 High-performance scale statistical calculation method for government affair data mining

Country Status (1)

Country Link
CN (1) CN111291102A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117251499A (en) * 2023-11-15 2023-12-19 山东光合云谷大数据有限公司 Data acquisition system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104754058A (en) * 2015-04-16 2015-07-01 贝才禾兑网络科技南京有限公司 Intelligent finance and tax service platform based on SaaS platform
CN109034739A (en) * 2018-07-19 2018-12-18 宁夏希望信息产业股份有限公司 A kind of internet government affairs service system
CN109766374A (en) * 2018-12-26 2019-05-17 科大国创软件股份有限公司 A kind of credit joint supervising platform

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104754058A (en) * 2015-04-16 2015-07-01 贝才禾兑网络科技南京有限公司 Intelligent finance and tax service platform based on SaaS platform
CN109034739A (en) * 2018-07-19 2018-12-18 宁夏希望信息产业股份有限公司 A kind of internet government affairs service system
CN109766374A (en) * 2018-12-26 2019-05-17 科大国创软件股份有限公司 A kind of credit joint supervising platform

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117251499A (en) * 2023-11-15 2023-12-19 山东光合云谷大数据有限公司 Data acquisition system
CN117251499B (en) * 2023-11-15 2024-02-06 山东光合云谷大数据有限公司 Data acquisition system

Similar Documents

Publication Publication Date Title
CN107193967A (en) A kind of multi-source heterogeneous industry field big data handles full link solution
CN107506393B (en) Agricultural big data model and application thereof in agriculture
CN108984761B (en) Information processing system based on model and domain knowledge drive
Du Energy analysis of Internet of things data mining algorithm for smart green communication networks
CN111813958A (en) Intelligent service method and system based on innovation and entrepreneurship platform
CN112633822B (en) Asset management method based on digital twin technology, storage medium and mobile terminal
Guo et al. Big Data Application Issues in the Agricultural Modernization of China.
CN112100800A (en) Design method of spatio-temporal information intelligent analysis system architecture for geographical environment
CN111291102A (en) High-performance scale statistical calculation method for government affair data mining
CN106815320B (en) Investigation big data visual modeling method and system based on expanded three-dimensional histogram
CN111784192A (en) Industrial park emergency plan executable system based on dynamic evolution
Niu Optimization of teaching management system based on association rules algorithm
Ying Research on big data and new smart city construction
Sui et al. Optimization simulation of supply-side structure of agricultural economy based on big data analysis in data sharing environment
Wang et al. Data analytics enabled power marketing analysis and decision-making supporting system
Ren Construction of Ideological and Political Practice Teaching System of Documentary Creation Course Based on Deep Learning
CN114303469B (en) Task-oriented capability guarantee plan generation system and method
CN113868322B (en) Semantic structure analysis method, device and equipment, virtualization system and medium
Chen et al. Internet of things technology in ecological security assessment system of intelligent land
Sun et al. Intelligent transportation decision system based on big data mining
Shang et al. Optimization of Computer-aided English Classroom Teaching System Based on Data Mining
Xu et al. Visualization digital system of digital museum based on big data technology
CN110209747A (en) Space big data digging system based on cluster
Xie et al. Design and Implementation of Data Mining in Information Management System
Wu et al. Research on Big Data Ad Hoc Queries Technology Based on Social Network Information Cognitive Model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20200616

WW01 Invention patent application withdrawn after publication