CN111291102A

CN111291102A - High-performance scale statistical calculation method for government affair data mining

Info

Publication number: CN111291102A
Application number: CN202010047244.0A
Authority: CN
Inventors: 金震宇; 树华伟; 罗玉泉
Original assignee: Dahan Software Co Ltd
Current assignee: Dahan Software Co Ltd
Priority date: 2020-01-16
Filing date: 2020-01-16
Publication date: 2020-06-16

Abstract

The invention discloses a high-performance scale statistical calculation method for government affair data mining, which belongs to the technical field of statistical calculation methods and is realized by a presentation layer, an application support layer and a data layer; the presentation layer mainly comprises a data large screen and a data analysis report; the application layer is mainly divided into three parts, namely data analysis, data source management and demonstration management; the data analysis is to combine the display template and the data model to generate a data analysis page, establish the data model and use different themes and demonstration modes; the application support layer may provide association components. According to the invention, by adopting the visual display system oriented to mass data analysis and combining the characteristics of government affairs service industry, an industry data mining model is established in advance to quickly extract and display key data from mass data, so that the data type analysis and modeling workload in the early stage is greatly reduced compared with that of the traditional data mining system, and the data mining efficiency is improved by at least 10%.

Description

High-performance scale statistical calculation method for government affair data mining

Technical Field

The invention relates to a statistical calculation method, in particular to a high-performance scale statistical calculation method for government affair data mining, and belongs to the technical field of statistical calculation methods.

Background

With the continuous promotion of the work of 'internet + government affair service', a large number of government affair systems and applications are integrated by an intensive and government affair service platform, and the intensive and government affair service platform has the largest number of data resources with the highest value density at present. From the perspective of government construction, the internet and government affair service invests a large amount of capital and manpower, the obtained scores and effects cannot be displayed, internal examination cannot be quantized, objective fairness is difficult to achieve, and meanwhile, the problems of how to operate and maintain a huge system and how to improve the existing services and the like emerge from the water surface successively.

Because of this, the mining and utilization of government affair data resources are urgent, and it can play a supporting, feedback and guiding role to government affairs and intensive work, and simultaneously promote the operation and maintenance service capability of the system, find problems that appear in the development. From the perspective of public management, the method has important values for promoting economic development, improving social governance, and improving government service and supervision capabilities.

In the field of data intelligence, although application scenarios are various, most data statistics analysis logics are relatively simple and are often based on simple event statistics. For example, the vehicle license plate is captured by the intelligent traffic gate, and the event information is the license plate number, the vehicle attribution, the time and the place. The structure of the event log itself is very simple. The service pain points are mainly large in data scale, various in statistical dimensionality and huge in computing resource consumption. Meanwhile, the business department submits a new calculation task and resubmits developers to carry out secondary development, so that the efficiency is low.

The defects and shortcomings exist at present:

supportable data sources are single, data of multiple data sources cannot be uniformly integrated, most of the supportable data sources are only collected aiming at the single data sources, the data sources in the actual environment are numerous and have interfaces, databases, index libraries and other data sources, and diversified requirements cannot be met;

the method has the advantages of acquisition, processing, analysis, display, high technical content, high use threshold, implementation by professional personnel, code compiling and relatively long implementation period;

the visualization effect is not rich, the adjustment difficulty is high, the data display depends on the customization of developers, the response cannot be timely carried out when the adjustment is needed, and the cost is high;

the data scale is large, the statistical dimensionality is various, the computing resource consumption is huge, and the computing mode of a single point cannot meet the requirement.

Disclosure of Invention

The invention mainly aims to solve the problems that the statistical calculation method in the prior art cannot meet diversified demands, is long in implementation period, is not timely in response and high in cost, and provides a high-performance scale statistical calculation method for government affair data mining.

The purpose of the invention can be achieved by adopting the following technical scheme:

a high-performance scale statistical calculation method for government affair data mining is realized by a presentation layer, an application support layer and a data layer.

Further, the presentation layer mainly comprises a data large screen and a data analysis report.

Furthermore, the data large screen can be applied to conference exhibition and business monitoring of government affair users; the data analysis reports may be used for lead decisions and operational analysis.

Furthermore, the application layer is mainly divided into three parts, namely data analysis, data source management and demonstration management.

Further, the data analysis is to combine the presentation template and the data model to generate a data analysis page; the data source management mainly comprises the steps of adding and connecting target data sources and establishing a data model; the demonstration management is used for demonstrating requirements of users in different scenes, and different themes and demonstration modes are used.

Further, the application support layer can provide a visual template, a template tag, chart data, a chart style, a dimension field, a measurement dictionary, a user-defined expression and a visual table association component, provides the visual template, the template tag, the chart data, the chart style, the dimension field, the measurement dictionary, the user-defined expression and the visual table association component for an application layer to call services, and performs multidimensional analysis and display on data by using an online analysis processing technology.

Further, the data layer supports the docking of various data sources, including excel data, csv data, mysql database, oracle database, hive and hbase of hadoop.

Furthermore, the data layer provides data interfaces for each service system in a software interface mode, and data acquisition is converged.

Furthermore, the data layer collects and gathers data in an open database mode, and configures the architecture owners of corresponding database names and tables to directly access the databases which can be directly communicated; and the indirectly connected database is accessed in a link server mode or an openset and opendatasource mode.

Further, the data layer obtains a bottom data exchange of the software system, a network flow packet between the software client and the database, acquires all data generated by the target system, converts and re-structures the data, and outputs the data to a new database for the software system to call through a data direct acquisition mode based on the bottom data exchange and based on technologies such as a bottom IO request and network analysis.

The invention has the beneficial technical effects that:

the invention provides a high-performance scale statistical calculation method for government affair data mining, which adopts a distributed file storage system, a mass data ETL engine, a stream data processing engine and an analysis result visual display system oriented to mass data analysis and combines the characteristics of government affair service industry, an industry data mining model is established in advance to quickly extract and display key data from mass data, compared with the traditional data mining system, the early data type analysis and modeling workload are greatly reduced, the data mining efficiency is improved by at least 10%, and in combination with a large amount of experience accumulation in the domestic government affair service industry, the accuracy of data mining is also guaranteed, the diversified demands are met, the implementation period is shorter, the response is timely, and the cost is low.

Drawings

FIG. 1 is a schematic view of the present invention;

FIG. 2 is a flow chart of an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail below in order to make the technical solutions of the present invention more clear and definite to those skilled in the art, but the embodiments of the present invention are not limited thereto.

Example (b):

as shown in fig. 1 and fig. 2, the high-performance statistical computation method for government affair data mining provided in this embodiment specifically includes a presentation layer, an application support layer, and a data layer; the presentation layer mainly comprises a data large screen and a data analysis report; the data large screen can be applied to conference exhibition and business monitoring of government affair users; the data analysis report can be used for leader decision-making and operation analysis, in order to enable analyzed data to be visually and briefly presented to a user, a certain form needs to be represented and issued, some query and report tools are generally adopted, a system provides a common data visualization chart, a coordinate system, a legend, a prompt, a tool box and other basic components are established on the bottom layer based on a ZRender, a broken line graph (a region graph), a bar graph (a bar graph), a scatter graph (a bubble graph), a pie graph (a ring graph), a K line graph, a map, a force guidance layout graph and a chord graph are constructed on the basic component, and meanwhile, stacking and multi-chart mixed presentation of any dimension are supported; the application layer mainly comprises three parts, namely data analysis, data source management and demonstration management; the data analysis is to combine the display template and the data model to generate a data analysis page; the data source management mainly comprises the steps of adding and connecting target data sources and establishing a data model; the demonstration management is used for demonstrating requirements of users in different scenes, and different themes and demonstration modes are used.

The application support layer can provide visual templates, template labels, chart data, chart styles, dimension fields, measurement dictionaries, custom expressions, visual table associations and other components, provides the components for the application layer to carry out service calling, and utilizes an Online Analytical Processing (OLAP) technology to carry out multi-dimensional analysis and display on data, so that analysts, managers or executives can rapidly, consistently and interactively access information which is converted from original data from various angles, can be really understood by users and truly reflects dimension characteristics;

the OLAP analysis is carried out on the premise that a built data warehouse is available, then, the complex query capability, data comparison, data extraction and report forms of the OLAP can be utilized to carry out detection type data analysis, after a user selects related data, the user can carry out analysis attempts on the data at different granularities through operations such as slicing (selecting data according to two dimensions), slicing (selecting data according to three dimensions), drilling (selecting data detail information and data view of higher level), drilling (expanding detailed information of data of the same level), rotating (obtaining data of different views) and the like to obtain different forms of knowledge and results, and online analysis processing is carried out on ROLAP (OLAP based on a relational database) and MOLAP (OLAP based on a multi-dimensional data organization) to reduce storage space and improve system performance;

predictive models, which can determine a result precisely from the values of data items, mainly include specific algorithms such as basket Analysis (Market Analysis), cluster Detection (Clustering Detection), Neural Networks (Neural Networks), Decision tree methods (Decision Trees), Genetic algorithms (Genetic Analysis), Link Analysis (Link Analysis), Case-Based Reasoning (Case Based learning), and rough set (RoughSet), as well as various statistical models.

The data layer supports the butt joint of various data sources, including excel data, csv data, mysql database, oracle database, hive and hbase of hadoop and the like.

The data layer provides data interfaces for each service system in a software interface mode, and collects and converges data;

the data layer collects and gathers data in an open database mode, the database can be directly communicated, and a structure owner configuring corresponding database names and tables directly accesses the database; the indirectly connected database is accessed in a mode of linking a server or using openset and opendatasource;

the data layer obtains a bottom data exchange of the software system, a network flow packet between the software client and the database, acquires all data generated by the target system, converts and re-structures the data, and outputs the data to a new database for the software system to call through a data direct acquisition mode based on the bottom data exchange and based on technologies such as a bottom IO request, network analysis and the like.

As shown in fig. 2, the specific implementation steps are as follows:

1. establishing a data source, newly establishing the data source in the system, filling in connection information of the data source, selecting different data sources for data connection, and testing the connectivity without problems;

2. data modeling, wherein table relations in a data source are set in a system and are mainly divided into inner relations, left relations and right relations, the relation relations before the tables are set, measurement calculation can be carried out on numerical fields, the numerical fields are subjected to de-duplication, summation, average number, counting, median, maximum value and minimum value calculation, and the numerical fields become independent measurement fields after being stored;

3. the method comprises the steps of displaying a diagram, setting the size and the background of a canvas of a page, dragging a diagram component, setting the style and the display data of the diagram component, generating a dynamic diagram after saving, and setting the public parameters, the updating period and the like of the diagram.

4. And (4) demonstration control, namely putting the prepared chart into a group of demonstration control, setting demonstration effects such as full screen and carousel, and switching demonstration through a mobile phone H5.

In this embodiment, as shown in fig. 2, by adopting a distributed file storage system, a mass data ETL engine, a stream data processing engine, and an analysis result visualization presentation system that establish a data source, perform data modeling, perform graph presentation, and perform control, are oriented to mass data analysis, and combining characteristics of the government affairs service industry, an industry data mining model is established in advance to quickly extract and present key data from mass data.

The above description is only for the purpose of illustrating the present invention and is not intended to limit the scope of the present invention, and any person skilled in the art can substitute or change the technical solution of the present invention and its conception within the scope of the present invention.

Claims

1. A high-performance scale statistical calculation method for government affair data mining is characterized by being realized by a presentation layer, an application support layer and a data layer.

2. The method of claim 1, wherein said presentation layer is mainly a data large screen and a data analysis report.

3. The method for high-performance statistical scale calculation of government affair data mining according to claim 2, wherein the data large screen can be applied to conference exhibition and business monitoring of government affair users; the data analysis reports may be used for lead decisions and operational analysis.

4. The method of claim 1, wherein the application layer is mainly divided into three parts, namely data analysis, data source management and demonstration management.

5. The method according to claim 4, wherein the data analysis is a combination of a presentation template and a data model to generate a data analysis page; the data source management mainly comprises the steps of adding and connecting target data sources and establishing a data model; the demonstration management is used for demonstrating requirements of users in different scenes, and different themes and demonstration modes are used.

6. The method according to claim 1, wherein the application support layer can provide visual templates, template labels, chart data, chart styles, dimension fields, measurement dictionaries, custom expressions and visual table association components, provide the visual templates, the template labels, the chart data, the chart styles, the dimension fields, the measurement dictionaries, the custom expressions and the visual table association components for service invocation by the application layer, and perform multidimensional analysis and presentation on the data by using an online analysis processing technology.

7. The method of claim 1, wherein the data layer supports interfacing of multiple data sources, including excel data, csv data, mysql database, oracle database, hive and hbase of hadoop.

8. The method according to claim 7, wherein the data layer provides data interfaces for each service system to collect and aggregate data by means of software interfaces.

9. The method according to claim 8, wherein the data layer gathers and aggregates data in an open database manner, and the architecture owner configuring the corresponding database name and table directly accesses the database which can be directly connected; and the indirectly connected database is accessed in a link server mode or an openset and opendatasource mode.

10. The high-performance statistical computation method for government affair data mining according to claim 9, wherein the data layer obtains the underlying data exchange of the software system, the network traffic package between the software client and the database, collects all data generated by the target system, converts and re-structures the data, and outputs the data to a new database for the software system to call, through a direct data collection mode based on the underlying data exchange, and based on techniques such as underlying IO request and network analysis.