CN112347092B - Method, device and computer equipment for generating data analysis billboard - Google Patents

Method, device and computer equipment for generating data analysis billboard Download PDF

Info

Publication number
CN112347092B
CN112347092B CN202011224649.3A CN202011224649A CN112347092B CN 112347092 B CN112347092 B CN 112347092B CN 202011224649 A CN202011224649 A CN 202011224649A CN 112347092 B CN112347092 B CN 112347092B
Authority
CN
China
Prior art keywords
data
index
dimension
database
business
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011224649.3A
Other languages
Chinese (zh)
Other versions
CN112347092A (en
Inventor
张健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN202011224649.3A priority Critical patent/CN112347092B/en
Publication of CN112347092A publication Critical patent/CN112347092A/en
Application granted granted Critical
Publication of CN112347092B publication Critical patent/CN112347092B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to big data technology, and discloses a method for generating a data analysis billboard, which comprises the following steps: determining databases to which data corresponding to the service indexes respectively belong according to the service indexes selected by the current user; calling a data synchronization script, and acquiring data corresponding to each service index from each database to form a collection source database; according to the correlation coefficient with each business index, cleaning the data in the collected source database to obtain a target database; acquiring index dimensions corresponding to service indexes selected by a current user in a configuration library; and forming a billboard graph corresponding to the index dimension according to the data of the target database. The related data are acquired and collected from different source databases through developing a data synchronization script, then data cleaning is carried out through correlation with service indexes to obtain target data, a billboard image corresponding to the target data according to dimension items is formed through index dimensions selected by a user, and dimension data are intuitively displayed.

Description

Method, device and computer equipment for generating data analysis billboard
Technical Field
The present application relates to the field of big data, and in particular, to a method, an apparatus, and a computer device for generating a data analysis billboard.
Background
The human resources department has not yet completely acquired the index data of interest to HR, such as: manpower distribution map, trunk configuration distribution map, institution performance distribution map, training budget distribution map, salary level distribution map, staff training distribution map, staff departure rate and the like can only depend on experience, and can not play a role in guiding and deciding for high-rise leaders; for indexes needing to know historical information, the data statistics cannot be realized at all when the related data quantity is relatively large; the statistics period is long in time consumption and slow in response, and relatively more manpower is required to be occupied, so that the statistics result is inaccurate.
Disclosure of Invention
The main aim of the application is to solve the technical problem that data in the field of human resources cannot be counted and displayed at random.
The application provides a method for generating a data analysis billboard, which comprises the following steps:
determining databases to which data respectively corresponding to the service indexes belong according to the service indexes selected by the current user;
calling a data synchronization script, and acquiring data corresponding to each business index from each database to form a collection source database;
Cleaning the data in the collecting source database according to the correlation coefficient with each business index to obtain a target database;
acquiring index dimensions corresponding to the business indexes selected by the current user in a configuration library;
and forming a billboard graph corresponding to the index dimension according to the data of the target database.
Preferably, the step of cleaning the data in the aggregate source database according to the correlation coefficient with each business index to obtain a target database includes:
acquiring data corresponding to a specified index, wherein the specified index belongs to any index in all service indexes;
sequencing and calculating the data corresponding to the specified index through a sequencing and calculating component to obtain the correlation coefficient of the data and the specified index;
forming a data queue corresponding to the specified index from high to low according to the correlation coefficient;
storing the data with the correlation coefficient larger than a preset threshold value in the data queue into a target database to form target data corresponding to the specified index;
and generating target data corresponding to each business index respectively according to the generation process of the target data corresponding to the specified index, and obtaining the target database after the collection source database is cleaned.
Preferably, the step of sorting the data related to the specified index by a sorting calculation component to obtain a correlation coefficient between the data and the specified index includes:
extracting index feature vectors of the specified indexes and data feature vectors respectively corresponding to data related to the specified indexes;
calculating cosine distances between the index feature vector and each data feature vector according to a specified calculation formula, wherein the specified calculation formula is thatm represents index feature vectors, n represents each data feature vector, P represents the total number of the data feature vectors, P is a positive integer, θ represents a vector angle between the index feature vector and the data feature vector, i represents the number of the data feature vectors, n i Representing an ith data feature vector;
and taking the cosine distance as a correlation coefficient between data and the specified index.
Preferably, after the step of obtaining the target database after cleaning the aggregate source database, the generating step of generating target data corresponding to each service index according to the generating process of the target data corresponding to the specified index includes:
pushing target data corresponding to each business index to a corresponding memory grid node;
Invoking a pre-configured computing assembly in each memory grid node in parallel, enabling multithreading to carry out data analysis on target data corresponding to each business index respectively according to dimension items to obtain dimension data corresponding to each dimension item respectively;
and summarizing dimension data corresponding to the business indexes returned by the memory grid nodes to obtain a dimension database.
Preferably, the step of forming a billboard graph corresponding to the index dimension according to the data of the target database in the index dimension includes:
enabling multithreading to process data of each index dimension in the dimension database to obtain attribute values corresponding to each index dimension respectively;
acquiring configuration attributes of corresponding timing tasks on a timing scheduling platform at the current moment, wherein the configuration attributes comprise a designated chart mode for displaying attribute values corresponding to each index dimension respectively;
and forming a billboard graph in the specified chart mode according to the attribute values corresponding to the index dimensions respectively according to the configuration attributes of the timing tasks.
Preferably, after the step of enabling multithreading to process the data of each index dimension in the dimension database to obtain the attribute value corresponding to each index dimension, the method includes:
Judging whether comprehensive indexes corresponding to all index dimensions exist or not, wherein the comprehensive indexes are obtained by combining the index dimensions with the specified number;
inputting the attribute values corresponding to the index dimensions into a calculation assembly for comprehensive calculation to obtain comprehensive attribute values corresponding to the comprehensive index;
and forming a billboard graph by the comprehensive attribute values in the mode of the designated chart.
Preferably, before the step of obtaining the index dimension corresponding to the service index selected by the current user in the configuration library, the method includes:
acquiring a service index currently selected by a user and setting weights respectively corresponding to the service indexes;
inputting the currently selected business index into a configuration library;
and respectively associating each business index with the corresponding setting weight of each business index in a one-to-one correspondence manner in the configuration library to form configured index dimensions, and storing the configured index dimensions in the configuration library.
The application also provides a device for generating the data analysis billboard, which comprises:
the determining module is used for determining databases to which data corresponding to the business indexes respectively belong according to the business indexes selected by the current user;
the calling module is used for calling the data synchronization script, acquiring data corresponding to each service index from each database, and forming a collection source database;
The cleaning module is used for cleaning the data in the collection source database according to the correlation coefficient with each business index to obtain a target database;
the first acquisition module is used for acquiring index dimensions corresponding to the business indexes selected by the current user in the configuration library;
and the forming module is used for forming a billboard graph corresponding to the index dimension according to the data of the target database.
The present application also provides a computer device comprising a memory storing a computer program and a processor implementing the steps of the above method when executing the computer program.
The present application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the above-described method.
According to the method, related data are acquired and collected from different source databases through development of the data synchronization script, then data cleaning is carried out through correlation with service indexes to obtain target data, and a billboard image corresponding to the target data according to dimension items is formed through index dimensions selected by a user, so that visual display of dimension data is achieved.
Drawings
FIG. 1 is a flow chart of a method for generating a data analysis billboard according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a device for generating a data analysis billboard according to an embodiment of the present application;
FIG. 3 is a schematic diagram of an internal structure of a computer device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
Referring to fig. 1, the method for generating a data analysis billboard of this embodiment includes:
s1: determining databases to which data respectively corresponding to the service indexes belong according to the service indexes selected by the current user;
s2: calling a data synchronization script, and acquiring data corresponding to each business index from each database to form a collection source database;
s3: cleaning the data in the collecting source database according to the correlation coefficient with each business index to obtain a target database;
s4: acquiring index dimensions corresponding to the business indexes selected by the current user in a configuration library;
S5: and forming a billboard graph corresponding to the index dimension according to the data of the target database.
In the embodiment of the application, the service index is determined by means of the data subject needed by the service, the data related to the service index is extracted from different databases through data mining and is collected into a collection source database, so that the centralized analysis of the data is realized through data mining. In the data analysis process, a high-efficiency queue is formed according to the correlation coefficient of the data and the service index, a target database is obtained through a data filtering program and a data input program, then data corresponding to the appointed dimension of the service index is extracted from the target database in a memory grid according to a dynamic rule to form dimension data, and the dimension data is visually displayed through an icon function to form a billboard capable of being directly used by the service. For example, the embodiments of the present application take the summary of human resource data as an example. For example, the staff assessment data of the organization needs personnel support management system support; training budget data requires training budget system support and financial related data support; training course data requires data support of a bird knowing system; recruitment data requires data support associated with a human resource system, and the like. The data sources are different, the data synchronization modes are also different, and the data distributed throughout the system are integrated through the development of the Sqoop script and the Kettle script. The Sqoop script obtains needed original data from the Hive database, and obtains related original data through a spark program; the Kettle script is mainly used for synchronizing needed data from a relational database, developing the Kettle script, and then running the Kettle script through a java program to obtain data corresponding to service indexes. Synchronizing personnel information data and personnel architecture data required by the analysis of the human resource data into a personnel management system from an association system through a key script; the bottom data of indexes such as a recruitment index, a financial index, a course index and the like required by the analysis of the human resource data are synchronized from the association system to the personnel management system through the hadoop script to form a collection source database, so that the collection of scattered data is realized, and the centralized analysis of the data is convenient.
According to the method, the data processing center is developed, and the synchronized data are cleaned through the data processing center to generate the target database. The data cleaning process is realized according to the data theme required by the customized service. The data subject is determined by the selected business index. And monitoring service data required by the current data subject, performing correlation analysis on the acquired service data, and writing the data meeting the specified correlation requirement after analysis into a corresponding target database to form a target database.
According to the method, the dimension is analyzed through the current data, the data is further processed in the target database, the dimension database is obtained, and the dimension data is displayed in an intuitive billboard through the data display mode determined on the dynamic configuration page. The corresponding business indexes of different business demands are different, and the different index dimensions of data analysis can lead to the change of the displayed billboard graph, so that the business demands can be met. According to the method and the device, related data are acquired from different source databases through development of the data synchronization script, collection of scattered data is achieved, then data cleaning is conducted through correlation with service indexes to obtain target data, a billboard image corresponding to dimension data is formed through index dimensions selected by a user, visual display of the dimension data is achieved, and convenience, intuitiveness and instantaneity of data analysis and data display are improved.
Further, step S3 of cleaning the data in the aggregate source database according to the correlation coefficient with each business index to obtain a target database, includes:
s31: acquiring data corresponding to a specified index, wherein the specified index belongs to any index in all service indexes;
s32: sequencing and calculating the data corresponding to the specified index through a sequencing and calculating component to obtain the correlation coefficient of the data and the specified index;
s33: forming a data queue corresponding to the specified index from high to low according to the correlation coefficient;
s34: storing the data with the correlation coefficient larger than a preset threshold value in the data queue into a target database to form target data corresponding to the specified index;
s35: and generating target data corresponding to each business index respectively according to the generation process of the target data corresponding to the specified index, and obtaining the target database after the collection source database is cleaned.
In the embodiment of the application, a data queue is formed according to the correlation coefficient of the data and the service index, and the data is cleaned according to the preset data cleaning condition. For example, data which cannot meet the correlation threshold is washed out, and data with the correlation number larger than a preset threshold is stored in a target database for subsequent business data analysis. According to the method, data of tens of millions of orders of magnitude in the source database are collected through data cleaning, and the data cleaning is completed within 1 hour through an efficient data queue, so that the target database is obtained.
Further, the step S32 of sorting the data related to the specified index by the sorting calculation component to obtain the correlation coefficient between the data and the specified index includes:
s321: extracting index feature vectors of the specified indexes and data feature vectors respectively corresponding to data related to the specified indexes;
s322: calculating cosine distances between the index feature vector and each data feature vector according to a specified calculation formula, wherein the specified calculation formula is thatm represents index feature vectors, n represents each data feature vector, P represents the total number of the data feature vectors, P is a positive integer, θ represents a vector angle between the index feature vector and the data feature vector, i represents the number of the data feature vectors, n i Representing an ith data feature vector;
s323: and taking the cosine distance as a correlation coefficient between data and the specified index.
In the data cleaning process of the embodiment of the application, an efficient queue is formed according to the correlation coefficient of the data and the service index. In the embodiment of the application, the correlation coefficient of the data and the service index is evaluated through the vector cosine distance respectively corresponding to the data and the service index, so that the calculated amount is small and the service requirement can be met with the precision.
Further, according to the generating process of the target data corresponding to the specified index, generating target data corresponding to each service index respectively, and after the step S35 of cleaning the target database after collecting the source database, the method includes:
s351: pushing target data corresponding to each business index to a corresponding memory grid node;
s352: invoking a pre-configured computing assembly in each memory grid node in parallel, enabling multithreading to carry out data analysis on target data corresponding to each business index respectively according to dimension items to obtain dimension data corresponding to each dimension item respectively;
s353: and summarizing dimension data corresponding to the business indexes returned by the memory grid nodes to obtain a dimension database.
According to the embodiment of the application, the asynchronous task is developed, the background timing task is developed for the data corresponding to each dimension item, the data corresponding to each dimension item is further extracted from the target database by utilizing multithreading, and the dimension data which can be displayed is generated. According to the embodiment of the application, the data in the target database is pushed to the memory grid by developing the calculation function of the memory grid, and the data analysis is carried out according to the dimension items by the distributed calculation program, so that dimension data corresponding to each dimension item is obtained. For example, in the embodiment of the application, 16 memory grid nodes are configured, so that the generation process of dimension data is accelerated, 45 pieces of organization data are processed, the data analysis can be completed within 30 minutes, and dimension data corresponding to 700 dimensions respectively are obtained.
Further, step S5 of forming a billboard graph corresponding to the index dimension according to the data of the target database, includes:
s51: enabling multithreading to process data of each index dimension in the dimension database to obtain attribute values corresponding to each index dimension respectively;
s52: acquiring configuration attributes of corresponding timing tasks on a timing scheduling platform at the current moment, wherein the configuration attributes comprise a designated chart mode for displaying attribute values corresponding to each index dimension respectively;
s53: and forming a billboard graph in the specified chart mode according to the attribute values corresponding to the index dimensions respectively according to the configuration attributes of the timing tasks.
The embodiment of the application supports the configuration attribute and the dynamic setting of the timing task by developing the service dynamic configuration system and the timing scheduling platform. The timing tasks are determined according to the service scene, all the timing tasks are written into a task list, the timing tasks are completed through an asynchronous program, and the names of the timing tasks are mounted in a timing scheduling platform after the completion. The configuration attributes such as execution time, running time length, task type and the like of the timing task are used as business operations which can be actually performed by a user through dynamic configuration of the related configuration attributes such as the running time length, including real-time operations or timing operations and the like. The data of the target database cannot support index analysis and display, and the attribute values which are required to be converted into dimension data are displayed. The calculation of the attribute value of the dimension data is related to a specific dimension item, for example, the dimension item is the coverage rate of the training course corresponding to the training index, and the corresponding attribute value can be obtained by dividing the total number of people participating in the training course by the total number of people. The above specified graph modes include, but are not limited to, pie charts, histograms, and line charts. The service indexes which are required to be focused on by the service and stored in the configuration library are analyzed, the attribute value corresponding to each service index is obtained through a program, then the attribute value is output to the page through the page in the forms of a pie chart, a histogram, a line chart and the like, and the display form can be dynamically configured according to the preference of each user. Through developing the front-end page, dimensional data, various trend graphs, cake graphs, linear graphs and the like are displayed through the page, and a downloading function is provided, so that the dimensional data obtained through analysis can be directly displayed and reported.
Further, after step S51 of enabling multithreading to process data of each index dimension in the dimension database to obtain attribute values corresponding to each index dimension, the method includes:
s511: judging whether comprehensive indexes corresponding to all index dimensions exist or not, wherein the comprehensive indexes are obtained by combining the index dimensions with the specified number;
s512: inputting the attribute values corresponding to the index dimensions into a calculation assembly for comprehensive calculation to obtain comprehensive attribute values corresponding to the comprehensive index;
s513: and forming a billboard graph by the comprehensive attribute values in the mode of the designated chart.
In the embodiment of the application, for the comprehensive index formed by the index dimensions of the multiple bases, the comprehensive attribute value corresponding to the comprehensive index can be obtained by analyzing the dimension data again through the calculating component. The calculation functions in the calculation component include, but are not limited to, accumulation calculation according to weights, proportion calculation and the like, so that the application range of data analysis is enlarged.
Further, before step S4 of obtaining the index dimension corresponding to the service index selected by the current user in the configuration library, the method includes:
s41: acquiring a service index currently selected by a user and setting weights respectively corresponding to the service indexes;
S42: inputting the currently selected business index into a configuration library;
s43: and respectively associating each business index with the corresponding setting weight of each business index in a one-to-one correspondence manner in the configuration library to form configured index dimensions, and storing the configured index dimensions in the configuration library.
According to the embodiment of the application, the configuration library is formed according to the business indexes reflected by the business strip lines by analyzing the index sources of the main records in the data synchronization process. For example, the service indexes fed back by each service stripe are approximately 200 or more, the service indexes are input into a configuration library, the service indexes focused on are obtained through program sequencing, and weights are set for each service index to form a final configuration library. Business personnel can dynamically adjust the weight of the business index through the page so as to determine the index dimension corresponding to the business index selected during data analysis, so that a processing target is determined for the subsequent data processing direction and data display.
For the newly added incremental data every day, the process of obtaining the target data through data cleaning and data processing is carried out every day, and because the service data in the synchronous collecting source database can not be directly used by the corresponding service system, the corresponding service index analysis and service logic calculation can be supported after the data cleaning.
Referring to fig. 2, an apparatus for generating a data analysis billboard according to an embodiment of the application includes:
the determining module 1 is used for determining databases to which data corresponding to the business indexes respectively belong according to the business indexes selected by the current user;
the calling module 2 is used for calling a data synchronization script, acquiring data corresponding to each service index from each database, and forming a collection source database;
the cleaning module 3 is used for cleaning the data in the collection source database according to the correlation coefficient with each business index to obtain a target database;
the first obtaining module 4 is used for obtaining the index dimension corresponding to the service index selected by the current user in the configuration library;
and the forming module 5 is used for forming a billboard graph corresponding to the index dimension according to the data of the target database.
Further, the cleaning module 3 includes:
a first obtaining unit, configured to obtain data corresponding to a specified index, where the specified index belongs to any index of all service indexes;
the first calculation unit is used for carrying out sequencing calculation on the data corresponding to the specified index through the sequencing calculation component to obtain the correlation coefficient between the data and the specified index;
The first forming unit is used for forming a data queue corresponding to the specified index from high to low according to the correlation coefficient;
the second forming unit is used for storing the data with the correlation coefficient larger than a preset threshold value in the data queue into a target database to form target data corresponding to the specified index;
and the generating unit is used for generating target data corresponding to each business index respectively according to the generating process of the target data corresponding to the specified index, so as to obtain the target database after the collection source database is cleaned.
Further, the computing unit includes:
an extraction subunit, configured to extract an index feature vector of the specified index and a data feature vector corresponding to data related to the specified index respectively;
a calculating subunit, configured to calculate cosine distances between the index feature vectors and the data feature vectors according to a specified calculation formula, where the specified calculation formula is thatm represents index feature vectors, n represents each data feature vector, P represents the total number of the data feature vectors, P is a positive integer, θ represents a vector angle between the index feature vector and the data feature vector, i represents the number of the data feature vectors, n i Representing an ith data feature vector;
and the subunit is used for taking the cosine distance as a correlation coefficient between data and the specified index.
Further, the cleaning module 3 includes:
the pushing unit is used for pushing the target data corresponding to each business index to the corresponding memory grid node;
the calling unit is used for calling the pre-configured computing components in the memory grid nodes in parallel, enabling multithreading to carry out data analysis on the target data corresponding to the business indexes respectively according to the dimension items to obtain dimension data corresponding to the dimension items respectively;
and the summarizing unit is used for summarizing dimension data corresponding to the business indexes returned by the memory grid nodes to obtain a dimension database.
Further, forming a module 5, comprising:
the starting unit is used for starting multithreading to process the data of each index dimension in the dimension database to obtain attribute values corresponding to each index dimension respectively;
the second acquisition unit is used for acquiring configuration attributes of the corresponding timing tasks on the timing scheduling platform at the current moment, wherein the configuration attributes comprise a designated chart mode for displaying attribute values corresponding to each index dimension respectively;
And the third forming unit is used for forming a billboard graph in the specified chart mode according to the attribute values corresponding to the index dimensions respectively according to the configuration attribute of the timing task.
Further, forming a module 5, comprising:
the judging unit is used for judging whether comprehensive indexes corresponding to all index dimensions exist or not, wherein the comprehensive indexes are obtained by combining the index dimensions with the specified number;
the second calculation unit is used for inputting the attribute values corresponding to the index dimensions respectively into the calculation assembly to carry out comprehensive calculation so as to obtain comprehensive attribute values corresponding to the comprehensive index;
and a fourth forming unit, configured to form the signboard graph with the integrated attribute value in the specified chart mode.
Further, an apparatus for generating a data analysis billboard, comprising:
the second acquisition module is used for acquiring the service index selected by the user currently and the setting weight corresponding to each service index respectively;
the input module is used for inputting the currently selected business index into a configuration library;
and the association module is used for associating each business index with the corresponding setting weight of each business index in a one-to-one correspondence manner in the configuration library to form a configured index dimension, and storing the configured index dimension in the configuration library.
For the explanation of the embodiment of the device portion, please refer to the corresponding method portion, and the description is omitted.
Referring to fig. 3, a computer device is further provided in the embodiment of the present application, where the computer device may be a server, and the internal structure of the computer device may be as shown in fig. 3. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the computer is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used to store all the data required for the process of generating the data analysis bulletin. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method of generating a data analysis billboard.
The method for generating the data analysis board by the processor comprises the following steps: determining databases to which data respectively corresponding to the service indexes belong according to the service indexes selected by the current user; calling a data synchronization script, and acquiring data corresponding to each business index from each database to form a collection source database; cleaning the data in the collecting source database according to the correlation coefficient with each business index to obtain a target database; acquiring index dimensions corresponding to the business indexes selected by the current user in a configuration library; and forming a billboard graph corresponding to the index dimension according to the data of the target database.
According to the computer equipment, related data are acquired and collected from different source databases through development of the data synchronization script, then data cleaning is carried out through correlation with the business indexes to obtain target data, and a billboard image corresponding to the target data according to dimension items is formed through index dimensions selected by a user, so that visual display of dimension data is realized.
In one embodiment, the step of cleaning the data in the aggregate source database to obtain the target database according to the correlation coefficient with each business index includes: acquiring data corresponding to a specified index, wherein the specified index belongs to any index in all service indexes; sequencing and calculating the data corresponding to the specified index through a sequencing and calculating component to obtain the correlation coefficient of the data and the specified index; forming a data queue corresponding to the specified index from high to low according to the correlation coefficient; storing the data with the correlation coefficient larger than a preset threshold value in the data queue into a target database to form target data corresponding to the specified index; and generating target data corresponding to each business index respectively according to the generation process of the target data corresponding to the specified index, and obtaining the target database after the collection source database is cleaned.
In one embodiment, the step of the processor performing the sorting calculation on the data related to the specified index through the sorting calculation component to obtain the correlation coefficient between the data and the specified index includes: extracting index feature vectors of the specified indexes and data feature vectors respectively corresponding to data related to the specified indexes; calculating cosine distances between the index feature vector and each data feature vector according to a specified calculation formula, wherein the specified calculation formula is thatm represents index feature vectors, n represents each data feature vector, P represents the total number of the data feature vectors, P is a positive integer, θ represents a vector angle between the index feature vector and the data feature vector, i represents the number of the data feature vectors, n i Representing an ith data feature vector; and taking the cosine distance as a correlation coefficient between data and the specified index.
In one embodiment, the step of generating, by the processor, the target data corresponding to each of the service indexes according to the generation process of the target data corresponding to the specified index, to obtain the target database after cleaning the aggregate source database includes: pushing target data corresponding to each business index to a corresponding memory grid node; invoking a pre-configured computing assembly in each memory grid node in parallel, enabling multithreading to carry out data analysis on target data corresponding to each business index respectively according to dimension items to obtain dimension data corresponding to each dimension item respectively; and summarizing dimension data corresponding to the business indexes returned by the memory grid nodes to obtain a dimension database.
In one embodiment, the step of forming the billboard graph corresponding to the index dimension by the processor according to the data of the target database includes: enabling multithreading to process data of each index dimension in the dimension database to obtain attribute values corresponding to each index dimension respectively; acquiring configuration attributes of corresponding timing tasks on a timing scheduling platform at the current moment, wherein the configuration attributes comprise a designated chart mode for displaying attribute values corresponding to each index dimension respectively; and forming a billboard graph in the specified chart mode according to the attribute values corresponding to the index dimensions respectively according to the configuration attributes of the timing tasks.
In one embodiment, after the step of enabling multithreading to process the data of each index dimension in the dimension database to obtain the attribute value corresponding to each index dimension, the method includes: judging whether comprehensive indexes corresponding to all index dimensions exist or not, wherein the comprehensive indexes are obtained by combining the index dimensions with the specified number; inputting the attribute values corresponding to the index dimensions into a calculation assembly for comprehensive calculation to obtain comprehensive attribute values corresponding to the comprehensive index; and forming a billboard graph by the comprehensive attribute values in the mode of the designated chart.
In one embodiment, before the step of obtaining the index dimension corresponding to the business index selected by the current user in the configuration library, the processor includes: acquiring a service index currently selected by a user and setting weights respectively corresponding to the service indexes; inputting the currently selected business index into a configuration library; and respectively associating each business index with the corresponding setting weight of each business index in a one-to-one correspondence manner in the configuration library to form configured index dimensions, and storing the configured index dimensions in the configuration library.
Those skilled in the art will appreciate that the architecture shown in fig. 3 is merely a block diagram of a portion of the architecture in connection with the present application and is not intended to limit the computer device to which the present application is applied.
An embodiment of the present application further provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of generating a data analysis billboard, comprising: determining databases to which data respectively corresponding to the service indexes belong according to the service indexes selected by the current user; calling a data synchronization script, and acquiring data corresponding to each business index from each database to form a collection source database; cleaning the data in the collecting source database according to the correlation coefficient with each business index to obtain a target database; acquiring index dimensions corresponding to the business indexes selected by the current user in a configuration library; and forming a billboard graph corresponding to the index dimension according to the data of the target database.
The computer readable storage medium acquires and gathers related data from different source databases by developing a data synchronization script, then carries out data cleaning by correlation with service indexes to obtain target data, and forms a billboard image corresponding to the target data according to dimension items by index dimensions selected by a user to realize visual display of dimension data.
In one embodiment, the step of cleaning the data in the aggregate source database to obtain the target database according to the correlation coefficient with each business index includes: acquiring data corresponding to a specified index, wherein the specified index belongs to any index in all service indexes; sequencing and calculating the data corresponding to the specified index through a sequencing and calculating component to obtain the correlation coefficient of the data and the specified index; forming a data queue corresponding to the specified index from high to low according to the correlation coefficient; storing the data with the correlation coefficient larger than a preset threshold value in the data queue into a target database to form target data corresponding to the specified index; and generating target data corresponding to each business index respectively according to the generation process of the target data corresponding to the specified index, and obtaining the target database after the collection source database is cleaned.
In one embodiment, the step of the processor performing the sorting calculation on the data related to the specified index through the sorting calculation component to obtain the correlation coefficient between the data and the specified index includes: extracting index feature vectors of the specified indexes and data feature vectors respectively corresponding to data related to the specified indexes; calculating cosine distances between the index feature vector and each data feature vector according to a specified calculation formula, wherein the specified calculation formula is thatm represents index feature vectors, n represents each data feature vector, P represents the total number of the data feature vectors, P is a positive integer, θ represents a vector angle between the index feature vector and the data feature vector, i represents the number of the data feature vectors, n i Representing an ith data feature vector; and taking the cosine distance as a correlation coefficient between data and the specified index.
In one embodiment, the step of generating, by the processor, the target data corresponding to each of the service indexes according to the generation process of the target data corresponding to the specified index, to obtain the target database after cleaning the aggregate source database includes: pushing target data corresponding to each business index to a corresponding memory grid node; invoking a pre-configured computing assembly in each memory grid node in parallel, enabling multithreading to carry out data analysis on target data corresponding to each business index respectively according to dimension items to obtain dimension data corresponding to each dimension item respectively; and summarizing dimension data corresponding to the business indexes returned by the memory grid nodes to obtain a dimension database.
In one embodiment, the step of forming the billboard graph corresponding to the index dimension by the processor according to the data of the target database includes: enabling multithreading to process data of each index dimension in the dimension database to obtain attribute values corresponding to each index dimension respectively; acquiring configuration attributes of corresponding timing tasks on a timing scheduling platform at the current moment, wherein the configuration attributes comprise a designated chart mode for displaying attribute values corresponding to each index dimension respectively; and forming a billboard graph in the specified chart mode according to the attribute values corresponding to the index dimensions respectively according to the configuration attributes of the timing tasks.
In one embodiment, after the step of enabling multithreading to process the data of each index dimension in the dimension database to obtain the attribute value corresponding to each index dimension, the method includes: judging whether comprehensive indexes corresponding to all index dimensions exist or not, wherein the comprehensive indexes are obtained by combining the index dimensions with the specified number; inputting the attribute values corresponding to the index dimensions into a calculation assembly for comprehensive calculation to obtain comprehensive attribute values corresponding to the comprehensive index; and forming a billboard graph by the comprehensive attribute values in the mode of the designated chart.
In one embodiment, before the step of obtaining the index dimension corresponding to the business index selected by the current user in the configuration library, the processor includes: acquiring a service index currently selected by a user and setting weights respectively corresponding to the service indexes; inputting the currently selected business index into a configuration library; and respectively associating each business index with the corresponding setting weight of each business index in a one-to-one correspondence manner in the configuration library to form configured index dimensions, and storing the configured index dimensions in the configuration library.
Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium provided herein and used in embodiments may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual speed data rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or method that comprises the element.
The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the claims, and all equivalent structures or equivalent processes using the descriptions and drawings of the present application, or direct or indirect application in other related technical fields are included in the scope of the claims of the present application.

Claims (8)

1. A method of generating a data analysis billboard, comprising:
determining databases to which data respectively corresponding to the service indexes belong according to the service indexes selected by the current user;
Calling a data synchronization script, and acquiring data corresponding to each business index from each database to form a collection source database;
cleaning the data in the collecting source database according to the correlation coefficient with each business index to obtain a target database;
acquiring index dimensions corresponding to the business indexes selected by the current user in a configuration library;
forming a billboard graph corresponding to the index dimension according to the data of the target database;
the step of cleaning the data in the collecting source database according to the correlation coefficient with each business index to obtain a target database comprises the following steps:
acquiring data corresponding to a specified index, wherein the specified index belongs to any index in all service indexes;
sequencing and calculating the data corresponding to the specified index through a sequencing and calculating component to obtain the correlation coefficient of the data and the specified index;
forming a data queue corresponding to the specified index from high to low according to the correlation coefficient;
storing the data with the correlation coefficient larger than a preset threshold value in the data queue into a target database to form target data corresponding to the specified index;
Generating target data corresponding to each business index respectively according to the generation process of the target data corresponding to the designated index, and obtaining the target database after the collection source database is cleaned;
the step of sorting and calculating the data related to the specified index through a sorting and calculating component to obtain the related coefficient of the data and the specified index comprises the following steps:
extracting index feature vectors of the specified indexes and data feature vectors respectively corresponding to data related to the specified indexes;
calculating cosine distances between the index feature vector and each data feature vector according to a specified calculation formula, wherein the specified calculation formula is thatm represents index feature vectors, n represents each data feature vector, P represents the total number of the data feature vectors, P is a positive integer, θ represents a vector angle between the index feature vector and the data feature vector, i represents the number of the data feature vectors, n i Representing an ith data feature vector;
and taking the cosine distance as a correlation coefficient between data and the specified index.
2. The method for generating a data analysis board according to claim 1, wherein the step of generating the target data respectively corresponding to the business indexes according to the generation process of the target data corresponding to the specified indexes, after the step of obtaining the target database after cleaning the aggregate source database, includes:
Pushing target data corresponding to each business index to a corresponding memory grid node;
invoking a pre-configured computing assembly in each memory grid node in parallel, enabling multithreading to carry out data analysis on target data corresponding to each business index respectively according to dimension items to obtain dimension data corresponding to each dimension item respectively;
and summarizing dimension data corresponding to the business indexes returned by the memory grid nodes to obtain a dimension database.
3. The method of generating a data analysis billboard of claim 2, wherein said step of forming a billboard graph corresponding to said index dimension from data of said target database in accordance with said index dimension comprises:
enabling multithreading to process data of each index dimension in the dimension database to obtain attribute values corresponding to each index dimension respectively;
acquiring configuration attributes of corresponding timing tasks on a timing scheduling platform at the current moment, wherein the configuration attributes comprise a designated chart mode for displaying attribute values corresponding to each index dimension respectively;
and forming a billboard graph in the specified chart mode according to the attribute values corresponding to the index dimensions respectively according to the configuration attributes of the timing tasks.
4. A method for generating a data analysis billboard according to claim 3, wherein after said step of enabling multithreading to process data of each index dimension in said dimension database to obtain attribute values corresponding to each of said index dimensions, said method comprises:
judging whether comprehensive indexes corresponding to all index dimensions exist or not, wherein the comprehensive indexes are obtained by combining the index dimensions with the specified number;
inputting the attribute values corresponding to the index dimensions into a calculation assembly for comprehensive calculation to obtain comprehensive attribute values corresponding to the comprehensive index;
and forming a billboard graph by the comprehensive attribute values in the mode of the designated chart.
5. The method for generating a data analysis billboard according to claim 1, wherein before the step of obtaining the index dimension corresponding to the business index selected by the current user in the configuration library, the method comprises:
acquiring a service index currently selected by a user and setting weights respectively corresponding to the service indexes;
inputting the currently selected business index into a configuration library;
and respectively associating each business index with the corresponding setting weight of each business index in a one-to-one correspondence manner in the configuration library to form configured index dimensions, and storing the configured index dimensions in the configuration library.
6. An apparatus for generating a data analysis billboard, the apparatus for implementing the method of any of claims 1-5, the apparatus comprising:
the determining module is used for determining databases to which data corresponding to the business indexes respectively belong according to the business indexes selected by the current user;
the calling module is used for calling the data synchronization script, acquiring data corresponding to each service index from each database, and forming a collection source database;
the cleaning module is used for cleaning the data in the collection source database according to the correlation coefficient with each business index to obtain a target database;
the first acquisition module is used for acquiring index dimensions corresponding to the business indexes selected by the current user in the configuration library;
a forming module, configured to store the data of the target database according to the index dimension, and forming a billboard graph corresponding to the index dimension.
7. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 5 when the computer program is executed.
8. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 5.
CN202011224649.3A 2020-11-05 2020-11-05 Method, device and computer equipment for generating data analysis billboard Active CN112347092B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011224649.3A CN112347092B (en) 2020-11-05 2020-11-05 Method, device and computer equipment for generating data analysis billboard

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011224649.3A CN112347092B (en) 2020-11-05 2020-11-05 Method, device and computer equipment for generating data analysis billboard

Publications (2)

Publication Number Publication Date
CN112347092A CN112347092A (en) 2021-02-09
CN112347092B true CN112347092B (en) 2023-07-18

Family

ID=74428829

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011224649.3A Active CN112347092B (en) 2020-11-05 2020-11-05 Method, device and computer equipment for generating data analysis billboard

Country Status (1)

Country Link
CN (1) CN112347092B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115759019B (en) * 2022-11-15 2023-10-20 广州天维信息技术股份有限公司 Service data calculation method, device, storage medium and computer equipment
CN116383299A (en) * 2023-03-31 2023-07-04 国任财产保险股份有限公司 Data display system based on distributed database

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110069519A (en) * 2018-08-23 2019-07-30 平安科技(深圳)有限公司 Data information management method, apparatus, computer equipment and storage medium
CN110109978A (en) * 2019-05-16 2019-08-09 深圳前海微众银行股份有限公司 Data analysing method, device, server and readable storage medium storing program for executing based on index
CN111368089A (en) * 2018-12-25 2020-07-03 ***通信集团浙江有限公司 Service processing method and device based on knowledge graph

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011041464A2 (en) * 2009-09-29 2011-04-07 Oracle International Corporation Agentless data collection

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110069519A (en) * 2018-08-23 2019-07-30 平安科技(深圳)有限公司 Data information management method, apparatus, computer equipment and storage medium
CN111368089A (en) * 2018-12-25 2020-07-03 ***通信集团浙江有限公司 Service processing method and device based on knowledge graph
CN110109978A (en) * 2019-05-16 2019-08-09 深圳前海微众银行股份有限公司 Data analysing method, device, server and readable storage medium storing program for executing based on index

Also Published As

Publication number Publication date
CN112347092A (en) 2021-02-09

Similar Documents

Publication Publication Date Title
US11281969B1 (en) Artificial intelligence system combining state space models and neural networks for time series forecasting
JP4980395B2 (en) Data analysis system and method
CN112347092B (en) Method, device and computer equipment for generating data analysis billboard
CN111126495B (en) Model training method, information prediction device, storage medium and equipment
CN114223189B (en) Time length statistics method, device, electronic equipment and computer readable medium
CN104993962A (en) Method and system for acquiring use state of terminal
CN110147470B (en) Cross-machine-room data comparison system and method
CN113420009B (en) Electromagnetic data analysis device, system and method based on big data
CN109242425A (en) Project Cost collects methodology, device, computer equipment and storage medium
CN111680085A (en) Data processing task analysis method and device, electronic equipment and readable storage medium
CN113393060A (en) Task allocation method and device, electronic equipment and storage medium
CN111291936B (en) Product life cycle prediction model generation method and device and electronic equipment
CN113222170A (en) Intelligent algorithm and model for IOT (Internet of things) AI (Artificial Intelligence) collaborative service platform
JP2023029604A (en) Apparatus and method for processing patent information, and program
CN110134663B (en) Organization structure data processing method and device and electronic equipment
CN111984677B (en) Resource data checking method, device, computer equipment and storage medium
CN112631889A (en) Portrayal method, device and equipment for application system and readable storage medium
CN112115281A (en) Data retrieval method, device and storage medium
CN117076770A (en) Data recommendation method and device based on graph calculation, storage value and electronic equipment
AU2016247853A1 (en) Requirements determination
CA3176293A1 (en) System and method for automated forest inventory mapping
CN114610308A (en) Application function layout adjusting method and device, electronic equipment and storage medium
CN113590667A (en) Real-time data updating and managing method based on Spark Streaming
CN113190587A (en) Data processing method and device for realizing service data processing
CN112560938A (en) Model training method and device and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant