CN117931898B

CN117931898B - Multidimensional database statistical analysis method based on large model

Info

Publication number: CN117931898B
Application number: CN202410339653.6A
Authority: CN
Inventors: 孟胜; 代平; 崔娅玲; 郭一欧; 杜德刚
Original assignee: Chengdu Synchronous Xinchuang Technology Co ltd
Current assignee: Chengdu Synchronous Xinchuang Technology Co ltd
Priority date: 2024-03-25
Filing date: 2024-03-25
Publication date: 2024-06-07
Anticipated expiration: 2044-03-25
Also published as: CN117931898A

Abstract

The invention discloses a multidimensional database statistical analysis method based on a large model, and belongs to the technical field of database analysis. The method comprises the following steps: importing the structures of a plurality of data tables into a large language model; analyzing the table relation of the database, and determining the multi-element relation between the service center table and the auxiliary table; generating a service chain relation according to the plurality of service center tables; establishing an atomic diffusion diagram conducted by means of external keys of a service center table; forming a plurality of conducting chains taking the primary bond as a core according to the atomic diffusion diagram; determining the starting point of a conductive chain, and combining all the conductive chains to form a tree structure diagram; calculating a shortest path from the link head to the target data point; generating a multidimensional query SQL statement according to the shortest path and a plurality of data tables on the shortest path; and linking the target database by using the multidimensional query SQL statement to acquire and output an abstract data set. The invention improves the efficiency and accuracy of data processing and analysis.

Description

Multidimensional database statistical analysis method based on large model

Technical Field

The invention belongs to the technical field of database analysis, and particularly relates to a multidimensional database statistical analysis method based on a large model.

Background

As the degree of enterprise informatization becomes higher, a large amount of data is stored in a Management Information System (MIS) or a Business Information System (BIS). These data are rich in business relationships and values, and for a large number of complex business data, it is a current challenge how to quickly and accurately construct an analytical model and form intuitive and easily understood analytical results. The current statistical methods of databases have the following adjustments:

1. The data association is complex: in a business system with hundreds of data tables, the relationship between data may be very complex, and when multi-table query is performed, multiple tables need to be associated to obtain required data, a great number of association conditions and connection operations are involved, and complexity and difficulty of query sentences are increased.

2. The technical dependence is high: the user needs to learn the data structure and SQL query statement deeply to perform complex data analysis, which is difficult to perform without deep familiarity with the system and database table relationships. Because after each time the user puts forward the demand, professional IT research and development engineers are required to finish corresponding research and development work through a set of development processes according to the demands of clients, and the clients can respond for a long time.

3. Multi-table lookup is complex: multi-table queries typically require writing complex SQL statements containing connections between multiple tables, aggregations, sub-queries, etc., which increase the complexity and maintenance difficulty of code and SQL statements. Meanwhile, when the database structure changes, the existing multi-table query statement needs to be modified or optimized.

4. The flexibility is not enough: the traditional statistical analysis method is not flexible enough, lacks intelligent support, automatic data identification, automatic relation carding, automatic statistical analysis and the like, and limits the analysis efficiency and accuracy.

5. The multi-table associative query performance is unstable: the conventional multi-table database query path is increased in geometric figures along with the increase of tables (the number of the combined paths between the tables=N (N-1)/2), multiple paths can follow when a large number of complex queries are processed, the query performance is unstable, and the optimal path is difficult to find manually.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a multidimensional database statistical analysis method based on a large model.

The aim of the invention is realized by the following technical scheme: a multi-dimensional database statistical analysis method based on a large model, comprising:

s100, importing structures of a plurality of data tables into a large language model;

S200, analyzing the relationship between tables of a database based on a preset database association analysis prompt word, and determining the multi-element relationship between a service center table and an auxiliary table according to the service flow;

S300, generating a service chain relation according to a mode of external key association and a plurality of service center tables;

s400, establishing an atomic diffusion diagram conducted by means of external keys of the service center tables by taking a main key among the service center tables in the service flow as a center;

S500, according to the atomic diffusion diagram, expanding auxiliary tables related to the service center table to form a plurality of conducting chains taking a main key as a core;

s600, determining the starting point of a conductive chain, and combining all the conductive chains to form a tree-shaped structure diagram;

s700, calculating the shortest path from the link head to the target data point by using a large language model;

s800, generating a multidimensional query SQL statement according to a shortest path calculated by a large language model and a plurality of data tables on the shortest path;

S900, linking the target database by using the multidimensional query SQL statement, carrying out data query, and obtaining and outputting an abstract data set.

Further, importing the structure of the plurality of data tables into the large language model includes:

the structure of the multiple data tables is imported into the large language model by linking or copying.

The structure of a plurality of data tables in a management information system or a business information system is imported into a large language model.

Further, the number of levels per conductive chain is less than 5.

Further, generating a multidimensional query SQL statement according to the shortest path calculated by the large language model and a plurality of data tables on the shortest path, including:

automatically generating multi-dimensional query SQL containing white, join and Group according to the shortest path calculated by the large language model and a plurality of data tables on the shortest path

Further, determining the origin of the conductive chain includes:

And determining the starting point of the conductive chain according to the service requirement.

Further, a path-exhaustive method is employed to calculate the shortest path from the link head to the target data point.

The beneficial effects of the invention are as follows:

(1) According to the invention, through automatic processing and optimizing the data importing process, the efficiency of the data preparation stage is obviously improved, the manual operation difficulty is reduced, errors are avoided, and the time is saved;

(2) The invention can accurately identify and comb the complex business relationship among multiple tables by utilizing the capability of the large language model, and provides deeper data association understanding for users;

(3) The shortest path calculation in the invention not only improves the query efficiency, but also optimizes the resource utilization and reduces the database load caused by complex query;

(4) The invention provides an intuitive way for displaying the relationship between data by adopting the atomic diffusion diagram and the conduction chain, so that a non-technical user can easily understand the data structure and business logic;

(5) The multiple tree structures generated by the invention can process not only linear data analysis tasks, but also more complex and multi-level data exploration and analysis requirements; through intelligent analysis and optimization, deep data insight is provided, enterprises are helped to better understand business data, and data-driven decisions are made;

(6) The automatically generated multi-table and multi-dimensional query SQL statement reduces the burden of writing complex queries by database administrators and analysts, and improves the working efficiency;

(7) The invention can automatically adjust the analysis method and the query statement according to different business rules and user requirements, thereby adapting to various business scenes and user requirements, displaying high adaptability and customization capability, having good flexibility and expansibility and being suitable for various business analysis;

(8) The user-friendly interface and the automation characteristic of the invention reduce the technical threshold of a user for operating a complex database system, and the user can complete complex data statistics analysis work without deep knowledge of a professional database query language, thereby remarkably reducing the operation difficulty and improving the working efficiency, and enabling non-professional personnel to easily perform multidimensional data analysis;

(9) The invention not only improves the efficiency and accuracy of data processing and analysis, but also provides a more visual, flexible and user-friendly data analysis tool which is suitable for enterprises and organizations of various scales. The advantages jointly reflect the commercial value and the practicability of the invention, and lay a foundation for popularization and application in the market.

Drawings

FIG. 1 is a flow chart of a statistical analysis method of a multidimensional database according to the present invention;

FIG. 2 is a schematic illustration of an atomic diffusion diagram according to the present invention;

FIG. 3 is a schematic diagram of a conductive chain according to the present invention.

Detailed Description

The technical solutions of the present invention will be clearly and completely described below with reference to the embodiments, and it is apparent that the described embodiments are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by a person skilled in the art without any inventive effort, are intended to be within the scope of the present invention, based on the embodiments of the present invention.

Referring to fig. 1-3, the present invention provides a statistical analysis method for a multidimensional database based on a large model:

as shown in fig. 1, a statistical analysis method of a multidimensional database based on a large model includes S100 to S1000.

S100, importing structures of a plurality of data tables into a large language model.

Specifically, the structure of a plurality of data tables in a Management Information System (MIS) or a Business Information System (BIS) is imported into a large language model.

A database is provided with a plurality of data tables, and the table structure among the data tables is the structure of the data tables.

In some embodiments, the fast importation of structures of multiple data tables is achieved by way of linking or replication.

S200, analyzing the relationship between tables of the database based on a preset database association analysis prompt word, and determining the multi-element relationship between the service center table and the auxiliary table according to the service flow.

In particular, the database design is required to meet between the second specification and the third paradigm. Before business center table and auxiliary table analysis: ① The system business description which needs to be statistically analyzed is arranged, namely: what the business system does, mainly uses what the objects have, frequent statistical analysis examples and the like; according to the standard design requirement of the database, ② perfects the annotation of the data table, and the definition and the function of the table are described; ③ The "annotation" of each field in the data table is perfected, specifying the service specification name, type and definition of this field. And (3) arranging the 3 points into a prompt word according to the service requirement, and analyzing a plurality of data tables in a database.

Specifically, database association analysis prompt words are set in advance in a self-defined mode, and then the relations among the tables of the database structure are automatically analyzed according to the database association analysis prompt words. In addition, the multi-element relation between the business center table and the auxiliary table is combed through the business flow.

S300, generating a service chain relation according to a plurality of service center tables by using the mode of the external key association.

Specifically, the service chain relation is analyzed one by one according to all the acquired service center tables. And (3) forming a service chain relation by using each service center table as a starting point and using a mode of 'foreign key' (fk) association through a plurality of service center tables, and gradually analyzing the service flow, wherein the maximum is 5 layers of tables, namely the finally formed plurality of service flows.

S400, establishing an atomic diffusion diagram conducted by means of external keys of the service center tables by taking a main key among the service center tables in the service flow as a center.

As shown in fig. 2, the atomic diffusion map reflects the relationship between the service center table and other data association tables.

S500, according to the atomic diffusion diagram, expanding auxiliary tables related to the service center table to form a plurality of conducting chains taking the main key as a core.

A schematic of the conductive chain is shown in fig. 3.

In some embodiments, the number of levels per conductive chain is less than 5.

In this embodiment, by ensuring that the level of each conductive chain is within 5 layers, efficiency can be ensured, and the conductive chains can be perfected by continuous drilling.

S600, determining the starting point of the conductive chain, and combining all the conductive chains to form a tree structure diagram.

In some embodiments, the start of the conductive chain is determined based on traffic demand, thereby providing an explicit start for statistical analysis.

In this embodiment, all the conductive chains are combined to form a tree structure diagram for reflecting complex association relation between data, so as to form an abstract data set after analysis of the large model BI, and "drill sub-dimension" of the data.

In this embodiment, a plurality of conductive chains are combined to form a tree structure, and each conductive chain is a "multi-table" combination mode of SQL, and is used to efficiently and accurately generate corresponding SQL statements as needed during statistical analysis in the future, and extract corresponding abstract data sets from the database.

The embodiment realizes a free BI statistical analysis method through an abstract data set, a link head and a drill child dimension method. In the embodiment, a plurality of abstract data sets are efficiently formed through a plurality of conducting chains, and the relationship between a link head, namely a service demand point, a sub-dimension, namely a downward multi-service demand point and an auxiliary table is drilled, so that a free and on-demand BI statistical analysis method can be realized, and BI, data statistical analysis and the like can be flexibly realized according to customer demands.

S700, calculating the shortest path from the link head to the target data point by using the large language model.

Specifically, a business decision analysis theory has multi-table association, a plurality of paths exist, an algorithm is used for calculating the shortest path from the link head to the target data point, and the statistical analysis process is optimized.

In some embodiments, a path-exhaustive method is employed to calculate the shortest path from the link head to the target data point. Specifically, firstly, determining a link head, namely two data tables of a service demand point and a target data point, analyzing the number of transmission paths among structures of a plurality of imported data tables through a large language model, recommending a shortest path to generate SQL sentences, and performing statistical analysis.

S800, generating a multidimensional query SQL statement according to the shortest path calculated by the large language model and a plurality of data tables on the shortest path.

In some embodiments, a multi-dimensional query SQL statement containing Where, join, group or the like clauses is automatically generated from the shortest path calculated by the large language model and the multiple data tables on the shortest path.

Specifically, the results are obtained from the database by inquiry, and the results are output as a data set for further analysis by the user.

In this embodiment, the large language model analyzes the table definition and the table relationship structure of the database by using the combined specific "hint words" according to the "system service description, database table and field annotation". The large language model analyzes the service center table and the auxiliary table, and uses the service center as a main key center to analyze one by one and establish an atomic diffusion diagram and a conduction link path which depend on the external key conduction of the service center table. The large language model combines a plurality of conducting chains to form a tree structure diagram according to the statistical analysis requirement, a corresponding SQL sentence is generated, and a corresponding abstract data set is extracted from a database.

The method of the embodiment provides a brand new methodology for statistical analysis of big data and multidimensional data, and the big language model is applied to the whole flow of statistical analysis of multidimensional databases, from data preprocessing to complex query generation, and processing technology processes and understands business relations. The invention adopts the concepts of an atomic diffusion diagram and a conducting chain to map and analyze data relations and generates optimized query SQL sentences aiming at the data relations, and the method is more dynamic and flexible than the traditional analysis model and path, and can adapt to the continuously changing service requirements and data structures.

Compared with the conventional method, the method of the embodiment has the following characteristics: the method of the embodiment utilizes a large language model to analyze business relations among multiple tables, so that complex logic and semantic relations among data are understood and presented, and intelligent combing of the business relations is realized. The method of the present embodiment reveals intuitive relationships between data for users by forming an atomic diffusion diagram centered on a "primary key" (pk), which is rarely involved in conventional multidimensional database tools. The method of the embodiment intelligently identifies the 'primary key' (pk) through a large model and forms a hierarchical conducting chain to support deep and flexible data exploration. The method in the embodiment calculates the shortest path in the statistical analysis, optimizes the data query path, reduces the load of the database and improves the query efficiency.

The foregoing is merely a preferred embodiment of the invention, and it is to be understood that the invention is not limited to the form disclosed herein but is not to be construed as excluding other embodiments, but is capable of numerous other combinations, modifications and environments and is capable of modifications within the scope of the inventive concept, either as taught or as a matter of routine skill or knowledge in the relevant art. And that modifications and variations which do not depart from the spirit and scope of the invention are intended to be within the scope of the appended claims.

Claims

1. A method for statistical analysis of a multidimensional database based on a large model, comprising:

2. The method of claim 1, wherein importing the structure of the plurality of data tables into the large language model comprises:

3. The method of claim 1, wherein importing the structure of the plurality of data tables into the large language model comprises:

4. The method of statistical analysis of a multidimensional database based on a large model of claim 1, wherein the number of levels per conductive chain is less than 5.

5. The method of claim 1, wherein generating the multi-dimensional query SQL statement from the shortest path calculated by the large language model and the plurality of data tables on the shortest path comprises:

And automatically generating the multidimensional query SQL containing white, join and Group according to the shortest path calculated by the large language model and a plurality of data tables on the shortest path.

6. The method of statistical analysis of a multidimensional database based on a large model of claim 1, wherein determining the start of the conductive chain comprises:

7. A method of statistical analysis of a multidimensional database based on a large model according to claim 1, wherein a path exhaustive method is used to calculate the shortest path from the link head to the target data point.