CN108664635B - Method, device, equipment and storage medium for acquiring database statistical information - Google Patents

Method, device, equipment and storage medium for acquiring database statistical information Download PDF

Info

Publication number
CN108664635B
CN108664635B CN201810462476.5A CN201810462476A CN108664635B CN 108664635 B CN108664635 B CN 108664635B CN 201810462476 A CN201810462476 A CN 201810462476A CN 108664635 B CN108664635 B CN 108664635B
Authority
CN
China
Prior art keywords
statistical information
database
statement
data
statistical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810462476.5A
Other languages
Chinese (zh)
Other versions
CN108664635A (en
Inventor
耿慧玲
万伟
韩朱忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Dameng Database Co Ltd
Original Assignee
Shanghai Dameng Database Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Dameng Database Co Ltd filed Critical Shanghai Dameng Database Co Ltd
Priority to CN201810462476.5A priority Critical patent/CN108664635B/en
Publication of CN108664635A publication Critical patent/CN108664635A/en
Application granted granted Critical
Publication of CN108664635B publication Critical patent/CN108664635B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a method, a device, equipment and a storage medium for acquiring statistical information of a database. The method for acquiring the statistical information of the database comprises the following steps: converting the statistical information collection instruction of the user into a corresponding SQL statement; performing semantic analysis on the SQL statement to extract a SELECT statement executable by a database; executing a SELECT statement through a database to obtain a data set which accords with a statistical information collection instruction; and counting the data set to obtain statistical information. The embodiment of the invention greatly reduces the waste and the running time of system resources, improves the performance and simultaneously reduces the implementation and maintenance difficulty.

Description

Method, device, equipment and storage medium for acquiring database statistical information
Technical Field
The embodiment of the invention relates to a database technology, in particular to a method, a device, equipment and a storage medium for acquiring statistical information of a database.
Background
In a relational database, statistical information is a necessary aid for the optimizer of the database to generate an optimal execution plan. The statistical information generally includes information such as data line number, data average width, number of different values, distribution condition of values, etc., and the optimizer judges the influence condition of the filtering condition in the query on the data result set according to the information and the overhead of the system when performing different calculation operations.
Currently, it is common practice to acquire statistical information by scanning an index (scanning an aggregate index of the table for column statistical information, and scanning the index for statistical information of the index), parsing all columns of the index from the read data page, and then performing sorting, grouping, analysis, and other processing on the data as a whole to obtain information such as the number of data lines, the average width of the data, the number of different values, and the distribution status of the values.
However, with the increase of the data scale in the above method, operations such as scanning, analyzing, sorting and the like of data require a large amount of I/O, memory and computing resources, and it will also take more and more time, and particularly, under the data scale that the memory requirement cannot be met, there will be a disk-scrubbing operation, which will seriously affect the performance of statistical information generation.
Disclosure of Invention
Embodiments of the present invention provide a method, an apparatus, a device, and a storage medium for acquiring database statistical information, so as to greatly reduce waste and running time of system resources, improve performance, and reduce implementation and maintenance difficulties.
In a first aspect, an embodiment of the present invention provides a method for acquiring database statistical information, including:
converting the statistical information collection instruction of the user into a corresponding SQL statement;
performing semantic analysis on the SQL statement to extract a SELECT statement executable by a database;
executing the SELECT statement through the database to obtain a data set which accords with the statistical information collection instruction;
and counting the data set to obtain statistical information, wherein the statistical information comprises data line number, data average width, number of different values and distribution of the different values.
Optionally, after the extracting the SELECT statement executable by the database, the method further includes:
the SELECT statement is tagged with the keyword STAT.
Optionally, after the statistical information is obtained from the data set, the method further includes:
and writing the statistical information into a database system table.
Optionally, the converting the statistical information collection instruction of the user into a corresponding SQL statement includes:
and extracting a statistical object and a sampling rate from the statistical information collection instruction, and converting the statistical information collection instruction into the SQL statement according to the statistical object and the sampling rate.
In a second aspect, an embodiment of the present invention provides an apparatus for acquiring database statistical information, including:
the user instruction conversion module is used for converting the statistical information collection instruction of the user into a corresponding SQL statement;
the SQL processing module is used for carrying out semantic analysis on the SQL statement and extracting a SELECT statement which can be executed by a database; executing the SELECT statement through the database to obtain a data set which accords with the statistical information collection instruction;
and the data collection processing module is used for counting the data set to obtain statistical information, and the statistical information comprises data line number, data average width, number of different values and distribution of the different values.
Optionally, the SQL processing module is further configured to mark the SELECT statement with a keyword STAT.
Optionally, the data collection processing module is further configured to write the statistical information into a database system table.
Optionally, the user instruction conversion module is specifically configured to extract a statistical object and a sampling rate from the statistical information collection instruction, and convert the statistical information collection instruction into the SQL statement according to the statistical object and the sampling rate.
In a third aspect, an embodiment of the present invention provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the method for obtaining the database statistical information according to any one of the above first aspects when executing the program.
In a fourth aspect, embodiments of the present invention provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform the method for obtaining database statistics as described in any one of the above first aspects.
According to the embodiment of the invention, the database is used for executing the SELECT statement, and the SQL optimizer and the executor of the database are used for completing the collection and processing of data, so that not only can the optimizer be used for selecting a proper index for the database, but also the existing optimization and parallel processing of the executor can be used, the waste and the running time of system resources are greatly reduced, the performance is improved, and the implementation and maintenance difficulty is reduced.
Drawings
Fig. 1 is a flowchart of a method for obtaining statistical information of a database according to an embodiment of the present invention;
fig. 2 is a flowchart of a database statistical information obtaining method according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of an apparatus for acquiring statistical information of a database according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a computer device according to a fourth embodiment of the present invention.
Detailed Description
The embodiments of the present invention will be described in further detail with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad invention. It should be further noted that, for convenience of description, only some structures, not all structures, relating to the embodiments of the present invention are shown in the drawings.
Example one
Fig. 1 is a flowchart of a method for acquiring database statistical information according to an embodiment of the present invention, where the method may be executed by a computer device, and specifically includes the following steps:
step 101, converting a statistical information collection instruction of a user into a corresponding SQL statement;
the computer equipment extracts the statistical object and the sampling rate from the statistical information collecting instruction and converts the statistical information collecting instruction into the SQL statement according to the statistical object and the sampling rate. For example, the user needs to make statistical information with a sampling rate of 100% for a certain column C belonging to a certain table T, and the user inputs a statistical information collection instruction:
STAT 100ON T(C)
the computer device converts it into an SQL statement:
STAT 100ON SELECT C,COUNT(*),
SYSFUN_1(C),SYSFUN_2(C)..,SYSFUN_N(C)FROM T GROUP BY C ORDER BY C
wherein STAT represents that the statement is a statistical information collection statement, 100 represents that the sampling rate is 100%, ON represents that statistical data comes from a following SELECT statement, C in the SELECT statement is a statistical object column, T is a table to which C belongs, sysfan _1, sysfan _2, and sysfan _ N are all system calculation functions, and are used for acquiring required information such as average data width, and the like, and reduction can be added as required, GROUP BY C is that data is grouped according to the C column, ORDER BY C is that data is sorted according to the C column, and COUNT (×) represents the number of rows of data in each GROUP.
102, carrying out semantic analysis on the SQL statement, and extracting a SELECT statement which can be executed by a database;
the database itself may execute some statements, which can perform lexical, syntactic, semantic analysis and execution ON general add-delete-modify-check SQL statements, while the unexpanded database may identify "STAT 100ON SELECT.
103, executing a SELECT statement through a database to obtain a data set which accords with a statistical information collection instruction;
the database may perform lexical, syntactic, semantic analysis and execution on the SELECT statement, in this embodiment, the computer device executes the SELECT statement through the database, for example, the database conventionally parses the SELECT statement, that is, SELECT C, COUNT (, sysfan _1(C), sysfan _2(C), sysfan _ n (C), FROM T GROUP BY C ORDER BY C, to obtain a data set conforming to the statistical information collection instruction. It should be noted that the SELECT statement is executed in the database according to a common query statement, and may utilize some existing optimization techniques and processing modules, for example, if there is a secondary index whose first column is the sequence, the sorting operation in the original processing data may be saved by index scanning, and the grouping operation may also greatly reduce the utilization of computing resources because the data is sorted; in addition, the results of the SELECT statement are grouped, a large number of repeated values are removed under a general condition (because the same values after grouping only need to appear once), so that a lot of unnecessary data transmission is reduced, resource consumption of subsequent processing is also saved, the existing mature module (SQL processing module of the original database) can be reused, an optimal mode is used for scanning and sorting data on the whole, and the overall performance of statistical information generation is greatly improved.
And 104, counting the data set to obtain statistical information, wherein the statistical information comprises data line number, data average width, number of different values and distribution of the different values.
The computer device traverses the data set to obtain statistical information (at least comprising data line number, data average width, number of different values and distribution of different values) from the data set, and writes the statistical information into a database system table. Since the data sets are grouped and sorted according to the normal query statement, the statistical information can be acquired more easily and more quickly than the conventional statistical information collection method.
According to the technical scheme of the embodiment, the database is used for executing the SELECT statement, the SQL optimizer and the executor of the database are used for completing the collection and processing of data, the optimizer can be used for selecting a proper index for the database, the existing optimization and parallel processing of the executor can be used, the waste and running time of system resources are greatly reduced, the performance is improved, and the implementation and maintenance difficulty is reduced.
Example two
Fig. 2 is a flowchart of a method for acquiring statistical information of a database according to a second embodiment of the present invention, where the method may be executed by a computer device, and specifically includes the following steps:
step 201, converting the statistical information collection instruction of the user into a corresponding SQL statement.
For example, STAT 100ON T (C) conversion
STAT 100 ON SELECT C,COUNT(*),
SYSFUN_1(C),SYSFUN_2(C)..,SYSFUN_N(C)FROM T GROUP BY C ORDER BY C。
Step 202, performing semantic analysis on the SQL statement.
If the semantic analysis is correct, go to step 203, otherwise report an error.
Step 203, extracting the SELECT statement from the SQL statement, and marking the SELECT statement with STAT.
The SELECT statement in the above example is: SELECT C, COUNT (, sysfan _1(C), sysfan _2(C), sysfan _ n (C) FROM T GROUP BY C ORDER BY C.
And 204, performing semantic analysis on the SELECT statement.
And step 205, generating an optimal execution plan for the SELECT statement.
An optimal execution plan is generated after comprehensive judgment according to factors such as indexes and parallel parameters.
And step 206, executing the optimal execution plan to generate a data set.
And step 207, counting the data set to obtain statistical information, and writing the statistical information into a database system table.
EXAMPLE III
Fig. 3 is a schematic structural diagram of an apparatus for acquiring database statistical information according to a third embodiment of the present invention, and referring to fig. 3, the apparatus includes: the system comprises a user instruction conversion module 11, an SQL processing module 12 and a data collection processing module 13, wherein the user instruction conversion module 11 is used for converting a statistical information collection instruction of a user into a corresponding SQL statement; the SQL processing module 12 is configured to perform semantic analysis on the SQL statement and extract a SELECT statement that can be executed by the database; executing the SELECT statement through the database to obtain a data set which accords with the statistical information collection instruction; and the data collection processing module 13 is configured to count the data sets to obtain statistical information, where the statistical information includes a data line number, a data average width, numbers of different values, and distribution of the different values.
On the basis of the above technical solution, the SQL processing module 12 is further configured to mark the SELECT statement with a keyword STAT.
On the basis of the above technical solution, the data collection processing module 13 is further configured to write the statistical information into a database system table.
On the basis of the above technical solution, the user instruction conversion module 11 is specifically configured to extract a statistical object and a sampling rate from the statistical information collection instruction, and convert the statistical information collection instruction into the SQL statement according to the statistical object and the sampling rate.
The device for acquiring the statistical information of the database provided by the embodiment of the invention can execute the method for acquiring the statistical information of the database provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
Example four
Fig. 4 is a schematic structural diagram of a computer apparatus according to a fourth embodiment of the present invention, as shown in fig. 4, the computer apparatus includes a processor 20, a memory 21, an input device 22, and an output device 23; the number of the processors 20 in the computer device may be one or more, and one processor 20 is taken as an example in fig. 4; the processor 20, the memory 21, the input device 22 and the output device 23 in the computer apparatus may be connected by a bus or other means, and the connection by the bus is exemplified in fig. 4.
The memory 21 is a computer-readable storage medium, and can be used for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the database statistical information obtaining method in the embodiment of the present invention. The processor 20 executes various functional applications and data processing of the computer device by executing software programs, instructions and modules stored in the memory 21, that is, the above-mentioned method for acquiring the database statistical information is realized.
The memory 21 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 21 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, memory 21 may further include memory located remotely from processor 20, which may be connected to a computer device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The input device 22 may be used to receive input numeric or character information and to generate key signal inputs relating to user settings and function controls of the computer apparatus. The output device 23 may include a display device such as a display screen.
EXAMPLE five
The fifth embodiment of the present invention further provides a storage medium containing computer-executable instructions, where the computer-executable instructions are used to perform relevant operations in the database statistical information obtaining method provided in any embodiment of the present invention when executed by a computer processor.
From the above description of the embodiments, it is obvious for those skilled in the art that the embodiments of the present invention can be implemented by software and necessary general hardware, and certainly can be implemented by hardware, but the former is a better implementation in many cases. Based on such understanding, the technical solutions of the embodiments of the present invention may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions to make a computer device (which may be a personal computer, a server, or a network device) perform the methods described in the embodiments of the present invention.
It should be noted that, in the embodiment of the apparatus, the included units and modules are merely divided according to functional logic, but are not limited to the above division as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the embodiment of the invention.
It should be noted that the foregoing is only a preferred embodiment of the present invention and the technical principles applied. Those skilled in the art will appreciate that the embodiments of the present invention are not limited to the specific embodiments described herein, and that various obvious changes, adaptations, and substitutions are possible, without departing from the scope of the embodiments of the present invention. Therefore, although the embodiments of the present invention have been described in more detail through the above embodiments, the embodiments of the present invention are not limited to the above embodiments, and many other equivalent embodiments may be included without departing from the concept of the embodiments of the present invention, and the scope of the embodiments of the present invention is determined by the scope of the appended claims.

Claims (9)

1. A method for obtaining statistical information of a database is characterized by comprising the following steps:
converting the statistical information collection instruction of the user into a corresponding SQL statement;
performing semantic analysis on the SQL statement to extract a SELECT statement executable by a database;
executing the SELECT statement through the database to obtain a data set which accords with the statistical information collection instruction;
and counting the data set to obtain statistical information, wherein the statistical information comprises data line number, data average width, number of different values and distribution of the different values.
2. The method of claim 1, wherein after the extracting the SELECT statement executable by the database, further comprising:
the SELECT statement is tagged with the keyword STAT.
3. The method according to claim 1 or 2, further comprising, after said obtaining statistical information from said statistics of said data sets:
and writing the statistical information into a database system table.
4. The method according to claim 1, wherein the converting the statistical information collection instruction of the user into the corresponding SQL statement comprises:
and extracting a statistical object and a sampling rate from the statistical information collection instruction, and converting the statistical information collection instruction into the SQL statement according to the statistical object and the sampling rate.
5. An apparatus for obtaining statistical information of a database, comprising:
the user instruction conversion module is used for converting the statistical information collection instruction of the user into a corresponding SQL statement;
the SQL processing module is used for carrying out semantic analysis on the SQL statement and extracting a SELECT statement which can be executed by a database; executing the SELECT statement through the database to obtain a data set which accords with the statistical information collection instruction;
and the data collection processing module is used for counting the data set to obtain statistical information, and the statistical information comprises data line number, data average width, number of different values and distribution of the different values.
6. The apparatus of claim 5, wherein the SQL processing module is further configured to mark the SELECT statement with a key STAT.
7. The apparatus of claim 5 or 6, wherein the data collection processing module is further configured to write the statistical information into a database system table.
8. The apparatus according to claim 5, wherein the user instruction converting module is specifically configured to extract a statistical object and a sampling rate from the statistical information collecting instruction, and convert the statistical information collecting instruction into the SQL statement according to the statistical object and the sampling rate.
9. A storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform the method of database statistics acquisition as claimed in any one of claims 1 to 4.
CN201810462476.5A 2018-05-15 2018-05-15 Method, device, equipment and storage medium for acquiring database statistical information Active CN108664635B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810462476.5A CN108664635B (en) 2018-05-15 2018-05-15 Method, device, equipment and storage medium for acquiring database statistical information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810462476.5A CN108664635B (en) 2018-05-15 2018-05-15 Method, device, equipment and storage medium for acquiring database statistical information

Publications (2)

Publication Number Publication Date
CN108664635A CN108664635A (en) 2018-10-16
CN108664635B true CN108664635B (en) 2020-12-04

Family

ID=63778561

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810462476.5A Active CN108664635B (en) 2018-05-15 2018-05-15 Method, device, equipment and storage medium for acquiring database statistical information

Country Status (1)

Country Link
CN (1) CN108664635B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110704472A (en) * 2019-08-27 2020-01-17 北京数知科技股份有限公司 Data query statistical method and device
CN111190897B (en) * 2019-11-07 2023-04-18 腾讯科技(深圳)有限公司 Information processing method, information processing apparatus, storage medium, and server
CN112925856B (en) * 2019-12-06 2023-09-19 ***通信集团重庆有限公司 Entity relationship analysis method, entity relationship analysis device, entity relationship analysis equipment and computer storage medium
CN111274270B (en) * 2020-02-24 2020-09-18 北京东方金信科技有限公司 Statistical information processing and using method of database optimizer and storage device
CN111401009B (en) * 2020-03-17 2024-03-01 深圳市铭墨科技有限公司 Digital expression character recognition conversion method, device, server and storage medium
CN114238395A (en) * 2022-01-06 2022-03-25 税友软件集团股份有限公司 Database optimization method and device, electronic equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040243555A1 (en) * 2003-05-30 2004-12-02 Oracle International Corp. Methods and systems for optimizing queries through dynamic and autonomous database schema analysis
US7797319B2 (en) * 2006-05-15 2010-09-14 Algebraix Data Corporation Systems and methods for data model mapping
CN103279509B (en) * 2013-05-17 2016-05-25 北京首钢自动化信息技术有限公司 A kind of method for quickly querying that adopts dynamic queries language
CN104679646B (en) * 2013-11-29 2018-02-06 阿里巴巴集团控股有限公司 A kind of method and apparatus for detecting SQL code defect

Also Published As

Publication number Publication date
CN108664635A (en) 2018-10-16

Similar Documents

Publication Publication Date Title
CN108664635B (en) Method, device, equipment and storage medium for acquiring database statistical information
CN104899314B (en) A kind of parentage analysis method and apparatus of data warehouse
CN105320679B (en) A kind of tables of data indexed set symphysis is at method and device
CN111708860A (en) Information extraction method, device, equipment and storage medium
US20080140627A1 (en) Method and apparatus for aggregating database runtime information and analyzing application performance
CN111400338A (en) SQ L optimization method, device, storage medium and computer equipment
CN106780149A (en) A kind of equipment real-time monitoring system based on timed task scheduling
CN106919612A (en) A kind of processing method and processing device of SQL script of reaching the standard grade
CN106844782B (en) Network-oriented multi-channel big data acquisition system and method
CN112434003B (en) SQL optimization method and device, computer equipment and storage medium
CN109284088B (en) Signaling big data processing method and electronic equipment
CN116016628A (en) API gateway buried point analysis method and device
CN111984625B (en) Database load characteristic processing method and device, medium and electronic equipment
CN106919566A (en) A kind of query statistic method and system based on mass data
KR102345410B1 (en) Big data intelligent collecting method and device
CN109299132B (en) SQL data processing method and system and electronic equipment
CN110287114A (en) A kind of method and device of database script performance test
CN103955192B (en) A kind of curve form data sampling method for sewage work
CN106682107B (en) Method and device for determining incidence relation of database table
Shao et al. An improved approach to the recovery of traceability links between requirement documents and source codes based on latent semantic indexing
CN110895529B (en) Processing method of structured query language and related device
CN111143329A (en) Data processing method and device
CN112435151A (en) Government affair information data processing method and system based on correlation analysis
CN110569243A (en) data query method, data query plug-in and data query server
CN117112780B (en) Unstructured log analysis method and device based on text abstract model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant