CN113760856B - Database management method and device, computer readable storage medium and electronic equipment - Google Patents

Database management method and device, computer readable storage medium and electronic equipment Download PDF

Info

Publication number
CN113760856B
CN113760856B CN202010504496.1A CN202010504496A CN113760856B CN 113760856 B CN113760856 B CN 113760856B CN 202010504496 A CN202010504496 A CN 202010504496A CN 113760856 B CN113760856 B CN 113760856B
Authority
CN
China
Prior art keywords
database
root
target database
capacity
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010504496.1A
Other languages
Chinese (zh)
Other versions
CN113760856A (en
Inventor
刘士超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingdong Technology Holding Co Ltd
Original Assignee
Jingdong Technology Holding Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jingdong Technology Holding Co Ltd filed Critical Jingdong Technology Holding Co Ltd
Priority to CN202010504496.1A priority Critical patent/CN113760856B/en
Publication of CN113760856A publication Critical patent/CN113760856A/en
Application granted granted Critical
Publication of CN113760856B publication Critical patent/CN113760856B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2379Updates performed during online database operations; commit processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention relates to a database management method and device, a computer readable storage medium and electronic equipment, and relates to the technical field of computers, wherein the database management method comprises the following steps: acquiring a library catalog of a target database and a root catalog corresponding to the library catalog; calculating the number of each root directory and the capacity of each root directory, and calculating the number of subdirectories corresponding to each root directory and the capacity of each subdirectory; generating a database table of the target database according to the database catalogue, the root catalogue, the number of the root catalogues, the capacity of the root catalogues, the number of the subdirectories and the capacity of the subdirectories; and storing the database table of the target database into the relational database so as to facilitate the user to check the database table of the target database in a regular expression matching mode. The embodiment of the invention solves the problem that the current main stream open source big data management platform does not count the use capacity of the database table in the prior art.

Description

Database management method and device, computer readable storage medium and electronic equipment
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a database management method, a database management device, a computer readable storage medium and electronic equipment.
Background
In order to reasonably utilize machine resources, large data clusters are created according to budgets, and are mostly public clusters, and the resources are used more fully, if the use capacity of a certain database table is suddenly increased, the whole use rate is increased, and even writing of other teams is influenced, so that task execution fails, and accidents are caused.
In order to solve the above problem, it is necessary to count the usage capacity of each database table in time, and process the cluster in time when the usage capacity is abnormal.
However, the current mainstream open source big data management platform does not have a statistical function on the usage capacity of the database table.
Therefore, it is desirable to provide a new database management method and apparatus.
It should be noted that the information of the present invention in the above background section is only for enhancing the understanding of the background of the present invention and thus may include information that does not form the prior art that is already known to those of ordinary skill in the art.
Disclosure of Invention
The present invention aims to provide a database management method, a database management device, a computer-readable storage medium, and an electronic apparatus, which further overcome, at least to some extent, the problem that the usage capacity of a database table cannot be counted due to limitations and drawbacks of the related art.
According to one aspect of the present disclosure, there is provided a database management method including:
acquiring a library catalog of a target database and a root catalog corresponding to the library catalog;
calculating the number of the root directories and the capacity of the root directories, and calculating the number of the subdirectories corresponding to the root directories and the capacity of the subdirectories;
generating a database table of the target database according to the library catalogue, the root catalogue, the number of root catalogues, the capacity of the root catalogue, the number of subdirectories and the capacity of the subdirectories;
And storing the database table of the target database into a relational database so as to facilitate the user to check the database table of the target database in a regular expression matching mode.
In an exemplary embodiment of the present disclosure, the database management method further includes:
Calculating the total capacity of the target database according to the database table of the target database;
Calculating the storage duty ratio of the target database in the distributed system according to the total capacity of the target database, and judging whether the storage duty ratio is larger than a first preset threshold value or not;
And when the storage duty ratio is determined to be larger than a first preset threshold value, positioning a root directory and/or a subdirectory for generating abnormal data according to a database table of the target database.
In an exemplary embodiment of the present disclosure, the database management method further includes:
And obtaining the corresponding table data under the root directory and/or the sub-directory generating the abnormal data, and analyzing the reason for generating the abnormal data according to the table data.
In an exemplary embodiment of the present disclosure, the database management method further includes:
judging whether the storage duty ratio is larger than a second preset threshold value or not;
When the storage duty ratio is determined to be larger than a second preset threshold value, generating alarm information corresponding to the target database according to a database table of the target database;
And storing the alarm information into a target database so as to facilitate the expansion of the distributed system by a user according to the alarm information.
In one exemplary embodiment of the present disclosure, storing the database table of the target database into a relational database includes:
Generating a data storage request according to the database table of the target database and the token of the target database;
and sending the data storage request to the relational database so that the relational database stores the database table when the token passes verification.
In one exemplary embodiment of the present disclosure, obtaining a library catalogue of a target database and a root catalogue corresponding to the library catalogue includes:
and acquiring a library catalog of the target database and a root catalog corresponding to the library catalog at intervals of preset time through a statistical script at regular time.
In an exemplary embodiment of the present disclosure, the target database is a Hive database and/or an Hbase database.
According to an aspect of the present disclosure, there is provided a database management apparatus including:
the directory acquisition module is used for acquiring a library directory of the target database and a root directory corresponding to the library directory;
a first calculation module, configured to calculate the number of root directories and the capacity of the root directories, and calculate the number of subdirectories corresponding to the root directories and the capacity of the subdirectories;
a database table generating module, configured to generate a database table of the target database according to the library catalogue, the root catalogue, the number of root catalogues, the capacity of root catalogues, the number of subdirectories, and the capacity of subdirectories;
And the database table storage module is used for storing the database table of the target database into the relational database so as to facilitate the user to check the database table of the target database in a regular expression matching mode.
According to one aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the database management method of any one of the above.
According to one aspect of the present disclosure, there is provided an electronic device including:
a processor; and
A memory for storing executable instructions of the processor;
Wherein the processor is configured to perform any of the database management methods described above via execution of the executable instructions.
According to the database management method provided by the embodiment of the invention, on one hand, the library catalogue of the target database and the root catalogue corresponding to the library catalogue are obtained; the number of the root directories and the capacity of the root directories are calculated, and the number of the subdirectories corresponding to the root directories and the capacity of the subdirectories are calculated; then generating a database table of the target database according to the database catalogue, the root catalogue, the number of each root catalogue, the capacity of each root catalogue, the number of each subdirectory and the capacity of each subdirectory; finally, the database table of the target database is stored in the relational database, so that a user can check the database table of the target database conveniently, and the problem that the current main stream open source big data management platform in the prior art does not count the use capacity of the database table is solved; on the other hand, by storing the database table of the target database into the relational database, a user can check the database table of the target database in a regular expression matching mode, so that the check speed is improved, and further the user experience is improved; on the other hand, the database table comprises the database catalogs, the root catalogs, the quantity of each root catalogs, the capacity of each root catalogs, the quantity of each subdirectory and the capacity of each subdirectory of the target database, so that a user can intuitively see the capacity use condition of each catalogue, and further, the positioning of the catalogue with overlarge or undersize capacity is facilitated.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is evident that the drawings in the following description are only some embodiments of the present invention and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.
Fig. 1 schematically shows a flow chart of a database management method according to an exemplary embodiment of the invention.
Fig. 2 schematically shows a flow chart of another database management method according to an exemplary embodiment of the invention.
Fig. 3 schematically shows a flow chart of another database management method according to an exemplary embodiment of the invention.
Fig. 4 schematically shows a flow chart of another database management method according to an exemplary embodiment of the invention.
Fig. 5 schematically shows a library number trend graph according to an exemplary embodiment of the present invention.
Fig. 6 schematically shows a table number change trend graph according to an exemplary embodiment of the present invention.
Fig. 7 schematically shows a table capacity change trend graph according to an exemplary embodiment of the present invention.
Fig. 8 schematically shows a block diagram of a database management apparatus according to an exemplary embodiment of the present invention.
Fig. 9 schematically shows an electronic device for implementing the above-described database management method according to an exemplary embodiment of the present invention.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known aspects have not been shown or described in detail to avoid obscuring aspects of the invention.
Furthermore, the drawings are merely schematic illustrations of the present invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.
In this exemplary embodiment, a database management method is provided first, where the method may operate on a terminal device, a server cluster, or a cloud server; of course, those skilled in the art may also operate the method of the present invention on other platforms as required, and this is not a particular limitation in the present exemplary embodiment. Referring to fig. 1, the database management method may include the steps of:
And S110, acquiring a library catalog of the target database and a root catalog corresponding to the library catalog.
And S120, calculating the number of the root directories and the capacity of the root directories, and calculating the number of the subdirectories corresponding to the root directories and the capacity of the subdirectories.
S130, generating a database table of the target database according to the library catalogue, the root catalogue, the number of the root catalogues, the capacity of the root catalogue, the number of the subdirectories and the capacity of the subdirectories.
And S140, storing the database table of the target database into a relational database so as to facilitate the user to check the database table of the target database in a regular expression matching mode.
In the database management method, on one hand, a library catalog of the target database and a root catalog corresponding to the library catalog are obtained; the number of the root directories and the capacity of the root directories are calculated, and the number of the subdirectories corresponding to the root directories and the capacity of the subdirectories are calculated; then generating a database table of the target database according to the database catalogue, the root catalogue, the number of each root catalogue, the capacity of each root catalogue, the number of each subdirectory and the capacity of each subdirectory; finally, the database table of the target database is stored in the relational database, so that a user can check the database table of the target database conveniently, and the problem that the current main stream open source big data management platform in the prior art does not count the use capacity of the database table is solved; on the other hand, by storing the database table of the target database into the relational database, a user can check the database table of the target database in a regular expression matching mode, so that the check speed is improved, and further the user experience is improved; on the other hand, the database table comprises the database catalogs, the root catalogs, the quantity of each root catalogs, the capacity of each root catalogs, the quantity of each subdirectory and the capacity of each subdirectory of the target database, so that a user can intuitively see the capacity use condition of each catalogue, and further, the positioning of the catalogue with overlarge or undersize capacity is facilitated.
Hereinafter, each step involved in the database management method according to the exemplary embodiment of the present invention will be explained and described in detail with reference to the accompanying drawings.
First, terms involved in the exemplary embodiments of the present invention are explained and explained.
Hbase, a distributed, column-oriented open source database, is derived from Google paper "Bigtable" written by Fay Chang: a distributed system of structured data. Just as Bigtable utilizes the distributed data store provided by the Google file system (FILE SYSTEM), HBase provides Bigtable-like capabilities over Hadoop. HBase is a child of the Hadoop project of Apache, and is different from a general relational database, which is a database suitable for unstructured data storage, and is different from a column-based rather than row-based schema.
Hive is a data warehouse tool based on Hadoop for data extraction, transformation, and loading, which is a mechanism that can store, query, and analyze large-scale data stored in Hadoop. The Hive data warehouse tool can map a structured data file into a database table, provide SQL query functions, and convert SQL sentences into MapReduce tasks for execution. Hive has the advantages that learning cost is low, rapid MapReduce statistics can be realized through SQL-like sentences, mapReduce is simpler, a special MapReduce application program does not need to be developed, and Hive is quite suitable for statistical analysis of a data warehouse and Windows registry files.
Further, both Hbase and Hive are stored based on HDFS (Hadoop Distributed FILE SYSTEM ), and each has a library (namespaces in Hbase and Hive are a set of tables), and Table (a structure of data stored in Table, hbase and Hive) and a different storage directory corresponding to HDFS.
Next, the objects of the present invention according to the exemplary embodiments of the present invention will be explained and explained. Specifically, since there is no open source product of change summary statistics in Hbase and Hive database tables, when the capacity of a big data cluster increases greatly, it cannot be determined which library or which table is caused by sudden increase, and the problem of inconvenient and rapid positioning is likely to cause the influence on the service. The method collects the daily library table capacity, the number and the file number of the database stored based on the HDFS, such as Hbase, hive and the like, is convenient to check the change trend, is convenient to determine a large table, and is also convenient to determine whether the storage needs to be optimized according to the file number. According to the threshold value set by the platform, when the capacity of the library table suddenly increases and suddenly falls, an administrator is actively informed of whether the attention is abnormal or not, so that problems can be timely found, and the influence on the service is avoided.
Hereinafter, step S110 to step S140 will be explained and explained.
In step S110, a library directory of a target database and a root directory corresponding to the library directory are acquired.
In the present exemplary embodiment, the target database may be a Hive database and an Hbase database, or may be other types of databases based on HDFS storage, which is not limited in this example. Specifically, a library catalog of a target database and a root catalog corresponding to the library catalog can be obtained through a statistical script at regular intervals for a preset time; the preset time may be, for example, one day or 12 hours.
For example, the data management platform may be configured to build in a statistics script, and then trigger the statistics script to perform a statistics task at regular time in the same time period of each day, where the statistics script may perform the statistics task by removing any machine of Hbase and/or Hive according to cluster information to be counted, to obtain a library directory stored based on HDFS and a root directory corresponding to the library directory. The library catalogue can be/hbase/data or/hive/data, and the root catalogue corresponding to the library catalogue can be/hbase/data/default, hbase/data/test1 and the like. Of course, other forms of root directories may be included, which are not particularly limited in this example.
In step S120, the number of each root directory and the capacity of each root directory are calculated, and the number of subdirectories corresponding to each root directory and the capacity of each subdirectory are calculated.
In the present exemplary embodiment, after the above-described library catalogs and the root catalogs corresponding to the library catalogs are acquired, the number of root catalogs under each library catalogs and the number of subdirectories under the root catalogs may be counted by the above-described statistical script, and then the capacity of each root catalogs and the capacity of the subdirectory corresponding to the root catalogs may be counted by the du command, where the capacity refers to the already used capacity.
For example, hbase is stored in a directory above HDFS as/Hbase/data, where the directory is stored in a table of libraries of Hbase, then the statistics script will count the number of all the directories under the directory, and use du command to count the capacity of each directory, where the data is the number of Hbase libraries and the capacity of each library, for example, there are two directories under/Hbase/data, defaults and test1, then these two directories are the two names under the current Hbase database, and the script counts the capacity of these two directories by the Hadoop command du. After counting the number and the capacity of the library, the script counts the number and the capacity of the catalog under each library catalog respectively, namely the number and the capacity of the table under the library, and the counting method is to calculate the catalog count by using a Hadoop command, and the capacity of du is calculated, for example, 100 catalogues exist under a test1 catalog, which means that 100 tables exist under the test1 catalog in the Hbase database, and the capacity of each table is calculated by du respectively. The number of files under each library table is obtained through the count parameter of Hadoop.
In step S130, a database table of the target database is generated according to the library inventory, the root inventory, the number of root inventory, the capacity of root inventory, the number of subdirectories, and the capacity of subdirectories.
In the present exemplary embodiment, after the above-described library inventory, root inventory, number of root inventory, capacity of root inventory, number of subdirectories, and capacity of subdirectories are obtained, the database table of the target database may be generated from the library inventory, root inventory, number of root inventory, capacity of root inventory, number of subdirectories, and capacity of subdirectories. By the method, which library and which table is suddenly increased and decreased or the size of the occupied amount or the number of files can be rapidly positioned, so that the business can be conveniently and timely adjusted, and Hbase and Hive services can be reasonably used.
In step S140, the database table of the target database is stored in a relational database, so that the user can check the database table of the target database in a regular expression matching manner.
In the present exemplary embodiment, first, the database table of the target database is stored into the relational database. Wherein storing the database table of the target database into the relational database may include: firstly, generating a data storage request according to a database table of the target database and a token of the target database; and secondly, sending the data storage request to the relational database so that the relational database stores the database table when confirming that the token passes the verification. Wherein the token is provided to the target database after the target database is authenticated by the relational database. Therefore, a data storage request can be generated according to the token and the database table, and after the data storage request is received by the relation database, the database table can be stored after the token passes the verification; in the relational database, the stored key may be a database name of the target database, and the value may be a database table corresponding to the target database.
Secondly, after the database table is successfully stored, when a user needs to check the database table of the target database, the target database can be checked directly in a regular expression matching mode; for example, the subdirectory under the root directory default under the data library directory of the Hbase database can be directly checked in a hbase+data+default mode; by this method, the viewing speed can be improved.
Fig. 2 schematically illustrates another database management method according to an exemplary embodiment of the present invention. Referring to fig. 2, the database management method may further include steps S210 to S230. Wherein:
in step S210, the total capacity of the target database is calculated from the database table of the target database.
In step S220, a storage duty ratio of the target database in the distributed system is calculated according to the total capacity of the target database, and it is determined whether the storage duty ratio is greater than a first preset threshold.
In step S230, when it is determined that the storage duty ratio is greater than the first preset threshold, the root directory and/or the subdirectory generating the abnormal data is located according to the database table of the target database.
Hereinafter, step S210 to step S230 will be explained and explained. Firstly, the total capacity of a target database can be calculated according to the capacity (the capacity of a root directory and the capacity of a subdirectory) of each directory in a database table of the target database, then the storage duty ratio of the target database in a distributed system is calculated according to the total capacity (the used capacity of the target database) of the target database and the total capacity (the total usable capacity of the distributed system) of the distributed system, and then whether the storage duty ratio is larger than a first preset threshold value is judged; the first preset threshold value can be determined according to an average value of storage duty ratios of each day of the target data in a period of time; further, if the data of the database is larger than the first preset threshold value, the data of the database is abnormal, and the root directory and/or the subdirectory for generating abnormal data can be directly positioned according to the database table; by the method, the positioning speed can be improved, so that a manager can analyze the reasons of the abnormality as soon as possible.
Meanwhile, in order to further avoid that the abnormal data affects the distributed system and further affects the use of other users, the database management method may further include: and obtaining the corresponding table data under the root directory and/or the sub-directory generating the abnormal data, and analyzing the reason for generating the abnormal data according to the table data. If the reason for generating the abnormal data is that the normal flow is increased to cause the data to be proliferated, the capacity can be directly expanded or the reason is not required, if the reason is other malicious reasons, corresponding measures can be taken for a data producer generating the abnormal data, and therefore the safety of the distributed system can be improved.
It should be noted that if the storage duty ratio is too small, the storage duty ratio may be considered as abnormal data, and then the subdirectory and/or the root directory with the capacity smaller than a certain value may be deleted, so as to avoid the problem that the directory in the database table is too many and is inconvenient to view.
Fig. 3 schematically illustrates another database management method according to an exemplary embodiment of the invention. Referring to fig. 3, the database management method may further include steps S310 to S330. Wherein:
in step S310, it is determined whether the storage duty cycle is greater than a second preset threshold.
In step S320, when it is determined that the storage duty ratio is greater than a second preset threshold, alarm information corresponding to the target database is generated according to the database table of the target database.
In step S330, the alarm information is stored in a target database, so that the user can expand the capacity of the distributed system according to the alarm information.
Hereinafter, step S310 to step S330 will be explained and explained. Firstly, judging whether the storage duty ratio is larger than a second preset threshold, wherein the second preset threshold can be ninety percent, can be other values, can be set according to the needs, and is not particularly limited in the example; secondly, if the number of the distributed system is more than ninety percent, corresponding alarm information is generated, so that a manager can conveniently expand the distributed system, and further, the use of other users is prevented from being influenced; of course, if the color is between seventy percent and ninety percent, the color can also be used for warning, and the manager can judge whether the capacity expansion process is needed according to the actual situation.
The database management method according to the exemplary embodiment of the present invention is further explained and illustrated in conjunction with fig. 4. Referring to fig. 4, the database management method may include the steps of:
In step S410, the program embeds timing tasks and performs statistics every day.
And step S420, according to the statistical script parameters, performing statistics until Hbase and Hive are designated, and storing a relational database. Specifically, a client machine of Hbase or Hive is selected, a Hadoop command is executed to count the size and the number, a path is recorded by a self-grinding platform and is provided to a statistics script, for example, hbase is stored in a directory above HDFS and is a database/data, the directory is a database table of Hbase, the statistics script counts the number of all directories under the directory, a du command is used to count the number of Hbase databases and the capacity of each database, for example, two directories are below Hbase/data, for example, two directories are below Hbase database, default and test1, and the script counts the capacity sizes of the two directories through Hadoop command du. After counting the number and the capacity of the libraries, the script counts the number and the capacity of the catalogs under each library catalog respectively, namely the number and the capacity of the tables under the library, and the counting method is to calculate the catalog count and the capacity of the du by using Hadoop commands, for example, 100 catalogs exist under test1 catalog, which means that 100 tables exist under test1 catalog in Hbase database, and the capacity of each table is calculated by du respectively. The number of files under each library table is obtained through the count parameter of Hadoop. Finally, the data are transmitted into a database of the self-grinding platform through an interface provided by the self-grinding platform and a token authentication mode.
Step S430, summarizing the database table information according to the conditions so that the user can check the database table information according to the requirements. Specifically, when the user checks the statistics information of the library table, clicking and checking the statistics information of the library table on the detail pages such as Hbase, hive and the like corresponding to the platform. For example, a specific representation of Hbase is shown with reference to FIG. 5. Wherein fig. 5 shows a trend graph of the bin number under Hbase so that a user can view the trend of the change in the bin capacity.
Further, clicking on the details can enter the warehouse to check the information of the quantity, capacity size and file quantity of all tables in the warehouse. Also, the bar graph shows the trend of the number of tables under this warehouse (refer to fig. 6 in particular), and the list shows the detailed information of each table at present. Similarly, clicking on the trend behind the table name can see the trend of the change in table capacity (see fig. 7 for details).
Furthermore, after the information of the library table is collected every day, the self-grinding platform calculates the increment and decrement of each library table, and if the increment exceeds the threshold set by the platform, alarm information is sent to an administrator to prompt the administrator to pay attention to whether the abnormality exists or not. If the capacity of the table has not changed recently, the table can be communicated with a business party, can be used or not, can be cleaned, and reduces useless tables in a warehouse.
The database management method provided by the example embodiment of the invention has at least the following advantages:
On the one hand, configuration information collected during cluster deployment, such as the position of a warehouse stored on an HDFS, client information and the like, is obtained through a storage catalog configured by a Hadoop command executed by a client at regular time every day, hbase and Hive library table statistical information including capacity, quantity and file quantity is stored in a platform database, so that the problem is conveniently checked, and meanwhile, an alarm notification can be triggered by sudden increase and sudden decrease.
On the other hand, when the Hbase and Hive library tables use the storage capacity alarm, the method can quickly judge which library table is caused or judge whether the service has a plurality of useless tables; meanwhile, the changes of the library table capacity, the number and the file number of Hbase and Hive can be conveniently checked, which library and which table are suddenly increased and suddenly decreased or the size of the occupied amount and the number of files can be conveniently positioned, the business can be conveniently and timely adjusted, and Hbase and Hive services can be reasonably used.
On the other hand, based on the self-grinding big data management platform, the statistical script is executed at regular time every day, database statistical base table information stored based on the HDFS is obtained for each Hbase, hive and the like on the platform, and the collected information is stored in a relational database of the self-grinding platform.
Further, the Hbase, hive library quantity, library capacity, table quantity, table capacity and file quantity of the existing large data cluster are counted daily, and the recent change trend can be checked, so that the problem of checking which library and table have large recent change is conveniently located. When the statistics is carried out, the library names and the table names can be matched according to the regular expression, so that the problem of longer statistical period caused by counting a large number of unimportant libraries or tables is avoided.
The embodiment of the invention also provides a database management device. Referring to fig. 8, the database management apparatus may include a catalog acquisition module 810, a first calculation module 820, a database table generation module 830, and a database table storage module 840. Wherein:
The catalog acquisition module 810 may be configured to acquire a library catalog of the target database and a root catalog corresponding to the library catalog.
The first calculation module 820 may be configured to calculate the number of each of the root directories and the capacity of each of the root directories, and calculate the number of sub-directories corresponding to each of the root directories and the capacity of each of the sub-directories.
The database table generating module 830 may be configured to generate a database table of the target database according to the library directory, the root directory, the number of root directories, the capacity of root directories, the number of subdirectories, and the capacity of subdirectories.
The database table storage module 840 may be configured to store the database table of the target database into a relational database, so that a user can view the database table of the target database in a regular expression matching manner.
In an exemplary embodiment of the present disclosure, the database management apparatus further includes:
The second calculation module can be used for calculating the total capacity of the target database according to the database table of the target database;
The third calculation module can be used for calculating the storage duty ratio of the target database in the distributed system according to the total capacity of the target database and judging whether the storage duty ratio is larger than a first preset threshold value or not;
And the directory positioning module is used for positioning the root directory and/or the subdirectory for generating the abnormal data according to the database table of the target database when the storage duty ratio is determined to be larger than a first preset threshold value.
In an exemplary embodiment of the present disclosure, the database management apparatus further includes:
The abnormal reason analysis module can be used for acquiring the table data corresponding to the root directory and/or the sub-directory generating the abnormal data and analyzing the reason for generating the abnormal data according to the table data.
In an exemplary embodiment of the present disclosure, the database management apparatus further includes:
the storage duty ratio judging module can be used for judging whether the storage duty ratio is larger than a second preset threshold value or not;
The alarm information generation module can be used for generating alarm information corresponding to the target database according to the database table of the target database when the storage duty ratio is determined to be larger than a second preset threshold;
And the alarm information storage module can be used for storing the alarm information into a target database so as to facilitate the capacity expansion of the distributed system according to the alarm information by a user.
In one exemplary embodiment of the present disclosure, storing the database table of the target database into a relational database includes:
Generating a data storage request according to the database table of the target database and the token of the target database;
and sending the data storage request to the relational database so that the relational database stores the database table when the token passes verification.
In one exemplary embodiment of the present disclosure, obtaining a library catalogue of a target database and a root catalogue corresponding to the library catalogue includes:
and acquiring a library catalog of the target database and a root catalog corresponding to the library catalog at intervals of preset time through a statistical script at regular time.
In an exemplary embodiment of the present disclosure, the target database is a Hive database and/or an Hbase database.
The details of each module in the above database management device are described in detail in the corresponding database management method, so that they will not be described in detail here.
It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the invention. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.
Furthermore, although the steps of the methods of the present invention are depicted in the accompanying drawings in a particular order, this is not required to or suggested that the steps must be performed in this particular order or that all of the steps shown be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.
In an exemplary embodiment of the present invention, an electronic device capable of implementing the above method is also provided.
Those skilled in the art will appreciate that the various aspects of the invention may be implemented as a system, method, or program product. Accordingly, aspects of the invention may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.
An electronic device 900 according to such an embodiment of the invention is described below with reference to fig. 9. The electronic device 900 shown in fig. 9 is merely an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.
As shown in fig. 9, the electronic device 900 is embodied in the form of a general purpose computing device. Components of electronic device 900 may include, but are not limited to: the at least one processing unit 910, the at least one storage unit 920, a bus 930 connecting the different system components (including the storage unit 920 and the processing unit 910), and a display unit 940.
Wherein the storage unit stores program code that is executable by the processing unit 910 such that the processing unit 910 performs steps according to various exemplary embodiments of the present invention described in the above-described "exemplary methods" section of the present specification. For example, the processing unit 910 may perform step S110 as shown in fig. 1: acquiring a library catalog of a target database and a root catalog corresponding to the library catalog; step S120: calculating the number of the root directories and the capacity of the root directories, and calculating the number of the subdirectories corresponding to the root directories and the capacity of the subdirectories; step S130: generating a database table of the target database according to the library catalogue, the root catalogue, the number of root catalogues, the capacity of the root catalogue, the number of subdirectories and the capacity of the subdirectories; step S140: and storing the database table of the target database into a relational database so as to facilitate the user to check the database table of the target database in a regular expression matching mode.
The storage unit 920 may include readable media in the form of volatile storage units, such as Random Access Memory (RAM) 9201 and/or cache memory 9202, and may further include Read Only Memory (ROM) 9203.
The storage unit 920 may also include a program/utility 9204 having a set (at least one) of program modules 9205, such program modules 9205 include, but are not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
The bus 930 may be one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 900 may also communicate with one or more external devices 1000 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 900, and/or with any device (e.g., router, modem, etc.) that enables the electronic device 900 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 950. Also, electronic device 900 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 960. As shown, the network adapter 960 communicates with other modules of the electronic device 900 over the bus 930. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 900, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present invention may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the embodiments of the present invention.
In an exemplary embodiment of the present invention, a computer-readable storage medium having stored thereon a program product capable of implementing the method described above in the present specification is also provided. In some possible embodiments, the various aspects of the invention may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the invention as described in the "exemplary methods" section of this specification, when said program product is run on the terminal device.
A program product for implementing the above-described method according to an embodiment of the present invention may employ a portable compact disc read-only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
Furthermore, the above-described drawings are only schematic illustrations of processes included in the method according to the exemplary embodiment of the present invention, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.
Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

Claims (7)

1. A method of database management, comprising:
acquiring a library catalog of a target database and a root catalog corresponding to the library catalog, wherein the target database is a Hive database and/or an Hbase database;
calculating the number of the root directories and the capacity of the root directories, and calculating the number of the subdirectories corresponding to the root directories and the capacity of the subdirectories;
generating a database table of the target database according to the library catalogue, the root catalogue, the number of root catalogues, the capacity of the root catalogue, the number of subdirectories and the capacity of the subdirectories;
storing the database table of the target database into a relational database so as to facilitate the user to check the database table of the target database in a regular expression matching mode;
The method further comprises the steps of: calculating the total capacity of the target database according to the database table of the target database;
Calculating the storage duty ratio of the target database in a distributed system according to the total capacity of the target database, and judging whether the storage duty ratio is larger than a first preset threshold value, wherein the first preset threshold value is determined according to the average value of the storage duty ratio of each day of the target database in a period of time;
When the storage duty ratio is determined to be larger than a first preset threshold value, positioning a root directory and/or a subdirectory for generating abnormal data according to a database table of the target database;
And obtaining the corresponding table data under the root directory and/or the sub-directory generating the abnormal data, and analyzing the reason for generating the abnormal data according to the table data.
2. The database management method according to claim 1, wherein the database management method further comprises:
judging whether the storage duty ratio is larger than a second preset threshold value or not;
When the storage duty ratio is determined to be larger than a second preset threshold value, generating alarm information corresponding to the target database according to a database table of the target database;
And storing the alarm information into a target database so as to facilitate the expansion of the distributed system by a user according to the alarm information.
3. The database management method according to claim 1, wherein storing the database table of the target database into a relational database comprises:
Generating a data storage request according to the database table of the target database and the token of the target database;
and sending the data storage request to the relational database so that the relational database stores the database table when the token passes verification.
4. The database management method according to claim 1, wherein obtaining a library directory of a target database and a root directory corresponding to the library directory comprises:
and acquiring a library catalog of the target database and a root catalog corresponding to the library catalog at intervals of preset time through a statistical script at regular time.
5. A database management apparatus, comprising:
the directory acquisition module is used for acquiring a library directory of a target database and a root directory corresponding to the library directory, wherein the target database is a Hive database and/or an Hbase database;
a first calculation module, configured to calculate the number of root directories and the capacity of the root directories, and calculate the number of subdirectories corresponding to the root directories and the capacity of the subdirectories;
a database table generating module, configured to generate a database table of the target database according to the library catalogue, the root catalogue, the number of root catalogues, the capacity of root catalogues, the number of subdirectories, and the capacity of subdirectories;
The database table storage module is used for storing the database table of the target database into the relational database so as to facilitate the user to check the database table of the target database in a regular expression matching mode;
The database management apparatus further includes:
The second calculation module can be used for calculating the total capacity of the target database according to the database table of the target database;
The third calculation module can be used for calculating the storage duty ratio of the target database in the distributed system according to the total capacity of the target database and judging whether the storage duty ratio is larger than a first preset threshold value or not;
The directory positioning module is configured to, when it is determined that the storage duty ratio is greater than a first preset threshold, position a root directory and/or a subdirectory that generate abnormal data according to a database table of the target database, where the first preset threshold is determined according to an average value of storage duty ratios of each day of the target database in a period of time;
The abnormal reason analysis module can be used for acquiring the table data corresponding to the root directory and/or the sub-directory generating the abnormal data and analyzing the reason for generating the abnormal data according to the table data.
6. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the database management method of any of claims 1-4.
7. An electronic device, comprising:
a processor; and
A memory for storing executable instructions of the processor;
wherein the processor is configured to perform the database management method of any of claims 1-4 via execution of the executable instructions.
CN202010504496.1A 2020-06-05 2020-06-05 Database management method and device, computer readable storage medium and electronic equipment Active CN113760856B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010504496.1A CN113760856B (en) 2020-06-05 2020-06-05 Database management method and device, computer readable storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010504496.1A CN113760856B (en) 2020-06-05 2020-06-05 Database management method and device, computer readable storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN113760856A CN113760856A (en) 2021-12-07
CN113760856B true CN113760856B (en) 2024-06-18

Family

ID=78783949

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010504496.1A Active CN113760856B (en) 2020-06-05 2020-06-05 Database management method and device, computer readable storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN113760856B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610279A (en) * 2021-07-20 2021-11-05 中国石油大学(华东) Accident prediction method based on data set regularity

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105335405A (en) * 2014-07-29 2016-02-17 北京奇虎科技有限公司 System file detection method and device

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05334249A (en) * 1992-05-28 1993-12-17 Nec Corp Nesting management system for catalog in interactive processing system
CN102722487B (en) * 2011-03-30 2016-08-24 腾讯科技(深圳)有限公司 File management method and device
US8843456B2 (en) * 2011-12-06 2014-09-23 International Business Machines Corporation Database table compression
CN103870603A (en) * 2014-04-03 2014-06-18 联想(北京)有限公司 Directory management method and electronic device
CN104239493B (en) * 2014-09-09 2017-05-10 北京京东尚科信息技术有限公司 cross-cluster data migration method and system
TW201738778A (en) * 2016-04-28 2017-11-01 原形研發股份有限公司 Multi hard-disk file management system and method thereof
CN109117083B (en) * 2017-06-26 2021-05-28 深圳回收宝科技有限公司 Mobile terminal, built-in storage capacity detection method, and computer-readable storage medium
TWI656451B (en) * 2018-03-23 2019-04-11 中華電信股份有限公司 Method of managing content structure configuration and apparatus using the same
CN111209259B (en) * 2018-11-22 2023-09-05 杭州海康威视***技术有限公司 NAS distributed file system and data processing method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105335405A (en) * 2014-07-29 2016-02-17 北京奇虎科技有限公司 System file detection method and device

Also Published As

Publication number Publication date
CN113760856A (en) 2021-12-07

Similar Documents

Publication Publication Date Title
US10338977B2 (en) Cluster-based processing of unstructured log messages
CN109844781B (en) System and method for identifying process flows from log files and visualizing the flows
CN107506451B (en) Abnormal information monitoring method and device for data interaction
US10452465B2 (en) Techniques for managing and analyzing log data
US10776220B2 (en) Systems and methods for monitoring distributed database deployments
US10713271B2 (en) Querying distributed log data using virtual fields defined in query strings
JP6865219B2 (en) Event batch processing, output sequencing, and log-based state storage in continuous query processing
US9612936B2 (en) Correlation of source code with system dump information
WO2019140828A1 (en) Electronic apparatus, method for querying logs in distributed system, and storage medium
US20180349257A1 (en) Systems and methods for test prediction in continuous integration environments
KR102614428B1 (en) Systems and methods for updating multi-tier cloud-based application stacks
US20210027503A1 (en) Systems and methods for displaying representative samples of tabular data
US10855750B2 (en) Centralized management of webservice resources in an enterprise
CN112000992B (en) Data leakage prevention protection method and device, computer readable medium and electronic equipment
CN113760856B (en) Database management method and device, computer readable storage medium and electronic equipment
CN107894942B (en) Method and device for monitoring data table access amount
CN116069725A (en) File migration method, device, apparatus, medium and program product
CN113138974A (en) Database compliance detection method and device
CN111352985A (en) Data service platform, method and storage medium based on computer system
CN116974638A (en) Data processing method, apparatus, device, computer program, and storage medium
CN117041678A (en) Video backtracking method, video backtracking device, equipment and storage medium
CN113778994A (en) Database detection method and device, electronic equipment and computer readable medium
CN114647364A (en) End-to-end management method and system for storage capacity
CN117093555A (en) Method, device, equipment and readable storage medium for acquiring equipment state information
KR20190143521A (en) Apparatus and method for managing storage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant after: Jingdong Technology Holding Co.,Ltd.

Address before: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant before: Jingdong Digital Technology Holding Co.,Ltd.

Address after: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant after: Jingdong Digital Technology Holding Co.,Ltd.

Address before: Room 221, 2 / F, block C, 18 Kechuang 11th Street, Daxing District, Beijing, 100176

Applicant before: JINGDONG DIGITAL TECHNOLOGY HOLDINGS Co.,Ltd.

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant