CN113342832B

CN113342832B - Database indexing method

Info

Publication number: CN113342832B
Application number: CN202110888568.1A
Authority: CN
Inventors: 骆彬
Original assignee: Beijing Fast Cube Technology Co ltd
Current assignee: Beijing Fast Cube Technology Co ltd
Priority date: 2021-08-04
Filing date: 2021-08-04
Publication date: 2021-11-02
Anticipated expiration: 2041-08-04
Also published as: CN113342832A

Abstract

The invention is suitable for the technical field of electric digital processing, and particularly relates to a database indexing method, which comprises the following steps: receiving a database index request, wherein the database index request comprises a retrieval key value; performing index operation on the database according to the retrieval key value in the database index request; the database comprises an index area and a data storage area, wherein the index area is generated according to data stored in the data storage area. According to the database indexing method provided by the embodiment of the invention, the creation of the index is completed according to the content of the data in the data storage area, so that the memory amount occupied by the index can be greatly reduced, and the effective data storage amount is improved.

Description

Database indexing method

Technical Field

The invention belongs to the technical field of electric digital processing, and particularly relates to a database indexing method.

Background

The database is a container for storing data, and the database has huge storage capacity and can store hundreds of millions of data. However, the storage of data is not random, the contents in the database need to be stored according to a certain rule, and if the data storage is not reasonable, the efficiency of data query is extremely low, and the use of data is affected.

In the current database, in order to facilitate the retrieval of the contents in the database, an index is generally set for the database, the index is an independent and physical storage structure for sorting one or more columns of values in a database table, and the index is equivalent to a book directory, and corresponding data can be quickly found according to the book directory, so that the deceleration efficiency is improved.

In the existing indexing method, indexes are directly set according to data volume, and the number of the indexes is large, so that the occupied amount of a memory is large, although the data retrieval speed can be improved, the effective storage space is reduced, the utilization rate of the storage space is reduced, and the cost is indirectly increased.

Disclosure of Invention

The embodiment of the invention aims to provide a database indexing method, aiming at solving the problems in the background technology.

The embodiment of the invention is realized in such a way that a database indexing method comprises the following steps:

receiving a database index request, wherein the database index request comprises a retrieval key value;

performing index operation on the database according to the retrieval key value in the database index request;

the database comprises an index area and a data storage area, wherein the index area is generated according to data stored in the data storage area;

the step of generating the index area according to the data stored in the data storage area specifically includes:

analyzing data to be stored to obtain a data analysis result, wherein the data analysis result at least comprises data storage time and data content type;

generating index data directory entries according to the data analysis results;

writing data to be stored into a corresponding data storage area according to data storage time, wherein the data storage area is divided into at least two independent storage areas, each independent storage area corresponds to a storage time period, all the storage time periods are not overlapped, the sum of the storage time periods is the same as the daily use time of a database, and the independent storage areas are divided into independent storage sub-areas according to data content types;

and writing index data directory entries into an index area, wherein the index area is divided into at least two independent index areas, and the number of the independent index areas is the same as that of the independent storage areas.

Preferably, the step of performing an index operation on the database according to the retrieval key value in the database index request specifically includes:

analyzing a retrieval key value in a database index request to obtain a fuzzy retrieval value, wherein the fuzzy retrieval value is used for representing data characteristics contained in the retrieval key value;

index retrieval is carried out on the index area according to the fuzzy retrieval value to obtain index identification, the index identification comprises time index identifiers and/or partition index identifiers, the time index identifiers correspond to the independent storage areas one by one, and each independent storage area has an independent storage subarea corresponding to the same partition index identifier;

and retrieving the independent storage area corresponding to the index identifier by taking the retrieval key value as a retrieval source to obtain an index operation result.

Preferably, the step of retrieving the independent storage area corresponding to the index identifier by using the retrieval key value as the retrieval source further includes determining the index identifier:

if the index identification only contains the time index identifier, positioning a corresponding independent storage area in the data storage area according to the time index identifier;

if the index identification only contains the partition index identifier, positioning corresponding independent storage subareas in all the independent storage areas according to the partition index identifier;

if the index identifier contains both the time index identifier and the partition index identifier, the corresponding independent storage area is positioned according to the time index identifier, and then the corresponding independent storage sub-area in the independent storage area is positioned according to the partition index identifier.

Preferably, the step of analyzing the retrieval key value in the database index request to obtain the fuzzy retrieval value includes:

extracting keywords from the retrieval key value to obtain a keyword group, wherein the keyword group at least comprises one keyword;

judging whether time information is recorded according to the keywords in the keyword group;

if the time information is recorded, the time information is extracted, and a fuzzy search value is generated.

Preferably, the data in the independent storage subareas are stored by adopting a T-tree structure.

Preferably, the number of the independent storage areas is 24.

Preferably, the data in the data storage area is encrypted when stored and decrypted when read.

Another object of an embodiment of the present invention is to provide a database indexing system, including:

a request receiving module, configured to receive a database index request, where the database index request includes a retrieval key value;

the operation execution module is used for carrying out index operation on the database according to the retrieval key value in the database index request;

the database comprises an index area and a data storage area, wherein the index area is generated according to data stored in the data storage area.

Preferably, the operation executing module includes:

the analysis unit is used for analyzing the retrieval key value in the database index request to obtain a fuzzy retrieval value, and the fuzzy retrieval value is used for representing the data characteristics contained in the retrieval key value;

the index retrieval unit is used for performing index retrieval on the index areas according to the fuzzy retrieval value to obtain index identifications, wherein the index identifications comprise time index identifiers and/or partition index identifiers, the time index identifiers correspond to the independent storage areas one by one, and each independent storage area has an independent storage sub-area corresponding to the same partition index identifier;

and the data retrieval unit is used for retrieving the independent storage area corresponding to the index identifier by taking the retrieval key value as a retrieval source to obtain an index operation result.

Preferably, the analysis unit includes:

the keyword extraction subunit is used for extracting keywords from the retrieval key values to obtain a keyword group, wherein the keyword group at least comprises one keyword;

a judging subunit, configured to judge whether time information is recorded according to the keywords in the keyword group;

According to the database indexing method provided by the embodiment of the invention, the creation of the index is completed according to the content of the data in the data storage area, so that the memory amount occupied by the index can be greatly reduced, and the effective data storage amount is improved.

Drawings

Fig. 1 is a flowchart of a database indexing method according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating steps of generating an index area according to data stored in a data storage area according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating steps of performing an index operation on a database according to a retrieval key in a database index request according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating a step of retrieving the independent storage area corresponding to the index identifier by using the retrieval key value as the retrieval source according to an embodiment of the present invention;

FIG. 5 is a flowchart illustrating a step of resolving a search key value in a database index request to obtain a fuzzy search value according to an embodiment of the present invention;

FIG. 6 is an architecture diagram of a database indexing system provided by an embodiment of the present invention;

FIG. 7 is an architecture diagram of an operation execution module provided in an embodiment of the present invention;

fig. 8 is an architecture diagram of a parsing unit according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

It will be understood that, as used herein, the terms "first," "second," and the like may be used herein to describe various elements, but these elements are not limited by these terms unless otherwise specified. These terms are only used to distinguish one element from another. For example, a first xx script may be referred to as a second xx script, and similarly, a second xx script may be referred to as a first xx script, without departing from the scope of the present application.

Fig. 1 is a database indexing method provided in an embodiment of the present invention, where the method includes:

s100, receiving a database index request, wherein the database index request comprises a retrieval key value.

In the step, a database index request is received, wherein the database index request comprises a retrieval key value, the database index request is a retrieval instruction sent by a user, and the database index request is information input by the user and used for retrieval.

S200, indexing the database according to the retrieval key value in the database indexing request, wherein the database comprises an indexing area and a data storage area, and the indexing area is generated according to data stored in the data storage area.

In this step, an index operation is performed on the database according to a retrieval key value in a database index request, where the database has an index area and a data storage area, the data storage area is used for storing data, the index area is used for storing a corresponding index, when the database index request is received, the index area is retrieved according to the database index request, then a corresponding result is extracted from the data storage area according to the retrieval result, the index area is generated according to the data stored in the data storage area, in the prior art, the index is built according to the data itself, so that a large number of indexes need to be built, which is equivalent to building a catalog for a book, in order to improve the retrieval speed, the retrieval speed can be improved by additionally setting catalogues, but since the quantity of catalogues in a book is increased, the catalogues may occupy one third of the thickness of the whole book, the invention can reduce the generation of index and save memory, and has high efficiency in searching because the data quantity in the index pointed area is limited.

As shown in fig. 2, as a preferred embodiment of the present invention, the step of generating the index area according to the data stored in the data storage area specifically includes:

s301, analyzing the data to be stored to obtain a data analysis result, wherein the data analysis result at least comprises data storage time and data content type.

In this step, the data to be stored is analyzed to obtain a data analysis result, where the data analysis result at least includes the data storage time and the data content type, where the data content type is content information used to represent the data, for example, if the information stored in a certain data block is data content such as a name list, a number list, a process method, or a cargo list, the data is classified according to the content, and the storage time of each data is reflected in the data analysis result.

And S302, generating index data directory entries according to the data analysis result.

In this step, an index data directory entry is generated according to the data analysis result, the index data directory entry is the specific content of the index, and the data content type in the corresponding region can be known according to the specific content of the index, so that the classification can be performed quickly.

And S303, writing the data to be stored into the corresponding data storage area according to the data storage time.

In this step, data to be stored is written into a corresponding data storage area according to the time of data storage, the data in the data storage area is encrypted when being stored, and is decrypted when being read, the data storage area is divided into at least two independent storage areas, each independent storage area corresponds to a storage time period, the storage time periods are not overlapped, the sum of the storage time periods is the same as the daily use time of the database, the independent storage areas are divided into independent storage subregions according to the data content type, for example, the use time of the database is 12 hours in one day, the remaining 12-hour database is maintained or no data is written or other operations exist, then 12 hours in use state are divided, the continuous 12 storage time periods with the duration of 1 hour are obtained by continuously dividing according to the step length of one hour and are not overlapped with each other, therefore, the data storage area is divided into 12 areas to obtain 12 independent storage areas, and the 12 independent storage areas respectively store the data to be stored in the storage time periods corresponding to the independent storage areas, so that the partitioned storage is realized according to the time periods, and the primary retrieval can be carried out according to the time sections when the data retrieval is carried out, so that the retrieval range is greatly reduced, and the retrieval efficiency is improved.

S304, writing the index data directory entry into an index area, wherein the index area is divided into at least two independent index areas, and the number of the independent index areas is the same as that of the independent storage areas.

In this step, the index data directory entry is written into the index area, and each independent index area corresponds to an independent storage area, so that the number of the index data directory entry is the same, that is, if the data storage area is currently divided into 12 areas, the independent index area is also divided into 12 areas, the two areas are in one-to-one correspondence, and the positions of the independent storage areas corresponding to the independent index areas are recorded in the independent index areas.

As shown in fig. 3, as a preferred embodiment of the present invention, the step of performing an index operation on the database according to the retrieval key value in the database index request specifically includes:

s201, analyzing a retrieval key value in the database index request to obtain a fuzzy retrieval value, wherein the fuzzy retrieval value is used for representing data characteristics contained in the retrieval key value.

In the step, the retrieval key value in the database index request is analyzed, the retrieval key value is analyzed, so that the content contained in the database index request is identified, the data characteristics of the content are extracted, a corresponding fuzzy retrieval value is generated according to the data characteristics, and the fuzzy retrieval value is used for carrying out fuzzy retrieval in the index area, so that the position of the data is preliminarily determined.

And S202, performing index retrieval on the index area according to the fuzzy retrieval value to obtain an index identifier.

In the step, index retrieval is carried out on the index area according to the fuzzy retrieval value, so that corresponding index identification is obtained, wherein the index identification comprises time index identifiers and/or partition index identifiers, the time index identifiers correspond to the independent storage areas one to one, each independent storage area has an independent storage sub-area corresponding to the same partition index identifier, the time index identifiers are used for pointing to different independent storage areas, and the partition index identifiers point to all the same independent storage sub-areas; the data in the independent storage subareas are stored by adopting a T-tree structure, and the number of the independent storage subareas is 24, namely, each independent storage subarea stores half an hour of data.

S203, the independent storage area corresponding to the index identification is searched by taking the search key value as a search source to obtain an index operation result.

In the step, the retrieval range is narrowed according to the index identification, when the retrieval range is narrowed to the minimum, all data in the minimum range are traversed by taking the retrieval key value as a retrieval source until the data matched with the retrieval key value is found, and the index operation result is obtained after the retrieval is finished.

As shown in fig. 4, as a preferred embodiment of the present invention, the step of retrieving the independent storage area corresponding to the index identifier by using the retrieval key value as the retrieval source further includes determining the index identifier:

s2031, if the index mark only contains the time index identifier, the corresponding independent storage area in the data storage area is located according to the time index identifier.

In this step, if the index identifier only includes the time index identifier, it indicates that the retrieval information input by the current user only includes the time for storing the data to be retrieved, and therefore, the retrieval range can only be narrowed according to the time, for example, if the user needs to retrieve a staff list stored in the database for one hour, the corresponding independent storage area in the time is located between the time and the time, and then the data to be retrieved is compared with the data in the independent storage area, and finally the retrieval result is generated.

S2032, if the index identification only contains the partition index identifier, positioning the corresponding independent storage subarea in all the independent storage areas according to the partition index identifier.

In this step, if the index identifier only includes the partition index identifier, which indicates that the user does not input the storage time of the data, but inputs the data content type of the data to be retrieved, and for different independent storage areas, the contents of the same data type are stored therein, then the independent storage sub-area with the data content type in each independent storage area is located according to the partition index identifier, and the data to be retrieved is compared with the data of the independent storage area, so as to finally generate the retrieval result.

S2033, if the index identification contains both the time index identifier and the partition index identifier, the corresponding independent storage area is located according to the time index identifier, and then the corresponding independent storage sub-area in the independent storage area is located according to the partition index identifier.

In this step, if the index identifier contains both the time index identifier and the partition index identifier, the corresponding independent storage area is located according to the time index identifier, and then the corresponding independent storage sub-area in the independent storage area is located according to the partition index identifier.

As shown in fig. 5, as a preferred embodiment of the present invention, the step of analyzing the search key value in the database index request to obtain the fuzzy search value specifically includes:

s2011, extracting keywords from the search keyword to obtain a keyword group, where the keyword group includes at least one keyword.

In this step, the search key value is extracted by keywords, that is, the search key value is divided into a plurality of phrases, then the content to be expressed of the current search key value can be determined according to the semantics of the phrases, the content to be expressed is re-described according to the preset keywords, so as to obtain the keyword group, that is, the user inputs "name", and finally corrects the keyword group into standard words such as "name", etc., so as to facilitate subsequent search.

S2012 determines whether or not time information is recorded based on the keywords in the keyword group.

And S2013, if the time information is recorded, extracting the time information and generating a fuzzy retrieval value.

In this step, time determination is first performed, and if the search key value includes time information, the search can be directly positioned in a specific time period during the search, so that the search range can be greatly reduced, and if the search key value does not include time information, the data content type can be directly determined only according to the information in the key group, and finally the fuzzy search value is generated.

As shown in fig. 6, the database indexing system provided by the present invention includes:

a request receiving module 100, configured to receive a database index request, where the database index request includes a search key value.

In the system, a database index request is received, wherein the database index request comprises a retrieval key value, namely the database index request is a retrieval instruction sent by a user, and the database index request is information input by the user for retrieval.

The operation execution module 200 is configured to perform an index operation on the database according to the retrieval key value in the database index request; the database comprises an index area and a data storage area, wherein the index area is generated according to data stored in the data storage area.

In the system, the database is indexed according to a retrieval key value in a database index request, wherein the database comprises an index area and a data storage area, the data storage area is used for storing data, the index area is used for storing a corresponding index, when the database index request is received, the index area is retrieved according to the database index request, then the corresponding result is extracted from the data storage area according to the retrieval result, and the index area is generated according to the data stored in the data storage area.

As shown in fig. 7, the operation execution module provided for the present invention includes:

the parsing unit 201 is configured to parse the retrieval key value in the database index request to obtain a fuzzy retrieval value, where the fuzzy retrieval value is used to represent a data feature included in the retrieval key value.

In this module, the parsing unit 201 parses the search key value in the database index request, and parses the search key value to identify the content contained therein, extract the data characteristics thereof, and generate a corresponding fuzzy search value according to the data characteristics, where the fuzzy search value is used for performing fuzzy search in the index area, thereby primarily determining the location of the data.

The index retrieval unit 202 is configured to perform index retrieval on the index area according to the fuzzy retrieval value to obtain an index identifier, where the index identifier includes a time index identifier and/or a partition index identifier, the time index identifier corresponds to the independent storage areas one to one, and each independent storage area has an independent storage sub-area corresponding to the same partition index identifier.

In this module, the index retrieving unit 202 performs index retrieval on the index area according to the fuzzy retrieval value, so as to obtain corresponding index identifiers, where the index identifiers include time index identifiers and/or partition index identifiers, the time index identifiers correspond to the independent storage areas one to one, each independent storage area has an independent storage sub-area corresponding to the same partition index identifier, the time index identifiers are used for pointing to different independent storage areas, and the partition index identifiers point to all the same independent storage sub-areas.

And the data retrieval unit 203 is used for retrieving the independent storage area corresponding to the index identifier by taking the retrieval key value as a retrieval source to obtain an index operation result.

In this module, the data retrieval unit 203 narrows down the retrieval range according to the index identifier, and when the retrieval range is narrowed down to the minimum, the retrieval key value is used as the retrieval source to traverse all the data in the minimum range until the data matched with the data is found, and the retrieval operation result is obtained after the retrieval is finished.

As shown in fig. 8, the parsing unit provided for the present invention includes:

the keyword extraction subunit 2011 is configured to perform keyword extraction on the search keyword to obtain a keyword group, where the keyword group includes at least one keyword.

In the module, keyword extraction is performed on the retrieval key value, that is, the retrieval key value is divided into a plurality of phrases, then the content to be expressed of the current retrieval key value can be determined according to the semantics of the phrases, and the content to be expressed is re-described according to the preset keywords, so as to obtain the keyword group.

A judging subunit 2012, configured to judge whether time information is recorded according to the keywords in the keyword group;

It should be understood that, although the steps in the flowcharts of the embodiments of the present invention are shown in sequence as indicated by the arrows, the steps are not necessarily performed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in various embodiments may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A database indexing method, the method comprising:

generating index data directory entries according to the data analysis results;

writing index data directory entries into an index area, wherein the index area is divided into at least two independent index areas, and the number of the independent index areas is the same as that of the independent storage areas;

the step of performing an index operation on the database according to the retrieval key value in the database index request specifically includes:

2. The database indexing method according to claim 1, wherein the step of retrieving the independent storage area corresponding to the index identifier by using the retrieval key value as the retrieval source further comprises determining the index identifier:

3. The database indexing method according to claim 1, wherein the step of analyzing the search key value in the database index request to obtain the fuzzy search value specifically comprises:

4. The database indexing method according to claim 1, wherein the data in the independent storage subareas are stored in a T-tree structure.

5. The database indexing method according to claim 1, wherein the number of the independent storage areas is 24.

6. The database indexing method according to claim 1, wherein the data in the data storage area is encrypted when stored and decrypted when read.