CN115455149B

CN115455149B - Database construction method based on coding query mode and application thereof

Info

Publication number: CN115455149B
Application number: CN202211141746.5A
Authority: CN
Inventors: 郁强; 潘蓝翔; 赵军辉; 黄红叶
Original assignee: CCI China Co Ltd
Current assignee: CCI China Co Ltd
Priority date: 2022-09-20
Filing date: 2022-09-20
Publication date: 2023-05-30
Anticipated expiration: 2042-09-20
Also published as: CN115455149A

Abstract

The application provides a database construction method based on a coding query mode and application thereof, and the method comprises the following steps: s00, traversing the original data and analyzing to obtain dimension keywords of each data in the original data; s01, coding each dimension keyword so that the codes are associated with the dimension keywords in a one-to-one correspondence manner; s02, constructing a mapping grid relation of a database according to the mapping relation among the dimension keywords; s03, correcting the mapping grid relation of the database and determining the data set storage mode of the database according to the corresponding relation between the codes and the dimension keywords; s04, constructing a query key based on a structure of a data set storage mode of a database; s05, storing the original data in a database in a determined data set storage mode. The method has the advantages of quickly constructing the database, quickly expanding the database and improving the query efficiency.

Description

Database construction method based on coding query mode and application thereof

Technical Field

The application relates to the technical field of databases, in particular to a database construction method based on a coding query mode and application thereof.

Background

With the continuous development of modern society, the application of networks is becoming wider and wider, and corresponding data transmitted through networks is becoming more and more. For processing and utilizing such data, the data is typically kept in a database, where network data may be kept for analysis of user behavior, for measurement and optimization of the network, and for auditing. Especially in the technical field of urban brain and urban management, the data volume is huge, data of multiple dimensions in one city are often involved, and the multiple data are often associated, so that a large amount of data are stored in a database, much calculation power is wasted during data retrieval or inquiry, the inquiry and retrieval speed is low, and the waiting time of a user is increased.

However, in the conventional technology, when the database is searched and queried, keywords are often input, then data are extracted or ordered based on the keywords, when the keywords span multiple dimensions, corresponding time is further increased, the query city dimension wage income database is taken as an example, the search keywords comprise data of men, zhejiang people and income in three dimensions, in the process of searching, all the data in the database need to be traversed one by one, then the data comprising the three dimensions are extracted to obtain a preliminary query database, and the preliminary query database is further analyzed to obtain a query result, so that the query time is long and the calculation power consumption is high.

As can be seen, a large and easily expandable database is required for storage because of the large amount of network data and the possible large variety of network data. At the same time, for various applications requiring database queries, rapid queries for large-capacity databases are also very important. Therefore, a database construction method based on the code query mode and application thereof are needed to solve the above problems.

Disclosure of Invention

The embodiment of the application provides a database construction method based on a coding query mode and application thereof, aiming at the problems of high calculation resource consumption, large data volume and the like in the prior art.

The core technology of the invention mainly comprises three directions: 1, establishing a database; 2, expanding the capacity of a database; 3, querying a database.

In a first aspect, the present application provides a method for constructing a database based on a coding query mode, where the method includes the following steps:

s00, traversing the original data and analyzing to obtain dimension keywords of each data in the original data;

s01, coding each dimension keyword so that the codes are associated with the dimension keywords in a one-to-one correspondence manner;

s02, constructing a mapping grid relation of a database according to the mapping relation among the dimension keywords;

s03, correcting the mapping grid relation of the database and determining the data set storage mode of the database according to the corresponding relation between the codes and the dimension keywords;

s04, constructing a query key based on a structure of a data set storage mode of a database;

the inquiry key consists of codes corresponding to the keywords in each dimension;

s05, storing the original data in a database in a determined data set storage mode.

Further, the method also comprises a step for expanding the database, and specifically comprises the following steps:

s10, receiving and analyzing the capacity expansion data to be stored in the database to obtain capacity expansion dimension keywords of the capacity expansion data;

s11, comparing the capacity expansion dimension keywords with original dimension keywords in a database to obtain differences between the capacity expansion dimension keywords and the original dimension keywords;

s12, judging the difference between the two;

s13, if the difference is greater than the threshold value, encoding the expansion dimension keywords to construct a new encoding, updating a data set storage mode of the database according to the corresponding relation between the new encoding and the expansion dimension keywords, and updating the query key at the same time, so that the expansion data is stored in the database in the updated data set storage mode to finish expansion; if the difference is smaller than or equal to the threshold value, integrating the expansion dimension keywords into the original dimension keywords, and storing the expansion data into a database by the original data set to finish expansion.

Further, the method also comprises the step of inquiring the database, and specifically comprises the following steps of:

s20, receiving and analyzing an input query key to acquire dimension keywords contained in the query key;

s21, confirming a corresponding query code based on the acquired dimension keywords;

wherein the query code is a code for a query;

s22, extracting data related to the query codes from the database according to the query codes.

Further, in step S00, each dimension keyword is output for the user to preview and review, and accepts modification, deletion or addition of the user.

Further, step S06 is further included, extracting dimension keywords based on machine learning.

Further, in step S01, each dimension keyword is encoded in digital form.

Further, in step S02, the dimension keywords that can be summarized in the upper level are subjected to the upper level to obtain the upper-level dimension keywords, so that a mapping relationship is formed between the lower-level dimension keywords and the upper-level dimension keywords.

In a second aspect, the present application provides a database query expansion device based on a coding query mode, including:

the building module is used for traversing the original data and analyzing the original data to obtain the dimension keywords of each data in the original data; encoding each dimension keyword so that the encoding is associated with the dimension keywords in a one-to-one correspondence; constructing a mapping grid relation of a database according to the mapping relation among the dimension keywords; correcting the mapping grid relation of the database and determining the data set storage mode of the database according to the corresponding relation between the codes and the dimension keywords; constructing a query key based on the structure of a data set storage mode of a database; storing the original data in a database in a determined data set storage mode;

the capacity expansion module is used for receiving and analyzing capacity expansion data to be stored in the database so as to obtain capacity expansion dimension keywords of the capacity expansion data; comparing the capacity expansion dimension keywords with original dimension keywords in a database to obtain differences between the capacity expansion dimension keywords and the original dimension keywords; judging the difference between the two; if the difference is greater than the threshold value, encoding the expansion dimension keywords to construct a new code, updating a data set storage mode of the database according to the corresponding relation between the new code and the expansion dimension keywords, and updating the query key at the same time to enable the expansion data to be stored in the database in the updated data set storage mode to finish expansion; if the difference is smaller than or equal to the threshold value, integrating the capacity expansion dimension keywords into the original dimension keywords, and storing the capacity expansion data into a database by using the original data set to finish capacity expansion;

the query module is used for receiving and analyzing the input query key to acquire the dimension keywords contained in the query key; confirming a corresponding query code based on the acquired dimension keywords;

and the output module is used for outputting the queried data.

In a third aspect, the present application provides an electronic device, including a memory, in which a computer program is stored, and a processor configured to run the computer program to perform the above-described method of constructing a database based on a coded query pattern.

In a fourth aspect, the present application provides a readable storage medium having stored therein a computer program comprising program code for controlling a process to execute a process comprising a database construction method based on a coded query pattern according to the above.

The main contributions and innovation points of the invention are as follows: 1. compared with the prior art, in the process of constructing the database, the method and the device for constructing the data set based on the encoding by analyzing the original data have the advantages that the data is input with a certain association relation by constructing the data set storage mode and the query key based on the encoding, so that ordered storage of the data is facilitated, data stacking storage in the traditional technology is replaced, storage is more scientific, and particularly when the data volume is huge, the storage of the data is ordered, and the rapid construction of the database is facilitated;

2. compared with the prior art, in the capacity expansion process of the database, the fact that the keywords of the data possibly change along with the increase of the service time of the database is considered, and the original database can not necessarily cover the dimension keywords, so that in the capacity expansion process, the capacity expansion dimension keywords of new data can be identified, the capacity expansion dimension keywords are encoded, the new codes are added into a data set storage mode, and the data storage mode can be updated;

3. compared with the prior art, in the data query process, the dimension information of the query keywords can be obtained based on the query keywords, the codes corresponding to the dimension information can be obtained based on the dimension information, the data can be rapidly extracted in the database through the codes, and meanwhile, the association relation between different data can be extracted, so that the response efficiency of data query can be improved.

The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the other features, objects, and advantages of the application.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

FIG. 1 is a flow chart of a method of constructing a database based on a coded query pattern according to an embodiment of the present application;

FIG. 2 is a flow chart of database expansion;

FIG. 3 is a flow chart of a database query;

fig. 4 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with aspects of one or more embodiments of the present description as detailed in the accompanying claims.

It should be noted that: in other embodiments, the steps of the corresponding method are not necessarily performed in the order shown and described in this specification. In some other embodiments, the method may include more or fewer steps than described in this specification. Furthermore, individual steps described in this specification, in other embodiments, may be described as being split into multiple steps; while various steps described in this specification may be combined into a single step in other embodiments.

Because the network data has a large data volume and may vary more, a large-capacity easily-expandable database is required for storage. At the same time, for various applications requiring database queries, rapid queries for large-capacity databases are also very important.

Based on the method, the database is built based on the keywords to solve the problems existing in the prior art.

Example 1

Specifically, the embodiment of the application provides a database construction method based on a coding query mode, and specifically, referring to fig. 1, the method includes:

in this embodiment, the original data is known data, and in the process of parsing the original data, the content included in the query field is split according to a language type, so as to obtain multiple dimension keywords, where the language type includes a word breaking sequence and a word association degree in the same data.

Preferably, after the dimension keywords are generated, a preview function can be provided, that is, all dimension keywords are previewed for user review, and the user can access to revisions of the dimension keywords based on the review results, for example, the dimension keywords can be deleted or the dimension keywords can be revised by supplementing the dimension keywords.

Preferably, the construction of the original database and the deep learning technology can be combined to realize the automatic extraction of the dimension keywords, and the extraction of the dimension keywords is more accurate along with the increase of the data volume.

preferably, each dimension keyword can be coded in a digital manner, so that the construction of the codes is further simplified, and the data set storage mode and the establishment of the query key are facilitated.

The corresponding relation between English letters and numbers is as example: a (1), B (2), C (3), D (4), E (5), F (6), G (7), H (8), I (9), J (210), K (211), L (212), M (213), N (214), O (215), P (216), Q (217), R (218), S (219), T (220), U (221), V (222), W (223), X (224), Y (225), Z (226).

in this embodiment, for example, taking "income of man and Zhejiang people" as an example, the original database may further include "income of woman and Zhejiang people", that is, four dimension keywords, which are parallel mapping relations, may also be used for upper positions of man and woman to obtain upper dimension keywords, where the upper dimension keywords may be gender, the gender and Zhejiang people and income are cross mapping relations, and the man, woman and gender are subordinate mapping relations.

in this embodiment, the dimension keywords in the mapping grid relationship are replaced by codes, so that a data set storage mode and a query key can be constructed, and the data set storage mode and the query key are constructed as gridded databased.

In this embodiment, an encrypted digital library is constructed, for example, based on the english alphabet, each english letter corresponds to an encrypted number, in the process of determining the code based on the dimension keyword, the first letters of the pinyin of the first character and the last character of each keyword are extracted, and the code is determined based on the first letters of the first character and the last character of the keyword and the encrypted number, if the keyword is male, the number corresponding to n is 214, and the number corresponding to x is 224, then the code corresponding to the keyword for male is 214224. Further, after determining the codes corresponding to the keywords, the confirmed codes can be stored to construct a code mapping relation, when determining the codes corresponding to the next keywords, the new codes and the code mapping relation are required to be compared, whether the new codes and the code mapping relation are repeated or not is determined, if the repetition is performed, the second pinyin letters of the first characters of the keywords are taken again to encrypt the keywords, and the cycle is performed until the new codes and the code mapping relation are not repeated, and the coding of the keywords can be completed based on the repetition.

It can be understood that the arrangement is convenient for coding the keywords, realizes the encryption of the keywords, and is convenient for coding the deciphered codes.

As shown in fig. 2, the capacity expansion step includes:

in this embodiment, as the usage time of the database increases, the keywords of the data may change, the original database may not necessarily cover the future dimension keywords, and also taking "man's income of Zhejiang" as an example, a large amount of "man's income of Zhejiang" and "man's financial income of Zhejiang" appear in the subsequent data input process.

s12, judging the difference between the two;

preferably, a similarity algorithm may be employed herein to determine or calculate the magnitude of the difference, or other machine learning model to calculate the similarity or magnitude of the difference.

In this embodiment, it may be understood that, in the process of comparing the expanded dimension keyword with the dimension keyword in the data volume, the comparison result may be determined based on the number of differences between the expanded dimension keyword and the angle of the number of overlapping, for example, in the case that the number of overlapping is very small, the new expanded data is not easy to be stored in the database, and it should be recommended to establish a new database; if the overlap ratio is greater than the first threshold (for example, the value is greater than 30%) and the number of differences is greater than 2, the difference between the two can be considered to be large in this case, and if the overlap ratio is greater than the first threshold (for example, the value is greater than 30%) and the number of differences is less than or equal to 2, the difference between the two can be considered to be small.

As shown in fig. 3, the querying step (applicable to the unexpanded database and the expanded database) includes:

wherein the query code is a code for a query;

In this embodiment, taking "man's income of Zhejiang" as an example, in the construction process of the database, the man's corresponding code 001, zhejiang corresponding code 002, and the income corresponding code 003, in the query process, the data in the database can be directly extracted based on 001002003, the query key in the database can be simultaneously extracted, the extracted data can be displayed in a mapping grid manner, the rapid query of the data can be performed, and meanwhile, the accurate analysis is facilitated.

Preferably, during the encoding process, the data can be encoded by escape (string) function and ncodeURI (URIstring) function, and the query key can be resolved by encodeURIComponent (URIstring) to extract the data quickly.

Preferably, the present application is also suitable for processing big data of city dimensions, and can store financial information and traffic data of users, and when storing the financial information, the financial information includes: gender, age, corporate nature, etc.; when traffic information storage is involved, the information may include flow rate, number of vehicles, and speed of the vehicle. Such as passenger side, driver side and government traffic control departments that need to drive out.

Taking the application at the passenger end as an example, the database can count the destination, the origin and the time period of the passenger as keywords, and a plurality of keywords can be included below the destination and the origin, for example, the keywords can relate to shopping places, transportation hub places and public service places (such as schools), and the database related to the destination, the origin and the time period can be built based on the keywords;

by taking the application of the database to the driver side as an example, the driver can quickly acquire the client arrival and departure directions of different time periods by inquiring the database, and based on the database, the equal passenger position of the driver can be promoted to be adapted to the client to send out the starting place, so that the efficiency of passengers to take the automobile can be improved, and the passenger carrying efficiency of the driver side can also be improved.

The same database can also serve government institutions, the government institutions can acquire the position information of passengers and drivers, in this case, the drivers can include taxi drivers and private car drivers, the capacity of the database can be larger through quick and efficient storage and inquiry of the data, the data coverage range of the database is wider, the venues of the data can be clearer due to coding statistics, and government functional departments can plan urban dimensions based on the venues of the data, such as road construction and maintenance planning, schools, market area division and the like.

Example two

Based on the same conception, the application also provides a database query capacity expansion device based on a coding query mode, which comprises the following steps:

and the output module is used for outputting the queried data.

Example III

This embodiment also provides an electronic device, referring to fig. 4, comprising a memory 404 and a processor 402, the memory 404 having stored therein a computer program, the processor 402 being arranged to run the computer program to perform the steps of any of the method embodiments described above.

In particular, the processor 402 may include a Central Processing Unit (CPU), or an Application Specific Integrated Circuit (ASIC), or may be configured to implement one or more integrated circuits of embodiments of the present application.

The memory 404 may include, among other things, mass storage 404 for data or instructions. By way of example, and not limitation, memory 404 may comprise a Hard Disk Drive (HDD), floppy disk drive, solid State Drive (SSD), flash memory, optical disk, magneto-optical disk, tape, or Universal Serial Bus (USB) drive, or a combination of two or more of these. Memory 404 may include removable or non-removable (or fixed) media, where appropriate. Memory 404 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 404 is a Non-Volatile (Non-Volatile) memory. In particular embodiments, memory 404 includes Read-only memory (ROM) and Random Access Memory (RAM). Where appropriate, the ROM may be a mask-programmed ROM, a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), an electrically rewritable ROM (EAROM) or FLASH memory (FLASH) or a combination of two or more of these. The RAM may be Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM) where appropriate, and the DRAM may be fast page mode dynamic random access memory 404 (FPMDRAM), extended Data Output Dynamic Random Access Memory (EDODRAM), synchronous Dynamic Random Access Memory (SDRAM), or the like.

Memory 404 may be used to store or cache various data files that need to be processed and/or used for communication, as well as possible computer program instructions for execution by processor 402.

The processor 402 reads and executes the computer program instructions stored in the memory 404 to implement any of the database construction methods based on the coded query pattern in the above embodiments.

Optionally, the electronic apparatus may further include a transmission device 406 and an input/output device 408, where the transmission device 406 is connected to the processor 402 and the input/output device 408 is connected to the processor 402.

The transmission device 406 may be used to receive or transmit data via a network. Specific examples of the network described above may include a wired or wireless network provided by a communication provider of the electronic device. In one example, the transmission device includes a network adapter (Network Interface Controller, simply referred to as NIC) that can connect to other network devices through the base station to communicate with the internet. In one example, the transmission device 406 may be a Radio Frequency (RF) module, which is configured to communicate with the internet wirelessly.

The input-output device 408 is used to input or output information. In this embodiment, the input information may be data to be stored or a query key, and the output information may be data to be queried.

Example IV

The present embodiment also provides a readable storage medium having stored therein a computer program including program code for controlling a process to execute the process including the database construction method based on the coded query pattern according to the first embodiment.

It should be noted that, specific examples in this embodiment may refer to examples described in the foregoing embodiments and alternative implementations, and this embodiment is not repeated herein.

In general, the various embodiments may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. Some aspects of the invention may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device, although the invention is not limited thereto. While various aspects of the invention may be illustrated and described as block diagrams, flow charts, or using some other pictorial representation, it is well understood that these blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.

Embodiments of the invention may be implemented by computer software executable by a data processor of a mobile device, such as in a processor entity, or by hardware, or by a combination of software and hardware. Computer software or programs (also referred to as program products) including software routines, applets, and/or macros can be stored in any apparatus-readable data storage medium and they include program instructions for performing particular tasks. The computer program product may include one or more computer-executable components configured to perform embodiments when the program is run. The one or more computer-executable components may be at least one software code or a portion thereof. In addition, in this regard, it should be noted that any blocks of the logic flows as illustrated may represent program steps, or interconnected logic circuits, blocks and functions, or a combination of program steps and logic circuits, blocks and functions. The software may be stored on a physical medium such as a memory chip or memory block implemented within a processor, a magnetic medium such as a hard disk or floppy disk, and an optical medium such as, for example, a DVD and its data variants, a CD, etc. The physical medium is a non-transitory medium.

It should be understood by those skilled in the art that the technical features of the above embodiments may be combined in any manner, and for brevity, all of the possible combinations of the technical features of the above embodiments are not described, however, they should be considered as being within the scope of the description provided herein, as long as there is no contradiction between the combinations of the technical features.

The foregoing examples merely represent several embodiments of the present application, the description of which is more specific and detailed and which should not be construed as limiting the scope of the present application in any way. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims

1. The database construction method based on the coding query mode is characterized by comprising the following steps:

s01, encoding each dimension keyword so that the codes are associated with the dimension keywords in a one-to-one correspondence manner;

s03, correcting the mapping grid relation of the database by using the corresponding relation between the codes and the dimension keywords and determining the data set storage mode of the database;

s04, constructing a query key based on the structure of the data set storage mode of the database;

the inquiry key consists of codes corresponding to each dimension keyword;

s05, storing the original data in the database in a determined data set storage mode;

the method also comprises a step for expanding the database, and specifically comprises the following steps:

s11, comparing the capacity expansion dimension keywords with original dimension keywords in the database to obtain differences between the capacity expansion dimension keywords and the original dimension keywords;

s12, judging the difference between the two;

s13, if the difference is greater than a threshold value, encoding the capacity expansion dimension keywords to construct a new code, updating a data set storage mode of the database according to the corresponding relation between the new code and the capacity expansion dimension keywords, and updating a query key at the same time, so that the capacity expansion data is stored in the database in the updated data set storage mode to finish capacity expansion; if the difference is smaller than or equal to a threshold value, integrating the expansion dimension keywords into the original dimension keywords, and storing the expansion data into the database according to the original data set to finish expansion.

2. The method for constructing a database based on a coded query pattern according to claim 1, further comprising a step for querying the database, specifically comprising the steps of:

wherein the query code is a code for a query;

s22, extracting data related to the query codes in the database according to the query codes.

3. The method for constructing a database according to claim 1, wherein in step S00, each dimension keyword is output for a user to review in a preview, and accept modification, deletion or addition of the user.

4. The method for constructing a database according to claim 1, further comprising step S06 of extracting the dimension keyword based on machine learning.

5. The method for constructing a database according to the code query method of claim 4, wherein in step S01, each of the dimension keywords is coded in a digital form.

6. The method for constructing a database based on a coded query pattern as claimed in claim 5, wherein in step S02, the dimension keywords that can be summarized in the upper level are subjected to the upper level to obtain the dimension keywords in the upper level, so that a mapping relationship is formed between the dimension keywords in the lower level and the dimension keywords in the upper level.

7. The database inquiry capacity-expanding device based on the coding inquiry mode is characterized by comprising:

and the output module is used for outputting the queried data.

8. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, the processor being arranged to run the computer program to perform the coded query pattern based database construction method of any of claims 1 to 6.

9. A readable storage medium, characterized in that the readable storage medium has stored therein a computer program comprising program code for controlling a process to execute a process comprising the method of constructing a database based on the coded query pattern according to any one of claims 1 to 6.