CN114139040A

CN114139040A - Data storage and query method, device, equipment and readable storage medium

Info

Publication number: CN114139040A
Application number: CN202111400976.4A
Authority: CN
Inventors: 钱仕鹏; 范渊; 吴卓群; 王欣
Original assignee: DBAPPSecurity Co Ltd
Current assignee: DBAPPSecurity Co Ltd
Priority date: 2021-11-19
Filing date: 2021-11-19
Publication date: 2022-03-04

Abstract

The application discloses a data storage and query method, a device, equipment and a readable storage medium, wherein original data to be stored are stored in a preset non-relational database, key information of the original data to be stored and a storage identifier of the original data to be stored in the non-relational database are stored in an Elasticissearch server, and the data storage capacity in the Elasticissearch server is effectively reduced by utilizing the advantages of convenience in storage and cluster expansion of the non-relational database, so that the data storage capacity can adapt to the storage expansion trend of large-scale data; when data needs to be queried, the Elasticisch server is queried according to query conditions to obtain target key information and a target storage identifier corresponding to the target key information, then the target storage identifier is utilized to query in a non-relational database to obtain target original data, and the Elasticisch server with simplified storage is utilized to provide higher-performance query capability.

Description

Data storage and query method, device, equipment and readable storage medium

Technical Field

The present application relates to the field of storage technologies, and in particular, to a data storage and query method, apparatus, device, and readable storage medium.

Background

The existing society is a society with high-speed development, developed science and technology and information circulation, people communicate with each other more and more closely, the life is more and more convenient, and big data is a product of the high-tech era. For many industries, how to utilize these large-scale data is a key to gain competition. The value of big data is reflected in the following aspects: enterprises providing products or services for a large number of consumers can utilize big data to conduct accurate marketing, small and beautiful small enterprises can utilize big data to conduct service transformation, and traditional enterprises which need to be transformed under the internet pressure need to fully utilize the value of the big data all the time. The premise for the use of big data is storage and retrieval, thereby realizing the rapid storage of a large amount of data and rapidly inquiring the desired data in the collected data according to the retrieval requirements. The upper limit of query speed and storage capacity is a key factor in determining large data platforms.

The Elasticissearch is a search server based on a full-text search engine Lucene full-text search engine, provides a distributed full-text search engine with multi-user capability, is based on a RESTful web interface, and is frequently used for realizing storage and query of big data nowadays, namely, querying and returning result data of the query through an Elasticissearch cluster. However, when a large number of users perform queries, if the amount of data returned is too large or the query statement is too complex, the performance pressure of query and data transmission is fully borne by the Elasticsearch server, including the use of memory and CPU for data query and the size of bandwidth when outputting results. And because the number of the fragments (the unit of the data stored in the elastic search server can be understood as an area for storing the data, which must be set at the beginning of establishing the elastic search cluster and cannot be added later) of the elastic search server is fixed, as the amount of the stored data increases, the amount of the data of each fragment increases, which affects the query performance of the elastic search server, and meanwhile, when the size of the elastic search cluster is increased or the hard disk is expanded after the original hard disk of the elastic search is full, the fragment self-balancing of the elastic search cluster is involved.

Therefore, the existing scheme for realizing large data storage and query by using the Elasticissearch cluster has the problems of inconvenient expansion and poor query performance due to the explosive increase of the data volume.

Disclosure of Invention

The application aims to provide a data storage and query method, a data storage and query device, data storage equipment and a readable storage medium, which are used for improving the storage performance and query performance of the Elasticissearch cluster when large data is stored and queried and optimizing the expansion capability of the cluster.

In order to solve the above technical problem, the present application provides a data storage and query method, including:

when receiving original data to be stored, extracting key information of the original data to be stored;

storing the original data to be stored into a preset non-relational database to obtain a storage identifier of the original data to be stored in the non-relational database;

storing the key information and the storage identification into an Elasticissearch server;

when a data query request is received, acquiring query conditions of the data query request;

inquiring in the Elasticissearch server according to the inquiry condition to obtain target key information and a target storage identifier corresponding to the target key information;

and querying the non-relational database by using the target storage identifier to obtain target original data.

Optionally, the extracting key information of the raw data to be stored specifically includes:

performing word segmentation on the original data to be stored to obtain word segmentation results;

segmenting the word segmentation result according to commas to obtain a character string;

and taking the character string as the key information.

Optionally, the querying in the Elasticsearch server according to the query condition to obtain the target key information and the target storage identifier corresponding to the target key information specifically includes:

generating an Elasticissearch DSL query statement according to the query condition;

and querying in the Elasticissearch server by using the Elasticissearch DSL query statement to obtain the target key information and the target storage identifier corresponding to the target key information.

Optionally, the non-relational database is specifically MongoDB;

correspondingly, the storage identifier is specifically a unique identifier corresponding to the original data to be stored in the montgodb.

Optionally, the querying the non-relational database by using the target storage identifier to obtain target original data specifically includes:

generating a database query statement corresponding to the type of the non-relational database according to the target storage identifier;

and querying the non-relational database by using the database query statement to obtain the target original data.

Optionally, the querying in the non-relational database by using the target storage identifier to obtain target original data specifically includes:

and querying the target original data in the non-relational database by using the target storage identification and directly outputting the target original data from equipment where the non-relational database is located.

Optionally, the data storage and query method is applied to the Elasticsearch server.

In order to solve the above technical problem, the present application further provides a data storage and query apparatus, including:

the device comprises an extraction unit, a storage unit and a processing unit, wherein the extraction unit is used for extracting key information of original data to be stored when the original data to be stored is received;

the first storage unit is used for storing the original data to be stored into a preset non-relational database to obtain a storage identifier of the original data to be stored in the non-relational database;

the second storage unit is used for storing the key information and the storage identifier into an Elasticissearch server;

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring the query condition of a data query request when the data query request is received;

a first query unit, configured to query the Elasticsearch server according to the query condition to obtain target key information and a target storage identifier corresponding to the target key information;

and the second query unit is used for querying the non-relational database by using the target storage identifier to obtain target original data.

In order to solve the above technical problem, the present application further provides a data storage and query device, including:

a memory for storing instructions, said instructions comprising the steps of any of the above data storage and query methods;

a processor to execute the instructions.

To solve the above technical problem, the present application further provides a readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the data storage and query method according to any one of the above.

The data storage and query method provided by the application stores original data to be stored into a preset non-relational database, and stores key information of the original data to be stored and a storage identifier of the original data to be stored in the non-relational database into an Elasticissearch server; when data needs to be queried, query conditions are obtained from the data query request, target key information and a target storage identifier corresponding to the target key information are obtained by querying in an elastic search server according to the query conditions, and then target original data are obtained by querying in a non-relational database by using the target storage identifier. When the method is applied to solve the data storage problem, the advantages of convenient storage and cluster expansion of a non-relational database are utilized, the data storage capacity in the Elasticissearch server is effectively reduced, and the method can adapt to the storage expansion trend of large-scale data; when solving the query problem, the Elasticissearch server with simplified storage can be utilized to provide higher-performance query capability. Compared with the method for realizing large data storage and query by utilizing the Elasticissearch cluster in the prior art, the data storage and query method provided by the application has higher storage performance and query performance, can obviously reduce the throughput of the Elasticissearch under the condition of overlarge data amount, reduces the bandwidth pressure of the Elasticissearch, delays the query when the data order reaches hundreds of billions, is more convenient to store and expand infinitely, and meets the requirements of storage, query and expansion of the existing large data.

The application also provides a data storage and query device, equipment and a readable storage medium, which have the beneficial effects and are not described herein again.

Drawings

For a clearer explanation of the embodiments or technical solutions of the prior art of the present application, the drawings needed for the description of the embodiments or prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a flowchart of a data storage and query method according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a data storage and query apparatus according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a data storage and query device according to an embodiment of the present application.

Detailed Description

The core of the application is to provide a data storage and query method, a data storage and query device, data storage equipment and a readable storage medium, which are used for improving the storage performance and query performance of the Elasticissearch cluster when large data is stored and queried and optimizing the expansion capability of the cluster.

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Example one

Fig. 1 is a flowchart of a data storage and query method according to an embodiment of the present disclosure.

As shown in fig. 1, the data storage and query method provided in the embodiment of the present application includes:

s101: when receiving the original data to be stored, extracting key information of the original data to be stored.

S102: and storing the original data to be stored into a preset non-relational database to obtain a storage identifier of the original data to be stored in the non-relational database.

S103: storing the key information and the storage identification into an Elasticissearch server.

S104: when a data query request is received, query conditions of the data query request are obtained.

S105: and inquiring in the Elasticissearch server according to the inquiry condition to obtain the target key information and the target storage identification corresponding to the target key information.

S106: and querying in a non-relational database by using the target storage identifier to obtain target original data.

In specific implementation, the data storage and query method provided by the embodiment of the present application integrates an elastic search server and a non-relational database, and the data storage and query method provided by the embodiment of the present application can be applied to an elastic search server, can also be implemented based on a server where the non-relational database is located, and can also be implemented based on a server separately arranged in a cluster.

The steps S101 to S103 and the steps S104 to S106 have no sequence relationship, and may be performed synchronously or with sequence changed. Step S101 and step S102 may also be performed synchronously or in reversed order.

For step S101, the type of the key information of the original data to be stored is determined based on the type of the original data to be stored and a query condition commonly used by a user when searching the original data to be stored, and the key information of the original data to be stored is extracted and stored in the Elasticsearch server, that is, only the thumbnail information of the original data to be stored is stored in the Elasticsearch server, so that the query is facilitated without occupying a large amount of storage space of the Elasticsearch server. Extracting key information of original data to be stored, which may specifically include: performing word segmentation on original data to be stored to obtain word segmentation results; segmenting the word segmentation result according to the comma to obtain a character string; the character string is used as key information.

For step S102, a non-relational database is selected according to the type of the raw data to be stored, and the raw data to be stored is stored in the non-relational database. The storage identification is an identification which is established after the data is stored in the non-relational database and is uniquely corresponding to the data, and the mode of generating the storage identification is related to the type of the non-relational database, and the storage identification is used for directly or indirectly indicating the storage position of the data.

Preferably, the non-relational database may be a MongoDB. MongoDB is a database based on distributed file storage, is written in C + + language and aims to provide an extensible high-performance data storage solution for Web application. Correspondingly, the storage identifier is specifically a unique identifier corresponding to the original data to be stored in the montogb. And if the data model of the document type non-relational database represented by the MongoDB is a Key Value pair corresponding to Key-Value, and Value is structured data, the storage identifier is the Key Value Key corresponding to the structured original data Value. MongoDB has the defect of low query performance, in the embodiment of the application, the Elasticissearch is combined to store the key information of the original data and the unique identifier of the original data in the MongoDB, the unique identifier of the original data can be rapidly queried by using the powerful search capability of the Elasticissearch, and then the original data can be conveniently queried in the MongoDB by using the unique identifier.

For step S103, performing word segmentation on the original data to be stored, segmenting the word segmentation result according to the comma to form a string of character strings, and storing the character strings corresponding to the original data to be stored and the storage identifier of the original data to be stored in the non-relational database into the Elasticsearch server.

For step S104, when the input data query request is received, the data query request is analyzed to obtain query conditions, such as data keywords, attribute value ranges, and the like.

For step S105, by using the powerful search function of the Elasticsearch server, the Elasticsearch server queries, according to the query condition, to obtain the target key information corresponding to the query condition and the target storage identifier corresponding to the target key information. At this time, the target storage identifier is temporarily stored in the memory of the Elasticsearch server.

Specifically, step S105: querying in the Elasticsearch server according to the query condition to obtain the target key information and the target storage identifier corresponding to the target key information, which may specifically include: generating an Elasticissearch DSL query statement according to the query condition; and querying in an Elasticsearch server by using an Elasticsearch DSL (Domain Specific Language) query statement to obtain target key information and a target storage identifier corresponding to the target key information.

For step S106, the target storage identity is sent to the server where the non-relational database resides. Since the storage identifier is the identifier uniquely corresponding to the original data, which is established according to the storage rule of the non-relational database in step S102, the server where the non-relational database is located can quickly query and obtain the corresponding target original data according to the target storage identifier.

According to the data storage and query method provided by the embodiment of the application, original data to be stored are stored in a preset non-relational database, and key information of the original data to be stored and a storage identifier of the original data to be stored in the non-relational database are stored in an Elasticisarch server; when data needs to be queried, query conditions are obtained from the data query request, target key information and a target storage identifier corresponding to the target key information are obtained by querying in an elastic search server according to the query conditions, and then target original data are obtained by querying in a non-relational database by using the target storage identifier. When the method is applied to solve the data storage problem, the advantages of convenient storage and cluster expansion of a non-relational database are utilized, the data storage capacity in the Elasticissearch server is effectively reduced, and the method can adapt to the storage expansion trend of large-scale data; when solving the query problem, the Elasticissearch server with simplified storage can be utilized to provide higher-performance query capability. Compared with the method for realizing large data storage and query by using the Elasticissearch cluster in the prior art, the data storage and query method provided by the embodiment of the application has higher storage performance and query performance, can obviously reduce the throughput of the Elasticissearch under the condition of overlarge data amount, lightens the bandwidth pressure of the Elasticissearch and the query delay when the data order reaches a billion level, is more convenient to store and expand infinitely, and meets the requirements of storage, query and expansion of the existing large data.

Example two

When the types of the raw data to be stored are many, different types of raw data to be stored are applicable to different non-relational databases.

For example, a document-type non-relational database, such as MongoDB and CouchDB, is suitable for Web applications, stores data in the form of Key-Value pairs corresponding to Key-Value pairs, and has the advantages that the data structure requirement is not strict, the table structure is variable, and the table structure does not need to be predefined like a relational database; the disadvantages are that the query performance is not high and the same query syntax is lacking. The Key-Value type non-relational database, such as Redis, Voldemont, Oracle BDB, has a data model of Key-Value pair pointing to Value, is usually realized by hash table, is suitable for content caching, is mainly used for processing high access load of a large amount of data, and is also used for some log systems. The column-type non-relational database, such as Cassandra, HBase and Riak, is suitable for a distributed file system, and has the advantages of high search speed, strong expandability and easy distributed expansion, and the defect of relatively limited functions. The Graph-type non-relational database, such as Neo4J, InfoGrid and Infinite Graph, is suitable for social networks, recommendation systems and the like, focuses on constructing a relational Graph, and stores data in a Graph structure mode.

When aiming at a big data storage scheme integrating various data types, different types of non-relational databases can be selected according to the data types, a data analysis script is established in advance and used for analyzing the types of original data to be stored and selecting the non-relational databases of corresponding types according to the types of the original data to be stored for storage, when data is inquired, the data analysis script is used for determining the types of the original data or the types of the non-relational databases according to a target storage identifier obtained by inquiry of an Elasticissearch server, and then inquiry sentences corresponding to the types of the non-relational databases are generated and input into the corresponding non-relational databases for inquiry.

On the basis of the foregoing embodiment, in the data storage and query method provided in the embodiment of the present application, step S106: the method for obtaining the target original data by querying the target storage identifier in the non-relational database specifically comprises the following steps:

and querying in a non-relational database by using a database query statement to obtain target original data.

EXAMPLE III

In the prior art, when the storage and query of the original data are realized by using the Elasticsearch cluster, due to the query characteristics of the Elasticsearch cluster, the original data queried according to the query conditions are stored in the memory and then output, so that limited memory resources are greatly occupied, and the data input and output efficiency is reduced. In the data storage and query method provided by the embodiment of the application, the non-relational database is used for storing the original data, and the Elasticsearch server only needs to store the queried target storage identifier in the memory and send the target storage identifier to the server where the non-relational database is located, so that the memory space can be released. On the basis of the foregoing embodiment, in order to further reduce the occupation of memory resources, in the data storage and query method provided in the embodiment of the present application, step S106: the method comprises the following steps of utilizing a target storage identifier to query in a non-relational database to obtain target original data, and specifically comprising the following steps:

and querying target original data in the non-relational database by using the target storage identifier and directly outputting the target original data from equipment where the non-relational database is located.

In the specific implementation, the input and output work of the original data is realized by the equipment where the non-relational database is located, the occupation of memory resources is further reduced on the basis of unloading the data IO of the Elasticissearch to the equipment where the non-relational database is located by utilizing the advantages that the non-relational database is convenient to store and output data, namely, only the storage identifier of the original data needs to be temporarily stored in the memory of the Elasticissearch server, and the equipment where the non-relational database is located directly outputs the target original data after quickly inquiring the target original data according to the storage identifier, so that the storage of the server is optimized, and the resource waste is avoided.

On the basis of the above detailed description of the embodiments corresponding to the data storage and query method, the present application also discloses a data storage and query device, an apparatus and a readable storage medium corresponding to the above method.

Example four

Fig. 2 is a schematic structural diagram of a data storage and query device according to an embodiment of the present disclosure.

As shown in fig. 2, the data storage and query apparatus provided in the embodiment of the present application includes:

an extracting unit 201, configured to, when receiving original data to be stored, extract key information of the original data to be stored;

the first storage unit 202 is configured to store original data to be stored in a preset non-relational database, so as to obtain a storage identifier of the original data to be stored in the non-relational database;

a second storage unit 203, configured to store the key information and the storage identifier in an Elasticsearch server;

the first query unit 204 is configured to query the Elasticsearch server according to the query condition to obtain target key information and a target storage identifier corresponding to the target key information;

and the second query unit 205 is configured to query the non-relational database with the target storage identifier to obtain target raw data.

Since the embodiments of the apparatus portion and the method portion correspond to each other, please refer to the description of the embodiments of the method portion for the embodiments of the apparatus portion, which is not repeated here.

EXAMPLE five

As shown in fig. 3, the data storage and query device provided in the embodiment of the present application includes:

a memory 310 for storing instructions, the instructions comprising the steps of the data storage and query method according to any of the above embodiments;

a processor 320 for executing the instructions.

Processor 320 may include one or more processing cores, such as a 3-core processor, an 8-core processor, and so forth. The processor 320 may be implemented in at least one hardware form of a digital Signal processing (dsp), a Field-Programmable Gate Array (FPGA), a Programmable Logic Array (pla), or a digital Signal processing (dsp). The processor 320 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a central Processing unit (cpu); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 320 may be integrated with an image processor GPU (graphics Processing unit), which is responsible for rendering and drawing the content that the display screen needs to display. In some embodiments, processor 320 may also include an Artificial Intelligence (AI) (artificial intelligence) processor for processing computational operations related to machine learning.

Memory 310 may include one or more readable storage media, which may be non-transitory. Memory 310 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In this embodiment, the memory 310 is at least used for storing the following computer program 311, wherein after the computer program 311 is loaded and executed by the processor 320, the relevant steps in the data storage and query method disclosed in any of the foregoing embodiments can be implemented. In addition, the resources stored by the memory 310 may also include an operating system 312, data 313, and the like, and the storage may be transient storage or persistent storage. The operating system 312 may be Windows, among others. Data 313 may include, but is not limited to, data involved in the above-described methods.

In some embodiments, the data storage and query device may also include a display 330, a power source 340, a communication interface 350, an input output interface 360, sensors 370, and a communication bus 380.

Those skilled in the art will appreciate that the architecture shown in FIG. 3 does not constitute a limitation of data storage and querying devices and may include more or fewer components than those shown.

The data storage and query device provided by the embodiment of the application comprises the memory and the processor, and the processor can realize the data storage and query method when executing the program stored in the memory, and the effect is the same as that of the data storage and query method.

EXAMPLE six

It should be noted that the above-described embodiments of the apparatus and device are merely illustrative, for example, the division of modules is only one division of logical functions, and there may be other divisions when actually implementing, for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form. Modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may be stored in a readable storage medium. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium and executes all or part of the steps of the methods described in the embodiments of the present application, or all or part of the technical solutions.

To this end, an embodiment of the present application further provides a readable storage medium, where a computer program is stored on the readable storage medium, and when the computer program is executed by a processor, the computer program implements steps such as a data storage and query method.

The readable storage medium may include: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory ROM (Read-Only Memory), a random Access Memory ram (random Access Memory), a magnetic disk, or an optical disk.

The computer program contained in the readable storage medium provided in this embodiment can implement the steps of the data storage and query method described above when executed by the processor, and the effect is the same as above.

The data storage and query methods, apparatuses, devices and readable storage media provided by the present application are described in detail above. The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device, the apparatus and the readable storage medium disclosed by the embodiments correspond to the method disclosed by the embodiments, so that the description is simple, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.

It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims

1. A data storage and query method is characterized by comprising the following steps:

2. The data storage and query method according to claim 1, wherein the extracting key information of the original data to be stored specifically comprises:

and taking the character string as the key information.

3. The data storage and query method according to claim 1, wherein the querying in the Elasticsearch server according to the query condition to obtain target key information and a target storage identifier corresponding to the target key information specifically includes:

4. The data storage and retrieval method of claim 1, wherein the non-relational database is specifically a MongoDB;

5. The data storage and query method according to claim 1, wherein the querying the non-relational database using the target storage identifier to obtain target raw data specifically comprises:

6. The data storage and query method according to claim 1, wherein the query in the non-relational database using the target storage identifier obtains target raw data, specifically:

7. The data storage and query method according to claim 1, wherein the data storage and query method is applied to the Elasticsearch server.

8. A data storage and retrieval apparatus, comprising:

9. A data storage and query device, comprising:

a memory for storing instructions, said instructions comprising the steps of the data storage and retrieval method of any one of claims 1 to 7;

a processor to execute the instructions.

10. A readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the data storage and query method according to any one of claims 1 to 7.