CN111625600B - Data storage processing method, system, computer equipment and storage medium - Google Patents

Data storage processing method, system, computer equipment and storage medium Download PDF

Info

Publication number
CN111625600B
CN111625600B CN202010433605.5A CN202010433605A CN111625600B CN 111625600 B CN111625600 B CN 111625600B CN 202010433605 A CN202010433605 A CN 202010433605A CN 111625600 B CN111625600 B CN 111625600B
Authority
CN
China
Prior art keywords
data
real
database
time
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010433605.5A
Other languages
Chinese (zh)
Other versions
CN111625600A (en
Inventor
盛森林
范渊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
DBAPPSecurity Co Ltd
Original Assignee
DBAPPSecurity Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by DBAPPSecurity Co Ltd filed Critical DBAPPSecurity Co Ltd
Priority to CN202010433605.5A priority Critical patent/CN111625600B/en
Publication of CN111625600A publication Critical patent/CN111625600A/en
Application granted granted Critical
Publication of CN111625600B publication Critical patent/CN111625600B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a data storage processing method, a system, a computer device and a storage medium, wherein the data storage processing method comprises the following steps: dividing the acquired data into real-time data and historical data according to service time; the real-time data is stored in an elastic search database and the history data is stored in a Carbondata database. The application solves the problem of low efficiency in the process of storing cold and hot data.

Description

Data storage processing method, system, computer equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a data storage processing method, a data storage processing system, a computer device, and a storage medium.
Background
In the current environment of growing data, how to efficiently store data and use the data becomes a topic of interest for each enterprise. In the related art, a hot identifier indicating the frequency of data access is generally added to a logical-physical address mapping table of a memory, and the hot identifier indicates the hot level of data, so that cold and hot data are separated. However, since the related art needs to perform wear balancing in order to balance the wear level of each data block during the data writing process, the wear balancing affects the access frequency of the data, and thus the efficiency of the separation result of the cold and hot data is not high.
Aiming at the problem of low efficiency in the processing process of cold and hot data storage in the related art, no effective solution has been proposed yet.
Disclosure of Invention
The embodiment of the application provides a data storage processing method, a system, computer equipment and a storage medium, which are used for at least solving the problem of low efficiency in the cold and hot data storage processing process in the related technology.
In a first aspect, an embodiment of the present application provides a processing system for data storage, where the system includes a server; wherein the server is provided with an elastic search database and a Carbondata database;
the server is used for acquiring the data of each data source and dividing the data into real-time data and historical data according to service time;
the server stores the real-time data in the elastic search database and stores the history data in the Carbondata database.
In some of these embodiments, the system further comprises a terminal; the terminal is connected with the server;
the server divides the historical data into historical simple data and historical complex data according to the service time under the condition of receiving a query instruction sent by the terminal;
the server queries historical simple data in the Carbondata database and the elastic search database under the condition that the query instruction is a simple data query instruction;
and the server queries the historical complex data in the Carbondata database and the elastic search database under the condition that the query instruction is a complex data query instruction.
In some of these embodiments, the server is further configured to write the real-time data to a Kafka distribution;
and the server reads the real-time data distributed by the Kafka through a link engine and writes the real-time data into the elastomer search database for real-time indexing.
In some embodiments, the server is further configured to obtain index name information of the real-time data according to the service time.
In some of these embodiments, the server is further configured to write the history data to a Hadoop distributed file system (Hadoop Distributed File System, abbreviated HDFS);
and the server reads the historical data of the HDFS through a Spark engine and writes the historical data into the Carbondata database for offline storage.
In a second aspect, an embodiment of the present application provides a method for processing data storage, where the method includes:
dividing the acquired data into real-time data and historical data according to the service time;
the real-time data is stored in an elastic search database and the history data is stored in a Carbondata database.
In some of these embodiments, after the storing the history data in a Carbondata database, the method further comprises:
under the condition that a query instruction sent by a terminal is received, dividing the historical data into historical simple data and historical complex data according to the service time;
querying historical simple data in the Carbondata database and the elastiscearch database under the condition that the query instruction is a simple data query instruction;
and querying historical complex data in the Carbondata database and the elastic search database under the condition that the query instruction is a complex data query instruction.
In some of these embodiments, the storing the real-time data in an elastic search database comprises:
writing the real-time data into a Kafka distributed type;
and reading the real-time data distributed by the Kafka through a flink engine, and writing a real-time index in the elastic search database.
In some of these embodiments, after the writing of the real-time index in the elastic search database, the method further comprises:
and acquiring index name information of the real-time data according to the service time.
In some of these embodiments, the storing the history data in a Carbondata database comprises:
writing the history data into an HDFS;
and reading the historical data of the HDFS through a Spark engine, and writing the historical data into the Carbondata database for offline storage.
In a third aspect, an embodiment of the present application provides a computer device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements a method for processing data storage according to the second aspect.
In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method of processing data storage as described in the second aspect above.
Compared with the related art, the processing method, the system, the computer equipment and the storage medium for data storage provided by the embodiment of the application divide acquired data into real-time data and historical data according to service time; the real-time data is stored in an elastic search database, and the history data is stored in a Carbondata database, so that the problem of low efficiency in the process of storing cold and hot data is solved.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the other features, objects, and advantages of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
FIG. 1 is a schematic diagram of an application scenario of a data storage processing method according to an embodiment of the present application;
FIG. 2 is a flow chart of a data storage processing method according to an embodiment of the present application;
FIG. 3 is a second flowchart of a data storage processing method according to an embodiment of the present application;
FIG. 4 is a third flowchart of a data storage processing method according to an embodiment of the present application;
FIG. 5 is a flow chart diagram of a data storage processing method according to an embodiment of the present application;
FIG. 6 is a block diagram of a data storage processing system in accordance with an embodiment of the present application;
FIG. 7 is a block diagram illustrating a second embodiment of a data storage processing system;
fig. 8 is a schematic diagram of a hardware structure of a computer device according to an embodiment of the present application.
Detailed Description
The present application will be described and illustrated with reference to the accompanying drawings and examples in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application. All other embodiments, which can be made by a person of ordinary skill in the art based on the embodiments provided by the present application without making any inventive effort, are intended to fall within the scope of the present application.
It is apparent that the drawings in the following description are only some examples or embodiments of the present application, and it is possible for those of ordinary skill in the art to apply the present application to other similar situations according to these drawings without inventive effort. Moreover, it should be appreciated that while such a development effort might be complex and lengthy, it would nevertheless be a routine undertaking of design, fabrication, or manufacture for those of ordinary skill having the benefit of this disclosure, and thus should not be construed as having the benefit of this disclosure.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is to be expressly and implicitly understood by those of ordinary skill in the art that the described embodiments of the application can be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this application belongs. The terms "a," "an," "the," and similar referents in the context of the application are not to be construed as limiting the quantity, but rather as singular or plural. The terms "comprising," "including," "having," and any variations thereof, are intended to cover a non-exclusive inclusion; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to only those steps or elements but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. The terms "connected," "coupled," and the like in connection with the present application are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as used herein means two or more. "and/or" describes an association relationship of an association object, meaning that there may be three relationships, e.g., "a and/or B" may mean: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. The terms "first," "second," "third," and the like, as used herein, are merely distinguishing between similar objects and not representing a particular ordering of objects.
The method for scheduling processing provided by the application can be applied to an application environment shown in figure 1. Wherein the terminal 12 communicates with the server 14 via a network. The server 14 acquires data of each data source and divides the data into real-time data and historical data according to service time; the server 14 stores the real-time data in an elastic search database and the history data in a Carbondata database; the server 14 receives the inquiry command sent by the terminal 12 and divides the history data into history simple data and history complex data according to the service time; the server 14 performs a corresponding query according to the query instruction, and sends the query result to the terminal 12 for display. The terminal 12 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 14 may be implemented as a stand-alone server or a server cluster composed of a plurality of servers.
In this embodiment, a processing method for data storage is provided. Fig. 2 is a flowchart of a data storage processing method according to an embodiment of the present application, as shown in fig. 2, the flowchart includes the following steps:
step S202, dividing the acquired data into real-time data and historical data according to service time; wherein, the data can be acquired and processed by a data acquisition and processing tool such as a flime or sqoop, collecting Mysql, syslog or http protocol interfaces and the like to obtain various data source format data; the data within the latest certain period of time may be divided into real-time data according to the business time, for example, data in which the business time is the current day is determined as the real-time data, and data in which the business time is the previous day is determined as the history data.
Step S204, storing the real-time data as hot data in an elastic search database, and storing the history data as cold data in a Carbondata database; the Carbondata database establishes understanding of stored data through file type multi-level indexing, reduces consumption of a central processing unit (central processing unit, CPU for short) and a memory in the searching process through dictionary coding, is high in data point searching efficiency due to unique indexing characteristics, does not exceed response time of 3 seconds when a 5 trillion 3pb data ordering table is searched, reduces disk space of data storage through efficient data compression storage, and can rapidly filter data which does not meet searching conditions in historical data. In addition, the cold and hot data are respectively stored in the Carbondata database and the elastic search database, so that the method can be suitable for a scene of accessing a situation-aware large screen and interacting with a large amount of alarm data and flow data in a data center in real time.
Through the steps S202 to S204, the data is defined as cold and hot data according to the grading of the service time to the data use degree, and two distributed data storage technologies of an elastic search and a Carbondata are adopted to fully exert the advantages of the two technologies, mutually cooperate and supplement the defects of the respective storage technologies, and store different types of data in decibels, so that the resources of the memory and the hard disk storage of the server 14 are effectively exerted under the condition that the data query, the use and the analysis are not influenced, and the hardware cost is saved; meanwhile, through accurate positioning of data and reasonable application of technical architecture, the comprehensive utilization rate of the server 14 is improved, the query and analysis efficiency of the data is improved, and the problem of low efficiency in the process of storing cold and hot data is solved.
In some of these embodiments, a method of processing data storage is provided. FIG. 3 is a second flowchart of a data storage processing method according to an embodiment of the present application, as shown in FIG. 3, the flowchart includes the following steps:
step S302, under the condition that a query instruction sent by the terminal 12 is received, dividing the historical data into historical simple data and historical complex data according to the service time; for example, the server 14 determines the history data of the business time within 15 days as the history simple data, and the history data of the business time not within 15 days as the history complex data; wherein the query instruction is for instructing the server 14 to query the user-specified data.
Step S304, in the case that the query instruction is a simple data query instruction, querying the historical simple data in the Carbondata database and the elastic search database; wherein, the server 14 stores the historical simple data as hot data, and then the server 14 queries the Carbondata hot data index and the real-time hot data index of the elastic search database respectively for simple type data query; if the server 14 does not query the user-specified data, it may also query the Carbondata database for cold data; the real-time data of the elastiscearch is not so much, so the query result can be returned quickly.
Under the condition that the query instruction is a complex data query instruction, querying historical complex data in the Carbondata database and the elastic search database; wherein the server 14 can query and calculate a real-time simple data index, and then combine with the offline complex type data index to obtain a query result; the server 14 queries for cold data may be returned on the order of seconds due to the efficient data indexing of the Carbondata database, the data structure and caching of the datamap, and the advantage of extremely fast queries for a single table.
Through the steps S302 to S304, different query processing is performed on the two types of data in the query stage by the server 14, so that the requirement of mass data on large memory for quick query is reduced, the advantages and disadvantages of each distributed storage can be fully exerted while the performance is ensured, the storage cost is greatly saved, and the quick query of mass data is realized.
In some of these embodiments, a method of processing data storage is provided. Fig. 4 is a flowchart III of a data storage processing method according to an embodiment of the present application, as shown in fig. 4, the flowchart including the steps of:
step S402, writing the real-time data into a Kafka distributed type; reading the Kafka distributed real-time data through a flink engine, and writing a real-time index in the elastic search database; the real-time data processing engine fin can process real-time data, and the flink engine is very suitable for real-time data processing due to high performance, high reliability and high expansibility; the results of the data processing are written in real time into the elastic search database and stored as hot data.
In some embodiments, according to the service time, index name information of the real-time data is obtained; for example, the index data name written into the elastic search database can be used as a suffix by taking the time of the day, so that a user can conveniently perform multi-index query, and the quick query on real-time data is realized.
Through the step S402, the data processing in the real-time data processing engine flink read processing Kafka is written into the real-time index of the elastic search database, so that the high performance of the flink engine is fully utilized, and the fast index of the real-time data by the elastic search database is facilitated.
In some of these embodiments, a method of processing data storage is provided. Fig. 5 is a flowchart of a data storage processing method according to an embodiment of the present application, as shown in fig. 5, the flowchart including the steps of:
step S502, writing the history data into the HDFS; reading the historical data of the HDFS through a Spark engine, and writing the historical data into the Carbondata database for offline storage; in the low peak period of the traffic flow, generally about early morning, the user can input an offline processing instruction at the terminal 12, the server 14 performs offline batch task processing according to the received offline processing instruction, processes the data stored in the HDFS, refreshes and corrects the real-time data, but the server 14 does not directly operate the generated real-time index, but operates the real-time index which is no longer ordered at the query end and has expired; the results of the same batch process are also divided into simple type data and complex traffic type data and stored in different indexes. The advantage of using the Carbondata database at this time is that complex data analysis including association aggregation and the like can be conveniently performed by relying on the Spark distributed computing engine; after the offline batch process is completed, the server 14 may then purge the real-time data of the previous cycle because the real-time data of the previous cycle has been summarized into the historical data.
Through the step S502, the historical data is written into the Carbondata database through the Spark engine for offline storage, and Spark sql is used as a technology for processing structured data in Spark technology, and by means of a full-function processing operator, a Spark optimizer (Catalyst) with excellent processing performance has wide support and application for good big data ecological support; the CarbonData database can be seamlessly integrated with sparkSQL, and simultaneously, the CarbonData is used as one of the storage formats supported by sparkSQL, so that the excellent processing capacity of Spark can be utilized, and the quick inquiry and complex analysis of big data can be realized by adding the excellent index expression of the CarbonData database, and multi-table association can be effectively carried out, thereby further effectively improving the processing efficiency of data storage.
It should be understood that, although the steps in the flowcharts of fig. 2 to 5 are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 2-5 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the sub-steps or stages are performed necessarily occur in sequence, but may be performed alternately or alternately with at least a portion of the other steps or sub-steps of other steps.
In this embodiment, a processing system for data storage is provided. FIG. 6 is a block diagram of a data storage processing system in accordance with an embodiment of the present application, as shown in FIG. 6, including a server 14, the server 14 being provided with an elastomer search database 62 and a Carbondata database 64; the server 14 is configured to obtain data of each data source, and divide the data into real-time data and historical data according to service time; the server 14 stores the real-time data in the elastesearch database 62 and the history data in the Carbondata database 64.
Through the above embodiment, the server 14 ranks the data usage degree according to the service time, defines the data as hot and cold data, and fully uses two distributed data storage technologies, namely an elastic search and a Carbondata, to cooperate with each other and supplement the respective deficiencies, and store different types of data, so that the memory and hard disk storage resources of the server 14 are efficiently used without affecting the data query, use and analysis, and the hardware cost is saved; meanwhile, through accurate positioning of data and reasonable application of technical architecture, the comprehensive utilization rate of the server 14 is improved, the query and analysis efficiency of the data is improved, and the problem of low efficiency in the processing process of cold and hot data storage is solved.
In some of these embodiments, a processing system for data storage is provided. FIG. 7 is a block diagram illustrating a second configuration of a data storage processing system according to an embodiment of the present application, as shown in FIG. 7, the system further including a terminal 12; wherein the terminal 12 is connected to the server 14; the server 14 divides the history data into history simple data and history complex data according to the service time in the case of receiving the inquiry command transmitted from the terminal 12; the server 14 queries the historical simple data in the Carbondata database 64 and the elastsearch database 62 in the case where the query is a simple data query; the server 14 queries the historical complex data in the Carbondata database 64 and the elastic search database 62 in the case where the query is a complex data query.
Through the above embodiment, different query processing is performed on the two types of data in the query stage through the server 14, so that the requirement for large memory for quick query of mass data is reduced, the advantages and disadvantages of each distributed storage can be fully exerted while the performance is ensured, the storage cost is greatly saved, and the quick query of mass data is realized.
In some of these embodiments, the server 14 is also configured to write the real-time data to Kafka distribution; the server 14 reads the real-time data distributed by the Kafka through a flink engine and writes a real-time index into the elastomer search database 62. The server 14 reads and processes data in Kafka through a real-time data processing engine, writes the data into the real-time index of the elastic search database, fully utilizes the high performance of the flink engine, and is beneficial to realizing the rapid index of the real-time data by the elastic search database.
In some embodiments, the server 14 is further configured to obtain index name information of the real-time data according to the service time; for example, the server 14 sets the index data name written into the elastic search database to a name with the time of day as a suffix, thereby facilitating multi-index query by the user and realizing quick query on real-time data.
In some of these embodiments, the server 14 is also configured to write the history data to a Hadoop distributed file system HDFS; the server 14 reads the history data of the HDFS via Spark engine and writes it to the Carbondata database 64 for offline storage. The CarbonData database can be seamlessly integrated with sparkSQL, and simultaneously, the CarbonData is used as one of the storage formats supported by sparkSQL, so that the excellent processing capacity of Spark can be utilized, and the quick inquiry and complex analysis of big data can be realized by adding the excellent index expression of the CarbonData database, and multi-table association can be effectively carried out, thereby further effectively improving the processing efficiency of data storage.
In addition, the processing method of the data storage according to the embodiment of the present application described in connection with fig. 2 may be implemented by a computer device. Fig. 8 is a schematic diagram of a hardware structure of a computer device according to an embodiment of the present application.
The computer device may include a processor 81 and a memory 82 storing computer program instructions.
In particular, the processor 81 may comprise a CPU, or an application specific integrated circuit (Application Specific Integrated Circuit, simply ASIC), or may be configured as one or more integrated circuits implementing embodiments of the present application.
Memory 82 may include, among other things, mass storage for data or instructions. By way of example, and not limitation, memory 82 may comprise a Hard Disk Drive (HDD), floppy Disk Drive, solid state Drive (Solid State Drive, SSD), flash memory, optical Disk, magneto-optical Disk, tape, or universal serial bus (Universal Serial Bus, USB) Drive, or a combination of two or more of the foregoing. The memory 82 may include removable or non-removable (or fixed) media, where appropriate. The memory 82 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 82 is a Non-Volatile (Non-Volatile) memory. In a particular embodiment, the Memory 82 includes Read-Only Memory (ROM) and random access Memory (Random Access Memory, RAM). Where appropriate, the ROM may be a mask-programmed ROM, a programmable ROM (Programmable Read-Only Memory, abbreviated PROM), an erasable PROM (Erasable Programmable Read-Only Memory, abbreviated EPROM), an electrically erasable PROM (Electrically Erasable Programmable Read-Only Memory, abbreviated EEPROM), an electrically rewritable ROM (Electrically Alterable Read-Only Memory, abbreviated EAROM), or a FLASH Memory (FLASH), or a combination of two or more of these. The RAM may be Static Random-Access Memory (SRAM) or dynamic Random-Access Memory (Dynamic Random Access Memory DRAM), where the DRAM may be a fast page mode dynamic Random-Access Memory (Fast Page Mode Dynamic Random Access Memory FPMDRAM), extended data output dynamic Random-Access Memory (Extended Date Out Dynamic Random Access Memory EDODRAM), synchronous dynamic Random-Access Memory (Synchronous Dynamic Random-Access Memory SDRAM), or the like, as appropriate.
Memory 82 may be used to store or cache various data files that need to be processed and/or communicated, as well as possible computer program instructions for execution by processor 81.
The processor 81 implements the processing method of any of the data storage of the above embodiments by reading and executing the computer program instructions stored in the memory 82.
In some of these embodiments, the computer device may also include a communication interface 83 and a bus 80. As shown in fig. 8, the processor 81, the memory 82, and the communication interface 83 are connected to each other via the bus 80 and perform communication with each other.
The communication interface 83 is used to enable communication between modules, devices, units and/or units in embodiments of the application. Communication port 83 may also enable communication with other components such as: and the external equipment, the image/data acquisition equipment, the database, the external storage, the image/data processing workstation and the like are used for data communication.
Bus 80 includes hardware, software, or both, coupling components of the computer device to each other. Bus 80 includes, but is not limited to, at least one of: data Bus (Data Bus), address Bus (Address Bus), control Bus (Control Bus), expansion Bus (Expansion Bus), local Bus (Local Bus). By way of example, and not limitation, bus 80 may include a graphics acceleration interface (Accelerated Graphics Port), abbreviated AGP, or other graphics Bus, an enhanced industry standard architecture (Extended Industry Standard Architecture, abbreviated EISA) Bus, a Front Side Bus (FSB), a HyperTransport (HT) interconnect, an industry standard architecture (Industry Standard Architecture, ISA) Bus, a wireless bandwidth (InfiniBand) interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a micro channel architecture (Micro Channel Architecture, abbreviated MCa) Bus, a peripheral component interconnect (Peripheral Component Interconnect, abbreviated PCI) Bus, a PCI-Express (PCI-X) Bus, a serial advanced technology attachment (Serial Advanced Technology Attachment, abbreviated SATA) Bus, a video electronics standards association local (Video Electronics Standards Association Local Bus, abbreviated VLB) Bus, or other suitable Bus, or a combination of two or more of the foregoing. Bus 80 may include one or more buses, where appropriate. Although embodiments of the application have been described and illustrated with respect to a particular bus, the application contemplates any suitable bus or interconnect.
The computer device may execute the processing method for data storage in the embodiment of the present application based on the acquired real-time data and history data, thereby implementing the processing method described in connection with fig. 2.
In addition, in combination with the processing method of data storage in the above embodiment, the embodiment of the present application may be implemented by providing a computer readable storage medium. The computer readable storage medium has stored thereon computer program instructions; the computer program instructions, when executed by a processor, implement a method of processing data storage of any of the above embodiments.
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims (10)

1. A processing system for data storage, wherein the system comprises a server and a terminal; wherein the server is provided with an elastic search database and a Carbondata database;
the server is used for acquiring the data of each data source and dividing the data into real-time data and historical data according to service time;
the server storing the real-time data in the elastic search database and the history data in the Carbondata database; the server is further used for writing the historical data into a Hadoop Distributed File System (HDFS);
the server reads the historical data of the HDFS through a Spark engine and writes the historical data into the Carbondata database for offline storage; and the server is also used for carrying out offline batch task processing according to the offline processing instruction, processing the data stored in the HDFS, refreshing and correcting the real-time data based on the acquired offline processing instruction.
2. The processing system of claim 1, wherein the system further comprises a terminal; the terminal is connected with the server;
the server divides the historical data into historical simple data and historical complex data according to the service time under the condition of receiving a query instruction sent by the terminal;
the server queries historical simple data in the Carbondata database and the elastic search database under the condition that the query instruction is a simple data query instruction;
and the server queries the historical complex data in the Carbondata database and the elastic search database under the condition that the query instruction is a complex data query instruction.
3. The processing system of claim 1, wherein the server is further configured to write the real-time data to a Kafka distribution;
and the server reads the real-time data distributed by the Kafka through a link engine and writes the real-time data into the elastomer search database for real-time indexing.
4. A processing system according to claim 3, wherein the server is further configured to obtain index name information of the real-time data according to the service time.
5. A method of processing data storage, the method comprising:
dividing the acquired data into real-time data and historical data according to the service time;
storing the real-time data in an elastic search database and the history data in a Carbondata database; writing the historical data into a Hadoop Distributed File System (HDFS);
reading the historical data of the HDFS through a Spark engine, and writing the historical data into the carbon data database for offline storage; and performing offline batch task processing based on the acquired offline processing instruction, processing the data stored in the HDFS, refreshing and correcting the real-time data.
6. The processing method of claim 5, wherein after storing the history data in a Carbondata database, the method further comprises:
under the condition that a query instruction sent by a terminal is received, dividing the historical data into historical simple data and historical complex data according to the service time;
querying historical simple data in the Carbondata database and the elastiscearch database under the condition that the query instruction is a simple data query instruction;
and querying historical complex data in the Carbondata database and the elastic search database under the condition that the query instruction is a complex data query instruction.
7. The processing method of claim 5, wherein storing the real-time data in an elastic search database comprises:
writing the real-time data into a Kafka distributed type;
and reading the real-time data distributed by the Kafka through a flink engine, and writing a real-time index in the elastic search database.
8. The processing method of claim 7, wherein after said writing the real-time index in the elastomer search database, the method further comprises:
and acquiring index name information of the real-time data according to the service time.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the processing method according to any of claims 5 to 8 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the processing method according to any one of claims 5 to 8.
CN202010433605.5A 2020-05-21 2020-05-21 Data storage processing method, system, computer equipment and storage medium Active CN111625600B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010433605.5A CN111625600B (en) 2020-05-21 2020-05-21 Data storage processing method, system, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010433605.5A CN111625600B (en) 2020-05-21 2020-05-21 Data storage processing method, system, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111625600A CN111625600A (en) 2020-09-04
CN111625600B true CN111625600B (en) 2023-10-31

Family

ID=72260073

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010433605.5A Active CN111625600B (en) 2020-05-21 2020-05-21 Data storage processing method, system, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111625600B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112115114A (en) * 2020-09-25 2020-12-22 北京百度网讯科技有限公司 Log processing method, device, equipment and storage medium
CN112241419B (en) * 2020-10-29 2023-05-02 浙江集享电子商务有限公司 Service data processing method, device, computer equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108197289A (en) * 2018-01-18 2018-06-22 吉浦斯信息咨询(深圳)有限公司 A kind of data store organisation, data store query method, terminal and medium
WO2018170276A2 (en) * 2017-03-15 2018-09-20 Fauna, Inc. Methods and systems for a database
CN109871367A (en) * 2019-02-28 2019-06-11 江苏实达迪美数据处理有限公司 A kind of distributed cold and heat data separation method based on Redis and HBase
US10409516B1 (en) * 2018-01-12 2019-09-10 EMC IP Holding Company LLC Positional indexing for a tiered data storage system
CN110795427A (en) * 2019-09-27 2020-02-14 苏宁云计算有限公司 Data separation storage method and device, computer equipment and storage medium
CN110928906A (en) * 2019-11-08 2020-03-27 杭州安恒信息技术股份有限公司 Method for writing carbon data only once based on flink

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9934294B2 (en) * 2014-09-26 2018-04-03 Wal-Mart Stores, Inc. System and method for using past or external information for future search results

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018170276A2 (en) * 2017-03-15 2018-09-20 Fauna, Inc. Methods and systems for a database
US10409516B1 (en) * 2018-01-12 2019-09-10 EMC IP Holding Company LLC Positional indexing for a tiered data storage system
CN108197289A (en) * 2018-01-18 2018-06-22 吉浦斯信息咨询(深圳)有限公司 A kind of data store organisation, data store query method, terminal and medium
CN109871367A (en) * 2019-02-28 2019-06-11 江苏实达迪美数据处理有限公司 A kind of distributed cold and heat data separation method based on Redis and HBase
CN110795427A (en) * 2019-09-27 2020-02-14 苏宁云计算有限公司 Data separation storage method and device, computer equipment and storage medium
CN110928906A (en) * 2019-11-08 2020-03-27 杭州安恒信息技术股份有限公司 Method for writing carbon data only once based on flink

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
军人电子健康档案大数据即席查询统计子***的设计与实现;迟晨阳;孟海滨;秦栋梁;钱诚;赵东升;毛华坚;;军事医学(12);全文 *
黑马程序员.《ThinkPHP 5框架原理与实战》.北京铁道出版社,2018,第293页. *

Also Published As

Publication number Publication date
CN111625600A (en) 2020-09-04

Similar Documents

Publication Publication Date Title
US20100287166A1 (en) Method and system for search engine indexing and searching using the index
CN108197296B (en) Data storage method based on Elasticissearch index
CN110851474A (en) Data query method, database middleware, data query device and storage medium
CN111625600B (en) Data storage processing method, system, computer equipment and storage medium
CN111061758B (en) Data storage method, device and storage medium
CN113485962B (en) Log file storage method, device, equipment and storage medium
CN111258978A (en) Data storage method
US20230128085A1 (en) Data aggregation processing apparatus and method, and storage medium
CN112613271A (en) Data paging method and device, computer equipment and storage medium
US10866960B2 (en) Dynamic execution of ETL jobs without metadata repository
CA3094727C (en) Transaction processing method and system, and server
CN114398520A (en) Data retrieval method, system, device, electronic equipment and storage medium
CN113849499A (en) Data query method and device, storage medium and electronic device
CN111858581B (en) Paging query method and device, storage medium and electronic equipment
CN107179883B (en) Spark architecture optimization method of hybrid storage system based on SSD and HDD
CN113157609A (en) Storage system, data processing method, data processing device, electronic device, and storage medium
CN112181302A (en) Data multilevel storage and access method and system
CN113342813B (en) Key value data processing method, device, computer equipment and readable storage medium
CN111881086B (en) Big data storage method, query method, electronic device and storage medium
CN113157629A (en) Data processing method and device, electronic equipment and storage medium
CN114064729A (en) Data retrieval method, device, equipment and storage medium
CN115794806A (en) Gridding processing system, method and device for financial data and computing equipment
CN106528577B (en) Method and device for setting file to be cleaned
CN116126546B (en) Performance optimization method and device, electronic equipment and medium
US20240086095A1 (en) Data layout optimization for object-oriented storage engine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant