CN111930837A - Mass data processing method and system based on preposed distributed database - Google Patents

Mass data processing method and system based on preposed distributed database Download PDF

Info

Publication number
CN111930837A
CN111930837A CN202010703239.0A CN202010703239A CN111930837A CN 111930837 A CN111930837 A CN 111930837A CN 202010703239 A CN202010703239 A CN 202010703239A CN 111930837 A CN111930837 A CN 111930837A
Authority
CN
China
Prior art keywords
cluster
transaction date
data
distributed database
current time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010703239.0A
Other languages
Chinese (zh)
Inventor
刘跃红
余丽玲
管正爽
郭倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yinsheng Payment Service Co Ltd
Original Assignee
Yinsheng Payment Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yinsheng Payment Service Co Ltd filed Critical Yinsheng Payment Service Co Ltd
Priority to CN202010703239.0A priority Critical patent/CN111930837A/en
Publication of CN111930837A publication Critical patent/CN111930837A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a mass data processing method based on a preposed distributed database, which comprises the following steps: the method comprises the following steps: acquiring current time and transaction date recorded in a distributed database, wherein the distributed database comprises a front cluster and a full cluster; step two: comparing a distance between the current time and the transaction date to a threshold; step three: if the distance between the current time and the transaction date is larger than a threshold value, the data corresponding to the transaction date is cold data, and the data corresponding to the transaction date is stored in the full-scale cluster; step four: and if the distance between the current time and the transaction date is smaller than or equal to a threshold value, the data corresponding to the transaction date is thermal data, and the data corresponding to the transaction date is stored to the front-end cluster. The embodiment of the invention is convenient for separating hot data and cold data, and reduces the load capacity of the distributed database.

Description

Mass data processing method and system based on preposed distributed database
Technical Field
The invention relates to the field of distributed databases, in particular to a mass data processing method and system based on a preposed distributed database.
Background
With the rapid development of services and the rapid increase of data volume, data storage and data use gradually become the bottleneck of the system.
The current solution to the database bottleneck can adopt a master-slave synchronous read-write separation and a database-based table division scheme. The read-write separation can be combined with a distributed database, so that the read performance is effectively improved, and the influence of frequent read requests on the write function of the database is reduced; the sub-database and sub-table can solve the problem of data writing of non-mass data, and when the data volume reaches the PB level and writing requests are frequent, the writing performance is seriously reduced due to consumption of selecting writing nodes.
Summary of the invention
In order to overcome the defects of the prior art, the invention provides a mass data processing method based on a preposed distributed database, which is used for separating hot data from cold data and solving the problem of high impact of mass data on the database.
The technical scheme adopted by the invention for solving the technical problems is as follows: a mass data processing method based on a preposed distributed database comprises the following steps: the method comprises the following steps: acquiring current time and transaction date recorded in a distributed database, wherein the distributed database comprises a front cluster and a full cluster; step two: comparing a distance between the current time and the transaction date to a threshold; step three: if the distance between the current time and the transaction date is larger than a threshold value, the data corresponding to the transaction date is cold data, and the data corresponding to the transaction date is stored in the full-scale cluster; step four: and if the distance between the current time and the transaction date is smaller than or equal to a threshold value, the data corresponding to the transaction date is thermal data, and the data corresponding to the transaction date is stored to the front-end cluster.
Preferably, before the acquiring the current time and the transaction date recorded in the distributed database, the steps further include:
and creating an interface of the current application program based on different tables in the distributed database and different authorities of different users.
Preferably, before the creating of the interface of the current application, the steps further include:
and creating tables, indexes and fragment structures with consistent structures in the front-end cluster and the full-scale cluster.
Preferably, after the data corresponding to the transaction date is thermal data and is stored in the pre-cluster if the distance between the current time and the transaction date is less than or equal to a threshold, the step further includes:
and periodically storing the data in the front cluster to the full-scale cluster through a script program.
Preferably, after the data in the pre-cluster is stored to the full-scale cluster by a script program, the steps further include:
and clearing the expired data in the pre-cluster through a script program at regular time according to a custom routing rule.
A mass data processing system based on a pre-populated distributed database, the system comprising:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring the current time and the transaction date recorded in a distributed database, and the distributed database comprises a front cluster and a full cluster;
the comparison unit is used for comparing the distance between the current time and the transaction date with a threshold value;
the first storage unit is used for storing the data corresponding to the transaction date to the full-volume cluster if the distance between the current time and the transaction date is larger than a threshold value, wherein the data corresponding to the transaction date is cold data;
and the second storage unit is used for storing the data corresponding to the transaction date to the preposed cluster if the distance between the current time and the transaction date is less than or equal to a threshold value.
Preferably, the system further comprises:
and the first creating unit is used for creating an interface of the current application program based on different tables in the distributed database and different authorities of different users.
Preferably, the system further comprises:
and the second creating unit is used for creating tables, indexes and fragment structures with consistent structures in the front cluster and the full cluster.
Preferably, the system further comprises:
and the third storage unit is used for storing the data in the pre-cluster to the full-scale cluster at regular time through a script program.
Preferably, the system further comprises:
and the clearing unit is used for clearing the expired data in the pre-cluster according to the custom routing rule at regular time through a script program.
The invention has the beneficial effects that: the distance between the current time and the transaction date is compared with a threshold value, so that whether the current data is cold data or hot data is judged and stored in different distributed databases respectively, the hot data and the cold data are separated, and the load capacity of the distributed databases is reduced.
Drawings
Fig. 1 is a flow chart diagram of a mass data processing method based on a preposed distributed database.
FIG. 2 is a functional block diagram of a mass data processing system based on a pre-populated distributed database.
FIG. 3 is another functional block diagram of a mass data processing system based on a pre-populated distributed database.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The following detailed description of specific implementations of the present invention is provided in conjunction with specific embodiments:
the first embodiment is as follows:
fig. 1 shows an implementation flow of a real-time push printing method based on an intelligent POS machine according to an embodiment of the present invention, and for convenience of description, only parts related to the embodiment of the present invention are shown, which are detailed as follows:
in step S101: acquiring current time and transaction date recorded in a distributed database, wherein the distributed database comprises a front cluster and a full cluster;
in the embodiment of the application, the current time and the transaction date recorded in the distributed database are obtained, and two clusters are deployed in the distributed database system: one front cluster and the other full cluster; the pre-cluster stores a small amount of data, the full-scale cluster stores full-scale data, the distributed database comprises various merchant transaction flow meters, transaction dates and current time are obtained from the merchant transaction flow meters, and therefore the merchant transaction flow meters can be conveniently classified according to the transaction dates.
Preferably, before the obtaining of the current time and the transaction date recorded in the distributed database, an interface of the current application program is created based on different tables in the distributed database and different permissions of different users. Further preferably, before the creating of the interface of the current application program, tables, indexes and shards with consistent structures are created in the pre-cluster and the full-scale cluster.
In order to ensure the safety and controllability of data, different interfaces are developed for different tables and different authorities of different users to provide services for the outside.
The mongodb cluster M1 is used as a preposed distributed storage platform to store data of nearly 10 days; cluster M2 stores the full amount of data as a storage distributed storage platform; for example, a TRADE _ DETAIL table is created in the ODS libraries of the M1 and M2 clusters, a joint index of date + sort fields is created, and the date + sort is distributed as a shard as a distribution rule.
In step S102: comparing a distance between the current time and the transaction date to a threshold;
in the embodiment of the application, the distance between the acquired current time and the transaction date recorded in the distributed database is compared with the threshold, the threshold is generally set to be 10 days according to the actual service requirement, the distance between the transaction date and the current time is compared with 10 working days, the type of the data is further judged according to the comparison result, and the data can be conveniently stored in different storage platforms according to the type of the data.
In step S103: if the distance between the current time and the transaction date is larger than a threshold value, the data corresponding to the transaction date is cold data, and the data corresponding to the transaction date is stored in the full-scale cluster;
in the embodiment of the application, if the distance between the current time and the transaction date is larger than the threshold, the data corresponding to the transaction date is cold data, the data corresponding to the transaction date is stored in the full-scale cluster, and when the cold data is judged, the merchant transaction flow water meter corresponding to the transaction date is stored in the full-scale cluster.
In step S104: and if the distance between the current time and the transaction date is smaller than or equal to a threshold value, the data corresponding to the transaction date is thermal data, and the data corresponding to the transaction date is stored to the front-end cluster.
In the embodiment of the application, when the distance between the current time and the transaction date is less than or equal to the threshold, the data corresponding to the transaction date is thermal data, and the merchant transaction flow meter corresponding to the transaction date is stored in the front-end cluster. The front-end cluster only stores a small amount of data in the last 10 days, the corresponding speed is high, and the transaction detail data adding, deleting, modifying and checking operations are facilitated. According to the actual situation of the business, the hot data are routed to different distributed storage platforms for operation according to the custom rule, most of transaction detail data adding, deleting, modifying and checking operations are data of nearly several days, date is used as a routing rule, the front-end cluster M1 stores data of nearly 10 days, and the M2 full-scale cluster stores full-scale data. The data in nearly 10 days (the current day is minus 10< date) is routed to the front cluster for operation, and the operation before 10 days (the date < the current day is minus 10) is routed to the M2 full-scale cluster for operation, so that the impact and pressure of high concurrency of mass data on the database are solved, the load capacity of the database is reduced, and the query efficiency is improved.
The M2 full-scale cluster stores full-scale data to improve fault tolerance, and once the front-scale cluster has problems, the full-scale cluster stores large part of data of the front-scale cluster to reduce influence.
Preferably, if the distance between the current time and the transaction date is less than or equal to a threshold value, the data corresponding to the transaction date is thermal data, and after the data corresponding to the transaction date is stored in the front-end cluster, the data in the front-end cluster is stored in the full-size cluster at regular time by a script program. Further preferably, the expired data in the pre-cluster is cleared regularly according to the custom routing rule through a script program, high availability of the system is ensured, and when the pre-cluster is upgraded or abnormal, the full storage platform can be started to provide services to the outside at any time.
The data in the front-mounted cluster M1 is led into the cluster M2 in batch at fixed time every day by using the timing script, because the data has a large amount of updating operations, the data of the whole cluster is compared by taking the front-mounted cluster as a main basis, a main key and the last updating time, if the whole cluster does not exist, the data is directly inserted, and if the data of the whole cluster is updated, the data in the whole cluster is directly updated, the efficiency of accessing the data is improved by the front-mounted cluster, and the accuracy of data access is improved by the full cluster.
The custom routing rule may be a routing rule using date as a determination field, and periodically clear data before 10 days (date is current day-10) in the pre-cluster M1, so that the data amount of the pre-cluster TRADE _ detach table is always kept for nearly 10 days.
It will be understood by those skilled in the art that all or part of the steps in the method for implementing the above embodiments may be implemented by relevant hardware instructed by a program, and the program may be stored in a computer-readable storage medium, such as ROM/RAM, magnetic disk, optical disk, etc.
Example two:
fig. 2 shows a structure of a mass data processing system based on a front distributed database according to a second embodiment of the present invention, and for convenience of description, only the parts related to the second embodiment of the present invention are shown, which are detailed as follows:
an obtaining unit 201, configured to obtain a current time and a transaction date recorded in a distributed database, where the distributed database includes a pre-cluster and a full-scale cluster;
a comparing unit 202, configured to compare a distance between the current time and the transaction date with a threshold;
a first storage unit 203, configured to, if a distance between the current time and the transaction date is greater than a threshold, determine that data corresponding to the transaction date is cold data, and store the data corresponding to the transaction date in the full-size cluster;
the second storage unit 204 is configured to, if a distance between the current time and the transaction date is less than or equal to a threshold, determine that data corresponding to the transaction date is hot data, and store the data corresponding to the transaction date to the front cluster.
In the embodiment of the invention, the current time and the transaction date recorded in a distributed database are obtained, wherein the distributed database comprises a front cluster and a full cluster; comparing a distance between the current time and the transaction date to a threshold; if the distance between the current time and the transaction date is larger than a threshold value, the data corresponding to the transaction date is cold data, and the data corresponding to the transaction date is stored in the full-scale cluster; and if the distance between the current time and the transaction date is smaller than or equal to a threshold value, the data corresponding to the transaction date is hot data, and the data corresponding to the transaction date is stored in the front-end cluster, so that the hot data and the cold data are conveniently separated, and the load capacity of the distributed database is reduced. The detailed implementation of each unit can refer to the description of the first embodiment, and is not repeated herein.
Example three:
fig. 3 shows another structure of a mass data processing system based on a front distributed database according to a third embodiment of the present invention, and for convenience of description, only the parts related to the third embodiment of the present invention are shown, which include:
a first creating unit 301, configured to create an interface of a current application program based on different tables in the distributed database and different permissions of different users;
a second creating unit 302, configured to create tables, indexes, and segment structures with consistent structures in the pre-cluster and the full-scale cluster;
an obtaining unit 303, configured to obtain a current time and a transaction date recorded in a distributed database, where the distributed database includes a pre-cluster and a full-scale cluster;
a comparison unit 304, configured to compare a threshold value with a distance between the current time and the transaction date;
a first storage unit 305, configured to, if a distance between the current time and the transaction date is greater than a threshold, determine that data corresponding to the transaction date is cold data, and store the data corresponding to the transaction date in the full-size cluster;
a second storage unit 305, configured to, if a distance between the current time and the transaction date is less than or equal to a threshold, determine that data corresponding to the transaction date is hot data, and store the data corresponding to the transaction date in the pre-cluster;
a third storage unit 306, configured to store the data in the pre-cluster to the full-scale cluster at regular time by using a script program;
and a clearing unit 307, configured to clear the stale data in the pre-cluster at regular time according to the custom routing rule through a script program.
In the embodiment of the present invention, an interface of a current application program is created based on different tables in the distributed database and different permissions of different users, tables, indexes, and segment structures with consistent structures are created in the pre-cluster and the full-volume cluster, a current time and a transaction date recorded in the distributed database are obtained, the distributed database includes the pre-cluster and the full-volume cluster, comparison is performed according to a distance between the current time and the transaction date and a threshold, if the distance between the current time and the transaction date is greater than the threshold, data corresponding to the transaction date is cold data, and data corresponding to the transaction date is stored in the full-volume cluster, if the distance between the current time and the transaction date is less than or equal to the threshold, data corresponding to the transaction date is hot data, and storing the data corresponding to the transaction date to the pre-cluster, regularly storing the data in the pre-cluster to the full-scale cluster through a script program, and regularly clearing out expired data in the pre-cluster according to a custom routing rule through the script program, so that hot data and cold data are separated, and the efficiency of querying the data is improved.
In the embodiment of the present invention, the processing of the mass data based on the pre-distributed database may be implemented by corresponding hardware or software units, and each unit may be an independent software or hardware unit, or may be integrated into a software or hardware unit, which is not limited herein.

Claims (10)

1. A mass data processing method based on a preposed distributed database is characterized by comprising the following steps:
the method comprises the following steps: acquiring current time and transaction date recorded in a distributed database, wherein the distributed database comprises a front cluster and a full cluster;
step two: comparing a distance between the current time and the transaction date to a threshold;
step three: if the distance between the current time and the transaction date is larger than a threshold value, the data corresponding to the transaction date is cold data, and the data corresponding to the transaction date is stored in the full-scale cluster;
step four: and if the distance between the current time and the transaction date is smaller than or equal to a threshold value, the data corresponding to the transaction date is thermal data, and the data corresponding to the transaction date is stored to the front-end cluster.
2. The method for processing mass data based on the pre-distributed database according to claim 1, wherein before the obtaining the current time and the transaction date recorded in the distributed database, the steps further include:
and creating an interface of the current application program based on different tables in the distributed database and different authorities of different users.
3. The method for processing mass data based on the pre-distributed database according to claim 2, wherein before the creating of the interface of the current application program, the steps further include:
and creating tables, indexes and fragment structures with consistent structures in the front-end cluster and the full-scale cluster.
4. The method according to claim 1, wherein after the data corresponding to the transaction date is hot data and is stored in the pre-cluster if the distance between the current time and the transaction date is less than or equal to a threshold value, the method further comprises:
and periodically storing the data in the front cluster to the full-scale cluster through a script program.
5. The massive data processing method based on the preposed distributed database as claimed in claim 4, wherein after the data in the preposed cluster is stored to the full-scale cluster at regular time by a script program, the steps further comprise:
and clearing the expired data in the pre-cluster through a script program at regular time according to a custom routing rule.
6. A mass data processing system based on a pre-populated distributed database, the system comprising:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring the current time and the transaction date recorded in a distributed database, and the distributed database comprises a front cluster and a full cluster;
the comparison unit is used for comparing the distance between the current time and the transaction date with a threshold value;
the first storage unit is used for storing the data corresponding to the transaction date to the full-volume cluster if the distance between the current time and the transaction date is larger than a threshold value, wherein the data corresponding to the transaction date is cold data;
and the second storage unit is used for storing the data corresponding to the transaction date to the preposed cluster if the distance between the current time and the transaction date is less than or equal to a threshold value.
7. The system for processing mass data based on pre-distributed database according to claim 6, further comprising:
and the first creating unit is used for creating an interface of the current application program based on different tables in the distributed database and different authorities of different users.
8. The system for processing mass data based on pre-distributed database according to claim 7, further comprising:
and the second creating unit is used for creating tables, indexes and fragment structures with consistent structures in the front cluster and the full cluster.
9. The system for processing mass data based on pre-distributed database according to claim 8, further comprising:
and the third storage unit is used for storing the data in the pre-cluster to the full-scale cluster at regular time through a script program.
10. The system for processing mass data based on pre-distributed database according to claim 7, further comprising:
and the clearing unit is used for clearing the expired data in the pre-cluster according to the custom routing rule at regular time through a script program.
CN202010703239.0A 2020-07-21 2020-07-21 Mass data processing method and system based on preposed distributed database Pending CN111930837A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010703239.0A CN111930837A (en) 2020-07-21 2020-07-21 Mass data processing method and system based on preposed distributed database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010703239.0A CN111930837A (en) 2020-07-21 2020-07-21 Mass data processing method and system based on preposed distributed database

Publications (1)

Publication Number Publication Date
CN111930837A true CN111930837A (en) 2020-11-13

Family

ID=73312686

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010703239.0A Pending CN111930837A (en) 2020-07-21 2020-07-21 Mass data processing method and system based on preposed distributed database

Country Status (1)

Country Link
CN (1) CN111930837A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104731794A (en) * 2013-12-19 2015-06-24 北京华易互动科技有限公司 Cold-hot data fragmenting, mining and storing method
CN108197289A (en) * 2018-01-18 2018-06-22 吉浦斯信息咨询(深圳)有限公司 A kind of data store organisation, data store query method, terminal and medium
CN108363813A (en) * 2018-03-15 2018-08-03 北京小度信息科技有限公司 Date storage method, device and system
CN109344198A (en) * 2018-09-19 2019-02-15 国网浙江省电力有限公司嘉兴供电公司 Log system and sharding method based on MongoDB distributed type assemblies framework
CN109656978A (en) * 2018-12-24 2019-04-19 泰华智慧产业集团股份有限公司 The optimization method of near real-time search service

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104731794A (en) * 2013-12-19 2015-06-24 北京华易互动科技有限公司 Cold-hot data fragmenting, mining and storing method
CN108197289A (en) * 2018-01-18 2018-06-22 吉浦斯信息咨询(深圳)有限公司 A kind of data store organisation, data store query method, terminal and medium
CN108363813A (en) * 2018-03-15 2018-08-03 北京小度信息科技有限公司 Date storage method, device and system
CN109344198A (en) * 2018-09-19 2019-02-15 国网浙江省电力有限公司嘉兴供电公司 Log system and sharding method based on MongoDB distributed type assemblies framework
CN109656978A (en) * 2018-12-24 2019-04-19 泰华智慧产业集团股份有限公司 The optimization method of near real-time search service

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
BRUCE: "大数据的一生一世——谈数据冷热分离技术", 《知乎》, 14 June 2020 (2020-06-14), pages 1 *

Similar Documents

Publication Publication Date Title
CN110489445B (en) Rapid mass data query method based on polymorphic composition
CN107818115B (en) Method and device for processing data table
CN109284293B (en) Data migration method for upgrading business charging system of water business company
US6161109A (en) Accumulating changes in a database management system by copying the data object to the image copy if the data object identifier of the data object is greater than the image identifier of the image copy
EP3876105A1 (en) Efficient methods and systems for consistent read in record-based multi-version concurrency control
CN106557578B (en) Historical data query method and system
CN104536904A (en) Data management method, equipment and system
CN103646111A (en) System and method for realizing real-time data association in big data environment
CA3176450A1 (en) Method and apparatus for implementing incremental data consistency
CN106682148A (en) Method and device based on Solr data search
CN110096509A (en) Realize that historical data draws the system and method for storage of linked list modeling processing under big data environment
CN102968456B (en) A kind of raster data reading and processing method and device
CN110175206A (en) Intellectual analysis operational approach, system and medium for multiple database separation
CN112269802A (en) Method and system for frequent deletion, modification and check optimization based on Clickhouse
CN113495872A (en) Transaction processing method and system in distributed database
US7136861B1 (en) Method and system for multiple function database indexing
CN114153809A (en) Parallel real-time incremental statistic method based on database logs
CN110489490A (en) Data storage and query method based on distributed data base
CN113934713A (en) Order data indexing method, system, computer equipment and storage medium
CN116450607A (en) Data processing method, device and storage medium
CN111930837A (en) Mass data processing method and system based on preposed distributed database
US11068451B2 (en) Database column refresh via replacement
US8290935B1 (en) Method and system for optimizing database system queries
CN113934797B (en) Banking industry super-large data synchronization method and system
CN113360551B (en) Method and system for storing and rapidly counting time sequence data in shooting range

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination