CN117389475A - Data processing method and device - Google Patents

Data processing method and device Download PDF

Info

Publication number
CN117389475A
CN117389475A CN202311412356.1A CN202311412356A CN117389475A CN 117389475 A CN117389475 A CN 117389475A CN 202311412356 A CN202311412356 A CN 202311412356A CN 117389475 A CN117389475 A CN 117389475A
Authority
CN
China
Prior art keywords
target
merging
identifier
task
disk file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311412356.1A
Other languages
Chinese (zh)
Inventor
张广超
张成远
刘欢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingdong Technology Information Technology Co Ltd
Original Assignee
Jingdong Technology Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jingdong Technology Information Technology Co Ltd filed Critical Jingdong Technology Information Technology Co Ltd
Priority to CN202311412356.1A priority Critical patent/CN117389475A/en
Publication of CN117389475A publication Critical patent/CN117389475A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0643Management of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data processing method and device, and relates to the technical field of computers. One embodiment of the method comprises the following steps: acquiring each piece of slicing information on a current node, wherein the slicing information comprises slicing identifiers; writing data corresponding to the fragment identification into one or more disk files; in response to receiving a merge task, determining a target fragment identifier corresponding to the merge task; and executing a merging task of the target disk file corresponding to the target fragment identifier according to the target fragment identifier so as to update the target disk file. According to the embodiment, the data in the magnetic disk are combined according to the slicing identification, so that the read-write amplification can be reduced, and the magnetic disk resources are saved.

Description

Data processing method and device
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for data processing.
Background
TiKV is a distributed, horizontally-scalable transactional key-value database, and uses RocksDB as the underlying storage engine. When TiKV is used for data storage, data are sliced, each slice is provided with a plurality of copies, the slices and the copies are scattered on each TiKV node, and all sliced data of the same TiKV node are stored on the same RocksDB.
When the RocksDB writes data into a disk, merging logic is executed on data of different levels, which can lead to the fact that merging data does not need to be involved in merging, so that read-write amplification is caused, and disk resources are wasted.
Disclosure of Invention
In view of this, the embodiments of the present invention provide a method and an apparatus for processing data, which can combine data in a disk according to a fragment identifier, so as to reduce read-write amplification and save disk resources.
To achieve the above object, according to one aspect of an embodiment of the present invention, there is provided a method of data processing, including:
acquiring each piece of slicing information on a current node, wherein the slicing information comprises slicing identifiers;
writing data corresponding to the fragment identification into one or more disk files;
in response to receiving a merge task, determining a target fragment identifier corresponding to the merge task;
and executing a merging task of the target disk file corresponding to the target fragment identifier according to the target fragment identifier so as to update the target disk file.
Optionally, the slicing information includes a key value range corresponding to the slicing identifier, and writing data corresponding to the slicing identifier into one or more disk files includes:
acquiring data corresponding to the fragment identifier according to the key value range;
writing the data into one or more disk files.
Optionally, determining the target fragment identifier corresponding to the merging task includes:
acquiring a target key value range corresponding to the merging task;
and taking the slice identifier corresponding to the target key value range as the target slice identifier.
Optionally, determining the target fragment identifier corresponding to the merging task includes:
acquiring a target key value range corresponding to the merging task;
dividing the target key value range according to the key value range of each fragment mark to obtain a plurality of sub-key value ranges;
and taking the fragment identifier corresponding to each sub-key value range as the target fragment identifier.
Optionally, executing the merging task of the target disk file corresponding to the target fragment identifier according to the target fragment identifier, including:
creating sub-merging tasks corresponding to each target fragment identifier to obtain a plurality of sub-merging tasks corresponding to the merging tasks;
and executing the merging of the target disk files corresponding to each target fragment identifier aiming at each sub-merging task so as to update the target disk files.
Optionally, each disk file has a corresponding hierarchy, and executing, according to the target shard identifier, a merging task of the target disk file corresponding to the target shard identifier, including:
and merging the target disk files according to the target fragment identification and the hierarchical sequence of the target disk files.
Optionally, merging the target disk file according to the target fragment identifier and the hierarchical order of the target disk file includes:
when the merging in the L0 layer is executed, merging all target disk files in the L0 layer to obtain updated target disk files in the L0 layer;
and when the layer-by-layer combination is executed, determining a disk file to be combined according to the updated target disk file in the previous layer, and combining the disk file to be combined with the target disk file in the current layer to obtain the updated target disk file in the current layer.
According to still another aspect of an embodiment of the present invention, there is provided an apparatus for data processing, including:
the acquisition module acquires all piece of piece information on the current node, wherein the piece information comprises a piece identification;
the storage module is used for writing the data corresponding to the fragment identification into one or more disk files;
the determining module is used for determining a target fragment identifier corresponding to the merging task in response to receiving the merging task;
and the merging module executes a merging task of the target disk file corresponding to the target fragment identifier according to the target fragment identifier so as to update the target disk file.
According to another aspect of an embodiment of the present invention, there is provided an electronic apparatus including:
one or more processors;
storage means for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods of data processing provided by the present invention.
According to a further aspect of an embodiment of the present invention, there is provided a computer readable medium having stored thereon a computer program which when executed by a processor implements the method of data processing provided by the present invention.
One embodiment of the above invention has the following advantages or benefits: the data processing method of the embodiment of the invention comprises the steps of firstly obtaining each piece of information on a current node, and then writing data corresponding to each piece of information into one or more disk files; and after receiving the merging task, acquiring a target fragment identifier corresponding to the merging task, and executing the merging task of the target disk file corresponding to the target fragment identifier to update the target disk file. According to the data processing method, the merging task is executed according to the slicing marks, the disk files with the same slicing mark are merged, and the disk files without the same slicing mark do not execute merging, so that the merging task of data which does not need to participate in merging is not executed, the read-write amplification is reduced, and the disk resources are saved.
Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of the main flow of a method of data processing according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of the main flow of another method of data processing according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of the main flow of a method of further data processing according to an embodiment of the invention;
FIG. 4 is a schematic diagram of the main modules of an apparatus for data processing according to an embodiment of the present invention;
FIG. 5 is an exemplary system architecture diagram in which embodiments of the present invention may be applied;
fig. 6 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a schematic diagram of the main flow of a data processing method according to an embodiment of the present invention, and as shown in fig. 1, the data processing method includes the following steps:
step S101: acquiring each piece of slicing information on the current node, wherein the slicing information comprises slicing marks;
step S102: writing data corresponding to the fragment identification into one or more disk files;
step S103: in response to receiving the merge task, determining a target fragment identifier corresponding to the merge task;
step S104: and executing the merging task of the target disk file corresponding to the target fragment identifier according to the target fragment identifier so as to update the target disk file.
In the embodiment of the invention, the data in the TiKV is divided into a plurality of slices (namely regions) according to the range of the keys, the range of the corresponding key in each slice is ordered from small to large, each slice is provided with a plurality of copies, the respective slices and the copies are uniformly dispersed on the respective TiKV nodes, the TiKV realizes distributed data consistency through a shift algorithm, the full utilization of the whole cluster resource is ensured, and the horizontal expansion can be carried out along with the increase of the number of machines. That is, each TiKV node includes multiple slices, all slices on each TiKV node are stored in the same RocksDB instance, while the log needed for the Raft protocol replication is stored in another RocksDB instance, because the performance of random writing of disks is much lower than that of sequential writing, so TiKV uses the same RocksDB instance to store these data so that writes of different regions can be consolidated in one disk write.
In the embodiment of the invention, firstly, each piece of information on the current node is acquired, and the piece of information comprises a piece of identification, namely a region ID. The shard information further includes a key value range corresponding to the shard identifier, and writing data corresponding to the shard identifier into one or more disk files, including: acquiring data corresponding to the fragment identification according to the key value range; data is written to one or more disk files.
In an embodiment of the present invention, the key value range, i.e., the range of keys, includes the range between the start key (start key) and end key (end key). According to the key range, data corresponding to the slice identifiers can be obtained from the memory, then the data is written into one or more disk files, the number of the disk files corresponding to the data can be determined according to the space size occupied by the data and the size of the disk files, namely the sst files, so that each disk file corresponding to each slice identifier is determined, and the data of different slice identifiers are written into different sst files, namely each sst file corresponds to one slice identifier, and each slice identifier can correspond to one or more sst files. For multiple sst files identified by the same tile, each sst file may have a partial key range for the tile, and the key ranges of the multiple sst files may overlap.
In the embodiment of the invention, in order to save disk resources, the combination or compression of data, namely the compact, can be triggered manually or automatically so as to reduce the influence of space amplification and read amplification. The merging task can be automatically triggered in a timing mode, and after the merging task is received, a target fragment identifier corresponding to the merging task is determined so as to execute the merging task according to the target fragment identifier.
In the embodiment of the invention, determining the target fragment identifier corresponding to the merging task comprises the following steps: obtaining a target key value range corresponding to the merging task; and taking the slice identifier corresponding to the target key value range as a target slice identifier. Namely, a target key value range is obtained from the merging task, a fragment identifier corresponding to the target key value range is determined, and if the target key value range is within the key value range of one of the fragment identifiers, the fragment identifier is used as the target fragment identifier so as to execute the merging task.
In an embodiment of the present invention, as shown in fig. 2, determining a target tile identifier corresponding to a merge task includes:
step S201: obtaining a target key value range corresponding to the merging task;
step S202: dividing a target key value range according to the key value range of each fragment mark to obtain a plurality of sub-key value ranges;
step S203: and taking the slice identifier corresponding to each sub-key value range as a target slice identifier.
In the embodiment of the invention, a target key value range is obtained from a merging task, a fragment identifier corresponding to the target key value range is determined, if the target key value range corresponds to a plurality of fragment identifiers, the target key value range can be divided into a plurality of sub-key value ranges, each sub-key value range corresponds to one fragment identifier, and the fragment identifier corresponding to each sub-key value range in the plurality of sub-key value ranges is used as the target fragment identifier, namely the merging task can indicate to merge the data of the plurality of fragment identifiers according to the key value range of each fragment identifier, namely the start key (start key) and the end key (end key) of the key value range corresponding to each fragment identifier, namely the boundary key. For example, the target key value range is [ a, e ], and the target key value range can be divided into three sub-key value ranges of [ a, c ], [ c, e ], [ e, z ] according to the boundary key of the slice identifier, wherein [ a, c ] corresponds to region1, [ c, e ] corresponds to region2, and [ e, z ] corresponds to region3.
In the embodiment of the present invention, each disk file has a corresponding hierarchy, and executing a merging task of a target disk file corresponding to a target fragment identifier according to the target fragment identifier, including:
and merging the target disk files according to the target fragment identification and the hierarchical sequence of the target disk files.
In the embodiment of the invention, after the target fragment identifier corresponding to the merging task is determined, the merging task can be executed according to the target fragment identifier, the target disk files corresponding to the target fragment identifier can be determined, each disk file has a corresponding level in the disk, and when merging is executed, the target disk files are merged according to the level sequence, namely, the level sequence is used for updating the disk, so that the release of disk resources is realized. The hierarchy includes L0, L1, L2 … …, etc., where merging is performed, merging within the L0 layer is performed first, and then layer-by-layer merging, i.e., merging of L0 to L1, merging of L1 to L2, merging of L2 to L3, … …, is performed until the last hierarchy.
In an embodiment of the present invention, as shown in fig. 3, executing, according to a target fragment identifier, a merging task of a target disk file corresponding to the target fragment identifier, including:
step S301: creating sub-merging tasks corresponding to each target fragment identifier to obtain a plurality of sub-merging tasks corresponding to the merging tasks;
step S302: and executing the merging of the target disk files corresponding to each target fragment identifier aiming at each sub-merging task so as to update the target disk files.
In the embodiment of the invention, when the target fragment identifier is a plurality of, that is, when the merging task includes a plurality of fragment identifiers, the merging task can be divided into a plurality of sub-merging tasks, that is, for each target fragment identifier, a sub-merging task corresponding to the target fragment identifier is created, then each sub-merging task is executed respectively, when each sub-merging task is executed, the target disk file corresponding to the target fragment identifier is determined, merging of the target disk file is executed, and the target disk file is updated to execute the merging task.
In the embodiment of the invention, merging the target disk file according to the target fragment identification and the hierarchical sequence of the target disk file comprises the following steps:
when the merging in the L0 layer is executed, merging all target disk files in the L0 layer to obtain updated target disk files in the L0 layer;
and when the layer-by-layer combination is executed, determining a disk file to be combined according to the updated target disk file in the previous layer, and combining the disk file to be combined with the target disk file in the current layer to obtain the updated target disk file in the current layer.
In the embodiment of the invention, if the target slicing mark is one, the merging task is executed according to the target slicing mark, and if the target slicing mark is a plurality of target slicing marks, the merging task is divided into a plurality of sub-merging tasks according to each target slicing mark, and each sub-merging task is executed to realize the execution of the merging task.
In the embodiment of the invention, when a merging task is executed, for each target fragment identifier, each target disk file corresponding to the target fragment identifier is firstly obtained from an L0 layer, namely, each sst file corresponding to the target fragment identifier is traversed, namely, all sst files in the L0 layer are traversed, sst files belonging to the same target fragment identifier are obtained, each sst file is merged, and one or more updated target disk files in the L0 layer are obtained; and then performing layer-by-layer combination, namely determining a disk file to be combined according to the target disk file updated in the previous layer (Ln, n is an integer greater than or equal to 0) and a preset rule, and combining the disk file to be combined with the target disk file in the current layer to obtain the target disk file updated in the current layer (Ln+1), wherein layer-by-layer combination is performed in such a way as combination of L0 to L1 layers, combination of L1 to L2 layers, combination of L2 to L3 layers, … … and the like. For example, the merging of the layers L0 to L1 may determine a disk file to be merged according to the updated target disk file in the layer L0, and merge the disk file to be merged with the target disk file in the layer L1 to obtain the updated target disk file in the layer L1.
In the embodiment of the invention, when layer-by-layer merging is performed, the preset rule may be the priority of the target disk file, the priority may be determined according to the order from high to low of the priority, that is, the order from high to low of the priority, for example, when L0 layer to L1 layer merging is performed, one with the largest occupied storage space may be selected from the updated target disk files as the disk file to be merged. The priority rule in the preset rule can also be set in a self-defined way, for example, one disc file to be combined can be randomly selected from the updated target disc files; the disk file to be combined with more preset operations can be selected from the updated target disk file, and the preset operations can be deletion operations (delete), so that the read amplification can be reduced.
The data processing method of the embodiment of the invention comprises the steps of firstly obtaining each piece of information on a current node, and then writing data corresponding to each piece of information into one or more disk files; and after receiving the merging task, acquiring a target fragment identifier corresponding to the merging task, and executing the merging task of the target disk file corresponding to the target fragment identifier to update the target disk file. According to the data processing method, the merging task is executed according to the slicing marks, the disk files with the same slicing mark are merged, and the disk files without the same slicing mark are not merged, so that the merging task of data which does not need to participate in merging is not executed, the read-write amplification is reduced, and the disk resources are saved; when the merging task indicates a plurality of slicing identifications, the merging task can be split into a plurality of sub-merging tasks according to the key value range of the slicing identifications, merging of each sub-task is executed according to each slicing identification, reading and writing amplification is reduced, and disk resources are saved.
According to yet another aspect of an embodiment of the present invention, as shown in fig. 4, there is provided an apparatus 400 for data processing, including:
the acquisition module 401 acquires each piece of slicing information on the current node, wherein the slicing information comprises slicing identifiers;
a storage module 402 for writing data corresponding to the fragment identifier into one or more disk files;
a determining module 403, configured to determine, in response to receiving the merge task, a target fragment identifier corresponding to the merge task;
and the merging module 404 executes the merging task of the target disk file corresponding to the target fragment identifier according to the target fragment identifier so as to update the target disk file.
In an embodiment of the present invention, the slice information includes a key value range corresponding to the slice identifier, and the storage module 402 further includes: acquiring data corresponding to the fragment identification according to the key value range; data is written to one or more disk files.
In an embodiment of the present invention, the determining module 403 is further configured to: obtaining a target key value range corresponding to the merging task; and taking the slice identifier corresponding to the target key value range as a target slice identifier.
In an embodiment of the present invention, the determining module 403 is further configured to: obtaining a target key value range corresponding to the merging task; dividing a target key value range according to the key value range of each fragment mark to obtain a plurality of sub-key value ranges; and taking the slice identifier corresponding to each sub-key value range as a target slice identifier.
In an embodiment of the present invention, the merging module 404 is further configured to: creating sub-merging tasks corresponding to each target fragment identifier to obtain a plurality of sub-merging tasks corresponding to the merging tasks; and executing the merging of the target disk files corresponding to each target fragment identifier aiming at each sub-merging task so as to update the target disk files.
In an embodiment of the present invention, each disk file has a corresponding hierarchy, and the merging module 404 is further configured to: and merging the target disk files according to the target fragment identification and the hierarchical sequence of the target disk files.
In an embodiment of the present invention, the merging module 404 is further configured to: when the merging in the L0 layer is executed, merging all target disk files in the L0 layer to obtain updated target disk files in the L0 layer; and when the layer-by-layer combination is executed, determining a disk file to be combined according to the updated target disk file in the previous layer, and combining the disk file to be combined with the target disk file in the current layer to obtain the updated target disk file in the current layer.
According to another aspect of an embodiment of the present invention, there is provided an electronic apparatus including: one or more processors; and a storage device for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method for data processing provided by the present invention.
According to a further aspect of an embodiment of the present invention, there is provided a computer readable medium having stored thereon a computer program which when executed by a processor implements the method of data processing provided by the present invention.
Fig. 5 illustrates an exemplary system architecture 500 of a data processing method or apparatus to which embodiments of the present invention may be applied.
As shown in fig. 5, the system architecture 500 may include terminal devices 501, 502, 503, a network 504, and a server 505. The network 504 is used as a medium to provide communication links between the terminal devices 501, 502, 503 and the server 505. The network 504 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
A user may interact with the server 505 via the network 504 using the terminal devices 501, 502, 503 to receive or send messages or the like. Various communication client applications may be installed on the terminal devices 501, 502, 503, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 501, 502, 503 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 505 may be a server providing various services, such as a background management server (by way of example only) providing support for shopping-type websites browsed by users using the terminal devices 501, 502, 503. The background management server may analyze and process the received data such as the product information query request, and feedback the processing result (e.g., the target push information, the product information—only an example) to the terminal device.
It should be noted that, the method for processing data provided by the embodiment of the present invention is generally performed by the server 505, and accordingly, the device for processing data is generally disposed in the server 505.
It should be understood that the number of terminal devices, networks and servers in fig. 5 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 6, there is illustrated a schematic diagram of a computer system 600 suitable for use in implementing an embodiment of the present invention. The terminal device shown in fig. 6 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiment of the present invention.
As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU) 601, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, mouse, etc.; an output portion 607 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The drive 610 is also connected to the I/O interface 605 as needed. Removable media 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on drive 610 so that a computer program read therefrom is installed as needed into storage section 608.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication portion 609, and/or installed from the removable medium 611. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 601.
The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules involved in the embodiments of the present invention may be implemented in software or in hardware. The described modules may also be provided in a processor, for example, as: a processor includes an acquisition module, a storage module, a determination module, and a merge module. The names of these modules do not limit the module itself in some cases, and the acquisition module may also be described as a "module that acquires the respective slice information on the current node", for example.
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to include: acquiring each piece of slicing information on the current node, wherein the slicing information comprises slicing marks; writing data corresponding to the fragment identification into one or more disk files; in response to receiving the merge task, determining a target fragment identifier corresponding to the merge task; and executing the merging task of the target disk file corresponding to the target fragment identifier according to the target fragment identifier so as to update the target disk file.
According to the technical scheme of the embodiment of the invention, the data processing method comprises the steps of firstly acquiring each piece of information on a current node, and then writing data corresponding to each piece of information into one or more disk files; and after receiving the merging task, acquiring a target fragment identifier corresponding to the merging task, and executing the merging task of the target disk file corresponding to the target fragment identifier to update the target disk file. According to the data processing method, the merging task is executed according to the slicing marks, the disk files with the same slicing mark are merged, and the disk files without the same slicing mark are not merged, so that the merging task of data which does not need to participate in merging is not executed, the read-write amplification is reduced, and the disk resources are saved; when the merging task indicates a plurality of slicing identifications, the merging task can be split into a plurality of sub-merging tasks according to the key value range of the slicing identifications, merging of each sub-task is executed according to each slicing identification, reading and writing amplification is reduced, and disk resources are saved.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method of data processing, comprising:
acquiring each piece of slicing information on a current node, wherein the slicing information comprises slicing identifiers;
writing data corresponding to the fragment identification into one or more disk files;
in response to receiving a merge task, determining a target fragment identifier corresponding to the merge task;
and executing a merging task of the target disk file corresponding to the target fragment identifier according to the target fragment identifier so as to update the target disk file.
2. The method of claim 1, wherein the shard information includes a key value range corresponding to the shard identifier, writing data corresponding to the shard identifier to one or more disk files, comprising:
acquiring data corresponding to the fragment identifier according to the key value range;
writing the data into one or more disk files.
3. The method of claim 1, wherein determining a target tile identifier corresponding to the merge task comprises:
acquiring a target key value range corresponding to the merging task;
and taking the slice identifier corresponding to the target key value range as the target slice identifier.
4. The method of claim 1, wherein determining a target tile identifier corresponding to the merge task comprises:
acquiring a target key value range corresponding to the merging task;
dividing the target key value range according to the key value range of each fragment mark to obtain a plurality of sub-key value ranges;
and taking the fragment identifier corresponding to each sub-key value range as the target fragment identifier.
5. The method of claim 4, wherein performing a merge task for a target disk file corresponding to the target shard identifier according to the target shard identifier comprises:
creating sub-merging tasks corresponding to each target fragment identifier to obtain a plurality of sub-merging tasks corresponding to the merging tasks;
and executing the merging of the target disk files corresponding to each target fragment identifier aiming at each sub-merging task so as to update the target disk files.
6. The method of claim 1, wherein each disk file has a corresponding hierarchy, performing a merge task for a target disk file corresponding to the target shard identifier according to the target shard identifier, comprising:
and merging the target disk files according to the target fragment identification and the hierarchical sequence of the target disk files.
7. The method of claim 6, wherein merging the target disk file according to the target fragment identification and the hierarchical order of the target disk file comprises:
when the merging in the L0 layer is executed, merging all target disk files in the L0 layer to obtain updated target disk files in the L0 layer;
and when the layer-by-layer combination is executed, determining a disk file to be combined according to the updated target disk file in the previous layer, and combining the disk file to be combined with the target disk file in the current layer to obtain the updated target disk file in the current layer.
8. An apparatus for data processing, comprising:
the acquisition module acquires all piece of piece information on the current node, wherein the piece information comprises a piece identification;
the storage module is used for writing the data corresponding to the fragment identification into one or more disk files;
the determining module is used for determining a target fragment identifier corresponding to the merging task in response to receiving the merging task;
and the merging module executes a merging task of the target disk file corresponding to the target fragment identifier according to the target fragment identifier so as to update the target disk file.
9. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs,
when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-7.
10. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-7.
CN202311412356.1A 2023-10-27 2023-10-27 Data processing method and device Pending CN117389475A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311412356.1A CN117389475A (en) 2023-10-27 2023-10-27 Data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311412356.1A CN117389475A (en) 2023-10-27 2023-10-27 Data processing method and device

Publications (1)

Publication Number Publication Date
CN117389475A true CN117389475A (en) 2024-01-12

Family

ID=89466354

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311412356.1A Pending CN117389475A (en) 2023-10-27 2023-10-27 Data processing method and device

Country Status (1)

Country Link
CN (1) CN117389475A (en)

Similar Documents

Publication Publication Date Title
US10097659B1 (en) High performance geographically distributed data storage, retrieval and update
US10127243B2 (en) Fast recovery using self-describing replica files in a distributed storage system
KR20170044718A (en) Synchronization of shared folders and files
US20170193034A1 (en) Object data updating method and apparatus in an object storage system
CN107480205B (en) Method and device for partitioning data
CN110858194A (en) Method and device for expanding database
CN113961510B (en) File processing method, device, equipment and storage medium
CN112597126A (en) Data migration method and device
CN111753019B (en) Data partitioning method and device applied to data warehouse
US10467190B2 (en) Tracking access pattern of inodes and pre-fetching inodes
CN112783887A (en) Data processing method and device based on data warehouse
CN107409086B (en) Mass data management in communication applications through multiple mailboxes
CN114817146A (en) Method and device for processing data
CN111984686A (en) Data processing method and device
CN112711572B (en) Online capacity expansion method and device suitable for database and table division
CN111177109A (en) Method and device for deleting overdue key
US11416468B2 (en) Active-active system index management
CN117389475A (en) Data processing method and device
CN114756173A (en) Method, system, device and computer readable medium for file merging
CN113986833A (en) File merging method, system, computer system and storage medium
CN109213815B (en) Method, device, server terminal and readable medium for controlling execution times
CN113742376A (en) Data synchronization method, first server and data synchronization system
CN113760861A (en) Data migration method and device
CN113779048A (en) Data processing method and device
CN111737218A (en) File sharing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination