CN115118761A - Data storage method, device, equipment and medium - Google Patents

Data storage method, device, equipment and medium Download PDF

Info

Publication number
CN115118761A
CN115118761A CN202210832806.1A CN202210832806A CN115118761A CN 115118761 A CN115118761 A CN 115118761A CN 202210832806 A CN202210832806 A CN 202210832806A CN 115118761 A CN115118761 A CN 115118761A
Authority
CN
China
Prior art keywords
data
node
nodes
data node
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210832806.1A
Other languages
Chinese (zh)
Inventor
高矗
李选
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Jinan data Technology Co ltd
Original Assignee
Inspur Jinan data Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Jinan data Technology Co ltd filed Critical Inspur Jinan data Technology Co ltd
Priority to CN202210832806.1A priority Critical patent/CN115118761A/en
Publication of CN115118761A publication Critical patent/CN115118761A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/14Session management
    • H04L67/141Setup of application sessions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • G06F16/1824Distributed file systems implemented using Network-attached Storage [NAS] architecture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data storage method, a device, equipment and a medium, which relate to the technical field of computers and comprise the following steps: sending a file uploading request to name nodes in a distributed file system through a client, and determining each data node through the name nodes; initiating a connection establishment request to each data node, and establishing a communication pipeline with each data node after receiving a response message returned by each data node based on the permission connection; the method comprises the steps of fragmenting a large data file to obtain a plurality of data blocks; and sequentially distributing a plurality of data blocks to each data node through a communication pipeline, and then distributing the currently unallocated data blocks in the plurality of data blocks to the currently transmitted data node until the plurality of data blocks are stored in the corresponding data nodes. Therefore, the effects of improving the transmission efficiency, improving the data transmission stability and reasonably utilizing resources are achieved by increasing the number of connections with the data nodes.

Description

Data storage method, device, equipment and medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data storage method, apparatus, device, and medium.
Background
With the increasing data volume, all data cannot be stored in the same operating System, so that the data needs to be allocated to more disks managed by the operating System, but the management and the maintenance are inconvenient, and therefore a System is needed to manage files on multiple machines, namely, a Distributed File management System (HDFS) is one of the Distributed File management systems, and a plurality of servers are combined to realize functions of the System, so as to store the files and locate the files through a directory tree.
Referring to fig. 1, the steps of the current HDFS streaming data storage are as follows: (1) a client sends a file uploading request to a Name Node (NN); (2) after receiving the request, the NN executes various checks to determine whether the file exists and whether the client has authority operation, and if so, returns an instruction which can be uploaded to the client; (3) after receiving the instruction, the client initiates a DN (DN, Data Node) Node issuing request (4) to the NN Node, and the NN Node calculates the uploaded DN Node according to the copy parameter and the rack perception and informs the client; (5) the client side uploads data according to the returned DN node, firstly, a communication pipeline is established with a first DN node, then the first DN node calls a second DN node, and the second DN node calls a third DN node until the whole communication is established; (6) DN1, DN2 and DN3 respond to the client step by step, and the establishment of the communication pipeline is completed; (7) the client uploads a first block (data block) to DN1, (data is read from the disk and put into a local memory cache at first), and by taking a packet as a unit, DN1 receives a packet and transmits the packet to DN2, and DN2 transmits the packet to DN 3; the data block is obtained by segmenting a file; (8) and after the transmission of the first block is finished, the client requests to upload the second block again until all blocks of the file are successfully uploaded. The above process drawbacks are: the data is stored in a block flow mode and limited by network bandwidth, network time delay and the requirement of selecting a new DN node for re-uploading after disconnection; secondly, block is uploaded to a DN node and then is processed by the DN node, and when the DN node processes other emergency transactions, the resource utilization rate is too high, which causes slow data processing; finally, DN1 receives data and transmits the data to DN2 and DN2 to DN3, so that the resources of DN2 and DN3 are not effectively utilized.
Therefore, how to improve the transmission efficiency, improve the data transmission stability and realize reasonable utilization of resources is an urgent problem to be solved in the field.
Disclosure of Invention
In view of this, the present invention provides a data storage method, apparatus, device and medium. Can improve transmission efficiency, improve data transmission stability and realize resource rational utilization, its concrete scheme as follows:
in a first aspect, the present application discloses a data storage method, including:
sending a file uploading request to name nodes in a distributed file system through a client, and determining each data node through the name nodes;
initiating a connection establishment request to each data node, and establishing a communication pipeline with each data node after receiving a response message returned by each data node based on connection permission;
fragmenting a big data file to obtain a plurality of data blocks, wherein each data block comprises a file identifier and a fragment offset of the big data file;
and sequentially distributing the data blocks of the plurality of numbers to each data node through the communication pipeline, and then distributing the data blocks which are not distributed currently in the data blocks of the plurality of numbers to the data nodes which have finished transmission currently until the data blocks of the plurality of numbers are all stored in the corresponding data nodes, so that the data nodes can finish transmission of the data blocks of the plurality of numbers.
Optionally, the determining each data node by the name node includes:
and determining each data node through the name node based on a copy storage strategy and rack perception.
Optionally, the initiating a connection request to each data node includes:
and when the number of each data node is not more than the number of the copies specified by the copy storage strategy, initiating a connection request to each data node.
Optionally, the initiating a connection request to each data node includes:
when the number of each data node is larger than the number of the copies specified in the copy storage strategy, judging whether the copies in the copy storage strategy are default copies;
if the copy in the copy storage strategy is the default copy, initiating a connection request to a corresponding data node based on the number of the default copy;
and if the copy in the copy storage strategy is a user-defined copy, initiating a connection request to each data node.
Optionally, after allocating the currently unallocated data blocks of the plurality of data blocks to the data nodes that have completed transmission until all the plurality of data blocks of the plurality of data blocks are stored in the corresponding data nodes, the method further includes:
and sending the data block received by each data node in the data nodes to other data nodes in the data nodes based on a preset interaction pipeline.
Optionally, after the sending the data block received by each data node in each data node to other data nodes in each data node based on the preset interaction pipeline, the method further includes:
judging whether the data blocks of any data node in the data nodes belong to the same big data file or not based on the file identification;
and if the data blocks in any data node belong to the same big data file, recombining the data blocks in any data node into a corresponding big data file according to the fragment offset.
Optionally, the data storage method further includes:
when the target data node is abnormal in the transmission process of the target data block, judging whether other data nodes exist besides the target data node;
and if other data nodes exist besides the target data node, determining a new data node from the other data nodes, resending a connection request to the new data node, then establishing a communication pipeline with the new data node after receiving a response message returned by the new data node based on connection permission, and distributing the target data block to the new data node through the communication channel.
In a second aspect, the present application discloses a data storage device comprising:
the data node determining module is used for sending a file uploading request to name nodes in the distributed file system through a client and determining each data node through the name nodes;
a communication pipeline establishing module, configured to initiate a connection request to each data node, and establish a communication pipeline with each data node after receiving a response message returned by each data node based on a connection permission;
the device comprises a big data file fragmentation module, a big data file fragmentation module and a fragment module, wherein the big data file fragmentation module is used for fragmenting a big data file to obtain a plurality of data blocks, and each data block comprises a file identifier and a fragmentation offset of the big data file;
and the data block transmission module is used for sequentially distributing the data blocks of the number to each data node according to the slicing sequence and through the communication pipeline, then distributing the currently unallocated data blocks in the data blocks of the number to the data nodes which have completed transmission until the data blocks of the number are all stored in the corresponding data nodes, so that the data nodes can complete transmission of the data blocks of the number.
In a third aspect, the present application discloses an electronic device, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the data storage method disclosed in the foregoing.
In a fourth aspect, the present application discloses a computer readable storage medium for storing a computer program; wherein the computer program, when executed by a processor, implements the data storage method disclosed above.
Therefore, the present application provides a data storage method, including: sending a file uploading request to name nodes in a distributed file system through a client, and determining each data node through the name nodes; initiating a connection establishment request to each data node, and establishing a communication pipeline with each data node after receiving a response message returned by each data node based on the permission connection; fragmenting a big data file to obtain a plurality of data blocks, wherein each data block comprises a file identifier and a fragment offset of the big data file; and sequentially distributing the data blocks of the plurality of numbers to each data node through the communication pipeline, and then distributing the data blocks which are not distributed currently in the data blocks of the plurality of numbers to the data nodes which have finished transmission currently until the data blocks of the plurality of numbers are all stored in the corresponding data nodes, so that the data nodes can finish transmission of the data blocks of the plurality of numbers. Therefore, compared with the traditional method that the client side is connected with only one data node, the method and the device increase the number of connections with the data node, and the transmission speed of a plurality of connections is faster than that of a single connection; moreover, under the condition of larger network delay or network disconnection, the stability and the reliability of transmission are improved by a plurality of connections; in addition, after the connection with the plurality of data nodes is established, resources consumed by data uploading to the plurality of data nodes and post-processing are shared by the plurality of data nodes. To sum up, this application reaches the effect that improves transmission efficiency, improves data transmission stability and resource rational utilization through the increase with data node's quantity of being connected.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of a conventional data storage method;
FIG. 2 is a flow chart of a data storage method disclosed herein;
fig. 3 is a schematic structural diagram of a data node establishing a connection channel according to the present disclosure;
FIG. 4 is a schematic diagram of a structure of a large document data fragment disclosed in the present application;
FIG. 5 is a flow chart of a particular data storage method disclosed herein;
FIG. 6 is a schematic diagram of a data storage device according to the present disclosure;
fig. 7 is a block diagram of an electronic device disclosed in the present application.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The data is stored in a block flow mode and limited by network bandwidth, network time delay and the requirement of selecting a new DN node for re-uploading after disconnection; secondly, block is uploaded to a DN node and then is processed by the DN node, and when the DN node processes other emergency transactions, the resource utilization rate is too high, which causes slow data processing; finally, DN1 receives data and transmits the data to DN2 and DN2 to DN3, so that the resources of DN2 and DN3 are not effectively utilized.
Therefore, the embodiment of the application provides a data storage scheme, which can improve transmission efficiency, improve data transmission stability and realize reasonable utilization of resources.
The embodiment of the application discloses a data storage method, which is shown in fig. 2 and comprises the following steps:
step S11: sending a file uploading request to name nodes in the distributed file system through a client, and determining each data node through the name nodes.
In this embodiment, after receiving the file upload request, the name node in the distributed file system determines whether the file exists and whether the client has permission to operate, and if both the file exists and the client has permission to operate, returns an instruction that can be uploaded to the client. And after receiving the instruction, the client initiates a data node issuing request to the name node so that the name node can calculate the corresponding data node and inform the client.
Step S12: and initiating a connection establishment request to each data node, and establishing a communication pipeline with each data node after receiving a response message returned by each data node based on the permission of connection.
Compared with the conventional method that the client establishes connection with only one data node, the present embodiment increases the number of connections with the data nodes, and specifically, the present embodiment initiates a connection establishment request to each data node, and establishes a communication pipeline with each data node after receiving a response message returned by each data node based on connection permission, as shown in fig. 3, after obtaining 3 data nodes notified by a name node, the present embodiment establishes a connection channel with all the 3 data nodes.
Step S13: the method comprises the steps of fragmenting a big data file to obtain a plurality of data blocks, wherein each data block comprises a file identifier and a fragment offset of the big data file.
It can be understood that, before uploading a large file to a data node, the large file is first fragmented, and a data header is added to each fragmented data block, where the data header includes a file identifier and a fragmentation offset, as shown in fig. 4, where the size of the fragmented data block may be 128M, and is not limited specifically herein.
Step S14: and sequentially distributing the data blocks of the plurality of numbers to each data node through the communication pipeline, and then distributing the data blocks which are not distributed currently in the data blocks of the plurality of numbers to the data nodes which have finished transmission currently until the data blocks of the plurality of numbers are all stored in the corresponding data nodes, so that the data nodes can finish transmission of the data blocks of the plurality of numbers.
For example, assuming that there are three data nodes and a large file is divided into ten data blocks, this embodiment first allocates one data block to each data node, and when data transmission of some of the three data nodes is completed, this embodiment allocates an unallocated data block of the ten data blocks to the data node until all of the ten data blocks are stored in the corresponding data node.
In this embodiment, when a target data node has an abnormality in the transmission process of a target data block, recording a transmission interruption condition, and determining whether other data nodes exist besides the target data node; and if other data nodes exist besides the target data node, determining a new data node from the other data nodes, sending a connection request to the new data node again, establishing a communication pipeline with the new data node after receiving a response message returned by the new data node based on connection permission, and distributing the target data block to the new data node through the communication channel based on a transmission interruption condition. And if no other data node exists except the target data node, retransmitting the target data block to the target data node based on the transmission interruption condition.
In this embodiment, after allocating the currently unallocated data block of the plurality of data blocks to the data node that has completed transmission until all the plurality of data blocks are stored in the corresponding data node, the method further includes: and sending the data block received by each data node in the data nodes to other data nodes in the data nodes based on a preset interaction pipeline. Further, whether the data blocks of any data node in the current data nodes belong to the same big data file is judged based on the file identification; if the data blocks in any data node belong to the same big data file, the data blocks in any data node are recombined into a corresponding big data file according to the fragment offset, and thus, the big data file can be generated on each data node, and thus, the big data file can be obtained from other data nodes when a certain data node is abnormal.
Therefore, the present application provides a data storage method, including: sending a file uploading request to name nodes in a distributed file system through a client, and determining each data node through the name nodes; initiating a connection establishment request to each data node, and establishing a communication pipeline with each data node after receiving a response message returned by each data node based on the permission connection; the method comprises the steps of fragmenting a large data file to obtain a plurality of data blocks, wherein each data block comprises a file identifier and a fragment offset of the large data file; and sequentially distributing the data blocks of the plurality of numbers to each data node through the communication pipeline, and then distributing the data blocks which are not distributed currently in the data blocks of the plurality of numbers to the data nodes which have finished transmission currently until the data blocks of the plurality of numbers are all stored in the corresponding data nodes, so that the data nodes can finish transmission of the data blocks of the plurality of numbers. Therefore, compared with the traditional method that the client side is connected with only one data node, the method and the device increase the number of connections with the data node, and the transmission speed of a plurality of connections is faster than that of a single connection; moreover, under the condition of larger network delay or network disconnection, the stability and the reliability of transmission are improved by a plurality of connections; in addition, after the connection with the plurality of data nodes is established, resources consumed by data uploading to the plurality of data nodes and post-processing are shared by the plurality of data nodes. To sum up, this application reaches the effect that improves transmission efficiency, improves data transmission stability and resource rational utilization through the increase with data node's quantity of being connected.
The embodiment of the application discloses a specific data storage method, and compared with the previous embodiment, the embodiment further explains and optimizes the technical scheme. As shown in fig. 5, the method specifically includes:
step S21: sending a file uploading request to name nodes in the distributed file system through a client, and determining each data node through the name nodes based on a copy storage strategy and rack perception.
In this embodiment, the name node determines each data node based on a copy storage policy and rack sensing.
Step S22: and initiating a connection establishment request to each data node, and establishing a communication pipeline with each data node after receiving a response message returned by each data node based on the permission of connection.
In this embodiment, when the number of each data node is not greater than the number of copies specified by the copy storage policy, a connection request is initiated to each data node. When the number of each data node is larger than the number of the copies specified in the copy storage strategy, judging whether the copies in the copy storage strategy are default copies; if the copy in the copy storage strategy is the default copy, initiating a connection request to a corresponding data node based on the number of the default copy; and if the copy in the copy storage strategy is a user-defined copy, initiating a connection request to each data node. Illustratively, when the number of each data node is 2 and the number of the copies specified by the copy storage policy is 3, a connection request is initiated to all of the 2 data nodes. When the number of each data node is 4, judging whether the copy in the copy storage strategy is a default copy at the moment, if the copy is the default copy and the number of the default copy is 3, sending a connection request to 3 data nodes, and if the copy is a self-defined copy, sending a connection request to 4 data nodes.
Step S23: the method comprises the steps of fragmenting a big data file to obtain a plurality of data blocks, wherein each data block comprises a file identifier and a fragment offset of the big data file.
Step S24: and sequentially distributing the data blocks of the plurality of quantities to each data node through the communication pipeline, and then distributing the currently unallocated data blocks in the data blocks of the plurality of quantities to the data nodes which have finished transmission at present until the data blocks of the plurality of quantities are stored in the corresponding data nodes, so that the data nodes can finish transmission of the data blocks of the plurality of quantities.
For more specific processes of step S23 and step S24, reference is made to the foregoing embodiments, and details are not repeated here.
Therefore, the present application provides a data storage method, including: sending a file uploading request to name nodes in a distributed file system through a client, and determining each data node based on a copy storage strategy and rack perception through the name nodes; initiating a connection establishment request to each data node, and establishing a communication pipeline with each data node after receiving a response message returned by each data node based on the permission connection; fragmenting a big data file to obtain a plurality of data blocks, wherein each data block comprises a file identifier and a fragment offset of the big data file; the data blocks of the plurality of numbers are sequentially distributed to the data nodes through the communication pipeline, and then the data blocks which are not distributed currently in the data blocks of the plurality of numbers are distributed to the data nodes which have finished transmission currently until the data blocks of the plurality of numbers are stored in the corresponding data nodes, so that the data blocks of the plurality of numbers are transmitted through the data nodes, therefore, compared with the traditional method that a client side only establishes connection with one data node, the method increases the connection number with the data nodes, and the transmission speed of a plurality of connections is faster than that of a single connection; moreover, under the condition of larger network delay or network disconnection, the stability and the reliability of transmission are improved by a plurality of connections; in addition, after the connection with the plurality of data nodes is established, resources consumed by data uploading to the plurality of data nodes and post-processing are shared by the plurality of data nodes. To sum up, this application reaches the effect that improves transmission efficiency, improves data transmission stability and resource rational utilization through the increase with data node's quantity of being connected.
Correspondingly, the embodiment of the present application also discloses a data storage device, as shown in fig. 6, the device includes:
the data node determining module 11 is configured to send a file uploading request to a name node in the distributed file system through a client, and determine each data node through the name node;
a communication pipe establishing module 12, configured to initiate a connection request to each data node, and establish a communication pipe with each data node after receiving a response message returned by each data node based on a connection permission;
the big data file fragmentation module 13 is configured to fragment a big data file to obtain a plurality of data blocks, where each data block includes a file identifier and a fragmentation offset of the big data file;
and the data block transmission module 14 is configured to sequentially allocate the data blocks of the number to each data node according to the fragmentation order and through the communication pipeline, and then allocate currently unallocated data blocks of the number to currently transmitted data nodes until the data blocks of the number are all stored in corresponding data nodes, so that the data nodes complete transmission of the data blocks of the number.
For more specific working processes of the modules, reference may be made to corresponding contents disclosed in the foregoing embodiments, and details are not repeated here.
Therefore, the present application provides a data storage method, including: sending a file uploading request to name nodes in a distributed file system through a client, and determining each data node through the name nodes; initiating a connection establishment request to each data node, and establishing a communication pipeline with each data node after receiving a response message returned by each data node based on the permission connection; fragmenting a big data file to obtain a plurality of data blocks, wherein each data block comprises a file identifier and a fragment offset of the big data file; and sequentially distributing the data blocks of the plurality of numbers to each data node through the communication pipeline, and then distributing the data blocks which are not distributed currently in the data blocks of the plurality of numbers to the data nodes which have finished transmission currently until the data blocks of the plurality of numbers are all stored in the corresponding data nodes, so that the data nodes can finish transmission of the data blocks of the plurality of numbers. Therefore, compared with the traditional method that the client side is connected with only one data node, the method and the device increase the number of connections with the data node, and the transmission speed of a plurality of connections is faster than that of a single connection; moreover, under the condition of larger network delay or network disconnection, the stability and the reliability of transmission are improved by a plurality of connections; in addition, after the connection with the plurality of data nodes is established, resources consumed by data uploading to the plurality of data nodes and post-processing are shared by the plurality of data nodes. To sum up, this application reaches the effect that improves transmission efficiency, improves data transmission stability and resource rational utilization through the increase with data node's quantity of being connected.
In some specific embodiments, the data node determining module 11 may specifically include:
and the data node determining unit is used for determining each data node through the name node based on a copy storage strategy and rack perception.
In some specific embodiments, the communication pipe establishing module 12 may specifically include:
and the first connection request establishing unit is used for initiating a connection request to each data node when the number of each data node is not more than the number of the copies specified by the copy storage strategy.
In some specific embodiments, the communication pipe establishing module 12 may specifically include:
a second connection request establishing unit, configured to determine whether a copy in the copy storage policy is a default copy when the number of each data node is greater than the number of copies specified in the copy storage policy; if the copy in the copy storage strategy is the default copy, initiating a connection request to a corresponding data node based on the number of the default copy; and if the copy in the copy storage strategy is the user-defined copy, initiating a connection request to each data node.
In some specific embodiments, after the data block transmission module 14, the method further may further include:
and the data sending unit is used for sending the data block received by each data node in each data node to other data nodes in each data node based on a preset interactive pipeline.
In some specific embodiments, after the data sending unit, the method further may further include:
the data block reorganizing unit is used for judging whether the data blocks of any data node in the current data nodes belong to the same big data file or not based on the file identification; and if the data blocks in any data node belong to the same big data file, recombining the data blocks in any data node into a corresponding big data file according to the fragment offset.
In some specific embodiments, the data storage device may further include:
the data exception processing unit is used for judging whether other data nodes exist besides the target data node or not when the target data node has exception in the transmission process of the target data block; and if other data nodes exist besides the target data node, determining a new data node from the other data nodes, resending a connection request to the new data node, then establishing a communication pipeline with the new data node after receiving a response message returned by the new data node based on connection permission, and distributing the target data block to the new data node through the communication channel.
Further, the embodiment of the application also provides electronic equipment. FIG. 7 is a block diagram illustrating an electronic device 20 according to an exemplary embodiment, and the contents of the diagram should not be construed as limiting the scope of use of the present application in any way.
Fig. 7 is a schematic structural diagram of an electronic device 20 according to an embodiment of the present disclosure. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a display 23, an input output interface 24, a communication interface 25, a power supply 26, and a communication bus 27. Wherein the memory 22 is adapted to store a computer program, which is loaded and executed by the processor 21, to implement the steps of:
sending a file uploading request to name nodes in a distributed file system through a client, and determining each data node through the name nodes;
initiating a connection establishment request to each data node, and establishing a communication pipeline with each data node after receiving a response message returned by each data node based on connection permission;
fragmenting a big data file to obtain a plurality of data blocks, wherein each data block comprises a file identifier and a fragment offset of the big data file;
and sequentially distributing the data blocks of the plurality of numbers to each data node through the communication pipeline, and then distributing the data blocks which are not distributed currently in the data blocks of the plurality of numbers to the data nodes which have finished transmission currently until the data blocks of the plurality of numbers are all stored in the corresponding data nodes, so that the data nodes can finish transmission of the data blocks of the plurality of numbers.
In some embodiments, the processor, by executing the computer program stored in the memory, may specifically implement the following steps:
and determining each data node through the name node based on a copy storage strategy and rack perception.
In some embodiments, the processor, by executing the computer program stored in the memory, may specifically implement the following steps:
and when the number of each data node is not more than the number of the copies specified by the copy storage strategy, initiating a connection request to each data node.
In some embodiments, the processor, by executing the computer program stored in the memory, may specifically implement the following steps:
when the number of each data node is larger than the number of the copies specified in the copy storage strategy, judging whether the copies in the copy storage strategy are default copies;
if the copy in the copy storage strategy is the default copy, initiating a connection request to a corresponding data node based on the number of the default copy;
and if the copy in the copy storage strategy is the user-defined copy, initiating a connection request to each data node.
In some embodiments, the processor, by executing the computer program stored in the memory, may further include the steps of:
and sending the data block received by each data node in the data nodes to other data nodes in the data nodes based on a preset interaction pipeline.
In some embodiments, the processor, by executing the computer program stored in the memory, may further include the steps of:
judging whether the data blocks of any data node in the data nodes belong to the same big data file or not based on the file identification;
and if the data blocks in any data node belong to the same big data file, recombining the data blocks in any data node into a corresponding big data file according to the fragment offset.
In some embodiments, the processor, by executing the computer program stored in the memory, may further include the steps of:
when the target data node is abnormal in the transmission process of the target data block, judging whether other data nodes exist besides the target data node;
and if other data nodes exist besides the target data node, determining a new data node from the other data nodes, resending a connection request to the new data node, then establishing a communication pipeline with the new data node after receiving a response message returned by the new data node based on connection permission, and distributing the target data block to the new data node through the communication channel.
In this embodiment, the power supply 26 is used for providing an operating voltage for each hardware device on the electronic device 20; the communication interface 25 can create a data transmission channel between the electronic device 20 and an external device, and a communication protocol followed by the communication interface is any communication protocol applicable to the technical solution of the present application, and is not specifically limited herein; the input/output interface 24 is configured to obtain external input data or output data to the outside, and a specific interface type thereof may be selected according to specific application requirements, which is not specifically limited herein.
In addition, the memory 22 is used as a carrier for resource storage, and may be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc., and the resource stored thereon may include the computer program 221, and the storage manner may be a transient storage or a permanent storage. The computer program 221 may further include a computer program that can be used to perform other specific tasks in addition to the computer program that can be used to perform the data storage method performed by the electronic device 20 disclosed in any of the foregoing embodiments.
Further, the embodiment of the application also discloses a computer readable storage medium for storing a computer program; wherein the computer program, when executed by a processor, implements the data storage method disclosed above.
For the specific steps of the method, reference may be made to the corresponding contents disclosed in the foregoing embodiments, which are not described herein again.
In the present application, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts in the embodiments are referred to each other, so that for the apparatus disclosed in the embodiments, since the apparatus corresponds to the method disclosed in the embodiments, the description is simple, and for the relevant parts, the method is referred to the method part.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Finally, it should also be noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing detailed description is directed to a data storage method, apparatus, device, and storage medium provided by the present application, and specific examples are applied in the present application to explain the principles and implementations of the present application, and the descriptions of the foregoing examples are only used to help understand the method and the core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, the specific implementation manner and the application scope may be changed, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (10)

1. A method of storing data, comprising:
sending a file uploading request to name nodes in a distributed file system through a client, and determining each data node through the name nodes;
initiating a connection establishment request to each data node, and establishing a communication pipeline with each data node after receiving a response message returned by each data node based on the permission connection;
fragmenting a big data file to obtain a plurality of data blocks, wherein each data block comprises a file identifier and a fragment offset of the big data file;
and sequentially distributing the data blocks of the plurality of numbers to each data node through the communication pipeline, and then distributing the data blocks which are not distributed currently in the data blocks of the plurality of numbers to the data nodes which have finished transmission currently until the data blocks of the plurality of numbers are all stored in the corresponding data nodes, so that the data nodes can finish transmission of the data blocks of the plurality of numbers.
2. The data storage method of claim 1, wherein said determining each data node by the name node comprises:
and determining each data node through the name node based on a copy storage strategy and rack perception.
3. The data storage method according to claim 2, wherein the initiating a connection request to each data node comprises:
and when the number of each data node is not more than the number of the copies specified by the copy storage strategy, initiating a connection request to each data node.
4. The data storage method according to claim 2, wherein the initiating a connection request to each data node comprises:
when the number of each data node is larger than the number of the copies specified in the copy storage strategy, judging whether the copies in the copy storage strategy are default copies;
if the copy in the copy storage strategy is the default copy, initiating a connection request to a corresponding data node based on the number of the default copy;
and if the copy in the copy storage strategy is the user-defined copy, initiating a connection request to each data node.
5. The method according to claim 1, wherein said allocating the currently unallocated data block of the number of data blocks to the data node that has currently completed transmission until all of the number of data blocks are stored in the corresponding data node, further comprises:
and sending the data block received by each data node in the data nodes to other data nodes in the data nodes based on a preset interaction pipeline.
6. The data storage method according to claim 5, wherein after the sending the data block received by each of the data nodes to other data nodes based on a preset interaction pipeline, further comprising:
judging whether the data blocks of any data node in the data nodes belong to the same big data file or not based on the file identification;
and if the data blocks in any data node belong to the same big data file, recombining the data blocks in any data node into a corresponding big data file according to the fragment offset.
7. The data storage method of any of claims 1 to 6, further comprising:
when the target data node is abnormal in the transmission process of the target data block, judging whether other data nodes exist besides the target data node;
and if other data nodes exist besides the target data node, determining a new data node from the other data nodes, resending a connection request to the new data node, then establishing a communication pipeline with the new data node after receiving a response message returned by the new data node based on connection permission, and distributing the target data block to the new data node through the communication channel.
8. A data storage device, comprising:
the data node determining module is used for sending a file uploading request to name nodes in the distributed file system through a client and determining each data node through the name nodes;
the communication pipeline establishing module is used for initiating a connection request to each data node and establishing a communication pipeline with each data node after receiving a response message returned by each data node based on connection permission;
the device comprises a big data file fragmentation module, a big data file fragmentation module and a fragment module, wherein the big data file fragmentation module is used for fragmenting a big data file to obtain a plurality of data blocks, and each data block comprises a file identifier and a fragmentation offset of the big data file;
and the data block transmission module is used for sequentially distributing the data blocks of the plurality of numbers to each data node according to the slicing sequence and through the communication pipeline, then distributing the data blocks which are not distributed currently in the data blocks of the plurality of numbers to the data nodes which are currently transmitted until the data blocks of the plurality of numbers are stored in the corresponding data nodes, so that the data nodes can complete the transmission of the data blocks of the plurality of numbers.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the data storage method of any one of claims 1 to 7.
10. A computer-readable storage medium for storing a computer program; wherein the computer program when executed by a processor implements a data storage method as claimed in any one of claims 1 to 7.
CN202210832806.1A 2022-07-15 2022-07-15 Data storage method, device, equipment and medium Pending CN115118761A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210832806.1A CN115118761A (en) 2022-07-15 2022-07-15 Data storage method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210832806.1A CN115118761A (en) 2022-07-15 2022-07-15 Data storage method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN115118761A true CN115118761A (en) 2022-09-27

Family

ID=83331290

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210832806.1A Pending CN115118761A (en) 2022-07-15 2022-07-15 Data storage method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN115118761A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102546782A (en) * 2011-12-28 2012-07-04 北京奇虎科技有限公司 Distribution system and data operation method thereof
CN103152395A (en) * 2013-02-05 2013-06-12 北京奇虎科技有限公司 Storage method and device of distributed file system
CN109376122A (en) * 2018-09-25 2019-02-22 深圳市元征科技股份有限公司 A kind of file management method, system and block chain node device and storage medium
CN110569213A (en) * 2018-05-18 2019-12-13 北京果仁宝软件技术有限责任公司 File access method, device and equipment
CN110825698A (en) * 2019-11-07 2020-02-21 重庆紫光华山智安科技有限公司 Metadata management method and related device
CN111182067A (en) * 2019-12-31 2020-05-19 上海焜耀网络科技有限公司 Data writing method and device based on interplanetary file system IPFS
CN111782623A (en) * 2020-05-21 2020-10-16 北京交通大学 File checking and repairing method in HDFS storage platform
CN113961946A (en) * 2021-09-01 2022-01-21 广州炒米信息科技有限公司 File processing method and device, storage medium and computer equipment
CN114328439A (en) * 2022-01-04 2022-04-12 中国建设银行股份有限公司 Data storage method and device, electronic equipment and storage medium
CN114553885A (en) * 2022-03-02 2022-05-27 上海弘玑信息技术有限公司 DHT network-based storage method and device, electronic equipment and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102546782A (en) * 2011-12-28 2012-07-04 北京奇虎科技有限公司 Distribution system and data operation method thereof
CN103152395A (en) * 2013-02-05 2013-06-12 北京奇虎科技有限公司 Storage method and device of distributed file system
CN110569213A (en) * 2018-05-18 2019-12-13 北京果仁宝软件技术有限责任公司 File access method, device and equipment
CN109376122A (en) * 2018-09-25 2019-02-22 深圳市元征科技股份有限公司 A kind of file management method, system and block chain node device and storage medium
CN110825698A (en) * 2019-11-07 2020-02-21 重庆紫光华山智安科技有限公司 Metadata management method and related device
CN111182067A (en) * 2019-12-31 2020-05-19 上海焜耀网络科技有限公司 Data writing method and device based on interplanetary file system IPFS
CN111782623A (en) * 2020-05-21 2020-10-16 北京交通大学 File checking and repairing method in HDFS storage platform
CN113961946A (en) * 2021-09-01 2022-01-21 广州炒米信息科技有限公司 File processing method and device, storage medium and computer equipment
CN114328439A (en) * 2022-01-04 2022-04-12 中国建设银行股份有限公司 Data storage method and device, electronic equipment and storage medium
CN114553885A (en) * 2022-03-02 2022-05-27 上海弘玑信息技术有限公司 DHT network-based storage method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112532675B (en) Method, device and medium for establishing network edge computing system
CN111200657B (en) Method for managing resource state information and resource downloading system
WO2018076765A1 (en) Content distribution method and device for cloud computing system, computing node and system
CN105025053A (en) Distributed file upload method based on cloud storage technology and system
CN108632063B (en) Method, device and system for managing network slice instances
EP3664373A1 (en) Method, device and system for processing network slice instance
CN107105050B (en) Storage and downloading method and system for service objects
CN103024081B (en) Be applicable to the terminal scheduling method of the point-to-point communication of effective guarantee communication system
US11251981B2 (en) Communication method and apparatus
WO2008025297A1 (en) A method for downloading files by adopting the p2p technique and a p2p downloading system
WO2022141021A1 (en) File storage method, apparatus, system and device
CN109561054B (en) Data transmission method, controller and access device
CN113517985B (en) File data processing method and device, electronic equipment and computer readable medium
CN114364031A (en) Service providing method, device and storage medium
WO2005029345A1 (en) Communication management system
CN113010474B (en) File management method, instant messaging method and storage server
CN110795041A (en) Quota method, quota device, server and storage medium of distributed file system
WO2016000303A1 (en) Resource allocation method and system, and computer storage medium
CN115118761A (en) Data storage method, device, equipment and medium
CN115933985A (en) Distributed storage QoS control method and system
CN104468674B (en) Data migration method and device
CN115378962A (en) High-availability communication method and system of storage cluster based on iSCSI protocol
CN105282203A (en) Method and device for establishing concentrated-directory P2P network
CN113766013A (en) Session creation method, device, equipment and storage medium
US11172021B2 (en) File objects download and file objects data exchange

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination