WO2020094064A1 - 性能优化方法、装置、设备及计算机可读存储介质 - Google Patents

性能优化方法、装置、设备及计算机可读存储介质 Download PDF

Info

Publication number
WO2020094064A1
WO2020094064A1 PCT/CN2019/116024 CN2019116024W WO2020094064A1 WO 2020094064 A1 WO2020094064 A1 WO 2020094064A1 CN 2019116024 W CN2019116024 W CN 2019116024W WO 2020094064 A1 WO2020094064 A1 WO 2020094064A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
node
data node
list
client
Prior art date
Application number
PCT/CN2019/116024
Other languages
English (en)
French (fr)
Inventor
胡晓东
张东涛
辛丽华
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2020094064A1 publication Critical patent/WO2020094064A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals

Definitions

  • the present disclosure relates to the field of communication technology, and in particular, to a performance optimization method, device, device, and computer-readable storage medium.
  • Hadoop Distributed File System (Hadoop Distributed File System, HDFS) is a core component of Hadoop, and is currently widely used in big data services.
  • HDFS is mainly responsible for storing file data in Hadoop.
  • the files on HDFS are stored in data blocks.
  • Data block is an abstract concept, it is a logical unit of file storage processing.
  • a data block usually has multiple copies to increase data security. Multiple copies of a data block are usually stored in different data nodes, which may be in the same rack or in different racks.
  • a client in HDFS wants to read a file, it usually reads the copy of the data block in the data node closest to it, such as reading the copy of the data block in the data node in the same rack. At this time, the client always accesses the data node closest to it.
  • the data node closest to the client will have too much pressure, while the data node farther away will have less pressure, resulting in uneven pressure distribution of HDFS.
  • the read performance is reduced.
  • the main purpose of the present disclosure is to provide a performance optimization method, device, and computer-readable storage medium, aiming to solve the problem that the HDFS pressure distribution caused by the client always reading data from the data node closest to it in HDFS Evenly, the technical problem of degraded read performance of the entire HDFS.
  • the performance optimization method includes the steps of: after receiving a data read request sent by a client, acquiring a data node where a data block corresponding to the data read request is located Obtain a preset sorting strategy corresponding to the data nodes, sort the data nodes according to the sorting strategy to obtain a list of data nodes; return the list of data nodes to the client for the client According to the data node list, a data node that provides a read data block service is determined.
  • the present disclosure also provides a performance optimization apparatus, wherein the performance optimization apparatus includes: an acquisition module configured to acquire the data reading after receiving the data reading request sent by the client Request the data node where the corresponding data block is located; obtain the preset sorting strategy corresponding to the data node; the sorting module is used to sort the data nodes according to the sorting strategy to obtain a list of data nodes; the data return module is used to Returning the list of data nodes to the client for the client to determine the data node that provides the read data block service according to the list of data nodes.
  • the present disclosure also provides a performance optimization device, which includes a memory, a processor, and a performance optimization program stored on the memory and runable on the processor, the When the performance optimization program is executed by the processor, the steps of the performance optimization method described above are realized.
  • the present disclosure also provides a computer-readable storage medium having a performance optimization program stored on the computer-readable storage medium, the performance optimization program being executed by a processor to achieve the performance optimization described above Method steps.
  • FIG. 1 is a schematic structural diagram of a hardware operating environment involved in a solution of an embodiment of the present disclosure
  • FIG. 3 is a schematic flowchart of a preferred embodiment of the performance optimization method of the present disclosure.
  • FIG. 4 is a sorting diagram of sorting data nodes according to pressure according to an embodiment of the present disclosure
  • FIG. 5 is a sorting diagram of sorting data nodes according to a distance from a client according to an embodiment of the present disclosure
  • FIG. 6 is a sorting diagram of sorting data nodes according to distance and pressure from a client according to an embodiment of the present disclosure
  • FIG. 7 is a functional schematic block diagram of a preferred embodiment of the performance optimization device of the present disclosure.
  • the present disclosure provides a solution by After receiving the data read request sent by the client, obtain the data node where the data block corresponding to the data read request is located; after obtaining the preset sorting strategy corresponding to the data node, sort the data nodes according to the sorting strategy to obtain data Node list; return the data node list to the client for the client to determine the data node that provides the read data block service according to the data node list. It avoids that the client always reads the data block from the data node that is closest to it, reduces the pressure on the data node that is closest to the client, and avoids the problem of uneven HDFS pressure distribution and reduced read performance of the entire HDFS.
  • FIG. 1 is a schematic structural diagram of a hardware operating environment involved in a solution of an embodiment of the present disclosure.
  • FIG. 1 is a schematic diagram of the hardware operating environment of the performance optimization device.
  • the performance optimization device in the embodiment of the present disclosure may be a PC, a server, such as a metadata server of HDFS, or a mobile terminal device such as a smart phone, tablet computer, or portable computer.
  • the performance optimization device may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, and a communication bus 1002.
  • the communication bus 1002 is used to implement connection communication between these components.
  • the user interface 1003 may include a display screen (Display), an input unit such as a keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface and a wireless interface.
  • the network interface 1004 may optionally include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the memory 1005 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as a disk memory.
  • the memory 1005 may optionally be a storage device independent of the foregoing processor 1001.
  • the performance optimization device may further include a camera, an RF (Radio Frequency) circuit, a sensor, an audio circuit, a WiFi module, and so on.
  • RF Radio Frequency
  • the structure of the performance optimization device shown in FIG. 2 does not constitute a limitation on the performance optimization device, and may include more or fewer components than the illustration, or a combination of certain components, or different components Layout.
  • the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a performance optimization program.
  • the network interface 1004 is mainly used to connect other data nodes, name nodes, or clients; HDFS operation and maintenance personnel can trigger setting instructions through the user interface 1003; and the processor 1001 can be used to call the memory
  • the performance optimization program stored in 1005 and perform the following operations: after receiving the data read request sent by the client, obtain the data node where the data block corresponding to the data read request is located; obtain the pre-correspondence corresponding to the data node Set a sorting strategy, sort the data nodes according to the sorting strategy to obtain a list of data nodes; return the list of data nodes to the client, so that the client determines to provide a read based on the list of data nodes The data node served by the data block.
  • the step of sorting the data nodes according to the sorting strategy to obtain a list of data nodes includes: obtaining pressure values corresponding to the data nodes Determine the pressure corresponding to the data node according to the pressure value, sort the data nodes in order of the pressure from small to large, and obtain a list of data nodes.
  • the step of obtaining the pressure value corresponding to the data node includes: obtaining pressure data of the data node; obtaining the data node according to the pressure data and a preset pressure data score standard The pressure data score; calculate the pressure value corresponding to the data node according to the pressure data score and the corresponding preset pressure data weight value.
  • the step of sorting the data nodes according to the sorting strategy to obtain a list of data nodes includes: sorting the data nodes according to the The distance of the clients is sorted from near to far to obtain a list of preprocessed data nodes; obtain the pressure value corresponding to the data node, and detect whether the pressure value corresponding to the data node meets the preset conditions; when the data node is detected When the corresponding pressure value satisfies the preset condition, move the data node whose pressure value satisfies the preset condition to the end of the pre-processing data node list to obtain the processed data node list.
  • the step of returning the data node list to the client for the client to determine the data node that provides the read data block service according to the data node list includes: The node list is returned to the client for the client to determine the data node ranked first in the data node list as the data node providing the read data block service.
  • the processor 1001 may call the data stored in the memory 1005.
  • the performance optimization program further performs the following operation: after receiving the setting request for setting the sorting strategy, setting the sorting strategy corresponding to the data node according to the setting request.
  • the various embodiments are described with the name node of the metadata server of HDFS as the execution subject.
  • the name node is the HDFS metadata server, used to manage and coordinate the work of the data node. Its memory stores two types of metadata for the entire HDFS: (1) the name space of the file system, namely the file directory tree; Index, that is, the list of data blocks corresponding to each file; (2) The mapping of data blocks and data nodes, that is, the data nodes on which the data blocks are stored.
  • the data node where each data block of each file is located can be obtained from the name node.
  • Each data node corresponds to a port number and IP (Internet Protocol) address. According to the port number or IP address, a data node can be uniquely identified.
  • Arabic numerals are used to name data nodes to distinguish different data nodes. For example, when the number of copies is 3, a data block may be stored in data node 1, data node 2, and data node. 3, this mapping relationship is saved in the name node.
  • the data node is responsible for storing the actual file data block, which is called by the client and the name node, and at the same time, it will periodically send the stored data block information to the name node through the heartbeat.
  • a node is usually a machine.
  • the machine that reads data or files is referred to as a client.
  • the client may be a data node or a node. Name nodes, or other terminals or devices such as personal computers and smart phones. Therefore, the data node and the client where the data block copy is located, and the data node may be in the same rack or in different racks, and the data node and the client may also be the same machine.
  • the process of data reading in HDFS is shown in Figure 2:
  • the client initiates a read data request to the name node.
  • the read data request may be a read file.
  • the name node finds the list of data blocks corresponding to the data to be read by the client according to the data block index, and then finds the data nodes where each copy of each data block is located according to the mapping of the data blocks and the data nodes, and these data nodes Return to the client. As shown in Figure 2, the name node returns data nodes 1, 2, and 3 where the copy of the data block is located to the client.
  • the client determines a data node that provides a read service for it, and sends a read data block request to the data node. As shown in FIG. 2, the client sends a read data block request to the data node 1.
  • the data node 1 After receiving the reading block request, the data node 1 sends a copy of the data block stored on it to the client.
  • the performance optimization method includes:
  • Step S1 After receiving the data read request sent by the client, obtain the data node where the data block corresponding to the data read request is located.
  • the client initiates a data read request to the name node.
  • the name node obtains a list of data blocks corresponding to the data to be read by the client according to the data block index.
  • the data to be read by the client is It is divided into three data blocks for storage.
  • the obtained data block lists are data block 1, data block 2 and data block 3, and each data block has three copies.
  • the name node obtains the data node where each copy of each data block in the data block list is located according to the mapping between the data block and the data node. For example, three copies of data block 1 are stored on data nodes 1, 2, and 3, respectively. For convenience of description, in the following embodiments, description is made according to the number of data blocks being 1 and the number of data block copies being 3.
  • Step S2 Acquire a preset sorting strategy corresponding to the data nodes, and sort the data nodes according to the sorting strategy to obtain a data node list.
  • the name node is preset with a sorting strategy for sorting data nodes.
  • the sorting strategy may be a strategy for sorting according to the distance between the data node and the client, or a strategy for sorting according to the pressure of the data node.
  • the sorting strategy is obtained, and the data nodes are sorted according to the sorting strategy to obtain a list of data nodes. It can be understood that the list of sorted data nodes is a list of data nodes. If data nodes 1, 2, and 3 are sorted, the data node list obtained is data node 1, data node 3, and data node 2.
  • Step S3 Return the data node list to the client, so that the client determines the data node that provides the read data block service according to the data node list.
  • the name node returns the list of data nodes sorted according to the sorting strategy to the client.
  • the client selects a data node from the list of data nodes and determines it to provide read data block service Data node and send a read data block request to the data node.
  • the data node receives the read data block request, it sends the corresponding data block to the client.
  • the client can select the data node ranked first in the data node list, the data node ranked second, or the data node ranked first two first, or randomly select a data Nodes etc.
  • step S3 includes:
  • Step a Return the data node list to the client, so that the client determines the data node ranked first in the data node list as the data node that provides the read data block service.
  • the client After receiving the data node list, the client selects the data node ranked first in the data node list, determines it as the data node providing the read data block service, and ranks the data node ranked first in the data node list Send a read data block request to obtain the data block to be read.
  • step S1 before step S1, it further includes:
  • Step b After receiving the setting request for setting the sorting strategy, set the sorting strategy corresponding to the data node according to the setting request.
  • a variety of sorting strategies are preset in the name node for HDFS operation and maintenance personnel to choose.
  • the operation and maintenance personnel can also set a new sorting strategy in the name node. That is, the operation and maintenance personnel can set different sorting strategies according to specific situations to cope with different HDFS operating environments.
  • the sorting strategy is set according to the setting request. After that, when the data nodes are to be sorted, the data nodes are sorted according to the sorting strategy set according to the setting request.
  • the sorting strategy can be managed through the HDFS configuration file.
  • the operation and maintenance personnel can modify the configuration file in the name node or the specially set management node, such as modifying the sorting strategy of the data node in the configuration file, or setting a new sorting strategy.
  • the name node can obtain the sorting strategy from the HDFS configuration file.
  • the data node where the data block corresponding to the data read request is located is obtained; after obtaining the preset sorting strategy corresponding to the data node, the data is processed according to the sorting strategy
  • the nodes are sorted to obtain a data node list; the data node list is returned to the client, so that the client determines the data node that provides the read data block service according to the data node list.
  • the data node closest to the client is no longer always determined as the data node providing the read data block service, thereby avoiding The client always reads the data block from the data node closest to it, which reduces the pressure on the data node closest to the client, avoids uneven distribution of HDFS pressure, and improves the read performance of the entire HDFS.
  • the second embodiment of the performance optimization method of the present disclosure provides a performance optimization method.
  • the step of sorting the data nodes according to the sorting strategy in step S2 to obtain a list of data nodes includes:
  • Step c Obtain the pressure value corresponding to the data node.
  • the name node After obtaining the data node where the data block is located, the name node first obtains the current pressure value of each data node.
  • the current pressure value of the data node can be calculated by the data node according to its current pressure data and the preset pressure value calculation method in the configuration file.
  • the name node can obtain its current pressure value from the data node.
  • the calculation method of the preset pressure value in the configuration file can be set by the operation and maintenance personnel in the name node or a specially set management node. For example, the pressure value can be obtained by directly adding each pressure data.
  • the pressure value of the data node can also be calculated by the name node based on the current pressure data of the data node obtained from the data node, and the preset pressure value calculation method in the configuration file. At this time, the name node needs to first obtain the data The node obtains the current pressure data of the data node.
  • Pressure data includes but is not limited to disk IO rate (disk read and write rate), memory utilization rate, CPU (Central Processing Unit), and network IO rate (network input and output rate).
  • a data node can monitor its pressure data in real time by setting up a monitoring process. For example, the monitoring process monitors that the current disk IO rate of the data node is 100 megabits per second, the memory usage rate is 20%, the CPU usage rate is 40%, and the network IO rate 50M per second.
  • the disk IO rate and network IO rate monitored by the monitoring process may also be in the form of a percentage, that is, the disk IO rate and network IO rate are converted into percentages, for example, the disk IO rate is 30%.
  • the data node may only add a monitoring process to monitor the pressure data after detecting that the sorting strategy of the data node in the configuration file is the first sorting strategy.
  • Step d Determine the pressure corresponding to the data node according to the pressure value, sort the data nodes according to the order of the pressure from small to large, and obtain a data node list.
  • the pressure of each data node can be determined according to the pressure value corresponding to each data node, and the data node can be adjusted according to the pressure. Arrange them in order from small to large to get the data node list. At this time, the data node with the least pressure is ranked at the top of the data node list. There may be two possible cases here. One is that the larger the pressure value, the greater the pressure on the data node. The other is that the smaller the pressure value, the smaller the pressure on the data node. These two situations are based on the calculation of the data node. The calculation method used for the pressure value is different.
  • the data nodes with less pressure are placed in front of the data list.
  • the larger the pressure value of the data node the smaller the pressure of the data node.
  • the data node 2 with the highest pressure value is ranked at the front of the data node list, and the data node 1 with the lowest pressure value is ranked at the end. surface.
  • step c includes:
  • Step e Obtain pressure data of the data node.
  • the name node obtains the current pressure data of the data node from the data node.
  • Step f Obtain the pressure data score of the data node according to the pressure data and the preset pressure data score standard.
  • the configuration file is preset with the pressure data score standard of each pressure data of the data node.
  • the pressure data score standard reflects the mapping relationship between the pressure data and the pressure data score.
  • HDFS operation and maintenance personnel can Set the pressure data score standard for the situation. For example, you can set the disk IO rate score standard to the disk IO rate score standard shown in Table 1, reflecting the mapping relationship between the disk IO rate and the disk IO rate score.
  • Table 2 is the CPU usage rate score standard
  • Table 3 is the memory usage rate score standard
  • Table 4 is the network IO rate score standard. It should be understood that the pressure data score standards are not limited to the various score standards shown in the table.
  • Disk IO rate Disk IO rate score 0-10% 10 11% -20% 9 21% -30% 8 31% -40% 7 41% -50% 6 51% -60% 5 61% -70% 4 71% -80% 3 81% -90% 2 91% -100% 1
  • CPU usage CPU usage score 0-10% 10 11% -20% 9 21% -30% 8 31% -40% 7 41% -50% 6 51% -60% 5 61% -70% 4 71% -80% 3 81% -90% 2 91% -100% 1
  • Network IO rate Network IO rate score 0-10% 10 11% -20% 9 21% -30% 8 31% -40% 7 41% -50% 6 51% -60% 5 61% -70% 4 71% -80% 3 81% -90% 2 91% -100% 1
  • the name node obtains the pressure data score standard from the configuration file, and compares each pressure data of the data node with the corresponding score standard to obtain each pressure data score. For example, when the current disk IO rate of the data node is 20%, the CPU usage rate is 30%, the memory usage rate is 40%, and the network IO rate is 20%, the data is obtained according to the score criteria shown in Table 1-4.
  • the node's disk IO rate score is 9, the CPU usage score is 8, the memory usage score is 7, and the network IO score is 9.
  • step g the pressure value corresponding to the data node is calculated according to the pressure data score and the corresponding preset pressure data weight value.
  • the configuration file is preset with the weight value of each pressure data of the data node.
  • HDFS operation and maintenance personnel can set the weight value of each pressure data according to the specific situation, for example, the disk IO rate weight value can be set to 10, CPU The usage weight value is set to 5, the memory usage weight value is set to 5, and the network IO rate weight value is set to 8.
  • the pressure value corresponding to the data node is obtained by multiplying each pressure data score and the corresponding pressure data weight value and adding them.
  • the calculation process is also the same as the process of calculating the pressure value by the above-mentioned name node.
  • the first data node in the data node list is the data node with the least pressure, thereby avoiding always being closest to the client
  • the data nodes at the top of the list make the pressure of the closest data node too large and the HDFS pressure distribution uneven, which improves the read performance of the entire HDFS.
  • the third embodiment of the performance optimization method of the present disclosure provides a performance optimization method.
  • the step of sorting the data nodes according to the sorting strategy in step S2 to obtain a list of data nodes includes:
  • step h the data nodes are sorted in order from the shortest to the farthest from the client to obtain a list of preprocessed data nodes.
  • the distance between the data node and the client is the shortest, when the data node and the client are on different machines in the same rack, the distance is farther, when the data node and the client are on different machines When the rack is mounted, the distance is farther.
  • the distance between each data node where the data block is located and the client may be the same or different.
  • the name node sorts the data nodes according to the distance from the data node to the client in the order of near and far. Two data nodes with the same distance from the client can be sorted in either order Process the data node list. For example, the name node sorts the data nodes 1, 2, and 3 in the order of the distance from the client to the farthest, to obtain the pre-processed data node list shown in FIG. 5.
  • Step i Obtain the pressure value corresponding to the data node, and detect whether the pressure value corresponding to the data node satisfies the preset condition.
  • the process of acquiring the pressure value corresponding to the data node by the name node is the same as the process described in step a in the second embodiment, and will not be described in detail here.
  • the name node After the name node obtains the pressure value corresponding to the data node, it traverses the list of pre-processed data nodes to check whether the pressure value of each data node meets the preset conditions.
  • the preset condition can be set according to specific conditions. For example, when the pressure value of the data node is greater, the pressure is greater, and the preset condition can be set when the pressure value of the data node is greater than the preset pressure value; When the pressure value of the data node is larger, the pressure is smaller. It can be set that when the pressure value of the data node is less than the preset pressure value, the preset condition is satisfied.
  • the preset pressure value can be set according to specific conditions.
  • Step j When it is detected that the pressure value corresponding to the data node satisfies the preset condition, move the data node whose pressure value satisfies the preset condition to the end of the pre-processed data node list to obtain the processed data node List.
  • the data node whose pressure value satisfies the preset condition is moved to the end of the preprocessing data node list. After traversing all the data nodes, get the processed, that is, the final list of data nodes. At this time, the pressure of the data node ranked first is relatively small, and the distance to the client is relatively close. As shown in FIG. 6, it is a list of data nodes obtained after moving the data node 1 whose pressure value meets the preset condition in the pre-processing data node list shown in FIG. 5 to the end.
  • the data nodes are sorted in order of the distance from the client to the nearest, and then the data nodes satisfy the preset condition, that is, the data nodes whose pressure exceeds the preset pressure are arranged after all the data nodes , So that the data node ranked at the top of the data node list is a data node with relatively low pressure and relatively close to the client, thereby avoiding always placing the data node closest to the client at the front, making this distance The recent problem of excessive pressure on data nodes.
  • the fourth embodiment of the performance optimization method of the present disclosure provides a performance optimization method.
  • the step of sorting the data nodes according to the sorting strategy in step S2 to obtain a list of data nodes includes:
  • Step k Randomly sort the data nodes to obtain a list of data nodes.
  • the name node randomly sorts the data nodes to obtain a list of data nodes.
  • the random sorting method can be any method that can randomly sort the data.
  • the present disclosure also provides a performance optimization apparatus.
  • the performance optimization apparatus includes: an acquisition module 10, configured to acquire the data read request corresponding to the data read request after receiving the data read request sent by the client The data node where the data block is located; the preset sorting strategy corresponding to the data node is obtained; the sorting module 20 is used to sort the data nodes according to the sorting strategy to obtain a list of data nodes; the data return module 30 is used to Returning the list of data nodes to the client for the client to determine the data node that provides the read data block service according to the list of data nodes.
  • the sorting module 20 when the sorting strategy is the first sorting strategy, includes: a first acquiring unit for acquiring the pressure value corresponding to the data node; a first sorting unit for The pressure value determines the pressure corresponding to the data node, and sorts the data nodes in order of the pressure from small to large to obtain a list of data nodes.
  • the first acquiring unit further includes: an acquiring subunit for acquiring pressure data of the data node; a calculating subunit for calculating score criteria based on the pressure data and preset pressure data To obtain the pressure data score of the data node; and also used to calculate the pressure value corresponding to the data node according to the pressure data score and the corresponding preset pressure data weight value.
  • the sorting module 20 further includes: a second sorting unit for sorting the data node from the nearest to the farthest according to the distance from the client Sort in order to obtain a list of preprocessed data nodes; a second acquisition unit to acquire the pressure value corresponding to the data node; a detection unit to detect whether the pressure value corresponding to the data node satisfies preset conditions; the first The second sorting unit is also used to move the data node whose pressure value satisfies the preset condition to the end of the pre-processed data node list when it is detected that the pressure value corresponding to the data node satisfies the preset condition. List of data nodes.
  • the sorting module 20 further includes: a third sorting unit for randomly sorting the data nodes to obtain a list of data nodes.
  • the data return module 30 is further configured to return the list of data nodes to the client, so that the client can determine the data node ranked first in the list of data nodes as Data nodes that provide read data block services.
  • the performance optimization apparatus further includes: a setting module, configured to set a sorting strategy corresponding to the data node according to the setting request after receiving a setting request to set the sorting strategy.
  • an embodiment of the present disclosure also proposes a computer-readable storage medium having a performance optimization program stored on the computer-readable storage medium, where the performance optimization program is executed by a processor to implement the steps of the performance optimization method described above.
  • the methods in the above embodiments can be implemented by means of software plus a necessary general hardware platform, and of course, can also be implemented by hardware, but in many cases the former is better Implementation.
  • the technical solution of the present disclosure can be embodied in the form of a software product in essence or part that contributes to some situations, and the computer software product is stored in a storage medium (such as ROM / RAM, The magnetic disk and the optical disk) include several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the embodiments of the present disclosure.
  • a terminal device which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.
  • the present disclosure obtains the data node where the data block corresponding to the data reading request is located after receiving the data reading request sent by the client, and obtains the preset sorting strategy corresponding to the data node, and then the data node is processed according to the sorting strategy Sorting to obtain a list of data nodes; returning the list of data nodes to the client for the client to determine the data node that provides the read data block service according to the list of data nodes.
  • the data node closest to the client is no longer always determined as the data node providing the read data block service, thereby avoiding The client always reads the data block from the data node closest to it, which reduces the pressure on the data node closest to the client, avoids uneven distribution of HDFS pressure, and improves the read performance of the entire HDFS.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种性能优化方法、装置、设备及计算机可读存储介质,所述方法包括:当接收到客户端发送的数据读取请求后,获取所述数据读取请求对应的数据块所在的数据节点(S1);获取所述数据节点对应的预设排序策略,按照所述排序策略对所述数据节点进行排序,得到数据节点列表(S2);将所述数据节点列表返回给所述客户端,以供所述客户端根据所述数据节点列表确定提供读数据块服务的数据节点(S3)。

Description

性能优化方法、装置、设备及计算机可读存储介质
本公开要求享有2018年11月07日提交的名称为“性能优化方法、装置、设备及计算机可读存储介质”的中国专利申请CN201811323508.X的优先权,其全部内容通过引用并入本文中。
技术领域
本公开涉及通信技术领域,尤其涉及一种性能优化方法、装置、设备及计算机可读存储介质。
背景技术
Hadoop是一个开源分布式计算平台,Hadoop分布式文件***(Hadoop Distributed File System,HDFS)是Hadoop的一个核心组成部分,目前广为大数据服务所应用。HDFS在Hadoop中主要负责存储文件数据。HDFS上的文件按照数据块进行存储。数据块是一个抽象概念,它是文件存储处理的逻辑单元。一个数据块通常有多个副本以增加数据安全性,数据块的多个副本通常被存放在不同的数据节点中,这些数据节点可能在同一个机架中,也可能在不同机架中。当HDFS中的客户端要读取文件时,通常优先读取与其距离最近的数据节点中的数据块副本,如读取与其在同一机架中的数据节点中的数据块副本。此时,客户端总是访问与其距离最近的数据节点,与客户端距离最近的数据节点会压力过大,相对较远的数据节点则压力偏小,从而导致HDFS压力分布不均匀,整个HDFS的读性能下降。
发明内容
本公开的主要目的在于提供一种性能优化方法、设备及计算机可读存储介质,旨在解决在HDFS中由于客户端总是从与其距离最近的数据节点中读取数据,导致的HDFS压力分布不均匀,整个HDFS的读性能下降的技术问题。
为实现上述目的,本公开提供一种性能优化方法,所述性能优化方法包括步骤:当接收到客户端发送的数据读取请求后,获取所述数据读取请求对应的数据块所在的数据节点;获取所述数据节点对应的预设排序策略,按照所述排序策略对所述数据节点进行排序,得到数据节点列表;将所述数据节点列表返回给所述客户端,以供所述客户端根据所述数据节点列表确定提供读数据块服务的数据节点。
此外,为实现上述目的,本公开还提供一种性能优化装置,其中,所述性能优化装置 包括:获取模块,用于当接收到客户端发送的数据读取请求后,获取所述数据读取请求对应的数据块所在的数据节点;获取所述数据节点对应的预设排序策略;排序模块,用于按照所述排序策略对所述数据节点进行排序,得到数据节点列表;数据返回模块,用于将所述数据节点列表返回给所述客户端,以供所述客户端根据所述数据节点列表确定提供读数据块服务的数据节点。
此外,为实现上述目的,本公开还提供一种性能优化设备,所述性能优化设备包括存储器、处理器和存储在所述存储器上并可在所述处理器上运行的性能优化程序,所述性能优化程序被所述处理器执行时实现如上所述的性能优化方法的步骤。
此外,为实现上述目的,本公开还提供一种计算机可读存储介质,所述计算机可读存储介质上存储有性能优化程序,所述性能优化程序被处理器执行时实现如上所述的性能优化方法的步骤。
附图说明
图1本公开实施例方案涉及的硬件运行环境的结构示意图;
图2本公开实施例方案涉及的HDFS中数据读的流程图;
图3为本公开性能优化方法较佳实施例的流程示意图;
图4为本公开实施例方案涉及的一种将数据节点按压力大小排序的排序图;
图5为本公开实施例方案涉及的一种将数据节点按与客户端之间的距离排序的排序图;
图6为本公开实施例方案涉及的一种将数据节点按与客户端之间的距离以及压力排序的排序图;
图7为本公开性能优化装置较佳实施例的功能示意图模块图。
本公开目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。
具体实施方式
应当理解,此处所描述的具体实施例仅仅用以解释本公开,并不用于限定本公开。
由于目前存在HDFS中客户端总是从与其距离最近的数据节点中读取数据,导致的HDFS压力分布不均匀,整个HDFS的读性能下降的技术问题,本公开提供一种解决方案,通过当接收到客户端发送的数据读取请求后,获取该数据读取请求对应的数据块所在的数据节点;获取数据节点对应的预设排序策略后,按照该排序策略对该数据节点进行排序, 得到数据节点列表;将该数据节点列表返回给该客户端,以供该客户端根据该数据节点列表确定提供读数据块服务的数据节点。避免了客户端总是从与其距离最近的数据节点中读取数据块,减小了与客户端距离最近的数据节点的压力,避免了HDFS压力分布不均匀、整个HDFS的读性能下降的问题。
本公开提供了一种性能优化设备,参照图1,图1是本公开实施例方案涉及的硬件运行环境的结构示意图。
需要说明的是,图1即可为性能优化设备的硬件运行环境的结构示意图。本公开实施例性能优化设备可以是PC、服务器,例如HDFS的元数据服务器,也可以是智能手机、平板电脑、便携计算机等可移动式终端设备。
如图1所示,该性能优化设备可以包括:处理器1001,例如CPU,网络接口1004,用户接口1003,存储器1005,通信总线1002。其中,通信总线1002用于实现这些组件之间的连接通信。用户接口1003可以包括显示屏(Display)、输入单元比如键盘(Keyboard),可选用户接口1003还可以包括标准的有线接口、无线接口。网络接口1004可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储装置。
在一个实施例中,性能优化设备还可以包括、摄像头、RF(Radio Frequency,射频)电路,传感器、音频电路、WiFi模块等等。本领域技术人员可以理解,图2中示出的性能优化设备结构并不构成对性能优化设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
如图1所示,作为一种计算机存储介质的存储器1005中可以包括操作***、网络通信模块、用户接口模块以及性能优化程序。
在图1所示的性能优化设备中,网络接口1004主要用于连接其他数据节点,名字节点或客户端;HDFS运维人员可通过用户接口1003触发设置指令;而处理器1001可以用于调用存储器1005中存储的性能优化程序,并执行以下操作:当接收到客户端发送的数据读取请求后,获取所述数据读取请求对应的数据块所在的数据节点;获取所述数据节点对应的预设排序策略,按照所述排序策略对所述数据节点进行排序,得到数据节点列表;将所述数据节点列表返回给所述客户端,以供所述客户端根据所述数据节点列表确定提供读数据块服务的数据节点。
在一个实施例中,当所述排序策略为第一排序策略时,所述按照所述排序策略对所述数据节点进行排序,得到数据节点列表的步骤包括:获取所述数据节点对应的压力值;根据所述压力值确定所述数据节点对应的压力,将所述数据节点按照所述压力从小到大的顺序排序,得到数据节点列表。
在一个实施例中,所述获取所述数据节点对应的压力值的步骤包括:获取所述数据节点的压力数据;根据所述压力数据和预设的压力数据分值标准,得到所述数据节点的压力数据分值;根据所述压力数据分值和对应预设的压力数据权重值,计算得到所述数据节点对应的压力值。
在一个实施例中,当所述排序策略为第二排序策略时,所述按照所述排序策略对所述数据节点进行排序,得到数据节点列表的步骤包括:将所述数据节点按照与所述客户端的距离由近到远的顺序排序,得到预处理数据节点列表;获取所述数据节点对应的压力值,检测所述数据节点对应的压力值是否满足预设条件;当检测到所述数据节点对应的压力值满足预设条件时,将所述压力值满足预设条件的数据节点移动到所述预处理数据节点列表的末端,得到处理后的数据节点列表。
在一个实施例中,所述将所述数据节点列表返回给所述客户端,以供所述客户端根据所述数据节点列表确定提供读数据块服务的数据节点的步骤包括:将所述数据节点列表返回给所述客户端,以供所述客户端将所述数据节点列表中排在最前面的数据节点确定为提供读数据块服务的数据节点。
在一个实施例中,所述当接收到客户端发送的数据读取请求后,获取所述数据读取请求对应的数据块所在的数据节点的步骤之前,处理器1001可以调用存储器1005中存储的性能优化程序,还执行以下操作:当接收到设置所述排序策略的设置请求后,根据所述设置请求设置所述数据节点对应的排序策略。
基于上述的硬件结构,提出本公开性能优化方法的各个实施例。在本公开性能优化方法的各个实施例中,为了便于描述,以HDFS的元数据服务器名字节点为执行主体进行阐述各个实施例。HDFS体系结构中主要有两类节点,一类是名字节点,一类是数据节点。名字节点是HDFS的元数据服务器,用于管理并协调数据节点的工作,其内存中保存整个HDFS的两类元数据:(1)文件***的名字空间,即文件目录树;以及文件的数据块索引,即每个文件对应的数据块列表;(2)数据块与数据节点的映射,即数据块存储在哪个数据节点上。从名字节点中可以获得每个文件的每个数据块所在的数据节点。数据节点均对应一个端口号和IP(Internet Protocol,网络协议)地址,根据该端口号或IP地址可唯一识别一个数据节点。以下各实施例中为方便描述,用***数字给数据节点命名,以区分不同的数据节点,如当副本数为3个时,一个数据块可能被存放在数据节点1,数据节点2和数据节点3上,这个映射关系被保存在名字节点中。数据节点负责存储实际的文件数据块,被客户端和名字节点调用,同时,它会通过心跳定时向名字节点发送所存储的数据块信息。需要说明的是,在HDFS中,通常一个节点是一个机器,在本公开各实施例中,将要读取数据或文件的机器均称作客户端,客户端可能是一个数据节点,也可能是一个名字节点, 或者其他个人计算机、智能手机等终端或设备。因此数据块副本所在的数据节点与客户端,以及数据节点之间可能在同一机架,也可能在不同机架,数据节点与客户端也可能是同一台机器。HDFS中数据读的流程如图2所示:
1、客户端向名字节点发起读数据请求,该读数据请求可以是读取文件。
2、名字节点根据数据块索引找到客户端要读取的数据对应的数据块列表,再根据数据块与数据节点的映射,找到每个数据块的各个副本所在的数据节点,并将这些数据节点返回给该客户端。如图2所示,名字节点将数据块副本所在的数据节点1、2和3返回给客户端。
3、客户端从名字节点返回的数据节点中,确定一个为其提供读服务的数据节点,向该数据节点发送读数据块请求。如图2所示,客户端向该数据节点1发送读数据块请求。
4、数据节点1接收到读数块请求后,将存储在其上的数据块副本发送给该客户端。
参照图3,本公开性能优化方法较佳实施例提供一种性能优化方法,需要说明的是,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。所述性能优化方法包括:
步骤S1,当接收到客户端发送的数据读取请求后,获取所述数据读取请求对应的数据块所在的数据节点。
客户端向名字节点发起数据读取请求,名字节点在接收到该数据读取请求后,根据数据块索引获取客户端要读取的数据对应的数据块列表,如客户端要读取的数据被分为三个数据块存储,获取到的数据块列表为数据块1、数据块2和数据块3,每个数据块分别有三个副本。名字节点根据数据块与数据节点的映射,获取数据块列表中的每个数据块的各个副本所在的数据节点,如数据块1的三个副本分别存储在数据节点1、2和3上。为方便描述,以下各实施例中,按照数据块个数为1,数据块副本数为3进行描述。
步骤S2,获取所述数据节点对应的预设排序策略,按照所述排序策略对所述数据节点进行排序,得到数据节点列表。
名字节点中预先设置有对数据节点进行排序的排序策略,排序策略可以是按照数据节点与客户端的距离进行排序的策略,也可以是按照数据节点的压力大小进行排序的策略等。当名字节点获取到数据块所在的数据节点后,获取该排序策略,根据该排序策略对数据节点进行排序,得到数据节点列表。可以理解的是,排序后的数据节点组成的列表即为数据节点列表。如对数据节点1、2、3进行排序,得到的数据节点列表为,数据节点1、数据节点3、数据节点2。
步骤S3,将所述数据节点列表返回给所述客户端,以供所述客户端根据所述数据节点列表确定提供读数据块服务的数据节点。
名字节点将根据该排序策略排序后得到的数据节点列表返回给客户端,客户端在接收 到该数据节点列表后,从该数据节点列表中选择一个数据节点,将其确定为提供读数据块服务的数据节点,并向该数据节点发送读数据块请求。当该数据节点接收到读数据块请求后,会将对应的数据块发送给客户端。需要说明的是,客户端可以选择排在数据节点列表第一位的数据节点,也可以选择排在第二为的数据节点,或者优先选择排在前二位的数据节点,或者随机选择一个数据节点等。
在一个实施例中,为了客户端快速确定读取数据块的数据节点,减少客户端的计算量,步骤S3包括:
步骤a,将所述数据节点列表返回给所述客户端,以供所述客户端将所述数据节点列表中排在最前面的数据节点确定为提供读数据块服务的数据节点。
客户端在接收到该数据节点列表后,选择数据节点列表中排在第一的数据节点,将其确定为提供读数据块服务的数据节点,并向数据节点列表中排在第一的数据节点发送读数据块请求,以获得要读取的数据块。
在一个实施例中,步骤S1之前,还包括:
步骤b,当接收到设置所述排序策略的设置请求后,根据所述设置请求设置所述数据节点对应的排序策略。
名字节点中预设有多种排序策略可供HDFS运维人员选择,运维人员也可以在名字节点中设置新的排序策略。即运维人员可以根据具体情况,设置不同的排序策略,以应对不同的HDFS运行环境。当名字节点接收到设置排序策略的设置请求后,根据该设置请求设置排序策略,在此之后,当要对数据节点进行排序时,就按照根据该设置请求设置的排序策略对数据节点进行排序。
在一个实施例中,可以通过HDFS配置文件对排序策略进行管理。运维人员可以在名字节点或者专门设置的管理节点中修改配置文件,如修改配置文件中数据节点的排序策略,或者设置新的排序策略。配置文件一经修改,将同步到HDFS的各个名字节点和各个数据节点中。名字节点可从HDFS配置文件中获取排序策略。
本实施例通过当接收到客户端发送的数据读取请求后,获取该数据读取请求对应的数据块所在的数据节点;获取数据节点对应的预设排序策略后,按照该排序策略对该数据节点进行排序,得到数据节点列表;将该数据节点列表返回给该客户端,以供该客户端根据该数据节点列表确定提供读数据块服务的数据节点。由于客户端根据该排序策略排序后所得的数据节点列表确定提供读数据块服务的数据节点,不再总是将与客户端距离最近的数据节点确定为提供读数据块服务的数据节点,从而避免了客户端总是从与其距离最近的数据节点中读取数据块,减小了与客户端距离最近的数据节点的压力,避免了HDFS压力分布不均匀,提高了整个HDFS的读性能。
在一个实施例中,基于上述第一实施例,本公开性能优化方法第二实施例提供一种性能优化方法。在本实施例中,当名字节点获取到的排序策略为第一排序策略时,步骤S2中的按照所述排序策略对所述数据节点进行排序,得到数据节点列表的步骤包括:
步骤c,获取所述数据节点对应的压力值。
名字节点在获取到数据块所在的数据节点后,首先获取每个数据节点当前的压力值。数据节点当前的压力值可由该数据节点根据其当前的压力数据,以及配置文件中预设的压力值计算方法计算得出,此时,名字节点从数据节点中获取其当前的压力值即可。其中配置文件中预设的压力值计算方法可由运维人员在名字节点或专门设置的管理节点中进行设置,如将各个压力数据直接相加即得到压力值。
数据节点的压力值也可以由名字节点根据从数据节点中获取到的该数据节点当前的压力数据,和配置文件中预设的压力值计算方法计算得出,此时,名字节点需要先从数据节点获取该数据节点的当前的压力数据。
压力数据包括但不限于磁盘IO速率(磁盘读写速率),内存使用率,CPU(Central Processing Unit,中央处理器)使用率和网络IO速率(网络输入输出速率)。数据节点可通过设置监控进程,实时监控其压力数据,如监控进程监控到该数据节点当前的磁盘IO速率为100兆每秒,内存使用率为20%,CPU使用率为40%,网络IO速率为50M每秒。其中监控进程监控到的磁盘IO速率和网络IO速率也可以是百分比的形式,即将磁盘IO速率和网络IO速率转换成了百分比,如磁盘IO速率为30%。
需要说明的是,数据节点可以是在检测到配置文件中的数据节点的排序策略为第一排序策略后,才增设监控进程对其压力数据进行监控。
步骤d,根据所述压力值确定所述数据节点对应的压力,将所述数据节点按照所述压力从小到大的顺序排序,得到数据节点列表。
当获取到每个数据节点对应的压力值后,由于压力值大小表示该数据节点压力的大小,因此可以根据每个数据节点对应的压力值,确定每个数据节点的压力,将数据节点按照压力从小到大的顺序排列,得到数据节点列表,此时,压力最小的数据节点被排在数据节点列表的最前面。此处可包含两种可能的情况,一种是压力值越大代表数据节点的压力越大,一种是压力值越小代表数据节点的压力越小,这两种情况是根据计算数据节点的压力值时所采用的计算方法不同而导致的。无论是上述哪一种情况,都是将压力小的数据节点排在数据列表的前面。如图4所示,数据节点的压力值越大,表示该数据节点的压力越小,将压力值最大的数据节点2排在数据节点列表的最前面,压力值最小的数据节点1排在最后面。
在一个实施例中,步骤c包括:
步骤e,获取所述数据节点的压力数据。
名字节点从数据节点中获取该数据节点当前的压力数据。
步骤f,根据所述压力数据和预设的压力数据分值标准,得到所述数据节点的压力数据分值。
配置文件中预设有数据节点各个压力数据的压力数据分值标准,压力数据分值标准反映压力数据与压力数据分值之间的映射关系,HDFS运维人员在设置排序策略时,可以根据具体情况设置压力数据分值标准,例如可将磁盘IO速率分值标准设置为表1所示的磁盘IO速率分值标准,反映磁盘IO速率与磁盘IO速率分值之间的映射关系,类似地,表2为CPU使用率分值标准,表3为内存使用率分值标准,表4为网络IO速率分值标准。应当理解的是,压力数据分值标准不限于表中所示的各个分值标准。
磁盘IO速率 磁盘IO速率分值
0-10% 10
11%-20% 9
21%-30% 8
31%-40% 7
41%-50% 6
51%-60% 5
61%-70% 4
71%-80% 3
81%-90% 2
91%-100% 1
表1
CPU使用率 CPU使用率分值
0-10% 10
11%-20% 9
21%-30% 8
31%-40% 7
41%-50% 6
51%-60% 5
61%-70% 4
71%-80% 3
81%-90% 2
91%-100% 1
表2
内存使用率 内存使用率分值
0-10% 10
11%-20% 9
21%-30% 8
31%-40% 7
41%-50% 6
51%-60% 5
61%-70% 4
71%-80% 3
81%-90% 2
91%-100% 1
表3
网络IO速率 网络IO速率分值
0-10% 10
11%-20% 9
21%-30% 8
31%-40% 7
41%-50% 6
51%-60% 5
61%-70% 4
71%-80% 3
81%-90% 2
91%-100% 1
表4
名字节点从配置文件中获取该压力数据分值标准,将数据节点的各个压力数据与对应的分值标准进行比对,得到各个压力数据分值。如当数据节点当前的磁盘IO速率为20%,CPU使用率为30%,内存使用率为40%,网络IO为20%,则根据表1-4所示的各个分值标准,得到该数据节点的磁盘IO速率分值为9,CPU使用率分值为8,内存使用率分值为7,网络IO分值为9。
步骤g,根据所述压力数据分值和对应预设的压力数据权重值,计算得到所述数据节点对应的压力值。
配置文件中预设有数据节点各个压力数据的权重值,HDFS运维人员在设置排序策略时,可以根据具体情况设置各个压力数据的权重值,例如可将磁盘IO速率权重值设置为10,CPU使用率权重值设置为5,内存使用率权重值设置为5,网络IO速率权重值设置为8。
名字节点从配置文件中获取到各个压力数据的权重值后,将各个压力数据分值与对应的压力数据权重值相乘后相加,即得到该数据节点对应的压力值。如根据上述具体例子中的各个压力数据分值以及各个压力数据的权重值,计算得到该数据节点的压力值为9*10+8*5+7*5+9*10=255。
需要说明的是,在由数据节点计算其当前的压力值时,计算过程也与上述名字节点计算压力值的过程相同。
在本实施例中,通过按照数据节点的压力从小到大的顺序给数据节点进行排序,使得数据节点列表中最前面的数据节点为压力最小的数据节点,从而避免了总是将与客户端最近的数据节点排在最前面,使得该距离最近的数据节点压力过大、HDFS压力分布不均匀的问题,提高了整个HDFS的读性能。
在一个实施例中,基于上述第一或第二实施例,本公开性能优化方法第三实施例提供一种性能优化方法。在本实施例中,当名字节点获取到的排序策略为第二排序策略时,步骤S2中的按照所述排序策略对所述数据节点进行排序,得到数据节点列表的步骤包括:
步骤h,将所述数据节点按照与所述客户端的距离由近到远的顺序排序,得到预处理数据节点列表。
当数据节点与客户端是同一台机器时,该数据节点与客户端的距离最近,当数据节点与客户端在同一机架的不同机器上时,距离较远,当数据节点与客户端在不同机架时,距离更远。数据块所在的各个数据节点与客户端的距离可能相同,也可能不相同。名字节点在获取到数据块所在的数据节点后,按照数据节点与客户端的距离由近及远的顺序给数据节点进行排序,两个与客户端距离相同的数据节点,可任一排序,得到预处理数据节点列表。如名字节点将数据节点1、2、3按照与客户端距离由近到远的顺序排序,得到图5所示的预处理数据节点列表。
步骤i,获取所述数据节点对应的压力值,检测所述数据节点对应的压力值是否满足预设条件。
名字节点获取数据节点对应的压力值的过程与第二实施例中的步骤a所述过程相同,在此不再详细赘述。当名字节点获取到数据节点对应的压力值后,遍历预处理数据节点列表,检测每个数据节点的压力值是否满足预设条件。其中,预设条件可根据具体情况进行设置,如当数据节点的压力值越大表示其压力越大时,可以设置为当数据节点的压力值大于预设压力值时即为满足预设条件;当数据节点的压力值越大表示其压力越小时,可以设置为当数据节点的压力值小于预设压力值时即为满足预设条件。预设压力值可根据具体情况设置。
步骤j,当检测到所述数据节点对应的压力值满足预设条件时,将所述压力值满足预设条件的数据节点移动到所述预处理数据节点列表的末端,得到处理后的数据节点列表。
当检测到数据节点对应的压力值满足预设条件时,将该压力值满足预设条件数据节点移动到预处理数据节点列表的末端。当遍历完所有数据节点后,得到处理后的,即最终的数据节点列表。此时,排在最前面的数据节点压力相对较小,与客户端的距离相对较近。如图6所示,是将图5所示的预处理数据节点列表中,压力值满足预设条件的数据节点1移动到末端后,得到的数据节点列表。
需要说明的是,如图5所示,若名字节点将此按照数据节点与客户端的距离排过序的预处理数据节点列表返回给客户端,并且总是按照数据节点与客户端的距离排序,数据节点1将会经常排在最前面,而客户端优先选择排在最前面的数据节点读取数据块时,排在前面的数据节点1由于经常被客户端访问,会变得压力过大,而后面的数据节点2和3则压力偏小,从而可能造成HDFS压力分布不均。因此,在本实施例中,通过将数据节点先按照与客户端的距离由近到远的顺序排序,再将数据节点满足预设条件,即压力超过预设压力的数据节点排到所有数据节点之后,使得排在数据节点列表最前面的数据节点为压力相对较小,与客户端距离相对较近的数据节点,从而避免了总是将与客户端最近的数据节点排在最前面,使得该距离最近的数据节点压力过大的问题。
在一个实施例中,基于上述第一、第二或第三实施例,本公开性能优化方法第四实施例提供一种性能优化方法。在本实施例中,当名字节点获取到的排序策略为第三排序策略时,步骤S2中的按照所述排序策略给所述数据节点排序,得到数据节点列表的步骤包括:
步骤k,将所述数据节点进行随机排序,得到数据节点列表。
名字节点在获取到数据块所在的数据节点后,将数据节点进行随机排序,得到数据节点列表。随机排序的方法可以是任何能够将数据进行随机排序的方法。当数据块所在的各个数据节点在同一机架中时,也即各个数据节点与客户端的距离相同时,不需要考虑数据节点与客户端的距离,此时按照随机排序的策略给数据节点进行排序,使得数据节点被客户端访问的几率相同,因此避免了由于某一个数据节点压力过大,导致的HDFS压力分布不均问题。
此外,参照图7,本公开还提供一种性能优化装置,所述性能优化装置包括:获取模块10,用于当接收到客户端发送的数据读取请求后,获取所述数据读取请求对应的数据块所在的数据节点;获取所述数据节点对应的预设排序策略;排序模块20,用于按照所述排序策略对所述数据节点进行排序,得到数据节点列表;数据返回模块30,用于将所述数据节点列表返回给所述客户端,以供所述客户端根据所述数据节点列表确定提供读数据块服务的数据节点。
在一个实施例中,当所述排序策略为第一排序策略时,所述排序模块20包括:第一获取单元,用于获取所述数据节点对应的压力值;第一排序单元,用于根据所述压力值确定所述数据节点对应的压力,将所述数据节点按照所述压力从小到大的顺序排序,得到数据节点列表。
在一个实施例中,所述第一获取单元还包括:获取子单元,用于获取所述数据节点的压力数据;计算子单元,用于根据所述压力数据和预设的压力数据分值标准,得到所述数 据节点的压力数据分值;还用于根据所述压力数据分值和对应预设的压力数据权重值,计算得到所述数据节点对应的压力值。
在一个实施例中,当所述排序策略为第二排序策略时,所述排序模块20还包括:第二排序单元,用于将所述数据节点按照与所述客户端的距离由近到远的顺序排序,得到预处理数据节点列表;第二获取单元,用于获取所述数据节点对应的压力值;检测单元,用于检测所述数据节点对应的压力值是否满足预设条件;所述第二排序单元还用于当检测到所述数据节点对应的压力值满足预设条件时,将所述压力值满足预设条件的数据节点移动到所述预处理数据节点列表的末端,得到处理后的数据节点列表。
在一个实施例中,当所述排序策略为第三排序策略时,所述排序模块20还包括:第三排序单元,用于将所述数据节点进行随机排序,得到数据节点列表。
在一个实施例中,所述数据返回模块30还用于将所述数据节点列表返回给所述客户端,以供所述客户端将所述数据节点列表中排在最前面的数据节点确定为提供读数据块服务的数据节点。
在一个实施例中,所述性能优化装置还包括:设置模块,用于当接收到设置所述排序策略的设置请求后,根据所述设置请求设置所述数据节点对应的排序策略。
需要说明的是,性能优化装置的各个实施例与上述性能优化方法的各实施例基本相同,在此不再详细赘述。
此外,本公开实施例还提出一种计算机可读存储介质,所述计算机可读存储介质上存储有性能优化程序,所述性能优化程序被处理器执行时实现如上所述性能优化方法的步骤。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者***不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者***所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者***中还存在另外的相同要素。
上述本公开实施例序号仅仅为了描述,不代表实施例的优劣。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本公开的技术方案本质上或者说对一些情况做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在如上所述的一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机, 计算机,服务器,空调器,或者网络设备等)执行本公开各个实施例所述的方法。
避免了客户端总是从与其距离最近的数据节点中读取数据块,减小了与客户端距离最近的数据节点的压力,避免了HDFS压力分布不均匀,提高了整个HDFS的读性能。
本公开通过当接收到客户端发送的数据读取请求后,获取该数据读取请求对应的数据块所在的数据节点,以及获取数据节点对应的预设排序策略,按照该排序策略对该数据节点进行排序,得到数据节点列表;将该数据节点列表返回给该客户端,以供所述客户端根据所述数据节点列表确定提供读数据块服务的数据节点。由于客户端根据该排序策略排序后所得的数据节点列表确定提供读数据块服务的数据节点,不再总是将与客户端距离最近的数据节点确定为提供读数据块服务的数据节点,从而避免了客户端总是从与其距离最近的数据节点中读取数据块,减小了与客户端距离最近的数据节点的压力,避免了HDFS压力分布不均匀,提高了整个HDFS的读性能。
以上仅为本公开的优选实施例,并非因此限制本公开的专利范围,凡是利用本公开说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本公开的专利保护范围内。

Claims (10)

  1. 一种性能优化方法,其中,所述性能优化方法包括以下步骤:
    当接收到客户端发送的数据读取请求后,获取所述数据读取请求对应的数据块所在的数据节点;
    获取所述数据节点对应的预设排序策略,按照所述排序策略对所述数据节点进行排序,得到数据节点列表;
    将所述数据节点列表返回给所述客户端,以供所述客户端根据所述数据节点列表确定提供读数据块服务的数据节点。
  2. 如权利要求1所述的性能优化方法,其中,当所述排序策略为第一排序策略时,所述按照所述排序策略对所述数据节点进行排序,得到数据节点列表的步骤包括:
    获取所述数据节点对应的压力值;
    根据所述压力值确定所述数据节点对应的压力,将所述数据节点按照所述压力从小到大的顺序排序,得到数据节点列表。
  3. 如权利要求2所述的性能优化方法,其中,所述获取所述数据节点对应的压力值的步骤包括:
    获取所述数据节点的压力数据;
    根据所述压力数据和预设的压力数据分值标准,得到所述数据节点的压力数据分值;
    根据所述压力数据分值和对应预设的压力数据权重值,计算得到所述数据节点对应的压力值。
  4. 如权利要求1所述的性能优化方法,其中,当所述排序策略为第二排序策略时,所述按照所述排序策略对所述数据节点进行排序,得到数据节点列表的步骤包括:
    将所述数据节点按照与所述客户端的距离由近到远的顺序排序,得到预处理数据节点列表;
    获取所述数据节点对应的压力值,检测所述数据节点对应的压力值是否满足预设条件;
    当检测到所述数据节点对应的压力值满足预设条件时,将所述压力值满足预设条件的数据节点移动到所述预处理数据节点列表的末端,得到处理后的数据节点列表。
  5. 如权利要求1所述的性能优化方法,其中,当所述排序策略为第三排序策略时, 所述按照所述排序策略对所述数据节点进行排序,得到数据节点列表的步骤包括:
    将所述数据节点进行随机排序,得到数据节点列表。
  6. 如权利要求1所述的性能优化方法,其中,所述将所述数据节点列表返回给所述客户端,以供所述客户端根据所述数据节点列表确定提供读数据块服务的数据节点的步骤包括:
    将所述数据节点列表返回给所述客户端,以供所述客户端将所述数据节点列表中排在最前面的数据节点确定为提供读数据块服务的数据节点。
  7. 如权利要求1至6任一项所述的性能优化方法,其中,所述当接收到客户端发送的数据读取请求后,获取所述数据读取请求对应的数据块所在的数据节点的步骤之前,还包括:
    当接收到设置所述排序策略的设置请求后,根据所述设置请求设置所述数据节点对应的排序策略。
  8. 一种性能优化装置,其中,所述性能优化装置包括:
    获取模块,用于当接收到客户端发送的数据读取请求后,获取所述数据读取请求对应的数据块所在的数据节点;获取所述数据节点对应的预设排序策略;
    排序模块,用于按照所述排序策略对所述数据节点进行排序,得到数据节点列表;
    数据返回模块,用于将所述数据节点列表返回给所述客户端,以供所述客户端根据所述数据节点列表确定提供读数据块服务的数据节点。
  9. 一种性能优化设备,其中,所述性能优化设备包括存储器、处理器和存储在所述存储器上并可在所述处理器上运行的性能优化程序,所述性能优化程序被所述处理器执行时实现如权利要求1至7中任一项所述的性能优化方法的步骤。
  10. 一种计算机可读存储介质,其中,所述计算机可读存储介质上存储有性能优化程序,所述性能优化程序被处理器执行时实现如权利要求1至7中任一项所述的性能优化方法的步骤。
PCT/CN2019/116024 2018-11-07 2019-11-06 性能优化方法、装置、设备及计算机可读存储介质 WO2020094064A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811323508.X 2018-11-07
CN201811323508.XA CN111159131A (zh) 2018-11-07 2018-11-07 性能优化方法、装置、设备及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2020094064A1 true WO2020094064A1 (zh) 2020-05-14

Family

ID=70554758

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/116024 WO2020094064A1 (zh) 2018-11-07 2019-11-06 性能优化方法、装置、设备及计算机可读存储介质

Country Status (2)

Country Link
CN (1) CN111159131A (zh)
WO (1) WO2020094064A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11425980B2 (en) 2020-04-01 2022-08-30 Omachron Intellectual Property Inc. Hair dryer

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112995280B (zh) * 2021-02-03 2022-04-22 北京邮电大学 面向多内容需求服务的数据分配方法和装置
CN113778346B (zh) * 2021-11-12 2022-02-11 深圳市名竹科技有限公司 数据读取方法、装置、设备和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104156381A (zh) * 2014-03-27 2014-11-19 深圳信息职业技术学院 Hadoop分布式文件***的副本存取方法、装置和Hadoop分布式文件***
CN105550362A (zh) * 2015-12-31 2016-05-04 浙江大华技术股份有限公司 一种存储***的索引数据修复方法和存储***
US20170373977A1 (en) * 2016-06-28 2017-12-28 Paypal, Inc. Tapping network data to perform load balancing
CN108009260A (zh) * 2017-12-11 2018-05-08 西安交通大学 一种大数据存储下结合节点负载和距离的副本放置方法
US20180285167A1 (en) * 2017-04-03 2018-10-04 Ocient, Inc Database management system providing local balancing within individual cluster node

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9424272B2 (en) * 2005-01-12 2016-08-23 Wandisco, Inc. Distributed file system using consensus nodes
CN102546782B (zh) * 2011-12-28 2015-04-29 北京奇虎科技有限公司 一种分布式***及其数据操作方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104156381A (zh) * 2014-03-27 2014-11-19 深圳信息职业技术学院 Hadoop分布式文件***的副本存取方法、装置和Hadoop分布式文件***
CN105550362A (zh) * 2015-12-31 2016-05-04 浙江大华技术股份有限公司 一种存储***的索引数据修复方法和存储***
US20170373977A1 (en) * 2016-06-28 2017-12-28 Paypal, Inc. Tapping network data to perform load balancing
US20180285167A1 (en) * 2017-04-03 2018-10-04 Ocient, Inc Database management system providing local balancing within individual cluster node
CN108009260A (zh) * 2017-12-11 2018-05-08 西安交通大学 一种大数据存储下结合节点负载和距离的副本放置方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11425980B2 (en) 2020-04-01 2022-08-30 Omachron Intellectual Property Inc. Hair dryer

Also Published As

Publication number Publication date
CN111159131A (zh) 2020-05-15

Similar Documents

Publication Publication Date Title
US10862957B2 (en) Dissemination of node metrics in server clusters
CN109660607B (zh) 一种业务请求分发方法、接收方法、装置及服务器集群
AU2016382908B2 (en) Short link processing method, device and server
US11238175B2 (en) File system permission setting method and apparatus
WO2020094064A1 (zh) 性能优化方法、装置、设备及计算机可读存储介质
CN112860695B (zh) 监控数据查询方法、装置、设备、存储介质及程序产品
US20180349363A1 (en) Opportunistic gossip-type dissemination of node metrics in server clusters
US8635250B2 (en) Methods and systems for deleting large amounts of data from a multitenant database
US20130311742A1 (en) Image management method, mobile terminal and computer storage medium
CN109885786B (zh) 数据缓存处理方法、装置、电子设备及可读存储介质
CN106790552B (zh) 一种基于内容分发网络的内容提供***
WO2020042427A1 (zh) 基于数据分片的对账方法、装置、计算机设备及存储介质
US20220075757A1 (en) Data read method, data write method, and server
CN105159845A (zh) 存储器读取方法
US11836132B2 (en) Managing persistent database result sets
CN112732756B (zh) 数据查询方法、装置、设备及存储介质
CN108512768B (zh) 一种访问量的控制方法及装置
US11683316B2 (en) Method and device for communication between microservices
US12014051B2 (en) IO path determination method and apparatus, device and readable storage medium
CN112764948A (zh) 数据发送方法、数据发送装置、计算机设备及存储介质
US20080270483A1 (en) Storage Management System
CN114745275A (zh) 云服务环境中的节点更新方法、装置和计算机设备
US11442632B2 (en) Rebalancing of user accounts among partitions of a storage service
CN114253456A (zh) 一种缓存负载均衡方法和装置
CN108718285B (zh) 云计算集群的流量控制方法、装置及服务器

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19881322

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19881322

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 17/09/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19881322

Country of ref document: EP

Kind code of ref document: A1