CN109302497A - Data processing method, access agent device and system based on HADOOP - Google Patents

Data processing method, access agent device and system based on HADOOP Download PDF

Info

Publication number
CN109302497A
CN109302497A CN201811440934.1A CN201811440934A CN109302497A CN 109302497 A CN109302497 A CN 109302497A CN 201811440934 A CN201811440934 A CN 201811440934A CN 109302497 A CN109302497 A CN 109302497A
Authority
CN
China
Prior art keywords
namenode
client
computer room
selection
occupation rate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811440934.1A
Other languages
Chinese (zh)
Inventor
吴维伟
王志远
毛宝龙
刘洪通
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201811440934.1A priority Critical patent/CN109302497A/en
Publication of CN109302497A publication Critical patent/CN109302497A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/52Network services specially adapted for the location of the user terminal
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1008Server selection for load balancing based on parameters of servers, e.g. available memory or workload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1021Server selection for load balancing based on client or server locations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
    • H04L67/61Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources taking into account QoS or priority requirements

Abstract

The disclosure proposes a kind of data processing method based on HADOOP, access agent device and system, is related to big data technical field.A kind of data processing method based on HADOOP of the disclosure includes: to receive the access request from client, and access request includes client identification;The comparable NameNode in position of position and client is selected according to client identification;Access request is transmitted to the NameNode of selection.By such method, NameNode can be distributed nearby for the access request of client, it realizes and distributes access request using multiple NameNode that individual access agent apparatus is different location, also it avoids the problem that parallel flow caused by client and NameNode distance are remote is big, influence cluster process performance, optimizes the HADOOP clustering performance under multimachine room scene.

Description

Data processing method, access agent device and system based on HADOOP
Technical field
This disclosure relates to big data technical field, especially a kind of data processing method based on HADOOP, access agent Device and system.
Background technique
As HADOOP (extra large dupp) becomes an infrastructure component of big data processing, each major company uses HADOOP one after another Bottom as big data platform.But with the surge of the growth of cluster scale and portfolio, HADOOP cluster can encounter very More bottlenecks, most notable one is the disadvantage is that the performance of NameNode (name node) will limit the extension of cluster-based storage scale.
3.0 baseline of HADOOP introduce RBF (Router-Based Federation, the alliance scheme based on routing) come Solve storage scaling problem.
Summary of the invention
Inventors have found that RBF is suitable in the environment of single machine room, and in multimachine room scene, it is unable to satisfy performance requirement.
One purpose of the disclosure is to optimize HADOOP clustering performance under multimachine room scene.
According to the one aspect of some embodiments of the present disclosure, a kind of data processing method based on HADOOP is proposed, wrap It includes: receiving the access request from client, access request includes client identification;Position and visitor are selected according to client identification The comparable NameNode in the position at family end;Access request is transmitted to the NameNode of selection.
In some embodiments, the comparable NameNode in position of position and client is selected to comprise determining that and client Position immediate NameNode and NameNode operation resources occupation rate;By the operation resources occupation rate of NameNode with Predetermined threshold compares: if the operation resources occupation rate of NameNode is more than or equal to predetermined threshold, it is determined that next priority The operation resources occupation rate of NameNode, and compared with predetermined threshold, if the operation resources occupation rate of NameNode is greater than etc. In predetermined threshold, then, the operation resources occupation rate of the NameNode of next priority is determined and compared with predetermined threshold, until The operation resources occupation rate of NameNode is less than predetermined threshold NameNode, determines the NameNode of selection;Wherein, NameNode The distance between client is shorter, and priority is higher.
In some embodiments, client identification is client address information;Select position suitable with the position of client NameNode include: to be determined between client and NameNode according to the address information of client address information and NameNode Path;According to the comparable NameNode in position of path length selection position and client between client and NameNode.
In some embodiments, client identification is Client location information;Select position suitable with the position of client NameNode include: according to the location information of Client location information and NameNode selection with client be located at same computer room NameNode.
In some embodiments, selecting NameNode according to client identification includes: according to the client identification prestored Location information determines the computer room that client is located at;According to the determination of the location information of the NameNode prestored with client positioned at identical The NameNode of computer room;Selection is located at the NameNode of same computer room with client.
In some embodiments, the NameNode that selection is located at same computer room with client includes: there are multiple In the case that NameNode is located at identical computer room with client, then it is located in the NameNode of identical computer room from client: with Machine selects a NameNode, or selects NameNode according to the operation resources occupation rate of NameNode, or according to NameNode Predetermined priority select NameNode.
In some embodiments, based on the data processing method of HADOOP further include: receive the access from NameNode As a result and it is transmitted to client.
By such method, NameNode can be distributed nearby for the access request of client, realized using single Access agent device is that multiple NameNode of different location distribute access request, also avoids client and NameNode distance remote Caused by flow is big between computer room, influences the problem of cluster process performance, optimize the HADOOP clustering performance under multimachine room scene.
According to the one aspect of other embodiments of the disclosure, a kind of access agent device based on HADOOP is proposed, Include: receiving unit, is configured as receiving the access request from client, access request includes client identification;Node choosing Unit is selected, is configured as selecting the comparable NameNode in position of position and client according to client identification;Retransmission unit, quilt It is configured to for access request to be transmitted to the NameNode of selection, so as to NameNode.
In some embodiments, it is immediate with the position of client to be also configured to determination for node selecting unit The operation resources occupation rate of NameNode and NameNode;By the operation resources occupation rate of NameNode compared with predetermined threshold Compared with: if the operation resources occupation rate of NameNode is more than or equal to predetermined threshold, it is determined that the fortune of the NameNode of next priority Row resources occupation rate, and compared with predetermined threshold, if the operation resources occupation rate of NameNode is more than or equal to predetermined threshold, Then, the operation resources occupation rate of the NameNode of next priority is determined and compared with predetermined threshold, until NameNode It runs resources occupation rate and is less than predetermined threshold NameNode, determine the NameNode of selection;Wherein, NameNode and client it Between the shorter priority of distance it is higher.
In some embodiments, client identification is client address information;Node selecting unit is configured as: according to visitor Family end address information and the address information of NameNode determine the path between client and NameNode;According to client with The comparable NameNode in position of path length selection position and client between NameNode.
In some embodiments, client identification is Client location information;Node selecting unit is configured as: according to visitor The selection of the location information of family end position information and NameNode is located at the NameNode of same computer room with client.
In some embodiments, node selecting unit is configured as: the location information according to the client identification prestored is true Determine the computer room that client is located at;It is determined according to the location information of the NameNode prestored and is located at identical computer room with client NameNode;Selection is located at the NameNode of same computer room with client.
In some embodiments, the NameNode that selection is located at same computer room with client includes: there are multiple In the case that NameNode is located at identical computer room with client, then it is located in the NameNode of identical computer room from client: with Machine selects a NameNode, or selects NameNode according to the operation resources occupation rate of NameNode, or according to NameNode Predetermined priority select NameNode.
In some embodiments, receiving unit is additionally configured to receive the access result from NameNode;Retransmission unit Result will be accessed by, which being additionally configured to, is transmitted to client.
According to the one aspect of the other embodiment of the disclosure, a kind of access agent device based on HADOOP is proposed, It include: memory;And it is coupled to the processor of memory, processor is configured as based on the instruction execution for being stored in memory Above any one data processing method based on HADOOP.
Such access agent device can distribute nearby NameNode for the access request of client, realize using single A access agent device is that multiple NameNode of different location distribute access request, also avoids client and NameNode distance Flow is big between computer room caused by remote, influences the problem of cluster process performance, optimizes the HADOOP sociability under multimachine room scene Energy.
According to the one aspect of the still other embodiments of the disclosure, proposes a kind of computer readable storage medium, deposit thereon Computer program instructions are contained, above any one data processing based on HADOOP is realized when which is executed by processor The step of method.
By executing the instruction on such computer readable storage medium, can divide nearby for the access request of client With NameNode, realizes and distributes access request using multiple NameNode that individual access agent apparatus is different location, It avoids the problem that parallel flow caused by client and NameNode distance are remote is big, influence cluster process performance, optimizes multimachine HADOOP clustering performance under room scene.
In addition, proposing a kind of system based on HADOOP according to the one aspect of some embodiments of the present disclosure, comprising: Client is configured to interact with user;Above any one access agent device based on HADOOP;Multiple NameNode, It is configured as the NameSpace of management file system;With, multiple back end DataNode, it is configured as storing data, and is rung It should the operation requests from client and NameNode.
In such system based on HADOOP, access agent device can distribute nearby for the access request of client NameNode is realized and is distributed access request using multiple NameNode that individual access agent apparatus is different location, also keeps away Flow is big, influences the problem of cluster process performance between computer room caused by exempting from client and NameNode distance far, optimizes multimachine HADOOP clustering performance under room scene.
Detailed description of the invention
Attached drawing described herein is used to provide further understanding of the disclosure, constitutes a part of this disclosure, this public affairs The illustrative embodiments and their description opened do not constitute the improper restriction to the disclosure for explaining the disclosure.In the accompanying drawings:
Fig. 1 is the flow chart of one embodiment of the data processing method based on HADOOP of the disclosure.
Fig. 2 is the flow chart of another embodiment of the data processing method based on HADOOP of the disclosure.
Fig. 3 is the schematic diagram of one embodiment of the access agent device based on HADOOP of the disclosure.
Fig. 4 is the schematic diagram of another embodiment of the access agent device based on HADOOP of the disclosure.
The schematic diagram of another embodiment of the access agent device based on HADOOP of Fig. 5 disclosure.
Fig. 6 is the schematic diagram of one embodiment of the system based on HADOOP of the disclosure.
Specific embodiment
Below by drawings and examples, the technical solution of the disclosure is described in further detail.
The flow chart of one embodiment of the data processing method based on HADOOP of the disclosure is as shown in Figure 1.
In a step 101, access agent device receives the access request from client, and access request includes client mark Know.In one embodiment, access agent device can be RBF, can simulate NameNode as client and provide access interface, Access request is transmitted to NameNode, and the NameNode access result fed back is fed back to the client for initiating access request End.
In one embodiment, client identification can be client id (Identification, identity number), be The unique identification of client;Client identification can also be the location information of client, the computer room or longitude and latitude being located at such as client Degree etc.;Client identification can also be the address information of client, such as IP (Internet Protocol, Internet protocol) Location etc..
In a step 102, access agent device selects the position of position and client comparable according to client identification NameNode.In one embodiment, access agent device can be according to computer room that the client prestored is located at and each The computer room that NameNode is located at determines the NameNode for being located at identical computer room with client.
In step 103, access request is transmitted to the NameNode of selection.In one embodiment, RBF can be inquired Carry table gets the address NameNode of virtual directory mapping, after obtaining the address NameNode, simulant-client, to mesh Mark NameNode accesses, and NameNode is to this agent process unaware.
In the related technology, as RBF a set of for all computer room NameNode cluster configurations, since RBF can not be identified The position NameNode, the service request of client may be assigned to the node of point of presence, different physical locations Flow can be very big between computer room, seriously affects cluster process performance.But if distinguishing for each physics computer room NameNode cluster A set of RBF is configured, although flow increase between the computer rooms of different physical locations can be evaded, user program and place machine The RBF in room configures binding, configures, matches when user program needs to modify the RBF in personal code work in the operation of another computer room Cumbersome and easy error is set, code maintenance and cluster O&M difficulty are big.
By the method in the embodiment of the present disclosure, can be distributed nearby using the access request that single RBF is client NameNode is realized and is distributed access request using multiple NameNode that individual access agent apparatus is different location, avoids User program can not be migrated smoothly between different computer rooms, and code maintenance and cluster O&M difficulty big problem also avoid visitor Flow is big between computer room caused by family end and NameNode distance are remote, influences the problem of cluster process performance, optimizes multimachine room field HADOOP clustering performance under scape.
The flow chart of another embodiment of the data processing method based on HADOOP of the disclosure is as shown in Figure 2.
In step 201, access agent device receives the access request from client, and access request includes client mark Know.
In step 202, the determining immediate NameNode in position with client of access agent device, and determine The operation resources occupation rate of NameNode.In one embodiment, can according between access agent device and client away from From priority is determined, closely then priority is high for distance;Or it can prestore and the matched each NameNode of client and corresponding Priority.In one embodiment, the highest priority of the NameNode of identical computer room is located at client.
In step 203, judge whether the operation resources occupation rate of NameNode is more than or equal to predetermined threshold.If more than etc. In predetermined threshold, such as setting 80%~95% is predetermined threshold, it is determined that the NameNode is busy, executes step 204;If The operation resources occupation rate of NameNode is less than predetermined threshold, thens follow the steps 206.
In step 204, the NameNode and its operation resources occupation rate of next priority are determined.
In step 205, judge whether the operation resources occupation rate of the NameNode is more than or equal to predetermined threshold.If more than Equal to predetermined threshold, it is determined that the NameNode is busy, executes step 204;Otherwise, step 206 is executed.
In step 206, determining selects the NameNode as the NameNode for handling this access request.
In step 207, access request is transmitted to the NameNode of selection.
In one embodiment, the data processing method based on HADOOP can also include step 208: reception comes from The access result of NameNode is simultaneously transmitted to client.In one embodiment, access includes initiating access request in result Access result can will be fed back to sending according to the mark or address information and corresponded to by client identification or client address information Access request client.
By such method, it is contemplated that the busy-idle condition of each NameNode, is considering distal end calling On the basis of the problem of NameNode causes flow to increase, avoids single NameNode over-burden that processing is caused to postpone, improve The treatment effeciency and reliability of system.
In one embodiment, client identification is client address information;Access agent device can be known by address Que Ding not be located at the NameNode of same computer room with client, or according to the address of each NameNode determine client with Path between NameNode, the position of position and client that selection is determined according to the path length between client and NameNode Set comparable NameNode.
By such method, it can not only realize that selection and client are located at the NameNode of same computer room, and can The selection of the NameNode in single computer room is realized according to the length of forward-path, selects the shortest NameNode of forward-path, Or the priority of each NameNode from high to low is determined from long sequence is short to forward-path, further decrease data forwarding Pressure improves HADOOP clustering performance.
In one embodiment, client identification can be the Client location information (mark for the computer room that such as client is located at Know) or access agent device in prestore the corresponding relationship of each client identification and client location, can be according to client It identifies and determines Client location information.Access agent device can be according to the position determination of each NameNode prestored and client End is located at the NameNode of same computer room.Such method data matching process is simple, is not necessarily to operation, improves NameNode's Determine efficiency.
In some embodiments, in the case where being located at identical computer room with client there are multiple NameNode, then from Client, which is located in the NameNode of identical computer room, selects a NameNode forwarding access request.In one embodiment, it visits Ask that agent apparatus can randomly choose a NameNode from the NameNode of the computer room, it can also be according to the fortune of NameNode Row resources occupation rate selects occupancy efficiency to be less than the NameNode of predetermined threshold, or is selected according to the predetermined priority of NameNode Select NameNode.
By such method, it can be realized to the further preferred of the NameNode inside single computer room, further mention The high performance of HADOOP cluster.
The schematic diagram of one embodiment of the access agent device based on HADOOP of the disclosure is as shown in Figure 3.It receives single Member 301 can receive the access request from client, and access request includes client identification.In one embodiment, client End mark can be client id, be the unique identification of client;Client identification can also be the location information of client, such as Computer room or longitude and latitude that client is located at etc.;Client identification can also be the address information of client, such as IP address.
Node selecting unit 302 can select the position of position and client comparable according to client identification NameNode.In one embodiment, the computer room that node selecting unit 302 can be located at according to the client prestored, and it is each The computer room that a NameNode is located at determines the NameNode for being located at identical computer room with client.
Access request can be transmitted to the NameNode of selection by retransmission unit 303.In one embodiment, retransmission unit 303 can inquire carry table, get the address NameNode of virtual directory mapping, after obtaining the address NameNode, simulation Client accesses to target NameNode, makes NameNode to this agent process unaware.
Such access agent device can distribute nearby NameNode for the access request of client, realize using single A access agent device is that multiple NameNode of different location distribute access request, also avoids client and NameNode distance Flow is big between computer room caused by remote, influences the problem of cluster process performance, optimizes the HADOOP sociability under multimachine room scene Energy.
In one embodiment, node selecting unit 302 can determine one by one according to the sequence of the priority of NameNode Whether the operation resources occupation rate of NameNode is less than predetermined threshold, selects occupancy less than first of predetermined threshold NameNode of the NameNode as processing access request.In one embodiment, node selecting unit 302 can be according to access The distance between agent apparatus and client determine that priority, the nearly then priority of distance are high;In another embodiment, generation is accessed Reason device can prestore and the matched each NameNode of client and corresponding priority.In one embodiment, with visitor Family end is located at the highest priority of the NameNode of identical computer room.
Such device it is contemplated that each NameNode busy-idle condition, in view of distal end call NameNode make On the basis of the problem of increasing at flow, avoids single NameNode over-burden that processing is caused to postpone, improve the processing of system Efficiency and reliability.
In one embodiment, client identification is client address information, and node selecting unit 302 can pass through address The determining NameNode for being located at same computer room with client of identification, or according to the address of each NameNode determine client and Path between NameNode, selection select the position of position and client according to the path length between client and NameNode Comparable NameNode.
Such device can not only realize selection and client is located at the NameNode of same computer room, and can according to turn The length in hair path realizes the selection of the NameNode in single computer room, selects the shortest NameNode of forward-path, or to turn It sends out path and determines the priority of each NameNode from high to low from long sequence is short to, further decrease data forwarding pressure, Improve HADOOP clustering performance.
In one embodiment, client identification can be the Client location information (mark for the computer room that such as client is located at Know) or access agent device in prestore the corresponding relationship of each client identification and client location, can be according to client It identifies and determines Client location information.Node selecting unit 302 can according to the position of each NameNode prestored determine with Client is located at the NameNode of same computer room.Such device Data Matching process when selecting NameNode is simple, is not necessarily to Operation improves the determination efficiency of NameNode.
In some embodiments, in the case where being located at identical computer room with client there are multiple NameNode, then node Selecting unit 302 forwards access request from one NameNode of selection in the NameNode of identical computer room is located at client.? In one embodiment, node selecting unit 302 can randomly choose a NameNode from the NameNode of the computer room, may be used also To select occupancy efficiency less than the NameNode of predetermined threshold according to the operation resources occupation rate of NameNode, or according to The predetermined priority of NameNode selects NameNode, to realize to the further excellent of the NameNode inside single computer room Choosing, further improves the performance of HADOOP cluster.
The structural schematic diagram of one embodiment of access agent device of the disclosure based on HADOOP is as shown in Figure 4.It is based on The access agent device of HADOOP includes memory 401 and processor 402.Wherein: memory 401 can be disk, flash memory or Other any non-volatile memory mediums.The correspondence that memory is used to store the above data processing method based on HADOOP is real Apply the instruction in example.Processor 402 is coupled to memory 401, can be used as one or more integrated circuits to implement, such as micro- Processor or microcontroller.The processor 402 can optimize under multimachine room scene for executing the instruction stored in memory HADOOP clustering performance.
It in one embodiment, can be as shown in figure 5, the access agent device 500 based on HADOOP includes memory 501 and processor 502.Processor 502 is coupled to memory 501 by BUS bus 503.The access agent based on HADOOP Device 500 can also be connected to external memory 505 by memory interface 504 to call external data, can also pass through Network interface 506 is connected to network or an other computer system (not shown).It no longer describes in detail herein.
In this embodiment, it is instructed by memory stores data, then above-metioned instruction is handled by processor, can optimized HADOOP clustering performance under multimachine room scene.
In another embodiment, a kind of computer readable storage medium, is stored thereon with computer program instructions, this refers to The step of enabling the method realized in the data processing method corresponding embodiment based on HADOOP when being executed by processor.In the art Technical staff it should be appreciated that embodiment of the disclosure can provide as method, apparatus or computer program product.Therefore, the disclosure The form of complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used.Moreover, The disclosure can be used can be stored in the computer that one or more wherein includes computer usable program code with non-transient The form for the computer program product implemented on medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.).
The schematic diagram of one embodiment of the system based on HADOOP of the disclosure is as shown in Figure 6.611~61n of client Access request can be sent to NameNode, wherein n is positive integer for positioned at the client of any computer room.
Access agent device 62 based on HADOOP can be above any one access agent dress based on HADOOP It sets.In one embodiment, the access agent device based on HADOOP is improved RBF.In one embodiment, individually Only has the access agent device based on HADOOP in HADOOP.Access agent device 62 based on HADOOP can simulate NameNode provides access interface for client, is selected using above any one data processing method based on HADOOP NameNode, and forward access request;And the NameNode access result fed back can be fed back to the visitor for initiating access request Family end.
631~63m of NameNode can manage the NameSpace of file system, and according to access request to DataNode Instruction is sent, and access result is fed back into the access agent device based on HADOOP, wherein m is positive integer.
641~64i of DataNode can storing data, and respond the operation requests from client and NameNode, Middle i is positive integer.
In such system based on HADOOP, access agent device can distribute nearby for the access request of client NameNode is realized and is distributed access request using multiple NameNode that individual access agent apparatus is different location, also keeps away Flow is big, influences the problem of cluster process performance between computer room caused by exempting from client and NameNode distance far, optimizes multimachine HADOOP clustering performance under room scene.
The disclosure is reference according to the method for the embodiment of the present disclosure, the flow chart of equipment (system) and computer program product And/or block diagram describes.It should be understood that each process in flowchart and/or the block diagram can be realized by computer program instructions And/or the combination of the process and/or box in box and flowchart and/or the block diagram.It can provide these computer programs to refer to Enable the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to generate One machine so that by the instruction that the processor of computer or other programmable data processing devices executes generate for realizing The device for the function of being specified in one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates, Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one The step of function of being specified in a box or multiple boxes.
So far, the disclosure is described in detail.In order to avoid covering the design of the disclosure, it is public that this field institute is not described The some details known.Those skilled in the art as described above, completely it can be appreciated how implementing technology disclosed herein Scheme.
Disclosed method and device may be achieved in many ways.For example, can by software, hardware, firmware or Person's software, hardware, firmware any combination realize disclosed method and device.The step of for the method it is above-mentioned Sequence is merely to be illustrated, and the step of disclosed method is not limited to sequence described in detail above, unless with other sides Formula illustrates.In addition, in some embodiments, the disclosure can be also embodied as recording program in the recording medium, these Program includes for realizing according to the machine readable instructions of disclosed method.Thus, the disclosure also covers storage for executing According to the recording medium of the program of disclosed method.
Finally it should be noted that: above embodiments are only to illustrate the technical solution of the disclosure rather than its limitations;To the greatest extent Pipe is described in detail the disclosure referring to preferred embodiment, it should be understood by those ordinary skilled in the art that: still It can modify to the specific embodiment of the disclosure or some technical features can be equivalently replaced;Without departing from this public affairs The spirit of technical solution is opened, should all be covered in the claimed technical proposal scope of the disclosure.

Claims (17)

1. a kind of data processing method based on extra large dupp HADOOP, comprising:
The access request from client is received, the access request includes client identification;
The comparable name node NameNode in position of position and the client is selected according to the client identification;
The access request is transmitted to the NameNode of selection.
2. according to the method described in claim 1, wherein, the selection position and the position of the client are comparable NameNode includes:
The determining operation resources occupation rate with the position immediate NameNode and NameNode of the client;
The operation resources occupation rate of NameNode is compared with predetermined threshold:
If the operation resources occupation rate of NameNode is more than or equal to predetermined threshold,
Determine the operation resources occupation rate of the NameNode of next priority, and compared with predetermined threshold, if NameNode It runs resources occupation rate and is more than or equal to predetermined threshold, then
Determine the operation resources occupation rate of the NameNode of next priority and compared with predetermined threshold, until NameNode It runs resources occupation rate and is less than predetermined threshold NameNode, determine the NameNode of selection;
Wherein, the shorter priority of the distance between the NameNode and client is higher.
3. method according to claim 1 or 2, wherein the client identification is client address information;
The selection position and the comparable NameNode in position of the client include:
It is determined between the client and the NameNode according to the address information of the client address information and NameNode Path;
Select position and the position of the client comparable according to the path length between the client and NameNode NameNode。
4. according to the method described in claim 1, wherein, the client identification is Client location information;
The selection position and the comparable NameNode in position of the client include:
It is located at same computer room with the client according to the selection of the location information of the Client location information and NameNode NameNode。
It is described to select the NameNode to include: according to the client identification 5. according to the method described in claim 1, wherein
The computer room that the client is located at is determined according to the location information of the client identification prestored;
The NameNode for being located at identical computer room with the client is determined according to the location information of the NameNode prestored;
Selection is located at the NameNode of same computer room with the client.
6. method according to claim 4 or 5, wherein the selection is located at same computer room with the client NameNode includes:
In the case where being located at identical computer room with the client there are multiple NameNode, then it is located at phase from the client With in the NameNode of computer room:
A NameNode is randomly choosed, or according to the operation resources occupation rate of NameNode selection NameNode, or according to The predetermined priority of NameNode selects NameNode.
7. according to the method described in claim 1, further include:
It receives the access result from NameNode and is transmitted to client.
8. a kind of access agent device based on extra large dupp HADOOP, comprising:
Receiving unit is configured as receiving the access request from client, and the access request includes client identification;
Node selecting unit is configured as selecting the comparable name in position of position and the client according to the client identification Claim node NameNode;
Retransmission unit is configured as the access request being transmitted to the NameNode of selection, so as to the NameNode.
9. device according to claim 8, wherein the node selecting unit is also configured to
The determining operation resources occupation rate with the position immediate NameNode and NameNode of the client;
The operation resources occupation rate of NameNode is compared with predetermined threshold:
If the operation resources occupation rate of NameNode is more than or equal to predetermined threshold,
Determine the operation resources occupation rate of the NameNode of next priority, and compared with predetermined threshold, if NameNode It runs resources occupation rate and is more than or equal to predetermined threshold, then
Determine the operation resources occupation rate of the NameNode of next priority and compared with predetermined threshold, until NameNode It runs resources occupation rate and is less than predetermined threshold NameNode, determine the NameNode of selection;
Wherein, the shorter priority of the distance between the NameNode and client is higher.
10. device according to claim 8 or claim 9, wherein the client identification is client address information;
The node selecting unit is configured as:
It is determined between the client and the NameNode according to the address information of the client address information and NameNode Path;
Select position and the position of the client comparable according to the path length between the client and NameNode NameNode。
11. device according to claim 8, wherein the client identification is Client location information;
The node selecting unit is configured as: according to the location information of the Client location information and NameNode selection with The client is located at the NameNode of same computer room.
12. device according to claim 8, wherein the node selecting unit is configured as:
The computer room that the client is located at is determined according to the location information of the client identification prestored;
The NameNode for being located at identical computer room with the client is determined according to the location information of the NameNode prestored;
Selection is located at the NameNode of same computer room with the client.
13. device according to claim 11 or 12, wherein the selection is located at same computer room with the client NameNode includes:
In the case where being located at identical computer room with the client there are multiple NameNode, then it is located at phase from the client With in the NameNode of computer room:
A NameNode is randomly choosed, or according to the operation resources occupation rate of NameNode selection NameNode, or according to The predetermined priority of NameNode selects NameNode.
14. device according to claim 8, wherein
The receiving unit is additionally configured to receive the access result from NameNode;
The retransmission unit is additionally configured to the access result being transmitted to client.
15. a kind of access agent device based on extra large dupp HADOOP, comprising:
Memory;And
It is coupled to the processor of the memory, the processor is configured to based on the instruction execution for being stored in the memory Method as described in any one of claim 1 to 7.
16. a kind of computer readable storage medium, is stored thereon with computer program instructions, real when which is executed by processor The step of method described in existing claim 1 to 7 any one.
17. a kind of system based on extra large dupp HADOOP, comprising:
Client is configured to interact with user;
Access agent device described in claim 8~15 any one based on HADOOP;
Multiple name node NameNode are configured as the NameSpace of management file system;With,
Multiple back end DataNode are configured as storing data, and respond the behaviour from client and the NameNode It requests.
CN201811440934.1A 2018-11-29 2018-11-29 Data processing method, access agent device and system based on HADOOP Pending CN109302497A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811440934.1A CN109302497A (en) 2018-11-29 2018-11-29 Data processing method, access agent device and system based on HADOOP

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811440934.1A CN109302497A (en) 2018-11-29 2018-11-29 Data processing method, access agent device and system based on HADOOP

Publications (1)

Publication Number Publication Date
CN109302497A true CN109302497A (en) 2019-02-01

Family

ID=65141461

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811440934.1A Pending CN109302497A (en) 2018-11-29 2018-11-29 Data processing method, access agent device and system based on HADOOP

Country Status (1)

Country Link
CN (1) CN109302497A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110198346A (en) * 2019-05-06 2019-09-03 北京三快在线科技有限公司 Method for reading data, device, electronic equipment and readable storage medium storing program for executing
CN111083204A (en) * 2019-11-29 2020-04-28 广州市百果园信息技术有限公司 File transmission method, device and storage medium

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103023902A (en) * 2012-12-11 2013-04-03 北京奇虎科技有限公司 Data transmission method and system
CN103491192A (en) * 2013-09-30 2014-01-01 北京搜狐新媒体信息技术有限公司 Namenode switching method and system of distributed system
CN103678360A (en) * 2012-09-13 2014-03-26 腾讯科技(深圳)有限公司 Data storing method and device for distributed file system
CN103685611A (en) * 2013-12-31 2014-03-26 山石网科通信技术有限公司 Network access processing method and device
CN103729250A (en) * 2012-10-11 2014-04-16 国际商业机器公司 Method and system to select data nodes configured to satisfy a set of requirements
US20140244701A1 (en) * 2013-02-25 2014-08-28 Emc Corporation Data analytics platform over parallel databases and distributed file systems
US20150378618A1 (en) * 2012-01-18 2015-12-31 Cloudera, Inc. Memory allocation buffer for reduction of heap fragmentation
CN105554125A (en) * 2015-04-24 2016-05-04 美通云动(北京)科技有限公司 Method for realizing webpage adaptation through CDN (content delivery network) and system thereof
CN105871985A (en) * 2015-12-10 2016-08-17 乐视网信息技术(北京)股份有限公司 Data access request processing method and apparatus, server, client and system
CN106649847A (en) * 2016-12-30 2017-05-10 南威软件股份有限公司 A large data real-time processing system based on Hadoop
CN107493331A (en) * 2017-08-16 2017-12-19 网宿科技股份有限公司 A kind of client access method, server and system
CN107992491A (en) * 2016-10-26 2018-05-04 ***通信有限公司研究院 A kind of method and device of distributed file system, data access and data storage
CN108768985A (en) * 2018-05-17 2018-11-06 成都致云科技有限公司 A kind of accessed node access distribution method and device

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150378618A1 (en) * 2012-01-18 2015-12-31 Cloudera, Inc. Memory allocation buffer for reduction of heap fragmentation
CN103678360A (en) * 2012-09-13 2014-03-26 腾讯科技(深圳)有限公司 Data storing method and device for distributed file system
CN103729250A (en) * 2012-10-11 2014-04-16 国际商业机器公司 Method and system to select data nodes configured to satisfy a set of requirements
CN103023902A (en) * 2012-12-11 2013-04-03 北京奇虎科技有限公司 Data transmission method and system
US20140244701A1 (en) * 2013-02-25 2014-08-28 Emc Corporation Data analytics platform over parallel databases and distributed file systems
CN103491192A (en) * 2013-09-30 2014-01-01 北京搜狐新媒体信息技术有限公司 Namenode switching method and system of distributed system
CN103685611A (en) * 2013-12-31 2014-03-26 山石网科通信技术有限公司 Network access processing method and device
CN105554125A (en) * 2015-04-24 2016-05-04 美通云动(北京)科技有限公司 Method for realizing webpage adaptation through CDN (content delivery network) and system thereof
CN105871985A (en) * 2015-12-10 2016-08-17 乐视网信息技术(北京)股份有限公司 Data access request processing method and apparatus, server, client and system
CN107992491A (en) * 2016-10-26 2018-05-04 ***通信有限公司研究院 A kind of method and device of distributed file system, data access and data storage
CN106649847A (en) * 2016-12-30 2017-05-10 南威软件股份有限公司 A large data real-time processing system based on Hadoop
CN107493331A (en) * 2017-08-16 2017-12-19 网宿科技股份有限公司 A kind of client access method, server and system
CN108768985A (en) * 2018-05-17 2018-11-06 成都致云科技有限公司 A kind of accessed node access distribution method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
胡博: "一种跨HDFS集群的文件资源调度机制", 《计算机学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110198346A (en) * 2019-05-06 2019-09-03 北京三快在线科技有限公司 Method for reading data, device, electronic equipment and readable storage medium storing program for executing
CN111083204A (en) * 2019-11-29 2020-04-28 广州市百果园信息技术有限公司 File transmission method, device and storage medium

Similar Documents

Publication Publication Date Title
US10466899B2 (en) Selecting controllers based on affinity between access devices and storage segments
CN105408863B (en) The end-point data center collected with different tenants
JP6470426B2 (en) Resource allocation device and resource allocation method
JP2019522846A5 (en)
CA3043198A1 (en) Selecting threads for concurrent processing of data
US10523753B2 (en) Broadcast data operations in distributed file systems
CN109302466A (en) Data processing method, relevant device and computer storage medium
CN106155264B (en) Manage the computer approach and computer system of the power consumption of storage subsystem
CN109302497A (en) Data processing method, access agent device and system based on HADOOP
CN107786669A (en) A kind of method of load balance process, server, device and storage medium
CN111327651A (en) Resource downloading method, device, edge node and storage medium
CN108173893A (en) For the method and apparatus of networking
CN108563697A (en) A kind of data processing method, device and storage medium
CN104796336B (en) A kind of method and device for being configured, issuing flow table item
US9164800B2 (en) Optimizing latencies in cloud systems by intelligent compute node placement
CN108259218A (en) A kind of IP address distribution method and device
US9641611B2 (en) Logical interface encoding
JP2016116184A (en) Network monitoring device and virtual network management method
JP6951846B2 (en) Computer system and task allocation method
Hsu et al. Virtual network mapping through path splitting and migration
CN108153494B (en) A kind of I/O request processing method and processing device
CN104780235B (en) IP attribution inquiry method, device and server
US20140225896A1 (en) Resource oriented dependency graph for network configuration
JP6256167B2 (en) Risk reduction in data center networks
CN110213365B (en) User access request processing method based on user partition and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190201

RJ01 Rejection of invention patent application after publication