CN109302497A - Data processing method, access agent device and system based on HADOOP - Google Patents
Data processing method, access agent device and system based on HADOOP Download PDFInfo
- Publication number
- CN109302497A CN109302497A CN201811440934.1A CN201811440934A CN109302497A CN 109302497 A CN109302497 A CN 109302497A CN 201811440934 A CN201811440934 A CN 201811440934A CN 109302497 A CN109302497 A CN 109302497A
- Authority
- CN
- China
- Prior art keywords
- namenode
- client
- computer room
- selection
- occupation rate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/60—Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/52—Network services specially adapted for the location of the user terminal
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1001—Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
- H04L67/1004—Server selection for load balancing
- H04L67/1008—Server selection for load balancing based on parameters of servers, e.g. available memory or workload
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1001—Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
- H04L67/1004—Server selection for load balancing
- H04L67/1021—Server selection for load balancing based on client or server locations
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/60—Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
- H04L67/61—Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources taking into account QoS or priority requirements
Abstract
The disclosure proposes a kind of data processing method based on HADOOP, access agent device and system, is related to big data technical field.A kind of data processing method based on HADOOP of the disclosure includes: to receive the access request from client, and access request includes client identification;The comparable NameNode in position of position and client is selected according to client identification;Access request is transmitted to the NameNode of selection.By such method, NameNode can be distributed nearby for the access request of client, it realizes and distributes access request using multiple NameNode that individual access agent apparatus is different location, also it avoids the problem that parallel flow caused by client and NameNode distance are remote is big, influence cluster process performance, optimizes the HADOOP clustering performance under multimachine room scene.
Description
Technical field
This disclosure relates to big data technical field, especially a kind of data processing method based on HADOOP, access agent
Device and system.
Background technique
As HADOOP (extra large dupp) becomes an infrastructure component of big data processing, each major company uses HADOOP one after another
Bottom as big data platform.But with the surge of the growth of cluster scale and portfolio, HADOOP cluster can encounter very
More bottlenecks, most notable one is the disadvantage is that the performance of NameNode (name node) will limit the extension of cluster-based storage scale.
3.0 baseline of HADOOP introduce RBF (Router-Based Federation, the alliance scheme based on routing) come
Solve storage scaling problem.
Summary of the invention
Inventors have found that RBF is suitable in the environment of single machine room, and in multimachine room scene, it is unable to satisfy performance requirement.
One purpose of the disclosure is to optimize HADOOP clustering performance under multimachine room scene.
According to the one aspect of some embodiments of the present disclosure, a kind of data processing method based on HADOOP is proposed, wrap
It includes: receiving the access request from client, access request includes client identification;Position and visitor are selected according to client identification
The comparable NameNode in the position at family end;Access request is transmitted to the NameNode of selection.
In some embodiments, the comparable NameNode in position of position and client is selected to comprise determining that and client
Position immediate NameNode and NameNode operation resources occupation rate;By the operation resources occupation rate of NameNode with
Predetermined threshold compares: if the operation resources occupation rate of NameNode is more than or equal to predetermined threshold, it is determined that next priority
The operation resources occupation rate of NameNode, and compared with predetermined threshold, if the operation resources occupation rate of NameNode is greater than etc.
In predetermined threshold, then, the operation resources occupation rate of the NameNode of next priority is determined and compared with predetermined threshold, until
The operation resources occupation rate of NameNode is less than predetermined threshold NameNode, determines the NameNode of selection;Wherein, NameNode
The distance between client is shorter, and priority is higher.
In some embodiments, client identification is client address information;Select position suitable with the position of client
NameNode include: to be determined between client and NameNode according to the address information of client address information and NameNode
Path;According to the comparable NameNode in position of path length selection position and client between client and NameNode.
In some embodiments, client identification is Client location information;Select position suitable with the position of client
NameNode include: according to the location information of Client location information and NameNode selection with client be located at same computer room
NameNode.
In some embodiments, selecting NameNode according to client identification includes: according to the client identification prestored
Location information determines the computer room that client is located at;According to the determination of the location information of the NameNode prestored with client positioned at identical
The NameNode of computer room;Selection is located at the NameNode of same computer room with client.
In some embodiments, the NameNode that selection is located at same computer room with client includes: there are multiple
In the case that NameNode is located at identical computer room with client, then it is located in the NameNode of identical computer room from client: with
Machine selects a NameNode, or selects NameNode according to the operation resources occupation rate of NameNode, or according to NameNode
Predetermined priority select NameNode.
In some embodiments, based on the data processing method of HADOOP further include: receive the access from NameNode
As a result and it is transmitted to client.
By such method, NameNode can be distributed nearby for the access request of client, realized using single
Access agent device is that multiple NameNode of different location distribute access request, also avoids client and NameNode distance remote
Caused by flow is big between computer room, influences the problem of cluster process performance, optimize the HADOOP clustering performance under multimachine room scene.
According to the one aspect of other embodiments of the disclosure, a kind of access agent device based on HADOOP is proposed,
Include: receiving unit, is configured as receiving the access request from client, access request includes client identification;Node choosing
Unit is selected, is configured as selecting the comparable NameNode in position of position and client according to client identification;Retransmission unit, quilt
It is configured to for access request to be transmitted to the NameNode of selection, so as to NameNode.
In some embodiments, it is immediate with the position of client to be also configured to determination for node selecting unit
The operation resources occupation rate of NameNode and NameNode;By the operation resources occupation rate of NameNode compared with predetermined threshold
Compared with: if the operation resources occupation rate of NameNode is more than or equal to predetermined threshold, it is determined that the fortune of the NameNode of next priority
Row resources occupation rate, and compared with predetermined threshold, if the operation resources occupation rate of NameNode is more than or equal to predetermined threshold,
Then, the operation resources occupation rate of the NameNode of next priority is determined and compared with predetermined threshold, until NameNode
It runs resources occupation rate and is less than predetermined threshold NameNode, determine the NameNode of selection;Wherein, NameNode and client it
Between the shorter priority of distance it is higher.
In some embodiments, client identification is client address information;Node selecting unit is configured as: according to visitor
Family end address information and the address information of NameNode determine the path between client and NameNode;According to client with
The comparable NameNode in position of path length selection position and client between NameNode.
In some embodiments, client identification is Client location information;Node selecting unit is configured as: according to visitor
The selection of the location information of family end position information and NameNode is located at the NameNode of same computer room with client.
In some embodiments, node selecting unit is configured as: the location information according to the client identification prestored is true
Determine the computer room that client is located at;It is determined according to the location information of the NameNode prestored and is located at identical computer room with client
NameNode;Selection is located at the NameNode of same computer room with client.
In some embodiments, the NameNode that selection is located at same computer room with client includes: there are multiple
In the case that NameNode is located at identical computer room with client, then it is located in the NameNode of identical computer room from client: with
Machine selects a NameNode, or selects NameNode according to the operation resources occupation rate of NameNode, or according to NameNode
Predetermined priority select NameNode.
In some embodiments, receiving unit is additionally configured to receive the access result from NameNode;Retransmission unit
Result will be accessed by, which being additionally configured to, is transmitted to client.
According to the one aspect of the other embodiment of the disclosure, a kind of access agent device based on HADOOP is proposed,
It include: memory;And it is coupled to the processor of memory, processor is configured as based on the instruction execution for being stored in memory
Above any one data processing method based on HADOOP.
Such access agent device can distribute nearby NameNode for the access request of client, realize using single
A access agent device is that multiple NameNode of different location distribute access request, also avoids client and NameNode distance
Flow is big between computer room caused by remote, influences the problem of cluster process performance, optimizes the HADOOP sociability under multimachine room scene
Energy.
According to the one aspect of the still other embodiments of the disclosure, proposes a kind of computer readable storage medium, deposit thereon
Computer program instructions are contained, above any one data processing based on HADOOP is realized when which is executed by processor
The step of method.
By executing the instruction on such computer readable storage medium, can divide nearby for the access request of client
With NameNode, realizes and distributes access request using multiple NameNode that individual access agent apparatus is different location,
It avoids the problem that parallel flow caused by client and NameNode distance are remote is big, influence cluster process performance, optimizes multimachine
HADOOP clustering performance under room scene.
In addition, proposing a kind of system based on HADOOP according to the one aspect of some embodiments of the present disclosure, comprising:
Client is configured to interact with user;Above any one access agent device based on HADOOP;Multiple NameNode,
It is configured as the NameSpace of management file system;With, multiple back end DataNode, it is configured as storing data, and is rung
It should the operation requests from client and NameNode.
In such system based on HADOOP, access agent device can distribute nearby for the access request of client
NameNode is realized and is distributed access request using multiple NameNode that individual access agent apparatus is different location, also keeps away
Flow is big, influences the problem of cluster process performance between computer room caused by exempting from client and NameNode distance far, optimizes multimachine
HADOOP clustering performance under room scene.
Detailed description of the invention
Attached drawing described herein is used to provide further understanding of the disclosure, constitutes a part of this disclosure, this public affairs
The illustrative embodiments and their description opened do not constitute the improper restriction to the disclosure for explaining the disclosure.In the accompanying drawings:
Fig. 1 is the flow chart of one embodiment of the data processing method based on HADOOP of the disclosure.
Fig. 2 is the flow chart of another embodiment of the data processing method based on HADOOP of the disclosure.
Fig. 3 is the schematic diagram of one embodiment of the access agent device based on HADOOP of the disclosure.
Fig. 4 is the schematic diagram of another embodiment of the access agent device based on HADOOP of the disclosure.
The schematic diagram of another embodiment of the access agent device based on HADOOP of Fig. 5 disclosure.
Fig. 6 is the schematic diagram of one embodiment of the system based on HADOOP of the disclosure.
Specific embodiment
Below by drawings and examples, the technical solution of the disclosure is described in further detail.
The flow chart of one embodiment of the data processing method based on HADOOP of the disclosure is as shown in Figure 1.
In a step 101, access agent device receives the access request from client, and access request includes client mark
Know.In one embodiment, access agent device can be RBF, can simulate NameNode as client and provide access interface,
Access request is transmitted to NameNode, and the NameNode access result fed back is fed back to the client for initiating access request
End.
In one embodiment, client identification can be client id (Identification, identity number), be
The unique identification of client;Client identification can also be the location information of client, the computer room or longitude and latitude being located at such as client
Degree etc.;Client identification can also be the address information of client, such as IP (Internet Protocol, Internet protocol)
Location etc..
In a step 102, access agent device selects the position of position and client comparable according to client identification
NameNode.In one embodiment, access agent device can be according to computer room that the client prestored is located at and each
The computer room that NameNode is located at determines the NameNode for being located at identical computer room with client.
In step 103, access request is transmitted to the NameNode of selection.In one embodiment, RBF can be inquired
Carry table gets the address NameNode of virtual directory mapping, after obtaining the address NameNode, simulant-client, to mesh
Mark NameNode accesses, and NameNode is to this agent process unaware.
In the related technology, as RBF a set of for all computer room NameNode cluster configurations, since RBF can not be identified
The position NameNode, the service request of client may be assigned to the node of point of presence, different physical locations
Flow can be very big between computer room, seriously affects cluster process performance.But if distinguishing for each physics computer room NameNode cluster
A set of RBF is configured, although flow increase between the computer rooms of different physical locations can be evaded, user program and place machine
The RBF in room configures binding, configures, matches when user program needs to modify the RBF in personal code work in the operation of another computer room
Cumbersome and easy error is set, code maintenance and cluster O&M difficulty are big.
By the method in the embodiment of the present disclosure, can be distributed nearby using the access request that single RBF is client
NameNode is realized and is distributed access request using multiple NameNode that individual access agent apparatus is different location, avoids
User program can not be migrated smoothly between different computer rooms, and code maintenance and cluster O&M difficulty big problem also avoid visitor
Flow is big between computer room caused by family end and NameNode distance are remote, influences the problem of cluster process performance, optimizes multimachine room field
HADOOP clustering performance under scape.
The flow chart of another embodiment of the data processing method based on HADOOP of the disclosure is as shown in Figure 2.
In step 201, access agent device receives the access request from client, and access request includes client mark
Know.
In step 202, the determining immediate NameNode in position with client of access agent device, and determine
The operation resources occupation rate of NameNode.In one embodiment, can according between access agent device and client away from
From priority is determined, closely then priority is high for distance;Or it can prestore and the matched each NameNode of client and corresponding
Priority.In one embodiment, the highest priority of the NameNode of identical computer room is located at client.
In step 203, judge whether the operation resources occupation rate of NameNode is more than or equal to predetermined threshold.If more than etc.
In predetermined threshold, such as setting 80%~95% is predetermined threshold, it is determined that the NameNode is busy, executes step 204;If
The operation resources occupation rate of NameNode is less than predetermined threshold, thens follow the steps 206.
In step 204, the NameNode and its operation resources occupation rate of next priority are determined.
In step 205, judge whether the operation resources occupation rate of the NameNode is more than or equal to predetermined threshold.If more than
Equal to predetermined threshold, it is determined that the NameNode is busy, executes step 204;Otherwise, step 206 is executed.
In step 206, determining selects the NameNode as the NameNode for handling this access request.
In step 207, access request is transmitted to the NameNode of selection.
In one embodiment, the data processing method based on HADOOP can also include step 208: reception comes from
The access result of NameNode is simultaneously transmitted to client.In one embodiment, access includes initiating access request in result
Access result can will be fed back to sending according to the mark or address information and corresponded to by client identification or client address information
Access request client.
By such method, it is contemplated that the busy-idle condition of each NameNode, is considering distal end calling
On the basis of the problem of NameNode causes flow to increase, avoids single NameNode over-burden that processing is caused to postpone, improve
The treatment effeciency and reliability of system.
In one embodiment, client identification is client address information;Access agent device can be known by address
Que Ding not be located at the NameNode of same computer room with client, or according to the address of each NameNode determine client with
Path between NameNode, the position of position and client that selection is determined according to the path length between client and NameNode
Set comparable NameNode.
By such method, it can not only realize that selection and client are located at the NameNode of same computer room, and can
The selection of the NameNode in single computer room is realized according to the length of forward-path, selects the shortest NameNode of forward-path,
Or the priority of each NameNode from high to low is determined from long sequence is short to forward-path, further decrease data forwarding
Pressure improves HADOOP clustering performance.
In one embodiment, client identification can be the Client location information (mark for the computer room that such as client is located at
Know) or access agent device in prestore the corresponding relationship of each client identification and client location, can be according to client
It identifies and determines Client location information.Access agent device can be according to the position determination of each NameNode prestored and client
End is located at the NameNode of same computer room.Such method data matching process is simple, is not necessarily to operation, improves NameNode's
Determine efficiency.
In some embodiments, in the case where being located at identical computer room with client there are multiple NameNode, then from
Client, which is located in the NameNode of identical computer room, selects a NameNode forwarding access request.In one embodiment, it visits
Ask that agent apparatus can randomly choose a NameNode from the NameNode of the computer room, it can also be according to the fortune of NameNode
Row resources occupation rate selects occupancy efficiency to be less than the NameNode of predetermined threshold, or is selected according to the predetermined priority of NameNode
Select NameNode.
By such method, it can be realized to the further preferred of the NameNode inside single computer room, further mention
The high performance of HADOOP cluster.
The schematic diagram of one embodiment of the access agent device based on HADOOP of the disclosure is as shown in Figure 3.It receives single
Member 301 can receive the access request from client, and access request includes client identification.In one embodiment, client
End mark can be client id, be the unique identification of client;Client identification can also be the location information of client, such as
Computer room or longitude and latitude that client is located at etc.;Client identification can also be the address information of client, such as IP address.
Node selecting unit 302 can select the position of position and client comparable according to client identification
NameNode.In one embodiment, the computer room that node selecting unit 302 can be located at according to the client prestored, and it is each
The computer room that a NameNode is located at determines the NameNode for being located at identical computer room with client.
Access request can be transmitted to the NameNode of selection by retransmission unit 303.In one embodiment, retransmission unit
303 can inquire carry table, get the address NameNode of virtual directory mapping, after obtaining the address NameNode, simulation
Client accesses to target NameNode, makes NameNode to this agent process unaware.
Such access agent device can distribute nearby NameNode for the access request of client, realize using single
A access agent device is that multiple NameNode of different location distribute access request, also avoids client and NameNode distance
Flow is big between computer room caused by remote, influences the problem of cluster process performance, optimizes the HADOOP sociability under multimachine room scene
Energy.
In one embodiment, node selecting unit 302 can determine one by one according to the sequence of the priority of NameNode
Whether the operation resources occupation rate of NameNode is less than predetermined threshold, selects occupancy less than first of predetermined threshold
NameNode of the NameNode as processing access request.In one embodiment, node selecting unit 302 can be according to access
The distance between agent apparatus and client determine that priority, the nearly then priority of distance are high;In another embodiment, generation is accessed
Reason device can prestore and the matched each NameNode of client and corresponding priority.In one embodiment, with visitor
Family end is located at the highest priority of the NameNode of identical computer room.
Such device it is contemplated that each NameNode busy-idle condition, in view of distal end call NameNode make
On the basis of the problem of increasing at flow, avoids single NameNode over-burden that processing is caused to postpone, improve the processing of system
Efficiency and reliability.
In one embodiment, client identification is client address information, and node selecting unit 302 can pass through address
The determining NameNode for being located at same computer room with client of identification, or according to the address of each NameNode determine client and
Path between NameNode, selection select the position of position and client according to the path length between client and NameNode
Comparable NameNode.
Such device can not only realize selection and client is located at the NameNode of same computer room, and can according to turn
The length in hair path realizes the selection of the NameNode in single computer room, selects the shortest NameNode of forward-path, or to turn
It sends out path and determines the priority of each NameNode from high to low from long sequence is short to, further decrease data forwarding pressure,
Improve HADOOP clustering performance.
In one embodiment, client identification can be the Client location information (mark for the computer room that such as client is located at
Know) or access agent device in prestore the corresponding relationship of each client identification and client location, can be according to client
It identifies and determines Client location information.Node selecting unit 302 can according to the position of each NameNode prestored determine with
Client is located at the NameNode of same computer room.Such device Data Matching process when selecting NameNode is simple, is not necessarily to
Operation improves the determination efficiency of NameNode.
In some embodiments, in the case where being located at identical computer room with client there are multiple NameNode, then node
Selecting unit 302 forwards access request from one NameNode of selection in the NameNode of identical computer room is located at client.?
In one embodiment, node selecting unit 302 can randomly choose a NameNode from the NameNode of the computer room, may be used also
To select occupancy efficiency less than the NameNode of predetermined threshold according to the operation resources occupation rate of NameNode, or according to
The predetermined priority of NameNode selects NameNode, to realize to the further excellent of the NameNode inside single computer room
Choosing, further improves the performance of HADOOP cluster.
The structural schematic diagram of one embodiment of access agent device of the disclosure based on HADOOP is as shown in Figure 4.It is based on
The access agent device of HADOOP includes memory 401 and processor 402.Wherein: memory 401 can be disk, flash memory or
Other any non-volatile memory mediums.The correspondence that memory is used to store the above data processing method based on HADOOP is real
Apply the instruction in example.Processor 402 is coupled to memory 401, can be used as one or more integrated circuits to implement, such as micro-
Processor or microcontroller.The processor 402 can optimize under multimachine room scene for executing the instruction stored in memory
HADOOP clustering performance.
It in one embodiment, can be as shown in figure 5, the access agent device 500 based on HADOOP includes memory
501 and processor 502.Processor 502 is coupled to memory 501 by BUS bus 503.The access agent based on HADOOP
Device 500 can also be connected to external memory 505 by memory interface 504 to call external data, can also pass through
Network interface 506 is connected to network or an other computer system (not shown).It no longer describes in detail herein.
In this embodiment, it is instructed by memory stores data, then above-metioned instruction is handled by processor, can optimized
HADOOP clustering performance under multimachine room scene.
In another embodiment, a kind of computer readable storage medium, is stored thereon with computer program instructions, this refers to
The step of enabling the method realized in the data processing method corresponding embodiment based on HADOOP when being executed by processor.In the art
Technical staff it should be appreciated that embodiment of the disclosure can provide as method, apparatus or computer program product.Therefore, the disclosure
The form of complete hardware embodiment, complete software embodiment or embodiment combining software and hardware aspects can be used.Moreover,
The disclosure can be used can be stored in the computer that one or more wherein includes computer usable program code with non-transient
The form for the computer program product implemented on medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.).
The schematic diagram of one embodiment of the system based on HADOOP of the disclosure is as shown in Figure 6.611~61n of client
Access request can be sent to NameNode, wherein n is positive integer for positioned at the client of any computer room.
Access agent device 62 based on HADOOP can be above any one access agent dress based on HADOOP
It sets.In one embodiment, the access agent device based on HADOOP is improved RBF.In one embodiment, individually
Only has the access agent device based on HADOOP in HADOOP.Access agent device 62 based on HADOOP can simulate
NameNode provides access interface for client, is selected using above any one data processing method based on HADOOP
NameNode, and forward access request;And the NameNode access result fed back can be fed back to the visitor for initiating access request
Family end.
631~63m of NameNode can manage the NameSpace of file system, and according to access request to DataNode
Instruction is sent, and access result is fed back into the access agent device based on HADOOP, wherein m is positive integer.
641~64i of DataNode can storing data, and respond the operation requests from client and NameNode,
Middle i is positive integer.
In such system based on HADOOP, access agent device can distribute nearby for the access request of client
NameNode is realized and is distributed access request using multiple NameNode that individual access agent apparatus is different location, also keeps away
Flow is big, influences the problem of cluster process performance between computer room caused by exempting from client and NameNode distance far, optimizes multimachine
HADOOP clustering performance under room scene.
The disclosure is reference according to the method for the embodiment of the present disclosure, the flow chart of equipment (system) and computer program product
And/or block diagram describes.It should be understood that each process in flowchart and/or the block diagram can be realized by computer program instructions
And/or the combination of the process and/or box in box and flowchart and/or the block diagram.It can provide these computer programs to refer to
Enable the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to generate
One machine so that by the instruction that the processor of computer or other programmable data processing devices executes generate for realizing
The device for the function of being specified in one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
So far, the disclosure is described in detail.In order to avoid covering the design of the disclosure, it is public that this field institute is not described
The some details known.Those skilled in the art as described above, completely it can be appreciated how implementing technology disclosed herein
Scheme.
Disclosed method and device may be achieved in many ways.For example, can by software, hardware, firmware or
Person's software, hardware, firmware any combination realize disclosed method and device.The step of for the method it is above-mentioned
Sequence is merely to be illustrated, and the step of disclosed method is not limited to sequence described in detail above, unless with other sides
Formula illustrates.In addition, in some embodiments, the disclosure can be also embodied as recording program in the recording medium, these
Program includes for realizing according to the machine readable instructions of disclosed method.Thus, the disclosure also covers storage for executing
According to the recording medium of the program of disclosed method.
Finally it should be noted that: above embodiments are only to illustrate the technical solution of the disclosure rather than its limitations;To the greatest extent
Pipe is described in detail the disclosure referring to preferred embodiment, it should be understood by those ordinary skilled in the art that: still
It can modify to the specific embodiment of the disclosure or some technical features can be equivalently replaced;Without departing from this public affairs
The spirit of technical solution is opened, should all be covered in the claimed technical proposal scope of the disclosure.
Claims (17)
1. a kind of data processing method based on extra large dupp HADOOP, comprising:
The access request from client is received, the access request includes client identification;
The comparable name node NameNode in position of position and the client is selected according to the client identification;
The access request is transmitted to the NameNode of selection.
2. according to the method described in claim 1, wherein, the selection position and the position of the client are comparable
NameNode includes:
The determining operation resources occupation rate with the position immediate NameNode and NameNode of the client;
The operation resources occupation rate of NameNode is compared with predetermined threshold:
If the operation resources occupation rate of NameNode is more than or equal to predetermined threshold,
Determine the operation resources occupation rate of the NameNode of next priority, and compared with predetermined threshold, if NameNode
It runs resources occupation rate and is more than or equal to predetermined threshold, then
Determine the operation resources occupation rate of the NameNode of next priority and compared with predetermined threshold, until NameNode
It runs resources occupation rate and is less than predetermined threshold NameNode, determine the NameNode of selection;
Wherein, the shorter priority of the distance between the NameNode and client is higher.
3. method according to claim 1 or 2, wherein the client identification is client address information;
The selection position and the comparable NameNode in position of the client include:
It is determined between the client and the NameNode according to the address information of the client address information and NameNode
Path;
Select position and the position of the client comparable according to the path length between the client and NameNode
NameNode。
4. according to the method described in claim 1, wherein, the client identification is Client location information;
The selection position and the comparable NameNode in position of the client include:
It is located at same computer room with the client according to the selection of the location information of the Client location information and NameNode
NameNode。
It is described to select the NameNode to include: according to the client identification 5. according to the method described in claim 1, wherein
The computer room that the client is located at is determined according to the location information of the client identification prestored;
The NameNode for being located at identical computer room with the client is determined according to the location information of the NameNode prestored;
Selection is located at the NameNode of same computer room with the client.
6. method according to claim 4 or 5, wherein the selection is located at same computer room with the client
NameNode includes:
In the case where being located at identical computer room with the client there are multiple NameNode, then it is located at phase from the client
With in the NameNode of computer room:
A NameNode is randomly choosed, or according to the operation resources occupation rate of NameNode selection NameNode, or according to
The predetermined priority of NameNode selects NameNode.
7. according to the method described in claim 1, further include:
It receives the access result from NameNode and is transmitted to client.
8. a kind of access agent device based on extra large dupp HADOOP, comprising:
Receiving unit is configured as receiving the access request from client, and the access request includes client identification;
Node selecting unit is configured as selecting the comparable name in position of position and the client according to the client identification
Claim node NameNode;
Retransmission unit is configured as the access request being transmitted to the NameNode of selection, so as to the NameNode.
9. device according to claim 8, wherein the node selecting unit is also configured to
The determining operation resources occupation rate with the position immediate NameNode and NameNode of the client;
The operation resources occupation rate of NameNode is compared with predetermined threshold:
If the operation resources occupation rate of NameNode is more than or equal to predetermined threshold,
Determine the operation resources occupation rate of the NameNode of next priority, and compared with predetermined threshold, if NameNode
It runs resources occupation rate and is more than or equal to predetermined threshold, then
Determine the operation resources occupation rate of the NameNode of next priority and compared with predetermined threshold, until NameNode
It runs resources occupation rate and is less than predetermined threshold NameNode, determine the NameNode of selection;
Wherein, the shorter priority of the distance between the NameNode and client is higher.
10. device according to claim 8 or claim 9, wherein the client identification is client address information;
The node selecting unit is configured as:
It is determined between the client and the NameNode according to the address information of the client address information and NameNode
Path;
Select position and the position of the client comparable according to the path length between the client and NameNode
NameNode。
11. device according to claim 8, wherein the client identification is Client location information;
The node selecting unit is configured as: according to the location information of the Client location information and NameNode selection with
The client is located at the NameNode of same computer room.
12. device according to claim 8, wherein the node selecting unit is configured as:
The computer room that the client is located at is determined according to the location information of the client identification prestored;
The NameNode for being located at identical computer room with the client is determined according to the location information of the NameNode prestored;
Selection is located at the NameNode of same computer room with the client.
13. device according to claim 11 or 12, wherein the selection is located at same computer room with the client
NameNode includes:
In the case where being located at identical computer room with the client there are multiple NameNode, then it is located at phase from the client
With in the NameNode of computer room:
A NameNode is randomly choosed, or according to the operation resources occupation rate of NameNode selection NameNode, or according to
The predetermined priority of NameNode selects NameNode.
14. device according to claim 8, wherein
The receiving unit is additionally configured to receive the access result from NameNode;
The retransmission unit is additionally configured to the access result being transmitted to client.
15. a kind of access agent device based on extra large dupp HADOOP, comprising:
Memory;And
It is coupled to the processor of the memory, the processor is configured to based on the instruction execution for being stored in the memory
Method as described in any one of claim 1 to 7.
16. a kind of computer readable storage medium, is stored thereon with computer program instructions, real when which is executed by processor
The step of method described in existing claim 1 to 7 any one.
17. a kind of system based on extra large dupp HADOOP, comprising:
Client is configured to interact with user;
Access agent device described in claim 8~15 any one based on HADOOP;
Multiple name node NameNode are configured as the NameSpace of management file system;With,
Multiple back end DataNode are configured as storing data, and respond the behaviour from client and the NameNode
It requests.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811440934.1A CN109302497A (en) | 2018-11-29 | 2018-11-29 | Data processing method, access agent device and system based on HADOOP |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811440934.1A CN109302497A (en) | 2018-11-29 | 2018-11-29 | Data processing method, access agent device and system based on HADOOP |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109302497A true CN109302497A (en) | 2019-02-01 |
Family
ID=65141461
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811440934.1A Pending CN109302497A (en) | 2018-11-29 | 2018-11-29 | Data processing method, access agent device and system based on HADOOP |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109302497A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110198346A (en) * | 2019-05-06 | 2019-09-03 | 北京三快在线科技有限公司 | Method for reading data, device, electronic equipment and readable storage medium storing program for executing |
CN111083204A (en) * | 2019-11-29 | 2020-04-28 | 广州市百果园信息技术有限公司 | File transmission method, device and storage medium |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103023902A (en) * | 2012-12-11 | 2013-04-03 | 北京奇虎科技有限公司 | Data transmission method and system |
CN103491192A (en) * | 2013-09-30 | 2014-01-01 | 北京搜狐新媒体信息技术有限公司 | Namenode switching method and system of distributed system |
CN103678360A (en) * | 2012-09-13 | 2014-03-26 | 腾讯科技(深圳)有限公司 | Data storing method and device for distributed file system |
CN103685611A (en) * | 2013-12-31 | 2014-03-26 | 山石网科通信技术有限公司 | Network access processing method and device |
CN103729250A (en) * | 2012-10-11 | 2014-04-16 | 国际商业机器公司 | Method and system to select data nodes configured to satisfy a set of requirements |
US20140244701A1 (en) * | 2013-02-25 | 2014-08-28 | Emc Corporation | Data analytics platform over parallel databases and distributed file systems |
US20150378618A1 (en) * | 2012-01-18 | 2015-12-31 | Cloudera, Inc. | Memory allocation buffer for reduction of heap fragmentation |
CN105554125A (en) * | 2015-04-24 | 2016-05-04 | 美通云动(北京)科技有限公司 | Method for realizing webpage adaptation through CDN (content delivery network) and system thereof |
CN105871985A (en) * | 2015-12-10 | 2016-08-17 | 乐视网信息技术(北京)股份有限公司 | Data access request processing method and apparatus, server, client and system |
CN106649847A (en) * | 2016-12-30 | 2017-05-10 | 南威软件股份有限公司 | A large data real-time processing system based on Hadoop |
CN107493331A (en) * | 2017-08-16 | 2017-12-19 | 网宿科技股份有限公司 | A kind of client access method, server and system |
CN107992491A (en) * | 2016-10-26 | 2018-05-04 | ***通信有限公司研究院 | A kind of method and device of distributed file system, data access and data storage |
CN108768985A (en) * | 2018-05-17 | 2018-11-06 | 成都致云科技有限公司 | A kind of accessed node access distribution method and device |
-
2018
- 2018-11-29 CN CN201811440934.1A patent/CN109302497A/en active Pending
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150378618A1 (en) * | 2012-01-18 | 2015-12-31 | Cloudera, Inc. | Memory allocation buffer for reduction of heap fragmentation |
CN103678360A (en) * | 2012-09-13 | 2014-03-26 | 腾讯科技(深圳)有限公司 | Data storing method and device for distributed file system |
CN103729250A (en) * | 2012-10-11 | 2014-04-16 | 国际商业机器公司 | Method and system to select data nodes configured to satisfy a set of requirements |
CN103023902A (en) * | 2012-12-11 | 2013-04-03 | 北京奇虎科技有限公司 | Data transmission method and system |
US20140244701A1 (en) * | 2013-02-25 | 2014-08-28 | Emc Corporation | Data analytics platform over parallel databases and distributed file systems |
CN103491192A (en) * | 2013-09-30 | 2014-01-01 | 北京搜狐新媒体信息技术有限公司 | Namenode switching method and system of distributed system |
CN103685611A (en) * | 2013-12-31 | 2014-03-26 | 山石网科通信技术有限公司 | Network access processing method and device |
CN105554125A (en) * | 2015-04-24 | 2016-05-04 | 美通云动(北京)科技有限公司 | Method for realizing webpage adaptation through CDN (content delivery network) and system thereof |
CN105871985A (en) * | 2015-12-10 | 2016-08-17 | 乐视网信息技术(北京)股份有限公司 | Data access request processing method and apparatus, server, client and system |
CN107992491A (en) * | 2016-10-26 | 2018-05-04 | ***通信有限公司研究院 | A kind of method and device of distributed file system, data access and data storage |
CN106649847A (en) * | 2016-12-30 | 2017-05-10 | 南威软件股份有限公司 | A large data real-time processing system based on Hadoop |
CN107493331A (en) * | 2017-08-16 | 2017-12-19 | 网宿科技股份有限公司 | A kind of client access method, server and system |
CN108768985A (en) * | 2018-05-17 | 2018-11-06 | 成都致云科技有限公司 | A kind of accessed node access distribution method and device |
Non-Patent Citations (1)
Title |
---|
胡博: "一种跨HDFS集群的文件资源调度机制", 《计算机学报》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110198346A (en) * | 2019-05-06 | 2019-09-03 | 北京三快在线科技有限公司 | Method for reading data, device, electronic equipment and readable storage medium storing program for executing |
CN111083204A (en) * | 2019-11-29 | 2020-04-28 | 广州市百果园信息技术有限公司 | File transmission method, device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10466899B2 (en) | Selecting controllers based on affinity between access devices and storage segments | |
CN105408863B (en) | The end-point data center collected with different tenants | |
JP6470426B2 (en) | Resource allocation device and resource allocation method | |
JP2019522846A5 (en) | ||
CA3043198A1 (en) | Selecting threads for concurrent processing of data | |
US10523753B2 (en) | Broadcast data operations in distributed file systems | |
CN109302466A (en) | Data processing method, relevant device and computer storage medium | |
CN106155264B (en) | Manage the computer approach and computer system of the power consumption of storage subsystem | |
CN109302497A (en) | Data processing method, access agent device and system based on HADOOP | |
CN107786669A (en) | A kind of method of load balance process, server, device and storage medium | |
CN111327651A (en) | Resource downloading method, device, edge node and storage medium | |
CN108173893A (en) | For the method and apparatus of networking | |
CN108563697A (en) | A kind of data processing method, device and storage medium | |
CN104796336B (en) | A kind of method and device for being configured, issuing flow table item | |
US9164800B2 (en) | Optimizing latencies in cloud systems by intelligent compute node placement | |
CN108259218A (en) | A kind of IP address distribution method and device | |
US9641611B2 (en) | Logical interface encoding | |
JP2016116184A (en) | Network monitoring device and virtual network management method | |
JP6951846B2 (en) | Computer system and task allocation method | |
Hsu et al. | Virtual network mapping through path splitting and migration | |
CN108153494B (en) | A kind of I/O request processing method and processing device | |
CN104780235B (en) | IP attribution inquiry method, device and server | |
US20140225896A1 (en) | Resource oriented dependency graph for network configuration | |
JP6256167B2 (en) | Risk reduction in data center networks | |
CN110213365B (en) | User access request processing method based on user partition and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190201 |
|
RJ01 | Rejection of invention patent application after publication |