CN116991618A - Information processing method and device - Google Patents

Information processing method and device Download PDF

Info

Publication number
CN116991618A
CN116991618A CN202310961445.5A CN202310961445A CN116991618A CN 116991618 A CN116991618 A CN 116991618A CN 202310961445 A CN202310961445 A CN 202310961445A CN 116991618 A CN116991618 A CN 116991618A
Authority
CN
China
Prior art keywords
control unit
main control
master
computing
master control
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310961445.5A
Other languages
Chinese (zh)
Inventor
李栋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN202310961445.5A priority Critical patent/CN116991618A/en
Publication of CN116991618A publication Critical patent/CN116991618A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Hardware Redundancy (AREA)

Abstract

The embodiment of the application discloses an information processing method and device, wherein a cluster comprises a main control unit group, a plurality of computing resources corresponding to the main control unit group are applied for by at least one main control unit in the main control unit group, and each main control unit in the main control unit group can analyze a computing task when receiving the computing task so as to determine a plurality of subtasks and issue the subtasks to at least one computing resource in the applied plurality of computing resources.

Description

Information processing method and device
Technical Field
The present application relates to the field of data processing technologies, and in particular, to an information processing method and apparatus.
Background
With the advent of the big data age, distributed computing engines have grown. In a distributed computing engine, a standard application cluster consists of a master control unit and a plurality of computing resources. The main control unit is used for receiving a calculation request of the client, analyzing the calculation request, generating an execution plan, carrying out task scheduling, collecting and returning a calculation result to the client, and the calculation resource is used for receiving and executing an actual calculation task from the main control unit and returning the calculation result to the main control unit. In the above architecture, once the master control unit is abnormal, the whole application cluster is completely disabled, and no service can be provided to the outside.
Disclosure of Invention
The application aims to provide an information processing method and device, which comprise the following technical scheme:
an information processing method for any master unit in a master unit group of a cluster, the method comprising:
when a computing task is received, analyzing the computing task to determine a plurality of subtasks;
determining at least one computing resource in a plurality of computing resources corresponding to the main control unit group; the plurality of computing resources are obtained by applying for at least one main control unit in the main control unit group;
the plurality of subtasks is issued to the at least one computing resource.
The method, optionally, further comprises:
and sharing information with other main control units in the main control unit group.
The method, optionally, performs information sharing with other master control units in the master control unit group, including:
acquiring state information and/or load information of any main control unit and/or state information of each computing resource in the plurality of computing resources;
synchronizing the state information and/or load information of any master control unit and/or the state information of each computing resource to other master control units in the master control unit group;
For any one of the plurality of computing resources, the state information of the any one computing resource acquired by each main control unit is used for determining whether the any one computing resource is abnormal.
The method, optionally, further comprising,
when the application condition of the computing resource is met, applying for the computing resource to a resource manager;
the meeting the computing resource application condition includes:
the plurality of computing resources cannot meet the computing requirements; or alternatively, the process may be performed,
the plurality of computing resources cannot meet the computing requirements, and any master control unit is polled according to a polling sequence.
The method, optionally, includes parsing the computing task to determine a plurality of subtasks when the computing task is received, including:
and when the calculation task is received, if the load of any main control unit meets the processing condition, analyzing the calculation task to determine a plurality of subtasks.
The method, optionally, further comprises:
if the load of any master control unit does not meet the processing condition, determining a proxy master control unit in the master control unit group; the load of the proxy main control unit meets the processing condition;
and sending the calculation task to the proxy main control unit.
The method, optionally, the load of any master control unit is determined based on at least one of the following:
the sum of the number of the sub-tasks which are running and waiting to run and correspond to any main control unit; the load of any master control unit is positively related to the sum of the numbers;
the proportion of the CPU occupied by the running subtasks corresponding to any main control unit; the load of any master control unit is positively related to the ratio;
the average calculation time of the running tasks corresponding to any main control unit; the load of any master control unit is positively related to the average calculation time.
An information processing method is used for any one of a plurality of computing resources corresponding to a main control unit group of a cluster, wherein the computing resources are obtained by applying for at least one main control unit in the main control unit group; the method comprises the following steps:
receiving at least one subtask sent by any master control unit in the master control unit group;
processing the at least one subtask to obtain a calculation result;
and returning the calculation result to any main control unit.
An information processing apparatus for any master unit in a master unit group of a cluster, the apparatus comprising:
The analysis module is used for analyzing the calculation task when receiving the calculation task so as to determine a plurality of subtasks;
the determining module is used for determining at least one computing resource from a plurality of computing resources corresponding to the main control unit group; the plurality of computing resources are obtained by applying for at least one main control unit in the main control unit group;
and the issuing module is used for issuing the plurality of subtasks to the at least one computing resource.
An information processing device is used for any one of a plurality of computing resources corresponding to a main control unit group of a cluster, wherein the computing resources are obtained by applying for at least one main control unit in the main control unit group; the device comprises:
the receiving module is used for receiving at least one subtask sent by any master control unit in the master control unit group;
the processing module is used for processing the at least one subtask to obtain a calculation result;
and the return module is used for returning the calculation result to any main control unit.
An electronic device, comprising:
a memory for storing a program;
a processor for calling and executing the program in the memory, and implementing the respective steps of the information processing method according to any one of the above by executing the program.
A readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the information processing method according to any of the preceding claims.
According to the information processing method and device provided by the application, the cluster comprises the main control unit group, a plurality of computing resources corresponding to the main control unit group are applied for by at least one main control unit in the main control unit group, each main control unit in the main control unit group can analyze the computing task when receiving the computing task so as to determine a plurality of subtasks and issue the subtasks to at least one computing resource in the applied plurality of computing resources, and therefore, even if one main control unit is abnormal, other main control units can operate, the situation that the whole application cluster is completely invalid and service cannot be provided outside due to the abnormality of a single main control unit is avoided, and the stability of the cluster is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed for the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a diagram illustrating an exemplary architecture of an application cluster according to an embodiment of the present application;
FIG. 2 is a diagram illustrating another architecture example of an application cluster according to an embodiment of the present application;
FIG. 3 is a flowchart of an implementation of an information processing method according to an embodiment of the present application;
FIG. 4 is a flowchart of another implementation of the information processing method according to the embodiment of the present application;
FIG. 5 is a diagram illustrating yet another exemplary architecture of an application cluster according to an embodiment of the present application;
FIG. 6 is a schematic diagram of an information processing apparatus according to an embodiment of the present application;
FIG. 7 is a schematic diagram of another configuration of an information processing apparatus according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
The terms "first," "second," "third," "fourth" and the like in the description and in the claims and in the above drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in other sequences than those illustrated herein.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without any inventive effort, are intended to be within the scope of the application.
The information processing method and the information processing device provided by the application are used for a distributed computing engine, and the distributed computing engine can be a Spark computing engine or other distributed computing engines, for example, a Hadoop computing engine, a Hive computing engine, a Presto computing engine or a Starblocks computing engine and the like.
As shown in fig. 1, an exemplary diagram of an architecture of an application cluster (may also be simply referred to as a cluster) according to an embodiment of the present application is shown, where an application cluster includes a master control unit (driver) and m (m is greater than or equal to 2) computing resources (Executor), where m computing resources in the application cluster are all applied for by the master control unit. Based on this example, once the master control unit is abnormal, the whole application cluster is completely disabled, and no service can be provided to the outside.
In order to improve the high availability of the master control units in the distributed computing engine, in the embodiment of the application, a master control unit group is configured for each application cluster, wherein the master control unit group comprises at least two master control units, and a plurality of computing resources in each application cluster are applied for by the master control unit group in the application cluster. In different application clusters, the number of the main control units in the main control unit group can be the same or different; similarly, the number of computing resources may be the same or different in different application clusters.
As shown in fig. 2, another architecture example diagram of an application cluster according to an embodiment of the present application is shown, where the application cluster includes n (n is greater than or equal to 2) master units, and m (m is greater than or equal to 2) computing resources, where the m computing resources are applied for by at least one master unit of the n master units.
Based on the architecture diagram shown in fig. 2, the present application provides an information processing method for any master control unit in a master control unit group. As shown in fig. 3, a flowchart for implementing an information processing method according to an embodiment of the present application may include:
Step S301: and when the computing task is received, analyzing the computing task to determine a plurality of subtasks.
In the embodiment of the application, after any application cluster receives the calculation task of the client, one main control unit can be determined in the main control unit group to serve as a target main control unit according to the preset main control unit scheduling rule, and then the calculation task is distributed to the target main control unit. Therefore, any master control unit can receive the calculation task when being determined as the target master control unit, and when the any master control unit receives the calculation task, the calculation task is resolved into a plurality of subtasks, and the specific resolving process can refer to the existing scheme, which is not described in detail herein.
After the analysis is completed, in order to facilitate interaction between the main control unit and the computing resource, each subtask further comprises identification information of any main control unit besides the identification of the subtask, so that after any main control unit issues the subtask to the selected computing resource, the computing resource can feed back the computing result to any main control unit.
As an example, a scheduling unit (not shown in fig. 2) may be configured in the application cluster, and each time a computing task of the client is received, the scheduling unit schedules each master unit in the master unit group.
As an example, the master unit scheduling rule may be a polling schedule (i.e., selecting one master unit in the master unit group as a target master unit in a preset order), or a random schedule (i.e., randomly selecting one master unit in the master unit group as a target master unit), or may be a schedule according to the load size of the master units (i.e., determining the master unit with the smallest load as the target master unit), or the like.
Step S302: at least one computing resource is determined among a plurality of computing resources corresponding to the master control unit group.
That is, regardless of which master unit in the master unit group received the computing task, at least one computing resource is determined from m computing resources corresponding to the master unit group. The specific determination of at least one computing resource may be found in existing schemes and will not be described in detail herein.
Step S303: and issuing the plurality of subtasks to the at least one computing resource.
Any master control unit can issue a plurality of subtasks to at least one computing resource at the same time, or can issue a plurality of subtasks to at least one computing resource in turn, specifically, issue simultaneously or issue sequentially, and can determine whether a dependency relationship exists among a plurality of subtasks.
Each computing resource in the at least one computing resource performs computation based on the received subtasks to obtain a computation result, and the computation result is fed back to any one of the main control units based on the task identifiers carried in the subtasks and the identification information of the main control units.
The information processing method provided by the embodiment of the application comprises a main control unit group, wherein a plurality of computing resources corresponding to the main control unit group are applied for by at least one main control unit in the main control unit group, each main control unit in the main control unit group can analyze the computing task when receiving the computing task so as to determine a plurality of subtasks and issue the subtasks to at least one computing resource in the applied plurality of computing resources, thus even if one main control unit is abnormal, other main control units can operate, the situation that the whole application cluster is completely invalid due to the abnormality of a single main control unit and service can not be provided outside any more is avoided, and the stability of the cluster is improved.
In an alternative embodiment, any master unit in the master unit group may share information with other master units in the master unit group.
Wherein, the information sharing between the master control units comprises, but is not limited to, sharing of at least one of the following information: state sharing of the master control unit, state sharing of computing resources, load information sharing of the master control unit and the like.
The main control units in the same main control unit group can directly interact to realize information sharing, or the main control units in the same main control unit group can also interact through a message synchronization module to realize information sharing.
In an alternative embodiment, the sharing of information between any of the above-mentioned master units and other master units in the master unit group may include:
any master control unit may obtain status information and/or load information of the any master control unit, and/or status information of each of the plurality of computing resources.
Optionally, in order for the master control unit to obtain the status information of each computing resource, each computing resource corresponding to the master control unit group may send heartbeat information to the any master control unit, each computing resource periodically sends heartbeat information to the any master control unit, and the any master control unit may determine the status of any computing resource according to the received heartbeat information of the any computing resource, where if the any master control unit may periodically receive the heartbeat information of the any computing resource, it may be determined that the any computing resource is in an online status, and is normal, otherwise, it may be determined that the any computing resource is abnormal.
The state information and/or load information of any master control unit and/or the state information of each computing resource are synchronized to other master control units in the master control unit group by any master control unit.
Optionally, under the condition that the master control units in the same master control unit group directly interact with each other, the any master control unit can directly send the state information and/or the load information of the any master control unit and/or the state information of each computing resource to other master control units in the master control unit group.
Under the condition that the main control units in the same main control unit group interact through the message synchronization module, any main control unit can send the state information and/or the load information of any main control unit and/or the state information of each computing resource to the message synchronization module, and the message synchronization module sends the state information and/or the load information of any main control unit and/or the state information of each computing resource to other main control units in the main control unit group.
For any one of the plurality of computing resources, the state information of the any one computing resource acquired by each main control unit is used for determining whether the any one computing resource is abnormal.
After the state information of any computing resource sent by other master control units in the same master control unit group is obtained by the corresponding any computing resource, the state information of the any computing resource determined by each master control unit can be compared, and if the state information of the any computing resource determined by at least one master control unit characterizes the abnormality of the any computing resource, the abnormality of the any computing resource is determined. That is, as long as there is one master unit determining that any one of the computing resources is abnormal, the any one of the computing resources is determined to be abnormal.
In an optional embodiment, the information processing method provided by the embodiment of the present application may further include:
when the computing resource application condition is satisfied, any master control unit can apply for computing resources to a resource manager in the distributed computing engine.
Wherein, meeting the computing resource application condition may include:
the multiple computing resources corresponding to the main control unit group cannot meet the computing requirements. As an example, the number of subtasks to be processed corresponding to the application cluster may be monitored, and if the number of subtasks to be processed is greater than the number threshold, it is determined that the plurality of computing resources cannot meet the computing requirement, and new computing resources need to be applied.
Optionally, after any master control unit analyzes to obtain the subtasks, if there is no idle computing resource, the process can wait, after the idle computing resource is left, the subtasks are issued to the idle computing resource, the subtasks waiting to be issued are added into the queues corresponding to any master control unit as the subtasks to be processed, and the sum of the number of the subtasks in the queues corresponding to each master control unit in the same master control unit group can be counted to obtain the number of the subtasks to be processed.
Optionally, after applying for the computing resource to the resource manager, any master control unit may send target state information to other master control units in the same master control unit group, where the target state information characterizes that the any master control unit has sent computing resource request information to the resource manager, and the master control unit that receives the target state information may not need to apply for the computing resource to the resource manager. Of course, if at least two master units in the same master unit group send the computing resource request information to the resource manager, the resource manager only responds to the first master unit in the same master unit group that sends the computing resource request information to the resource manager and allocates computing resources to the master unit group (the allocated computing resources can be called by any master unit in the master unit group), but does not respond to the computing resource request information sent by other master units in the same master unit group.
Optionally, after applying for the computing resource to the resource manager, any master control unit may not send the target state information to other master control units in the same master control unit group, where the target state information characterizes that the any master control unit has sent the computing resource request information to the resource manager. Based on this, the resource manager allocates computing resources to the master unit group only in response to the first master unit in the same master unit group that sends computing resource request information to the resource manager (the allocated computing resources may be invoked by any master unit in the master unit group), and does not respond to computing resource request information sent by other master units in the same master unit group.
In an alternative embodiment, the meeting the computing resource application condition may include:
the multiple computing resources corresponding to the master control unit group cannot meet the computing requirements, and any master control unit is polled according to the polling sequence.
That is, each master control unit in the same master control unit group polls for applying for the computing resource, and when determining that the computing resources corresponding to the master control unit group cannot meet the computing requirement, the computing resource is applied for the resource manager by different master control units in the master control unit group. For example, the multiple computing resources sequentially apply for the computing resources according to the sequence from the small number to the large number, if the last time that the multiple computing resources corresponding to the main control unit group are determined to be unable to meet the computing requirement is that the main control unit with the number a applies for the computing resources, the last time that the multiple computing resources corresponding to the main control unit group are determined to be unable to meet the computing requirement is that the main control unit with the number a+1 applies for the computing resources.
In an alternative embodiment, when any master control unit receives the calculation task, it may first determine whether the master control unit itself satisfies the processing condition, and if the master control unit satisfies the processing condition, then analyze the calculation task.
Optionally, when the computing task is received, parsing the computing task to determine a plurality of subtasks may include:
when any main control unit receives the subtasks, if the load of any main control unit meets the processing conditions, analyzing the calculation tasks to determine a plurality of subtasks.
Wherein, the load of any master control unit meeting the processing condition may include:
the load of any master control unit is smaller than the target load.
Or alternatively, the process may be performed,
the load of any master control unit is smaller than the load of at least one other master control unit in the master control unit group. That is, if any master unit is not the master unit with the largest load in the master unit group, the load of any master unit is considered to satisfy the processing condition. As an example, if the any master unit is the master unit with the smallest load in the master unit group, the load of the any master unit is considered to satisfy the processing condition. As an example, the master units in the master unit group are ordered in order of load from small to large, and if any master unit is the master unit of N before ordering, the load of any master unit is considered to satisfy the processing condition.
Further, when any master control unit receives the calculation task, if the any master control unit does not meet the processing conditions, the calculation task is transferred to other master control units in the same master control unit group for processing.
Optionally, when any master control unit receives the calculation task, if the load of any master control unit does not meet the processing condition, selecting one master control unit from the master control unit group as an agent master control unit; the load of the proxy master unit satisfies the processing condition.
The master control unit with the smallest load in the master control unit group may be determined as the proxy master control unit. Or alternatively, the process may be performed,
each master control unit in the master control unit group can be ordered according to the order of the load from small to large, and one master control unit is randomly selected from the master control units M before ordering to serve as an agent master control unit.
And sending the computing task to the proxy main control unit, analyzing the computing task into a plurality of subtasks by the proxy main control unit, and issuing the subtasks to at least one computing resource in the plurality of computing resources. And the calculation result obtained by processing the subtasks by the at least one calculation resource is returned to the proxy main control unit.
Further, the proxy main control unit returns the calculation result to any main control unit, that is, after the any main control unit sends the calculation task to the proxy main control unit, the calculation result for the calculation task returned by the proxy main control unit is also received.
By transferring the calculation task of the main control unit with larger load to the main control unit with smaller load, the waiting time of the calculation task can be reduced, and the task processing efficiency can be improved.
In an alternative embodiment, the load of any master unit may be determined based on at least one of:
the sum of the number of sub-tasks which are running and waiting to run and correspond to any main control unit. The load of any master control unit is just related to the sum of the above numbers, that is, the larger the sum of the number of the sub-tasks which are running and waiting to run and correspond to any master control unit, the larger the load of any master control unit, the smaller the sum of the number of the sub-tasks which are running and waiting to run and correspond to any master control unit, and the smaller the load of any master control unit.
The proportion of the CPU occupied by the running subtasks corresponding to any main control unit. The load of any master control unit is directly related to the proportion, that is, the larger the proportion of the CPU occupied by the running subtask corresponding to the any master control unit is, the larger the load of any master control unit is, the smaller the proportion of the CPU occupied by the running subtask corresponding to the any master control unit is, and the smaller the load of any master control unit is.
And the average calculation time of the running tasks corresponding to any main control unit. The load of any master control unit is directly related to the average calculation time, that is, the longer the average calculation time of the running task corresponding to any master control unit is, the larger the load of any master control unit is, the shorter the average calculation time of the running task corresponding to any master control unit is, and the smaller the load of any master control unit is.
The information processing method of the application cluster is described above from the viewpoint of the main control unit, and the information processing method of the application cluster is described below from the viewpoint of computing resources. As shown in fig. 4, another implementation flowchart of an information processing method provided by an embodiment of the present application is the same in information processing manner of each computing resource in an application cluster, where the information processing method may be used for any computing resource, and specifically may include:
step S401: at least one subtask sent by any master control unit in the master control unit group is received.
The identification information of each subtask comprises the identification of the subtask and the identification information of any master control unit, so that any computing resource can feed back the computing result to any master control unit after obtaining the computing result.
Step S402: and processing at least one subtask to obtain a calculation result.
The process of processing subtasks to obtain calculation results can refer to the existing scheme, and will not be described in detail here.
Step S403: and returning the calculation result to any one of the main control units.
The computing resource may return the computing result to any of the above-mentioned master control units according to the identification information of the subtasks.
Specifically, any computing resource can analyze the identification information of the at least one subtask to determine the identification of the subtask and the identification information of the main control unit; and returning the calculation result to the main control unit according to the analyzed identification of the subtask and the identification information of the main control unit.
According to the information processing method provided by the embodiment of the application, the cluster comprises the main control unit group, a plurality of computing resources corresponding to the main control unit group are applied for by at least one main control unit in the main control unit group, each computing resource in the plurality of computing resources can receive a subtask issued by any main control unit in the main control unit group, so that even if a certain main control unit is abnormal, other main control units can operate, the computing resources can also receive the subtasks issued by other main control units, the situation that the whole application cluster is completely invalid due to the abnormality of a single main control unit and service cannot be provided outside any more is avoided, and the stability of the cluster is improved.
In an alternative embodiment, any one of the plurality of computing resources may further send heartbeat information to each of the master units in the master unit group, so that each of the master units determines a state of the any one computing resource according to the heartbeat information of the any one computing resource.
In an alternative embodiment, each master unit is provided with a master data block management module (BlockManager master), and the master data block management module on any master unit is used for uniformly managing the data block management modules (BlockManager) existing on the master unit or the computing resource. The interaction between the master control unit and the computing resource about the BlockManager depends on the BlockManager master, for example, the computing resource needs to send the registered BlockManager to the master control unit, update the latest information of the data block on the computing resource, inquire the position of the required data block target, and the like. Whereas BlockManager is only responsible for managing the data blocks on the computing resource where it resides. In the present application, the computing resource may generate a data block during the processing of the subtasks, for example, a data block of Rdd (Resilient Distributed Datasets, elastic distributed data set), a data block of Shuffle, broadcast, etc. After the computing resource generates the data block, the computing resource reports the information related to the data block (such as, but not limited to, the identifier, the size, the storage position, etc. of the data block) to the main control unit sending the subtask, and the main control unit correlates the information related to the data block with the identifier information of the main control unit and issues the identifier information of the main control unit to the computing resource. And subsequently, when any computing resource needs to read the target data block, searching the target data block in the local area, and if the target data block is not locally found, determining a main data block management module (BlockManagerMaster) corresponding to the main control unit according to the identification information of the main control unit for transmitting the subtask, thereby searching the target data block according to the main data block management module. The main data block management module is a tool for performing data block tracking, and how to track and read target data blocks specifically can refer to the existing scheme, and will not be described in detail here.
Furthermore, each computing resource may record the identifier of the master data block management module corresponding to each master control unit. Further, when any computing resource receives a new message of the main control unit sent by the new main control unit, the any computing resource correspondingly increases the identification information of the main data block management module corresponding to the new main control unit. When any computing resource receives a main control unit deleting message sent by any main control unit, the any computing resource correspondingly deletes the identification information of the main data block management module corresponding to any main control unit.
Further, each computing resource may also record identification information of each master control unit. Further, when any computing resource receives a new message of the main control unit sent by the new main control unit, the any computing resource correspondingly increases the identification information of the new main control unit. When any computing resource receives a main control unit deleting message sent by any main control unit, the any computing resource correspondingly sends the identification information of any main control unit.
When the heartbeat information needs to be sent, each computing resource only sends the heartbeat information to the recorded main control unit.
As shown in fig. 5, another architecture example diagram of an application cluster according to an embodiment of the present application is provided, where the application cluster includes a scheduling unit, a message synchronization module, a master unit group (including n master units), and m computing resources, where the m computing resources are applied for by at least one master unit in the n master unit groups.
Each master control unit is configured with a shared resource management module, a load monitoring module, a task migration module and a master data block management module (not shown). Wherein, the liquid crystal display device comprises a liquid crystal display device,
the load monitoring module of the main control unit i (i=1, 2,3, … …, n) is used for monitoring the load condition of the main control unit i.
The task migration module of the master control unit i is used for interacting with migration modules of other master control units of the same master control unit group so as to realize migration of computing tasks among different master control units.
The shared resource management module of each main control unit is used for interacting with the message synchronization module and realizing information sharing among the main control units. The shared resource management module and the message synchronization module can interact through a subscription/push mechanism. In the example shown in fig. 5, one message synchronization module is configured for each application cluster, and in other embodiments, one message synchronization module may be configured for multiple application clusters, where in this case, when the shared resource management module of each master unit sends information to be shared to the message synchronization module, the shared resource management module of each master unit needs to carry identification information of the master unit group, so that the message synchronization module can synchronize the messages to the master units of the same master unit group.
And each computing resource is configured with a main control unit information queue and a main data block information queue, wherein the main control unit information queue is used for storing the identification information of each main control unit, and the main data block information queue is used for storing the identification information of the main data block management module of each main control unit. Each computing resource sends heartbeat information to each master control unit according to the master control unit information queue so that each master control unit can determine the state information of the computing resource. In the process of processing the subtasks, if the computing resource needs to read the target data block corresponding to the main control unit i (for example, the data block of the Rdd, the data block of the Shuffle process, the data block of the Broadcast process, etc.), the computing resource can be searched locally, and if the target data block is not searched according to the main data block management module corresponding to the main control unit i.
Communication is performed between the master units within the master unit group and between the master unit group and the plurality of computing resources based on a RPC (Remote Procedure Call) mechanism.
After the client sends a calculation request to the application cluster, a scheduling unit of the application cluster determines one master control unit (for convenience of description and distinction, denoted as a target master control unit) from n master control units. In order to facilitate the scheduling of the scheduling unit, each main control unit can periodically send heartbeat information to the scheduling unit, for any main control unit, the scheduling unit can judge whether any main control unit is online according to the heartbeat information of any main control unit, if the scheduling unit determines that any main control unit does not periodically send the heartbeat information according to the heartbeat information of any main control unit, the scheduling unit determines that any main control unit is abnormal, the scheduling of any main control unit is not performed, and only if any main control unit is normal, the scheduling of any main control unit is performed.
The scheduling unit issues the calculation task which is requested to be processed by the calculation request to the target main control unit.
The target main control unit determines whether the load of the target main control unit meets the processing condition through the shared resource management module, if the load meets the processing condition, the computing task is analyzed into a plurality of subtasks, the identification information of the target main control unit is added in the identification information of each subtask, and the plurality of subtasks are issued to at least one computing resource; if the processing conditions are not met, determining a proxy main control unit in the main control unit, and sending the calculation task to the proxy main control unit through the task migration module.
The agent main control unit analyzes the computing task into a plurality of subtasks, adds the identification information of the agent main control unit in the identification information of each subtask, and issues the plurality of subtasks to at least one computing resource; after the agent main control unit obtains feedback information of the computing resource aiming at a plurality of subtasks, the feedback information is sent to the target main control unit through the task migration module.
And the target main control unit receives the feedback information sent by at least one computing resource or the feedback information sent by the proxy main control unit and then sends the feedback information to the client.
In addition, if the main control unit i is a newly added main control unit, the main control unit i sends a new instruction to each computing resource, wherein the new instruction carries the identification information of the main control unit i, and after each computing resource receives the new instruction, the identification information of the main control unit i is added into the main control unit information queue, and the identification information of the main data block management module of the main control unit i is added into the main data block information queue.
If the main control unit i determines that the state of the computing resource j is abnormal according to the heartbeat message sent by the computing resource j (j=1, 2,3, … …, m), the state information of the computing resource j is shared with other main control units, and after receiving the computing task, the main control unit does not issue the sub-task obtained by analysis to the computing resource j.
If the computing resource j receives the deleting instruction sent by the main control unit i, deleting the identification information of the main control unit i from the main control unit information queue, and deleting the identification information of the main data block management module of the main control unit i from the main data block information queue.
Corresponding to the method embodiment, the present application further provides an information processing apparatus, and a schematic structural diagram of the information processing apparatus provided in the embodiment of the present application is shown in fig. 6, and may include:
The analysis module 601, the determination module 602 and the issuing module 603; wherein, the liquid crystal display device comprises a liquid crystal display device,
the parsing module 601 is configured to parse the computing task to determine a plurality of subtasks when the computing task is received;
the determining module 602 is configured to determine at least one computing resource from a plurality of computing resources corresponding to the master control unit group; the plurality of computing resources are obtained by applying for at least one main control unit in the main control unit group;
the issuing module 603 is configured to issue the plurality of subtasks to the at least one computing resource.
The information processing device provided by the embodiment of the application is used for any master control unit in the master control unit group of the cluster, the cluster comprises the master control unit group, a plurality of computing resources corresponding to the master control unit group are applied for by at least one master control unit in the master control unit group, each master control unit in the master control unit group can analyze the computing task when receiving the computing task so as to determine a plurality of subtasks and issue the subtasks to at least one computing resource in the applied plurality of computing resources, thus even if one master control unit is abnormal, other master control units can operate, the situation that the whole application cluster is completely invalid and service cannot be provided any more due to the abnormality of a single master control unit is avoided, and the stability of the cluster is improved.
In an alternative embodiment, the apparatus further comprises:
and the sharing module is used for sharing information with other main control units in the main control unit group.
In an alternative embodiment, the sharing module is configured to:
acquiring state information and/or load information of any main control unit and/or state information of each computing resource in the plurality of computing resources;
synchronizing the state information and/or load information of any master control unit and/or the state information of each computing resource to other master control units in the master control unit group;
for any one of the plurality of computing resources, the state information of the any one computing resource acquired by each main control unit is used for determining whether the any one computing resource is abnormal.
In an alternative embodiment, the apparatus further comprises,
the application module is used for applying for the computing resources to the resource manager when the computing resource application conditions are met;
the meeting the computing resource application condition includes:
the plurality of computing resources cannot meet the computing requirements; or alternatively, the process may be performed,
the plurality of computing resources cannot meet the computing requirements, and any master control unit is polled according to a polling sequence.
In an alternative embodiment, the parsing module 601 is configured to:
and when the calculation task is received, if the load of any main control unit meets the processing condition, analyzing the calculation task to determine a plurality of subtasks.
In an alternative embodiment, the apparatus further comprises:
the migration module is used for determining an agent main control unit in the main control unit group if the load of any main control unit does not meet the processing condition; the load of the proxy main control unit meets the processing condition; and sending the calculation task to the proxy main control unit.
In an alternative embodiment, the load of any master control unit is determined based on at least one of:
the sum of the number of the sub-tasks which are running and waiting to run and correspond to any main control unit; the load of any master control unit is positively related to the sum of the numbers;
the proportion of the CPU occupied by the running subtasks corresponding to any main control unit; the load of any master control unit is positively related to the ratio;
the average calculation time of the running tasks corresponding to any main control unit; the load of any master control unit is positively related to the average calculation time.
Corresponding to the method embodiment, another schematic structural diagram of the information processing apparatus provided in the embodiment of the present application is shown in fig. 7, and may include:
a receiving module 701, a processing module 702 and a returning module 703; wherein, the liquid crystal display device comprises a liquid crystal display device,
the receiving module 701 is configured to receive at least one subtask sent by any master control unit in the master control unit group;
the processing module 702 is configured to process the at least one subtask to obtain a calculation result;
the return module 703 is configured to return the calculation result to the any master control unit.
The information processing device provided by the embodiment of the application is used for any one of a plurality of computing resources corresponding to a main control unit group of a cluster, wherein the computing resources are applied for by at least one main control unit in the main control unit group; each computing resource in the plurality of computing resources can receive the subtasks issued by any master control unit in the master control unit group, so that even if a certain master control unit is abnormal, other master control units can operate, the computing resources can also receive the subtasks issued by other master control units, the situation that the whole application cluster is completely invalid and cannot provide service for the outside due to the abnormality of a single master control unit is avoided, and the stability of the cluster is improved.
Corresponding to the method embodiment, the application further provides an electronic device, and a schematic structural diagram of the electronic device is shown in fig. 8, which may include: at least one processor 1, at least one communication interface 2, at least one memory 3 and at least one communication bus 4.
In the embodiment of the present application, the number of the processor 1, the communication interface 2, the memory 3 and the communication bus 4 is at least one, and the processor 1, the communication interface 2 and the memory 3 complete communication with each other through the communication bus 4.
The processor 1 may be a central processing unit CPU, or a specific integrated circuit ASIC (Application Specific Integrated Circuit), or one or more integrated circuits configured to implement embodiments of the present application, etc.
The memory 3 may comprise a high-speed RAM memory, and may also comprise a non-volatile memory (non-volatile memory) or the like, such as at least one disk memory.
Wherein the memory 3 stores a program, the processor 1 may call the program stored in the memory 3, the program being for:
when a computing task is received, analyzing the computing task to determine a plurality of subtasks; determining at least one computing resource from a plurality of computing resources corresponding to a main control unit group of the cluster; the plurality of computing resources are obtained by applying for at least one main control unit in the main control unit group; the plurality of subtasks is issued to the at least one computing resource.
Or alternatively, the process may be performed,
the program is used for any one of a plurality of computing resources corresponding to a main control unit group of the cluster, wherein the computing resources are obtained by applying for at least one main control unit in the main control unit group; the program is specifically for: receiving at least one subtask sent by any master control unit in a master control unit group of the cluster; processing the at least one subtask to obtain a calculation result; and returning the calculation result to any main control unit.
Alternatively, the refinement function and the extension function of the program may be described with reference to the above.
The embodiment of the present application also provides a storage medium storing a program adapted to be executed by a processor, the program being configured to:
when a computing task is received, analyzing the computing task to determine a plurality of subtasks; determining at least one computing resource from a plurality of computing resources corresponding to a main control unit group of the cluster; the plurality of computing resources are obtained by applying for at least one main control unit in the main control unit group; the plurality of subtasks is issued to the at least one computing resource.
Or alternatively, the process may be performed,
the program is used for any one of a plurality of computing resources corresponding to a main control unit group of the cluster, wherein the computing resources are obtained by applying for at least one main control unit in the main control unit group; the program is specifically for: receiving at least one subtask sent by any master control unit in a master control unit group of the cluster; processing the at least one subtask to obtain a calculation result; and returning the calculation result to any main control unit.
Alternatively, the refinement function and the extension function of the program may be described with reference to the above.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the several embodiments provided by the present application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
It should be understood that in the embodiments of the present application, the claims, the various embodiments, and the features may be combined with each other, so as to solve the foregoing technical problems.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. An information processing method for any master unit in a master unit group of a cluster, the method comprising:
when a computing task is received, analyzing the computing task to determine a plurality of subtasks;
determining at least one computing resource in a plurality of computing resources corresponding to the main control unit group; the plurality of computing resources are obtained by applying for at least one main control unit in the main control unit group;
the plurality of subtasks is issued to the at least one computing resource.
2. The method of claim 1, further comprising:
and sharing information with other main control units in the main control unit group.
3. The method of claim 2, the sharing of information with other master units in the set of master units, comprising:
acquiring state information and/or load information of any main control unit and/or state information of each computing resource in the plurality of computing resources;
synchronizing the state information and/or load information of any master control unit and/or the state information of each computing resource to other master control units in the master control unit group;
for any one of the plurality of computing resources, the state information of the any one computing resource acquired by each main control unit is used for determining whether the any one computing resource is abnormal.
4. The method of claim 1, further comprising,
when the application condition of the computing resource is met, applying for the computing resource to a resource manager;
the meeting the computing resource application condition includes:
the plurality of computing resources cannot meet the computing requirements; or alternatively, the process may be performed,
the plurality of computing resources cannot meet the computing requirements, and any master control unit is polled according to a polling sequence.
5. The method of any of claims 1-4, wherein upon receiving a computing task, parsing the computing task to determine a plurality of sub-tasks comprises:
And when the calculation task is received, if the load of any main control unit meets the processing condition, analyzing the calculation task to determine a plurality of subtasks.
6. The method of claim 5, further comprising:
if the load of any master control unit does not meet the processing condition, determining a proxy master control unit in the master control unit group; the load of the proxy main control unit meets the processing condition;
and sending the calculation task to the proxy main control unit.
7. The method of claim 5, the load of any master unit being determined based on at least one of:
the sum of the number of the sub-tasks which are running and waiting to run and correspond to any main control unit; the load of any master control unit is positively related to the sum of the numbers;
the proportion of the CPU occupied by the running subtasks corresponding to any main control unit; the load of any master control unit is positively related to the ratio;
the average calculation time of the running tasks corresponding to any main control unit; the load of any master control unit is positively related to the average calculation time.
8. An information processing method is used for any one of a plurality of computing resources corresponding to a main control unit group of a cluster, wherein the computing resources are obtained by applying for at least one main control unit in the main control unit group; the method comprises the following steps:
Receiving at least one subtask sent by any master control unit in the master control unit group;
processing the at least one subtask to obtain a calculation result;
and returning the calculation result to any main control unit.
9. An information processing apparatus for any master unit in a master unit group of a cluster, the apparatus comprising:
the analysis module is used for analyzing the calculation task when receiving the calculation task so as to determine a plurality of subtasks;
the determining module is used for determining at least one computing resource from a plurality of computing resources corresponding to the main control unit group; the plurality of computing resources are obtained by applying for at least one main control unit in the main control unit group;
and the issuing module is used for issuing the plurality of subtasks to the at least one computing resource.
10. An information processing device is used for any one of a plurality of computing resources corresponding to a main control unit group of a cluster, wherein the computing resources are obtained by applying for at least one main control unit in the main control unit group; the device comprises:
the receiving module is used for receiving at least one subtask sent by any master control unit in the master control unit group;
The processing module is used for processing the at least one subtask to obtain a calculation result;
and the return module is used for returning the calculation result to any main control unit.
CN202310961445.5A 2023-08-01 2023-08-01 Information processing method and device Pending CN116991618A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310961445.5A CN116991618A (en) 2023-08-01 2023-08-01 Information processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310961445.5A CN116991618A (en) 2023-08-01 2023-08-01 Information processing method and device

Publications (1)

Publication Number Publication Date
CN116991618A true CN116991618A (en) 2023-11-03

Family

ID=88531608

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310961445.5A Pending CN116991618A (en) 2023-08-01 2023-08-01 Information processing method and device

Country Status (1)

Country Link
CN (1) CN116991618A (en)

Similar Documents

Publication Publication Date Title
US10838777B2 (en) Distributed resource allocation method, allocation node, and access node
CN107291547B (en) Task scheduling processing method, device and system
CN109034396B (en) Method and apparatus for processing deep learning jobs in a distributed cluster
CN112527489B (en) Task scheduling method, device, equipment and computer readable storage medium
CN111897638B (en) Distributed task scheduling method and system
US8903981B2 (en) Method and system for achieving better efficiency in a client grid using node resource usage and tracking
CN109343939B (en) Distributed cluster and parallel computing task scheduling method
CN104468638B (en) A kind of distributed data processing method and system
CN106462593B (en) System and method for massively parallel processing of databases
WO2019006907A1 (en) Systems and methods for allocating computing resources in distributed computing
CN111580990A (en) Task scheduling method, scheduling node, centralized configuration server and system
JP2020531967A (en) Distributed system Resource allocation methods, equipment, and systems
CN110955501B (en) Service request processing method, device, electronic equipment and readable medium
CN109117244B (en) Method for implementing virtual machine resource application queuing mechanism
US20240152395A1 (en) Resource scheduling method and apparatus, and computing node
CN111163140A (en) Method, apparatus and computer readable storage medium for resource acquisition and allocation
CN114816709A (en) Task scheduling method, device, server and readable storage medium
CN111835809B (en) Work order message distribution method, work order message distribution device, server and storage medium
US8788601B2 (en) Rapid notification system
CN112631756A (en) Distributed regulation and control method and device applied to space flight measurement and control software
CN113434591B (en) Data processing method and device
CN116991618A (en) Information processing method and device
CN115629853A (en) Task scheduling method and device
CN114489978A (en) Resource scheduling method, device, equipment and storage medium
CN113703930A (en) Task scheduling method, device and system and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination