CN112839071B - Training system, training data access method and device, electronic equipment and medium - Google Patents

Training system, training data access method and device, electronic equipment and medium Download PDF

Info

Publication number
CN112839071B
CN112839071B CN201911167520.0A CN201911167520A CN112839071B CN 112839071 B CN112839071 B CN 112839071B CN 201911167520 A CN201911167520 A CN 201911167520A CN 112839071 B CN112839071 B CN 112839071B
Authority
CN
China
Prior art keywords
client
training data
target
node
server
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911167520.0A
Other languages
Chinese (zh)
Other versions
CN112839071A (en
Inventor
王立鹏
杨柏辰
叶松高
颜深根
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sensetime Group Ltd
Original Assignee
Sensetime Group Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sensetime Group Ltd filed Critical Sensetime Group Ltd
Priority to CN201911167520.0A priority Critical patent/CN112839071B/en
Publication of CN112839071A publication Critical patent/CN112839071A/en
Application granted granted Critical
Publication of CN112839071B publication Critical patent/CN112839071B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/568Storing data temporarily at an intermediate stage, e.g. caching
    • H04L67/5683Storage of data provided by user terminals, i.e. reverse caching
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
    • H04L67/63Routing a service request depending on the request content or context

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The disclosure relates to a training system, a training data access method and device, an electronic device and a medium, wherein the training system comprises: the system comprises a server and a plurality of nodes, wherein the server is used for distributing caching tasks for the plurality of nodes; the node is used for caching the training data block based on the cache task distributed by the server; the node comprises at least one client; a first client of the at least one client for obtaining a first data access request for target training data; and the first client is used for responding to the first data access request, determining a target client used for caching the target training data, and acquiring the target training data from a node where the target client is located.

Description

Training system, training data access method and device, electronic equipment and medium
Technical Field
The disclosure relates to the field of computer technology, and in particular, to a training system, a training data access method and device, an electronic device, and a medium.
Background
Distributed storage is a data storage technology that can store data in a decentralized manner on multiple servers, where the decentralized data resources can form a virtual storage system. The development of cloud computing and the Internet brings about massive data, and the distributed storage provides an efficient storage mode for massive data. The distributed storage utilizes a plurality of storage servers to share the storage load, and utilizes the position server to position the storage information, thereby effectively improving the reliability, availability and access efficiency of the system.
Typically, a client performs reading or writing of a file by requesting it from a server, each time the file is read or written by the server. Taking the file reading as an example, the client side firstly sends a file reading request to the server, and then the server returns the requested file to the client side according to the file reading request. However, one server may connect to multiple clients, and a large number of clients may access the server at the same time, which may occupy more network resources of the server, so that the performance of the server may be degraded.
Disclosure of Invention
The disclosure provides a training system, a training data access method and device, electronic equipment and medium.
According to an aspect of the present disclosure, there is provided a training system comprising:
a training system, the system comprising: a server and a plurality of nodes are provided,
the server is used for distributing caching tasks for the plurality of nodes;
the node is used for caching the training data block based on the cache task distributed by the server;
the node comprises at least one client;
a first client of the at least one client for obtaining a first data access request for target training data;
And the first client is used for responding to the first data access request, determining a target client used for caching the target training data, and acquiring the target training data from a node where the target client is located.
In one possible implementation manner, the first client is further configured to determine, in response to the first data access request, the target client for caching a target training data block in a plurality of clients in the training system according to registration information and meta information of the plurality of clients, where the meta information includes information of each training data.
In a possible implementation manner, in a case that the target client is the first client, the first client is configured to obtain a target training data block including the target training data from a node where the first client is located, and obtain the target training data from the target training data block.
In a possible implementation manner, in a case that the target client is a second client, the first client is configured to send a second data access request for the target training data to the second client, where the first client is different from the second client, and the first client and the second client belong to different nodes;
The second client is used for responding to the second data access request, acquiring the target training data block from the node where the second client is located, and sending the target training data block to the first client;
the first client is further configured to obtain the target training data from the target training data block sent by the second client.
In a possible implementation manner, the target client is configured to obtain a cache task allocated to the target client by the server;
the target client is configured to cache the training data block to be cached indicated by the caching task to a node where the target client is located, where the training data block to be cached includes the target training data block.
In a possible implementation manner, the target client is further configured to cache, in a node where the target client is located, the training data block to be cached indicated by the caching task when the plurality of clients in the training system complete registration.
In a possible implementation manner, the target client is further configured to determine, when the target client receives the second data access request, a target training data block where target training data indicated by the second data access request is located, and obtain the target training data block from the server, so as to cache the target training data block in a node where the target client is located.
In a possible implementation manner, the target client is configured to obtain meta information from the server, and send a registration request to the server, where the meta information includes information of each training data;
the server is further configured to send registration information of a plurality of clients in the training system to the target client according to the registration request;
the target client is further configured to determine a training data block indicated by the allocated cache task of each of the plurality of clients according to the registration information of the plurality of clients and the meta information.
In one possible implementation manner, the target client is further configured to send, to the server, an available memory of a node where the target client is located;
the server is further configured to determine, according to available memories of a plurality of nodes of the training system, a cache task allocated to the target client.
In one possible implementation manner, the target client is a client with the lowest process level in the nodes, wherein the process level of the client is determined by the server.
According to an aspect of the present disclosure, there is provided a training data access method, which is applied to a training system, the system including: a server and a plurality of nodes are provided,
The server is used for distributing caching tasks for the plurality of nodes;
the node is used for caching the training data block based on the cache task distributed by the server;
the node comprises at least one client;
a first client side in the at least one client side obtains a first data access request aiming at target training data;
and the first client responds to the first data access request, and determines a target client used for caching the target training data so as to acquire the target training data from a node where the target client is located.
In one possible implementation manner, the determining, by the first client, the target client for caching the target training data in response to the first data access request includes:
and the first client responds to the first data access request, and determines the target client for caching a target training data block in a plurality of clients according to registration information and meta information of the plurality of clients in the training system, wherein the meta information comprises information of each training data.
In a possible implementation manner, in a case that the target client is the first client, the determining, by the first client, in response to the first data access request, the target client for caching the target training data, so as to obtain the target training data from a node where the target client is located, includes:
The first client acquires a target training data block comprising the target training data from a node where the first client is located, and acquires the target training data from the target training data block.
In a possible implementation manner, in a case that the target client is the second client, the determining, by the first client, the target client for caching the target training data in response to the first data access request, so as to obtain the target training data from a node where the target client is located, includes:
the first client sends a second data access request for the target training data to the second client, so that the second client responds to the second data access request to acquire the target training data block from a node where the second client is located, and sends the target training data block to the first client;
the first client acquires the target training data from the target training data block sent by the second client;
the first client is different from the second client, and the first client and the second client belong to different nodes.
In one possible implementation manner, before the first client obtains the target training data from the node where the target client is located, the method further includes:
the first client acquires a cache task distributed by the server for the first client;
and the first client caches the training data block to be cached indicated by the caching task into a node where the first client is located, wherein the training data block to be cached comprises the target training data block.
In one possible implementation manner, before the first client obtains the target training data from the node where the target client is located, the method further includes:
and under the condition that a plurality of clients in the training system finish registration, the first client caches the training data block to be cached indicated by the caching task into the node where the first client is located.
In one possible implementation manner, before the first client obtains the target training data from the node where the target client is located, the method further includes:
and under the condition that the first data access request is received, the first client determines a target training data block where target training data indicated by the first data access request is located, and acquires the target training data block from the server so as to cache the target training data block into a node where the first client is located.
In one possible implementation, the method further includes:
the first client acquires meta information from the server and sends a registration request to the server, so that the server sends registration information of a plurality of clients in the training system to the first client according to the registration request, and the meta information comprises information of each training data;
and the first client determines a training data block indicated by the allocated cache task of each client in the plurality of clients according to the registration information of the plurality of clients and the meta information.
In one possible implementation, the method further includes:
and the first client sends the available memory of the node where the first client is located to the server, so that the server determines a cache task allocated to the first client according to the available memories of a plurality of nodes of the training system.
In one possible implementation manner, the first client is a client with the lowest process level in the nodes, wherein the process level of the client is determined by the server.
According to an aspect of the present disclosure, there is provided a training data access apparatus, the apparatus being applied to a training system, the system comprising: the system comprises a server and a plurality of nodes, wherein the server is used for distributing caching tasks for the plurality of nodes; the node is used for caching the training data block based on the cache task distributed by the server; the node comprises at least one client;
The apparatus is deployed in a first client of the at least one client, the apparatus comprising:
the acquisition module is used for acquiring a first data access request aiming at target training data;
and the processing module is used for responding to the first data access request, determining a target client for caching the target training data, and acquiring the target training data from a node where the target client is located.
In one possible implementation manner, the processing module is specifically configured to determine, in response to the first data access request, the target client for caching a target training data block in a plurality of clients in the training system according to registration information of the plurality of clients and meta information, where the meta information includes information of each training data.
In a possible implementation manner, the processing module is specifically configured to obtain, when the target client is the first client, a target training data block including the target training data from a node where the first client is located, and obtain the target training data from the target training data block.
In a possible implementation manner, the processing module is specifically configured to send a second data access request for the target training data to a second client, where the target client is the second client, so that the second client obtains the target training data block from a node where the second client is located in response to the second data access request, and sends the target training data block to the first client; acquiring the target training data from the target training data block sent by the second client;
The first client is different from the second client, and the first client and the second client belong to different nodes.
In one possible implementation, the apparatus further includes:
the caching module is used for acquiring a caching task distributed by the server for the first client before the first client acquires the target training data from a node where the target client is located; and caching the training data block to be cached indicated by the caching task into the node where the first client is located, wherein the training data block to be cached comprises the target training data block.
In a possible implementation manner, the buffer module is further configured to buffer, before the target training data is obtained from the node where the target client is located, the training data block to be buffered indicated by the buffer task to the node where the first client is located, where registration is completed by multiple clients in the training system.
In a possible implementation manner, the buffer module is further configured to determine, when the first data access request is received before the target training data is acquired from the node where the target client is located, a target training data block where the target training data indicated by the first data access request is located, and acquire the target training data block from the server, so as to buffer the target training data block into the node where the first client is located.
In one possible implementation, the apparatus further includes:
the registration module is used for acquiring meta information from the server and sending a registration request to the server so that the server can send registration information of a plurality of clients in the training system to the first client according to the registration request, wherein the meta information comprises information of each training data;
the determining module is further configured to determine, according to the registration information of the plurality of clients and the meta information, a training data block indicated by the allocated buffer task of each client in the plurality of clients.
In one possible implementation, the apparatus further includes:
and the sending module is used for sending the available memory of the node where the first client is located to the server so that the server determines the cache task allocated to the first client according to the available memories of a plurality of nodes of the training system.
In one possible implementation manner, the first client is a client with the lowest process level in the nodes, wherein the process level of the client is determined by the server.
According to an aspect of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the instructions stored in the memory to perform the above method.
According to an aspect of the present disclosure, there is provided a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method.
In an embodiment of the disclosure, the training system may include a server and a plurality of nodes, the server may allocate a buffer task for the plurality of nodes, and the nodes may buffer the training data block based on the buffer task allocated by the server. The node may include at least one client, a first client of the at least one client may obtain a first data access request for the target training data, and the first client may determine, in response to the first data access request, a target client for caching the target training data to obtain the target training data from a node where the target client is located. Therefore, the training data blocks can be scattered and cached in a plurality of nodes, so that the cache resources of the nodes can be fully utilized, the loading speed of the training data is improved and the data transmission pressure between the client and the server is reduced through the data interaction between the client and the client.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure. Other features and aspects of the present disclosure will become apparent from the following detailed description of exemplary embodiments, which proceeds with reference to the accompanying drawings.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the technical aspects of the disclosure.
FIG. 1 illustrates a block diagram of a training system according to an embodiment of the present disclosure.
Fig. 2 shows a flowchart of a training data access method according to an embodiment of the present disclosure.
FIG. 3 illustrates a block diagram of acquiring target training data, according to an embodiment of the present disclosure.
Fig. 4 illustrates a block diagram of an example of a communication connection between clients in accordance with an embodiment of the present disclosure.
Fig. 5 illustrates a block diagram of an example of a client buffering training data blocks, according to an embodiment of the present disclosure.
Fig. 6 illustrates a block diagram of an example of client buffering training data blocks, according to an embodiment of the present disclosure.
Fig. 7 shows a block diagram of client registration according to an embodiment of the present disclosure.
Fig. 8 shows a block diagram of a training system according to an embodiment of the present disclosure.
Fig. 9 shows a block diagram of a training data access device according to an embodiment of the present disclosure.
Fig. 10 shows a block diagram of an electronic device, according to an embodiment of the disclosure.
Detailed Description
Various exemplary embodiments, features and aspects of the disclosure will be described in detail below with reference to the drawings. In the drawings, like reference numbers indicate identical or functionally similar elements. Although various aspects of the embodiments are illustrated in the accompanying drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The word "exemplary" is used herein to mean "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.
The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.
Furthermore, numerous specific details are set forth in the following detailed description in order to provide a better understanding of the present disclosure. It will be understood by those skilled in the art that the present disclosure may be practiced without some of these specific details. In some instances, methods, means, elements, and circuits well known to those skilled in the art have not been described in detail in order not to obscure the present disclosure.
The training system provided by the embodiment of the disclosure may include a server and a plurality of nodes. Wherein the server may allocate cache tasks for a plurality of nodes. The node may cache the training data blocks based on the server-assigned cache tasks. In this way, a large number of training data blocks can be stored in a distributed manner in a plurality of nodes, reducing the access load of the server. Here, the node may include at least one client, a first client in the at least one client may obtain a first data access request for the target training data, and in response to the first data access request, the first client may determine a target client for caching the target training data, so as to obtain the target training data from a node where the target client is located, thereby improving a loading speed of the target training data, and showing that cache resources in the node may be fully utilized.
In the related art, a training data block is generally stored in a server. When a client accesses training data, an access request needs to be sent to a server. However, the processing resources of the server are limited, and under the condition of simultaneously receiving access requests of a large number of clients, it is difficult to respond to the access requests of some clients in time, so that the speed of acquiring training data by the clients is influenced. In the embodiment of the disclosure, the training data blocks are cached in the nodes, so that the client can acquire target training data in the caching unit of the nodes, the speed of loading the target training data is improved, and the caching resources of a plurality of nodes can be fully utilized, so that the clients can communicate efficiently and reliably, and the communication performance of the training data is improved.
FIG. 1 illustrates a block diagram of a training system according to an embodiment of the present disclosure. As shown in fig. 1, the training system may include: a server 11 and a plurality of nodes 12.
The server 11 is configured to allocate cache tasks to the plurality of nodes 12;
the node 12 is configured to cache a training data block based on a cache task allocated by the server; the node 12 comprises at least one client 13.
In the embodiments of the present disclosure, the training system may be applied in the context of neural network training. The neural network may use a large amount of training data during the training process, which may form a data set. The server 11 may manage the data sets. The data set may include a plurality of training data blocks, each of which may include a plurality of training data. Here, the training data may be data required in a training process of input data, output data, tag data, etc. of the neural network, and the training data may be image information, text information, etc. The server 11 may allocate a buffering task to a plurality of nodes, that is, may be understood as a task of allocating training data blocks to a plurality of nodes for buffering.
Here, the training system may include a plurality of nodes, where the nodes may be training nodes in a neural network, may be devices, a device cluster (including at least two devices), or a program running on a device, and are not limited herein. Each node 12 may have a corresponding buffer unit, and may buffer the training data blocks corresponding to the buffer tasks allocated by the server 11, so that a plurality of training data blocks may be stored in a distributed manner in a plurality of nodes. For example, in the case that the node is a device cluster, the cache unit may refer to one or more servers or other devices belonging to the device cluster; in the case that the node is a device, the cache unit may refer to an area on the device for storing data.
Here, each node 12 may include at least one client. Each client may provide services to the node, for example, in response to a training process (a training process may refer to a client in which the training system is deployed on a node). At least one client included in each node 12 may share the cache of the node where it is located, i.e. a plurality of clients deployed on the same node, may store the training data blocks indicated by the cache tasks allocated by the server in the same cache unit, and in some implementations, may also set different cache units for different clients.
By caching the cache tasks distributed by the server in the nodes, the distributed cache of the training data on the client side can be realized, and the loading performance of the training data is improved. It should be noted that the training system may be integrated in one device, and in some implementations, the server and the node in the training system may be integrated in different devices, and the embodiments of the present disclosure do not limit a specific setting manner.
The following describes a training data access method provided by an embodiment of the present disclosure. The training data access method may be applied to a terminal device or other electronic devices, where the terminal device or other electronic devices may be a node in the training system, and the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a personal digital assistant (Personal Digital Assistant, PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like. In some possible implementations, the training data access method may be implemented by way of a processor invoking computer readable instructions stored in a memory. The training data access method provided by the embodiment of the present disclosure is described below with the first client as an execution body.
Fig. 2 shows a flowchart of a training data access method according to an embodiment of the present disclosure. The training data access method may include the steps of:
step S21, a first client of the at least one client obtains a first data access request for target training data.
In step S22, the first client determines, in response to the first data access request, a target client for caching the target training data, so as to obtain the target training data from a node where the target client is located.
In the embodiment of the disclosure, the first client may be any one of at least one client included in the node, or one client determined according to a preset rule. For example, it may be determined according to the order in which clients are registered with the server, and/or according to the process level assigned to the clients by the server, and/or according to the data processing capabilities (e.g., computing capabilities, etc.) of the respective clients in each node, and so on. That is, the server may consider a factor or factors as exemplified above to determine the first client. The first data access request may be an access request to access the target training data, the first data access request may be generated by the first client, or may be sent by a client other than the first client, which may be located at the same node as the first client or at a different node. For example, a first client may receive a first data access request from other clients in the same node.
Here, the target training data may be located in a target training data block, and the target client may be a client to which a buffering task of the target training data block is allocated. The target client may be the first client or may be another client other than the first client.
In the embodiment of the disclosure, the first client may receive a first data access request for accessing the target training data, and then may determine, according to relevant information of the target training data carried in the first data access request, a target client for caching the target training data. Here, the related information of the target training data in the first data access request may include information such as a name, a storage path, and the like of the target training data.
Here, the training data block may be formed by aggregating a plurality of training data, and a large amount of training data may be aggregated into a plurality of training data blocks for storage in order to facilitate transmission and storage of the training data. Correspondingly, the first client may further store a correspondence between the training data block and the client, where the correspondence may represent a correspondence between the client and a cache task of the training data block allocated to the client. The first client can determine a target training data block where the target training data is located according to the related information of the target training data carried in the first data access request. And then according to the corresponding relation between the training data block and the client, the client corresponding to the target training data block can be determined.
After determining the target client for caching the target training data, the first client can acquire the target training data from the node where the target client is located according to the address information of the target client. Here, the address information may include an internet protocol address (Internet Protocol Address, IP) address and a port address, and the address information of the target client may be acquired by the server by the first client, and the first client may acquire address information of a plurality of clients from the server.
According to the first client side of the embodiment of the disclosure, the target training data can be acquired in the node where the target client side is located under the condition that the first data access request for accessing the target training data is acquired, the speed of training data access can be improved, and efficient and reliable communication among a plurality of client sides is realized.
In one possible implementation, the first client may determine, in response to the first data access request, a target client for caching a target training data block from a plurality of clients in the training system according to registration information of the plurality of clients and meta information, where the meta information includes information of each training data block.
In this implementation manner, meta information of a plurality of training data may be stored in the first client in advance, where the meta information may include information of a plurality of training data, for example, may include information of a name, a data length, a data offset address, a training data block where each training data is located, and the first client may determine, according to the pre-stored meta information, a target training data block where the target training data is located.
The first client may also store registration information of a plurality of clients in advance, where the registration information of the plurality of clients may include address information of each client, a process level, an available memory of a node where the client is located, and other information. According to the registration information of the plurality of clients, the first client can determine the allocated buffer task of each client, and further according to the allocated buffer task of each client, the client for buffering each training data block can be determined, that is, the correspondence between the training data block and the client can be determined, so that the target client for buffering the target training data block can be determined according to the correspondence.
Here, the registration information and the meta information stored in advance by the first client may be acquired from the server. The first client may determine a correspondence between each client and the cached training data block according to the pre-stored registration information and meta information, and store the correspondence. Thus, according to the corresponding relation, the first client can quickly determine the target training data block where the target training data is located.
In one possible implementation, in the case that the target client is the first client, the first client may obtain a target training data block including target training data from a node where the first client is located, and obtain the target training data from the target training data block.
In this implementation manner, in the case that the target client is the first client, it may be determined that the client for caching the target training data block is the first client, and the target training data block is stored in the node where the first client is located, so that the first client may obtain the target training data in the target training data block in the local cache (i.e., the cache unit of the node where the first client is located), thereby implementing fast reading of the target training data.
Here, the first client may be a client having access to a cache capability in the node where it is located. In one implementation, the first client may be the lowest-process-level client among the nodes where the process level of the client is determined by the server. The server may assign a unique process level to each client so that clients with access capabilities in each node may be determined based on the process level of the respective client in each node. For example, the server may randomly assign a unique process level to each client, or the server may assign a process level to each client according to a certain rule, for example, set a process level according to the order in which each client registers with the server. The certain rule may be preset, and includes, but is not limited to, the above-mentioned exemplary case, and in the embodiment of the present application, the setting manner of the certain rule is not limited.
In one possible implementation manner, in the case that the target client is the second client, the first client sends a second data access request for the target training data to the second client, so that the second client responds to the second data access request to acquire the target training data block from the node where the second client is located, and sends the target training data block to the first client. The first client acquires target training data from a target training data block sent by the second client. The first client and the second client are different, and the first client and the second client belong to different nodes.
In this implementation, if the target client is a second client, and the first client and the second client belong to different nodes, the first client may send a second data access request for accessing the target training data to the second client after obtaining the first data access request. The second client may determine a target training data block including target training data according to the second data access request and transmit the target training data block to the first client. The first client may obtain the target training data in a target training data block. Thus, for a first client, the first client is able to access a target training data block from another node by means of a second client located on the other node (different from the node on which the first client is located).
The second client has the same or similar functions as the first client, so the determining manner of the second client may refer to the determining manner of the first client, and in the case that the second client receives the first data access request, the above-mentioned related content described for the first client may be used for data storage, data access, and so on, which are not described herein. For example, the second client may be a client having access to a cache capability in a node where the second client is located, and the second client is a client having a lowest process level in the node where the second client is located, or the second client may be a client having a highest computing capability in the node where the second client is located.
The above-described process of acquiring target training data is described below by way of an example. FIG. 3 illustrates a block diagram of acquiring target training data in accordance with an embodiment of the present disclosure. It is assumed that the first node includes a client a, a client B, and a client C, where the client B is a client with the smallest process level of the client in the first node. It is assumed that the second node includes a client D, a client E, and a client F, where the client E is a client with the smallest process level of the client in the second node.
In one example, any one client in each node may access a block of training data indicated by the client's assigned cache task. The client a in the first node receives an access request for accessing the target training data, and may send the first data access request to the client C when it is determined that the target training data block where the target training data is located is cached on the node where the client C is located. The client C may read the target training data in a cache unit of the node where the client C is located, and send the target training data to the client a. That is, for each client deployed in the training system, the client having the capability of buffering the training data block may access the target training data block buffered in the buffer unit by the client after receiving the data access request sent by the other client, and send the target training data block to the client initiating the data access request, so that the client initiating the data access request completes the access of the target training data.
In one example, a process level minimum client in each node may access a buffered training data block in the node at which it is located. The client a receives an access request for accessing the target training data, and may send a first data access request to the client B in case it is determined that the target training data stores the first node. Client B may read the target training data in the local cache and return the target training data to client a.
In one example, a process level minimum client in each node may access a buffered training data block in the node at which it is located. It should be noted that, at this time, the client with the smallest process level in each node has the capability of buffering the training data block. When the client a receives the access request for accessing the target training data and determines that the target training data is stored in the client E, the client a may send a second data access request to the client E in the second node, obtain, by the client E, a target training data block of the target training data in a cache of the second node, and return the target training data block to the client a.
In order to enable efficient communication and data transfer between clients, communication between clients may be established through a framework of remote procedure calls (Remote Procedure Call, RPC), for example, a simple and efficient Apache thread may be used to enable communication between clients. Apache thread is a compact and friendly RPC framework that can support multiple interface description languages, such as C++, java, python, and other computer languages. And, apache thread adopts the binary communication protocol, thus can realize the communication between the customer end more high-efficiently.
According to the communication mode between the clients provided by the embodiment of the disclosure, the problem that the connection number is too large and the server pressure is too high due to the fact that one server corresponds to a plurality of clients can be solved, and the pressure of the server can be reduced by the fact that the first client in each node is responsible for accessing training data. Fig. 4 illustrates a block diagram of an example of a communication connection between clients in accordance with an embodiment of the present disclosure. The implementation in fig. 4 may represent a communication connection between clients, where a cache unit is accessed by a given client in one node, e.g., the access of training data is responsible for the smallest level of progress client in each node. Each client communicates with the client with the smallest process level in each node (i.e., for one client, data interaction can be performed between the client and the client with the smallest process level in each node), where n represents the number of clients and p represents the number of nodes. In the case of full connection of clients, each client typically has the ability to buffer training data blocks, and the number of connections between clients is n× (n-1). By adopting any implementation mode, the data interaction between the client and the server can be effectively reduced, so that the network pressure is reduced.
In one possible implementation manner, before the first client obtains the target training data from the node where the target client is located, the first client obtains a buffer task allocated by the server for the first client, the first client buffers a training data block to be buffered indicated by the buffer task into the node where the first client is located, and the training data block to be buffered includes the target training data block.
In this implementation manner, before the first client obtains the target training data from the node where the target client is located, the first client may obtain a buffering task allocated by the server for the first client, and cache, at the node where the first client is located, a training data block to be cached indicated by the obtained buffering task. For example, the training data blocks to be cached indicated by the obtained caching task may be cached in the node (i.e. all the training data blocks to be cached indicated by the caching task are cached in the node where the first client is located) before the first data access request is received, or the training data blocks to be cached indicated by the obtained caching task may be cached in the node (i.e. the training data blocks to be obtained from the node where the first client is located at this time are cached in the node where the first client is located according to the requirement, i.e. according to the first data access request) after the first data access request is received. Here, the training data blocks to be buffered may be all the training data blocks indicated by the buffering task, or may be one or more training data blocks in all the training data blocks indicated by the buffering task. The training data block to be cached comprises target training data, so that the first client can quickly acquire the target training data included in the target training data block in the node. Here, the first client is the target client.
In one example, before the first client obtains the target training data from the node where the target client is located, the first client caches the training data block to be cached indicated by the caching task into the node where the first client is located under the condition that a plurality of clients in the training system complete registration.
In this example, the first client may cache the training data block indicated by the allocated cache task in the node in advance, that is, the first client may cache the training data block indicated by the allocated cache task locally in advance in a case where a plurality of clients complete registration, so that the first client may quickly respond to the first data access request in a case where the first data access request is received, and acquire the target training data in the cache of the node.
Fig. 5 illustrates a block diagram of an example of a client buffering training data blocks, according to an embodiment of the present disclosure. In a one-time active caching (one shot) mode, that is, after a client in a training system registers with a server, a training data block indicated by a cache task allocated by the server is cached in a caching unit of a node at one time, and after the client caches the training data block, the client can often not increase the cache tasks of other training data blocks. Suppose that the training system includes 3 clients, client a, client B, and client C. Client a may pre-cache assigned training data blocks 1 and 2, client B may pre-cache assigned training data blocks 3 and 4, and client C may pre-cache assigned training data blocks 5 and 6. In the case where the client a receives a request to read the training data blocks 3 and 5, a request to read the training data blocks may be initiated to the client B and the client C according to the correspondence between the training data blocks and the clients to acquire the training data blocks 3 through the client B and the training data blocks 5 through the client C.
In one example, when the first client receives the first data access request, determining a target training data block where target training data indicated by the first data access request is located, and acquiring the target training data block from a server to cache the target training data block in a node where the first client is located.
In this implementation manner, after receiving the first data access request, the first client may determine a target training data block of the target training data according to the first data access request, where the client allocated to the buffering task corresponding to the target training data block is the first client, the first client may obtain the target training data block from the server, and cache the target training data block in the node. That is, it may be understood that, after the first client receives the first data access request, only the target training data block may be cached, and no other training data block indicated by the caching task is cached. Therefore, the pressure caused by that a plurality of clients acquire training data blocks from a server at the same time can be reduced, and the cache resource of the node where the first client is located can be saved.
Fig. 6 illustrates a block diagram of an example of client buffering training data blocks, according to an embodiment of the present disclosure. In an on demand passive caching (on demand) mode, i.e. after receiving a request to access training data, a client in the training system may cache the training data blocks to be accessed in a cache unit of the node according to the request. Suppose that the training system includes 3 clients, client a, client B, and client C. The allocated buffer tasks of the client A comprise a training data block 1 and a training data block 2, the allocated buffer tasks of the client B comprise a training data block 3 and a training data block 4, and the allocated buffer tasks of the client C comprise a training data block 5 and a training data block 6. In the case that the client a receives the first data access request for accessing the training data block 3 and the training data block 5, a request for accessing the training data block 3 may be initiated to the client B and a request for accessing the training data block 5 may be initiated to the client C according to the correspondence between the training data blocks and the client. After receiving the request for reading the training data block 3, the client B may buffer the training data block 3. After receiving the request for reading the training data block 5, the client C may cache the training data block 5, so that the client B obtains the training data block 3 and obtains the training data block 5 through the client C, and the client a may not perform the assigned cache task, i.e., the client a may not cache the training data block 1 and the training data block 2. In this way, after receiving the request for reading the corresponding training data block, the client can acquire the corresponding training data block from the server and cache the corresponding training data block locally, so that the local cache can be saved, and the pressure of a plurality of clients for simultaneously requesting the training data block from the server can be relieved.
In one possible implementation, the first client obtains meta information from the server, and sends a registration request to the server, so that the server sends registration information of a plurality of clients in the training system to the first client according to the registration request, where the meta information includes information of each training data. And then the first client determines the training data block indicated by the allocated buffer task of each client in the plurality of clients according to the registration information and the meta information of the plurality of clients.
In this implementation, the first client may register with the server. Fig. 7 shows a block diagram of client registration according to an embodiment of the present disclosure. The first client may obtain meta information of the training data by the server, where the meta information may include information of a plurality of training data, for example, may include information of a name, a data length, a data offset address, a training data block where each training data is located, and the meta information may be used for the first client to determine a target training data block where the target training data is located. The first client may also initiate a registration process, send a registration request to the server, and register with the server. The registration request may carry information about the first client, for example, the registration request may carry information about an address of the first client, a task identifier, an available memory of the node where the first client is located, and so on. Here, the address information of the first client may include an IP address and a port address. For example, the first client may select a port randomly or according to a rule from the currently idle ports, and send the port address of the port to the server, so that the collision between ports of multiple clients in a node may be reduced. Here, a task identification may be used to indicate a task that the first client requests registration, which may be obtained in an environment variable of the training system.
After receiving the registration request of the first client, the server may send registration information of a plurality of clients in the training system to the first client. Here, the registration information may include address information of the client, a process level, available memory, and the like. For example, the server may receive registration requests of multiple clients in the training system, and record information such as an IP address, a port address, a task identifier, and an available memory of a node where each client is located, which are carried in the registration request of each client. The server may obtain the number of processes in the environment variables of the training system, which may refer to the number of registration processes initiated by the client. Then, under the condition that the registration requests of a plurality of clients in the training system are received, that is, the received registration requests are larger than or equal to the number of processes, different process grades can be allocated to different clients according to the number of the clients initiating the registration requests, and the address information, the process grade, the available memory and other registration information of each client are sent to the plurality of clients.
Further, the first client may determine, according to the registration information and the acquired meta information of the plurality of clients, a buffering task allocated by the server to each client, that is, may determine a correspondence between each client and a training data block to be buffered by the client. In this way, the first client can locally calculate the corresponding relation between each client and the training data block, so as to reduce the server pressure caused by the client accessing the server.
In one possible implementation, the first client may send the available memory of the node where the first client is located to the server, so that the server determines, according to the available memory of the plurality of nodes of the training system, a cache task allocated to the first client.
In this implementation, the first client may send the available memory of the node where the first client is located to the server, e.g., the first client may carry the available memory of the node where the first client is located in a registration request that registers with the server, or the first client may send the available memory of the node where the first client is located to the server before sending the registration request to the server.
The server may allocate a buffer task of the training data block for each of the plurality of clients based on the available memory of each of the clients. The server may determine, according to the available memory of each client, a buffer size of a node where each client is located (the buffer size refers to a buffer for storing a training data block on the node), and then allocate, according to the buffer size of the node where each client is located, a buffer task of the training data block to each client, so that the training data block indicated by the buffer task allocated to each client matches the buffer size of the client.
Here, the cache size of each client may be calculated according to the following formula (1):
buffer size = available memory x (data set size/total memory) of the node where it is located formula (1);
the data set may be a data set formed by all training data blocks, and the total memory may be the sum of available memories of all clients.
Here, when determining the buffer task allocated to each client according to the available memory of the node where each client is located, the server may sequentially allocate the aligned training data blocks to each client according to the process level of each client, for example, allocate the training data blocks that are ranked earlier to the clients with small process levels. According to the mode of distributing the training data blocks, the first client can determine the buffer task of the distributed training data blocks of each client according to the available memory of the node where each client is located, the process level and the meta information of the training data blocks. In this way, the first client can quickly determine the correspondence between each client and the training data block to be cached, and in the case of receiving the first data access request, can quickly determine the target client for the target training data.
It should be noted that, the client and the server provided in the embodiments of the present disclosure may be configured in the same electronic device, or may be configured in different electronic devices, and the embodiments of the present disclosure do not limit a specific configuration manner. The embodiment of the disclosure provides a scheme of client distributed dynamic caching, which can realize efficient information communication and meet the requirements of different users on caching modes.
According to the training system and the training data access method provided by the embodiment of the disclosure, the training data blocks can be cached in the nodes, and the cache resources of a plurality of nodes can be fully utilized. The first client can access the training data cached by the first client in the node, so that data interaction with other clients is realized, the speed of loading target training data is improved, the clients can communicate efficiently and reliably, and the communication performance of the training data is improved.
It will be appreciated that the above-mentioned method embodiments of the present disclosure may be combined with each other to form a combined embodiment without departing from the principle logic, and are limited to the description of the present disclosure. It will be appreciated by those skilled in the art that in the above-described methods of the embodiments, the particular order of execution of the steps should be determined by their function and possible inherent logic.
In addition, the disclosure further provides an information processing apparatus, an electronic device, a computer readable storage medium, and a program, where the foregoing may be used to implement any one of the information processing methods provided in the disclosure, and corresponding technical schemes and descriptions and corresponding descriptions referring to method parts are not repeated.
Fig. 8 shows a block diagram of a training system, as shown in fig. 8, according to an embodiment of the present disclosure, the system comprising: a server 31 and a plurality of nodes 32,
the server 31 is configured to allocate a cache task to the plurality of nodes;
the node 32 is configured to cache a training data block based on the cache task allocated by the server;
the node comprises at least one client 33;
a first client 331 of the at least one client, configured to obtain a first data access request for target training data;
the first client 331 is configured to determine, in response to the first data access request, a target client for caching the target training data, so as to obtain the target training data from a node where the target client is located.
In a possible implementation manner, the first client 331 is further configured to determine, in response to the first data access request, the target client for caching a target training data block in a plurality of clients in the training system according to registration information and meta information of the plurality of clients, where the meta information includes information of each training data.
And in the case that the target client is the first client, the first client 331 is configured to obtain a target training data block including the target training data from a node where the first client is located, and obtain the target training data from the target training data block.
In a possible implementation manner, in a case that the target client is a second client, the first client 331 is configured to send a second data access request for the target training data to the second client, where the first client is different from the second client, and the first client and the second client belong to different nodes;
in a possible implementation manner, the second client is configured to obtain, in response to the second data access request, the target training data block from a node where the second client is located, and send the target training data block to the first client;
the first client 331 is further configured to obtain the target training data from the target training data block sent by the second client.
In a possible implementation manner, the target client is configured to obtain a cache task allocated to the target client by the server;
The target client is configured to cache the training data block to be cached indicated by the caching task to a node where the target client is located, where the training data block to be cached includes the target training data block.
In a possible implementation manner, the target client is further configured to cache, in a node where the target client is located, the training data block to be cached indicated by the caching task when the plurality of clients in the training system complete registration.
In a possible implementation manner, the target client is further configured to determine, when the target client receives the second data access request, a target training data block where target training data indicated by the second data access request is located, and obtain the target training data block from the server, so as to cache the target training data block in a node where the target client is located.
In a possible implementation manner, the target client is configured to obtain meta information from the server, and send a registration request to the server, where the meta information includes information of each training data;
the server is further configured to send registration information of a plurality of clients in the training system to the target client according to the registration request;
The target client is further configured to determine a training data block indicated by the allocated cache task of each of the plurality of clients according to the registration information of the plurality of clients and the meta information.
In one possible implementation manner, the target client is further configured to send, to the server, an available memory of a node where the target client is located;
the server is further configured to determine, according to available memories of a plurality of nodes of the training system, a cache task allocated to the target client.
In one possible implementation manner, the target client is a client with the lowest process level in the nodes, wherein the process level of the client is determined by the server.
Fig. 9 shows a block diagram of a training data access apparatus according to an embodiment of the present disclosure, the apparatus being applied to a training system, the system comprising: the system comprises a server and a plurality of nodes, wherein the server is used for distributing caching tasks for the plurality of nodes; the node is used for caching the training data block based on the cache task distributed by the server; the node comprises at least one client; the apparatus is deployed in a first client of the at least one client, the apparatus comprising:
An acquisition module 41, configured to acquire a first data access request for target training data;
and the processing module 42 is configured to determine, in response to the first data access request, a target client for caching the target training data, so as to obtain the target training data from a node where the target client is located.
In a possible implementation manner, the processing module 42 is specifically configured to determine, in response to the first data access request, the target client for caching a target training data block in a plurality of clients in the training system according to registration information and meta information of the plurality of clients, where the meta information includes information of each training data.
In a possible implementation manner, the processing module 42 is specifically configured to obtain, when the target client is the first client, a target training data block including the target training data from a node where the first client is located, and obtain the target training data from the target training data block.
In a possible implementation manner, the processing module 42 is specifically configured to send, when the target client is a second client, a second data access request for the target training data to the second client, so that the second client obtains, in response to the second data access request, the target training data block from a node where the second client is located, and send the target training data block to the first client; acquiring the target training data from the target training data block sent by the second client;
The first client is different from the second client, and the first client and the second client belong to different nodes.
In one possible implementation, the apparatus further includes:
the caching module is used for acquiring a caching task distributed by the server for the first client before the first client acquires the target training data from a node where the target client is located; and caching the training data block to be cached indicated by the caching task into the node where the first client is located, wherein the training data block to be cached comprises the target training data block.
In a possible implementation manner, the buffer module is further configured to buffer, before the target training data is obtained from the node where the target client is located, the training data block to be buffered indicated by the buffer task to the node where the first client is located, where registration is completed by multiple clients in the training system.
In a possible implementation manner, the buffer module is further configured to determine, when the first data access request is received before the target training data is acquired from the node where the target client is located, a target training data block where the target training data indicated by the first data access request is located, and acquire the target training data block from the server, so as to buffer the target training data block into the node where the first client is located.
In one possible implementation, the apparatus further includes:
the registration module is used for acquiring meta information from the server and sending a registration request to the server so that the server can send registration information of a plurality of clients in the training system to the first client according to the registration request, wherein the meta information comprises information of each training data;
the determining module is further configured to determine, according to the registration information of the plurality of clients and the meta information, a training data block indicated by the allocated buffer task of each client in the plurality of clients.
In one possible implementation, the apparatus further includes:
and the sending module is used for sending the available memory of the node where the first client is located to the server so that the server determines the cache task allocated to the first client according to the available memories of a plurality of nodes of the training system.
In one possible implementation manner, the first client is a client with the lowest process level in the nodes, wherein the process level of the client is determined by the server.
In some embodiments, functions or modules included in an apparatus provided by the embodiments of the present disclosure may be used to perform a method described in the foregoing method embodiments, and specific implementations thereof may refer to descriptions of the foregoing method embodiments, which are not repeated herein for brevity.
The disclosed embodiments also provide a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the above-described method. The computer readable storage medium may be a non-volatile computer readable storage medium.
The embodiment of the disclosure also provides an electronic device, which comprises: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to invoke the instructions stored in the memory to perform the above method.
Embodiments of the present disclosure also provide a computer program product comprising computer readable code which, when run on a device, causes a processor in the device to execute instructions for implementing the picture searching method provided in any of the embodiments above.
The disclosed embodiments also provide another computer program product for storing computer readable instructions that, when executed, cause a computer to perform the operations of the picture searching method provided in any of the above embodiments.
Fig. 10 shows a block diagram of an electronic device 1900 according to an embodiment of the disclosure. For example, electronic device 1900 may be provided as a server. Referring to fig. 10, electronic device 1900 includes a processing component 1922 that further includes one or more processors and memory resources represented by memory 1932 for storing instructions, such as application programs, that can be executed by processing component 1922. The application programs stored in memory 1932 may include one or more modules each corresponding to a set of instructions. Further, processing component 1922 is configured to execute instructions to perform the methods described above.
The electronic device 1900 may also include a power component 1926 configured to perform power management of the electronic device 1900, a wired or wireless network interface 1950 configured to connect the electronic device 1900 to a network, and an input/output (I/O) interface 1958. The electronic device 1900 may operate based on an operating system stored in memory 1932, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, or the like.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as memory 1932, including computer program instructions executable by processing component 1922 of electronic device 1900 to perform the methods described above.
The present disclosure may be a system, method, and/or computer program product. The computer program product may include a computer readable storage medium having computer readable program instructions embodied thereon for causing a processor to implement aspects of the present disclosure.
The computer readable storage medium may be a tangible device that can hold and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: portable computer disks, hard disks, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), static Random Access Memory (SRAM), portable compact disk read-only memory (CD-ROM), digital Versatile Disks (DVD), memory sticks, floppy disks, mechanical coding devices, punch cards or in-groove structures such as punch cards or grooves having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media, as used herein, are not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (e.g., optical pulses through fiber optic cables), or electrical signals transmitted through wires.
The computer readable program instructions described herein may be downloaded from a computer readable storage medium to a respective computing/processing device or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmissions, wireless transmissions, routers, firewalls, switches, gateway computers and/or edge servers. The network interface card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium in the respective computing/processing device.
Computer program instructions for performing the operations of the present disclosure can be assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, c++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer readable program instructions may be executed entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present disclosure are implemented by personalizing electronic circuitry, such as programmable logic circuitry, field Programmable Gate Arrays (FPGAs), or Programmable Logic Arrays (PLAs), with state information of computer readable program instructions, which can execute the computer readable program instructions.
Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable medium having the instructions stored therein includes an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The computer program product may be realized in particular by means of hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied as a computer storage medium, and in another alternative embodiment, the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.
The foregoing description of the embodiments of the present disclosure has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described. The terminology used herein was chosen in order to best explain the principles of the embodiments, the practical application, or the improvement of technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (32)

1. A training system, the system comprising: a server and a plurality of nodes are provided,
the server is used for distributing caching tasks for the plurality of nodes, wherein the nodes are training nodes in the neural network;
The node is used for caching the training data block based on the cache task distributed by the server;
the node comprises at least one client;
a first client of the at least one client for obtaining a first data access request for target training data;
and the first client is used for responding to the first data access request, determining a target client used for caching the target training data, and acquiring the target training data from a node where the target client is located.
2. The training system of claim 1, wherein the first client is further configured to determine, in response to the first data access request, the target client of the plurality of clients for caching a target training data block based on registration information of the plurality of clients in the training system and meta information, the meta information including information of each training data.
3. Training system according to claim 1 or 2, characterized in that, in case the target client is the first client, the first client is configured to obtain a target training data block comprising the target training data from a node where the first client is located, and obtain the target training data from the target training data block.
4. Training system according to claim 1 or 2, characterized in that in case the target client is a second client, the first client is adapted to send a second data access request for the target training data to the second client, the first client being different from the second client and the first client and the second client belonging to different nodes;
the second client is used for responding to the second data access request, acquiring the target training data block from the node where the second client is located, and sending the target training data block to the first client;
the first client is further configured to obtain the target training data from the target training data block sent by the second client.
5. Training system according to claim 1 or 2, characterized in that the target client is adapted to obtain a cache task allocated by the server for the target client;
the target client is configured to cache the training data block to be cached indicated by the caching task to a node where the target client is located, where the training data block to be cached includes the target training data block.
6. The training system of claim 5, wherein the target client is further configured to cache the training data block to be cached indicated by the caching task to a node where the target client is located, in a case where a plurality of clients in the training system complete registration.
7. The training system of claim 5, wherein the target client is further configured to, in the case where the target client receives a second data access request, determine a target training data block in which target training data indicated by the second data access request is located, and obtain the target training data block from the server, so as to cache the target training data block in a node in which the target client is located.
8. Training system according to claim 1 or 2, characterized in that the target client is adapted to obtain meta information from the server and to send a registration request to the server, the meta information comprising information of the respective training data;
the server is further configured to send registration information of a plurality of clients in the training system to the target client according to the registration request;
The target client is further configured to determine a training data block indicated by the allocated cache task of each of the plurality of clients according to the registration information of the plurality of clients and the meta information.
9. The training system according to claim 1 or 2, wherein the target client is further configured to send, to the server, an available memory of a node where the target client is located;
the server is further configured to determine, according to available memories of a plurality of nodes of the training system, a cache task allocated to the target client.
10. Training system according to claim 1 or 2, characterized in that the target client is the lowest process level client of the nodes in which it is located, wherein the process level of the client is determined by the server.
11. A training data access method, the method being applied to a training system, the system comprising: a server and a plurality of nodes are provided,
the server is used for distributing caching tasks for the plurality of nodes, wherein the nodes are training nodes in the neural network;
the node is used for caching the training data block based on the cache task distributed by the server;
The node comprises at least one client;
a first client side in the at least one client side obtains a first data access request aiming at target training data;
and the first client responds to the first data access request, and determines a target client used for caching the target training data so as to acquire the target training data from a node where the target client is located.
12. The method of claim 11, wherein the first client determining a target client for caching the target training data in response to the first data access request comprises:
and the first client responds to the first data access request, and determines the target client for caching a target training data block in a plurality of clients according to registration information and meta information of the plurality of clients in the training system, wherein the meta information comprises information of each training data.
13. The method according to claim 11 or 12, wherein, in the case that the target client is the first client, the first client determining, in response to the first data access request, a target client for caching the target training data, to obtain the target training data from a node where the target client is located, includes:
The first client acquires a target training data block comprising the target training data from a node where the first client is located, and acquires the target training data from the target training data block.
14. The method according to claim 11 or 12, wherein, in the case that the target client is the second client, the determining, by the first client, the target client for caching the target training data in response to the first data access request, to obtain the target training data from a node where the target client is located, includes:
the first client sends a second data access request for the target training data to the second client, so that the second client responds to the second data access request to acquire the target training data block from a node where the second client is located, and sends the target training data block to the first client;
the first client acquires the target training data from the target training data block sent by the second client;
the first client is different from the second client, and the first client and the second client belong to different nodes.
15. The method of claim 13, wherein prior to the first client obtaining the target training data from the node at which the target client is located, the method further comprises:
the first client acquires a cache task distributed by the server for the first client;
and the first client caches the training data block to be cached indicated by the caching task into a node where the first client is located, wherein the training data block to be cached comprises the target training data block.
16. The method of claim 15, wherein prior to the first client obtaining the target training data from the node at which the target client is located, the method further comprises:
and under the condition that a plurality of clients in the training system finish registration, the first client caches the training data block to be cached indicated by the caching task into the node where the first client is located.
17. The method of claim 15, wherein prior to the first client obtaining the target training data from the node at which the target client is located, the method further comprises:
And under the condition that the first data access request is received, the first client determines a target training data block where target training data indicated by the first data access request is located, and acquires the target training data block from the server so as to cache the target training data block into a node where the first client is located.
18. The method according to claim 11 or 12, characterized in that the method further comprises:
the first client acquires meta information from the server and sends a registration request to the server, so that the server sends registration information of a plurality of clients in the training system to the first client according to the registration request, and the meta information comprises information of each training data;
and the first client determines a training data block indicated by the allocated cache task of each client in the plurality of clients according to the registration information of the plurality of clients and the meta information.
19. The method according to claim 11 or 12, characterized in that the method further comprises:
and the first client sends the available memory of the node where the first client is located to the server, so that the server determines a cache task allocated to the first client according to the available memories of a plurality of nodes of the training system.
20. The method according to claim 11 or 12, wherein the first client is the lowest process level client of the nodes where the process level of the client is determined by the server.
21. A training data access device, the device being applied to a training system, the system comprising: the system comprises a server and a plurality of nodes, wherein the server is used for distributing caching tasks for the nodes, and the nodes are training nodes in a neural network; the node is used for caching the training data block based on the cache task distributed by the server; the node comprises at least one client;
the apparatus is deployed in a first client of the at least one client, the apparatus comprising:
the acquisition module is used for acquiring a first data access request aiming at target training data;
and the processing module is used for responding to the first data access request, determining a target client for caching the target training data, and acquiring the target training data from a node where the target client is located.
22. The apparatus according to claim 21, wherein the processing module is configured to determine, in response to the first data access request, the target client for caching a target training data block from among the plurality of clients in the training system according to registration information of the plurality of clients and meta information, the meta information including information of each training data.
23. The apparatus according to claim 21 or 22, wherein the processing module is specifically configured to obtain, when the target client is the first client, a target training data block including the target training data from a node where the first client is located, and obtain the target training data from the target training data block.
24. The apparatus according to claim 21 or 22, wherein the processing module is specifically configured to, in a case where the target client is a second client, send a second data access request for the target training data to the second client, so that the second client obtains the target training data block from a node where the second client is located in response to the second data access request, and send the target training data block to the first client; acquiring the target training data from the target training data block sent by the second client;
the first client is different from the second client, and the first client and the second client belong to different nodes.
25. The apparatus of claim 23, wherein the apparatus further comprises:
The caching module is used for acquiring a caching task distributed by the server for the first client before the first client acquires the target training data from a node where the target client is located; and caching the training data block to be cached indicated by the caching task into the node where the first client is located, wherein the training data block to be cached comprises the target training data block.
26. The apparatus of claim 25, wherein the caching module is further configured to cache, before the target training data is obtained from a node where the target client is located, the training data block to be cached indicated by the caching task to the node where the first client is located, where a plurality of clients in the training system complete registration.
27. The apparatus of claim 25, wherein the caching module is further configured to determine, in the case of receiving the first data access request, a target training data block in which target training data indicated by the first data access request is located before the target training data is obtained from the node in which the target client is located, and obtain the target training data block from the server, so as to cache the target training data block in the node in which the first client is located.
28. The apparatus according to claim 21 or 22, characterized in that the apparatus further comprises:
the registration module is used for acquiring meta information from the server and sending a registration request to the server so that the server can send registration information of a plurality of clients in the training system to the first client according to the registration request, wherein the meta information comprises information of each training data;
and the determining module is also used for determining the training data block indicated by the allocated caching task of each client in the plurality of clients according to the registration information of the plurality of clients and the meta information.
29. The apparatus according to claim 21 or 22, characterized in that the apparatus further comprises:
and the sending module is used for sending the available memory of the node where the first client is located to the server so that the server determines the cache task allocated to the first client according to the available memories of a plurality of nodes of the training system.
30. The apparatus according to claim 21 or 22, wherein the first client is the lowest process level client in the node where the process level of the client is determined by the server.
31. An electronic device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured to invoke the instructions stored in the memory to perform the method of any of claims 11 to 20.
32. A computer readable storage medium having stored thereon computer program instructions, which when executed by a processor, implement the method of any of claims 11 to 20.
CN201911167520.0A 2019-11-25 2019-11-25 Training system, training data access method and device, electronic equipment and medium Active CN112839071B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911167520.0A CN112839071B (en) 2019-11-25 2019-11-25 Training system, training data access method and device, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911167520.0A CN112839071B (en) 2019-11-25 2019-11-25 Training system, training data access method and device, electronic equipment and medium

Publications (2)

Publication Number Publication Date
CN112839071A CN112839071A (en) 2021-05-25
CN112839071B true CN112839071B (en) 2024-01-05

Family

ID=75922991

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911167520.0A Active CN112839071B (en) 2019-11-25 2019-11-25 Training system, training data access method and device, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN112839071B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115563499A (en) * 2021-12-02 2023-01-03 华为技术有限公司 Method, device and system for training model and computing node

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6718372B1 (en) * 2000-01-07 2004-04-06 Emc Corporation Methods and apparatus for providing access by a first computing system to data stored in a shared storage device managed by a second computing system
CN104618482A (en) * 2015-02-02 2015-05-13 浙江宇视科技有限公司 Cloud data access method, server, traditional storage device and architecture
US9058122B1 (en) * 2012-08-30 2015-06-16 Google Inc. Controlling access in a single-sided distributed storage system
JP2016110175A (en) * 2014-12-02 2016-06-20 三菱電機株式会社 Client device, communication system, and data processing method and program
CN106982245A (en) * 2016-01-15 2017-07-25 Ls 产电株式会社 Supervise the client and server in Control & data acquisition system
CN110262901A (en) * 2019-06-27 2019-09-20 深圳前海微众银行股份有限公司 A kind of data processing method and data processing system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7171469B2 (en) * 2002-09-16 2007-01-30 Network Appliance, Inc. Apparatus and method for storing data in a proxy cache in a network
US9047195B2 (en) * 2012-07-05 2015-06-02 Hitachi, Ltd. Computer system with virtualization mechanism and management table, cache control method and computer program
US10484473B2 (en) * 2016-06-28 2019-11-19 Solano Labs, Inc. Systems and methods for efficient distribution of stored data objects

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6718372B1 (en) * 2000-01-07 2004-04-06 Emc Corporation Methods and apparatus for providing access by a first computing system to data stored in a shared storage device managed by a second computing system
US9058122B1 (en) * 2012-08-30 2015-06-16 Google Inc. Controlling access in a single-sided distributed storage system
JP2016110175A (en) * 2014-12-02 2016-06-20 三菱電機株式会社 Client device, communication system, and data processing method and program
CN104618482A (en) * 2015-02-02 2015-05-13 浙江宇视科技有限公司 Cloud data access method, server, traditional storage device and architecture
CN106982245A (en) * 2016-01-15 2017-07-25 Ls 产电株式会社 Supervise the client and server in Control & data acquisition system
CN110262901A (en) * 2019-06-27 2019-09-20 深圳前海微众银行股份有限公司 A kind of data processing method and data processing system

Also Published As

Publication number Publication date
CN112839071A (en) 2021-05-25

Similar Documents

Publication Publication Date Title
US11146502B2 (en) Method and apparatus for allocating resource
US10257115B2 (en) Cloud-based service resource provisioning based on network characteristics
JP6588977B2 (en) Composite partition function
CN107786621B (en) User information management method, access processing method, device and system
CN109729106B (en) Method, system and computer program product for processing computing tasks
CN109995881B (en) Load balancing method and device of cache server
US20170180470A1 (en) Method and electronic device for sending CDN address
CN111163130B (en) Network service system and data transmission method thereof
CN112637287B (en) Load balancing method and equipment
CN112261094A (en) Message processing method and proxy server
CN105791381A (en) Access control method and apparatus
US10237233B2 (en) Allocating identifiers with minimal fragmentation
CN107579929B (en) Method, system and related device for setting reliable connection communication queue pair
US10986065B1 (en) Cell-based distributed service architecture with dynamic cell assignment
CN112839071B (en) Training system, training data access method and device, electronic equipment and medium
US9459807B2 (en) Methods and systems for providing resources for cloud storage
CN114625536A (en) Video memory allocation method, device, medium and electronic equipment
CN110022341B (en) Data transmission method and related equipment
CN116303126B (en) Caching method, data processing method and electronic equipment
CN112306685A (en) Task isolation method and device, electronic equipment and computer readable medium
CN108696557B (en) Information processing system, method and device
US10176144B2 (en) Piggybacking target buffer address for next RDMA operation in current acknowledgement message
CN112491066B (en) Load balancing method, device, system, target load balancing equipment and medium
CN114513465A (en) Load balancing method, load balancing device, electronic device and storage medium
US10819775B2 (en) Systems and methods for server failover and load balancing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant