WO2018077292A1 - Data processing method and system, electronic device - Google Patents

Data processing method and system, electronic device Download PDF

Info

Publication number
WO2018077292A1
WO2018077292A1 PCT/CN2017/108449 CN2017108449W WO2018077292A1 WO 2018077292 A1 WO2018077292 A1 WO 2018077292A1 CN 2017108449 W CN2017108449 W CN 2017108449W WO 2018077292 A1 WO2018077292 A1 WO 2018077292A1
Authority
WO
WIPO (PCT)
Prior art keywords
cache
data
hard disk
memory
processor
Prior art date
Application number
PCT/CN2017/108449
Other languages
French (fr)
Chinese (zh)
Inventor
谢瑞桃
孙鹏
颜深根
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Publication of WO2018077292A1 publication Critical patent/WO2018077292A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4812Task transfer initiation or dispatching by interrupt, e.g. masked
    • G06F9/4831Task transfer initiation or dispatching by interrupt, e.g. masked with variable priority
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0844Multiple simultaneous or quasi-simultaneous cache accessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt

Definitions

  • the present application relates to deep learning techniques, and more particularly to data processing methods and systems, and electronic devices.
  • Deep learning algorithms for analyzing large-scale data are being more and more widely used in big data analysis, such as image recognition, speech recognition, and natural language processing.
  • Such deep learning algorithms for analyzing large-scale data typically require training data of terabytes or more and massive neural network model parameters.
  • Distributed deep learning systems have been proposed because the storage and computing power of a single computer cannot accommodate this task.
  • a typical distributed deep learning system consists of hundreds of computing nodes and dozens of storage nodes.
  • the training algorithm of the deep learning model is distributed in parallel on the compute nodes.
  • a training algorithm for a deep learning model (such as a training algorithm for a convolutional neural network model) usually requires hundreds of thousands of iterations.
  • the running time of the training algorithm includes the time at which the computing node obtains the training data, and the parameters of the neural network model are acquired and updated. Time, and/or time to calculate parameter gradients, etc.
  • low-speed data transmission increases the time for the node to acquire information such as training data, thereby increasing the running time of the training algorithm. Therefore, how to shorten the running time of the training algorithm is an urgent problem to be solved in the distributed deep learning system.
  • the embodiment of the present application provides a data processing solution.
  • a data processing method including:
  • the data cache system includes a plurality of cache elements, the transfer rate and/or storage space of the plurality of cache elements being different, and the plurality of The cache element is preset with different lookup priorities according to the transmission rate and/or the storage space;
  • the method of the foregoing embodiments of the present application further includes: in response to not finding a cache element that caches the corresponding cache data, searching for the corresponding cache to a distributed file system that is in communication connection with the data cache system. Data and returning the corresponding cached data to the corresponding processor process.
  • the multiple cache components include at least two of the following: an image processor GPU memory, a memory, and a hard disk;
  • the lookup priority of the GPU memory is higher than the lookup priority of the memory, and the memory has a higher lookup priority than the hard disk.
  • the hard disk comprises a solid state hard disk and a mechanical hard disk, and the solid state hard disk has a higher priority of searching than the mechanical hard disk.
  • the at least one processor process comprises: at least one GPU process, and/or at least one central processing unit CPU process.
  • the plurality of cache components look up the corresponding cache data, including:
  • the corresponding cache data is sequentially searched to the GPU memory, the memory, and the hard disk according to the lookup priority from high to low.
  • the plurality of cache components look up the corresponding cache data, including:
  • the corresponding cache data is sequentially searched to the memory and the hard disk according to the lookup priority from high to low.
  • the method of the foregoing embodiments of the present application further includes: if the cache component that caches the corresponding cache data is found to be a hard disk, cache the corresponding cache data into the GPU memory and/or the memory.
  • the method of the foregoing embodiments of the present application further includes: if the cache element with the corresponding cached data is not found, the corresponding cached data is newly cached to the memory.
  • the method of the foregoing embodiments of the present application further includes: in response to newly adding data of the corresponding cache data buffered to the memory, forming a file block of a predetermined size, creating a new thread to write the file block The cache address space of the hard disk.
  • the hard disk comprises a solid state hard disk; the data in response to the newly added cached data buffered to the memory constitutes a file block of a predetermined size, and a new thread is created to write the file block into the
  • the hard disk includes: in response to the newly added data of the corresponding cache data buffered to the memory forming a file block of a predetermined size, creating a new first thread to write the file block into a cache space of the solid state hard disk.
  • the hard disk comprises a solid state hard disk and a mechanical hard disk;
  • the data in response to the newly added cached data buffered to the memory constitutes a file block of a predetermined size, and a new thread is created to write the file block Entering a cache space of the hard disk, comprising: in response to newly adding data of the corresponding cache data buffered to the memory, forming a file block of a predetermined size, creating a new first thread to write the file block to the solid state The cache space of the hard disk, and the second thread is created to write the file block written to the solid state hard disk to the cache space of the mechanical hard disk.
  • the method of the foregoing embodiments of the present application further includes: deleting, or replacing, a cache space of a cache component whose cache space is full according to a predetermined cache space release policy when a cache space of any cache element is full. Cache data.
  • a data processing system including:
  • a receiving module configured to receive a cache data read request initiated by the at least one processor process to the data cache system;
  • the data cache system includes a plurality of cache elements, and the plurality of cache elements have different transmission rates and/or storage spaces And the plurality of cache elements are preset with different lookup priorities according to a transmission rate and/or a storage space;
  • a first searching module configured to start, according to the cache element read request, a cache element of a processor corresponding to a process that initiates the cache data read request, according to the search priority from high to low Sequentially searching for the corresponding cache data to the plurality of cache components in sequence;
  • a first returning module configured to respond to the cached component that caches the corresponding cached data, from the cache that is found The component returns the corresponding cached data to the corresponding processor process.
  • system of the foregoing embodiments of the present application further includes: a second lookup and return module, configured to communicate with the data cache system in response to not finding a cache element that caches the corresponding cache data
  • the distributed file system looks up the corresponding cached data and returns the corresponding cached data to the corresponding processor process.
  • the multiple cache elements include at least two of the following: an image processor GPU memory, a memory, and a hard disk; the GPU memory has a higher lookup priority than the memory lookup priority, and the memory search priority The level is higher than the lookup priority of the hard disk.
  • the hard disk comprises a solid state hard disk and a mechanical hard disk, and the solid state hard disk has a higher priority of searching than the mechanical hard disk.
  • the at least one processor process comprises: at least one GPU process, and/or at least one central processing unit CPU process.
  • the first search module includes: a first lookup submodule, configured to respond to the cache data read request initiated by the GPU process to the GPU according to the lookup priority from high to low Memory, memory, and hard disk look up the corresponding cached data.
  • a first lookup submodule configured to respond to the cache data read request initiated by the GPU process to the GPU according to the lookup priority from high to low Memory, memory, and hard disk look up the corresponding cached data.
  • the first search module includes: a second search submodule, configured to, in response to receiving the cache data read request initiated by the CPU process, sequentially go to the memory according to the lookup priority from high to low And the hard disk finds the corresponding cached data.
  • a second search submodule configured to, in response to receiving the cache data read request initiated by the CPU process, sequentially go to the memory according to the lookup priority from high to low And the hard disk finds the corresponding cached data.
  • system of the foregoing embodiments of the present application further includes: a first cache module, configured to cache the corresponding cache data to the GPU memory and if the cache component that caches the corresponding cache data is found to be a hard disk / or in memory.
  • a first cache module configured to cache the corresponding cache data to the GPU memory and if the cache component that caches the corresponding cache data is found to be a hard disk / or in memory.
  • system of the foregoing embodiments of the present application further includes: a second cache module, if the cache element with the corresponding cache data is not found, the corresponding cache data is newly cached into the memory.
  • system of the foregoing embodiments of the present application further includes: a writing module, configured to generate a new thread by responding to newly added data of the corresponding cache data buffered to the memory to form a file block of a predetermined size The file block is written to a cache address space of the hard disk.
  • a writing module configured to generate a new thread by responding to newly added data of the corresponding cache data buffered to the memory to form a file block of a predetermined size The file block is written to a cache address space of the hard disk.
  • the hard disk includes a solid state hard disk;
  • the writing module includes a first writing submodule configured to form a file block of a predetermined size in response to the newly added data of the corresponding cache data buffered to the memory, Creating a new first thread writes the file block to the cache space of the SSD.
  • the hard disk includes a solid state hard disk and a mechanical hard disk;
  • the writing module includes: a second writing submodule configured to form a predetermined size in response to newly added data of the corresponding cache data buffered to the memory a file block, creating a new first thread to write the file block to a cache space of the solid state drive, and creating a second thread to write the file block written to the solid state drive to the mechanical hard disk Cache space.
  • system of the foregoing embodiments of the present application further includes: a release module, configured to delete or replace the cache space that is full according to a predetermined cache space release policy when a cache space of any cache element is filled.
  • the cached space of the cache component is the original cached data.
  • an electronic device including the data processing system of any of the embodiments of the present application.
  • an electronic device including:
  • the unit in the data processing system according to any of the embodiments of the present application is shipped Row.
  • an electronic device includes: one or more processors, a memory, a plurality of cache elements, a communication component, and a communication bus, the processor, the memory, and the The cache unit and the communication component complete communication with each other through the communication bus, the transmission rate and/or storage space of the plurality of cache elements are different, and the plurality of cache elements are based on a transmission rate and/or a storage space.
  • the memory is configured to store at least one executable instruction, and the executable instruction causes the processor to perform an operation corresponding to the data processing method in any of the embodiments of the present application.
  • a computer program comprising computer readable code, when a computer readable code is run on a device, a processor in the device performs the above-described An instruction of each step in the data processing method described in an embodiment.
  • a computer readable storage medium for storing computer readable instructions, when the instructions are executed, implementing data processing according to any of the above embodiments of the present application. The operation of each step in the method.
  • the data processing method and system, the electronic device, the program, and the medium implemented in the present application have different search priorities preset for the plurality of cache components in the data cache system according to different transmission rates and/or storage spaces.
  • a plurality of cache elements look up the corresponding cache data, and return the corresponding cache data from the found cache component to the corresponding processor process.
  • the embodiments of the present application utilize the performance of various cache resources in data transmission efficiency and storage space, and help optimize cache performance (eg, maximize cache hit ratio to minimize the flow caused by cache misses to the distributed file system. Request load), and optimize the data transmission performance of the cache system (such as: maximize the throughput of data reading), help to reduce the time for the computing node to obtain training data, thereby speeding up the operation of the deep learning algorithm and shortening the
  • FIG. 1 is a flowchart of an embodiment of a data processing method according to the present application.
  • FIG. 2 is a schematic diagram of a working mode of a data caching system in an embodiment of the present application.
  • FIG. 3 is a flowchart of an embodiment of a response to a GPU process request according to an embodiment of the present application.
  • FIG. 4 is a flow chart of an embodiment of a response to a CPU process request according to an embodiment of the present application.
  • FIG. 5 is a flowchart of an embodiment of a response file cache request according to an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of an embodiment of a data processing system according to the present application.
  • FIG. 7 is a schematic structural diagram of another embodiment of a data processing system according to the present application.
  • FIG. 8 is a schematic structural diagram of an embodiment of an electronic device according to the present application.
  • FIG. 9 is a schematic structural diagram of another embodiment of an electronic device according to the present application.
  • Embodiments of the present application can be applied to electronic devices such as terminal devices, computer systems, servers, etc., which can operate with numerous other general purpose or special purpose computing system environments or configurations.
  • Examples of well-known terminal devices, computing systems, environments, and/or configurations suitable for use with electronic devices such as terminal devices, computer systems, servers, and the like include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients Machines, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and the like.
  • FIG. 1 is a flowchart of an embodiment of a data processing method according to the present application.
  • FIG. 2 is a schematic diagram of a working mode of a data caching system in an embodiment of the present application. The various steps of the method of this embodiment will now be described in detail in conjunction with FIG.
  • step S110 a cache data read request initiated by at least one processor process to the data cache system is received.
  • the data cache system includes a plurality of cache elements, the transfer rate and/or storage space of the plurality of cache elements are different, and the plurality of cache elements are preset with different lookup priorities according to the transmission rate and/or the storage space. .
  • the data cache system includes a plurality of cache elements on the computing node.
  • the foregoing plurality of cache elements may include but are not limited to at least two of the following: GPU memory, memory, and hard disk.
  • the hard disk may include a solid state drive (SSD) and a mechanical hard disk (HDD). That is, the above various cache elements may include at least two of GPU memory, a memory (RAM), a solid state hard disk, and a mechanical hard disk.
  • SSD solid state drive
  • HDD mechanical hard disk
  • These cache elements have different data transfer efficiencies. For example, on a commercial computer, the memory usually supports a transfer rate of about 1600MB/s, and the SSD usually supports a transfer rate of about 100MB/s to 600MB/s. Mechanical hard disks are usually available.
  • the storage resources of these cache elements have different storage spaces.
  • the storage space of the GPU memory is usually 12 GB
  • the storage space of the memory is usually 64 GB to 256 GB
  • the storage space of the solid state hard disk is usually 128 GB to 512 GB.
  • Mechanical hard disk storage space is usually 1TB ⁇ 4TB.
  • different cache priorities are set in advance according to the transmission rate and/or storage space of these cache elements.
  • the GPU memory has a higher priority than the memory search priority, and the memory search priority is higher than the hard disk search priority, wherein the memory search Priority
  • the level is higher than the search priority of the SSD, and the SSD priority is higher than the search priority of the mechanical hard disk.
  • the at least one processor process may include, but is not limited to, at least one GPU process, and/or at least one CPU process.
  • step S110 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a receiving module 10 executed by the processor.
  • step S120 in response to receiving the cache data read request, starting from the cache element of the processor corresponding to the process of initiating the cache data read request, according to the above-mentioned search priority from high to low, sequentially to the above The cache element looks up the corresponding cached data.
  • the cache data is searched for the GPU memory, the memory, and the hard disk in descending order of the lookup priority.
  • the cache data is searched for in the memory and the hard disk in order from the highest priority to the low priority. For example, in response to receiving a file read request initiated by the GPU process, sequentially searching for a file to be read from the GPU memory, the memory, the solid state hard disk, and the mechanical hard disk; in response to receiving the file read request initiated by the CPU process, in turn Find the file you want to read from memory, solid state drive, and hard drive.
  • step S120 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first lookup module 20 that is executed by the processor.
  • step S130 in response to finding the cache element in which the corresponding cache data is cached, the corresponding cache data is returned from the found cache element to the corresponding processor process.
  • the cached data is returned from the memory to the GPU process.
  • step S130 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first return module 30 that is executed by the processor.
  • the hierarchical management of the data caching system gives different storage resources different search priorities, from high to low, GPU memory, memory, solid state hard disk, and mechanical hard disk. This is mainly because the data transmission rates supported by the four devices are successively decremented. The higher the data transmission rate of the device, the faster the client obtains the data, which is more conducive to improving the speed of the deep learning system.
  • the data cache system receives the cache data read request sent by the GPU process, it searches the four storage resources layer by layer according to the search priority. Once found, the corresponding cache data is returned to the process of initiating the request, as shown in FIG. Show.
  • This storage resource hierarchy management method takes full advantage of the performance of various cache resources in terms of data transmission efficiency and storage space, and helps optimize cache performance (eg, maximizes cache hit ratio to minimize flow distribution caused by cache misses).
  • cache performance eg, maximizes cache hit ratio to minimize flow distribution caused by cache misses.
  • the file system's request load and optimize the data transfer performance of the cache system (such as: maximize the throughput of data read), thereby reducing the time for the compute node to obtain data, thereby speeding up the deep learning algorithm.
  • the method may further include: searching for the cache corresponding to the distributed file system communicatively connected to the data cache system in response to not finding the cache element that caches the corresponding cache data. Data and return the corresponding cached data to the corresponding processor process.
  • the method further includes: if the cache component that caches the corresponding cache data is a hard disk, cache the corresponding cache data into the GPU memory and/or the memory.
  • the foregoing process and data caching system can work in a client/server (C/S) working mode, and the data caching system acts as a server and can simultaneously respond to multiple processes.
  • Cache data read requests (such as GPU processes or CPU processes) so that multiple processes can share cached content, thereby improving Cache efficiency.
  • the client/server working mode enables the data caching system to simultaneously respond to cached data read requests of multiple processes, so that multiple processes can share cached data, thereby improving cache efficiency.
  • the data caching system is logically located between the client and the distributed file system, as shown in Figure 2.
  • Cache data reads provide the client with a file-reading interface similar to the file system.
  • the client reads the cached data through the interface, as shown in Figure 2. If the cached data has been cached in the data cache system, ie, a cache hit, the data cache system directly returns the cached data, as shown in 2.response in Figure 2. Otherwise, if the cache misses, that is, the cached data has not been cached in the data cache system, the data cache system reads the cached data from the distributed file system (see 3.get in Figure 2) and caches the cached data ( See 4.cache in Figure 2, and return the cached data to the client (see 5.response in Figure 2).
  • This design structure has the characteristics of being transparent to the client, that is, the client operates in the same way as the file cache interface provided by the data cache system, and does not use the data cache system.
  • the method further includes: in response to not finding a cache element that caches the corresponding cached data, searching for the corresponding cached data to a distributed file system that is in communication with the data cache system, and The corresponding cached data described above is returned to the corresponding processor process.
  • FIG. 3 is a flowchart of an embodiment of a response to a GPU process request according to an embodiment of the present application.
  • a cache data read request is taken as a file read request as an example.
  • this embodiment includes:
  • step S310 in response to receiving the file read request initiated by the GPU process, determining whether the file to be read is in the GPU memory, and if so, the process proceeds to step S312 to return the file to be read to the GPU process; If not, the process proceeds to step S314.
  • step S314 it is determined whether the file to be read is in the memory, and if so, the process proceeds to step S312; if not, the process proceeds to step S316.
  • step S316 it is determined whether the file to be read is in the solid state hard disk, and if so, the process proceeds to step S320; if not, the process proceeds to step S318.
  • step S318 it is determined whether the file to be read is in the mechanical hard disk, and if so, the process proceeds to step S320; if not, the process proceeds to step S322.
  • steps S310-S318 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first lookup module 20 executed by the processor or a first look-up sub-module 22 therein.
  • step S320 the file to be read is read from the solid state hard disk or the mechanical hard disk, cached into the memory and the GUP memory, and then the file to be read is returned to the GPU process.
  • step S320 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first cache module 40 that is executed by the processor.
  • Caching files in GPU memory and memory takes advantage of the respective advantages of storage resources to improve cache efficiency.
  • GPU memory access speed is fast, but the available cache space is small; while memory can provide a large cache space, but the access speed is slightly slower.
  • the file is only cached in the GPU memory, because the cache space is small, it will be filled up quickly. Once the GPU memory is full, when the new file needs to be cached, the old file will be deleted to make room for the new file. If the deleted file is revisited in the future, it can only be found in the SSD and the mechanical hard disk, which takes a long time.
  • the access speed is not as fast as GPU memory. In this embodiment, a cache twice strategy is adopted, and more files can be cached in consideration of file access speed.
  • step S322 the file is read from the distributed file system. Thereafter, the process proceeds to step S324.
  • the file is cached in memory, a solid state drive, and/or a mechanical hard disk, and then the file to be read is returned to the GPU process.
  • steps S322-S324 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a second lookup and return module 50 executed by the processor.
  • the technical solution provided in this embodiment not only utilizes the memory and the hard disk, but also utilizes the GPU memory cache data, and can implement cache read acceleration for a large number of GPU algorithms in deep learning.
  • This embodiment uses a dynamic caching mechanism to support the massive data size required by large-scale deep learning systems.
  • the present application utilizes multiple storage resources to form a heterogeneous resource data caching system. Since the GPU memory and memory resources are small, the cache space formed by the two devices cannot meet the data size required by the large-scale deep learning system. Therefore, we use a larger storage resource, solid state drive and mechanical hard drive to build a cache system. This caching system can effectively improve data transmission efficiency. Because the original file needs to be obtained from the remote hard disk (distributed file system), caching it to the local can save the network transmission time; cache it to the memory to get a faster transfer rate than the hard disk; for the GPU process needs The file is cached into the GPU memory, and the transfer time from memory to GPU memory can be eliminated.
  • the design of the present application can utilize the advantages of fast GPU memory and memory transfer rate on the one hand, and the advantage of large storage space of solid state hard disk and mechanical hard disk on the other hand, which helps to improve the efficiency of the cache system (access speed). And meet the needs of large data scale.
  • FIG. 4 is a flow chart of an embodiment of a response to a CPU process request according to an embodiment of the present application.
  • a cache data read request is taken as a file read request as an example.
  • this embodiment includes:
  • step S410 in response to receiving the file read request initiated by the CPU process, determining whether the file to be read is in the RAM, and if so, the process proceeds to step S412 to return the file to be read to the CPU process; Otherwise, the process proceeds to step S414.
  • step S414 it is determined whether the file to be read is in the solid state hard disk, and if so, the process proceeds to step S418; if not, the process proceeds to step S416.
  • step S416 it is determined whether the file to be read is in the mechanical hard disk, and if so, the process proceeds to step S418; if not, the process proceeds to step S420.
  • steps S410-S416 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first lookup module 20 or a second look-up sub-module 24 executed by the processor.
  • the file to be read is read from the hard disk, cached into the RAM and the GUP memory, and then the file to be read is returned to the CPU process.
  • step S418 may be performed by the processor invoking a corresponding instruction stored in the memory, or may be performed by the first cache module 40 being executed by the processor.
  • step S420 the file is read from the distributed file system. Thereafter, the process proceeds to step S422.
  • step S422 the file is cached in a memory, a solid state hard disk, and/or a mechanical hard disk, and then the file is returned to the CPU process.
  • steps S420-S422 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a second lookup and return module 50 executed by the processor.
  • This embodiment also uses a dynamic caching mechanism to support the massive data size required by large-scale deep learning systems.
  • the method further includes: if the cache element having the corresponding cached data is not found, the corresponding cached data is newly cached into the memory.
  • the method may further include: in response to newly adding data of the corresponding cache data buffered to the memory, forming a file block of a predetermined size, creating a new thread to write the file block to a cache address of the hard disk. space.
  • the hard disk comprises a solid state hard disk.
  • creating a new thread to write the file block to the hard disk may include: responding to the corresponding cached data newly added to the memory.
  • the data constitutes a file block of a predetermined size, and a new first thread is created to write the above file block into the cache space of the solid state hard disk.
  • the hard disk includes a solid state hard disk and a mechanical hard disk.
  • creating a new thread to write the file block to the cache space of the hard disk may include: responding to the corresponding new cache to the memory
  • the data of the cached data constitutes a file block of a predetermined size
  • a new first thread is created to write the file block to the cache space of the SSD
  • a second thread is created to write the file block written to the SSD to the cache of the mechanical hard disk. space.
  • FIG. 5 is a flowchart of an embodiment of a response file cache request according to an embodiment of the present application, that is, an exemplary flowchart of step S324 shown in FIG. 3 and step S420 shown in FIG. 4. As shown in FIG. 5, this embodiment includes:
  • step S510 when the cache miss occurs, the file to be read is not found in the data cache system, and the file to be read is cached from the distributed file system to the buffer of the memory.
  • step S510 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a second lookup and return module 50 executed by the processor.
  • step S520 when the data in the buffer is accumulated to constitute a block file, a new sub-thread is created to write the block file into the cache address space of the SSD.
  • step S520 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a write module 70 executed by the processor or a first write sub-module 72 therein.
  • the method further includes:
  • Step S530 creating a separate sub-thread to write the cache data of the SSD to the cache address space of the mechanical hard disk.
  • step S520 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a write module 70 or a second write sub-module 74 executed by the processor.
  • the embodiment of the present application is directed to a small file (a few KB to several MB) for training data in most deep learning systems.
  • the characteristics of the composition, the multi-threaded asynchronous cache writing method shown in Figure 5 is proposed.
  • the multi-threading refers to the writing of different types of cache spaces by different threads, and the asynchronous means that the writing operations of multiple threads are not synchronized with each other.
  • the embodiment of the present application utilizes a portion of the memory cache address space as a memory buffer.
  • the files to be cached are first stored in the memory buffer. Once the data in the buffer has accumulated to form a block file (the size of the block file can be adjusted according to the hardware system), the block file is written to the SSD cache address space by a newly created sub-thread. That is, a main thread is responsible for responding to file cache requests and creating a SSD write thread when needed, as shown in Figure 5.
  • the embodiment of the present application proposes that a separate thread is specifically responsible for writing the cache data of the solid state hard disk into the mechanical hard disk cache address space, as shown in FIG. 5 . Use this In this way, the write speed of the cache system is mainly determined by the write speed of the memory and the write speed of the solid state hard disk, without being affected by the write speed of the mechanical hard disk.
  • the data cache system reads the corresponding cache data from the distributed file system and inserts the file into the cache system (memory space of the memory, SSD and mechanical hard disk).
  • the cache space of any of the cache elements when the cache space of any of the cache elements is full, the cache space of the cache component whose cache space is full may be deleted or replaced according to a predetermined cache space release policy.
  • Cache data That is: if the cache space of the cache component is full, you can set a space for the new file according to the cache replacement policy, such as the least used frequency (Least Frequently Used, etc.).
  • the data cache system supports any cache space release policy. It can be selected according to the application requirements.
  • the files are cached in multiple cache spaces (memory, solid state hard disk and mechanical hard disk), and the advantages of various storage devices can be utilized to improve the cache efficiency, which is the same as the above embodiment.
  • the reason for caching in two cache spaces is the same, so I won't go into details here.
  • any of the data processing methods provided by the embodiments of the present application may be performed by any suitable device having data processing capabilities, including but not limited to: a terminal device, a server, and the like.
  • any data processing method provided by the embodiment of the present application may be executed by a processor, such as the processor, by executing a corresponding instruction stored in the memory to execute any one of the data processing methods mentioned in the embodiments of the present application. This will not be repeated below.
  • the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed.
  • the foregoing steps include the steps of the foregoing method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
  • FIG. 6 is a schematic structural diagram of an embodiment of a data processing system according to the present application.
  • the data processing system of the embodiments of the present application can be selectively used to implement the foregoing embodiments of the data mathematical methods of the present application.
  • the data processing system of this embodiment includes a receiving module 10, a first lookup module 20, and a first return module 30. among them:
  • the receiving module 10 is configured to receive a cache data read request initiated by the at least one processor process (such as at least one GPU process and/or at least one CPU process) to the data cache system, where the data cache system includes multiple cache components, and more The transmission rate and/or storage space of the cache element are different, and the plurality of cache elements are preset with different lookup priorities according to the transmission rate and/or the storage space.
  • the foregoing multiple cache components may include, but are not limited to, at least two of the following: GPU memory, memory, and hard disk, wherein the GPU memory has a higher lookup priority than a memory lookup priority, and the memory lookup priority is higher than the hard disk search. priority.
  • the hard disk may further include a solid state hard disk and a mechanical hard disk, wherein the solid state hard disk has a higher priority than the mechanical hard disk.
  • the first searching module 20 is configured to, in response to receiving the cache data read request, start from the cache element of the processor corresponding to the process that initiates the cache data read request, and sequentially perform the search priority from high to low.
  • the above various cache elements look up the corresponding cache data.
  • the first returning module 30 is configured to, in response to the first searching module 20 searching for the cache element buffered with the corresponding cache data, returning the corresponding cache data from the found cache component to the corresponding processor process.
  • the first searching module 20 may include: a first searching sub-module 22, configured to respond to receiving a cache data read request initiated by the GPU process, according to The above search priority sequentially searches the GPU memory, the memory and the hard disk for the corresponding cache data; and/or the second search submodule 24 is configured to respond to the cache data read request initiated by the CPU process. According to the above search priority, the cached data is searched for in memory and hard disk in descending order.
  • FIG. 7 is a schematic structural diagram of another embodiment of a data processing system according to the present application.
  • the data processing system of the embodiments of the present invention may further include: a first cache module 40, configured to: if the cache component that caches the corresponding cache data is a hard disk, the corresponding cache Data is cached into GPU memory and/or memory.
  • a first cache module 40 configured to: if the cache component that caches the corresponding cache data is a hard disk, the corresponding cache Data is cached into GPU memory and/or memory.
  • the data processing system of the embodiments of the present invention may further include: a second lookup and return module 50, configured to respond to the cache element that does not find the cached data, and The distributed file system of the data cache system communication connection searches for the corresponding cache data and returns the corresponding cache data to the corresponding processor process.
  • a second lookup and return module 50 configured to respond to the cache element that does not find the cached data
  • the distributed file system of the data cache system communication connection searches for the corresponding cache data and returns the corresponding cache data to the corresponding processor process.
  • the data processing system of the embodiments of the present invention may further include: a second cache module 60, configured to: if the cache element having the corresponding cached data is not found, the corresponding cached data is Added cache to memory.
  • the data processing system of the embodiments of the present invention may further include: a writing module 70, configured to form a file block of a predetermined size in response to the data of the corresponding cached data newly added to the memory. Create a new thread to write the above file block to the cache address space of the hard disk.
  • a writing module 70 configured to form a file block of a predetermined size in response to the data of the corresponding cached data newly added to the memory. Create a new thread to write the above file block to the cache address space of the hard disk.
  • the writing module 70 may include: a first writing sub-module 72, configured to form a file of a predetermined size in response to newly added data of the corresponding cached data buffered to the memory when the hard disk is a solid state disk. Block, create a new first thread to write the above file block to the cache space of the SSD;
  • the second writing sub-module 74 is configured to, when the hard disk comprises a solid state hard disk and a mechanical hard disk, form a file block of a predetermined size in response to the data of the corresponding cache data newly added to the memory, and create a new first thread.
  • the file block is written into the cache space of the SSD, and a second thread is created to write the file block written to the SSD to the cache space of the mechanical hard disk.
  • the release module 80 is configured to delete or replace the original cache data of the cache space of the cache component whose cache space is full according to the predetermined cache space release policy when the cache space of any of the cache elements is full.
  • the embodiment of the present application further provides an electronic device, including the data processing system of any of the foregoing embodiments of the present application.
  • the embodiment of the present application further provides another electronic device, including:
  • the embodiment of the present application further provides another electronic device, including: one or more processors, a memory, a plurality of cache components, a communication component, and a communication bus, the processor, the memory, the plurality of cache units, and the foregoing communication
  • the components complete communication with each other through the communication bus, the transmission rates and/or storage spaces of the plurality of cache components are different, and the plurality of cache components are preset with different lookup priorities according to the transmission rate and/or the storage space;
  • the memory is configured to store at least one executable instruction, and the executable instruction causes the processor to perform an operation corresponding to the data processing method of any of the above embodiments of the present application.
  • FIG. 9 is a schematic structural diagram of an embodiment of an electronic device according to the present application.
  • the electronic device of this embodiment includes a processor 902, a communication component 904, a memory 906, a GPU memory 908, a memory 910, and a communication bus 912.
  • Memory 906 can include a mechanical hard disk and/or a solid state hard disk.
  • Communication components may include, but are not limited to, input/output (I/O) interfaces, network cards, and the like.
  • the processor 902, the communication component 904, the memory 906, the GPU memory 908, and the memory 910 communicate with each other via the communication bus 912.
  • the communication component 904 is configured to communicate with network elements of other devices, such as a client or a data collection device.
  • the processor 902 is configured to execute the program 920. Specifically, the related steps in the foregoing method embodiments may be performed.
  • the program can include program code, the program code including computer operating instructions.
  • the processor 902 may be one or more, and the device may be a CPU, or a GPU, or an Application Specific Integrated Circuit (ASIC), or may be configured to implement one or the embodiment of the present application. Multiple integrated circuits, etc.
  • ASIC Application Specific Integrated Circuit
  • the memory 906 is configured to store the program 920.
  • Memory 906 may include high speed RAM memory and may also include non-volatile memory, such as at least one disk memory.
  • the program 920 includes at least one executable instruction, and may be specifically configured to cause the processor 902 to: receive a cache data read request initiated by the at least one processor process to the data cache system, the data cache system including a plurality of cache elements, And the foregoing multiple cache elements have different transmission rates and/or storage spaces, and are preset with different lookup priorities according to respective transmission rates and/or storage spaces; in response to the received cache data read request, the above is initiated.
  • the cache element of the processor corresponding to the process of buffering the data read request starts to search for the corresponding cache data in the order of the above-mentioned lookup priorities from high to low, and sequentially responds to the cache element that caches the corresponding cached data. Returning the corresponding cached data from the found cache component to the corresponding processor process.
  • FIG. 9 is a schematic structural diagram of another embodiment of an electronic device according to the present application.
  • the electronic device includes one or more processors, a communication unit, etc., the one or more processors such as one or more CPUs 901, and/or one or more GPUs 913, etc., the processor may Various appropriate actions and processes are performed in accordance with executable instructions stored in read only memory (ROM) 902 or executable instructions loaded into random access memory (RAM) 903 from storage portion 908.
  • ROM read only memory
  • RAM random access memory
  • Communication portion 912 can include, but is not limited to, a network card, which can include, but is not limited to, an IB (Infiniband) network card, and the processor can communicate with read only memory 902 and/or random access memory 903 to execute executable instructions over bus 904.
  • the operation is performed by the communication unit 912, and communicates with other target devices via the communication unit 912, thereby performing operations corresponding to any data processing method provided by the embodiment of the present application, for example, receiving a cache initiated by at least one processor process to the data cache system.
  • the data cache system includes a plurality of cache elements, the transfer rates and/or storage spaces of the plurality of cache elements are different, and the plurality of cache elements are preset according to a transfer rate and/or a storage space Having different lookup priorities; in response to receiving the cache data read request, starting from a cache element of a processor corresponding to the process initiating the cache data read request, according to the lookup priority from high to low Having sequentially searched the plurality of cache elements for respective cache data; in response to finding the cache, the corresponding cache data is buffered Element, returns to the corresponding cache data from the processor corresponding to the process to find the cache element.
  • RAM 903 various programs and data required for the operation of the device can be stored.
  • the CPU 901, the ROM 902, and the RAM 903 are connected to each other by a bus 904.
  • ROM 902 is an optional module.
  • the RAM 903 stores executable instructions or writes executable instructions to the ROM 902 at runtime, the executable instructions causing the processor 901 to perform operations corresponding to the data processing methods described above.
  • An input/output (I/O) interface 905 is also coupled to bus 904.
  • the communication unit 912 may be integrated or may be provided with a plurality of sub-modules (for example, a plurality of IB network cards) and on the bus link.
  • the following components are connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, etc.; an output portion 907 including, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 908 including a hard disk or the like. ; And a communication portion 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the Internet.
  • Driver 910 is also connected to I/O interface 905 as needed.
  • a removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like is mounted on the drive 910 as needed so that a computer program read therefrom is installed into the storage portion 908 as needed.
  • FIG. 9 is only an optional implementation manner.
  • the number and type of components in FIG. 9 may be selected, deleted, added, or replaced according to actual needs;
  • Different function components can also be implemented in separate settings or integrated settings, such as GPU and CPU detachable settings or GPU can be integrated on the CPU, the communication part can be separated, or integrated on the CPU or GPU. and many more.
  • an embodiment of the present disclosure includes a computer program product comprising a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for executing the method illustrated in the flowchart, the program code comprising Executing instructions corresponding to the method steps provided by the embodiments of the present application, for example, receiving an instruction of a cache data read request initiated by at least one processor process to a data cache system; the data cache system includes multiple cache elements, The transmission rate and/or the storage space of the cache element are different, and the plurality of cache elements are preset with different lookup priorities according to the transmission rate and/or the storage space; in response to receiving the cache data read request, Starting from the cache element of the processor corresponding to the process of initiating the cache data read request, sequentially querying the plurality of cache elements for the corresponding cache data according to the lookup priority from the highest priority to the lower priority;
  • the embodiment of the present application further provides a computer program, including computer readable code, when the computer readable code is run on a device, the processor in the device executes to implement any of the embodiments of the present application. Instructions for each step in the data processing method.
  • the embodiment of the present application further provides a computer readable storage medium for storing computer readable instructions, which when executed, implement the operations of the steps in the data processing method of any embodiment of the present application.
  • the above method according to an embodiment of the present application may be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or as via Internet
  • Such software processing on a recording medium of dedicated hardware such as an ASIC or an FPGA.
  • a computer, processor, microprocessor controller or programmable hardware includes storage components (eg, RAM, ROM, flash memory, etc.) that can store or receive software or computer code, when the software or computer code is The processing methods described herein are implemented when the processor or hardware is accessed and executed. Moreover, when a general purpose computer accesses code for implementing the processing shown herein, the execution of the code converts the general purpose computer into a special purpose computer for performing the processing shown herein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Memory System Of A Hierarchy Structure (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A data processing method and system, and an electronic device, said method comprising: receiving a cache data read request initiated by at least one processor process to a data cache system (S110); in response to the received cache data read request, starting from a cache element of a processor corresponding to the process initiating the cache data read request, searching for relevant cache data in multiple types of cache element sequentially according to a search priority level, ordered from high to low (S120); in response to finding cache elements having cached said relevant cache data, returning said relevant cache data from the found cache elements to the corresponding processor process (S130). The present method facilitates greatly reducing time calculating nodes to acquire training data, shortening training algorithm time.

Description

数据处理方法和***、电子设备Data processing method and system, electronic device
本申请要求在2016年10月28日提交中国专利局、申请号为CN201610972718.6、发明名称为“数据处理方法和***、电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese Patent Application, filed on Oct. 28, 2016, filed on Jan. 28, 2016, filed Jan. In this application.
技术领域Technical field
本申请涉及深度学习技术,尤其涉及数据处理方法和***、电子设备。The present application relates to deep learning techniques, and more particularly to data processing methods and systems, and electronic devices.
背景技术Background technique
分析大规模数据的深度学习算法正被越来越广泛地应用在大数据分析中,例如图像识别、语音识别和自然语言处理等。这类分析大规模数据的深度学习算法通常需要TB级或以上的训练数据和海量的神经网络模型参数。由于单台计算机的存储和计算能力都无法适应这种任务需求,所以分布式深度学习***被提出。Deep learning algorithms for analyzing large-scale data are being more and more widely used in big data analysis, such as image recognition, speech recognition, and natural language processing. Such deep learning algorithms for analyzing large-scale data typically require training data of terabytes or more and massive neural network model parameters. Distributed deep learning systems have been proposed because the storage and computing power of a single computer cannot accommodate this task.
典型的分布式深度学习***由成百上千台计算节点和几十台存储节点构成。深度学习模型的训练算法被分布式并行地运行在计算节点上。一个深度学习模型的训练算法(例如卷积神经网络模型的训练算法)通常需要经过几十万次迭代运算,训练算法的运行时间包括计算节点获取训练数据的时间,获取、更新神经网络模型参数的时间,和/或计算参数梯度的时间等。显然,低速的数据传输会增加节点获取训练数据等信息的时间,从而增加训练算法的运行时间。因此,如何缩短训练算法的运行时间,是分布式深度学习***亟需解决的问题。A typical distributed deep learning system consists of hundreds of computing nodes and dozens of storage nodes. The training algorithm of the deep learning model is distributed in parallel on the compute nodes. A training algorithm for a deep learning model (such as a training algorithm for a convolutional neural network model) usually requires hundreds of thousands of iterations. The running time of the training algorithm includes the time at which the computing node obtains the training data, and the parameters of the neural network model are acquired and updated. Time, and/or time to calculate parameter gradients, etc. Obviously, low-speed data transmission increases the time for the node to acquire information such as training data, thereby increasing the running time of the training algorithm. Therefore, how to shorten the running time of the training algorithm is an urgent problem to be solved in the distributed deep learning system.
发明内容Summary of the invention
本申请实施例提供一种数据处理方案。The embodiment of the present application provides a data processing solution.
根据本申请实施例的一个方面,提供一种数据处理方法,包括:According to an aspect of the embodiments of the present application, a data processing method is provided, including:
接收由至少一处理器进程向数据缓存***发起的缓存数据读取请求;所述数据缓存***包括多种缓存元件,所述多种缓存元件的传输速率和/或存储空间不同、且所述多种缓存元件根据传输速率和/或存储空间被预先设置有不同的查找优先级;Receiving a cache data read request initiated by the at least one processor process to the data cache system; the data cache system includes a plurality of cache elements, the transfer rate and/or storage space of the plurality of cache elements being different, and the plurality of The cache element is preset with different lookup priorities according to the transmission rate and/or the storage space;
响应于接收到所述缓存数据读取请求,从发起所述缓存数据读取请求的进程对应的处理器的缓存元件开始,按照所述查找优先级从高到低的顺序依次向所述多种缓存元件查找相应缓存数据;Responding to receiving the cache data read request, starting from a cache element of a processor corresponding to a process that initiates the cache data read request, sequentially performing the plurality of lookup priorities in descending order The cache component looks up the corresponding cache data;
响应于查找到缓存有所述相应缓存数据的缓存元件,从查找到的缓存元件向所述对应的处理器进程返回所述相应缓存数据Responding to the cache element having the corresponding cache data cached, returning the corresponding cache data from the found cache element to the corresponding processor process
可选地,本申请上述各实施例的方法还包括:响应于未查找到缓存有所述相应缓存数据的缓存元件,向与所述数据缓存***通信连接的分布式文件***查找所述相应缓存数据、并将所述相应缓存数据向所述对应的处理器进程返回。Optionally, the method of the foregoing embodiments of the present application further includes: in response to not finding a cache element that caches the corresponding cache data, searching for the corresponding cache to a distributed file system that is in communication connection with the data cache system. Data and returning the corresponding cached data to the corresponding processor process.
可选地,所述多种缓存元件包括以下至少二种:图像处理器GPU显存、内存和硬盘;Optionally, the multiple cache components include at least two of the following: an image processor GPU memory, a memory, and a hard disk;
所述GPU显存的查找优先级高于所述内存的查找优先级,所述内存的查找优先级高于所述硬盘的查找优先级。 The lookup priority of the GPU memory is higher than the lookup priority of the memory, and the memory has a higher lookup priority than the hard disk.
可选地,所述硬盘包括固态硬盘和机械硬盘,所述固态硬盘的查找优先级高于所述机械硬盘的查找优先级。Optionally, the hard disk comprises a solid state hard disk and a mechanical hard disk, and the solid state hard disk has a higher priority of searching than the mechanical hard disk.
可选地,所述至少一处理器进程包括:至少一GPU进程,和/或,至少一中央处理器CPU进程。Optionally, the at least one processor process comprises: at least one GPU process, and/or at least one central processing unit CPU process.
可选地,响应于接收到所述缓存数据读取请求,从发起所述缓存数据读取请求的进程对应的处理器的缓存元件开始,按照所述查找优先级从高到低的顺序依次向所述多种缓存元件查找相应缓存数据,包括:Optionally, in response to receiving the cache data read request, starting from a cache element of a processor corresponding to the process that initiates the cache data read request, sequentially following the lookup priority from high to low The plurality of cache components look up the corresponding cache data, including:
响应于接收到由GPU进程发起的所述缓存数据读取请求,按照所述查找优先级从高到低的顺序依次向GPU显存、内存和硬盘查找所述相应缓存数据。In response to receiving the cache data read request initiated by the GPU process, the corresponding cache data is sequentially searched to the GPU memory, the memory, and the hard disk according to the lookup priority from high to low.
可选地,响应于接收到所述缓存数据读取请求,从发起所述缓存数据读取请求的进程对应的处理器的缓存元件开始,按照所述查找优先级从高到低的顺序依次向所述多种缓存元件查找相应缓存数据,包括:Optionally, in response to receiving the cache data read request, starting from a cache element of a processor corresponding to the process that initiates the cache data read request, sequentially following the lookup priority from high to low The plurality of cache components look up the corresponding cache data, including:
响应于接收到由CPU进程发起的所述缓存数据读取请求,按照所述查找优先级从高到低的顺序依次向内存和硬盘查找所述相应缓存数据。In response to receiving the cache data read request initiated by the CPU process, the corresponding cache data is sequentially searched to the memory and the hard disk according to the lookup priority from high to low.
可选地,本申请上述各实施例的方法还包括:如果查找到缓存有所述相应缓存数据的缓存元件为硬盘,则将所述相应缓存数据缓存到GPU显存和/或内存中。Optionally, the method of the foregoing embodiments of the present application further includes: if the cache component that caches the corresponding cache data is found to be a hard disk, cache the corresponding cache data into the GPU memory and/or the memory.
可选地,本申请上述各实施例的方法还包括:如果未查找到缓存有所述相应缓存数据的缓存元件,则将所述相应缓存数据新增缓存到所述内存。Optionally, the method of the foregoing embodiments of the present application further includes: if the cache element with the corresponding cached data is not found, the corresponding cached data is newly cached to the memory.
可选地,本申请上述各实施例的方法还包括:响应于新增缓存到所述内存的所述相应缓存数据的数据构成预定大小的文件块,创建新的线程将所述文件块写入所述硬盘的缓存地址空间。Optionally, the method of the foregoing embodiments of the present application further includes: in response to newly adding data of the corresponding cache data buffered to the memory, forming a file block of a predetermined size, creating a new thread to write the file block The cache address space of the hard disk.
可选地,所述硬盘包括固态硬盘;所述响应于新增缓存到所述内存的所述相应缓存数据的数据构成预定大小的文件块,创建新的线程将所述文件块写入所述硬盘,包括:响应于新增缓存到所述内存的所述相应缓存数据的数据构成预定大小的文件块,创建新的第一线程将所述文件块写入所述固态硬盘的缓存空间。Optionally, the hard disk comprises a solid state hard disk; the data in response to the newly added cached data buffered to the memory constitutes a file block of a predetermined size, and a new thread is created to write the file block into the The hard disk includes: in response to the newly added data of the corresponding cache data buffered to the memory forming a file block of a predetermined size, creating a new first thread to write the file block into a cache space of the solid state hard disk.
可选地,所述硬盘包括固态硬盘和机械硬盘;所述响应于新增缓存到所述内存的所述相应缓存数据的数据构成预定大小的文件块,创建新的线程将所述文件块写入所述硬盘的缓存空间,包括:响应于新增缓存到所述内存的所述相应缓存数据的数据构成预定大小的文件块,创建新的第一线程将所述文件块写入所述固态硬盘的缓存空间,并创建第二线程将写入到所述固态硬盘的所述文件块写入所述机械硬盘的缓存空间。Optionally, the hard disk comprises a solid state hard disk and a mechanical hard disk; the data in response to the newly added cached data buffered to the memory constitutes a file block of a predetermined size, and a new thread is created to write the file block Entering a cache space of the hard disk, comprising: in response to newly adding data of the corresponding cache data buffered to the memory, forming a file block of a predetermined size, creating a new first thread to write the file block to the solid state The cache space of the hard disk, and the second thread is created to write the file block written to the solid state hard disk to the cache space of the mechanical hard disk.
可选地,本申请上述各实施例的方法还包括:在任一缓存元件的缓存空间被写满时,根据预定缓存空间释放策略,删除或替换缓存空间被写满的缓存元件的缓存空间原有的缓存数据。Optionally, the method of the foregoing embodiments of the present application further includes: deleting, or replacing, a cache space of a cache component whose cache space is full according to a predetermined cache space release policy when a cache space of any cache element is full. Cache data.
根据本申请实施例的另一个方面,提供一种数据处理***,包括:According to another aspect of the embodiments of the present application, a data processing system is provided, including:
接收模块,用于接收由至少一处理器进程向数据缓存***发起的缓存数据读取请求;所述数据缓存***包括多种缓存元件,所述多种缓存元件的传输速率和/或存储空间不同、且所述多种缓存元件根据传输速率和/或存储空间被预先设置有不同的查找优先级;a receiving module, configured to receive a cache data read request initiated by the at least one processor process to the data cache system; the data cache system includes a plurality of cache elements, and the plurality of cache elements have different transmission rates and/or storage spaces And the plurality of cache elements are preset with different lookup priorities according to a transmission rate and/or a storage space;
第一查找模块,用于响应于接收到所述缓存数据读取请求,从发起所述缓存数据读取请求的进程对应的处理器的缓存元件开始,按照所述查找优先级从高到低的顺序依次向所述多种缓存元件查找相应缓存数据;a first searching module, configured to start, according to the cache element read request, a cache element of a processor corresponding to a process that initiates the cache data read request, according to the search priority from high to low Sequentially searching for the corresponding cache data to the plurality of cache components in sequence;
第一返回模块,用于响应于查找到缓存有所述相应缓存数据的缓存元件,从查找到的缓存 元件向所述对应的处理器进程返回所述相应缓存数据。a first returning module, configured to respond to the cached component that caches the corresponding cached data, from the cache that is found The component returns the corresponding cached data to the corresponding processor process.
可选地,本申请上述各实施例的***还包括:第二查找和返回模块,用于响应于未查找到缓存有所述相应缓存数据的缓存元件,向与所述数据缓存***通信连接的分布式文件***查找所述相应缓存数据、并将所述相应缓存数据向所述对应的处理器进程返回。Optionally, the system of the foregoing embodiments of the present application further includes: a second lookup and return module, configured to communicate with the data cache system in response to not finding a cache element that caches the corresponding cache data The distributed file system looks up the corresponding cached data and returns the corresponding cached data to the corresponding processor process.
可选地,所述多种缓存元件包括以下至少二种:图像处理器GPU显存、内存和硬盘;所述GPU显存的查找优先级高于所述内存的查找优先级,所述内存的查找优先级高于所述硬盘的查找优先级。Optionally, the multiple cache elements include at least two of the following: an image processor GPU memory, a memory, and a hard disk; the GPU memory has a higher lookup priority than the memory lookup priority, and the memory search priority The level is higher than the lookup priority of the hard disk.
可选地,所述硬盘包括固态硬盘和机械硬盘,所述固态硬盘的查找优先级高于所述机械硬盘的查找优先级。Optionally, the hard disk comprises a solid state hard disk and a mechanical hard disk, and the solid state hard disk has a higher priority of searching than the mechanical hard disk.
可选地,所述至少一处理器进程包括:至少一GPU进程,和/或,至少一中央处理器CPU进程。Optionally, the at least one processor process comprises: at least one GPU process, and/or at least one central processing unit CPU process.
可选地,第一查找模块包括:第一查找子模块,用于响应于接收到由GPU进程发起的所述缓存数据读取请求,按照所述查找优先级从高到低的顺序依次向GPU显存、内存和硬盘查找所述相应缓存数据。Optionally, the first search module includes: a first lookup submodule, configured to respond to the cache data read request initiated by the GPU process to the GPU according to the lookup priority from high to low Memory, memory, and hard disk look up the corresponding cached data.
可选地,第一查找模块包括:第二查找子模块,用于响应于接收到由CPU进程发起的所述缓存数据读取请求,按照所述查找优先级从高到低的顺序依次向内存和硬盘查找所述相应缓存数据。Optionally, the first search module includes: a second search submodule, configured to, in response to receiving the cache data read request initiated by the CPU process, sequentially go to the memory according to the lookup priority from high to low And the hard disk finds the corresponding cached data.
可选地,本申请上述各实施例的***还包括:第一缓存模块,用于如果查找到缓存有所述相应缓存数据的缓存元件为硬盘,则将所述相应缓存数据缓存到GPU显存和/或内存中。Optionally, the system of the foregoing embodiments of the present application further includes: a first cache module, configured to cache the corresponding cache data to the GPU memory and if the cache component that caches the corresponding cache data is found to be a hard disk / or in memory.
可选地,本申请上述各实施例的***还包括:第二缓存模块,如果未查找到缓存有所述相应缓存数据的缓存元件,则将所述相应缓存数据新增缓存到所述内存。Optionally, the system of the foregoing embodiments of the present application further includes: a second cache module, if the cache element with the corresponding cache data is not found, the corresponding cache data is newly cached into the memory.
可选地,本申请上述各实施例的***还包括:写入模块,用于响应于新增缓存到所述内存的所述相应缓存数据的数据构成预定大小的文件块,创建新的线程将所述文件块写入所述硬盘的缓存地址空间。Optionally, the system of the foregoing embodiments of the present application further includes: a writing module, configured to generate a new thread by responding to newly added data of the corresponding cache data buffered to the memory to form a file block of a predetermined size The file block is written to a cache address space of the hard disk.
可选地,所述硬盘包括固态硬盘;所述写入模块包括第一写入子模块,用于响应于新增缓存到所述内存的所述相应缓存数据的数据构成预定大小的文件块,创建新的第一线程将所述文件块写入所述固态硬盘的缓存空间。Optionally, the hard disk includes a solid state hard disk; the writing module includes a first writing submodule configured to form a file block of a predetermined size in response to the newly added data of the corresponding cache data buffered to the memory, Creating a new first thread writes the file block to the cache space of the SSD.
可选地,所述硬盘包括固态硬盘和机械硬盘;所述写入模块包括:第二写入子模块,用于响应于新增缓存到所述内存的所述相应缓存数据的数据构成预定大小的文件块,创建新的第一线程将所述文件块写入所述固态硬盘的缓存空间,并创建第二线程将写入到所述固态硬盘的所述文件块写入所述机械硬盘的缓存空间。Optionally, the hard disk includes a solid state hard disk and a mechanical hard disk; the writing module includes: a second writing submodule configured to form a predetermined size in response to newly added data of the corresponding cache data buffered to the memory a file block, creating a new first thread to write the file block to a cache space of the solid state drive, and creating a second thread to write the file block written to the solid state drive to the mechanical hard disk Cache space.
可选地,本申请上述各实施例的***还包括:释放模块,用于在任一缓存元件的缓存空间被写满时,根据预定缓存空间释放策略,删除或替换缓存空间被写满的所述缓存元件的缓存空间原有的缓存数据。Optionally, the system of the foregoing embodiments of the present application further includes: a release module, configured to delete or replace the cache space that is full according to a predetermined cache space release policy when a cache space of any cache element is filled. The cached space of the cache component is the original cached data.
根据本申请实施例的又一个方面,提供一种电子设备,包括本申请任一实施例所述的数据处理***。According to still another aspect of embodiments of the present application, an electronic device is provided, including the data processing system of any of the embodiments of the present application.
根据本申请实施例的再一个方面,提供一种电子设备,包括:According to still another aspect of the embodiments of the present application, an electronic device is provided, including:
处理器和本申请任一实施例所述的数据处理***;a processor and a data processing system according to any of the embodiments of the present application;
在处理器运行所述数据处理***时,本申请任一实施例所述的数据处理***中的单元被运 行。When the processor runs the data processing system, the unit in the data processing system according to any of the embodiments of the present application is shipped Row.
根据本申请实施例的再一个方面,提供一种电子设备,包括:一个或多个处理器、存储器、多种缓存元件、通信部件和通信总线,所述处理器、所述存储器、所述多种缓存单元和所述通信部件通过所述通信总线完成相互间的通信,所述多种缓存元件的传输速率和/或存储空间不同、且所述多种缓存元件根据传输速率和/或存储空间被预先设置有不同的查找优先级;所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行本申请任一实施例所述数据处理方法对应的操作。According to still another aspect of the embodiments of the present application, an electronic device includes: one or more processors, a memory, a plurality of cache elements, a communication component, and a communication bus, the processor, the memory, and the The cache unit and the communication component complete communication with each other through the communication bus, the transmission rate and/or storage space of the plurality of cache elements are different, and the plurality of cache elements are based on a transmission rate and/or a storage space. The memory is configured to store at least one executable instruction, and the executable instruction causes the processor to perform an operation corresponding to the data processing method in any of the embodiments of the present application.
根据本申请实施例的再一方面,提供一种计算机程序,包括计算机可读代码,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现本申请上述任一实施例所述的数据处理方法中各步骤的指令。According to still another aspect of an embodiment of the present application, a computer program is provided, comprising computer readable code, when a computer readable code is run on a device, a processor in the device performs the above-described An instruction of each step in the data processing method described in an embodiment.
根据本申请实施例的又一方面,还提供了一种计算机可读存储介质,用于存储计算机可读取的指令,所述指令被执行时实现本申请上述任一实施例所述的数据处理方法中各步骤的操作。According to still another aspect of the embodiments of the present application, a computer readable storage medium is provided for storing computer readable instructions, when the instructions are executed, implementing data processing according to any of the above embodiments of the present application. The operation of each step in the method.
基于本申请实施的数据处理方法和***、电子设备、程序和介质,对于数据缓存***中的多种缓存元件,根据传输速率和/或存储空间的不同对其预先设置有不同的查找优先级,接收到至少一处理器进程向数据缓存***发起的缓存数据读取请求是,从发起缓存数据读取请求的进程对应的处理器的缓存元件开始,按照查找优先级从高到低的顺序依次向多种缓存元件查找相应缓存数据,从查找到的缓存元件向对应的处理器进程返回相应缓存数据。本申请实施例利用了各种缓存资源在数据传输效率和存储空间两方面的性能,有助于优化缓存性能(如:最大化缓存命中率从而最小化缓存未命中导致的流向分布式文件***的请求负载),并且优化缓存***的数据传输性能(如:最大化数据读取的吞吐量),有助于幅度减少计算节点获取训练数据的时间,进而加快深度学习算法的运行速度,缩短训练算法的运行时间。The data processing method and system, the electronic device, the program, and the medium implemented in the present application have different search priorities preset for the plurality of cache components in the data cache system according to different transmission rates and/or storage spaces. Receiving, by the at least one processor process, a cache data read request initiated by the data cache system, starting from a cache element of the processor corresponding to the process that initiates the cache data read request, in order from the highest priority to the lowest priority A plurality of cache elements look up the corresponding cache data, and return the corresponding cache data from the found cache component to the corresponding processor process. The embodiments of the present application utilize the performance of various cache resources in data transmission efficiency and storage space, and help optimize cache performance (eg, maximize cache hit ratio to minimize the flow caused by cache misses to the distributed file system. Request load), and optimize the data transmission performance of the cache system (such as: maximize the throughput of data reading), help to reduce the time for the computing node to obtain training data, thereby speeding up the operation of the deep learning algorithm and shortening the training algorithm Running time.
下面通过附图和实施例,对本申请的技术方案做进一步的详细描述。The technical solutions of the present application are further described in detail below through the accompanying drawings and embodiments.
附图说明DRAWINGS
构成说明书的一部分的附图描述了本申请的实施例,并且连同描述一起用于解释本申请的原理。The accompanying drawings, which are incorporated in FIG.
本申请将在下面参考附图并结合实施例进行说明。其中:The present application will be described below with reference to the accompanying drawings in conjunction with the embodiments. among them:
图1为本申请数据处理方法一实施例的流程图。FIG. 1 is a flowchart of an embodiment of a data processing method according to the present application.
图2为本申请实施例中数据缓存***的一个工作模式示意图。FIG. 2 is a schematic diagram of a working mode of a data caching system in an embodiment of the present application.
图3为本申请实施例响应GPU进程请求一实施例的流程图。FIG. 3 is a flowchart of an embodiment of a response to a GPU process request according to an embodiment of the present application.
图4为本申请实施例响应CPU进程请求一实施例的流程图。4 is a flow chart of an embodiment of a response to a CPU process request according to an embodiment of the present application.
图5为本申请实施例响应文件缓存请求一实施例的流程图。FIG. 5 is a flowchart of an embodiment of a response file cache request according to an embodiment of the present application.
图6为本申请数据处理***一实施例结构示意图。FIG. 6 is a schematic structural diagram of an embodiment of a data processing system according to the present application.
图7为本申请数据处理***另一实施例结构示意图。FIG. 7 is a schematic structural diagram of another embodiment of a data processing system according to the present application.
图8为本申请电子设备一个实施例的结构示意图。FIG. 8 is a schematic structural diagram of an embodiment of an electronic device according to the present application.
图9为本申请电子设备另一个实施例的结构示意图。FIG. 9 is a schematic structural diagram of another embodiment of an electronic device according to the present application.
为清晰起见,这些附图均为示意性及简化的图,只给出了对于理解本申请所必要的细节,而省略其他细节。 For the sake of clarity, the drawings are schematic and simplified, and only the details necessary for understanding the present application are given, and other details are omitted.
具体实施方式detailed description
现在将参照附图来详细描述本申请的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本申请的范围。Various exemplary embodiments of the present application will now be described in detail with reference to the drawings. It should be noted that the relative arrangement of the components and steps, numerical expressions and numerical values set forth in the embodiments are not intended to limit the scope of the application.
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。In the meantime, it should be understood that the dimensions of the various parts shown in the drawings are not drawn in the actual scale relationship for the convenience of the description.
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本申请及其应用或使用的任何限制。The following description of the at least one exemplary embodiment is merely illustrative and is in no way
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。Techniques, methods and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail, but the techniques, methods and apparatus should be considered as part of the specification, where appropriate.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。It should be noted that similar reference numerals and letters indicate similar items in the following figures, and therefore, once an item is defined in one figure, it is not required to be further discussed in the subsequent figures.
本申请实施例可以应用于终端设备、计算机***、服务器等电子设备,其可与众多其它通用或专用计算***环境或配置一起操作。适于与终端设备、计算机***、服务器等电子设备一起使用的众所周知的终端设备、计算***、环境和/或配置的例子包括但不限于:个人计算机***、服务器计算机***、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的***、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机***﹑大型计算机***和包括上述任何***的分布式云计算技术环境,等等。Embodiments of the present application can be applied to electronic devices such as terminal devices, computer systems, servers, etc., which can operate with numerous other general purpose or special purpose computing system environments or configurations. Examples of well-known terminal devices, computing systems, environments, and/or configurations suitable for use with electronic devices such as terminal devices, computer systems, servers, and the like include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients Machines, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and the like.
终端设备、计算机***、服务器等电子设备可以在由计算机***执行的计算机***可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机***/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算***存储介质上。图1为本申请数据处理方法一实施例的流程图。图2为本申请实施例中数据缓存***的一个工作模式示意图。现结合图2对该实施例方法的各个步骤进行详细说明。Electronic devices such as terminal devices, computer systems, servers, etc., can be described in the general context of computer system executable instructions (such as program modules) being executed by a computer system. Generally, program modules may include routines, programs, target programs, components, logic, data structures, and the like that perform particular tasks or implement particular abstract data types. The computer system/server can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices that are linked through a communication network. In a distributed cloud computing environment, program modules may be located on a local or remote computing system storage medium including storage devices. FIG. 1 is a flowchart of an embodiment of a data processing method according to the present application. FIG. 2 is a schematic diagram of a working mode of a data caching system in an embodiment of the present application. The various steps of the method of this embodiment will now be described in detail in conjunction with FIG.
在步骤S110中,接收由至少一处理器进程向数据缓存***发起的缓存数据读取请求。In step S110, a cache data read request initiated by at least one processor process to the data cache system is received.
其中,所述数据缓存***包括多种缓存元件,该多种缓存元件的传输速率和/或存储空间不同、且多种缓存元件根据传输速率和/或存储空间被预先设置有不同的查找优先级。The data cache system includes a plurality of cache elements, the transfer rate and/or storage space of the plurality of cache elements are different, and the plurality of cache elements are preset with different lookup priorities according to the transmission rate and/or the storage space. .
数据缓存***(MoG)包括计算节点上的多种缓存元件,在本申请各实施例的一个可选示例中,上述多种缓存元件可以包括但不限于以下至少二种:GPU显存、内存和硬盘,硬盘又可以包括固态硬盘(SSD)和机械硬盘(HDD)。即:上述多种缓存元件可以包括GPU显存、内存(RAM)、固态硬盘和机械硬盘中的至少两个。这些缓存元件拥有不同的数据传输效率,例如,在商用计算机上,内存通常可支持大约1600MB/s的传输速率,固态硬盘通常可支持大约100MB/s~600MB/s的传输速率,机械硬盘通常可支持大约100MB/s~200MB/s的传输速率。同时,这几种缓存元件的存储资源拥有大小各异的存储空间,例如,GPU显存的存储空间通常为12GB,内存的存储空间通常为64GB~256GB,固态硬盘的存储空间通常为128GB~512GB,机械硬盘的存储空间通常为1TB~4TB。为有效地提升数据传输效率和充分利用存储资源,根据这些缓存元件的传输速率和/或存储空间,预先对这些缓存元件设置不同的查找优先级。例如,在包含GPU显存、内存、固态硬盘和机械硬盘的情形下,GPU显存的查找优先级高于内存的查找优先级,内存的查找优先级高于硬盘的查找优先级,其中,内存的查找优先 级高于固态硬盘的查找优先级,固态硬盘的查找优先级高于机械硬盘的查找优先级。The data cache system (MoG) includes a plurality of cache elements on the computing node. In an optional example of the embodiments of the present application, the foregoing plurality of cache elements may include but are not limited to at least two of the following: GPU memory, memory, and hard disk. The hard disk may include a solid state drive (SSD) and a mechanical hard disk (HDD). That is, the above various cache elements may include at least two of GPU memory, a memory (RAM), a solid state hard disk, and a mechanical hard disk. These cache elements have different data transfer efficiencies. For example, on a commercial computer, the memory usually supports a transfer rate of about 1600MB/s, and the SSD usually supports a transfer rate of about 100MB/s to 600MB/s. Mechanical hard disks are usually available. Supports transmission rates of approximately 100MB/s to 200MB/s. At the same time, the storage resources of these cache elements have different storage spaces. For example, the storage space of the GPU memory is usually 12 GB, the storage space of the memory is usually 64 GB to 256 GB, and the storage space of the solid state hard disk is usually 128 GB to 512 GB. Mechanical hard disk storage space is usually 1TB ~ 4TB. In order to effectively improve data transmission efficiency and make full use of storage resources, different cache priorities are set in advance according to the transmission rate and/or storage space of these cache elements. For example, in the case of GPU memory, memory, solid state hard disk and mechanical hard disk, the GPU memory has a higher priority than the memory search priority, and the memory search priority is higher than the hard disk search priority, wherein the memory search Priority The level is higher than the search priority of the SSD, and the SSD priority is higher than the search priority of the mechanical hard disk.
在本申请各实施例的一个可选实施方式中,上述至少一处理器进程例如可以包括但不限于:至少一GPU进程,和/或,至少一CPU进程。In an optional implementation manner of the embodiments of the present application, the at least one processor process may include, but is not limited to, at least one GPU process, and/or at least one CPU process.
在一个可选示例中,步骤S110可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的接收模块10执行。In an alternative example, step S110 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a receiving module 10 executed by the processor.
在步骤S120中,响应于接收到缓存数据读取请求,从发起上述缓存数据读取请求的进程对应的处理器的缓存元件开始,按照上述查找优先级从高到低的顺序,依次向上述多种缓存元件查找相应缓存数据。In step S120, in response to receiving the cache data read request, starting from the cache element of the processor corresponding to the process of initiating the cache data read request, according to the above-mentioned search priority from high to low, sequentially to the above The cache element looks up the corresponding cached data.
响应于接收到由GPU进程发起的上述缓存数据读取请求,按照上述查找优先级从高到低的顺序依次向GPU显存、内存和硬盘查找相应缓存数据。响应于接收到由CPU进程发起的上述缓存数据读取请求,按照上述查找优先级从高到低的顺序依次向内存和硬盘查找相应缓存数据。例如,响应于接收到由GPU进程发起的文件读取请求,依次从GPU显存、内存、固态硬盘、机械硬盘查找所要读取的文件;响应于接收到由CPU进程发起的文件读取请求,依次从内存、固态硬盘、硬盘查找所要读取的文件。In response to receiving the cache data read request initiated by the GPU process, the cache data is searched for the GPU memory, the memory, and the hard disk in descending order of the lookup priority. In response to receiving the cache data read request initiated by the CPU process, the cache data is searched for in the memory and the hard disk in order from the highest priority to the low priority. For example, in response to receiving a file read request initiated by the GPU process, sequentially searching for a file to be read from the GPU memory, the memory, the solid state hard disk, and the mechanical hard disk; in response to receiving the file read request initiated by the CPU process, in turn Find the file you want to read from memory, solid state drive, and hard drive.
在一个可选示例中,步骤S120可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一查找模块20执行。In an alternative example, step S120 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first lookup module 20 that is executed by the processor.
在步骤S130中,响应于查找到缓存有上述相应缓存数据的缓存元件,从查找到的缓存元件处向对应的处理器进程返回上述相应缓存数据。In step S130, in response to finding the cache element in which the corresponding cache data is cached, the corresponding cache data is returned from the found cache element to the corresponding processor process.
例如,对于GPU进程发送的缓存数据读取请求,当在内存中查找到相应缓存数据时,将该缓存数据从内存返回给GPU进程。For example, for a cache data read request sent by a GPU process, when the corresponding cache data is found in the memory, the cached data is returned from the memory to the GPU process.
在一个可选示例中,步骤S130可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一返回模块30执行。In an alternative example, step S130 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first return module 30 that is executed by the processor.
根据本申请实施例,数据缓存***的层级管理给予不同的存储资源不同的查找优先级,由高到低依次为GPU显存、内存、固态硬盘、机械硬盘。这主要是因为这四种设备所支持的数据传输速率依次递减,设备的数据传输速率越高,客户端获取数据的速度越快,更有利于提升深度学习***的速度。当数据缓存***收到GPU进程发送的缓存数据读取请求时,它按照查找优先级依次逐层查找这四种存储资源,一旦找到,即返回相应缓存数据给发起请求的进程,如图3所示。这种存储资源层级管理方法充分利用了各种缓存资源在数据传输效率和存储空间两方面的性能,有助于优化缓存性能(如:最大化缓存命中率从而最小化缓存未命中导致的流向分布式文件***的请求负载),并且优化缓存***的数据传输性能(如:最大化数据读取的吞吐量),从而降低计算节点获取数据的时间,进而加快深度学习算法的运行速度。According to the embodiment of the present application, the hierarchical management of the data caching system gives different storage resources different search priorities, from high to low, GPU memory, memory, solid state hard disk, and mechanical hard disk. This is mainly because the data transmission rates supported by the four devices are successively decremented. The higher the data transmission rate of the device, the faster the client obtains the data, which is more conducive to improving the speed of the deep learning system. When the data cache system receives the cache data read request sent by the GPU process, it searches the four storage resources layer by layer according to the search priority. Once found, the corresponding cache data is returned to the process of initiating the request, as shown in FIG. Show. This storage resource hierarchy management method takes full advantage of the performance of various cache resources in terms of data transmission efficiency and storage space, and helps optimize cache performance (eg, maximizes cache hit ratio to minimize flow distribution caused by cache misses). The file system's request load), and optimize the data transfer performance of the cache system (such as: maximize the throughput of data read), thereby reducing the time for the compute node to obtain data, thereby speeding up the deep learning algorithm.
另外,在本申请数据处理方法的另一实施例中,还可以包括:响应于未查找到缓存有上述相应缓存数据的缓存元件,向与数据缓存***通信连接的分布式文件***查找上述相应缓存数据、并将该相应缓存数据向对应的处理器进程返回。In addition, in another embodiment of the data processing method of the present application, the method may further include: searching for the cache corresponding to the distributed file system communicatively connected to the data cache system in response to not finding the cache element that caches the corresponding cache data. Data and return the corresponding cached data to the corresponding processor process.
在本申请数据处理方法的又一实施例中,还可以包括:如果查找到缓存有上述相应缓存数据的缓存元件为硬盘,则将该相应缓存数据缓存到GPU显存和/或内存中。In still another embodiment of the data processing method of the present application, the method further includes: if the cache component that caches the corresponding cache data is a hard disk, cache the corresponding cache data into the GPU memory and/or the memory.
根据本申请方法数据处理方法实施例的一种可选实施方式,上述进程与数据缓存***可以按客户端/服务器(C/S)工作模式工作,数据缓存***作为服务器,可以同时响应多个进程(例如GPU进程或CPU进程)的缓存数据读取请求,使得多个进程可以共享缓存内容,从而提高 了缓存效率。客户端/服务器的工作模式使得数据缓存***能够同时响应多个进程的缓存数据读取请求,使得多个进程可以共享缓存数据,从而提高了缓存效率。数据缓存***逻辑上位于客户端和分布式文件***之间,如图2所示。缓存数据读取为客户端提供了和文件***类似的文件读取接口。客户端通过该接口读取缓存数据,见图2中的1.get。如果该缓存数据已经被缓存在数据缓存***里,即缓存命中,则数据缓存***直接返回该缓存数据,见图2中的2.response。否则,缓存未命中,即该缓存数据还没有被缓存在数据缓存***内,则数据缓存***从分布式文件***读取缓存数据(见图2中的3.get),并缓存该缓存数据(见图2中的4.cache),同时向客户端返回该缓存数据(见图2中的5.response)。这种设计结构具有对客户端透明的特性,也就是说,客户端除了使用数据缓存***所提供的文件读取接口以外,其他操作和不使用数据缓存***时一样。According to an optional implementation manner of the data processing method embodiment of the present application, the foregoing process and data caching system can work in a client/server (C/S) working mode, and the data caching system acts as a server and can simultaneously respond to multiple processes. Cache data read requests (such as GPU processes or CPU processes) so that multiple processes can share cached content, thereby improving Cache efficiency. The client/server working mode enables the data caching system to simultaneously respond to cached data read requests of multiple processes, so that multiple processes can share cached data, thereby improving cache efficiency. The data caching system is logically located between the client and the distributed file system, as shown in Figure 2. Cache data reads provide the client with a file-reading interface similar to the file system. The client reads the cached data through the interface, as shown in Figure 2. If the cached data has been cached in the data cache system, ie, a cache hit, the data cache system directly returns the cached data, as shown in 2.response in Figure 2. Otherwise, if the cache misses, that is, the cached data has not been cached in the data cache system, the data cache system reads the cached data from the distributed file system (see 3.get in Figure 2) and caches the cached data ( See 4.cache in Figure 2, and return the cached data to the client (see 5.response in Figure 2). This design structure has the characteristics of being transparent to the client, that is, the client operates in the same way as the file cache interface provided by the data cache system, and does not use the data cache system.
根据本申请上述方法的一种实施方式,上述方法还包括响应未查找到缓存有上述相应缓存数据的缓存元件,向与上述数据缓存***通信连接的分布式文件***查找上述相应缓存数据、并将上述相应缓存数据向相应处理器进程返回。According to an embodiment of the foregoing method, the method further includes: in response to not finding a cache element that caches the corresponding cached data, searching for the corresponding cached data to a distributed file system that is in communication with the data cache system, and The corresponding cached data described above is returned to the corresponding processor process.
图3为本申请实施例响应GPU进程请求一实施例的流程图。该实施例中,以缓存数据读取请求为文件读取请求为例进行说明。如图3所示,该实施例包括:FIG. 3 is a flowchart of an embodiment of a response to a GPU process request according to an embodiment of the present application. In this embodiment, a cache data read request is taken as a file read request as an example. As shown in FIG. 3, this embodiment includes:
在步骤S310,响应于接收到由GPU进程发起的文件读取请求,确定所要读取的文件是否在GPU显存里,如果是,则处理进行到步骤S312,向GPU进程返回所要读取的文件;如果否,则处理进行到步骤S314。In step S310, in response to receiving the file read request initiated by the GPU process, determining whether the file to be read is in the GPU memory, and if so, the process proceeds to step S312 to return the file to be read to the GPU process; If not, the process proceeds to step S314.
在步骤S314,确定所要读取的文件是否在内存里,如果是,则处理进行到步骤S312;如果否,则处理进行到步骤S316。In step S314, it is determined whether the file to be read is in the memory, and if so, the process proceeds to step S312; if not, the process proceeds to step S316.
在步骤S316,确定所要读取的文件是否在固态硬盘里,如果是,则处理进行到步骤S320;如果否,则处理进行到步骤S318。In step S316, it is determined whether the file to be read is in the solid state hard disk, and if so, the process proceeds to step S320; if not, the process proceeds to step S318.
在步骤S318,确定所要读取的文件是否在机械硬盘里,如果是,则处理进行到步骤S320;如果否,则处理进行到步骤S322。In step S318, it is determined whether the file to be read is in the mechanical hard disk, and if so, the process proceeds to step S320; if not, the process proceeds to step S322.
在一个可选示例中,步骤S310~S318可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一查找模块20或者其中的第一查找子模块22执行。In an alternative example, steps S310-S318 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first lookup module 20 executed by the processor or a first look-up sub-module 22 therein.
在步骤S320,从固态硬盘或机械硬盘读取所要读取的文件,将其缓存到内存和GUP显存里,然后向GPU进程返回所要读取的文件。In step S320, the file to be read is read from the solid state hard disk or the mechanical hard disk, cached into the memory and the GUP memory, and then the file to be read is returned to the GPU process.
之后,不执行本实施例的后续流程。Thereafter, the subsequent flow of this embodiment is not performed.
在一个可选示例中,步骤S320可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一缓存模块40执行。In an alternative example, step S320 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first cache module 40 that is executed by the processor.
将文件缓存在GPU显存和内存里利用了存储资源各自的优势来提高缓存效率。对比GPU显存和内存,GPU显存访问速度快,但能提供的缓存空间小;而内存能提供的缓存空间大,但访问速度稍慢。如果将文件只缓存到GPU显存里,因为缓存空间小,很快会被写满,一旦GPU显存写满,当需要缓存新的文件时,旧文件会被删除来给新文件腾出空间,这样,如果被删除的这份文件在将来被再次访问,那就只能去固态硬盘和机械硬盘里面查找,所需时间较长。另外一方面,如果将文件只缓存到内存里,访问速度没有GPU显存快。本实施例采用了缓存两次的策略,可以在考虑文件访问速度的情况下、缓存较多的文件。Caching files in GPU memory and memory takes advantage of the respective advantages of storage resources to improve cache efficiency. Compared to GPU memory and memory, GPU memory access speed is fast, but the available cache space is small; while memory can provide a large cache space, but the access speed is slightly slower. If the file is only cached in the GPU memory, because the cache space is small, it will be filled up quickly. Once the GPU memory is full, when the new file needs to be cached, the old file will be deleted to make room for the new file. If the deleted file is revisited in the future, it can only be found in the SSD and the mechanical hard disk, which takes a long time. On the other hand, if the file is only cached in memory, the access speed is not as fast as GPU memory. In this embodiment, a cache twice strategy is adopted, and more files can be cached in consideration of file access speed.
在步骤S322,从分布式文件***读取文件。之后,处理进行到步骤S324。 At step S322, the file is read from the distributed file system. Thereafter, the process proceeds to step S324.
在步骤324,将文件缓存在内存、固态硬盘和/或机械硬盘里,然后向GPU进程返回所要读取的文件。At step 324, the file is cached in memory, a solid state drive, and/or a mechanical hard disk, and then the file to be read is returned to the GPU process.
在一个可选示例中,步骤S322~S324可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二查找和返回模块50执行。In an alternative example, steps S322-S324 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a second lookup and return module 50 executed by the processor.
本实施例提供的技术方案不仅利用了内存和硬盘,还利用了GPU显存缓存数据,能够对深度学习中的大量GPU算法实现缓存读取加速。本实施例使用动态缓存机制,能够支持大规模深度学习***要求的海量数据规模。The technical solution provided in this embodiment not only utilizes the memory and the hard disk, but also utilizes the GPU memory cache data, and can implement cache read acceleration for a large number of GPU algorithms in deep learning. This embodiment uses a dynamic caching mechanism to support the massive data size required by large-scale deep learning systems.
本申请利用多种存储资源形成异构资源数据缓存***,由于GPU显存和内存资源很小,这两种设备构成的缓存空间不能满足大规模深度学习***所要求的数据规模。因此,我们利用更大的存储资源,固态硬盘和机械硬盘,来共同构建缓存***。这一缓存***可以有效地提升数据传输效率。因为,原来文件需要从远端硬盘(分布式文件***)中获取,将其缓存到本地可以省去网络传输的时间;将其缓存到内存可以得到比硬盘更快的传输速率;对于GPU进程需要的文件,将其缓存到GPU显存里,还可以省去从内存到GPU显存的传输时间。本申请的这种设计方案,一方面能利用GPU显存和内存传输速率快的优势,另一方面能利用固态硬盘和机械硬盘存储空间大的优势,有助于提高缓存***的效率(访问速度),并且满足数据规模大的需求。The present application utilizes multiple storage resources to form a heterogeneous resource data caching system. Since the GPU memory and memory resources are small, the cache space formed by the two devices cannot meet the data size required by the large-scale deep learning system. Therefore, we use a larger storage resource, solid state drive and mechanical hard drive to build a cache system. This caching system can effectively improve data transmission efficiency. Because the original file needs to be obtained from the remote hard disk (distributed file system), caching it to the local can save the network transmission time; cache it to the memory to get a faster transfer rate than the hard disk; for the GPU process needs The file is cached into the GPU memory, and the transfer time from memory to GPU memory can be eliminated. The design of the present application can utilize the advantages of fast GPU memory and memory transfer rate on the one hand, and the advantage of large storage space of solid state hard disk and mechanical hard disk on the other hand, which helps to improve the efficiency of the cache system (access speed). And meet the needs of large data scale.
图4为本申请实施例响应CPU进程请求一实施例的流程图。该实施例中,以缓存数据读取请求为文件读取请求为例进行说明。如图4所示,该实施例包括:4 is a flow chart of an embodiment of a response to a CPU process request according to an embodiment of the present application. In this embodiment, a cache data read request is taken as a file read request as an example. As shown in FIG. 4, this embodiment includes:
在步骤S410,响应于接收到由CPU进程发起的文件读取请求,确定所要读取的文件是否在RAM里,如果是,则处理进行到步骤S412,向CPU进程返回所要读取的文件;如果否,则处理进行到步骤S414。In step S410, in response to receiving the file read request initiated by the CPU process, determining whether the file to be read is in the RAM, and if so, the process proceeds to step S412 to return the file to be read to the CPU process; Otherwise, the process proceeds to step S414.
在步骤S414,确定所要读取的文件是否在固态硬盘里,如果是,则处理进行到步骤S418;如果否,则处理进行到步骤S416。In step S414, it is determined whether the file to be read is in the solid state hard disk, and if so, the process proceeds to step S418; if not, the process proceeds to step S416.
在步骤S416,确定所要读取的文件是否在机械硬盘里,如果是,则处理进行到步骤S418;如果否,则处理进行到步骤S420。In step S416, it is determined whether the file to be read is in the mechanical hard disk, and if so, the process proceeds to step S418; if not, the process proceeds to step S420.
在一个可选示例中,步骤S410~S416可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一查找模块20或者其中的第二查找子模块24执行。In an alternative example, steps S410-S416 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a first lookup module 20 or a second look-up sub-module 24 executed by the processor.
在步骤S418,从硬盘读取所要读取的文件,将其缓存到RAM和GUP显存里,然后向CPU进程返回所要读取的文件。At step S418, the file to be read is read from the hard disk, cached into the RAM and the GUP memory, and then the file to be read is returned to the CPU process.
之后,不执行本实施例的后续流程。Thereafter, the subsequent flow of this embodiment is not performed.
在一个可选示例中,步骤S418可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第一缓存模块40执行。In an alternative example, step S418 may be performed by the processor invoking a corresponding instruction stored in the memory, or may be performed by the first cache module 40 being executed by the processor.
在步骤S420,从分布式文件***读取文件。之后,处理进行到步骤S422。At step S420, the file is read from the distributed file system. Thereafter, the process proceeds to step S422.
在步骤S422,将文件缓存在内存、固态硬盘和/或机械硬盘里,然后向CPU进程返回文件。In step S422, the file is cached in a memory, a solid state hard disk, and/or a mechanical hard disk, and then the file is returned to the CPU process.
在一个可选示例中,步骤S420~S422可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二查找和返回模块50执行。In an alternative example, steps S420-S422 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a second lookup and return module 50 executed by the processor.
本实施例同样使用动态缓存机制,能够支持大规模深度学习***要求的海量数据规模。This embodiment also uses a dynamic caching mechanism to support the massive data size required by large-scale deep learning systems.
在本申请数据处理方法的再一实施例中,还可以包括:如果未查找到缓存有上述相应缓存数据的缓存元件,则将该相应缓存数据新增缓存到内存。 In still another embodiment of the data processing method of the present application, the method further includes: if the cache element having the corresponding cached data is not found, the corresponding cached data is newly cached into the memory.
在本申请数据处理方法的进一步实施例中,还可以包括:响应于新增缓存到内存的相应缓存数据的数据构成预定大小的文件块,创建新的线程将该文件块写入硬盘的缓存地址空间。In a further embodiment of the data processing method of the present application, the method may further include: in response to newly adding data of the corresponding cache data buffered to the memory, forming a file block of a predetermined size, creating a new thread to write the file block to a cache address of the hard disk. space.
其中,在一个可选实施例方式中,上述硬盘包括固态硬盘。相应的,响应于新增缓存到内存的相应缓存数据的数据构成预定大小的文件块,创建新的线程将该文件块写入硬盘,可以包括:响应于新增缓存到内存的相应缓存数据的数据构成预定大小的文件块,创建新的第一线程将上述文件块写入固态硬盘的缓存空间。Wherein, in an optional embodiment, the hard disk comprises a solid state hard disk. Correspondingly, in response to the newly added data of the corresponding cached data buffered to the memory constituting the file block of a predetermined size, creating a new thread to write the file block to the hard disk may include: responding to the corresponding cached data newly added to the memory. The data constitutes a file block of a predetermined size, and a new first thread is created to write the above file block into the cache space of the solid state hard disk.
在另一个可选实施例方式中,上述硬盘包括固态硬盘和机械硬盘。相应的,响应于新增缓存到内存的相应缓存数据的数据构成预定大小的文件块,创建新的线程将该文件块写入硬盘的缓存空间,可以包括:响应于新增缓存到内存的相应缓存数据的数据构成预定大小的文件块,创建新的第一线程将该文件块写入固态硬盘的缓存空间,并创建第二线程将写入到上述固态硬盘的文件块写入机械硬盘的缓存空间。In another alternative embodiment, the hard disk includes a solid state hard disk and a mechanical hard disk. Correspondingly, in response to the newly added data of the cached data buffered to the memory constituting the file block of a predetermined size, creating a new thread to write the file block to the cache space of the hard disk may include: responding to the corresponding new cache to the memory The data of the cached data constitutes a file block of a predetermined size, a new first thread is created to write the file block to the cache space of the SSD, and a second thread is created to write the file block written to the SSD to the cache of the mechanical hard disk. space.
图5为本申请实施例响应文件缓存请求一实施例的流程图,即图3中所示的步骤S324和图4中所示的步骤S420的示例性流程图。如图5所示,该实施例包括:FIG. 5 is a flowchart of an embodiment of a response file cache request according to an embodiment of the present application, that is, an exemplary flowchart of step S324 shown in FIG. 3 and step S420 shown in FIG. 4. As shown in FIG. 5, this embodiment includes:
在步骤S510,当缓存未命中即在上述数据缓存***中未查找到所要读取的文件时,将所要读取的文件从分布式文件***缓存到内存的缓冲器。In step S510, when the cache miss occurs, the file to be read is not found in the data cache system, and the file to be read is cached from the distributed file system to the buffer of the memory.
在一个可选示例中,步骤S510可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的第二查找和返回模块50执行。In an alternative example, step S510 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a second lookup and return module 50 executed by the processor.
在步骤S520,当上述缓冲器中的数据累计构成一个块文件时,创建新的子线程将上述块文件写入固态硬盘的缓存地址空间。In step S520, when the data in the buffer is accumulated to constitute a block file, a new sub-thread is created to write the block file into the cache address space of the SSD.
在一个可选示例中,步骤S520可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的写入模块70或其中的第一写入子模块72执行。In an alternative example, step S520 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a write module 70 executed by the processor or a first write sub-module 72 therein.
在其它可选实施例中,在固态硬盘和机械硬盘同时存在时,还包括:In other optional embodiments, when both the solid state hard disk and the mechanical hard disk are present, the method further includes:
步骤S530,创建单独的子线程将固态硬盘的缓存数据写入机械硬盘的缓存地址空间。Step S530, creating a separate sub-thread to write the cache data of the SSD to the cache address space of the mechanical hard disk.
在一个可选示例中,步骤S520可以由处理器调用存储器存储的相应指令执行,也可以由被处理器运行的写入模块70或其中的第二写入子模块74执行。In an alternative example, step S520 may be performed by a processor invoking a corresponding instruction stored in a memory, or may be performed by a write module 70 or a second write sub-module 74 executed by the processor.
因为数据缓存***的缓存写入涉及多种存储资源,为了使数据缓存***具有高速的写入速度,本申请实施例针对多数深度学习***中训练数据由小文件(几KB到几MB不等)组成的特点,提出了图5所示的多线程异步的缓存写入方法。其中的多线程是指由不同的线程实现不同类型缓存空间的写入,异步是指多个线程的写入操作相互不同步。Because the cache write of the data cache system involves a variety of storage resources, in order to make the data cache system have a high speed of writing speed, the embodiment of the present application is directed to a small file (a few KB to several MB) for training data in most deep learning systems. The characteristics of the composition, the multi-threaded asynchronous cache writing method shown in Figure 5 is proposed. The multi-threading refers to the writing of different types of cache spaces by different threads, and the asynchronous means that the writing operations of multiple threads are not synchronized with each other.
缓存小文件遇到的一个问题是写入固态硬盘和机械硬盘的速度慢,因为,这两种设备的随机访问(写小文件)速度通常远没有顺序访问(写大文件)速度快。为了解决这一问题,本申请实施例利用内存缓存地址空间的一部分作为一个内存缓冲器(buffer)。要缓存的文件先被存入内存缓冲器。一旦缓冲器中的数据累计构成了一个块文件(块文件的大小可根据硬件***调整),该块文件则由一个新创建的子线程写入固态硬盘缓存地址空间。也就是,一个主线程负责响应文件缓存请求,并在需要的时候创建固态硬盘写入线程,如图5所示。值得注意的是,我们并不把缓冲器里的块文件直接写入机械硬盘缓存地址空间。这是因为,如果采用这种方式,机械硬盘的写入速度将决定内存缓冲器被清空的时间。因为这个速度比固态硬盘的写入速度慢,这将使得内存缓冲器被清空的时间变长,进而减慢缓存写入的速度。本申请实施例提出由一个单独的线程专门负责将固态硬盘的缓存数据写入机械硬盘缓存地址空间,如图5所示。用这种 方法,缓存***的写入速度主要由内存的写入速度和固态硬盘的写入速度决定,而不受到机械硬盘的写入速度影响。One problem encountered with caching small files is that writing to SSDs and mechanical drives is slow, because the random access (write small files) speed of these two devices is usually much faster than sequential access (writing large files). In order to solve this problem, the embodiment of the present application utilizes a portion of the memory cache address space as a memory buffer. The files to be cached are first stored in the memory buffer. Once the data in the buffer has accumulated to form a block file (the size of the block file can be adjusted according to the hardware system), the block file is written to the SSD cache address space by a newly created sub-thread. That is, a main thread is responsible for responding to file cache requests and creating a SSD write thread when needed, as shown in Figure 5. It is worth noting that we do not write the block files in the buffer directly into the mechanical hard disk cache address space. This is because, in this way, the write speed of the mechanical hard disk will determine when the memory buffer is emptied. Because this speed is slower than the write speed of the SSD, this will make the memory buffer clear longer, which slows down the cache write. The embodiment of the present application proposes that a separate thread is specifically responsible for writing the cache data of the solid state hard disk into the mechanical hard disk cache address space, as shown in FIG. 5 . Use this In this way, the write speed of the cache system is mainly determined by the write speed of the memory and the write speed of the solid state hard disk, without being affected by the write speed of the mechanical hard disk.
如上上述,数据缓存***一旦查找不到被请求的缓存数据,它会从分布式文件***读取相应缓存数据,并将该文件***缓存***(内存、固态硬盘和机械硬盘的缓存空间里)。在本申请数据处理方法的又一个实施例中,在任一缓存元件的缓存空间被写满时,可以根据预定缓存空间释放策略,删除或替换缓存空间被写满的缓存元件的缓存空间原有的缓存数据。即:如果缓存元件的缓存空间已满,可以根据缓存替换策略,例如最少使用频率(Least Frequently Used等策略,删除一个文件为新的文件留出空间。数据缓存***支持任何缓存空间释放策略,具体可以根据应用需求来选择。本实施例中将文件缓存在多个缓存空间(内存、固态硬盘和机械硬盘),可以利用多种存储设备各自的优势来提高缓存效率,这与上述实施例将文件缓存在两个缓存空间里(GPU显存和RAM)的原因相同,此处不再赘述。As mentioned above, once the data cache system cannot find the requested cache data, it reads the corresponding cache data from the distributed file system and inserts the file into the cache system (memory space of the memory, SSD and mechanical hard disk). In still another embodiment of the data processing method of the present application, when the cache space of any of the cache elements is full, the cache space of the cache component whose cache space is full may be deleted or replaced according to a predetermined cache space release policy. Cache data. That is: if the cache space of the cache component is full, you can set a space for the new file according to the cache replacement policy, such as the least used frequency (Least Frequently Used, etc.). The data cache system supports any cache space release policy. It can be selected according to the application requirements. In this embodiment, the files are cached in multiple cache spaces (memory, solid state hard disk and mechanical hard disk), and the advantages of various storage devices can be utilized to improve the cache efficiency, which is the same as the above embodiment. The reason for caching in two cache spaces (GPU memory and RAM) is the same, so I won't go into details here.
本申请实施例提供的任一种数据处理方法可以由任意适当的具有数据处理能力的设备执行,包括但不限于:终端设备和服务器等。或者,本申请实施例提供的任一种数据处理方法可以由处理器执行,如处理器通过调用存储器存储的相应指令来执行本申请实施例提及的任一种数据处理方法。下文不再赘述。Any of the data processing methods provided by the embodiments of the present application may be performed by any suitable device having data processing capabilities, including but not limited to: a terminal device, a server, and the like. Alternatively, any data processing method provided by the embodiment of the present application may be executed by a processor, such as the processor, by executing a corresponding instruction stored in the memory to execute any one of the data processing methods mentioned in the embodiments of the present application. This will not be repeated below.
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。A person skilled in the art can understand that all or part of the steps of implementing the above method embodiments may be completed by using hardware related to the program instructions. The foregoing program may be stored in a computer readable storage medium, and the program is executed when executed. The foregoing steps include the steps of the foregoing method embodiments; and the foregoing storage medium includes: a medium that can store program codes, such as a ROM, a RAM, a magnetic disk, or an optical disk.
图6为本申请数据处理***一实施例结构示意图。本申请各实施例的数据处理***可选择性地用于实现本申请上述各数据数理方法实施例。如图6所示,该实施例的数据处理***包括:接收模块10、第一查找模块20和第一返回模块30。其中:FIG. 6 is a schematic structural diagram of an embodiment of a data processing system according to the present application. The data processing system of the embodiments of the present application can be selectively used to implement the foregoing embodiments of the data mathematical methods of the present application. As shown in FIG. 6, the data processing system of this embodiment includes a receiving module 10, a first lookup module 20, and a first return module 30. among them:
接收模块10,用于接收由至少一处理器进程(如至少一GPU进程和/或至少一CPU进程)向数据缓存***发起的缓存数据读取请求,该数据缓存***包括多种缓存元件,多种缓存元件的传输速率和/或存储空间不同、且多种缓存元件根据传输速率和/或存储空间被预先设置有不同的查找优先级。上述多种缓存元件例如可以包括但不限于以下至少二种:GPU显存、内存和硬盘,其中,上述GPU显存的查找优先级高于内存的查找优先级,内存的查找优先级高于硬盘的查找优先级。硬盘又可以包括固态硬盘和机械硬盘,其中,固态硬盘的查找优先级高于机械硬盘的查找优先级。The receiving module 10 is configured to receive a cache data read request initiated by the at least one processor process (such as at least one GPU process and/or at least one CPU process) to the data cache system, where the data cache system includes multiple cache components, and more The transmission rate and/or storage space of the cache element are different, and the plurality of cache elements are preset with different lookup priorities according to the transmission rate and/or the storage space. The foregoing multiple cache components may include, but are not limited to, at least two of the following: GPU memory, memory, and hard disk, wherein the GPU memory has a higher lookup priority than a memory lookup priority, and the memory lookup priority is higher than the hard disk search. priority. The hard disk may further include a solid state hard disk and a mechanical hard disk, wherein the solid state hard disk has a higher priority than the mechanical hard disk.
第一查找模块20,用于响应于接收到缓存数据读取请求,从发起上述缓存数据读取请求的进程对应的处理器的缓存元件开始,按照上述查找优先级从高到低的顺序依次向上述多种缓存元件查找相应缓存数据。The first searching module 20 is configured to, in response to receiving the cache data read request, start from the cache element of the processor corresponding to the process that initiates the cache data read request, and sequentially perform the search priority from high to low. The above various cache elements look up the corresponding cache data.
第一返回模块30,用于响应于第一查找模块20查找到缓存有上述相应缓存数据的缓存元件,从查找到的缓存元件处向上述对应的处理器进程返回上述相应缓存数据。在本申请数据处理***各实施例的一个可选实施方式中,第一查找模块20可以包括:第一查找子模块22,用于响应于接收到由GPU进程发起的缓存数据读取请求,按照上述查找优先级从高到低的顺序依次向GPU显存、内存和硬盘查找相应缓存数据;和/或,第二查找子模块24,用于响应于接收到由CPU进程发起的缓存数据读取请求,按照上述查找优先级从高到低的顺序依次向内存和硬盘查找相应缓存数据。如图7所示,为本申请数据处理***另一实施例结构示意图。 The first returning module 30 is configured to, in response to the first searching module 20 searching for the cache element buffered with the corresponding cache data, returning the corresponding cache data from the found cache component to the corresponding processor process. In an optional implementation manner of the embodiments of the data processing system of the present application, the first searching module 20 may include: a first searching sub-module 22, configured to respond to receiving a cache data read request initiated by the GPU process, according to The above search priority sequentially searches the GPU memory, the memory and the hard disk for the corresponding cache data; and/or the second search submodule 24 is configured to respond to the cache data read request initiated by the CPU process. According to the above search priority, the cached data is searched for in memory and hard disk in descending order. FIG. 7 is a schematic structural diagram of another embodiment of a data processing system according to the present application.
可选地,再参见图图,本发明各实施例的数据处理***还可以包括:第一缓存模块40,用于如果查找到缓存有上述相应缓存数据的缓存元件为硬盘,则将上述相应缓存数据缓存到GPU显存和/或内存中。Optionally, referring to the figure, the data processing system of the embodiments of the present invention may further include: a first cache module 40, configured to: if the cache component that caches the corresponding cache data is a hard disk, the corresponding cache Data is cached into GPU memory and/or memory.
可选地,再参见图图,本发明各实施例的数据处理***还可以包括:第二查找和返回模块50,用于响应于未查找到缓存有上述相应缓存数据的缓存元件,向与上述数据缓存***通信连接的分布式文件***查找上述相应缓存数据、并将上述相应缓存数据向对应的处理器进程返回。Optionally, referring to the figure, the data processing system of the embodiments of the present invention may further include: a second lookup and return module 50, configured to respond to the cache element that does not find the cached data, and The distributed file system of the data cache system communication connection searches for the corresponding cache data and returns the corresponding cache data to the corresponding processor process.
可选地,再参见图图,本发明各实施例的数据处理***还可以包括:第二缓存模块60,用于如果未查找到缓存有上述相应缓存数据的缓存元件,则将上述相应缓存数据新增缓存到内存。Optionally, referring to the figure, the data processing system of the embodiments of the present invention may further include: a second cache module 60, configured to: if the cache element having the corresponding cached data is not found, the corresponding cached data is Added cache to memory.
可选地,再参见图图,本发明各实施例的数据处理***还可以包括:写入模块70,用于响应于新增缓存到内存的上述相应缓存数据的数据构成预定大小的文件块,创建新的线程将上述文件块写入硬盘的缓存地址空间。Optionally, referring to the figure, the data processing system of the embodiments of the present invention may further include: a writing module 70, configured to form a file block of a predetermined size in response to the data of the corresponding cached data newly added to the memory. Create a new thread to write the above file block to the cache address space of the hard disk.
在其中一个可选示例中,写入模块70可以包括:第一写入子模块72,用于在硬盘为固态硬盘时,响应新增缓存到内存的上述相应缓存数据的数据构成预定大小的文件块,创建新的第一线程将上述文件块写入固态硬盘的缓存空间;In an optional example, the writing module 70 may include: a first writing sub-module 72, configured to form a file of a predetermined size in response to newly added data of the corresponding cached data buffered to the memory when the hard disk is a solid state disk. Block, create a new first thread to write the above file block to the cache space of the SSD;
第二写入子模块74,用于在上述硬盘包括固态硬盘和机械硬盘时,响应于新增缓存到上述内存的上述相应缓存数据的数据构成预定大小的文件块,创建新的第一线程将上述文件块写入上述固态硬盘的缓存空间,并创建第二线程将写入到上述固态硬盘的上述文件块写入上述机械硬盘的缓存空间。The second writing sub-module 74 is configured to, when the hard disk comprises a solid state hard disk and a mechanical hard disk, form a file block of a predetermined size in response to the data of the corresponding cache data newly added to the memory, and create a new first thread. The file block is written into the cache space of the SSD, and a second thread is created to write the file block written to the SSD to the cache space of the mechanical hard disk.
释放模块80,用于在任一缓存元件的缓存空间被写满时,根据预定缓存空间释放策略,删除或替换缓存空间被写满的缓存元件的缓存空间原有的缓存数据。The release module 80 is configured to delete or replace the original cache data of the cache space of the cache component whose cache space is full according to the predetermined cache space release policy when the cache space of any of the cache elements is full.
本领域技术人员可以清楚地了解到,为描述的方便和简洁,上面描述的设备和模块的具体工作过程,可以参考前述方法实施例中的对应过程描述,在此不再赘述。A person skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the device and the module described above can be referred to the corresponding process description in the foregoing method embodiments, and details are not described herein again.
本申请实施例还提供了一种电子设备,包括本申请上述任一实施例的数据处理***。The embodiment of the present application further provides an electronic device, including the data processing system of any of the foregoing embodiments of the present application.
本申请实施例还提供了另一种电子设备,包括:The embodiment of the present application further provides another electronic device, including:
处理器和本申请上述任一实施例的数据处理***;A processor and a data processing system of any of the above embodiments of the present application;
在处理器运行上述数据处理***时,本申请上述任一实施例的数据处理***中的单元被运行。When the processor is running the data processing system described above, the units in the data processing system of any of the above embodiments of the present application are executed.
本申请实施例还提供了又一种电子设备,包括:一个或多个处理器、存储器、多种缓存元件、通信部件和通信总线,上述处理器、上述存储器、上述多种缓存单元和上述通信部件通过上述通信总线完成相互间的通信,上述多种缓存元件的传输速率和/或存储空间不同、且上述多种缓存元件根据传输速率和/或存储空间被预先设置有不同的查找优先级;The embodiment of the present application further provides another electronic device, including: one or more processors, a memory, a plurality of cache components, a communication component, and a communication bus, the processor, the memory, the plurality of cache units, and the foregoing communication The components complete communication with each other through the communication bus, the transmission rates and/or storage spaces of the plurality of cache components are different, and the plurality of cache components are preset with different lookup priorities according to the transmission rate and/or the storage space;
上述存储器用于存放至少一可执行指令,上述可执行指令使上述处理器执行如本申请上述任一实施例的数据处理方法对应的操作。The memory is configured to store at least one executable instruction, and the executable instruction causes the processor to perform an operation corresponding to the data processing method of any of the above embodiments of the present application.
图9为本申请电子设备一个实施例的结构示意图。如图9所示,该实施例的电子设备包括:处理器902、通信部件904、存储器906、GPU显存908、内存910以及通信总线912。存储器906可包括机械硬盘和/或固态硬盘。通信部件可包括但不限于输入/输出(I/O)接口、网卡等。FIG. 9 is a schematic structural diagram of an embodiment of an electronic device according to the present application. As shown in FIG. 9, the electronic device of this embodiment includes a processor 902, a communication component 904, a memory 906, a GPU memory 908, a memory 910, and a communication bus 912. Memory 906 can include a mechanical hard disk and/or a solid state hard disk. Communication components may include, but are not limited to, input/output (I/O) interfaces, network cards, and the like.
处理器902、通信部件904、存储器906、GPU显存908以及内存910通过通信总线912完成相互间的通信。 The processor 902, the communication component 904, the memory 906, the GPU memory 908, and the memory 910 communicate with each other via the communication bus 912.
通信部件904,用于与其它设备比如客户端或数据采集设备等的网元通信。The communication component 904 is configured to communicate with network elements of other devices, such as a client or a data collection device.
处理器902,用于执行程序920,具体可以执行上述方法实施例中的相关步骤。The processor 902 is configured to execute the program 920. Specifically, the related steps in the foregoing method embodiments may be performed.
具体地,程序可以包括程序代码,该程序代码包括计算机操作指令。In particular, the program can include program code, the program code including computer operating instructions.
上述处理器902可以一个或多个,处理器的设备形态可以是CPU,或者是GPU,或者是特定集成电路(Application Specific Integrated Circuit,ASIC),或者是被配置成实施本申请实施例的一个或多个集成电路等。The processor 902 may be one or more, and the device may be a CPU, or a GPU, or an Application Specific Integrated Circuit (ASIC), or may be configured to implement one or the embodiment of the present application. Multiple integrated circuits, etc.
存储器906,用于存放程序920。存储器906可能包含高速RAM存储器,也可能还包括非易失性存储器(non-volatile memory),例如至少一个磁盘存储器。The memory 906 is configured to store the program 920. Memory 906 may include high speed RAM memory and may also include non-volatile memory, such as at least one disk memory.
程序920包括至少一条可执行指令,具体可以用于使得处理器902执行以下操作:接收由至少一处理器进程向数据缓存***发起的缓存数据读取请求,上述数据缓存***包括多种缓存元件,且上述多种缓存元件的传输速率和/或存储空间不同、并根据各自的传输速率和/或存储空间被预先设置有不同的查找优先级;响应接收到的缓存数据读取请求,从发起上述缓存数据读取请求的进程对应的处理器的缓存元件开始按照上述查找优先级从高到低的顺序依次向上述多种缓存元件查找相应缓存数据;响应查找到缓存有上述相应缓存数据的缓存元件,从查找到的缓存元件处向相应处理器进程返回上述相应缓存数据。The program 920 includes at least one executable instruction, and may be specifically configured to cause the processor 902 to: receive a cache data read request initiated by the at least one processor process to the data cache system, the data cache system including a plurality of cache elements, And the foregoing multiple cache elements have different transmission rates and/or storage spaces, and are preset with different lookup priorities according to respective transmission rates and/or storage spaces; in response to the received cache data read request, the above is initiated. The cache element of the processor corresponding to the process of buffering the data read request starts to search for the corresponding cache data in the order of the above-mentioned lookup priorities from high to low, and sequentially responds to the cache element that caches the corresponding cached data. Returning the corresponding cached data from the found cache component to the corresponding processor process.
程序920中各步骤的具体实现可以参见上述实施例中的相应步骤和单元中对应的描述,在此不赘述。For a specific implementation of the steps in the program 920, reference may be made to corresponding descriptions in the corresponding steps and units in the foregoing embodiments, and details are not described herein.
图9为本申请电子设备另一个实施例的结构示意图。下面参考图9,其示出了适于用来实现本申请实施例的终端设备或服务器的电子设备的结构示意图。如图9所示,该电子设备包括一个或多个处理器、通信部等,所述一个或多个处理器例如:一个或多个CPU901,和/或一个或多个GPU913等,处理器可以根据存储在只读存储器(ROM)902中的可执行指令或者从存储部分908加载到随机访问存储器(RAM)903中的可执行指令而执行各种适当的动作和处理。通信部912可包括但不限于网卡,所述网卡可包括但不限于IB(Infiniband)网卡,处理器可与只读存储器902和/或随机访问存储器903中通信以执行可执行指令,通过总线904与通信部912相连、并经通信部912与其他目标设备通信,从而完成本申请实施例提供的任一数据处理方法对应的操作,例如,接收由至少一处理器进程向数据缓存***发起的缓存数据读取请求;所述数据缓存***包括多种缓存元件,所述多种缓存元件的传输速率和/或存储空间不同、且所述多种缓存元件根据传输速率和/或存储空间被预先设置有不同的查找优先级;响应于接收到所述缓存数据读取请求,从发起所述缓存数据读取请求的进程对应的处理器的缓存元件开始,按照所述查找优先级从高到低的顺序依次向所述多种缓存元件查找相应缓存数据;响应于查找到缓存有所述相应缓存数据的缓存元件,从查找到的缓存元件向所述对应的处理器进程返回所述相应缓存数据。FIG. 9 is a schematic structural diagram of another embodiment of an electronic device according to the present application. Referring to FIG. 9, there is shown a schematic structural diagram of an electronic device suitable for implementing a terminal device or a server of an embodiment of the present application. As shown in FIG. 9, the electronic device includes one or more processors, a communication unit, etc., the one or more processors such as one or more CPUs 901, and/or one or more GPUs 913, etc., the processor may Various appropriate actions and processes are performed in accordance with executable instructions stored in read only memory (ROM) 902 or executable instructions loaded into random access memory (RAM) 903 from storage portion 908. Communication portion 912 can include, but is not limited to, a network card, which can include, but is not limited to, an IB (Infiniband) network card, and the processor can communicate with read only memory 902 and/or random access memory 903 to execute executable instructions over bus 904. The operation is performed by the communication unit 912, and communicates with other target devices via the communication unit 912, thereby performing operations corresponding to any data processing method provided by the embodiment of the present application, for example, receiving a cache initiated by at least one processor process to the data cache system. a data read request; the data cache system includes a plurality of cache elements, the transfer rates and/or storage spaces of the plurality of cache elements are different, and the plurality of cache elements are preset according to a transfer rate and/or a storage space Having different lookup priorities; in response to receiving the cache data read request, starting from a cache element of a processor corresponding to the process initiating the cache data read request, according to the lookup priority from high to low Having sequentially searched the plurality of cache elements for respective cache data; in response to finding the cache, the corresponding cache data is buffered Element, returns to the corresponding cache data from the processor corresponding to the process to find the cache element.
此外,在RAM 903中,还可存储有装置操作所需的各种程序和数据。CPU901、ROM902以及RAM903通过总线904彼此相连。在有RAM903的情况下,ROM902为可选模块。RAM903存储可执行指令,或在运行时向ROM902中写入可执行指令,可执行指令使处理器901执行上述数据处理方法对应的操作。输入/输出(I/O)接口905也连接至总线904。通信部912可以集成设置,也可以设置为具有多个子模块(例如多个IB网卡),并在总线链接上。Further, in the RAM 903, various programs and data required for the operation of the device can be stored. The CPU 901, the ROM 902, and the RAM 903 are connected to each other by a bus 904. In the case of RAM 903, ROM 902 is an optional module. The RAM 903 stores executable instructions or writes executable instructions to the ROM 902 at runtime, the executable instructions causing the processor 901 to perform operations corresponding to the data processing methods described above. An input/output (I/O) interface 905 is also coupled to bus 904. The communication unit 912 may be integrated or may be provided with a plurality of sub-modules (for example, a plurality of IB network cards) and on the bus link.
以下部件连接至I/O接口905:包括键盘、鼠标等的输入部分906;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分907;包括硬盘等的存储部分908; 以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分909。通信部分909经由诸如因特网的网络执行通信处理。驱动器910也根据需要连接至I/O接口905。可拆卸介质911,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器910上,以便于从其上读出的计算机程序根据需要被安装入存储部分908。The following components are connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, etc.; an output portion 907 including, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 908 including a hard disk or the like. ; And a communication portion 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the Internet. Driver 910 is also connected to I/O interface 905 as needed. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like is mounted on the drive 910 as needed so that a computer program read therefrom is installed into the storage portion 908 as needed.
需要说明的,如图9所示的架构仅为一种可选实现方式,在具体实践过程中,可根据实际需要对上述图9的部件数量和类型进行选择、删减、增加或替换;在不同功能部件设置上,也可采用分离设置或集成设置等实现方式,例如GPU和CPU可分离设置或者可将GPU集成在CPU上,通信部可分离设置,也可集成设置在CPU或GPU上,等等。这些可替换的实施方式均落入本申请公开的保护范围。It should be noted that the architecture shown in FIG. 9 is only an optional implementation manner. In a specific implementation process, the number and type of components in FIG. 9 may be selected, deleted, added, or replaced according to actual needs; Different function components can also be implemented in separate settings or integrated settings, such as GPU and CPU detachable settings or GPU can be integrated on the CPU, the communication part can be separated, or integrated on the CPU or GPU. and many more. These alternative embodiments are all within the scope of protection disclosed herein.
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,计算机程序包含用于执行流程图所示的方法的程序代码,程序代码可包括对应执行本申请实施例提供的方法步骤对应的指令,例如,接收由至少一处理器进程向数据缓存***发起的缓存数据读取请求的指令;所述数据缓存***包括多种缓存元件,所述多种缓存元件的传输速率和/或存储空间不同、且所述多种缓存元件根据传输速率和/或存储空间被预先设置有不同的查找优先级;响应于接收到所述缓存数据读取请求,从发起所述缓存数据读取请求的进程对应的处理器的缓存元件开始,按照所述查找优先级从高到低的顺序依次向所述多种缓存元件查找相应缓存数据的指令;响应于查找到缓存有所述相应缓存数据的缓存元件,从查找到的缓存元件向所述对应的处理器进程返回所述相应缓存数据的指令。In particular, the processes described above with reference to the flowcharts may be implemented as a computer software program in accordance with an embodiment of the present disclosure. For example, an embodiment of the present disclosure includes a computer program product comprising a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for executing the method illustrated in the flowchart, the program code comprising Executing instructions corresponding to the method steps provided by the embodiments of the present application, for example, receiving an instruction of a cache data read request initiated by at least one processor process to a data cache system; the data cache system includes multiple cache elements, The transmission rate and/or the storage space of the cache element are different, and the plurality of cache elements are preset with different lookup priorities according to the transmission rate and/or the storage space; in response to receiving the cache data read request, Starting from the cache element of the processor corresponding to the process of initiating the cache data read request, sequentially querying the plurality of cache elements for the corresponding cache data according to the lookup priority from the highest priority to the lower priority; Caching the cached element with the corresponding cached data from the cached element to the Processor should process a return instruction corresponding to the cache data.
另外,本申请实施例还提供了一种计算机程序,包括计算机可读代码,当该计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现本申请任一实施例的数据处理方法中各步骤的指令。In addition, the embodiment of the present application further provides a computer program, including computer readable code, when the computer readable code is run on a device, the processor in the device executes to implement any of the embodiments of the present application. Instructions for each step in the data processing method.
另外,本申请实施例还提供了一种计算机可读存储介质,用于存储计算机可读取的指令,该指令被执行时实现本申请任一实施例的数据处理方法中各步骤的操作。In addition, the embodiment of the present application further provides a computer readable storage medium for storing computer readable instructions, which when executed, implement the operations of the steps in the data processing method of any embodiment of the present application.
本说明书中各个实施例均采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似的部分相互参见即可。对于***实施例而言,由于其与方法实施例基本对应,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。The various embodiments in the specification are described in a progressive manner, and each embodiment focuses on differences from other embodiments, and the same or similar parts between the various embodiments may be referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.
除非明确指出,在此所用的单数形式“一”、“该”均包括复数含义(即具有“至少一”的意思)。应当进一步理解,说明书中使用的术语“具有”、“包括”和/或“包含”表明存在所述的特征、步骤、操作、元件和/或部件,但不排除存在或增加一个或多个其他特征、步骤、操作、元件、部件和/或其组合。如在此所用的术语“和/或”包括一个或多个列举的相关项目的任何及所有组合。除非明确指出,在此公开的任何方法的步骤不必精确按照所公开的顺序执行。The singular forms "a", "the", and "the" It is to be understood that the terms "comprising", "comprising", and" Features, steps, operations, elements, components, and/or combinations thereof. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items. The steps of any method disclosed herein do not have to be performed exactly in the order disclosed.
一些可选实施例已经在前面进行了说明,但是应当强调的是,本申请不局限于这些实施例,而是可以本申请主题范围内的其它方式实现。Some alternative embodiments have been described above, but it should be emphasized that the present application is not limited to the embodiments, but may be implemented in other ways within the scope of the subject matter of the present application.
需要指出,根据实施的需要,可将本申请实施例中描述的各个部件/步骤拆分为更多部件/步骤,也可将两个或多个部件/步骤或者部件/步骤的部分操作组合成新的部件/步骤,以实现本申请实施例的目的。It should be noted that the various components/steps described in the embodiments of the present application may be split into more components/steps according to the needs of the implementation, or two or more components/steps or partial operations of the components/steps may be combined into one. New components/steps to achieve the objectives of the embodiments of the present application.
上述根据本申请实施例的方法可在硬件、固件中实现,或者被实现为可存储在记录介质(诸如CD ROM、RAM、软盘、硬盘或磁光盘)中的软件或计算机代码,或者被实现为通过网络 下载的、原始存储在远程记录介质或非暂时机器可读介质中并将被存储在本地记录介质中的计算机代码,从而在此描述的方法可被存储在使用通用计算机、专用处理器或者可编程或专用硬件(诸如ASIC或FPGA)的记录介质上的这样的软件处理。可以理解,计算机、处理器、微处理器控制器或可编程硬件包括可存储或接收软件或计算机代码的存储组件(例如,RAM、ROM、闪存等),当所述软件或计算机代码被计算机、处理器或硬件访问且执行时,实现在此描述的处理方法。此外,当通用计算机访问用于实现在此示出的处理的代码时,代码的执行将通用计算机转换为用于执行在此示出的处理的专用计算机。The above method according to an embodiment of the present application may be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD ROM, a RAM, a floppy disk, a hard disk, or a magneto-optical disk, or as via Internet A computer code that is stored in a remote recording medium or a non-transitory machine readable medium and that is stored in a local recording medium, such that the methods described herein can be stored using a general purpose computer, a dedicated processor, or programmable. Such software processing on a recording medium of dedicated hardware such as an ASIC or an FPGA. It will be understood that a computer, processor, microprocessor controller or programmable hardware includes storage components (eg, RAM, ROM, flash memory, etc.) that can store or receive software or computer code, when the software or computer code is The processing methods described herein are implemented when the processor or hardware is accessed and executed. Moreover, when a general purpose computer accesses code for implementing the processing shown herein, the execution of the code converts the general purpose computer into a special purpose computer for performing the processing shown herein.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及方法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法实现所描述的功能,但是这种实现不应认为超出本申请实施例的范围。Those of ordinary skill in the art will appreciate that the elements and method steps of the various examples described in connection with the embodiments disclosed herein can be implemented in electronic hardware or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the solution. A person skilled in the art can implement the described functions using different methods for each particular application, but such implementation should not be considered to be beyond the scope of the embodiments of the present application.
以上实施方式仅用于说明本申请实施例,而并非对本申请实施例的限制,有关技术领域的普通技术人员,在不脱离本申请实施例的精神和范围的情况下,还可以做出各种变化和变型,因此所有等同的技术方案也属于本申请实施例的范畴,本申请实施例的专利保护范围应由权利要求限定。 The above embodiments are only used to describe the embodiments of the present application, and are not intended to limit the embodiments of the present application. Those skilled in the relevant art can make various kinds without departing from the spirit and scope of the embodiments of the present application. Variations and modifications, therefore, all equivalent technical solutions are also within the scope of the embodiments of the present application, and the scope of patent protection of the embodiments of the present application should be defined by the claims.

Claims (31)

  1. 一种数据处理方法,其特征在于,包括:A data processing method, comprising:
    接收由至少一处理器进程向数据缓存***发起的缓存数据读取请求;所述数据缓存***包括多种缓存元件,所述多种缓存元件的传输速率和/或存储空间不同、且所述多种缓存元件根据传输速率和/或存储空间被预先设置有不同的查找优先级;Receiving a cache data read request initiated by the at least one processor process to the data cache system; the data cache system includes a plurality of cache elements, the transfer rate and/or storage space of the plurality of cache elements being different, and the plurality of The cache element is preset with different lookup priorities according to the transmission rate and/or the storage space;
    响应于接收到所述缓存数据读取请求,从发起所述缓存数据读取请求的进程对应的处理器的缓存元件开始,按照所述查找优先级从高到低的顺序依次向所述多种缓存元件查找相应缓存数据;Responding to receiving the cache data read request, starting from a cache element of a processor corresponding to a process that initiates the cache data read request, sequentially performing the plurality of lookup priorities in descending order The cache component looks up the corresponding cache data;
    响应于查找到缓存有所述相应缓存数据的缓存元件,从查找到的缓存元件向所述对应的处理器进程返回所述相应缓存数据。In response to finding a cache element that caches the corresponding cache data, the corresponding cache data is returned from the found cache element to the corresponding processor process.
  2. 根据权利要求1所述的方法,其特征在于,还包括:The method of claim 1 further comprising:
    响应于未查找到缓存有所述相应缓存数据的缓存元件,向与所述数据缓存***通信连接的分布式文件***查找所述相应缓存数据、并将所述相应缓存数据向所述对应的处理器进程返回。Responding to a cache element that is cached with the corresponding cached data, looking up the corresponding cached data to a distributed file system communicatively coupled to the data cache system, and directing the corresponding cached data to the corresponding processing The process returns.
  3. 根据权利要求1或2所述的方法,其特征在于,所述多种缓存元件包括以下至少二种:图像处理器GPU显存、内存和硬盘;The method according to claim 1 or 2, wherein the plurality of cache elements comprise at least two of the following: an image processor GPU memory, a memory, and a hard disk;
    所述GPU显存的查找优先级高于所述内存的查找优先级,所述内存的查找优先级高于所述硬盘的查找优先级。The lookup priority of the GPU memory is higher than the lookup priority of the memory, and the memory has a higher lookup priority than the hard disk.
  4. 根据权利要求3所述的方法,其特征在于,所述硬盘包括固态硬盘和机械硬盘,所述固态硬盘的查找优先级高于所述机械硬盘的查找优先级。The method according to claim 3, wherein the hard disk comprises a solid state hard disk and a mechanical hard disk, and the solid state hard disk has a higher priority of searching than the mechanical hard disk.
  5. 根据权利要求1-4任一所述的方法,其特征在于,所述至少一处理器进程包括:至少一GPU进程,和/或,至少一中央处理器CPU进程。The method of any of claims 1-4, wherein the at least one processor process comprises: at least one GPU process, and/or at least one central processor CPU process.
  6. 根据权利要求5所述的方法,其特征在于,响应于接收到所述缓存数据读取请求,从发起所述缓存数据读取请求的进程对应的处理器的缓存元件开始,按照所述查找优先级从高到低的顺序依次向所述多种缓存元件查找相应缓存数据,包括:The method according to claim 5, wherein, in response to receiving the cache data read request, starting from a cache element of a processor corresponding to a process initiating the cache data read request, according to the lookup priority Levels are searched for the cached data from the plurality of cache elements in descending order, including:
    响应于接收到由GPU进程发起的所述缓存数据读取请求,按照所述查找优先级从高到低的顺序依次向GPU显存、内存和硬盘查找所述相应缓存数据。In response to receiving the cache data read request initiated by the GPU process, the corresponding cache data is sequentially searched to the GPU memory, the memory, and the hard disk according to the lookup priority from high to low.
  7. 根据权利要求5或6所述的方法,其特征在于,响应于接收到所述缓存数据读取请求,从发起所述缓存数据读取请求的进程对应的处理器的缓存元件开始,按照所述查找优先级从高到低的顺序依次向所述多种缓存元件查找相应缓存数据,包括:The method according to claim 5 or 6, wherein in response to receiving the cache data read request, starting from a cache element of a processor corresponding to a process initiating the cache data read request, according to the Finding the corresponding cache data to the plurality of cache components in order of priority from high to low, including:
    响应于接收到由CPU进程发起的所述缓存数据读取请求,按照所述查找优先级从高到低的顺序依次向内存和硬盘查找所述相应缓存数据。In response to receiving the cache data read request initiated by the CPU process, the corresponding cache data is sequentially searched to the memory and the hard disk according to the lookup priority from high to low.
  8. 根据权利要求3-7任一所述的方法,其特征在于,还包括:The method according to any one of claims 3-7, further comprising:
    如果查找到缓存有所述相应缓存数据的缓存元件为硬盘,则将所述相应缓存数据缓存到GPU显存和/或内存中。 If it is found that the cache element with the corresponding cache data is a hard disk, the corresponding cache data is cached into the GPU memory and/or the memory.
  9. 根据权利要求3-8任一所述的方法,其特征在于,还包括:The method according to any one of claims 3-8, further comprising:
    如果未查找到缓存有所述相应缓存数据的缓存元件,则将所述相应缓存数据新增缓存到所述内存。If the cache element with the corresponding cached data is not found, the corresponding cached data is newly cached into the memory.
  10. 根据权利要求9所述的方法,其特征在于,还包括:The method of claim 9 further comprising:
    响应于新增缓存到所述内存的所述相应缓存数据的数据构成预定大小的文件块,创建新的线程将所述文件块写入所述硬盘的缓存地址空间。In response to the newly added data of the corresponding cache data buffered to the memory constituting a file block of a predetermined size, a new thread is created to write the file block to the cache address space of the hard disk.
  11. 根据权利要求10所述的方法,其特征在于,所述硬盘包括固态硬盘;所述响应于新增缓存到所述内存的所述相应缓存数据的数据构成预定大小的文件块,创建新的线程将所述文件块写入所述硬盘,包括:The method according to claim 10, wherein the hard disk comprises a solid state hard disk; and the data in response to the newly added cache data buffered to the memory constitutes a file block of a predetermined size, creating a new thread Writing the file block to the hard disk includes:
    响应于新增缓存到所述内存的所述相应缓存数据的数据构成预定大小的文件块,创建新的第一线程将所述文件块写入所述固态硬盘的缓存空间。In response to the newly added data of the corresponding cached data buffered to the memory constituting a file block of a predetermined size, a new first thread is created to write the file block to the cache space of the solid state hard disk.
  12. 根据权利要求10所述的方法,其特征在于,所述硬盘包括固态硬盘和机械硬盘;所述响应于新增缓存到所述内存的所述相应缓存数据的数据构成预定大小的文件块,创建新的线程将所述文件块写入所述硬盘的缓存空间,包括:The method according to claim 10, wherein the hard disk comprises a solid state hard disk and a mechanical hard disk; and the data in response to the newly added cached data buffered to the memory constitutes a file block of a predetermined size, created The new thread writes the file block to the cache space of the hard disk, including:
    响应于新增缓存到所述内存的所述相应缓存数据的数据构成预定大小的文件块,创建新的第一线程将所述文件块写入所述固态硬盘的缓存空间,并创建第二线程将写入到所述固态硬盘的所述文件块写入所述机械硬盘的缓存空间。Responding to newly adding data of the corresponding cache data buffered to the memory to form a file block of a predetermined size, creating a new first thread to write the file block to a cache space of the solid state hard disk, and creating a second thread The file block written to the solid state hard disk is written to a cache space of the mechanical hard disk.
  13. 根据权利要求1-12任一所述的方法,其特征在于,还包括:The method of any of claims 1-12, further comprising:
    在任一缓存元件的缓存空间被写满时,根据预定缓存空间释放策略,删除或替换缓存空间被写满的缓存元件的缓存空间原有的缓存数据。When the buffer space of any of the cache elements is full, the original cache data of the cache space of the cache element whose cache space is full is deleted or replaced according to the predetermined cache space release policy.
  14. 一种数据处理***,其特征在于,包括:A data processing system, comprising:
    接收模块,用于接收由至少一处理器进程向数据缓存***发起的缓存数据读取请求;所述数据缓存***包括多种缓存元件,所述多种缓存元件的传输速率和/或存储空间不同、且所述多种缓存元件根据传输速率和/或存储空间被预先设置有不同的查找优先级;a receiving module, configured to receive a cache data read request initiated by the at least one processor process to the data cache system; the data cache system includes a plurality of cache elements, and the plurality of cache elements have different transmission rates and/or storage spaces And the plurality of cache elements are preset with different lookup priorities according to a transmission rate and/or a storage space;
    第一查找模块,用于响应于接收到所述缓存数据读取请求,从发起所述缓存数据读取请求的进程对应的处理器的缓存元件开始,按照所述查找优先级从高到低的顺序依次向所述多种缓存元件查找相应缓存数据;a first searching module, configured to start, according to the cache element read request, a cache element of a processor corresponding to a process that initiates the cache data read request, according to the search priority from high to low Sequentially searching for the corresponding cache data to the plurality of cache components in sequence;
    第一返回模块,用于响应于查找到缓存有所述相应缓存数据的缓存元件,从查找到的缓存元件向所述对应的处理器进程返回所述相应缓存数据。And a first returning module, configured to return the corresponding cached data from the found cache element to the corresponding processor process in response to finding a cache element that caches the corresponding cached data.
  15. 根据权利要求14所述的***,其特征在于,还包括:The system of claim 14 further comprising:
    第二查找和返回模块,用于响应于未查找到缓存有所述相应缓存数据的缓存元件,向与所述数据缓存***通信连接的分布式文件***查找所述相应缓存数据、并将所述相应缓存数据向所述对应的处理器进程返回。a second lookup and return module responsive to not finding a cache element having the corresponding cached data cached, looking up the corresponding cached data to a distributed file system communicatively coupled to the data cache system, and Corresponding cache data is returned to the corresponding processor process.
  16. 根据权利要求14或15所述的***,其特征在于,所述多种缓存元件包括以下至少二种:图像处理器GPU显存、内存和硬盘;The system according to claim 14 or 15, wherein the plurality of cache elements comprise at least two of the following: an image processor GPU memory, a memory, and a hard disk;
    所述GPU显存的查找优先级高于所述内存的查找优先级,所述内存的查找优先级高于所述硬盘的查找优先级。The lookup priority of the GPU memory is higher than the lookup priority of the memory, and the memory has a higher lookup priority than the hard disk.
  17. 根据权利要求16所述的***,其特征在于,所述硬盘包括固态硬盘和 机械硬盘,所述固态硬盘的查找优先级高于所述机械硬盘的查找优先级。The system of claim 16 wherein said hard disk comprises a solid state drive and A mechanical hard disk having a higher priority of lookup than the mechanical hard disk.
  18. 根据权利要求14-17任一所述的***,其特征在于,所述至少一处理器进程包括:至少一GPU进程,和/或,至少一中央处理器CPU进程。The system of any of claims 14-17, wherein the at least one processor process comprises: at least one GPU process, and/or at least one central processor CPU process.
  19. 根据权利要求18所述的***,其特征在于,第一查找模块包括:The system of claim 18 wherein the first lookup module comprises:
    第一查找子模块,用于响应于接收到由GPU进程发起的所述缓存数据读取请求,按照所述查找优先级从高到低的顺序依次向GPU显存、内存和硬盘查找所述相应缓存数据。a first search submodule, configured to, in response to receiving the cache data read request initiated by the GPU process, search for the corresponding cache to the GPU memory, the memory, and the hard disk according to the lookup priority from high to low. data.
  20. 根据权利要求18或19所述的***,其特征在于,第一查找模块包括:The system of claim 18 or 19, wherein the first lookup module comprises:
    第二查找子模块,用于响应于接收到由CPU进程发起的所述缓存数据读取请求,按照所述查找优先级从高到低的顺序依次向内存和硬盘查找所述相应缓存数据。The second search submodule is configured to, in response to receiving the cache data read request initiated by the CPU process, sequentially search the memory and the hard disk for the corresponding cache data according to the lookup priority from high to low.
  21. 根据权利要求16-20任一所述的***,其特征在于,还包括:The system of any one of claims 16 to 20, further comprising:
    第一缓存模块,用于如果查找到缓存有所述相应缓存数据的缓存元件为硬盘,则将所述相应缓存数据缓存到GPU显存和/或内存中。The first cache module is configured to cache the corresponding cache data into the GPU memory and/or the memory if the cache element that caches the corresponding cache data is found to be a hard disk.
  22. 根据权利要求16-21任一所述的***,其特征在于,还包括:The system of any of claims 16-21, further comprising:
    第二缓存模块,如果未查找到缓存有所述相应缓存数据的缓存元件,则将所述相应缓存数据新增缓存到所述内存。The second cache module adds the corresponding cache data to the memory if the cache element with the corresponding cache data is not found.
  23. 根据权利要求22所述的***,其特征在于,还包括:The system of claim 22, further comprising:
    写入模块,用于响应于新增缓存到所述内存的所述相应缓存数据的数据构成预定大小的文件块,创建新的线程将所述文件块写入所述硬盘的缓存地址空间。And a write module, configured to respond to the newly added data of the corresponding cache data buffered to the memory to form a file block of a predetermined size, and create a new thread to write the file block to a cache address space of the hard disk.
  24. 根据权利要求23所述的***,其特征在于,所述硬盘包括固态硬盘;所述写入模块包括:The system of claim 23, wherein the hard disk comprises a solid state disk; the writing module comprises:
    第一写入子模块,用于响应于新增缓存到所述内存的所述相应缓存数据的数据构成预定大小的文件块,创建新的第一线程将所述文件块写入所述固态硬盘的缓存空间。a first writing submodule, configured to generate a new first thread to write the file block to the solid state hard disk in response to the newly added data of the corresponding cache data buffered to the memory constituting a file block of a predetermined size Cache space.
  25. 根据权利要求23所述的***,其特征在于,所述硬盘包括固态硬盘和机械硬盘;所述写入模块包括:The system of claim 23, wherein the hard disk comprises a solid state hard disk and a mechanical hard disk; the writing module comprises:
    第二写入子模块,用于响应于新增缓存到所述内存的所述相应缓存数据的数据构成预定大小的文件块,创建新的第一线程将所述文件块写入所述固态硬盘的缓存空间,并创建第二线程将写入到所述固态硬盘的所述文件块写入所述机械硬盘的缓存空间。a second write submodule, configured to generate a new first thread to write the file block to the SSD in response to the newly added data of the corresponding cache data buffered to the memory constituting a file block of a predetermined size Cache space and create a second thread to write the file block written to the SSD to the cache space of the mechanical hard disk.
  26. 根据权利要求14-25任一所述的***,其特征在于,还包括:The system of any of claims 14-25, further comprising:
    释放模块,用于在任一缓存元件的缓存空间被写满时,根据预定缓存空间释放策略,删除或替换缓存空间被写满的所述缓存元件的缓存空间原有的缓存数据。And a release module, configured to delete or replace the original cache data of the cache space of the cache component whose cache space is full according to a predetermined cache space release policy when the cache space of any cache element is filled.
  27. 一种电子设备,其特征在于,包括权利要求14-26任一所述的数据处理***。An electronic device comprising the data processing system of any of claims 14-26.
  28. 一种电子设备,其特征在于,包括:An electronic device, comprising:
    处理器和权利要求14-26任一所述的数据处理***;The processor and the data processing system of any of claims 14-26;
    在处理器运行所述数据处理***时,权利要求14-26任一所述的数据处理系 统中的单元被运行。The data processing system of any of claims 14-26 when the processor is running the data processing system The unit in the system is running.
  29. 一种电子设备,其特征在于,包括:一个或多个处理器、存储器、多种缓存元件、通信部件和通信总线,所述处理器、所述存储器、所述多种缓存单元和所述通信部件通过所述通信总线完成相互间的通信,所述多种缓存元件的传输速率和/或存储空间不同、且所述多种缓存元件根据传输速率和/或存储空间被预先设置有不同的查找优先级;An electronic device, comprising: one or more processors, a memory, a plurality of cache elements, a communication component, and a communication bus, the processor, the memory, the plurality of cache units, and the communication The components complete communication with each other through the communication bus, the transmission rates and/or storage spaces of the plurality of cache elements are different, and the plurality of cache elements are pre-set with different lookups according to the transmission rate and/or the storage space. priority;
    所述存储器用于存放至少一可执行指令,所述可执行指令使所述处理器执行如权利要求1-13任一所述的数据处理方法对应的操作。The memory is for storing at least one executable instruction that causes the processor to perform operations corresponding to the data processing method of any of claims 1-13.
  30. 一种计算机程序,包括计算机可读代码,其特征在于,当所述计算机可读代码在设备上运行时,所述设备中的处理器执行用于实现权利要求1-13任一所述的数据处理方法中各步骤的指令。A computer program comprising computer readable code, wherein a processor in the device executes data for implementing any of claims 1-13 when the computer readable code is run on a device The instructions for each step in the processing method.
  31. 一种计算机可读存储介质,用于存储计算机可读取的指令,其特征在于,所述指令被执行时实现权利要求1-14任一所述的数据处理方法中各步骤的操作。 A computer readable storage medium for storing computer readable instructions, wherein the instructions are executed to perform the operations of the steps of the data processing method of any of claims 1-14.
PCT/CN2017/108449 2016-10-28 2017-10-30 Data processing method and system, electronic device WO2018077292A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610972718.6 2016-10-28
CN201610972718.6A CN108009008B (en) 2016-10-28 2016-10-28 Data processing method and system and electronic equipment

Publications (1)

Publication Number Publication Date
WO2018077292A1 true WO2018077292A1 (en) 2018-05-03

Family

ID=62023105

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/108449 WO2018077292A1 (en) 2016-10-28 2017-10-30 Data processing method and system, electronic device

Country Status (2)

Country Link
CN (1) CN108009008B (en)
WO (1) WO2018077292A1 (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109597689A (en) * 2018-12-10 2019-04-09 浪潮(北京)电子信息产业有限公司 A kind of distributed file system Memory Optimize Method, device, equipment and medium
CN110322979A (en) * 2019-07-25 2019-10-11 美核电气(济南)股份有限公司 Nuclear power station digital control computer system core processing unit based on FPGA
CN110489058A (en) * 2019-07-02 2019-11-22 深圳市金泰克半导体有限公司 Solid state hard disk speed adjustment method, device, solid state hard disk and storage medium
CN110716900A (en) * 2019-10-10 2020-01-21 支付宝(杭州)信息技术有限公司 Data query method and system
CN110795395A (en) * 2018-07-31 2020-02-14 阿里巴巴集团控股有限公司 File deployment system and file deployment method
CN110851209A (en) * 2019-11-08 2020-02-28 北京字节跳动网络技术有限公司 Data processing method and device, electronic equipment and storage medium
CN111240845A (en) * 2020-01-13 2020-06-05 腾讯科技(深圳)有限公司 Data processing method, device and storage medium
CN111831691A (en) * 2019-05-29 2020-10-27 北京嘀嘀无限科技发展有限公司 Data reading and writing method and device, electronic equipment and storage medium
CN111949661A (en) * 2020-08-12 2020-11-17 国网信息通信产业集团有限公司 Access method and device for power distribution and utilization data
CN112650450A (en) * 2020-12-25 2021-04-13 深圳大普微电子科技有限公司 Solid state disk cache management method, solid state disk cache controller and solid state disk
CN112685431A (en) * 2020-12-29 2021-04-20 京东数字科技控股股份有限公司 Asynchronous caching method, device, system, electronic equipment and storage medium
CN112988619A (en) * 2021-02-08 2021-06-18 北京金山云网络技术有限公司 Data reading method and device and electronic equipment
CN113094529A (en) * 2019-12-23 2021-07-09 深圳云天励飞技术有限公司 Image data processing method and device, electronic equipment and storage medium
CN113204502A (en) * 2021-04-20 2021-08-03 深圳致星科技有限公司 Heterogeneous accelerated computing optimization method, device and equipment and readable storage medium
CN113342265A (en) * 2021-05-11 2021-09-03 中天恒星(上海)科技有限公司 Cache management method and device, processor and computer device
CN114242173A (en) * 2021-12-22 2022-03-25 深圳吉因加医学检验实验室 Data processing method, device and storage medium for identifying microorganisms by using mNGS
CN114327280A (en) * 2021-12-29 2022-04-12 以萨技术股份有限公司 Message storage method and system based on cold-hot separation storage
CN114390098A (en) * 2020-10-21 2022-04-22 北京金山云网络技术有限公司 Data transmission method and device, electronic equipment and storage medium

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111538681B (en) * 2020-03-25 2022-11-01 武汉理工大学 Cache replacement method based on maximized cache gain under Spark platform
CN111488626A (en) * 2020-04-09 2020-08-04 腾讯科技(深圳)有限公司 Data processing method, device, equipment and medium based on block chain
CN111563054B (en) * 2020-04-17 2023-05-30 深圳震有科技股份有限公司 Method for improving read-write speed of chip, intelligent terminal and storage medium
CN111552442A (en) * 2020-05-13 2020-08-18 重庆紫光华山智安科技有限公司 SSD-based cache management system and method
CN111708288B (en) * 2020-05-18 2021-11-26 慧灵科技(深圳)有限公司 Data processing method and device, electronic equipment and storage medium
CN111966283A (en) * 2020-07-06 2020-11-20 云知声智能科技股份有限公司 Client multi-level caching method and system based on enterprise-level super-computation scene
CN112329919B (en) * 2020-11-05 2022-07-29 北京百度网讯科技有限公司 Model training method and device
CN112486678A (en) * 2020-11-25 2021-03-12 广州经传多赢投资咨询有限公司 Stock market data processing method, system, device and storage medium
CN113870093A (en) * 2021-09-28 2021-12-31 上海商汤科技开发有限公司 Image caching method and device, electronic equipment and storage medium
CN113918483B (en) * 2021-12-14 2022-03-01 南京芯驰半导体科技有限公司 Multi-master device cache control method and system
CN114153754B (en) * 2022-02-08 2022-04-29 维塔科技(北京)有限公司 Data transmission method and device for computing cluster and storage medium
CN116886719B (en) * 2023-09-05 2024-01-23 苏州浪潮智能科技有限公司 Data processing method and device of storage system, equipment and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102981807A (en) * 2012-11-08 2013-03-20 北京大学 Graphics processing unit (GPU) program optimization method based on compute unified device architecture (CUDA) parallel environment
CN103200128A (en) * 2013-04-01 2013-07-10 华为技术有限公司 Method, device and system for network package processing
US20130346695A1 (en) * 2012-06-25 2013-12-26 Advanced Micro Devices, Inc. Integrated circuit with high reliability cache controller and method therefor
CN103955435A (en) * 2014-04-09 2014-07-30 上海理工大学 Method for establishing access by fusing multiple levels of cache directories
CN105988874A (en) * 2015-02-10 2016-10-05 阿里巴巴集团控股有限公司 Resource processing method and device

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8341380B2 (en) * 2009-09-22 2012-12-25 Nvidia Corporation Efficient memory translator with variable size cache line coverage
US8914466B2 (en) * 2011-07-07 2014-12-16 International Business Machines Corporation Multi-level adaptive caching within asset-based web systems
CN102662459A (en) * 2012-04-22 2012-09-12 复旦大学 Method for reducing energy consumption of server by using mixed storage of solid-state drive and mechanical hard disk
CN104102542A (en) * 2013-04-10 2014-10-15 华为技术有限公司 Network data packet processing method and device
CN103268293B (en) * 2013-06-14 2016-03-16 重庆重邮汇测通信技术有限公司 Hyperchannel multi-rate data acquisition system storage management method
US10250451B1 (en) * 2014-01-13 2019-04-02 Cazena, Inc. Intelligent analytic cloud provisioning
CN103927277B (en) * 2014-04-14 2017-01-04 中国人民解放军国防科学技术大学 CPU and GPU shares the method and device of on chip cache
CN104077368A (en) * 2014-06-18 2014-10-01 国电南瑞科技股份有限公司 History data two-level caching multi-stage submitting method for dispatching monitoring system
CN104239231B (en) * 2014-09-01 2018-01-30 上海爱数信息技术股份有限公司 A kind of method and device for accelerating L2 cache preheating
WO2016082205A1 (en) * 2014-11-28 2016-06-02 华为技术有限公司 Method, apparatus and device for controlling power consumption of multi-level cache
CN105183566B (en) * 2015-10-16 2019-01-29 上海恺英网络科技有限公司 The method for managing resource of 3D game rendering engine
CN105589664B (en) * 2015-12-29 2018-07-31 四川中电启明星信息技术有限公司 Virtual memory high speed transmission method
CN105847365A (en) * 2016-03-28 2016-08-10 乐视控股(北京)有限公司 Content caching method and content caching system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130346695A1 (en) * 2012-06-25 2013-12-26 Advanced Micro Devices, Inc. Integrated circuit with high reliability cache controller and method therefor
CN102981807A (en) * 2012-11-08 2013-03-20 北京大学 Graphics processing unit (GPU) program optimization method based on compute unified device architecture (CUDA) parallel environment
CN103200128A (en) * 2013-04-01 2013-07-10 华为技术有限公司 Method, device and system for network package processing
CN103955435A (en) * 2014-04-09 2014-07-30 上海理工大学 Method for establishing access by fusing multiple levels of cache directories
CN105988874A (en) * 2015-02-10 2016-10-05 阿里巴巴集团控股有限公司 Resource processing method and device

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110795395A (en) * 2018-07-31 2020-02-14 阿里巴巴集团控股有限公司 File deployment system and file deployment method
CN110795395B (en) * 2018-07-31 2023-04-18 阿里巴巴集团控股有限公司 File deployment system and file deployment method
CN109597689B (en) * 2018-12-10 2022-06-10 浪潮(北京)电子信息产业有限公司 Distributed file system memory optimization method, device, equipment and medium
CN109597689A (en) * 2018-12-10 2019-04-09 浪潮(北京)电子信息产业有限公司 A kind of distributed file system Memory Optimize Method, device, equipment and medium
CN111831691B (en) * 2019-05-29 2024-05-03 北京嘀嘀无限科技发展有限公司 Data reading and writing method and device, electronic equipment and storage medium
CN111831691A (en) * 2019-05-29 2020-10-27 北京嘀嘀无限科技发展有限公司 Data reading and writing method and device, electronic equipment and storage medium
CN110489058A (en) * 2019-07-02 2019-11-22 深圳市金泰克半导体有限公司 Solid state hard disk speed adjustment method, device, solid state hard disk and storage medium
CN110489058B (en) * 2019-07-02 2023-03-31 深圳市金泰克半导体有限公司 Solid state disk speed adjusting method and device, solid state disk and storage medium
CN110322979A (en) * 2019-07-25 2019-10-11 美核电气(济南)股份有限公司 Nuclear power station digital control computer system core processing unit based on FPGA
CN110322979B (en) * 2019-07-25 2024-01-30 美核电气(济南)股份有限公司 Nuclear power station digital control computer system core processing unit based on FPGA
CN110716900A (en) * 2019-10-10 2020-01-21 支付宝(杭州)信息技术有限公司 Data query method and system
CN110851209B (en) * 2019-11-08 2023-07-21 北京字节跳动网络技术有限公司 Data processing method and device, electronic equipment and storage medium
CN110851209A (en) * 2019-11-08 2020-02-28 北京字节跳动网络技术有限公司 Data processing method and device, electronic equipment and storage medium
CN113094529A (en) * 2019-12-23 2021-07-09 深圳云天励飞技术有限公司 Image data processing method and device, electronic equipment and storage medium
CN113094529B (en) * 2019-12-23 2024-01-05 深圳云天励飞技术有限公司 Image data processing method and device, electronic equipment and storage medium
CN111240845A (en) * 2020-01-13 2020-06-05 腾讯科技(深圳)有限公司 Data processing method, device and storage medium
CN111240845B (en) * 2020-01-13 2023-10-03 腾讯科技(深圳)有限公司 Data processing method, device and storage medium
CN111949661A (en) * 2020-08-12 2020-11-17 国网信息通信产业集团有限公司 Access method and device for power distribution and utilization data
CN114390098A (en) * 2020-10-21 2022-04-22 北京金山云网络技术有限公司 Data transmission method and device, electronic equipment and storage medium
CN112650450A (en) * 2020-12-25 2021-04-13 深圳大普微电子科技有限公司 Solid state disk cache management method, solid state disk cache controller and solid state disk
CN112650450B (en) * 2020-12-25 2024-02-27 深圳大普微电子科技有限公司 Solid state disk cache management method, solid state disk cache controller and solid state disk
CN112685431A (en) * 2020-12-29 2021-04-20 京东数字科技控股股份有限公司 Asynchronous caching method, device, system, electronic equipment and storage medium
CN112685431B (en) * 2020-12-29 2024-05-17 京东科技控股股份有限公司 Asynchronous caching method, device, system, electronic equipment and storage medium
CN112988619A (en) * 2021-02-08 2021-06-18 北京金山云网络技术有限公司 Data reading method and device and electronic equipment
CN113204502A (en) * 2021-04-20 2021-08-03 深圳致星科技有限公司 Heterogeneous accelerated computing optimization method, device and equipment and readable storage medium
CN113342265B (en) * 2021-05-11 2023-11-24 中天恒星(上海)科技有限公司 Cache management method and device, processor and computer device
CN113342265A (en) * 2021-05-11 2021-09-03 中天恒星(上海)科技有限公司 Cache management method and device, processor and computer device
CN114242173A (en) * 2021-12-22 2022-03-25 深圳吉因加医学检验实验室 Data processing method, device and storage medium for identifying microorganisms by using mNGS
CN114327280A (en) * 2021-12-29 2022-04-12 以萨技术股份有限公司 Message storage method and system based on cold-hot separation storage
CN114327280B (en) * 2021-12-29 2024-02-09 以萨技术股份有限公司 Message storage method and system based on cold and hot separation storage

Also Published As

Publication number Publication date
CN108009008B (en) 2022-08-09
CN108009008A (en) 2018-05-08

Similar Documents

Publication Publication Date Title
WO2018077292A1 (en) Data processing method and system, electronic device
US11899937B2 (en) Memory allocation buffer for reduction of heap fragmentation
US11949759B2 (en) Adaptive computation and faster computer operation
US10289555B1 (en) Memory read-ahead using learned memory access patterns
US10467152B2 (en) Dynamic cache management for in-memory data analytic platforms
CN112214424B (en) Object memory architecture, processing node, memory object storage and management method
US9652405B1 (en) Persistence of page access heuristics in a memory centric architecture
US8352517B2 (en) Infrastructure for spilling pages to a persistent store
US20210117473A1 (en) Technologies for managing connected data on persistent memory-based systems
CN110597451B (en) Method for realizing virtualized cache and physical machine
US10204175B2 (en) Dynamic memory tuning for in-memory data analytic platforms
US9760493B1 (en) System and methods of a CPU-efficient cache replacement algorithm
US9959074B1 (en) Asynchronous in-memory data backup system
TW201220197A (en) for improving the safety and reliability of data storage in a virtual machine based on cloud calculation and distributed storage environment
KR102236419B1 (en) Method, apparatus, device and storage medium for managing access request
CN104580437A (en) Cloud storage client and high-efficiency data access method thereof
US8346744B2 (en) Database management method, database management system, and processing program therefor
CN107958018B (en) Method and device for updating data in cache and computer readable medium
JP7322184B2 (en) Database change stream caching techniques
US20160306655A1 (en) Resource management and allocation using history information stored in application's commit signature log
CN107888687B (en) Proxy client storage acceleration method and system based on distributed storage system
WO2023051228A1 (en) Method and apparatus for sample data processing, and device and storage medium
US20210011847A1 (en) Optimized sorting of variable-length records
CN111488492A (en) Method and apparatus for retrieving graph database
US11055223B2 (en) Efficient cache warm up based on user requests

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17865593

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17865593

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 14.08.2019)

122 Ep: pct application non-entry in european phase

Ref document number: 17865593

Country of ref document: EP

Kind code of ref document: A1