CN112667528A

CN112667528A - Data prefetching method and related equipment

Info

Publication number: CN112667528A
Application number: CN201910985051.7A
Authority: CN
Inventors: 方维
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2019-10-16
Filing date: 2019-10-16
Publication date: 2021-04-16

Abstract

The application provides a data prefetching method and related equipment. Wherein, the method comprises the following steps: receiving, by a computing device, a first read request; the computing device inputs an address of data read by a first read request into a neural network model, the neural network model outputs a probability value of a plurality of addresses, the probability value of each address represents the probability that the data corresponding to each address is the data read by a second read request, and the second read request is the next read request of the first read request; and the computing equipment acquires N addresses from the output of the neural network model according to the probability value, and stores the data corresponding to the N addresses into a cache. The method can predict the data to be accessed and pre-fetch the data to the cache in advance, and improves the cache hit rate.

Description

Data prefetching method and related equipment

Technical Field

The present invention relates to the field of storage technologies, and in particular, to a data prefetching method and related devices.

Background

The memory is one of the core components of the computer, and the performance of the memory is directly related to the performance of the whole computer system. How to design a memory system with capacity and speed meeting the requirements of a computer system has been one of the key issues in the design of computer architectures. This goal is difficult to achieve with only a single memory, and the current solution to this problem is to use multiple memory technologies to form a multi-level memory hierarchy. As shown in fig. 1, the two-level storage model typically includes a small-capacity and fast-speed input/output (I/O) device, such as a Static Random Access Memory (SRAM), and a large-capacity and slow-speed I/O device, such as a hard disk. All data is stored on the low-speed I/O device, the high-speed I/O device is used as a cache (cache), and when the data is read, whether a copy of the data exists in the high-speed I/O device is always searched in advance.

A cache is an important component of a data processing device (such as a computer, a mobile terminal, etc.), and the cache is used for temporarily storing instructions and data in a Central Processing Unit (CPU) and exchanging data with an external memory such as a hard disk, so that the CPU can realize higher-speed access, shorten access time, and improve system performance, as shown in fig. 1. Caching is based on the principle that the local behavior of program execution and data access, i.e. within a certain program execution time and space, the accessed code is concentrated in one part.

If the data accessed by the user can be found in the cache, the data is called cache hit; if not, it is called a cache miss (miss). In order to increase the cache hit rate, it is necessary to use a data prefetch technique, in which data to be used by the CPU is previously fetched from the hard disk into the cache, in addition to the local behavior depending on the program execution and data access.

At present, sequential stream prefetching and interval stream prefetching are mainly performed when the access addresses of the access requests are determined to be continuous or at equal intervals through analysis. However, in most cases, there is no sequential but strong association between access addresses, and existing data prefetching approaches are no longer applicable. Therefore, how to prefetch the data to be accessed by the CPU into the cache to improve the cache hit rate is an urgent problem to be solved under the condition that the access addresses are not sequential but strongly associated.

Disclosure of Invention

The embodiment of the invention discloses a data prefetching method and related equipment, which can predict data to be accessed and prefetch the data to a cache in advance, thereby improving the cache hit rate.

In a first aspect, the present application provides a method of data prefetching, the method comprising: the processor receives a first read request; the processor inputs the address of the data read by the first read request into a neural network model, the neural network model outputs a plurality of address probability values, each address probability value represents the probability that the data corresponding to each address is the data read by a second read request, and the second read request is the next read request of the first read request; and the processor acquires N addresses from the output of the neural network model according to the probability value, and stores the data corresponding to the N addresses in a cache.

Alternatively, the neural network model may be a skip gram model, a recurrent neural network model, or a convolutional neural network model.

In the embodiment of the application, the processor inputs the received read request into the trained neural network model, predicts the data which is possibly read by the subsequent read request by using the neural network model, selects the address with the larger probability value output by the neural network model and stores the corresponding data in the cache, so that the data can be prevented from being read from the hard disk, and the cache hit rate and the performance of the processor are improved.

With reference to the first aspect, in a possible implementation manner of the first aspect, the processor determines M addresses where corresponding data in the N addresses is not located in the cache, where M is a positive integer greater than or equal to 1; and the processor stores the data corresponding to the M addresses to the cache.

In the embodiment of the application, in the process of storing the data in the cache, the processor needs to first judge whether the data is already stored in the cache, and only when the data is not stored in the cache, the data is stored in the cache, so that the data repetition can be avoided, the waste of cache space is reduced, and the utilization rate of the cache is improved.

With reference to the first aspect, in a possible implementation manner of the first aspect, the neural network model includes a first neural network model and a second neural network model, where the first neural network model is used to train an address of write data, and the second neural network model is used to train an address of read data; the processor respectively inputs the addresses of the data read by the first read request into a first neural network model and a second neural network model; the processor obtains the first N1 addresses from the output of the first neural network model according to the descending order of the probability value, and obtains the first N2 addresses from the output of the second neural network model according to the descending order of the probability value, wherein the sum of the N1 and the N2 is equal to the N.

In the embodiment of the application, the first read request is respectively predicted by using the first neural network model and the second neural network model, and the N1 addresses and the N2 addresses are respectively selected from the outputs of the first neural network model and the second neural network model according to the magnitude of the probability value and are stored in the cache, so that the cache hit rate and the processor performance can be further improved.

With reference to the first aspect, in a possible implementation manner of the first aspect, the processor obtains an address set of write data, where a write time interval between two adjacent data in write time in the address set is smaller than a preset value; the processor trains the addresses in the address set through the first neural network model, so that after a first address in the address set is input into the first neural network model, the probability of at least one address behind the first address in the probability of a plurality of addresses output by the first neural network model is increased, and the first address is any one address in the address set.

In the embodiment of the application, the processor inputs the address set of the written data with adjacent writing time as a training sample into the first neural network model for training, so that the first neural network model can learn the incidence relation among all addresses in the address set after training, the probability of at least one address behind the address is increased after any one address in the address set is input into the first neural network model, and the prediction accuracy of the first neural network model is improved.

With reference to the first aspect, in a possible implementation manner of the first aspect, the processor obtains an address set of read data, where a read time interval between two adjacent data in read time in the address set is smaller than a preset value; the processor trains the addresses in the address set through the second neural network model, so that after a first address in the address set is input into the second neural network model, the probability of at least one address after the first address in a plurality of addresses output by the second neural network model is increased, and the first address is any one address in the address set.

In the embodiment of the application, the processor inputs the address set of read data adjacent to the read time as a training sample into the second neural network model for training, so that the second neural network model can learn the incidence relation among all addresses in the address set after training, the probability of at least one address behind the address is increased after any one address in the address set is input into the second neural network model, and the prediction accuracy of the second neural network model is improved.

With reference to the first aspect, in a possible implementation manner of the first aspect, the processor counts a number a of addresses obtained from the output of the first neural network model that hit in the cache and a number B of addresses obtained from the output of the second neural network model that hit in the cache within a preset time period; the processor adjusts the number of addresses N1 obtained from the output of the first neural network model and the number of addresses N2 obtained from the output of the second neural network model according to A and B.

In the embodiment of the application, the number of addresses N1 obtained by the processor from the output of the first neural network model and the number of addresses N2 obtained from the output of the second neural network model are not always constant, but are dynamically adjusted according to the number of hits in the cache, so that the cache hit rate and the processor performance are further improved.

In a second aspect, the present application provides a computing device comprising: a receiving unit configured to receive a first read request; a prediction unit, configured to input an address of data read by the first read request into a neural network model, where the neural network model outputs a probability value of a plurality of addresses, where the probability value of each address represents a probability that data corresponding to each address is data read by a second read request, and the second read request is a read request next to the first read request; and acquiring N addresses from the output of the neural network model according to the probability value, and storing data corresponding to the N addresses in a cache.

With reference to the second aspect, in a possible implementation manner of the second aspect, the prediction unit is further configured to: determining M addresses of the N addresses, wherein the corresponding data are not in the cache, and M is a positive integer greater than or equal to 1; and storing the data corresponding to the M addresses to the cache.

With reference to the second aspect, in a possible implementation manner of the second aspect, the neural network model includes a first neural network model and a second neural network model, where the first neural network model is used to train an address of write data, predict an address of data read by the first read request, and output probability values of multiple addresses; the second neural network model is used for training addresses of read data, predicting the addresses of the data read by the first read request and outputting probability values of a plurality of addresses; the prediction unit is further configured to obtain the first N1 addresses from the output of the first neural network model in an order of decreasing probability values, and obtain the first N2 addresses from the output of the second neural network model in an order of decreasing probability values, where a sum of N1 and N2 is equal to N.

With reference to the second aspect, in a possible implementation manner of the second aspect, the prediction unit is further configured to: acquiring an address set of write-in data, wherein the write-in time interval of two data with adjacent write-in time in the address set is smaller than a preset value; training the addresses in the address set, so that after a first address in the address set is input into the first neural network model, the probability of at least one address after the first address in a plurality of addresses output by the first neural network model is increased, wherein the first address is any one address in the address set.

With reference to the second aspect, in a possible implementation manner of the second aspect, the prediction unit is further configured to: acquiring an address set for reading data, wherein the reading time interval of two data adjacent to the reading time in the address set is smaller than a preset value; training the addresses in the address set, so that after a first address in the address set is input into the second neural network model, the probability of at least one address after the first address in the address set in a plurality of addresses output by the second neural network model is increased, wherein the first address is any one address in the address set.

With reference to the second aspect, in a possible implementation manner of the second aspect, the computing device further includes a processing unit, configured to count a number a of addresses obtained from the output of the first neural network model that hit in the cache and a number B of addresses obtained from the output of the second neural network model that hit in the cache within a preset time duration; adjusting the number of addresses N1 obtained from the output of the first neural network model and the number of addresses N2 obtained from the output of the second neural network model according to A and B.

In a third aspect, the present application provides an intelligent chip, where an instruction is burned in the intelligent chip, and the intelligent chip executes the instruction to execute the first aspect and a method for prefetching data provided by combining any one implementation manner of the first aspect.

In a fourth aspect, the present application provides a computing device, where the computing device includes a processor and a memory, where the processor and the memory are connected through an internal bus, and the memory stores instructions, and the processor calls the instructions in the memory to execute the method for prefetching data according to the first aspect and the implementation manner of any one of the first aspect.

In a fifth aspect, the present application provides a computing device, where the computing device includes a processor, a memory, and an intelligent chip, where the intelligent chip is connected to the processor through a serial bus and connected to the memory through a PCIe interface, and where an instruction is burned in the intelligent chip, and the intelligent chip executes the instruction to execute the first aspect and to combine the method for prefetching data provided by any one implementation manner in the first aspect.

In a sixth aspect, the present application provides a computer storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program can implement the first aspect and the flow of the data prefetching method provided in connection with any one of the implementations of the first aspect.

In a seventh aspect, the present application provides a computer program product, where the computer program includes instructions that, when executed by a computer, enable the computer to perform the first aspect and the flow of the data prefetching method provided in connection with any one of the implementations of the first aspect.

In an eighth aspect, the present invention provides a computing device, including a processor and a memory, where the memory stores program instructions, and the processor executes the instructions in the memory to execute the first aspect and the data prefetching method provided in connection with any one of the implementations of the first aspect.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic diagram of a storage model provided by an embodiment of the present application;

FIG. 2A is a schematic diagram of data access provided by an embodiment of the present application;

FIG. 2B is a schematic diagram of yet another data access provided by an embodiment of the present application;

FIG. 3 is a diagram of a system architecture provided by an embodiment of the present application;

FIG. 4 is a schematic structural diagram of a data prefetching system according to an embodiment of the present application;

FIG. 5 is a flow chart illustrating a method for prefetching data according to an embodiment of the present application;

FIG. 6 is a schematic flow chart of a first neural network model training method according to an embodiment of the present disclosure;

FIG. 7 is a diagram illustrating a write request stream splitting according to an embodiment of the present application;

FIG. 8 is a schematic flow chart diagram illustrating a second neural network model training method according to an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of a computing device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application are described below clearly and completely with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments.

First, a part of words and related technologies referred to in the present application will be explained with reference to the accompanying drawings so as to be easily understood by those skilled in the art.

A Logical Block Address (LBA) is a general mechanism for describing a block where data is located on a storage device, and is generally used in an auxiliary memory device such as a hard disk. The LBA may refer to the address of a block of data, and a logical block is typically 512 or 1024 bytes.

Caches (caches) are typically used for Random Access Memory (RAM) to store temporary data. The cache is a memory capable of high-speed data exchange, has extremely high storage rate, and solves the speed difference between the CPU speed and the hard disk speed. The method exchanges data with the CPU before the hard disk, the capacity of the cache is small, only a small part of data in the hard disk can be stored, but the small part is to be accessed by the CPU in a short time, and when the CPU calls a large amount of data, the hard disk can be avoided from being directly called from the cache, so that the reading speed is accelerated. When data is stored in the cache, the data is stored in the form of key-value pairs (key-value), that is, both the LBA and the value corresponding to the LBA are stored in the cache.

The prefetching technique is to load data from a main memory (e.g. a hard disk) to a cache memory (cache) in advance before the CPU accesses the data, so as to reduce the stall time of the CPU accessing the data and improve the performance of the CPU.

Skip-gram is an unsupervised learning technology, which can independently learn the context of words, and can automatically complement sentences or analyze the relevance of sentences by learning the relationship between words, so as to predict the context words corresponding to given central words. Unsupervised learning refers to an unmarked training data set, and samples need to be analyzed according to regular statistics among the samples, such as task clustering and the like.

A Recurrent Neural Network (RNN) is an artificial neural network in which nodes are connected in a ring in a directed manner, and the internal state of the network can develop dynamic timing behavior. The RNN not only takes into account the input from the previous moment, but also gives the network a memory function for the previous content. The RNN network is mainly composed of an input layer, a hidden layer and an output layer, the network memorizes the previous information and applies it to the calculation of the current output, i.e. the nodes between the hidden layers are no longer connected but connected, and the input of the hidden layer includes not only the output of the input layer but also the output of the hidden layer at the previous moment. The RNN may be used in the fields of Natural Language Processing (NLP), machine translation, speech recognition, image description generation, and text similarity calculation, for example, in the present application, data prefetching may be implemented by using the RNN.

Convolutional Neural Networks (CNNs) are a class of feed-forward neural networks that contain convolution calculations and have a deep structure, and are one of the algorithms that represent deep learning. The CNN has the characteristic learning ability and can carry out translation invariant classification on input information according to the hierarchical structure of the CNN. The CNN simulates the visual perception mechanism construction of organisms, can perform supervised learning and unsupervised learning, and ensures that a convolutional neural network can learn lattice characteristics such as pixels and audio with small calculation amount and has stable effect and no additional characteristic engineering requirement on data due to the parameter sharing of convolution kernels in hidden layers and the sparsity of interlayer connection.

In the data processing process, the cache is closer to the CPU and the data access speed is higher, so when an access request is processed, the CPU always searches for data to be accessed in the cache, and if the data to be accessed is not found in the cache, the data to be accessed is further searched in the hard disk, which reduces the access rate of the CPU. In order to shorten the access time and improve the access efficiency, a prefetching technology is used to prefetch the data to be accessed by the CPU into the cache, so as to improve the cache hit rate. As shown in fig. 2A, data block 1, data block 2. . . The data block n is spatially continuous, and the CPU accesses the data block 1 and the data block 2 in sequence. . . Block n, e.g. accessed by the CPU when reading some large file. The CPU can identify the sequential flow by analyzing the relation between the access addresses (such as LBAs), namely the addresses corresponding to the data accessed by the CPU are continuous, and the data to be accessed by the CPU is prefetched to the cache in advance by utilizing the sequential flow prefetching technology, so that the data can be ensured to be found in the cache when being used, the cache hit rate is improved, and the data access delay is reduced. As shown in fig. 2B, data block 1, data block 2. . . The data block n is spatially continuous, and the CPU accesses the data block n at a certain interval, instead of strictly sequential access, for example, the CPU accesses the data block n at intervals of 2 data blocks, that is, the CPU accesses the data block n after accessing the data block 1, and then accesses the data block n after accessing the data block n. The CPU identifies the interval flow by analyzing the relation between the access addresses, namely the addresses corresponding to the data accessed by the CPU are at equal intervals, and the data to be accessed by the CPU is prefetched to the cache in advance by utilizing the interval flow prefetching technology, so that the cache hit rate can be improved, and the data access delay is reduced.

The sequential flow prefetching technique and the interval flow prefetching technique can only be applied to the situation that the access addresses are continuous or at equal intervals, and if the access addresses have no sequence but strong correlation, the method is not applicable any more. For example, when a virtual machine is started, an operating system reads a plurality of configuration files according to a fixed sequence, addresses corresponding to the files are not sequential or equally spaced, but the configuration files are read according to the sequence every time the virtual machine is started, the corresponding addresses have a strong association relationship, data to be accessed by a CPU cannot be pre-fetched by using a sequential stream prefetching technology or an interval stream prefetching technology, namely, a cache hit rate cannot be increased, and data access delay cannot be reduced.

Based on the above, the present application provides a data prefetching method and related device, which can prefetch data corresponding to an address to be accessed to a cache in advance by analyzing the correlation between access addresses and identifying an address access rule, thereby improving the cache hit rate, reducing the data access delay, and improving the system performance.

The technical scheme of the embodiment of the application can be applied to various scenes needing data prefetching, including but not limited to enterprise storage, distributed storage, cloud storage and the like.

In a particular embodiment, the data prefetching system may be deployed in any computing device that involves data prefetching. For example, as shown in fig. 3, may be deployed on one or more computing devices (e.g., a central server) on a cloud environment, or on one or more computing devices (edge computing devices) in an edge environment, which may be servers. The cloud environment is a central computing equipment cluster owned by a cloud service provider and used for providing computing, storage and communication resources, and has larger storage resources and computing resources; an edge environment refers to a cluster of edge computing devices geographically close to the end devices for providing computing, storage, and communication resources.

The data prefetching system is configured to analyze an access address of the service access request, and prefetch the data corresponding to the access address into the cache by the prefetching CPU. Fig. 4 is an exemplary division manner, and as shown in fig. 4, the function of each functional unit will be briefly described below.

The data prefetching system 400 shown comprises a plurality of functional units, wherein the obtaining unit 410 is configured to obtain data access requests, and the data access requests include read requests and write requests; a predicting unit 420, configured to input an access address corresponding to the data access request acquired by the acquiring unit 410 into a neural network model for training, and predict a subsequent address to be accessed; and the processing unit 430 is configured to prefetch data corresponding to the access address predicted by the prediction unit 420 into a cache.

Optionally, the data prefetching system 400 further includes a sequence dividing unit 440, configured to divide the read request and the write request acquired by the acquiring unit 410 based on the arrival time of each data access request, so as to obtain a plurality of read request sequences and a plurality of write request sequences. The neural network model comprises a first neural network model 4210 and a second neural network model 4220, wherein the first neural network model 4210 is used for training addresses corresponding to a plurality of write request sequences obtained by segmentation of the segmentation unit 440 and predicting addresses to be accessed subsequently; the second neural network model 4220 is configured to train addresses corresponding to a plurality of read request sequences obtained by splitting by the splitting unit 440, and predict addresses to be accessed subsequently. The data prefetching system 400 further includes a counting unit 450 for counting the number of first associated addresses predicted and output by the first neural network model 4210, the number of second associated addresses predicted and output by the second neural network model 4220, the number of first associated addresses hit in the cache, and the number of second associated addresses hit in the cache. The data prefetching system 400 further includes an updating unit 460, configured to update weights corresponding to the first neural network model 4210 and the second neural network model 4220 according to data obtained by statistics of the statistics unit 450, that is, the number of first associated addresses predicted and input by the first neural network model 4210 and the number of second associated addresses predicted and output by the second neural network model 4220 are not always equal, and are output according to a weight ratio, for example, 10 associated addresses are predicted and output in total, the weight occupied by the first neural network model 4210 is 0.4, the weight occupied by the second neural network model 4220 is 0.6, then 4 of the first associated addresses predicted by the first neural network model 4210 are selected and output, and 6 of the second associated addresses predicted by the second neural network model 4220 are selected and output.

In the present application, the data prefetching system 400 may be a software system, and the form of each portion and functional unit included in the software system being deployed on a hardware device is flexible.

It should be noted that the cache prefetch system may be an independent chip, and is connected to the CPU through a serial bus, receives a data access request sent by the CPU, trains an access address corresponding to the data access request, and outputs an address that is predicted to be accessed by a next access request of the access request. The independent chip may be a field-programmable gate array (FPGA), the FPGA is an integrated circuit and may be configured by a user, or may be other programmable logic devices, which is not limited in this application.

FIG. 5 depicts a method for data prefetching according to an embodiment of the present application. As shown in fig. 5, the method includes, but is not limited to, the following steps:

s510: the computing device obtains a first read request.

Specifically, the computing device is deployed with the data prefetching system 400 shown in fig. 4, where the first read request may be a request generated by an application program and the like in the computing device, and is used to access data stored in the computing device, where the first read request carries an address corresponding to the data to be read, where the address may be an LBA, and after receiving the first read request, the obtaining unit 410 in the computing device finds and returns the corresponding data according to the LBA. The searching sequence of the computing equipment is that the data is searched from the cache, if the data is found in the cache, the data is called cache hit, and the data is directly returned; since only a small portion of the data is stored in the cache, the data may not be found in the cache, and if the data is not found in the cache, the data is further searched from the hard disk and returned.

S520: the computing device inputs the address of the data read by the first read request into the neural network model and outputs probability values of the plurality of addresses.

Specifically, since the reading speed of the data in the cache is fast and can be matched with the speed of the CPU, in order to shorten the data access time, the cache hit rate is increased, and the system performance is further increased. Therefore, the address corresponding to the first read request is input into a trained neural network model, which may be the prediction unit 420 shown in fig. 4, the neural network model is used to predict the address of the data that may be accessed by the second read request after the first read request, and output the probability values of the predicted multiple addresses, and N associated addresses are obtained from the predicted multiple addresses according to the probability values, so as to increase the probability that the data accessed by the second read request is found from the cache by the second read request, where the N associated addresses have stronger association with the access address corresponding to the first read request. Wherein N is a positive integer greater than 1.

In one possible implementation, the neural network model includes a first neural network model, which may be the first neural network model 4210 shown in fig. 4, and a second neural network model, which may be the second neural network model 4220 shown in fig. 4. The computing device respectively inputs the addresses corresponding to the first read request into a first neural network model and a second neural network model, so that the first neural network model outputs probability values of a plurality of addresses, and selects N1 addresses as first associated addresses, so that the second neural network model also outputs probability values of a plurality of addresses, and selects N2 addresses as second associated addresses, wherein N1 and N2 are positive integers greater than or equal to 1, and the sum of N1 and N2 is equal to N. The first neural network model is obtained by training the computing equipment according to the address of the written data, and the second neural network model is obtained by training the computing equipment according to the address of the read data. How the first neural network model and the second neural network model are specifically trained will be specifically described in the subsequent steps.

It can be understood that, when the computing device is trained by using the address of the write data (i.e. the address corresponding to the write request acquired before the prediction by using the first neural network model) as a training sample and the address of the read data (i.e. the address corresponding to the read request acquired before the prediction by using the second neural network model) as a training sample, the association between the addresses of the data can be analyzed to find an access rule, so that the first neural network model and the second neural network model can predict, according to the address of the write data and the address of the read data, that the accuracy of the address corresponding to the second read request after the first read request is increased, and prefetch the data corresponding to the address into the cache in advance, thereby improving the cache hit rate, shortening the access time and improving the system performance.

S530: and the computing equipment acquires the N addresses from the output of the neural network model according to the probability value and stores the corresponding data in a cache.

Specifically, after the first read request is input into the neural network model, the neural network model predicts and outputs probability values of a plurality of addresses, and selects N associated addresses with larger probability values, wherein data corresponding to the N associated addresses may partially exist in the cache. Therefore, the processing unit 430 in the computing device first determines which data corresponding to the associated address exists in the cache among the N associated addresses, that is, the part of data does not need to be prefetched any more, then takes the associated address which is not stored in the cache as a target associated address, and prefetches the data corresponding to the target associated address from the hard disk into the cache.

It will be appreciated that the capacity of the buffer is relatively small and the data stored is relatively limited. Therefore, the acquired associated address output by the neural network model needs to be set, that is, the value of N needs to be set, so that the condition that the value of N is set too large is avoided, the utilization rate of the cache is improved, and the cache can be ensured to reserve enough capacity to support the next prefetching operation.

Further, the computing device sorts probability values corresponding to a plurality of first associated addresses predicted by the first neural network model according to sizes, and selects N1 first associated addresses in sequence from large to small for output, wherein the probability value of each first associated address represents the probability that data corresponding to each first associated address is the data read by the second read request, and the second read request is the next read request of the first read request; in addition, the computing device sorts the probability values corresponding to the plurality of second associated addresses predicted by the second neural network model according to sizes, and selects N2 second associated addresses in sequence from large to small for output, wherein the probability value of each second associated address represents the probability that the data corresponding to each second associated address is the data read by the second read request, and the second read request is the next read request of the first read request.

It should be noted that, in the initialization stage, the weights of the first neural network model and the second neural network model are the same and are both 0.5. In other words, in the initial stage, the number of the first associated addresses and the number of the second associated addresses selected by the computing device are the same, i.e., the value of N1 is equal to N2. However, as time changes, the computing device may periodically adjust the weights occupied by the first neural network model and the second neural network model, so as to ensure that an address corresponding to a next read request (i.e., a second read request) of the current read request can be predicted more accurately, and improve the cache hit rate.

In a possible implementation manner, the counting unit 450 in the computing device counts the number a of the first associated addresses hit in the cache and the number B of the second associated addresses hit in the preset time duration, and the computing device updates the weights occupied by the first neural network model and the second neural network model according to the values of the parameters, i.e., a and B, counted by the counting unit 450.

Specifically, the computing device updates the respective weights of the two models according to the values of the parameter a and the parameter B. Wherein, the weight a corresponding to the first neural network model is a1/(a1+ B1); and the weight B corresponding to the second neural network model is B1/(A1+ B1), and the computing equipment adjusts the number of the first associated addresses and the number of the second associated addresses pre-fetched into the cache according to the calculated a and B, so that the highest address association between the addresses pre-fetched into the cache and the current read request is ensured, and the cache hit rate is improved.

Illustratively, the calculated value of a is 0.3, the value of b is 0.7, and the computing device needs to prefetch data corresponding to 10 associated addresses into the cache each time, the computing device selects 3 first associated addresses from all the first associated addresses predicted and output by the first neural network model, and prefetches data corresponding to the first associated addresses into the cache; meanwhile, the computing device selects 7 second associated addresses from all the second associated addresses predicted and output by the second neural network model, and prefetches the corresponding data of the 7 second associated addresses into the cache.

It can be seen that the computing device can dynamically adjust the number of the first associated addresses and the number of the second associated addresses prefetched into the cache according to the statistical parameters and the weights corresponding to the models in time, so as to ensure that the associated addresses corresponding to the current read request and having the strongest address association can be always prefetched, improve the cache hit rate, and further improve the system performance.

In addition, since the capacity of the cache is limited, data in a storage device such as a hard disk cannot be prefetched into the cache without limit, and therefore, in order to ensure high performance of the cache and improve the cache hit rate, the computing device needs to periodically update (i.e., delete) the data stored in the cache, so that the cache has enough capacity to store the data prefetched into the cache recently.

Optionally, the computing device may update the data stored in the cache with corresponding replacement algorithms, including, but not limited to, Least Recently Used (LRU) algorithms, pseudo least recently used (pseudo LRU) algorithms, Least Frequently Used (LFU) algorithms, random (random) algorithms. It will be readily appreciated that the computing device may further improve the cache hit rate by periodically updating the data stored in the cache using a replacement algorithm.

It should be noted that, the foregoing is only described by taking the case of prefetching data from the hard disk into the cache, and the method provided in the present application is also applicable to any two-level storage structure, for example, prefetching data from the memory into the CPU cache, or prefetching data from the hard disk into the memory, and is not described herein again for brevity.

The specific training process of the first neural network model and the second neural network model will be described in detail below, and it should be understood that the neural network models selected in the present application include, but are not limited to, skip-gram models, CNN models, and RNN models. In addition, the first neural network model and the second neural network model may be the same type of model, for example, both are skip-gram models, or may be different types of models, for example, the first neural network model is a skip-gram model, and the second neural network model is an RNN model, which is not limited in this application.

For ease of understanding and explanation, the first neural network model and the second neural network model are both specifically described as skip-gram models.

Fig. 6 is a flowchart illustrating a first neural network model training method according to an embodiment of the present disclosure. As shown in fig. 6, the method includes, but is not limited to, the following steps:

s610: a computing device obtains a plurality of write requests.

Specifically, the obtaining unit 410 in the computing device obtains write requests generated by an application program within a period of time, records write time and write address corresponding to each write request, and temporarily stores the write time and the write address in the cache.

Illustratively, the computing device obtains a write request at time t0, where the write address is a; acquiring a write request with a write address b at t 1; acquiring a write request with a write address of c at time t 2; a write request is acquired at time t3 with a write address d. The computing device temporarily stores these 4 write requests in the cache in chronological order of write time, for example, in the form of [ { t0, a }, { t1, b }, { t2, c }, { t3, d } ].

S620: the computing device divides the plurality of write requests into a plurality of address sets of write data.

Specifically, the continuous write requests constitute a write request stream, and it should be understood that, in the write request stream, if two write requests are too far apart from each other, that is, the write times of the two write requests are too different, it is indicated that the association between the write addresses corresponding to the two write requests is weak, and it can be determined that there is no association between the two write requests. Therefore, the sequence splitting unit 440 in the computing device needs to split the write request stream, and there is no correlation between the split multiple shorter write request streams, but there is a correlation between the write addresses corresponding to the write requests in each write request stream, where each segment is a set of addresses for writing data.

Further, the sequence splitting unit 440 in the computing device performs splitting based on the writing time of each write request, and when the writing time between two write requests exceeds a preset time length, the two write requests need to be split, that is, the writing time interval between two data with adjacent writing times in each address set is less than the preset time length. The preset duration may be set according to actual needs, which is not limited in this application.

For example, as shown in fig. 7, a write request with a write address a is acquired at time t0, a write request with a write address b is acquired at time t1, a write request with a write address c is acquired at time t2, a write request with a write address d is acquired at time t3, a write request with a write address e is acquired at time t4, a write request with a write address a is acquired at time t5, and a write request with a write address d is acquired at time t 6. The interval between t0 and t1 is 3 time units (e.g., 3 milliseconds), the interval between t1 and t2 is 4 time units, the interval between t2 and t3 is 5 time units, the interval between t3 and t4 is 11 time units, the interval between t4 and t5 is 2 time units, and the interval between t5 and t6 is 5 time units. The preset time length is 10 time units, and the interval between t3 and t4 is 11 time units, which exceeds 10 time units, so that the address set needs to be divided into two address sets, namely (a, b, c, d) and (e, a, b). In addition, the address sets corresponding to the segmented multi-segment write request streams form a training set.

S630: the computing device inputs each address set as a training sample into the skip-gram model for training.

Specifically, the computing device sequentially inputs an address set corresponding to each segment of write request stream into a skip-gram model for training, continuously updates parameters of the model, and finally enables the skip-gram model to have the capability of predicting the address of data accessed by a second read request through training, wherein the second read request is the next read request of the current read request (namely, the first read request).

Further, the skip-gram model mainly comprises a hidden layer and an output layer, all parameters exist in a weight matrix W1 of the hidden layer and a weight matrix W2 of the output layer, and the essence of the model training is to update the parameters in W1 and W2. When the skip-gram model is used, the initialization of W1 and W2 is performed, in the initialization process, the parameters included in W1 and W2 may be randomly selected, and then the randomly selected parameters are continuously updated through a subsequent training process.

It should be further noted that all the addresses (including the address corresponding to the read request, the address corresponding to the write request, and the like) described above refer not to an address of a certain data specifically, but refer to an address corresponding to a data block, the size of the data block may be divided as needed, for example, the data block may be divided into 64KB, 128KB, and the like, and the address may be obtained by logically encoding the data block. In addition, the storage address space of each computing device is fixed and may be logically divided, for example, the size of the memory address space is 1GB, the memory address space is divided according to 64KB, and the memory address space may be divided into 16384 data blocks, each data block corresponds to one address (for example, LBA), that is, the size of the address space formed by all the addresses is 16384.

For example, assuming that a certain address set obtained after the division is (a, b, c, d, e), address pairs [ a, b ], [ a, c ], [ a, d ], [ a, e ] are constructed, and each time an address pair is input into the skip-gram model, wherein the previous address of each address pair is a model input, and the latter address is used for correcting the address predicted and output by the model to update the model parameters.

After the completion of constructing the address pair, the address pair [ a, b ] is first trained. And a is coded by one-hot codes (one-hot), the essence of one-hot coding is a code system of how many bits are in how many states, only one bit is 1, and the other bits are all 0. Here, the total amount of states refers to the size of the memory address space, and therefore, when the memory address space is 1GB and is divided into 64KB, a is one-hot encoded, and then a is converted into a 16384-dimensional vector, which is denoted by x.

Then, x is passed to the hidden layer, which performs a dot product operation of the weight matrix W1 and x to obtain a result h, i.e., h ═ x × W1. It should be understood that x is a vector of 1 × 16384, W1 may be a matrix of 16384 × k, h is a vector of 1 × k, and the value of k may be set as desired. In general, the value of k is relatively small, and may be, for example, 5 or 10. By dot product operation, the dimension reduction of x can be realized, i.e. a can be represented by a vector with a lower dimension. It is easy to understand that after the dimensionality is reduced, the operation amount can be greatly reduced, and the training efficiency is improved.

Then, h is transmitted to the output layer, and the output layer performs a dot product operation of the weight matrix W2 and h to obtain a result y, that is, y is h × W2. y is the result of the skip-gram model prediction, it is understood that W2 is a k × 16384 matrix, and thus y is a1 × 16384 vector.

After y is found, it can be passed to the regression classifier for normalization calculation using the softmax function, resulting in y1, y1 representing the probability of 16384 addresses predicted by the skip-gram model to appear, and the sum of probability values for all addresses in y1 is equal to 1.

It should be understood that the predicted result is not completely matched with the actual result, i.e. there is an error between the predicted result and the actual result, and therefore, the weight matrices W1 and W2 need to be modified by using a back propagation algorithm.

Alternatively, after y1 is obtained, error back propagation may be performed to achieve the correction of W1 and W2 in the following two ways.

1. First, after y1 is obtained, an average error value v of y1 and b, c, d, e is calculated. That is, the difference v1 between the vectors corresponding to y1 and b is calculated, the difference v2 between the vectors corresponding to y1 and c is calculated, the difference v3 between the vectors corresponding to y1 and d is calculated, the difference v4 between the vectors corresponding to y1 and e is calculated, and after v1, v2, v3 and v4 are obtained, the average error value v is calculated, that is, v is (v1+ v2+ v3+ v 4)/4.

Then, an increment Δ W2 of the weight matrix W2, i.e., Δ W2 — h × v, is calculated using h and v. W2 is corrected by the calculated Δ W2, that is, the weight matrix of the corrected output layer is W2+ α Δ W2, where α represents the learning rate and ranges from 0 to 1, and may be set as needed. An increment Δ W1 of the weight matrix W1, i.e., Δ W1 ═ x (W2 ×) is calculated using x, W2, and v. And correcting W1 by using the calculated delta W1, namely, the weight matrix of the corrected hidden layer is W1+ alpha delta W2.

2. After y1 is obtained, the error value of this training is calculated, i.e. the difference v1 between the vectors corresponding to y1 and b is calculated.

Then, an increment Δ W2 of the weight matrix W2, i.e., Δ W2 — h × v1, is calculated using h and v 1. W2 is corrected by the calculated Δ W2, that is, the weight matrix of the corrected output layer is W2+ α Δ W2, where α represents the learning rate and ranges from 0 to 1, and may be set as needed. An increment Δ W1 of the weight matrix W1, i.e., Δ W1 ═ x (W2 ×) is calculated using x, W2, and v 1. And correcting W1 by using the calculated delta W1, namely, the weight matrix of the corrected hidden layer is W1+ alpha delta W2.

Then, training the address pair [ a, c ], wherein the specific process is consistent with the above process, and for brevity, the detailed description is omitted, after obtaining a new output y1, calculating a difference v2 between vectors corresponding to y1 and c, and then correcting W1 and W2 again by using h and v 2.

After completing the corrections of W1 and W2 with a as the model input, address pairs [ b, c ], [ b, d ], [ b, e ] are constructed, and then each address pair is trained in turn into the skip-gram model, it being understood that either of the two approaches described above can be used to perform error back-propagation to complete the corrections of W1 and W2. The specific training process is consistent with the above and is not repeated.

And similarly, c is used as model input, an address pair [ c, d ], [ c, e ] is constructed, skip-gram model training is input, and any one of the above modes is selected to finish the correction of W1 and W2.

After training is completed on one training sample in the training set, i.e., on one address set, training is performed on the other training sample to continue to modify (update) W1 and W2, the specific training process of which is consistent with the above. It should be noted that the samples in the training set are not only trained once, but may need to be input into the skip-gram model many times for training. When the overall error in the entire training set is smaller than a preset threshold, that is, for any address pair or address set in the training set, the calculated error value (e.g., v1) or the average error value (e.g., v) is smaller than the preset threshold, it indicates that the skip-gram model has reached the convergence condition, that is, the skip-gram model has the capability of predicting the address corresponding to the second read request, and the trained skip-gram model can be used for prediction to support data prefetching.

It should be noted that, in the process of training the training samples in the training set and updating W1 and W2, the number of the same address pairs is positively correlated with the probability value corresponding to the address pair predicted by the skip-gram model, for example, if the number of the address pairs [ a, b ] is large and the number of [ a, c ] is small, the model will prefer b to c when predicting the address after a, in other words, when predicting a, the probability value that b appears after a and calculated by the model is much greater than the probability value that c appears after a.

Fig. 8 is a schematic flowchart of a second neural network model training method according to an embodiment of the present disclosure. As shown in fig. 8, the method includes, but is not limited to, the following steps:

s810: a computing device obtains a plurality of read requests.

Specifically, the specific implementation of S810 may refer to the related description in S610 above.

S820: the computing device divides the plurality of read requests into a plurality of address sets of read data.

Specifically, the specific implementation of S820 may refer to the related description in S620 described above.

S830: the computing device inputs each address set as a training sample into the skip-gram model for training.

Specifically, the specific implementation of S830 may refer to the related description in S630 above.

It should be understood that the first neural network model and the second neural network model are different only in the training sets, one is trained on the address set of the write data, and the other is trained on the address set of the read data, but the specific training methods are the same and can be referred to each other, and are not described herein again.

The method of the embodiments of the present application is described in detail above, and in order to better implement the above-mentioned aspects of the embodiments of the present application, correspondingly, the following also provides related equipment for implementing the above-mentioned aspects in a matching manner.

Embodiments of the present application further provide a data prefetching system, such as the data prefetching system 400 in fig. 4, which is used for executing the aforementioned data prefetching method. The present application does not limit the division of the functional units of the data prefetching system, and each unit in the data prefetching system may be increased, decreased, or merged as needed. Fig. 4 exemplarily provides a division of functional units:

the data prefetch system includes an acquisition unit 410, a prediction unit 420, a processing unit 430, and a sequence partitioning unit 440.

Specifically, the obtaining unit 410 is configured to execute the foregoing steps S510, S610, and S810, and optionally execute an optional method in the foregoing steps to obtain the first read request.

The sequence segmentation unit 440 is configured to perform the foregoing steps S620 and S820, and optionally perform an optional method in the foregoing steps, and segment the acquired read request or write request to obtain a plurality of address sets as training samples.

The prediction unit 420 is configured to perform the foregoing steps S520, S630, and S830, and optionally perform an optional method in the foregoing steps, train the neural network model, and output probability values of a plurality of addresses by using the trained neural network model.

The processing unit 430 is configured to execute the foregoing step S530, and optionally execute an optional method in the foregoing step, and acquire N addresses from the output of the prediction unit 420 according to the probability values and store corresponding data in the cache.

The four units can perform data transmission with each other through a communication path, and it should be understood that each unit included in the data prefetching system 400 may be a software unit, a hardware unit, or a software unit and a hardware unit.

Referring to fig. 9, fig. 9 is a schematic structural diagram of a computing device according to an embodiment of the present application. As shown in fig. 9, the computing device 900 includes: a processor 910, a communication interface 920, and a memory 930, the processor 910, the communication interface 920, and the memory 930 being connected to each other by an internal bus 940. It should be understood that the computing device may be a database server.

The processor 910 may be formed by one or more general-purpose processors, such as a Central Processing Unit (CPU), or a combination of a CPU and a hardware chip. The hardware chip may be an application-specific integrated circuit (ASIC), a Programmable Logic Device (PLD), or a combination thereof. The PLD may be a Complex Programmable Logic Device (CPLD), a field-programmable gate array (FPGA), a General Array Logic (GAL), or any combination thereof.

The bus 940 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus 940 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 9, but not only one bus or type of bus.

Memory 930 may include volatile memory (volatile memory), such as Random Access Memory (RAM); the memory 930 may also include a non-volatile memory (non-volatile memory), such as a read-only memory (ROM), a flash memory (flash memory), a Hard Disk Drive (HDD), or a solid-state drive (SSD); the memory 930 may also include combinations of the above. The program code may be functional modules used to implement the data prefetching system 400 or may be used to implement the method steps that are subject to execution by a computing device in the method embodiment shown in fig. 5.

The embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, may implement part or all of the steps of any one of the method embodiments described above, and implement the functions of any one of the functional modules described in fig. 4 above.

Embodiments of the present application also provide a computer program product, which when run on a computer or a processor, causes the computer or the processor to perform one or more steps of any of the methods described above. The respective constituent modules of the above-mentioned apparatuses may be stored in the computer-readable storage medium if they are implemented in the form of software functional units and sold or used as independent products.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

It should be understood that the reference herein to first, second, third, fourth, and various numerical designations is merely a convenient division to describe and is not intended to limit the scope of the present application.

It should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

It should also be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The steps in the method of the embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs.

The modules in the device can be merged, divided and deleted according to actual needs.

The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A method of data prefetching, comprising:

receiving a first read request;

inputting addresses of data read by the first read request into a neural network model, wherein the neural network model outputs a plurality of address probability values, each address probability value represents the probability that the data corresponding to each address is the data read by a second read request, and the second read request is the next read request of the first read request;

and acquiring N addresses from the output of the neural network model according to the probability value, and storing data corresponding to the N addresses in a cache.

2. The method of claim 1, wherein the causing data corresponding to the N addresses to be stored in a cache comprises:

determining M addresses of the N addresses, wherein the corresponding data are not in the cache, and M is a positive integer greater than or equal to 1;

and storing the data corresponding to the M addresses to the cache.

3. The method of claim 1, wherein the neural network model comprises a first neural network model and a second neural network model, wherein the first neural network model is used to train addresses for write data and the second neural network model is used to train addresses for read data;

the inputting the address of the data read by the first read request into a neural network model comprises:

inputting addresses of data read by the first read request into a first neural network model and a second neural network model respectively;

the obtaining N addresses from the output of the neural network model according to the probability values includes:

and acquiring the first N1 addresses from the output of the first neural network model in the descending order of probability values, and acquiring the first N2 addresses from the output of the second neural network model in the descending order of probability values, wherein the sum of the N1 and the N2 is equal to the N.

4. The method of claim 3, wherein the method further comprises:

acquiring an address set of write-in data, wherein the write-in time interval of two data with adjacent write-in time in the address set is smaller than a preset value;

training addresses in a set of addresses through the first neural network model, so that after a first address in the set of addresses is input into the first neural network model, the probability of at least one address after the first address in the plurality of addresses output by the first neural network model is increased, wherein the first address is any one address in the set of addresses.

5. The method of claim 3, wherein the method further comprises:

acquiring an address set for reading data, wherein the reading time interval of two data adjacent to the reading time in the address set is smaller than a preset value;

training addresses in a set of addresses through the second neural network model, so that after a first address in the set of addresses is input into the second neural network model, the probability of at least one address after the first address in a plurality of addresses output by the second neural network model is increased, wherein the first address is any one address in the set of addresses.

6. The method of any one of claims 2-5, further comprising:

counting the number A of addresses obtained from the output of the first neural network model and the number B of addresses obtained from the output of the second neural network model, which are hit in the cache, within a preset time length;

adjusting the number of addresses N1 obtained from the output of the first neural network model and the number of addresses N2 obtained from the output of the second neural network model according to A and B.

7. A computing device, comprising:

a receiving unit configured to receive a first read request;

a prediction unit, configured to input an address of data read by the first read request into a neural network model, where the neural network model outputs a probability value of a plurality of addresses, where the probability value of each address represents a probability that data corresponding to each address is data read by a second read request, and the second read request is a read request next to the first read request; and acquiring N addresses from the output of the neural network model according to the probability value, and storing data corresponding to the N addresses in a cache.

8. The computing device of claim 7, wherein the prediction unit is further to:

and storing the data corresponding to the M addresses to the cache.

9. The computing device of claim 7, wherein the neural network model comprises a first neural network model and a second neural network model, wherein,

the first neural network model is used for training addresses of write-in data, predicting the addresses of the data read by the first read request and outputting probability values of a plurality of addresses;

the second neural network model is used for training addresses of read data, predicting the addresses of the data read by the first read request and outputting probability values of a plurality of addresses;

the prediction unit is further configured to obtain the first N1 addresses from the output of the first neural network model in an order of decreasing probability values, and obtain the first N2 addresses from the output of the second neural network model in an order of decreasing probability values, where a sum of N1 and N2 is equal to N.

10. The computing device of claim 9,

the prediction unit is further configured to:

training the addresses in the address set, so that after a first address in the address set is input into the first neural network model, the probability of at least one address after the first address in a plurality of addresses output by the first neural network model is increased, wherein the first address is any one address in the address set.

11. The computing device of claim 9, wherein the prediction unit is further to:

training the addresses in the address set, so that after a first address in the address set is input into the second neural network model, the probability of at least one address after the first address in the address set in a plurality of addresses output by the second neural network model is increased, wherein the first address is any one address in the address set.

12. The computing device of any of claims 8-11,

the computing device further comprises a processing unit, configured to count a number a of addresses obtained from the output of the first neural network model that hit in the cache and a number B of addresses obtained from the output of the second neural network model that hit in the cache within a preset duration;

13. An intelligent chip, wherein the intelligent chip has instructions burned therein, and the intelligent chip executes the instructions to perform the method according to any one of claims 1 to 6.

14. A computing device comprising a processor, a memory having program instructions stored therein, the processor executing the instructions in the memory to perform the method of any of claims 1-6.