WO2020156259A1 - Memory management method and device, mobile terminal, and storage medium - Google Patents

Memory management method and device, mobile terminal, and storage medium Download PDF

Info

Publication number
WO2020156259A1
WO2020156259A1 PCT/CN2020/072876 CN2020072876W WO2020156259A1 WO 2020156259 A1 WO2020156259 A1 WO 2020156259A1 CN 2020072876 W CN2020072876 W CN 2020072876W WO 2020156259 A1 WO2020156259 A1 WO 2020156259A1
Authority
WO
WIPO (PCT)
Prior art keywords
memory
allocated
unit
block
tensor
Prior art date
Application number
PCT/CN2020/072876
Other languages
French (fr)
Chinese (zh)
Inventor
陈岩
刘耀勇
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2020156259A1 publication Critical patent/WO2020156259A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This application relates to the field of communication technology, and in particular to a memory management method, device, mobile terminal and storage medium.
  • the current neural network algorithm framework (for example, Tensorflow Lite) can support Android and IOS.
  • the current neural network algorithm framework can support Android and IOS.
  • tensors (Tensor) of complex neural networks, the required memory size varies. If frequent The allocation and release of small memory will form some memory fragments in the entire memory block, and these memory fragments cannot be reused, resulting in a waste of memory.
  • the embodiments of the present application provide a memory management method, device, mobile terminal, and storage medium, which can avoid memory waste under the neural network algorithm framework.
  • an embodiment of the present application provides a memory management method based on a neural network algorithm framework.
  • the neural network algorithm framework includes a plurality of tensor units, and the method includes:
  • an embodiment of the present application provides a memory management device, the memory management device is applied to a neural network algorithm framework, the neural network algorithm framework includes a plurality of tensor units, and the memory management device includes:
  • a receiving unit configured to receive a memory application request sent by a first Tensor unit, the memory application request carrying memory space to be applied for, and the first Tensor unit is any one of the multiple tensor Tensor units;
  • the detecting unit is configured to detect whether the memory space that needs to be applied for is less than or equal to the capacity of the current largest available blank memory block of the allocated memory;
  • the memory arranging unit is configured to perform a memory arranging operation on the allocated memory when the detection unit detects that the memory space that needs to be applied for is greater than the capacity of the current largest available blank memory block of the allocated memory.
  • an embodiment of the present application provides a mobile terminal, including a processor and a memory, the memory is configured to store one or more programs, and the one or more programs are configured to be executed by the processor.
  • the program includes instructions for executing the steps in the first aspect of the embodiments of the present application.
  • an embodiment of the present application provides a computer-readable storage medium, wherein the foregoing computer-readable storage medium stores a computer program for electronic data exchange, wherein the foregoing computer program enables a computer to execute Some or all of the steps described in one aspect.
  • the embodiments of the present application provide a computer program product, wherein the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute Example part or all of the steps described in the first aspect.
  • the computer program product may be a software installation package.
  • the memory management method based on the neural network algorithm framework described in the embodiments of this application includes a memory management unit and a plurality of tensor Tensor units, and the memory management unit receives the memory sent by the first Tensor unit.
  • Application request the memory application request carries the memory space that needs to be applied for, the first Tensor unit is any one of the multiple tensor Tensor units; the memory management unit detects whether the memory space that needs to be applied for is less than or equal to The capacity of the current largest available blank memory block of the allocated memory; if the memory space that needs to be applied for is greater than the capacity of the current largest available blank memory block of the allocated memory, the memory management unit performs a memory arranging operation on the allocated memory .
  • a memory arranging operation can be performed to make the allocated memory empty
  • the memory block that comes out can meet the memory application requirements of the Tensor unit, avoid frequent application of larger memory to the operating system, avoid memory waste under the framework of neural network algorithm, and save the need for frequent application and release of memory from the operating system time.
  • Figure 1 is a neural network algorithm framework disclosed in an embodiment of the application.
  • FIG. 2 is a schematic flowchart of a memory management method based on a neural network algorithm framework disclosed in an embodiment of the present application
  • FIG. 3 is a schematic flowchart of another memory management method based on a neural network algorithm framework disclosed in an embodiment of the present application
  • FIG. 4 is a schematic diagram of a memory organization disclosed in an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a memory management device disclosed in an embodiment of the present application.
  • Fig. 6 is a schematic structural diagram of a mobile terminal disclosed in an embodiment of the present application.
  • the mobile terminals involved in the embodiments of this application may include various handheld devices with wireless communication functions, vehicle-mounted devices, wearable devices, computing devices or other processing devices connected to wireless modems, as well as various forms of user equipment (User Equipment, UE), mobile station (Mobile Station, MS), terminal device (terminal device), etc.
  • UE User Equipment
  • MS Mobile Station
  • terminal device terminal device
  • FIG. 1 is a neural network algorithm framework disclosed in an embodiment of the present application.
  • the neural network algorithm framework includes a memory management unit and a plurality of tensor units.
  • Tensor unit can also be called Tensor, or tensor, or tensor unit.
  • Tensor unit is used for inference calculation of neural network.
  • the Tensor unit can apply for memory from the memory management unit, and the Tensor unit can also release memory from the memory management unit.
  • the neural network algorithm framework of this application can be applied to mobile terminals such as mobile phones and tablet computers. The memory resources of mobile terminals are relatively precious.
  • the Tensor unit in the neural network algorithm framework requires frequent memory application and memory release.
  • the neural network algorithm framework used in this application will first apply for a large block of memory when applying for memory from the operating system. Then, the large block of memory is managed by the memory management unit. The Tensor unit only needs to apply for memory and release the memory from the memory management unit. There is no need to frequently apply for memory and release memory from the operating system, which can improve the network reasoning time of the neural network algorithm framework.
  • the neural network algorithm framework of this application can be TensorFlow or TensorFlow Lite.
  • TensorFlow is a framework for training and running neural network models that runs on a personal computer (PC).
  • ensorFlow Lite is a framework for training and running neural network models that runs on the mobile terminal.
  • the mobile terminal can run the IOS system or the Android system.
  • FIG. 2 is a schematic flowchart of a memory management method based on a neural network algorithm framework disclosed in an embodiment of the present application. As shown in FIG. 2, the memory management method based on a neural network algorithm framework includes the following steps.
  • the memory management unit receives a memory application request sent by a first Tensor unit, where the memory application request carries memory space that needs to be applied for, and the first Tensor unit is any one of a plurality of tensor Tensor units.
  • the neural network algorithm framework includes a memory management unit and a plurality of tensor units.
  • Neural network algorithm framework When performing neural network calculations, the Tensor unit needs to frequently apply and release memory. When the Tensor unit applies for memory, it directly applies to the memory management unit, without frequent application and release of memory from the operating system, which can improve the network reasoning time of the neural network algorithm framework.
  • the memory management unit detects whether the memory space that needs to be applied for is less than or equal to the capacity of the current largest available blank memory block of the allocated memory.
  • the memory management unit performs a memory arranging operation on the allocated memory.
  • the allocated memory is a large block of memory allocated by the operating system of the mobile terminal to the neural network computing framework for neural network calculation and inference.
  • This large block of memory is specifically used for neural network calculations and inferences.
  • the memory management unit can apply to the operating system for the allocated memory according to the number of Tensor units of the neural network calculation framework.
  • the size of the allocated memory is related to the number of Tensor units of the neural network calculation framework. Generally speaking, the larger the number of Tensor units of the neural network calculation framework, the larger the allocated memory.
  • This large block of memory is memory with consecutive addresses.
  • the allocated memory can include many small memory blocks, and these memory blocks can be in an occupied state or a blank state. If the memory block is in the occupied state, the memory block is an occupied memory block; if the memory block is in the blank state, the memory block is a blank memory block.
  • the memory management unit receives a release request for the occupied memory block, it will release the occupied memory block, the occupied memory block will become a blank memory block, and the memory block will be The status of is changed from occupied status to blank status.
  • the memory management unit For a blank memory block, when the memory management unit receives a memory application request from a Tensor unit, it selects a suitable blank memory block and allocates it to the Tensor unit for use. The blank memory block will become an occupied memory block and the state of the memory block Change from blank state to occupied state.
  • the memory management unit when the memory management unit receives the memory application request of the Tensor unit, it selects a suitable blank memory block and allocates it to the Tensor unit for use, specifically:
  • the memory management unit When the memory management unit receives the memory application request of the Tensor unit, it selects the smallest available blank memory block larger than the memory space that needs to be applied from the allocated memory.
  • the memory management unit selects the smallest available blank memory block larger than the required memory space from the available blank memory block for the Tensor unit to use;
  • the memory management unit executes on the allocated memory Memory organization operation.
  • the memory management unit performs memory arranging operations on the allocated memory, specifically: the memory management unit compresses the gaps between the occupied memory blocks of the allocated memory to increase the available blank memory blocks of the allocated memory Capacity.
  • the memory management unit may compress the occupied memory block of the allocated memory in one direction, so as to make a larger blank memory block available in the other direction.
  • the memory management unit may also divide the occupied memory blocks of the allocated memory into two groups, one group compresses in the first direction, and the other group compresses in the direction opposite to the first direction, so that A larger blank memory block is left in the middle of the allocated memory.
  • a memory arranging operation can be performed to make the allocated memory
  • the memory block vacated by the memory can meet the memory application requirements of the Tensor unit, avoid frequently requesting larger memory from the operating system, avoid memory waste under the neural network algorithm framework, and save the frequent application and release of memory from the operating system The time required.
  • FIG. 3 is a schematic flowchart of another memory management method based on a neural network algorithm framework disclosed in an embodiment of the present application.
  • FIG. 3 is further optimized on the basis of FIG. 2.
  • the memory management method based on the neural network algorithm framework includes the following steps.
  • the memory management unit receives a memory application request sent by a first Tensor unit, where the memory application request carries memory space that needs to be applied for, and the first Tensor unit is any one of a plurality of tensor Tensor units.
  • the memory management unit detects whether the memory space that needs to be applied for is less than or equal to the capacity of the current largest available blank memory block of the allocated memory.
  • the memory management unit performs a memory arranging operation on the allocated memory.
  • step 301 to step 303 in the embodiment of the present application please refer to step 201 to step 203 shown in FIG. 2, which will not be repeated here.
  • the memory management unit performs a memory arranging operation on the allocated memory, which may specifically be:
  • the memory management unit compresses the occupied memory block in the allocated memory toward the first memory block of the allocated memory, so that the memory gap between adjacent memory blocks in the allocated memory is smaller than the preset threshold;
  • the memory management sheet adjusts the memory index corresponding to the occupied memory block in the allocated memory; where the first memory index includes the starting memory address of the first memory block and the memory length of the first memory block, and the first memory block is occupied In any of the memory blocks, the first memory index is a memory index corresponding to the first memory block.
  • the memory index can be represented by a structure pointer.
  • the memory index includes the starting memory address of the corresponding occupied memory block and the memory length of the occupied memory block. Among them, the memory length can also be called the memory size.
  • the memory gap between adjacent memory blocks refers to the size of the blank memory blocks between adjacent memory blocks that cannot be used.
  • FIG. 4 is a schematic diagram of a memory organization disclosed in an embodiment of the present application.
  • the allocated memory is a whole block of continuous memory that the memory management unit applies to the operating system.
  • "Memory 1", “Memory 2", “Memory 3", and “Memory 4" indicate the memory location in the allocated memory, and are occupied memory blocks.
  • the locations not occupied by "Memory 1", “Memory 2", “Memory 3", and “Memory 4" are empty memory blocks.
  • the 0-3 in the memory index indicates the order of the memory blocks, and the memory index can use a structure pointer to indicate the starting address of each memory block and the length of memory occupied.
  • the allocated memory can be numbered from 0-99, "Memory 1" is the memory block numbered 5-20, “Memory 2" is the memory block numbered 30-45, and “Memory 3" is the number The memory block numbered 55-70, “Memory 4" is the memory block numbered 80-95.
  • the memory index corresponding to "Memory 1" includes the starting memory address of "Memory 1" (for example, 5) and memory length (for example, 16)
  • the memory index corresponding to "Memory 2" includes the memory of "Memory 2" Starting memory address (for example, 30) and memory length (for example, 16)
  • the memory index corresponding to "Memory 3" includes the starting memory address of "Memory 3" (for example, 55) and memory length (for example, , Is 16)
  • the memory index corresponding to "Memory 4" includes the starting memory address of "Memory 4" (for example, 80) and the memory length (for example, 16).
  • the empty memory blocks of the allocated memory are the first empty memory block numbered 0-4, the second empty memory block numbered 21-29, the third empty memory block numbered 46-54, The fourth blank memory block numbered 71-79, and the fifth blank memory block numbered 96-99.
  • the memory length of the first blank memory block is 5, the memory length of the second blank memory block is 9, the memory length of the third blank memory block is 9, the memory length of the fourth blank memory block is 9, and the fifth blank memory is The memory length of the block is 4. If the requested memory size of the first Tensor unit is 20, there is no blank memory block that meets the requirements in the allocated memory.
  • the number of "Memory 1" becomes the memory block of 1-16
  • the number of "Memory 2" becomes the memory block of 18-33
  • the number of "Memory 3" becomes the memory of 35-50.
  • the number of block, "Memory 4" becomes 52-67 memory block.
  • the number of the blank memory block in the allocated memory is 68-99, which is located at the end of the allocated memory. It can also be called the last blank memory block. Its size is 32, which can fully satisfy the application of the first Tensor unit. If required, the memory block numbered 69-88 in the blank memory block can be allocated to the first Tensor unit.
  • the starting memory address in the memory index corresponding to "Memory 1" changes from 5 to 1
  • the starting memory address in the memory index corresponding to "Memory 2" changes from 30 to 18
  • the starting memory address in the memory index corresponding to "Memory 3" changes from 55 to 35
  • the starting memory address in the memory index corresponding to "Memory 4" changes from 80 to 52.
  • the memory length in the memory index corresponding to "Memory 1", “Memory 2", “Memory 3", and "Memory 4" will not change due to memory consolidation.
  • the memory management unit detects whether the memory space that needs to be applied for is less than or equal to the capacity of the last blank memory block of the allocated memory.
  • the memory management unit allocates the last blank memory block to the first Tensor unit.
  • the memory management unit allocates the last blank memory block to the first Tensor unit. It can either allocate the entire last blank memory block to the first Tensor unit, or allocate a segment of continuous memory space of the last blank memory block. For the first Tensor unit, the specific situation depends on the required memory space and the capacity of the last blank memory block.
  • the entire last blank memory block can be allocated to the first Tensor unit; if the memory space needed to be applied for is much smaller than For the capacity of the last blank memory block of the allocated memory, one segment of the continuous memory space of the last blank memory block is allocated to the first Tensor unit (specifically, the last blank memory block can be adjacent to the allocated adjacent A continuous memory space of the memory block is allocated to the first Tensor unit).
  • the memory arranging operation is to compress the occupied memory block in the allocated memory toward the first memory block of the allocated memory, thereby generating the last blank memory block, so that the capacity of the last blank memory block is greater than The capacity of any blank memory block before memory arranging is performed. If the required memory space is less than or equal to the capacity of the last blank memory block of the allocated memory, the memory management unit allocates the last blank memory block to the first Tensor unit. After the memory is sorted, larger blank memory blocks can be obtained, avoiding frequent requests for larger memory from the operating system, avoiding memory waste under the framework of neural network algorithms, and saving the need for frequent requests and releases of memory from the operating system time.
  • the memory management unit allocates the last blank memory block to the first Tensor unit, specifically:
  • the memory management unit sends the memory index corresponding to the last blank memory block to the first Tensor unit.
  • the memory management unit when allocating memory for the first Tensor unit, only needs to send the memory index corresponding to the last blank memory block to the first Tensor unit, and the first Tensor unit can then use the last blank memory
  • the memory index corresponding to the block finds the start memory address and the termination memory address of the last blank memory block, and stores the content corresponding to the first Tensor unit in the last blank memory block.
  • the termination memory address of the last blank memory block can be determined according to the start memory address and memory length of the last blank memory block included in the memory index , And then get the start memory address and the stop memory address of the last blank memory block. It can be seen that the memory application in the allocated memory only needs to return the memory index. Compared with the memory application to the operating system, it can greatly save the memory allocation time, thereby improving the network reasoning time of the neural network algorithm framework.
  • the method flow shown in FIG. 3 may further include the following steps:
  • the memory management unit applies to the operating system to allocate a target large block of memory, and the target large block of memory is greater than or equal to the allocated memory and the memory to be applied for The sum of the memory size of the space.
  • the memory management unit copies the content stored in the allocated memory to the target large block of memory, and releases the allocated memory.
  • the memory management unit needs to apply to the operating system again Allocate a block of memory larger than the currently allocated allocated memory to meet the computational requirements of the neural network algorithm framework.
  • the size of the target block of memory to be reapplied is related to the number of Tensor units of the neural network algorithm framework and the algorithm complexity of the neural network algorithm framework. Generally speaking, the larger the number of Tensor units of the neural network algorithm framework, the higher the algorithm complexity of the neural network algorithm framework, and the larger the target block memory for reapplication.
  • the memory management unit After re-applying for the target large block of memory, the memory management unit copies the content stored in the allocated memory to the target large block of memory, and releases the allocated memory. If the subsequent Tensor unit needs to apply for memory, it can apply to the memory management unit, which can select a blank memory block from the target large block of memory and allocate it to the Tensor unit.
  • the memory management unit management when the allocated memory managed by the memory management unit cannot meet the calculation requirements of the neural network algorithm framework, the memory management unit management re-applies to the operating system to allocate a larger target block of memory to satisfy the neural network algorithm
  • the computational requirements of the framework The target large block of memory can meet the memory application requirements of all Tensor units of the neural network algorithm framework, and there is no need to frequently apply for memory from the operating system, which can save the time required for frequent application and release of memory from the operating system.
  • the method flow shown in FIG. 3 may further include the following steps:
  • the memory management unit receives the memory release request sent by the second Tensor unit.
  • the memory release request carries the memory index corresponding to the memory block that needs to be released, and marks the memory index corresponding to the memory block that needs to be released as blank.
  • the second Tensor unit is more than Any one of Tensor units.
  • the memory management unit when releasing the memory for the second Tensor unit, the memory management unit only needs to mark the memory index corresponding to the memory block to be released carried in the memory release request sent by the second Tensor unit as a blank state. freed. Compared with releasing memory to the operating system, the memory release time can be greatly saved, and the network reasoning time of the neural network algorithm framework can be improved.
  • the method flow shown in FIG. 3 may further include the following steps:
  • the memory management unit records the memory size occupied by the first Tensor unit through the application programming interface API for memory management of the allocated memory;
  • the memory management unit receives the memory occupation query instruction for the first program, and obtains the memory size occupied by all the Tensor units used by the first program through the application programming interface API for memory management of the allocated memory.
  • a memory management unit (Simple Memory Arena) is used to uniformly manage memory, and the memory records of the entire program can be recorded by adding records to the application and released application programming interface API, which is very easy to debug.
  • the memory management unit can use the Profile program debugging tool to view the memory occupied by each memory application module (for example, Tensor unit) for performance tuning, and can check whether there is any memory that does not need to be applied for can be optimized.
  • each memory application module for example, Tensor unit
  • the memory management method of this application is easy to expand, and the memory occupied by the program can be checked to determine whether the memory is too large, whether there is a memory leak, and the memory can be optimized.
  • the mobile terminal includes hardware structures and/or software modules corresponding to each function.
  • the present invention can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered as going beyond the scope of the present invention.
  • the embodiment of the present application may divide the mobile terminal into functional units according to the foregoing method examples.
  • each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit. It should be noted that the division of units in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other division methods in actual implementation.
  • FIG. 5 is a schematic structural diagram of a memory management device disclosed in an embodiment of the present application. As shown in FIG. 5, the memory management device is applied to a neural network algorithm framework.
  • the neural network algorithm framework includes a plurality of tensor units.
  • the memory management device 500 includes a receiving unit 501, a detection unit 502, and a memory sorting unit 503. :
  • the receiving unit 501 is configured to receive a memory application request sent by a first Tensor unit, where the memory application request carries memory space to be applied for, and the first Tensor unit is any one of the multiple tensor Tensor units;
  • the detecting unit 502 is configured to detect whether the memory space that needs to be applied for is less than or equal to the capacity of the current largest available blank memory block of the allocated memory;
  • the memory arranging unit 503 is configured to perform a memory arranging operation on the allocated memory when the detecting unit 502 detects that the memory space that needs to be applied for is greater than the capacity of the current largest available blank memory block of the allocated memory.
  • the memory management apparatus 500 may further include a memory allocation unit 504.
  • the detecting unit 502 is further configured to detect whether the memory space that needs to be applied for is less than or equal to the capacity of the last blank memory block of the allocated memory after the memory arranging unit 503 performs a memory arranging operation on the allocated memory;
  • the memory allocation unit 504 is configured to allocate the last blank memory block to the case that the detection unit 502 detects that the memory space that needs to be applied for is less than or equal to the capacity of the last blank memory block of the allocated memory The first Tensor unit.
  • the memory management apparatus 500 may further include a memory application unit 505 and a memory release unit 506.
  • the memory application unit 505 is configured to apply to the operating system for the allocation of a target large block of memory when the detection unit 502 detects that the memory space that needs to be applied for is greater than the capacity of the last blank memory block of the allocated memory.
  • the target large block of memory is greater than or equal to the sum of the memory size of the allocated memory and the memory space that needs to be applied for;
  • the memory release unit 506 is configured to copy the content stored in the allocated memory to the target large block of memory, and release the allocated memory.
  • the memory arranging unit 503 performs a memory arranging operation on the allocated memory, specifically: compressing the occupied memory block in the allocated memory toward the first memory block of the allocated memory, to Make the memory gap between adjacent memory blocks in the allocated memory smaller than a preset threshold; adjust the memory index corresponding to the occupied memory block in the allocated memory; wherein the first memory index includes the first memory block And the memory length of the first memory block, the first memory block is any one of the occupied memory blocks, and the first memory index corresponds to the first memory block The memory index.
  • the memory allocation unit 504 allocates the last blank memory block to the first Tensor unit, specifically: sending a memory index corresponding to the last blank memory block to the first Tensor unit.
  • the receiving unit 501 is further configured to receive a memory release request sent by the second Tensor unit, where the memory release request carries the memory index corresponding to the memory block to be released, and the memory index corresponding to the memory block to be released Marked as a blank state, the second Tensor unit is any one of the multiple tensor Tensor units.
  • the memory management apparatus 500 may further include a recording unit 507 and an acquiring unit 508.
  • the recording unit 507 is configured to record the size of the memory occupied by the first Tensor unit through the application programming interface API for memory management of the allocated memory;
  • the receiving unit 501 is further configured to receive a memory occupation query instruction for the first program
  • the obtaining unit 508 is configured to obtain the memory size occupied by all Tensor units used by the first program through the application programming interface API for memory management of the allocated memory.
  • the memory management device 500 may be the memory management unit in FIGS. 1 to 4.
  • the receiving unit 501 in FIG. 5 may be a communication interface
  • the detection unit 502, the memory arrangement unit 503, the memory allocation unit 504, the memory application unit 505, the memory release unit 506, the recording unit 507, and the acquisition unit 508 may be processors.
  • the memory management device shown in 5 may further include a storage unit, which may be a memory (for example, a non-volatile memory).
  • FIG. 6 is a schematic structural diagram of a mobile terminal disclosed in an embodiment of the present application.
  • the mobile terminal 600 includes a processor 601 and a memory 602.
  • the mobile terminal 600 may also include a bus 603.
  • the processor 601 and the memory 602 may be connected to each other through the bus 603.
  • the bus 603 may be an interconnection of peripheral components. Peripheral Component Interconnect (PCI) bus or Extended Industry Standard Architecture (EISA) bus, etc.
  • PCI Peripheral Component Interconnect
  • EISA Extended Industry Standard Architecture
  • the bus 603 can be divided into an address bus, a data bus, a control bus, and so on. For ease of representation, only one thick line is used in FIG. 6, but it does not mean that there is only one bus or one type of bus.
  • the mobile terminal 600 may also include an input and output device 604, and the input and output device 604 may include a display screen, such as a liquid crystal display screen.
  • the memory 602 is used to store one or more programs containing instructions; the processor 601 is used to call the instructions stored in the memory 602 to execute some or all of the method steps in FIGS. 2 to 3.
  • An embodiment of the present application also provides a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and the computer program enables the computer to execute any neural network-based algorithm framework described in the above method embodiments Part or all of the steps of the memory management method.
  • the embodiments of the present application also provide a computer program product.
  • the computer program product includes a non-transitory computer-readable storage medium storing a computer program.
  • the computer program is operable to cause a computer to execute any of the methods described in the above method embodiments. Part or all of the steps of a memory management method based on the neural network algorithm framework.
  • the disclosed device may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable memory.
  • the technical solution of the present invention essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory, A number of instructions are included to enable a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in each embodiment of the present invention.
  • the aforementioned memory includes: U disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.
  • the program can be stored in a computer-readable memory, and the memory can include: a flash disk , Read-only memory (English: Read-Only Memory, abbreviation: ROM), random access device (English: Random Access Memory, abbreviation: RAM), magnetic disk or optical disk, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Memory System (AREA)

Abstract

Embodiments of the present application disclose a memory management method and device, a mobile terminal, and a storage medium. The method comprises: receiving a memory application request sent by a first tensor unit, the memory application request carrying a required memory space, and the first tensor unit being any one of a plurality of tensor units; detecting whether or not the required memory space is less than or equal to the capacity of a current largest available blank memory block in an already-allocated memory; and if not, performing a memory reorganization operation on the already-allocated memory. The embodiments of the present application are employed in a neural network algorithm framework to prevent wastage of memory.

Description

内存管理方法、装置、移动终端及存储介质Memory management method, device, mobile terminal and storage medium 技术领域Technical field
本申请涉及通信技术领域,具体涉及一种内存管理方法、装置、移动终端及存储介质。This application relates to the field of communication technology, and in particular to a memory management method, device, mobile terminal and storage medium.
背景技术Background technique
目前的神经网络算法框架(比如,Tensorflow Lite)中,可以支持Android和IOS,其在进行内存分配时,由于复杂神经网络的张量(Tensor)很多,所需占用内存大小不一,如果频繁的有小内存的分配和释放,会在整个内存块中形成一些内存碎片,而这些内存碎片无法再复用,导致内存浪费。The current neural network algorithm framework (for example, Tensorflow Lite) can support Android and IOS. During memory allocation, due to the large number of tensors (Tensor) of complex neural networks, the required memory size varies. If frequent The allocation and release of small memory will form some memory fragments in the entire memory block, and these memory fragments cannot be reused, resulting in a waste of memory.
发明内容Summary of the invention
本申请实施例提供了一种内存管理方法、装置、移动终端及存储介质,可以在神经网络算法框架下避免内存浪费。The embodiments of the present application provide a memory management method, device, mobile terminal, and storage medium, which can avoid memory waste under the neural network algorithm framework.
第一方面,本申请实施例提供一种基于神经网络算法框架的内存管理方法,所述神经网络算法框架包括多个张量Tensor单元,所述方法包括:In the first aspect, an embodiment of the present application provides a memory management method based on a neural network algorithm framework. The neural network algorithm framework includes a plurality of tensor units, and the method includes:
接收第一Tensor单元发送的内存申请请求,所述内存申请请求携带需要申请的内存空间,所述第一Tensor单元为所述多个张量Tensor单元中的任一个;Receiving a memory application request sent by a first Tensor unit, where the memory application request carries memory space that needs to be applied for, and the first Tensor unit is any one of the multiple tensor Tensor units;
检测所述需要申请的内存空间是否小于或等于已分配内存的当前最大可用空白内存块的容量;Detecting whether the memory space that needs to be applied for is less than or equal to the capacity of the current largest available blank memory block of the allocated memory;
若所述需要申请的内存空间大于所述已分配内存的当前最大可用空白内存块的容量,对所述已分配内存执行内存整理操作。If the memory space that needs to be applied for is greater than the capacity of the current largest available blank memory block of the allocated memory, a memory arranging operation is performed on the allocated memory.
第二方面,本申请实施例提供了一种内存管理装置,所述内存管理装置应用于神经网络算法框架,所述神经网络算法框架包括多个张量Tensor单元,所述内存管理装置包括:In a second aspect, an embodiment of the present application provides a memory management device, the memory management device is applied to a neural network algorithm framework, the neural network algorithm framework includes a plurality of tensor units, and the memory management device includes:
接收单元,用于接收第一Tensor单元发送的内存申请请求,所述内存申请请求携带需要申请的内存空间,所述第一Tensor单元为所述多个张量Tensor单元中的任一个;A receiving unit, configured to receive a memory application request sent by a first Tensor unit, the memory application request carrying memory space to be applied for, and the first Tensor unit is any one of the multiple tensor Tensor units;
检测单元,用于检测所述需要申请的内存空间是否小于或等于已分配内存的当前最大可用空白内存块的容量;The detecting unit is configured to detect whether the memory space that needs to be applied for is less than or equal to the capacity of the current largest available blank memory block of the allocated memory;
内存整理单元,用于在所述检测单元检测到所述需要申请的内存空间大于已分配内存的当前最大可用空白内存块的容量的情况下,对所述已分配内存执行内存整理操作。The memory arranging unit is configured to perform a memory arranging operation on the allocated memory when the detection unit detects that the memory space that needs to be applied for is greater than the capacity of the current largest available blank memory block of the allocated memory.
第三方面,本申请实施例提供一种移动终端,包括处理器、存储器,所述存储器用于存储一个或多个程序,所述一个或多个程序被配置成由所述处理器执行,上述程序包括用于执行本申请实施例第一方面中的步骤的指令。In a third aspect, an embodiment of the present application provides a mobile terminal, including a processor and a memory, the memory is configured to store one or more programs, and the one or more programs are configured to be executed by the processor. The program includes instructions for executing the steps in the first aspect of the embodiments of the present application.
第四方面,本申请实施例提供了一种计算机可读存储介质,其中,上述计算机可读存储介质存储用于电子数据交换的计算机程序,其中,上述计算机程序使得计算机执行如本申请实施例第一方面中所描述的部分或全部步骤。In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, wherein the foregoing computer-readable storage medium stores a computer program for electronic data exchange, wherein the foregoing computer program enables a computer to execute Some or all of the steps described in one aspect.
第五方面,本申请实施例提供了一种计算机程序产品,其中,上述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,上述计算机程序可操作来使计算机执行如本申请实施例第一方面中所描述的部分或全部步骤。该计算机程序产品可以为一个软件安装包。In a fifth aspect, the embodiments of the present application provide a computer program product, wherein the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute Example part or all of the steps described in the first aspect. The computer program product may be a software installation package.
可以看出,本申请实施例中所描述的基于神经网络算法框架的内存管理方法,该神经网络算法框架包括内存管理单元和多个张量Tensor单元,内存管理单元接收第一Tensor单元发送的内存申请请求,所述内存申请请求携带需要申请的内存空间,所述第一Tensor单元为所述多个张量Tensor单元中的任一个;内存管理单元检测所述需要申请的内存空间是否小于或等于已分配内存的当前最大可用空白内存块的容量;若所述需要申请的内存空间大于所述已分配内存的当前最大可用空白内存块的容量,内存管理单元对所述已分配内存执行内存整理操作。本申请实施例可以在Tensor单元向内存管理单元申请内存时,在已分配内存的当前最大可用空白内存块无法满足该Tensor单元的内存申请需求时,执行一次内存整理操作,以使已分配内存空出来的内存块能够满足该Tensor单元的内存申请需求,避免频繁向操作***申请更大的内存,可以在神经网络算法框架下避免内存浪费,还可以节省由于频繁向操作***申请和释放内存所需的时间。It can be seen that the memory management method based on the neural network algorithm framework described in the embodiments of this application includes a memory management unit and a plurality of tensor Tensor units, and the memory management unit receives the memory sent by the first Tensor unit. Application request, the memory application request carries the memory space that needs to be applied for, the first Tensor unit is any one of the multiple tensor Tensor units; the memory management unit detects whether the memory space that needs to be applied for is less than or equal to The capacity of the current largest available blank memory block of the allocated memory; if the memory space that needs to be applied for is greater than the capacity of the current largest available blank memory block of the allocated memory, the memory management unit performs a memory arranging operation on the allocated memory . In the embodiment of the application, when a Tensor unit applies for memory from the memory management unit, when the current maximum available free memory block of the allocated memory cannot meet the memory application requirements of the Tensor unit, a memory arranging operation can be performed to make the allocated memory empty The memory block that comes out can meet the memory application requirements of the Tensor unit, avoid frequent application of larger memory to the operating system, avoid memory waste under the framework of neural network algorithm, and save the need for frequent application and release of memory from the operating system time.
附图说明Description of the drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly describe the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments of the application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative work.
图1是本申请实施例公开的一种神经网络算法框架;Figure 1 is a neural network algorithm framework disclosed in an embodiment of the application;
图2是本申请实施例公开的一种基于神经网络算法框架的内存管理方法的流程示意图;2 is a schematic flowchart of a memory management method based on a neural network algorithm framework disclosed in an embodiment of the present application;
图3是本申请实施例公开的另一种基于神经网络算法框架的内存管理方法的流程示意图;3 is a schematic flowchart of another memory management method based on a neural network algorithm framework disclosed in an embodiment of the present application;
图4是本申请实施例公开的一种内存整理的示意图;4 is a schematic diagram of a memory organization disclosed in an embodiment of the present application;
图5是本申请实施例公开的一种内存管理装置的结构示意图;FIG. 5 is a schematic structural diagram of a memory management device disclosed in an embodiment of the present application;
图6是本申请实施例公开的一种移动终端的结构示意图。Fig. 6 is a schematic structural diagram of a mobile terminal disclosed in an embodiment of the present application.
具体实施方式detailed description
为了使本技术领域的人员更好地理解本发明方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to enable those skilled in the art to better understand the solutions of the present invention, the technical solutions in the embodiments of the present application will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only It is a part of the embodiments of the present invention, not all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.
本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、***、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其他步骤或单元。The terms "first", "second", etc. in the specification and claims of the present invention and the above-mentioned drawings are used to distinguish different objects, rather than to describe a specific sequence. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, product, or device that includes a series of steps or units is not limited to the listed steps or units, but optionally includes unlisted steps or units, or optionally also includes Other steps or units inherent in these processes, methods, products or equipment.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本发明的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。The reference to "embodiment" herein means that a specific feature, structure or characteristic described in conjunction with the embodiment may be included in at least one embodiment of the present invention. The appearance of the phrase in various places in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment mutually exclusive with other embodiments. Those skilled in the art clearly and implicitly understand that the embodiments described herein can be combined with other embodiments.
本申请实施例所涉及到的移动终端可以包括各种具有无线通信功能的手持设备、车载设备、可穿戴设备、计算设备或连接到无线调制解调器的其他处理设备,以及各种形式的用户设备(User Equipment,UE),移动台(Mobile Station,MS),终端设备(terminal device)等等。为方便描述,上面提到的设备统称为移动终端。The mobile terminals involved in the embodiments of this application may include various handheld devices with wireless communication functions, vehicle-mounted devices, wearable devices, computing devices or other processing devices connected to wireless modems, as well as various forms of user equipment (User Equipment, UE), mobile station (Mobile Station, MS), terminal device (terminal device), etc. For ease of description, the devices mentioned above are collectively referred to as mobile terminals.
下面对本申请实施例进行详细介绍。The following describes the embodiments of the application in detail.
请参阅图1,图1是本申请实施例公开的一种神经网络算法框架,该神经网络算法框 架包括内存管理单元和多个张量Tensor单元。其中,Tensor单元,也可以称为Tensor,或张量,或张量单元。Tensor单元用于进行神经网络的推理计算。Tensor单元可以向内存管理单元申请内存,Tensor单元也可以向内存管理单元释放内存。本申请的神经网络算法框架可以应用于手机、平板电脑等移动终端中。移动终端的内存资源比较宝贵,该神经网络算法框架中的Tensor单元需要经常的进行内存申请与内存释放,如果Tensor单元频繁向移动终端的操作***申请内存与释放内存,会耗费较多的时间,导致该神经网络算法框架在进行网络推理时耗费时间较多,导致推理速度变慢。本申请采用的神经网络算法框架在向操作***申请内存时会先申请一大块内存,然后,该大块内存由内存管理单元进行管理,Tensor单元只需要向内存管理单元申请内存和释放内存,无需向操作***频繁申请内存与释放内存,可以提高神经网络算法框架的网络推理时间。本申请的神经网络算法框架可以为TensorFlow或TensorFlow Lite。其中,TensorFlow是一个在个人计算机(personal computer,PC)端运行的用于训练以及运行神经网络模型的框架。ensorFlow Lite是一个在移动端运行的用于训练以及运行神经网络模型的框架,该移动端可以运行IOS***或安卓(Android)***。Please refer to FIG. 1. FIG. 1 is a neural network algorithm framework disclosed in an embodiment of the present application. The neural network algorithm framework includes a memory management unit and a plurality of tensor units. Among them, Tensor unit can also be called Tensor, or tensor, or tensor unit. Tensor unit is used for inference calculation of neural network. The Tensor unit can apply for memory from the memory management unit, and the Tensor unit can also release memory from the memory management unit. The neural network algorithm framework of this application can be applied to mobile terminals such as mobile phones and tablet computers. The memory resources of mobile terminals are relatively precious. The Tensor unit in the neural network algorithm framework requires frequent memory application and memory release. If the Tensor unit frequently requests and releases memory from the operating system of the mobile terminal, it will take a lot of time. As a result, the neural network algorithm framework consumes a lot of time in network reasoning, which results in slower reasoning. The neural network algorithm framework used in this application will first apply for a large block of memory when applying for memory from the operating system. Then, the large block of memory is managed by the memory management unit. The Tensor unit only needs to apply for memory and release the memory from the memory management unit. There is no need to frequently apply for memory and release memory from the operating system, which can improve the network reasoning time of the neural network algorithm framework. The neural network algorithm framework of this application can be TensorFlow or TensorFlow Lite. Among them, TensorFlow is a framework for training and running neural network models that runs on a personal computer (PC). ensorFlow Lite is a framework for training and running neural network models that runs on the mobile terminal. The mobile terminal can run the IOS system or the Android system.
请参阅图2,图2是本申请实施例公开的一种基于神经网络算法框架的内存管理方法的流程示意图,如图2所示,该基于神经网络算法框架的内存管理方法包括如下步骤。Please refer to FIG. 2. FIG. 2 is a schematic flowchart of a memory management method based on a neural network algorithm framework disclosed in an embodiment of the present application. As shown in FIG. 2, the memory management method based on a neural network algorithm framework includes the following steps.
201,内存管理单元接收第一Tensor单元发送的内存申请请求,该内存申请请求携带需要申请的内存空间,该第一Tensor单元为多个张量Tensor单元中的任一个。201. The memory management unit receives a memory application request sent by a first Tensor unit, where the memory application request carries memory space that needs to be applied for, and the first Tensor unit is any one of a plurality of tensor Tensor units.
本申请实施例中,神经网络算法框架包括内存管理单元和多个张量Tensor单元。神经网络算法框架在进行神经网络计算时,Tensor单元需要频繁的进行内存申请与内存释放。Tensor单元在进行内存申请时,直接向内存管理单元进行申请,无需向操作***频繁申请内存与释放内存,可以提高神经网络算法框架的网络推理时间。In the embodiment of the present application, the neural network algorithm framework includes a memory management unit and a plurality of tensor units. Neural network algorithm framework When performing neural network calculations, the Tensor unit needs to frequently apply and release memory. When the Tensor unit applies for memory, it directly applies to the memory management unit, without frequent application and release of memory from the operating system, which can improve the network reasoning time of the neural network algorithm framework.
202,内存管理单元检测需要申请的内存空间是否小于或等于已分配内存的当前最大可用空白内存块的容量。202. The memory management unit detects whether the memory space that needs to be applied for is less than or equal to the capacity of the current largest available blank memory block of the allocated memory.
203,若需要申请的内存空间大于已分配内存的当前最大可用空白内存块的容量,内存管理单元对已分配内存执行内存整理操作。203. If the required memory space is greater than the capacity of the current largest available blank memory block of the allocated memory, the memory management unit performs a memory arranging operation on the allocated memory.
本申请实施例中,已分配内存是移动终端的操作***分配给神经网络计算框架进行神经网络计算与推理的一个大块内存。该大块内存专门用于进行神经网络计算与推理。内存管理单元可以根据该神经网络计算框架的Tensor单元的数量向操作***申请该已分配内 存。该已分配内存的大小与该神经网络计算框架的Tensor单元的数量相关,一般而言,该神经网络计算框架的Tensor单元的数量越大,该已分配内存越大。In the embodiment of this application, the allocated memory is a large block of memory allocated by the operating system of the mobile terminal to the neural network computing framework for neural network calculation and inference. This large block of memory is specifically used for neural network calculations and inferences. The memory management unit can apply to the operating system for the allocated memory according to the number of Tensor units of the neural network calculation framework. The size of the allocated memory is related to the number of Tensor units of the neural network calculation framework. Generally speaking, the larger the number of Tensor units of the neural network calculation framework, the larger the allocated memory.
该大块内存是地址连续的内存。已分配内存可以包括很多个小的内存块,这些内存块可以处于占用状态或者空白状态。如果内存块处于占用状态,则该内存块为已占用内存块;如果内存块处于空白状态,则该内存块为空白内存块。对于已占用内存块,内存管理单元在接收到针对该已占用内存块的释放请求时,会将该已占用内存块进行内存释放,该已占用内存块会变为空白内存块,将该内存块的状态从已占用状态修改为空白状态。对于空白内存块,内存管理单元在接收到Tensor单元的内存申请请求时,选择合适的空白内存块分配给该Tensor单元使用,该空白内存块会变为已占用内存块,将该内存块的状态从空白状态修改为已占用状态。This large block of memory is memory with consecutive addresses. The allocated memory can include many small memory blocks, and these memory blocks can be in an occupied state or a blank state. If the memory block is in the occupied state, the memory block is an occupied memory block; if the memory block is in the blank state, the memory block is a blank memory block. For an occupied memory block, when the memory management unit receives a release request for the occupied memory block, it will release the occupied memory block, the occupied memory block will become a blank memory block, and the memory block will be The status of is changed from occupied status to blank status. For a blank memory block, when the memory management unit receives a memory application request from a Tensor unit, it selects a suitable blank memory block and allocates it to the Tensor unit for use. The blank memory block will become an occupied memory block and the state of the memory block Change from blank state to occupied state.
其中,内存管理单元在接收到Tensor单元的内存申请请求时,选择合适的空白内存块分配给该Tensor单元使用,具体为:Among them, when the memory management unit receives the memory application request of the Tensor unit, it selects a suitable blank memory block and allocates it to the Tensor unit for use, specifically:
内存管理单元在接收到Tensor单元的内存申请请求时,从已分配内存中选择大于需要申请的内存空间的最小可用空白内存块。When the memory management unit receives the memory application request of the Tensor unit, it selects the smallest available blank memory block larger than the memory space that needs to be applied from the allocated memory.
如果已分配内存中存在大于需要申请的内存空间的可用空白内存块,内存管理单元从可用空白内存块中选择大于该需要申请的内存空间的最小可用空白内存块给该Tensor单元使用;If there is an available blank memory block larger than the required memory space in the allocated memory, the memory management unit selects the smallest available blank memory block larger than the required memory space from the available blank memory block for the Tensor unit to use;
如果已分配内存中不存在大于需要申请的内存空间的可用空白内存块,也即,该需要申请的内存空间大于已分配内存的当前最大可用空白内存块的容量,内存管理单元对已分配内存执行内存整理操作。If there is no available blank memory block larger than the memory space that needs to be applied for in the allocated memory, that is, the memory space that needs to be applied for is greater than the capacity of the current largest available blank memory block of the allocated memory, the memory management unit executes on the allocated memory Memory organization operation.
其中,内存管理单元对已分配内存执行内存整理操作,具体是:内存管理单元对该已分配内存的已占用的内存块之间的间隙进行压缩,以增大该已分配内存的可用空白内存块的容量。其中,内存管理单元可以将该已分配内存的已占用的内存块朝一个方向进行压缩,以使另外一个方向空出更大的空白内存块。可选的,内存管理单元也可以将该已分配内存的已占用的内存块分为两组,一组向第一方向进行压缩,另一组向与第一方向相反的方向进行压缩,以使该已分配内存的中间空出更大的空白内存块。Among them, the memory management unit performs memory arranging operations on the allocated memory, specifically: the memory management unit compresses the gaps between the occupied memory blocks of the allocated memory to increase the available blank memory blocks of the allocated memory Capacity. Among them, the memory management unit may compress the occupied memory block of the allocated memory in one direction, so as to make a larger blank memory block available in the other direction. Optionally, the memory management unit may also divide the occupied memory blocks of the allocated memory into two groups, one group compresses in the first direction, and the other group compresses in the direction opposite to the first direction, so that A larger blank memory block is left in the middle of the allocated memory.
本申请实施例中,可以在Tensor单元向内存管理单元申请内存时,在已分配内存的当前最大可用空白内存块无法满足该Tensor单元的内存申请需求时,执行一次内存整理操作,以使已分配内存空出来的内存块能够满足该Tensor单元的内存申请需求,避免频繁向操作 ***申请更大的内存,可以在神经网络算法框架下避免内存浪费,还可以节省由于频繁向操作***申请和释放内存所需的时间。In the embodiment of this application, when the Tensor unit applies for memory from the memory management unit, when the current largest available free memory block of the allocated memory cannot meet the memory application requirements of the Tensor unit, a memory arranging operation can be performed to make the allocated memory The memory block vacated by the memory can meet the memory application requirements of the Tensor unit, avoid frequently requesting larger memory from the operating system, avoid memory waste under the neural network algorithm framework, and save the frequent application and release of memory from the operating system The time required.
请参阅图3,图3是本申请实施例公开的另一种基于神经网络算法框架的内存管理方法的流程示意图,图3是在图2的基础上进一步优化得到的。如图3所示,该基于神经网络算法框架的内存管理方法包括如下步骤。Please refer to FIG. 3. FIG. 3 is a schematic flowchart of another memory management method based on a neural network algorithm framework disclosed in an embodiment of the present application. FIG. 3 is further optimized on the basis of FIG. 2. As shown in Fig. 3, the memory management method based on the neural network algorithm framework includes the following steps.
301,内存管理单元接收第一Tensor单元发送的内存申请请求,该内存申请请求携带需要申请的内存空间,该第一Tensor单元为多个张量Tensor单元中的任一个。301. The memory management unit receives a memory application request sent by a first Tensor unit, where the memory application request carries memory space that needs to be applied for, and the first Tensor unit is any one of a plurality of tensor Tensor units.
302,内存管理单元检测需要申请的内存空间是否小于或等于已分配内存的当前最大可用空白内存块的容量。302. The memory management unit detects whether the memory space that needs to be applied for is less than or equal to the capacity of the current largest available blank memory block of the allocated memory.
303,若需要申请的内存空间大于已分配内存的当前最大可用空白内存块的容量,内存管理单元对已分配内存执行内存整理操作。303. If the required memory space is greater than the capacity of the current largest available blank memory block of the allocated memory, the memory management unit performs a memory arranging operation on the allocated memory.
本申请实施例中的步骤301至步骤303的具体实施请参见图2所示的步骤201至步骤203,此处不再赘述。For the specific implementation of step 301 to step 303 in the embodiment of the present application, please refer to step 201 to step 203 shown in FIG. 2, which will not be repeated here.
可选的,步骤303中,内存管理单元对已分配内存执行内存整理操作,具体可以为:Optionally, in step 303, the memory management unit performs a memory arranging operation on the allocated memory, which may specifically be:
内存管理单将已分配内存中已占用的内存块朝已分配内存的首位内存块的方向进行压缩,以使已分配内存中相邻的内存块之间的内存间隙小于预设阈值;The memory management unit compresses the occupied memory block in the allocated memory toward the first memory block of the allocated memory, so that the memory gap between adjacent memory blocks in the allocated memory is smaller than the preset threshold;
内存管理单调整已分配内存中已占用的内存块对应的内存索引;其中,第一内存索引包括第一内存块的起始内存地址和第一内存块的内存长度,第一内存块为已占用的内存块中的任一个,第一内存索引为与第一内存块对应的内存索引。The memory management sheet adjusts the memory index corresponding to the occupied memory block in the allocated memory; where the first memory index includes the starting memory address of the first memory block and the memory length of the first memory block, and the first memory block is occupied In any of the memory blocks, the first memory index is a memory index corresponding to the first memory block.
本申请实施例中,内存索引可以用结构体指针来表示。内存索引包括对应的已占用内存块的起始内存地址和该已占用内存块的内存长度。其中,内存长度也可以称为内存大小。相邻的内存块之间的内存间隙指的是相邻内存块之间的无法被利用的空白内存块的大小。In the embodiment of this application, the memory index can be represented by a structure pointer. The memory index includes the starting memory address of the corresponding occupied memory block and the memory length of the occupied memory block. Among them, the memory length can also be called the memory size. The memory gap between adjacent memory blocks refers to the size of the blank memory blocks between adjacent memory blocks that cannot be used.
下面以图4为例来描述内存整理的具体过程。请参阅图4,图4是本申请实施例公开的一种内存整理的示意图。图4中,已分配内存为内存管理单元向操作***申请的一整块连续内存。“内存1”、“内存2”、“内存3”、“内存4”表示在已分配的内存中所占内存的位置,为已占用内存块。已分配的内存中未被“内存1”、“内存2”、“内存3”、“内存4”占据的位置为空白内存块。内存索引中的0-3表示内存块的顺序,内存索引可以用一个结构体指针来表示每个内存块的起始地址和所占用的内存长度。The following takes Figure 4 as an example to describe the specific process of memory organization. Please refer to FIG. 4, which is a schematic diagram of a memory organization disclosed in an embodiment of the present application. In Figure 4, the allocated memory is a whole block of continuous memory that the memory management unit applies to the operating system. "Memory 1", "Memory 2", "Memory 3", and "Memory 4" indicate the memory location in the allocated memory, and are occupied memory blocks. In the allocated memory, the locations not occupied by "Memory 1", "Memory 2", "Memory 3", and "Memory 4" are empty memory blocks. The 0-3 in the memory index indicates the order of the memory blocks, and the memory index can use a structure pointer to indicate the starting address of each memory block and the length of memory occupied.
从图4可以看出,在没有执行内存整理之前,“内存1”、“内存2”、“内存3”、“内存4” 这些内存块之间的空白内存块较小,可能无法满足第一Tensor单元的内存申请需求,此时执行内存整理操作,内存管理单将已分配内存中已占用的内存块朝已分配内存的首位内存块(图4中的“内存1”)的方向进行压缩,以使已分配内存中相邻的内存块之间的内存间隙小于预设阈值。其中预设阈值为相邻内存块允许的最小间隔。相应的,图4中的“内存1”、“内存2”、“内存3”、“内存4”对应的内存索引也发生相应的变化(比如,内存索引中起始内存地址发生了变化)。It can be seen from Figure 4 that before the memory arranging is performed, the blank memory blocks between "Memory 1", "Memory 2", "Memory 3" and "Memory 4" are small and may not meet the requirements of the first The memory application requirements of the Tensor unit. At this time, the memory arranging operation is performed, and the memory management unit compresses the occupied memory block in the allocated memory toward the first memory block of the allocated memory ("Memory 1" in Figure 4). In order to make the memory gap between adjacent memory blocks in the allocated memory smaller than the preset threshold. The preset threshold is the minimum allowable interval between adjacent memory blocks. Correspondingly, the memory indexes corresponding to "memory 1", "memory 2", "memory 3", and "memory 4" in FIG. 4 also have corresponding changes (for example, the starting memory address in the memory index has changed).
举例来说,已分配的内存可以按照0-99进行编号,“内存1”是编号为5-20的内存块、“内存2”是编号为30-45的内存块、“内存3”是编号为55-70的内存块、“内存4”是编号为80-95的内存块。“内存1”对应的内存索引中包括“内存1”的起始内存地址(比如,为5)和内存长度(比如,为16),“内存2”对应的内存索引中包括“内存2”的起始内存地址(比如,为30)和内存长度(比如,为16),“内存3”对应的内存索引中包括“内存3”的起始内存地址(比如,为55)和内存长度(比如,为16),“内存4”对应的内存索引中包括“内存4”的起始内存地址(比如,为80)和内存长度(比如,为16)。可以看出,已分配的内存的空白内存块分别是编号为0-4的第一空白内存块、编号为21-29的第二空白内存块、编号为46-54的第三空白内存块、编号为71-79的第四空白内存块、编号为96-99的第五空白内存块。其中,第一空白内存块的内存长度为5,第二空白内存块的内存长度为9,第三空白内存块的内存长度为9,第四空白内存块的内存长度为9,第五空白内存块的内存长度为4。如果第一Tensor单元的申请的内存大小为20,则已分配的内存中没有符合要求的空白内存块。在执行内存整理操作后,“内存1”的编号变为1-16的内存块、“内存2”的编号变为18-33的内存块、“内存3”的编号变为35-50的内存块、“内存4”的编号变为52-67的内存块。则已分配的内存中空白内存块的编号为68-99,位于已分配的内存中的末位,也可以称为末位空白内存块,其大小为32,完全可以满足第一Tensor单元的申请需求,则可以将该空白内存块中编号为69-88的内存块分配给第一Tensor单元。可见,当执行内存整理操作后,“内存1”对应的内存索引中的起始内存地址从5变为1,“内存2”对应的内存索引中的起始内存地址从30变为18,“内存3”对应的内存索引中的起始内存地址从55变为35,“内存4”对应的内存索引中的起始内存地址从80变为52。“内存1”、“内存2”、“内存3”、“内存4”对应的内存索引中的内存长度则不会由于内存整理而发生变化。For example, the allocated memory can be numbered from 0-99, "Memory 1" is the memory block numbered 5-20, "Memory 2" is the memory block numbered 30-45, and "Memory 3" is the number The memory block numbered 55-70, "Memory 4" is the memory block numbered 80-95. The memory index corresponding to "Memory 1" includes the starting memory address of "Memory 1" (for example, 5) and memory length (for example, 16), and the memory index corresponding to "Memory 2" includes the memory of "Memory 2" Starting memory address (for example, 30) and memory length (for example, 16), the memory index corresponding to "Memory 3" includes the starting memory address of "Memory 3" (for example, 55) and memory length (for example, , Is 16), the memory index corresponding to "Memory 4" includes the starting memory address of "Memory 4" (for example, 80) and the memory length (for example, 16). It can be seen that the empty memory blocks of the allocated memory are the first empty memory block numbered 0-4, the second empty memory block numbered 21-29, the third empty memory block numbered 46-54, The fourth blank memory block numbered 71-79, and the fifth blank memory block numbered 96-99. Among them, the memory length of the first blank memory block is 5, the memory length of the second blank memory block is 9, the memory length of the third blank memory block is 9, the memory length of the fourth blank memory block is 9, and the fifth blank memory is The memory length of the block is 4. If the requested memory size of the first Tensor unit is 20, there is no blank memory block that meets the requirements in the allocated memory. After the memory arranging operation, the number of "Memory 1" becomes the memory block of 1-16, the number of "Memory 2" becomes the memory block of 18-33, and the number of "Memory 3" becomes the memory of 35-50. The number of block, "Memory 4" becomes 52-67 memory block. The number of the blank memory block in the allocated memory is 68-99, which is located at the end of the allocated memory. It can also be called the last blank memory block. Its size is 32, which can fully satisfy the application of the first Tensor unit. If required, the memory block numbered 69-88 in the blank memory block can be allocated to the first Tensor unit. It can be seen that when the memory arranging operation is performed, the starting memory address in the memory index corresponding to "Memory 1" changes from 5 to 1, and the starting memory address in the memory index corresponding to "Memory 2" changes from 30 to 18." The starting memory address in the memory index corresponding to "Memory 3" changes from 55 to 35, and the starting memory address in the memory index corresponding to "Memory 4" changes from 80 to 52. The memory length in the memory index corresponding to "Memory 1", "Memory 2", "Memory 3", and "Memory 4" will not change due to memory consolidation.
304,内存管理单元检测需要申请的内存空间是否小于或等于已分配内存的末位空白内 存块的容量。304. The memory management unit detects whether the memory space that needs to be applied for is less than or equal to the capacity of the last blank memory block of the allocated memory.
305,若需要申请的内存空间小于或等于已分配内存的末位空白内存块的容量,内存管理单元将末位空白内存块分配给第一Tensor单元。305. If the required memory space is less than or equal to the capacity of the last blank memory block of the allocated memory, the memory management unit allocates the last blank memory block to the first Tensor unit.
其中,内存管理单元将末位空白内存块分配给第一Tensor单元,可以是将整个末位空白内存块分配给第一Tensor单元,也可以将末位空白内存块的其中一段连续的内存空间分配给第一Tensor单元,具体情况依据需要申请的内存空间与末位空白内存块的容量大小。举例来说,如果需要申请的内存空间等于或微小于已分配内存的末位空白内存块的容量,则可以将整个末位空白内存块分配给第一Tensor单元;如果需要申请的内存空间远小于已分配内存的末位空白内存块的容量,则将末位空白内存块的其中一段连续的内存空间分配给第一Tensor单元(具体的,可以将末位空白内存块中靠近已分配的相邻内存块的一段连续的内存空间分配给第一Tensor单元)。Among them, the memory management unit allocates the last blank memory block to the first Tensor unit. It can either allocate the entire last blank memory block to the first Tensor unit, or allocate a segment of continuous memory space of the last blank memory block. For the first Tensor unit, the specific situation depends on the required memory space and the capacity of the last blank memory block. For example, if the memory space that needs to be applied for is equal to or smaller than the capacity of the last blank memory block of the allocated memory, the entire last blank memory block can be allocated to the first Tensor unit; if the memory space needed to be applied for is much smaller than For the capacity of the last blank memory block of the allocated memory, one segment of the continuous memory space of the last blank memory block is allocated to the first Tensor unit (specifically, the last blank memory block can be adjacent to the allocated adjacent A continuous memory space of the memory block is allocated to the first Tensor unit).
本申请实施例中,内存整理操作是将已分配内存中已占用的内存块朝已分配内存的首位内存块的方向进行压缩,从而产生末位空白内存块,使得末位空白内存块的容量大于未执行内存整理之前的任意一个空白内存块的容量。如果需要申请的内存空间小于或等于已分配内存的末位空白内存块的容量,内存管理单元将末位空白内存块分配给第一Tensor单元。采用内存整理之后,可以得到更大的空白内存块,避免频繁向操作***申请更大的内存,可以在神经网络算法框架下避免内存浪费,还可以节省由于频繁向操作***申请和释放内存所需的时间。In the embodiment of this application, the memory arranging operation is to compress the occupied memory block in the allocated memory toward the first memory block of the allocated memory, thereby generating the last blank memory block, so that the capacity of the last blank memory block is greater than The capacity of any blank memory block before memory arranging is performed. If the required memory space is less than or equal to the capacity of the last blank memory block of the allocated memory, the memory management unit allocates the last blank memory block to the first Tensor unit. After the memory is sorted, larger blank memory blocks can be obtained, avoiding frequent requests for larger memory from the operating system, avoiding memory waste under the framework of neural network algorithms, and saving the need for frequent requests and releases of memory from the operating system time.
可选的,步骤305中,内存管理单元将末位空白内存块分配给第一Tensor单元,具体为:Optionally, in step 305, the memory management unit allocates the last blank memory block to the first Tensor unit, specifically:
内存管理单元将末位空白内存块对应的内存索引发送至第一Tensor单元。The memory management unit sends the memory index corresponding to the last blank memory block to the first Tensor unit.
本申请实施例中,在为第一Tensor单元分配内存时,内存管理单元只需将末位空白内存块对应的内存索引发送至第一Tensor单元,第一Tensor单元即可根据该末位空白内存块对应的内存索引找到该末位空白内存块的起始内存地址和中止内存地址,将第一Tensor单元对应的内容存入该末位空白内存块。具体的,由于该末位空白内存块的内存地址是连续的,可以根据该内存索引中包括的该末位空白内存块的起始内存地址和内存长度确定该末位空白内存块的中止内存地址,进而得到该末位空白内存块的起始内存地址和中止内存地址。可见,在已分配内存中进行内存申请只需返回内存索引即可,与向操作***申请内存相比,可以大大的节省内存分配时间,进而提高神经网络算法框架的网络推理时间。In the embodiment of this application, when allocating memory for the first Tensor unit, the memory management unit only needs to send the memory index corresponding to the last blank memory block to the first Tensor unit, and the first Tensor unit can then use the last blank memory The memory index corresponding to the block finds the start memory address and the termination memory address of the last blank memory block, and stores the content corresponding to the first Tensor unit in the last blank memory block. Specifically, since the memory address of the last blank memory block is continuous, the termination memory address of the last blank memory block can be determined according to the start memory address and memory length of the last blank memory block included in the memory index , And then get the start memory address and the stop memory address of the last blank memory block. It can be seen that the memory application in the allocated memory only needs to return the memory index. Compared with the memory application to the operating system, it can greatly save the memory allocation time, thereby improving the network reasoning time of the neural network algorithm framework.
可选的,图3所示的方法流程还可以包括如下步骤:Optionally, the method flow shown in FIG. 3 may further include the following steps:
306,若需要申请的内存空间大于已分配内存的末位空白内存块的容量,内存管理单元向操作***申请分配目标大块内存,该目标大块内存大于或等于已分配内存与需要申请的内存空间的内存大小之和。306. If the required memory space is greater than the capacity of the last blank memory block of the allocated memory, the memory management unit applies to the operating system to allocate a target large block of memory, and the target large block of memory is greater than or equal to the allocated memory and the memory to be applied for The sum of the memory size of the space.
307,内存管理单元将存储在已分配内存的内容复制到目标大块内存,释放已分配内存。307. The memory management unit copies the content stored in the allocated memory to the target large block of memory, and releases the allocated memory.
本申请实施例中,如果需要申请的内存空间大于已分配内存的末位空白内存块的容量,表明该已分配内存无法满足第一Tensor单元的内存申请需求,内存管理单元需要重新向操作***申请分配一块大于当前分配的已分配内存的大块内存,以满足神经网络算法框架的计算需求。具体的,重新申请的目标大块内存的大小与该神经网络算法框架的Tensor单元的数量、该神经网络算法框架的算法复杂度相关。一般而言,神经网络算法框架的Tensor单元的数量越大,该神经网络算法框架的算法复杂度越高,重新申请的目标大块内存越大。In the embodiment of this application, if the memory space that needs to be applied for is greater than the capacity of the last blank memory block of the allocated memory, it indicates that the allocated memory cannot meet the memory application requirements of the first Tensor unit, and the memory management unit needs to apply to the operating system again Allocate a block of memory larger than the currently allocated allocated memory to meet the computational requirements of the neural network algorithm framework. Specifically, the size of the target block of memory to be reapplied is related to the number of Tensor units of the neural network algorithm framework and the algorithm complexity of the neural network algorithm framework. Generally speaking, the larger the number of Tensor units of the neural network algorithm framework, the higher the algorithm complexity of the neural network algorithm framework, and the larger the target block memory for reapplication.
在重新申请目标大块内存之后,内存管理单元将存储在已分配内存的内容复制到目标大块内存,释放已分配内存。后续的Tensor单元如果需要申请内存,可以向内存管理单元申请,该内存管理单元可以从目标大块内存中选取空白内存块分配给Tensor单元。After re-applying for the target large block of memory, the memory management unit copies the content stored in the allocated memory to the target large block of memory, and releases the allocated memory. If the subsequent Tensor unit needs to apply for memory, it can apply to the memory management unit, which can select a blank memory block from the target large block of memory and allocate it to the Tensor unit.
本申请实施例中,当内存管理单元管理的已分配内存无法满足神经网络算法框架的计算需求时,内存管理单元管理重新向操作***申请分配更大的目标大块内存,以满足该神经网络算法框架的计算需求。该目标大块内存可以满足该神经网络算法框架的所有Tensor单元的内存申请需求,无需频繁向操作***申请内存,可以节省由于频繁向操作***申请和释放内存所需的时间。In the embodiment of the present application, when the allocated memory managed by the memory management unit cannot meet the calculation requirements of the neural network algorithm framework, the memory management unit management re-applies to the operating system to allocate a larger target block of memory to satisfy the neural network algorithm The computational requirements of the framework. The target large block of memory can meet the memory application requirements of all Tensor units of the neural network algorithm framework, and there is no need to frequently apply for memory from the operating system, which can save the time required for frequent application and release of memory from the operating system.
可选的,图3所示的方法流程还可以包括如下步骤:Optionally, the method flow shown in FIG. 3 may further include the following steps:
内存管理单元接收第二Tensor单元发送的内存释放请求,该内存释放请求携带需要释放的内存块对应的内存索引,将需要释放的内存块对应的内存索引标记为空白状态,第二Tensor单元为多个张量Tensor单元中的任一个。The memory management unit receives the memory release request sent by the second Tensor unit. The memory release request carries the memory index corresponding to the memory block that needs to be released, and marks the memory index corresponding to the memory block that needs to be released as blank. The second Tensor unit is more than Any one of Tensor units.
本申请实施例中,在为第二Tensor单元释放内存时,内存管理单元只需将第二Tensor单元发送的内存释放请求携带的需要释放的内存块对应的内存索引标记为空白状态即可进行内存释放。与向操作***释放内存相比,可以大大的节省内存释放时间,进而提高神经网络算法框架的网络推理时间。In the embodiment of this application, when releasing the memory for the second Tensor unit, the memory management unit only needs to mark the memory index corresponding to the memory block to be released carried in the memory release request sent by the second Tensor unit as a blank state. freed. Compared with releasing memory to the operating system, the memory release time can be greatly saved, and the network reasoning time of the neural network algorithm framework can be improved.
可选的,图3所示的方法流程还可以包括如下步骤:Optionally, the method flow shown in FIG. 3 may further include the following steps:
内存管理单元通过已分配的内存的用于内存管理的应用程序编程接口API记录第一 Tensor单元所占用的内存大小;The memory management unit records the memory size occupied by the first Tensor unit through the application programming interface API for memory management of the allocated memory;
内存管理单元接收针对第一程序的内存占用查询指令,通过已分配的内存的用于内存管理的应用程序编程接口API获取第一程序使用的所有Tensor单元所占用的内存大小。The memory management unit receives the memory occupation query instruction for the first program, and obtains the memory size occupied by all the Tensor units used by the first program through the application programming interface API for memory management of the allocated memory.
本申请实施例中,由于传统的申请和释放内存的方式,因为是调用的***的接口,研发人员是无法改写的。如果研发人员需要获取整个程序占用的内存,需要在每一处内存申请和内存释放的位置添加记录,但是在程序中申请和释放内存的地方非常多,这种方式会让代码结构变得很混乱。本申请实施例中使用内存管理单元(Simple Memory Arena)统一管理内存,只需要在的申请和释放的应用程序编程接口API那里添加记录就可以记录整个程序的内存记录,非常容易调试。In the embodiments of the present application, due to the traditional application and memory release method, the R&D personnel cannot rewrite it because it is the interface of the calling system. If developers need to obtain the memory occupied by the entire program, they need to add records at each memory application and memory release location, but there are many places to apply and release memory in the program, which makes the code structure very confusing . In the embodiments of this application, a memory management unit (Simple Memory Arena) is used to uniformly manage memory, and the memory records of the entire program can be recorded by adding records to the application and released application programming interface API, which is very easy to debug.
当程序出现bug时,可以通过内存管理单元的Debug程序调试工具看出程序占用内存,判断是否内存过大,是否有内存泄漏。When there is a bug in the program, you can use the Debug program debugging tool of the memory management unit to see that the program occupies memory to determine whether the memory is too large and whether there is a memory leak.
内存管理单元可以通过Profile程序调试工具查看每一个内存申请的模块(比如,Tensor单元)所占用的内存,用于性能调优,可以排查是否有哪些不需要申请的内存可以优化掉。The memory management unit can use the Profile program debugging tool to view the memory occupied by each memory application module (for example, Tensor unit) for performance tuning, and can check whether there is any memory that does not need to be applied for can be optimized.
本申请的内存管理方法便于扩展,可以查看程序占用的内存,判断是否内存过大,是否有内存泄漏,可以进行内存优化。The memory management method of this application is easy to expand, and the memory occupied by the program can be checked to determine whether the memory is too large, whether there is a memory leak, and the memory can be optimized.
上述主要从方法侧执行过程的角度对本申请实施例的方案进行了介绍。可以理解的是,移动终端为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本发明能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。The foregoing mainly introduces the solution of the embodiment of the present application from the perspective of the execution process on the method side. It can be understood that, in order to implement the above-mentioned functions, the mobile terminal includes hardware structures and/or software modules corresponding to each function. Those skilled in the art should easily realize that in combination with the units and algorithm steps of the examples described in the embodiments disclosed herein, the present invention can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered as going beyond the scope of the present invention.
本申请实施例可以根据上述方法示例对移动终端进行功能单元的划分,例如,可以对应各个功能划分各个功能单元,也可以将两个或两个以上的功能集成在一个处理单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。需要说明的是,本申请实施例中对单元的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。The embodiment of the present application may divide the mobile terminal into functional units according to the foregoing method examples. For example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit. It should be noted that the division of units in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other division methods in actual implementation.
请参阅图5,图5是本申请实施例公开的一种内存管理装置的结构示意图。如图5所示,该内存管理装置应用于神经网络算法框架,该神经网络算法框架包括多个张量Tensor 单元,该内存管理装置500包括接收单元501、检测单元502以及内存整理单元503,其中:Please refer to FIG. 5. FIG. 5 is a schematic structural diagram of a memory management device disclosed in an embodiment of the present application. As shown in FIG. 5, the memory management device is applied to a neural network algorithm framework. The neural network algorithm framework includes a plurality of tensor units. The memory management device 500 includes a receiving unit 501, a detection unit 502, and a memory sorting unit 503. :
接收单元501,用于接收第一Tensor单元发送的内存申请请求,所述内存申请请求携带需要申请的内存空间,所述第一Tensor单元为所述多个张量Tensor单元中的任一个;The receiving unit 501 is configured to receive a memory application request sent by a first Tensor unit, where the memory application request carries memory space to be applied for, and the first Tensor unit is any one of the multiple tensor Tensor units;
检测单元502,用于检测所述需要申请的内存空间是否小于或等于已分配内存的当前最大可用空白内存块的容量;The detecting unit 502 is configured to detect whether the memory space that needs to be applied for is less than or equal to the capacity of the current largest available blank memory block of the allocated memory;
内存整理单元503,用于在所述检测单元502检测到所述需要申请的内存空间大于已分配内存的当前最大可用空白内存块的容量的情况下,对所述已分配内存执行内存整理操作。The memory arranging unit 503 is configured to perform a memory arranging operation on the allocated memory when the detecting unit 502 detects that the memory space that needs to be applied for is greater than the capacity of the current largest available blank memory block of the allocated memory.
可选的,该内存管理装置500还可以包括内存分配单元504。Optionally, the memory management apparatus 500 may further include a memory allocation unit 504.
检测单元502,还用于在内存整理单元503对所述已分配内存执行内存整理操作之后,检测所述需要申请的内存空间是否小于或等于所述已分配内存的末位空白内存块的容量;The detecting unit 502 is further configured to detect whether the memory space that needs to be applied for is less than or equal to the capacity of the last blank memory block of the allocated memory after the memory arranging unit 503 performs a memory arranging operation on the allocated memory;
内存分配单元504,用于在检测单元502检测到所述需要申请的内存空间小于或等于所述已分配内存的末位空白内存块的容量的情况下,将所述末位空白内存块分配给所述第一Tensor单元。The memory allocation unit 504 is configured to allocate the last blank memory block to the case that the detection unit 502 detects that the memory space that needs to be applied for is less than or equal to the capacity of the last blank memory block of the allocated memory The first Tensor unit.
可选的,该内存管理装置500还可以包括内存申请单元505和内存释放单元506。Optionally, the memory management apparatus 500 may further include a memory application unit 505 and a memory release unit 506.
内存申请单元505,用于在检测单元502检测到所述需要申请的内存空间大于所述已分配内存的末位空白内存块的容量的情况下,向操作***申请分配目标大块内存,所述目标大块内存大于或等于所述已分配内存与所述需要申请的内存空间的内存大小之和;The memory application unit 505 is configured to apply to the operating system for the allocation of a target large block of memory when the detection unit 502 detects that the memory space that needs to be applied for is greater than the capacity of the last blank memory block of the allocated memory. The target large block of memory is greater than or equal to the sum of the memory size of the allocated memory and the memory space that needs to be applied for;
内存释放单元506,用于将存储在所述已分配内存的内容复制到所述目标大块内存,释放所述已分配内存。The memory release unit 506 is configured to copy the content stored in the allocated memory to the target large block of memory, and release the allocated memory.
可选的,内存整理单元503对所述已分配内存执行内存整理操作,具体为:将所述已分配内存中已占用的内存块朝所述已分配内存的首位内存块的方向进行压缩,以使所述已分配内存中相邻的内存块之间的内存间隙小于预设阈值;调整所述已分配内存中已占用的内存块对应的内存索引;其中,第一内存索引包括第一内存块的起始内存地址和所述第一内存块的内存长度,所述第一内存块为所述已占用的内存块中的任一个,所述第一内存索引为与所述第一内存块对应的内存索引。Optionally, the memory arranging unit 503 performs a memory arranging operation on the allocated memory, specifically: compressing the occupied memory block in the allocated memory toward the first memory block of the allocated memory, to Make the memory gap between adjacent memory blocks in the allocated memory smaller than a preset threshold; adjust the memory index corresponding to the occupied memory block in the allocated memory; wherein the first memory index includes the first memory block And the memory length of the first memory block, the first memory block is any one of the occupied memory blocks, and the first memory index corresponds to the first memory block The memory index.
可选的,内存分配单元504将所述末位空白内存块分配给所述第一Tensor单元,具体为:将所述末位空白内存块对应的内存索引发送至所述第一Tensor单元。Optionally, the memory allocation unit 504 allocates the last blank memory block to the first Tensor unit, specifically: sending a memory index corresponding to the last blank memory block to the first Tensor unit.
可选的,接收单元501,还用于接收第二Tensor单元发送的内存释放请求,所述内存 释放请求携带需要释放的内存块对应的内存索引,将所述需要释放的内存块对应的内存索引标记为空白状态,所述第二Tensor单元为所述多个张量Tensor单元中的任一个。Optionally, the receiving unit 501 is further configured to receive a memory release request sent by the second Tensor unit, where the memory release request carries the memory index corresponding to the memory block to be released, and the memory index corresponding to the memory block to be released Marked as a blank state, the second Tensor unit is any one of the multiple tensor Tensor units.
可选的,该内存管理装置500还可以包括记录单元507和获取单元508。Optionally, the memory management apparatus 500 may further include a recording unit 507 and an acquiring unit 508.
记录单元507,用于通过所述已分配的内存的用于内存管理的应用程序编程接口API记录所述第一Tensor单元所占用的内存大小;The recording unit 507 is configured to record the size of the memory occupied by the first Tensor unit through the application programming interface API for memory management of the allocated memory;
接收单元501,还用于接收针对第一程序的内存占用查询指令;The receiving unit 501 is further configured to receive a memory occupation query instruction for the first program;
获取单元508,用于通过所述已分配的内存的用于内存管理的应用程序编程接口API获取所述第一程序使用的所有Tensor单元所占用的内存大小。The obtaining unit 508 is configured to obtain the memory size occupied by all Tensor units used by the first program through the application programming interface API for memory management of the allocated memory.
其中,内存管理装置500可以是图1至图4中的内存管理单元。Wherein, the memory management device 500 may be the memory management unit in FIGS. 1 to 4.
其中,图5的接收单元501可以是通信接口,检测单元502、内存整理单元503、内存分配单元504、内存申请单元505、内存释放单元506、记录单元507和获取单元508可以是处理器,图5所示的内存管理装置还可以包括存储单元,该存储单元可以是存储器(比如,非易失性存储器)。Wherein, the receiving unit 501 in FIG. 5 may be a communication interface, and the detection unit 502, the memory arrangement unit 503, the memory allocation unit 504, the memory application unit 505, the memory release unit 506, the recording unit 507, and the acquisition unit 508 may be processors. The memory management device shown in 5 may further include a storage unit, which may be a memory (for example, a non-volatile memory).
实施图5所示的内存管理装置,可以在Tensor单元向内存管理单元申请内存时,在已分配内存的当前最大可用空白内存块无法满足该Tensor单元的内存申请需求时,执行一次内存整理操作,以使已分配内存空出来的内存块能够满足该Tensor单元的内存申请需求,避免频繁向操作***申请更大的内存,可以在神经网络算法框架下避免内存浪费,还可以节省由于频繁向操作***申请和释放内存所需的时间。Implementing the memory management device shown in Figure 5, when the Tensor unit applies for memory from the memory management unit, when the current maximum available free memory block of the allocated memory cannot meet the memory application requirements of the Tensor unit, a memory arrangement operation can be performed. In order to make the memory block vacated by the allocated memory can meet the memory application requirements of the Tensor unit, avoid frequent application of larger memory to the operating system, avoid memory waste under the framework of the neural network algorithm, and save the frequent application to the operating system The time required to apply and release memory.
请参阅图6,图6是本申请实施例公开的一种移动终端的结构示意图。如图6所示,该移动终端600包括处理器601和存储器602,其中,移动终端600还可以包括总线603,处理器601和存储器602可以通过总线603相互连接,总线603可以是外设部件互连标准(Peripheral Component Interconnect,简称PCI)总线或扩展工业标准结构(Extended Industry Standard Architecture,简称EISA)总线等。总线603可以分为地址总线、数据总线、控制总线等。为便于表示,图6中仅用一条粗线表示,但并不表示仅有一根总线或一种类型的总线。其中,移动终端600还可以包括输入输出设备604,输入输出设备604可以包括显示屏,例如液晶显示屏。存储器602用于存储包含指令的一个或多个程序;处理器601用于调用存储在存储器602中的指令执行上述图2至图3中的部分或全部方法步骤。Please refer to FIG. 6, which is a schematic structural diagram of a mobile terminal disclosed in an embodiment of the present application. As shown in FIG. 6, the mobile terminal 600 includes a processor 601 and a memory 602. The mobile terminal 600 may also include a bus 603. The processor 601 and the memory 602 may be connected to each other through the bus 603. The bus 603 may be an interconnection of peripheral components. Peripheral Component Interconnect (PCI) bus or Extended Industry Standard Architecture (EISA) bus, etc. The bus 603 can be divided into an address bus, a data bus, a control bus, and so on. For ease of representation, only one thick line is used in FIG. 6, but it does not mean that there is only one bus or one type of bus. The mobile terminal 600 may also include an input and output device 604, and the input and output device 604 may include a display screen, such as a liquid crystal display screen. The memory 602 is used to store one or more programs containing instructions; the processor 601 is used to call the instructions stored in the memory 602 to execute some or all of the method steps in FIGS. 2 to 3.
实施图6所示的移动终端,可以在Tensor单元向内存管理单元申请内存时,在已分配内存的当前最大可用空白内存块无法满足该Tensor单元的内存申请需求时,执行一次内存 整理操作,以使已分配内存空出来的内存块能够满足该Tensor单元的内存申请需求,避免频繁向操作***申请更大的内存,可以在神经网络算法框架下避免内存浪费,还可以节省由于频繁向操作***申请和释放内存所需的时间。Implementing the mobile terminal shown in Figure 6, when the Tensor unit applies for memory from the memory management unit, when the current maximum available free memory block of the allocated memory cannot meet the memory application requirements of the Tensor unit, a memory arranging operation can be performed to The memory block that has been freed from the allocated memory can meet the memory application requirements of the Tensor unit, avoid frequent application of larger memory to the operating system, avoid memory waste under the framework of the neural network algorithm, and save the frequent application to the operating system And the time required to release the memory.
本申请实施例还提供一种计算机存储介质,其中,该计算机存储介质存储用于电子数据交换的计算机程序,该计算机程序使得计算机执行如上述方法实施例中记载的任何一种基于神经网络算法框架的内存管理方法的部分或全部步骤。An embodiment of the present application also provides a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and the computer program enables the computer to execute any neural network-based algorithm framework described in the above method embodiments Part or all of the steps of the memory management method.
本申请实施例还提供一种计算机程序产品,该计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,该计算机程序可操作来使计算机执行如上述方法实施例中记载的任何一种基于神经网络算法框架的内存管理方法的部分或全部步骤。The embodiments of the present application also provide a computer program product. The computer program product includes a non-transitory computer-readable storage medium storing a computer program. The computer program is operable to cause a computer to execute any of the methods described in the above method embodiments. Part or all of the steps of a memory management method based on the neural network algorithm framework.
需要说明的是,对于前述的各方法实施例,为了简单描述,故将其都表述为一系列的动作组合,但是本领域技术人员应该知悉,本发明并不受所描述的动作顺序的限制,因为依据本发明,某些步骤可以采用其他顺序或者同时进行。其次,本领域技术人员也应该知悉,说明书中所描述的实施例均属于优选实施例,所涉及的动作和模块并不一定是本发明所必须的。It should be noted that for the foregoing method embodiments, for the sake of simple description, they are all expressed as a series of action combinations, but those skilled in the art should know that the present invention is not limited by the described sequence of actions. Because according to the present invention, some steps can be performed in other order or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the involved actions and modules are not necessarily required by the present invention.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in an embodiment, reference may be made to related descriptions of other embodiments.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置,可通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed device may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or may be Integrate into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。In addition, the functional units in the various embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可 以存储在一个计算机可读取存储器中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储器中,包括若干指令用以使得一台计算机设备(可为个人计算机、服务器或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储器包括:U盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable memory. Based on this understanding, the technical solution of the present invention essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory, A number of instructions are included to enable a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method described in each embodiment of the present invention. The aforementioned memory includes: U disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), mobile hard disk, magnetic disk or optical disk and other media that can store program codes.
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储器中,存储器可以包括:闪存盘、只读存储器(英文:Read-Only Memory,简称:ROM)、随机存取器(英文:Random Access Memory,简称:RAM)、磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above-mentioned embodiments can be completed by a program instructing relevant hardware. The program can be stored in a computer-readable memory, and the memory can include: a flash disk , Read-only memory (English: Read-Only Memory, abbreviation: ROM), random access device (English: Random Access Memory, abbreviation: RAM), magnetic disk or optical disk, etc.
以上对本申请实施例进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。The embodiments of the present application are described in detail above, and specific examples are used in this article to illustrate the principles and implementation of the present invention. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the present invention; Those of ordinary skill in the art, based on the idea of the present invention, will have changes in the specific implementation and the scope of application. In summary, the content of this specification should not be construed as limiting the present invention.

Claims (20)

  1. 一种基于神经网络算法框架的内存管理方法,所述神经网络算法框架包括多个张量Tensor单元,其特征在于,所述方法包括:A memory management method based on a neural network algorithm framework, the neural network algorithm framework including a plurality of tensor units, characterized in that the method includes:
    接收第一Tensor单元发送的内存申请请求,所述内存申请请求携带需要申请的内存空间,所述第一Tensor单元为所述多个张量Tensor单元中的任一个;Receiving a memory application request sent by a first Tensor unit, where the memory application request carries memory space that needs to be applied for, and the first Tensor unit is any one of the multiple tensor Tensor units;
    检测所述需要申请的内存空间是否小于或等于已分配内存的当前最大可用空白内存块的容量;Detecting whether the memory space that needs to be applied for is less than or equal to the capacity of the current largest available blank memory block of the allocated memory;
    若所述需要申请的内存空间大于所述已分配内存的当前最大可用空白内存块的容量,对所述已分配内存执行内存整理操作。If the memory space that needs to be applied for is greater than the capacity of the current largest available blank memory block of the allocated memory, a memory arranging operation is performed on the allocated memory.
  2. 根据权利要求1所述的方法,其特征在于,所述对所述已分配内存执行内存整理操作之后,所述方法还包括:The method according to claim 1, wherein after the memory arranging operation is performed on the allocated memory, the method further comprises:
    检测所述需要申请的内存空间是否小于或等于所述已分配内存的末位空白内存块的容量;Detecting whether the memory space that needs to be applied for is less than or equal to the capacity of the last blank memory block of the allocated memory;
    若所述需要申请的内存空间小于或等于所述已分配内存的末位空白内存块的容量,将所述末位空白内存块分配给所述第一Tensor单元。If the memory space that needs to be applied for is less than or equal to the capacity of the last blank memory block of the allocated memory, the last blank memory block is allocated to the first Tensor unit.
  3. 根据权利要求2所述的方法,其特征在于,所述方法还包括:The method of claim 2, wherein the method further comprises:
    若所述需要申请的内存空间大于所述已分配内存的末位空白内存块的容量,向操作***申请分配目标大块内存,所述目标大块内存大于或等于所述已分配内存与所述需要申请的内存空间的内存大小之和;If the memory space that needs to be applied for is greater than the capacity of the last blank memory block of the allocated memory, apply to the operating system to allocate a target large block of memory, and the target large block of memory is greater than or equal to the allocated memory and the The sum of the memory size of the required memory space;
    将存储在所述已分配内存的内容复制到所述目标大块内存,释放所述已分配内存。Copy the content stored in the allocated memory to the target large block of memory, and release the allocated memory.
  4. 根据权利要求2或3所述的方法,其特征在于,所述对所述已分配内存执行内存整理操作,包括:The method according to claim 2 or 3, wherein the performing a memory arranging operation on the allocated memory comprises:
    将所述已分配内存中已占用的内存块朝所述已分配内存的首位内存块的方向进行压缩,以使所述已分配内存中相邻的内存块之间的内存间隙小于预设阈值;Compressing the occupied memory block in the allocated memory toward the first memory block of the allocated memory, so that the memory gap between adjacent memory blocks in the allocated memory is smaller than a preset threshold;
    调整所述已分配内存中已占用的内存块对应的内存索引;其中,第一内存索引包括第 一内存块的起始内存地址和所述第一内存块的内存长度,所述第一内存块为所述已占用的内存块中的任一个,所述第一内存索引为与所述第一内存块对应的内存索引。Adjust the memory index corresponding to the occupied memory block in the allocated memory; where the first memory index includes the start memory address of the first memory block and the memory length of the first memory block, and the first memory block Is any one of the occupied memory blocks, and the first memory index is a memory index corresponding to the first memory block.
  5. 根据权利要求4所述的方法,其特征在于,所述将所述末位空白内存块分配给所述第一Tensor单元,包括:The method according to claim 4, wherein the allocating the last blank memory block to the first Tensor unit comprises:
    将所述末位空白内存块对应的内存索引发送至所述第一Tensor单元。Send the memory index corresponding to the last blank memory block to the first Tensor unit.
  6. 根据权利要求1~5任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 5, wherein the method further comprises:
    接收第二Tensor单元发送的内存释放请求,所述内存释放请求携带需要释放的内存块对应的内存索引,将所述需要释放的内存块对应的内存索引标记为空白状态,所述第二Tensor单元为所述多个张量Tensor单元中的任一个。Receive the memory release request sent by the second Tensor unit, where the memory release request carries the memory index corresponding to the memory block to be released, and mark the memory index corresponding to the memory block to be released as a blank state, the second Tensor unit Is any one of the multiple tensor Tensor units.
  7. 根据权利要求1~6任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 6, wherein the method further comprises:
    通过所述已分配的内存的用于内存管理的应用程序编程接口API记录所述第一Tensor单元所占用的内存大小;Recording the memory size occupied by the first Tensor unit through the application programming interface API for memory management of the allocated memory;
    接收针对第一程序的内存占用查询指令,通过所述已分配的内存的用于内存管理的应用程序编程接口API获取所述第一程序使用的所有Tensor单元所占用的内存大小。Receive a memory occupation query instruction for the first program, and obtain the memory size occupied by all Tensor units used by the first program through the application programming interface API for memory management of the allocated memory.
  8. 根据权利要求1所述的方法,其特征在于,所述检测所述需要申请的内存空间是否小于或等于已分配内存的当前最大可用空白内存块的容量之后,所述方法还包括:The method according to claim 1, wherein after detecting whether the memory space that needs to be applied for is less than or equal to the capacity of the current largest available blank memory block of the allocated memory, the method further comprises:
    若所述需要申请的内存空间小于所述已分配内存的当前最大可用空白内存块的容量,从所述已分配内存中选择大于所述需要申请的内存空间的最小可用空白内存块分配给所述第一Tensor单元。If the memory space that needs to be applied for is less than the capacity of the current largest available blank memory block of the allocated memory, select the smallest available blank memory block larger than the memory space that needs to be applied from the allocated memory and allocate it to the The first Tensor unit.
  9. 根据权利要求1所述的方法,其特征在于,所述对所述已分配内存执行内存整理操作,包括:The method according to claim 1, wherein the performing a memory arranging operation on the allocated memory comprises:
    对所述已分配内存的已占用的内存块之间的间隙进行压缩,以增大所述已分配内存的可用空白内存块的容量;Compressing the gaps between the occupied memory blocks of the allocated memory to increase the capacity of the available blank memory blocks of the allocated memory;
    其中,所述对所述已分配内存的已占用的内存块之间的间隙进行压缩,包括:Wherein, the compressing the gap between the occupied memory blocks of the allocated memory includes:
    将所述已分配内存的已占用的内存块朝一个方向进行压缩;Compressing the occupied memory block of the allocated memory in one direction;
    或者,将所述已分配内存的已占用的内存块分为两组,一组向第一方向进行压缩,另一组向与第一方向相反的方向进行压缩。Alternatively, the occupied memory blocks of the allocated memory are divided into two groups, one group is compressed in a first direction, and the other group is compressed in a direction opposite to the first direction.
  10. 一种内存管理装置,其特征在于,所述内存管理装置应用于神经网络算法框架,所述神经网络算法框架包括多个张量Tensor单元,所述内存管理装置包括:A memory management device, characterized in that the memory management device is applied to a neural network algorithm framework, the neural network algorithm framework includes a plurality of tensor units, and the memory management device includes:
    接收单元,用于接收第一Tensor单元发送的内存申请请求,所述内存申请请求携带需要申请的内存空间,所述第一Tensor单元为所述多个张量Tensor单元中的任一个;A receiving unit, configured to receive a memory application request sent by a first Tensor unit, the memory application request carrying memory space to be applied for, and the first Tensor unit is any one of the multiple tensor Tensor units;
    检测单元,用于检测所述需要申请的内存空间是否小于或等于已分配内存的当前最大可用空白内存块的容量;The detecting unit is configured to detect whether the memory space that needs to be applied for is less than or equal to the capacity of the current largest available blank memory block of the allocated memory;
    内存整理单元,用于在所述检测单元检测到所述需要申请的内存空间大于已分配内存的当前最大可用空白内存块的容量的情况下,对所述已分配内存执行内存整理操作。The memory arranging unit is configured to perform a memory arranging operation on the allocated memory when the detection unit detects that the memory space that needs to be applied for is greater than the capacity of the current largest available blank memory block of the allocated memory.
  11. 根据权利要求10所述的装置,其特征在于,所述内存管理装置还包括内存分配单元;The device according to claim 10, wherein the memory management device further comprises a memory allocation unit;
    所述检测单元,还用于在所述内存整理单元对所述已分配内存执行内存整理操作之后,检测所述需要申请的内存空间是否小于或等于所述已分配内存的末位空白内存块的容量;The detecting unit is further configured to detect whether the memory space that needs to be applied for is less than or equal to the last blank memory block of the allocated memory after the memory arranging unit performs a memory arranging operation on the allocated memory capacity;
    所述内存分配单元,用于在所述检测单元检测到所述需要申请的内存空间小于或等于所述已分配内存的末位空白内存块的容量的情况下,将所述末位空白内存块分配给所述第一Tensor单元。The memory allocation unit is configured to: when the detection unit detects that the memory space that needs to be applied for is less than or equal to the capacity of the last blank memory block of the allocated memory, transfer the last blank memory block Assigned to the first Tensor unit.
  12. 根据权利要求11所述的装置,其特征在于,所述内存管理装置还包括内存申请单元和内存释放单元;The device according to claim 11, wherein the memory management device further comprises a memory application unit and a memory release unit;
    所述内存申请单元,用于在所述检测单元检测到所述需要申请的内存空间大于所述已分配内存的末位空白内存块的容量的情况下,向操作***申请分配目标大块内存,所述目标大块内存大于或等于所述已分配内存与所述需要申请的内存空间的内存大小之和;The memory application unit is configured to apply to the operating system for allocation of a target large block of memory when the detection unit detects that the memory space that needs to be applied for is greater than the capacity of the last blank memory block of the allocated memory, The target large block of memory is greater than or equal to the sum of the memory size of the allocated memory and the memory space that needs to be applied for;
    所述内存释放单元,用于将存储在所述已分配内存的内容复制到所述目标大块内存,释放所述已分配内存。The memory release unit is configured to copy the content stored in the allocated memory to the target large block of memory, and release the allocated memory.
  13. 根据权利要求11或12所述的装置,其特征在于,所述内存整理单元对所述已分配内存执行内存整理操作,具体为:将所述已分配内存中已占用的内存块朝所述已分配内存的首位内存块的方向进行压缩,以使所述已分配内存中相邻的内存块之间的内存间隙小于预设阈值;调整所述已分配内存中已占用的内存块对应的内存索引;其中,第一内存索引包括第一内存块的起始内存地址和所述第一内存块的内存长度,所述第一内存块为所述已占用的内存块中的任一个,所述第一内存索引为与所述第一内存块对应的内存索引。The device according to claim 11 or 12, wherein the memory arranging unit performs a memory arranging operation on the allocated memory, specifically: directing the occupied memory block in the allocated memory to the allocated memory Compress the direction of the first memory block of the allocated memory so that the memory gap between adjacent memory blocks in the allocated memory is smaller than a preset threshold; adjust the memory index corresponding to the occupied memory block in the allocated memory ; Wherein, the first memory index includes the start memory address of the first memory block and the memory length of the first memory block, the first memory block is any one of the occupied memory blocks, the first A memory index is a memory index corresponding to the first memory block.
  14. 根据权利要求13所述的装置,其特征在于,所述内存分配单元将所述末位空白内存块分配给所述第一Tensor单元,具体为:将所述末位空白内存块对应的内存索引发送至所述第一Tensor单元。The device according to claim 13, wherein the memory allocation unit allocates the last blank memory block to the first Tensor unit, specifically: the memory index corresponding to the last blank memory block Send to the first Tensor unit.
  15. 根据权利要求10~14任一项所述的装置,其特征在于,所述接收单元,还用于接收第二Tensor单元发送的内存释放请求,所述内存释放请求携带需要释放的内存块对应的内存索引,将所述需要释放的内存块对应的内存索引标记为空白状态,所述第二Tensor单元为所述多个张量Tensor单元中的任一个。The device according to any one of claims 10 to 14, wherein the receiving unit is further configured to receive a memory release request sent by the second Tensor unit, and the memory release request carries a corresponding memory block that needs to be released. A memory index, marking the memory index corresponding to the memory block that needs to be released as a blank state, and the second Tensor unit is any one of the multiple tensor Tensor units.
  16. 根据权利要求10~15任一项所述的装置,其特征在于,所述内存管理装置还包括记录单元和获取单元;The device according to any one of claims 10 to 15, wherein the memory management device further comprises a recording unit and an acquiring unit;
    所述记录单元,用于通过所述已分配的内存的用于内存管理的应用程序编程接口API记录所述第一Tensor单元所占用的内存大小;The recording unit is configured to record the size of the memory occupied by the first Tensor unit through the application programming interface API for memory management of the allocated memory;
    所述接收单元,还用于接收针对第一程序的内存占用查询指令;The receiving unit is further configured to receive a memory occupation query instruction for the first program;
    所述获取单元,用于通过所述已分配的内存的用于内存管理的应用程序编程接口API获取所述第一程序使用的所有Tensor单元所占用的内存大小。The acquiring unit is configured to acquire the memory size occupied by all Tensor units used by the first program through the application programming interface API for memory management of the allocated memory.
  17. 根据权利要求10所述的装置,其特征在于,所述内存分配单元,还用于在所述检测单元检测所述需要申请的内存空间是否小于或等于已分配内存的当前最大可用空白内存块的容量之后,在所述需要申请的内存空间小于所述已分配内存的当前最大可用空白内存块的容量的情况下,从所述已分配内存中选择大于所述需要申请的内存空间的最小可用空白内存块分配给所述第一Tensor单元。The device according to claim 10, wherein the memory allocation unit is further configured to detect in the detection unit whether the memory space that needs to be applied for is less than or equal to the current maximum available free memory block of the allocated memory. After the capacity, in the case that the memory space that needs to be applied for is less than the capacity of the current largest available blank memory block of the allocated memory, select the smallest available space from the allocated memory that is larger than the memory space that needs to be applied for The memory block is allocated to the first Tensor unit.
  18. 根据权利要求10所述的装置,其特征在于,所述内存整理单元对所述已分配内存执行内存整理操作,具体为:对所述已分配内存的已占用的内存块之间的间隙进行压缩,以增大所述已分配内存的可用空白内存块的容量;The device according to claim 10, wherein the memory arranging unit performs a memory arranging operation on the allocated memory, specifically: compressing gaps between occupied memory blocks of the allocated memory To increase the capacity of the available blank memory block of the allocated memory;
    其中,所述内存整理单元对所述已分配内存的已占用的内存块之间的间隙进行压缩,具体为:将所述已分配内存的已占用的内存块朝一个方向进行压缩;或者,将所述已分配内存的已占用的内存块分为两组,一组向第一方向进行压缩,另一组向与第一方向相反的方向进行压缩。Wherein, the memory arranging unit compresses the gaps between the occupied memory blocks of the allocated memory, specifically: compressing the occupied memory blocks of the allocated memory in one direction; or, The occupied memory blocks of the allocated memory are divided into two groups, one group is compressed in a first direction, and the other group is compressed in a direction opposite to the first direction.
  19. 一种移动终端,其特征在于,包括处理器以及存储器,所述存储器用于存储一个或多个程序,所述一个或多个程序被配置成由所述处理器执行,所述程序包括用于执行如权利要求1-9任一项所述的方法。A mobile terminal, characterized by comprising a processor and a memory, the memory is used to store one or more programs, the one or more programs are configured to be executed by the processor, and the programs include Perform the method according to any one of claims 1-9.
  20. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质用于存储电子数据交换的计算机程序,其中,所述计算机程序使得计算机执行如权利要求1-9任一项所述的方法。A computer-readable storage medium, wherein the computer-readable storage medium is used to store a computer program for electronic data exchange, wherein the computer program causes a computer to execute any one of claims 1-9 method.
PCT/CN2020/072876 2019-01-28 2020-01-17 Memory management method and device, mobile terminal, and storage medium WO2020156259A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910082246.0A CN109815162A (en) 2019-01-28 2019-01-28 EMS memory management process, device, mobile terminal and storage medium
CN201910082246.0 2019-01-28

Publications (1)

Publication Number Publication Date
WO2020156259A1 true WO2020156259A1 (en) 2020-08-06

Family

ID=66605598

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/072876 WO2020156259A1 (en) 2019-01-28 2020-01-17 Memory management method and device, mobile terminal, and storage medium

Country Status (2)

Country Link
CN (1) CN109815162A (en)
WO (1) WO2020156259A1 (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109815162A (en) * 2019-01-28 2019-05-28 Oppo广东移动通信有限公司 EMS memory management process, device, mobile terminal and storage medium
CN110347506B (en) * 2019-06-28 2023-01-06 Oppo广东移动通信有限公司 Data processing method and device based on LSTM, storage medium and electronic equipment
CN112783640B (en) * 2019-11-11 2023-04-04 上海肇观电子科技有限公司 Method and apparatus for pre-allocating memory, circuit, electronic device and medium
CN111090521B (en) * 2019-12-10 2023-05-02 Oppo(重庆)智能科技有限公司 Memory allocation method and device, storage medium and electronic equipment
CN111708641B (en) * 2020-07-14 2024-03-19 腾讯科技(深圳)有限公司 Memory management method, device, equipment and computer readable storage medium
CN111984400B (en) * 2020-07-17 2024-04-02 深圳云天励飞技术有限公司 Memory allocation method and device for neural network
CN112199190B (en) * 2020-07-31 2023-11-03 星宸科技股份有限公司 Memory allocation method and device, storage medium and electronic equipment
CN112256440B (en) * 2020-12-23 2021-03-09 上海齐感电子信息科技有限公司 Memory management method and device for neural network inference
CN112925727B (en) * 2021-03-16 2023-03-03 杭州慧芯达科技有限公司 Tensor cache and access structure and method thereof
CN113485832A (en) * 2021-07-09 2021-10-08 支付宝(杭州)信息技术有限公司 Method and device for carrying out allocation management on physical memory pool and physical memory pool
CN114327867B (en) * 2021-11-29 2023-11-10 苏州浪潮智能科技有限公司 Memory resource processing method and device, electronic equipment and storage medium
CN114237918B (en) 2022-02-28 2022-05-27 之江实验室 Graph execution method and device for neural network model calculation
CN114518962A (en) * 2022-04-15 2022-05-20 北京奥星贝斯科技有限公司 Memory management method and device
CN114741208B (en) * 2022-06-13 2022-09-23 北京智芯微电子科技有限公司 Electric energy meter, memory stack management method, memory stack management device and storage medium thereof
CN115688893A (en) * 2022-10-19 2023-02-03 北京百度网讯科技有限公司 Memory scheduling method and device, electronic equipment and storage medium
CN117785759B (en) * 2024-02-28 2024-04-23 北京壁仞科技开发有限公司 Data storage method, data reading method, electronic device, and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110246742A1 (en) * 2010-04-01 2011-10-06 Kogen Clark C Memory pooling in segmented memory architecture
CN102279808A (en) * 2011-09-06 2011-12-14 晨星软件研发(深圳)有限公司 Method and device for managing video memory of embedded equipment
CN107153576A (en) * 2017-04-10 2017-09-12 广东欧珀移动通信有限公司 The distribution method and terminal device of a kind of memory source
CN107480080A (en) * 2017-07-03 2017-12-15 香港红鸟科技股份有限公司 A kind of Zero-copy data stream based on RDMA
CN108874532A (en) * 2017-06-01 2018-11-23 北京旷视科技有限公司 Memory allocation method and equipment
CN108897617A (en) * 2018-06-19 2018-11-27 北京元心科技有限公司 The method, apparatus and terminal device of memory management
CN109815162A (en) * 2019-01-28 2019-05-28 Oppo广东移动通信有限公司 EMS memory management process, device, mobile terminal and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102479155B (en) * 2010-11-30 2014-07-02 国际商业机器公司 Method and system for memory overload management applied in network application server system
US10649691B2 (en) * 2014-03-27 2020-05-12 Hitachi, Ltd. Storage system
EP3207507B1 (en) * 2014-10-16 2021-02-17 DeepMind Technologies Limited Augmenting neural networks with external memory
CN104375949A (en) * 2014-12-01 2015-02-25 恒宝股份有限公司 Smart card storage space arrangement method and system
CN107992821B (en) * 2017-11-30 2021-12-03 宁夏恒信荣网络科技有限公司 Image identification method and system
CN108829610B (en) * 2018-04-02 2020-08-04 浙江大华技术股份有限公司 Memory management method and device in neural network forward computing process
CN109144718A (en) * 2018-07-06 2019-01-04 北京比特大陆科技有限公司 A kind of memory allocation method, memory release method and relevant device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110246742A1 (en) * 2010-04-01 2011-10-06 Kogen Clark C Memory pooling in segmented memory architecture
CN102279808A (en) * 2011-09-06 2011-12-14 晨星软件研发(深圳)有限公司 Method and device for managing video memory of embedded equipment
CN107153576A (en) * 2017-04-10 2017-09-12 广东欧珀移动通信有限公司 The distribution method and terminal device of a kind of memory source
CN108874532A (en) * 2017-06-01 2018-11-23 北京旷视科技有限公司 Memory allocation method and equipment
CN107480080A (en) * 2017-07-03 2017-12-15 香港红鸟科技股份有限公司 A kind of Zero-copy data stream based on RDMA
CN108897617A (en) * 2018-06-19 2018-11-27 北京元心科技有限公司 The method, apparatus and terminal device of memory management
CN109815162A (en) * 2019-01-28 2019-05-28 Oppo广东移动通信有限公司 EMS memory management process, device, mobile terminal and storage medium

Also Published As

Publication number Publication date
CN109815162A (en) 2019-05-28

Similar Documents

Publication Publication Date Title
WO2020156259A1 (en) Memory management method and device, mobile terminal, and storage medium
CN111813713B (en) Data acceleration operation processing method and device and computer readable storage medium
WO2021114025A1 (en) Incremental data determination method, incremental data determination apparatus, server and terminal device
CN111177113B (en) Data migration method, device, computer equipment and storage medium
CN111984400A (en) Memory allocation method and device of neural network
WO2017219524A1 (en) Page saving method and electronic device
CN112667405B (en) Information processing method, device, equipment and storage medium
WO2021135574A1 (en) Data storage method and apparatus, and terminal device
CN111885184A (en) Method and device for processing hot spot access keywords in high concurrency scene
CN112764925A (en) Data storage method, device, equipment and storage medium based on virtual memory
CN104915302B (en) Data transmission processing method and data link
CN111666150B (en) Storage space allocation method and device, terminal and computer readable storage medium
CN116467235B (en) DMA-based data processing method and device, electronic equipment and medium
WO2024027140A1 (en) Data processing method and apparatus, and device, system and readable storage medium
WO2020113421A1 (en) Method for mounting file system, terminal device, and storage medium
CN110209548B (en) Service control method, system, electronic device and computer readable storage medium
CN107463829B (en) Processing method, system and the relevant apparatus of DMA request in a kind of cipher card
CN115617800A (en) Data reading method and device, electronic equipment and storage medium
CN115905095A (en) USB drive-free communication method, device, electronic equipment and storage medium
CN109271538A (en) A kind of picture storage method and relevant device
CN112269665B (en) Memory processing method and device, electronic equipment and storage medium
CN111090627B (en) Log storage method and device based on pooling, computer equipment and storage medium
US10832132B2 (en) Data transmission method and calculation apparatus for neural network, electronic apparatus, computer-readable storage medium and computer program product
CN106202262A (en) A kind of information processing method and electronic equipment
CN112650693A (en) Static memory management method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20748822

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20748822

Country of ref document: EP

Kind code of ref document: A1