CN112363824A

CN112363824A - Memory virtualization method and system under Shenwei architecture

Info

Publication number: CN112363824A
Application number: CN202011084199.2A
Authority: CN
Inventors: 沙赛; 罗英伟; 汪小林; 张毅
Original assignee: Wuxi Advanced Technology Research Institute; Peking University
Current assignee: Wuxi Advanced Technology Research Institute; Peking University
Priority date: 2020-10-12
Filing date: 2020-10-12
Publication date: 2021-02-12
Anticipated expiration: 2040-10-12
Also published as: CN112363824B

Abstract

The invention relates to a memory virtualization method and system under a Shenwei architecture. The method comprises the following steps: establishing a buffer area for storing a base address of a shadow page table; when the CPU queries the TLB and generates TLB miss, the CPU accesses the buffer to acquire the base address of the shadow page table of the current process, loads the base address of the shadow page table into a memory management unit and starts page table query; when mapping is missing in page table query, the CPU switches the client context to the host context to perform missing page interrupt processing; directly filling virtual and real address translation mapping obtained after the interruption process of missing page into corresponding TLB to realize TLB prefetching; and the CPU inquires the TLB again to complete the address conversion from the virtual address of the client to the physical address of the host machine. The invention realizes the simultaneous refreshing of the shadow page table and the TLB based on the TLB characteristic managed by the Shenwei architecture software, thereby realizing the synchronization of the shadow page table and the client process page table.

Description

Memory virtualization method and system under Shenwei architecture

Technical Field

The invention relates to the field of Shenwei architecture virtualization, in particular to a method and a system for realizing efficient memory virtualization under the Shenwei architecture.

Background

The development of the Shenwei family of processors, which are representative of the domestic processors, is of great interest. The huge success of super computer Shenwei Taihu lake lays the important position of Shenwei in the field of domestic processors. Particularly, in a part with high safety and autonomous controllability such as a government, the Shenwei server is favored and is mainly used for a desktop office system. The first generation of the Shenwei architecture instruction set is derived from the Alpha instruction set, and then has been developed into the independent and autonomous Shenwei instruction set through continuous improvement and development.

Compared to the international mainstream computer processor architecture (e.g., x86), the Shenwei architecture still has a significant gap in functionality and performance. With the continuous development of the information technology, the Shenwei processor is not limited to a desktop office system but follows the trend of the times, so that the Shenwei processor is oriented to a wider cloud service system. Virtualization is one of the main support technologies for cloud services. Virtualization refers to the virtualization of a physical computer system into one or more virtual computer systems (virtual machines). Each virtual machine has its own virtual hardware (such as CPU, memory, etc.) to provide an independent and complete computer execution environment. Virtualization mainly aims at three physical resources, namely CPU virtualization, memory virtualization and I/O virtualization. The memory virtualization is the most complex virtualization technology, and the quality of the memory virtualization is often the bottleneck of the system performance of the virtual machine.

From the perspective of the operating system, it has two fundamental insights into physical memory: physical addresses start at zero and memory address continuity. The virtual machine runs on the host machine as an independent process, and the two basic conditions are difficult to meet. Virtualization introduces a new layer of system software, called a virtual machine monitor (or hypervisor), that controls access to physical resources by the guest operating systems. To satisfy the two basic conditions, the hypervisor introduces a new address space called the guest physical address space. In a computer system, accessing a memory by a CPU includes two steps: virtual-real address translation and accessing memory data based on physical addresses. Virtual-to-real address translation refers to translating virtual addresses of program operations to actual physical addresses.

In a virtualized environment, address translation is divided into two steps: client virtual address- > client physical address- > host physical address. The task of memory virtualization is how to efficiently accomplish the two-layer address translation. In this process, the address translation overhead is mainly divided into three parts: TLB lookup, page table lookup, and page fault handling.

The existing main stream architecture memory virtualization solutions are divided into two types: software memory virtualization represented by traditional shadow page tables and hardware-assisted virtualization represented by extended page tables. Both of these solutions are implemented on a mainstream processor architecture such as x86, but neither is suitable for a Shenwei architecture processor. For expanding the page table model, the two-layer address translation is efficiently completed by using hardware essentially, but the Shenwei architecture lacks hardware support, and if the model is realized by pure software, the performance cannot meet the production and living requirements. Furthermore, while the extended page table model reduces the miss interrupt handling overhead compared to traditional shadow page tables, it introduces additional page table walk overhead.

With respect to the traditional shadow page table model, on one hand, the code implementation is extremely complex and inefficient due to the synchronization mechanism of write protection; on the other hand, the software flexibility advantage unique to the Shenwei architecture cannot be exerted. The Shenwei architecture has unique virtualization advantages compared to the x86 architecture. First, the Shenwei architecture is a software-managed Translation Lookaside Buffer (TLB), which provides the necessary conditions for memory virtualization optimization. The TLB is a hardware device having a small storage capacity, and directly stores a mapping relationship from a virtual address to a physical address. It is an address translation unit next to the CPU, which first accesses the TLB to look up the virtual-real address translation map at each address translation. Second, the Shenwei architecture has a hardware mode higher than the Kernel mode and a unique programmable software interface HMcode. The software interface runs on a hardware mode, and has the highest system authority to directly access registers, a memory and other devices such as a TLB and the like. By the characteristic, the Shenwei architecture has extremely high flexibility of bottom-layer software and can provide rich and diverse support for virtualization.

The Shenwei architecture has unique virtualization advantages. In addition to user mode and kernel mode, the Shenwei architecture has a mode with the highest privilege, called hardware mode. Hosts and clients under the AWARE architecture may have orthogonal three levels of privileges, namely user mode, kernel mode, and hardware mode. This is similar to the Intel VMX mode of operation. The svwey HMcode is a programmable interface between the kernel layer and the hardware, operating in hardware mode to execute privileged instructions. The HMcode interface is transparent to the user layer and even the kernel layer and can directly access registers and memory using physical addresses and the like. The operating system may be trapped in hardware mode by system calls. For example, HMcode provides a TLB flush interface for the kernel called TBI. Similar to VPID (Virtual Process ID) and PCID (Process context identifier) in TLB of x86 architecture, VPN (Virtual processor number) and UPN (User Process number) in Sunway TLB may distinguish different Virtual processors and processes, respectively. The HMcode interface provides software flexibility and may also help verify virtualized hardware support.

Disclosure of Invention

The invention aims to realize a memory virtualization system on a Shenwei server by fully combining the advantages of the Shenwei architecture and based on a TLB (translation lookaside buffer) managed by software. Specifically, the memory virtualization system of the Shenwei 1621 server is realized by fully utilizing the characteristics of the Shenwei software programmable interface HMcode, the TLB managed by software and the like based on the shadow page table memory virtualization model. The core idea of the invention is to realize the simultaneous refreshing of the shadow page table and the TLB based on the TLB characteristic managed by the Shenwei architecture software, thereby realizing the synchronization of the shadow page table and the client process page table.

The technical scheme adopted by the invention is as follows:

a memory virtualization method under Shenwei architecture comprises the following steps:

establishing a buffer area for storing a base address of a shadow page table;

when the CPU queries the TLB and generates TLB miss, the CPU accesses the buffer area to acquire the base address of the shadow page table of the current process by using the TLB characteristic managed by the Shenwei architecture software, loads the base address of the shadow page table into a memory management unit and starts page table query;

when mapping is missing in page table query, the CPU switches the client context to the host context to perform missing page interrupt processing;

filling the TLB by utilizing the characteristic that Shenwei architecture software fills the TLB, and directly filling virtual-real address translation mapping obtained after the missing page interrupt processing into the corresponding TLB to realize the TLB prefetching;

and the CPU inquires the TLB again to complete the address conversion from the virtual address of the client to the physical address of the host machine.

Further, the buffer uses 16 physical pages, each page stores 1024 entries, the buffer is indexed using a combination of a 2-bit VPN and an 8-bit UPN, and each entry in the buffer contains a 64-bit shadow page table base address of a process.

Further, when the page table is queried, a 4-level shadow page table structure is adopted, the memory management unit traverses the 4-level shadow page table to obtain the mapping from the virtual address of the client to the physical address of the host, if the page table query is successful, the mapping is filled into the TLB, then the CPU queries the TLB again to complete the address translation; a mapping miss for any one level of page tables results in a page miss interrupt.

Further, the page fault interrupt processing includes: client process page table walk, host process page table walk, and shadow page table build and fill.

Further, the page fault interrupt processing includes:

inquiring a client process page table, converting a client virtual address into a client physical address, and if the client process page table is missing or incompletely mapped, entering a virtual machine to perfect the client process page table;

converting the physical address of the client into a virtual address of a host machine, and inquiring a page table of a process of the host machine to convert the virtual address of the host machine into the physical address of the host machine;

and utilizing the mapping from the virtual address of the client to the physical address of the host obtained by the two-step query, constructing a four-level page table according to the organization rule of the shadow page table, and filling the mapping.

Further, maintaining synchronization of the shadow page table and the guest process page table is performed by:

the client operating system refreshes the TLB entry of the current process directly through system call without exiting the virtual machine;

implementing a shadow page table flusher in the HMcode interface to monitor the software managed TLB interface;

the shadow page table flusher decodes the captured TLB flush instruction and invalidates the corresponding shadow page table entries, thereby enabling a simultaneous flush of the TLB and shadow page tables.

Further, TLB flush requests are issued over two interfaces in the claims operating system: one interface is the operating system process page table page fault handling function and the other interface is the process context switch handling function.

Further, the following steps are adopted to timely recycle the discarded shadow page table:

when the TLB refreshing instruction captured by the shadow page table refresher is to refresh the TLB entries of the whole process, immediately invalidating the entries of the corresponding process stored in the shadow page table base address buffer;

when the shadow page table is in page missing processing, when the virtual machine monitoring program constructs shadow page table mapping, firstly checking the valid bit of the base address of the current process in the buffer, and directly recycling all the shadow page tables of the current process if the valid bit is invalid.

A memory virtualization system under the Shenwei architecture using the above method, comprising:

the TLB query module is used for realizing that the CPU queries the TLB to acquire virtual-real address translation mapping;

the page table query module is used for accessing a pre-established buffer area for storing the base address of the shadow page table by a CPU (central processing unit) to acquire the base address of the shadow page table of the current process by utilizing the TLB characteristic managed by the Shenwei architecture software when the TLB is queried and the TLB is missed, loading the base address of the shadow page table into the memory management unit and starting page table query;

the page missing processing module is used for switching the client context to the host context by the CPU to carry out page missing interrupt processing when mapping missing occurs in page table query;

and the TLB prefetching module is used for directly filling virtual-real address translation mapping obtained after the missing page interrupt processing into the corresponding TLB by utilizing the characteristic that the Shenwei architecture software fills the TLB so as to realize TLB prefetching, so that the address translation from the virtual address of the client to the physical address of the host is completed when the CPU inquires the TLB again.

The virtual machine based on the Shenwei architecture adopts the method to perform memory virtualization.

The invention provides a novel memory virtualization method and a novel memory virtualization system under a Shenwei architecture, which are based on the characteristics of the Shenwei architecture, in particular to a TLB mechanism of software management. On one hand, the shadow page table is used as a basis, the advantages of efficient page table query of a traditional shadow page table model are inherited, and the page missing processing overhead caused by write protection synchronization of the traditional shadow page table model is eliminated; on the other hand, the invention does not need complex hardware support and does not introduce extra page table query cost as the expansion of the page table model.

Drawings

Fig. 1 is an implementation interface diagram of a memory virtualization model on the schenware architecture.

FIG. 2 is a diagram of the Shenwei memory virtualization overhead results using the SPEC CPU2006 test set.

FIG. 3 is a graph of x86 memory virtualization overhead results using the SPEC CPU2006 test set.

FIG. 4 is a comparison graph of Shenwei and x86 memory virtualization overhead using the SPEC CPU2017 large working set program.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, the present invention shall be described in further detail with reference to the following detailed description and accompanying drawings.

The "software managed TLB" described in the present disclosure refers to an architecture that exposes a TLB software management interface for an operating system to flush/fill TLB entries.

The shadow page table is used for directly caching mapping relation from virtual address of a client to physical address of a host in a virtualization environment. This is an important method for accelerating the efficiency of two-layer address translation.

The page table synchronization refers to the fact that in a shadow page table memory virtualization model, a shadow page table needs to keep mapping consistency with a client process page table, and the process of maintaining the consistency is called page table synchronization.

The present invention relates to a TLB and shadow page table simultaneous refreshing method, which is characterized in that a monitoring-capturing TLB refreshing instruction is realized based on a Shenwei architecture TLB software management interface, and corresponding shadow page table entries are refreshed simultaneously. In a computer system, to ensure the validity of TLB entries, once the process page table is modified by the operating system, the corresponding TLB entry must be flushed, i.e., the old mapping invalidated.

The KVM is a module in a Linux kernel, and is an open-source efficient virtualization solution. It consists of a loadable kernel module that provides the core virtualization infrastructure, and a processor specific module for architecture emulation and interrupt handling. The invention realizes a memory virtualization model based on KVM on the Shenwei 1621 server. Fig. 1 shows an implementation interface of a memory virtualization model on the schenware architecture. Wherein, I-TLB represents instruction TLB, and D-TLB represents data TLB. In the invention, the memory virtualization mainly ensures two aspects of work, namely, efficiently completing the address conversion from the virtual address of the client to the physical address of the host, and maintaining the synchronization of the shadow page table and the process page table of the client.

1. And (3) address conversion flow:

the task of memory virtualization is to complete the conversion of a virtual address of a client to a physical address of a host.

1) And (4) querying the TLB. CPU visits TLB, according to the virtual address inquiry mapping of the client, if TLB hits, then address translation is finished; otherwise, a TLB miss is generated, and a page table walk is entered.

2) And (5) page table query. The invention designs a buffer with limited size for storing the base address of the shadow page table. By utilizing the TLB characteristic managed by the Shenwei architecture software, before the page table query, the CPU accesses the buffer to acquire the base address of the shadow page table of the current process. The Shenwei 1621 has 16 physical cores and a page size of 8 KB. The buffer uses 16 physical pages, each storing 1024 entries. The present invention uses a 2-bit VPN and 8-bit UPN combination index buffer. Each entry in the buffer contains a 64-bit shadow page table base for one process. And loading the base address of the shadow page table into a memory management unit, and starting page table query. The invention adopts 4-level shadow page table structure, the memory management unit traverses 4-level page table to obtain the mapping from the virtual address of the client to the physical address of the host. If the page table walk is successful, the mapping to the TLB is filled, and then the CPU walks the TLB again, completing the address translation. A missing mapping of any one of the level page tables results in a missing page interrupt.

3) And interrupting the processing of missing pages. Once a mapping miss occurs in the page table walk, the CPU will switch the guest context to the host context to handle the page miss interrupt. The parameters of the incoming page fault interrupt handler include the virtual address of the client, error information, and the like. The missing page interrupt processing mainly comprises three parts: client process page table walk, host process page table walk and shadow page table build and fill.

a) The system first queries the client process page table to translate the client virtual address to the client physical address. If the guest process page table is missing (or not fully mapped), then the virtual machine needs to be re-entered to complete the guest process page table.

b) The client physical address and the host virtual address are continuous and have a direct linear mapping relationship. The conversion can be direct. The system queries a host process page table to convert a host virtual address to a host physical address.

c) According to the two steps of inquiry, the system obtains the mapping from the virtual address of the client machine to the physical address of the host machine. The system constructs a four-level page table according to the organization rule of the shadow page table and fills the mapping.

4) TLB prefetching. After the page fault interrupt processing is finished, the CPU enters the virtual machine again to execute an instruction which generates the TLB fault. Before that, the present invention utilizes the characteristic of filling TLB by using Shenwei architecture software, and fills the virtual-real address translation mapping obtained after the missing page interrupt processing into the corresponding TLB directly, which is called TLB prefetching. Under the condition of no TLB prefetching, the CPU needs to execute the original instruction, generates TLB missing and page table query, acquires mapping in the shadow page table and fills the TLB, and then executes the original instruction again to complete address translation. TLB prefetch optimization may save one TLB miss and one page table walk.

2. Page table synchronization:

the guest operating system can flush (invalidate) the TLB entries of the current process directly through system calls without requiring a virtual machine exit (i.e., context switch). In the Shenwei operating system, there are two main interfaces that can issue "TLB flush" requests. One is the operating system process page table page fault handling function. When the operating system updates the process page table, it needs to invalidate the old TLB entries with the same guest virtual address. The other interface is a process context switch handling function. When switching process contexts, all TLB entries under the entire virtual processor should be flushed if a rotation UPN is required. The TLB entry of the Shenwei 1621 contains an 8-bit UPN that identifies the active process. Each process gets a UPN when it is first scheduled on the CPU. If the number of processes exceeds 256, the UPN will rotate, meaning that all TLB entries under the current virtual processor will be flushed, and the system will reassign the UPN to the active process. The present invention implements a shadow page table flusher in the HMcode interface to monitor the TLB interface for software management. The shadow page table flusher decodes the captured TLB flush instruction and invalidates the corresponding shadow page table entry. Therefore, the present invention realizes the simultaneous refreshing of the TLB and the shadow page table.

3. And (3) recovering the table page of the shadow page:

the shadow page table belongs to the host process memory and is managed by the virtual machine monitoring program. In a multitasking virtual machine, each process uses a shadow page table. Under the Shenwei architecture, we maintain 246 process shadow page tables for each virtual processor, and at most 1024 process shadow page tables on one physical core. The frequent creation and destruction of processes requires that the memory virtualization model must recycle obsolete shadow page tables in a timely manner. The TLB flush has different granularities such as a single TLB entry, a TLB entry of the whole process, and the like, and when the TLB flush instruction captured by the shadow page table flusher flushes the TLB entry of the whole process, the entry of the corresponding process stored in the shadow page table base address buffer is immediately invalidated. When the shadow page table is in page missing processing, when the virtual machine monitoring program constructs shadow page table mapping, the virtual machine monitoring program firstly checks the valid bit of the base address of the current process in the buffer area, and directly recycles all the shadow page table of the current process if the valid bit is invalid.

4. And (3) experimental evaluation:

to verify the efficiency of the present invention, we evaluated using the SPEC CPU test set and STREAM bandwidth test script process. Since the test set in SPEC CPU2006 is generally small (less than 3GB), we also selected a partial large working set in SPEC CPU2017 for testing. FIGS. 2 and 3 show the test results of the SPEC CPU2006 of the new shadow page table model under the Shenwei architecture and the conventional shadow page table model and the extended page table model under x86, respectively. Experiment results show that the execution time average cost of memory virtualization of the new shadow page table under the Shenwei architecture is only 1.36%, which is significantly lower than the cost of the traditional shadow page table (5.97%) and the extended page table (5.36%) under the x86 architecture. FIG. 4 is a test result using the SPEC CPU2017, again, the new model exhibits good performance even with a large working set program. Under the test, the virtualization overhead of the new shadow page table model under the Shenwei architecture is only 3.22%, and the overhead of the traditional shadow page table model and the extended page table model under x86 is respectively as high as 9.27% and 11.06%. STREAM is a classic procedure for testing system bandwidth. The experimental result shows that the bandwidth loss of memory virtualization under the Shenwei architecture is only 0.5%.

Based on the same inventive concept, another embodiment of the present invention provides a memory virtualization system under the Shenwei architecture using the above method, including:

Based on the same inventive concept, another embodiment of the present invention provides a virtual machine based on the Shenwei architecture, and the virtual machine performs memory virtualization by using the method of the present invention.

The foregoing disclosure of the specific embodiments of the present invention and the accompanying drawings is directed to an understanding of the present invention and its implementation, and it will be appreciated by those skilled in the art that various alternatives, modifications, and variations may be made without departing from the spirit and scope of the invention. The present invention should not be limited to the disclosure of the embodiments and drawings in the specification, and the scope of the present invention is defined by the scope of the claims.

Claims

1. A memory virtualization method under Shenwei architecture is characterized by comprising the following steps:

establishing a buffer area for storing a base address of a shadow page table;

2. The method of claim 1, wherein the buffer uses 16 physical pages, each page stores 1024 entries, the buffer is indexed using a combination of a 2-bit VPN and an 8-bit UPN, and each entry in the buffer contains a 64-bit shadow page table base for a process.

3. The method of claim 1, wherein when performing the page table walk, a 4-level shadow page table structure is adopted, the memory management unit traverses the 4-level shadow page table to obtain the mapping from the virtual address of the client to the physical address of the host, if the page table walk is successful, the mapping is filled in the TLB, and then the CPU queries the TLB again to complete the address translation; a mapping miss for any one level of page tables results in a page miss interrupt.

4. The method of claim 1, wherein the page fault interrupt handling comprises: client process page table walk, host process page table walk, and shadow page table build and fill.

5. The method of claim 4, wherein the page fault interrupt handling comprises:

6. The method of claim 1, wherein the step of maintaining synchronization of the shadow page table and the guest process page table is performed by:

7. The method of claim 6, wherein the TLB flush request is issued over two interfaces in the SW operating system: one interface is the operating system process page table page fault handling function and the other interface is the process context switch handling function.

8. The method of claim 1, wherein the obsolete shadow page tables are reclaimed in time by:

9. A memory virtualization system under an Schwek architecture using the method of any one of claims 1-8, comprising:

10. A virtual machine based on the schenware architecture, wherein the virtual machine performs memory virtualization by using the method of any one of claims 1 to 8.