CN107368431B

CN107368431B - Memory access method, cross switch and computer system

Info

Publication number: CN107368431B
Application number: CN201610308364.5A
Authority: CN
Inventors: 王洪虎; 高翔; 朱琛; 苏孟豪
Original assignee: Loongson Technology Corp Ltd
Current assignee: Loongson Technology Corp Ltd
Priority date: 2016-05-11
Filing date: 2016-05-11
Publication date: 2020-03-31
Anticipated expiration: 2036-05-11
Also published as: CN107368431A

Abstract

The embodiment of the invention provides a memory access method, a cross switch and a computer system, wherein the method comprises the following steps: the cross bar receives an access request message sent by a GPU, wherein the access request message comprises: starting addresses corresponding to the memory space to be accessed; if the initial address is greater than or equal to the first address, the crossbar switch determines a mapping address corresponding to the initial address, so that the GPU accesses a memory space where the mapping address starts; before the first address is used for determining the mapping address for the crossbar switch, the GPU actually can access the highest address corresponding to the memory space, the mapping address is located between the second address and the third address, the second address is the highest address corresponding to the memory space theoretically accessible by the GPU, and the third address is the highest address of the memory; the first address is smaller than the second address, and the second address is smaller than the third address. Thereby meeting the memory requirement of the GPU.

Description

Memory access method, cross switch and computer system

Technical Field

The present invention relates to memory access technologies, and in particular, to a memory access method, a crossbar switch, and a computer system.

Background

With the continuous development of computer technology, many computers currently include a Central Processing Unit (CPU) and a Graphics Processing Unit (GPU).

When the number of bits of the binary number which can be processed once by the GPU in unit time is smaller than the number of bits of the binary number which can be processed once by the CPU in unit time, the theoretically accessible memory space of the GPU can be determined according to the number of bits of the binary number which can be processed once by the GPU in unit time. For example: based on Microprocessor (MIPS) 64 architecture without internal interlocked pipeline Stages, GPU32 can theoretically access 2³²The corresponding address range is 0-0xffffffff for 4G of memory space, however, only 256MB of memory space in the 4G of memory space may not be used for storing the page table, so the GPU can only compete for the 256MB of memory space with other devices, however, the memory requirement of the GPU is 4G, and thus 256MB cannot meet the memory requirement of the GPU.

In summary, the memory access method in the prior art cannot meet the memory requirement of the GPU.

Disclosure of Invention

The embodiment of the invention provides a memory access method, a crossbar switch and a computer system, so that the memory requirement of a GPU (graphics processing unit) is met.

In a first aspect, an embodiment of the present invention provides a memory access method, where the memory access method is applied to a computer system without an internal interlocking pipeline microprocessor MIPS architecture, where the computer system includes: the method comprises the following steps that a Central Processing Unit (CPU), a memory, a cross switch and a Graphics Processing Unit (GPU), wherein the cross switch is respectively connected with the memory and the GPU, the number of bits of binary numbers which can be processed once by the CPU in unit time is larger than the number of bits of binary numbers which can be processed once by the GPU in unit time, and the method comprises the following steps:

the crossbar receives an access request message sent by the GPU, wherein the access request message comprises: starting addresses corresponding to the memory space to be accessed;

if the starting address is larger than or equal to a first address, the crossbar switch determines a mapping address corresponding to the starting address, so that the GPU accesses a memory space where the mapping address starts;

before the crossbar switch determines the mapping address, the first address is the highest address corresponding to the memory space actually accessible by the GPU, the mapping address is located between a second address and a third address, the second address is the highest address corresponding to the memory space theoretically accessible by the GPU, and the third address is the highest address of the memory;

the first address is smaller than the second address, and the second address is smaller than the third address.

As described above, if the start address is greater than or equal to the first address, the determining, by the crossbar switch, the mapping address corresponding to the start address includes: and if the starting address is greater than or equal to the first address, the crossbar switch performs OR operation on the starting address and the next address of the second address to obtain the mapping address.

As described above, if the start address is greater than or equal to the first address, the determining, by the crossbar switch, the mapping address corresponding to the start address includes:

if the starting address is greater than or equal to the first address, the crossbar switch determines a first difference value between the starting address and the second address and a second difference value between the starting address and the third address;

the crossbar determines an offset based on the start address according to the first difference and the second difference, the offset being greater than or equal to the first difference and less than the second difference;

and the crossbar switch determines a mapping address corresponding to the starting address according to the starting address and the offset.

In a second aspect, an embodiment of the present invention provides a crossbar switch, where the crossbar switch is applied to a computer system of a microprocessor MIPS architecture without an internal interlock pipeline stage, where the computer system further includes: CPU, internal memory and graphic processor GPU, the said cross bar switch is connected with said internal memory and said GPU respectively, wherein, the number of bits of binary number that the said CPU can be processed once in unit time is greater than the number of bits of binary number that the said GPU can be processed once in unit time, the said cross bar switch includes: a receiving module and a determining module;

the receiving module is configured to receive an access request message sent by the GPU, where the access request message includes: starting addresses corresponding to the memory space to be accessed;

if the starting address is larger than or equal to a first address, the determining module determines a mapping address corresponding to the starting address so that the GPU accesses a memory space where the mapping address starts;

As described above, the determining module is specifically configured to:

if the starting address is greater than or equal to the first address, the determining module performs an or operation on the starting address and a next address of the second address to obtain the mapping address.

As described above, the determining module is specifically configured to:

if the starting address is greater than or equal to the first address, the determining module determines a first difference between the starting address and the second address and a second difference between the starting address and the third address;

the determining module determines an offset based on the start address according to the first difference and the second difference, the offset being greater than or equal to the first difference and less than the second difference;

and the determining module determines a mapping address corresponding to the starting address according to the starting address and the offset.

In a third aspect, an embodiment of the present invention provides a computer system based on a microprocessor MIPS architecture without an internal interlock pipeline stage, where the computer system includes: a Central Processing Unit (CPU), a memory, a Graphics Processing Unit (GPU) and the cross bar switch;

the number of bits of binary numbers which can be processed once by the CPU in unit time is larger than the number of bits of binary numbers which can be processed once by the GPU in unit time.

Embodiments of the present invention provide a memory access method, a crossbar switch, and a computer system, where in the prior art, once an initial address is greater than or equal to a first address, a memory requirement of a GPU cannot be met, that is, the GPU cannot implement a read operation or a write operation on a memory. The method expands the address between the second address and the third address, and the GPU has accessible memory space no matter the initial address is smaller than the first address or larger than or equal to the first address, thereby meeting the memory requirement of the GPU.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1A is a schematic view of an application scenario according to an embodiment of the present invention;

fig. 1B is a schematic diagram of another application scenario provided in an embodiment of the present invention;

fig. 2 is a flowchart of a memory access method according to an embodiment of the present invention;

fig. 3 is a flowchart of a memory access method according to another embodiment of the present invention;

fig. 4 is a flowchart of a memory access method according to another embodiment of the present invention;

FIG. 5 is a schematic diagram of a crossbar switch according to an embodiment of the present invention;

fig. 6 is a schematic diagram of a computer system according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example one

In order to solve the problem that the existing memory access method cannot meet the requirement of a GPU on a memory, the embodiment of the invention provides a memory access method, a cross switch and a computer system, wherein the method is applied to the computer system of a microprocessor MIPS (million Instructions per second) architecture without an internal interlocking pipeline stage, and the computer system comprises: the CPU, the Memory, the cross switch and the GPU are respectively connected with the Memory and the GPU, wherein the number of bits of binary numbers which can be processed once by the CPU in unit time is larger than the number of bits of binary numbers which can be processed once by the GPU in unit time.

Specifically, the memory access method may be applied to the following two scenarios, fig. 1A is a schematic view of an application scenario provided in an embodiment of the present invention, and as shown in fig. 1A, the memory access method is applied to a System on Chip (SoC), wherein a crossbar switch is respectively connected to a GPU, a cache, a CPU, and a memory controller, the memory controller is connected to a ROM or a RAM, and the crossbar switch is used for executing the memory access method. Fig. 1B is a schematic diagram of another application scenario provided by an embodiment of the present invention, as shown in fig. 1B, a crossbar is connected to a cache, a memory controller and a bridge, and the memory controller is connected to a system ROM or RAM; wherein, be provided with GPU, HT bus interface, cross switch, memory controller and ROM or RAM on the bridge piece, the cross switch on the bridge piece is connected with GPU, the memory controller on the bridge piece respectively, the cross switch's on the bridge piece effect is: it is determined whether the GPU accesses the ROM or RAM on the bridge directly or accesses the system memory via the HT bus, as shown in fig. 1B, which is the system ROM, RAM, or cache. Based on the above two application scenarios, fig. 2 is a flowchart of a memory access method according to an embodiment of the present invention, and as shown in fig. 2, the memory access method includes:

s201: the cross bar receives an access request message sent by a GPU, wherein the access request message comprises: starting addresses corresponding to the memory space to be accessed;

s202: if the starting address is greater than or equal to the first address, the crossbar switch determines a mapping address corresponding to the starting address, so that the GPU accesses a memory space where the mapping address starts.

The memory space in the embodiment of the present invention refers to a virtual memory space, that is, an address space of a CPU, and addresses related to the embodiment of the present invention all refer to linear addresses. Before the first address is used for determining the mapping address for the crossbar switch, the GPU can actually access the highest address corresponding to the memory space, the second address is the highest address determined according to the number of bits of binary numbers which can be processed once in unit time, the GPU can theoretically access the highest address corresponding to the memory space, the third address is the highest address determined according to the bits of binary numbers which can be processed once in unit time, the first address is smaller than the second address, the second address is smaller than the third address, the range of the starting address is [0, the second address), and the mapping address is located between the second address and the third address.

For example: assuming that the number of bits of a binary number that can be processed once by the CPU in unit time is 64 bits, the number of bits of a binary number that can be processed once by the GPU in unit time is 32 bits, and according to the number of bits of a binary number that can be processed once by the CPU in unit time, it can be determined that the memory space is 2⁶⁴The highest address corresponding to the memory space determined according to the number of bits of the binary number that the CPU can process once in unit time, that is, the third address is 0 xfffffffffffffffffff, and the size of the memory space that the GPU can access theoretically can be determined to be 2 according to the number of bits of the binary number that the GPU can process once in unit time³²The theoretically highest address, i.e. the second address, of the memory space accessible at 4G is 0xffffffffff, but since it is lowPart of the memory space in the 4G memory space is needed to store the page table, and only 256MB of memory space is usually left for the GPU and other devices to compete, where 256MB is the size of the memory space actually accessible by the GPU, and the corresponding first address is 0x 10000000.

It should be noted that, in the embodiment of the present invention, the memory space accessed by the GPU may be divided into two cases: one is a memory space corresponding to the RAM or the ROM accessed by the GPU, and the other is a cache accessed by the GPU, and for the second situation, when the GPU does not access required contents in the cache, the cache determines the address corresponding to the RAM or the ROM to be accessed by the GPU through the mapping relation between the cache and the RAM or the ROM, so that the GPU can access the RAM or the ROM.

In the embodiment of the present invention, if the starting address is greater than or equal to the first address, the crossbar switch determines the mapping address corresponding to the starting address, where the mapping address is located between the second address and the third address, the GPU in the prior art can only access the memory space between 0 and the first address, and once the starting address corresponding to the memory space to be accessed included in the access request message is greater than or equal to the first address, the memory requirement of the GPU cannot be met, that is, the GPU cannot implement a read operation or a write operation on the memory. The method expands the address between the second address and the third address, and the GPU has accessible memory space no matter the initial address is smaller than the first address or larger than or equal to the first address, thereby meeting the memory requirement of the GPU.

Example two

Based on the basis of the first embodiment, one alternative is as follows:

fig. 3 is a flowchart of a memory access method according to another embodiment of the present invention, and as shown in fig. 3, the method specifically includes the following steps:

s301: the cross bar receives an access request message sent by a GPU, wherein the access request message comprises: starting addresses corresponding to the memory space to be accessed;

s302: the cross switch judges whether the initial address is greater than or equal to the first address, if so, S303 is executed; otherwise, executing S304;

s303: the crossbar switch performs OR operation on the initial address and the next address of the second address to obtain a mapping address, and the GPU accesses the memory space where the mapping address starts;

s304: the GPU directly accesses the memory space where the start address starts.

Specifically, based on the example of the first embodiment, assuming that the number of bits of a binary number that can be processed once by the CPU in unit time is 64 bits, the number of bits of a binary number that can be processed once by the GPU in unit time is 32 bits, it is determined that the first address is 0x10000000, the second address is 0xfffffffff, and the third address is 0 xfffffffffffffffffffffffffffffffff according to the number of bits of the binary number that can be processed once by the CPU in unit time, when the start address is 0x23000000, the crossbar switch determines that the start address is greater than the first address, the crossbar switch performs or operation on the start address and a next address of the second address, where the next address of the second address is 0x100000000, and the map address is 0x123000000 after performing or operation on the start address x100000000 and 0x23000000, and the map address is between the second address and the third address, so that the GPU can access a memory space where 0x 1230000; when the start address is 0x1000000, the crossbar switch determines that the start address is smaller than the first address, and the GPU may access the memory space starting with 0x 1000000.

In the embodiment of the invention, when the crossbar switch judges that the initial address is greater than or equal to the first address, the crossbar switch performs OR operation on the initial address and the next address of the second address to obtain a mapping address, and the GPU accesses the memory space corresponding to the mapping address; and when the starting address is judged to be smaller than the first address by the cross switch, the GPU directly accesses the memory space where the starting address starts. Therefore, no matter the initial address is smaller than the first address or larger than or equal to the first address, the GPU has accessible memory space, and the memory requirement of the GPU is met.

EXAMPLE III

Based on the basis of the first embodiment, another alternative is as follows:

fig. 4 is a flowchart of a memory access method according to another embodiment of the present invention, and as shown in fig. 4, the method specifically includes the following steps:

s401: the cross bar receives an access request message sent by a GPU, wherein the access request message comprises: starting addresses corresponding to the memory space to be accessed;

s402: the cross switch judges whether the initial address is greater than or equal to the first address, if so, S403 is executed; otherwise, executing S406;

s403: the crossbar switch determines a first difference value of the starting address and the second address and a second difference value of the starting address and the third address;

s404: the crossbar switch determines an offset based on the start address according to a first difference and the second difference, wherein the offset is greater than or equal to the first difference and less than the second difference;

s405: the cross switch determines a mapping address corresponding to the initial address according to the initial address and the offset, and the GPU accesses a memory space where the mapping address starts;

s406: the GPU directly accesses the memory space where the start address starts.

Specifically, based on the example of the first embodiment, assuming that the number of bits of a binary number that can be processed once by the CPU in unit time is 64 bits, the number of bits of a binary number that can be processed once by the GPU in unit time is 32 bits, it is determined that the first address is 0x10000000, the second address is 0xfffffffffffff, and the third address is 0 xfffffffffffffffffffffffffffff, when the start address is 0x23000000, the crossbar switch determines that the start address is greater than the first address, the crossbar switch determines that the first difference between the start address and the second address is 0 xffffffffffffffffffffffffffffffffff-0 x 23000000-0 xffffffffffffffffffff, and the second difference between the start address and the third address is 0 xffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff; assuming that the determined offset is 0xdd000000, the mapping address corresponding to the start address is 0x23000000+0xdd000000 — 0x 100000000; when the start address is 0x1000000, the crossbar switch determines that the start address is smaller than the first address, and the GPU may access the memory space starting with 0x 1000000.

In the embodiment of the invention, when the crossbar switch judges that the initial address is greater than or equal to the first address, the crossbar switch obtains the mapping address according to the first difference and the second difference, and the GPU accesses the memory space corresponding to the mapping address; and when the starting address is judged to be smaller than the first address by the cross switch, the GPU directly accesses the memory space where the starting address starts. Therefore, no matter the initial address is smaller than the first address or larger than or equal to the first address, the GPU has accessible memory space, and the memory requirement of the GPU is met.

Example four

The embodiment of the present invention further provides a crossbar switch, which is applied to a computer system of an MIPS architecture, and the computer system further includes: the crossbar switch is respectively connected with the memory and the GPU, wherein the number of bits of binary numbers which can be processed once by the CPU in unit time is larger than the number of bits of binary numbers which can be processed once by the GPU in unit time.

Specifically, fig. 5 is a schematic diagram of a crossbar switch according to an embodiment of the present invention, and as shown in fig. 5, the crossbar switch includes: a receiving module 51 and a determining module 52; the receiving module 51 is configured to receive an access request message sent by the GPU, where the access request message includes: starting addresses corresponding to the memory space to be accessed; if the start address is greater than or equal to the first address, the determining module 52 determines a mapping address corresponding to the start address, so that the GPU accesses a memory space where the mapping address starts;

before the crossbar switch determines the mapping address, the first address is the highest address corresponding to the memory space actually accessible by the GPU, the mapping address is located between a second address and a third address, the second address is the highest address corresponding to the memory space theoretically accessible by the GPU, and the third address is the highest address of the memory; the first address is smaller than the second address, and the second address is smaller than the third address.

The present invention provides a crossbar switch, which can be used to execute the method steps in the embodiment shown in fig. 2, and the implementation principle and technical effect are similar, and are not described herein again.

EXAMPLE five

On the basis of the fourth embodiment, optionally, the determining module 52 is specifically configured to: if the starting address is greater than or equal to the first address, the determining module performs an or operation on the starting address and a next address of the second address to obtain the mapping address.

The present invention provides a crossbar switch, which can be used to execute the method steps in the embodiment shown in fig. 3, and the implementation principle and technical effect are similar, and are not described herein again.

EXAMPLE six

On the basis of the fourth embodiment, optionally, the determining module 52 is specifically configured to: if the start address is greater than or equal to the first address, the determining module 52 determines a first difference between the start address and the second address and a second difference between the start address and the third address; the determining module 52 determines an offset based on the start address according to the first difference and the second difference, wherein the offset is greater than or equal to the first difference and less than the second difference; the determining module 52 determines a mapping address corresponding to the start address according to the start address and the offset.

The present invention provides a crossbar switch, which can be used to execute the method steps in the embodiment shown in fig. 4, and the implementation principle and technical effect are similar, and are not described herein again.

EXAMPLE seven

Fig. 6 is a schematic diagram of a computer system based on MIPS architecture without internal interlock pipeline, according to an embodiment of the present invention, as shown in fig. 6, the computer system 60 includes: a central processing unit CPU61, a memory 62, a graphics processing unit GPU63, and a crossbar switch 64 according to an embodiment of the present invention; the number of bits of binary numbers which can be processed once by the CPU in unit time is larger than the number of bits of binary numbers which can be processed once by the GPU in unit time.

The invention provides a computer system, wherein a crossbar switch in the computer system can be used for executing the method steps in the embodiments shown in fig. 2, fig. 3 and fig. 4, and the implementation principle and the technical effect are similar, and are not described herein again.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a memory access system readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A memory access method is applied to a computer system of a microprocessor MIPS (million Instructions Per second) architecture without an internal interlocking pipeline stage, and the computer system comprises the following steps: the method comprises the following steps that a Central Processing Unit (CPU), a memory, a cross switch and a Graphics Processing Unit (GPU), wherein the cross switch is respectively connected with the memory and the GPU, the number of bits of binary numbers which can be processed once by the CPU in unit time is larger than the number of bits of binary numbers which can be processed once by the GPU in unit time, and the method comprises the following steps:

the crossbar switch judges whether the starting address is greater than or equal to a first address;

the first address is less than the second address, and the second address is less than the third address;

and if the starting address is smaller than the first address, the GPU directly accesses the memory space where the starting address starts.

2. The method of claim 1, wherein if the start address is greater than or equal to the first address, the crossbar switch determining a mapping address corresponding to the start address comprises:

and if the starting address is greater than or equal to the first address, the crossbar switch performs OR operation on the starting address and the next address of the second address to obtain the mapping address.

3. The method of claim 1, wherein if the start address is greater than or equal to the first address, the crossbar switch determining a mapping address corresponding to the start address comprises:

4. A crossbar switch for use in a microprocessor MIPS architecture computer system without internal interlocked pipeline stages, the computer system further comprising: CPU, internal memory and graphic processor GPU, the said cross bar switch is connected with said internal memory and said GPU respectively, wherein, the number of bits of binary number that the said CPU can be processed once in unit time is greater than the number of bits of binary number that the said GPU can be processed once in unit time, the said cross bar switch includes: a receiving module and a determining module;

the crossbar switch judges whether the starting address is greater than or equal to a first address; if the starting address is larger than or equal to a first address, the determining module determines a mapping address corresponding to the starting address so that the GPU accesses a memory space where the mapping address starts;

5. The crossbar switch of claim 4 wherein the determination module is specifically configured to:

6. The crossbar switch of claim 4 wherein the determination module is specifically configured to:

7. A computer system based on a microprocessor MIPS architecture without internal interlocking pipeline stages, the computer system comprising: -a central processing unit CPU, a memory, a graphics processing unit GPU and a crossbar switch according to any of claims 4-6;