CN116132375A

CN116132375A - Multi-node arbitrary inter-core global communication method based on domestic DSP

Info

Publication number: CN116132375A
Application number: CN202211571806.7A
Authority: CN
Inventors: 侯旋; 曾令将; 杨进
Original assignee: CSIC (WUHAN) LINCOM ELECTRONICS CO LTD
Current assignee: CSIC (WUHAN) LINCOM ELECTRONICS CO LTD
Priority date: 2022-12-08
Filing date: 2022-12-08
Publication date: 2023-05-16

Abstract

The invention discloses a multi-node arbitrary inter-core global communication method based on domestic DSP, comprising dividing exclusive memory space in each core of a multi-core DSP processor to form a circulating message queue for caching received data; a communication proxy module is installed in the DSP processor main core, and the communication proxy module processes requests and routes from other cores outside the processor node. The invention realizes the data intercommunication between the global kernel and the kernel, and solves the problem of transmission delay caused by network congestion in a high throughput and high concurrency scene to a certain extent.

Description

Multi-node arbitrary inter-core global communication method based on domestic DSP

Technical Field

The invention relates to a global communication method, in particular to a multi-node arbitrary inter-core global communication method based on a domestic DSP, and belongs to the technical field of computer communication.

Background

In the field of signal processing, along with the development of information technology and the continuous improvement of signal acquisition precision, the information complexity is also continuously improved, and the calculation power demand of people on signal processing equipment is also increased.

However, since the current chip process is approaching the theoretical limit, the increase of the single-core performance becomes slow, and although the improvement of the main frequency of the single-core processor can bring about the improvement of the performance, the increase of the power consumption is often not reimbursed. The single-core processor device can not meet the demands of people on computation power and power consumption in the signal processing scene, and the large-scale integrated signal processing device composed of multi-core multi-processor becomes the hot research in the current signal processing field.

In addition, while the multi-core multiprocessor brings about an improvement in computational power, how to perform efficient inter-core data communication becomes an important point and a difficulty in improving signal processing efficiency. In the prior art, the inter-core communication scheme is mostly inter-chip core communication of the multi-core processor based on shared memory, the scheme can only meet the requirement of data communication between cores in the multi-core processor, and an effective solution is temporarily unavailable for performing inter-core-to-core data communication of a global processor node for large-scale integrated signal processing equipment consisting of multi-core multiprocessors.

Disclosure of Invention

The invention aims to solve at least one technical problem, and provides a multi-node arbitrary inter-core global communication method based on domestic DSP.

The invention realizes the above purpose through the following technical scheme: a multi-node arbitrary inter-core global communication method based on domestic DSP comprises integrated signal processing equipment, wherein the integrated signal processing equipment comprises a signal processing module composed of a plurality of DSP processors; each DSP processor is mutually connected through an SRIO switch to form a network intercommunication switching system to realize data interaction, each DSP processor is composed of eight cores (cores), the eight cores share a 2G DDR RAM memory, and the eight cores are mutually communicated through IPC interrupt;

the multi-node arbitrary inter-core global communication method comprises the following steps:

s1) dividing exclusive memory space in each core of the multi-core DSP processor to form a circular message queue for caching received data;

s2) installing a communication proxy module in the DSP processor main core, which processes requests and routes from other cores outside the present processor node.

As still further aspects of the invention: the circular message queue consists of 32 memory blocks with the same size on the exclusive memory space, the circular message queue is maintained and managed by a mapping table, the mapping table is a 32bit bitmap, each bit represents one memory block in the circular message queue, and bit position 1 represents that the corresponding memory block is used.

As still further aspects of the invention: the communication proxy module realizes the processing and routing of interaction information in inter-processor inter-core communication, and the communication proxy module is realized in a doorbell interrupt service function, and the doorbell can be initiated by any core in the universe and carries 16-bit doorbell information.

As still further aspects of the invention: when the cross-DSP processor node communication is carried out, the source core sends a doorbell interrupt carrying 16bit doorbell information to the DSP processor where the target core is located, the 16bit doorbell information comprises an interaction instruction and a source core target core number, a communication proxy module of the DSP processor where the target core is located completes corresponding actions according to the instruction and the target core number, and when necessary, a result is fed back to the source core according to the source core number in the doorbell information.

As still further aspects of the invention: the interactive instructions are divided into "inquiry" and "data delivery".

As still further aspects of the invention: the query instruction is sent by a source core to a processor communication proxy module where a target core is located, and the query instruction indicates that the communication proxy module needs to access a target core circulation message queue mapping table according to a target core number carried in doorbell information, obtains a first idle block address of a target core circulation message queue, and feeds back the idle block address to the source core through an SRIO NWRITE transaction packet.

As still further aspects of the invention: the data sending instruction is sent by the source core to the communication proxy module of the processor where the target core is located, which means that the source core has completed sending data at this time, the data is already in the circulation message queue of the target core, the communication proxy module needs to "route" the message to the target core, and the communication proxy module sends IPC interrupt to the target core according to the target core number carried in the doorbell information.

The beneficial effects of the invention are as follows:

on the one hand, for the inter-node core-to-core data communication of the global processor, the method adopts the steps that firstly, a message is sent to a target processor communication proxy module, and then the target processor communication proxy module processes the message or routes the message to a target core, so that the inter-node message intercommunication of the global processor is achieved;

on the other hand, the target core receives the data from each core in the domain in a circular queue mode, and compared with the inter-core communication mode of a single shared memory, the method effectively reduces the transmission delay caused by network congestion in the communication process, and greatly improves the transmission efficiency of inter-core communication data in a high-throughput and high-concurrency scene.

Drawings

FIG. 1 is a block diagram of a signal processing module according to the present invention;

FIG. 2 is a diagram of an inter-core communication scheme in accordance with the present invention;

FIG. 3 is a sequence diagram of a processor core-to-core communication in accordance with the present invention;

FIG. 4 is a sequence diagram illustrating inter-processor core-to-core communications according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Example 1

As shown in fig. 1 to fig. 4, a multi-node arbitrary inter-Core global communication method based on domestic DSP includes an integrated signal processing device, where the integrated signal processing device includes a plurality of DSP processors forming a signal processing module, the plurality of DSP processors are interconnected with each other through SRIO switches to form a network intercommunication switching system, each of the DSP processors implements data interaction through SRIO, each of the DSP processors is composed of eight cores (cores), the eight cores share a 2G DDR RAM memory, and the cores of the DSP processors communicate through IPC interrupt;

In the embodiment of the invention, the circular message queue consists of 32 memory blocks with the same size on the exclusive memory space, and the circular message queue is maintained and managed by a mapping table, wherein the mapping table is a 32bit bitmap, each bit represents one memory block in the circular message queue, and bit position 1 represents that the corresponding memory block is used.

In the embodiment of the invention, the communication proxy module realizes the processing and routing of the interaction information in inter-processor inter-core communication, and the communication proxy module realizes the interrupt service function of the doorbell, and the doorbell can be initiated by any core in the universe and carries 16-bit doorbell information.

In the embodiment of the invention, when cross-DSP processor node communication is performed, a source core sends a doorbell interrupt carrying 16bit doorbell information to a DSP processor where a target core is located, the 16bit doorbell information comprises an interaction instruction and a source core target core number, a communication proxy module of the DSP processor where the target core is located completes corresponding actions according to the instruction and the target core number, and if necessary, a result is fed back to the source core according to the source core number in the doorbell information.

In the embodiment of the invention, the interactive instructions are divided into 'inquiry' and 'data delivery'.

In the embodiment of the invention, a query instruction is sent by a source core to a processor communication proxy module where a target core is located, which means that the communication proxy module needs to access a target core circulation message queue mapping table according to a target core number carried in doorbell information, acquires a first idle block address of a target core circulation message queue, and feeds back the idle block address to the source core through an SRIO NWRITE transaction packet.

In the embodiment of the invention, a data sending instruction is sent by a source core to a communication proxy module of a processor where a target core is located, which means that the source core has completed sending data at the moment, the data is already in a circulation message queue of the target core, the communication proxy module needs to 'route' the message to the target core, and the communication proxy module sends IPC interrupt to the target core according to a target core number carried in doorbell information.

Example two

As shown in fig. 2, the multi-node arbitrary inter-core global communication method based on the domestic DSP further includes: two scenes exist in communication between the global core and the core of the signal processing module, one is that the source core and the target core are positioned in the same processor and share the same memory space; the other is that the source core and the target core are respectively located in different processors, and can perform data interaction through SRIO.

The core-to-core data interaction approach varies for the two scenarios above. For the first scene, the source core and the target core are located in the same processor, and data interaction can be directly carried out in a mode of sharing memory and inter-core IPC interrupt; in the second scenario, the source core and the target core cannot directly communicate, for which we design the processor as a master-slave core structure, and implement a communication proxy module on each processor master core for proxy and forwarding of interaction information from other processors. The source core interacts with the communication proxy module of the processor where the target core is located, so that the aim of transmitting data to the target core is fulfilled. In this implementation, the default processor 0 core is the main core, and the communication proxy module is implemented in the doorbell interrupt service function of the SRIO. The doorbell is installed on the main core of each processor in an interrupt mode, and the doorbell can carry 16 bits of information each time. The interaction between the source core and the communication proxy module is that the source core sends a doorbell interrupt carrying 16bit information to a main core of a processor where the target core is located, and the communication proxy module analyzes the 16bit information to execute instructions such as query, routing and the like and feeds back results to the source core. Although the inter-processor core communication method still adopts a master-slave core mode in the design of a communication model, the master-slave communication structure is not perceivable to a user, namely, in a user view, each processor core and each core are in a decentralised net-shaped distribution structure, any core is communicated with each core, and no difference exists.

For the two scenes, the source core data sending flow is different according to the position of the target core, but for the target core data receiving flow, the two scenes are completely consistent.

Example III

As shown in fig. 3, in the multi-node arbitrary inter-core global communication method based on the domestic DSP, for a source core and a target core located in the same processor, the source core and the target core can directly perform data interaction through a shared memory, and no routing forwarding is required to be performed on interaction information through a communication proxy module, and for the source core to send data, the sending flow specifically includes:

the source core directly accesses the target core message queue mapping table through the shared memory, obtains the service condition of the current target core message queue, calculates the address of the idle block at the head of the current idle queue of the target core in the memory according to the service condition, and locks the idle block in the mapping table to indicate that the current idle block is used. After locking the free block, the source core moves the user data to the free block address through the EDMA. After the data is moved, the source core notifies the target core to read the data by sending an IPC interrupt to the target core. Thus, one data "transmission" is completed.

Example IV

As shown in fig. 4, in the multi-node arbitrary inter-core global communication method based on the domestic DSP, for the scenario that the source core and the target core are in different processors, the source core cannot directly communicate with the target core, so that the source core cannot acquire the use condition of the target core message queue. At this time, through the interaction with the communication proxy module of the target processor, the effect of indirectly communicating with the target core is achieved, and for the source core to send data, the sending flow specifically comprises:

the source core sends a query instruction to a communication proxy module of a processor where the target core is located, wherein the query instruction carries the source core and target core number information. The communication agent module accesses the target core message queue mapping table through the shared memory according to the target core number in the information, obtains the service condition of the current target core message queue, obtains the idle block at the head of the current idle queue of the target core according to the service condition, and locks the idle block in the mapping table to indicate that the current idle block is used. After the idle block is locked, the communication proxy module sends the locked idle block information of the target core to the source core through an NWRITE transaction packet of SRIO. The source core analyzes the idle block information, acquires the address of the idle block of the target core in the DDR RAM memory, and then sends user data to the address through an NWRITE transaction packet of SRIO. After the data is sent, the source core sends a data sending completion instruction to a communication proxy module of a processor where the target core is located. And the communication proxy module sends the IPC interrupt to the target core according to the instruction to inform the target core of data reading. Thus, one data "transmission" is completed.

Example five

In this embodiment, except for including all the technical features in the third embodiment and the fourth embodiment, no difference exists in the data receiving flow for the target core, and only the source of the responding IPC interrupt is different, regardless of whether the source core and the target core are in the same processor. The former IPC interrupt directly comes from the source core, the latter IPC interrupt comes from the communication proxy module of the processor, and the sending flow of the target core for receiving data specifically comprises:

the target core blocks the semaphore that is to be released by the IPC interrupt service function before the IPC interrupt from the source core or the communication agent module is not received by the data receiving interface. When the target core receives an IPC interrupt from the source core or the communication proxy module, the target core indicates that the data has arrived at the core message queue at the moment. The IPC interrupt service function executes and releases the semaphore unblocking the receive data interface. The data receiving interface queries a message queue mapping table, acquires a memory block where current data is located, moves the data in the memory block to a user receiving data address through EDMA, then unlocks the memory block in the mapping table, releases the memory block to a message queue, and completes one-time data receiving.

Working principle: the message is sent to the communication proxy module of the target processor, and then the communication proxy module of the target processor processes the message or routes the message to the target core, so that the message intercommunication among the whole domain cores is achieved. On the other hand, the target core receives data from each core in the domain in a circular queue manner.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to the embodiments described in detail below, and that the embodiments described in the examples may be combined as appropriate to form other embodiments that will be apparent to those skilled in the art.

Claims

1. A multi-node arbitrary inter-core global communication method based on domestic DSP is characterized in that: including an integrated signal processing device;

the integrated signal processing device comprises a signal processing module consisting of a plurality of DSP processors;

the DSP processors are mutually interconnected through an SRIO switch and form a network intercommunication switching system to realize data interaction;

each DSP processor consists of eight cores, the eight cores share a DDR RAM memory of 2G, and the eight cores communicate with each other through IPC interrupt;

2. The multi-node arbitrary inter-core global communication method according to claim 1, wherein: in the step S1, the circular message queue is composed of 32 memory blocks with the same size in the exclusive memory space, and the circular message queue is maintained and managed by a mapping table, wherein the mapping table is a 32bit bitmap, each bit represents one memory block in the circular message queue, and bit position 1 represents that the corresponding memory block is used.

3. The multi-node arbitrary inter-core global communication method according to claim 1, wherein: in the step S2, the communication proxy module realizes the processing and routing of the interaction information in inter-processor inter-core communication, and the communication proxy module realizes the interrupt service function of the doorbell, and the doorbell can be initiated by any core in the universe and carries 16-bit doorbell information.

4. The multi-node arbitrary inter-core global communication method according to claim 2, wherein: in the step S2, when the cross-DSP processor node communication is performed, the source core sends a doorbell interrupt carrying 16bit doorbell information to the DSP processor where the target core is located, the 16bit doorbell information includes an interaction instruction and a source core target core number, and the communication proxy module of the DSP processor where the target core is located completes a corresponding action according to the instruction and the target core number, and if necessary, feeds back a result to the source core according to the source core number in the doorbell information.

5. The multi-node inter-arbitrary core global communication method of claim 4, wherein: the interactive instructions are divided into "inquiry" and "data delivery".

6. The multi-node inter-arbitrary core global communication method of claim 5, wherein: the query instruction is sent by a source core to a processor communication proxy module where a target core is located, and the communication proxy module is used for accessing the core circulation message queue mapping table according to a target core number carried in doorbell information, obtaining a first idle block address of a target core circulation message queue, and feeding back the idle block address to the source core through an SRIO NWRITE transaction packet.

7. The multi-node inter-arbitrary core global communication method of claim 5, wherein: the data sending instruction is sent by the source core to the communication proxy module of the processor where the target core is located, the data is sent by the source core, the data is stored in the circulating message queue of the target core, the communication proxy module needs to route the message to the target core, and the communication proxy module sends IPC interrupt to the target core according to the target core number carried in doorbell information.