CN116132375A - Multi-node arbitrary inter-core global communication method based on domestic DSP - Google Patents

Multi-node arbitrary inter-core global communication method based on domestic DSP Download PDF

Info

Publication number
CN116132375A
CN116132375A CN202211571806.7A CN202211571806A CN116132375A CN 116132375 A CN116132375 A CN 116132375A CN 202211571806 A CN202211571806 A CN 202211571806A CN 116132375 A CN116132375 A CN 116132375A
Authority
CN
China
Prior art keywords
core
processor
proxy module
inter
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211571806.7A
Other languages
Chinese (zh)
Inventor
侯旋
曾令将
杨进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CSIC (WUHAN) LINCOM ELECTRONICS CO LTD
Original Assignee
CSIC (WUHAN) LINCOM ELECTRONICS CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CSIC (WUHAN) LINCOM ELECTRONICS CO LTD filed Critical CSIC (WUHAN) LINCOM ELECTRONICS CO LTD
Priority to CN202211571806.7A priority Critical patent/CN116132375A/en
Publication of CN116132375A publication Critical patent/CN116132375A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/50Queue scheduling
    • H04L47/62Queue scheduling characterised by scheduling criteria
    • H04L47/622Queue service order
    • H04L47/6225Fixed service order, e.g. Round Robin
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L45/00Routing or path finding of packets in data switching networks
    • H04L45/02Topology update or discovery
    • H04L45/04Interdomain routing, e.g. hierarchical routing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/90Buffering arrangements
    • H04L49/9005Buffering arrangements using dynamic buffer space allocation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/90Buffering arrangements
    • H04L49/9063Intermediate storage in different physical parts of a node or terminal
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a multi-node arbitrary inter-core global communication method based on domestic DSP, comprising dividing exclusive memory space in each core of a multi-core DSP processor to form a circulating message queue for caching received data; a communication proxy module is installed in the DSP processor main core, and the communication proxy module processes requests and routes from other cores outside the processor node. The invention realizes the data intercommunication between the global kernel and the kernel, and solves the problem of transmission delay caused by network congestion in a high throughput and high concurrency scene to a certain extent.

Description

Multi-node arbitrary inter-core global communication method based on domestic DSP
Technical Field
The invention relates to a global communication method, in particular to a multi-node arbitrary inter-core global communication method based on a domestic DSP, and belongs to the technical field of computer communication.
Background
In the field of signal processing, along with the development of information technology and the continuous improvement of signal acquisition precision, the information complexity is also continuously improved, and the calculation power demand of people on signal processing equipment is also increased.
However, since the current chip process is approaching the theoretical limit, the increase of the single-core performance becomes slow, and although the improvement of the main frequency of the single-core processor can bring about the improvement of the performance, the increase of the power consumption is often not reimbursed. The single-core processor device can not meet the demands of people on computation power and power consumption in the signal processing scene, and the large-scale integrated signal processing device composed of multi-core multi-processor becomes the hot research in the current signal processing field.
In addition, while the multi-core multiprocessor brings about an improvement in computational power, how to perform efficient inter-core data communication becomes an important point and a difficulty in improving signal processing efficiency. In the prior art, the inter-core communication scheme is mostly inter-chip core communication of the multi-core processor based on shared memory, the scheme can only meet the requirement of data communication between cores in the multi-core processor, and an effective solution is temporarily unavailable for performing inter-core-to-core data communication of a global processor node for large-scale integrated signal processing equipment consisting of multi-core multiprocessors.
Disclosure of Invention
The invention aims to solve at least one technical problem, and provides a multi-node arbitrary inter-core global communication method based on domestic DSP.
The invention realizes the above purpose through the following technical scheme: a multi-node arbitrary inter-core global communication method based on domestic DSP comprises integrated signal processing equipment, wherein the integrated signal processing equipment comprises a signal processing module composed of a plurality of DSP processors; each DSP processor is mutually connected through an SRIO switch to form a network intercommunication switching system to realize data interaction, each DSP processor is composed of eight cores (cores), the eight cores share a 2G DDR RAM memory, and the eight cores are mutually communicated through IPC interrupt;
the multi-node arbitrary inter-core global communication method comprises the following steps:
s1) dividing exclusive memory space in each core of the multi-core DSP processor to form a circular message queue for caching received data;
s2) installing a communication proxy module in the DSP processor main core, which processes requests and routes from other cores outside the present processor node.
As still further aspects of the invention: the circular message queue consists of 32 memory blocks with the same size on the exclusive memory space, the circular message queue is maintained and managed by a mapping table, the mapping table is a 32bit bitmap, each bit represents one memory block in the circular message queue, and bit position 1 represents that the corresponding memory block is used.
As still further aspects of the invention: the communication proxy module realizes the processing and routing of interaction information in inter-processor inter-core communication, and the communication proxy module is realized in a doorbell interrupt service function, and the doorbell can be initiated by any core in the universe and carries 16-bit doorbell information.
As still further aspects of the invention: when the cross-DSP processor node communication is carried out, the source core sends a doorbell interrupt carrying 16bit doorbell information to the DSP processor where the target core is located, the 16bit doorbell information comprises an interaction instruction and a source core target core number, a communication proxy module of the DSP processor where the target core is located completes corresponding actions according to the instruction and the target core number, and when necessary, a result is fed back to the source core according to the source core number in the doorbell information.
As still further aspects of the invention: the interactive instructions are divided into "inquiry" and "data delivery".
As still further aspects of the invention: the query instruction is sent by a source core to a processor communication proxy module where a target core is located, and the query instruction indicates that the communication proxy module needs to access a target core circulation message queue mapping table according to a target core number carried in doorbell information, obtains a first idle block address of a target core circulation message queue, and feeds back the idle block address to the source core through an SRIO NWRITE transaction packet.
As still further aspects of the invention: the data sending instruction is sent by the source core to the communication proxy module of the processor where the target core is located, which means that the source core has completed sending data at this time, the data is already in the circulation message queue of the target core, the communication proxy module needs to "route" the message to the target core, and the communication proxy module sends IPC interrupt to the target core according to the target core number carried in the doorbell information.
The beneficial effects of the invention are as follows:
on the one hand, for the inter-node core-to-core data communication of the global processor, the method adopts the steps that firstly, a message is sent to a target processor communication proxy module, and then the target processor communication proxy module processes the message or routes the message to a target core, so that the inter-node message intercommunication of the global processor is achieved;
on the other hand, the target core receives the data from each core in the domain in a circular queue mode, and compared with the inter-core communication mode of a single shared memory, the method effectively reduces the transmission delay caused by network congestion in the communication process, and greatly improves the transmission efficiency of inter-core communication data in a high-throughput and high-concurrency scene.
Drawings
FIG. 1 is a block diagram of a signal processing module according to the present invention;
FIG. 2 is a diagram of an inter-core communication scheme in accordance with the present invention;
FIG. 3 is a sequence diagram of a processor core-to-core communication in accordance with the present invention;
FIG. 4 is a sequence diagram illustrating inter-processor core-to-core communications according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
As shown in fig. 1 to fig. 4, a multi-node arbitrary inter-Core global communication method based on domestic DSP includes an integrated signal processing device, where the integrated signal processing device includes a plurality of DSP processors forming a signal processing module, the plurality of DSP processors are interconnected with each other through SRIO switches to form a network intercommunication switching system, each of the DSP processors implements data interaction through SRIO, each of the DSP processors is composed of eight cores (cores), the eight cores share a 2G DDR RAM memory, and the cores of the DSP processors communicate through IPC interrupt;
the multi-node arbitrary inter-core global communication method comprises the following steps:
s1) dividing exclusive memory space in each core of the multi-core DSP processor to form a circular message queue for caching received data;
s2) installing a communication proxy module in the DSP processor main core, which processes requests and routes from other cores outside the present processor node.
In the embodiment of the invention, the circular message queue consists of 32 memory blocks with the same size on the exclusive memory space, and the circular message queue is maintained and managed by a mapping table, wherein the mapping table is a 32bit bitmap, each bit represents one memory block in the circular message queue, and bit position 1 represents that the corresponding memory block is used.
In the embodiment of the invention, the communication proxy module realizes the processing and routing of the interaction information in inter-processor inter-core communication, and the communication proxy module realizes the interrupt service function of the doorbell, and the doorbell can be initiated by any core in the universe and carries 16-bit doorbell information.
In the embodiment of the invention, when cross-DSP processor node communication is performed, a source core sends a doorbell interrupt carrying 16bit doorbell information to a DSP processor where a target core is located, the 16bit doorbell information comprises an interaction instruction and a source core target core number, a communication proxy module of the DSP processor where the target core is located completes corresponding actions according to the instruction and the target core number, and if necessary, a result is fed back to the source core according to the source core number in the doorbell information.
In the embodiment of the invention, the interactive instructions are divided into 'inquiry' and 'data delivery'.
In the embodiment of the invention, a query instruction is sent by a source core to a processor communication proxy module where a target core is located, which means that the communication proxy module needs to access a target core circulation message queue mapping table according to a target core number carried in doorbell information, acquires a first idle block address of a target core circulation message queue, and feeds back the idle block address to the source core through an SRIO NWRITE transaction packet.
In the embodiment of the invention, a data sending instruction is sent by a source core to a communication proxy module of a processor where a target core is located, which means that the source core has completed sending data at the moment, the data is already in a circulation message queue of the target core, the communication proxy module needs to 'route' the message to the target core, and the communication proxy module sends IPC interrupt to the target core according to a target core number carried in doorbell information.
Example two
As shown in fig. 2, the multi-node arbitrary inter-core global communication method based on the domestic DSP further includes: two scenes exist in communication between the global core and the core of the signal processing module, one is that the source core and the target core are positioned in the same processor and share the same memory space; the other is that the source core and the target core are respectively located in different processors, and can perform data interaction through SRIO.
The core-to-core data interaction approach varies for the two scenarios above. For the first scene, the source core and the target core are located in the same processor, and data interaction can be directly carried out in a mode of sharing memory and inter-core IPC interrupt; in the second scenario, the source core and the target core cannot directly communicate, for which we design the processor as a master-slave core structure, and implement a communication proxy module on each processor master core for proxy and forwarding of interaction information from other processors. The source core interacts with the communication proxy module of the processor where the target core is located, so that the aim of transmitting data to the target core is fulfilled. In this implementation, the default processor 0 core is the main core, and the communication proxy module is implemented in the doorbell interrupt service function of the SRIO. The doorbell is installed on the main core of each processor in an interrupt mode, and the doorbell can carry 16 bits of information each time. The interaction between the source core and the communication proxy module is that the source core sends a doorbell interrupt carrying 16bit information to a main core of a processor where the target core is located, and the communication proxy module analyzes the 16bit information to execute instructions such as query, routing and the like and feeds back results to the source core. Although the inter-processor core communication method still adopts a master-slave core mode in the design of a communication model, the master-slave communication structure is not perceivable to a user, namely, in a user view, each processor core and each core are in a decentralised net-shaped distribution structure, any core is communicated with each core, and no difference exists.
For the two scenes, the source core data sending flow is different according to the position of the target core, but for the target core data receiving flow, the two scenes are completely consistent.
Example III
As shown in fig. 3, in the multi-node arbitrary inter-core global communication method based on the domestic DSP, for a source core and a target core located in the same processor, the source core and the target core can directly perform data interaction through a shared memory, and no routing forwarding is required to be performed on interaction information through a communication proxy module, and for the source core to send data, the sending flow specifically includes:
the source core directly accesses the target core message queue mapping table through the shared memory, obtains the service condition of the current target core message queue, calculates the address of the idle block at the head of the current idle queue of the target core in the memory according to the service condition, and locks the idle block in the mapping table to indicate that the current idle block is used. After locking the free block, the source core moves the user data to the free block address through the EDMA. After the data is moved, the source core notifies the target core to read the data by sending an IPC interrupt to the target core. Thus, one data "transmission" is completed.
Example IV
As shown in fig. 4, in the multi-node arbitrary inter-core global communication method based on the domestic DSP, for the scenario that the source core and the target core are in different processors, the source core cannot directly communicate with the target core, so that the source core cannot acquire the use condition of the target core message queue. At this time, through the interaction with the communication proxy module of the target processor, the effect of indirectly communicating with the target core is achieved, and for the source core to send data, the sending flow specifically comprises:
the source core sends a query instruction to a communication proxy module of a processor where the target core is located, wherein the query instruction carries the source core and target core number information. The communication agent module accesses the target core message queue mapping table through the shared memory according to the target core number in the information, obtains the service condition of the current target core message queue, obtains the idle block at the head of the current idle queue of the target core according to the service condition, and locks the idle block in the mapping table to indicate that the current idle block is used. After the idle block is locked, the communication proxy module sends the locked idle block information of the target core to the source core through an NWRITE transaction packet of SRIO. The source core analyzes the idle block information, acquires the address of the idle block of the target core in the DDR RAM memory, and then sends user data to the address through an NWRITE transaction packet of SRIO. After the data is sent, the source core sends a data sending completion instruction to a communication proxy module of a processor where the target core is located. And the communication proxy module sends the IPC interrupt to the target core according to the instruction to inform the target core of data reading. Thus, one data "transmission" is completed.
Example five
In this embodiment, except for including all the technical features in the third embodiment and the fourth embodiment, no difference exists in the data receiving flow for the target core, and only the source of the responding IPC interrupt is different, regardless of whether the source core and the target core are in the same processor. The former IPC interrupt directly comes from the source core, the latter IPC interrupt comes from the communication proxy module of the processor, and the sending flow of the target core for receiving data specifically comprises:
the target core blocks the semaphore that is to be released by the IPC interrupt service function before the IPC interrupt from the source core or the communication agent module is not received by the data receiving interface. When the target core receives an IPC interrupt from the source core or the communication proxy module, the target core indicates that the data has arrived at the core message queue at the moment. The IPC interrupt service function executes and releases the semaphore unblocking the receive data interface. The data receiving interface queries a message queue mapping table, acquires a memory block where current data is located, moves the data in the memory block to a user receiving data address through EDMA, then unlocks the memory block in the mapping table, releases the memory block to a message queue, and completes one-time data receiving.
Working principle: the message is sent to the communication proxy module of the target processor, and then the communication proxy module of the target processor processes the message or routes the message to the target core, so that the message intercommunication among the whole domain cores is achieved. On the other hand, the target core receives data from each core in the domain in a circular queue manner.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to the embodiments described in detail below, and that the embodiments described in the examples may be combined as appropriate to form other embodiments that will be apparent to those skilled in the art.

Claims (7)

1. A multi-node arbitrary inter-core global communication method based on domestic DSP is characterized in that: including an integrated signal processing device;
the integrated signal processing device comprises a signal processing module consisting of a plurality of DSP processors;
the DSP processors are mutually interconnected through an SRIO switch and form a network intercommunication switching system to realize data interaction;
each DSP processor consists of eight cores, the eight cores share a DDR RAM memory of 2G, and the eight cores communicate with each other through IPC interrupt;
the multi-node arbitrary inter-core global communication method comprises the following steps:
s1) dividing exclusive memory space in each core of the multi-core DSP processor to form a circular message queue for caching received data;
s2) installing a communication proxy module in the DSP processor main core, which processes requests and routes from other cores outside the present processor node.
2. The multi-node arbitrary inter-core global communication method according to claim 1, wherein: in the step S1, the circular message queue is composed of 32 memory blocks with the same size in the exclusive memory space, and the circular message queue is maintained and managed by a mapping table, wherein the mapping table is a 32bit bitmap, each bit represents one memory block in the circular message queue, and bit position 1 represents that the corresponding memory block is used.
3. The multi-node arbitrary inter-core global communication method according to claim 1, wherein: in the step S2, the communication proxy module realizes the processing and routing of the interaction information in inter-processor inter-core communication, and the communication proxy module realizes the interrupt service function of the doorbell, and the doorbell can be initiated by any core in the universe and carries 16-bit doorbell information.
4. The multi-node arbitrary inter-core global communication method according to claim 2, wherein: in the step S2, when the cross-DSP processor node communication is performed, the source core sends a doorbell interrupt carrying 16bit doorbell information to the DSP processor where the target core is located, the 16bit doorbell information includes an interaction instruction and a source core target core number, and the communication proxy module of the DSP processor where the target core is located completes a corresponding action according to the instruction and the target core number, and if necessary, feeds back a result to the source core according to the source core number in the doorbell information.
5. The multi-node inter-arbitrary core global communication method of claim 4, wherein: the interactive instructions are divided into "inquiry" and "data delivery".
6. The multi-node inter-arbitrary core global communication method of claim 5, wherein: the query instruction is sent by a source core to a processor communication proxy module where a target core is located, and the communication proxy module is used for accessing the core circulation message queue mapping table according to a target core number carried in doorbell information, obtaining a first idle block address of a target core circulation message queue, and feeding back the idle block address to the source core through an SRIO NWRITE transaction packet.
7. The multi-node inter-arbitrary core global communication method of claim 5, wherein: the data sending instruction is sent by the source core to the communication proxy module of the processor where the target core is located, the data is sent by the source core, the data is stored in the circulating message queue of the target core, the communication proxy module needs to route the message to the target core, and the communication proxy module sends IPC interrupt to the target core according to the target core number carried in doorbell information.
CN202211571806.7A 2022-12-08 2022-12-08 Multi-node arbitrary inter-core global communication method based on domestic DSP Pending CN116132375A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211571806.7A CN116132375A (en) 2022-12-08 2022-12-08 Multi-node arbitrary inter-core global communication method based on domestic DSP

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211571806.7A CN116132375A (en) 2022-12-08 2022-12-08 Multi-node arbitrary inter-core global communication method based on domestic DSP

Publications (1)

Publication Number Publication Date
CN116132375A true CN116132375A (en) 2023-05-16

Family

ID=86296431

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211571806.7A Pending CN116132375A (en) 2022-12-08 2022-12-08 Multi-node arbitrary inter-core global communication method based on domestic DSP

Country Status (1)

Country Link
CN (1) CN116132375A (en)

Similar Documents

Publication Publication Date Title
Mamidala et al. MPI collectives on modern multicore clusters: Performance optimizations and communication characteristics
US8209690B2 (en) System and method for thread handling in multithreaded parallel computing of nested threads
Ma et al. Kernel assisted collective intra-node mpi communication among multi-core and many-core cpus
CN101840390B (en) Hardware synchronous circuit structure suitable for multiprocessor system and implement method thereof
TW201543218A (en) Chip device and method for multi-core network processor interconnect with multi-node connection
CN110119304B (en) Interrupt processing method and device and server
CN103955435A (en) Method for establishing access by fusing multiple levels of cache directories
Hashmi et al. Designing efficient shared address space reduction collectives for multi-/many-cores
TW201546615A (en) Inter-chip interconnect protocol for a multi-chip system
CN111488308A (en) System and method for supporting multi-processor expansion of different architectures
CA2505259A1 (en) Methods and apparatus for multiple cluster locking
US6904465B2 (en) Low latency inter-reference ordering in a multiple processor system employing a multiple-level inter-node switch
CN114598746B (en) Method for optimizing load balancing performance between servers based on intelligent network card
CN106844263B (en) Configurable multiprocessor-based computer system and implementation method
Singhal et al. A distributed mutual exclusion algorithm for mobile computing environments
CN109918335A (en) One kind being based on 8 road DSM IA frame serverPC system of CPU+FPGA and processing method
Suresh et al. A novel framework for efficient offloading of communication operations to bluefield smartnics
CN116881192A (en) Cluster architecture for GPU and internal first-level cache management method thereof
CN116132375A (en) Multi-node arbitrary inter-core global communication method based on domestic DSP
Antoniu et al. Making a DSM consistency protocol hierarchy-aware: An efficient synchronization scheme
US10366006B2 (en) Computing apparatus, node device, and server
Sikder et al. Exploring wireless technology for off-chip memory access
Tu et al. Multi-core aware optimization for MPI collectives
CN114116167A (en) Regional autonomous heterogeneous many-core processor for high-performance computing
Dhanraj Enhancement of LiMIC-Based Collectives for Multi-core Clusters

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination