CN112416702B - Safety isolation system for hybrid operation of multiple safety level tasks - Google Patents

Safety isolation system for hybrid operation of multiple safety level tasks Download PDF

Info

Publication number
CN112416702B
CN112416702B CN202011227314.7A CN202011227314A CN112416702B CN 112416702 B CN112416702 B CN 112416702B CN 202011227314 A CN202011227314 A CN 202011227314A CN 112416702 B CN112416702 B CN 112416702B
Authority
CN
China
Prior art keywords
processor core
processor
core
memory
pcie
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011227314.7A
Other languages
Chinese (zh)
Other versions
CN112416702A (en
Inventor
段小虎
程俊强
段宇博
马小博
田征戈
苏鹏涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Aeronautics Computing Technique Research Institute of AVIC
Original Assignee
Xian Aeronautics Computing Technique Research Institute of AVIC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Aeronautics Computing Technique Research Institute of AVIC filed Critical Xian Aeronautics Computing Technique Research Institute of AVIC
Priority to CN202011227314.7A priority Critical patent/CN112416702B/en
Publication of CN112416702A publication Critical patent/CN112416702A/en
Application granted granted Critical
Publication of CN112416702B publication Critical patent/CN112416702B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3013Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is an embedded system, i.e. a combination of hardware and software dedicated to perform a certain function in mobile devices, printers, automotive or aircraft systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3051Monitoring arrangements for monitoring the configuration of the computing system or of the computing system component, e.g. monitoring the presence of processing resources, peripherals, I/O links, software programs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3058Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations
    • G06F11/3062Monitoring arrangements for monitoring environmental properties or parameters of the computing system or of the computing system component, e.g. monitoring of power, currents, temperature, humidity, position, vibrations where the monitored property is the power consumption
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Hardware Redundancy (AREA)

Abstract

The application provides a safety isolation system for mixed running of tasks with multiple safety levels, which comprises a multi-core processor, a DDRx memory, a FLASH memory, an NvRAM memory, a PCIe exchange chip, a TTE network end system and an FPGA, wherein: the multi-core processor is respectively connected with the PCIe exchange chip, the FPGA, the X-group DDRx memory, the Y-group DDRx memory and the Z-group DDRx memory; the PCIe exchange chip is connected with the TTE network end system; the FPGA is respectively connected with the FLASH memory and the NvRAM memory; the PCIe exchange chip is connected with the TTE network end system through a PCIe bus; the FPGA comprises a watchdog circuit; the TTE network end system expands a dual-redundancy or tri-redundancy TTE network interface outside the safety isolation system; the multi-core processor comprises a processor core A, a processor core B and a processor core C; the multi-core processor comprises a PCIe bus interface L, PCIe, a PCIe bus interface M and a PCIe bus interface N; PCIe bus interface L, PCIe bus interface M and PCIe bus interface N are both connected to the PCIe switch chip.

Description

Safety isolation system for hybrid operation of multiple safety level tasks
Technical Field
The application relates to the field of embedded computing, in particular to a safety isolation system for mixed operation of multiple safety level tasks.
Background
In a large-scale complex embedded computing application scenario, there are often embedded computers with different types and different features, which are oriented to different types of computing tasks. Taking an embedded computing system in an aircraft as an example, the flight control computing task generally requires high reliability and high safety and usually runs on a computer with stronger fault tolerance; the task load (mixing Payload) type computing task generally requires high performance and high bandwidth, and is usually run on a computer with relatively strong computing performance. In the same embedded computing application scene, a plurality of embedded computers with different types and different characteristics are used for executing different types of computing tasks, and the design mode has the advantages that the computers with different characteristics are convenient for meeting the characteristic requirements of the different types of computing tasks, but the design mode also has the inherent defects: on one hand, when the number of computers in the system is large, the volume, weight and power consumption of the system are difficult to reduce; on the other hand, computers of different features often have different architectures, different bus networks, and run separately from each other, which makes different computing tasks running on different computers a natural impediment to data fusion, collaborative work. The obstruction causes that various computing tasks cannot be organically and uniformly tightly coupled and cooperate with each other in the same embedded computing application scene, thereby limiting the improvement of the overall functions, performance and operation efficiency of the embedded computing application scene. In order to solve the problem, large-scale complex embedded computing systems in various fields are in recent years aiming at improving the comprehensive level, namely, continuously improving the functions and performances of a single computer, integrating computing tasks originally in a plurality of computers into the single computer to run, so that on one hand, the level of data fusion and the capability of tight collaboration between different computing tasks can be improved, and on the other hand, the whole volume, weight and power consumption of the embedded computing system can be reduced, and the whole functions, performances and running efficiency of the embedded computing application scene are improved.
In many areas of complex embedded computing systems, various computing tasks may be categorized according to their critical importance. Safety critical tasks: the computational tasks that are related to system security are often referred to as security-Critical (security-Critical) tasks. Taking an aircraft as an example, the flight control calculation task is a typical safety critical task. Safety critical tasks generally have high reliability and safety requirements, often run uninterruptedly in the background of the system, and have high requirements on real-time performance. If the safety critical task is wrong or loses function, serious consequences such as life and property loss can be possibly caused, so that the system needs to ensure the reliable operation of the safety critical task. Task key tasks: the computing tasks associated with the task load (MissionPayload) that the system bears are commonly referred to as Mission-Critical (Mission-Critical) tasks. Taking an aircraft as an example, tasks related to task loads such as radar, communication, navigation and the like are typical task key tasks. Mission critical tasks are characterized by the fact that they are typically event-triggered. When the trigger event does not occur or occurs in a small amount, the calculated amount and the bandwidth requirement of the task key task are low; when trigger events occur frequently or occur centrally, the computational effort and bandwidth requirements of mission-critical tasks increase dramatically, even peaking. When the mission-critical task is wrong or is out of function, the corresponding task load can not be correctly executed, and the system safety effect is not caused directly. Computers running mission-critical tasks need to support the computational effort and bandwidth requirements at the time of mission peaks, so high performance, high bandwidth are typically required, but the requirements for reliability, security are relatively low. Non-critical tasks: tasks that do not have significant impact when anomalies occur, beyond safety-Critical tasks and mission-Critical tasks, are commonly referred to as non-Critical tasks. Related computing tasks such as data logging, entertainment devices, etc., are often non-critical tasks. The non-critical tasks are the least demanding for reliability and security.
With the continuous improvement of the integration level, various safety critical tasks originally in different computers can be integrated into the same high-reliability computer to run in a large-scale complex embedded computing system in various fields at present; the key tasks and the non-key tasks of various tasks which are originally in different computers can be integrated into the same high-performance computer to run. However, it is difficult to continue the operation of the safety critical task running on the high-reliability computer, the task critical task running on the high-performance computer, and the non-critical task into the same computer with both high reliability and high performance. This is because, when the high-security-level security-critical task, the low-security-level task-critical task, and the non-critical task are located in the same computer, resources such as computation, storage, and communication are shared and run simultaneously, it is difficult to ensure that the normal operation of the high-security-level task is not affected by the low-security-level task, which makes it difficult to ensure the security of the system. For example, when a mission-critical task peaks, the communication bandwidth of the safety-critical task may be squeezed, resulting in its normal operation being affected; when a task critical task fails, the shared resource may be misoperated to cause the failure to spread, so that the safety critical task is abnormal. Therefore, to further improve the integration level of the complex embedded computing system, to integrate the security critical tasks, the task critical tasks and the non-critical tasks with different security levels and different task characteristic requirements into the same computer for operation, it is necessary to develop a computer capable of executing tasks with multiple security levels in a mixed manner, so as to solve the security isolation problem between the computing tasks with different security levels.
Disclosure of Invention
Aiming at the problem that the large-scale complex embedded computing system in various fields is difficult to further improve the comprehensive level in the background technology, the application provides a safety isolation system for mixed operation of multiple safety level tasks, which can simultaneously operate safety critical tasks, task critical tasks and non-critical tasks with different safety levels and different task characteristic requirements, and establishes safety isolation among the different safety level computing tasks so as to ensure that the normal operation of the high safety level tasks is not influenced by the low safety level tasks.
The application provides a safety isolation system for mixed running of tasks with multiple safety levels, which comprises a multi-core processor, a DDRx memory, a FLASH memory, an NvRAM memory, a PCIe exchange chip, a TTE network end system and an FPGA, wherein:
The multi-core processor is respectively connected with the PCIe exchange chip, the FPGA, the X-group DDRx memory, the Y-group DDRx memory and the Z-group DDRx memory; the PCIe exchange chip is connected with the TTE network end system; the FPGA is respectively connected with the FLASH memory and the NvRAM memory; the PCIe exchange chip is connected with the TTE network end system through a PCIe bus; the FPGA comprises a watchdog circuit; the TTE network end system expands a dual-redundancy or tri-redundancy TTE network interface outside the safety isolation system;
The multi-core processor comprises a processor core A, a processor core B and a processor core C, wherein the processor core A is a group of processor cores running the tasks with the highest security level, the processor core B is a group of processor cores running the tasks with the next highest security level, and the processor core C is a group of processor cores running the tasks with the lowest security level;
The multi-core processor comprises a PCIe bus interface L, PCIe, a PCIe bus interface M and a PCIe bus interface N; PCIe bus interface L, PCIe bus interface M and PCIe bus interface N are both connected to the PCIe switch chip.
Specifically, each processor core in the multi-core processor operates a task corresponding to a security level according to a preset security level, wherein a processor core A operates a task of the highest security level, a processor core B operates a task of the next highest security level, and a processor core C operates a task of the lowest security level; the tasks of processor core A run in the X-set of DDRx memories, the tasks of processor core B run in the Y-set of DDRx memories, and the tasks of processor core C run in the Z-set of DDRx memories.
Specifically, after the system is reset and started, programs and data required by the operation of each processor core are transferred to the DDRx memory, and the multi-core processor enters the task operation period, each processor is not allowed to check the FLASH memory for access;
NvRAM memory, only allowing processor core a to access the NvRAM memory; the data which needs to be written into the NvRAM memory is shared to the processor core a by the processor core B and the processor core C through the DDRx memory address space which is shared and accessed among the processor cores, and is written into the NvRAM by the processor core a proxy.
Specifically, the PCIe bus interface L communicates only with the processor core a and the X group DDRx memory; the PCIe bus interface M only communicates with the processor core B and the Y-group DDRx memory; PCIe bus interface N communicates only with processor core C and Z-group DDRx memory.
Specifically, a plurality of watchdog timers are realized in the FPGA to respectively monitor the running states of the processor cores. Only processor core a is allowed to access all watchdog timers; when the processor core B and the processor core C need to access the watchdog timer, the access operation information is shared to the processor core A through the shared access DDRx memory address space among the processor cores, and is accessed by the processor core A agent.
Specifically, if the processor core a and the processor core B need to share data by sharing preset address areas of the DDRx memory, the shared address areas of the processor core a and the processor core B are in the Y-group DDRx memory;
The processor core A and the processor core C need to share data by sharing a preset address area of the DDRx memory, and the shared address area of the processor core A and the processor core C is in the Z group DDRx memory;
processor core B and processor core C need to share data by sharing the DDRx memory preset address area, and the shared address area of processor core B and processor core C is in the Z-group DDRx memory.
Specifically, the PCIe switch chip supports multiple virtual channels.
Specifically, the multi-core processor is a multi-core processor with a switched interconnection structure.
In summary, the safety isolation system for mixed operation of multiple safety level tasks can simultaneously operate safety critical tasks, task critical tasks and non-critical tasks with different safety levels and different task characteristic requirements, and safety isolation is established between different safety level computing tasks, so that normal operation of high safety level tasks is not influenced by low safety level tasks.
Various tasks can share the computing resources (multi-core processor), storage resources (DDRx memory, FLASH memory, nvRAM memory), communication resources (TTE network) and the like of the security isolation system. For each hardware resource, the system provides a corresponding mechanism to ensure that the normal operation of the high-security-level task is not affected by the low-security-level task.
The system can be applied to large-scale complex embedded computing systems in various fields, and can further integrate the safety critical tasks originally running in a high-reliability computer and the task critical tasks and the non-critical tasks originally running in a high-performance computer in the embedded computing system into the same computer with high reliability and high performance to run, so that the integration level of the complex embedded computing system is further improved, the data fusion level and the tight collaboration capacity between different computing tasks can be improved, and the volume, the weight and the power consumption of the whole embedded computing system can be reduced, so that the overall function, the performance and the running efficiency of the embedded computing application scene are improved.
Drawings
FIG. 1 is a hardware block diagram of the present security isolation system;
FIG. 2 is a functional block diagram of a multi-core processor within the present security isolation system;
FIG. 3 is a schematic diagram of a DDRx memory within the present security isolation system;
FIG. 4 is a schematic diagram of a PCIe bus within the present security isolation system;
FIG. 5 is a schematic diagram of the present security isolation system running a hybrid mission;
FIG. 6 is a schematic diagram of a plurality of the present security isolation systems making up a larger scale computer system.
Detailed Description
In large-scale complex embedded computing systems in various fields, the development of integration encounters an obstacle, and it is difficult to integrate a safety critical task running on a high-reliability computer, a task critical task running on a high-performance computer and a non-critical task into the same computer with both high reliability and high performance to run. In order to solve the problem, the invention provides a safety isolation system for mixed operation of multiple safety level tasks. The safety isolation system can simultaneously run safety critical tasks, task critical tasks and non-critical tasks with different safety levels and different task characteristic requirements, and safety isolation is established between calculation tasks with different safety levels, so that normal running of tasks with high safety levels is ensured not to be influenced by tasks with low safety levels. In large-scale complex embedded computing systems in various fields, the invention can further improve the comprehensive level of the complex embedded computing system, and integrate safety critical tasks, task critical tasks and non-critical tasks in the system into the same computer to operate, so that on one hand, the level of data fusion and the capability of tight cooperation between different computing tasks can be improved, and on the other hand, the invention is also beneficial to reducing the whole volume, weight and power consumption of the embedded computing system, thereby improving the whole function, performance and operation efficiency of the embedded computing application scene.
The technical scheme of the invention is as follows:
The computer system shown in fig. 1 includes two computing nodes (in fig. 1, a left multi-core processor and its peripheral memory, bus, interface, network, etc. are one computing node, and a right multi-core processor and its peripheral memory, bus, interface, network, etc. are another computing node). Each computing node comprises a multi-core processor; the multi-core processor is externally connected with at least two groups of independent DDRx memories; the multi-core processor is externally connected with an FPGA, and a group of FLASH memories and a group of NvRAM memories are expanded through the FPGA; the multi-core processor extends an Ethernet interface and a serial interface to the outside of the computer system; the multi-core processor is connected to a PCIe switching chip in the computing node through at least two PCIe bus interfaces. Each computing node comprises a TTE (TIME TRIGGERED ETHERNET ) network end system, the TTE network end system is connected to a PCIe exchange chip in the computing node through a PCIe bus, and the TTE network end system extends a dual-redundancy or triple-redundancy TTE network interface outside the computer system. And the FPGA in each computing node is used for realizing a watchdog circuit and monitoring the running state of each processor core of the multi-core processor of the computing node. The multi-core processors in the two computing nodes are respectively extended with one rapidIO bus interface outwards and connected with the same rapidIO exchange chip, and the two computing nodes can perform data communication through the rapidIO exchange chip.
The single computing node is the safety isolation system of the invention.
The specific design of the various hardware components within a compute node is set forth below.
1. Multi-core processor
Each computing node includes a multi-core processor. In a multi-core processor, computing tasks specifying different security levels must run on different processor cores, thereby achieving isolation of the computing tasks of different security levels from each other on the computing resources. For example, safety critical tasks run on processor core a, task critical tasks run on processor core B, non-critical tasks run on processor core C, etc.
A switched interconnect structure (rather than a conventional internal bus interconnect structure) should be employed between each processor core (and its L1/L2 Cache), each DDRx memory controller, and each bus/interface controller within the multi-core processor, as shown in fig. 2. The reasons for the adoption of the switched interconnect structure are two: 1. the switched interconnect architecture allows different memories, devices to be configured for exclusive use by different processor cores; 2. the switch interconnection structure supports multiple processor cores to simultaneously access different memories and devices, and the multiple processor cores cannot be blocked with each other (in the traditional internal bus interconnection structure, bus contention occurs when multiple processor cores simultaneously issue accesses to the outside, and bus arbitration is necessary).
In related contents such as a DDRx memory and a PCIe bus, the characteristics of the multi-core processor exchange type interconnection structure are utilized to realize calculation, storage and reasonable allocation and safety isolation of communication resources of tasks with different security levels.
2. DDRx memory
The DDRx memory refers to DDR, DDR2, DDR3, DDR4, … … series memories, and is used as a main memory for running and storing data of each processor core program of the multi-core processor.
The multi-core processor of each computing node is externally connected with at least two groups of independent DDRx memories. Computing tasks (running on different processor cores) specifying different security levels must run on different sets of DDRx memories. I.e. processor cores running different security level computing tasks, have to use different sets of DDRx memory as main memory for their program execution and data storage. In addition, the processor cores running the low security level computing task are not allowed to access the DDRx main memory group corresponding to the processor cores running the high security level computing task. If data needs to be shared between a certain processor core running a low security level computing task and a certain processor core running a high security level computing task by sharing a certain address area of the DDRx memory, the shared address area should be in the DDRx main memory group corresponding to the processor core running the low security level computing task. (Access rights of each processor to check each set of DDRx memories may be set through configuration management of the multicore processor.)
Taking fig. 3 as an example, the example multi-core processor internally contains 3 processor cores (core a, core B, core C), external connection X, Y, Z three sets of DDRx memories. The processor core A runs the highest security level computing task and uses X groups of DDRx memories as a main memory for program running and data storage; the processor core B runs a next-highest security level computing task and uses Y groups of DDRx memories as a main memory for program running and data storage; the processor core C runs the lowest security level computing task, using the Z-group DDRx memory as the main memory for its program execution and data storage. The X-group DDRx memory full address segment only allows processor core A access (as core A's main memory); in the Y-group DDRx memory, a partial address field only allows processor core B to access (as a main memory of core B), and a partial address field allows processor core A, B to commonly access (as a shared data storage area of core A and core B); in the Z-set DDRx memory, the partial address field allows only processor core C to access (as core C's main memory), and the partial address field allows processor core A, C to commonly access (as core A and core C share a data store).
This design of DDRx memory has two advantages: 1. the processor cores running different security level computing tasks are guaranteed as far as possible, are mutually independent and do not interfere with each other (each processor core is provided with an independent main memory, and the exchange interconnection structure of the multi-core processor supports each processor core to simultaneously access the corresponding main memory); 2. the normal operation of the high-security-level task is ensured not to be affected by the low-security-level task. Taking fig. 3 as an example, when the processor core B, the processor core C, Y, and the Z-group DDRx memory running the tasks with the next highest security level fail, for the processor core a, only the shared data between the core a and the cores B and C is affected, normal access of the processor core a to the X-group DDRx memory is not affected, and the highest security level computing task can still continue to run.
3. FLASH memory
The multi-core processor of each computing node expands a group of FLASH memories through the FPGA. The FLASH memory is used for resetting and starting the multi-core processor, and programs, data and the like required by the operation of each processor core are transferred into each DDRx memory after resetting and starting. After the transfer is finished, the multi-core processor enters a task running period. During the task running period, the processor cores are not allowed to check the FLASH memory to access any more (the processor cores can be forbidden to access the address segment where the FLASH memory is located through the configuration management of the multi-core processor).
During task operation, even if a certain processor core fails, the address segment where the FLASH memory is located is accessed wrongly, and the access is shielded and does not influence the normal operation of other processor cores. This ensures a secure isolation between tasks of different security levels.
4. NvRAM memory
The multi-core processor of each computing node expands a set of NvRAM memories through the FPGA. NvRAM memory is typically used to record fault information when an anomaly occurs in the system. Provision is made for only a single processor core running the highest security level computing task within the multi-core processor to access the NvRAM memory, and for other processor cores not to be allowed to directly access the NvRAM memory (other processor cores may be prohibited from accessing the address field in which the NvRAM memory resides by configuration management of the multi-core processor). If other processor cores also have fault information needing to be recorded by the NvRAM, corresponding data can be presented to the processor core with the NvRAM access authority through the DDRx memory address space shared by the processor cores, and the proxy of the corresponding data is written into the NvRAM (the proxy writing operation should be periodically performed or performed when idle, and the normal operation of the self task of the processor core with the NvRAM access authority should not be influenced).
The design mode of the NvRAM ensures that the normal operation of the high-security-level task is not influenced by the low-security-level task. When a processor core running a low-security-level computing task fails, the shared access DDRx memory address space between the processor core and the processor core with the NvRAM access authority is only affected at most, and the normal running of the computing task in other processor cores is not affected.
5. PCIe bus
In each computing node, the multi-core processor communicates with the TTE network end system through a PCIe bus. The multi-core processor of each computing node is connected to a PCIe exchange chip in the computing node through at least two PCIe bus interfaces, and the PCIe exchange chip is connected to the TTE network end system through a PCIe bus.
In each multi-core processor, different security levels are assigned to different PCIe bus interfaces. The PCIe bus interface defining each security level may only communicate with the processor cores and DDRx memory groups running the corresponding security level computation task, and may not communicate with the processor cores and DDRx memory groups running the other security level computation tasks. (the access rights of each PCIe bus interface and each PCIe bus interface to each set of DDRx memories can be set through the configuration management of the multi-core processor.) namely, the processor cores running different security level computing tasks and the DDRx main memories must communicate with TTE network end systems by using different PCIe bus interfaces.
The PCIe switch chip must be specified to support multiple Virtual Channels (Virtual Channels). Multiple Virtual lanes (Virtual Channels) are used to support multiple transmission priorities for PCIe bus packets. According to the PCIe specification, when the PCIe link is only a single virtual lane, the bus packets cannot be prioritized, and only when the PCIe link is a multi-lane virtual lane, the bus packets can be prioritized (high priority bus packets use high priority virtual lanes and low priority bus packets use low priority virtual lanes). A PCIe bus interface of a high security level in a multi-core processor is specified, and communication is performed only by using PCIe bus data packets of a high priority level, and communication is performed only by using PCIe bus data packets of a low priority level. The PCIe switch chip may thus provide a high priority virtual channel for data communications for high security level computing tasks and a low priority virtual channel for data communications for low security level computing tasks.
Taking fig. 4 as an example, the example multi-core processor internally includes 3 processor cores (core a, core B, core C), 3 PCIe bus interfaces (interface L, interface M, interface N), and external connection X, Y, Z three sets of DDRx memories. The processor core A and the X-group DDRx memory run the highest security level computing task and can only communicate with the PCIe bus interface L, one PCIe bus interface is extended to be connected to the PCIe switching chip by using the highest security level computing task, and the PCIe bus interface L only uses PCIe data packets with the highest priority level to communicate; the processor core B and the Y-group DDRx memories run secondary high-security level computing tasks and can only communicate with a PCIe bus interface M, one PCIe bus interface is extended to be connected to a PCIe switching chip by using the secondary high-security level computing tasks, and the PCIe bus interface M only uses a secondary high-priority PCIe data packet for communication; the processor core C and the Z-group DDRx memory run the lowest security level computing task and can only communicate with the PCIe bus interface N, one PCIe bus interface is extended to be connected to the PCIe switching chip by using the lowest security level computing task, and the PCIe bus interface N only uses PCIe data packets with the lowest priority level to communicate. The PCIe exchange chip supports 3 paths of virtual channels and is connected to the TTE network end system through a PCIe bus.
This design of PCIe bus has two advantages: 1. the processor cores/DDRx memories running different security level computing tasks are guaranteed as far as possible, and are mutually independent and do not interfere with each other (each processor core is provided with an independent PCIe bus interface, and the exchange interconnection structure of the multi-core processor supports each processor core to simultaneously access the corresponding PCIe bus interface and supports each PCIe bus interface to simultaneously access the corresponding DDRx memory); 2. the normal operation of the high-security-level task is ensured not to be affected by the low-security-level task. When the communication between the processor core running the low-security-level computing task, the DDRx memory and the TTE network end system fails, the communication between the processor core running the high-security-level computing task, the DDRx memory and the TTE network end system can still be continued. Taking fig. 4 as an example, when the processor core B, the processor core C, Y, the DDRx memory of the Z group and the PCIe bus interface M, PCIe bus interface N running the tasks with the highest and lowest security levels fail, even if an erroneous data packet is sent to the PCIe bus network, the communication between the DDRx memory of the processor core A, X group and the TTE network system through the PCIe bus interface L (the highest priority data packet is adopted), the highest security level computing task can still continue to run.
6. TTE network end system
Each computing node comprises a TTE (TIME TRIGGERED ETHERNET ) network end system which extends the dual-redundancy or triple-redundancy TTE network interface outwards. The TTE network end system is used for receiving and transmitting various TTE network data frames.
In the safety isolation system of the invention, a TTE network is adopted as a communication network between the safety isolation systems, because the TTE network supports mixed transmission of three traffic flows of TT (TIME TRIGGERED ) data frames, RC (RateConstrained, rate limited) data frames and BE (Best Effort) data frames, and the safety isolation system is particularly suitable for data communication of mixed critical tasks. The transmission of TT data frames can strictly ensure the certainty of transmission time, and meets the high-reliability and strong real-time requirements of safety critical task data communication in a plurality of embedded computing application fields; the transmission of RC data frames can avoid network congestion while guaranteeing high bandwidth, and meets the high bandwidth and real-time requirements of task-critical task data communication in a plurality of embedded computing application fields; the transmission of BE data frames utilizes the residual bandwidth in the network as much as possible, the transmission delay is not guaranteed, and the BE data frames are suitable for data communication of non-critical tasks. In the TTE network, the three types of data frames can BE transmitted in a mixed mode, and the transmission priority of the three types of data frames is TT data frames, RC data frames and BE data frames in sequence from high to low. In the design of TTE network components (network end systems and network switches), the normal transmission of high transmission priority data frames is strictly ensured not to be affected by low transmission priority data frames.
In the safety isolation system, safety critical tasks, task critical tasks and non-critical tasks respectively use TT, RC and BE data frames to carry out data communication, so that the safety isolation system has two advantages: 1. the TTE network communication resources can be shared by various tasks, and the tasks are transmitted in a mixed mode; the transmission priority mechanism of the TTE network ensures that normal communication of the high-security-level task is not affected by the low-security-level task. In addition, in the TTE network, each TTE network end system typically has a dual-redundancy or tri-redundancy TTE network interface, and separate MAC layer, physical layer circuits are provided for each network interface. Multiple TTE network interfaces of each TTE network end system receive and transmit identical data. This redundancy design of the TTE network may improve the reliability, availability of the communication link, benefiting the high reliability, high security communication requirements of the security critical tasks.
7. FPGA and watchdog
The multi-core processor of each computing node is externally connected with an FPGA chip. The FPGA chip is used for realizing the expansion access of the multi-core processor to the FLASH and the NvRAM memories on one hand and realizing the watchdog circuit on the other hand. FLASH and NvRAM memories were described above, and only the design of the watchdog circuit implemented inside the FPGA is described here.
The FPGA internally realizes a plurality of watchdog timers, and the number of the watchdog timers is equal to the number of the processor cores which are started in the multi-core processor. The watchdog timers are in one-to-one correspondence with the processor cores, and each watchdog timer monitors the running state of the corresponding processor core. Provision is made for only a single processor core running the highest security level computing task within the multi-core processor to directly access all of the watchdog timers, and none of the other processor cores are allowed to directly access the watchdog timers. When other processor cores need to operate the corresponding watchdog timer, the operation information is written into the DDRx memory address space shared to be accessed among the processor cores. The processor cores with the watchdog timer access authority need to periodically query the shared access DDRx memory address space among the processor cores except for operating the corresponding watchdog timer, acquire the watchdog timer operation information needed by other processor cores, and proxy executes corresponding watchdog timer access. In addition, the processor cores with the access authority of the watchdog timer need to monitor the states of other watchdog timers periodically besides the states of the watchdog timers corresponding to the processor cores, and feed back the states of the other watchdog timers to the corresponding processor cores through the DDRx memory address space shared by the processor cores.
This design of the watchdog circuit has two advantages: 1. the task running condition of each processor core in the multi-core processor can be independently monitored; 2. the normal operation of the high-security-level task is ensured not to be affected by the low-security-level task. When a processor core running a low security level computing task fails, the shared access DDRx memory address space between the processor core and the processor core with the watchdog timer access authority is only affected at most, and the normal running of other processor cores (and corresponding watchdog timers thereof) is not affected.
8. RapidIO bus
In the computer system of fig. 1, the multi-core processors in the two computing nodes respectively extend one RapidIO bus interface outwards, and are connected to the same RapidIO exchange chip, and the two computing nodes can perform data communication through the RapidIO exchange chip.
The RapidIO bus is specified to be only available for data communication for a certain security level computing task. The RapidIO summary interface can only communicate with the processor core and DDRx memory group running this security level computing task, but not with the processor core and DDRx memory group running other security level computing tasks. The usage constraint of the rapidIO bus can be set by configuration management of the multi-core processor, and the access rights of the rapidIO bus interface to the DDRx memories of the groups are checked by the processors, so that the computation tasks of different security levels cannot be affected mutually.
The use of RapidIO bus communication between two computing nodes is not limited. Two application scenarios are illustrated: 1. the two computing nodes respectively run the same safety critical tasks, the two computing nodes form a command/monitoring comparison pair, the computing results of the safety critical tasks of the two computing nodes are mutually transmitted through a rapidIO bus, and the computing results are compared with each other to detect the correctness of the running of the two computing nodes; 2. the two computing nodes cooperate to complete the same task key task, data related to the task are transmitted between the two computing nodes through the rapidIO bus (instead of the TTE network), the communication bandwidth in the computer system of FIG. 1 is reasonably utilized, and the TTE network load is reduced.
9. Debug interface
The multi-core processor of each compute node extends the ethernet interface and serial interface outward to serve as a debug interface. The ethernet interface and the serial interface are only enabled when the compute node is in a debug state. And when the computing node normally works to execute tasks, the Ethernet interface and the serial interface are forbidden.
In practical applications, the computer system of fig. 1 may be used independently as a single computer, or multiple computer systems of fig. 1 may be connected by using a TTE network to form a larger-scale computer for use.
When the computer system of fig. 1 is used independently as a single computer, the safety-critical tasks, the task-critical tasks, and the non-critical tasks can be simultaneously executed, as shown in fig. 5. In the example of FIG. 5, each of the compute nodes' multicore processors is assigned a processor core, a set of DDRx memory, and a PCIe bus interface for each of the safety critical tasks, the task critical tasks, and the non-critical tasks. The two computing nodes can respectively run the same safety critical tasks, the computing results of the safety critical tasks of the two computing nodes are mutually transmitted through the rapidIO bus, and the computing results are compared with each other to detect the correctness of the operation of the two computing nodes, so that the correct operation of the safety critical tasks is ensured.
Multiple computer systems of FIG. 1 may also be connected together using a TTE network to form a larger scale computer for use as shown in FIG. 6 (other devices may also be connected to the TTE network). The large-scale computer can simultaneously run safety critical tasks, task critical tasks and non-critical tasks. And performing deployment of various tasks on hardware resources of each computing node. For safety critical tasks, the large-scale computer can flexibly construct various fault-tolerant configurations to meet the requirements of reliability and safety. The TTE network itself also needs to ensure the reliability and safety requirements of the safety critical tasks, but the fault-tolerant design of the TTE network is not the key point of the patent and is not described again.
In summary, in a large-scale complex embedded computing system, the invention can further integrate the safety critical task originally running on a high-reliability computer and the task critical task and the non-critical task originally running on a high-performance computer into the same computer with high reliability and high performance to run, thereby further improving the integration level of the complex embedded computing system, improving the data fusion level and the tight collaboration capability between different computing tasks on one hand, and helping to reduce the whole volume, weight and power consumption of the embedded computing system on the other hand, thereby improving the whole function, performance and running efficiency of the embedded computing application scene.

Claims (5)

1. The safety isolation system for hybrid operation of the tasks with multiple safety levels is characterized by comprising a multi-core processor, a DDRx memory, a FLASH memory, an NvRAM memory, a PCIe exchange chip, a TTE network end system and an FPGA, wherein:
The multi-core processor is respectively connected with the PCIe exchange chip, the FPGA, the X-group DDRx memory, the Y-group DDRx memory and the Z-group DDRx memory; the PCIe exchange chip is connected with the TTE network end system; the FPGA is respectively connected with the FLASH memory and the NvRAM memory; the PCIe exchange chip is connected with the TTE network end system through a PCIe bus; the FPGA comprises a watchdog circuit; the TTE network end system expands a dual-redundancy or tri-redundancy TTE network interface outside the safety isolation system;
The multi-core processor comprises a processor core A, a processor core B and a processor core C, wherein the processor core A is a group of processor cores running the tasks with the highest security level, the processor core B is a group of processor cores running the tasks with the next highest security level, and the processor core C is a group of processor cores running the tasks with the lowest security level;
The multi-core processor comprises a PCIe bus interface L, PCIe, a PCIe bus interface M and a PCIe bus interface N; PCIe bus interface L, PCIe bus interface M and PCIe bus interface N are both connected with PCIe switching chips;
Each processor core in the multi-core processor operates a task corresponding to the security level according to a preset security level, wherein a processor core A operates a task with the highest security level, a processor core B operates a task with the next highest security level, and a processor core C operates a task with the lowest security level; the task of the processor core A runs in an X-group DDRx memory, the task of the processor core B runs in a Y-group DDRx memory, and the task of the processor core C runs in a Z-group DDRx memory;
After the system is reset and started, the programs and data required by the operation of each processor core are restored to the DDRx memory, and the multi-core processor is not allowed to check the FLASH memory to access during the task operation;
NvRAM memory, only allowing processor core a to access the NvRAM memory; the processor core B and the processor core C share the data which needs to be written into the NvRAM through the DDRx memory address space shared and accessed among the processor cores, and the data is shared to the processor core A and is written into the NvRAM by the processor core A agent;
The PCIe bus interface L only communicates with the processor core A and the X group DDRx memory; the PCIe bus interface M only communicates with the processor core B and the Y-group DDRx memory; the PCIe bus interface N only communicates with the processor core C and the Z-group DDRx memory; the PCIe bus interface L uses the PCIe bus data packet with the highest priority to communicate; the PCIe bus interface M uses the next highest priority PCIe bus data packets for communication; PCIe bus interface N communicates using lowest priority PCIe bus packets.
2. The system of claim 1, wherein a plurality of watchdog timers are implemented within the FPGA to monitor the operating state of each processor core; only processor core a is allowed to access all watchdog timers; when the processor core B and the processor core C need to access the watchdog timer, the access operation information is shared to the processor core A through the shared access DDRx memory address space among the processor cores, and is accessed by the processor core A agent.
3. The system of claim 1, wherein the system further comprises a controller configured to control the controller,
The processor core A and the processor core B need to share data by sharing preset address areas of the DDRx memory, and the shared address areas of the processor core A and the processor core B are in the Y-group DDRx memory;
The processor core A and the processor core C need to share data by sharing a preset address area of the DDRx memory, and the shared address area of the processor core A and the processor core C is in the Z group DDRx memory;
processor core B and processor core C need to share data by sharing the DDRx memory preset address area, and the shared address area of processor core B and processor core C is in the Z-group DDRx memory.
4. The system of claim 1, wherein the PCIe switch chip supports multiple virtual lanes.
5. The system of claim 1, wherein the multi-core processor is a multi-core processor of a switched interconnect fabric.
CN202011227314.7A 2020-11-05 2020-11-05 Safety isolation system for hybrid operation of multiple safety level tasks Active CN112416702B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011227314.7A CN112416702B (en) 2020-11-05 2020-11-05 Safety isolation system for hybrid operation of multiple safety level tasks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011227314.7A CN112416702B (en) 2020-11-05 2020-11-05 Safety isolation system for hybrid operation of multiple safety level tasks

Publications (2)

Publication Number Publication Date
CN112416702A CN112416702A (en) 2021-02-26
CN112416702B true CN112416702B (en) 2024-05-24

Family

ID=74827699

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011227314.7A Active CN112416702B (en) 2020-11-05 2020-11-05 Safety isolation system for hybrid operation of multiple safety level tasks

Country Status (1)

Country Link
CN (1) CN112416702B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115081033B (en) * 2022-07-20 2022-11-11 南方电网数字电网研究院有限公司 Service safety isolation method for edge computing device of digital power distribution network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106940687A (en) * 2017-04-06 2017-07-11 上海航天测控通信研究所 A kind of low-cost and high-performance space computer
CN111026573A (en) * 2019-11-19 2020-04-17 中国航空工业集团公司西安航空计算技术研究所 Watchdog system of multi-core processing system and control method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8503484B2 (en) * 2009-01-19 2013-08-06 Honeywell International Inc. System and method for a cross channel data link
CA3041597C (en) * 2016-10-31 2020-07-07 Leonardo S.P.A. Certifiable deterministic system software framework for hard real-time safety-critical applications in avionics systems featuring multi-core processors
US10663964B2 (en) * 2017-08-04 2020-05-26 Facebook, Inc. Unified and redundant flight and mission control for an unmanned aerial vehicle

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106940687A (en) * 2017-04-06 2017-07-11 上海航天测控通信研究所 A kind of low-cost and high-performance space computer
CN111026573A (en) * 2019-11-19 2020-04-17 中国航空工业集团公司西安航空计算技术研究所 Watchdog system of multi-core processing system and control method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
多核处理器核间高速通讯架构的研究;汪健;张磊;王少轩;赵忠惠;陈亚宁;;电子与封装;20110620(第06期);46-53 *

Also Published As

Publication number Publication date
CN112416702A (en) 2021-02-26

Similar Documents

Publication Publication Date Title
US7020076B1 (en) Fault-tolerant communication channel structures
US7971029B2 (en) Barrier synchronization method, device, and multi-core processor
US11914440B2 (en) Protocol level control for system on a chip (SoC) agent reset and power management
JP4529767B2 (en) Cluster configuration computer system and system reset method thereof
US11973338B2 (en) Chip-level software and hardware cooperative relay protection device
US10007629B2 (en) Inter-processor bus link and switch chip failure recovery
US20090193229A1 (en) High-integrity computation architecture with multiple supervised resources
CN116881053B (en) Data processing method, exchange board, data processing system and data processing device
CN112416702B (en) Safety isolation system for hybrid operation of multiple safety level tasks
CN111858456A (en) Arrow-mounted full-triple-modular redundancy computer system architecture
CN113515408A (en) Data disaster tolerance method, device, equipment and medium
CN113806290B (en) High-integrity system-on-a-chip for integrated modular avionics systems
CN114356665A (en) Comprehensive photoelectric signal processing computing resource management method
US9548906B2 (en) High availability multi-partition networking device with reserve partition and method for operating
KR102053849B1 (en) Airplane system and control method thereof
Wächter et al. A hierarchical and distributed fault tolerant proposal for NoC-based MPSoCs
US11881982B2 (en) Transaction-based messaging and logging infrastructure for networking systems and computing devices
CN114564340B (en) High availability method for distributed software of aerospace ground system
Abbasi Zadeh et al. Load migration in distributed softwarized network controllers
CN111142945A (en) Dynamic switching method for master channel and slave channel of dual-redundancy computer
Rozhdestvenskaya et al. Additional approaches for onboard networks FDIR
CN110609845A (en) Big data redundancy disaster recovery method, big data service system and query method
JP4579242B2 (en) Apparatus and method for connecting processing nodes in a distributed system
Pimentel et al. A fault management protocol for TTP/C
CN111984376B (en) Protocol processing method, device, equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant