CN113705802B

CN113705802B - Synchronous calculation method, device, system, program product and medium for automatic driving

Info

Publication number: CN113705802B
Application number: CN202110847499.XA
Authority: CN
Inventors: 宋朝忠; 雷振华
Original assignee: Shenzhen Echiev Autonomous Driving Technology Co ltd
Current assignee: Shenzhen Echiev Autonomous Driving Technology Co ltd
Priority date: 2021-07-26
Filing date: 2021-07-26
Publication date: 2023-09-08
Anticipated expiration: 2041-07-26
Also published as: CN113705802A

Abstract

The invention discloses a synchronous calculation method, a device, a system, a program product and a medium for automatic driving, wherein the method comprises the following steps: acquiring the target number of external targets around the target vehicle at the current moment and target data corresponding to each external target; sending each target data to a corresponding calculation unit block according to the target number, and calculating the received target data through each calculation unit block to obtain a corresponding calculation result; and determining the driving strategy of the target vehicle at the next moment according to the calculation result so as to enable the target vehicle to drive according to the driving strategy. According to the method and the device, under the condition that no additional computing resource is added, the target data of all external targets are adaptively and synchronously computed according to the target number of the external dynamic change, so that the flexibility and the synchronism of the processing process are reflected, and the prediction precision in the automatic driving process can be improved.

Description

Synchronous calculation method, device, system, program product and medium for automatic driving

Technical Field

The invention relates to the technical field of automatic driving, in particular to a synchronous calculation method, a synchronous calculation system, a synchronous calculation program product and a synchronous calculation medium for automatic driving.

Background

With the rapid development of deep learning technology, RNN (Recurrent Neural Network ) algorithms are also becoming increasingly widely used. For example, in the field of autopilot, more accurate prediction results can be obtained through RNN algorithms, providing more reliable prediction data for the driving planning and decision-making of subsequent autopilot vehicles.

Currently, the floor modes for implementing the RNN algorithm mainly include GPU (Graphics Processing Unit, graphics processor), CPU (Central Processing Unit ) and ASIC (application specific integrated chip), but all three floor modes are processed by adopting a time-sharing calculation mode. For example, if the processing is performed in a time-sharing parallel manner, the calculation manner of the system cannot be dynamically adjusted according to the number of external targets, and in order to adapt to the situation of different external targets, a chip with a larger capacity is generally replaced, which causes an increase in resources and cost; if the processing is performed in a time-sharing serial manner, although different external target numbers can be adapted, in actual processing, the calculation unit implementing the RNN algorithm needs to sequentially calculate one or two external target information in a certain order until a plurality of target information outside the target vehicle is calculated, so that when the processing is performed in a time-sharing serial manner, a plurality of external target information cannot be calculated simultaneously.

Therefore, the existing automatic driving calculation scheme cannot flexibly adapt to various external target numbers or cannot obtain high synchronism when adapting to various external target numbers, and influences the prediction accuracy, so that how to balance the dynamic adaptability and the prediction accuracy in the automatic driving process becomes a main research direction of various manufacturers.

Disclosure of Invention

The invention mainly aims to provide an automatic driving synchronous calculation method, an automatic driving synchronous calculation system, a program product and a medium, and aims to achieve high synchronous prediction precision in a processing process by dynamically adapting the target numbers of different external targets without increasing extra resource cost.

In order to achieve the above object, the present invention provides a synchronous calculation method for automatic driving, the method comprising the steps of:

acquiring the target number of external targets around a target vehicle at the current moment and target data corresponding to each external target;

sending each target data to a corresponding calculation unit block according to the target number, and calculating the received target data through each calculation unit block to obtain a corresponding calculation result;

and determining a driving strategy of the target vehicle at the next moment according to the calculation result so as to enable the target vehicle to drive according to the driving strategy.

Preferably, the step of transmitting each of the target data to a corresponding calculation unit block according to the target number includes:

acquiring the total number of the computing unit blocks, and determining the computing unit blocks corresponding to the external targets according to the target number and the total number of the computing unit blocks;

and sending the corresponding target data of each external target to a corresponding computing unit block.

Preferably, the step of sending the target data corresponding to each external target to the corresponding computing unit block includes:

acquiring data dimensions corresponding to the target data, and classifying the target data according to the data dimensions to obtain classified dimension data;

and sending the corresponding dimension data of each external target to a corresponding computing unit block.

Preferably, the step of sending the dimension data corresponding to each external target to the corresponding computing unit block includes:

respectively obtaining the number of the computing unit blocks corresponding to the external targets;

if the number of the computing unit blocks corresponding to the external target is at least two, grouping the target dimension data corresponding to the external target, and distributing the grouped target dimension data to a plurality of computing unit blocks corresponding to the external target;

and if the number of the computing unit blocks corresponding to the external target is one, transmitting the corresponding target dimension data of the external target to the corresponding computing unit blocks.

Preferably, each of the computing unit blocks includes a computing matrix, each of the computing matrices includes a data column of a different data dimension, and the step of transmitting the corresponding dimension data of each of the external targets to the corresponding computing unit block includes:

if the number of the computing unit blocks corresponding to the external target is at least two, distributing the target dimension data after grouping processing to data columns with the same data dimension in the corresponding multiple computing unit blocks respectively;

or,

and if the number of the computing unit blocks corresponding to the external target is one, respectively sending the corresponding target dimension data of the external target to the data columns with the same data dimension in the corresponding computing unit blocks.

Preferably, the step of calculating the received target data by each calculation unit block to obtain a corresponding calculation result includes:

respectively obtaining weight parameters corresponding to each row in each calculation matrix;

and calculating the dimension data of each column in each calculation matrix according to the weight parameters corresponding to each row to obtain a corresponding calculation result.

In addition, to achieve the above object, the present invention also provides an automatic driving synchronization calculating apparatus, including:

the acquisition module is used for acquiring the target number of the external targets around the target vehicle at the current moment and the target data corresponding to each external target;

the calculation module is used for sending each target data to a corresponding calculation unit block according to the target number, and calculating the received target data through each calculation unit block respectively to obtain a corresponding calculation result;

and the running module is used for determining a running strategy of the target vehicle at the next moment according to the calculation result so as to enable the target vehicle to run according to the running strategy.

Furthermore, to achieve the above object, the present invention provides a program product comprising a computer program which, when executed by a processor, implements the steps of the synchronous calculation method for autopilot as described above.

In addition, to achieve the above object, the present invention also provides an automatic driving synchronization computing system, including: the system comprises a memory, a processor and an autopilot synchronous calculation program stored on the memory and capable of running on the processor, wherein the autopilot synchronous calculation program realizes the steps of the autopilot synchronous calculation method when being executed by the processor.

In addition, to achieve the above object, the present invention also provides a medium, preferably a computer-readable storage medium, on which an autopilot synchronous calculation program is stored, which when executed by a processor, implements the steps of the autopilot synchronous calculation method as described above.

According to the automatic driving synchronous calculation method, the target number of the external targets around the target vehicle at the current moment and the target data corresponding to the external targets are obtained; sending each target data to a corresponding calculation unit block according to the target number, and calculating the received target data through each calculation unit block to obtain a corresponding calculation result; and determining the driving strategy of the target vehicle at the next moment according to the calculation result so as to enable the target vehicle to drive according to the driving strategy. According to the method and the device, under the condition that no additional computing resource is added, the target data of all external targets are adaptively and synchronously computed according to the target number of the external dynamic change, so that the flexibility and the synchronism of the processing process are reflected, and the prediction precision in the automatic driving process can be improved.

Drawings

FIG. 1 is a schematic diagram of a system architecture of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flowchart of a first embodiment of an automatic driving synchronization calculation method according to the present invention;

FIG. 3 is a schematic block diagram of the RNN accelerator hardware according to the preferred embodiment of the autopilot synchronous computing method of the present invention;

FIG. 4 is a graph of the target distribution of 8 targets outside the synchronous calculation method for autopilot according to the present invention;

FIG. 5 is a target distribution diagram of the target number 3 of the external targets in the synchronous calculation method for automatic driving according to the present invention;

fig. 6 is a schematic diagram of functional modules of a preferred embodiment of the method for calculating synchronization of autopilot according to the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

Referring to fig. 1, fig. 1 is a schematic system architecture diagram of a hardware running environment according to an embodiment of the present invention.

The system of the embodiment of the invention can be a PC, a management server, a cloud server and the like.

As shown in fig. 1, the system may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a stable memory (non-volatile memory), such as a disk memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.

Those skilled in the art will appreciate that the system architecture shown in fig. 1 is not limiting of the system and may include more or fewer components than shown, or certain components may be combined, or a different arrangement of components.

As shown in fig. 1, an operating system, a network communication module, a user interface module, and a synchronous calculation program for autopilot may be included in the memory 1005 as one type of computer medium.

The operating system is a program for managing and controlling the synchronous computing system and the software resource of the automatic driving, and supports the operation of a network communication module, a user interface module, the synchronous computing program of the automatic driving and other programs or software; the network communication module is used to manage and control the network interface 1002; the user interface module is used to manage and control the user interface 1003.

In the automated driving synchronization computing system shown in fig. 1, the automated driving synchronization computing system invokes an automated driving synchronization computing program stored in the memory 1005 through the processor 1001 and performs operations in various embodiments of the automated driving synchronization computing method described below.

Based on the hardware structure, the embodiment of the synchronous calculation method for the automatic driving is provided.

Referring to fig. 2, fig. 2 is a flowchart of a first embodiment of a synchronous calculation method for automatic driving according to the present invention, where the method includes:

step S10, obtaining the target number of external targets around the target vehicle at the current moment and target data corresponding to each external target;

the automatic driving synchronous calculation method is applied to an automatic driving scene, and for convenience of description, the automatic driving synchronous calculation system is abbreviated as a synchronous calculation system. In this embodiment, the synchronous computing system includes RNN accelerator hardware, a functional block diagram of which is shown in fig. 3. As known from the schematic block diagram of the RNN accelerator hardware, the RNN accelerator hardware includes a control unit control, a first-level buffer, a second-level buffer, a calculation unit block, and other modules, where the control unit is configured to adjust a working mode of the accelerator hardware according to a target number of an external target of the target vehicle; the first level cache is a RAM (Random Access Memory ) portion in fig. 3, which may include a data input RAM for storing target data of an external target, a parameter input RAM, and an output RAM; the parameter input RAM is used for storing weight parameters corresponding to each row of calculation data of the calculation matrix; the output RAM is used for storing the calculation result obtained by calculating the matrix, and is preferably PRAM (Parallel Random Access Machine, parallel random access memory); the calculation unit block is used for independently calculating the received target data.

In addition, each first-level buffer memory has a plurality of second-level buffers corresponding to the first-level buffer memory, the contents in the first-level buffer memory and the second-level buffer memory are replicas (mapping) of data with high access frequency in the memory, each calculation unit block corresponds to a parameter input second-level buffer memory respectively, and the first-level buffer memory and the second-level buffer memory are arranged to reduce the access of the high-speed CPU to the slow memory. As shown in fig. 3, the data input RAM corresponds to a plurality of data input secondary buffers xh and bias secondary buffers bias, and the addition of bias can make the RNN network more flexible. For example, if the RNN network includes four structural gates, i.e., an input gate, a forget gate, a state gate, and an output gate, the parameter input RAM may be connected to the input gate secondary cache wi, the forget gate secondary cache wf, the state gate secondary cache wc, and the output gate secondary cache wo, respectively.

Because the hardware for realizing the RNN algorithm at present adopts a time-sharing processing mode, on one hand, if the target information around the target vehicle is independently acquired, the target information around the target vehicle at the decision moment cannot be ensured to be synchronous, and the accuracy and the reliability of the prediction decision are possibly reduced; on the other hand, if the target information around the target vehicle is collected uniformly, although the synchronization function of the surrounding target information can be realized, the repeated reading of the RNN parameter by the hardware is increased by independent time-sharing calculation, so that the data in the calculation process is repeatedly cached for a plurality of times, and the power consumption and the internal cache of the system are increased.

In this embodiment, the number of targets of the external targets and the target data corresponding to each external target may be determined by means of a sensor, a laser radar, a camera, and other instruments set in the target vehicle in advance, or the acquisition function of the external environment data information may be implemented by using V2X technology (vehicle to everything, vehicle wireless communication technology), then the acquired data information is subjected to data processing to obtain target data with a preset dimension, and then the processed target data is sent to DDR (i.e., DDR SDRAM, double Data Rate Synchronous Dynamic Random Access Memory, double rate synchronous dynamic random access memory) for storage, where the external targets may be external moving objects, such as vehicles running, or static objects, such as vehicles stopping running, stationary obstacles, and the target data may include a running speed, a running acceleration, relative position information between the target vehicle, a distance between the target vehicle, and the like. The target data stored in the DDR may be 128-dimensional data or 64-dimensional data, and the preset dimension may be set according to actual application requirements, which is not limited herein. In addition, the RAM portion as shown in fig. 3 can perform data interaction with the external DDR, and therefore, the number of external targets around the target vehicle at the present time and the target data corresponding to the respective external targets can also be directly obtained from the DDR.

Because the automatic driving vehicle is required to monitor the road condition around the vehicle in real time when driving on the road, the monitoring range and the maximum target number of the external targets monitored simultaneously can be set according to the actual hardware conditions and application requirements. For convenience of description, taking the RNN acceleration hardware to support synchronous calculation of at most 8 external targets around the target vehicle as an example, the range of the target number is 0-8. As shown in fig. 4, the automatic driving target vehicle is represented by a middle shaded box, an arrow represents the driving direction of the target vehicle at the current moment, and in the case of dense driving vehicles, the surrounding 8 external targets around the target vehicle need to be synchronously calculated, and the movement track of the 8 external targets at the next moment is predicted to make a driving planning decision of the target vehicle at the next moment.

If there is no external target around the target vehicle at the current time, the synchronous computing system cannot acquire any target data, and the computing unit block is not required to synchronously compute the external target information. Thus, the embodiments of the present invention are mainly described with respect to the case where at least one external target exists around the target vehicle during the automatic driving, and when no other external target exists around the target vehicle, the driving of the target vehicle is controlled only according to the real-time road traffic condition.

Step S20, sending each target data to a corresponding calculation unit block according to the target number, and calculating the received target data through each calculation unit block respectively to obtain a corresponding calculation result;

in this embodiment, when the target data of each external target is sent to the corresponding calculation unit block according to the target number of the external targets around the target vehicle at the current time, the target data received by different calculation unit blocks may be different portions of the target data corresponding to the same external target or may be the target data corresponding to different external targets, and then each calculation unit block performs calculation prediction on the received target data to obtain a corresponding calculation result. Specifically, in each calculation cycle, the control unit selects the calculation mode of each calculation unit block according to the number of targets of the external targets, for example, if the sensing device in the target vehicle can identify 8 surrounding external targets, the calculation unit block supports 9 different calculation modes of 0 to 8 external targets. And then determining data distribution from the data input primary cache to the corresponding secondary cache in the calculation time according to the selected calculation mode. According to the number of external targets dynamically changing around the target vehicle and the total number of the computing unit blocks, the corresponding relation between the number of targets and the computing unit blocks is selected to determine the computing unit blocks corresponding to all external vehicles, and input target data is sent to all the computing unit blocks in the system while the optimal computing mode of the computing unit blocks is dynamically determined, so that the computing efficiency and the utilization rate of the computing unit blocks can be improved, and the prediction rate can be improved by improving the utilization rate of the computing unit blocks in the system.

When the RNN network calculates different external targets, the calculation parameters and the calculation amounts are the same, so that the calculation process of the plurality of external targets can be finished at the same time when the calculation is started at the same time, thereby ensuring the synchronism of the processing process.

And step S30, determining a driving strategy of the target vehicle at the next moment according to the calculation result so as to enable the target vehicle to drive according to the driving strategy.

In the present embodiment, the driving policy of the target vehicle at the next time may be determined based on the calculation result, for example, it is determined that the target vehicle is running at a certain running speed at the next time, which is shifted to the left/right by 30 °, or the like, and then the target vehicle is running according to the driving policy. The target data of the external target is calculated through different calculation unit blocks, so that the RNN parameters of the acceleration calculation hardware and the multiple accesses of the intermediate data to the cache memory can be reduced under the condition of ensuring the prediction accuracy.

Further, based on the first embodiment of the synchronous calculation method for automatic driving of the present invention, a second embodiment of the synchronous calculation method for automatic driving of the present invention is provided.

The second embodiment of the synchronous calculation method of automatic driving is different from the first embodiment of the synchronous calculation method of automatic driving in that, further, the step of transmitting each of the target data to a corresponding calculation unit block according to the target number includes:

step a1, obtaining the total number of the calculation unit blocks;

and a2, transmitting each target data to the corresponding computing unit block according to the target number and the total number of the computing unit blocks.

In this embodiment, since the total number of the computing unit blocks in the synchronous computing system may be preset according to the actual hardware design condition, for example, the target number of the external targets may be greater than, less than or equal to the target number of the computing unit blocks, the computing unit blocks corresponding to the respective external targets may be determined according to the target number of the external targets and the total number of the computing unit blocks to transmit the corresponding target data to the corresponding computing unit blocks. Specifically, each computing unit block may be numbered in advance, and if the synchronous computing system includes 8 computing unit blocks, the numbers corresponding to the computing unit blocks may be A1-A8.

If the distribution of the external targets around the target vehicle at a certain time is shown in fig. 5, it is known that the number of targets of the external targets at this time is 3, it is necessary to perform synchronous calculation on the three external targets and predict the motion trajectories of the three external targets at the next time. If the external target numbers are 1, 2 and 3, respectively, the computing unit blocks corresponding to the external targets can be determined according to a preset distribution mode. For example, if the computing unit blocks corresponding to the external targets are determined according to the allocation method of 233, the target data corresponding to the external target 1 may be sent to the computing unit blocks A1 and A2; sending target data corresponding to the external target 2 to the computing unit blocks A3, A4 and A5; target data corresponding to the external target 3 is sent to the calculation unit blocks A6, A7, and A8. When the calculation unit blocks are allocated according to different target numbers, a preset allocation mode can be set according to actual application requirements, and the allocation mode is not particularly limited.

Further, the step of sending the target data corresponding to each external target to the corresponding computing unit block includes:

step b1, obtaining data dimensions corresponding to the target data, and classifying the target data according to the data dimensions to obtain classified dimension data;

and b2, transmitting the corresponding dimension data of each external target to a corresponding calculation unit block.

In this embodiment, in general, the total data dimensions corresponding to the target data of different external targets are the same, for example, all the target data of 0, 1, 2. However, the target data of each external target may be data with different data dimensions, so that the classified dimensional data may be obtained by classifying the target data with different data dimensions, as shown in fig. 3, after the target data of each external target is obtained from the data input RAM, the target data of each external target may be classified according to the data dimensions, that is, the target data with the same data dimensions are classified into one type, so that each type of data is the target data with different data dimensions, thereby obtaining classified dimensional data, and then the dimensional data is respectively stored in the data input secondary buffer with corresponding dimensions, and then the dimensional data corresponding to each external target is respectively sent to the corresponding computing unit from the data input secondary buffer. For example, if the computing unit block corresponding to the external target 1 includes A1 and A2, and the dimensional data of the external target 1 from 0 dimension to a (a < n) dimension is transmitted to the computing unit block A1, and the dimensional data of the external target 1 from (a+1) dimension to n dimension is transmitted to the computing unit block A2, the dimensional data of the external target 1 from 0 dimension to a dimension is transmitted to the computing unit block A1, and the dimensional data of the external target 1 from (a+1) dimension to n dimension is transmitted to the computing unit block A2, respectively, so that all the dimensional data corresponding to the external target of the target vehicle at this time is transmitted to the corresponding computing unit block according to the same transmission principle described above.

step c1, respectively obtaining the number of the computing unit blocks corresponding to the external targets;

step c2, if the number of the computing unit blocks corresponding to the external target is at least two, grouping the target data corresponding to the external target, and distributing the grouped target data to a plurality of computing unit blocks corresponding to the external target;

and c3, if the number of the computing unit blocks corresponding to the external target is one, transmitting the corresponding target data of the external target to the corresponding computing unit blocks.

In this embodiment, in order to improve the utilization rate of the computing unit blocks and improve the prediction efficiency, when performing synchronous computation on the target data of the external target, whether the target data needs to be subjected to packet processing may be determined according to the number of computing unit blocks allocated to the external target, for example, if one computing unit block allocated to the external target includes at least two computing unit blocks, the target data of the external target is subjected to packet processing, and the target data after the packet processing is distributed to a plurality of corresponding computing unit blocks; if there is only one computing unit block allocated to a certain external target, that is, each computing unit block calculates target data of a external target, the target data corresponding to the external target may be directly sent to the computing unit block. If the computing unit blocks corresponding to the external target 1 include two computing unit blocks A1 and A2, respectively, the target data corresponding to the external target 1 may be divided into two parts, and the target data of the two parts may be distributed to the computing unit blocks A1 and A2, respectively; if the computing unit blocks corresponding to the external target 2 include three computing unit blocks A3, A4 and A5, respectively, the target data corresponding to the external target 2 may be divided into three parts, and the target data of the three parts may be distributed to the computing unit blocks A3, A4 and A5, respectively; the computing unit blocks corresponding to the external target 3 include three computing unit blocks A6, A7 and A8, respectively, so that the target data corresponding to the external target 3 can be divided into three parts, and the target data of the three parts can be distributed to the computing unit blocks A6, A7 and A8, respectively.

In the case that the computing unit blocks allocated to the external target include at least two computing unit blocks, the target data is subjected to grouping processing, and different portions of the target data can be simultaneously computed on the plurality of computing unit blocks, so that the computing speed of the external target can be increased, thereby being beneficial to increasing the prediction rate of the synchronous computing system. It can be understood that in the automatic driving scenario, when the number of the external targets is larger, it is indicated that the target vehicle is currently in a dense driving environment, for example, when the periphery of the target vehicle is surrounded by other 8 vehicles, it is indicated that the current driving is crowded and the road condition is poor, at this time, the driving speed of the target vehicle will be relatively slower, and the speed of obtaining the result through calculation is slower, which is exactly matched with the calculated scheme; when the number of the external targets is smaller or no external targets exist, the current road condition of the target vehicle is better, so that the running speed is high, the speed of obtaining the result through calculation is also high, and the speed is exactly matched with the calculated scheme.

According to the automatic driving synchronous calculation method, target data of all external targets are classified according to data dimensions, classified dimension data are obtained, and the calculation rate of a subsequent calculation unit block is improved; according to the total number of the computing unit blocks and the target number of the external targets, the computing unit blocks corresponding to the external targets are determined, so that the utilization rate of the computing unit blocks can be improved; in addition, whether the target data needs to be subjected to grouping processing is determined according to the number of the calculation unit blocks allocated to each external target, and the prediction rate can be improved while the utilization rate of the calculation unit blocks is improved.

Further, based on the first and second embodiments of the synchronous calculation method for autopilot of the present invention, a third embodiment of the synchronous calculation method for autopilot of the present invention is provided.

The third embodiment of the synchronous calculation method for autopilot differs from the first and second embodiments of the synchronous calculation method for autopilot in that each of the calculation unit blocks includes a calculation matrix, each of the calculation matrices includes a data column of different data dimensions, and the step of transmitting the corresponding dimension data of each of the external targets to the corresponding calculation unit block includes:

step d, if the number of the computing unit blocks corresponding to the external target is at least two, distributing the target dimension data after grouping processing to the data columns with the same data dimension in the corresponding multiple computing unit blocks respectively;

or,

and e, if the number of the computing unit blocks corresponding to the external target is one, respectively sending the corresponding target dimension data of the external target to the data columns with the same data dimension in the corresponding computing unit blocks.

In this embodiment, each calculation unit block includes a calculation matrix, an accumulation and output buffer, and the like, and data of each column in each calculation matrix is dimension data of the same dimension, so, as shown in fig. 3, when the calculation unit block receives the corresponding dimension data, the received dimension data is written into the data columns with matched data dimensions respectively. The calculation process of each calculation matrix is completed by multipliers of M rows and N columns, wherein M is determined by the number of gate structures in the RNN network, and N is determined by the data dimension of the calculation unit block. In addition, the accumulation and output buffer is used for receiving the calculation result of each row of the calculation matrix, and carrying out accumulation and stacking operation on the calculation result of M rows to obtain a prediction result.

Further, the step of calculating the received target data by each calculation unit block to obtain a corresponding calculation result includes:

step f1, respectively obtaining weight parameters corresponding to each row in each calculation matrix;

and f2, calculating the dimension data of each column in each calculation matrix according to the weight parameters corresponding to each row to obtain a corresponding calculation result.

In this embodiment, the gate structure in the RNN network is preferably four structural gates, i.e., m=4, as shown in fig. 3, if each structural gate corresponds to one line of data of the multiplier, for example, the input gate corresponds to the first line of the computing matrix, the forget gate corresponds to the second line of the computing matrix, the state gate corresponds to the third line of the computing matrix, the output gate corresponds to the fourth line of the computing matrix, the weight parameter corresponding to the input gate is wi, the weight parameter corresponding to the forget gate is wf, the weight parameter corresponding to the state gate is wc, and the weight parameter corresponding to the output gate is wo, then in each computing matrix, the dimension data of each line of different data dimensions and the corresponding weight parameter, i.e., according to the weight parameter wi corresponding to the first line, the dimension data of the second line and the weight parameter wf corresponding to the second line, the dimension data of the third line and the weight parameter wc corresponding to the third line, the weight parameter wc corresponding to the fourth line and the weight parameter corresponding to the fourth line of the weight parameter corresponding to the fourth line are calculated, and the result is obtained. Each calculation unit block consists of 4 lines and a plurality of columns of multipliers, the 4 lines of multipliers share data provided by the data input secondary buffer, but the respective weight parameters are independently used, and then each line of multipliers is accumulated and calculated step by step, and the obtained calculation result is output to the output buffer.

It should be noted that, each computing unit block is connected in a parallel manner, and each computing unit corresponds to one weight parameter secondary buffer, and the weight parameters corresponding to each weight parameter secondary buffer may be the same or different. For example, the number of the external targets is 8, and the computing unit blocks have 8 blocks, so when the synchronous computing system needs to realize the processing of 8 paths of independent target data, the computing process of each path is the same, and the weight parameters corresponding to the 8 weight parameter secondary caches are the same; when the number of targets of the external target is 1, that is, only the processing of 1-way data needs to be realized, the 8 pieces of calculation unit blocks calculate different portions of target data corresponding to the external target.

The automatic driving synchronous calculation method can synchronously process different target data with external dynamic changes, and can share various parameters of the RNN network in the calculation process, so that the repeated access of various parameters when RNN acceleration hardware successively calculates the target data of different targets is avoided, and the bandwidth and the power consumption of a synchronous calculation system are saved.

The invention also provides a synchronous calculation device for automatic driving. Referring to fig. 6, the synchronous calculating device for automatic driving of the present invention includes:

an obtaining module 10, configured to obtain a target number of external targets around a target vehicle at a current moment, and target data corresponding to each of the external targets;

the calculating module 20 is configured to send each target data to a corresponding calculating unit block according to the target number, and calculate, by using each calculating unit block, the received target data to obtain a corresponding calculation result;

and the driving module 30 is used for determining a driving strategy of the target vehicle at the next moment according to the calculation result so as to drive the target vehicle according to the driving strategy.

Preferably, the computing module is configured to:

Preferably, each of the calculation unit blocks includes a calculation matrix, each of the calculation matrices includes a data column of a different data dimension, and the calculation module is configured to:

or,

Preferably, the driving module is configured to:

The invention also provides a program product comprising a computer program which, when executed by a processor, implements the steps of the synchronous calculation method of autopilot as described above.

The invention also provides a medium.

The inventive medium is preferably a computer readable storage medium having stored thereon an autopilot synchronization calculation program which when executed by a processor implements the steps of the autopilot synchronization calculation method as described above.

Embodiments of the automatic driving synchronization computing system, the program product and the medium of the present invention may refer to embodiments of the automatic driving synchronization computing method of the present invention, and are not described herein.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, including several instructions for causing an end system (which may be a mobile phone, a computer, a server, an air conditioner, or a network system, etc.) to perform the method according to the embodiments of the present invention.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein, or any application, directly or indirectly, in the field of other related technology.

Claims

1. A synchronous computing method for autopilot, characterized by being applied to a synchronous computing system for autopilot, the synchronous computing system for autopilot comprising a plurality of computing unit blocks, the method comprising the steps of:

sending each piece of target data to a corresponding calculation unit block according to the target number, wherein the calculation unit block comprises the steps of obtaining the data dimension corresponding to each piece of target data, classifying each piece of target data according to the data dimension, and obtaining classified dimension data; transmitting corresponding dimension data of each external target to a corresponding calculation unit block;

the step of sending the corresponding dimension data of each external target to the corresponding computing unit block comprises the following steps:

if the number of the computing unit blocks corresponding to the external target is one, transmitting the corresponding target dimension data of the external target to the corresponding computing unit blocks;

calculating the received target data through each calculation unit block to obtain a corresponding calculation result, wherein the calculation result comprises the steps of respectively obtaining weight parameters corresponding to each row in each calculation matrix; calculating dimension data of each column in each calculation matrix according to the weight parameters corresponding to each row to obtain a corresponding calculation result;

determining a driving strategy of the target vehicle at the next moment according to the calculation result so as to enable the target vehicle to drive according to the driving strategy;

each calculation unit block comprises a calculation matrix, an accumulation and output buffer module, the data of each column in each calculation matrix is dimension data with the same dimension, and when the calculation unit block receives the corresponding dimension data, the received dimension data are respectively written into the data columns with the matched data dimensions; the accumulation and output buffer is used for receiving the calculation result of each row of the calculation matrix and carrying out accumulation and stacking operation on the calculation result to obtain a prediction result.

2. The synchronous calculation method of automatic driving according to claim 1, wherein the step of transmitting each of the target data to a corresponding calculation unit block according to the target number includes:

3. The method for synchronously calculating the autopilot of claim 1 wherein each of said calculation unit blocks includes a calculation matrix, each of said calculation matrices including a data column of a different data dimension, said step of transmitting respective dimension data of each of said external targets to a corresponding calculation unit block includes:

or,

4. An autopilot synchronous computing device, the autopilot synchronous computing device comprising:

the acquisition module is used for acquiring external environment information around the target vehicle at the current moment and determining the target number of external targets around the target vehicle according to the external environment information;

the computing module is used for extracting target data corresponding to each external target from the external environment information, sending each target data to a corresponding computing unit block according to the target number, acquiring data dimensions corresponding to each target data, classifying each target data according to the data dimensions and obtaining classified dimension data; transmitting corresponding dimension data of each external target to a corresponding calculation unit block; the method is also used for respectively acquiring the number of the computing unit blocks corresponding to each external target; if the number of the computing unit blocks corresponding to the external target is at least two, grouping the target dimension data corresponding to the external target, and distributing the grouped target dimension data to a plurality of computing unit blocks corresponding to the external target; if the number of the computing unit blocks corresponding to the external target is one, transmitting the corresponding target dimension data of the external target to the corresponding computing unit blocks; the method is also used for respectively acquiring weight parameters corresponding to each row in each calculation matrix; calculating dimension data of each column in each calculation matrix according to the weight parameters corresponding to each row to obtain a corresponding calculation result;

the running module is used for respectively calculating the received target data through each calculation unit block to obtain a corresponding calculation result, and determining a running strategy of the target vehicle at the next moment according to the calculation result so as to enable the target vehicle to run according to the running strategy;

5. An automated driving synchronous computing system, the automated driving synchronous computing system comprising: memory, a processor and an autopilot synchronization computing program stored on the memory and executable on the processor, which autopilot synchronization computing program when executed by the processor implements the steps of the autopilot synchronization computing method of any one of claims 1 to 3.

6. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon an autopilot synchronization calculation program, which when executed by a processor, implements the steps of the autopilot synchronization calculation method of any one of claims 1 to 3.