CN114297576A - Weighted average calculation method and weighted average calculation device - Google Patents

Weighted average calculation method and weighted average calculation device Download PDF

Info

Publication number
CN114297576A
CN114297576A CN202111355490.3A CN202111355490A CN114297576A CN 114297576 A CN114297576 A CN 114297576A CN 202111355490 A CN202111355490 A CN 202111355490A CN 114297576 A CN114297576 A CN 114297576A
Authority
CN
China
Prior art keywords
weighted average
average calculation
chip
multiply
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111355490.3A
Other languages
Chinese (zh)
Inventor
江鹏
蒲宇
王彤
寇博华
陆启乐
王洁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou C Sky Microsystems Co Ltd
Original Assignee
Pingtouge Shanghai Semiconductor Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pingtouge Shanghai Semiconductor Co Ltd filed Critical Pingtouge Shanghai Semiconductor Co Ltd
Priority to CN202111355490.3A priority Critical patent/CN114297576A/en
Publication of CN114297576A publication Critical patent/CN114297576A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Microcomputers (AREA)

Abstract

Provided are a weighted average calculation method and a weighted average calculation apparatus. The apparatus includes a normalization module to scale a plurality of weights to 2NTo obtain a plurality of weight normalization values; the multiply-accumulate device is used for carrying out multiply-accumulate calculation on the multiple operands and the multiple weight normalization values; and a first register for storing intermediate and final result data of the multiply-accumulator, wherein bit bits remaining after removing lower N bit bits of the final result data are used as a weighted average calculation result of the plurality of weights and the plurality of operands, and N is an integer greater than or equal to 1. The clothesSince the weight normalization value is determined and kept unchanged in the execution process of the hardware circuit, the hardware circuit can mainly comprise a multiply-accumulator, and an adder and a divider are reduced compared with the hardware circuit in the prior art, so that the cost of the hardware circuit can be reduced and the circuit size can be reduced.

Description

Weighted average calculation method and weighted average calculation device
Technical Field
The present disclosure relates to the field of chips, and in particular, to a weighted average calculation method and a weighted average calculation apparatus.
Background
As integrated circuit processes are continuously reduced, cost and power consumption of control chips become important targets of manufacturers of large chips, and an Adaptive Voltage Frequency Scaling (AVFS) scheme is a standard configuration of chips such as a high-end processor (CPU), a Graphics Processing Unit (GPU), and a mobile phone processor in the industry. The AVFS is a technology for reducing power consumption, and is used for acquiring detection data in real time through a process sensor and then calculating and adjusting the clock frequency and the power supply voltage of a chip according to the detection data.
Generally, a plurality of process sensors are disposed at a plurality of positions in a chip in a chain-type structure to obtain detection data at the plurality of positions, and then the detection data at the plurality of positions are integrated to obtain a process deviation for calculating a clock period and a supply voltage of the chip. In the prior art, as an alternative, the process deviation can be obtained by adopting weighted average calculation, but thus, the calculation efficiency of the weighted average calculation becomes an important factor for restricting the clock frequency of the chip and the regulation speed of the power supply voltage.
Disclosure of Invention
In view of the above, an object of the present disclosure is to provide a weighted average calculation method and a weighted average calculation apparatus, so as to improve the calculation efficiency of the weighted average calculation.
In a first aspect, an embodiment of the present disclosure provides a weighted average calculation apparatus, including:
a normalization module for normalizing the plurality of weights to a scaleTo 2NTo obtain a plurality of weight normalization values;
a multiply-accumulate device for performing multiply-accumulate calculation on a plurality of operands and the plurality of weight normalization values;
a first register for storing intermediate and final result data of the multiply-accumulator,
and the residual bits of the final result data after removing the lower N bits are used as the weighted average calculation result of the weights and the operands, and N is an integer greater than or equal to 1.
In some embodiments, the weighted average calculating means further comprises: and the second register is coupled with the first register and used for receiving the bits left after the lower N bits of the final result data are removed.
In some embodiments, the multiply-accumulator comprises three inputs and an output, a first one of the three inputs receives the plurality of operands one by one, a second one of the three inputs receives the plurality of weight normalization values one by one, and a third one of the three inputs is coupled to the output of the multiply-accumulator to receive intermediate result data.
In a second aspect, an embodiment of the present disclosure provides a weighted average calculation method, including:
scaling multiple weights to 2NTo obtain a plurality of weight normalization values;
calculating a sum of multiply-accumulate between the plurality of weight normalization values and a plurality of operands; and
taking the bit left after the lower N bits of the multiply-accumulate sum are removed as the weighted average calculation result of the weights and the operands,
wherein N is an integer greater than or equal to 1, and at least the step of calculating the sum of the multiply-accumulate between the plurality of weight normalization values and the plurality of operands is performed by a hardware circuit.
In some embodiments, the scaling of the plurality of weights to 2NTo obtain a plurality ofThe weight normalization values include:
calculating a proportion of each weight occupying the sum of the plurality of weights; and
the ratio corresponding to each weight is compared with 2NThe result is multiplied as a weight normalization value of the weight.
In some embodiments, N is determined according to the accuracy requirements of the weight calculation.
In a third aspect, embodiments of the present disclosure provide a process sensor, including:
each oscillation ring is formed by connecting a plurality of inverters of the same type end to form a ring and is coupled with a counting unit, and the counting unit is used for counting the inversion times of the plurality of inverters in fixed time and outputting the inversion times of the corresponding oscillation ring in unit time as a counting value;
a sensor controller for performing weighted average calculation on at least one count value output by the at least one oscillation ring and outputting a weighted calculation result,
wherein the sensor controller performs the weighted average calculation using any one of the weighted average calculation means described above.
In a fourth aspect, an embodiment of the present disclosure provides a power consumption control system used in a system on chip, including:
a plurality of sensors forming a unidirectional sensor chain by being connected in series;
with one-way sensor chain coupling end to end's power consumption controller includes:
a chain controller for collecting detection data from each of the unidirectional sensor chains and performing a weighted average calculation on the detection data of the plurality of sensors to determine a process variation of the system-on-chip in a current environment;
a voltage and frequency calculation unit for determining a target frequency and/or a target voltage to which the system on chip is to be adjusted according to the process deviation;
wherein the chain controller performs the weighted average calculation using any one of the weighted average calculation means described above.
In a fifth aspect, an embodiment of the present disclosure provides a power consumption control method, which is applied to a power consumption controller in a system on chip, where the power consumption control method includes:
collecting detection data from each sensor of a unidirectional sensor chain, and performing weighted average calculation on a plurality of detection data to obtain the process deviation of the system on chip under the current environment, wherein the unidirectional sensor chain is formed by connecting a plurality of sensors arranged in the system on chip in series;
determining a target frequency and/or a target voltage to which the system on chip is to be adjusted according to the process deviation;
wherein the weighted average calculation comprises the operations of:
scaling multiple weights to 2NObtaining a plurality of weight normalization values, wherein the weights correspond to the sensors one by one;
calculating a sum of multiply-accumulate between the plurality of weight normalization values and a plurality of operands; and
and taking the bit left after the lower N bits of the multiply-accumulate sum are removed as the weighted average calculation result of the weights and the operands, wherein N is an integer greater than or equal to 1.
In some embodiments, the detection data of each sensor is a weighted average calculation result of a plurality of count values of a plurality of oscillation rings provided inside the sensor, which is calculated by the weighted average.
In a sixth aspect, an embodiment of the present disclosure provides a system on a chip, including:
a processing unit;
the power consumption control system of any of the above;
an on-chip bus for coupling the processing unit and the power consumption controller.
In a seventh aspect, an embodiment of the present disclosure provides a computing apparatus, including:
the system-on-chip of any of the above as a processor;
according to the weighted average calculation method provided by the embodiment of the disclosure, once the weight normalization value is determined, the weight normalization value is basically kept unchanged for the hardware circuit, the execution of the hardware circuit is mainly a multiply-accumulate operation, and compared with the method in the prior art, the execution of the hardware circuit reduces the addition and division of the weight, so that the calculation efficiency of the weighted average calculation can be improved as a whole.
Similarly, according to the weighted average calculation apparatus provided by the embodiment of the present disclosure, the weight normalization value is once determined and remains unchanged during the execution of the hardware circuit, so that the hardware circuit may mainly include a multiplier-accumulator, and the adder and the divider are reduced compared with the hardware circuit in the prior art, thereby reducing the cost of the hardware circuit as a whole and reducing the circuit size.
Further, in the context of AVFS voltage and frequency adjustment of the system-on-chip, weighted average calculation is involved in a plurality of operations, and thus by improving the calculation efficiency of the weighted average calculation, it is helpful to improve the response speed and sensitivity of AVFS frequency and voltage adjustment.
Drawings
The foregoing and other objects, features, and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, which refers to the accompanying drawings in which:
FIG. 1 is a schematic diagram of a system-on-chip including a single sensor chain;
FIG. 2 is a block diagram of an exemplary process sensor of FIG. 1;
FIG. 3 is a flow chart of a weighted average calculation method provided by an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of an exemplary weighted average calculation apparatus provided in the embodiments of the present disclosure;
FIG. 5 is a block diagram of a general-purpose computer system to which embodiments of the present disclosure are applied;
fig. 6 is a block diagram of an embedded system to which an embodiment of the present disclosure is applied.
Detailed Description
The present disclosure is described below based on examples, but the present disclosure is not limited to only these examples. In the following detailed description of the present disclosure, some specific details are set forth in detail. It will be apparent to those skilled in the art that the present disclosure may be practiced without these specific details. Well-known methods, procedures, and procedures have not been described in detail so as not to obscure the present disclosure. The figures are not necessarily drawn to scale.
System on chip
FIG. 1 is a block diagram of a system-on-chip 100 including a single sensor chain. Referring to the figure, processing unit 101 and high speed memory 103, cache 104 are coupled to on-chip bus 102. The on-chip bus 102 is, for example, an AXI bus. The AXI Bus is the most important part of the amba (advanced Microcontroller Bus architecture)3.0 and above protocol proposed by ARM corporation, and is an on-chip Bus oriented to high performance, high bandwidth, and low latency. The AIX bus separates address/control and data phases, supports unaligned data transmission, and simultaneously supports burst transmission and out-of-order transmission, thereby meeting the requirements of ultrahigh performance and complex system-on-chip design. Processing unit 101 may be any processing unit having a different circuit configuration, such as a microprocessor, a microcontroller, a digital processing unit (DSP), a processor core, a Graphics Processing Unit (GPU), a neural network processing unit, or the like. Unlike high speed memory 103, which is provided in a system on a chip, off-chip memory can be larger in capacity but slower and less costly. In some implementations, the high speed memories 103 and 104 may be Static Random Access Memories (SRAMs), while the off-chip memories are DRAM (dynamic random access memory) and flash (flash) memories.
The system-on-chip 100 also includes an AVFS controller 105 coupled to the on-chip bus 102. The AVFS controller 105 is also coupled to the clock management unit 106, respectively, an off-chip power management circuit 211. The system-on-chip 100 also includes a plurality of sensors ps 1-ps 6 coupled with a chain controller 1051 in the AVFS controller 105. In some embodiments, the power management circuit 212 may also be disposed internal to the system-on-chip 200.
The system-on-chip 100 further comprises interface circuitry, not shown, for coupling the system-on-chip 100 with external devices off-chip. The external devices may be, for example, text, audio and video input/output devices and various memories. The processing unit 101 may access off-chip external devices through the interface circuit.
The system-on-chip 100 may also embed basic software, such as an operating system of the system-on-chip, and applications (not shown), which are some proprietary purpose programs. Still other applications may be stored in memory external to the system-on-chip 100, copied into the high-speed memory 104 in the system-on-chip 100 through interface circuitry, or otherwise accessed from resources in the system-on-chip 100 through interface circuitry.
When system-on-chip 100 is in operation, processing unit 101 reads various instructions from cache 104, decodes and executes the instructions. Meanwhile, the processing unit 101 may send a fm and voltage regulation instruction to the AVFS controller 105 via the on-chip bus 102, and the AVFS controller 105 may perform fm and voltage regulation operations according to the fm and voltage regulation instruction or adaptively. When the frequency modulation and voltage regulation operation is performed, the AVFS controller 105 generates a frequency modulation signal REGF and a voltage regulation signal REGV, transmits the frequency modulation signal REGF to the clock management unit 106, transmits the voltage regulation signal REGV to the power management circuit 211, adjusts the clock frequency FCPU by the clock management unit 106, and adjusts the supply voltage VCPU provided to the system-on-chip 100 by the power management circuit 211.
The AVFS controller 105 includes a chain controller 1051 and a frequency voltage calculation unit 1052. The chain controller 1051 is configured to send a read data command to the plurality of sensors ps1 to ps6, and each of the plurality of sensors ps1 to ps6 acquires the detection data after receiving the read data command, and transmits the detection data to the chain controller 1051. In some embodiments, the chain controller 1051 directly transmits the plurality of detection data to the frequency-voltage calculating unit 1052, and the frequency-voltage calculating unit 1052 calculates a process deviation under the current environment according to the plurality of detection data, determines a target frequency and/or a target voltage to be adjusted by the system on chip according to the process deviation, and transmits the target frequency and/or the target voltage via the frequency modulation signal REGF and the voltage regulation signal REGV. In other embodiments, the chain controller 1051 directly transmits the plurality of detection data to the frequency-voltage calculating unit 1052, and the frequency-voltage calculating unit 1052 calculates the process deviation under the current environment according to the plurality of detection data, determines the target frequency and/or the target voltage to be adjusted by the system on chip according to the process deviation, and transmits the target frequency and/or the target voltage via the frequency modulation signal REGF and the voltage regulation signal REGV.
Various algorithms can be used to calculate the process deviation under the current environment according to the detection data. For example, as described in the background, a weighted average calculation is used to obtain the process variation under the current environment from the detected data at a plurality of locations. The corresponding equation (1) is as follows.
Figure BDA0003356953070000051
The sensors 0-5 are detection data returned from the sensors ps 1-ps 6, the sensors are process deviation under the current process environment, the sensors w 0-w 5 are weights corresponding to the sensors ps 1-ps 6 respectively, the weight is sum of the weights, and the sum is sum of multiplication and accumulation of the weights and the detection data.
When performing the weighted average calculation on the detection data of a plurality of process sensors, a weight needs to be configured for each process sensor. In some cases, the weight of each process sensor is determined by its position in the unidirectional sensor chain.
Fig. 2 is a block diagram of an exemplary process sensor ps of fig. 1. As shown in the figure, the process sensor ps includes an LVT ring 201, an RVT ring 202, an HVT ring 203, a computing unit 205 coupled to the LVT ring 201, a counting unit 206 coupled to the RVT ring 202, and a counting unit 207 coupled to the HVT ring 203.
The LVT ring 201 is a loop formed by connecting an odd number of LVT inverters. RVT oscillator loop 202 is a loop of an odd number of RVT inverters. The HVT oscillator 203 is a loop formed by connecting an odd number of HVT inverters. A plurality of inverters of the same type may be looped through the switch control signal. The odd number of inverters can ensure that a signal can return with an opposite signal after passing through the oscillator loop, for example, the LVT oscillator loop 201 starts with 0, and then returns with 1 after passing through the 5 LVT inverters, and the signal returns with the time equal to the sum of the delay times of the odd number of inverters, thereby generating periodic signal oscillation.
The counting units 205 and 207 respectively calculate the count value of the oscillation ring coupled thereto in the unit time under the current condition. Specifically, the LVT inverter, the RVT inverter and the HVT inverter are constituted by inverter cells in the LVT, RVT and HVT standard cell libraries, respectively. According to the design of a standard cell library, different types of inverters are provided with a time (timing) parameter, the parameter comprises inverter delay time under different temperatures, different process deviations and different voltages, the delay time of each of the three types of inverters under the current condition is determined according to the parameter, then the product of the delay time of the inverter and the number of the inverters is the oscillation period time of the oscillation ring, and the count value in the fixed time is divided by the oscillation period time of the oscillation ring, so that the count value of each oscillation ring in the unit time under the current condition is obtained.
In comparison, the HVT inverter has the highest speed, the lowest delay, but the highest power consumption. The LVT inverters have the smallest speed and largest delay, but at the same time have the smallest power consumption. The speed, delay and power consumption of the RVT all lie in between. The process sensor can also be constructed based on speed, delay and power consumption.
As an embodiment, the sensor controller (not shown) or the chain controller (such as the chain controller 1051 of FIG. 1) of each process sensor can perform a weighted average calculation on the count values of the oscillation loops (the oscillation loops 201, 202 and 203 of FIG. 2) to obtain the detection data of each process sensor
Figure BDA0003356953070000061
The LVT oscillator ring 201 has a weight of w0, a count value of counter0, the RVT oscillator ring 202 has a weight of w1, a count value of counter1, the HVT oscillator ring 203 has a weight of w2, and a count value of counter 2.
It should be understood that when performing the weighted average calculation of the count values of the oscillation rings in a certain process sensor, a weight needs to be configured for each oscillation ring, and in some cases, the type of the inverter constituting the oscillation ring determines the weight of each oscillation ring.
Based on the above description, in the context of AVFS voltage and frequency adjustment of the system on chip, weighted average calculation is involved in a plurality of operations, and in the prior art, multiply-accumulate operation, add operation, and divide operation are required to implement weighted average calculation.
In order to improve the calculation efficiency of the weighted average calculation, the embodiments of the present disclosure provide a weighted average calculation method, which may be performed by a hardware circuit or a hardware circuit in combination with a software functional module that sets a function. The flowchart of the method is shown in fig. 3, and specifically includes the following steps.
In step S301, the plurality of weights are scaled to 2NTo obtain a plurality of weight normalization values, N being an integer greater than or equal to 1.
Specifically, for example, w0, w1, w2, w3 are normalized to 2 by scaleNHere, taking N-7 as 128, for example, four operands w 0-1, w 1-4, w 2-5, and w 3-6, the corresponding weight normalization values are as follows:
Figure BDA0003356953070000071
Figure BDA0003356953070000072
Figure BDA0003356953070000073
Figure BDA0003356953070000074
in step S302, the multiply-accumulate sum between the plurality of weight normalization values and the plurality of operands is calculated. Specifically, continuing with the example in step S301, sum is calculated using the following equation (7).
sum=m0*p0+m1*p1+m2*p2+m3*p3(7)
In step S303, the lower N bits of the multiply-accumulate sum are removed to obtain a weighted average calculation result. That is, the operation is equivalent to dividing sum by 128, in which the lower 7 bits of sum obtained by equation (7) are removed and the data consisting of the remaining bits is used as the weighted average calculation result.
In the present embodiment, although step S301 is added, the plurality of weight normalization values are once determined and remain unchanged during the execution of the hardware circuit, and the execution of the hardware circuit mainly consists in the multiply-accumulate calculation, which reduces addition and division compared to the weighted average calculation of the prior art, and thus can improve the calculation efficiency of the weighted average calculation as a whole. Further, the plurality of weight normalization values may be calculated by a software function module of the setting function, that is, step S301 may be executed by the software function module of the setting function.
Accordingly, an embodiment of the present disclosure provides a weighted average calculating device. The computing means is arranged to perform the weighted average computing method shown in figure 3. The calculating device only carries out multiply-accumulate operation, and an adder and a divider are omitted, so that the physical area of the calculating device is reduced, and the calculating efficiency of weighted average calculation can be improved.
Fig. 4 is a schematic structural diagram of an exemplary weighted average calculation apparatus according to an embodiment of the present disclosure. As shown, the computing device 400 includes a multiply-accumulator 401 and a register 402. The multiply-accumulate unit 401 has input terminals A, B and C, the input terminal B is used for receiving an operand, the input terminal a is used for receiving a weight normalization value corresponding to the operand, the input terminal C is used for inputting a multiply-accumulate result output by the multiply-accumulate unit 401, the multiply-accumulate unit 401 performs a multiply-accumulate operation and outputs the multiply-accumulate result, for example, the operand input by the terminal B is pi, the weight normalization value input by the terminal a is mi, sum is the multiply-accumulate result output by the multiply-accumulate unit 401, and the multiply-accumulate result output by each multiply-accumulate operation is:
sum=sum+pi*mi (8)
where i denotes an index number, i going from 0 to 3 on the figure.
The weighted average computing device further comprises a normalization module 403, the normalization module 403 being configured to scale the initial plurality of weights to 2NTo obtain a plurality of weight normalization values mi. Normalization module 403 may be implemented in software, for example, by assigning N in software, and calculating the ratio of each weight to the sum of the weights, and then comparing the ratio of each weight to 2NThe result is multiplied as a weight normalization value of the weight. The resulting plurality of weight normalization values are then provided to a hardware unit for operation. Where N is chosen in relation to the accuracy requirement of the weight calculation, the greater N the higher the accuracy of the weight calculation, e.g. N is 6 or 7.
The figure also shows four- way selectors 501 and 502, i.e. the normalization module 403 feeds operands and weights to the following hardware unit (i.e. the multiplier-accumulator 401 in the figure) using the four- way selectors 501 and 502, but the four- way selectors 501 and 502 may also be omitted or replaced, e.g. it may be replaced by registers or buffers, i.e. the following hardware unit (i.e. the multiplier-accumulator 401 in the figure) is fed operands and weights by means of registers or buffers. In addition, the register 402 is used to store the multiply-accumulate result output by the multiply-accumulator 401, and the register 403 is used to store the remaining bits after the lower N bits are removed by the register 402, that is, the weighted average calculation result. However, the register 403 may be omitted, for example, the remaining bits are directly fed to a system after the lower N bits are removed, for example, to the frequency-voltage calculating unit 1052 for calculating the target frequency and the target voltage.
The weighted average calculation apparatus provided in the above embodiments, once the weight normalization value is determined, remains substantially unchanged for the hardware circuit, and the hardware circuit is implemented mainly by multiplying the accumulator, so that the adder and the divider are reduced compared with the hardware circuit in the prior art, thereby reducing the cost of the hardware circuit as a whole and reducing the circuit size.
Summary of specific applications of weighted average calculation device
The weighted average calculation means provided in the above embodiment may be provided in the process sensor, and configured to perform weighted average calculation on the count values acquired from the plurality of oscillation rings and output the result of the weighted average calculation to a chain controller (e.g., chain controller 1051 in fig. 1) of the AVFS controller.
The weighted average calculation means provided in the above embodiment may be disposed in a chain controller (e.g., chain controller 1051 in fig. 1) of the AVFS controller, and configured to perform weighted average calculation again on the weighted average calculation results obtained from the respective process sensors to obtain the process deviation under the current environment.
The weighted average calculation means provided as the above embodiment may be provided in a chain controller (e.g., chain controller 1051 in fig. 1) of the AVFS controller for weighted average calculation of the count values of the respective oscillators obtained from the respective process sensors and output to a frequency-voltage calculation unit (e.g., frequency-voltage calculation unit 1052 in fig. 1).
The weighted average calculation means provided in the above embodiment may be disposed in a frequency voltage calculation unit (e.g. the frequency voltage calculation unit 1052 in fig. 1) of the AVFS controller, and is used to perform weighted average calculation again on the weighted average calculation results obtained from the respective process sensors to obtain the process deviation under the current environment.
As can be seen from the above description, in the context of AVFS voltage and frequency adjustment of a system on chip, weighted average calculation is involved in a plurality of operations, and thus by improving the calculation efficiency of the weighted average calculation, it is helpful to improve the response speed and sensitivity of AVFS frequency and voltage adjustment.
According to the above description, the embodiments of the present disclosure also provide a power consumption control method for use in a system on chip, including the following steps.
In step S1, detection data is collected from each sensor of a unidirectional sensor chain, such as shown in fig. 1, formed by a plurality of sensors arranged in series in a system-on-chip, and a weighted average calculation is performed on the plurality of detection data to obtain a process deviation of the system-on-chip in the current environment.
In step S2, a target frequency and/or a target voltage to which the system on chip is to be adjusted is determined based on the process variation.
Here, the weighted average calculation in step S1 includes a plurality of steps of fig. 3, in other words, the weighted average calculation in step S1 may be realized by the weighted average calculation means shown in fig. 4.
Further, for each sensor, a weighted average calculation result of a plurality of count values of a plurality of oscillation rings provided inside the sensor is obtained as detection data delivered thereto by a weighted average calculation.
Specific application of system on chip
FIG. 5 is a block diagram of a general-purpose computer system to which embodiments of the present disclosure are applied. As shown, computer system 500 may include one or more processors 12, and memory 14. The system on chip provided by the above embodiments may be used as the processor 12.
The memory 14 in the computer system 500 may be a main memory (referred to simply as main memory or memory). For storing instruction information and/or data information represented by data signals, such as data provided by the processor 12 (e.g., operation results), and for implementing data exchange between the processor 12 and an external storage device 16 (or referred to as an auxiliary memory or an external memory).
In some cases, processor 12 may need to access memory 14 to retrieve data in memory 14 or to make modifications to data in memory 14. To alleviate the speed gap between processor 12 and memory 14 due to the slow access speed of memory 14, computer system 500 further includes a cache memory 18 coupled to bus 11, cache memory 18 being used to cache some data in memory 14, such as program data or message data, that may be repeatedly called. The cache Memory 18 is implemented by a storage device such as a Static Random Access Memory (SRAM). The Cache memory 18 may have a multi-level structure, such as a three-level Cache structure having a first-level Cache (L1 Cache), a second-level Cache (L2 Cache), and a third-level Cache (L3 Cache), or may have a Cache structure with more than three levels or other types of Cache structures. In some embodiments, a portion of the cache memory 18 (e.g., a level one cache, or a level one cache and a level two cache) may be integrated within the processor 12 or in the same system on a chip as the processor 12.
In this regard, the processor 12 may include an instruction execution unit 121, a memory management unit 122, and so on. The instruction execution unit 121 initiates a write access request when executing some instructions that need to modify the memory, where the write access request specifies write data and a corresponding physical address that need to be written into the memory; the memory management unit 122 is configured to translate the virtual addresses specified by the instructions into the physical addresses mapped by the virtual addresses, and the physical addresses specified by the write access request may be consistent with the physical addresses specified by the corresponding instructions.
The information exchange between the memory 14 and the cache 18 is typically organized in blocks. In some embodiments, the cache 18 and the memory 14 may be divided into data blocks by the same spatial size, and a data block may be the smallest unit of data exchange (including one or more data of a preset length) between the cache 18 and the memory 14. For the sake of brevity and clarity, each data block in the cache memory 18 will be referred to below simply as a cache block (which may be referred to as a cacheline or cache line), and different cache blocks have different cache block addresses; each data block in the memory 14 is referred to as a memory block, and different memory blocks have different memory block addresses. The cache block address comprises, for example, a physical address tag for locating the data block.
Due to space and resource constraints, the cache memory 18 cannot cache the entire contents of the memory 14, i.e., the storage capacity of the cache memory 18 is generally smaller than that of the memory 14, and the cache block addresses provided by the cache memory 18 cannot correspond to the entire memory block addresses provided by the memory 14. When the processor 12 needs to access the memory, firstly, the cache memory 18 is accessed through the bus 11 to judge whether the content to be accessed is stored in the cache memory 18, if so, the cache memory 18 hits, and at the moment, the processor 12 directly calls the content to be accessed from the cache memory 18; if the content that the processor 12 needs to access is not in the cache memory 18, the processor 12 needs to access the memory 14 via the bus 11 to look up the corresponding information in the memory 14. Because the access rate of the cache memory 18 is very fast, the efficiency of the processor 12 can be significantly improved when the cache memory 18 hits, thereby also improving the performance and efficiency of the overall computer system 500.
In addition, computer system 500 may also include input/output devices such as storage device 16, display device 13, audio device 19, mouse/keyboard 15, and the like. The storage device 16 is a device for information access such as a hard disk, an optical disk, and a flash memory coupled to the bus 11 via corresponding interfaces. The display device 13 is coupled to the bus 11, for example via a corresponding graphics card, for displaying in accordance with display signals provided by the bus 11.
The computer system 500 also typically includes a communication device 17 and thus may communicate with a network or other devices in a variety of ways. The communication device 17 may comprise, for example, one or more communication modules, by way of example, the communication device 17 may comprise a wireless communication module adapted for a particular wireless communication protocol. For example, the communication device 17 may include a WLAN module for implementing Wi-FiTM communication in compliance with 602.11 standards set by the Institute of Electrical and Electronics Engineers (IEEE); the communication device 17 may also include a WWAN module for implementing wireless wide area communication conforming to a cellular or other wireless wide area protocol; the communication device 17 may also include a communication module using other protocols, such as a bluetooth module, or other custom type communication modules; the communication device 17 may also be a port for serial transmission of data.
Of course, the structure of different computer systems may vary depending on the motherboard, operating system, and instruction set architecture. For example, many computer systems today have an input/output control hub coupled between the bus 11 and various input/output devices, and the input/output control hub may be integrated within the processor 12 or separate from the processor 12.
Fig. 6 is a block diagram of an embedded system to which an embodiment of the present disclosure is applied. The system on chip provided by the above embodiments may be used as the processor 601.
Although the embedded system has a high similarity to a computer system in terms of hardware structure, the application characteristics of the embedded system cause the embedded system to be greatly different from a general computer system in terms of the composition and implementation form of hardware.
First, in order to meet the requirements of the embedded system 600 on speed, volume and power consumption, data that needs to be stored for a long time, such as an operating system, application software, and special data, is usually not used in a storage medium with a large capacity and a low speed, such as a magnetic disk, but a random access Memory 602 or a Flash Memory (Flash Memory)603 is mostly used.
In addition, in the embedded system 600, an a/D (analog/digital conversion) interface 605 and a serial interface 606 are required for the need of measurement and control, which is rarely used in general-purpose computers. The a/D interface 605 mainly performs conversion of an analog signal to a digital signal and conversion of a digital signal to an analog signal required in the test. The embedded system 600 often requires testing when applied to industrial production. Since the single chip generates a digital signal and needs to be converted into an analog signal for testing during testing, unlike a general-purpose computer, an a/D (analog/digital conversion) interface 605 is required to complete the related conversion. In addition, the industry often requires multiple embedded systems to be connected in series to perform related functions, and therefore a serial interface 606 for connecting multiple embedded systems in series is required, which is not required in general purpose computers.
In addition, the embedded system 600 is a basic processing unit, and it is often necessary to connect a plurality of embedded systems 600 into a network in industrial design, so that a network interface 607 for connecting the embedded system 600 into the network is required. This is also mostly not required in general purpose computers. In addition, some embedded systems 600 employ an external bus 604, depending on the application and size. With the rapid expansion of the application field of the embedded system 600, the embedded system 600 tends to be personalized more and more, and the types of buses adopted according to the characteristics of the embedded system 600 are more and more. In addition, in order to test the internal circuit of the embedded processor 601, the boundary scan test technology is commonly used in the processor chip. To accommodate this testing, a debug interface 608 is employed.
With the rapid development of Very Large Scale integrated circuits (Very Large Scale Integration) and semiconductor processes, part or all of the embedded system can be implemented on a silicon chip, i.e., an embedded system on a chip (SoC).
Commercial value of the disclosed embodiments
The weighted average calculation device provided by the embodiment of the disclosure can improve the calculation efficiency of weighted average calculation, can be applied to many scenes including AVFS frequency and voltage regulation, and thus has commercial value and economic value.
As will be appreciated by one skilled in the art, the present disclosure may be embodied as systems, methods and computer program products. Accordingly, the present disclosure may be embodied in the form of entirely hardware, entirely software (including firmware, resident software, micro-code), or in the form of a combination of software and hardware. Furthermore, in some embodiments, the present disclosure may also be embodied in the form of a computer program product in one or more computer-readable media having computer-readable program code embodied therein.
Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium is, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer-readable storage medium include: an electrical connection for the particular wire or wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical memory, a magnetic memory, or any suitable combination of the foregoing. In this context, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with a processing unit, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a chopper. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any other suitable combination. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., and any suitable combination of the foregoing.
Computer program code for carrying out embodiments of the present disclosure may be written in one or more programming languages or combinations. The programming language includes an object-oriented programming language such as JAVA, C + +, and may also include a conventional procedural programming language such as C. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims (12)

1. A weighted average calculation apparatus comprising:
a normalization module for scaling the plurality of weights to 2NTo obtain a plurality of weight normalization values;
a multiply-accumulate device for performing multiply-accumulate calculation on a plurality of operands and the plurality of weight normalization values;
a first register for storing intermediate and final result data of the multiply-accumulator,
and the bit left after removing the lower N bits of the final result data is used as the weighted average calculation result of the weights and the operands, and N is an integer greater than or equal to 1.
2. The weighted average calculation apparatus according to claim 1, further comprising: and the second register is coupled with the first register and used for receiving the bits left after the lower N bits of the final result data are removed.
3. The weighted average calculation device of claim 1, the multiply-accumulator comprising three inputs and one output, a first one of the three inputs receiving the plurality of operands one by one, a second one of the three inputs receiving the plurality of weight normalization values one by one, a third one of the three inputs coupled to the output of the multiply-accumulator to receive intermediate result data.
4. A weighted average calculation method, comprising:
scaling multiple weights to 2NTo obtain a plurality of weight normalization values;
calculating a sum of multiply-accumulate between the plurality of weight normalization values and a plurality of operands; and
taking the bit left after the lower N bits of the multiply-accumulate sum are removed as the weighted average calculation result of the weights and the operands,
wherein N is an integer greater than or equal to 1, and at least the step of calculating the sum of the multiply-accumulate between the plurality of weight normalization values and the plurality of operands is performed by a hardware circuit.
5. The weighted average calculation method of claim 4, wherein the plurality of weights are scaled to 2NObtaining a plurality of weight normalization values comprises:
calculating a proportion of each weight occupying the sum of the plurality of weights; and
the ratio corresponding to each weight is compared with 2NThe result is multiplied as a weight normalization value of the weight.
6. The weighted average calculation method of claim 4 wherein N is determined according to the accuracy requirement of the weight calculation.
7. A process sensor, comprising:
each oscillation ring is formed by connecting a plurality of inverters of the same type end to form a ring and is coupled with a counting unit, and the counting unit is used for counting the inversion times of the plurality of inverters in fixed time and outputting the inversion times of the corresponding oscillation ring in unit time as a counting value;
a sensor controller for performing weighted average calculation on at least one count value output by the at least one oscillation ring and outputting a weighted calculation result,
wherein the sensor controller performs the weighted average calculation using the weighted average calculation device according to any one of claims 4 to 6.
8. A power consumption control system for use in a system on a chip, comprising:
a plurality of sensors forming a unidirectional sensor chain by being connected in series;
with one-way sensor chain coupling end to end's power consumption controller includes:
a chain controller for collecting detection data from each of the unidirectional sensor chains and performing a weighted average calculation on the detection data of the plurality of sensors to determine a process variation of the system-on-chip in a current environment;
a voltage and frequency calculation unit for determining a target frequency and/or a target voltage to which the system on chip is to be adjusted according to the process deviation;
wherein the chain controller performs the weighted average calculation using the weighted average calculation device according to any one of claims 4 to 6.
9. A power consumption control method for use in a system on a chip, comprising:
collecting detection data from each sensor of a unidirectional sensor chain, and performing weighted average calculation on a plurality of detection data to obtain the process deviation of the system on chip under the current environment, wherein the unidirectional sensor chain is formed by connecting a plurality of sensors arranged in the system on chip in series;
determining a target frequency and/or a target voltage to which the system on chip is to be adjusted according to the process deviation;
wherein the weighted average calculation comprises the operations of:
scaling multiple weights to 2NObtaining a plurality of weight normalization values, wherein the weights correspond to the sensors one by one;
calculating a sum of multiply-accumulate between the plurality of weight normalization values and a plurality of operands; and
and taking the bit left after the lower N bits of the multiply-accumulate sum are removed as the weighted average calculation result of the weights and the operands, wherein N is an integer greater than or equal to 1.
10. The power consumption control method according to claim 9, wherein the detection data of each sensor is a weighted average calculation result of a plurality of count values of a plurality of oscillation rings provided inside the sensor, which is calculated by the weighted average.
11. A system on a chip, comprising:
a processing unit;
the power consumption control system of claim 8;
an on-chip bus for coupling the processing unit and the power consumption controller.
12. A computing device, comprising:
the system-on-chip as claimed in claim 11 as a processor;
a bus;
a memory device coupled to the processor through the bus.
CN202111355490.3A 2021-11-16 2021-11-16 Weighted average calculation method and weighted average calculation device Pending CN114297576A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111355490.3A CN114297576A (en) 2021-11-16 2021-11-16 Weighted average calculation method and weighted average calculation device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111355490.3A CN114297576A (en) 2021-11-16 2021-11-16 Weighted average calculation method and weighted average calculation device

Publications (1)

Publication Number Publication Date
CN114297576A true CN114297576A (en) 2022-04-08

Family

ID=80964540

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111355490.3A Pending CN114297576A (en) 2021-11-16 2021-11-16 Weighted average calculation method and weighted average calculation device

Country Status (1)

Country Link
CN (1) CN114297576A (en)

Similar Documents

Publication Publication Date Title
US9870341B2 (en) Memory reduction method for fixed point matrix multiply
KR101253012B1 (en) Method and apparatus to facilitate shared pointers in a heterogeneous platform
CN111316261B (en) Matrix computing engine
JPH05502125A (en) Microprocessor with last-in, first-out stack, microprocessor system, and method of operating a last-in, first-out stack
TWI506428B (en) Method and system for optimizing prefetching of cache memory lines
US20140282578A1 (en) Locality aware work stealing runtime scheduler
US20210089459A1 (en) Storage control apparatus, processing apparatus, computer system, and storage control method
CN104823153B (en) Processor, method, communication equipment, machine readable media, the equipment and equipment for process instruction of normalization add operation for execute instruction
CN114297576A (en) Weighted average calculation method and weighted average calculation device
KR20130005292A (en) Methods and apparatus for sum of address compare in a content-addressable memory
US9134751B2 (en) Time keeping in unknown and unstable clock architecture
CN114924792A (en) Instruction decoding unit, instruction execution unit, and related devices and methods
CN114721464A (en) System on chip and computing device
CN114185837A (en) System on chip and method for adjusting voltage and frequency
CN113656331A (en) Method and device for determining access address based on high and low bits
CN114297131B (en) Sensor control system, system on chip and computing device
CN115202468A (en) Power consumption control system for system on chip, system on chip and computing device
CN111381875B (en) Data comparator, data processing method, chip and electronic equipment
CN114185834A (en) System on chip and method for voltage and frequency regulation
CN112580278A (en) Optimization method and optimization device for logic circuit and storage medium
US7634636B2 (en) Device, system and method of reduced-power memory address generation
US8768991B2 (en) Mechanism to find first two values
CN114258533A (en) Optimizing access to page table entries in a processor-based device
US11175926B2 (en) Providing exception stack management using stack panic fault exceptions in processor-based devices
CN111506530A (en) Interrupt management system and management method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20240223

Address after: 310052 Room 201, floor 2, building 5, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Applicant after: C-SKY MICROSYSTEMS Co.,Ltd.

Country or region after: China

Address before: 200120 floor 5, No. 366, Shangke road and No. 2, Lane 55, Chuanhe Road, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai

Applicant before: Pingtouge (Shanghai) semiconductor technology Co.,Ltd.

Country or region before: China

TA01 Transfer of patent application right