JP6242170B2

JP6242170B2 - Circuit design support apparatus and program

Info

Publication number: JP6242170B2
Application number: JP2013234536A
Authority: JP
Inventors: 山本　亮; 亮山本; 峯岸　孝行; 孝行峯岸; 平野　進; 進平野; 中村　稔; 稔中村
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2013-11-13
Filing date: 2013-11-13
Publication date: 2017-12-06
Anticipated expiration: 2033-11-13
Also published as: JP2015095130A

Description

本発明は、回路設計を支援する回路設計支援装置に関する。
より具体的には、動作記述コード（以下、単に「動作記述」ともいう）からレジスタ転送レベルを自動生成する高位合成（動作合成）を利用した半導体設計を支援する回路設計支援装置に関する。 The present invention relates to a circuit design support apparatus that supports circuit design.
More specifically, the present invention relates to a circuit design support apparatus that supports semiconductor design using high-level synthesis (behavioral synthesis) that automatically generates a register transfer level from an operation description code (hereinafter also simply referred to as “behavior description”).

従来の半導体集積回路設計では、回路に含まれるレジスタ（フリップフロップ）とレジスタ間の組み合わせ回路の動作を記述するレジスタ転送レベル（ＲＴＬ）をハードウェア記述言語で設計していた。
近年では集積回路の回路規模が増大しており、ＲＴＬ設計時間を多大に要することが問題となっている。
そこで、ＲＴＬよりも抽象度が高い高級言語であるＣ言語、Ｃ＋＋言語、ＳｙｓｔｅｍＣ言語などを用いて自動的にＲＴＬを生成する高位合成技術が提唱されており、これを実現する高位合成ツールが市販されている。 In conventional semiconductor integrated circuit design, a register transfer level (RTL) that describes the operation of a register (flip-flop) included in a circuit and a combinational circuit between the registers is designed in a hardware description language.
In recent years, the circuit scale of integrated circuits has increased, and it has been a problem that it takes a lot of RTL design time.
Therefore, a high-level synthesis technology that automatically generates RTL using a high-level language such as C language, C ++ language, or System C language that has a higher abstraction level than RTL has been proposed, and a high-level synthesis tool that realizes this is commercially available. Has been.

しかし、高位合成ツールで生成されるＲＴＬは可読性が悪いため、このような高級言語を用いた設計では、生成されるＲＴＬの性能に問題が発生した場合に、設計者にＲＴＬを解析させることなく、発生した問題についての適切な情報を設計者に通知する必要がある。
そのため、特許文献１では、遅延不具合が発生した場合に、遅延不具合に対応する高級言語のソースコード箇所、また遅延不具合に関連する高級言語のソースコード箇所をユーザに通知することが示されている。
また、特許文献２では、高級言語から回路の消費電力を解析する方法が開示されている。
特許文献２では、回路の内部処理が並列処理可能で、かつ、並列化する回路の動作周波数を下げることが可能であり、削減される消費電力が指定閾値以上である場合に、回路を並列化することで、消費電力を低減することが開示されている。
補足すると、一般に消費電力を削減する技術の一つとして、動作周波数を下げて、回路を並列化する手法がある。
上記の特許技術文献２では、その手法を利用している。
一般に消費電力は、回路規模と動作周波数に比例するが、例えば動作周波数を半分にした場合に、要求される処理能力を動作周波数を半分にする前と同じにするには、回路を並列化し、処理能力をあげる必要がある。
このとき、並列化回路の規模がもとの回路の２倍以上になっても、電源電圧を下げることが可能となれば、電源電圧は式（１）のように消費電力には二乗で効くため、消費電力を下げることが可能となる。
または、並列化しても回路規模が２倍以上とならない場合は、周波数が１／２になって、回路規模は２倍にならないため、消費電力は下がることになる。
Ｐ＝α・ｆ・Ｃ・ｖ^２・・・式（１）
なお、式（１）において、Ｐは消費電力であり、αは動作率であり、ｆは動作周波数であり、Ｃは回路規模であり、ｖは電源電圧である。 However, since the RTL generated by the high-level synthesis tool has poor readability, in the design using such a high-level language, if a problem occurs in the performance of the generated RTL, the designer is not allowed to analyze the RTL. , The designer needs to be notified of the appropriate information about the problem that occurred.
Therefore, Patent Document 1 shows that when a delay defect occurs, the user is notified of a high-level language source code location corresponding to the delay failure and a high-level language source code location related to the delay failure. .
Patent Document 2 discloses a method for analyzing power consumption of a circuit from a high-level language.
In Patent Document 2, parallel processing is possible when internal processing of a circuit can be performed in parallel, and the operating frequency of the circuit to be parallelized can be reduced, and the power consumption to be reduced is equal to or greater than a specified threshold value. Thus, it is disclosed to reduce power consumption.
Supplementally, as one of the techniques for reducing power consumption, there is a technique of reducing the operating frequency to parallelize the circuits.
In the above-mentioned Patent Document 2, this technique is used.
Generally, the power consumption is proportional to the circuit scale and the operating frequency. For example, when the operating frequency is halved, in order to make the required processing capacity the same as before halving the operating frequency, the circuit is parallelized, It is necessary to increase the processing capacity.
At this time, if the power supply voltage can be lowered even if the scale of the parallel circuit is more than twice that of the original circuit, the power supply voltage works on the power consumption by the square as shown in Equation (1). Therefore, power consumption can be reduced.
Alternatively, if the circuit scale does not double or more even when parallelized, the frequency is halved and the circuit scale does not double, resulting in a reduction in power consumption.
P = α · f · C · v ² Formula (1)
In Equation (1), P is power consumption, α is an operating rate, f is an operating frequency, C is a circuit scale, and v is a power supply voltage.

特開２００８−７７４９０号公報JP 2008-77490 A 特開２００１−１４２９２７号公報JP 2001-142927 A

特許文献１に記載の方法は、遅延不具合の主因となるクリティカルパスの位置を設計者に通知するものの、クリティカルパスを直接改善するものではない。
高位合成上、クリティカルパスの改善は、ツール依存性の問題もあり、クリティカルパス箇所の改善方法が一般化されておらず、設計者が改善することは難しいといった問題がある。
設計者が容易にクリティカルパスの改善を行えるようにすることが設計の効率化の面で望ましい。
しかしながら、前述したように、特許文献１の方法は、クリティカルパスの位置を設計者に通知するものの、クリティカルパスを改善し、遅延不具合を解消するものではない。 Although the method described in Patent Document 1 notifies the designer of the position of the critical path that is the main cause of the delay defect, it does not directly improve the critical path.
In high-level synthesis, there is a problem that the improvement of the critical path has a problem of tool dependency, and the improvement method of the critical path part is not generalized, and it is difficult for the designer to improve.
It is desirable in terms of design efficiency that the designer can easily improve the critical path.
However, as described above, the method of Patent Document 1 notifies the designer of the position of the critical path, but does not improve the critical path and eliminate the delay defect.

また、特許文献２による回路並列化及び動作周波数の低減による消費電力削減方法では、効果的に消費電力を削減できないといった課題がある。
具体的には、ＬＳＩ（ＬａｒｇｅＳｃａｌｅＩｎｔｅｇｒａｔｉｏｎ）の中で、回路並列化ができる箇所は、非常に多くある。
図１６に回路並列化前の回路を示す。
図中で並列化が可能な箇所は、ａ−ｂ間、ｂ−ｃ間、ｃ−ｄ間、ｄ−ｅ間の４つある。
さらにこれら２つを組み合わせると、４Ｃ２＝６通り、さらに４Ｃ３＝４通り、全て実施して１通りとなり、この４つの小さい論理だけで１５通り（４＋６＋４＋１＝１５）の候補がある。
一般化するとΣｎ＝１：Ｎ _ＮＣ_ｎとなる。
並列化対象となる回路はＬＳＩの回路の中で点在しており、その組み合わせは膨大となり、実時間で処理ができないといった問題がある。
また、ここで、最も広範囲に並列化ができるａ−ｅ間を並列化したとしても、それが消費電力の改善に必ずしも効果的であるとは限らない。
このように、従来技術では、並列化が可能な箇所を抽出するのに長時間を要し、また、長時間をかけても必ずしも消費電力の改善に効果的な箇所を抽出できるとも限らない。 Further, the power consumption reduction method by circuit parallelization and reduction of the operating frequency according to Patent Document 2 has a problem that power consumption cannot be effectively reduced.
Specifically, in LSI (Large Scale Integration), there are many places where circuit parallelization is possible.
FIG. 16 shows a circuit before circuit parallelization.
In the figure, there are four places where parallelization is possible, between ab, bc, cd, and de.
Furthermore, when these two are combined, 4C2 = 6 ways, further 4C3 = 4 ways, all are implemented as one way, and there are 15 (4 + 6 + 4 + 1 = 15) candidates with only these four small logics.
When generalized, Σn = 1: N _N C _n .
Circuits to be parallelized are scattered in the LSI circuit, and the combination becomes enormous, and there is a problem that processing cannot be performed in real time.
Here, even if a-e that can be parallelized in the widest range is parallelized, this is not necessarily effective in improving power consumption.
As described above, in the conventional technique, it takes a long time to extract a portion that can be parallelized, and even if it takes a long time, it is not always possible to extract a portion that is effective in improving power consumption.

この発明は、上記の事情に鑑みたものであり、遅延不具合を解消するための並列化及び消費電力を改善するための並列化のいずれに対しても、並列化の対象箇所を効率的に決定することを主な目的とする。 The present invention has been made in view of the above circumstances, and the parallelization target portion is efficiently determined for both the parallelization for eliminating the delay defect and the parallelization for improving the power consumption. The main purpose is to do.

本発明に係る回路設計支援装置は、
設計対象回路の動作記述コードに含まれる複数のコードブロックの中から、前記複数のコードブロックの高位合成により生成される複数のデータパスのうちのクリティカルパスに対応するコードブロックをクリティカルコードブロックとして抽出するコードブロック抽出部と、
前記複数のコードブロックから、前記クリティカルコードブロックよりも後のコードブロックであって回路並列化の対象外のコードブロックである並列化対象外コードブロック及び前記並列化対象外コードブロックに後続するコードブロックを除外して、クリティカルコードブロックを含む、回路並列化の候補になるコードブロックを並列化候補コードブロックとして特定し、前記並列化候補コードブロックにおいて、評価対象となるコードブロックの範囲である評価対象範囲を前記クリティカルコードブロックを中心にして広狭を変化させて複数設定し、評価対象範囲ごとに、評価対象範囲のコードブロックに対応するデータパスの特性を評価し、複数の評価対象範囲の中から、回路並列化の対象にする評価対象範囲を選択する評価部とを有することを特徴とする。 A circuit design support apparatus according to the present invention includes:
A code block corresponding to a critical path among a plurality of data paths generated by high-level synthesis of the plurality of code blocks is extracted as a critical code block from a plurality of code blocks included in the operation description code of the circuit to be designed. A code block extractor to perform,
From the plurality of code blocks, a code block after the critical code block and a code block that is not subject to circuit parallelization, and a code block that follows the non-parallelization target code block And a code block that is a candidate for circuit parallelization including a critical code block is identified as a parallelization candidate code block, and an evaluation target that is a range of code blocks to be evaluated in the parallelization candidate code block A plurality of ranges are set by changing the range around the critical code block, and the characteristics of the data path corresponding to the code block in the evaluation target range are evaluated for each evaluation target range. And an evaluation unit that selects an evaluation target range to be subjected to circuit parallelization. Characterized in that it.

本発明によれば、動作記述コードの全てのコードブロックを並列化の評価の対象とするのではなく、並列化候補コードブロックに限定して評価するため、並列化の対象箇所を効率的に決定することができる。
また、クリティカルコードブロックを中心にして評価対象範囲を設定するため、並列化の対象箇所を効率的に決定することができる。
また、各評価対象範囲に含まれるコードブロックに対応するデータパスでの遅延時間を評価することにより、遅延不具合に対処するための並列化の対象箇所を効率的に決定することができる。
また、各評価対象範囲に含まれるコードブロックに対応するデータパスで構成される回路が並列化された場合の設計対象回路の回路規模を評価することにより、消費電力の改善に対処するための並列化の対象箇所を効率的に決定することができる。 According to the present invention, not all code blocks of the behavioral description code are subjected to parallelization evaluation, but are limited to the parallelization candidate code blocks. can do.
In addition, since the evaluation target range is set centering on the critical code block, it is possible to efficiently determine the target part of parallelization.
In addition, by evaluating the delay time in the data path corresponding to the code block included in each evaluation target range, it is possible to efficiently determine the target part of parallelization for dealing with the delay defect.
In addition, by evaluating the circuit scale of the design target circuit when the circuit configured by the data path corresponding to the code block included in each evaluation target range is parallelized, parallel processing for coping with improvement of power consumption It is possible to efficiently determine the location to be converted.

実施の形態１に係る半導体設計支援装置の構成例を示す図。1 is a diagram illustrating a configuration example of a semiconductor design support apparatus according to a first embodiment. 実施の形態１に係る動作記述コードの例を示す図。FIG. 4 is a diagram illustrating an example of an operation description code according to the first embodiment. 実施の形態１に係る半導体設計支援装置の速度改善モードにおける動作例を示すフローチャート図。FIG. 6 is a flowchart showing an operation example in a speed improvement mode of the semiconductor design support apparatus according to the first embodiment. 実施の形態１に係る高位合成部により生成される回路の例を示す図。FIG. 3 is a diagram illustrating an example of a circuit generated by the high-level synthesis unit according to the first embodiment. 実施の形態１に係るタイミング解析部により生成される遅延情報テーブルの例を示す図。FIG. 6 is a diagram illustrating an example of a delay information table generated by the timing analysis unit according to the first embodiment. 実施の形態１に係るタイミング解析部により生成されるヒストグラムの例を示す図。FIG. 4 is a diagram illustrating an example of a histogram generated by a timing analysis unit according to the first embodiment. 実施の形態１に係るコード削減時の回路の例を示す図。FIG. 3 shows an example of a circuit at the time of code reduction according to the first embodiment. 実施の形態１に係るコード削減時の遅延情報テーブルの例を示す図。FIG. 6 shows an example of a delay information table at the time of code reduction according to the first embodiment. 実施の形態１に係るコード削減時のヒストグラムの例を示す図。FIG. 6 is a diagram showing an example of a histogram at the time of code reduction according to the first embodiment. 実施の形態１に係るコード削減時の回路の例を示す図。FIG. 3 shows an example of a circuit at the time of code reduction according to the first embodiment. 実施の形態１に係る回路並列化判定部の動作例を示すフローチャート図。FIG. 3 is a flowchart showing an operation example of a circuit parallelization determining unit according to the first embodiment. 実施の形態１に係る半導体設計支援装置の消費電力改善モードにおける動作例を示すフローチャート図。The flowchart figure which shows the operation example in the power consumption improvement mode of the semiconductor design assistance apparatus which concerns on Embodiment 1. FIG. 実施の形態１に係る半導体設計支援装置により生成される入出力制御回路と回路並列を接続した例を示す図。The figure which shows the example which connected the input-output control circuit produced | generated by the semiconductor design support apparatus which concerns on Embodiment 1, and circuit parallel. 実施の形態１に係る入出力制御回路と回路の動作例を表すタイミングチャート図。4 is a timing chart illustrating an operation example of an input / output control circuit and a circuit according to Embodiment 1. FIG. 実施の形態１に係る半導体設計支援装置のハードウェア構成例を示す図。1 is a diagram illustrating a hardware configuration example of a semiconductor design support apparatus according to a first embodiment. 従来技術の課題を説明するための図。The figure for demonstrating the subject of a prior art.

実施の形態１．
本実施の形態では、半導体設計支援装置を説明する。
より具体的には、本実施の形態に係る半導体設計支援装置は、動作合成の対象となるコード記述から得たタイミング情報を含むスケジューリング結果、またはＲＴＬをタイミング解析した結果から、最も遅延が大きい箇所を抽出する。
そして、本実施の形態に係る半導体設計支援装置は、最も遅延が大きい箇所を中心として、回路の並列化が可能であり、かつ、効果的に回路の並列化による動作周波数の削減で、消費電力が改善される箇所を特定する。
また、本実施の形態に係る半導体設計支援装置は、最も遅延が大きい箇所を中心として、回路の並列化が可能であり、かつ、回路規模を抑えつつ、タイミングが改善される箇所を特定することもできる。
更に、本実施の形態に係る半導体設計支援装置は、設計者が再設計することなく、動作周波数を削減し、かつ並列化された回路と、その回路の入出力回路を自動で生成することも可能である。 Embodiment 1 FIG.
In this embodiment, a semiconductor design support apparatus will be described.
More specifically, the semiconductor design support apparatus according to the present embodiment has a portion with the largest delay from the scheduling result including timing information obtained from the code description to be subjected to behavioral synthesis or the result of timing analysis of the RTL. To extract.
Then, the semiconductor design support apparatus according to the present embodiment is capable of circuit paralleling around the point with the longest delay, and can effectively reduce power consumption by reducing the operating frequency by paralleling the circuit. Identify where the improvement is.
In addition, the semiconductor design support apparatus according to the present embodiment can specify a location where the circuit can be paralleled and the timing is improved while suppressing the circuit scale, with the location having the largest delay as the center. You can also.
Furthermore, the semiconductor design support apparatus according to the present embodiment can reduce the operating frequency and automatically generate the parallelized circuit and the input / output circuit of the circuit without redesigning by the designer. Is possible.

図１は、本実施の形態に係る半導体設計支援装置１０の構成例を示す。
半導体設計支援装置１０は、回路設計支援装置の例に相当する。
図１に示すように、半導体設計支援装置１０は、入出力部１００と、データ処理部２００と、記憶部３００を備える。 FIG. 1 shows a configuration example of a semiconductor design support apparatus 10 according to the present embodiment.
The semiconductor design support apparatus 10 corresponds to an example of a circuit design support apparatus.
As shown in FIG. 1, the semiconductor design support apparatus 10 includes an input / output unit 100, a data processing unit 200, and a storage unit 300.

入出力部１００において、高位言語表示部１０１は、高位合成への入力となる動作記述コードを表示する。
動作記述コードでは、設計対象回路の動作が高位言語で記述されている。
また、動作記述コードには、複数のコードブロック（一塊の記述）が含まれる。
各コードブロックは、設計対象回路に含まれるデータパスに対応する。 In the input / output unit 100, the high-level language display unit 101 displays an operation description code serving as an input to the high-level synthesis.
In the operation description code, the operation of the circuit to be designed is described in a high-level language.
The operation description code includes a plurality of code blocks (a group of descriptions).
Each code block corresponds to a data path included in the circuit to be designed.

並列化可能領域表示部１０２は、動作記述コードにおいて回路並列化が可能な領域である並列化可能領域を表示する。
並列化可能領域は、動作記述コードの複数のコードブロックから、並列化対象外コードブロック及び並列化対象外コードブロックに後続するコードブロックを除外した範囲である。
並列化対象外コードブロックは、回路並列化の対象外のデータパスに対応するコードブロックである。
並列化可能領域は、並列化の候補になるデータパスに対応するコードブロックの範囲である。
以下では、並列化可能領域は、並列化候補コードブロックともいう。 The parallelizable area display unit 102 displays a parallelizable area that is an area in which circuit parallelization is possible in the operation description code.
The parallelizable area is a range in which a code block that is not subject to parallelization and a code block that follows the code block that is not subject to parallelization are excluded from a plurality of code blocks of the behavioral description code.
The non-parallelization target code block is a code block corresponding to a data path that is not subject to circuit parallelization.
The parallelizable area is a range of code blocks corresponding to data paths that are candidates for parallelization.
Hereinafter, the parallelizable area is also referred to as a parallelization candidate code block.

並列化指示部１０３は、設計者が動作記述コード上で並列化対象領域を指定するための要素である。
遅延情報表示部１０４は、動作記述コード上の遅延情報（スラックともいう）、クリティカルパス、遅延情報のヒストグラムを表示する。
モード指定部１０５は、速度改善モード（第１の評価モード）又は消費電力改善モード（第２の評価モード）を指定する。
性能表示部１０６は、生成された回路の遅延情報及び回路規模を表示する。
性能表示部１０６は、評価結果表示部の例に相当する。 The parallelization instructing unit 103 is an element for the designer to specify a parallelization target area on the operation description code.
The delay information display unit 104 displays a delay information (also referred to as slack), critical path, and delay information histogram on the operation description code.
The mode designation unit 105 designates a speed improvement mode (first evaluation mode) or a power consumption improvement mode (second evaluation mode).
The performance display unit 106 displays the generated delay information and circuit scale of the circuit.
The performance display unit 106 corresponds to an example of an evaluation result display unit.

記憶部３００において、スケジューリング結果・ＲＴＬ記憶部３０１は、動作記述コードの高位合成により得られるスケジューリング結果・ＲＴＬを記憶する。 In the storage unit 300, the scheduling result / RTL storage unit 301 stores the scheduling result / RTL obtained by high-level synthesis of the behavioral description code.

遅延情報記憶部３０２は、動作記述コードのスケジューリング結果またはＲＴＬを静的解析した結果から得られた遅延情報が記述される遅延情報テーブルを記憶する。
遅延情報テーブルでは、データパスごとに、遅延情報とコードブロックとが対応付けられている。 The delay information storage unit 302 stores a delay information table in which delay information obtained from the scheduling result of the operation description code or the result of static analysis of the RTL is described.
In the delay information table, delay information and code blocks are associated with each data path.

ヒストグラム記憶部３０３は、遅延情報のヒストグラムを記憶する。 The histogram storage unit 303 stores a histogram of delay information.

クリティカルパス情報記憶部３０４は、設計対象回路に含まれるデータパスのうちクリティカルパスに相当するデータパスに対応するコードブロック（クリティカルコードブロック）の情報をクリティカルパス情報として記憶する。 The critical path information storage unit 304 stores information on a code block (critical code block) corresponding to a data path corresponding to the critical path among the data paths included in the design target circuit as critical path information.

並列化可能領域記憶部３０５は、並列化可能領域を記憶する。 The parallelizable area storage unit 305 stores a parallelizable area.

並列化回路コード記憶部３０６は、並列化した回路及び並列化に必要な入出力回路の高位言語による動作記述コード又はＲＴＬを記憶する。 The parallel circuit code storage unit 306 stores the parallel circuit and the operation description code or RTL in the high-level language of the input / output circuit necessary for parallelization.

データ処理部２００において、高位合成部２０１は、高位言語での動作記述コード及び高位合成に必要なオプションを入力し、高位合成後の中間結果であるスケジューリング結果、及び合成の結果であるＲＴＬを生成する。 In the data processing unit 200, the high-level synthesis unit 201 inputs a behavior description code in a high-level language and options necessary for high-level synthesis, and generates a scheduling result that is an intermediate result after the high-level synthesis and an RTL that is a synthesis result. To do.

タイミング解析部２０２は、以下の３つの機能を有する。
（ａ）動作記述コード上の変数や処理に対応する遅延情報を取得し、コードブロックと遅延情報を対応づける。
（ｂ）上記（ａ）で得られた遅延情報を元にヒストグラムを作成する。
（ｃ）得られた遅延情報からクリティカルパスを抽出する。 The timing analysis unit 202 has the following three functions.
(A) Obtain delay information corresponding to variables and processing on the operation description code, and associate the code block with the delay information.
(B) A histogram is created based on the delay information obtained in (a) above.
(C) A critical path is extracted from the obtained delay information.

タイミング解析部２０２の入力は、動作記述コードと当該動作記述コードに対応するスケジューリング結果、またはＲＴＬである。
タイミング解析部２０２の出力は、クリティカルパスと当該クリティカルパスに対応するコードブロック（クリティカルコードブロック）、遅延情報のヒストグラムである。 The input of the timing analysis unit 202 is an operation description code and a scheduling result corresponding to the operation description code, or RTL.
The output of the timing analysis unit 202 is a histogram of critical paths, code blocks (critical code blocks) corresponding to the critical paths, and delay information.

まず、上記（ａ）について説明する。
タイミング解析部２０２は、以下の（１）または（２）を元に、設計対象回路内で、全ての遅延情報（スラック）とそれに対応する動作記述コード上の変数を取得する。
タイミング解析方法は、以下の２つから選択できる。
２つのうちいずれかで実施するかは、設計者が選択できる。
（１）高位合成後のスケジューリング結果から解析する
（２）静的タイミング解析により解析する（ＲＴＬのマッピング） First, (a) will be described.
Based on the following (1) or (2), the timing analysis unit 202 acquires all delay information (slack) and variables on the operation description code corresponding thereto in the design target circuit.
The timing analysis method can be selected from the following two.
The designer can select whether to implement either of the two.
(1) Analyze from scheduling results after high-level synthesis (2) Analyze by static timing analysis (RTL mapping)

上記（１）の利点は、論理合成などの後工程に依らず、短期間に実施ができる点である。
一方で、消費電力の改善、または速度改善の効果が低い可能性がある。
上記（２）の利点は、マッピング後の解析となるため、情報が正確であり、消費電力の改善、または速度改善の効果が高い可能性がある。
しかし、マッピング試行などを行うため、短期間の効果は（１）より低い。 The advantage of the above (1) is that it can be carried out in a short period of time without depending on subsequent processes such as logic synthesis.
On the other hand, the effect of improving power consumption or speed may be low.
The advantage of (2) above is the analysis after mapping, so the information is accurate, and there is a possibility that the effect of improving power consumption or speed is high.
However, since a mapping trial is performed, the short-term effect is lower than (1).

ここで、スラックとは、目標となる速度にどれだけ余裕があるかを示す値である。
例えば目標が１０ｎｓ（１００ＭＨｚ）である場合に、あるレジスタ間の遅延が１０．１ｎｓである場合は、−０．１ｎｓがスラックとなる。
これはタイミング違反となる。
逆に、レジスタ間の遅延が９．９ｎｓであった場合は、＋０．１ｎｓのスラックであり、タイミングは０．１ｎｓの余裕があることになる。 Here, the slack is a value indicating how much room is available for the target speed.
For example, when the target is 10 ns (100 MHz) and the delay between certain registers is 10.1 ns, −0.1 ns is slack.
This is a timing violation.
Conversely, if the delay between registers is 9.9 ns, the slack is +0.1 ns, and the timing has a margin of 0.1 ns.

なお、上記における（２）のタイミング情報と高位言語の対応は、特許文献１に記載されている。 The correspondence between the timing information (2) and the high-level language in the above is described in Patent Document 1.

上記（１）または（２）の手段により、タイミング解析部２０２は、遅延情報と対応するコードブロックとをひもづける。
具体的には、スケジューリング結果、あるいはＲＴＬの各データパスと、各データパスでのスラック（遅延情報）と、各データパスに対応するコードブロックとが記述された遅延情報テーブルを生成し、生成した遅延情報テーブルを遅延情報記憶部３０２に書き込む。
例えば、図４の回路に対する遅延情報テーブルは、図５のようになる。
図４及び図５の詳細は後述する。 By means of (1) or (2) above, the timing analysis unit 202 links the delay information and the corresponding code block.
Specifically, a delay information table in which scheduling results or RTL data paths, slack (delay information) in each data path, and code blocks corresponding to each data path are described is generated and generated. The delay information table is written into the delay information storage unit 302.
For example, the delay information table for the circuit of FIG. 4 is as shown in FIG.
Details of FIGS. 4 and 5 will be described later.

次に、上記の（ｂ）について説明する。
タイミング解析部２０２は、動作記述コードに対応する全ての遅延情報をヒストグラム化する。
ヒストグラムは、横軸をスラック、縦軸をそのスラックを持つデータパス数とする。
タイミング解析部２０２は、生成したヒストグラムをヒストグラム記憶部３０３に書き込む。
例えば、図５の遅延情報テーブルから生成されたヒストグラムは、図６のようになる。
図６の詳細は後述する。 Next, the above (b) will be described.
The timing analysis unit 202 forms a histogram of all delay information corresponding to the operation description code.
In the histogram, the horizontal axis represents slack, and the vertical axis represents the number of data paths having the slack.
The timing analysis unit 202 writes the generated histogram in the histogram storage unit 303.
For example, the histogram generated from the delay information table of FIG. 5 is as shown in FIG.
Details of FIG. 6 will be described later.

次に、上記（ｃ）を説明する。
タイミング解析部２０２は、上記（ａ）により得られた遅延情報テーブルを参照して、最も遅延が大きいデータパス（クリティカルパス）と、そのデータパスの遅延情報、そのデータパスに対応するコードブロック（クリティカルコードブロック）を抽出する。
そして、タイミング解析部２０２は、抽出したデータパス、遅延情報、コードブロックの情報を、クリティカルパス情報として、クリティカルパス情報記憶部３０４に書き込む。
このように、タイミング解析部２０２は、動作記述コードに含まれる複数のコードブロックの中から、クリティカルコードブロックを抽出しており、コードブロック抽出部の例に相当する。 Next, the above (c) will be described.
The timing analysis unit 202 refers to the delay information table obtained in (a) above, and refers to the data path (critical path) having the largest delay, the delay information of the data path, and the code block corresponding to the data path ( (Critical code block) is extracted.
Then, the timing analysis unit 202 writes the extracted data path, delay information, and code block information in the critical path information storage unit 304 as critical path information.
Thus, the timing analysis unit 202 extracts a critical code block from a plurality of code blocks included in the operation description code, and corresponds to an example of a code block extraction unit.

なお、上記（ａ）、（ｂ）、（ｃ）で得られた情報は、遅延情報表示部１０４にて、設計者に表示し、解析の補助にしてもよい。 Note that the information obtained in the above (a), (b), and (c) may be displayed to the designer on the delay information display unit 104 to assist analysis.

回路並列化可否解析部２０３は、動作記述コードにおいて、クリティカルコードブロックを含む、回路並列化の候補になるコードブロックを並列化候補コードブロックとして特定する。
動作記述コードには、回路並列化ができるコードブロックとできないコードブロックがある。
回路並列化可否解析部２０３は、動作記述コードの複数のコードブロックから、クリティカルコードブロックよりも後のコードブロックであって並列化の対象外のコードブロックである並列化対象外コードブロック及び並列化対象外コードブロックに後続するコードブロックを除外して、並列化候補コードブロックを抽出する。
並列化の対象外のコードブロックは、前クロックでの処理結果が現在のクロックでの処理結果に影響を与えるコードブロックである。
回路並列化可否解析部２０３は、このように、並列化候補コードブロックを抽出しており、後述の回路並列化判定部２０４とともに、評価部の例に相当する。 The circuit parallelization possibility analysis unit 203 identifies code blocks that are candidates for circuit parallelization, including critical code blocks, in the behavioral description code as parallelization candidate code blocks.
The behavioral description code includes a code block that can be circuit-parallelized and a code block that cannot.
The circuit parallelization enable / disable analysis unit 203 includes, from the plurality of code blocks of the behavioral description code, a code block that is a code block after the critical code block and that is a code block that is not a target for parallelization, and a parallelization target code block A code block following the non-target code block is excluded, and a parallelization candidate code block is extracted.
Code blocks that are not subject to parallelization are code blocks in which the processing result at the previous clock affects the processing result at the current clock.
In this way, the circuit parallelization availability analysis unit 203 extracts the parallelization candidate code blocks, and corresponds to an example of an evaluation unit together with a circuit parallelization determination unit 204 described later.

具体的には、ある変数が参照後、代入されている場合（例えば、ｃｏｕｎｔｅｒ＝ｃｏｕｎｔｅｒ＋１は、ｃｏｕｎｔｅｒ変数が参照後、その変数に代入されている）、時間制約がある部分（ＳｙｓｔｅｍＣ言語の場合はｗａｉｔ（）の有無）、さらに合成オプションによる制約部分は並列化ができない。
合成オプションにより回路並列化ができない例としては、ループ展開しない場合や、メモリポート数制約でポート数が足りない場合などである。 Specifically, when a certain variable is assigned after reference (for example, counter = counter + 1 is assigned to the variable after referencing the counter variable), there is a time-constrained part (in the case of SystemC language) The presence or absence of wait ()), and the restriction part due to the synthesis option cannot be parallelized.
Examples of circuit parallelization that cannot be performed by the synthesis option include a case where loop expansion is not performed or a case where the number of ports is insufficient due to the restriction on the number of memory ports.

回路並列化可否解析部２０３は、動作記述コードを入力として、上記のように並列化ができない箇所を排除し、並列化できるコードブロックを抽出し、抽出したコードブロックを並列化可能領域記憶部３０５に書き込む。 The circuit parallelization possibility analysis unit 203 receives the operation description code, eliminates the portions that cannot be parallelized as described above, extracts code blocks that can be parallelized, and extracts the code blocks that can be parallelized as a parallelizable area storage unit 305. Write to.

上記のように、回路並列化可否解析部２０３が静的に並列化可能領域（並列化候補コードブロック）を決めることにより、設計者が並列化が可能かどうかを判断することなく、回路並列化が容易に可能となる。 As described above, the circuit parallelization possibility analysis unit 203 statically determines the parallelizable area (parallelization candidate code block), so that the circuit parallelization can be performed without determining whether the designer can perform parallelization. Is easily possible.

また、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）などのメモリは、並列化により容量が２倍になる可能性があり、回路規模の増加量が多い可能性がある。
特にＦＰＧＡ（ＦｉｅｌｄＰｒｏｇｒａｍｍａｂｌｅＧａｔｅＡｒｒａｙ）はメモリ配置が決まっている場合があり、メモリが増加することで、速度劣化が生じる懸念がある。
このように制約がある場合があるので、並列化可能領域表示部１０２が、動作記述コード上の回路並列化が可能な領域を設計者にハイライトで示すようにしてもよい。
設計者は、開発の制約により回路の複製を希望しない箇所を、並列化指示部１０３を介して回路並列化可否解析部２０３に通知することで、該当箇所を回路並列化の対象外とすることができる。 In addition, a memory such as a ROM (Read Only Memory) may be doubled in capacity due to parallelization, and there is a possibility that an increase in circuit scale is large.
In particular, a field programmable gate array (FPGA) may have a predetermined memory arrangement, and there is a concern that speed increases due to an increase in memory.
Since there may be such restrictions, the parallelizable area display unit 102 may highlight areas that can be circuit-parallelized on the behavioral description code to the designer.
The designer notifies the circuit parallelization enable / disable analysis unit 203 via the parallelization instructing unit 103 of the part where the circuit is not desired due to development restrictions, thereby excluding the corresponding part from the circuit parallelization target. Can do.

また、回路並列化可否解析部２０３は、設計者が記述した全ての動作記述コードを並列化可否の解析の対象とせず、設計者に対象となるコードを指定させてもよい。
さらに、回路並列化可否解析部２０３は、ある変数を入力とし、その変数を中心とした範囲で並列化の可否を判定してもよい。 Further, the circuit parallelization availability analysis unit 203 may allow the designer to designate a target code without making all the operation description codes described by the designer subject to parallelization availability analysis.
Further, the circuit parallelization availability analysis unit 203 may input a certain variable and determine whether parallelization is possible within a range centered on the variable.

なお、上記では、回路並列化可否解析部２０３は、動作記述コードから解析を行っているが、高位合成の中間結果であるスケジューリング結果、あるいは、合成結果であるＲＴＬから回路並列化の判断をしてもよい。 In the above description, the circuit parallelization possibility analysis unit 203 analyzes from the behavioral description code, but determines the circuit parallelization from the scheduling result that is the intermediate result of the high-level synthesis or the RTL that is the synthesis result. May be.

回路並列化判定部２０４は、回路並列化可否解析部２０３により抽出された並列化可能領域（並列化候補コードブロック）において、データパスの特性を評価し、速度改善ができる箇所や消費電力が削減できる箇所を抽出し、回路並列化を行う範囲を判定する。
モード指定部１０５で速度改善モードが指定されていれば、回路並列化判定部２０４は、速度改善ができる箇所を抽出し、消費電力改善モードが指定されていれば、回路並列化判定部２０４は、消費電力が削減できる箇所を抽出する。
並列化ができる範囲は、通常、ソースコード上で点在している。
点在した箇所の全てを、あるいは、その組み合わせの全てに対して回路並列化を施しても、速度の改善又は消費電力の改善が実現されるとは限らない。
また、点在した箇所の全て、あるいは、その組み合わせの全てに対して回路並列化の効果を評価するには長時間を要する。
そこで、回路並列化判定部２０４は、クリティカルパスを中心にして、効率的に速度改善ができる箇所や消費電力が削減できる箇所を抽出する。
回路並列化判定部２０４の動作の詳細は、後述する。
なお、回路並列化判定部２０４は、前述の回路並列化可否解析部２０３とともに、評価部の例に相当する。 The circuit parallelization determination unit 204 evaluates the data path characteristics in the parallelizable area (parallelization candidate code block) extracted by the circuit parallelization possibility analysis unit 203, and reduces the locations where the speed can be improved and the power consumption. Extract possible locations and determine the range for circuit parallelization.
If the speed improvement mode is specified by the mode specification unit 105, the circuit parallelization determination unit 204 extracts a portion where the speed improvement can be performed. If the power consumption improvement mode is specified, the circuit parallelization determination unit 204 Extract locations where power consumption can be reduced.
The range that can be parallelized is usually scattered on the source code.
Even if circuit parallelization is applied to all of the scattered locations or all of the combinations, speed improvement or power consumption improvement is not always realized.
In addition, it takes a long time to evaluate the effect of circuit parallelization on all the scattered points or all of the combinations.
Therefore, the circuit parallelization determination unit 204 extracts a part where the speed can be improved efficiently and a part where the power consumption can be reduced centering on the critical path.
Details of the operation of the circuit parallelization determination unit 204 will be described later.
The circuit parallelization determination unit 204 corresponds to an example of an evaluation unit together with the circuit parallelization availability analysis unit 203 described above.

並列回路生成部２０５は、回路並列化判定部２０４により選択された箇所（コードブロック）の並列化を実施する。
つまり、並列回路生成部２０５は、回路並列化判定部２０４により選択されたコードブロックに対応するデータパスで構成される回路を並列化する。
回路並列化前後で、外から見たときの機能を同一にするには、並列化に付随する入出力回路も必要となる。
そこで、並列回路生成部２０５は、対象回路の並列化とそれに付随する入出力回路を自動的に生成する機能も有する。 The parallel circuit generation unit 205 performs parallelization of the portion (code block) selected by the circuit parallelization determination unit 204.
That is, the parallel circuit generation unit 205 parallelizes a circuit configured with a data path corresponding to the code block selected by the circuit parallelization determination unit 204.
In order to have the same function when viewed from outside before and after circuit parallelization, an input / output circuit accompanying the parallelization is also required.
Therefore, the parallel circuit generation unit 205 has a function of parallelizing the target circuit and automatically generating an input / output circuit associated therewith.

並列回路生成部２０５が、並列化対象となる回路及び付随する入出力回路も自動生成することで、設計者に並列化対象となる回路及び入出力回路を再設計させる必要がなく、設計開発の短期化が可能となる。
また以下に示す通り、生成される回路は高位言語でも可能となる。
そのため、高速なシミュレーションも可能である。 The parallel circuit generation unit 205 automatically generates a circuit to be parallelized and an accompanying input / output circuit, thereby eliminating the need for the designer to redesign the circuit to be parallelized and the input / output circuit. Shortening is possible.
As shown below, the generated circuit can also be in a high-level language.
Therefore, high speed simulation is also possible.

並列回路生成部２０５による回路の生成方法について述べる。
並列回路生成部２０５は、並列化可能領域記憶部３０５から、回路並列化判定部２０４により選択された並列化の対象範囲の入力信号と出力信号を抽出し、スケジューリング結果・ＲＴＬ記憶部３０１にある信号と対応させる。 A circuit generation method by the parallel circuit generation unit 205 will be described.
The parallel circuit generation unit 205 extracts the input signal and the output signal of the parallelization target range selected by the circuit parallelization determination unit 204 from the parallelizable area storage unit 305, and is in the scheduling result / RTL storage unit 301. Correspond with the signal.

ここで、入出力制御回路はあらかじめテンプレートとなる高位言語を用意しておき、並列回路生成部２０５は、並列化の対象となる回路の入力信号毎に入力制御回路を、出力信号毎に出力制御回路を生成する。
テンプレートの入力制御回路は、並列化対象回路のクロックと、そのクロックの２倍の周期をもつクロックと、並列化対象回路と同一のリセット信号と、１つのデータ入力信号と、２つのデータ出力信号と、その２つのデータ出力信号が有意であることを示す２つのイネーブル信号を持つ。
この入力制御回路は、データ入力信号を奇数クロックと偶数クロックで交互にイネーブル付きで出力する機能を有する。
データ出力信号とイネーブル信号は、上記２倍の周期で出力される。
同様に、テンプレートの出力制御回路は、並列化対象回路のクロックと、そのクロックの２倍の周期をもつクロックと、並列化対象回路と同一のリセット信号と、２つのデータ入力信号と、そのデータ入力信号が有意であることを示すイネーブル信号と、１つのデータ出力信号を持つ。
この出力制御回路は、入力信号のイネーブルが有意なものを出力する機能を持つ。
データ出力信号は元のクロック周期で出力される。
また、上記において、もし並列化対象回路の入出力信号にイネーブルがついていた場合は、出力制御回路においてもイネーブル付きでデータ出力信号を生成する。 Here, the input / output control circuit prepares a high-level language as a template in advance, and the parallel circuit generation unit 205 controls the input control circuit for each input signal of the circuit to be parallelized and performs output control for each output signal. Generate a circuit.
The template input control circuit includes a clock of the parallelization target circuit, a clock having a cycle twice that of the clock, the same reset signal as the parallelization target circuit, one data input signal, and two data output signals. And two enable signals indicating that the two data output signals are significant.
The input control circuit has a function of outputting a data input signal with an enable signal alternately with an odd number clock and an even number clock.
The data output signal and the enable signal are output with the double cycle.
Similarly, the output control circuit of the template includes a clock of the parallelization target circuit, a clock having a cycle twice that of the clock, the same reset signal as the parallelization target circuit, two data input signals, and the data It has an enable signal indicating that the input signal is significant and one data output signal.
This output control circuit has a function of outputting a significant enable signal.
The data output signal is output with the original clock period.
In the above, if the input / output signal of the parallel target circuit is enabled, the output control circuit also generates a data output signal with enable.

簡単のため、並列化対象回路の入力信号がａ、出力信号がｅの場合の生成される回路イメージを図１３に示す。
また、図１３の回路の動作を示すタイミングチャートを図１４に示す。
図１３及び図１４において、複製回路１及び複製回路２は、並列化された回路である。
ａ＿ｅｖｅｎは偶数クロックでの入力制御回路からのデータ出力信号であり、ａ＿ｏｄｄは奇数クロックでの入力制御回路からのデータ出力信号である。
ｉｅｎ＿ｅｖｅｎは偶数クロックでの入力制御回路からのイネーブル信号であり、ｉｅｎ＿ｏｄｄは奇数クロックでの入力制御回路からのイネーブル信号である。
ｅ＿ｅｖｅｎは偶数クロックでの出力制御回路へのデータ入力信号であり、ｅ＿ｏｄｄは奇数クロックでの出力制御回路へのデータ入力信号である。
ｏｅｎ＿ｅｖｅｎは偶数クロックでの出力制御回路へのイネーブル信号であり、ｏｅｎ＿ｏｄｄは奇数クロックでの出力制御回路へのイネーブル信号である。
ｅは出力制御回路からのデータ出力信号であり、ｏ＿ｅｎは出力制御回路からのイネーブル信号である。
ｃｌｋ１はデフォルトのクロック信号であり、ｃｌｋ２はｃｌｋ１の２倍の周期のクロック信号である。 For simplicity, FIG. 13 shows a circuit image generated when the input signal of the parallel target circuit is a and the output signal is e.
FIG. 14 is a timing chart showing the operation of the circuit of FIG.
In FIG. 13 and FIG. 14, the duplicate circuit 1 and the duplicate circuit 2 are parallel circuits.
a_even is a data output signal from the input control circuit at an even clock, and a_odd is a data output signal from the input control circuit at an odd clock.
ien_even is an enable signal from the input control circuit at an even clock, and ien_odd is an enable signal from the input control circuit at an odd clock.
e_even is a data input signal to the output control circuit at an even clock, and e_odd is a data input signal to the output control circuit at an odd clock.
oen_even is an enable signal to the output control circuit at an even clock, and oen_odd is an enable signal to the output control circuit at an odd clock.
e is a data output signal from the output control circuit, and o_en is an enable signal from the output control circuit.
clk1 is a default clock signal, and clk2 is a clock signal having a cycle twice that of clk1.

また、並列回路生成部２０５は、並列化可能領域記憶部３０５から抽出した入力信号と出力信号から、上記のとおり入出力回路の高位言語を自動で生成し、高位合成を行う。
このときスケジューリング結果が得られるので、並列回路生成部２０５は、このスケジューリング結果と、前記抽出した並列化のスケジューリング結果とその複製を１つ生成し、上記全てを結合したスケジューリングを生成し、高位合成を行う。
これにより、自動的に並列化した回路とそれに付随する入出力制御回路が自動生成される。
自動生成された上記全ての回路は、並列化回路コード記憶部３０６に書き込まれる。 The parallel circuit generation unit 205 automatically generates a high-level language of the input / output circuit from the input signal and output signal extracted from the parallelizable area storage unit 305 as described above, and performs high-level synthesis.
Since a scheduling result is obtained at this time, the parallel circuit generation unit 205 generates one scheduling result, one of the extracted parallelization scheduling results and a duplicate thereof, generates a scheduling that combines all the above, and performs high-level synthesis. I do.
As a result, an automatically parallelized circuit and an accompanying input / output control circuit are automatically generated.
All the automatically generated circuits are written in the parallelized circuit code storage unit 306.

また、上記では生成されるのはＲＴＬだけだが、高位言語で出力する方法もある。
一般にＲＴＬより高位言語のほうがシミュレーションが高速なため、高位言語で出力することでシミュレーションの高速化が可能となる。 In the above, only RTL is generated, but there is a method of outputting in a high-level language.
In general, a higher-level language is faster in simulation than RTL, so that the simulation can be speeded up by outputting in a higher-level language.

高位言語で出力するためには、並列回路生成部２０５は、まず並列化の対象となった範囲を動作記述コードから抜き出す。
次に、並列回路生成部２０５は、抜き出したコードを関数化、あるいはモジュール化する。
その後、並列回路生成部２０５は、関数化、モジュール化された前記コードから入出力信号を抽出し、前記テンプレートとなる高位言語で記述された入出力制御回路を自動生成する。
その後、並列回路生成部２０５は、並列化対象となるコードと入出力制御コードを接続する。 In order to output in a high-level language, the parallel circuit generation unit 205 first extracts the range to be parallelized from the behavioral description code.
Next, the parallel circuit generation unit 205 converts the extracted code into a function or a module.
After that, the parallel circuit generation unit 205 extracts input / output signals from the functionalized and modularized code, and automatically generates an input / output control circuit described in a high-level language as the template.
Thereafter, the parallel circuit generation unit 205 connects the code to be parallelized and the input / output control code.

なお、本実施の形態では、周波数を１／２にし、回路を２並列化しているが、周波数を１／Ｎ（Ｎは３以上の整数）、回路をＮ並列に置き換えてもよい。 In this embodiment, the frequency is halved and two circuits are paralleled. However, the frequency may be 1 / N (N is an integer of 3 or more) and the circuit may be replaced with N parallel.

次に、本実施の形態に係る半導体設計支援装置１０による速度改善例と消費電力改善例を説明する。
速度改善モードか消費電力改善モードかは、設計者がモード指定部１０５を介して指定する。
また、以降の説明で用いる動作記述コードのサンプルプログラムコードを図２に示す。
このサンプルプログラムコードは、説明に不要な箇所のコードは除いているので、不完全であることに注意する。
また、このサンプルプログラムコードは、Ｃ言語で記述されているが、他の言語で記述されていてもよい。
図５において、破線で区切った範囲が、一塊の記述であり、コードブロックに相当する。 Next, a speed improvement example and a power consumption improvement example by the semiconductor design support apparatus 10 according to the present embodiment will be described.
The designer designates the speed improvement mode or the power consumption improvement mode via the mode designation unit 105.
FIG. 2 shows a sample program code of the operation description code used in the following description.
Note that this sample program code is incomplete because it excludes code that is not needed for explanation.
The sample program code is described in C language, but may be described in other languages.
In FIG. 5, the range delimited by a broken line is a lump description and corresponds to a code block.

まず、速度改善を行うための動作例を示す。
図６に、速度改善モードでの半導体設計支援装置１０の動作フローを示す。 First, an example of operation for speed improvement is shown.
FIG. 6 shows an operation flow of the semiconductor design support apparatus 10 in the speed improvement mode.

ステップ１０１：高位合成
高位合成部２０１が、高位言語の動作記述コード（図５）を高位合成する。
そして、高位合成部２０１は、生成したスケジューリング結果とＲＴＬをスケジューリング結果・ＲＴＬ記憶部３０１に書き込む。
図４は、生成されたＲＴＬの一部を示す。 Step 101: High-level synthesis The high-level synthesis unit 201 performs high-level synthesis of the behavioral description code (FIG. 5) of the high-level language.
Then, the high-level synthesis unit 201 writes the generated scheduling result and RTL in the scheduling result / RTL storage unit 301.
FIG. 4 shows a part of the generated RTL.

ステップ１０２：クリティカルパス解析
タイミング解析部２０２が、高位合成部２０１によって得られたスケジューリング結果、あるいは、生成されたＲＴＬより静的タイミング解析された情報から、スラックを算出し、遅延情報テーブルを生成し、また、クリティカルパス及びクリティカルコードブロックの抽出を行う。
タイミング解析部２０２は、遅延情報テーブルを遅延情報記憶部３０２に書込み、クリティカルパス情報をクリティカルパス情報記憶部３０４に書き込む。
図４の回路に対して生成された遅延情報テーブルを図５に示す。
図４の例では、クリティカルパスに対応する記述が図２の「ｄ＝ａ＊ｂ＋ｃ；」（６行目）であると仮定する。
「ｄ＝ａ＊ｂ＋ｃ；」（６行目）は、５行目〜９行目で構成されるコードブロックに含まれており、タイミング解析部２０２は、５行目〜９行目で構成されるコードブロックをクリティカルコードブロックとして抽出する。
後述するように、回路並列化判定部２０４は、このクリティカルコードブロックを中心として、動作記述コード上で評価対象範囲を設定する。
評価対象範囲とは、速度改善の評価の対象にするコードブロックの範囲である。 Step 102: Critical path analysis The timing analysis unit 202 calculates slack from the scheduling result obtained by the high-level synthesis unit 201 or the information subjected to static timing analysis from the generated RTL, and generates a delay information table. Also, a critical path and a critical code block are extracted.
The timing analysis unit 202 writes the delay information table in the delay information storage unit 302 and writes critical path information in the critical path information storage unit 304.
FIG. 5 shows a delay information table generated for the circuit of FIG.
In the example of FIG. 4, it is assumed that the description corresponding to the critical path is “d = a * b + c;” (line 6) of FIG.
“D = a * b + c;” (line 6) is included in a code block composed of lines 5 to 9, and the timing analysis unit 202 is composed of lines 5 to 9. Code blocks to be extracted as critical code blocks.
As will be described later, the circuit parallelization determination unit 204 sets an evaluation target range on the operation description code with the critical code block as the center.
The evaluation target range is a range of code blocks to be evaluated for speed improvement.

ステップ１０３：回路並列化可否解析
回路並列化可否解析部２０３は、クリティカルパス情報記憶部３０４のクリティカルパス情報を入力し、クリティカルコードブロックを中心として回路並列化が可能な並列化可能領域（並列化候補コードブロック）を抽出する。
回路並列化可否解析部２０３は、まず、クリティカルコードブロックよりも後のコードブロックから、並列化対象外コードブロック及び前記並列化対象外コードブロックに後続するコードブロックを動作記述コードから除外する。
そして、回路並列化可否解析部２０３は、除外後の動作記述コードを、並列化可能領域とする。
図２のサンプルプログラムコードでは、「ｘ＝ｘ＋ｆ；」及び「ｃｏｕｎｔｅｒ＋＋」が先行するコードブロックと依存関係があるため、「ｘ＝ｘ＋ｆ；」及び「ｉｆ（ｘ＞０）ｃｏｕｎｔｅｒ＋＋」が並列化対象外コードブロックとなり、除外される。
図２のサンプルプログラムコードでは、「ｘ＝ｘ＋ｆ；」及び「ｉｆ（ｘ＞０）ｃｏｕｎｔｅｒ＋＋」に後続するコードブロックが存在していないが、後続するコードブロックが存在している場合には、後続するコードブロックも除外される。
回路並列化可否解析部２０３は、並列化可能領域を、並列化可能領域記憶部３０５に書き込む。 Step 103: Circuit parallelization possibility analysis The circuit parallelization possibility analysis unit 203 inputs critical path information from the critical path information storage unit 304, and is a parallelizable region (parallelization) that can be circuit-parallelized around a critical code block. Candidate code block) is extracted.
The circuit parallelization possibility analysis unit 203 first excludes from the behavior description code the code block that is not subject to parallelization and the code block that follows the code block that is not subject to parallelization from the code block after the critical code block.
Then, the circuit parallelization possibility analysis unit 203 sets the operation description code after the exclusion as a parallelizable area.
In the sample program code of FIG. 2, since “x = x + f;” and “counter ++” have a dependency relationship with the preceding code block, “x = x + f;” and “if (x> 0) counter ++” are to be parallelized. It becomes an outer code block and is excluded.
In the sample program code of FIG. 2, there is no code block following “x = x + f;” and “if (x> 0) counter ++”, but if there is a code block following, Code blocks to be excluded are also excluded.
The circuit parallelization possibility analysis unit 203 writes the parallelizable area in the parallelizable area storage unit 305.

ステップ１０４：タイミング解析
タイミング解析部２０２は、並列化可能領域記憶部３０５から並列化可能領域を入力し、並列化可能領域に対して、ステップ１０２と同様にして、スラックを算出し、遅延情報テーブルを生成する。
この段階で生成される遅延情報テーブルも図５に示すものと同様である。
並列化可能領域記憶部３０５は、生成した遅延情報テーブルを遅延情報記憶部３０２に書き込む。 Step 104: Timing Analysis The timing analysis unit 202 inputs a parallelizable region from the parallelizable region storage unit 305, calculates slack for the parallelizable region in the same manner as in step 102, and stores a delay information table. Is generated.
The delay information table generated at this stage is the same as that shown in FIG.
The parallelizable area storage unit 305 writes the generated delay information table in the delay information storage unit 302.

ステップ１０５：回路並列化判定
ステップ１０５−１：
まず、タイミング解析部２０２が、ステップ１０４で生成した遅延情報テーブルを元に、横軸がスラック、縦軸がデータパス個数のヒストグラムを作成し、ヒストグラム記憶部３０３に書き込む。
また、ここで基準となるスラックの値（閾値）を設計者が設定する。
通常、余裕を見て＋０．２などの値が設定される。
この段階で生成されるヒストグラム例を図６に示す。
図６のヒストグラムは、図５の遅延情報テーブルから生成されたものである。
図６において、ヒストグラムの上部に記述されている値は、データパスの個数である。
ステップ１０５−２：
次に、図１１に示すフローに従って、並列化可能範囲の中から並列化の対象とする箇所が選択される。
以下にて、図１１の各ステップを説明する。 Step 105: Circuit parallelization determination Step 105-1:
First, the timing analysis unit 202 creates a histogram in which the horizontal axis is slack and the vertical axis is the number of data paths based on the delay information table generated in step 104 and writes the histogram in the histogram storage unit 303.
In addition, the designer sets a reference slack value (threshold value).
Usually, a value such as +0.2 is set with a margin.
An example of a histogram generated at this stage is shown in FIG.
The histogram of FIG. 6 is generated from the delay information table of FIG.
In FIG. 6, the value described at the top of the histogram is the number of data paths.
Step 105-2:
Next, according to the flow shown in FIG. 11, a location to be parallelized is selected from the parallelizable range.
Below, each step of FIG. 11 is demonstrated.

ステップ１０５１：
回路並列化判定部２０４が、並列化可能範囲において、評価対象となるコードブロックの範囲である評価対象範囲を設定する。
回路並列化判定部２０４は、ステップ１０５１の繰り返しの度に、クリティカルコードブロックを中心にして評価対象範囲の広狭を変化させて、複数の評価対象範囲を設定する。
具体的には、回路並列化判定部２０４は、ステップ１０５１の繰り返しごとに、クリティカルコードブロックを中心にして徐々に範囲を狭めていって複数の評価対象範囲を設定する。
例えば、図２の例において、並列化可能範囲で、クリティカルコードブロックを中心とする最も広い評価対象範囲は、１行目〜１５行目である。
このため、回路並列化判定部２０４は、最初に、１行目〜１５行目を評価対象範囲として設定する。
次のステップ１０５１では、例えば、クリティカルコードブロックの前方で、クリティカルコードブロックから最も離れたコードブロックを除外した範囲を評価対象範囲として設定する。
前方でクリティカルコードブロックから最も離れている行は、１行目である。
図２の例では、１行目〜４行目で１つのコードブロックを形成するので、回路並列化判定部２０４は、１行目〜４行目のコードブロックを除外して、５行目〜１５行目のコードブロックを評価対象範囲として設定する。
次のステップ１０５１では、例えば、クリティカルコードブロックの後方で、クリティカルコードブロックから最も離れたコードブロックを除外した範囲を評価対象範囲として設定する。
後方でクリティカルコードブロックから最も離れている行は、１５行目である。
図２の例では、１１行目〜１５行目で１つのコードブロックを形成するので、回路並列化判定部２０４は、１１行目〜１５行目のコードブロックを除外して、１行目〜１０行目のコードブロックを評価対象範囲として設定する。
このようにして、回路並列化判定部２０４は、クリティカルコードブロックを中心にして並列化可能範囲内の全ての組合せが含まれるように、複数の評価対象範囲を設定する。 Step 1051:
The circuit parallelization determination unit 204 sets an evaluation target range that is a range of code blocks to be evaluated in the parallelizable range.
The circuit parallelization determination unit 204 sets a plurality of evaluation target ranges by changing the range of the evaluation target range around the critical code block every time step 1051 is repeated.
Specifically, the circuit parallelization determination unit 204 sets a plurality of evaluation target ranges by gradually narrowing the range around the critical code block every time step 1051 is repeated.
For example, in the example of FIG. 2, the widest evaluation target range centering on the critical code block in the parallelizable range is the 1st to 15th rows.
For this reason, the circuit parallelization determination unit 204 first sets the first to fifteenth rows as the evaluation target range.
In the next step 1051, for example, a range excluding the code block farthest from the critical code block in front of the critical code block is set as the evaluation target range.
The line farthest from the critical code block ahead is the first line.
In the example of FIG. 2, since one code block is formed in the first to fourth lines, the circuit parallelization determination unit 204 excludes the code blocks in the first to fourth lines, The code block on the 15th line is set as the evaluation target range.
In the next step 1051, for example, a range excluding the code block farthest from the critical code block behind the critical code block is set as the evaluation target range.
The line farthest away from the critical code block is the 15th line.
In the example of FIG. 2, since one code block is formed in the 11th to 15th lines, the circuit parallelization determination unit 204 excludes the 11th to 15th code blocks, and the 1st to 15th lines. The code block on the 10th line is set as the evaluation target range.
In this way, the circuit parallelization determination unit 204 sets a plurality of evaluation target ranges so that all combinations within the parallelizable range centering on the critical code block are included.

ステップ１０５２：
タイミング解析部２０２が、評価対象範囲でのスラックを計算する。
スラックの計算は、ステップ１０４のものと同じである。 Step 1052:
The timing analysis unit 202 calculates slack in the evaluation target range.
The slack calculation is the same as in step 104.

ステップ１０５３：
次に、タイミング解析部２０２が、基準値以下のスラックのデータパスの個数（ヒストグラム上の面積）を計数する。
基準値は、ステップ１０５−１で指定された基準値（＋０．２）である。 Step 1053:
Next, the timing analysis unit 202 counts the number of slack data paths (area on the histogram) equal to or less than the reference value.
The reference value is the reference value (+0.2) designated in Step 105-1.

なお、１行目〜１５行目についてのスラックは既にステップ１０４で計算され、また、基準値以下のスラックのデータパスの個数もステップ１０５−１で計数されているので、１行目〜１５行目の評価対象範囲については、ステップ１０５２及びステップ１０５３は省略される。 Note that the slack for the first to fifteenth rows has already been calculated in step 104, and the number of slack data paths below the reference value has also been counted in step 105-1, so the first to fifteenth rows. For the eye evaluation target range, step 1052 and step 1053 are omitted.

ステップ１０５４：
次に、タイミング解析部２０２は、ステップ１０５３での計数結果をヒストグラム記憶部３０３に書き込む。 Step 1054:
Next, the timing analysis unit 202 writes the count result at step 1053 in the histogram storage unit 303.

ステップ１０５５：
回路並列化判定部２０４は、並列化可能領域で設定可能な全ての評価対象範囲に対してステップ１０５２〜１０５４の処理を行ったかを判断し、未処理の評価対象範囲があれば、ステップ１０５１以降の処理を繰り返す。
一方、全ての評価対象範囲に対する処理が完了していれば、ステップ１０５６に進む。 Step 1055:
The circuit parallelization determination unit 204 determines whether or not the processing of steps 1052 to 1054 has been performed on all the evaluation target ranges that can be set in the parallelizable region, and if there is an unprocessed evaluation target range, step 1051 and subsequent steps. Repeat the process.
On the other hand, if the processing for all the evaluation target ranges is completed, the process proceeds to step 1056.

ステップ１０５６：
回路並列化判定部２０４は、複数の評価対象範囲の中から、並列化を行う評価対象範囲を選択する。
具体的には、回路並列化判定部２０４は、ステップ１０５３で計数されたデータパスの個数が最も大きい評価対象範囲を選択する。
また、最も大きい個数を持つ評価対象範囲が複数ある場合は、回路並列化後の動作記述コードのコードサイズが最も小さいものを選択する。
最も小さいコードサイズのものを選択するのは、回路規模の増加を抑える目的である。
例えば、評価対象範囲が図２の１行目〜１５行目である場合は、基準値＋０．２以下となる個数は、図５及び図６に示すように、５個である。
また、評価対象範囲が図２の５行目〜１５行目である場合は、回路構成は図７のようになり、基準値＋０．２以下となる個数は、図８及び図９に示すように、５個である。
また、評価対象範囲が図２の１行目〜１０行目である場合は、回路構成図、遅延情報テーブル及びヒストグラムの図示は省略するが、基準値＋０．２以下となる個数は、５個である。
これら以外の評価対象範囲の場合は、基準値＋０．２以下となる個数は、４個以下である。
そして、個数が最も大きく、かつ、コードサイズが最も小さいのは、図２の１行目〜１０行目の評価対象範囲であり、回路並列化判定部２０４は、この評価対象範囲を選択する。
なお、図２の１行目〜１０行目の評価対象範囲の回路構成は、図１０のようになる。 Step 1056:
The circuit parallelization determination unit 204 selects an evaluation target range to be parallelized from a plurality of evaluation target ranges.
Specifically, the circuit parallelization determination unit 204 selects the evaluation target range having the largest number of data paths counted in Step 1053.
In addition, when there are a plurality of evaluation target ranges having the largest number, the one with the smallest code size of the behavioral description code after circuit parallelization is selected.
The purpose of selecting the smallest code size is to suppress an increase in circuit scale.
For example, when the evaluation target range is the first to fifteenth rows in FIG. 2, the number of reference values +0.2 or less is five as shown in FIGS.
When the evaluation target range is the 5th to 15th rows in FIG. 2, the circuit configuration is as shown in FIG. 7, and the number of reference values +0.2 or less is as shown in FIG. 8 and FIG. And five.
When the evaluation target range is the first row to the tenth row in FIG. 2, the circuit configuration diagram, the delay information table, and the histogram are omitted, but the number of reference values +0.2 or less is five. It is.
In the case of an evaluation target range other than these, the number of reference values +0.2 or less is 4 or less.
The largest number and the smallest code size are the evaluation target ranges of the first to tenth lines in FIG. 2, and the circuit parallelization determination unit 204 selects this evaluation target range.
The circuit configuration of the evaluation target range in the first to tenth rows in FIG. 2 is as shown in FIG.

ステップ１０６：並列回路生成（図３）
並列回路生成部２０５が、ステップ１０５において選択された評価対象範囲のコードブロックに対応する回路（図１０）を並列化する。 Step 106: Parallel circuit generation (FIG. 3)
The parallel circuit generation unit 205 parallelizes the circuit (FIG. 10) corresponding to the code block in the evaluation target range selected in step 105.

なお、ここで、ステップ１０５で並列化された回路の規模が大きい場合は、ステップ１０５に戻り、次にスラックの面積が大きい評価対象範囲を選択し、許容できる回路規模内において、最適な改善効果があるものを選択してもよい。 Here, when the scale of the circuit parallelized in step 105 is large, the process returns to step 105, the evaluation target range having the next largest slack area is selected, and the optimum improvement effect is within the allowable circuit scale. You may choose one with

次に消費電力の改善を行う例を説明する。
以下では、図２のサンプルプログラムコードを用いて説明を行う。
消費電力改善モードの際の半導体設計支援装置１０の動作フローを図１２に示す。 Next, an example of improving power consumption will be described.
Hereinafter, description will be made using the sample program code of FIG.
FIG. 12 shows an operation flow of the semiconductor design support apparatus 10 in the power consumption improvement mode.

図１２において、ステップ１０１〜１０４は図６と同一である。 In FIG. 12, steps 101 to 104 are the same as those in FIG.

ステップ２０１：再合成
設計対象回路がデフォルトの動作周波数から低減された低減動作周波数（例えば、１／２の周波数）で動作するとの設定で、高位合成部２０１が高位合成を再度行い、ＲＴＬを得る。
再度合成するのは、周波数が低減されたことで、回路規模が削減される、つまりフリップフロップ（ＦＦ）が削減され、消費電力が下がる可能性があるためである。
なお、 Step 201: Resynthesis With the setting that the circuit to be designed operates at a reduced operating frequency (for example, a half frequency) reduced from the default operating frequency, the high level synthesis unit 201 performs high level synthesis again to obtain RTL. .
The reason for synthesizing again is that the circuit size is reduced by reducing the frequency, that is, flip-flops (FF) are reduced, and power consumption may be reduced.
In addition,

ステップ２０２：並列回路生成
並列回路生成部２０５が、図１３に示すように、複製回路、入力制御回路及び出力制御回路を生成し、これらを上記再合成したＲＴＬとつなげ、回路規模を得る。 Step 202: Parallel circuit generation As shown in FIG. 13, the parallel circuit generation unit 205 generates a duplicate circuit, an input control circuit, and an output control circuit, and connects them to the re-synthesized RTL to obtain a circuit scale.

ステップ２０３：回路並列化判定
次に、回路並列化判定部２０４が、並列化可能領域においてクリティカルコードブロックを中心に広狭を変化させて評価対象範囲を設定する。
これは速度改善モードと同様の処理である。 Step 203: Circuit Parallelization Determination Next, the circuit parallelization determination unit 204 sets the evaluation target range by changing the width around the critical code block in the parallelizable area.
This is the same processing as in the speed improvement mode.

ステップ２０１〜２０３を、全ての評価対象範囲について処理した後、回路並列化判定部２０４は、回路並列化後の設計対象回路の回路規模が最も小さくなる評価対象範囲を選択する。
このように、消費電力改善モードでは、回路並列化判定部２０４は、評価対象範囲のコードブロックに対応するデータパスで構成される回路が並列化された場合の設計対象回路の回路規模を評価して、評価対象範囲を選択する。 After processing steps 201 to 203 for all the evaluation target ranges, the circuit parallelization determination unit 204 selects an evaluation target range in which the circuit scale of the design target circuit after the circuit parallelization is the smallest.
As described above, in the power consumption improvement mode, the circuit parallelization determination unit 204 evaluates the circuit scale of the design target circuit when the circuit configured by the data path corresponding to the code block in the evaluation target range is parallelized. To select the evaluation target range.

なお、上記では、ステップ２０１において、設計対象回路が低減動作周波数で動作する設定で高位合成を行うことで、設計対象回路が低減動作周波数する場合の回路規模を評価する例を説明した。
しかしながら、設計対象回路が低減動作周波数で動作すると設計対象回路に要求される条件が満たされない場合は、ステップ２０１を省略し、設計対象回路がデフォルトの動作周波数で動作する場合の設計対象回路の回路規模を評価するようにしてもよい。 In the above description, in step 201, the example in which the circuit scale is evaluated when the design target circuit operates at the reduced operating frequency by performing high-level synthesis with the setting in which the design target circuit operates at the reduced operating frequency has been described.
However, if the condition required for the design target circuit is not satisfied when the design target circuit operates at the reduced operation frequency, the circuit of the design target circuit when the design target circuit operates at the default operation frequency is omitted. You may make it evaluate a scale.

また、速度改善モード及び消費電力改善モードのいずれにおいても、性能表示部１０６が、並列化対象箇所とその性能である速度及び回路規模を、設計者にて通知する。 Further, in both the speed improvement mode and the power consumption improvement mode, the performance display unit 106 notifies the designer of the parallelization target portion and the speed and circuit scale as the performance.

また、本実施の形態では、回路並列化判定部２０４が解析を自動的に行い、並列化する範囲の決定を行っているが、回路並列化可否解析部２０３、並列回路生成部２０５を用いて、設計者が並列化する範囲を決めることも可能である。 In the present embodiment, the circuit parallelization determination unit 204 automatically performs analysis and determines the range to be parallelized, but the circuit parallelization availability analysis unit 203 and the parallel circuit generation unit 205 are used. It is also possible for the designer to determine the range to be parallelized.

以上、本実施の形態に係る半導体設計支援装置によれば、設計者の能力に依存することなく低消費電力なハードウェア構成のＲＴＬを短時間で得るという効果がある。
また、要求となる動作周波数に到達しない場合でも、設計者が問題個所の特定と改善を行うことなく、自動的に速度改善が短時間に行えるといった効果がある。 As described above, according to the semiconductor design support apparatus according to the present embodiment, there is an effect that an RTL having a hardware configuration with low power consumption can be obtained in a short time without depending on the ability of the designer.
In addition, even if the required operating frequency is not reached, there is an effect that the designer can automatically improve the speed in a short time without specifying and improving the problem part.

以上、本実施の形態では、
回路の動作を記述した動作記述を入力として、高位合成処理を行い、ＨＤＬ記述を出力する半導体設計支援装置であって、
前記高位合成された結果であるスケジューリング結果、あるいはＲＴＬから、遅延情報を取得し、前記動作記述と遅延情報を関連付け、また遅延情報のヒストグラムを作成するタイミング解析手段と、
並列化回路と並列化に必要な入出力回路を自動生成する並列回路生成手段と、
並列化が可能な領域を前記動作記述から抽出する回路並列化可否解析手段と、
複数の並列化可能領域から、前記ヒストグラムと前記並列化回路と入出力回路の回路規模を取得し、
その中から、速度を満足する回路、あるいは周波数を改善できる回路を抽出する手段を有する半導体設計支援装置を説明した。 As described above, in the present embodiment,
A semiconductor design support apparatus that performs a high-level synthesis process and outputs an HDL description with an operation description describing the operation of a circuit as an input,
Timing analysis means for acquiring delay information from the scheduling result that is a result of the high-level synthesis, or RTL, associating the behavior description with delay information, and creating a histogram of delay information;
Parallel circuit generation means for automatically generating a parallel circuit and an input / output circuit necessary for parallelization;
A circuit parallelization possibility analyzing means for extracting a parallelizable area from the behavior description;
From a plurality of parallelizable areas, obtain the circuit scale of the histogram, the parallelized circuit and the input / output circuit,
A semiconductor design support apparatus having means for extracting a circuit that satisfies speed or a circuit that can improve frequency has been described.

最後に、本実施の形態に示した半導体設計支援装置１０のハードウェア構成例を図１５を参照して説明する。
半導体設計支援装置１０はコンピュータであり、半導体設計支援装置１０の各要素をプログラムで実現することができる。
半導体設計支援装置１０のハードウェア構成としては、バスに、演算装置９０１、外部記憶装置９０２、主記憶装置９０３、通信装置９０４、入出力装置９０５が接続されている。 Finally, a hardware configuration example of the semiconductor design support apparatus 10 shown in the present embodiment will be described with reference to FIG.
The semiconductor design support apparatus 10 is a computer, and each element of the semiconductor design support apparatus 10 can be realized by a program.
As a hardware configuration of the semiconductor design support apparatus 10, an arithmetic device 901, an external storage device 902, a main storage device 903, a communication device 904, and an input / output device 905 are connected to the bus.

演算装置９０１は、プログラムを実行するＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）である。
外部記憶装置９０２は、例えばＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）やフラッシュメモリ、ハードディスク装置である。
主記憶装置９０３は、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）である。
記憶部３００は、主記憶装置９０３又は外部記憶装置９０２により実現される。
通信装置９０４は、例えば、ＮＩＣ（ＮｅｔｗｏｒｋＩｎｔｅｒｆａｃｅＣａｒｄ）である。
入出力装置９０５は、例えばマウス、キーボード、ディスプレイ装置等である。 The arithmetic device 901 is a CPU (Central Processing Unit) that executes a program.
The external storage device 902 is, for example, a ROM (Read Only Memory), a flash memory, or a hard disk device.
The main storage device 903 is a RAM (Random Access Memory).
The storage unit 300 is realized by the main storage device 903 or the external storage device 902.
The communication device 904 is, for example, a NIC (Network Interface Card).
The input / output device 905 is, for example, a mouse, a keyboard, a display device, or the like.

プログラムは、通常は外部記憶装置９０２に記憶されており、主記憶装置９０３にロードされた状態で、順次演算装置９０１に読み込まれ、実行される。
プログラムは、図１に示す「〜部」（記憶部３００内の要素を除く、以下も同様）として説明している機能を実現するプログラムである。
更に、外部記憶装置９０２にはオペレーティングシステム（ＯＳ）も記憶されており、ＯＳの少なくとも一部が主記憶装置９０３にロードされ、演算装置９０１はＯＳを実行しながら、図１に示す「〜部」の機能を実現するプログラムを実行する。
また、本実施の形態の説明において、「〜の判断」、「〜の判定」、「〜の抽出」、「〜の設定」、「〜の選択」、「〜の算出」、「〜の計数」、「〜の解析」、「〜の評価」、「〜の生成」、「〜の入力」、「〜の出力」等として説明している処理の結果を示す情報やデータや信号値や変数値が主記憶装置９０３にファイルとして記憶されている。
また、暗号鍵・復号鍵や乱数値やパラメータが、主記憶装置９０３にファイルとして記憶されてもよい。 The program is normally stored in the external storage device 902, and is loaded into the main storage device 903 and sequentially read into the arithmetic device 901 and executed.
The program is a program that realizes a function described as “˜unit” (excluding elements in the storage unit 300, the same applies hereinafter) shown in FIG.
Further, an operating system (OS) is also stored in the external storage device 902. At least a part of the OS is loaded into the main storage device 903. ”Is executed.
In the description of the present embodiment, “determination of”, “determination of”, “extraction of”, “setting of”, “selection of”, “calculation of”, “counting of” ”,“ Analysis of ”,“ Evaluation of ”,“ Generation of ”,“ Input of ”,“ Output of ”, etc. Information, data, signal values, and variables indicating the results of processing The value is stored in the main storage device 903 as a file.
Further, the encryption key / decryption key, random number value, and parameter may be stored in the main storage device 903 as a file.

なお、図１５の構成は、あくまでも半導体設計支援装置１０のハードウェア構成の一例を示すものであり、半導体設計支援装置１０のハードウェア構成は図１５に記載の構成に限らず、他の構成であってもよい。 Note that the configuration of FIG. 15 is merely an example of the hardware configuration of the semiconductor design support apparatus 10, and the hardware configuration of the semiconductor design support apparatus 10 is not limited to the configuration illustrated in FIG. There may be.

１０半導体設計支援装置、１００入出力部、１０１高位言語表示部、１０２並列化可能領域表示部、１０３並列化指示部、１０４遅延情報表示部、１０５モード指定部、１０６性能表示部、２００データ処理部、２０１高位合成部、２０２タイミング解析部、２０３回路並列化可否解析部、２０４回路並列化判定部、２０５並列回路生成部、３００記憶部、３０１スケジューリング結果・ＲＴＬ記憶部、３０２遅延情報記憶部、３０３ヒストグラム記憶部、３０４クリティカルパス情報記憶部、３０５並列化可能領域記憶部、３０６並列化回路コード記憶部。 DESCRIPTION OF SYMBOLS 10 Semiconductor design support apparatus, 100 Input / output part, 101 High-level language display part, 102 Parallelizable area display part, 103 Parallelization instruction part, 104 Delay information display part, 105 Mode designation part, 106 Performance display part, 200 Data processing 201, high-level synthesis unit, 202 timing analysis unit, 203 circuit parallelization possibility analysis unit, 204 circuit parallelization determination unit, 205 parallel circuit generation unit, 300 storage unit, 301 scheduling result / RTL storage unit, 302 delay information storage unit , 303 Histogram storage unit, 304 critical path information storage unit, 305 parallelizable area storage unit, 306 parallelized circuit code storage unit.

Claims

設計対象回路の動作記述コードに含まれる複数のコードブロックの中から、前記複数のコードブロックの高位合成により生成される複数のデータパスのうちのクリティカルパスに対応するコードブロックをクリティカルコードブロックとして抽出するコードブロック抽出部と、
前記複数のコードブロックから、前記クリティカルコードブロックよりも後のコードブロックであって回路並列化の対象外のコードブロックである並列化対象外コードブロック及び前記並列化対象外コードブロックに後続するコードブロックを除外して、クリティカルコードブロックを含む、回路並列化の候補になるコードブロックを並列化候補コードブロックとして特定し、前記並列化候補コードブロックにおいて、評価対象となるコードブロックの範囲である評価対象範囲を前記クリティカルコードブロックを中心にして広狭を変化させて複数設定し、評価対象範囲ごとに、評価対象範囲のコードブロックに対応するデータパスの特性を評価し、複数の評価対象範囲の中から、回路並列化の対象にする評価対象範囲を選択する評価部とを有することを特徴とする回路設計支援装置。 A code block corresponding to a critical path among a plurality of data paths generated by high-level synthesis of the plurality of code blocks is extracted as a critical code block from a plurality of code blocks included in the operation description code of the circuit to be designed. A code block extractor to perform,
From the plurality of code blocks, a code block after the critical code block and a code block that is not subject to circuit parallelization, and a code block that follows the non-parallelization target code block And a code block that is a candidate for circuit parallelization including a critical code block is identified as a parallelization candidate code block, and an evaluation target that is a range of code blocks to be evaluated in the parallelization candidate code block A plurality of ranges are set by changing the range around the critical code block, and the characteristics of the data path corresponding to the code block in the evaluation target range are evaluated for each evaluation target range. And an evaluation unit that selects an evaluation target range to be subjected to circuit parallelization. Circuit design support apparatus characterized by.

前記評価部は、
評価対象範囲のコードブロックに対応するデータパスの特性として、評価対象範囲のコードブロックに対応するデータパスでの遅延時間を評価することを特徴とする請求項１に記載の回路設計支援装置。 The evaluation unit is
The circuit design support apparatus according to claim 1, wherein the delay time in the data path corresponding to the code block in the evaluation target range is evaluated as the characteristic of the data path corresponding to the code block in the evaluation target range.

前記評価部は、
評価対象範囲のコードブロックに対応するデータパスの特性として、評価対象範囲のコードブロックに対応するデータパスで構成される回路が並列化された場合の前記設計対象回路の規模を評価することを特徴とする請求項１に記載の回路設計支援装置。 The evaluation unit is
As a characteristic of the data path corresponding to the code block in the evaluation target range, the scale of the circuit to be designed when the circuit configured by the data path corresponding to the code block in the evaluation target range is parallelized is characterized. The circuit design support apparatus according to claim 1.

前記評価部は、
前記複数のコードブロックから、前記並列化対象外コードブロック、前記並列化対象外コードブロックに後続するコードブロック、及び前記回路設計支援装置のユーザにより指定されたコードブロックを除外して、前記並列化候補コードブロックを特定することを特徴とする請求項１に記載の回路設計支援装置。 The evaluation unit is
Exclude from the plurality of code blocks the code block that is not subject to parallelization, the code block that follows the code block that is not subject to parallelization, and the code block specified by the user of the circuit design support device, and the parallelization The circuit design support apparatus according to claim 1, wherein a candidate code block is specified.

前記評価部は、
前記クリティカルコードブロックを中心にして前記並列化候補コードブロック内の全ての組合せが含まれるように、評価対象範囲を複数設定することを特徴とする請求項１に記載の回路設計支援装置。 The evaluation unit is
The circuit design support apparatus according to claim 1, wherein a plurality of evaluation target ranges are set so that all combinations in the parallelization candidate code block are included around the critical code block.

前記評価部は、
遅延時間が閾値以上のデータパスの数が最大の評価対象範囲を選択することを特徴とする請求項２に記載の回路設計支援装置。 The evaluation unit is
3. The circuit design support apparatus according to claim 2, wherein an evaluation target range having the maximum number of data paths having a delay time equal to or greater than a threshold is selected.

前記評価部は、
遅延時間が閾値以上のデータパスの数が最大の評価対象範囲が２つ以上ある場合に、
遅延時間が閾値以上のデータパスの数が最大の評価対象範囲のうち、回路並列化後の動作記述コードのコードサイズが最小になる評価対象範囲を選択することを特徴とする請求項６に記載の回路設計支援装置。 The evaluation unit is
When there are two or more evaluation target ranges with the maximum number of data paths whose delay time is equal to or greater than the threshold,
The evaluation target range in which the code size of the behavioral description code after circuit parallelization is minimized is selected from the evaluation target range having the maximum number of data paths with a delay time equal to or greater than the threshold value. Circuit design support device.

前記評価部は、
回路並列化後の前記設計対象回路の規模が最小の評価対象範囲を選択することを特徴とする請求項３に記載の回路設計支援装置。 The evaluation unit is
The circuit design support apparatus according to claim 3, wherein an evaluation target range having a minimum scale of the circuit to be designed after circuit parallelization is selected.

前記評価部は、
前記設計対象回路がデフォルトの動作周波数から低減された低減動作周波数で動作する場合の前記設計対象回路の規模を評価することを特徴とする請求項３に記載の回路設計支援装置。 The evaluation unit is
4. The circuit design support apparatus according to claim 3, wherein a scale of the circuit to be designed is evaluated when the circuit to be designed operates at a reduced operating frequency reduced from a default operating frequency.

前記評価部は、
前記設計対象回路が前記低減動作周波数で動作すると前記設計対象回路に要求される条件が満たされない場合に、前記設計対象回路がデフォルトの動作周波数で動作する場合の前記設計対象回路の規模を評価することを特徴とする請求項９に記載の回路設計支援装置。 The evaluation unit is
When the design target circuit operates at the reduced operation frequency and the condition required for the design target circuit is not satisfied, the scale of the design target circuit when the design target circuit operates at a default operation frequency is evaluated. The circuit design support apparatus according to claim 9.

前記評価部は、
評価対象範囲のコードブロックに対応するデータパスでの遅延時間を評価する第１の評価モードと、評価対象範囲のコードブロックに対応するデータパスで構成される回路が並列化された場合の前記設計対象回路の規模を評価する第２の評価モードとのいずれかを、前記回路設計支援装置のユーザに選択させ、
前記ユーザにより前記第１の評価モードが選択された場合に、評価対象範囲ごとに、評価対象範囲のコードブロックに対応するデータパスでの遅延時間を評価し、
前記ユーザにより前記第２の評価モードが選択された場合に、評価対象範囲ごとに、評価対象範囲のコードブロックに対応するデータパスで構成される回路が並列化された場合の前記設計対象回路の規模を評価することを特徴とする請求項１に記載の回路設計支援装置。 The evaluation unit is
The design when the first evaluation mode for evaluating the delay time in the data path corresponding to the code block in the evaluation target range and the circuit configured by the data path corresponding to the code block in the evaluation target range are parallelized One of the second evaluation modes for evaluating the scale of the target circuit is selected by the user of the circuit design support device,
When the first evaluation mode is selected by the user, for each evaluation target range, evaluate the delay time in the data path corresponding to the code block of the evaluation target range,
When the second evaluation mode is selected by the user, the circuit of the design target when the circuit configured by the data path corresponding to the code block of the evaluation target range is parallelized for each evaluation target range The circuit design support apparatus according to claim 1, wherein the scale is evaluated.

前記回路設計支援装置は、更に、
前記評価部による評価結果を表示する評価結果表示部を有することを特徴とする請求項１に記載の回路設計支援装置。 The circuit design support device further includes:
The circuit design support apparatus according to claim 1, further comprising an evaluation result display unit that displays an evaluation result by the evaluation unit.

設計対象回路の動作記述コードに含まれる複数のコードブロックの中から、前記複数のコードブロックの高位合成により生成される複数のデータパスのうちのクリティカルパスに対応するコードブロックをクリティカルコードブロックとして抽出するコードブロック抽出処理と、
前記複数のコードブロックから、前記クリティカルコードブロックよりも後のコードブロックであって回路並列化の対象外のコードブロックである並列化対象外コードブロック及び前記並列化対象外コードブロックに後続するコードブロックを除外して、クリティカルコードブロックを含む、回路並列化の候補になるコードブロックを並列化候補コードブロックとして特定し、前記並列化候補コードブロックにおいて、評価対象となるコードブロックの範囲である評価対象範囲を前記クリティカルコードブロックを中心にして広狭を変化させて複数設定し、評価対象範囲ごとに、評価対象範囲のコードブロックに対応するデータパスの特性を評価し、複数の評価対象範囲の中から、回路並列化の対象にする評価対象範囲を選択する評価処理とをコンピュータに実行させることを特徴とするプログラム。 A code block corresponding to a critical path among a plurality of data paths generated by high-level synthesis of the plurality of code blocks is extracted as a critical code block from a plurality of code blocks included in the operation description code of the circuit to be designed. Code block extraction processing to
From the plurality of code blocks, a code block after the critical code block and a code block that is not subject to circuit parallelization, and a code block that follows the non-parallelization target code block And a code block that is a candidate for circuit parallelization including a critical code block is identified as a parallelization candidate code block, and an evaluation target that is a range of code blocks to be evaluated in the parallelization candidate code block A plurality of ranges are set by changing the range around the critical code block, and the characteristics of the data path corresponding to the code block in the evaluation target range are evaluated for each evaluation target range. , Evaluation processing to select the evaluation target range to be circuit parallelization and A program characterized by causing a computer to execute.