JP2017220149A

JP2017220149A - Data processor, data processing method and data processing program

Info

Publication number: JP2017220149A
Application number: JP2016116161A
Authority: JP
Inventors: 善之大野; Yoshiyuki Ono
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2016-06-10
Filing date: 2016-06-10
Publication date: 2017-12-14
Anticipated expiration: 2036-06-10
Also published as: JP6748372B2

Abstract

PROBLEM TO BE SOLVED: To efficiently perform processing for inputting a large number of extraction candidate data arrays and a plurality of pieces of discrimination condition formula information, and extracting only candidate data for satisfying all discrimination condition formulae out of the extraction candidate data arrays in a data parallel manner.SOLUTION: A continuous candidate data discrimination part performs continuous data discrimination processing to a plurality of extraction candidate data arrays being processing objects to be processed by calculating one or a plurality of discrimination condition formulae out of input discrimination condition formulae. A discontinuous candidate data discrimination part performs discontinuous data discrimination processing to the extraction candidate data arrays which are designated by a candidate ID list in which index information indicating the extraction candidate data arrays is included by calculating one or a plurality of condition formulae out of the input discrimination condition formulae. A candidate ID update determination part determines whether or not a candidate ID included in the candidate ID list should be updated on the basis of a discrimination result which is performed by the continuous candidate data discrimination part. A candidate ID update part updates the candidate ID list when the candidate update determination part determines that the candidate ID list should be updated, and after execution of the discontinuous candidate data determination part.SELECTED DRAWING: Figure 2

Description

本発明は、データ処理装置、データ処理方法、およびデータ処理プログラムに関する。 The present invention relates to a data processing device, a data processing method, and a data processing program.

近年、様々なデータ処理（例えば、画像処理、音声処理、その他統計処理等）において、データを抽出する抽出計算が用いられている。ここで、「抽出計算」とは、多数の入力データ列に対して、複数の判別条件の全てを満たすような入力データのみを抽出する計算のことをいう。代表的な抽出計算として、画像データから、顔などの特定のオブジェクトを検出する検出処理が知られている。 In recent years, extraction calculation for extracting data is used in various data processing (for example, image processing, audio processing, other statistical processing, and the like). Here, “extraction calculation” refers to a calculation that extracts only input data that satisfies all of a plurality of determination conditions for a large number of input data strings. As a typical extraction calculation, a detection process for detecting a specific object such as a face from image data is known.

非特許文献１は、検出処理の手法を開示している。この非特許文献１に開示された検出処理の手法では、次のようにして特定のオブジェクトの検出を行う。まず、検出処理の手法は、画像データをウィンドウと呼ばれる部分領域に分割する処理を行う。引き続いて、検出処理の手法は、それぞれのウィンドウに対して、特定の画素領域から求めたスコアとあらかじめ定めた閾値と比較する処理を行う。最後に、検出処理の手法は、これら処理を複数回繰り返すことで、特定のオブジェクトの検出を行っている。 Non-Patent Document 1 discloses a detection processing technique. In the detection processing method disclosed in Non-Patent Document 1, a specific object is detected as follows. First, the detection processing method performs a process of dividing image data into partial areas called windows. Subsequently, in the detection processing method, for each window, a process of comparing a score obtained from a specific pixel area with a predetermined threshold value is performed. Finally, the detection processing method detects a specific object by repeating these processes a plurality of times.

図１５は、非特許文献１に開示された検出処理の手法を、コンピュータで動作させるためのプログラムとして記述した場合のコードを示す図である。 FIG. 15 is a diagram illustrating a code when the detection processing method disclosed in Non-Patent Document 1 is described as a program for operating on a computer.

また、特許文献１は、上記の検出処理を、並列に処理する手法を開示している。特許文献１に開示された手法では、画素領域からスコアを計算するという処理（図１５の内側ループに相当）を、並列に実行している。 Patent Document 1 discloses a technique for performing the above detection processing in parallel. In the method disclosed in Patent Document 1, a process of calculating a score from a pixel region (corresponding to the inner loop in FIG. 15) is executed in parallel.

特開２０１０−２０４９４７号公報JP 2010-204947 A

Rapid Object Detection using a Boosted Cascade of Simple Features, Conference of Computer Vision And Pattern Recognition, 2001.Rapid Object Detection using a Boosted Cascade of Simple Features, Conference of Computer Vision And Pattern Recognition, 2001.

しかしながら、上記特許文献１に開示された並列検出処理では、全ての候補ウィンドウに対して、全てのスコア計算および閾値判別をすることになり、計算量が大きくなるという問題がある。 However, in the parallel detection process disclosed in Patent Document 1, all score calculations and threshold values are determined for all candidate windows, and there is a problem that the amount of calculation increases.

図１５で示す通り、複数回のスコア計算および閾値判別をする間で、一度閾値判定からもれた場合は、当該ウィンドウについては、残りのスコア計算および閾値判別をする必要がない。そのため、上記特許文献１に開示の並列検出処理は、計算量が大きくなるといえる。 As shown in FIG. 15, if the threshold value is once lost during multiple score calculations and threshold value determinations, it is not necessary to perform the remaining score calculation and threshold value determinations for the window. Therefore, it can be said that the parallel detection processing disclosed in Patent Document 1 requires a large amount of calculation.

本発明の目的は、上述したいずれかの課題を解決する、データ処理装置、データ処理方法、およびデータ処理プログラムを提供することにある。 An object of the present invention is to provide a data processing device, a data processing method, and a data processing program that can solve any of the problems described above.

本発明のデータ処理装置は、処理対象である複数の抽出候補データ列に対して、入力判別条件式の内１つまたは複数の判別条件式を計算することによって、連続データ判別処理を行う連続候補データ判別手段と；前記各抽出候補データ列を示すインデックス情報が含まれる候補ＩＤリストで指定される抽出候補データ列に対して、入力判別条件式の内１つまたは複数の条件式を計算することによって、不連続データ判別処理を行う不連続候補データ判別手段と；前記連続候補データ判別手段が行った判別結果をもとに、前記候補ＩＤリストに含まれる候補ＩＤを更新するかどうかを判断する候補ＩＤ更新判定手段と；前記候補ＩＤ更新判定手段が候補ＩＤリストを更新すると判断した場合や、前記不連続候補データ判定手段の実行後に、候補ＩＤリストの更新を行う候補ＩＤ更新手段と；を備え、前記連続候補データ判別手段、前記不連続候補データ判別手段、前記候補ＩＤ更新判別手段、および前記候補ＩＤ更新手段による処理を複数回反復処理することによって、前記複数の抽出候補データ列のうち、前記候補ＩＤ更新手段によって更新された候補ＩＤに対応する候補データが抽出されることを特徴とする。 The data processing apparatus of the present invention performs continuous data discrimination processing by calculating one or more discriminant conditional expressions from among input discriminant conditional expressions for a plurality of extraction candidate data strings to be processed. Calculating one or a plurality of conditional expressions among the input determination conditional expressions for the extraction candidate data strings specified by the candidate ID list including the index information indicating the extraction candidate data strings; To determine whether to update a candidate ID included in the candidate ID list based on the determination result performed by the continuous candidate data determination unit; A candidate ID update determination unit; and a candidate when the candidate ID update determination unit determines to update the candidate ID list or after execution of the discontinuous candidate data determination unit A candidate ID update unit that updates the D list, and the process by the continuous candidate data determination unit, the discontinuous candidate data determination unit, the candidate ID update determination unit, and the candidate ID update unit is repeated a plurality of times Thus, candidate data corresponding to the candidate ID updated by the candidate ID updating means is extracted from the plurality of extraction candidate data strings.

本発明のデータ処理方法は、データ処理装置が、複数の抽出候補データ列および複数の判別条件式情報を入力し、抽出候補データ列のうち、全ての判別条件式を満たすような候補データのみを抽出するようなデータ処理方法であって、処理対象である抽出候補データ列に対して、入力判別条件式の内１つまたは複数の判別条件式を計算することによって、連続データ判別処理を行い；前記判別条件式を用いた判定の結果に基づいて、各抽出候補データ列を示すインデックス情報が含まれる候補ＩＤリストで指定される候補ＩＤを更新するかどうかを判断し；前記候補ＩＤリストを更新すると判断しない場合は、次の判別条件式に対しては、前記の判別条件式を用いた条件判定と、前記候補ＩＤ更新判定を行い；前記候補ＩＤリストを更新すると判断した場合は、候補ＩＤリストの更新を行い；前記候補ＩＤリストで指定される抽出候補データ列に対して、入力判別条件式の内１つまたは複数の条件式を計算することによって、不連続データ判別処理を行い、これら処理を複数回反復処理することによって、更新された候補ＩＤに対応する候補データが抽出される。 In the data processing method of the present invention, the data processing apparatus inputs a plurality of extraction candidate data strings and a plurality of discriminant condition expression information, and selects only candidate data that satisfies all the discriminant expression expressions from the extraction candidate data strings. A data processing method for extraction, wherein continuous data discrimination processing is performed by calculating one or a plurality of discriminant conditional expressions among input discriminant conditional expressions for an extraction candidate data string to be processed; Based on the determination result using the determination conditional expression, it is determined whether or not to update the candidate ID specified in the candidate ID list including index information indicating each extraction candidate data string; the candidate ID list is updated If not, the condition determination using the determination condition expression and the candidate ID update determination are performed for the next determination condition expression; the candidate ID list is updated. If determined, the candidate ID list is updated; discontinuity is calculated by calculating one or more of the input discriminant conditional expressions for the extraction candidate data string specified in the candidate ID list. By performing a data discrimination process and repeating these processes a plurality of times, candidate data corresponding to the updated candidate ID is extracted.

本発明のデータ処理プログラムは、コンピュータに、複数の抽出候補データ列および複数の判別条件式情報を入力し、抽出候補データ列のうち、全ての判別条件式を満たすような候補データのみを抽出させるデータ処理プログラムであって、処理対象である抽出候補データ列に対して、入力判別条件式の内１つまたは複数の判別条件式を計算することによって、連続データ判別処理を行う連続候補データ判別処理と；各抽出候補データ列を示すインデックス情報が含まれる候補ＩＤリストで指定される抽出候補データ列に対して、入力判別条件式の内１つまたは複数の条件式を計算することによって、不連続データ判別処理を行う不連続候補データ判別処理と；前記連続候補データ判別処理が行った判別結果に基づいて、候補ＩＤリストに含まれる候補ＩＤを更新するかどうかを判断する候補ＩＤ更新判定処理と；前記候補ＩＤ更新判定処理が候補ＩＤリストを更新すると判断した場合と、前記不連続候補データ判定処理の実行後に、候補ＩＤリストの更新を行う候補ＩＤ更新処理と；を前記コンピュータに実行させ、前述した連続候補データ判別処理、不連続候補データ判別処理、候補ＩＤ更新判定処理、および候補ＩＤ更新処理を複数回反復処理することによって、更新された候補ＩＤに対応する候補データを抽出させる。 The data processing program according to the present invention causes a computer to input a plurality of extraction candidate data strings and a plurality of discriminant condition information, and extract only candidate data satisfying all the discriminant expressions from the extraction candidate data strings. A continuous candidate data discrimination process that is a data processing program and performs continuous data discrimination processing by calculating one or a plurality of discrimination condition formulas among input discrimination condition formulas for an extraction candidate data string to be processed And discontinuous by calculating one or a plurality of conditional expressions of the input discriminant conditional expressions for the extracted candidate data strings specified by the candidate ID list including the index information indicating each extracted candidate data string Discontinuous candidate data discriminating processing for performing data discriminating processing; and including in the candidate ID list based on the discriminant result obtained by the continuous candidate data discriminating processing. A candidate ID update determination process for determining whether to update a candidate ID to be updated; a candidate ID list when the candidate ID update determination process determines to update a candidate ID list, and after the execution of the discontinuous candidate data determination process The candidate ID update process for updating is performed by the computer, and the continuous candidate data determination process, the discontinuous candidate data determination process, the candidate ID update determination process, and the candidate ID update process are repeatedly performed a plurality of times. To extract candidate data corresponding to the updated candidate ID.

本発明によれば、連続候補データ判別処理と不連続候補データ判別処理とのどちらを実施するかどうかを切り替えられるようにし、各判別式の判別処理実行後に、次の判別式の判別処理の実行を動的判断させることで、より効率のよい並列判別処理を選択・実行できる。 According to the present invention, it is possible to switch whether the continuous candidate data discrimination process or the discontinuous candidate data discrimination process is performed, and after the discrimination process of each discriminant is executed, the discrimination process of the next discriminant is executed. Can be selected and executed more efficiently.

本発明の第１の実施形態に係るデータ処理装置の構成を例示するブロック図である。It is a block diagram which illustrates the composition of the data processor concerning a 1st embodiment of the present invention. 本発明の第１の実施形態に係るデータ処理装置の機能的な構成を例示するブロック図である。It is a block diagram which illustrates the functional structure of the data processor which concerns on the 1st Embodiment of this invention. 本発明の第１の実施形態に係るデータ処理装置の動作の概要を例示するフローチャートである。It is a flowchart which illustrates the outline | summary of operation | movement of the data processor which concerns on the 1st Embodiment of this invention. 本発明の第１の実施形態に係るデータ処理装置を構成する記憶部に記憶されたデータの具体例を示す図である。It is a figure which shows the specific example of the data memorize | stored in the memory | storage part which comprises the data processor which concerns on the 1st Embodiment of this invention. 本発明の第１の実施形態に係る、候補データ判別条件式情報の具体例を表す図である。It is a figure showing the specific example of candidate data discrimination | determination conditional expression information based on the 1st Embodiment of this invention. 本発明の第１の実施形態に係る、候補ＩＤ更新判定閾値情報の具体例を表す図である。It is a figure showing the specific example of candidate ID update determination threshold value information based on the 1st Embodiment of this invention. 本発明の第１の実施形態に係る、連続候補データ判別部で実行される連続候補データ判別処理の概要を例示するプログラムの一例である。It is an example of the program which illustrates the outline | summary of the continuous candidate data discrimination | determination process performed in the continuous candidate data discrimination | determination part based on the 1st Embodiment of this invention. 本発明の第１の実施形態に係る、候補ＩＤ更新判定部で実行される候補ＩＤ更新判定処理の概要を例示するプログラムの一例である。It is an example of the program which illustrates the outline | summary of the candidate ID update determination process performed in the candidate ID update determination part based on the 1st Embodiment of this invention. 本発明の第１の実施形態に係る、候補ＩＤ更新部で実行される候補ＩＤ更新処理の概要を例示するプログラムの一例である。It is an example of the program which illustrates the outline | summary of the candidate ID update process performed in the candidate ID update part based on the 1st Embodiment of this invention. 本発明の第１の実施形態に係る、候補ＩＤ更新処理の前後の、記憶部に記憶されたデータの一部を示す図である。It is a figure which shows a part of data memorize | stored in the memory | storage part before and behind candidate ID update process based on the 1st Embodiment of this invention. 本発明の第１の実施形態に係る、不連続候補データ判別部で実行される不連続候補データ判別処理の概要を例示するプログラムの一例である。It is an example of the program which illustrates the outline | summary of the discontinuous candidate data discrimination | determination process performed in the discontinuous candidate data discrimination | determination part based on the 1st Embodiment of this invention. 本発明の第２の実施形態に係る、候補ＩＤ更新判定閾値情報の具体例を表す図である。It is a figure showing the specific example of candidate ID update determination threshold value information based on the 2nd Embodiment of this invention. 本発明の第３の実施形態に係る、候補ＩＤ更新判定閾値情報の具体例を表す図である。It is a figure showing the specific example of candidate ID update determination threshold value information based on the 3rd Embodiment of this invention. 本発明の第４の実施形態に係るデータ処理装置の機能的な構成を例示するブロック図である。It is a block diagram which illustrates the functional structure of the data processor which concerns on the 4th Embodiment of this invention. 非特許文献１の検出処理の手法を、コンピュータで動作させるためのプログラムとして記述した場合のコードの一例である。It is an example of the code | cord | chord at the time of describing the technique of the detection process of a nonpatent literature 1 as a program for operating with a computer.

以下、本発明を実施する形態について図面を参照して詳細に説明する。以下の各実施形態に記載されている構成は単なる例示であり、本発明の技術範囲はそれらには限定されない。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. The configurations described in the following embodiments are merely examples, and the technical scope of the present invention is not limited thereto.

次に、発明を実施するための形態について図面を参照して詳細に説明する。 Next, embodiments for carrying out the invention will be described in detail with reference to the drawings.

［第１の実施形態］
［構成の説明］
本発明の第１の実施形態に係るデータ処理装置について、図面を参照して詳細に説明する。 [First Embodiment]
[Description of configuration]
A data processing apparatus according to a first embodiment of the present invention will be described in detail with reference to the drawings.

図１は、本第１の実施形態におけるデータ処理装置の機能的な構成を例示するブロック図である。 FIG. 1 is a block diagram illustrating a functional configuration of the data processing apparatus according to the first embodiment.

図１を参照すると、本発明の第１の実施形態におけるデータ処理装置１００は、演算処理部１０１と、記憶部１０２と、入力部１０３とを備える。データ処理装置１００を構成するこれらの構成要素の間は、任意の通信手段（例えば、通信バスや通信ネットワーク等）により、相互に通信可能に接続されている。以下、それぞれの構成要素について概要を説明する。 Referring to FIG. 1, the data processing device 100 according to the first exemplary embodiment of the present invention includes an arithmetic processing unit 101, a storage unit 102, and an input unit 103. These components constituting the data processing apparatus 100 are connected to each other by an arbitrary communication means (for example, a communication bus or a communication network) so that they can communicate with each other. Hereinafter, the outline of each component will be described.

演算処理部１０１は、例えば、後述する記憶部１０２に記憶されたデータに対して任意の演算処理を実行可能な、専用又は汎用のプロセッサ（ＣＰＵ（central processing unit）等）である。演算処理部１０１は、例えば、記憶部１０２に記憶された任意のソフトウェア・プログラム（コンピュータ・プログラム、以下単に「プログラム」と称する場合がある）を実行可能であってもよい。なお、プログラムは、記憶部１０２に限定されず、データ処理装置１００の内外の任意の装置（不図示）に保持されてもよい。この場合、必要に応じて演算処理部１０１が当該プログラムを読み出して実行する。 The arithmetic processing unit 101 is, for example, a dedicated or general-purpose processor (CPU (central processing unit) or the like) that can execute arbitrary arithmetic processing on data stored in the storage unit 102 described later. The arithmetic processing unit 101 may be able to execute, for example, an arbitrary software program (computer program, hereinafter simply referred to as “program”) stored in the storage unit 102. The program is not limited to the storage unit 102 and may be held in any device (not shown) inside or outside the data processing device 100. In this case, the arithmetic processing unit 101 reads and executes the program as necessary.

記憶部１０２は、任意のデータを記憶可能な記憶デバイス（メモリ）である。記憶部１０２は、半導体記憶装置等により実現された揮発性あるいは不揮発性のメモリデバイスにより実現されてもよい。なお、記憶部１０２は、上記に限定されず、任意の記憶デバイス（例えば、磁気記憶デバイス、光磁気記憶デバイス、光記憶デバイス、等）により実現可能である。 The storage unit 102 is a storage device (memory) capable of storing arbitrary data. The storage unit 102 may be realized by a volatile or non-volatile memory device realized by a semiconductor storage device or the like. The storage unit 102 is not limited to the above, and can be realized by any storage device (for example, a magnetic storage device, a magneto-optical storage device, an optical storage device, etc.).

入力部１０３は、データ処理装置１００に対して任意のデータを入力可能な入力装置である。入力部１０３は、データ処理装置１００の構成に応じて、任意の入力装置を用いて実現可能である。即ち、入力部１０３は、例えば、ネットワークを介して任意のデータを入力可能なネットワークデバイスでもよい。また、入力部１０３は、例えば、データ処理装置１００のユーザが直接データを入力可能な、インタフェース装置であってもよい。また、入力部１０３は、例えば、任意の記憶媒体（記録媒体）を介してデータを入力可能な、記憶媒体（記録媒体）の読み込みデバイスであってもよい。本第１の実施形態においては、入力部１０３を介して、処理対象のデータ集合である入力データがデータ処理装置１００に入力され、その入力データが記憶部１０２に保持（記憶）される。 The input unit 103 is an input device that can input arbitrary data to the data processing apparatus 100. The input unit 103 can be realized using any input device according to the configuration of the data processing apparatus 100. That is, the input unit 103 may be a network device that can input arbitrary data via a network, for example. The input unit 103 may be an interface device that allows a user of the data processing apparatus 100 to directly input data, for example. The input unit 103 may be, for example, a storage medium (recording medium) reading device capable of inputting data via an arbitrary storage medium (recording medium). In the first embodiment, input data that is a data set to be processed is input to the data processing apparatus 100 via the input unit 103, and the input data is held (stored) in the storage unit 102.

上記のように構成された本第１の実施形態におけるデータ処理装置１００は、入力データ列に対して、複数の判別条件の全てを満たすような入力データ列のみを抽出する抽出処理を実行する。より具体的には、本第１の実施形態におけるデータ処理装置１００においては、演算処理部１０１が、記憶部１０２に記憶されたデータ（入力データ列）に対し、抽出処理を実行する。この場合、演算処理部１０１は、特定のプログラムを実行することにより、上記抽出処理を実行してもよい。これに限定されず、演算処理部１０１は、予め組み込まれたロジック等により、上記抽出処理を実行してもよい。 The data processing apparatus 100 according to the first embodiment configured as described above performs an extraction process for extracting only an input data string that satisfies all of the plurality of determination conditions for the input data string. More specifically, in the data processing device 100 according to the first embodiment, the arithmetic processing unit 101 performs an extraction process on data (input data string) stored in the storage unit 102. In this case, the arithmetic processing unit 101 may execute the extraction process by executing a specific program. However, the present invention is not limited to this, and the arithmetic processing unit 101 may execute the extraction process by using a previously incorporated logic or the like.

以下、本第１の実施形態におけるデータ処理装置１００の動作について、図面を参照して説明する。 Hereinafter, the operation of the data processing apparatus 100 according to the first embodiment will be described with reference to the drawings.

図２は、本第１の実施形態におけるデータ処理装置１００の演算処理部１０１における動作の概要を例示する処理ブロック図である。図２に例示するように、データ処理装置１００の演算処理部１０１における主要な構成要素は、大別して、連続候補データ判別部２０１、不連続候補データ判別部２０２、候補ＩＤ（identification; identity）更新判定部２０３、および候補ＩＤ更新部２０４から成る。 FIG. 2 is a processing block diagram illustrating an outline of the operation in the arithmetic processing unit 101 of the data processing apparatus 100 according to the first embodiment. As illustrated in FIG. 2, main components in the arithmetic processing unit 101 of the data processing device 100 are roughly classified into a continuous candidate data determining unit 201, a discontinuous candidate data determining unit 202, and candidate ID (identification; identity) update. It consists of a determination unit 203 and a candidate ID update unit 204.

後述するように、連続候補データ判別部２０１、不連続候補データ判別部２０２、候補ＩＤ更新判定部２０３、および候補ＩＤ更新部２０４の組合せが、入力データ列に対して、複数の判別条件の全てを満たすような入力データ列のみを抽出する抽出手段として働く。 As will be described later, the combination of the continuous candidate data determination unit 201, the discontinuous candidate data determination unit 202, the candidate ID update determination unit 203, and the candidate ID update unit 204 includes all of a plurality of determination conditions for the input data string. It serves as an extraction means for extracting only the input data string that satisfies the above.

連続候補データ判別部２０１は、後述する連続候補データ判別処理を実行する。不連続候補データ判別部２０２は、後述する不連続候補データ判別処理を実行する。候補ＩＤ更新判定部２０３は、後述する候補ＩＤ更新判定処理を実行する。候補ＩＤ更新部２０４は、後述する候補ＩＤ更新処理を実行する。 The continuous candidate data discriminating unit 201 executes a continuous candidate data discriminating process described later. The discontinuous candidate data discriminating unit 202 executes a discontinuous candidate data discriminating process described later. The candidate ID update determination unit 203 executes candidate ID update determination processing described later. The candidate ID update unit 204 executes candidate ID update processing described later.

連続候補データ判別部２０１は、入力となる全抽出候補データ列に対して、入力判別条件式の内１つまたは複数の判別条件式を計算することによって、連続データ判別処理を行う。不連続候補データ判別部２０２は、各抽出候補データ列を示すインデックス情報が含まれる候補ＩＤリストを受け取り、この候補ＩＤリストで指定される抽出候補データ列のみに対して、入力判別条件式の内１つまたは複数の条件式を計算することによって、不連続データ判別処理を行う。候補ＩＤ更新判定部２０３は、連続候補データ判別部２０１が行った判別結果に基づいて、候補ＩＤリストに含まれる候補ＩＤを更新するかどうかを判断する。候補ＩＤ更新部２０４は、候補ＩＤ更新判定部２０３が候補ＩＤリストを更新すると判断した場合、および不連続候補データ判定部２０２の実行後に、候補ＩＤリストの更新を行う。 The continuous candidate data discriminating unit 201 performs a continuous data discrimination process by calculating one or a plurality of discriminant conditional expressions among the input discriminant conditional expressions for all the extraction candidate data strings to be input. The discontinuous candidate data discriminating unit 202 receives a candidate ID list including index information indicating each extraction candidate data string, and only the extraction candidate data string specified by this candidate ID list is included in the input discrimination conditional expression. Discontinuous data discrimination processing is performed by calculating one or more conditional expressions. The candidate ID update determination unit 203 determines whether to update the candidate ID included in the candidate ID list based on the determination result performed by the continuous candidate data determination unit 201. The candidate ID update unit 204 updates the candidate ID list when the candidate ID update determination unit 203 determines to update the candidate ID list and after the discontinuous candidate data determination unit 202 executes.

なお、記憶部１０２（図１）は、候補データ判別条件式情報５００と候補ＩＤ更新判定閾値情報６００とを保持している。 Note that the storage unit 102 (FIG. 1) holds candidate data determination conditional expression information 500 and candidate ID update determination threshold information 600.

また、図３は、データ処理装置１００の動作の概要を例示するフローチャートである。なお、図３に例示するフローチャートにおける各処理の実行順序は、処理結果に影響を与えない範囲で変更されてもよい。各ステップにおける処理については、後述する。 FIG. 3 is a flowchart illustrating an outline of the operation of the data processing apparatus 100. Note that the execution order of each process in the flowchart illustrated in FIG. 3 may be changed within a range that does not affect the process result. The process in each step will be described later.

図４は、本実施形態におけるデータ処理装置１００における、記憶部１０２に記憶されたデータの一部を例示する図である。 FIG. 4 is a diagram illustrating a part of data stored in the storage unit 102 in the data processing apparatus 100 according to the present embodiment.

図４に例示するように、記憶部１０２は、入力データ配列４０１、全候補数４０２、現候補数４０３、候補ＩＤ配列４０４、候補フラグ配列４０５、及び候補ＩＤ更新済フラグ４０６を保持する。入力データ配列４０１、候補ＩＤ配列４０４、及び候補フラグ配列４０５は、それぞれ連続した記憶領域に配置される。連続した記憶領域は、記憶領域を構成するアドレス等が物理的に連続した記憶領域であってもよく、論理的に連続した記憶領域であってもよい。 As illustrated in FIG. 4, the storage unit 102 holds an input data array 401, the total number of candidates 402, the current number of candidates 403, a candidate ID array 404, a candidate flag array 405, and a candidate ID updated flag 406. The input data array 401, candidate ID array 404, and candidate flag array 405 are each arranged in a continuous storage area. The continuous storage area may be a storage area in which addresses or the like constituting the storage area are physically continuous, or may be a logically continuous storage area.

入力データ配列４０１は、ｍ個（ｍは自然数）のデータの集合（入力データ）を保持する配列である。以下、入力データ配列４０１を「入力データ配列ｄ」と表し、入力データ配列４０１のｉ番目の要素（ｉは０以上の整数）をｄ[ｉ]と表す場合がある。 The input data array 401 is an array that holds a set (input data) of m pieces of data (m is a natural number). Hereinafter, the input data array 401 may be represented as “input data array d”, and the i-th element (i is an integer of 0 or more) of the input data array 401 may be represented as d [i].

全候補数４０２および現候補数４０３は、それぞれ順に、抽出候補の総数および任意の参照時点での抽出候補の数である。全候補数４０２がｎ（ｎは自然数）の場合、データ処理の開始時には、全候補数４０２および現候補数４０３には、ｎが入っている。 The total number of candidates 402 and the current number of candidates 403 are the total number of extraction candidates and the number of extraction candidates at any reference time, respectively. When the total number of candidates 402 is n (n is a natural number), n is included in the total number of candidates 402 and the current number of candidates 403 at the start of data processing.

候補ＩＤ配列４０４は、最大ｎ個の抽出候補を示す候補ＩＤを保持する配列である。以下、候補ＩＤ配列４０４を「候補ＩＤ配列ｐ」と表し、候補ＩＤ配列４０４のｉ番目の要素（ｉは０以上の整数）をｐ[ｉ]と表す場合がある。なお、データ処理の開始時には、候補ＩＤ配列ｐは、ｐ[ｉ]＝ｉとなる値が入っている。 The candidate ID array 404 is an array that holds candidate IDs indicating up to n extraction candidates. Hereinafter, the candidate ID array 404 may be expressed as “candidate ID array p”, and the i-th element (i is an integer of 0 or more) of the candidate ID array 404 may be expressed as p [i]. At the start of data processing, the candidate ID array p contains a value such that p [i] = i.

候補フラグ配列４０５は、候補ＩＤ配列４０４と同サイズの配列である。以下、候補フラグ配列４０５を「候補ＩＤ配列ｑ」と表し、候補フラグ配列４０５のｉ番目の要素（ｉは０以上の整数）をｑ[ｉ]と表す場合がある。候補フラグの要素ｑ[ｉ]は、候補ＩＤ配列４０４の要素ｐ[ｉ]が示す候補ＩＤが、判別処理の結果、抽出候補であるかどうかを示す、ｔｒｕｅもしくはｆａｌｓｅをとり、順に、“候補である”、もしくは、“候補ではない”を意味する。データ処理の開始時には、候補フラグ配列４０５の全要素には、ｔｒｕｅが入っている。 The candidate flag array 405 is an array having the same size as the candidate ID array 404. Hereinafter, the candidate flag array 405 may be expressed as “candidate ID array q”, and the i-th element (i is an integer of 0 or more) of the candidate flag array 405 may be expressed as q [i]. The candidate flag element q [i] takes true or false indicating whether or not the candidate ID indicated by the element p [i] of the candidate ID array 404 is an extraction candidate as a result of the discrimination process. Or “not a candidate”. At the start of data processing, all elements of the candidate flag array 405 contain true.

現在候補数４０３は、ｎ候補である。 The current number of candidates 403 is n candidates.

候補ＩＤ更新済フラグ４０６は、候補ＩＤ配列４０５が初期状態から更新されたかどうかを示すフラグである。候補ＩＤ更新済フラグ４０６の値は、ｔｒｕｅもしくはｆａｌｓｅをとり、順に、“更新済”、もしくは、“未更新”を意味する。データ処理の開始時には、候補ＩＤ更新済フラグ４０６はｆａｌｓｅ（未更新）が入っている。 The candidate ID updated flag 406 is a flag indicating whether the candidate ID array 405 has been updated from the initial state. The value of the candidate ID updated flag 406 takes “true” or “false” and means “updated” or “unupdated” in order. At the start of data processing, the candidate ID updated flag 406 contains false (not updated).

図５及び図６は、それぞれ、候補データ判別条件式情報５００及び候補ＩＤ更新判定閾値情報６００の一例を示す図である。 5 and 6 are diagrams illustrating examples of candidate data determination conditional expression information 500 and candidate ID update determination threshold information 600, respectively.

図５に示されるように、候補データ判別条件式情報５００は、判別条件式数Ｌ（Ｌは自然数）と、Ｌ個の判別条件式Ｃ（０）、Ｃ（１）、・・・、Ｃ（Ｌ−１）とから成る。それぞれの判別条件式Ｃ（０）〜Ｃ（Ｌ−１）は、自然数ｘを引数として算出される評価値Ｓ(ｘ)と、Ｓ(ｘ)と閾値との大小関係の比較を行う条件式Ｔ(ｙ)とから成る。なお、評価値Ｓ(ｘ)の自然数の引数ｘは、候補ＩＤや、候補ＩＤをインデックスとしてデータ配列ｄを参照した際の参照値などを与えることを想定している。本例では、候補ＩＤを引数として与える場合を例に説明する。 As shown in FIG. 5, the candidate data discrimination conditional expression information 500 includes a discrimination condition formula number L (L is a natural number) and L discrimination condition formulas C (0), C (1),. (L-1). Each discrimination conditional expression C (0) to C (L-1) is a conditional expression that compares the evaluation value S (x) calculated using the natural number x as an argument and the magnitude relationship between S (x) and a threshold value. T (y). Note that the natural number argument x of the evaluation value S (x) is assumed to give a candidate ID, a reference value when the data array d is referred to using the candidate ID as an index, and the like. In this example, a case where a candidate ID is given as an argument will be described as an example.

図６に示されるように、候補ＩＤ更新判定閾値情報６００は、０以上１未満の小数値である候補ＩＤ更新判定閾値から成る。 As shown in FIG. 6, the candidate ID update determination threshold information 600 includes a candidate ID update determination threshold that is a decimal value between 0 and 1.

次に、図３のフローチャートを参照しながら、データ処理装置１００の動作について説明する。なお、以下においては、図５及び図６に示す、候補データ判別条件式情報５００及び候補ＩＤ更新判定閾値情報６００を具体例として用いて説明する。 Next, the operation of the data processing apparatus 100 will be described with reference to the flowchart of FIG. In the following description, the candidate data determination conditional expression information 500 and the candidate ID update determination threshold information 600 shown in FIGS. 5 and 6 will be described as specific examples.

図３のフローチャートで示す通り、データ処理装置１００の演算処理部１０１は、ｉが０から順に、Ｌ−１（Ｌは判別条件式数）までＬ回の反復処理を行う（ステップＳ３０１、Ｓ３０２）。 As shown in the flowchart of FIG. 3, the arithmetic processing unit 101 of the data processing apparatus 100 performs L times of iteration processing from i = 0 to L−1 (L is the number of discriminant expressions) (steps S301 and S302). .

反復処理の初めに、演算処理部１０２は、候補ＩＤ更新済フラグ４０６を参照する（ステップＳ３０３）。候補ＩＤが更新済みではない場合（ステップＳ３０３の「更新されていない」）、演算処理部１０２の連続候補データ判別部２０１は、連続候補データ判別処理を実行する（ステップＳ３０４）。候補ＩＤが更新済みの場合（ステップＳ３０３の「更新されている」）、演算処理部１０２の不連続候補データ判別部２０２は、不連続候補データ判別処理を実行する（ステップＳ３０５）。 At the beginning of the iterative process, the arithmetic processing unit 102 refers to the candidate ID updated flag 406 (step S303). If the candidate ID has not been updated (“not updated” in step S303), the continuous candidate data determination unit 201 of the arithmetic processing unit 102 executes a continuous candidate data determination process (step S304). When the candidate ID has been updated (“updated” in step S303), the discontinuous candidate data determination unit 202 of the arithmetic processing unit 102 executes a discontinuous candidate data determination process (step S305).

連続候補データ判別部２０１での連続候補データ判別処理の実行後、演算処理部１０２の候補ＩＤ更新判定部２０３は、候補ＩＤ更新判定処理を実行することにより、候補ＩＤを更新するかどうかの判定を行う（ステップＳ３０６）。更新すると判定した場合（ステップＳ３０６の「更新する」）、演算処理部１０１の候補ＩＤ更新部２０４は、候補ＩＤ更新処理を実行することにより、候補ＩＤを更新し（ステップＳ３０７）、反復処理の１回を完了する（ステップＳ３０８）。更新しないと判定した場合（ステップＳ３０６の「更新しない」）、演算処理部１０１は、候補ＩＤを更新せずに反復処理の１回を完了する（ステップＳ３０８）。 After execution of the continuous candidate data determination process in the continuous candidate data determination unit 201, the candidate ID update determination unit 203 of the arithmetic processing unit 102 determines whether to update the candidate ID by executing the candidate ID update determination process. Is performed (step S306). When it is determined to update (“update” in step S306), the candidate ID update unit 204 of the arithmetic processing unit 101 updates the candidate ID by executing the candidate ID update process (step S307). One time is completed (step S308). If it is determined not to update (“not update” in step S306), the arithmetic processing unit 101 completes one iteration without updating the candidate ID (step S308).

一方、不連続候補データ判別部２０２での不連続候補データ判別処理の実行後、演算処理部１０１の候補ＩＤ更新判定部２０３での候補ＩＤ更新判定処理を行わずに、候補ＩＤ更新部２０４は、候補ＩＤ更新処理を実行して、候補ＩＤを更新し（ステップＳ３０７）、反復処理の１回を完了する（ステップＳ３０８）。 On the other hand, after the discontinuous candidate data discriminating unit 202 performs the discontinuous candidate data discriminating process, the candidate ID updating unit 204 does not perform the candidate ID update discriminating process in the candidate ID update determining unit 203 of the arithmetic processing unit 101. The candidate ID update process is executed to update the candidate ID (step S307), and one iteration process is completed (step S308).

Ｌ回の反復処理が終わったときには（ステップＳ３０２のＮＯ）、現候補数４０３には全ての判別式を満たす候補の数、また、候補ＩＤ配列４０４の先頭から現候補数要素には、全ての判別式を満たした候補のＩＤが入っている。 When L iterations have been completed (NO in step S302), the number of candidates that satisfy all discriminants in the current candidate number 403, and all the current candidate number elements from the top of the candidate ID array 404 are all Contains candidate IDs that satisfy the discriminant.

以下、演算処理部１０１の連続候補データ判別部２０１で実行される連続候補データ判別処理、候補ＩＤ更新判定部２０３で実行される候補ＩＤ更新判定処理、候補ＩＤ更新部２０４で実行される候補ＩＤ更新処理、および不連続候補データ判別部２０２で実行される不連続候補データ判別処理について説明する。 Hereinafter, continuous candidate data determination processing executed by the continuous candidate data determination unit 201 of the arithmetic processing unit 101, candidate ID update determination processing executed by the candidate ID update determination unit 203, and candidate ID executed by the candidate ID update unit 204 The update process and the discontinuous candidate data determination process executed by the discontinuous candidate data determination unit 202 will be described.

図７は、演算処理部１０１の連続候補データ判別部２０１で実行される連続候補データ判別処理を示すプログラムの一例である。 FIG. 7 is an example of a program showing continuous candidate data determination processing executed by the continuous candidate data determination unit 201 of the arithmetic processing unit 101.

連続候補データ判別部２０１は、現候補数個の抽出候補に対して、ｉ番目の判別条件式Ｃ(ｉ)を満たすかどうかの判別を行う。このとき、連続候補データ判別部２０１は、全ての抽出候補に対して以下の処理を行う。 The continuous candidate data discriminating unit 201 discriminates whether or not the i-th discriminant conditional expression C (i) is satisfied for the current candidate number of extraction candidates. At this time, the continuous candidate data discriminating unit 201 performs the following processing for all extraction candidates.

まず、連続候補データ判別部２０１は、候補ＩＤ（CandidateID）を取得する。連続候補データ判別処理を実行している間は、候補ＩＤ配列４０４が更新されていないため、連続候補データ判別部２０１は、候補ＩＤ配列４０４を参照することなく、ループのインデックスｊが候補ＩＤとなる。連続候補データ判別部２０１は、候補ＩＤを引数として、判別条件式のＣ(ｉ)の評価値を計算する。そして、連続候補データ判別部２０１は、評価値が判別条件式Ｃ(ｉ)の条件式を満たすかどうかの判定を行う。閾値を満たさない場合、連続候補データ判別部２０１は、候補フラグ配列４０５のｊ番目の要素ｑ[ｊ]をｆａｌｓｅに更新する。なお、これらの処理は、異なる抽出候補に対して独立した処理であるため、連続候補データ判別部２０１は、連続候補データ判別処理を並列に実行することができる。 First, the continuous candidate data determination unit 201 acquires a candidate ID (CandidateID). Since the candidate ID array 404 is not updated while the continuous candidate data determination process is being executed, the continuous candidate data determination unit 201 refers to the loop index j as the candidate ID without referring to the candidate ID array 404. Become. The continuous candidate data discriminating unit 201 calculates the evaluation value of C (i) of the discriminant conditional expression using the candidate ID as an argument. Then, the continuous candidate data determination unit 201 determines whether or not the evaluation value satisfies the conditional expression of the determination conditional expression C (i). When the threshold value is not satisfied, the continuous candidate data determination unit 201 updates the j-th element q [j] of the candidate flag array 405 to false. Since these processes are independent processes for different extraction candidates, the continuous candidate data determining unit 201 can execute the continuous candidate data determining processes in parallel.

図８は、演算処理部１０１の候補ＩＤ更新判定部２０３で実行される候補ＩＤ更新判定処理を示すプログラムの一例である。 FIG. 8 is an example of a program showing candidate ID update determination processing executed by the candidate ID update determination unit 203 of the arithmetic processing unit 101.

候補ＩＤ更新判定部２０３は、候補フラグ配列４０５を参照することで、現時点での抽出候補として残っている残存候補数を数え上げる（０行目から６行目）。そして、候補ＩＤ更新判定部２０３は、全候補数のうちの残存候補数の割合である残存率を算出する（９行目）。残存率が候補ＩＤ更新判定閾値を下回った場合、もしくは、最後の判別条件式を用いた判別であった場合、候補ＩＤ更新判定部２０３は、候補ＩＤ更新をすると判定する（１０行目から１２行目）。そうでない場合、候補ＩＤ更新判定部２０３は、候補ＩＤを更新しないと判定する。 The candidate ID update determination unit 203 refers to the candidate flag array 405 to count the number of remaining candidates remaining as extraction candidates at the current time (from the 0th line to the 6th line). Then, the candidate ID update determination unit 203 calculates a remaining rate that is a ratio of the number of remaining candidates out of the total number of candidates (line 9). When the remaining rate falls below the candidate ID update determination threshold value or when determination is made using the last determination conditional expression, the candidate ID update determination unit 203 determines to update the candidate ID (from the 10th line to the 12th line) Line). Otherwise, the candidate ID update determination unit 203 determines not to update the candidate ID.

図９は、演算処理部１０１の候補ＩＤ更新部２０４で実行される候補ＩＤ更新処理を示すプログラムの一例である。 FIG. 9 is an example of a program showing candidate ID update processing executed by the candidate ID update unit 204 of the arithmetic processing unit 101.

候補ＩＤ更新部２０４は、候補ＩＤ配列４０４と候補フラグ配列４０５との再構成を行う。候補ＩＤ更新部２０４は、現候補数個分の候補ＩＤ配列４０４と候補フラグ配列４０５とを参照し、抽出候補として残っている候補の候補ID（ｑ[i] が trueであるようなp[i]）のみを、候補ID配列ｐに先頭から代入していくことで、再構成を行う。そして、候補ＩＤ更新部２０４は、最後に、現候補数を残存候補数で更新する。 The candidate ID update unit 204 reconstructs the candidate ID array 404 and the candidate flag array 405. The candidate ID update unit 204 refers to the candidate ID array 404 and candidate flag array 405 for the current number of candidates, and the candidate ID (q [i] for which the candidate candidate ID (q [i] is true) remains as an extraction candidate. Only i]) is substituted into the candidate ID array p from the beginning to perform reconstruction. Then, the candidate ID update unit 204 finally updates the current candidate number with the remaining candidate number.

図１０は、候補ID更新前後の、記憶部１０２の現候補数４０３、候補ID配列４０４、および候補フラグ配列４０５の様子を示した図である。 FIG. 10 is a diagram illustrating a state of the current candidate number 403, the candidate ID array 404, and the candidate flag array 405 in the storage unit 102 before and after the candidate ID update.

候補フラグ配列４０５のTはtrueを、Fはfalseを示す。候補ID更新前にて、候補フラグ配列４０５がtrueであるような要素に対応する、候補ID配列４０４の要素（０、１、４、…）のみが、候補ID更新後の候補ID配列に順に保存される。そして、それらの個数（ｎ’）が現候補数４０３に入り、候補フラグ配列４０５の先頭ｎ’個の要素にはtrueが入る。候補ID配列４０４および候補フラグ配列４０５の（ｎ’＋１）以降の要素は、以降利用しないので、どのような値が入っていてもよい。 T in the candidate flag array 405 indicates true, and F indicates false. Only the elements (0, 1, 4,...) Of the candidate ID array 404 corresponding to elements for which the candidate flag array 405 is true before the candidate ID update are sequentially added to the candidate ID array after the candidate ID update. Saved. Then, the number (n ′) of them enters the current candidate number 403, and true is entered in the first n ′ elements of the candidate flag array 405. Since the elements after (n ′ + 1) in the candidate ID array 404 and the candidate flag array 405 are not used, any value may be entered.

図１１は、演算処理部１０１の不連続候補データ判別部２０２で実行される不連続候補データ判別処理を示すプログラムの一例である。 FIG. 11 is an example of a program showing the discontinuity candidate data determination process executed by the discontinuity candidate data determination unit 202 of the arithmetic processing unit 101.

不連続候補データ判別部２０２は、現候補数個の抽出候補に対して、ｉ番目の判別条件式Ｃ(ｉ)を満たすかどうかの判別を行う。 The discontinuous candidate data discriminating unit 202 discriminates whether or not the i-th discriminant conditional expression C (i) is satisfied for the current candidate number of extraction candidates.

不連続候補データ判別部２０２は、連続候補データ判別部２０１での連続候補データ判別処理とほぼ同じような処理をするが、唯一の違いは、候補ＩＤ（CandidateID）の取得方法である。 The discontinuous candidate data discriminating unit 202 performs almost the same process as the continuous candidate data discriminating process in the continuous candidate data discriminating unit 201, but the only difference is the method for obtaining the candidate ID (CandidateID).

詳述すると、連続候補データ判別部２０１では、ループのインデックスｊが候補ＩＤとなっていた。これに対して、不連続候補データ判別部２０２では、ループのインデックスｊが示す候補ID配列の要素ｐ[ｊ]が候補ＩＤとなる。換言すれば、不連続候補データ判別部２０２は、抽出候補データを示すインデックス情報である候補ＩＤリストを受け取る。 More specifically, in the continuous candidate data discriminating unit 201, the loop index j is a candidate ID. On the other hand, in the discontinuous candidate data discriminating unit 202, the element p [j] of the candidate ID array indicated by the loop index j is the candidate ID. In other words, the discontinuous candidate data determination unit 202 receives a candidate ID list that is index information indicating extraction candidate data.

それ以外の処理は、連続候補データ判別処理と同等である。また、これらの処理は、連続候補データ判別部２０１の連続データ判別処理と同様に、異なる抽出候補に対して独立した処理であるため、不連続候補データ判別部２０２は、不連続データ判別処理を並列に実行することができる。 Other processes are equivalent to the continuous candidate data discrimination process. Since these processes are independent processes for different extraction candidates as in the continuous data determination process of the continuous candidate data determination unit 201, the discontinuous candidate data determination unit 202 performs the discontinuous data determination process. Can be executed in parallel.

［効果の説明］
次に、本第１の実施形態の効果について説明する。 [Description of effects]
Next, the effect of the first embodiment will be described.

先に述べたとおり、連続候補データ判別処理や不連続候補データ判別処理は、異なる抽出候補に対して独立した処理であるため、連続候補データ判別部２０１や不連続候補データ判別部２０２はそれらの処理を並列に実行することができる。 As described above, the continuous candidate data discriminating process and the discontinuous candidate data discriminating process are independent processes for different extraction candidates. Processing can be performed in parallel.

また、連続候補データ判別部２０１が連続候補データ判別処理を並列に実行する場合、評価値計算の際の記憶域１０２の参照が連続（シーケンシャルアクセス）、もしくは、固定値要素とび（ストライドアクセス）になるため、効率的なメモリアクセスをすることができる。一方で、不連続候補データ判別部２０２が不連続候補データ判別処理を並列に実行する場合は、評価値計算の際の記憶域１０２の参照が、候補IDをインデックスとした参照（ランダムアクセス、リストアクセス）となるため、メモリアクセス性能が並列処理の効率性を損なう可能性がある。候補IDを更新しなければ、全判別式に対して、連続候補データ判別部２０１は連続候補データ判別処理で処理することができる。しかしながら、その場合は、抽出候補でない候補（候補フラグがｆａｌｓｅになった候補）に対しても、判別処理を施すことになり、不要な計算を行うことになる。 Further, when the continuous candidate data discriminating unit 201 executes the continuous candidate data discriminating process in parallel, the reference to the storage area 102 in the evaluation value calculation is continuous (sequential access) or fixed value element skipping (stride access). Therefore, efficient memory access can be performed. On the other hand, when the discontinuous candidate data discriminating unit 202 executes the discontinuous candidate data discriminating process in parallel, the reference of the storage area 102 at the time of evaluation value calculation is a reference using the candidate ID as an index (random access, list Memory access performance may impair the efficiency of parallel processing. If the candidate ID is not updated, the continuous candidate data determining unit 201 can process the continuous candidate data determining process for all discriminants. However, in that case, a determination process is performed even for a candidate that is not an extraction candidate (a candidate whose candidate flag is false), and unnecessary calculation is performed.

そこで、本発明の第１の実施形態では、連続候補データ判別処理と不連続候補データ判別処理とのどちらを実施するかどうかを切り替えられるようにし、各判別式の判別処理実行後に、次の判別式の判別処理の実行を動的判断させることで、より効率のよい並列判別処理を選択・実行している。 Therefore, in the first embodiment of the present invention, it is possible to switch between the continuous candidate data discrimination process and the discontinuous candidate data discrimination process, and after executing the discrimination process of each discriminant, the next discrimination is performed. By making the determination of the expression determination process dynamically, a more efficient parallel determination process is selected and executed.

［第２の実施形態］
上記第１の実施形態の例では、図６に示されるように、候補ＩＤ更新判定閾値情報６００を１つの候補ＩＤ更新判定閾値とした。 [Second Embodiment]
In the example of the first embodiment, as shown in FIG. 6, the candidate ID update determination threshold information 600 is set as one candidate ID update determination threshold.

これに対して、本発明の第２の実施形態に係るデータ処理装置では、図１２に示されるように、候補ＩＤ更新判定閾値情報６００Ａとして、複数の（例えば、判別条件式数個の）候補ＩＤ更新判定閾値を有する、候補ＩＤ更新判定閾値リストを用いる。 On the other hand, in the data processing apparatus according to the second embodiment of the present invention, as shown in FIG. 12, a plurality of candidates (for example, several discriminant conditional expressions) are used as candidate ID update determination threshold information 600A. A candidate ID update determination threshold list having an ID update determination threshold is used.

そして、本発明の第２の実施形態に係るデータ処理装置では、残りの判別条件式の数に応じて、候補ＩＤ更新判定部２０３は、候補ＩＤ更新判定処理で用いる候補ＩＤ更新判定閾値を変更するという方法を採用する。 In the data processing apparatus according to the second embodiment of the present invention, the candidate ID update determination unit 203 changes the candidate ID update determination threshold used in the candidate ID update determination process according to the number of remaining determination conditional expressions. The method of doing is adopted.

［第３の実施形態］
また、上記第１の実施形態の例では、図６に示されるように、候補ＩＤ更新判定閾値情報６００として、候補ＩＤ更新判定閾値を入力として与えるものとした。 [Third Embodiment]
In the example of the first embodiment, as shown in FIG. 6, the candidate ID update determination threshold information 600 is given as an input as candidate ID update determination threshold information 600.

これに対して、本発明の第３の実施形態に係るデータ処理装置では、候補ＩＤ更新判定部２０３が、候補ＩＤ更新判定処理の中で候補ＩＤ更新判定閾値を計算で求めるという構成をとっている。 On the other hand, in the data processing device according to the third embodiment of the present invention, the candidate ID update determination unit 203 has a configuration in which the candidate ID update determination threshold is obtained by calculation in the candidate ID update determination process. Yes.

例えば、図１３で示すように候補ＩＤ更新判定閾値情報６００Ｂとして、プロセッサ別の連続候補データ判別コスト係数、不連続候補データ判別コスト係数を与え、それにより、候補ＩＤ更新判定の閾値を計算で求めるという構成を例に挙げる。 For example, as shown in FIG. 13, as the candidate ID update determination threshold information 600B, a continuous candidate data determination cost coefficient and a discontinuous candidate data determination cost coefficient for each processor are given, and thereby a threshold for candidate ID update determination is obtained by calculation. Take the configuration

例えば、プロセッサＡの場合、連続候補データ判別コスト係数が1.0であり、不連続候補データ判別コスト係数が3.0である。これは、連続候補データ判別部２０１の連続候補データ判別処理での抽出候補１要素あたりの処理コストが1.0であり、不連続候補データ判別部２０２の不連続候補データ判別処理での抽出候補１要素あたりの処理コストが3.0であることを意味する。 For example, in the case of the processor A, the continuous candidate data discrimination cost coefficient is 1.0, and the discontinuous candidate data discrimination cost coefficient is 3.0. This is because the processing cost per one extraction candidate element in the continuous candidate data determination process of the continuous candidate data determination unit 201 is 1.0, and one extraction candidate element in the discontinuous candidate data determination process of the discontinuous candidate data determination unit 202 This means that the processing cost per unit is 3.0.

全候補数をｎ、残候補数がｚとする。この場合、仮に候補ＩＤ更新部２０４で候補ＩＤ更新処理を実行しなかった場合は、判別処理１回の総コストは 1.0×ｎである。何故なら、次の連続候補データ判別部２０１の連続候補データ判別処理では全候補に対して処理を行うのからである。これに対して、候補ＩＤ更新部２０４で候補ＩＤ更新処理を実行した場合の総コストは、3.0×ｚである。したがって、候補ＩＤ更新判定部２０３の候補ＩＤ更新判定処理では、それぞれの総コストを比較し、コストが小さくなる方を選択する。 The total number of candidates is n, and the number of remaining candidates is z. In this case, if the candidate ID update unit 204 does not execute the candidate ID update process, the total cost for one determination process is 1.0 × n. This is because in the next continuous candidate data determination process of the next continuous candidate data determination unit 201, processing is performed for all candidates. On the other hand, the total cost when the candidate ID update unit 204 executes the candidate ID update process is 3.0 × z. Therefore, in the candidate ID update determination process of the candidate ID update determination unit 203, the total costs are compared and the one with the smaller cost is selected.

すなわち、候補ＩＤ更新判定部２０３は、連続候補データ判別部２０１が行った判別の結果に基づいて、残存候補数を数え上げ、数え上げた残存候補数と不連続候補データ判別コスト係数とから不連続候補データ判別コストを算出するとともに、全抽出候補数と連続候補データ判別コスト係数とから連続候補データ判別コストを算出し、不連続候補データ判別コストの方が連続候補データ判別コストより小さくなるときに、候補ＩＤを更新すると判定する。 That is, the candidate ID update determination unit 203 counts the number of remaining candidates based on the determination result performed by the continuous candidate data determination unit 201, and calculates the number of remaining candidates and the discontinuous candidate data determination cost coefficient from the counted number of remaining candidates. Calculate the data discrimination cost, calculate the continuous candidate data discrimination cost from the total number of extraction candidates and the continuous candidate data discrimination cost coefficient, and when the discontinuous candidate data discrimination cost is smaller than the continuous candidate data discrimination cost, It is determined that the candidate ID is updated.

［第４の実施形態］
上記の候補ＩＤ更新判定閾値情報６００Ｂは、入力データとしてデータ処理装置１００の利用者が与えてもよいが、データ処理装置１００の中で、動的に更新してもよい。 [Fourth Embodiment]
The candidate ID update determination threshold information 600B may be given by the user of the data processing apparatus 100 as input data, but may be dynamically updated in the data processing apparatus 100.

例えば、本発明の第４の実施形態に係るデータ処理装置では、図１４に示されるように、演算処理部１０１Ａがコスト計測部２０５を更に有する。コスト計測部２０５は、上記プロセッサ別の候補データ判別コスト係数、不連続候補データ判別コスト係数を、複数の抽出処理を進める中で更新することができる。 For example, in the data processing apparatus according to the fourth embodiment of the present invention, the arithmetic processing unit 101A further includes a cost measuring unit 205 as shown in FIG. The cost measuring unit 205 can update the candidate data discrimination cost coefficient for each processor and the discontinuous candidate data discrimination cost coefficient while a plurality of extraction processes are in progress.

詳述すると、コスト計測部２０５は、処理にかかった実行時間や電力、通信量、記憶域のサイズといったコスト情報を計測する。コスト計測部２０５は、連続候補データ判別部２０１および不連続候補データ判別部２０２の実行にかかったコストを計測して、上記プロセッサ別の候補データ判別コスト係数、不連続候補データ判別コスト係数を更新する。 More specifically, the cost measuring unit 205 measures cost information such as execution time, power, communication amount, and storage area size required for processing. The cost measuring unit 205 measures the cost of execution of the continuous candidate data discriminating unit 201 and the discontinuous candidate data discriminating unit 202 and updates the candidate data discriminating cost coefficient and the discontinuous candidate data discriminating cost coefficient for each processor. To do.

尚、コスト計測部２０５は、上記コストを計測して、上記プロセッサ別の候補ＩＤ更新判定閾値を更新してもよい。 The cost measuring unit 205 may measure the cost and update the candidate ID update determination threshold for each processor.

なお、本発明は、上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、複数の構成要素の適宜な組合せにより種々の発明を形成できる。 Note that the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. Various inventions can be formed by appropriately combining a plurality of components.

尚、データ処理装置の各部は、ハードウェアとソフトウェアとの組み合わせを用いて実現すればよい。ハードウェアとソフトウェアとを組み合わせた形態では、ＲＡＭ（random access memory）にデータ処理プログラムが展開され、該データ処理プログラムに基づいて制御部（ＣＰＵ（central processing unit））等のハードウェアを動作させることによって、各部を各種手段として実現する。また、該データ処理プログラムは、記録媒体に記録されて頒布されても良い。当該記録媒体に記録されたデータ処理プログラムは、有線、無線、又は記録媒体そのものを介して、メモリに読込まれ、制御部等を動作させる。尚、記録媒体を例示すれば、オプティカルディスクや磁気ディスク、半導体メモリ装置、ハードディスクなどが挙げられる。 In addition, what is necessary is just to implement | achieve each part of a data processor using the combination of hardware and software. In the form of a combination of hardware and software, a data processing program is developed in a RAM (random access memory), and hardware such as a control unit (CPU (central processing unit)) is operated based on the data processing program. Thus, each unit is realized as various means. Further, the data processing program may be recorded on a recording medium and distributed. The data processing program recorded on the recording medium is read into the memory via the wired, wireless, or recording medium itself, and operates the control unit and the like. Examples of the recording medium include an optical disk, a magnetic disk, a semiconductor memory device, and a hard disk.

上記実施の形態を別の表現で説明すれば、データ処理装置として動作させるコンピュータを、ＲＡＭに展開されたデータ処理プログラムに基づき、連続候補データ判別部２０１、不連続候補データ判別部２０２、候補ＩＤ更新判定部２０３、候補ＩＤ更新部２０４、およびコスト計測部２０５として動作させることで実現することが可能である。 To describe the above embodiment in another expression, a computer that operates as a data processing apparatus is based on a data processing program expanded in a RAM, and a continuous candidate data determining unit 201, a discontinuous candidate data determining unit 202, a candidate ID This can be realized by operating as the update determination unit 203, the candidate ID update unit 204, and the cost measurement unit 205.

また、本発明の具体的な構成は前述の実施の形態に限られるものではなく、この発明の要旨を逸脱しない範囲の変更があってもこの発明に含まれる。 In addition, the specific configuration of the present invention is not limited to the above-described embodiment, and changes within a range not departing from the gist of the present invention are included in the present invention.

以上、実施の形態を参照して本願発明を説明したが、本願発明は上記実施の形態に限定されるものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。 Although the present invention has been described with reference to the embodiments, the present invention is not limited to the above embodiments. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.

本発明は、ＧＰＧＰＵ（General Purpose computing on Graphics Processing Unit）やベクトル型命令を有するＣＰＵで、抽出計算を並列に効率的に処理するデータ処理装置に適用可能である。 The present invention is applicable to a data processing apparatus that efficiently processes extraction calculations in parallel with a CPU having GPGPU (General Purpose Computing on Graphics Processing Unit) and vector type instructions.

１００データ処理装置
１０１、１０１Ａ演算処理部（プロセッサ）
１０２記憶部（メモリ）
１０３入力部
２０１連続候補データ判別部
２０２不連続候補データ判別部
２０３候補ＩＤ更新判定部
２０４候補ＩＤ更新部
２０５コスト計測部
５００候補データ判別条件式情報
６００、６００Ａ、６００Ｂ候補ＩＤ更新判定閾値情報 100 Data processing device 101, 101A Arithmetic processing unit (processor)
102 Memory unit (memory)
DESCRIPTION OF SYMBOLS 103 Input part 201 Continuous candidate data discrimination | determination part 202 Discontinuous candidate data discrimination | determination part 203 Candidate ID update determination part 204 Candidate ID update part 205 Cost measurement part 500 Candidate data discrimination | determination conditional expression information 600, 600A, 600B Candidate ID update determination threshold information

Claims

処理対象である複数の抽出候補データ列に対して、入力判別条件式の内１つまたは複数の判別条件式を計算することによって、連続データ判別処理を行う連続候補データ判別手段と、
前記各抽出候補データ列を示すインデックス情報が含まれる候補ＩＤリストで指定される抽出候補データ列に対して、入力判別条件式の内１つまたは複数の条件式を計算することによって、不連続データ判別処理を行う不連続候補データ判別手段と、
前記連続候補データ判別手段が行った判別結果に基づいて、前記候補ＩＤリストに含まれる候補ＩＤを更新するかどうかを判断する候補ＩＤ更新判定手段と、
前記候補ＩＤ更新判定手段が前記候補ＩＤリストを更新すると判断した場合、および前記不連続候補データ判定手段の実行後に、前記候補ＩＤリストの更新を行う候補ＩＤ更新手段と、を備え、
前記連続候補データ判別手段、前記不連続候補データ判別手段、前記候補ＩＤ更新判別手段、および前記候補ＩＤ更新手段による処理を複数回反復処理することによって、前記複数の抽出候補データ列のうち、前記候補ＩＤ更新手段によって更新された候補ＩＤに対応する候補データが抽出されることを特徴とするデータ処理装置。 Continuous candidate data discriminating means for performing a continuous data discriminating process by calculating one or a plurality of discriminant conditional expressions among the input discriminant conditional expressions for a plurality of extraction candidate data strings to be processed;
By calculating one or a plurality of conditional expressions among the input candidate conditional expressions for the extracted candidate data strings specified by the candidate ID list including the index information indicating the respective extracted candidate data strings, the discontinuous data Discontinuous candidate data discrimination means for performing discrimination processing;
Candidate ID update determination means for determining whether to update the candidate ID included in the candidate ID list based on the determination result performed by the continuous candidate data determination means;
When the candidate ID update determination unit determines to update the candidate ID list, and after execution of the discontinuous candidate data determination unit, a candidate ID update unit that updates the candidate ID list,
Of the plurality of extraction candidate data strings, the processing by the continuous candidate data determination unit, the discontinuous candidate data determination unit, the candidate ID update determination unit, and the candidate ID update unit is repeatedly performed a plurality of times. A data processing apparatus, wherein candidate data corresponding to a candidate ID updated by a candidate ID update means is extracted.

当該データ処理装置は、候補ＩＤ更新判定閾値を保持し、
前記候補ＩＤ更新判定手段は、前記連続候補データ判別手段が行った判別の結果、全抽出候補数のうちの残存候補数の割合が、前記候補ＩＤ更新判定閾値を下回ったときに、前記候補ＩＤリストを更新すると判定する、
ことを特徴とする、請求項１に記載のデータ処理装置。 The data processing apparatus holds a candidate ID update determination threshold,
The candidate ID update determination unit determines whether the candidate ID update determination unit determines that the candidate ID update determination unit determines that the ratio of the number of remaining candidates out of the total number of extracted candidates is lower than the candidate ID update determination threshold. Decide to update the list,
The data processing apparatus according to claim 1, wherein:

当該データ処理装置が保持する前記候補ＩＤ更新判定閾値は、データ処理装置の構成に応じた複数の値であり、
前記候補ＩＤ更新判定手段は、当該データ処理装置の構成に応じた、前記候補ＩＤ更新判定閾値を選択して、更新判定処理を行う、
ことを特徴とする、請求項２に記載のデータ処理装置。 The candidate ID update determination threshold held by the data processing device is a plurality of values according to the configuration of the data processing device,
The candidate ID update determination means selects the candidate ID update determination threshold according to the configuration of the data processing device, and performs an update determination process.
The data processing apparatus according to claim 2, wherein:

前記連続候補データ判別手段および前記不連続候補データ判別手段の実行にかかったコストを計測して、前記候補ＩＤ更新判定閾値を更新するコスト計測手段を更に有する、
ことを特徴とする、請求項２または３に記載のデータ処理装置。 A cost measuring unit that measures the cost of execution of the continuous candidate data determining unit and the discontinuous candidate data determining unit and updates the candidate ID update determination threshold;
The data processing device according to claim 2, wherein the data processing device is a data processing device.

当該データ処理装置は、連続候補データ判別コスト係数および不連続候補データ判別コスト係数を保持し、
前記候補ＩＤ更新判定手段は、前記連続候補データ判別手段が行った判別の結果に基づいて、残存候補数を数え上げ、数え上げた残存候補数と前記不連続候補データ判別コスト係数とから不連続候補データ判別コストを算出するとともに、全抽出候補数と前記連続候補データ判別コスト係数とから連続候補データ判別コストを算出し、前記不連続候補データ判別コストの方が前記連続候補データ判別コストより小さくなるときに、前記候補ＩＤリストを更新すると判定する、
ことを特徴とする、請求項１に記載のデータ処理装置。 The data processing apparatus holds a continuous candidate data discrimination cost coefficient and a discontinuous candidate data discrimination cost coefficient,
The candidate ID update determination means counts the number of remaining candidates based on the result of determination performed by the continuous candidate data determination means, and calculates discontinuous candidate data from the counted number of remaining candidates and the discontinuous candidate data determination cost coefficient. When calculating the discriminant cost, calculating the continuous candidate data discriminating cost from the total number of extracted candidates and the continuous candidate data discriminating cost coefficient, and when the discontinuous candidate data discriminating cost is smaller than the continuous candidate data discriminating cost Determining that the candidate ID list is updated.
The data processing apparatus according to claim 1, wherein:

当該データ処理装置が保持する前記連続候補データ判別コスト係数および不連続候補データ判別コスト係数は、データ処理装置の構成に応じた複数の値であり、
前記候補ＩＤ更新判定手段は、当該データ処理装置の構成に応じた、前記連続候補データ判別コスト係数および不連続候補データ判別コスト係数を選択して、更新判定処理を行う、
ことを特徴とする、請求項５に記載のデータ処理装置。 The continuous candidate data determination cost coefficient and the discontinuous candidate data determination cost coefficient held by the data processing device are a plurality of values according to the configuration of the data processing device,
The candidate ID update determination unit selects the continuous candidate data determination cost coefficient and the discontinuous candidate data determination cost coefficient according to the configuration of the data processing apparatus, and performs an update determination process.
The data processing apparatus according to claim 5, wherein:

前記連続候補データ判別手段および前記不連続候補データ判別手段の実行にかかったコストを計測して、前記連続候補データ判別コスト係数および不連続候補データ判別コスト係数を更新するコスト計測手段を更に有する、
ことを特徴とする、請求項５または６に記載のデータ処理装置。 Cost measuring means for measuring the cost of execution of the continuous candidate data determining means and the discontinuous candidate data determining means and further updating the continuous candidate data determining cost coefficient and the discontinuous candidate data determining cost coefficient;
The data processing apparatus according to claim 5 or 6, characterized by the above.

データ処理装置が、複数の抽出候補データ列および複数の判別条件式情報を入力し、抽出候補データ列のうち、全ての判別条件式を満たすような候補データのみを抽出するようなデータ処理方法において、
処理対象である複数の抽出候補データ列に対して、入力判別条件式の内１つまたは複数の判別条件式を計算することによって、連続データ判別処理を行い、
前記判別条件式を用いた判定の結果に基づいて、各抽出候補データ列を示すインデックス情報が含まれる候補ＩＤリストに含まれる更新ＩＤを更新するかどうかを判断し、
前記候補ＩＤリストを更新すると判断しない場合は、次の判別条件式に対しては、前記の判別条件式を用いた条件判定と、前記候補ＩＤ更新判定を行い、
前記候補ＩＤリストを更新すると判断した場合は、前記候補ＩＤリストの更新を行い、
更新された更新候補ＩＤリストで指定される抽出候補データ列に対して、入力判別条件式の内１つまたは複数の条件式を計算することによって、不連続データ判別処理を行い、
上述した処理を複数回反復処理することによって、更新された候補ＩＤに対応する候補データが抽出されるデータ処理方法。 In a data processing method in which a data processing apparatus inputs a plurality of extraction candidate data strings and a plurality of discriminant condition expression information, and extracts only candidate data satisfying all the discrimination condition expressions from the extraction candidate data strings ,
A continuous data discrimination process is performed by calculating one or more discriminant conditional expressions among the input discriminant conditional expressions for a plurality of extraction candidate data strings to be processed,
Based on the determination result using the determination conditional expression, it is determined whether or not to update the update ID included in the candidate ID list including the index information indicating each extraction candidate data string,
If it is not determined to update the candidate ID list, for the next discriminant conditional expression, the condition determination using the discriminant conditional expression and the candidate ID update determination are performed,
If it is determined to update the candidate ID list, update the candidate ID list,
For the extraction candidate data string specified in the updated update candidate ID list, discontinuous data determination processing is performed by calculating one or more conditional expressions among the input determination conditional expressions,
A data processing method in which candidate data corresponding to an updated candidate ID is extracted by repeatedly performing the above-described processing a plurality of times.

コンピュータに、複数の抽出候補データ列および複数の判別条件式情報を入力し、抽出候補データ列のうち、全ての判別条件式を満たすような候補データのみを抽出させるデータ処理プログラムであって、
処理対象である複数の抽出候補データ列に対して、入力判別条件式の内１つまたは複数の判別条件式を計算することによって、連続データ判別処理を行う連続候補データ判別処理と、
各抽出候補データ列を示すインデックス情報が含まれる候補ＩＤリストで指定される抽出候補データ列に対して、入力判別条件式の内１つまたは複数の条件式を計算することによって、不連続データ判別処理を行う不連続候補データ判別処理と、
前記連続候補データ判別処理が行った判別結果に基づいて、前記候補ＩＤリストに含まれる候補ＩＤを更新するかどうかを判断する候補ＩＤ更新判定処理と、
前記候補ＩＤ更新判定処理が前記候補ＩＤリストを更新すると判断した場合と、前記不連続候補データ判定処理の実行後に、前記候補ＩＤリストの更新を行う候補ＩＤ更新処理と、
を前記コンピュータに実行させ、
前記連続候補データ判別処理、前記不連続候補データ判別処理、前記候補ＩＤ更新判定処理、および前記候補ＩＤ更新処理を複数回反復処理することによって、更新された候補ＩＤに対応する候補データを抽出させるデータ処理プログラム。 A data processing program for inputting a plurality of extraction candidate data strings and a plurality of discriminant condition expression information to a computer and extracting only candidate data satisfying all the discriminant condition expressions from the extraction candidate data string,
A continuous candidate data discrimination process for performing a continuous data discrimination process by calculating one or a plurality of discrimination condition formulas among input discrimination condition formulas for a plurality of extraction candidate data strings to be processed;
Discrete data discrimination by calculating one or more of the input discrimination conditional expressions for the extraction candidate data strings specified in the candidate ID list including index information indicating each extraction candidate data string Discontinuous candidate data discrimination processing for processing,
A candidate ID update determination process for determining whether or not to update a candidate ID included in the candidate ID list based on a determination result performed by the continuous candidate data determination process;
When the candidate ID update determination process determines to update the candidate ID list, and after the execution of the discontinuous candidate data determination process, a candidate ID update process for updating the candidate ID list;
To the computer,
The candidate data corresponding to the updated candidate ID is extracted by repeating the continuous candidate data determination process, the discontinuous candidate data determination process, the candidate ID update determination process, and the candidate ID update process a plurality of times. Data processing program.