JP5340185B2

JP5340185B2 - File processing apparatus and program

Info

Publication number: JP5340185B2
Application number: JP2010009363A
Authority: JP
Inventors: 吉則和泉; 金子　　豊; 真也竹内; ▲民▼錫黄
Original assignee: Japan Broadcasting Corp
Current assignee: Japan Broadcasting Corp
Priority date: 2010-01-19
Filing date: 2010-01-19
Publication date: 2013-11-13
Anticipated expiration: 2030-01-19
Also published as: JP2011150428A

Abstract

<P>PROBLEM TO BE SOLVED: To achieve high reliability by detecting collision of digest values by a simple method and raising detection accuracy of a mismatching part. <P>SOLUTION: In a file processing apparatus 1, when detecting a mismatching part of a file by using a digest value of block data obtained by dividing a file, a continuity determination part 15 detects the matching part other than the mismatching part as the part of collision generating part, when the number of mismatching part is smaller than the predetermined number of divisions m, for each predetermined data section influenced accompanying file editing. A continuation part 16 changes the collision generating part into the mismatching part. Thereby all block data in the predetermined data section influenced accompanying the file editing become a part for the mismatching part. Consequently, collision can be detected and eliminated without comparing mass file data. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、映像音声等のファイルの同一性または不一致部分を、ハッシュ値等のダイジェスト値を比較することにより検出するファイル処理装置及びプログラムに関し、特に、不一致部分を精度高く検出する技術に関する。 The present invention relates to a file processing apparatus and a program for detecting an identical or inconsistent portion of a file such as a video and audio by comparing digest values such as hash values, and more particularly to a technique for detecting an inconsistent portion with high accuracy.

従来、放送局では、映像音声の信号処理の高速化及び蓄積容量の増加に伴い、映像音声の信号をデジタルデータとしてファイルで扱うシステムの導入が進められている。このシステムは、映像音声等の大容量ファイルを、複数のファイル記録装置及びファイル処理装置の間で転送して複製し、複製ファイルを生成する。元ファイルを複製して複製ファイルを生成した後、元ファイルを編集した場合、複製ファイルも元ファイルと同様に編集する必要がある。一般に、このような元ファイルと複製ファイルとの間の同期化の処理は、元ファイル及び複製ファイルを用いて行われるのではなく、元ファイル及び複製ファイルの各データから演算されるダイジェスト値を用いて行われる。元ファイル及び複製ファイルは、大容量の実データであり処理負荷が高いのに対し、ダイジェスト値は、小容量のデータであり処理負荷が低いからである。ここで、元ファイル及び複製ファイルが同期しているとは、両ファイルのデータが一致していることをいう。 2. Description of the Related Art Conventionally, in broadcasting stations, with the increase in the speed of video / audio signal processing and the increase in storage capacity, the introduction of systems that handle video / audio signals as digital data in files has been promoted. In this system, a large-capacity file such as video and audio is transferred and copied between a plurality of file recording devices and file processing devices, and a duplicate file is generated. When the original file is edited after the original file is duplicated to generate the duplicate file, the duplicate file needs to be edited in the same manner as the original file. In general, the synchronization process between the original file and the duplicate file is not performed using the original file and the duplicate file, but a digest value calculated from each data of the original file and the duplicate file is used. Done. This is because the original file and the duplicate file are large-capacity actual data and have a high processing load, whereas the digest value is a small-capacity data and has a low processing load. Here, the fact that the original file and the duplicate file are synchronized means that the data of both files match.

一般にダイジェスト値は、ＳＨＡ（ＳｅｃｕｒｅＨａｓｈＡｌｇｏｒｉｔｈｍ）−１、ＳＨＡ−２、ＭＤ（ＭｅｓｓａｇｅＤｉｇｅｓｔＡｌｇｏｒｉｔｈｍ）１、ＭＤ２、ＭＤ３、ＭＤ４、ＭＤ５等のハッシュ関数によって生成される。ハッシュ関数を用いるのは、コリジョンの発生確率が低いこと、固定長のダイジェスト値にデータを圧縮できること、ダイジェスト値の分布が一様であること、演算時間が比較的高速であること等があげられる。 In general, the digest value is generated by a hash function such as SHA (Secure Hash Algorithm) -1, SHA-2, MD (Message Digest Algorithm) 1, MD2, MD3, MD4, MD5, or the like. The hash function is used because the probability of collision occurrence is low, the data can be compressed to a fixed-length digest value, the distribution of the digest value is uniform, the computation time is relatively fast, etc. .

ここで、コリジョンとは、異なる２つのデータから同じダイジェスト値が生成されてしまう現象をいう。コリジョンが発生すると、本来は異なるデータであるにもかかわらず同じデータであると判定されてしまうから、編集された元ファイルのデータが複製ファイルに反映されなくなり、結果として、元ファイルと複製ファイルとの間でデータが不一致（非同期）になってしまう。 Here, collision refers to a phenomenon in which the same digest value is generated from two different data. When a collision occurs, it is determined that the data is the same data even though it is originally different, so the edited original file data is not reflected in the duplicate file. As a result, the original file and the duplicate file Data will be inconsistent (asynchronous).

しかしながら、大容量のファイルの同期化を実現する場合、ハッシュ関数を用いてダイジェスト値を生成し同期化の処理を行う時間が、ダイジェスト値を用いることなく、ファイルを転送及び記録して差分を書き換えて同期化の処理を行う時間よりも大きくなることがあり得る。そこで、ハッシュ関数を用いてダイジェスト値を生成する演算処理の高速化のために、並列処理、アセンブラ化、事前処理、ハードウェア化等の様々な手法が提案されている（特許文献１，２、非特許文献１，２を参照）。 However, when synchronizing large files, it takes time to generate a digest value using a hash function and perform the synchronization process. Transfer and record the file and rewrite the difference without using the digest value. Therefore, it may be longer than the time for performing the synchronization process. Therefore, various methods such as parallel processing, assembler, preprocessing, and hardware have been proposed in order to speed up the arithmetic processing for generating a digest value using a hash function (Patent Documents 1, 2, and the like). (See Non-Patent Documents 1 and 2).

（従来の不一致部分検出手法１）
ところで、ハッシュ関数を用いてダイジェスト値を生成する場合、異なる２つのデータから同じダイジェスト値が生成されてしまうコリジョンは、統計的に起こりえる現象であり、その発生確率は、天文学的数値分の１であり限りなくゼロに近い。このため、ファイル同期化処理を精度高く実現することができる。 (Conventional mismatch detection method 1)
By the way, when a digest value is generated using a hash function, a collision in which the same digest value is generated from two different pieces of data is a phenomenon that can occur statistically, and the probability of occurrence is a fraction of astronomical numerical values. It is almost zero. For this reason, file synchronization processing can be realized with high accuracy.

図１９は、従来のファイル処理装置の構成を示すブロック図である。このファイル処理装置は、ハッシュ関数を用いてダイジェスト値を生成し、ダイジェスト値が異なる部分を不一致部分として検出する。そして、不一致部分として検出された箇所にデータを上書きすることにより、ファイル＃１，＃２が同期化される。具体的には、２つのファイル＃１，＃２の不一致部分を検出する際に、ファイル＃１，＃２をブロックデータにそれぞれブロック化し、ブロックデータのダイジェスト値をそれぞれ演算して比較し、異なるダイジェスト値を有するブロックデータの部分を、不一致部分として検出する。このファイル処理装置１００は、設定ファイル部１０１、ブロック化部１０２−１，１０２−２、ダイジェスト値演算部１０３−１，１０３−２及び比較部１０４を備えている。 FIG. 19 is a block diagram showing a configuration of a conventional file processing apparatus. This file processing apparatus generates a digest value using a hash function, and detects a portion having a different digest value as a mismatch portion. Then, the files # 1 and # 2 are synchronized by overwriting data on the portion detected as the mismatched portion. Specifically, when detecting inconsistent portions of the two files # 1 and # 2, the files # 1 and # 2 are each blocked into block data, and the digest values of the block data are calculated and compared, respectively. A block data portion having a digest value is detected as a mismatch portion. The file processing apparatus 100 includes a setting file unit 101, blocking units 102-1 and 102-2, digest value calculation units 103-1 and 103-2, and a comparison unit 104.

設定ファイル部１０１は記憶手段であり、オペレータにより予め設定されたブロックサイズＮが設定ファイルとして格納されている。ここで、ブロックサイズＮは、ブロック化部１０２−１，１０２−２において、ファイルデータがブロックデータに分割される際の、ブロックデータのサイズ（バイト数）である。 The setting file unit 101 is a storage unit, and stores a block size N preset by an operator as a setting file. Here, the block size N is the size (number of bytes) of the block data when the file data is divided into block data in the blocking units 102-1 and 102-2.

ブロック化部１０２−１は、ファイル＃１データを入力し、設定ファイル部１０１からブロックサイズＮを読み出し、ファイル＃１データをブロックサイズＮのデータ（ブロックデータ）に分割してブロック化し、ブロックデータ及びその位置情報をダイジェスト値演算部１０３−１に出力する。ここで、位置情報は、ファイル＃１データにおけるブロックデータの位置を示す情報であり、例えば、ファイルを構成するフレームの番号及びフレームを構成するブロックの番号である。同様に、ファイルブロック化部１０２−２は、ファイル＃２データを入力し、設定ファイル部１０１からブロックサイズＮを読み出し、ファイル＃２データをブロックサイズＮのブロックデータに分割してブロック化し、ブロックデータ及びその位置情報をダイジェスト値演算部１０３−２に出力する。 The blocking unit 102-1 receives the file # 1 data, reads the block size N from the setting file unit 101, divides the file # 1 data into block size N data (block data), and blocks the block data. And the position information thereof are output to the digest value calculation unit 103-1. Here, the position information is information indicating the position of the block data in the file # 1 data, and is, for example, the number of the frame constituting the file and the number of the block constituting the frame. Similarly, the file blocking unit 102-2 receives the file # 2 data, reads the block size N from the setting file unit 101, divides the file # 2 data into block data of the block size N, blocks the block, The data and its position information are output to the digest value calculation unit 103-2.

ダイジェスト値演算部１０３−１は、ブロック化部１０２−１からブロックデータ及び位置情報を入力し、ブロックデータに対し、ＳＨＡ−１等のハッシュ関数による演算を行ってダイジェスト値を求め、ブロックデータのダイジェスト値及び位置情報を比較部１０４に出力する。同様に、ダイジェスト値演算部１０３−２は、ブロック化部１０２−２からブロックデータ及び位置情報を入力し、ブロックデータに対し、ＳＨＡ−１等のハッシュ関数による演算を行ってダイジェスト値を求め、ブロックデータのダイジェスト値及び位置情報を比較部１０４に出力する。ここで、ダイジェスト値演算部１０３−１，１０３−２は、同じハッシュ関数を用いて演算を行う。 The digest value calculation unit 103-1 receives the block data and the position information from the blocking unit 102-1, performs a calculation using a hash function such as SHA-1 on the block data, obtains a digest value, and obtains the block data The digest value and the position information are output to the comparison unit 104. Similarly, the digest value calculation unit 103-2 receives the block data and the position information from the blocking unit 102-2, performs a calculation using a hash function such as SHA-1 on the block data, obtains a digest value, The digest value and position information of the block data are output to the comparison unit 104. Here, the digest value calculation units 103-1 and 103-2 perform calculations using the same hash function.

比較部１０４は、ダイジェスト値演算部１０３−１からブロックデータのダイジェスト値及び位置情報を入力すると共に、ダイジェスト値演算部１０３−２からブロックデータのダイジェスト値及び位置情報を入力し、同じ位置情報のダイジェスト値を比較する。比較部１０４は、２つのダイジェスト値が異なるものと判定した場合、その位置情報が示す位置のブロックデータは一致していないとして、その位置のブロックデータを不一致部分として検出する。そして、検出した不一致部分の位置情報を出力する。 The comparison unit 104 inputs the digest value and position information of the block data from the digest value calculation unit 103-1, and also receives the digest value and position information of the block data from the digest value calculation unit 103-2. Compare digest values. When the comparison unit 104 determines that the two digest values are different from each other, the block data at the position indicated by the position information is not matched, and the block data at the position is detected as a mismatched portion. Then, the position information of the detected mismatched part is output.

ここで、ブロックサイズＮは、設定ファイル部１０１において、比較部１０４にてファイルが比較され、不一致部分が検出される際の検出精度等に応じて設定される。例えば、映像音声ファイルの差分を更新する機能を有するファイル処理装置１００では、１００Ｍｂｐｓの映像符号化データに対して、例えば、ブロックサイズＮ＝１ＭＢ，４ｋＢが設定される。映像編集の最低単位である1フレームのデータ量は約４００ｋＢであるため、ブロックサイズＮ＝１ＭＢの例ではフレームの約２倍の大きさの精度にて差分が検出され、ブロックサイズＮ＝４ｋＢの例ではフレームの１／１００倍の大きさの精度にて差分が検出されることになる。 Here, the block size N is set in the setting file unit 101 according to the detection accuracy when the comparison unit 104 compares the files and detects a mismatched portion. For example, in the file processing apparatus 100 having the function of updating the difference between the video and audio files, for example, the block size N = 1 MB and 4 kB are set for the encoded video data of 100 Mbps. Since the amount of data of one frame, which is the minimum unit of video editing, is about 400 kB, in the example of the block size N = 1 MB, the difference is detected with an accuracy twice as large as the frame, and the block size N = 4 kB. In the example, the difference is detected with an accuracy of 1/100 times the size of the frame.

ブロックサイズＮ＝１ＭＢの例では、高速に複数のファイルを同期化するのが目的である。したがって、ブロックサイズＮ＝１ＭＢは、時間をかけて不一致部分を精度高く検出するよりも、不一致部分が多少多くても少ないデータ量で高速に比較することを優先する場合に設定される。また、ブロックサイズＮ＝４ｋＢの例では、ファイルシステムの工夫によってハードディスク上のファイルの差し替えを高速化するのが目的である。したがって、ブロックサイズＮ＝４ｋＢは、ファイルシステムのデータ単位である４ｋＢ毎に不一致部分を検出する必要がある場合に設定される。 In the example of the block size N = 1 MB, the purpose is to synchronize a plurality of files at high speed. Therefore, the block size N = 1 MB is set when priority is given to a high-speed comparison with a small amount of data even if there are some mismatches rather than detecting the mismatches with high accuracy over time. In the example of the block size N = 4 kB, the purpose is to speed up the replacement of files on the hard disk by devising the file system. Therefore, the block size N = 4 kB is set when it is necessary to detect a mismatched portion for every 4 kB that is a data unit of the file system.

このように、ファイル処理装置１００によれば、ファイル＃１，＃２におけるブロックデータのダイジェスト値を、ハッシュ関数を用いてそれぞれ演算し、ダイジェスト値の異なるブロックデータの位置情報を不一致部分の位置情報として出力するようにした。これにより、ファイル＃１とファイル＃２との間の不一致部分を検出することができ、ファイル同期化処理を実現することができる。 As described above, according to the file processing apparatus 100, the digest values of the block data in the files # 1 and # 2 are respectively calculated using the hash function, and the position information of the block data having different digest values is calculated. Output as. As a result, a mismatched portion between the file # 1 and the file # 2 can be detected, and a file synchronization process can be realized.

（従来の不一致部分検出手法２）
しかしながら、図１９に示したファイル処理装置１００では、ダイジェスト値のコリジョンの発生確率をゼロにすることはできない。そこで、コリジョンを排除する点に着目した手法も提案されている。具体的には、ダイジェスト値と元ファイルとを記録しておき、ダイジェスト値が異なる箇所を不一致部分とし、さらに、一致部分に対し、元ファイルを用いて比較することにより、コリジョンの発生を検出する。 (Conventional mismatch detection method 2)
However, in the file processing apparatus 100 shown in FIG. 19, the occurrence probability of digest value collision cannot be made zero. Therefore, a method that focuses on eliminating collisions has also been proposed. Specifically, the digest value and the original file are recorded, a portion having a different digest value is set as a mismatched portion, and further, the occurrence of a collision is detected by comparing the matched portion with the original file. .

（従来の不一致部分検出手法３）
また、ハッシュ関数を用いてダイジェスト値を演算し、ダイジェスト値が異なる部分を不一致部分として検出する際の安全性の観点から、コリジョンの発生を低下させるために、ビット長の長いハッシュ関数を用いる手法、及び標準化した新しいハッシュ関数を用いる手法が提案されている（非特許文献３を参照）。さらに、コリジョンの発生を低下させるために、複数のダイジェスト値を用いることにより、階層的に高信頼化を実現する手法も提案されている（非特許文献４を参照）。 (Conventional mismatch detection method 3)
Also, a method using a hash function with a long bit length in order to reduce the occurrence of collision from the viewpoint of safety when calculating a digest value using a hash function and detecting a portion with a different digest value as a mismatched portion And a method using a new standardized hash function has been proposed (see Non-Patent Document 3). Furthermore, in order to reduce the occurrence of collision, a method of realizing high reliability hierarchically by using a plurality of digest values has also been proposed (see Non-Patent Document 4).

特開２００５−２０８４００号公報JP 2005-208400 A 特開２００９−２３０５２３号公報JP 2009-230523 A

松井充、“暗号アルゴリズムの最新動向 −安全性と実装性の現状と課題−”、［online］、平成１８年１２月７日、［平成２１年１２月１７日検索］、インターネット＜ＵＲＬ：http://www.soi.wide.ad.jp/class/20060031/slides/41/index_1.html＞、スライド４７，４８Mitsuru Matsui, “Latest Trends in Cryptographic Algorithms-Current Status and Issues of Security and Implementation”, [online], December 7, 2006, [Search on December 17, 2009], Internet <URL: http : //www.soi.wide.ad.jp/class/20060031/slides/41/index_1.html>, slides 47 and 48 Yong Ki Lee、他２名、“Design Methodology for Throughput Optimum Architectures of Hash Algorithms of the MD4-class”、［online］、［平成２１年１２月１７日検索］、インターネット＜ＵＲＬ：http://www.cosic.esat.kuleuven.be/publications/article-1031.pdf＞Yong Ki Lee and two others, “Design Methodology for Throughput Optimum Architectures of Hash Algorithms of the MD4-class”, [online], [searched on December 17, 2009], Internet <URL: http: // www. cosic.esat.kuleuven.be/publications/article-1031.pdf> “CRYPTOGRAPHIC HASH PROJECT”、［online］、ＮＩＳＴ、［平成２１年１２月１７日検索］、インターネット＜ＵＲＬ：http://csrc.nist.gov/groups/ST/hash/index.html ＞“CRYPTOGRAPHIC HASH PROJECT”, [online], NIST, [Search on December 17, 2009], Internet <URL: http://csrc.nist.gov/groups/ST/hash/index.html> “暗号学的ハッシュ関数”、［online］、平成２１年９月１９日、フリー百科辞典「ウィキペディア（Ｗｉｋｉｐｅｄｉａ）、［平成２１年１２月１７日検索］、インターネット＜ＵＲＬ：http://ja.wikipedia.org/wiki/暗号学的ハッシュ関数＞“Cryptographic hash function”, [online], September 19, 2009, free encyclopedia “Wikipedia, [searched on December 17, 2009], Internet <URL: http: // en. wikipedia.org/wiki/cryptographic hash function>

前述した従来の不一致部分検出手法１は、ハッシュ関数を用いてダイジェスト値を演算し、ダイジェスト値を比較することにより、ダイジェスト値が異なる部分を不一致部分として検出するものである。したがって、この手法によるファイル同期化処理の安全性は、ハッシュ関数を用いた場合のコリジョン発生確率の低さに依存する。 The conventional inconsistent part detection method 1 described above calculates a digest value using a hash function and compares the digest values to detect a part having a different digest value as a mismatched part. Therefore, the safety of the file synchronization processing by this method depends on the low probability of collision occurrence when the hash function is used.

しかしながら、ハッシュ関数を用いた場合のコリジョン発生確率はゼロではない。このため、ゼロでないハッシュ関数を用いる限り、コリジョンの発生は避けることができず、本来は不一致であるブロックデータであるにもかかわらず、一致しているものとして誤検出する可能性があった。この場合、高い信頼性が要求される用途、例えば、正確性が絶対使命である放送局における用途では、番組内容を検査したり比較したりする際にコリジョンが発生すると、番組内容がエラーとなるから、安全性及び信頼性の観点から、ハッシュ関数を用いてダイジェスト値を演算する処理をそのまま使用することができないという問題があった。 However, the collision occurrence probability when the hash function is used is not zero. For this reason, as long as a non-zero hash function is used, the occurrence of collision cannot be avoided, and it may be erroneously detected as matching even though it is originally non-matching block data. In this case, in a use requiring high reliability, for example, a use in a broadcasting station whose accuracy is an absolute mission, if a collision occurs when checking or comparing the program content, the program content becomes an error. Therefore, from the viewpoint of safety and reliability, there is a problem that the process of calculating the digest value using the hash function cannot be used as it is.

ちなみに、ＳＨＡ−１のハッシュ関数を用いた場合、１６０ビットのダイジェスト値が生成されるから、ダイジェスト値としては２^１６０の空間を有することになる。この空間に相当するブロックデータからなる番組ファイルの数は、ファイルを１ＭＢ毎にブロックデータに分割した場合で概算すると、１００Ｍｂｐｓの６０分番組の２４時間×５メディア分を１年分の番組数として、１億の５乗年分になる。したがって、コリジョンの発生は天文学的に低い確率となる。実際のところ、ＳＨＡ−１のハッシュ関数を用いた場合のコリジョン発生例は未だに報告されていないが、コリジョンが発生する可能性はある。これに対し、ＭＤ−５のハッシュ関数を用いた場合のコリジョン発生例は報告されている。 By the way, when the hash function of SHA-1 is used, a digest value of 160 bits is generated, so that the digest value has ²¹⁶⁰ spaces. The number of program files consisting of block data corresponding to this space is roughly calculated when the file is divided into block data for each 1 MB, and 24 hours × 5 media for 100 Mbps 60-minute programs is taken as the number of programs for one year. It will be 100 million 5th year. Therefore, the occurrence of collision is astronomically low. Actually, a collision occurrence example using the hash function of SHA-1 has not yet been reported, but there is a possibility of collision. On the other hand, a collision occurrence example in the case of using the MD-5 hash function has been reported.

また、前述した従来の不一致部分検出手法２は、不一致部分検出手法１において、ダイジェスト値のコリジョン発生確率をゼロにすることができないという問題を解決するために、ダイジェスト値が異なる部分を不一致部分とし、さらに、一致部分について、ダイジェスト値を用いることなく元ファイルを用いて比較することにより、コリジョンの発生を検出する。しかしながら、この手法では、コリジョンの発生を検出するためにファイルデータを用いるから、大容量のデータを保持して処理する必要があり、処理が複雑になって処理負荷が高くなるという問題があった。また、データを比較するのに時間がかかるという問題もあった。 In addition, in the above-described conventional mismatched part detection method 2, in order to solve the problem that the collision occurrence probability of the digest value cannot be zero in the mismatched part detection method 1, a part having a different digest value is set as a mismatched part. Further, the occurrence of collision is detected by comparing the matching portion using the original file without using the digest value. However, since this method uses file data to detect the occurrence of collision, there is a problem that it is necessary to hold and process a large amount of data, which complicates the processing and increases the processing load. . Another problem is that it takes time to compare the data.

また、前述した従来の不一致部分検出手法３は、ビット長の長いハッシュ関数、標準化した新しいハッシュ関数を用いてコリジョンの発生を低下させるものであり、階層的に複数のダイジェスト値を用いて、コリジョンの発生を低下させるものである。しかしながら、これらの手法では、データ量が増加し、演算時間が増加するという問題があった。 The above-described conventional mismatch detection method 3 uses a hash function having a long bit length or a new standardized hash function to reduce the occurrence of collision, and uses a plurality of digest values hierarchically to collide. This reduces the occurrence of. However, these methods have a problem that the amount of data increases and the calculation time increases.

そこで、本発明はかかる問題を解決するためになされたものであり、その目的は、ファイルを分割したブロックデータのダイジェスト値を用いて、２つのファイルの不一致部分を検出するファイル処理装置において、ダイジェスト値のコリジョンを簡易な手法により検出し、不一致部分の検出精度を向上させて高信頼化を実現可能なファイル処理装置及びプログラムを提供することにある。 Accordingly, the present invention has been made to solve such a problem, and an object of the present invention is to provide a digest in a file processing apparatus that detects a mismatched portion of two files by using a digest value of block data obtained by dividing a file. An object of the present invention is to provide a file processing apparatus and a program that can detect collision of values by a simple method, improve the accuracy of detecting a mismatched portion, and achieve high reliability.

上記目的を達成するために、本発明による請求項１のファイル処理装置は、複数のファイルのデータを比較して不一致部分を検出するファイル処理装置において、前記ファイルのそれぞれについて、前記ファイルにおける処理単位のサイズを示す同期間隔、及び前記同期間隔における先頭位置のタイミングを検出し、同期情報を生成する同期検出部と、前記ファイルのそれぞれについて、前記同期検出部により生成された同期情報の示す同期間隔及びタイミングに従って、前記ファイルのデータを、所定のデータ区間内で所定のブロックサイズのブロックデータに分割し、前記ファイル内の前記ブロックデータの位置を示す位置情報を生成するブロック化部と、前記ファイルのそれぞれについて、前記ブロック化部により分割されたブロックデータのダイジェスト値を演算するダイジェスト値演算部と、前記ダイジェスト値演算部により演算されたダイジェスト値を、前記ブロック化部により生成された同じ位置情報が示す位置毎に比較し、異なるダイジェスト値の位置情報を、前記ファイルの不一致部分の位置情報として出力する比較部と、前記比較部により出力された不一致部分の位置情報を入力し、前記不一致部分の位置情報に基づいて、前記所定のデータ区間内で不一致部分の連続性を判定し、前記所定のデータ区間内で不一致部分が連続していない一致部分をコリジョン発生部分とし、前記コリジョン発生部分の位置情報及び前記不一致部分の位置情報を出力する連続性判定部と、前記連続性判定部により出力されたコリジョン発生部分の位置情報及び不一致部分の位置情報を入力し、前記コリジョン発生部分の位置情報を不一致部分の位置情報に変更して不一致部分を連続させ、前記連続させた不一致部分の位置情報を出力する連続化部と、を備えたことを特徴とする。 In order to achieve the above object, a file processing apparatus according to claim 1 of the present invention is a file processing apparatus that detects data inconsistency by comparing data of a plurality of files, and for each of the files, a processing unit in the file. A synchronization interval indicating the size of the file, and a synchronization detection unit that detects the timing of the start position in the synchronization interval and generates synchronization information, and a synchronization interval indicated by the synchronization information generated by the synchronization detection unit for each of the files And a blocking unit that divides the data of the file into block data of a predetermined block size in a predetermined data section according to the timing, and generates position information indicating the position of the block data in the file, and the file For each of the blocks divided by the blocking unit The digest value calculation unit that calculates the digest value of the data and the digest value calculated by the digest value calculation unit are compared for each position indicated by the same position information generated by the blocking unit, and different digest values are A comparison unit that outputs position information as position information of a mismatched portion of the file; and position information of the mismatched portion output by the comparison unit, and the predetermined data section based on the position information of the mismatched portion Continuity of the non-matching portion is determined, and a matching portion where the non-matching portion is not continuous in the predetermined data section is set as a collision occurrence portion, and the position information of the collision occurrence portion and the position information of the non-matching portion are output. Position information of the collision occurrence part and the position of the mismatched part output by the continuity judgment part and the continuity judgment part A continuation unit that inputs information, changes the position information of the collision occurrence part to position information of the mismatched part, continues the mismatched part, and outputs the position information of the continued mismatched part. Features.

また、本発明による請求項２のファイル処理装置は、複数のファイルのデータを比較して不一致部分を検出するファイル処理装置において、前記ファイルのそれぞれについて、前記ファイルのデータを、所定のブロックサイズのブロックデータに分割し、前記ファイル内の前記ブロックデータの位置を示す位置情報を生成するブロック化部と、前記ファイルのそれぞれについて、前記ブロック化部により分割されたブロックデータのダイジェスト値を演算するダイジェスト値演算部と、前記ダイジェスト値演算部により演算されたダイジェスト値を、前記ブロック化部により生成された同じ位置情報が示す位置毎に比較し、異なるダイジェスト値の位置情報を、前記ファイルの不一致部分の位置情報として出力する比較部と、前記比較部により出力された不一致部分の位置情報を入力し、前記不一致部分の位置情報が示す位置の前後に、前記不一致部分を所定数分広げて連続させ、前記連続させた不一致部分の位置情報を出力する連続化部と、を備えたことを特徴とする。 According to a second aspect of the present invention, there is provided the file processing apparatus according to the second aspect of the present invention, wherein a plurality of file data are compared to detect a mismatched portion, and the file data is set to a predetermined block size for each of the files. A block forming unit that divides into block data and generates position information indicating the position of the block data in the file, and a digest that calculates a digest value of the block data divided by the block forming unit for each of the files Compare the digest value calculated by the value calculation unit and the digest value calculation unit for each position indicated by the same position information generated by the blocking unit, and the position information of different digest values is compared with the mismatched part of the file Output as position information of the Continuously input the positional information of the output non-matching part, and before and after the position indicated by the position information of the non-matching part, expand the non-matching part by a predetermined number and continuously output the position information of the non-matching part. And a conversion unit.

また、本発明による請求項３のファイル処理装置は、請求項１に記載のファイル処理装置において、さらに、前記同期検出部により生成された同期情報が前記ファイルのそれぞれについて同一であると判定した場合、前記所定のデータ区間内におけるファイルのデータをブロックデータに分割する際の所定の分割数に基づいて、ブロックサイズを判定するブロックサイズ判定部を備え、前記ブロック化部が、前記ファイルのそれぞれについて、前記ファイルのデータを、前記所定のデータ区間内で、前記ブロックサイズ判定部により判定されたブロックサイズのブロックデータに分割し、前記ファイル内における前記ブロックデータの位置を示す位置情報を生成する、ことを特徴とする。 According to a third aspect of the present invention, there is provided the file processing apparatus according to the first aspect, wherein the synchronization information generated by the synchronization detection unit is determined to be the same for each of the files. A block size determination unit that determines a block size based on a predetermined number of divisions when the file data in the predetermined data section is divided into block data, wherein the blocking unit is configured for each of the files. The file data is divided into block data having a block size determined by the block size determination unit within the predetermined data section, and position information indicating the position of the block data in the file is generated. It is characterized by that.

また、本発明による請求項４のファイル処理装置は、請求項２に記載のファイル処理装置において、さらに、前記ファイルのそれぞれについて、前記ファイルにおける処理単位のサイズを示す同期間隔、及び前記同期間隔における先頭位置のタイミングを検出し、同期情報を生成する同期検出部と、前記同期検出部により生成された同期情報が前記ファイルのそれぞれについて同一であると判定した場合、前記所定のデータ区間内におけるファイルのデータをブロックデータに分割する際の所定の分割数に基づいて、ブロックサイズを判定するブロックサイズ判定部と、を備え、前記ブロック化部が、前記ファイルのそれぞれについて、前記ファイルのデータを、前記所定のデータ区間内で、前記ブロックサイズ判定部により判定されたブロックサイズのブロックデータに分割し、前記ファイル内における前記ブロックデータの位置を示す位置情報を生成する、ことを特徴とする。 According to a fourth aspect of the present invention, there is provided the file processing apparatus according to the second aspect, further comprising a synchronization interval indicating a size of a processing unit in the file for each of the files, and the synchronization interval. When it is determined that the synchronization information generated by the synchronization detection unit that detects the timing of the start position and generates the synchronization information and the synchronization information generated by the synchronization detection unit are the same for each of the files, the file in the predetermined data section A block size determination unit that determines a block size based on a predetermined number of divisions when dividing the data into block data, and the blocking unit converts the data of the file for each of the files, The block determined by the block size determination unit within the predetermined data section. Is divided into block data Kusaizu generates position information indicating the position of the block data in the file, characterized in that.

また、本発明による請求項５のファイル処理装置は、請求項１または２に記載のファイル処理装置において、さらに、前記比較する複数のファイルのうちの１つのファイルが蓄積され、かつ、前記１つのファイルについてのブロックデータのダイジェスト値及び位置情報が蓄積された記憶部を備え、前記記憶部に蓄積されたファイルについて処理する前記ブロック化部、ダイジェスト値演算部及び請求項１の同期検出部の代わりにダイジェスト値読み出し部を備え、前記ダイジェスト値読み出し部が、前記記憶部に蓄積されていない他のファイルについて処理する前記ブロック化部により生成された位置情報に対応するダイジェスト値を、前記記憶部から読み出し、前記比較部が、前記記憶部に蓄積されていない他のファイルについて処理する前記ダイジェスト値演算部により演算されたダイジェスト値と、前記ダイジェスト値読み出し部により読み出されたダイジェスト値とを、同じ位置情報が示す位置毎に比較し、異なるダイジェスト値の位置情報を、前記ファイルの不一致部分の位置情報として出力する、ことを特徴とする。 According to a fifth aspect of the present invention, there is provided the file processing device according to the first or second aspect, wherein one file of the plurality of files to be compared is further accumulated and the one file In place of the block forming unit, the digest value calculating unit, and the synchronization detecting unit according to claim 1, comprising a storage unit in which digest values and position information of block data for files are stored, and processing the files stored in the storage unit The digest value reading unit includes a digest value corresponding to the position information generated by the blocking unit that processes other files that are not stored in the storage unit, from the storage unit. Read, the comparison unit processes other files that are not stored in the storage unit The digest value calculated by the digest value calculation unit and the digest value read by the digest value reading unit are compared for each position indicated by the same position information, and the position information of different digest values is compared with that of the file. It is characterized in that it is output as position information of a mismatched portion.

また、本発明による請求項６のファイル処理装置は、請求項３または４に記載のファイル処理装置において、さらに、前記比較する複数のファイルのうちの１つのファイルが蓄積され、かつ、前記１つのファイルについての同期情報、ブロックデータのダイジェスト値及び位置情報が蓄積された記憶部を備え、前記記憶部に蓄積されたファイルについて処理する前記同期検出部、ブロック化部及びダイジェスト値演算部の代わりにダイジェスト値読み出し部を備え、前記ダイジェスト値読み出し部が、前記記憶部に蓄積されていない他のファイルについて処理する前記ブロック化部により生成された位置情報に対応するダイジェスト値を、前記記憶部から読み出し、前記ブロックサイズ判定部が、前記記憶部に蓄積されていない他のファイルについて処理する前記同期検出部により生成された同期情報と、前記記憶部に蓄積されたファイルの同期情報とが同一であると判定した場合、前記ファイルのデータをブロックデータに分割する際の所定の分割数に基づいて、ブロックサイズを判定し、前記比較部が、前記記憶部に蓄積されていない他のファイルについて処理する前記ダイジェスト値演算部により演算されたダイジェスト値と、前記ダイジェスト値読み出し部により読み出されたダイジェスト値とを、同じ位置情報が示す位置毎に比較し、異なるダイジェスト値の位置情報を、前記ファイルの不一致部分の位置情報として出力する、ことを特徴とする。 A file processing apparatus according to a sixth aspect of the present invention is the file processing apparatus according to the third or fourth aspect, wherein one of the plurality of files to be compared is further accumulated, and the one Instead of the synchronization detection unit, the blocking unit, and the digest value calculation unit that process the file stored in the storage unit, including a storage unit that stores synchronization information about the file, digest value of block data, and position information. A digest value reading unit is provided, and the digest value reading unit reads from the storage unit a digest value corresponding to the position information generated by the blocking unit that processes other files that are not stored in the storage unit. , The block size determination unit is another file that is not stored in the storage unit. If it is determined that the synchronization information generated by the synchronization detection unit to be processed and the synchronization information of the file stored in the storage unit are the same, a predetermined value when dividing the data of the file into block data Based on the number of divisions, the block size is determined, and the comparison unit calculates the digest value calculated by the digest value calculation unit that processes other files not stored in the storage unit, and the digest value reading unit The read digest value is compared for each position indicated by the same position information, and the position information of a different digest value is output as the position information of the mismatched portion of the file.

さらに、本発明によるファイル処理プログラムは、コンピュータを、請求項１から６までのいずれか一項に記載のファイル処理装置として機能させることを特徴とする。 Furthermore, a file processing program according to the present invention causes a computer to function as the file processing apparatus according to any one of claims 1 to 6.

以上のように、本発明によれば、ファイルを分割したブロックデータのダイジェスト値を用いて、２つのファイルの不一致部分を検出し、ファイル編集の単位を基準にして不一致部分の連続性を判定することにより、または不一致部分を拡大することにより、ダイジェスト値のコリジョンを検出し排除するようにした。これにより、大容量のファイルデータを比較することなく、簡易な手法によりコリジョンを検出することができる。また、２つのファイルの不一致部分を一致部分であるとして誤って判定することがなくなる。つまり、不一致部分の検出精度を向上させ、高信頼化を実現することができる。 As described above, according to the present invention, the digest value of block data obtained by dividing a file is used to detect a mismatched portion between two files, and the continuity of the mismatched portion is determined based on the unit of file editing. The collision of the digest value is detected and eliminated by expanding the inconsistent portion. Thereby, collision can be detected by a simple method without comparing large capacity file data. In addition, the mismatched portions of the two files are not erroneously determined as being matched portions. That is, it is possible to improve the detection accuracy of the mismatched portion and realize high reliability.

本発明の実施形態によるファイル処理装置を含む全体システムの構成を示す図である。It is a figure which shows the structure of the whole system containing the file processing apparatus by embodiment of this invention. 本発明の第１の実施形態（実施例１）によるファイル処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the file processing apparatus by the 1st Embodiment (Example 1) of this invention. 連続性判定部の処理を示すフローチャートである。It is a flowchart which shows the process of a continuity determination part. 分割数ｍ＝２、定数Ｘ＝１の場合において、同期しているブロックデータの不一致部分及びコリジョン発生部分を説明する図である。It is a figure explaining the mismatching part and collision generation | occurrence | production part of the block data which are synchronized in the case of the division | segmentation number m = 2 and the constant X = 1. 分割数ｍ＝１．５、定数Ｘ＝１の場合において、同期しているブロックデータの不一致部分及びコリジョン発生部分を説明する図である。It is a figure explaining the mismatching part and collision generation | occurrence | production part of the block data which are synchronized in the case of division | segmentation number m = 1.5 and the constant X = 1. 本発明の第２の実施形態（実施例２）によるファイル処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the file processing apparatus by the 2nd Embodiment (Example 2) of this invention. 連続化部の処理を示すフローチャートである。It is a flowchart which shows the process of a continuous part. 分割数ｍ＝１．５、定数Ｘ＝１、及び、フレームとブロックデータを同期させることなくブロック化を行い、不一致部分の前後１ブロックをコリジョン排除の対象（排除コリジョン数ｎ＝１）とした場合を説明する図である。The number of divisions m = 1.5, the constant X = 1, and the block and the block data are blocked without being synchronized, and one block before and after the inconsistent portion is the target of collision exclusion (exclusion collision number n = 1). It is a figure explaining a case. 分割数ｍ＝１．５、定数Ｘ＝１、及び、フレームとブロックデータとを同期させることなくブロック化を行い、不一致部分の前後２ブロックをコリジョン排除の対象（排除コリジョン数ｎ＝２）とした場合を説明する図である。The division number m = 1.5, the constant X = 1, and the frame and the block data are blocked without being synchronized, and the two blocks before and after the inconsistent portion are subject to collision exclusion (exclusion collision number n = 2). It is a figure explaining the case where it did. 本発明の第３の実施形態（実施例３）によるファイル処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the file processing apparatus by the 3rd Embodiment (Example 3) of this invention. ブロックサイズ判定部の処理を示すフローチャートである。It is a flowchart which shows the process of a block size determination part. 本発明の第４の実施形態（実施例４）によるファイル処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the file processing apparatus by the 4th Embodiment (Example 4) of this invention. 本発明の第５の実施形態（実施例５）によるファイル処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the file processing apparatus by the 5th Embodiment (Example 5) of this invention. 本発明の第６の実施形態（実施例６）によるファイル処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the file processing apparatus by the 6th Embodiment (Example 6) of this invention. 本発明の第７の実施形態（実施例７）によるファイル処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the file processing apparatus by the 7th Embodiment (Example 7) of this invention. 本発明の第８の実施形態（実施例８）によるファイル処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the file processing apparatus by the 8th Embodiment (Example 8) of this invention. 映像音声ファイルの種類と編集単位との関係を説明する図である。It is a figure explaining the relationship between the kind of video / audio file and an edit unit. ＭＸＦファイルのフォーマットを説明する図である。It is a figure explaining the format of an MXF file. 従来のファイル処理装置の構成を示すブロック図である。It is a block diagram which shows the structure of the conventional file processing apparatus.

以下、本発明を実施するための最良の形態について、図面を参照して説明する。本発明は、元ファイルとそれが編集されたファイル（編集ファイル）との間の不一致部分を検出する際に、これらのファイルデータを用いることなく、ダイジェスト値を用いるものである。ダイジェスト値を用いて不一致部分を検出する場合、ダイジェスト値のコリジョンが発生する可能性がある。そこで、本発明では、ファイルの編集に伴って、１フレームを構成する複数のブロックデータの全てが本来的に不一致になるにもかかわらず、その一部が一致になるという通常あり得ない状態を判定し、その一致部分をコリジョン発生部分として検出し排除することを特徴とする。具体的には、ダイジェスト値を比較して検出したブロックデータの不一致部分の長さ（連続性）が、ファイルの編集単位である所定数以下である場合は、あり得ない状態であると判定する。 The best mode for carrying out the present invention will be described below with reference to the drawings. The present invention uses digest values without using these file data when detecting a mismatch between the original file and a file (edited file) in which the original file is edited. When an inconsistent part is detected using a digest value, there is a possibility that a digest value collision occurs. Therefore, according to the present invention, a state in which a part of the plurality of block data constituting one frame is essentially inconsistent with the editing of the file, but a part of the data becomes inconsistent is not possible. It is characterized by determining and detecting the coincidence portion as a collision occurrence portion and eliminating it. Specifically, if the length (continuity) of the mismatched portion of the block data detected by comparing the digest values is equal to or less than a predetermined number that is a file editing unit, it is determined that the state is not possible. .

一般に、ファイルの編集はフレーム単位で行われるから、不一致部分はフレーム単位で検出される。つまり、編集されたファイルのフレームを構成する複数のブロックデータは、その全てが不一致部分となり、不一致部分の位置は、フレーム内において連続したものとなる。しかしながら、編集されたファイルのフレームを構成する複数のブロックデータのうち、不一致部分でない一致部分が検出される場合があり得る。本発明では、この一致部分をコリジョン発生部分として扱い、不一致部分に変更する。すなわち、本発明は、ファイルを分割したブロックデータのダイジェスト値を比較することにより、２つのファイルの不一致部分を検出し、その不一致部分の位置に基づいてコリジョンを検出し、検出したコリジョンを排除する。これにより、大容量の元ファイルのデータを用いることなく、簡易な手法によりコリジョンによる不一致部分の検出漏れをなくすことができ、不一致部分の検出精度を向上させて高信頼化を実現することができる。 In general, since file editing is performed in units of frames, inconsistent portions are detected in units of frames. That is, the plurality of block data constituting the frame of the edited file are all inconsistent portions, and the positions of the inconsistent portions are continuous within the frame. However, there may be a case where a matching portion that is not a mismatching portion is detected from a plurality of block data constituting the frame of the edited file. In the present invention, this matching portion is treated as a collision occurrence portion and is changed to a mismatching portion. That is, the present invention detects a mismatched portion of two files by comparing digest values of block data obtained by dividing the file, detects a collision based on the position of the mismatched portion, and eliminates the detected collision. . As a result, it is possible to eliminate omission of detection of inconsistent portions due to collision by a simple method without using data of a large-capacity original file, and it is possible to improve the detection accuracy of inconsistent portions and achieve high reliability. .

以下に説明する実施例１〜４は、２つのファイルに対して、ファイルを分割したブロックデータのダイジェスト値をそれぞれ演算して比較し、不一致部分を検出し、コリジョンを検出して排除する。実施例５〜８は、２つのファイルのうち１つのファイルについてのダイジェスト値が、予め演算され記憶されている場合に、他の１つのファイルに対してダイジェスト値を演算し、記憶されたダイジェスト値と比較し、不一致部分を検出し、コリジョンを検出して排除する。 In the first to fourth embodiments described below, digest values of block data obtained by dividing a file are respectively calculated and compared for two files, a mismatch portion is detected, and a collision is detected and eliminated. In Embodiments 5 to 8, when a digest value for one file out of two files is calculated and stored in advance, the digest value is calculated for the other one file, and the stored digest value Compared with, the inconsistent part is detected, and the collision is detected and eliminated.

また、実施例１，２，５，６は、比較する２つのファイルにおいて、ファイルデータの処理単位を示すフレームサイズ（同期間隔）が一定であり、かつ一致している場合の例である。対象ファイルは、例えば、固定レートの映像音声ファイルである。実施例３，４，７，８は、比較する２つのファイルにおいて、ファイルデータの処理単位を示すフレームサイズ（同期間隔）が変化する場合の例である。対象ファイルは、例えば、可変レートの映像音声ファイルである。 In the first, second, fifth, and sixth embodiments, the frame size (synchronization interval) indicating the processing unit of file data is constant and coincides between the two files to be compared. The target file is, for example, a fixed-rate video / audio file. Examples 3, 4, 7, and 8 are examples in which the frame size (synchronization interval) indicating the processing unit of file data changes in two files to be compared. The target file is, for example, a variable rate video / audio file.

また、実施例１，３，５，７は、フレームとブロックデータとが同期しており、ダイジェスト値が異なるブロックデータの部分を不一致部分とし、不一致部分の連続性を判定し、コリジョンを検出して排除する。実施例２，４，６，８は、フレームとブロックデータとが同期しておらず、ダイジェスト値が異なるブロックデータの部分を不一致部分とし、不一致部分の近傍をコリジョンが発生した部分であると推定し、推定したコリジョンを排除する。 In the first, third, fifth, and seventh embodiments, the frame data and the block data are synchronized, the block data portions having different digest values are set as mismatched portions, the continuity of the mismatched portions is determined, and the collision is detected. And eliminate. In the second, fourth, sixth, and eighth embodiments, the block and block data are not synchronized, a block data portion having a different digest value is regarded as a mismatched portion, and the vicinity of the mismatched portion is estimated to be a portion where collision has occurred. And eliminating the estimated collision.

〔全体システム〕
まず、本発明の実施形態によるファイル処理装置を含む全体システムについて説明する。図１は、全体システムの構成を示す図である。このシステムは、元ファイルを格納するファイル処理装置１〜８、元ファイルを編集する編集装置９０、及び、複製ファイルを格納するサーバ９１−１，９１−２，・・・を備えて構成される。編集装置９０において元ファイルが編集された場合、ファイル処理装置１〜８に格納される元ファイルと、サーバ９１−１，９１−２，・・・に格納される複製ファイルとが常に同一になるように、所定の処理を行う。ファイル処理装置１〜８と編集装置９０とは、インターネットまたはイントラネット等の通信ネットワークにより接続され、同様に、ファイル処理装置１〜８とサーバ９１−１，９１−２，・・・とは、インターネットまたはイントラネット等の通信ネットワークにより接続される。 [Overall system]
First, an overall system including a file processing apparatus according to an embodiment of the present invention will be described. FIG. 1 is a diagram showing a configuration of the entire system. This system includes file processing devices 1 to 8 that store original files, an editing device 90 that edits original files, and servers 91-1, 91-2,... That store duplicate files. . When the original file is edited by the editing device 90, the original file stored in the file processing devices 1 to 8 and the duplicate file stored in the servers 91-1, 91-2, ... are always the same. Thus, a predetermined process is performed. The file processing devices 1 to 8 and the editing device 90 are connected by a communication network such as the Internet or an intranet. Similarly, the file processing devices 1 to 8 and the servers 91-1, 91-2,. Or it connects by communication networks, such as an intranet.

ファイル処理装置１〜８は、映像音声等のファイルを元ファイルとして記憶手段に格納しており、編集装置９０によるファイル編集処理に先立って、元ファイルを編集装置９０へ送信する。また、ファイル処理装置１〜８は、編集装置９０によるファイル編集処理の後、編集装置９０から編集ファイルを受信し、元ファイルと編集ファイルとの間の不一致部分を、ハッシュ関数等の所定の関数または演算式から演算したダイジェスト値に基づいて検出し、編集ファイルのデータのうち不一致部分のデータを編集データとしてサーバ９１−１，９１−２，・・・へ送信すると共に、不一致部分の位置情報をサーバ９１−１，９１−２，・・・へ送信する。また、ファイル処理装置１〜８は、編集装置９０から受信した編集ファイルを元ファイルとして記憶手段に格納する。 The file processing apparatuses 1 to 8 store files such as video and audio in the storage unit as original files, and transmit the original files to the editing apparatus 90 prior to file editing processing by the editing apparatus 90. In addition, after the file editing process by the editing device 90, the file processing devices 1 to 8 receive the edited file from the editing device 90, and a mismatch function between the original file and the edited file is replaced with a predetermined function such as a hash function. Alternatively, it is detected based on the digest value calculated from the arithmetic expression, and the data of the inconsistent portion of the edit file data is transmitted as edit data to the servers 91-1, 91-2,. Are transmitted to the servers 91-1, 91-2,. Further, the file processing devices 1 to 8 store the edited file received from the editing device 90 in the storage unit as an original file.

編集装置９０は、ファイル処理装置１〜８から元ファイルを受信し、元ファイルに対し、オペレータによるマウス、キーボード等の入力操作に従って編集を行い、編集ファイルを生成する。また、編集装置９０は、編集ファイルをファイル処理装置１〜８へ送信する。 The editing device 90 receives the original file from the file processing devices 1 to 8, edits the original file in accordance with the input operation of the mouse, the keyboard, and the like by the operator, and generates an editing file. Further, the editing device 90 transmits the edited file to the file processing devices 1 to 8.

サーバ９１−１，９１−２，・・・は、ファイル処理装置１〜８から編集データ及び位置情報を受信し、記憶手段に格納された複製ファイルに対し、位置情報が示す領域に編集データを上書きする。これにより、ファイル処理装置１〜８に格納された元ファイルと、サーバ９１−１，９１−２，・・・に格納された複製ファイルとが同一になる。 The servers 91-1, 91-2,... Receive the edit data and the position information from the file processing devices 1 to 8, and store the edit data in the area indicated by the position information for the duplicate file stored in the storage unit. Overwrite. As a result, the original files stored in the file processing apparatuses 1 to 8 and the duplicate files stored in the servers 91-1, 91-2,.

以下、図１に示したシステムのファイル処理装置１〜８について、実施例１〜８に分けてそれぞれ詳細に説明する。 Hereinafter, the file processing apparatuses 1 to 8 of the system shown in FIG.

まず、実施例１について詳細に説明する。実施例１のファイル処理装置１は、比較する２つのファイルのフレームサイズ（同期間隔）が一定及び一致しており、フレームとブロックデータとが同期している場合において、ファイルの不一致部分を検出する装置である。具体的には、ファイル処理装置１は、２つのファイルに対し、ファイルを分割したブロックデータのダイジェスト値をそれぞれ演算して比較し、ダイジェスト値が異なるブロックデータの部分を不一致部分として検出する。そして、ファイル処理装置１は、検出した不一致部分の連続性を判定し、連続すべき不一致部分に一致部分が含まれる場合、この一致部分をコリジョン発生部分とし、コリジョン発生部分を不一致部分に変更することにより、コリジョンを排除する。比較する２つのファイルは、図１の例では、元ファイル及び編集ファイルである。後述する実施例２〜８についても同じ。 First, Example 1 will be described in detail. The file processing apparatus 1 according to the first embodiment detects an inconsistent portion of a file when the frame sizes (synchronization intervals) of the two files to be compared are constant and coincide, and the frame and the block data are synchronized. Device. Specifically, the file processing apparatus 1 calculates and compares the digest values of the block data obtained by dividing the files for the two files, and detects a block data portion having a different digest value as a mismatched portion. Then, the file processing apparatus 1 determines the continuity of the detected mismatched portion, and when the mismatched portion to be continued includes a matched portion, this matched portion is set as a collision occurrence portion, and the collision occurrence portion is changed to a mismatched portion. By eliminating collisions. The two files to be compared are the original file and the edit file in the example of FIG. The same applies to Examples 2 to 8 described later.

図２は、実施例１によるファイル処理装置１の構成を示すブロック図である。このファイル処理装置１は、設定ファイル部１０、同期検出部１１−１，１１−２、ブロック化部１２−１，１２−２、ダイジェスト値演算部１３−１，１３−２、比較部１４、連続性判定部１５及び連続化部１６を備えている。 FIG. 2 is a block diagram illustrating the configuration of the file processing apparatus 1 according to the first embodiment. The file processing apparatus 1 includes a setting file unit 10, synchronization detection units 11-1 and 11-2, blocking units 12-1 and 12-2, digest value calculation units 13-1 and 13-2, a comparison unit 14, A continuity determination unit 15 and a continuation unit 16 are provided.

設定ファイル部１０は記憶手段であり、オペレータにより予め設定されたブロックサイズＮ及び分割数ｍが設定ファイルとして格納されている。ここで、ブロックサイズＮは、前述したとおり、ブロック化部１２−１，１２−２において、フレームサイズを示す同期間隔またはその同期間隔の整数倍（Ｘ倍）のデータ（１つのフレームまたは複数のフレーム）がブロックデータに分割される際の、ブロックデータのサイズ（バイト数）である。分割数ｍは、同期間隔またはその同期間隔の整数倍（Ｘ倍）のデータ（１つのフレームまたは複数のフレーム）がブロックデータに分割される際の、ブロックデータの数である。 The setting file unit 10 is a storage unit, and stores a block size N and a division number m preset by an operator as a setting file. Here, as described above, the block size N is determined by the block forming units 12-1 and 12-2 at a synchronization interval indicating the frame size or an integer multiple (X times) of the synchronization interval (one frame or a plurality of data). This is the size (number of bytes) of the block data when the (frame) is divided into block data. The division number m is the number of block data when data (one frame or a plurality of frames) of the synchronization interval or an integral multiple (X times) of the synchronization interval is divided into block data.

同期検出部１１−１は、ファイル＃１データ（編集ファイルのデータ）を入力し、１つのフレームまたは複数フレームを処理単位とした場合のフレームサイズを示す同期間隔、及びその同期間隔の先頭位置のタイミングを検出し、同期情報を生成してブロック化部１２−１に出力する。同様に、同期検出部１１−２は、ファイル＃２データ（元ファイルのデータ）を入力し、同期間隔及び処理のタイミングを検出し、同期情報を生成してブロック化部１２−２に出力する。 The synchronization detection unit 11-1 receives file # 1 data (edited file data), a synchronization interval indicating a frame size when one frame or a plurality of frames is used as a processing unit, and a head position of the synchronization interval. Timing is detected, synchronization information is generated and output to the blocking unit 12-1. Similarly, the synchronization detection unit 11-2 receives the file # 2 data (original file data), detects the synchronization interval and processing timing, generates synchronization information, and outputs the synchronization information to the blocking unit 12-2. .

ここで、同期検出部１１−１，１１−２について詳細に説明する。本実施例１〜８が対象とする映像音声ファイルには、非圧縮データのファイル及び圧縮データのファイルの２種類がある。非圧縮データのファイルは、撮像された映像データ及び録音された音声データがそのままファイル化されたものである。修正、加工等の編集処理は、画面単位、すなわちフレーム単位で行われ、通常、映像信号には、ライン及びフレームの先頭を示すためのヘッダー情報が用いられている。 Here, the synchronization detection units 11-1 and 11-2 will be described in detail. There are two types of video and audio files targeted by the first to eighth embodiments: a non-compressed data file and a compressed data file. The uncompressed data file is a file in which captured video data and recorded audio data are directly converted into a file. Editing processing such as correction and processing is performed in units of screens, that is, in units of frames. Usually, header information for indicating the heads of lines and frames is used for video signals.

したがって、映像音声ファイルが非圧縮データの場合、同期検出部１１−１，１１−２は、ヘッダー情報からラインまたはフレームの先頭を検出し、フレームサイズを示す同期間隔、及びその同期間隔の先頭位置のタイミングを検出する。このように、編集処理の単位であるフレームは、ヘッダー情報に基づいて容易に検出することができ、同期情報を容易に生成することができる。 Therefore, when the video / audio file is uncompressed data, the synchronization detectors 11-1 and 11-2 detect the head of the line or frame from the header information, the synchronization interval indicating the frame size, and the start position of the synchronization interval. Detect the timing. As described above, the frame that is the unit of the editing process can be easily detected based on the header information, and the synchronization information can be easily generated.

一方、圧縮データのファイルは、様々な手段によりデータの冗長性を取り除いてデータ量が減らされ、ファイル化されたものであり、映像音声ファイルの場合、ＭＰＥＧ−２、ＪＰＥＧ、Ｗａｖｅｌｅｔ、Ｈ．２６４等の規格のファイルが実用化されている。これらの圧縮データにおいても、映像データ及び音声データは、フレームまたは複数フレームとしてまとめて扱われ、映像信号には、フレームまたは複数フレームの先頭を示すためのヘッダー情報が用いられている。 On the other hand, the compressed data file is a file that has been reduced to a data amount by removing data redundancy by various means, and in the case of a video / audio file, MPEG-2, JPEG, Wavelet, H.264, and the like. H.264 standard files have been put into practical use. Also in these compressed data, video data and audio data are collectively handled as a frame or a plurality of frames, and header information for indicating the head of the frame or the plurality of frames is used for the video signal.

したがって、映像音声ファイルが圧縮データの場合、非圧縮データの場合と同様に、同期検出部１１−１，１１−２は、ヘッダー情報からフレームまたは複数フレームの先頭を検出し、フレームサイズを示す同期間隔、及びその同期間隔の先頭位置のタイミングを検出する。 Therefore, when the video / audio file is compressed data, as in the case of uncompressed data, the synchronization detectors 11-1 and 11-2 detect the head of the frame or a plurality of frames from the header information, and indicate the frame size. The timing of the interval and the head position of the synchronization interval is detected.

例えば、ＭＰＥＧ−２の映像音声ファイルには、編集用途を考慮したＩ−ｏｎｌｙのファイル、及び圧縮率の高いｌｏｎｇ−ＧＯＰのファイルがある。さらに、Ｉ−ｏｎｌｙのファイルには、固定レート（ＣＢＲ）のファイル及び可変レート（ＶＢＲ）のファイルがある。固定レート（ＣＢＲ）のファイルの場合、データ量が一定であるから、フレームの先頭が固定周期となり、非圧縮データのファイルと同様に扱うことができる。これに対し、可変レート（ＶＢＲ）のファイルの場合、映像音声の情報量に応じてデータ量が変化するから、フレームの周期が変動することなり、非圧縮データのファイルと同様に扱うことができない。 For example, the MPEG-2 video / audio file includes an I-only file for editing purposes and a long-GOP file with a high compression rate. Further, the I-only file includes a fixed rate (CBR) file and a variable rate (VBR) file. In the case of a file with a fixed rate (CBR), since the data amount is constant, the beginning of the frame has a fixed period and can be handled in the same manner as a file with uncompressed data. On the other hand, in the case of a variable rate (VBR) file, the amount of data changes according to the amount of information of video and audio, so the frame period varies, and it cannot be handled in the same way as a file of uncompressed data. .

また、Ｉ−ｏｎｌｙのファイルの場合、フレーム内でデータが圧縮され符号化が行われるから、同期検出部１１−１，１１−２は、ヘッダー情報からフレームの先頭を検出することができ、フレーム単位でデータを扱うことができる。これに対し、ｌｏｎｇ−ＧＯＰのファイルの場合、複数フレームを集めたＧＯＰの単位で符号化が行われるから、同期検出部１１−１，１１−２は、ヘッダー情報からＧＯＰの先頭を検出することができ、ＧＯＰ単位でデータを扱うことができる。また、ヘッダー情報を解読して演算することにより、ＧＯＰを構成する各フレームの先頭を求めることもできる。しかしながら、ｌｏｎｇ−ＧＯＰでは、本来、複数フレーム内でデータ処理され圧縮されるから、圧縮データをフレーム単位で処理したとしても、他のフレームとの間で符号化処理の連続性及び一貫性を保つためには再符号化が必要となる。そこで、ここでは、修正、加工等の編集処理はＧＯＰ単位の場合を対象とする。 In the case of an I-only file, data is compressed and encoded in the frame, so that the synchronization detection units 11-1 and 11-2 can detect the head of the frame from the header information. Data can be handled in units. On the other hand, in the case of a long-GOP file, encoding is performed in units of GOPs in which a plurality of frames are collected. Therefore, the synchronization detection units 11-1 and 11-2 detect the head of the GOP from the header information. And can handle data in GOP units. It is also possible to obtain the head of each frame constituting the GOP by decoding and calculating the header information. However, since long-GOP is originally processed and compressed in a plurality of frames, even if the compressed data is processed in units of frames, the continuity and consistency of encoding processing with other frames is maintained. This requires re-encoding. Therefore, here, editing processing such as correction and processing is targeted for the case of GOP units.

図１７は、映像音声ファイルの種類と編集単位との関係を説明する図である。図１７において、非圧縮データからなる固定レート（ＣＢＲ）のファイルの場合、フレーム毎に編集が行われ、同期検出部１１−１，１１−２は、ヘッダー情報からフレームを検出し同期情報を生成する。また、非圧縮データからなる可変レート（ＶＢＲ）のファイルの場合、フレーム毎に編集が行われ、後述する実施例３の同期検出部３０−１，３０−２等は、ヘッダー情報からフレームを検出し同期情報を生成する。また、ＭＰＥＧ−２のＩ−ｏｎｌｙの圧縮データからなる固定レート（ＣＢＲ）及び可変レート（ＶＢＲ）のファイルの場合も、非圧縮データの場合と同様に、フレーム毎に編集が行われる。また、ＭＰＥＧ−２のｌｏｎｇ−ＧＯＰの圧縮データからなる可変レート（ＶＢＲ）のファイルの場合、ＧＯＰ毎に編集が行われ、後述する実施例３の同期検出部３０−１，３０−２等はヘッダー情報から複数フレームのＧＯＰを検出し同期情報を生成する。 FIG. 17 is a diagram for explaining the relationship between the type of video / audio file and the editing unit. In FIG. 17, in the case of a fixed rate (CBR) file composed of uncompressed data, editing is performed for each frame, and the synchronization detection units 11-1 and 11-2 detect the frame from the header information and generate synchronization information. To do. Further, in the case of a variable rate (VBR) file made up of uncompressed data, editing is performed for each frame, and the synchronization detection units 30-1, 30-2, etc. of the third embodiment described later detect frames from the header information. And generate synchronization information. In the case of fixed rate (CBR) and variable rate (VBR) files made of MPEG-2 I-only compressed data, editing is performed on a frame-by-frame basis as in the case of uncompressed data. In addition, in the case of a variable rate (VBR) file composed of MPEG-2 long-GOP compressed data, editing is performed for each GOP, and synchronization detection units 30-1 and 30-2 of Example 3 described later are A GOP of a plurality of frames is detected from the header information to generate synchronization information.

さらに最近、様々な種類の映像音声をファイルとして扱うための共通フォーマットとして、ＭＸＦファイルフォーマットを用いることが一般的になっている。ＭＸＦは、非圧縮及び圧縮を問わず、共通のヘッダー形式でコンピュータがファイルを認識できるようにするファイルの形式である。 Further, recently, it has become common to use the MXF file format as a common format for handling various types of video and audio as files. MXF is a file format that allows a computer to recognize a file in a common header format, whether uncompressed or compressed.

図１８は、ＭＸＦファイルのフォーマットを説明する図である。ＭＸＦファイルは、ヘッダー（ＦｉｌｅＨｅａｄｅｒ）、ボディ（ＦｉｌｅＢｏｄｙ）及びフッター（ＦｉｌｅＦｏｏｔｅｒ）により構成される。ヘッダー及びフッターには、メタデータ、インデックステーブル（ＩｎｄｅｘＴａｂｌｅ）等が格納され、映像音声データの種類等に依存しない、データの種類及びデータの配置（フレームの単位、ＧＯＰの単位等）を示す構造になっている。ボディには、ＫＬＶコーディングという、実データのタイプを示すキー（Ｋｅｙ）、データ長を示すレングス（Ｌｅｎｇｔｈ）、及びデータ実体のバリュー（Ｖａｌｕｅ）を単位とした映像音声データがパックされる。 FIG. 18 is a diagram for explaining the format of the MXF file. The MXF file includes a header (File Header), a body (File Body), and a footer (File Footer). The header and footer store metadata, an index table (Index Table), and the like, and indicate a data type and a data arrangement (frame unit, GOP unit, etc.) independent of the type of video / audio data. It has become. The body is packed with audio / video data in units of KLV coding, which is a key (Key) indicating the type of actual data, a length (Length) indicating the data length, and a value (Value) of the data entity.

ＭＸＦフォーマットのファイルの場合、ヘッダーのインデックステーブルからデータのパック状態を認識することができる。したがって、同期検出部１１−１，１１−２または後述する実施例３の同期検出部３０−１，３０−２等は、ヘッダーのインデックステーブル（ＩｎｄｅｘＴａｂｌｅ）、またはボディのキー（Ｋｅｙ）からファイルの編集単位であるフレームまたはＧＯＰのサイズを読み取り、フレーム等を処理単位とした場合のサイズを示す同期間隔、及びその同期間隔の先頭位置のタイミングを検出する。尚、ヘッダーのインデックステーブル（ＩｎｄｅｘＴａｂｌｅ）及びボディのキー（Ｋｅｙ）は既知であるから、これらの詳細な説明は省略する。 In the case of the MXF format file, the data pack state can be recognized from the index table of the header. Therefore, the synchronization detection units 11-1 and 11-2, or the synchronization detection units 30-1 and 30-2 of the third embodiment described later, and the like can receive files from the header index table (Index Table) or the body key (Key). The size of a frame or GOP, which is an editing unit, is read, and the synchronization interval indicating the size when the frame or the like is used as a processing unit, and the timing of the start position of the synchronization interval are detected. Since the header index table (Index Table) and the body key (Key) are known, a detailed description thereof will be omitted.

また、後述する実施例３の同期検出部３０−１，３０−２等は、ＭＰＥＧ−２の圧縮データからなる可変レート（ＶＢＲ）のファイルの場合、ヘッダーからＧＯＰのサイズを読み取り、ＧＯＰを処理単位とした場合のサイズを示す同期間隔、及びその同期間隔の先頭位置のタイミングを検出する。また、後述する実施例３の同期検出部３０−１，３０−２等は、ＭＰＥＧ−２のＩ−ｏｎｌｙの圧縮データからなる可変レート（ＶＢＲ）のファイルの場合、ヘッダーからフレーム間隔のサイズ（フレームサイズ）を読み取り、フレームサイズ示す同期間隔、及びその同期間隔の先頭位置のタイミングを検出する。 In addition, in the case of a variable rate (VBR) file composed of MPEG-2 compressed data, the synchronization detection units 30-1 and 30-2 of the third embodiment described later read the GOP size from the header and process the GOP. A synchronization interval indicating the size in the case of a unit and the timing of the start position of the synchronization interval are detected. In addition, the synchronization detection units 30-1 and 30-2 of the third embodiment to be described later, for a variable rate (VBR) file made up of MPEG-2 I-only compressed data, Frame size) is read, and the synchronization interval indicating the frame size and the timing of the start position of the synchronization interval are detected.

図２に戻って、ブロック化部１２−１は、ファイル＃１データを入力すると共に、同期検出部１１−１から同期情報を入力し、設定ファイル部１０からブロックサイズＮを読み出し、同期情報が示すタイミングを基点として、同期情報が示す同期間隔またはその同期間隔の整数倍（Ｘ倍）のデータをブロックサイズＮに分割し、分割したブロックデータの位置を示す位置情報を生成し、ブロックデータ及びその位置情報をダイジェスト値演算部１３−１に出力する。同様に、ブロック化部１２−２は、ファイル＃２データを入力すると共に、同期検出部１１−２から同期情報を入力し、設定ファイル部１０からブロックサイズＮを読み出し、同期情報が示すタイミングを基点として、同期情報が示す同期間隔またはその同期間隔の整数倍（Ｘ倍）のデータをブロックサイズＮに分割し、分割したブロックデータの位置を示す位置情報を生成し、ブロックデータ及び位置情報をダイジェスト値演算部１３−２に出力する。尚、同期情報が示す同期間隔またはその同期間隔の整数倍（Ｘ倍）のデータは、分割数ｍのブロックデータに分割される。 Returning to FIG. 2, the blocking unit 12-1 receives the file # 1 data, inputs the synchronization information from the synchronization detection unit 11-1, reads the block size N from the setting file unit 10, and receives the synchronization information. Based on the timing shown, the synchronization interval indicated by the synchronization information or data of an integral multiple (X times) of the synchronization interval is divided into block sizes N, position information indicating the position of the divided block data is generated, and the block data and The position information is output to the digest value calculation unit 13-1. Similarly, the blocking unit 12-2 inputs the file # 2 data, inputs synchronization information from the synchronization detection unit 11-2, reads the block size N from the setting file unit 10, and indicates the timing indicated by the synchronization information. As a base point, the synchronization interval indicated by the synchronization information or data of an integral multiple (X times) of the synchronization interval is divided into block sizes N, position information indicating the position of the divided block data is generated, and the block data and position information are It outputs to the digest value calculating part 13-2. Note that the synchronization interval indicated by the synchronization information or data of an integral multiple (X times) of the synchronization interval is divided into block data of the division number m.

ここで、ファイルデータをブロックサイズに分割する際の整数倍（Ｘ倍）の値は予め設定される。また、位置情報は、ファイル内におけるブロックデータの位置を示す情報であり、例えば、フレームの番号、及び、１つのフレームまたは複数のフレーム内のブロックの番号である。同期間隔をＦ、倍数を示す定数をＸとすると、ファイルが分割されたブロックデータのサイズ（ブロックサイズＮ）は、Ｎ＝Ｆ×Ｘ／ｍで表される。すなわち、設定ファイル部１０から読み出されるブロックサイズＮは、映像の処理単位を示すフレームサイズである同期間隔Ｆ及び分割数ｍにより決定され、前記式の関係にある。設定ファイル部１０には、この式を満たすブロックサイズＮ及び分割数ｍが予め設定され、格納されている。定数Ｘは、コリジョンを排除するための範囲を定める値である。例えば、Ｘ＝１の場合、編集単位であるフレーム毎にコリジョンが排除され、Ｘ＝２の場合、編集単位の２倍である２フレーム毎にコリジョンが排除される。 Here, an integer multiple (X times) for dividing the file data into block sizes is set in advance. The position information is information indicating the position of the block data in the file, and is, for example, a frame number and a block number in one frame or a plurality of frames. When the synchronization interval is F and the constant indicating the multiple is X, the size of the block data (block size N) obtained by dividing the file is represented by N = F × X / m. That is, the block size N read from the setting file unit 10 is determined by the synchronization interval F and the division number m, which are frame sizes indicating video processing units, and is in the relationship of the above formula. In the setting file unit 10, a block size N and a division number m satisfying this expression are set and stored in advance. The constant X is a value that defines a range for eliminating collision. For example, when X = 1, collisions are excluded for each frame that is an editing unit, and when X = 2, collisions are excluded every two frames that are twice the editing unit.

ダイジェスト値演算部１３−１は、ブロック化部１２−１からブロックデータ及び位置情報を入力し、ブロックデータに対し、ハッシュ関数等の所定の関数または演算式を用いて、ブロックデータの要約であるダイジェスト値を演算し、ダイジェスト値及び位置情報を比較部１４に出力する。同様に、ダイジェスト値演算部１３−２は、ブロック化部１２−２からブロックデータ及び位置情報を入力し、ブロックデータに対し、ダイジェスト値演算部１３−１において用いる同じ関数または演算式を用いて、ダイジェスト値を演算し、ダイジェスト値及び位置情報を比較部１４に出力する。ここで、ダイジェスト値を演算するために用いる関数として、一般にハッシュ関数が用いられるが、本発明では、ハッシュ関数に限定するものではなく、他の関数または演算式であってもよい（以下の実施例２〜８についても同じ）。 The digest value calculation unit 13-1 receives block data and position information from the blocking unit 12-1, and uses a predetermined function such as a hash function or an arithmetic expression for the block data to summarize the block data. The digest value is calculated, and the digest value and position information are output to the comparison unit 14. Similarly, the digest value calculation unit 13-2 receives block data and position information from the block forming unit 12-2, and uses the same function or calculation formula used in the digest value calculation unit 13-1 for the block data. The digest value is calculated, and the digest value and position information are output to the comparison unit 14. Here, a hash function is generally used as a function used to calculate the digest value. However, in the present invention, the function is not limited to the hash function, and may be another function or an arithmetic expression (the following implementation) The same applies to Examples 2 to 8).

比較部１４は、ダイジェスト値演算部１３−１から、ファイル＃１のブロックデータにおけるダイジェスト値及び位置情報を入力すると共に、ダイジェスト値演算部１３−２から、ファイル＃２のブロックデータにおけるダイジェスト値及び位置情報を入力し、同じ位置情報のダイジェスト値を比較し、異なるダイジェスト値の位置情報を不一致部分の位置情報として連続性判定部１５に出力する。比較部１４により出力された不一致部分の位置情報は、ファイル＃１，＃２間で不一致部分の位置を示す情報となる。 The comparison unit 14 inputs the digest value and position information in the block data of the file # 1 from the digest value calculation unit 13-1, and the digest value and the position information in the block data of the file # 2 from the digest value calculation unit 13-2. Position information is input, digest values of the same position information are compared, and position information of different digest values is output to the continuity determination unit 15 as position information of a mismatched portion. The position information of the mismatched portion output by the comparison unit 14 is information indicating the position of the mismatched portion between the files # 1 and # 2.

連続性判定部１５は、比較部１４から不一致部分の位置情報を入力し、設定ファイル部１０から分割数ｍを読み出し、フレームサイズ（同期間隔）またはその整数倍（Ｘ倍）のデータ区間において、不一致部分の連続性を判定し、不一致部分が連続していない不一致部分をコリジョン発生部分とすることにより、コリジョンの発生を検出し、不一致部分の位置情報及びコリジョン発生部分の位置情報を連続化部１６に出力する。 The continuity determination unit 15 inputs position information of the mismatched portion from the comparison unit 14, reads the division number m from the setting file unit 10, and in a data section of a frame size (synchronization interval) or an integral multiple (X times) thereof, Judgment of the occurrence of collision by determining the continuity of the non-matching part and making the non-matching part where the non-matching part is not continuous a collision generating part, and the position information of the non-matching part and the position information of the collision generating part are serialized 16 is output.

図３は、連続性判定部１５の処理を示すフローチャートである。連続性判定部１５は、比較部１４から不一致部分の位置情報を入力し（ステップＳ３０１）、設定ファイル部１０から分割数ｍを読み出す（ステップＳ３０２）。そして、連続性判定部１５は、不一致部分の位置情報から、フレームサイズ（同期間隔）またはその整数倍（Ｘ倍）のデータ区間（所定のデータ区間）における不一致部分の数を算出する（ステップＳ３０３）。この所定のデータ区間には、少なくとも１つの不一致部分が含まれる。不一致部分を全く含まないデータ区間については、連続性判定部１５による連続性判定の対象としない。 FIG. 3 is a flowchart showing the processing of the continuity determination unit 15. The continuity determination unit 15 inputs the position information of the mismatched portion from the comparison unit 14 (step S301), and reads the division number m from the setting file unit 10 (step S302). Then, the continuity determination unit 15 calculates the number of mismatched portions in the data size (predetermined data interval) of the frame size (synchronization interval) or an integer multiple (X times) thereof from the position information of the mismatched portions (step S303). ). This predetermined data section includes at least one mismatched portion. Data sections that do not include any inconsistent portions are not subject to continuity determination by the continuity determination unit 15.

連続性判定部１５は、算出した不一致部分の数と、そのデータ区間におけるブロックデータの数を示す分割数ｍとを比較する（ステップＳ３０４）。連続性判定部１５は、ステップＳ３０４において、不一致部分の数と分割数ｍとが同じであると判定した場合（ステップＳ３０４：＝）、すなわち、データ区間内の全てのブロックデータが不一致部分であると判定した場合、所定のデータ区間において不一致部分は正しく連続していると判断し、コリジョンは発生していないと判定する（ステップＳ３０５）。そして、連続性判定部１５は、入力した不一致部分の位置情報のみを連続化部１６に出力する（ステップＳ３０６）。 The continuity determination unit 15 compares the calculated number of inconsistent portions with the division number m indicating the number of block data in the data section (step S304). If the continuity determination unit 15 determines in step S304 that the number of mismatched portions and the number of divisions m are the same (step S304: =), that is, all block data in the data section are mismatched portions. If it is determined that the inconsistent portion is correctly continued in the predetermined data section, it is determined that no collision has occurred (step S305). Then, the continuity determination unit 15 outputs only the input position information of the mismatched portion to the continuation unit 16 (step S306).

一方、連続性判定部１５は、ステップＳ３０４において、不一致部分の数が分割数ｍよりも小さいと判定した場合（ステップＳ３０４：＜）、すなわち、データ区間内に一致部分が存在すると判定した場合、所定のデータ区間において不一致部分は正しく連続していないと判断し、コリジョンが発生していると判定する（ステップＳ３０７）。そして、連続性判定部１５は、所定のデータ区間において、不一致部分以外の部分（不一致部分が連続していない部分）である一致部分を、コリジョン発生部分に設定し（ステップＳ３０８）、入力した不一致部分の位置情報、及び、設定したコリジョン発生部分の位置情報を連続化部１６に出力する（ステップＳ３０９）。 On the other hand, if the continuity determination unit 15 determines in step S304 that the number of mismatched parts is smaller than the division number m (step S304: <), that is, if it is determined that there is a matched part in the data section, It is determined that the inconsistent portions are not correctly continued in the predetermined data section, and it is determined that collision has occurred (step S307). Then, the continuity determination unit 15 sets a matching portion that is a portion other than the mismatching portion (a portion where the mismatching portion is not continuous) in the predetermined data section as a collision occurrence portion (step S308) and inputs the mismatching The position information of the part and the position information of the set collision occurrence part are output to the continuation unit 16 (step S309).

例えば、ファイル＃１，＃２が非圧縮の映像音声ファイルの場合、同期間隔Ｆは、１フレームのデータサイズに相当し、分割数ｍは、１フレームがブロックデータに分割される数であり、ｍ＞１とする。以下、Ｘ＝１，ｍ＝２とし、１フレームが２つのブロックデータに分割される場合を想定し、具体的に説明する。ダイジェスト値演算部１３−１，１３−２により、１フレーム毎に２つのダイジェスト値がそれぞれ生成される。ところで、ファイル編集はフレーム単位で行われるから、ファイルの修正によって生じる不一致部分も、比較部１４により、１フレーム単位で検出されることになり、ダイジェスト値の不一致部分は、１フレーム内において２ブロック連続することになる。ここで、１フレーム内において、２ブロックに満たない１ブロックの不一致部分が存在する場合、その欠落部分（一致部分）でコリジョンが発生していることになる。そこで、連続性判定部１５は、所定のデータ区間であるフレーム毎に、不一致部分が２ブロック連続するか否かについて、不一致部分の連続性を判定し、不一致部分が２ブロック連続している場合、コリジョンは発生していないと判定し、不一致部分が２ブロック連続しておらず１ブロックのみの場合、一致部分においてコリジョンが発生していると判定する。 For example, when the files # 1 and # 2 are uncompressed video and audio files, the synchronization interval F corresponds to the data size of one frame, and the division number m is the number by which one frame is divided into block data. Let m> 1. Hereinafter, specific description will be given assuming that X = 1 and m = 2 and one frame is divided into two block data. The digest value calculation units 13-1 and 13-2 generate two digest values for each frame. By the way, since file editing is performed in units of frames, inconsistent portions caused by file correction are also detected by the comparing unit 14 in units of one frame, and inconsistent portions of digest values are divided into two blocks within one frame. It will be continuous. Here, if there is a mismatched portion of one block that is less than two blocks in one frame, a collision has occurred at the missing portion (matched portion). Therefore, the continuity determination unit 15 determines the continuity of the non-matching portion for whether or not the non-matching portion is continuous for two blocks for each frame that is a predetermined data section, and the non-matching portion is continuous for two blocks. , It is determined that no collision has occurred, and when the non-matching portion is not two consecutive blocks but only one block, it is determined that a collision has occurred in the matching portion.

このように、連続性判定部１５は、ファイル編集に伴って影響を受けるフレームのデータ区間において、全てのブロックデータについて不一致部分が連続していない場合、コリジョンが発生していると判定する。これにより、コリジョンの発生を確実に判定することができる。ファイル編集の影響を受けるフレームのデータ区間において、全てのブロックデータは不一致部分になることが前提になっているからである。 In this way, the continuity determination unit 15 determines that a collision has occurred when inconsistent portions are not continuous for all block data in the data section of a frame that is affected by file editing. As a result, the occurrence of a collision can be reliably determined. This is because it is assumed that all block data are inconsistent in the data section of the frame affected by file editing.

図２に戻って、連続化部１６は、連続性判定部１５から、不一致部分の位置情報及びコリジョン発生部分の位置情報を入力し、設定ファイル部１０から分割数ｍを読み出し、コリジョン発生部分の位置情報を不一致部分の位置情報に加えるように変更して不一致部分を連続させ、連続させた不一致部分の位置情報を出力する。このように、ファイル編集に伴って影響を受ける所定のデータ区間毎にコリジョンが検出され、検出されたコリジョンの位置情報が不一致部分の位置情報に変更されるから、コリジョンが確実に排除され、正確な不一致部分の位置情報を得ることができる。 Returning to FIG. 2, the continuation unit 16 inputs the position information of the inconsistent portion and the position of the collision occurrence portion from the continuity determination portion 15, reads the division number m from the setting file portion 10, and determines the collision occurrence portion. The position information is changed to be added to the position information of the mismatched portion, the mismatched portion is made continuous, and the position information of the continued mismatched portion is output. In this way, collisions are detected for each predetermined data section affected by file editing, and the detected collision position information is changed to the position information of the mismatched portion, so collisions are reliably eliminated and accurate. It is possible to obtain position information of such inconsistent portions.

図４は、分割数ｍ＝２、定数Ｘ＝１の場合において、フレームとブロックデータとが同期しているときの不一致部分及びコリジョン発生部分を説明する図である。図中、ファイル＃１データである編集ファイルの斜線部分は、ファイル編集された箇所を示しており、フレーム２〜４が編集されている。塗りつぶし部分は不一致部分を示しており、αはコリジョン発生部分を示している。比較部１４により不一致部分が比較検出された結果、フレーム２における第２のブロック、フレーム３における第１のブロック、及びフレーム４における第１，２のブロックが不一致部分になっている。連続性判定部１５により不一致部分の連続性が判定された結果、フレーム２の第１のブロック、及び第３のフレームの第２のブロックがコリジョン発生部分αになっている。そして、連続化部１６により、コリジョン発生部分αが不一致部分に変更され、結果として、ファイル編集されたフレーム２〜４の全てのブロックが、不一致部分として検出される。 FIG. 4 is a diagram for explaining a mismatched portion and a collision occurrence portion when the frame and block data are synchronized when the division number m = 2 and the constant X = 1. In the figure, the hatched portion of the edit file that is the file # 1 data indicates the portion where the file is edited, and frames 2 to 4 are edited. The filled portion indicates a non-matching portion, and α indicates a collision occurrence portion. As a result of the comparison and detection of the mismatched portion by the comparison unit 14, the second block in frame 2, the first block in frame 3, and the first and second blocks in frame 4 are mismatched portions. As a result of determining the continuity of the mismatched portion by the continuity determining unit 15, the first block of the frame 2 and the second block of the third frame are the collision occurrence portion α. Then, the collision unit 16 changes the collision occurrence portion α to a mismatched portion, and as a result, all the blocks of the frames 2 to 4 that have been file-edited are detected as mismatched portions.

図５は、分割数ｍ＝１．５、定数Ｘ＝１の場合において、フレームとブロックデータとが同期しているときの不一致部分及びコリジョン発生部分を説明する図である。図４と同様に、ファイル＃１データである編集ファイルの斜線部分は、ファイル編集された箇所を示しており、フレーム２〜４が編集されている。また、塗りつぶし部分は不一致部分を示しており、αはコリジョン発生部分を示している。比較部１４により不一致部分が比較検出された結果、フレーム１，２における第３のブロック、及びフレーム３，４における第２，３ブロックが不一致部分になっている。連続性判定部１５により不一致部分の連続性が判定された結果、フレーム１，２の第２のブロック、及び第３，４のフレームの第１ブロックがコリジョン発生部分αになっている。そして、連続化部１６により、コリジョン発生部分αが不一致部分に変更され、結果として、ファイル編集されたフレーム２〜４の全てのブロックが、不一致部分として検出される。ここで、ファイル編集されていないβの部分も不一致部分として検出されるが、図１に示したように、元ファイル及び複製ファイルに対し同じデータが上書きされることになるから、結果として元のデータが維持されることになる。 FIG. 5 is a diagram for explaining a mismatch portion and a collision occurrence portion when the frame and the block data are synchronized when the division number m = 1.5 and the constant X = 1. Similar to FIG. 4, the hatched portion of the edit file that is the file # 1 data indicates the part where the file is edited, and frames 2 to 4 are edited. Further, the filled portion indicates a mismatched portion, and α indicates a collision occurrence portion. As a result of the comparison and detection of the mismatched portion by the comparison unit 14, the third block in the frames 1 and 2 and the second and third blocks in the frames 3 and 4 are mismatched portions. As a result of determining the continuity of the mismatched portion by the continuity determining unit 15, the second block of the frames 1 and 2 and the first block of the third and fourth frames are the collision occurrence portion α. Then, the collision unit 16 changes the collision occurrence portion α to a mismatched portion, and as a result, all the blocks of the frames 2 to 4 that have been file-edited are detected as mismatched portions. Here, the portion of β that has not been edited is also detected as a mismatched portion. However, as shown in FIG. 1, since the same data is overwritten on the original file and the duplicate file, as a result, the original data is overwritten. Data will be maintained.

以上のように、実施例１のファイル処理装置１によれば、ファイルを分割したブロックデータのダイジェスト値を用いて、ファイル＃１，２の不一致部分を検出する際に、連続性判定部１５が、ファイル編集に伴って影響を受ける所定のデータ区間毎に、不一致部分の連続性を判定し、所定のデータ区間内の不一致部分の数が、所定のデータ区間をブロックデータに分割するために予め設定された分割数ｍよりも小さい場合、不一致部分以外の一致部分をコリジョン発生部分として検出し、連続化部１６が、コリジョン発生部分を不一致部分に変更するようにした。これにより、大容量のファイルデータを比較することなく簡易な手法により、コリジョンを確実に検出し排除することができる。したがって、コリジョンの発生確率が０になるから、不一致部分の検出精度を向上させ、高信頼化を実現することができる。 As described above, according to the file processing apparatus 1 of the first embodiment, the continuity determination unit 15 uses the digest value of the block data obtained by dividing the file to detect the inconsistent portion of the files # 1 and # 2. For each predetermined data section affected by file editing, the continuity of the mismatched portion is determined, and the number of mismatched portions in the predetermined data section is determined in advance to divide the predetermined data section into block data. When the number of divisions is smaller than the set division number m, a matching portion other than the mismatching portion is detected as a collision occurrence portion, and the continuation unit 16 changes the collision occurrence portion to a mismatching portion. Thereby, collision can be reliably detected and eliminated by a simple method without comparing large-capacity file data. Accordingly, since the collision occurrence probability becomes 0, the detection accuracy of the mismatched portion can be improved and high reliability can be realized.

次に、実施例２について詳細に説明する。実施例２のファイル処理装置２は、比較する２つのファイルのフレームサイズ（同期間隔）が一定及び一致しており、フレームとブロックデータとが同期していない場合において、ファイルの不一致部分を検出する装置である。具体的には、ファイル処理装置２は、２つのファイルに対し、ファイルを分割したブロックデータのダイジェスト値をそれぞれ演算して比較し、ダイジェスト値が異なるブロックデータの部分を不一致部分として検出する。そして、ファイル処理装置２は、検出した不一致部分の前後のブロックデータのうち、一致部分をコリジョン発生部分として推定し、コリジョン発生部分を不一致部分に変更することにより、コリジョンを排除する。 Next, Example 2 will be described in detail. The file processing apparatus 2 according to the second embodiment detects a mismatched portion of a file when the frame sizes (synchronization intervals) of the two files to be compared are constant and coincide with each other and the frame and the block data are not synchronized. Device. Specifically, the file processing apparatus 2 calculates and compares the digest values of the block data obtained by dividing the files for the two files, and detects a block data portion having a different digest value as a mismatched portion. Then, the file processing apparatus 2 estimates the matching portion as the collision occurrence portion in the block data before and after the detected mismatching portion, and eliminates the collision by changing the collision occurrence portion to the mismatching portion.

図６は、実施例２によるファイル処理装置２の構成を示すブロック図である。このファイル処理装置２は、設定ファイル部２０、ブロック化部２１−１，２１−２、ダイジェスト値演算部２２−１，２２−２、比較部２３及び連続化部２４を備えている。 FIG. 6 is a block diagram illustrating the configuration of the file processing apparatus 2 according to the second embodiment. The file processing apparatus 2 includes a setting file unit 20, blocking units 21-1 and 21-2, digest value calculation units 22-1 and 22-2, a comparison unit 23, and a continuation unit 24.

設定ファイル部２０は記憶手段であり、オペレータにより予め設定されたブロックサイズＮ及び排除コリジョン数ｎが設定ファイルとして格納されている。ここで、排除コリジョン数ｎは、連続化部２４において、不一致部分の前後のブロックデータをコリジョン発生部分として推定し排除する数である。排除コリジョン数ｎは、例えば、コリジョンが発生し易いハッシュ関数を用いる場合、大きい値が設定され、コリジョンが発生し難いハッシュ関数を用いる場合、小さい値が設定される。 The setting file unit 20 is a storage unit, and stores a block size N and an exclusion collision number n set in advance by an operator as a setting file. Here, the number of exclusion collisions n is a number that the continuation unit 24 estimates and excludes block data before and after the mismatched portion as a collision occurrence portion. The exclusion collision number n is set to a large value when, for example, a hash function that easily causes a collision is used, and is set to a small value when a hash function that is difficult to cause a collision is used.

ブロック化部２１−１は、ファイル＃１データを入力し、設定ファイル部２０からブロックサイズＮを読み出し、任意の位相にて、ファイル＃１データをブロックサイズＮに分割し、分割したブロックデータ及びその位置情報をダイジェスト値演算部２２−１に出力する。同様に、ブロック化部２１−２は、ファイル＃２データを入力し、設定ファイル部２０からブロックサイズＮを読み出し、任意の位相にて、ファイル＃２データをブロックサイズＮに分割し、分割したブロックデータ及びその位置情報をダイジェスト値演算部２２−２に出力する。ブロック化部２１−１，２１−２から出力されるそれぞれのブロックデータは、比較する２つのファイル＃１，＃２のフレームサイズ（同期間隔）が一定及び一致していることが前提であるから、同期していることになる。 The blocking unit 21-1 inputs the file # 1 data, reads the block size N from the setting file unit 20, divides the file # 1 data into the block size N at an arbitrary phase, The position information is output to the digest value calculation unit 22-1. Similarly, the blocking unit 21-2 inputs the file # 2 data, reads the block size N from the setting file unit 20, divides the file # 2 data into the block size N at an arbitrary phase, and divides the data. The block data and its position information are output to the digest value calculation unit 22-2. Each block data output from the blocking units 21-1 and 21-2 is based on the premise that the frame sizes (synchronization intervals) of the two files # 1 and # 2 to be compared are constant and coincide. , Will be in sync.

ダイジェスト値演算部２２−１は、ブロック化部２１−１からブロックデータ及び位置情報を入力し、ブロックデータに対し、ハッシュ関数等の所定の関数または演算式を用いてダイジェスト値を演算し、ダイジェスト値及び位置情報を比較部２３に出力する。同様に、ダイジェスト値演算部２２−２は、ブロック化部２１−２からブロックデータ及び位置情報を入力し、ブロックデータに対し、ダイジェスト値演算部２２−１において用いる関数等と同じ関数等を用いて、ダイジェスト値を演算し、ダイジェスト値及び位置情報を比較部２３に出力する。 The digest value calculation unit 22-1 receives block data and position information from the block forming unit 21-1, calculates a digest value for the block data using a predetermined function such as a hash function or an arithmetic expression, and performs digest processing. The value and the position information are output to the comparison unit 23. Similarly, the digest value calculation unit 22-2 receives block data and position information from the block forming unit 21-2, and uses the same functions as the functions used in the digest value calculation unit 22-1 for the block data. Then, the digest value is calculated, and the digest value and the position information are output to the comparison unit 23.

比較部２３は、ダイジェスト値演算部２２−１から、ファイル＃１のブロックデータにおけるダイジェスト値及び位置情報を入力すると共に、ダイジェスト値演算部２２−２から、ファイル＃２のブロックデータにおけるダイジェスト値及び位置情報を入力し、同じ位置情報のダイジェスト値を比較し、異なるダイジェスト値の位置情報を不一致部分の位置情報として連続化部２４に出力する。 The comparison unit 23 inputs the digest value and position information in the block data of the file # 1 from the digest value calculation unit 22-1, and from the digest value calculation unit 22-2, the digest value in the block data of the file # 2 Position information is input, digest values of the same position information are compared, and position information of different digest values is output to the continuation unit 24 as position information of a mismatched portion.

連続化部２４は、比較部２３から不一致部分の位置情報を入力し、設定ファイル部２０から排除コリジョン数ｎを読み出し、不一致部分をその前後に排除コリジョン数ｎ分広げて連続化し、不一致部分の位置情報を出力する。 The continuation unit 24 inputs the position information of the mismatched portion from the comparison unit 23, reads out the number of excluded collisions n from the setting file unit 20, extends the mismatched portion by the number of excluded collisions n before and after it, Output location information.

図７は、連続化部２４の処理を示すフローチャートである。連続化部２４は、比較部２３から不一致部分の位置情報を入力し（ステップＳ７０１）、設定ファイル部２０から排除コリジョン数ｎを読み出す（ステップＳ７０２）。 FIG. 7 is a flowchart showing the processing of the continuation unit 24. The continuation unit 24 inputs the position information of the mismatched portion from the comparison unit 23 (step S701), and reads the number n of excluded collisions from the setting file unit 20 (step S702).

連続化部２４は、不一致部分を基点として、その前後における排除コリジョン数ｎ分のブロックデータが不一致部分であるか、または一致部分であるかをそれぞれ判定する（ステップＳ７０３）。そのブロックデータが一致部分であると判定した場合（ステップＳ７０３：一致部分）、そのブロックデータをコリジョン発生部分として推定する（ステップＳ７０４）。 The continuation unit 24 determines whether or not the block data corresponding to the number of excluded collisions n before and after the mismatched part is a mismatched part or a matched part using the mismatched part as a base point (step S703). When it is determined that the block data is a matching portion (step S703: matching portion), the block data is estimated as a collision occurrence portion (step S704).

連続化部２４は、コリジョン発生部分を不一致部分に変更し、すなわち、コリジョン発生部分の位置情報を不一致部分の位置情報に加えるように変更する（ステップＳ７０５）。連続化部２４は、ステップＳ７０５から移行した場合、または、ステップＳ７０３において、そのブロックデータが不一致部分であると判定した場合（ステップＳ７０３：不一致部分）、不一致部分の位置情報を出力する（ステップＳ７０６）。 The continuation unit 24 changes the collision occurrence portion to the mismatch portion, that is, changes the position information of the collision occurrence portion to be added to the position information of the mismatch portion (step S705). When the process proceeds from step S705 or when the block data is determined to be a mismatched part in step S703 (step S703: mismatched part), the continuation unit 24 outputs position information of the mismatched part (step S706). ).

このように、連続化部２４は、不一致部分の前後における排除コリジョン数ｎ分のブロックデータのそれぞれについて、そのブロックデータが一致部分である場合、そのブロックデータをコリジョン発生部分として推定し、推定したコリジョン発生部分を不一致部分に変更するようにした。これにより、コリジョンの発生を、簡易な手法にて、安全性及び信頼性の観点から判定することができる。 As described above, the continuation unit 24 estimates and estimates the block data as the collision occurrence part when the block data is the matching part for each of the block data corresponding to the number n of the excluded collisions before and after the mismatching part. The collision occurrence part was changed to the mismatch part. Thereby, generation | occurrence | production of a collision can be determined from a viewpoint of safety | security and reliability with a simple method.

図８は、分割数ｍ＝１．５、定数Ｘ＝１、及び、フレームとブロックデータを同期させることなくブロック化を行い、不一致部分の前後１ブロックをコリジョン排除の対象（排除コリジョン数ｎ＝１）とした場合を説明する図である。図中、ファイル＃１データである編集ファイルの斜線部分は、ファイル編集された箇所を示しており、フレーム２〜４が編集されている。塗りつぶし部分は不一致部分を示しており、αは、連続化部２４により推定されたコリジョン発生部分を示している。比較部２３により不一致部分が比較検出された結果、左から２番目のブロック、３番目のブロック及び５番目のブロックが不一致部分になっている。連続化部２４により不一致部分の前後１ブロックを対象にしてコリジョン発生部分が推定された結果、１番目のブロック、４番目のブロック及び６番目のブロックがコリジョン発生部分αとして推定されている。そして、連続化部２４により、推定されたコリジョン発生部分αが不一致部分に変更され、結果として、ファイル編集されたフレーム２〜４のブロック等が、不一致部分として検出される。ここで、ファイル編集されていないβの部分も不一致部分として検出されるが、図１に示したように、元ファイル及び複製ファイルに対し同じデータが上書きされることになるから、結果として元のデータが維持されることになる。 In FIG. 8, the division number m = 1.5, the constant X = 1, and the block and the block data are divided into blocks without being synchronized. It is a figure explaining the case set as 1). In the figure, the hatched portion of the edit file that is the file # 1 data indicates the portion where the file is edited, and frames 2 to 4 are edited. The filled portion indicates a mismatched portion, and α indicates a collision occurrence portion estimated by the continuation unit 24. As a result of the comparison and detection of the mismatched portion by the comparison unit 23, the second block, the third block, and the fifth block from the left are mismatched portions. As a result of the collision generation part being estimated by the continuation unit 24 with respect to one block before and after the mismatched part, the first block, the fourth block, and the sixth block are estimated as the collision generation part α. Then, the estimated collision occurrence portion α is changed to a mismatched portion by the continuation unit 24, and as a result, the blocks and the like of the frames 2 to 4 edited as a file are detected as mismatched portions. Here, the portion of β that has not been edited is also detected as a mismatched portion. However, as shown in FIG. 1, since the same data is overwritten on the original file and the duplicate file, as a result, the original data is overwritten. Data will be maintained.

図９は、分割数ｍ＝１．５、定数Ｘ＝１、及び、フレームとブロックデータを同期させることなくブロック化を行い、不一致部分の前後１ブロックをコリジョン排除の対象（排除コリジョン数ｎ＝２）とした場合を説明する図である。図８と同様に、ファイル＃１データである編集ファイルの斜線部分は、ファイル編集された箇所を示しており、フレーム２〜４が編集されている。塗りつぶし部分は不一致部分を示しており、αは、連続化部２４により推定されたコリジョン発生部分を示している。比較部２３により不一致部分が比較検出された結果、左から３番目のブロック、５番目のブロック及び６番目のブロックが不一致部分になっている。連続化部２４により不一致部分の前後２ブロックを対象にしてコリジョン発生部分が推定された結果、１番目のブロック、２番目のブロック、４番目のブロック、７番目のブロック及び８番目のブロックがコリジョン発生部分αとして推定されている。そして、連続化部２４により、推定されたコリジョン発生部分αが不一致部分に変更され、結果として、ファイル編集されたフレーム２〜４のブロック等が、不一致部分として検出される。ここで、図８と同様に、ファイル編集されていないβの部分も不一致部分として検出されるが、図１に示したように、元ファイル及び複製ファイルに対し同じデータが上書きされることになるから、結果として元のデータが維持されることになる。 In FIG. 9, the division number m = 1.5, the constant X = 1, and the block and the block data are blocked without being synchronized, and one block before and after the inconsistent portion is subject to collision exclusion (exclusion collision number n = It is a figure explaining the case set as 2). Similarly to FIG. 8, the shaded portion of the edit file that is the file # 1 data indicates the part where the file is edited, and frames 2 to 4 are edited. The filled portion indicates a mismatched portion, and α indicates a collision occurrence portion estimated by the continuation unit 24. As a result of the comparison and detection of the mismatched portion by the comparison unit 23, the third block, the fifth block and the sixth block from the left are mismatched portions. As a result of the collision unit being estimated by the continuation unit 24 for the two blocks before and after the mismatched portion, the first block, the second block, the fourth block, the seventh block, and the eighth block collide. It is estimated as the generation part α. Then, the estimated collision occurrence portion α is changed to a mismatched portion by the continuation unit 24, and as a result, the blocks and the like of the frames 2 to 4 edited as a file are detected as mismatched portions. Here, as in FIG. 8, the portion of β that has not been edited is also detected as a mismatched portion, but the same data is overwritten on the original file and the duplicate file as shown in FIG. As a result, the original data is maintained.

以上のように、実施例２のファイル処理装置２によれば、ファイルを分割したブロックデータのダイジェスト値を用いて、ファイル＃１，２の不一致部分を検出する際に、連続化部２４が、比較部２３により検出された不一致部分の前後における排除コリジョン数ｎ分のブロックデータのそれぞれについて、そのブロックデータが一致部分である場合、そのブロックデータをコリジョン発生部分として推定し、推定したコリジョン発生部分を不一致部分に変更するようにした。これにより、大容量のファイルデータを比較することなく簡易な手法により、コリジョンを検出し排除することができ、不一致部分の検出精度を向上させ、高信頼化を実現することができる。 As described above, according to the file processing apparatus 2 of the second embodiment, when detecting a mismatched portion between the files # 1 and # 2 using the digest value of the block data obtained by dividing the file, the continuation unit 24 When the block data is the matching portion for each of the block data of n collisions before and after the inconsistent portion detected by the comparison unit 23, the block data is estimated as the collision occurrence portion, and the estimated collision occurrence portion Was changed to the inconsistent part. This makes it possible to detect and eliminate collisions by a simple method without comparing large-capacity file data, improve the detection accuracy of inconsistent portions, and realize high reliability.

また、実施例２のファイル処理装置２によれば、図２に示した実施例１のファイル処理装置１と比較して、同期検出部１１−１，１１−２及び連続性判定部１５を備えておらず、フレームとブロックデータとが同期していない場合であっても、排除コリジョン数ｎに応じて不一致部分を広げることにより、コリジョンを検出し排除するようにした。これにより、フレームとブロックデータとを同期させるための処理を行うことなく、実施例１に比べて、処理及び装置を大幅に簡略化することができる。 In addition, the file processing apparatus 2 according to the second embodiment includes synchronization detection units 11-1 and 11-2 and a continuity determination unit 15 as compared with the file processing apparatus 1 according to the first embodiment illustrated in FIG. Even if the frame and the block data are not synchronized, the collision is detected and eliminated by widening the mismatched portion according to the number n of excluded collisions. As a result, the processing and apparatus can be greatly simplified as compared with the first embodiment without performing processing for synchronizing the frame and the block data.

尚、図６に示した実施例２によるファイル処理装置２の構成に加え、連続化部２４の後段に孤立除去部を備えるようにしてもよい。孤立除去部は、不一致部分の間で孤立している一致部分を除去し、一連の不一致部分を生成する。具体的には、孤立除去部は、連続化部２４から、コリジョンが排除された不一致部分の位置情報を入力し、所定数以上連続している不一致部分を不一致グループとし、不一致グループの間に位置する一致部分のブロック数と、予め設定された閾値とを比較する。そして、孤立除去部は、一致部分のブロック数が閾値よりも小さいと判定した場合、その一致部分を不一致部分に加えるように変更し、不一致部分の位置情報を出力する。 In addition to the configuration of the file processing apparatus 2 according to the second embodiment illustrated in FIG. 6, an isolation removing unit may be provided after the continuation unit 24. The isolated removal unit removes the matching parts that are isolated between the mismatching parts, and generates a series of mismatching parts. Specifically, the isolated removal unit inputs the position information of the mismatched portion from which the collision is eliminated from the continuation unit 24, sets the mismatched portion continuing for a predetermined number or more as the mismatched group, and is positioned between the mismatched groups. The number of matching blocks is compared with a preset threshold value. Then, when it is determined that the number of blocks in the matching portion is smaller than the threshold value, the isolation removing unit changes the matching portion to be added to the mismatching portion, and outputs position information of the mismatching portion.

このように、孤立除去部を備えたファイル処理装置２によれば、安全性の観点から、不一致部分を広げることができ、一層の高信頼化を実現することができる。また、不一致グループの間に孤立している一致部分が不一致部分に変更されるから、例えば、２つの不一致グループを１つの不一致グループに変更することができる。したがって、転送等の処理は１つの不一致部分のデータに対して１回で済むから、処理負荷を低減することができる。 As described above, according to the file processing apparatus 2 including the isolated removal unit, the inconsistent portion can be widened from the viewpoint of safety, and higher reliability can be realized. In addition, since a matching portion that is isolated between mismatching groups is changed to a mismatching portion, for example, two mismatching groups can be changed to one mismatching group. Accordingly, processing such as transfer is performed only once for one mismatched portion of data, so that the processing load can be reduced.

また、排除コリジョン数ｎ＝１が設定された場合、連続化部２４により、不一致部分がｎ＝１の範囲で広げられ、コリジョンが排除される。しかしながら、コリジョンはｎ＝１の範囲で排除されるに過ぎない。そこで、孤立除去部を備えることにより、さらに不一致部分が広げられるから、排除コリジョン数ｎ＝２が設定された場合と同様の範囲で、コリジョンを排除することができる。 Further, when the number of excluded collisions n = 1 is set, the non-matching portion is widened in the range of n = 1 by the continuation unit 24, and the collision is excluded. However, collisions are only eliminated in the range of n = 1. Therefore, by providing the isolated removal unit, the inconsistent portion is further widened, so that the collision can be excluded in the same range as when the number of excluded collisions n = 2 is set.

次に、実施例３について詳細に説明する。実施例３のファイル処理装置３は、比較する２つのファイルのフレームサイズ（同期間隔）が変化している場合において、ファイルの不一致部分を検出する装置である。具体的には、ファイル処理装置３は、２つのファイルの同期を検出し、同期情報、分割数ｍ、及びファイルデータをブロックサイズに分割する際の整数倍（Ｘ倍）の値である定数Ｘに基づいてブロックサイズＮを判定し、前述の実施例１のファイル処理装置１と同様の処理を行う。すなわち、ファイル処理装置３は、ファイルをブロックサイズＮに分割したブロックデータのダイジェスト値をそれぞれ演算して比較し、ダイジェスト値が異なるブロックデータの部分を不一致部分として検出する。そして、ファイル処理装置３は、検出した不一致部分の連続性を判定し、連続すべき不一致部分に一致部分が含まれる場合、この一致部分をダイジェスト値のコリジョン発生部分とし、コリジョン発生部分を不一致部分に変更することにより、コリジョンを排除する。 Next, Example 3 will be described in detail. The file processing device 3 according to the third embodiment is a device that detects a mismatched portion of files when the frame sizes (synchronization intervals) of two files to be compared are changed. Specifically, the file processing device 3 detects the synchronization of two files, the synchronization information, the division number m, and a constant X that is an integer multiple (X times) when dividing the file data into block sizes. The block size N is determined based on the above, and the same processing as that of the file processing apparatus 1 of the first embodiment is performed. That is, the file processing apparatus 3 calculates and compares the digest values of the block data obtained by dividing the file into the block size N, and detects the portions of the block data having different digest values as inconsistent portions. Then, the file processing device 3 determines the continuity of the detected mismatched portion, and when the mismatched portion to be continued includes a matched portion, the matched portion is set as a collision occurrence portion of the digest value, and the collision generated portion is the mismatched portion. By changing to, the collision is eliminated.

図１０は、実施例３によるファイル処理装置３の構成を示すブロック図である。このファイル処理装置３は、同期検出部３０−１，３０−２、ブロックサイズ判定部３１、ブロック化部３２−１，３２−２、ダイジェスト値演算部３３−１，３３−２、比較部３４、連続性判定部３５及び連続化部３６を備えている。 FIG. 10 is a block diagram illustrating the configuration of the file processing apparatus 3 according to the third embodiment. The file processing apparatus 3 includes synchronization detection units 30-1 and 30-2, a block size determination unit 31, blocking units 32-1 and 32-2, digest value calculation units 33-1 and 33-2, and a comparison unit 34. The continuity determination unit 35 and the continuation unit 36 are provided.

図２に示した実施例１のファイル処理装置１と、このファイル処理装置３とを比較すると、両装置１，３は、同期検出部１１−１，１１−２，３０−１,３０−２、ブロック化部１２−１，１２−２，３２−１，３２−２、ダイジェスト値演算部１３−１，１３−２，３３−１，３３−２、比較部１４，３４、連続性判定部１５，３５及び連続化部１６，３６を備えている点で同一である。一方、ファイル処理装置１は、設定ファイル部１０を備えているのに対し、ファイル処理装置３は、ブロックサイズ判定部３１を備えている点で相違する。ファイル処理装置１では、ブロックサイズＮが予め設定されているのに対し、ファイル処理装置３では、ブロックサイズＮがファイル＃１，＃２データの同期情報に基づいて判定される。 When comparing the file processing apparatus 1 of the first embodiment shown in FIG. 2 and the file processing apparatus 3, both apparatuses 1 and 3 are synchronized detection units 11-1, 11-2, 30-1, 30-2. Blocking units 12-1, 12-2, 32-1, 32-2, digest value calculating units 13-1, 13-2, 33-1, 33-2, comparing units 14, 34, continuity determining unit 15 and 35 and the continuous parts 16 and 36 are the same. On the other hand, the file processing apparatus 1 includes a setting file unit 10, whereas the file processing apparatus 3 is different in that it includes a block size determination unit 31. In the file processing apparatus 1, the block size N is set in advance, whereas in the file processing apparatus 3, the block size N is determined based on the synchronization information of the files # 1 and # 2 data.

同期検出部３０−１,３０−２、ブロック化部３２−１，３２−２、ダイジェスト値演算部３３−１，３３−２、比較部３４、連続性判定部３５及び連続化部３６は、実施例１の同期検出部１１−１，１１−２、ブロック化部１２−１，１２−２、ダイジェスト値演算部１３−１，１３−２、比較部１４、連続性判定部１５及び連続化部１６と同じ処理を行うから、ここでは説明を省略する。 The synchronization detection units 30-1 and 30-2, the blocking units 32-1 and 32-2, the digest value calculation units 33-1 and 33-2, the comparison unit 34, the continuity determination unit 35, and the continuation unit 36 are Synchronization detection units 11-1 and 11-2, blocking units 12-1 and 12-2, digest value calculation units 13-1 and 13-2, comparison unit 14, continuity determination unit 15, and continuation in Example 1 Since the same processing as that of the unit 16 is performed, description thereof is omitted here.

ブロックサイズ判定部３１は、同期検出部３０−１から、フレームサイズを示す同期間隔及びその同期間隔の先頭位置のタイミングを含む同期情報ａを入力すると共に、同期検出部３０−２から同期情報ｂを入力し、同期情報ａと同期情報ｂとを比較し、同じであると判定した場合、予め設定された定数Ｘ及び分割数ｍを用いてブロックサイズＮを求め、同期情報及びブロックサイズＮをブロック化部３２−１，３２−２に出力し、分割数ｍを連続性判定部３５及び連続化部３６に出力する。一方、ブロックサイズ判定部３１は、同期情報ａ，ｂが同じでないと判定した場合、全部不一致を出力する。この場合、コリジョンの検出及び排除の処理は行われない。尚、実施例３では、比較する２つのファイルのフレームサイズ（同期間隔）が変化しているから、ブロックサイズＮは、その変化に応じた値となる。 The block size determination unit 31 receives the synchronization information a including the synchronization interval indicating the frame size and the timing of the head position of the synchronization interval from the synchronization detection unit 30-1, and also receives the synchronization information b from the synchronization detection unit 30-2. When the synchronization information a and the synchronization information b are compared and determined to be the same, the block size N is obtained using a preset constant X and the division number m, and the synchronization information and the block size N are obtained. The data is output to the blocking units 32-1 and 32-2, and the division number m is output to the continuity determination unit 35 and the continuation unit 36. On the other hand, if the block size determination unit 31 determines that the synchronization information a and b are not the same, the block size determination unit 31 outputs all mismatches. In this case, collision detection and exclusion processing is not performed. In the third embodiment, since the frame sizes (synchronization intervals) of the two files to be compared are changed, the block size N is a value corresponding to the change.

図１１は、ブロックサイズ判定部３１の処理を示すフローチャートである。ブロックサイズ判定部３１は、同期検出部３０−１から、同期間隔及びタイミングを含む同期情報ａを入力すると共に、同期検出部３０−２から同期情報ｂを入力（ステップＳ１１０１）する。そして、ブロックサイズ判定部３１は、同期情報ａと同期情報ｂとを比較し（ステップＳ１１０２）、同期情報ａ，ｂが同じであると判定した場合（ステップＳ１１０２：Ｙ）、すなわち、ファイル＃１，＃２データの同期間隔及びその先頭位置のタイミングが同じであると判定した場合、同期間隔またはその同期間隔の整数倍（Ｘ倍）のデータ（１つのフレームまたは複数のフレーム）を分割数ｍで除算し、ブロックサイズＮを求める（ステップＳ１１０３）。そして、ブロックサイズ判定部３１は、同期情報（同期情報ａ，ｂ）及びブロックサイズＮをブロック化部３２−１，３２−２に出力し、予め設定された分割数ｍを連続性判定部３５及び連続化部３６に出力する（ステップＳ１１０４）。 FIG. 11 is a flowchart showing the processing of the block size determination unit 31. The block size determination unit 31 receives the synchronization information a including the synchronization interval and timing from the synchronization detection unit 30-1, and also receives the synchronization information b from the synchronization detection unit 30-2 (step S1101). Then, the block size determination unit 31 compares the synchronization information a and the synchronization information b (step S1102), and determines that the synchronization information a and b are the same (step S1102: Y), that is, the file # 1. , # 2 when the synchronization interval and the timing of the head position are determined to be the same, the number of divisions m (1 frame or a plurality of frames) of the synchronization interval or an integral multiple (X times) of the synchronization interval To obtain a block size N (step S1103). Then, the block size determination unit 31 outputs the synchronization information (synchronization information a and b) and the block size N to the blocking units 32-1 and 32-2, and sets the preset division number m to the continuity determination unit 35. And it outputs to the continuous part 36 (step S1104).

一方、ブロックサイズ判定部３１は、ステップＳ１１０２において、同期情報ａ，ｂが同じでないと判定した場合（ステップＳ１１０２：Ｎ）、すなわち、ファイル＃１，＃２データの同期間隔及びその先頭位置のタイミングのうちの少なくとも一方が異なると判定した場合、コリジョンの検出及び排除の処理を行わないことを示す全部不一致を出力する（ステップＳ１１０５）。 On the other hand, the block size determination unit 31 determines in step S1102 that the synchronization information a and b are not the same (step S1102: N), that is, the synchronization interval of the file # 1 and # 2 data and the timing of the head position. If it is determined that at least one of the two is different, an all mismatch indicating that the collision detection and exclusion processing is not performed is output (step S1105).

以上のように、実施例３のファイル処理装置３によれば、ファイルを分割したブロックデータのダイジェスト値を用いて、ファイル＃１，２の不一致部分を検出する際に、ブロックサイズ判定部３１が、同期情報ａ，ｂが同じであると判定した場合に、同期間隔またはその整数倍（Ｘ倍）のデータを分割数ｍで除算してブロックサイズＮを求め、ブロック化部３２−１，３２−２が、ブロックサイズ判定部３１から同期情報及びブロックサイズＮを入力し、ブロックデータにファイルを分割する。そして、連続性判定部３５が、実施例１の連続性判定部１５と同様に、ファイル編集に伴って影響を受ける所定のデータ区間毎に、不一致部分の連続性を判定し、所定のデータ区間内の不一致部分の数が、所定のデータ区間をブロックデータに分割するための予め設定された分割数ｍよりも小さい場合、不一致部分以外の一致部分をコリジョン発生部分として検出し、連続化部３６が、実施例１の連続化部１６と同様に、コリジョン発生部分を不一致部分に変更するようにした。これにより、大容量のファイルデータを比較することなく簡易な手法により、可変レートのファイルのコリジョンを検出し排除することができ、不一致部分の検出精度を向上させ、高信頼化を実現することができる。また、ブロックサイズＮを予め設定する必要がなく、可変レートのファイル＃１，＃２データから直接求めることができ、ブロックサイズＮを設定する手間を省くことができる。 As described above, according to the file processing device 3 of the third embodiment, the block size determination unit 31 uses the digest value of the block data obtained by dividing the file to detect the inconsistent portion of the files # 1 and # 2. When the synchronization information a and b are determined to be the same, the block size N is obtained by dividing the synchronization interval or an integer multiple (X times) of the data by the division number m to obtain the block size N. -2 inputs the synchronization information and the block size N from the block size determination unit 31, and divides the file into block data. Similar to the continuity determination unit 15 of the first embodiment, the continuity determination unit 35 determines the continuity of the inconsistent portion for each predetermined data section that is affected by file editing. When the number of inconsistent portions is smaller than a predetermined division number m for dividing a predetermined data section into block data, a matching portion other than the inconsistent portion is detected as a collision occurrence portion, and the continuation unit 36 However, as with the continuation unit 16 of the first embodiment, the collision occurrence portion is changed to a mismatch portion. This makes it possible to detect and eliminate collisions of variable rate files with a simple method without comparing large amounts of file data, improve the detection accuracy of mismatched parts, and achieve high reliability. it can. Further, it is not necessary to set the block size N in advance, it can be obtained directly from the variable rate file # 1, # 2 data, and the trouble of setting the block size N can be saved.

次に、実施例４について詳細に説明する。実施例４のファイル処理装置４は、比較する２つのファイルのフレームサイズ（同期間隔）が変化している場合において、ファイルの不一致部分を検出する装置である。具体的には、ファイル処理装置４は、２つのファイルの同期を検出し、前述の実施例３のファイル処理装置３と同様の処理を行ってブロックサイズＮを判定し、前述の実施例２のファイル処理装置２と同様の処理を行ってコリジョンを排除する。すなわち、ファイル処理装置４は、２つのファイルの同期を検出し、同期情報、分割数ｍ、及びファイルデータをブロックサイズに分割する際の整数倍（Ｘ倍）の値である定数Ｘに基づいてブロックサイズＮを判定する。そして、ファイル処理装置４は、ファイルをブロックサイズＮに分割したブロックデータのダイジェスト値をそれぞれ演算して比較し、ダイジェスト値が異なるブロックデータの部分を不一致部分として検出する。そして、ファイル処理装置４は、検出した不一致部分の前後のブロックデータのうち、一致部分をコリジョン発生部分として推定し、コリジョン発生部分を不一致部分に変更することにより、コリジョンを排除する。 Next, Example 4 will be described in detail. The file processing device 4 according to the fourth embodiment is a device that detects a mismatched portion of files when the frame sizes (synchronization intervals) of two files to be compared are changed. Specifically, the file processing device 4 detects the synchronization of the two files, performs the same processing as the file processing device 3 of the above-described third embodiment, determines the block size N, and performs the processing of the above-described second embodiment. The same processing as that of the file processing apparatus 2 is performed to eliminate collision. That is, the file processing device 4 detects synchronization of two files, and based on the synchronization information, the division number m, and a constant X that is an integer multiple (X times) when dividing the file data into block sizes. The block size N is determined. Then, the file processing device 4 calculates and compares the digest values of the block data obtained by dividing the file into the block size N, and detects the block data portions having different digest values as inconsistent portions. Then, the file processing apparatus 4 estimates the matching portion as the collision occurrence portion in the block data before and after the detected mismatching portion, and eliminates the collision by changing the collision occurrence portion to the mismatching portion.

図１２は、実施例４によるファイル処理装置４の構成を示すブロック図である。このファイル処理装置４は、同期検出部４０−１，４０−２、ブロックサイズ判定部４１、ブロック化部４２−１，４２−２、ダイジェスト値演算部４３−１，４３−２、比較部４４及び連続化部４５を備えている。 FIG. 12 is a block diagram illustrating the configuration of the file processing apparatus 4 according to the fourth embodiment. The file processing apparatus 4 includes synchronization detection units 40-1 and 40-2, a block size determination unit 41, blocking units 42-1 and 42-2, digest value calculation units 43-1 and 43-2, and a comparison unit 44. And a continuation unit 45.

図１０に示した実施例３のファイル処理装置３と、このファイル処理装置４とを比較すると、両装置３，４は、同期検出部３０−１,３０−２，４０−１，４０−２、ブロック化部３２−１，３２−２，４２−１，４２−２、ダイジェスト値演算部３３−１，３３−２，４３−１，４３−２及び比較部３４，４４を備えている点で同一である。一方、ファイル処理装置３は、ブロックサイズ判定部３１、連続性判定部３５及び連続化部３６を備えているのに対し、ファイル処理装置４は、連続性判定部３５を備えておらず、機能の異なるブロックサイズ判定部４１及び連続化部４５を備えている点で相違する。ファイル処理装置４の連続化部４５は、図６に示した実施例２のファイル処理装置２における連続化部２４と同じ機能を有する。 When comparing the file processing apparatus 3 of the third embodiment shown in FIG. 10 and the file processing apparatus 4, both apparatuses 3 and 4 are synchronized detection units 30-1, 30-2, 40-1, and 40-2. , Block forming units 32-1, 32-2, 42-1, 42-2, digest value calculating units 33-1, 33-2, 43-1, 43-2, and comparing units 34, 44 Are the same. On the other hand, the file processing device 3 includes the block size determination unit 31, the continuity determination unit 35, and the continuation unit 36, whereas the file processing device 4 does not include the continuity determination unit 35 and functions. Are different in that a block size determination unit 41 and a continuation unit 45 are provided. The continuation unit 45 of the file processing device 4 has the same function as the continuation unit 24 in the file processing device 2 of the second embodiment shown in FIG.

両装置３，４は、ブロックサイズＮがファイル＃１，＃２データの同期情報に基づいて判定される点で同じである。一方、ファイル処理装置３は、不一致部分の連続性を判定し、連続すべき不一致部分に一致部分が含まれる場合、この一致部分をダイジェスト値のコリジョン発生部分とし、コリジョンを排除するのに対し、ファイル処理装置４は、連続性を判定することなく、不一致部分の前後のブロックデータのうち、一致部分をコリジョン発生部分として推定し、コリジョンを排除する点で相違する。 Both apparatuses 3 and 4 are the same in that the block size N is determined based on the synchronization information of the files # 1 and # 2 data. On the other hand, the file processing device 3 determines the continuity of the non-matching portion, and when the non-matching portion to be continued includes a matching portion, this matching portion is regarded as a collision occurrence portion of the digest value, whereas the collision is excluded. The file processing apparatus 4 is different in that the matching part is estimated as a collision occurrence part in the block data before and after the non-matching part and the collision is excluded without determining the continuity.

同期検出部４０−１，４０−２、ブロック化部４２−１，４２−２、ダイジェスト値演算部４３−１，４３−２及び比較部４４は、実施例３の同期検出部３０−１,３０−２、ブロック化部３２−１，３２−２、ダイジェスト値演算部３３−１，３３−２及び比較部３４と同じ処理を行うから、ここでは説明を省略する。また、連続化部４５は、実施例２の連続化部２４と同じ処理を行うから、ここでは説明を省略する。 The synchronization detection units 40-1 and 40-2, the blocking units 42-1 and 42-2, the digest value calculation units 43-1 and 43-2, and the comparison unit 44 are the synchronization detection unit 30-1 and the synchronization detection unit 30-1, respectively. 30-2, the block forming units 32-1 and 32-2, the digest value calculating units 33-1 and 33-2, and the comparison unit 34, the description thereof is omitted here. Moreover, since the continuation part 45 performs the same process as the continuation part 24 of Example 2, description is abbreviate | omitted here.

ブロックサイズ判定部４１は、同期検出部４０−１から、フレームサイズを示す同期間隔及びその同期間隔の先頭位置のタイミングを含む同期情報ａを入力すると共に、同期検出部４０−２から同期情報ｂを入力し、同期情報ａと同期情報ｂとを比較し、同じであると判定した場合、同期間隔またはその整数倍（Ｘ倍）のデータを分割数ｍで除算してブロックサイズＮを求め、同期情報及びブロックサイズＮをブロック化部４２−１，４２−２に出力し、排除コリジョン数ｎを連続化部４５に出力する。定数Ｘ、分割数ｍ及び排除コリジョン数ｎは予め設定されているものとする。一方、ブロックサイズ判定部４１は、同期情報ａ，ｂが同じでないと判定した場合、全部不一致を出力する。この場合、コリジョンの検出及び排除の処理は行われない。尚、実施例４では、比較する２つのファイルのフレームサイズ（同期間隔）が変化しているから、ブロックサイズＮは、その変化に応じた値となる。 The block size determination unit 41 receives the synchronization information a including the synchronization interval indicating the frame size and the timing of the head position of the synchronization interval from the synchronization detection unit 40-1, and also receives the synchronization information b from the synchronization detection unit 40-2. When the synchronization information a and the synchronization information b are compared and determined to be the same, the synchronization interval or an integer multiple (X times) of data is divided by the division number m to obtain the block size N, The synchronization information and the block size N are output to the blocking units 42-1 and 42-2, and the exclusion collision number n is output to the continuation unit 45. The constant X, the division number m, and the exclusion collision number n are set in advance. On the other hand, if the block size determination unit 41 determines that the synchronization information a and b are not the same, it outputs all mismatches. In this case, collision detection and exclusion processing is not performed. In the fourth embodiment, since the frame sizes (synchronization intervals) of the two files to be compared are changed, the block size N is a value corresponding to the change.

以上のように、実施例４のファイル処理装置４によれば、ファイルを分割したブロックデータのダイジェスト値を用いて、ファイル＃１，２の不一致部分を検出する際に、ブロックサイズ判定部４１が、同期情報ａ，ｂが同じであると判定した場合に、同期間隔またはその整数倍（Ｘ倍）のデータを分割数ｍで除算してブロックサイズＮを求め、ブロック化部４２−１，４２−２が、ブロックサイズ判定部４１から同期情報及びブロックサイズＮを入力し、ブロックデータにファイルを分割する。そして、連続化部４５が、実施例２の連続化部２４と同様に、比較部４４により検出された不一致部分の前後における排除コリジョン数ｎ分のブロックデータのそれぞれについて、そのブロックデータが一致部分である場合、そのブロックデータをコリジョン発生部分として推定し、推定したコリジョン発生部分を不一致部分に変更するようにした。これにより、大容量のファイルデータを比較することなく簡易な手法により、可変レートのファイルのコリジョンを検出し排除することができ、不一致部分の検出精度を向上させ、高信頼化を実現することができる。また、ブロックサイズＮを予め設定する必要がなく、可変レートのファイル＃１，＃２データから直接求めることができ、ブロックサイズＮを設定する手間を省くことができる。 As described above, according to the file processing apparatus 4 of the fourth embodiment, the block size determination unit 41 uses the digest value of the block data obtained by dividing the file to detect the inconsistent portion of the files # 1 and # 2. When the synchronization information a and b are determined to be the same, the block size N is obtained by dividing the synchronization interval or an integral multiple (X times) of the data by the division number m to obtain a block size N. -2 inputs the synchronization information and the block size N from the block size determination unit 41, and divides the file into block data. Then, as with the continuation unit 24 of the second embodiment, the continuation unit 45 matches the block data corresponding to the number of exclusion collisions n before and after the non-coincidence portion detected by the comparison unit 44. In such a case, the block data is estimated as a collision occurrence portion, and the estimated collision occurrence portion is changed to a mismatch portion. This makes it possible to detect and eliminate collisions of variable rate files with a simple method without comparing large amounts of file data, improve the detection accuracy of mismatched parts, and achieve high reliability. it can. Further, it is not necessary to set the block size N in advance, it can be obtained directly from the variable rate file # 1, # 2 data, and the trouble of setting the block size N can be saved.

次に、実施例５について詳細に説明する。実施例５のファイル処理装置５は、比較する２つのファイルのうちの１つのファイルが、ブロックデータのダイジェスト値、位置情報、ブロックサイズＮ及び分割数ｍと共に記憶部に予め蓄積されており、実施例１の場合と同様に、比較する２つのファイルのフレームサイズ（同期間隔）が一定及び一致しており、フレームとブロックデータとが同期している場合において、ファイルの不一致部分を検出する装置である。 Next, Example 5 will be described in detail. In the file processing apparatus 5 of the fifth embodiment, one of two files to be compared is stored in advance in the storage unit together with the digest value of block data, position information, block size N, and division number m. As in the case of Example 1, when the frame sizes (synchronization intervals) of the two files to be compared are constant and coincide with each other, and the frame and the block data are synchronized, the device that detects the inconsistent portion of the file. is there.

図１３は、実施例５によるファイル処理装置５の構成を示すブロック図である。このファイル処理装置５は、記憶部５０、同期検出部５１、ブロック化部５２、ダイジェスト値演算部５３、ダイジェスト値読み出し部５４、比較部５５、連続性判定部５６及び連続化部５７を備えている。 FIG. 13 is a block diagram illustrating the configuration of the file processing apparatus 5 according to the fifth embodiment. The file processing apparatus 5 includes a storage unit 50, a synchronization detection unit 51, a blocking unit 52, a digest value calculation unit 53, a digest value reading unit 54, a comparison unit 55, a continuity determination unit 56, and a continuation unit 57. Yes.

図２に示した実施例１のファイル処理装置１と、このファイル処理装置５とを比較すると、両装置１，５は、ファイル＃１データを処理する同期検出部１１−１，５１、ブロック化部１２−１，５２、ダイジェスト値演算部１３−１，５３を備え、また、比較部１４，５５、連続性判定部１５，５６及び連続化部１６，５７を備えている点で同一である。一方、ファイル処理装置１は、ファイル＃２データを処理する同期検出部１１−２、ブロック化部１２−２及びダイジェスト値演算部１３−２を備え、また、設定ファイル部１０を備えているのに対し、ファイル処理装置５は、記憶部５０及びダイジェスト値読み出し部５４を備えている点で相違する。 Comparing the file processing apparatus 1 of the first embodiment shown in FIG. 2 with the file processing apparatus 5, both apparatuses 1 and 5 are synchronized detection units 11-1 and 51 that process file # 1 data, and are blocked. Parts 12-1 and 52, digest value calculation parts 13-1 and 53, and comparison parts 14 and 55, continuity determination parts 15 and 56, and continuation parts 16 and 57. . On the other hand, the file processing apparatus 1 includes a synchronization detection unit 11-2 that processes the file # 2 data, a blocking unit 12-2, and a digest value calculation unit 13-2, and also includes a setting file unit 10. On the other hand, the file processing apparatus 5 is different in that it includes a storage unit 50 and a digest value reading unit 54.

同期検出部５１、ブロック化部５２、ダイジェスト値演算部５３、比較部５５、連続性判定部５６及び連続化部５７は、実施例１の同期検出部１１−１、ブロック化部１２−１、ダイジェスト値演算部１３−１、比較部１４、連続性判定部１５及び連続化部１６と同じ処理を行うから、ここでは説明を省略する。尚、ダイジェスト値演算部５３は、記憶部５０に蓄積された、ファイル＃２データのダイジェスト値が演算された際の関数等と同じ関数等を用いて、ダイジェスト値を演算する。 The synchronization detection unit 51, the blocking unit 52, the digest value calculation unit 53, the comparison unit 55, the continuity determination unit 56, and the continuation unit 57 are the synchronization detection unit 11-1, the blocking unit 12-1, Since the same processing as the digest value calculation unit 13-1, the comparison unit 14, the continuity determination unit 15, and the continuation unit 16 is performed, description thereof is omitted here. The digest value calculation unit 53 calculates the digest value using the same function or the like stored in the storage unit 50 when the digest value of the file # 2 data is calculated.

記憶部５０には、予め設定されたブロックサイズＮ及び分割数ｍが設定ファイルとして蓄積されている。また、記憶部５０には、ファイル＃２データが蓄積されており、既にファイル＃２データをブロック化して求めたブロックデータ毎のダイジェスト値がその位置情報と共に蓄積されている。このように、ファイル＃２データのダイジェスト値及び位置情報は、ファイル＃２データ及びブロックサイズＮから演算して求めるのではなく、記憶部５０に予めＤＢまたはファイルとして蓄積されている。 The storage unit 50 stores a preset block size N and division number m as a setting file. In addition, file # 2 data is stored in the storage unit 50, and digest values for each block data already obtained by blocking the file # 2 data are stored together with the position information. Thus, the digest value and position information of the file # 2 data are not calculated from the file # 2 data and the block size N, but are stored in advance in the storage unit 50 as a DB or a file.

ダイジェスト値読み出し部５４は、ブロック化部５２から位置情報を入力し、その位置情報のダイジェスト値を記憶部５０から読み出し、読み出したダイジェスト値及び位置情報を比較部５５に出力する。 The digest value reading unit 54 receives position information from the blocking unit 52, reads the digest value of the position information from the storage unit 50, and outputs the read digest value and position information to the comparison unit 55.

比較部５５は、ダイジェスト値演算部５３から、ファイル＃１のブロックデータにおけるダイジェスト値及び位置情報を入力すると共に、ダイジェスト値読み出し部５４から、ファイル＃２のブロックデータにおけるダイジェスト値及び位置情報を入力し、同じ位置情報のダイジェスト値を比較し、異なるダイジェスト値の位置情報を不一致部分の位置情報として連続性判定部５６に出力する。連続性判定部５６及び連続化部５７は、記憶部５０から分割数ｍを読み出す。 The comparison unit 55 inputs the digest value and position information in the block data of the file # 1 from the digest value calculation unit 53, and inputs the digest value and position information in the block data of the file # 2 from the digest value reading unit 54. Then, the digest values of the same position information are compared, and the position information of different digest values is output to the continuity determination unit 56 as the position information of the mismatched portion. The continuity determination unit 56 and the continuation unit 57 read the division number m from the storage unit 50.

以上のように、実施例５のファイル処理装置５によれば、ファイル＃１データをブロック化してダイジェスト値及び位置情報を求め、予め蓄積されたファイル＃２データのダイジェスト値を、位置情報をキーにして記憶部５０から読み出し、２つのダイジェスト値を用いて、ファイル＃１，２の不一致部分を検出する際に、連続性判定部５６が、実施例１，３の連続性判定部１５，３５と同様に、ファイル編集に伴って影響を受ける所定のデータ区間毎に、不一致部分の連続性を判定し、所定のデータ区間内の不一致部分の数が、所定のデータ区間をブロックデータに分割するために予め設定された分割数ｍよりも小さい場合、不一致部分以外の一致部分をコリジョン発生部分として検出し、連続化部５７が、コリジョン発生部分を不一致部分に変更するようにした。これにより、大容量のファイルデータを比較することなく簡易な手法により、コリジョンを確実に検出し排除することができる。したがって、コリジョンの発生確率が０になるから、不一致部分の検出精度を向上させ、高信頼化を実現することができる。 As described above, according to the file processing device 5 of the fifth embodiment, the file # 1 data is blocked to obtain the digest value and the position information, the digest value of the file # 2 data stored in advance is used, and the position information is used as a key. The continuity determination unit 56 uses the two digest values to detect the mismatched portions of the files # 1 and # 2, and the continuity determination unit 15 and 35 of the first and third embodiments. In the same manner as described above, the continuity of inconsistent portions is determined for each predetermined data section affected by file editing, and the number of inconsistent portions in the predetermined data section divides the predetermined data section into block data. Therefore, when the division number m is smaller than the preset division number m, a matching portion other than the mismatching portion is detected as a collision occurrence portion, and the continuation unit 57 detects the collision occurrence portion as a mismatching portion. It was to be changed to. Thereby, collision can be reliably detected and eliminated by a simple method without comparing large-capacity file data. Accordingly, since the collision occurrence probability becomes 0, the detection accuracy of the mismatched portion can be improved and high reliability can be realized.

次に、実施例６について詳細に説明する。実施例６のファイル処理装置６は、比較する２つのファイルのうちの１つのファイルが、ブロックデータのダイジェスト値、位置情報、ブロックサイズＮ及び排除コリジョン数ｎと共に記憶部に予め蓄積されており、実施例２の場合と同様に、比較する２つのファイルのフレームサイズ（同期間隔）が一定及び一致しており、フレームとブロックデータとが同期していない場合において、ファイルの不一致部分を検出する装置である。 Next, Example 6 will be described in detail. In the file processing device 6 of the sixth embodiment, one of the two files to be compared is stored in advance in the storage unit together with the digest value of block data, position information, block size N, and number of excluded collisions n, As in the case of the second embodiment, the apparatus detects a mismatched portion of a file when the frame sizes (synchronization intervals) of the two files to be compared are constant and coincident and the frame and the block data are not synchronized. It is.

図１４は、実施例６によるファイル処理装置６の構成を示すブロック図である。このファイル処理装置６は、記憶部６０、ブロック化部６１、ダイジェスト値演算部６２、ダイジェスト値読み出し部６３、比較部６４及び連続化部６５を備えている。 FIG. 14 is a block diagram illustrating the configuration of the file processing apparatus 6 according to the sixth embodiment. The file processing device 6 includes a storage unit 60, a blocking unit 61, a digest value calculation unit 62, a digest value reading unit 63, a comparison unit 64, and a continuation unit 65.

図６に示した実施例２のファイル処理装置２と、このファイル処理装置６とを比較すると、両装置２，６は、ファイル＃１データを処理するブロック化部２１−１，６１、ダイジェスト値演算部２２−１，６２を備え、また、比較部２３，６４及び連続化部２４，６５を備えている点で同一である。一方、ファイル処理装置２は、ファイル＃２データを処理するブロック化部２１−２及びダイジェスト値演算部２２−２を備え、また、設定ファイル部２０を備えているのに対し、ファイル処理装置６は、記憶部６０及びダイジェスト値読み出し部６３を備えている点で相違する。 Comparing the file processing apparatus 2 according to the second embodiment shown in FIG. 6 with the file processing apparatus 6, both apparatuses 2 and 6 are configured to block units 21-1 and 61 that process file # 1 data, and digest values. It is the same in that arithmetic units 22-1 and 62 are provided, and comparison units 23 and 64 and continuation units 24 and 65 are provided. On the other hand, the file processing device 2 includes a blocking unit 21-2 and a digest value calculation unit 22-2 for processing the file # 2 data, and also includes a setting file unit 20, whereas the file processing device 6 Is different in that a storage unit 60 and a digest value reading unit 63 are provided.

ブロック化部６１、ダイジェスト値演算部６２、比較部６４及び連続化部６５は、実施例２のブロック化部２１−１、ダイジェスト値演算部２２−１、比較部２３及び連続化部２４と同じ処理を行うから、ここでは説明を省略する。尚、ダイジェスト値演算部６２は、記憶部６０に蓄積された、ファイル＃２データのダイジェスト値が演算された際の関数等と同じ関数等を用いて、ダイジェスト値を演算する。 The blocking unit 61, the digest value calculation unit 62, the comparison unit 64, and the continuation unit 65 are the same as the blocking unit 21-1, the digest value calculation unit 22-1, the comparison unit 23, and the continuation unit 24 of the second embodiment. Since the process is performed, the description is omitted here. The digest value calculation unit 62 calculates the digest value using the same function or the like stored in the storage unit 60 when the digest value of the file # 2 data is calculated.

記憶部６０には、予め設定されたブロックサイズＮ及び排除コリジョン数ｎが設定ファイルとして蓄積されている。また、記憶部６０には、ファイル＃２データが蓄積されており、既にファイル＃２データをブロック化して求めたブロックデータ毎のダイジェスト値がその位置情報と共に蓄積されている。このように、ファイル＃２データのダイジェスト値及び位置情報は、ファイル＃２データ及びブロックサイズＮから演算して求めるのではなく、記憶部６０に予めＤＢまたはファイルとして蓄積されている。 The storage unit 60 stores a preset block size N and exclusion collision number n as a setting file. In addition, file # 2 data is stored in the storage unit 60, and digest values for each block data obtained by blocking the file # 2 data are stored together with the position information. As described above, the digest value and position information of the file # 2 data are not calculated from the file # 2 data and the block size N, but are stored in advance in the storage unit 60 as a DB or a file.

ダイジェスト値読み出し部６３は、ブロック化部６１から位置情報を入力し、その位置情報のダイジェスト値を記憶部６０から読み出し、読み出したダイジェスト値及び位置情報を比較部６４に出力する。 The digest value reading unit 63 receives position information from the blocking unit 61, reads the digest value of the position information from the storage unit 60, and outputs the read digest value and position information to the comparison unit 64.

比較部６４は、ダイジェスト値演算部６２から、ファイル＃１のブロックデータにおけるダイジェスト値及び位置情報を入力すると共に、ダイジェスト値読み出し部６３から、ファイル＃２のブロックデータにおけるダイジェスト値及び位置情報を入力し、同じ位置情報のダイジェスト値を比較し、異なるダイジェスト値の位置情報を不一致部分の位置情報として連続化部６５に出力する。連続化部６５は、記憶部６０から排除コリジョン数ｎを読み出す。 The comparison unit 64 inputs the digest value and position information in the block data of the file # 1 from the digest value calculation unit 62, and inputs the digest value and position information in the block data of the file # 2 from the digest value reading unit 63. Then, the digest values of the same position information are compared, and the position information of the different digest values is output to the continuation unit 65 as the position information of the mismatched portion. The continuation unit 65 reads out the number of excluded collisions n from the storage unit 60.

以上のように、実施例６のファイル処理装置６によれば、ファイル＃１データをブロック化してダイジェスト値及び位置情報を求め、予め蓄積されたファイル＃２データのダイジェスト値を、位置情報をキーにして記憶部６０から読み出し、２つのダイジェスト値を用いて、ファイル＃１，２の不一致部分を検出する際に、連続化部６５が、実施例２，４の連続化部２４，４５と同様に、比較部６４により検出された不一致部分の前後における排除コリジョン数ｎ分のブロックデータのそれぞれについて、そのブロックデータが一致部分である場合、そのブロックデータをコリジョン発生部分として推定し、推定したコリジョン発生部分を不一致部分に変更するようにした。これにより、大容量のファイルデータを比較することなく簡易な手法により、コリジョンを検出し排除することができ、不一致部分の検出精度を向上させ、高信頼化を実現することができる。 As described above, according to the file processing device 6 of the sixth embodiment, the file # 1 data is blocked to obtain the digest value and the position information, and the digest value of the file # 2 data stored in advance is used as the key for the position information. When the inconsistent portions of the files # 1 and # 2 are detected using the two digest values, the continuation unit 65 is the same as the continuation units 24 and 45 of the second and fourth embodiments. In addition, for each of the block data corresponding to the number of excluded collisions n before and after the non-matching portion detected by the comparison unit 64, when the block data is a matching portion, the block data is estimated as a collision occurrence portion, and the estimated collision The occurrence part was changed to the mismatch part. This makes it possible to detect and eliminate collisions by a simple method without comparing large-capacity file data, improve the detection accuracy of inconsistent portions, and realize high reliability.

次に、実施例７について詳細に説明する。実施例７のファイル処理装置７は、比較する２つのファイルのうちの１つのファイルが、ブロックデータのダイジェスト値、位置情報、同期情報及び分割数ｍと共に記憶部に予め蓄積されており、実施例３の場合と同様に、比較する２つのファイルのフレームサイズ（同期間隔）が変化している場合において、ファイルの不一致部分を検出する装置である。 Next, Example 7 will be described in detail. In the file processing apparatus 7 of the seventh embodiment, one of the two files to be compared is accumulated in advance in the storage unit together with the digest value of block data, position information, synchronization information, and the division number m. As in the case of No. 3, the apparatus detects a mismatched portion of the file when the frame size (synchronization interval) of the two files to be compared has changed.

図１５は、実施例７によるファイル処理装置７の構成を示すブロック図である。このファイル処理装置７は、記憶部７０、同期検出部７１、ブロックサイズ判定部７２、ブロック化部７３、ダイジェスト値演算部７４、ダイジェスト値読み出し部７５、比較部７６、連続性判定部７７及び連続化部７８を備えている。 FIG. 15 is a block diagram illustrating the configuration of the file processing apparatus 7 according to the seventh embodiment. The file processing device 7 includes a storage unit 70, a synchronization detection unit 71, a block size determination unit 72, a blocking unit 73, a digest value calculation unit 74, a digest value reading unit 75, a comparison unit 76, a continuity determination unit 77, and a continuity determination unit. A conversion unit 78 is provided.

図１０に示した実施例３のファイル処理装置３と、このファイル処理装置７とを比較すると、両装置３，７は、ファイル＃１データを処理する同期検出部３０−１，７１、ブロック化部３２−１，７３及びダイジェスト値演算部３３−１，７４を備え、また、ブロックサイズ判定部３１，７２、比較部３４，７６、連続性判定部３５，７７及び連続化部３６，７８を備えている点で同一である。一方、ファイル処理装置３は、ファイル＃２データを処理する同期検出部３０−２、ブロック化部３２−２及びダイジェスト値演算部３３−２を備えているのに対し、ファイル処理装置７は、記憶部７０及びダイジェスト値読み出し部７５を備えている点で相違する。 When comparing the file processing apparatus 3 of the third embodiment shown in FIG. 10 with this file processing apparatus 7, both apparatuses 3 and 7 are synchronized detection units 30-1 and 71 that process file # 1 data, and are blocked. Sections 32-1 and 73 and digest value calculation sections 33-1 and 74, and block size determination sections 31 and 72, comparison sections 34 and 76, continuity determination sections 35 and 77, and continuation sections 36 and 78. It is the same in terms of provision. On the other hand, the file processing apparatus 3 includes a synchronization detection unit 30-2, a blocking unit 32-2, and a digest value calculation unit 33-2 that process the file # 2 data. The difference is that a storage unit 70 and a digest value reading unit 75 are provided.

同期検出部７１、ブロックサイズ判定部７２、ブロック化部７３、ダイジェスト値演算部７４、比較部７６、連続性判定部７７及び連続化部７８は、実施例３の同期検出部３０−１、ブロックサイズ判定部３１、ブロック化部３２−１、ダイジェスト値演算部３３−１、比較部３４、連続性判定部３５及び連続化部３６と同じ処理を行うから、ここでは説明を省略する。尚、ダイジェスト値演算部７４は、記憶部７０に蓄積された、ファイル＃２データのダイジェスト値が演算された際の関数等と同じ関数等を用いて、ダイジェスト値を演算する。また、ブロックサイズ判定部７２は、ファイル＃２データの同期情報ｂ及び分割数ｍを記憶部７０から読み出し、ブロックサイズＮを判定する。 The synchronization detection unit 71, block size determination unit 72, blocking unit 73, digest value calculation unit 74, comparison unit 76, continuity determination unit 77, and continuation unit 78 are the same as the synchronization detection unit 30-1 of the third embodiment, block Since the same processing as that of the size determination unit 31, the blocking unit 32-1, the digest value calculation unit 33-1, the comparison unit 34, the continuity determination unit 35, and the continuation unit 36 is performed, description thereof is omitted here. The digest value calculation unit 74 calculates the digest value by using the same function or the like that is stored in the storage unit 70 when the digest value of the file # 2 data is calculated. Further, the block size determination unit 72 reads the synchronization information b and the division number m of the file # 2 data from the storage unit 70, and determines the block size N.

記憶部７０には、予め設定された分割数ｍが設定ファイルとして蓄積されている。また、記憶部７０には、ファイル＃２データ及び同期情報ｂが蓄積されており、既にファイル＃２データをブロック化して求めたブロックデータ毎のダイジェスト値がその位置情報と共に蓄積されている。このように、ファイル＃２データのダイジェスト値及び位置情報は、ファイル＃２データ及びブロックサイズＮから演算して求めるのではなく、記憶部７０に予めＤＢまたはファイルとして蓄積されている。また、同期情報ｂもファイル＃２データから求めるのではなく、記憶部７０に予めＤＢまたはファイルとして蓄積されている。 The storage unit 70 stores a preset division number m as a setting file. In addition, file # 2 data and synchronization information b are stored in the storage unit 70, and digest values for each block data already obtained by blocking the file # 2 data are stored together with the position information. Thus, the digest value and position information of the file # 2 data are not calculated from the file # 2 data and the block size N, but are stored in advance in the storage unit 70 as a DB or a file. In addition, the synchronization information b is not obtained from the file # 2 data, but is stored in advance in the storage unit 70 as a DB or a file.

ダイジェスト値読み出し部７５は、ブロック化部７３から位置情報を入力し、その位置情報のダイジェスト値を記憶部７０から読み出し、読み出したダイジェスト値及び位置情報を比較部７６に出力する。 The digest value reading unit 75 receives position information from the blocking unit 73, reads the digest value of the position information from the storage unit 70, and outputs the read digest value and position information to the comparison unit 76.

比較部７６は、ダイジェスト値演算部７４から、ファイル＃１のブロックデータにおけるダイジェスト値及び位置情報を入力すると共に、ダイジェスト値読み出し部７５から、ファイル＃２のブロックデータにおけるダイジェスト値及び位置情報を入力し、同じ位置情報のダイジェスト値を比較し、異なるダイジェスト値の位置情報を不一致部分の位置情報として連続性判定部７７に出力する。 The comparison unit 76 inputs the digest value and position information in the block data of the file # 1 from the digest value calculation unit 74, and inputs the digest value and position information in the block data of the file # 2 from the digest value reading unit 75. Then, the digest values of the same position information are compared, and the position information of the different digest values is output to the continuity determination unit 77 as the position information of the mismatched portion.

以上のように、実施例７のファイル処理装置７によれば、ファイル＃１データをブロック化してダイジェスト値及び位置情報を求め、予め蓄積されたファイル＃２データのダイジェスト値を、位置情報をキーにして記憶部７０から読み出し、２つのダイジェスト値を用いて、ファイル＃１，２の不一致部分を検出する際に、ブロックサイズ判定部７２が、同期情報ａ，ｂが同じであると判定した場合に、同期間隔またはその整数倍（Ｘ倍）のデータを分割数ｍで除算してブロックサイズＮを求め、ブロック化部７３が、ブロックサイズ判定部７２から同期情報及びブロックサイズＮを入力し、ブロックデータにファイルを分割する。そして、連続性判定部７７が、実施例１，３の連続性判定部１５，３５と同様に、ファイル編集に伴って影響を受ける所定のデータ区間毎に、不一致部分の連続性を判定し、所定のデータ区間内の不一致部分の数が、所定のデータ区間をブロックデータに分割するための予め設定された分割数ｍよりも小さい場合、不一致部分以外の一致部分をコリジョン発生部分として検出し、連続化部７８が、実施例１，３の連続化部１６，３６と同様に、コリジョン発生部分を不一致部分に変更するようにした。これにより、大容量のファイルデータを比較することなく簡易な手法により、コリジョンを検出し排除することができ、不一致部分の検出精度を向上させ、高信頼化を実現することができる。また、ブロックサイズＮを予め設定する必要がなく、可変レートのファイル＃１，＃２データの同期情報ａ，ｂから直接求めることができ、ブロックサイズＮを設定する手間を省くことができる。 As described above, according to the file processing apparatus 7 of the seventh embodiment, the file # 1 data is blocked to obtain the digest value and position information, and the digest value of the file # 2 data stored in advance is used as the position information key. When the block size determination unit 72 determines that the synchronization information a and b are the same when detecting the inconsistent portions of the files # 1 and # 2 using the two digest values. Then, the block size N is obtained by dividing the synchronization interval or its integral multiple (X times) data by the division number m, and the block forming unit 73 inputs the synchronization information and the block size N from the block size determining unit 72, Divide the file into block data. Then, the continuity determination unit 77 determines the continuity of the inconsistent portion for each predetermined data section that is affected by the file editing, like the continuity determination units 15 and 35 of the first and third embodiments. If the number of inconsistent portions in the predetermined data section is smaller than a preset division number m for dividing the predetermined data section into block data, a matching portion other than the inconsistent portion is detected as a collision occurrence portion, As in the case of the continuation units 16 and 36 of the first and third embodiments, the continuation unit 78 changes the collision occurrence portion to the mismatch portion. This makes it possible to detect and eliminate collisions by a simple method without comparing large-capacity file data, improve the detection accuracy of inconsistent portions, and realize high reliability. Further, it is not necessary to set the block size N in advance, and it can be directly obtained from the synchronization information a and b of the variable rate file # 1 and # 2 data, so that the trouble of setting the block size N can be saved.

次に、実施例８について詳細に説明する。実施例８のファイル処理装置８は、比較する２つのファイルのうちの１つのファイルが、ブロックデータのダイジェスト値、位置情報、同期情報、分割数ｍ及び排除コリジョン数ｎと共に記憶部に予め蓄積されており、実施例４の場合と同様に、比較する２つのファイルのフレームサイズ（同期間隔）が変化している場合において、ファイルの不一致部分を検出する装置である。 Next, Example 8 will be described in detail. In the file processing apparatus 8 according to the eighth embodiment, one of two files to be compared is stored in advance in a storage unit together with a digest value of block data, position information, synchronization information, a division number m, and an exclusion collision number n. Similarly to the case of the fourth embodiment, when the frame sizes (synchronization intervals) of two files to be compared are changed, the apparatus detects a mismatched portion of the files.

図１６は、実施例８によるファイル処理装置８の構成を示すブロック図である。このファイル処理装置８は、記憶部８０、同期検出部８１、ブロックサイズ判定部８２、ブロック化部８３、ダイジェスト値演算部８４、ダイジェスト値読み出し部８５、比較部８６及び連続化部８７を備えている。 FIG. 16 is a block diagram illustrating the configuration of the file processing apparatus 8 according to the eighth embodiment. The file processing apparatus 8 includes a storage unit 80, a synchronization detection unit 81, a block size determination unit 82, a blocking unit 83, a digest value calculation unit 84, a digest value reading unit 85, a comparison unit 86, and a continuation unit 87. Yes.

図１２に示した実施例４のファイル処理装置４と、このファイル処理装置８とを比較すると、両装置４，８は、ファイル＃１データを処理する同期検出部４０−１，８１、ブロック化部４２−１，８３、ダイジェスト値演算部４３−１，８４を備え、また、ブロックサイズ判定部４１，８２、比較部４４，８６、及び連続化部４５，８７を備えている点で同一である。一方、ファイル処理装置４は、ファイル＃２データを処理する同期検出部４０−２、ブロック化部４２−２及びダイジェスト値演算部４３−２を備えているのに対し、ファイル処理装置８は、記憶部８０及びダイジェスト値読み出し部８５を備えている点で相違する。 Comparing the file processing apparatus 4 of the fourth embodiment shown in FIG. 12 with this file processing apparatus 8, both apparatuses 4 and 8 are synchronized detection units 40-1 and 81 that process file # 1 data, and are blocked. , And digest value calculation units 43-1 and 84, and block size determination units 41 and 82, comparison units 44 and 86, and continuation units 45 and 87. is there. On the other hand, the file processing device 4 includes a synchronization detection unit 40-2, a blocking unit 42-2, and a digest value calculation unit 43-2 that process the file # 2 data. The difference is that a storage unit 80 and a digest value reading unit 85 are provided.

同期検出部８１、ブロックサイズ判定部８２、ブロック化部８３、ダイジェスト値演算部８４、比較部８６及び連続化部８７は、実施例４の同期検出部４０−１、ブロックサイズ判定部４１、ブロック化部４２−１、ダイジェスト値演算部４３−１、比較部４４及び連続化部４５と同じ処理を行うから、ここでは説明を省略する。尚、ダイジェスト値演算部８４は、記憶部８０に蓄積された、ファイル＃２データのダイジェスト値が演算された際の関数等と同じ関数等を用いて、ダイジェスト値を演算する。また、ブロックサイズ判定部８２は、ファイル＃２データの同期情報ｂ、分割数ｍ及び排除コリジョン数ｎを記憶部８０から読み出し、ブロックサイズＮを判定する。 The synchronization detection unit 81, the block size determination unit 82, the blocking unit 83, the digest value calculation unit 84, the comparison unit 86, and the continuation unit 87 are the synchronization detection unit 40-1, the block size determination unit 41, and the block according to the fourth embodiment. Since the same processing as that of the conversion unit 42-1, the digest value calculation unit 43-1, the comparison unit 44, and the continuation unit 45 is performed, description thereof is omitted here. The digest value calculation unit 84 calculates the digest value using the same function or the like that is stored in the storage unit 80 when the digest value of the file # 2 data is calculated. Further, the block size determination unit 82 reads the synchronization information b, the division number m, and the exclusion collision number n of the file # 2 data from the storage unit 80, and determines the block size N.

記憶部８０には、予め設定された分割数ｍ及び排除コリジョン数ｎが設定ファイルとして蓄積されている。また、記憶部８０には、ファイル＃２データ及び同期情報ｂが蓄積されており、既にファイル＃２データをブロック化して求めたブロックデータ毎のダイジェスト値がその位置情報と共に蓄積されている。このように、ファイル＃２データのダイジェスト値及び位置情報は、ファイル＃２データ及びブロックサイズＮから演算して求めるのではなく、記憶部８０に予めＤＢまたはファイルとして蓄積されている。また、同期情報ｂもファイル＃２データから求めるのではなく、記憶部８０に予めＤＢまたはファイルとして蓄積されている。 The storage unit 80 stores a preset division number m and exclusion collision number n as a setting file. In addition, file # 2 data and synchronization information b are stored in the storage unit 80, and digest values for each block data already obtained by blocking the file # 2 data are stored together with the position information. As described above, the digest value and position information of the file # 2 data are not calculated from the file # 2 data and the block size N, but are stored in advance in the storage unit 80 as a DB or a file. Also, the synchronization information b is not obtained from the file # 2 data, but is stored in advance in the storage unit 80 as a DB or a file.

ダイジェスト値読み出し部８５は、ブロック化部８３から位置情報を入力し、その位置情報のダイジェスト値を記憶部８０から読み出し、読み出したダイジェスト値及び位置情報を比較部８６に出力する。 The digest value reading unit 85 receives position information from the blocking unit 83, reads the digest value of the position information from the storage unit 80, and outputs the read digest value and position information to the comparison unit 86.

比較部８６は、ダイジェスト値演算部８４から、ファイル＃１のブロックデータにおけるダイジェスト値及び位置情報を入力すると共に、ダイジェスト値読み出し部８５から、ファイル＃２のブロックデータにおけるダイジェスト値及び位置情報を入力し、同じ位置情報のダイジェスト値を比較し、異なるダイジェスト値の位置情報を不一致部分の位置情報として連続化部８７に出力する。 The comparison unit 86 inputs the digest value and position information in the block data of the file # 1 from the digest value calculation unit 84, and inputs the digest value and position information in the block data of the file # 2 from the digest value reading unit 85. Then, the digest values of the same position information are compared, and the position information of the different digest values is output to the continuation unit 87 as the position information of the mismatched portion.

以上のように、実施例８のファイル処理装置８によれば、ファイル＃１データをブロック化してダイジェスト値及び位置情報を求め、予め蓄積されたファイル＃２データのダイジェスト値を、位置情報をキーにして記憶部８０から読み出し、２つのダイジェスト値を用いて、ファイル＃１，２の不一致部分を検出する際に、ブロックサイズ判定部８２が、同期情報ａ，ｂが同じであると判定した場合に、同期間隔またはその整数倍（Ｘ倍）のデータを分割数ｍで除算してブロックサイズＮを求め、ブロック化部８３が、ブロックサイズ判定部８２から同期情報及びブロックサイズＮを入力し、ブロックデータにファイルを分割する。そして、連続化部８７が、実施例２，４の連続化部２４，４５と同様に、比較部８６により検出された不一致部分の前後における排除コリジョン数ｎ分のブロックデータのそれぞれについて、そのブロックデータが一致部分である場合、そのブロックデータをコリジョン発生部分として推定し、推定したコリジョン発生部分を不一致部分に変更するようにした。これにより、大容量のファイルデータを比較することなく簡易な手法により、コリジョンを検出し排除することができ、不一致部分の検出精度を向上させ、高信頼化を実現することができる。また、ブロックサイズＮを予め設定する必要がなく、可変レートのファイル＃１，＃２データの同期情報ａ，ｂから直接求めることができ、ブロックサイズＮを設定する手間を省くことができる。 As described above, according to the file processing apparatus 8 of the eighth embodiment, the file # 1 data is blocked to obtain the digest value and position information, and the digest value of the file # 2 data stored in advance is used as the position information key. When the block size determination unit 82 determines that the synchronization information a and b are the same when detecting the inconsistent portion of the files # 1 and # 2 using the two digest values Then, the block size N is obtained by dividing the synchronization interval or its integral multiple (X times) data by the division number m, and the block forming unit 83 inputs the synchronization information and the block size N from the block size determining unit 82, Divide the file into block data. Then, as with the continuation units 24 and 45 of the second and fourth embodiments, the continuation unit 87 applies the block data for each block data corresponding to the number n of excluded collisions before and after the mismatched portion detected by the comparison unit 86. When the data is a coincident portion, the block data is estimated as a collision occurrence portion, and the estimated collision occurrence portion is changed to a mismatch portion. This makes it possible to detect and eliminate collisions by a simple method without comparing large-capacity file data, improve the detection accuracy of inconsistent portions, and realize high reliability. Further, it is not necessary to set the block size N in advance, and it can be directly obtained from the synchronization information a and b of the variable rate file # 1 and # 2 data, so that the trouble of setting the block size N can be saved.

尚、本発明の実施例１〜８によるファイル処理装置１〜８のハード構成としては、通常のコンピュータを使用することができる。ファイル処理装置１〜８は、ＣＰＵ、ＲＡＭ等の揮発性の記憶媒体、ＲＯＭ等の不揮発性の記憶媒体、及びインターフェース等を備えたコンピュータによって構成される。ファイル処理装置１に備えた設定ファイル部１０、同期検出部１１−１，１１−２、ブロック化部１２−１，１２−２、ダイジェスト値演算部１３−１，１３−２、比較部１４、連続性判定部１５及び連続化部１６の各機能は、これらの機能を記述したプログラムをＣＰＵに実行させることによりそれぞれ実現される。ファイル処理装置２に備えた設定ファイル部２０、ブロック化部２１−１，２１−２、ダイジェスト値演算部２２−１，２２−２、比較部２３及び連続化部２４の各機能も、これらの機能を記述したプログラムをＣＰＵに実行させることによりそれぞれ実現される。ファイル処理装置３〜８に備えたそれぞれの構成部の各機能も同様である。また、これらのプログラムは、磁気ディスク（フロッピー（登録商標）ディスク、ハードディスク等）、光ディスク（ＣＤ−ＲＯＭ、ＤＶＤ等）、半導体メモリ等の記憶媒体に格納して頒布することもできる。 Note that a normal computer can be used as the hardware configuration of the file processing apparatuses 1 to 8 according to the first to eighth embodiments of the present invention. The file processing devices 1 to 8 are configured by a computer including a volatile storage medium such as a CPU and a RAM, a nonvolatile storage medium such as a ROM, an interface, and the like. A setting file unit 10, synchronization detection units 11-1 and 11-2, blocking units 12-1 and 12-2, digest value calculation units 13-1 and 13-2, a comparison unit 14, Each function of the continuity determination unit 15 and the continuation unit 16 is realized by causing the CPU to execute a program describing these functions. The functions of the setting file unit 20, the blocking units 21-1 and 21-2, the digest value calculation units 22-1 and 22-2, the comparison unit 23, and the continuation unit 24 included in the file processing apparatus 2 are Each is realized by causing the CPU to execute a program describing the function. The same applies to each function of each component included in the file processing apparatuses 3 to 8. These programs can also be stored and distributed in a storage medium such as a magnetic disk (floppy (registered trademark) disk, hard disk, etc.), optical disk (CD-ROM, DVD, etc.), semiconductor memory, or the like.

１〜８，１００ファイル処理装置
１０，２０，１０１設定ファイル部
１１，３０，４０，５１，７１，８１同期検出部
１２，２１，３２，４２，５２，６１，７３，８３，１０２ブロック化部
１３，２２，３３，４３，５３，６２，７４，８４，１０３ダイジェスト値演算部
１４，２３，３４，４４，５５，６４，７６，８６，１０４比較部
１５，３５，５６，７７連続性判定部
１６，２４，３６，４５，５７，６５，７８，８７連続化部
３１，４１，７２，８２ブロックサイズ判定部
５０，６０，７０，８０記憶部
５４，６３，７５，８５ダイジェスト値読み出し部
９０編集装置
９１サーバ 1 to 8, 100 File processing device 10, 20, 101 Setting file unit 11, 30, 40, 51, 71, 81 Synchronization detection unit 12, 21, 32, 42, 52, 61, 73, 83, 102 Blocking unit 13, 22, 33, 43, 53, 62, 74, 84, 103 Digest value calculation unit 14, 23, 34, 44, 55, 64, 76, 86, 104 Comparison unit 15, 35, 56, 77 Continuity determination Units 16, 24, 36, 45, 57, 65, 78, 87 Continuous units 31, 41, 72, 82 Block size determination units 50, 60, 70, 80 Storage units 54, 63, 75, 85 Digest value reading unit 90 Editing device 91 Server

Claims

複数のファイルのデータを比較して不一致部分を検出するファイル処理装置において、
前記ファイルのそれぞれについて、前記ファイルにおける処理単位のサイズを示す同期間隔、及び前記同期間隔における先頭位置のタイミングを検出し、同期情報を生成する同期検出部と、
前記ファイルのそれぞれについて、前記同期検出部により生成された同期情報の示す同期間隔及びタイミングに従って、前記ファイルのデータを、所定のデータ区間内で所定のブロックサイズのブロックデータに分割し、前記ファイル内の前記ブロックデータの位置を示す位置情報を生成するブロック化部と、
前記ファイルのそれぞれについて、前記ブロック化部により分割されたブロックデータのダイジェスト値を演算するダイジェスト値演算部と、
前記ダイジェスト値演算部により演算されたダイジェスト値を、前記ブロック化部により生成された同じ位置情報が示す位置毎に比較し、異なるダイジェスト値の位置情報を、前記ファイルの不一致部分の位置情報として出力する比較部と、
前記比較部により出力された不一致部分の位置情報を入力し、前記不一致部分の位置情報に基づいて、前記所定のデータ区間内で不一致部分の連続性を判定し、前記所定のデータ区間内で不一致部分が連続していない一致部分をコリジョン発生部分とし、前記コリジョン発生部分の位置情報及び前記不一致部分の位置情報を出力する連続性判定部と、
前記連続性判定部により出力されたコリジョン発生部分の位置情報及び不一致部分の位置情報を入力し、前記コリジョン発生部分の位置情報を不一致部分の位置情報に変更して不一致部分を連続させ、前記連続させた不一致部分の位置情報を出力する連続化部と、
を備えたことを特徴とするファイル処理装置。 In a file processing apparatus that compares data of a plurality of files and detects a mismatched portion,
For each of the files, a synchronization interval that indicates the size of a processing unit in the file, and a synchronization detection unit that detects the timing of the start position in the synchronization interval, and generates synchronization information;
For each of the files, the data of the file is divided into block data of a predetermined block size within a predetermined data section according to the synchronization interval and timing indicated by the synchronization information generated by the synchronization detection unit, A blocking unit that generates position information indicating the position of the block data of
For each of the files, a digest value calculation unit that calculates the digest value of the block data divided by the blocking unit;
The digest value calculated by the digest value calculation unit is compared for each position indicated by the same position information generated by the blocking unit, and position information of different digest values is output as position information of the inconsistent portion of the file. A comparison unit to
The position information of the non-matching portion output by the comparison unit is input, the continuity of the non-matching portion is determined within the predetermined data section based on the position information of the non-matching portion, and the non-matching within the predetermined data section A continuity determining unit that outputs a position where the collision occurs as a collision occurrence part and a position where the collision does not occur and a position where the collision does not occur.
The position information of the collision occurrence part and the position information of the mismatch part outputted by the continuity determination unit are input, the position information of the collision occurrence part is changed to the position information of the mismatch part, and the mismatch part is made continuous. A continuation unit that outputs position information of the mismatched part,
A file processing apparatus comprising:

複数のファイルのデータを比較して不一致部分を検出するファイル処理装置において、
前記ファイルのそれぞれについて、前記ファイルのデータを、所定のブロックサイズのブロックデータに分割し、前記ファイル内の前記ブロックデータの位置を示す位置情報を生成するブロック化部と、
前記ファイルのそれぞれについて、前記ブロック化部により分割されたブロックデータのダイジェスト値を演算するダイジェスト値演算部と、
前記ダイジェスト値演算部により演算されたダイジェスト値を、前記ブロック化部により生成された同じ位置情報が示す位置毎に比較し、異なるダイジェスト値の位置情報を、前記ファイルの不一致部分の位置情報として出力する比較部と、
前記比較部により出力された不一致部分の位置情報を入力し、前記不一致部分の位置情報が示す位置の前後に、前記不一致部分を所定数分広げて連続させ、前記連続させた不一致部分の位置情報を出力する連続化部と、
を備えたことを特徴とするファイル処理装置。 In a file processing apparatus that compares data of a plurality of files and detects a mismatched portion,
For each of the files, a blocking unit that divides the data of the file into block data of a predetermined block size and generates position information indicating the position of the block data in the file;
For each of the files, a digest value calculation unit that calculates the digest value of the block data divided by the blocking unit;
The digest value calculated by the digest value calculation unit is compared for each position indicated by the same position information generated by the blocking unit, and position information of different digest values is output as position information of the inconsistent portion of the file. A comparison unit to
The position information of the non-matching part output by the comparison unit is input, and before and after the position indicated by the position information of the non-matching part, the non-matching part is expanded by a predetermined number to be continuous, and the position information of the non-matching part made continuous A continuation unit that outputs
A file processing apparatus comprising:

請求項１に記載のファイル処理装置において、
さらに、前記同期検出部により生成された同期情報が前記ファイルのそれぞれについて同一であると判定した場合、前記所定のデータ区間内におけるファイルのデータをブロックデータに分割する際の所定の分割数に基づいて、ブロックサイズを判定するブロックサイズ判定部を備え、
前記ブロック化部は、前記ファイルのそれぞれについて、前記ファイルのデータを、前記所定のデータ区間内で、前記ブロックサイズ判定部により判定されたブロックサイズのブロックデータに分割し、前記ファイル内における前記ブロックデータの位置を示す位置情報を生成する、ことを特徴とするファイル処理装置。 The file processing apparatus according to claim 1,
Further, when it is determined that the synchronization information generated by the synchronization detection unit is the same for each of the files, based on a predetermined number of divisions when the data of the file in the predetermined data section is divided into block data A block size determination unit for determining the block size,
For each of the files, the blocking unit divides the data of the file into block data having a block size determined by the block size determination unit within the predetermined data section, and the blocks in the file A file processing apparatus that generates position information indicating a position of data.

請求項２に記載のファイル処理装置において、
さらに、前記ファイルのそれぞれについて、前記ファイルにおける処理単位のサイズを示す同期間隔、及び前記同期間隔における先頭位置のタイミングを検出し、同期情報を生成する同期検出部と、
前記同期検出部により生成された同期情報が前記ファイルのそれぞれについて同一であると判定した場合、前記所定のデータ区間内におけるファイルのデータをブロックデータに分割する際の所定の分割数に基づいて、ブロックサイズを判定するブロックサイズ判定部と、を備え、
前記ブロック化部は、前記ファイルのそれぞれについて、前記ファイルのデータを、前記所定のデータ区間内で、前記ブロックサイズ判定部により判定されたブロックサイズのブロックデータに分割し、前記ファイル内における前記ブロックデータの位置を示す位置情報を生成する、ことを特徴とするファイル処理装置。 The file processing apparatus according to claim 2,
Further, for each of the files, a synchronization interval that indicates the size of a processing unit in the file, and a synchronization detection unit that detects the timing of the start position in the synchronization interval, and generates synchronization information;
When it is determined that the synchronization information generated by the synchronization detection unit is the same for each of the files, based on a predetermined division number when dividing the data of the file into the block data within the predetermined data section, A block size determination unit for determining a block size,
For each of the files, the blocking unit divides the data of the file into block data having a block size determined by the block size determination unit within the predetermined data section, and the blocks in the file A file processing apparatus that generates position information indicating a position of data.

請求項１または２に記載のファイル処理装置において、
さらに、前記比較する複数のファイルのうちの１つのファイルが蓄積され、かつ、前記１つのファイルについてのブロックデータのダイジェスト値及び位置情報が蓄積された記憶部を備え、
前記記憶部に蓄積されたファイルについて処理する前記ブロック化部、ダイジェスト値演算部及び請求項１の同期検出部の代わりにダイジェスト値読み出し部を備え、
前記ダイジェスト値読み出し部は、前記記憶部に蓄積されていない他のファイルについて処理する前記ブロック化部により生成された位置情報に対応するダイジェスト値を、前記記憶部から読み出し、
前記比較部は、前記記憶部に蓄積されていない他のファイルについて処理する前記ダイジェスト値演算部により演算されたダイジェスト値と、前記ダイジェスト値読み出し部により読み出されたダイジェスト値とを、同じ位置情報が示す位置毎に比較し、異なるダイジェスト値の位置情報を、前記ファイルの不一致部分の位置情報として出力する、ことを特徴とするファイル処理装置。 The file processing apparatus according to claim 1 or 2,
Furthermore, a storage unit is provided in which one of the plurality of files to be compared is stored, and a digest value and position information of block data for the one file are stored,
A digest value reading unit is provided instead of the blocking unit, the digest value calculation unit, and the synchronization detection unit of claim 1 for processing files stored in the storage unit,
The digest value reading unit reads the digest value corresponding to the position information generated by the blocking unit that processes other files not stored in the storage unit from the storage unit,
The comparison unit uses the same position information for the digest value calculated by the digest value calculation unit that processes other files not stored in the storage unit and the digest value read by the digest value reading unit. The file processing apparatus is characterized in that the position information of the different digest values is output as the position information of the mismatched portion of the file.

請求項３または４に記載のファイル処理装置において、
さらに、前記比較する複数のファイルのうちの１つのファイルが蓄積され、かつ、前記１つのファイルについての同期情報、ブロックデータのダイジェスト値及び位置情報が蓄積された記憶部を備え、
前記記憶部に蓄積されたファイルについて処理する前記同期検出部、ブロック化部及びダイジェスト値演算部の代わりにダイジェスト値読み出し部を備え、
前記ダイジェスト値読み出し部は、前記記憶部に蓄積されていない他のファイルについて処理する前記ブロック化部により生成された位置情報に対応するダイジェスト値を、前記記憶部から読み出し、
前記ブロックサイズ判定部は、前記記憶部に蓄積されていない他のファイルについて処理する前記同期検出部により生成された同期情報と、前記記憶部に蓄積されたファイルの同期情報とが同一であると判定した場合、前記ファイルのデータをブロックデータに分割する際の所定の分割数に基づいて、ブロックサイズを判定し、
前記比較部は、前記記憶部に蓄積されていない他のファイルについて処理する前記ダイジェスト値演算部により演算されたダイジェスト値と、前記ダイジェスト値読み出し部により読み出されたダイジェスト値とを、同じ位置情報が示す位置毎に比較し、異なるダイジェスト値の位置情報を、前記ファイルの不一致部分の位置情報として出力する、ことを特徴とするファイル処理装置。 The file processing apparatus according to claim 3 or 4,
And a storage unit in which one of the plurality of files to be compared is stored, and the synchronization information, the digest value of block data, and the position information about the one file are stored.
A digest value reading unit is provided instead of the synchronization detection unit, the blocking unit, and the digest value calculation unit that process the files stored in the storage unit,
The digest value reading unit reads the digest value corresponding to the position information generated by the blocking unit that processes other files not stored in the storage unit from the storage unit,
The block size determination unit has the same synchronization information generated by the synchronization detection unit that processes other files not stored in the storage unit, and the synchronization information of the files stored in the storage unit. If determined, the block size is determined based on a predetermined number of divisions when the file data is divided into block data,
The comparison unit uses the same position information for the digest value calculated by the digest value calculation unit that processes other files not stored in the storage unit and the digest value read by the digest value reading unit. The file processing apparatus is characterized in that the position information of the different digest values is output as the position information of the mismatched portion of the file.

コンピュータを、請求項１から６までのいずれか一項に記載のファイル処理装置として機能させるためのファイル処理プログラム。 The file processing program for functioning a computer as a file processing apparatus as described in any one of Claim 1-6.