CN102651057A - OOXML (office open extensible markup language)-based electronic document digital evidence collecting method and device thereof - Google Patents

OOXML (office open extensible markup language)-based electronic document digital evidence collecting method and device thereof Download PDF

Info

Publication number
CN102651057A
CN102651057A CN2011100462543A CN201110046254A CN102651057A CN 102651057 A CN102651057 A CN 102651057A CN 2011100462543 A CN2011100462543 A CN 2011100462543A CN 201110046254 A CN201110046254 A CN 201110046254A CN 102651057 A CN102651057 A CN 102651057A
Authority
CN
China
Prior art keywords
document
electronic document
ooxml
information
modification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011100462543A
Other languages
Chinese (zh)
Inventor
孙星明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN2011100462543A priority Critical patent/CN102651057A/en
Publication of CN102651057A publication Critical patent/CN102651057A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Storage Device Security (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention relates to an OOXML (office open extensible markup language)-based electronic document digital evidence collecting method and a device thereof. The Office Open XML (OOXML) file format is a new file format adopted by Microsoft office 2007 and later versions. The method disclosed by the invention can collect relevant evidences from a suspected electronic file, such as author information, tampering time information, hidden secret file information and the like, so as to judge the copyright attribution of the electronic file and detect the confidential information and the like. The process of electronic file digital evidence collection does not alter any content of the electronic file, and the collected characteristics have strong robustness. Moreover, the method can resist attacks such as Save As, Delete, Edit, Copy and the like, and can be used for safety control of confidential information covering national defense, politics and commerce.

Description

A kind of electronic document digital evidence collecting method and device thereof based on OOXML
Technical field
The present invention relates to the OOXML format electronic document is the computer version field of information security technology of carrier, refers to a kind of electronic document digital evidence collecting method and device thereof based on OOXML especially.
Background technology
Along with computer technology and rapid development of network technology, the importance of information security seems and becomes increasingly conspicuous.The digital evidence obtaining technology is an important branch of information security field, and it can effectively extract effective digital evidence from protected various information carriers, avoids invading with the original copyright of protection carrier.
Electronic document is to use the most frequent bearer type on the internet; Like Word, Excel, PDF, HTML, JavaScript or the like; Most information on the internet are all transmitted through electronic document, and it provides great facility for daily life and work.Yet just because of the duplicating conveniently of electronic document, propagate characteristics such as quick, it has also brought a series of problem.Such as the dispute over copyright problem, in our daily life, dispute over copyright, infringement incident etc. take place, or even lawless person's massive duplication, piracy receive the electronic document of legal protection etc. through regular meeting; Hidden for another example transmission problem, because that electronic document uses is very extensive, being not easy to come to light includes hidden information, so terrorist or offender utilize electronic document to transmit the information etc. of relevant terrorist activity, criminal activity as hidden carrier.In addition at other fields such as commerce, suspect's archive files, financial statement often with the stored in form of electronic documents such as word in computing machine, and propagate.
Produce having its source at present us that is that all right being ripe to the research of the digital evidence obtaining aspect of relevant electronic document of above problem, also do not form the passive evidence obtaining of a cover complete and effective electronic document and the method for evidence obtaining initiatively.Evaluation, the supervision of departments such as this gives us national defence, national security, the administration of justice have brought very big difficulty.And the electronic document forensic technologies promptly is the unknown object that comprises in founder, reviser, creation-time, modification time, modification number of times and the document that can from electronic document, extract such as document or the like an important information, to obtain strong evidence.Thereby auxiliary relevant department and author, the integrality of personal appraisal electronic document and the hidden object that is wherein comprised etc. combat copyright piracy and unwarranted distort, fight crime molecule or terrorist's covert communications, maintain social stability.
The Office groupware uses the most extensive with this electronic document type of Office Word especially.And along with the development of technology and the variation of the market demand, form that the Office series of products are adopted and technology also change thereupon.Version before the Office software; Like Office97-2003, employing be the compound document structure, and since 2007; Versions such as Office2007 that Microsoft releases and Office2010 have been introduced a kind of new file layout, i.e. Open Office XML (OOXML) file layout.The binary format that compound document adopted before this document format is different from fully; Adopt the electronic document of this form to be based on that XML and ZIP technology create; Each office2007 or Office2010 file in fact all are ZIP document package of being made up of many parts; Such as document.xml is exactly a critical piece in the word2007 literature kit, and it has defined all content of text of this word document.This form can significantly reduce the capacity of Office document files, and the mistake that possibly occur in can also avoiding file transfer and handling, so also will be by extensive employing in the Office software development afterwards.
It is few both at home and abroad to carry out the research of textual number evidence obtaining to text document, but some achievements are also arranged.Digital evidence obtaining (Digital Forensics) also is computer forensics (Computer Forensics), and it is accompanied by development of computer and grows up, and its fundamental purpose is the computer crime that containment is becoming increasingly rampant.And digital evidence obtaining is studied targetedly is inchoate in recent years; Calendar year 2001 has been held first digital evidence obtaining research meeting (Digital Forensic Research Workshop: DFRWS) make as giving a definition for the digital evidence obtaining science: to use scientific basis and method of proof in the U.S.; Relevant testimony to copyright is collected, stores, examines, verifies, analyzes, explains, is write down and appears; Thereby reach the purpose that promotes and promote the crime dramas reconstruction, or the unwarranted destructive behavior of auxiliary prediction.This meeting is afterwards held once every year, has held 9 so far.A large amount of experts and scholars wait from the more profound implication that digital evidence obtaining has been discussed, related scope and their achievement in research separately etc.
The research of electronic document digital forensic technologies still is an emerging field, and comprehensive both at home and abroad to this research, we can be divided into initiatively evidence collecting method and passive evidence collecting method two big classes to it.Electronic document initiatively evidence collecting method comprises digital watermarking and digital signature etc.; But in actual life we can not be on one's own initiative in each electronic document embed digital watermarks etc. in advance all, moreover suspectable electronic document is likely that the people oneself that is suspected of being guilty creates and propagation.Therefore we need study be how from the electronic document of those suspection, to extract evidence that we want with judge this electronic document whether belong to pirate, through distorting, or being concealed with some secret information etc., this promptly is the passive blind evidence collecting method research of electronic document.
The passive blind evidence collecting method of electronic document is directly to collect evidence to the electronic document of suspecting; This evidence obtaining does not need embed digital watermark etc. in the carrier works in advance; Comprise evidence collecting method, based on the evidence collecting method of document content (Unique Identifiers, Revision Identifier etc.) with based on the evidence collecting method three major types of document properties (founder/creation-time, reviser/modification time, revise number of times etc.) based on document physical arrangement (metadata (metadata), redundant space (redundant space), data recover (Data Recovery) etc.).Narrate respectively below.
Evidence collecting method based on the document physical arrangementThe electronic document of any kind all has specific physical arrangement, and this physical arrangement has been reacted the mutual membership credentials of the document intraware.Can be divided into general parts, close based part and 3 ingredients of content parts, a simple pdf document physical arrangement can be divided into file header, document body, cross reference table, 4 ingredients of end-of-file such as the physical arrangement of a simple Word2007 file.People such as the Bora Park of Korea S come unknown component (unknown parts) and the unknown relation (unknown relationships) of detection of concealed in the office2007 document according to memory mechanism and the relational file of office 2007 system, thus the relevant testimony of obtaining.The lot of domestic and international scholar utilizes the physical arrangement of the electronic document of other type to carry out Information hiding in addition, and for example people such as gondola Castiglione proposes to cover secret information on the not usage space (unused space) of MS compound document and hides Info or the like.These methods have important enlightenment to the digital evidence obtaining that we are directed against electronic document.
Evidence collecting method based on document contentBe meant whether content to electronic document is complete, whether content is distorted or partial content such as is distorted at the evidence obtaining of carrying out based on the evidence collecting method of document content.At present based on this research also seldom; But also there are some researchs that we are had directive significance; People such as Simson L. Garfinkel such as United States Naval Postgraduate School have analyzed the meaning of the new construction of office 2007 system to digital evidence obtaining, and they have proposed to utilize the unique identifier (Unique Identifiers) among the format OOXML of office2007 document to infer whether the document is distorted.
Evidence collecting method based on document propertiesEvidence collecting method based on document properties is meant the attribute information that from the electronic document of suspecting, detects the document, such as founder/creation-time, reviser/modification time, modification number of times or the like.Aspect the extraction of the time attribute of electronic document; Previous research mainly is creation-time, founder and the last modification time that extracts document, last reviser; And our research not only will detect the time that document is modified at last, also will deeply excavate from document creation to begin to information such as time of all modifications number of times that document is last revising or even corresponding modification persons.
Can know that from above analysis the passive blind evidence collecting method research of electronic document has attracted lot of domestic and foreign researcher's interest, but achievement in research is also more scrappy, and also there are some subject matters in these available research achievements.Such as the object major part of research is based on Microsoft office 2007 electronic document of version in the past; Be office1997 – 2003; What this sort of electronic document adopted is compound document (compound document) structure; And office 2007 or office2008 series electronic document have adopted a kind of new file layout, i.e. Office Open XML file layout, and this format is about to replace original compound document format.
The present invention is directed to the Office Word electronic document type that adopts new OOXML form, propose a kind of electronic document digital evidence collecting method and device thereof based on OOXML.The present invention does not change any displaying contents of OOXML format electronic document, can be used for the safety guarantee of confidential information such as national defence, politics, commerce.
Summary of the invention
The object of the present invention is to provide a kind of electronic document digital evidence collecting method and device thereof, can from the electronic document of suspecting, extract relevant evidence, thereby carry out that the electronic document copyright ownership is judged, the detection of confidential information etc. based on OOXML.The process of electronic document digital evidence obtaining is not changed any content of electronic document, and the characteristic of being extracted has very strong robustness, and this method such as can resist and save as, delete, edit, duplicate at attack.
For reaching above purpose, the present invention adopts following scheme:
A kind of electronic document digital evidence collecting method and device thereof based on OOXML, electronic document digital evidence collecting method are the relevant evidence information that from the electronic document of OOXML, extracts, such as author information, distort temporal information, hiding secret papers information etc.
The electronic document digital evidence obtaining process performing step of said method is following:
1. at first be according to Open Office XML format specification; Read the critical piece document.xml file in the electronic document package of an OOXML; And extract the root node (document) in these parts; Container (body), paragraph (paragraph), text and property set thereof (run) and plain text (text).
2. to the electronic document package of each OOXML, set up corresponding internal physical structure and internal logic relation.
3. according to their storage inside structure and relation, and then analyze the operating position of the internal physical space of the document, and their intrawares principle that is mutually related.
4. whether the physical object that then detects in the electronic document package of each OOXML according to this relevance principle meets this principle, is not suspect object if meet.
5. sharp modification time information, modification author information, modification number of times information, modified logo symbol information and the property value thereof that uses the same method and from the electronic document of OOXML, extract the document, and store.
6. for the once task of editing, judge whether the property value of the modified logo symbol in this editing area is identical, distorts attack if then explanation inequality has received.
 
The method that modified logo symbol in the word2010 document and property value thereof produce is following:
1. after creating a new word2010 document, in the new text character of input in the document, will produce new text modification identifier automatically, the property value of this modified logo symbol is to be become by eight 16 system arrays, and its value is unique.
2. when opening an existing word2010 document; (comprise and change font, font size, color etc.) when original text is edited; Will produce new text modification identifier automatically; The property value of this modified logo symbol is to be become by eight 16 system arrays, and its value is unique.
3. when opening an existing word2010 document, will produce new text modification identifier automatically when importing new text, the property value of this modified logo symbol is to be become by eight 16 system arrays, and its value is unique.
 
Advantage of the present invention and good effect:
The present invention is directed to Open Office XML format file and propose a kind of electronic document digital evidence collecting method and device thereof based on OOXML.This method is the relevant evidence information that from the electronic document of OOXML, extracts, such as author information, distort temporal information, hiding secret papers information etc.The present invention does not change any displaying contents of electronic document, has very strong robustness, can resist attacks such as saving as, delete, edit, duplicate.
This scheme has three principal features: the one, can carry out filtration treatment to document, and through the definition again of document internal part and relation, remove the classified item that is hidden in the document; The 2nd, this scheme can discern the document original author, revise author, modification time, modification number of times etc.; The 3rd, this scheme such as can resist and save as, delete, edit, duplicate at attack.
 
Description of drawings
Fig. 1 is the ZIP inclusion composition of an OOXML format file
Fig. 2 is electronic document digital evidence obtaining process synoptic diagram
Embodiment
For making the object of the invention, technical scheme more clear, the electronic document digital evidence collecting method that the embodiment of the invention proposed is elaborated below in conjunction with accompanying drawing.
Original document is for adopting the word2010 electronic document of Open Office XML form, and its ZIP bag file structure is shown in accompanying drawing 1.If assailant A creates a original OOXML document, called after " aa.docx ", and in the document, hidden " bb.jpg " secret papers.It is as shown in Figure 2 then to utilize method of the present invention to carry out the process of electronic document digital evidence obtaining, and concrete steps are:
The first step; At first be according to Open Office XML format specification; Read the critical piece document.xml file in the electronic document package of an OOXML, and extract the root node (document) in these parts, container (body); Paragraph (paragraph), text and property set thereof (run) and plain text (text).
In second step,, set up corresponding internal physical structure and internal logic relation to the electronic document package of each OOXML.
The 3rd step according to their storage inside structure and relation, and then analyzed the operating position of the internal physical space of the document, and their intrawares principle that is mutually related.
In the 4th step, whether the physical object that then detects in the electronic document package of each OOXML according to this relevance principle meets this principle, is not suspect object if meet.
The 5th step, modification time information, modification author information, modification number of times information, modified logo symbol information and property value thereof that profit uses the same method and from the electronic document of OOXML, extracts the document, and store.
In the 6th step,, judge whether the property value of the modified logo symbol in this editing area is identical, distorts attack if then explanation inequality has received for the once task of editing.
 
In sum; The embodiment of the invention proposes a kind of electronic document digital evidence collecting method based on OOXML to the electronic document of OOXML form; This method is the relevant evidence information that from the electronic document of OOXML, extracts, such as author information, distort temporal information, hiding secret papers information etc.The embodiment of the invention is not changed any displaying contents of electronic document, can be used for the safety guarantee of confidential information such as national defence, politics, commerce.
The above is merely the preferable embodiment of the present invention.But protection scope of the present invention is not limited thereto, and any technician who is familiar with the present technique field is in the technical scope that the present invention discloses, and the variation that can expect easily or replacement all should be encompassed within protection scope of the present invention.

Claims (5)

1. electronic document digital evidence collecting method and device thereof based on an OOXML, the electronic document digital evidence collecting method is the relevant evidence information that from the electronic document of OOXML, extracts, such as author information, distort temporal information, hiding secret papers information etc.The present invention does not change any displaying contents of electronic document, has very strong robustness, can resist attacks such as saving as, delete, edit, duplicate.
2. method according to claim 1 is characterized in that, electronic document digital evidence obtaining process performing step is following:
A. at first be according to Open Office XML format specification; Read the critical piece document.xml file in the electronic document package of an OOXML; And extract the root node (document) in these parts; Container (body), paragraph (paragraph), text and property set thereof (run) and plain text (text).
B. to the electronic document package of each OOXML, set up corresponding internal physical structure and internal logic relation.
C. according to their storage inside structure and relation, and then analyze the operating position of the internal physical space of the document, and their intrawares principle that is mutually related.
Whether the physical object that d. then detects in the electronic document package of each OOXML according to this relevance principle meets this principle, is not suspect object if meet.
E. sharp modification time information, modification author information, modification number of times information, modified logo symbol information and the property value thereof that uses the same method and from the electronic document of OOXML, extract the document, and store.
F. for the once task of editing, judge whether the property value of the modified logo symbol in this editing area is identical, distorts attack if then explanation inequality has received.
3. method according to claim 1 is characterized in that, the method that modified logo symbol and property value thereof produce is following:
A. after creating a new word2010 document, in the new text character of input in the document, will produce new text modification identifier automatically, the property value of this modified logo symbol is to be become by eight 16 system arrays, and its value is unique.
B. ought open an existing word2010 document; (comprise and change font, font size, color etc.) when original text is edited; Will produce new text modification identifier automatically; The property value of this modified logo symbol is to be become by eight 16 system arrays, and its value is unique.
C. ought open an existing word2010 document, will produce new text modification identifier automatically when importing new text, the property value of this modified logo symbol is to be become by eight 16 system arrays, and its value is unique.
4. method according to claim 1, this scheme have two principal features: the one, can carry out filtration treatment to document, and through the definition again of document internal part and relation, remove the classified item that is hidden in the document; The 2nd, this scheme can discern the document original author, revise author, modification time, modification number of times etc.
5. method according to claim 1 is characterized in that, any displaying contents of electronic document is not changed in this invention, can resist attacks such as saving as, delete, edit, duplicate.
CN2011100462543A 2011-02-27 2011-02-27 OOXML (office open extensible markup language)-based electronic document digital evidence collecting method and device thereof Pending CN102651057A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011100462543A CN102651057A (en) 2011-02-27 2011-02-27 OOXML (office open extensible markup language)-based electronic document digital evidence collecting method and device thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011100462543A CN102651057A (en) 2011-02-27 2011-02-27 OOXML (office open extensible markup language)-based electronic document digital evidence collecting method and device thereof

Publications (1)

Publication Number Publication Date
CN102651057A true CN102651057A (en) 2012-08-29

Family

ID=46693065

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011100462543A Pending CN102651057A (en) 2011-02-27 2011-02-27 OOXML (office open extensible markup language)-based electronic document digital evidence collecting method and device thereof

Country Status (1)

Country Link
CN (1) CN102651057A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105046159A (en) * 2015-06-18 2015-11-11 中国科学院信息工程研究所 Modification identifier based OOX text document privacy information detection method
CN105117235A (en) * 2015-09-18 2015-12-02 四川效率源信息安全技术股份有限公司 Method for reorganizing Office file
CN106407820A (en) * 2016-08-31 2017-02-15 江苏中威科技软件***有限公司 Method and system for preventing document from being tampered and leaked through watermark encryption
CN109409031A (en) * 2018-10-22 2019-03-01 中国科学院信息工程研究所 A kind of PDF document privacy leakage defence method and system
CN109960608A (en) * 2017-12-26 2019-07-02 北京安天网络安全技术有限公司 The processing method and processing system of office document
CN112329062A (en) * 2020-11-06 2021-02-05 卓尔智联(武汉)研究院有限公司 Method and device for detecting hidden data and electronic equipment

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101017544A (en) * 2007-02-15 2007-08-15 江苏国盾科技实业有限责任公司 Conflated seal affix authentication method having electronic seal digital certification

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101017544A (en) * 2007-02-15 2007-08-15 江苏国盾科技实业有限责任公司 Conflated seal affix authentication method having electronic seal digital certification

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A.CASTIGLIONE等: "Taking advantages of a disadvantage: Digital forensics and steganography using document metadata", 《THE JOURNAL OF SYSTEMS AND SOFTWARE》 *
BORA PARK等: "Data concealment and detection in Microsoft Office 2007 files", 《DIGITAL INVESTIGATION》 *
SIMSON L.GARFINKEL等: "New XML-Based Files Implications for Forensics", 《IEEE SECURITY & PRIVACY》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105046159A (en) * 2015-06-18 2015-11-11 中国科学院信息工程研究所 Modification identifier based OOX text document privacy information detection method
CN105046159B (en) * 2015-06-18 2018-04-03 中国科学院信息工程研究所 OOX text document privacy information detection methods based on modified logo symbol
CN105117235A (en) * 2015-09-18 2015-12-02 四川效率源信息安全技术股份有限公司 Method for reorganizing Office file
CN106407820A (en) * 2016-08-31 2017-02-15 江苏中威科技软件***有限公司 Method and system for preventing document from being tampered and leaked through watermark encryption
CN106407820B (en) * 2016-08-31 2019-12-10 江苏中威科技软件***有限公司 Method and system for preventing file from being tampered and leaked through watermark encryption
CN109960608A (en) * 2017-12-26 2019-07-02 北京安天网络安全技术有限公司 The processing method and processing system of office document
CN109409031A (en) * 2018-10-22 2019-03-01 中国科学院信息工程研究所 A kind of PDF document privacy leakage defence method and system
CN109409031B (en) * 2018-10-22 2021-11-09 中国科学院信息工程研究所 PDF document privacy disclosure defense method and system
CN112329062A (en) * 2020-11-06 2021-02-05 卓尔智联(武汉)研究院有限公司 Method and device for detecting hidden data and electronic equipment

Similar Documents

Publication Publication Date Title
Cao et al. Exploring the role of visual content in fake news detection
Sun et al. Coprotector: Protect open-source code against unauthorized training usage with data poisoning
CN102651057A (en) OOXML (office open extensible markup language)-based electronic document digital evidence collecting method and device thereof
CN101404037B (en) Method for detecting and positioning electronic text contents plagiary
CN107992764B (en) Sensitive webpage identification and detection method and device
CN102096787B (en) Method and device for hiding information based on word2007 text segmentation
CN103646195B (en) A kind of database water mark method towards copyright protection
Huggett Digital haystacks: Open data and the transformation of archaeological knowledge
Park et al. A study on science technology trend and prediction using topic modeling
CN104572849A (en) Automatic standardized filing method based on text semantic mining
Garg A novel text steganography technique based on html documents
CN102622443A (en) Customized screening system and method for microblog
CN105975575A (en) Automatic data type recognition method
Shapira et al. Content-based data leakage detection using extended fingerprinting
CN109918505A (en) A kind of network security incident visualization method based on text-processing
CN109857869A (en) A kind of hot topic prediction technique based on Ap increment cluster and network primitive
Milić et al. Framework for open data mining in e-government
Rui et al. A multiple watermarking algorithm for texts mixed Chinese and English
Liao et al. Evidential reasoning for forensic readiness
Burget Model-based Integration of Unstructured Web Data Sources using Graph Representation of Document Contents.
Jiang et al. A scientometric review of research evolution in digital forensics
Kieseberg et al. Structural limitations of B+-tree forensics
Lee Homoglyphs restoration with deep learning-focus on optical character recognition
Gautam et al. Fake textual and image news detection on social media using natural language processing
Wenhua et al. Application of unstructured data processing and analyzing base on chinese in digital data evidence collecting

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20120829