CN115756964A - Coprocessor firmware parsing method and device - Google Patents

Coprocessor firmware parsing method and device Download PDF

Info

Publication number
CN115756964A
CN115756964A CN202211469737.9A CN202211469737A CN115756964A CN 115756964 A CN115756964 A CN 115756964A CN 202211469737 A CN202211469737 A CN 202211469737A CN 115756964 A CN115756964 A CN 115756964A
Authority
CN
China
Prior art keywords
firmware
module
partition
code
coprocessor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211469737.9A
Other languages
Chinese (zh)
Inventor
陈志锋
李清宝
张贵民
姚伟平
曹飞
焦卫华
樊亚琪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information Engineering University of PLA Strategic Support Force
Original Assignee
Information Engineering University of PLA Strategic Support Force
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information Engineering University of PLA Strategic Support Force filed Critical Information Engineering University of PLA Strategic Support Force
Priority to CN202211469737.9A priority Critical patent/CN115756964A/en
Publication of CN115756964A publication Critical patent/CN115756964A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention belongs to the technical field of firmware reverse analysis and safety correlation, and particularly relates to a coprocessor firmware analysis method and a coprocessor firmware analysis device, wherein the method comprises the following steps: firstly, constructing a Huffman dictionary table used by a recovery coprocessor firmware compression algorithm by using a Gaussian elimination method and a heuristic method; then extracting BIOS firmware from the target mainboard, automatically decompressing the BIOS firmware and extracting coprocessor firmware from the BIOS firmware to be used as firmware to be analyzed; secondly, scanning the coprocessor firmware, acquiring a firmware partition table, traversing the firmware partition table, and respectively performing module extraction on the code partition and the data partition; and finally, traversing all the extracted modules, respectively decompressing and storing the decompressed results according to the compression algorithms used by the modules. The invention discloses a Hoffman dictionary table construction method based on a Gaussian elimination method and a heuristic method, which realizes the decompression of a coprocessor firmware module and finally finishes the information extraction and analysis of coprocessor firmware.

Description

Coprocessor firmware parsing method and device
Technical Field
The invention belongs to the technical field of firmware reverse analysis and safety correlation, and particularly relates to a coprocessor firmware analysis method and device which are mainly applied to firmware reverse analysis, firmware vulnerability mining and the like.
Background
With the development of trusted computing technology, a CPU in an X86 platform has a plurality of embedded microcontrollers in addition to an X86 core, and security coprocessors are increasingly used. Currently, most modern microprocessors include such processors as ME (Management Engine) from Intel, PSP (Platform Security Processor) from AMD, T2 Processor from apple, etc. These processors are typically used for system initialization or to assist the main operating system in performing power management tasks, etc. at runtime. In addition, the function of acting as a TPM to provide a trusted execution environment or as a system trust anchor, etc. But neither the architecture nor the running firmware of these coprocessors is open to the public (a coprocessor is a chip that relieves the system microprocessor of certain processing tasks). According to the research results of the relevant researchers, the Intel ME has complete access to and control of the PC, has the ability to start and close the computer, read open files, check all running programs, track button and mouse movements, and even capture screenshots, and has a network interface that is proven insecure, allowing an attacker to implant rootkit program controls and intrude into the computer.
Most of the current research on ME is focused on ME low versions. Under the continuous development of processor architecture and manufacturing process, the matched ME chip and firmware are also changed, and the ME firmware version has been developed from V11 to V16. Moreover, tests show that the firmware structure of the version above V11 is greatly changed, and different versions are greatly different, which causes great difficulty in developing reverse analysis of ME firmware and puts forward higher requirements. In 12 months 2020, the Intel officer first releases the security white paper of Intel CSME, and the system introduces the security mechanism and countermeasure technology introduced by CSME, but details such as ME firmware and the like are not disclosed, which makes the handling of encrypted or compressed coprocessor firmware more challenging, especially the recovery of dictionaries or reference tables used for compressed content.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a coprocessor firmware analysis method and device, which are based on a Hoffman dictionary table construction method of a Gaussian elimination method and a heuristic method, and are used for realizing decompression of a coprocessor firmware module and finally completing information extraction and analysis of coprocessor firmware.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention provides a coprocessor firmware parsing method, which comprises the following steps:
establishing a Huffman dictionary table used for recovering a coprocessor firmware compression algorithm by using a Gaussian elimination method and a heuristic method;
extracting BIOS firmware from the target mainboard, automatically decompressing the BIOS firmware and extracting coprocessor firmware from the BIOS firmware to be used as firmware to be analyzed;
scanning coprocessor firmware, acquiring a firmware partition table, traversing the firmware partition table, and respectively performing module extraction on a code partition and a data partition;
and traversing all the extracted modules, decompressing and storing the decompressed results according to the compression algorithms used by the modules.
Further, constructing a huffman dictionary table used by a recovery coprocessor firmware compression algorithm specifically comprises:
step 11, determining the length range of the Huffman dictionary table code word used by the coprocessor firmware;
step 12, extracting homonymous modules adopting different compression algorithms in the coprocessor firmware, calculating the hash of the modules decompressed by using LZMA, comparing the hash with the hash value of the modules adopting Hoffmann compression in the homonymous modules, and storing the modules with the same hash value in a pairing module table;
step 13, respectively constructing the range of the code word values under the same length according to the length range of the step 11;
step 14, according to the pairing module table in the step 12, a linear equation set is created, each page corresponds to one equation set, the coefficient of the equation set is the frequency of occurrence of a code sequence in a compressed page, the length of a code value is an unknown number, and a free term is the size of the page;
step 15, performing elementary transformation on the linear equation set in the step 14, and solving the transformed matrix by using a Gaussian elimination method to obtain the length value of each coding value;
step 16, comparing the plaintext values in the pairing module table, and acquiring the corresponding relation between the code sequence and the code value according to the length and the sequence of the code value;
step 17, for the sequences which cannot be restored in the steps 11 to 16, determining the coding value of the unknown code sequence by using a heuristic strategy;
and step 18, storing the code word values and the corresponding code values in the Huffman dictionary table in sequence.
Further, the encoding value of the unknown code sequence is determined in step 17 using the following heuristic strategy:
step 171, because two Huffman dictionary tables exist in the unpacking device, compressing data by using the two tables, comparing the size of the compressed data, and reserving the Huffman dictionary table occupying a smaller space; checking different versions of the same module, searching the same fragment packed in another table, and recovering the unknown byte after comparison;
step 172, searching code sequences appearing in the same or different code and data modules for multiple times, and determining the constraint applied to the unknown value according to the sequence change condition;
step 173, extracting and storing text character strings or function constants and offsets in all modules, obtaining codes or data segments applied to the modules of the same version according to the offset values, and recovering code values corresponding to code sequences after comparison;
step 174, analyzing the character string constant of the open source library, and recovering the coded value corresponding to the text character string segment through context and source code information;
step 175, analyzing the source code of the open library, searching the code text corresponding to the function, compiling the source code into a binary file, and restoring the code value corresponding to the function by comparing the binary information of the function;
step 176, comparing different versions of the same module, finding an equivalent function, and recovering the code value of the unknown module by the code value recovered by the module.
Furthermore, the key character string is taken as a characteristic to scan the firmware of the coprocessor, the firmware partition table is obtained, the partition number, the offset, the size and the type information of the firmware partition table are extracted, and the partition number, the offset, the size and the type information are stored in a partition table data structure.
Further, traversing all the partitions in the firmware partition table, judging the partition type, and if the partition type is a code, executing the steps 21 to 26;
step 21, analyzing the header information of the code partition directory and storing the header information into a partition directory table data structure;
step 22, analyzing the directory data information of the code partition, extracting the structure information of each module under the partition, and stopping extracting until the number of the analyzed modules is consistent with the number of the modules in the step 21;
step 23, traversing all modules, and judging the type of the compression algorithm in the module structure information according to the field value of the compression algorithm;
step 24, if the compression algorithm field value is None, directly storing the file content to the local in a binary mode according to the module size;
step 25, if the field value of the compression algorithm is LZMA, extracting the file content in a binary system mode according to the size of the module, and then calling an LZMA decompression algorithm to analyze the file content and store the file content to the local;
and step 26, if the field value of the compression algorithm is Huffman, extracting the file content in a binary mode according to the size of the module, and then calling a Huffman decompression algorithm to analyze the file content and store the file content to the local.
Further, the code partition directory header information includes the number of partition modules and the name of the partition; the code partition directory data information includes a compression algorithm, an offset, a compressed module size, and a decompressed module size.
Further, the Huffman decompression algorithm in step 26 further comprises:
step 261, calculating the number of the occupied pages after the module is compressed;
step 262, establishing a module offset information table;
263, traversing the offset information table, determining whether the last item is present, if yes, executing 267, otherwise, executing 264;
step 264, extracting the information of the offset parts of the current item and the next item, positioning the position of the compressed page, and taking the information of the offset part of the next item as the page compression size;
265, extracting the contents of the compression modules one by one according to the bit size and storing the contents into a character string, reducing the size of the module to be decompressed according to the bit size, judging whether the value of the character string is consistent with the code word in the Hoffman dictionary table, if so, extracting the coding value corresponding to the code word from the Hoffman dictionary table, storing the coding value into a decompression file in an additional form, and emptying the value of the character string; step 266 is performed;
step 266, determining whether the size of the module to be decompressed is 0, if yes, stopping decompressing the page, executing step 263, otherwise, executing step 265;
step 267, extracting the contents of the compression modules one by one according to the bit size and storing the contents into a character string, judging whether the value of the character string at the moment is consistent with the code word in the Hoffman dictionary table, if so, extracting the coding value corresponding to the code word from the Hoffman dictionary table, storing the coding value into a decompression file in an additional form, and emptying the value of the character string; go to step 268;
step 268, determining whether the byte number occupied by the decompressed code is the page size value, if yes, stopping the decompression of the module, otherwise, executing step 267.
Further, traversing all the partitions in the firmware partition table, judging the partition type, and if the partition type is data, executing steps 31 to 38;
step 31, calculating the page number of the data partition, and positioning the initial position of the first page of the data partition;
step 32, analyzing the data partition directory header information, judging whether the page is the last page of the partition according to the flag byte, if so, executing step 38, otherwise, extracting the page type field value, and executing step 33;
step 33, judging whether the field value of the page type field is 0, if so, setting the page type as a system page, and executing step 34, otherwise, setting the page type as a data page, and executing step 35;
step 34, positioning the position of the next page according to the partition offset, and executing step 32;
step 35, calculating the number of page blocks and the position of the first block;
step 36, sequentially analyzing the file allocation table according to the block size, judging whether a file exists at present, and if so, further recursively analyzing and storing field information according to the file type; the number of blocks is decreased;
step 37, positioning the next page position or block position, judging whether the block number is 0, if so, positioning the next page position according to the offset, and executing step 32, otherwise, entering the starting position of the next block according to the index, and executing step 36;
and step 38, extracting the head information, setting the page type and recording the analysis result.
The invention also provides a coprocessor firmware analysis device, which comprises:
the Huffman dictionary table building module is used for building a Huffman dictionary table used for recovering a coprocessor firmware compression algorithm by using a Gaussian elimination method and a heuristic method;
the firmware decompression module is used for extracting the BIOS firmware from the target mainboard, automatically decompressing the BIOS firmware and extracting the coprocessor firmware from the BIOS firmware to be used as the firmware to be analyzed;
the partition table traversing module scans coprocessor firmware, acquires a firmware partition table, traverses the firmware partition table, and respectively extracts a code partition and a data partition;
and the decompression module is used for traversing all the extracted modules, and respectively decompressing and storing the decompression results according to the compression algorithm used by the modules.
Further, the huffman dictionary table constructing module comprises:
the hash value comparison module is used for calculating hash values of modules with the same name and adopting different compression algorithms, wherein the module adopting the LZMA compression algorithm decompresses firstly and then calculates the hash values, and then compares the calculated hash values to obtain a pairing module table;
the Gaussian elimination module is used for creating a linear equation set, carrying out primary transformation on the equation set, solving a matrix by using a Gaussian elimination method, constructing the length of a code value, and recovering the code value of the code sequence by contrasting the pairing module table;
and the heuristic module is used for further recovering the coding value of the unknown code sequence.
Compared with the prior art, the invention has the following advantages:
the Huffman dictionary table construction method combining the Gaussian elimination method and the heuristic method is suitable for restoring Huffman compression contents in the latest version of coprocessor firmware, can automatically analyze module composition of the coprocessor firmware and restore file contents corresponding to each compression module, and provides important support for further reverse analysis and analog simulation.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart illustrating a method for coprocessor firmware parsing with partition type code according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a method for coprocessor firmware parsing with partition type data according to an embodiment of the present invention;
FIG. 3 is a flow chart schematic of a Huffman decompression algorithm of an embodiment of the present invention;
fig. 4 is a flowchart illustrating recovering a huffman dictionary table according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1 and fig. 2, the present embodiment provides a firmware parsing method for an Intel coprocessor, which is intended to analyze the configuration of coprocessor firmware, and includes the following steps:
and S1, constructing a Huffman dictionary table used by a recovery coprocessor firmware compression algorithm.
And S2, acquiring the firmware of the coprocessor.
Coprocessor firmware is typically stored with BIOS firmware by fetching the target motherboard firmware and fetching the coprocessor firmware when the firmware is parsed using a firmware parsing tool (e.g., uefit).
And S3, acquiring a firmware partition table.
Scanning a firmware binary file by taking a keyword "$ FPT" or a 16-system numerical value 0x24465054 as a feature, acquiring a firmware Flash partition table, extracting information such as partition quantity, partition offset, size and type of the partition table, and storing the information into a partition table data structure.
And S4, traversing all the partitions in the firmware partition table, judging the partition types, executing the steps S5-S7 if the partition types are codes, and executing the steps S8-S15 if the partition types are data.
And S5, analyzing the code partition directory header information, extracting information such as "$ CPD", the number of partition modules, the partition names and the like, and storing the information into a partition directory table data structure.
And S6, analyzing the code partition directory data information, respectively extracting the module structure information of the compression algorithm, the offset, the module size after compression, the module size after decompression and the like of each module under the partition by taking 24 bytes as a unit, adding 1 to the number of modules when the structure information of 1 module is extracted, and stopping when the number of modules is equal to the number of partition modules extracted in the step S5, and executing the step S7.
And S7, traversing all modules, and judging the compression algorithm type in the module structure information according to the compression algorithm field value.
If the compression algorithm field value is None, calculating the actual position of the module in the firmware according to the offset in the module structure information, and extracting the module content from the actual position according to the module size information in the byte sequence to form a module file; if the field value of the compression algorithm is LZMA, directly calling an open-source LZMA decompression algorithm, extracting module contents as the input of the LZMA decompression algorithm in a byte sequence from the actual position according to the size information after the module is compressed, and then storing the contents output by the LZMA algorithm as a module file; if the field value of the compression algorithm is Huffman, a Huffman decompression algorithm is called to decompress the module; step S4 is performed.
And S8, calculating the number of the partitioned pages, dividing the partition size in the partition table data structure by 0x2000 to solve the number of the partitioned pages, and positioning the first page position according to the partition offset.
Step S9, analyzing the header information of the data partition directory, judging whether the flag byte is 'AA 557887', if so, extracting first chunk field information, and executing step S10; otherwise, step S15 is executed.
Step S10, judging whether the first chunk field value is 0, if so, setting the page type to be 'system', executing step S11, otherwise, setting the page type to be 'data', and executing step S12.
And S11, checking the crc integrity of each chunk in the page, positioning the position of the next page according to the partition offset, and executing the step S9.
Step S12, the number of chunks of the page and the offset of the first chunk are calculated.
Step S13, judging whether a file exists in the current directory according to the file allocation table value, if so, further judging the file type, recursively analyzing and storing all field information according to a defined structure, and otherwise, skipping; the number of chunks is decremented by 1 and step S14 is performed.
Step S14, judging whether the number of chunks is equal to 0, if so, positioning the position of the next page according to the partition offset, and executing step S9; otherwise, jumping to the next chunk starting position according to the index, and executing step S13.
And step S15, extracting the head information, setting the page type as 'Scratch', recording the analysis result, and ending the analysis work.
As shown in fig. 4, step S1 further includes:
and S101, analyzing the structure of the low-version Hoffman dictionary table, and determining the length range of the code words in the Hoffman dictionary table used by the high-version firmware.
And S102, extracting modules with the same name and adopting different compression algorithms in the coprocessor firmware, decompressing the modules compressed by the LZMA algorithm, calculating SHA256 of the decompressed modules, comparing the SHA256 with the SHA256 of the modules with the same name and adopting Huffman compression, and storing the modules with the same value in a pairing module table.
In step S103, ranges (boundaries) of codeword values at the same length are respectively constructed from the length ranges determined in step S101.
And step S104, creating a linear equation set according to the pairing module table in the step S102, wherein each page corresponds to one equation set, the coefficient of the equation set is the frequency of occurrence of a code sequence in a compressed page, the length of a code value is an unknown number, and the value of a free term is 4096.
And step S105, performing primary transformation on the linear equation set in the step S104, and solving the transformed matrix by using a Gaussian elimination method to obtain the length value of each code value.
And step S106, comparing the plaintext values in the pairing module table, and acquiring the corresponding relation between the code sequence and the code value according to the length and the sequence of the code value.
And step S107, for the sequences which cannot be restored in the steps S101-S106, determining the coding value of the unknown code sequence by using the following heuristic strategy.
Step S1071, because two Huffman dictionary tables exist in the unpacker, the compressed data is compared by using the two tables, and the Huffman dictionary table occupying a smaller space is reserved; and looking up different versions of the same module, looking up the same fragment packaged in another table, and recovering the unknown byte consistently after comparison.
Step S1072, searching code sequences appearing in the same or different code and data modules for multiple times, and determining the constraint applied to the unknown value according to the sequence change condition.
And step S1073, extracting and storing constants and offsets such as text character strings or functions in all modules, obtaining codes or data segments applied to the modules of the same version according to the offset values, and recovering the code values corresponding to the code sequences after comparison.
And step S1074, analyzing the character string constants of the WPA and other open source libraries, and restoring the coded value corresponding to the text character string segment through context and source code information.
Step S1075, analyzing the source code of the open library, searching the code text corresponding to the function, compiling the source code into a binary file, and comparing the binary information of the function to restore the code value corresponding to the function.
Step S1076, compare different versions of the same module, find the equivalent function, and resume the code value of the unknown module through the code value that the module has already been resumed.
And step S108, sequentially storing the code word values and the corresponding code values into a Huffman dictionary table according to the length values of the code values recovered in the steps S101-107.
As shown in fig. 3, the Huffman decompression algorithm in step S7 further comprises:
step S701, calculating the number of the pages occupied by the module after compression according to the decompressed size in the module structure information and the fixed size of each independent page.
Step S702, a module offset information table is established, then data is read from the actual position according to the size of 4 bytes according to the actual position of an offset calculation module in firmware in the module structure information, the first 2 bytes are stored in the offset part of the offset information table, the values of the last two bytes are judged, if the value is 0x0040, the page is a code page, the code attribute is stored in the attribute part of the offset information table, if the value is 0x00C0, the page is a data page, and the data attribute is stored in the attribute part of the offset information table.
Step S703, traversing the offset information table, and determining whether the item is the last item, if so, executing step S707; otherwise, step S704 is executed.
Step S704, extracting the information of the current item and the next item offset part, positioning the position of the compressed page according to the information of the current item offset part, and taking the information of the next item offset part as the page compression size.
Step S705, extracting the compression module contents one by one according to the bit size and storing the compression module contents into a character string, reducing the size of the compression module by 1, judging whether the character string value at the moment is consistent with a code word in a Hoffman dictionary table, if so, extracting a coding value corresponding to the code word from the Hoffman dictionary table, storing the coding value into a decompression file in an additional form, and emptying the character string value; step S706 is performed.
Step S706, determining whether the page compression size value at this time is 0, if yes, stopping decompression of the page, and executing step S703; otherwise, step S705 is performed.
Step S707, extracting the compression module content one by one according to the bit size and storing the compression module content in a character string, and judging whether the character string value at the moment is consistent with the code word in the Hoffman dictionary table, if so, extracting the coding value corresponding to the code word from the Hoffman dictionary table, storing the coding value in a decompression file in an additional form, and emptying the character string value; step S708 is executed.
Step S708, judging whether the byte number occupied by the decompressed code is 4096 bytes, if yes, stopping the decompression of the module; otherwise, step S707 is executed.
Correspondingly to the above coprocessor firmware parsing method, this embodiment further provides a coprocessor firmware parsing apparatus, which includes a huffman dictionary table construction module, a firmware decompression module, a partition table traversal module, and a decompression module.
The Huffman dictionary table building module is used for building a Huffman dictionary table used for recovering a coprocessor firmware compression algorithm by using a Gaussian elimination method and a heuristic method;
the firmware decompression module is used for extracting the BIOS firmware from the target mainboard, automatically decompressing the BIOS firmware and extracting the coprocessor firmware from the BIOS firmware to be used as the firmware to be analyzed;
the partition table traversal module scans the coprocessor firmware, acquires a firmware partition table, traverses the firmware partition table and respectively extracts the code partition and the data partition;
and the decompression module is used for traversing all the extracted modules, and respectively decompressing and storing the decompression results according to the compression algorithms used by the modules.
Further, the Huffman dictionary table building module comprises a hash value comparison module, a Gaussian elimination module and a heuristic module.
The hash value comparison module is used for calculating hash values of modules with the same name and adopting different compression algorithms, wherein the module adopting the LZMA compression algorithm decompresses firstly and then calculates the hash values, and then compares the calculated hash values to obtain a pairing module table;
the Gaussian elimination module is used for creating a linear equation set, carrying out primary transformation on the equation set, solving a matrix by using a Gaussian elimination method, constructing the length of a code value, and recovering the code value of the code sequence by contrasting the pairing module table;
and the heuristic module is used for further recovering the coding value of the unknown code sequence.
The invention utilizes the Hoffman dictionary table recovery and construction technology combining the Gaussian elimination method and the heuristic method to automatically analyze the coprocessor firmware, can decompress the compressed content, and is suitable for the coprocessor firmware of a new edition.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Finally, it is to be noted that: the above description is only a preferred embodiment of the present invention, and is only used to illustrate the technical solutions of the present invention, and not to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (10)

1. A coprocessor firmware parsing method, comprising the steps of:
establishing a Huffman dictionary table used by a recovery coprocessor firmware compression algorithm by using a Gaussian elimination method and a heuristic method;
extracting BIOS firmware from the target mainboard, automatically decompressing the BIOS firmware and extracting coprocessor firmware from the BIOS firmware to be used as firmware to be analyzed;
scanning coprocessor firmware, acquiring a firmware partition table, traversing the firmware partition table, and respectively performing module extraction on a code partition and a data partition;
and traversing all the extracted modules, decompressing and storing the decompressed results according to the compression algorithms used by the modules.
2. The coprocessor firmware parsing method according to claim 1, wherein constructing a huffman dictionary table used for recovering a coprocessor firmware compression algorithm specifically comprises:
step 11, determining the length range of the Huffman dictionary table code word used by the coprocessor firmware;
step 12, extracting homonymous modules adopting different compression algorithms in the coprocessor firmware, calculating the hash of the modules decompressed by using LZMA, comparing the hash with the hash value of the modules adopting Hoffmann compression in the homonymous modules, and storing the modules with the same hash value in a pairing module table;
step 13, respectively constructing the range of the code word values under the same length according to the length range of the step 11;
step 14, creating a linear equation set according to the pairing module table in the step 12, wherein each page corresponds to one equation set, the coefficient of the equation set is the frequency of occurrence of a code sequence in a compressed page, the length of a code value is an unknown number, and a free term is the size of the page;
step 15, performing elementary transformation on the linear equation set in the step 14, and solving the transformed matrix by using a Gaussian elimination method to obtain the length value of each coding value;
step 16, comparing the plaintext values in the pairing module table, and acquiring the corresponding relation between the code sequence and the code value according to the length and the sequence of the code value;
step 17, for the sequences which cannot be restored in the steps 11 to 16, determining the coding value of the unknown code sequence by using a heuristic strategy;
and step 18, sequentially storing the code word values and the corresponding code values into a Huffman dictionary table.
3. The coprocessor firmware parsing method according to claim 2, wherein the following heuristic strategy is used in step 17 to determine the coding value of the unknown code sequence:
171, because two Huffman dictionary tables exist in the unpacker, compressing data by using the two tables, comparing the size of the compressed data, and reserving the Huffman dictionary table occupying a smaller space; checking different versions of the same module, searching the same segment packed in another table, and recovering the unknown bytes after comparison;
step 172, searching code sequences appearing in the same or different code and data modules for multiple times, and determining constraints applied to unknown values according to sequence change conditions;
step 173, extracting and storing text character strings or function constants and offsets in all modules, obtaining codes or data segments applied to the modules of the same version according to the offset values, and recovering code values corresponding to code sequences after comparison;
step 174, analyzing the character string constant of the open source library, and recovering the coded value corresponding to the text character string segment through context and source code information;
step 175, analyzing the source code of the open library, searching the code text corresponding to the function, compiling the source code into a binary file, and restoring the code value corresponding to the function by comparing the binary information of the function;
step 176, compare different versions of the same module, find the equivalent function, and recover the encoding value of the unknown module through the encoding value recovered by the module.
4. The coprocessor firmware parsing method according to claim 2, wherein the key character string is used as a feature to scan the coprocessor firmware, obtain the firmware partition table, extract the partition number, offset, size and type information of the firmware partition table, and store the partition number, offset, size and type information in the partition table data structure.
5. The coprocessor firmware parsing method according to claim 4, wherein all partitions in the firmware partition table are traversed, a partition type is determined, and if the partition type is a code, steps 21 to 26 are performed;
step 21, analyzing the header information of the code partition directory and storing the header information into a partition directory table data structure;
step 22, analyzing the directory data information of the code partition, extracting the structure information of each module under the partition, and stopping extracting until the number of the analyzed modules is consistent with the number of the modules in the step 21;
step 23, traversing all modules, and judging the type of the compression algorithm in the module structure information according to the field value of the compression algorithm;
step 24, if the compression algorithm field value is None, directly storing the file content to the local in a binary mode according to the module size;
step 25, if the field value of the compression algorithm is LZMA, extracting the file content in a binary mode according to the size of the module, and then calling an LZMA decompression algorithm to analyze the file content and store the file content to the local;
and step 26, if the field value of the compression algorithm is Huffman, extracting the file content in a binary mode according to the size of the module, and then calling a Huffman decompression algorithm to analyze the file content and store the file content to the local.
6. The coprocessor firmware parsing method of claim 5, wherein the code partition directory header information comprises a partition module number and a partition name; the code partition directory data information includes a compression algorithm, an offset, a compressed module size, and a decompressed module size.
7. The coprocessor firmware parsing method of claim 5, wherein the Huffman decompression algorithm in step 26 further comprises:
step 261, calculating the number of the occupied pages after the module is compressed;
step 262, establishing a module offset information table;
step 263, traverse the offset table, and determine whether the last item is, if yes, execute step 267, otherwise, execute step 264;
step 264, extracting the information of the offset parts of the current item and the next item, positioning the position of the compressed page, and taking the information of the offset part of the next item as the page compression size;
265, extracting the contents of the compression modules one by one according to the bit size and storing the contents into a character string, reducing the size of the module to be decompressed according to the bit size, judging whether the value of the character string is consistent with the code word in the Hoffman dictionary table, if so, extracting the coding value corresponding to the code word from the Hoffman dictionary table, storing the coding value into a decompression file in an additional form, and emptying the value of the character string; step 266 is performed;
step 266, determining whether the size of the module to be decompressed is 0, if yes, stopping decompressing the page, executing step 263, otherwise, executing step 265;
step 267, extracting the contents of the compression modules one by one according to the bit size and storing the contents into a character string, judging whether the value of the character string at the moment is consistent with the code word in the Hoffman dictionary table, if so, extracting the coding value corresponding to the code word from the Hoffman dictionary table, storing the coding value into a decompression file in an additional form, and emptying the value of the character string; step 268 is performed;
step 268, determining whether the byte number occupied by the decompressed code is the page size value, if yes, stopping the decompression of the module, otherwise, executing step 267.
8. The coprocessor firmware parsing method according to claim 4, wherein all partitions in the firmware partition table are traversed, a partition type is determined, and if the partition type is data, steps 31 to 38 are executed;
step 31, calculating the page number of the data partition, and positioning the initial position of the first page of the data partition;
step 32, analyzing the data partition directory header information, judging whether the page is the last page of the partition according to the flag byte, if so, executing step 38, otherwise, extracting the page type field value, and executing step 33;
step 33, judging whether the page type field value is 0, if so, setting the page type as a system page, and executing step 34, otherwise, setting the page type as a data page, and executing step 35;
step 34, positioning the position of the next page according to the partition offset, and executing step 32;
step 35, calculating the number of page blocks and the position of the first block;
step 36, sequentially analyzing the file allocation table according to the block size, judging whether a file exists at present, and if so, further recursively analyzing and storing field information according to the file type; the number of blocks is decreased;
step 37, positioning the next page position or block position, judging whether the block number is 0, if so, positioning the next page position according to the offset, and executing step 32, otherwise, entering the starting position of the next block according to the index, and executing step 36;
and step 38, extracting the head information, setting the page type and recording the analysis result.
9. A coprocessor firmware parser, comprising:
the Huffman dictionary table building module is used for building a Huffman dictionary table used for recovering a coprocessor firmware compression algorithm by using a Gaussian elimination method and a heuristic method;
the firmware decompression module is used for extracting the BIOS firmware from the target mainboard, automatically decompressing the BIOS firmware and extracting the coprocessor firmware from the BIOS firmware to be used as the firmware to be analyzed;
the partition table traversing module scans coprocessor firmware, acquires a firmware partition table, traverses the firmware partition table, and respectively extracts a code partition and a data partition;
and the decompression module is used for traversing all the extracted modules, and respectively decompressing and storing the decompression results according to the compression algorithm used by the modules.
10. The coprocessor firmware parsing device of claim 9, wherein the huffman dictionary table building module comprises:
the hash value comparison module is used for calculating hash values of modules with the same name and adopting different compression algorithms, wherein the module adopting the LZMA compression algorithm decompresses firstly and then calculates the hash values, and then compares the calculated hash values to obtain a pairing module table;
the Gaussian elimination module is used for creating a linear equation set, carrying out elementary transformation on the equation set, solving a matrix by using a Gaussian elimination method, constructing the length of a code value, and recovering the code value of the code sequence by contrasting the pairing module table;
and the heuristic module is used for further recovering the coding value of the unknown code sequence.
CN202211469737.9A 2022-11-22 2022-11-22 Coprocessor firmware parsing method and device Pending CN115756964A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211469737.9A CN115756964A (en) 2022-11-22 2022-11-22 Coprocessor firmware parsing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211469737.9A CN115756964A (en) 2022-11-22 2022-11-22 Coprocessor firmware parsing method and device

Publications (1)

Publication Number Publication Date
CN115756964A true CN115756964A (en) 2023-03-07

Family

ID=85335419

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211469737.9A Pending CN115756964A (en) 2022-11-22 2022-11-22 Coprocessor firmware parsing method and device

Country Status (1)

Country Link
CN (1) CN115756964A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116578542A (en) * 2023-07-13 2023-08-11 鹏钛存储技术(南京)有限公司 Hardware implementation method and system of self-adaptive compression algorithm based on configurable logic
CN116775544A (en) * 2023-08-23 2023-09-19 上海芯联芯智能科技有限公司 Coprocessor and computer equipment

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116578542A (en) * 2023-07-13 2023-08-11 鹏钛存储技术(南京)有限公司 Hardware implementation method and system of self-adaptive compression algorithm based on configurable logic
CN116578542B (en) * 2023-07-13 2023-09-29 鹏钛存储技术(南京)有限公司 Hardware implementation method and system of self-adaptive compression algorithm based on configurable logic
CN116775544A (en) * 2023-08-23 2023-09-19 上海芯联芯智能科技有限公司 Coprocessor and computer equipment
CN116775544B (en) * 2023-08-23 2023-11-28 上海芯联芯智能科技有限公司 Coprocessor and computer equipment

Similar Documents

Publication Publication Date Title
CN115756964A (en) Coprocessor firmware parsing method and device
US9135289B2 (en) Matching transactions in multi-level records
US20200411138A1 (en) Compressing, storing and searching sequence data
EP2446363B1 (en) Algorithm for generating a digital DNA sequence
JP6596102B2 (en) Lossless data loss by deriving data from basic data elements present in content-associative sheaves
US8224641B2 (en) Language identification for documents containing multiple languages
US10747880B2 (en) System and method for identifying and comparing code by semantic abstractions
CA2876466A1 (en) Scan optimization using bloom filter synopsis
CN108845843B (en) Function processing method and device and related equipment
Burtscher et al. Compressing extended program traces using value predictors
CN117940894A (en) System and method for detecting code clones
EP1590732A2 (en) Method and apparatus for instruction compression
CN111444411A (en) Network data increment acquisition method, device, equipment and storage medium
AU2017248417B2 (en) Fuzzy hash algorithm
Pungila Improved file-carving through data-parallel pattern matching for data forensics
Fariña et al. On the reproducibility of experiments of indexing repetitive document collections
Garfinkel et al. Sharpening your tools: Updating bulk_extractor for the 2020s
CN113721928A (en) Binary analysis-based dynamic library clipping method
CN115904486A (en) Code similarity detection method and device
Karcioglu et al. q‐frame hash comparison based exact string matching algorithms for DNA sequences
CN112202822B (en) Database injection detection method and device, electronic equipment and storage medium
CN102930208A (en) Method and system for processing files affected by virus
CN117473494A (en) Method and device for determining homologous binary files, electronic equipment and storage medium
EP2795488B1 (en) Compressing, storing and searching sequence data
CN113553587A (en) File detection method, device, equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination