TWI765426B - Character-generating appartus, character-generating method and computer program thereof for building test data - Google Patents
Character-generating appartus, character-generating method and computer program thereof for building test data Download PDFInfo
- Publication number
- TWI765426B TWI765426B TW109141188A TW109141188A TWI765426B TW I765426 B TWI765426 B TW I765426B TW 109141188 A TW109141188 A TW 109141188A TW 109141188 A TW109141188 A TW 109141188A TW I765426 B TWI765426 B TW I765426B
- Authority
- TW
- Taiwan
- Prior art keywords
- character
- weight values
- predicted
- current output
- characters
- Prior art date
Links
- 238000012360 testing method Methods 0.000 title claims abstract description 83
- 238000000034 method Methods 0.000 title claims abstract description 58
- 238000004590 computer program Methods 0.000 title claims abstract description 17
- 238000003860 storage Methods 0.000 claims description 20
- 238000012549 training Methods 0.000 claims description 13
- 238000004891 communication Methods 0.000 claims description 9
- 238000013528 artificial neural network Methods 0.000 claims description 7
- 238000009826 distribution Methods 0.000 claims description 5
- 238000005070 sampling Methods 0.000 claims description 5
- 230000015654 memory Effects 0.000 description 15
- 238000010276 construction Methods 0.000 description 11
- 230000000694 effects Effects 0.000 description 5
- 238000013136 deep learning model Methods 0.000 description 4
- 238000012546 transfer Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 239000003795 chemical substances by application Substances 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 230000006403 short-term memory Effects 0.000 description 2
- 235000014510 cooky Nutrition 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Images
Landscapes
- Debugging And Monitoring (AREA)
Abstract
Description
本發明的實施例是關於一種字元生成裝置、字元生成方法及其電腦程式產品。具體而言,本發明的實施例是關於一種用於建構測試資料的字元生成裝置、字元生成方法及其電腦程式產品。Embodiments of the present invention relate to a character generating device, a character generating method and a computer program product thereof. Specifically, the embodiments of the present invention relate to a character generating apparatus for constructing test data, a character generating method and a computer program product thereof.
模糊測試是一種自動挖掘受測對象(例如:產線上的生產機具中運行的軟體)的安全漏洞的一種檢測方法,其透過發送測試資料至受測裝置來造成受測裝置發生錯誤,藉此找出受測裝置中存在的安全漏洞。傳統基於文字(text-based)的測試資料生成技術可透過深度學習模型分析受測裝置的封包來學習受測裝置採用的協定的封包格式,以逐字元地生成符合受測裝置所能接受格式的測試資料。在此情況下,倘若深度學習模型的訓練效果不佳,則容易出現所生成的測試資料因格式不符協定而不被受測裝置接受的情況。反之,倘若深度學習模型的訓練效果優良,也可能因其對於封包格式的高模仿精準度而出現所生成的訓練資料過於正常(即,無法造成受測裝置出現錯誤的情況),因而無法滿足模糊測試的需求。換言之,對於傳統的測試資料生成技術而言,即便其深度學習模型的訓練效果已然相當優良,其生成的訓練資料的有效性對於模糊測試而言亦有所不足。有鑑於此,本發明所屬技術領域亟需一種能夠為模糊測試逐字元地生成有效測試資料的方法。Fuzz testing is a detection method that automatically discovers the security vulnerabilities of the object under test (for example, the software running in the production equipment on the production line). identify security vulnerabilities in the device under test. The traditional text-based test data generation technology can analyze the packets of the device under test through a deep learning model to learn the agreed packet format adopted by the device under test, so as to generate a format that meets the requirements of the device under test character by character. test data. In this case, if the training effect of the deep learning model is not good, the generated test data may not be accepted by the device under test because the format does not conform to the agreement. Conversely, if the training effect of the deep learning model is good, the generated training data may be too normal (that is, cannot cause errors in the device under test) due to its high imitation accuracy for the packet format, so it cannot meet the fuzzy requirements. testing needs. In other words, for the traditional test data generation technology, even if the training effect of the deep learning model is quite good, the validity of the generated training data is insufficient for fuzzing. In view of this, there is an urgent need in the technical field to which the present invention pertains to a method capable of generating valid test data character-by-character for fuzzing testing.
為了解決至少上述的問題,本發明的實施例提供一種用於建構測試資料的字元生成裝置。該字元生成裝置可包含一儲存器以及與該儲存器電性連接的一處理器。該儲存器可用以儲存一字元預測模型以及一字元集,且該字元集可包含複數個字元。該處理器可用以運行一字元預測模型,以根據一當前輸出字串而產生一字元預測向量,該字元預測向量可包含對應至該等字元的複數個權重值。該處理器還可用以根據該字元預測向量自該字元集當中決定一預測字元,並判斷該等權重值中相應於該預測字元的權重值是否小於一臨界值。若相應於該預測字元的權重值小於該臨界值,則該處理器可反轉該等權重值,並根據反轉後的該等權重值而自該字元集當中選取一字元做為一輸出字元。若相應於該預測字元的權重值不小於該臨界值,則該處理器可將該預測字元做為該輸出字元。In order to solve at least the above problems, embodiments of the present invention provide a character generation device for constructing test data. The character generating device may include a storage and a processor electrically connected to the storage. The storage can be used to store a character prediction model and a character set, and the character set can include a plurality of characters. The processor is operable to run a character prediction model to generate a character prediction vector according to a current output string, and the character prediction vector may include a plurality of weight values corresponding to the characters. The processor is further configured to determine a predicted character from the character set according to the character prediction vector, and determine whether a weight value corresponding to the predicted character among the weight values is smaller than a threshold. If the weight value corresponding to the predicted character is less than the threshold value, the processor may reverse the weight value and select a character from the set of characters according to the reversed weight value An output character. If the weight value corresponding to the predicted word is not less than the threshold, the processor may use the predicted word as the output word.
為了至少解決上述的問題,本發明的實施例還提供了一種用於建構測試資料的字元生成方法。該字元生成方法可由一電子計算裝置所執行。該電子計算裝置可儲存一字元預測模型以及一字元集,且該字元集包含複數個字元。該字元生成方法可包含以下步驟: 運行該字元預測模型,以根據一當前輸出字串而產生一字元預測向量,其中該字元預測向量包含對應至該等字元的複數個權重值; 根據該字元預測向量,自該字元集當中決定一預測字元; 當判斷該等權重值中相應於該預測字元的權重值小於一臨界值時,反轉該等權重值,並根據反轉後的該等權重值而自該字元集當中選取一字元做為一輸出字元;以及 當判斷該等權重值中相應於該預測字元的權重值不小於該臨界值時,將該預測字元做為該輸出字元。 In order to at least solve the above problems, embodiments of the present invention also provide a character generation method for constructing test data. The character generation method can be executed by an electronic computing device. The electronic computing device can store a character prediction model and a character set, and the character set includes a plurality of characters. The character generation method can include the following steps: running the character prediction model to generate a character prediction vector according to a current output string, wherein the character prediction vector includes a plurality of weight values corresponding to the characters; determining a predicted character from the set of characters according to the character prediction vector; When it is determined that the weight value corresponding to the predicted character among the weight values is smaller than a threshold value, the weight values are reversed, and a character is selected from the character set according to the reverse weight values as an output character; and When it is determined that the weight value corresponding to the predicted character among the weight values is not less than the threshold value, the predicted character is used as the output character.
為了至少解決上述的問題,本發明的實施例還提供了一種電腦程式產品。一電子計算裝置載入該電腦程式產品所包含之複數個程式指令後可執行用於建構測試資料的一字元生成方法。該電子計算裝置可儲存一字元預測模型以及一字元集,且該字元集可包含複數個字元。該字元生成方法包含以下步驟: 運行該字元預測模型,以根據一當前輸出字串而產生該字元預測向量,其中該字元預測向量包含對應至該等字元的複數個權重值; 根據該字元預測向量,自該字元集當中決定一預測字元; 當判斷該等權重值中相應於該預測字元的權重值小於一臨界值時,反轉該等權重值,並根據反轉後的該等權重值而自該字元集當中選取一字元做為一輸出字元;以及 當判斷該等權重值中相應於該預測字元的權重值不小於該臨界值時,將該預測字元做為該輸出字元。 In order to solve at least the above problems, embodiments of the present invention also provide a computer program product. After loading a plurality of program instructions included in the computer program product, an electronic computing device can execute a character generation method for constructing test data. The electronic computing device can store a character prediction model and a character set, and the character set can include a plurality of characters. The character generation method consists of the following steps: running the character prediction model to generate the character prediction vector according to a current output string, wherein the character prediction vector includes a plurality of weight values corresponding to the characters; determining a predicted character from the set of characters according to the character prediction vector; When it is determined that the weight value corresponding to the predicted character among the weight values is smaller than a threshold value, the weight values are reversed, and a character is selected from the character set according to the reverse weight values as an output character; and When it is determined that the weight value corresponding to the predicted character among the weight values is not less than the threshold value, the predicted character is used as the output character.
綜上所述,本發明提供的字元生成方法透過判斷字元預測向量中相應於預測字元的權重值是否小於指定的一臨界值來判斷該預測字元是否屬於可做出變更的字元。若該預測字元的權重值不小於該臨界值,表示該預測字元對於受測裝置採用的協定格式而言出現的頻率較高,可能屬於格式中的固定字元(即,對其進行變異將造成測試資料格式不符的情況,例如「http://」等字串中的各個字元),故可直接輸出該預測字元。反之,若該預測字元的權重值小於該臨界值,表示該預測字元對於受測裝置採用的協定格式而言出現的頻率較低,可能屬於可變異的字元,故可反轉該等權重值,以增加字元集當中其他原先權重值較低的字元的輸出機率,進而達到變異的效果。據此,透過本發明提供的字元生成方法所建構的測試資料不但可符合協定格式,還可具有模糊測試為了造成受測裝置出現錯誤所需的測試資料變異性。是以,本發明提供的測試資料生成方法確實有效地解決了傳統基於文字的測試資料生成技術的上述問題。To sum up, the method for generating a character provided by the present invention determines whether the predicted character belongs to a changeable character by judging whether the weight value corresponding to the predicted character in the character prediction vector is less than a specified threshold value. . If the weight value of the predicted character is not less than the threshold value, it means that the predicted character has a relatively high frequency for the agreed format adopted by the device under test, and it may belong to the fixed character in the format (that is, mutating it It will cause the test data format to be inconsistent, such as each character in the string such as "http://"), so the predicted character can be output directly. Conversely, if the weight value of the predicted character is less than the threshold, it means that the predicted character appears less frequently in the protocol format adopted by the device under test, and may be a variable character, so the predicted characters can be reversed. The weight value is used to increase the output probability of other characters with lower weight values in the character set, so as to achieve the effect of mutation. Accordingly, the test data constructed by the character generation method provided by the present invention can not only conform to the agreed format, but also have the test data variability required by the fuzz test to cause errors in the device under test. Therefore, the test data generation method provided by the present invention effectively solves the above-mentioned problems of the traditional text-based test data generation technology.
以上內容並非為了限制本發明,而只是概括地敘述了本發明可解決的技術問題、可採用的技術手段以及可達到的技術功效,以讓本發明所屬技術領域中具有通常知識者初步地瞭解本發明。根據檢附的圖式及以下的實施方式所記載的內容,本發明所屬技術領域中具有通常知識者便可進一步瞭解本發明的各種實施例的細節。The above contents are not intended to limit the present invention, but merely describe the technical problems that can be solved by the present invention, the technical means that can be adopted and the technical effects that can be achieved, so that those with ordinary knowledge in the technical field to which the present invention belongs can have a preliminary understanding of the present invention. invention. Those with ordinary knowledge in the technical field to which the present invention pertains can further understand the details of various embodiments of the present invention according to the attached drawings and the contents described in the following embodiments.
以下將透過多個實施例來說明本發明,惟這些實施例並非用以限制本發明只能根據所述操作、環境、應用、結構、流程或步驟來實施。為了易於說明,與本發明的實施例無直接關聯的內容或是不需特別說明也能理解的內容,將於本文以及圖式中省略。於圖式中,各元件(element)的尺寸以及各元件之間的比例僅是範例,而非用以限制本發明。除了特別說明之外,在以下內容中,相同(或相近)的元件符號可對應至相同(或相近)的元件。在可被實現的情況下,如未特別說明,以下所述的每一個元件的數量可以是一個或多個。The present invention will be described below through various embodiments, but these embodiments are not intended to limit the present invention to only be implemented according to the described operations, environments, applications, structures, processes or steps. For ease of description, content not directly related to the embodiments of the present invention or content that can be understood without special description will be omitted from the text and the drawings. In the drawings, the size of each element and the ratio between each element are only examples, and are not intended to limit the present invention. Unless otherwise specified, in the following content, the same (or similar) element symbols may correspond to the same (or similar) elements. Where possible, the number of each of the elements described below may be one or more, unless otherwise specified.
本發明使用之用語僅用於描述實施例,並不意圖限制本發明的保護。除非上下文另有明確說明,否則單數形式「一」也旨在包括複數形式。「包括」、「包含」等用語指示所述特徵、整數、步驟、操作、元素及/或元件的存在,但並不排除一或多個其他特徵、整數、步驟、操作、元素、元件及/或前述之組合之存在。用語「及/或」包含一或多個相關所列項目的任何及所有的組合。The terms used in the present invention are only used to describe the embodiments, and are not intended to limit the protection of the present invention. The singular form "a" is intended to include the plural form as well, unless the context clearly dictates otherwise. The terms "comprising", "comprising" and the like indicate the presence of the stated features, integers, steps, operations, elements and/or elements, but do not exclude one or more other features, integers, steps, operations, elements, elements and/or elements or a combination of the foregoing. The term "and/or" includes any and all combinations of one or more of the associated listed items.
第1圖例示了根據本發明的一或多個實施例中的字元生成裝置。第1圖所示內容是為了舉例說明本發明的實施例,而非為了限制本發明的保護範圍。Figure 1 illustrates a character generation apparatus in accordance with one or more embodiments of the present invention. The content shown in FIG. 1 is for illustrating the embodiment of the present invention, rather than for limiting the protection scope of the present invention.
參照第1圖,用於建構測試資料的一字元生成裝置11基本上可包含一儲存器111以及一處理器112,且儲存器111可與處理器112電性連接。儲存器111與處理器112之間的電性連接可以是直接的(即沒有透過其他元件而彼此連接)或是間接的(即透過其他元件而彼此連接)。字元生成裝置11可以是各種類型的電子計算裝置,例如桌上型電腦、可攜式電腦、行動電話、可攜式電子配件(眼鏡、手錶等等)。字元生成裝置11每回合的運作皆可生成一字元,進而逐字元地建構出一測試資料,有關其具體的字元生成方式將隨後詳述。Referring to FIG. 1 , a character generating device 11 for constructing test data can basically include a storage 111 and a
在某些實施例中,由於字元生成裝置11可用於建構測試資料,故其可被包含於用於進行模糊測試的一模糊測試裝置100當中,而模糊測試裝置100可直接使用字元生成裝置11所建構的測試資料來對受測裝置進行測試。在某些其他實施例中,字元生成裝置11還可包含與處理器112及儲存器113電性連接的一收發器113,且此時字元生成裝置11可改為透過收發器113而與一模糊測試裝置101進行通訊,以將所建構的測試資料傳送置模糊測試裝置101。模糊測試裝置100與模糊測試裝置101可以同樣是各種類型的電子計算裝置。在某些其他實施例中,字元生成裝置11就是模糊測試裝置100,亦即,字元生成裝置11除了逐字元地建構測試資料之外,還可透過儲存器111、處理器112與收發器113的共同運作而使用所建構的測試資料來對該受測裝置進行測試。In some embodiments, since the character generation device 11 can be used to construct test data, it can be included in a fuzzing device 100 for fuzzing, and the fuzzing device 100 can directly use the character generation device 11 The constructed test data is used to test the device under test. In some other embodiments, the character generating device 11 may further include a
收發器113可用以與外部裝置進行有線或無線的通訊(例如前述的模糊測試裝置101)。在某些實施例中,收發器113可包含一傳送器(transmitter)與一接收器(receiver)。以無線通訊為例,該收發器可包含但不限於:天線、放大器、調變器、解調變器、偵測器、類比至數位轉換器、數位至類比轉換器等通訊元件。以有線通訊為例,該收發器可以是例如但不限於:一十億位元乙太網路收發器(gigabit Ethernet transceiver)、一十億位元乙太網路介面轉換器(gigabit interface converter,GBIC)、一小封裝可插拔收發器(small form-factor pluggable (SFP) transceiver)、一百億位元小封裝可插拔收發器(ten gigabit small form-factor pluggable (XFP) transceiver)等。The
儲存器111可用以儲存字元生成裝置11所產生的資料、外部裝置傳入的資料、或使用者自行輸入的資料。儲存器111可包含第一級記憶體(又稱主記憶體或內部記憶體),且處理器112可直接讀取儲存在第一級記憶體內的指令集,並在需要時執行這些指令集。儲存器111可選擇性地包含第二級記憶體(又稱外部記憶體或輔助記憶體),且此記憶體可透過資料緩衝器將儲存的資料傳送至第一級記憶體。舉例而言,第二級記憶體可以是但不限於:硬碟、光碟等。儲存器111可選擇性地包含第三級記憶體,亦即,可直接***或自電腦拔除的儲存裝置,例如隨身硬碟。The storage 111 can be used for storing data generated by the character generating device 11 , data input from an external device, or data input by the user. The storage 111 may include first-level memory (also known as main memory or internal memory), and the
儲存器111可至少儲存一字元預測模型M1以及一字元集C1。字元集C1可為由複數個字元所形成的一集合,且該等字元可為例如但不限於各種語言的字母、數字、空格、全形符號或半形符號等等。字元預測模型M1可用以根據一輸入字串或一輸入字元而預測接續該輸入字串或該輸入字元的下一個可能出現的字元,並輸出一字元預測向量,該字元預測向量可包含相應於複數個字元(例如:字元集C1當中的該等字元)接續該輸入字串或該輸入字元之後出現的複數個權重值。字元預測模型M1可以是基於深度神經網路架構的機器學習模型。The storage 111 can at least store a character prediction model M1 and a character set C1. The character set C1 can be a set formed by a plurality of characters, and the characters can be, for example, but not limited to, letters, numbers, spaces, full-shaped symbols or half-shaped symbols of various languages. The character prediction model M1 can be used to predict the next possible character that follows the input character string or the input character according to an input character string or an input character, and output a character prediction vector, the character prediction The vector may contain a plurality of weight values corresponding to a plurality of characters (eg, the characters in character set C1 ) that occur following the input string or the input characters. The character prediction model M1 may be a machine learning model based on a deep neural network architecture.
處理器112可以是具備訊號處理功能的微處理器(microprocessor)或微控制器(microcontroller)等。微處理器或微控制器是一種可程式化的特殊積體電路,其具有運算、儲存、輸出/輸入等能力,且可接受並處理各種編碼指令,藉以進行各種邏輯運算與算術運算,並輸出相應的運算結果。處理器112可被編程以解釋各種指令,以處理字元生成裝置11中的資料並執行各項運算程序或程式。The
第2A圖與第2B圖例示了根據本發明的一或多個實施例中的測試資料建構流程。第2A圖與第2B圖所示內容是為了舉例說明本發明的實施例,而非為了限制本發明的保護範圍。Figures 2A and 2B illustrate a test data construction process according to one or more embodiments of the present invention. The contents shown in FIG. 2A and FIG. 2B are for illustrating the embodiments of the present invention, rather than for limiting the protection scope of the present invention.
同時參照第1圖、第2A圖與第2B圖,處理器112可透過執行一測試資料建構流程2以與儲存器111(在某些實施例中還包含收發器113)共同運作而生成字元,進而建構一測試資料。測試資料建構流程2可包含複數個動作201~210。首先,於動作201中,處理器112可用以分析一受測裝置使用的複數個封包PK1、PK2的內容,以建立字元集C1。具體而言,封包PK1、PK2是由該受測裝置所接收/發送的封包,故其符合該受測裝置採用的文字式通訊協定(text-based communication protocols,例如但不限於超文本傳輸協定(HyperText Transfer Protocol,HTTP)、檔案傳輸協定(File Transfer Protocol,FTP)等等),而處理器112可對封包PK1、PK2中的每一者進行應用層過濾(application layer filtering),以擷取當中採用的複數個字元,並可據以建立字元集C1。在某些實施例中,收發器113可自外部(例如:該受測裝置本身或是與該受測裝置通訊的其他裝置)接收封包PK1、PK2。須說明,本揭露中關於封包PK1、PK2的數量描述僅是舉例說明而非限制,故本發明所屬技術領域中具有通常知識者可理解本揭露並未限制封包的數量須僅為二個。Referring to FIGS. 1 , 2A and 2B simultaneously, the
於動作202中,處理器112可根據封包PK1、PK2以及字元集C1來訓練一深度神經網路,以建立字元預測模型M1。具體而言,該深度神經網路可以是多層次的長短期記憶(Long Short-Term Memory)網路,故字元預測模型M1屬於一序列至序列(seq2seq)模型。有關使用包含諸多字元的封包來訓練深度學習網路進而獲得一字元預測模型的具體方式已為本發明所屬技術領域所習知,故不贅述。In
在某些實施例中,封包PK1、PK2可包含於該受測裝置的網路流量紀錄(network traffic trace)檔案當中,例如但不限於一「.pcap」檔案。在此情況下,字元預測模型M1的訓練可不必涉及關於該受測裝置內部的應用程式狀態,且字元生成裝置1也可僅根據該受測裝置的網路流量紀錄而生成建構測試資料的字元,無須存取該受測裝置的執行資料(execution trace)或其他機敏資料,藉此確保該受測裝置的資訊安全不會因本發明提供的測試資料建構流程2而遭遇風險。In some embodiments, the packets PK1 and PK2 may be included in a network traffic trace file of the device under test, such as but not limited to a ".pcap" file. In this case, the training of the character prediction model M1 does not need to involve the application state inside the device under test, and the character generation device 1 can also generate construction test data only according to the network traffic record of the device under test character, there is no need to access the execution trace or other sensitive data of the device under test, thereby ensuring that the information security of the device under test will not be at risk due to the test
完成字元預測模型M1的訓練之後,於動作203中處理器112可運行字元預測模型M1,以根據一當前輸出字串而產生一字元預測向量。具體而言,該當前輸出字串為訓練資料建構流程2結束後所輸出的一字串,其可做為該受測裝置的一筆訓練資料。該字元預測向量可包含複數個欄位,各欄位可對應至字元集C1當中的每一個字元,且可存放對應至該字元的一權重值。換言之,字元預測模型M1可根據該當前輸出字串而判斷字元集C1當中的每個字元接續在該當前輸出字串之後出現的可能性,並將其以權重值的形式表現於該字元預測向量的各欄位中。After completing the training of the character prediction model M1, in
於訓練資料建構流程2起始時,該當前輸出字串可包含根據封包PK1、PK2的文字內容所決定的一種子字元,而隨後處理器112可自該種子字元(例如:「G」)開始逐一生成其他字元(例如:「E」、「T」、空格等字元),最終於訓練資料建構流程2完成後可建構出一筆完整的測試資料。舉例而言,完整的測試資料可以是如下所示:
GET http://local host· 8080/tiendal/miembros/jmageoes/zacauz.jpg HTTP/1.1
User-Agent: Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.8 (like Gecko)
Pragma: no-cache
Cache-control: no-cache
Accept: text/xml,application/xml,application/xhtml+xml,text/ html;q=47xtext/html; q=0.9, text/plain; q=0.8,image/pn; q=0.5
Accept-Encoding: x-gzip, x-deflate, gzip, deflate
Accept-Charset: utf-8, utf-8;q=0.5, * ;q=0.5
Accept - Language: en
Host: localhost:8080
Cookie: JSIONID=D13O48O90gvhunn&email=wiss%40ngrzubote
Arasticsoos.govi&eo=7emiete&email=gulloy.ya)gma: ap,he-control: no-cache
Accept: text/xml,application/xml,application:8T808080/tienda1/publico/entrar.jsp HTTP/1.1
User-Agent: Moz illa/5.0 (compatible; Konqueror/3 . 5; Linux ) KHTML/3.5.8 (like Gecko)
Pragma: no - cache
Cache-control: no-cache
Accept:text/xml,application/ xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.B,image/ png, * / */*;q=0.5
Accept-Encoding: x-gzip, x-deflate, gzip, de~late
Accept-Charset: utf-8, utf-8;q=0.5, * ;q=0.5
Accept-Language: en
Host: localhost:8080CC994E10{compatible; Konqueror/3 .5; Linux) KHTML/3.5.8 (like Gecko)
Pragma: no - cache
Cache-control: no-cache
Accept: text/xml,application/xml,applicat ion/xhtml+xml,tex t /html;q=0.9, text/plain;q=0.B,image/png, * / * ;q=0.5
Accept-Encoding: x-gzip, x-deflate, gzip, de~late
Accept-Charset: utf-8, utf-8;q=0.5, * ;q=0.5
Accept-Language: en
Host: localhost:8080,image/png, */*;q=0.5
Accept-Encoding: x-gzip, x - deflate, gzip, deflate
Accept-Charset: utf-8, utf-8;q=0.5, * ;q=760O4183EFA38loccept-Encoding: x-gzip, x-deflate, gzip, deflate
Accept-Chelil+alccept : text/xml,application/xml,application/pae: JSESSIONID=2037082C55E238447DFCED79CA44O82C
At the beginning of the training
為便於說明,茲假設字元集C1包含全部英文字母的大寫與小寫共五十二個字元,且於起始狀態時該當前輸出字串僅包含該種子字元「O」,則處理器112可將該當前輸出字串輸入字元預測模型M1。字元預測模型M1可根據該當前輸出字串「O」而輸出包含五十二個欄位的該字元預測向量,且各欄位存有對應至字元集C1其中的一字元的一權重值。For the convenience of description, it is assumed that the character set C1 includes a total of fifty-two characters in uppercase and lowercase of all English letters, and in the initial state, the current output string only includes the seed character "O", then the
接著,於動作204中,處理器112可根據該字元預測向量而決定出一預測字元。具體而言,處理器112可於該字元預測向量所包含的複數個權重值當中找出最高者,並將該最高者在字元集C1當中對應的字元決定為該預測字元。舉例而言,在字元集C1中,對應到字元預測向量中的權重值最高的字元可為「N」,故處理器112可決定字元「N」為字元預測模型M1預測最有可能接續該當前輸出字串「O」而出現的該預測字元。Next, in act 204, the
隨後,於動作205中,處理器112可判斷相應於該預測字元的權重值是否小於一臨界值。由於處理器112分析過封包PK1、PK2,故該臨界值可由處理器112根據該受測裝置採用的協定的用字特性而界定。如前所述,相應於預測字元的權重值是否小於臨界值可用來判斷該預測字元是否屬於可做出變更的字元。若該預測字元的權重值小於該臨界值,則表示該預測字元對於受測裝置採用的協定格式而言出現的頻率較低,可能屬於允許變異的字元,故可將該預測字元改為字元集C1當中的其他字元。有鑑於此,若該預測字元(例如:「N」)的權重值(例如:「12」)小於該臨界值(例如:「20」),則於動作206中,處理器112可反轉該字元預測向量中的該等權重值,以增加字元集C1當中其他原先權重值較低的字元的輸出機率。具體而言,所述「反轉」可以是指任何可將複數個數值彼此間的大小關係互換的運算方式,例如:以同一實數減去各該權重值,並將該差值取代原本的權重值,使得原先相對較低的權重值被取代為相對較高的權重值。在某些實施例中,於反轉該等權重值之前,處理器112可先將該等權重值正規化(例如:透過一「Softmax」函式),以將各權重值均轉換為小於「1」的機率數值,而此時該臨界值亦可被設定為介於「0」和「1」之間的數值。Then, in
接著於動作207中,處理器112可根據反轉後的該等權重值而自字元集C1當中選取一字元做為一輸出字元。由於反轉後的該等權重值彼此間的大小關係已然相反,故處理器112根據反轉後的權重值進行選擇將更有機會選到該預測字元以外的其他字元來做為該輸出字元。在某些實施例中,處理器112是針對反轉後的該等權重值進行多項式分布抽樣,以選取該字元。Next, in
另一方面,若該預測字元(例如:「N」)的權重值(例如:「12」)不小於該臨界值(例如:「10」),則表示該預測字元對於受測裝置採用的協定而言出現的頻率較高,可能屬於格式中的固定字元,故倘若對其進行變異將容易造成測試資料格式不符的情況。在此情況下,處理器112可直接將該預測字元決定為該輸出字元。On the other hand, if the weight value (for example: "12") of the predicted character (for example: "N") is not less than the threshold value (for example: "10"), it means that the predicted character is used for the device under test. The frequency of occurrence is relatively high in terms of the protocol, and it may be a fixed character in the format, so if it is mutated, it will easily cause the test data format to be inconsistent. In this case, the
在決定該輸出字元之後,於動作209中,處理器112可將該輸出字元(例如:「N」)與該當前輸出字串(例如:「O」)進行串接,以更新該當前輸出字串。接著,於動作210中,處理器112可判斷該當前輸出字串的長度是否符合一指定長度。若該當前輸出字串的長度符合該指定長度(例如:150個字元),則表示該當前輸出字串已符合使用者對於測試資料的長度需求,故處理器112可終止測試資料建構流程2,且該當前輸出字串可被用做測試該受測裝置的一筆測試資料。反之,若該當前輸出字串的長度不符合該指定長度,則處理器112可返回動作203,並重新執行動作203至動作210,直到該當前輸出字串的長度符合該指定長度。After determining the output character, in
第3圖例示了根據本發明的一或多個實施例中的字元生成方法。第3圖所示內容是為了舉例說明本發明的實施例,而非為了限制本發明的保護範圍。Figure 3 illustrates a character generation method in accordance with one or more embodiments of the present invention. The content shown in FIG. 3 is for illustrating the embodiment of the present invention, rather than for limiting the protection scope of the present invention.
參照第3圖,用於建構測試資料的一字元生成方法3可由一電子計算裝置所執行。該電子計算裝置可儲存一字元預測模型以及一字元集,且該字元集可包含複數個字元。字元生成方法3可包含以下步驟:
運行該字元預測模型,以根據一當前輸出字串而產生該字元預測向量,其中該字元預測向量包含對應至該等字元的複數個權重值(標示為301);
根據該字元預測向量,自該字元集當中決定一預測字元(標示為302);
當判斷該等權重值中相應於該預測字元的權重值小於一臨界值時,反轉該等權重值,並根據反轉後的該等權重值而自該字元集當中選取一字元做為一輸出字元(標示為303);以及
當判斷該等權重值中相應於該預測字元的權重值不小於該臨界值時,將該預測字元做為該輸出字元(標示為304)。
Referring to FIG. 3, a
在某些實施例中,字元生成方法3還可包含以下步驟:
將該輸出字元與該當前輸出字串進行串接;以及
持續產生該輸出字元以及更新該當前輸出字串,直到該當前輸出字串的一長度符合一指定長度,並將該當前輸出字串做為一測試資料。
In some embodiments, the
在某些實施例中,字元生成方法3還可包含以下步驟:於判斷該等權重值中相應於該預測字元的權重值是否小於該臨界值之前,將該等權重值正規化,以使各權重值皆介於0和1之間,其中該臨界值亦介於0和1之間。In some embodiments, the
在某些實施例中,關於字元生成方法3,該電子計算裝置是根據該等權重值而針對該等字元進行一多項式分布抽樣,以選取該字元。In some embodiments, regarding the
在某些實施例中,字元生成方法3還可包含以下步驟:
自外部接收一受測裝置使用的複數個封包;以及
分析該等封包,以建立該字元集,其中該等封包符合一文字式通訊協定。
In some embodiments, the
在某些實施例中,字元生成方法3還可包含以下步驟:
自外部接收一受測裝置使用的複數個封包;
分析該等封包,以建立該字元集,其中該等封包符合一文字式通訊協定;以及
根據該等封包以及該字元集訓練一深度神經網路,以建立該字元預測模型。
In some embodiments, the
字元生成方法3的每一個實施例本質上都會與字元生成裝置11的某一個實施例相對應。因此,即使上文未針對字元生成方法3的每一個實施例進行詳述,本發明所屬技術領域中具有通常知識者仍可根據上文針對字元生成裝置11的說明而直接瞭解字元生成方法3的未詳述的實施例。Each embodiment of the
上述的字元生成方法3可被實作為一電腦程式產品。當該電腦程式產品被讀入該電子計算裝置時,包含於該電腦程式產品中的複數個程式指令可執行上述的字元生成方法3。該電腦程式產品可被儲存於一非暫態有形機器可讀媒介,例如但不限於一唯讀記憶體(read-only memory,ROM)、一快閃記憶體(flash memory)、一磁碟片(floppy disk)、一行動硬碟、一磁帶(magnetic tape)、可連網的一資料庫或任何其他為本發明所屬技術領域中具有通常知識者所熟知且具有相同功能的儲存媒介。The
上述實施例僅用來例舉本發明的部分實施態樣,以及闡釋本發明的技術特徵,而非用來限制本發明的保護範疇及範圍。任何本發明所屬技術領域中具有通常知識者可輕易完成的改變或均等性的安排均屬於本發明所主張的範圍,而本發明的權利保護範圍以申請專利範圍為準。The above embodiments are only used to illustrate some embodiments of the present invention and to illustrate the technical characteristics of the present invention, but are not used to limit the protection scope and scope of the present invention. Any changes or equivalent arrangements that can be easily accomplished by those with ordinary knowledge in the technical field to which the present invention pertains belong to the claimed scope of the present invention, and the protection scope of the present invention is subject to the scope of the patent application.
如下所示
100、101:模糊測試裝置
11:字元生成裝置
111:儲存器
112:處理器
113:收發器
2:測試資料建構流程
201、202、203、204、205、206、207、208、209、210:動作
3:字元生成方法
301、302、303、304:步驟
C1:字元集
M1:字元預測模型
PK1、PK2:封包As follows
100, 101: Fuzzing setup
11: Character generation device
111: Storage
112: Processor
113: Transceiver
2: Test
檢附的圖式可輔助說明本發明的各種實施例,其中: 第1圖例示了根據本發明的一或多個實施例中的字元生成裝置; 第2A圖與第2B圖例示了根據本發明的一或多個實施例中的測試資料建構流程;以及 第3圖例示了根據本發明的一或多個實施例中的字元生成方法。 The accompanying drawings assist in explaining various embodiments of the invention, in which: FIG. 1 illustrates a character generating apparatus in accordance with one or more embodiments of the present invention; FIGS. 2A and 2B illustrate a test data construction process in accordance with one or more embodiments of the present invention; and Figure 3 illustrates a character generation method in accordance with one or more embodiments of the present invention.
無。none.
3:字元生成方法 3: Character generation method
301、302、303、304:步驟 301, 302, 303, 304: Steps
Claims (18)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW109141188A TWI765426B (en) | 2020-11-24 | 2020-11-24 | Character-generating appartus, character-generating method and computer program thereof for building test data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
TW109141188A TWI765426B (en) | 2020-11-24 | 2020-11-24 | Character-generating appartus, character-generating method and computer program thereof for building test data |
Publications (2)
Publication Number | Publication Date |
---|---|
TWI765426B true TWI765426B (en) | 2022-05-21 |
TW202221555A TW202221555A (en) | 2022-06-01 |
Family
ID=82594474
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
TW109141188A TWI765426B (en) | 2020-11-24 | 2020-11-24 | Character-generating appartus, character-generating method and computer program thereof for building test data |
Country Status (1)
Country | Link |
---|---|
TW (1) | TWI765426B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103136098A (en) * | 2011-11-30 | 2013-06-05 | 西门子公司 | Method, device and system for fuzzing test |
TW201617771A (en) * | 2014-11-10 | 2016-05-16 | 財團法人資訊工業策進會 | Backup method, pre-testing method for enviornment updating and system thereof |
CN108470003A (en) * | 2018-03-24 | 2018-08-31 | 中科软评科技(北京)有限公司 | Fuzz testing methods, devices and systems |
US20180365139A1 (en) * | 2017-06-15 | 2018-12-20 | Microsoft Technology Licensing, Llc | Machine learning for constrained mutation-based fuzz testing |
-
2020
- 2020-11-24 TW TW109141188A patent/TWI765426B/en not_active IP Right Cessation
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103136098A (en) * | 2011-11-30 | 2013-06-05 | 西门子公司 | Method, device and system for fuzzing test |
TW201617771A (en) * | 2014-11-10 | 2016-05-16 | 財團法人資訊工業策進會 | Backup method, pre-testing method for enviornment updating and system thereof |
US20180365139A1 (en) * | 2017-06-15 | 2018-12-20 | Microsoft Technology Licensing, Llc | Machine learning for constrained mutation-based fuzz testing |
CN108470003A (en) * | 2018-03-24 | 2018-08-31 | 中科软评科技(北京)有限公司 | Fuzz testing methods, devices and systems |
Also Published As
Publication number | Publication date |
---|---|
TW202221555A (en) | 2022-06-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6154824B2 (en) | Boolean logic in state machine lattices | |
CN110990273B (en) | Clone code detection method and device | |
CN109873774B (en) | Network traffic identification method and device | |
CN111431819A (en) | Network traffic classification method and device based on serialized protocol flow characteristics | |
CN115169570A (en) | Quantum network protocol simulation method and device and electronic equipment | |
CN109714356A (en) | A kind of recognition methods of abnormal domain name, device and electronic equipment | |
CN107451106A (en) | Text method and device for correcting, electronic equipment | |
KR20230094956A (en) | Techniques for performing subject word classification of document data | |
TWI765426B (en) | Character-generating appartus, character-generating method and computer program thereof for building test data | |
Viotti et al. | A survey of JSON-compatible binary serialization specifications | |
TW202234277A (en) | Detection and mitigation of unstable cells in unclonable cell array | |
CN116644180A (en) | Training method and training system for text matching model and text label determining method | |
CN116136970B (en) | Stable sub-checking line construction method, quantum error correction decoding method and related equipment | |
CN115828269A (en) | Method, device, equipment and storage medium for constructing source code vulnerability detection model | |
CN113225213B (en) | Method and device for translating configuration file of network equipment and network simulation | |
CN116980356A (en) | Network traffic identification method and device, electronic equipment and storage medium | |
Xu et al. | Feature Extraction for Payload Classification: A Byte Pair Encoding Algorithm | |
CN113886593A (en) | Method for improving relation extraction performance by using reference dependence | |
CN112764791A (en) | Incremental updating malicious software detection method and system | |
US12008580B2 (en) | Natural language processing machine learning to convert service recommendations to actionable objects | |
CN116545779B (en) | Network security named entity recognition method, device, equipment and storage medium | |
Jameel et al. | Optimal topology search for fast model averaging in decentralized parallel SGD | |
CN114491621B (en) | Text object security detection method and equipment | |
JP2019213183A (en) | Clustering method, classification method, clustering apparatus, and classification apparatus | |
WO2022259330A1 (en) | Estimation device, estimation method, and estimation program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
MM4A | Annulment or lapse of patent due to non-payment of fees |