TWI765426B - Character-generating appartus, character-generating method and computer program thereof for building test data - Google Patents

Character-generating appartus, character-generating method and computer program thereof for building test data Download PDF

Info

Publication number
TWI765426B
TWI765426B TW109141188A TW109141188A TWI765426B TW I765426 B TWI765426 B TW I765426B TW 109141188 A TW109141188 A TW 109141188A TW 109141188 A TW109141188 A TW 109141188A TW I765426 B TWI765426 B TW I765426B
Authority
TW
Taiwan
Prior art keywords
character
weight values
predicted
current output
characters
Prior art date
Application number
TW109141188A
Other languages
Chinese (zh)
Other versions
TW202221555A (en
Inventor
李育杰
黃咨詠
郭勝騎
Original Assignee
安華聯網科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 安華聯網科技股份有限公司 filed Critical 安華聯網科技股份有限公司
Priority to TW109141188A priority Critical patent/TWI765426B/en
Application granted granted Critical
Publication of TWI765426B publication Critical patent/TWI765426B/en
Publication of TW202221555A publication Critical patent/TW202221555A/en

Links

Images

Landscapes

  • Debugging And Monitoring (AREA)

Abstract

A character-generating apparatus, method and computer program product thereof for building test data are disclosed. The apparatus stores a character prediction model and a character set including a plurality of characters. The apparatus executes the character prediction model to generate a character prediction vector including a plurality of a plurality of weight values corresponding to the characters. The apparatus also decides a predicted character from the character set according to the character prediction vector, and determines whether the weight value among the weight values corresponding to the prediction character is lower than a threshold value. When positive, the apparatus inverts the weight values, selects a character from the character set according to the weight values inverted, and use the character as output character. When negative, the apparatus uses the predicted character as output character when the weight of the predicted character is not less than the threshold value.

Description

用於建構測試資料的字元生成裝置、字元生成方法及其電腦程式產品Character generation device, character generation method and computer program product for constructing test data

本發明的實施例是關於一種字元生成裝置、字元生成方法及其電腦程式產品。具體而言,本發明的實施例是關於一種用於建構測試資料的字元生成裝置、字元生成方法及其電腦程式產品。Embodiments of the present invention relate to a character generating device, a character generating method and a computer program product thereof. Specifically, the embodiments of the present invention relate to a character generating apparatus for constructing test data, a character generating method and a computer program product thereof.

模糊測試是一種自動挖掘受測對象(例如:產線上的生產機具中運行的軟體)的安全漏洞的一種檢測方法,其透過發送測試資料至受測裝置來造成受測裝置發生錯誤,藉此找出受測裝置中存在的安全漏洞。傳統基於文字(text-based)的測試資料生成技術可透過深度學習模型分析受測裝置的封包來學習受測裝置採用的協定的封包格式,以逐字元地生成符合受測裝置所能接受格式的測試資料。在此情況下,倘若深度學習模型的訓練效果不佳,則容易出現所生成的測試資料因格式不符協定而不被受測裝置接受的情況。反之,倘若深度學習模型的訓練效果優良,也可能因其對於封包格式的高模仿精準度而出現所生成的訓練資料過於正常(即,無法造成受測裝置出現錯誤的情況),因而無法滿足模糊測試的需求。換言之,對於傳統的測試資料生成技術而言,即便其深度學習模型的訓練效果已然相當優良,其生成的訓練資料的有效性對於模糊測試而言亦有所不足。有鑑於此,本發明所屬技術領域亟需一種能夠為模糊測試逐字元地生成有效測試資料的方法。Fuzz testing is a detection method that automatically discovers the security vulnerabilities of the object under test (for example, the software running in the production equipment on the production line). identify security vulnerabilities in the device under test. The traditional text-based test data generation technology can analyze the packets of the device under test through a deep learning model to learn the agreed packet format adopted by the device under test, so as to generate a format that meets the requirements of the device under test character by character. test data. In this case, if the training effect of the deep learning model is not good, the generated test data may not be accepted by the device under test because the format does not conform to the agreement. Conversely, if the training effect of the deep learning model is good, the generated training data may be too normal (that is, cannot cause errors in the device under test) due to its high imitation accuracy for the packet format, so it cannot meet the fuzzy requirements. testing needs. In other words, for the traditional test data generation technology, even if the training effect of the deep learning model is quite good, the validity of the generated training data is insufficient for fuzzing. In view of this, there is an urgent need in the technical field to which the present invention pertains to a method capable of generating valid test data character-by-character for fuzzing testing.

為了解決至少上述的問題,本發明的實施例提供一種用於建構測試資料的字元生成裝置。該字元生成裝置可包含一儲存器以及與該儲存器電性連接的一處理器。該儲存器可用以儲存一字元預測模型以及一字元集,且該字元集可包含複數個字元。該處理器可用以運行一字元預測模型,以根據一當前輸出字串而產生一字元預測向量,該字元預測向量可包含對應至該等字元的複數個權重值。該處理器還可用以根據該字元預測向量自該字元集當中決定一預測字元,並判斷該等權重值中相應於該預測字元的權重值是否小於一臨界值。若相應於該預測字元的權重值小於該臨界值,則該處理器可反轉該等權重值,並根據反轉後的該等權重值而自該字元集當中選取一字元做為一輸出字元。若相應於該預測字元的權重值不小於該臨界值,則該處理器可將該預測字元做為該輸出字元。In order to solve at least the above problems, embodiments of the present invention provide a character generation device for constructing test data. The character generating device may include a storage and a processor electrically connected to the storage. The storage can be used to store a character prediction model and a character set, and the character set can include a plurality of characters. The processor is operable to run a character prediction model to generate a character prediction vector according to a current output string, and the character prediction vector may include a plurality of weight values corresponding to the characters. The processor is further configured to determine a predicted character from the character set according to the character prediction vector, and determine whether a weight value corresponding to the predicted character among the weight values is smaller than a threshold. If the weight value corresponding to the predicted character is less than the threshold value, the processor may reverse the weight value and select a character from the set of characters according to the reversed weight value An output character. If the weight value corresponding to the predicted word is not less than the threshold, the processor may use the predicted word as the output word.

為了至少解決上述的問題,本發明的實施例還提供了一種用於建構測試資料的字元生成方法。該字元生成方法可由一電子計算裝置所執行。該電子計算裝置可儲存一字元預測模型以及一字元集,且該字元集包含複數個字元。該字元生成方法可包含以下步驟: 運行該字元預測模型,以根據一當前輸出字串而產生一字元預測向量,其中該字元預測向量包含對應至該等字元的複數個權重值; 根據該字元預測向量,自該字元集當中決定一預測字元; 當判斷該等權重值中相應於該預測字元的權重值小於一臨界值時,反轉該等權重值,並根據反轉後的該等權重值而自該字元集當中選取一字元做為一輸出字元;以及 當判斷該等權重值中相應於該預測字元的權重值不小於該臨界值時,將該預測字元做為該輸出字元。 In order to at least solve the above problems, embodiments of the present invention also provide a character generation method for constructing test data. The character generation method can be executed by an electronic computing device. The electronic computing device can store a character prediction model and a character set, and the character set includes a plurality of characters. The character generation method can include the following steps: running the character prediction model to generate a character prediction vector according to a current output string, wherein the character prediction vector includes a plurality of weight values corresponding to the characters; determining a predicted character from the set of characters according to the character prediction vector; When it is determined that the weight value corresponding to the predicted character among the weight values is smaller than a threshold value, the weight values are reversed, and a character is selected from the character set according to the reverse weight values as an output character; and When it is determined that the weight value corresponding to the predicted character among the weight values is not less than the threshold value, the predicted character is used as the output character.

為了至少解決上述的問題,本發明的實施例還提供了一種電腦程式產品。一電子計算裝置載入該電腦程式產品所包含之複數個程式指令後可執行用於建構測試資料的一字元生成方法。該電子計算裝置可儲存一字元預測模型以及一字元集,且該字元集可包含複數個字元。該字元生成方法包含以下步驟: 運行該字元預測模型,以根據一當前輸出字串而產生該字元預測向量,其中該字元預測向量包含對應至該等字元的複數個權重值; 根據該字元預測向量,自該字元集當中決定一預測字元; 當判斷該等權重值中相應於該預測字元的權重值小於一臨界值時,反轉該等權重值,並根據反轉後的該等權重值而自該字元集當中選取一字元做為一輸出字元;以及 當判斷該等權重值中相應於該預測字元的權重值不小於該臨界值時,將該預測字元做為該輸出字元。 In order to solve at least the above problems, embodiments of the present invention also provide a computer program product. After loading a plurality of program instructions included in the computer program product, an electronic computing device can execute a character generation method for constructing test data. The electronic computing device can store a character prediction model and a character set, and the character set can include a plurality of characters. The character generation method consists of the following steps: running the character prediction model to generate the character prediction vector according to a current output string, wherein the character prediction vector includes a plurality of weight values corresponding to the characters; determining a predicted character from the set of characters according to the character prediction vector; When it is determined that the weight value corresponding to the predicted character among the weight values is smaller than a threshold value, the weight values are reversed, and a character is selected from the character set according to the reverse weight values as an output character; and When it is determined that the weight value corresponding to the predicted character among the weight values is not less than the threshold value, the predicted character is used as the output character.

綜上所述,本發明提供的字元生成方法透過判斷字元預測向量中相應於預測字元的權重值是否小於指定的一臨界值來判斷該預測字元是否屬於可做出變更的字元。若該預測字元的權重值不小於該臨界值,表示該預測字元對於受測裝置採用的協定格式而言出現的頻率較高,可能屬於格式中的固定字元(即,對其進行變異將造成測試資料格式不符的情況,例如「http://」等字串中的各個字元),故可直接輸出該預測字元。反之,若該預測字元的權重值小於該臨界值,表示該預測字元對於受測裝置採用的協定格式而言出現的頻率較低,可能屬於可變異的字元,故可反轉該等權重值,以增加字元集當中其他原先權重值較低的字元的輸出機率,進而達到變異的效果。據此,透過本發明提供的字元生成方法所建構的測試資料不但可符合協定格式,還可具有模糊測試為了造成受測裝置出現錯誤所需的測試資料變異性。是以,本發明提供的測試資料生成方法確實有效地解決了傳統基於文字的測試資料生成技術的上述問題。To sum up, the method for generating a character provided by the present invention determines whether the predicted character belongs to a changeable character by judging whether the weight value corresponding to the predicted character in the character prediction vector is less than a specified threshold value. . If the weight value of the predicted character is not less than the threshold value, it means that the predicted character has a relatively high frequency for the agreed format adopted by the device under test, and it may belong to the fixed character in the format (that is, mutating it It will cause the test data format to be inconsistent, such as each character in the string such as "http://"), so the predicted character can be output directly. Conversely, if the weight value of the predicted character is less than the threshold, it means that the predicted character appears less frequently in the protocol format adopted by the device under test, and may be a variable character, so the predicted characters can be reversed. The weight value is used to increase the output probability of other characters with lower weight values in the character set, so as to achieve the effect of mutation. Accordingly, the test data constructed by the character generation method provided by the present invention can not only conform to the agreed format, but also have the test data variability required by the fuzz test to cause errors in the device under test. Therefore, the test data generation method provided by the present invention effectively solves the above-mentioned problems of the traditional text-based test data generation technology.

以上內容並非為了限制本發明,而只是概括地敘述了本發明可解決的技術問題、可採用的技術手段以及可達到的技術功效,以讓本發明所屬技術領域中具有通常知識者初步地瞭解本發明。根據檢附的圖式及以下的實施方式所記載的內容,本發明所屬技術領域中具有通常知識者便可進一步瞭解本發明的各種實施例的細節。The above contents are not intended to limit the present invention, but merely describe the technical problems that can be solved by the present invention, the technical means that can be adopted and the technical effects that can be achieved, so that those with ordinary knowledge in the technical field to which the present invention belongs can have a preliminary understanding of the present invention. invention. Those with ordinary knowledge in the technical field to which the present invention pertains can further understand the details of various embodiments of the present invention according to the attached drawings and the contents described in the following embodiments.

以下將透過多個實施例來說明本發明,惟這些實施例並非用以限制本發明只能根據所述操作、環境、應用、結構、流程或步驟來實施。為了易於說明,與本發明的實施例無直接關聯的內容或是不需特別說明也能理解的內容,將於本文以及圖式中省略。於圖式中,各元件(element)的尺寸以及各元件之間的比例僅是範例,而非用以限制本發明。除了特別說明之外,在以下內容中,相同(或相近)的元件符號可對應至相同(或相近)的元件。在可被實現的情況下,如未特別說明,以下所述的每一個元件的數量可以是一個或多個。The present invention will be described below through various embodiments, but these embodiments are not intended to limit the present invention to only be implemented according to the described operations, environments, applications, structures, processes or steps. For ease of description, content not directly related to the embodiments of the present invention or content that can be understood without special description will be omitted from the text and the drawings. In the drawings, the size of each element and the ratio between each element are only examples, and are not intended to limit the present invention. Unless otherwise specified, in the following content, the same (or similar) element symbols may correspond to the same (or similar) elements. Where possible, the number of each of the elements described below may be one or more, unless otherwise specified.

本發明使用之用語僅用於描述實施例,並不意圖限制本發明的保護。除非上下文另有明確說明,否則單數形式「一」也旨在包括複數形式。「包括」、「包含」等用語指示所述特徵、整數、步驟、操作、元素及/或元件的存在,但並不排除一或多個其他特徵、整數、步驟、操作、元素、元件及/或前述之組合之存在。用語「及/或」包含一或多個相關所列項目的任何及所有的組合。The terms used in the present invention are only used to describe the embodiments, and are not intended to limit the protection of the present invention. The singular form "a" is intended to include the plural form as well, unless the context clearly dictates otherwise. The terms "comprising", "comprising" and the like indicate the presence of the stated features, integers, steps, operations, elements and/or elements, but do not exclude one or more other features, integers, steps, operations, elements, elements and/or elements or a combination of the foregoing. The term "and/or" includes any and all combinations of one or more of the associated listed items.

第1圖例示了根據本發明的一或多個實施例中的字元生成裝置。第1圖所示內容是為了舉例說明本發明的實施例,而非為了限制本發明的保護範圍。Figure 1 illustrates a character generation apparatus in accordance with one or more embodiments of the present invention. The content shown in FIG. 1 is for illustrating the embodiment of the present invention, rather than for limiting the protection scope of the present invention.

參照第1圖,用於建構測試資料的一字元生成裝置11基本上可包含一儲存器111以及一處理器112,且儲存器111可與處理器112電性連接。儲存器111與處理器112之間的電性連接可以是直接的(即沒有透過其他元件而彼此連接)或是間接的(即透過其他元件而彼此連接)。字元生成裝置11可以是各種類型的電子計算裝置,例如桌上型電腦、可攜式電腦、行動電話、可攜式電子配件(眼鏡、手錶等等)。字元生成裝置11每回合的運作皆可生成一字元,進而逐字元地建構出一測試資料,有關其具體的字元生成方式將隨後詳述。Referring to FIG. 1 , a character generating device 11 for constructing test data can basically include a storage 111 and a processor 112 , and the storage 111 can be electrically connected with the processor 112 . The electrical connection between the storage 111 and the processor 112 may be direct (ie, not connected to each other through other components) or indirect (ie, connected to each other through other components). The character generating device 11 may be various types of electronic computing devices, such as desktop computers, laptop computers, mobile phones, portable electronic accessories (glasses, watches, etc.). The character generating device 11 can generate a character in each round of operation, and then construct a test data character by character. The specific character generation method will be described in detail later.

在某些實施例中,由於字元生成裝置11可用於建構測試資料,故其可被包含於用於進行模糊測試的一模糊測試裝置100當中,而模糊測試裝置100可直接使用字元生成裝置11所建構的測試資料來對受測裝置進行測試。在某些其他實施例中,字元生成裝置11還可包含與處理器112及儲存器113電性連接的一收發器113,且此時字元生成裝置11可改為透過收發器113而與一模糊測試裝置101進行通訊,以將所建構的測試資料傳送置模糊測試裝置101。模糊測試裝置100與模糊測試裝置101可以同樣是各種類型的電子計算裝置。在某些其他實施例中,字元生成裝置11就是模糊測試裝置100,亦即,字元生成裝置11除了逐字元地建構測試資料之外,還可透過儲存器111、處理器112與收發器113的共同運作而使用所建構的測試資料來對該受測裝置進行測試。In some embodiments, since the character generation device 11 can be used to construct test data, it can be included in a fuzzing device 100 for fuzzing, and the fuzzing device 100 can directly use the character generation device 11 The constructed test data is used to test the device under test. In some other embodiments, the character generating device 11 may further include a transceiver 113 electrically connected to the processor 112 and the storage 113 , and at this time, the character generating device 11 may be changed to communicate with the transceiver 113 through the transceiver 113 . A fuzzing device 101 communicates to transmit the constructed test data to the fuzzing device 101 . The fuzzing device 100 and the fuzzing device 101 may likewise be various types of electronic computing devices. In some other embodiments, the character generation device 11 is the fuzzing device 100 , that is, the character generation device 11 not only constructs test data character by character, but also transmits and receives data through the storage 111 , the processor 112 The device under test is tested using the constructed test data in conjunction with the operation of the device 113 .

收發器113可用以與外部裝置進行有線或無線的通訊(例如前述的模糊測試裝置101)。在某些實施例中,收發器113可包含一傳送器(transmitter)與一接收器(receiver)。以無線通訊為例,該收發器可包含但不限於:天線、放大器、調變器、解調變器、偵測器、類比至數位轉換器、數位至類比轉換器等通訊元件。以有線通訊為例,該收發器可以是例如但不限於:一十億位元乙太網路收發器(gigabit Ethernet transceiver)、一十億位元乙太網路介面轉換器(gigabit interface converter,GBIC)、一小封裝可插拔收發器(small form-factor pluggable (SFP) transceiver)、一百億位元小封裝可插拔收發器(ten gigabit small form-factor pluggable (XFP) transceiver)等。The transceiver 113 can be used for wired or wireless communication with an external device (eg, the aforementioned fuzzing device 101 ). In some embodiments, the transceiver 113 may include a transmitter and a receiver. Taking wireless communication as an example, the transceiver may include, but is not limited to, communication components such as antennas, amplifiers, modulators, demodulators, detectors, analog-to-digital converters, and digital-to-analog converters. Taking wired communication as an example, the transceiver can be, for example, but not limited to, a gigabit Ethernet transceiver, a gigabit interface converter, GBIC), small form-factor pluggable (SFP) transceiver, ten gigabit small form-factor pluggable (XFP) transceiver, etc.

儲存器111可用以儲存字元生成裝置11所產生的資料、外部裝置傳入的資料、或使用者自行輸入的資料。儲存器111可包含第一級記憶體(又稱主記憶體或內部記憶體),且處理器112可直接讀取儲存在第一級記憶體內的指令集,並在需要時執行這些指令集。儲存器111可選擇性地包含第二級記憶體(又稱外部記憶體或輔助記憶體),且此記憶體可透過資料緩衝器將儲存的資料傳送至第一級記憶體。舉例而言,第二級記憶體可以是但不限於:硬碟、光碟等。儲存器111可選擇性地包含第三級記憶體,亦即,可直接***或自電腦拔除的儲存裝置,例如隨身硬碟。The storage 111 can be used for storing data generated by the character generating device 11 , data input from an external device, or data input by the user. The storage 111 may include first-level memory (also known as main memory or internal memory), and the processor 112 may directly read instruction sets stored in the first-level memory and execute these instruction sets as needed. The storage 111 can optionally include a second-level memory (also called external memory or auxiliary memory), and this memory can transfer stored data to the first-level memory through a data buffer. For example, the second-level memory can be, but not limited to, a hard disk, an optical disk, and the like. The storage 111 may optionally include tertiary memory, ie, a storage device that can be directly inserted into or removed from the computer, such as a flash drive.

儲存器111可至少儲存一字元預測模型M1以及一字元集C1。字元集C1可為由複數個字元所形成的一集合,且該等字元可為例如但不限於各種語言的字母、數字、空格、全形符號或半形符號等等。字元預測模型M1可用以根據一輸入字串或一輸入字元而預測接續該輸入字串或該輸入字元的下一個可能出現的字元,並輸出一字元預測向量,該字元預測向量可包含相應於複數個字元(例如:字元集C1當中的該等字元)接續該輸入字串或該輸入字元之後出現的複數個權重值。字元預測模型M1可以是基於深度神經網路架構的機器學習模型。The storage 111 can at least store a character prediction model M1 and a character set C1. The character set C1 can be a set formed by a plurality of characters, and the characters can be, for example, but not limited to, letters, numbers, spaces, full-shaped symbols or half-shaped symbols of various languages. The character prediction model M1 can be used to predict the next possible character that follows the input character string or the input character according to an input character string or an input character, and output a character prediction vector, the character prediction The vector may contain a plurality of weight values corresponding to a plurality of characters (eg, the characters in character set C1 ) that occur following the input string or the input characters. The character prediction model M1 may be a machine learning model based on a deep neural network architecture.

處理器112可以是具備訊號處理功能的微處理器(microprocessor)或微控制器(microcontroller)等。微處理器或微控制器是一種可程式化的特殊積體電路,其具有運算、儲存、輸出/輸入等能力,且可接受並處理各種編碼指令,藉以進行各種邏輯運算與算術運算,並輸出相應的運算結果。處理器112可被編程以解釋各種指令,以處理字元生成裝置11中的資料並執行各項運算程序或程式。The processor 112 may be a microprocessor or a microcontroller with a signal processing function. Microprocessor or microcontroller is a programmable special integrated circuit, which has the capabilities of operation, storage, output/input, etc., and can accept and process various coded instructions, so as to perform various logical operations and arithmetic operations, and output corresponding operation result. The processor 112 can be programmed to interpret various instructions to process data in the character generation device 11 and execute various operational procedures or programs.

第2A圖與第2B圖例示了根據本發明的一或多個實施例中的測試資料建構流程。第2A圖與第2B圖所示內容是為了舉例說明本發明的實施例,而非為了限制本發明的保護範圍。Figures 2A and 2B illustrate a test data construction process according to one or more embodiments of the present invention. The contents shown in FIG. 2A and FIG. 2B are for illustrating the embodiments of the present invention, rather than for limiting the protection scope of the present invention.

同時參照第1圖、第2A圖與第2B圖,處理器112可透過執行一測試資料建構流程2以與儲存器111(在某些實施例中還包含收發器113)共同運作而生成字元,進而建構一測試資料。測試資料建構流程2可包含複數個動作201~210。首先,於動作201中,處理器112可用以分析一受測裝置使用的複數個封包PK1、PK2的內容,以建立字元集C1。具體而言,封包PK1、PK2是由該受測裝置所接收/發送的封包,故其符合該受測裝置採用的文字式通訊協定(text-based communication protocols,例如但不限於超文本傳輸協定(HyperText Transfer Protocol,HTTP)、檔案傳輸協定(File Transfer Protocol,FTP)等等),而處理器112可對封包PK1、PK2中的每一者進行應用層過濾(application layer filtering),以擷取當中採用的複數個字元,並可據以建立字元集C1。在某些實施例中,收發器113可自外部(例如:該受測裝置本身或是與該受測裝置通訊的其他裝置)接收封包PK1、PK2。須說明,本揭露中關於封包PK1、PK2的數量描述僅是舉例說明而非限制,故本發明所屬技術領域中具有通常知識者可理解本揭露並未限制封包的數量須僅為二個。Referring to FIGS. 1 , 2A and 2B simultaneously, the processor 112 may generate characters by executing a test data construction process 2 to operate with the memory 111 (and in some embodiments, the transceiver 113 ) , and then construct a test data. The test data construction process 2 may include a plurality of actions 201 to 210 . First, in act 201, the processor 112 may be configured to analyze the contents of a plurality of packets PK1 and PK2 used by a device under test to create a character set C1. Specifically, the packets PK1 and PK2 are packets received/sent by the device under test, so they conform to the text-based communication protocols (such as but not limited to the hypertext transfer protocol) adopted by the device under test. HyperText Transfer Protocol (HTTP), File Transfer Protocol (FTP), etc.), and the processor 112 may perform application layer filtering (application layer filtering) on each of the packets PK1, PK2 to extract the A plurality of characters are used, and a character set C1 can be created accordingly. In some embodiments, the transceiver 113 may receive packets PK1 and PK2 from outside (eg, the device under test itself or other devices communicating with the device under test). It should be noted that the description about the number of packets PK1 and PK2 in the present disclosure is only an example and not a limitation, so those skilled in the art to which the present invention pertains can understand that the present disclosure does not limit the number of packets to only two.

於動作202中,處理器112可根據封包PK1、PK2以及字元集C1來訓練一深度神經網路,以建立字元預測模型M1。具體而言,該深度神經網路可以是多層次的長短期記憶(Long Short-Term Memory)網路,故字元預測模型M1屬於一序列至序列(seq2seq)模型。有關使用包含諸多字元的封包來訓練深度學習網路進而獲得一字元預測模型的具體方式已為本發明所屬技術領域所習知,故不贅述。In act 202, the processor 112 may train a deep neural network according to the packets PK1, PK2 and the character set C1 to establish a character prediction model M1. Specifically, the deep neural network may be a multi-level Long Short-Term Memory (Long Short-Term Memory) network, so the character prediction model M1 belongs to a sequence-to-sequence (seq2seq) model. The specific manner of using a packet containing many characters to train a deep learning network to obtain a character prediction model is well known in the technical field to which the present invention pertains, so it will not be repeated.

在某些實施例中,封包PK1、PK2可包含於該受測裝置的網路流量紀錄(network traffic trace)檔案當中,例如但不限於一「.pcap」檔案。在此情況下,字元預測模型M1的訓練可不必涉及關於該受測裝置內部的應用程式狀態,且字元生成裝置1也可僅根據該受測裝置的網路流量紀錄而生成建構測試資料的字元,無須存取該受測裝置的執行資料(execution trace)或其他機敏資料,藉此確保該受測裝置的資訊安全不會因本發明提供的測試資料建構流程2而遭遇風險。In some embodiments, the packets PK1 and PK2 may be included in a network traffic trace file of the device under test, such as but not limited to a ".pcap" file. In this case, the training of the character prediction model M1 does not need to involve the application state inside the device under test, and the character generation device 1 can also generate construction test data only according to the network traffic record of the device under test character, there is no need to access the execution trace or other sensitive data of the device under test, thereby ensuring that the information security of the device under test will not be at risk due to the test data construction process 2 provided by the present invention.

完成字元預測模型M1的訓練之後,於動作203中處理器112可運行字元預測模型M1,以根據一當前輸出字串而產生一字元預測向量。具體而言,該當前輸出字串為訓練資料建構流程2結束後所輸出的一字串,其可做為該受測裝置的一筆訓練資料。該字元預測向量可包含複數個欄位,各欄位可對應至字元集C1當中的每一個字元,且可存放對應至該字元的一權重值。換言之,字元預測模型M1可根據該當前輸出字串而判斷字元集C1當中的每個字元接續在該當前輸出字串之後出現的可能性,並將其以權重值的形式表現於該字元預測向量的各欄位中。After completing the training of the character prediction model M1, in act 203, the processor 112 may run the character prediction model M1 to generate a character prediction vector according to a current output string. Specifically, the current output string is a string output after the training data construction process 2 is completed, which can be used as a piece of training data for the device under test. The character prediction vector may include a plurality of fields, each field may correspond to each character in the character set C1, and may store a weight value corresponding to the character. In other words, the character prediction model M1 can determine the possibility of each character in the character set C1 appearing after the current output character string according to the current output character string, and express it in the form of weight value in the in each field of the character prediction vector.

於訓練資料建構流程2起始時,該當前輸出字串可包含根據封包PK1、PK2的文字內容所決定的一種子字元,而隨後處理器112可自該種子字元(例如:「G」)開始逐一生成其他字元(例如:「E」、「T」、空格等字元),最終於訓練資料建構流程2完成後可建構出一筆完整的測試資料。舉例而言,完整的測試資料可以是如下所示: GET http://local host· 8080/tiendal/miembros/jmageoes/zacauz.jpg HTTP/1.1 User-Agent: Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.8 (like Gecko) Pragma: no-cache Cache-control: no-cache Accept: text/xml,application/xml,application/xhtml+xml,text/ html;q=47xtext/html; q=0.9, text/plain; q=0.8,image/pn; q=0.5 Accept-Encoding: x-gzip, x-deflate, gzip, deflate Accept-Charset: utf-8, utf-8;q=0.5, * ;q=0.5 Accept - Language: en Host: localhost:8080 Cookie: JSIONID=D13O48O90gvhunn&email=wiss%40ngrzubote Arasticsoos.govi&eo=7emiete&email=gulloy.ya)gma: ap,he-control: no-cache Accept: text/xml,application/xml,application:8T808080/tienda1/publico/entrar.jsp HTTP/1.1 User-Agent: Moz illa/5.0 (compatible; Konqueror/3 . 5; Linux ) KHTML/3.5.8 (like Gecko) Pragma: no - cache Cache-control: no-cache Accept:text/xml,application/ xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.B,image/ png, * / */*;q=0.5 Accept-Encoding: x-gzip, x-deflate, gzip, de~late Accept-Charset: utf-8, utf-8;q=0.5, * ;q=0.5 Accept-Language: en Host: localhost:8080CC994E10{compatible; Konqueror/3 .5; Linux) KHTML/3.5.8 (like Gecko) Pragma: no - cache Cache-control: no-cache Accept: text/xml,application/xml,applicat ion/xhtml+xml,tex t /html;q=0.9, text/plain;q=0.B,image/png, * / * ;q=0.5 Accept-Encoding: x-gzip, x-deflate, gzip, de~late Accept-Charset: utf-8, utf-8;q=0.5, * ;q=0.5 Accept-Language: en Host: localhost:8080,image/png, */*;q=0.5 Accept-Encoding: x-gzip, x - deflate, gzip, deflate Accept-Charset: utf-8, utf-8;q=0.5, * ;q=760O4183EFA38loccept-Encoding: x-gzip, x-deflate, gzip, deflate Accept-Chelil+alccept : text/xml,application/xml,application/pae: JSESSIONID=2037082C55E238447DFCED79CA44O82C At the beginning of the training data construction process 2, the current output string may include a sub-character determined according to the text content of the packets PK1, PK2, and then the processor 112 may extract the sub-character (eg, "G" from the sub-character ) starts to generate other characters (for example: "E", "T", spaces, etc.) one by one, and finally a complete test data can be constructed after the training data construction process 2 is completed. For example, a complete test profile could look like this: GET http://local host 8080/tiendal/miembros/jmageoes/zacauz.jpg HTTP/1.1 User-Agent: Mozilla/5.0 (compatible; Konqueror/3.5; Linux) KHTML/3.5.8 (like Gecko) Pragma: no-cache Cache-control: no-cache Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=47xtext/html;q=0.9,text/plain;q=0.8,image/pn;q=0.5 Accept-Encoding: x-gzip, x-deflate, gzip, deflate Accept-Charset: utf-8, utf-8;q=0.5, * ;q=0.5 Accept - Language: en Host: localhost:8080 Cookie: JSIONID=D13O48O90gvhunn&email=wiss%40ngrzubote Arasticsoos.govi&eo=7emiete&email=gulloy.ya)gma: ap,he-control: no-cache Accept: text/xml,application/xml,application:8T808080/tienda1/publico/entrar.jsp HTTP/1.1 User-Agent: Mozilla/5.0 (compatible; Konqueror/3 . 5; Linux ) KHTML/3.5.8 (like Gecko) Pragma: no - cache Cache-control: no-cache Accept:text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.B,image/ png,* / */*;q=0.5 Accept-Encoding: x-gzip, x-deflate, gzip, de~late Accept-Charset: utf-8, utf-8;q=0.5, * ;q=0.5 Accept-Language: en Host: localhost:8080CC994E10{compatible; Konqueror/3.5; Linux) KHTML/3.5.8 (like Gecko) Pragma: no - cache Cache-control: no-cache Accept: text/xml,application/xml,application/xhtml+xml,text /html;q=0.9, text/plain;q=0.B,image/png, * / * ;q=0.5 Accept-Encoding: x-gzip, x-deflate, gzip, de~late Accept-Charset: utf-8, utf-8;q=0.5, * ;q=0.5 Accept-Language: en Host: localhost:8080,image/png, */*;q=0.5 Accept-Encoding: x-gzip, x - deflate, gzip, deflate Accept-Charset: utf-8, utf-8;q=0.5, * ;q=760O4183EFA38loccept-Encoding: x-gzip, x-deflate, gzip, deflate Accept-Chelil+alccept : text/xml,application/xml,application/pae: JSESSIONID=2037082C55E238447DFCED79CA44O82C

為便於說明,茲假設字元集C1包含全部英文字母的大寫與小寫共五十二個字元,且於起始狀態時該當前輸出字串僅包含該種子字元「O」,則處理器112可將該當前輸出字串輸入字元預測模型M1。字元預測模型M1可根據該當前輸出字串「O」而輸出包含五十二個欄位的該字元預測向量,且各欄位存有對應至字元集C1其中的一字元的一權重值。For the convenience of description, it is assumed that the character set C1 includes a total of fifty-two characters in uppercase and lowercase of all English letters, and in the initial state, the current output string only includes the seed character "O", then the processor 112 may input the current output string into the character prediction model M1. The character prediction model M1 can output the character prediction vector including fifty-two fields according to the current output string "0", and each field stores a corresponding to a character in the character set C1. Weights.

接著,於動作204中,處理器112可根據該字元預測向量而決定出一預測字元。具體而言,處理器112可於該字元預測向量所包含的複數個權重值當中找出最高者,並將該最高者在字元集C1當中對應的字元決定為該預測字元。舉例而言,在字元集C1中,對應到字元預測向量中的權重值最高的字元可為「N」,故處理器112可決定字元「N」為字元預測模型M1預測最有可能接續該當前輸出字串「O」而出現的該預測字元。Next, in act 204, the processor 112 may determine a prediction word according to the word prediction vector. Specifically, the processor 112 may find the highest value among the plurality of weight values included in the character prediction vector, and determine the character corresponding to the highest value in the character set C1 as the prediction character. For example, in the character set C1, the character with the highest weight value corresponding to the character prediction vector may be "N", so the processor 112 may determine that the character "N" is the character predicted by the character prediction model M1 with the highest weight value. It is possible for the prediction character to appear following the current output string "O".

隨後,於動作205中,處理器112可判斷相應於該預測字元的權重值是否小於一臨界值。由於處理器112分析過封包PK1、PK2,故該臨界值可由處理器112根據該受測裝置採用的協定的用字特性而界定。如前所述,相應於預測字元的權重值是否小於臨界值可用來判斷該預測字元是否屬於可做出變更的字元。若該預測字元的權重值小於該臨界值,則表示該預測字元對於受測裝置採用的協定格式而言出現的頻率較低,可能屬於允許變異的字元,故可將該預測字元改為字元集C1當中的其他字元。有鑑於此,若該預測字元(例如:「N」)的權重值(例如:「12」)小於該臨界值(例如:「20」),則於動作206中,處理器112可反轉該字元預測向量中的該等權重值,以增加字元集C1當中其他原先權重值較低的字元的輸出機率。具體而言,所述「反轉」可以是指任何可將複數個數值彼此間的大小關係互換的運算方式,例如:以同一實數減去各該權重值,並將該差值取代原本的權重值,使得原先相對較低的權重值被取代為相對較高的權重值。在某些實施例中,於反轉該等權重值之前,處理器112可先將該等權重值正規化(例如:透過一「Softmax」函式),以將各權重值均轉換為小於「1」的機率數值,而此時該臨界值亦可被設定為介於「0」和「1」之間的數值。Then, in act 205, the processor 112 may determine whether the weight value corresponding to the prediction word is less than a threshold value. Since the processor 112 has analyzed the packets PK1 and PK2, the threshold can be defined by the processor 112 according to the wording characteristics of the protocol used by the device under test. As mentioned above, whether the weight value corresponding to the predicted character is smaller than the threshold value can be used to determine whether the predicted character belongs to the character that can be changed. If the weight value of the predicted character is less than the threshold, it means that the predicted character appears less frequently in the protocol format adopted by the device under test, and may be a character that allows variation, so the predicted character can be Change to another character in character set C1. In view of this, if the weight value (eg, "12") of the predicted character (eg, "N") is less than the threshold (eg, "20"), then in act 206, the processor 112 may reverse the The weight values in the character prediction vector increase the output probability of other characters in the character set C1 with lower weight values. Specifically, the "inversion" can refer to any operation method that can exchange the magnitude relationship between a plurality of values, for example: subtracting each of the weight values from the same real number, and replacing the original weight with the difference value, so that the original relatively low weight value is replaced by a relatively high weight value. In some embodiments, before inverting the weight values, the processor 112 may first normalize the weight values (eg, through a "Softmax" function) to convert each weight value to be less than " 1", and the threshold can also be set to a value between "0" and "1".

接著於動作207中,處理器112可根據反轉後的該等權重值而自字元集C1當中選取一字元做為一輸出字元。由於反轉後的該等權重值彼此間的大小關係已然相反,故處理器112根據反轉後的權重值進行選擇將更有機會選到該預測字元以外的其他字元來做為該輸出字元。在某些實施例中,處理器112是針對反轉後的該等權重值進行多項式分布抽樣,以選取該字元。Next, in act 207, the processor 112 may select a character from the character set C1 as an output character according to the inverted weight values. Since the magnitude relationship between the inverted weight values is already opposite to each other, the processor 112 selects according to the inverted weight value and has a better chance to select other characters other than the predicted character as the output character. In some embodiments, the processor 112 performs polynomial distribution sampling on the inverted weight values to select the character.

另一方面,若該預測字元(例如:「N」)的權重值(例如:「12」)不小於該臨界值(例如:「10」),則表示該預測字元對於受測裝置採用的協定而言出現的頻率較高,可能屬於格式中的固定字元,故倘若對其進行變異將容易造成測試資料格式不符的情況。在此情況下,處理器112可直接將該預測字元決定為該輸出字元。On the other hand, if the weight value (for example: "12") of the predicted character (for example: "N") is not less than the threshold value (for example: "10"), it means that the predicted character is used for the device under test. The frequency of occurrence is relatively high in terms of the protocol, and it may be a fixed character in the format, so if it is mutated, it will easily cause the test data format to be inconsistent. In this case, the processor 112 can directly determine the predicted character as the output character.

在決定該輸出字元之後,於動作209中,處理器112可將該輸出字元(例如:「N」)與該當前輸出字串(例如:「O」)進行串接,以更新該當前輸出字串。接著,於動作210中,處理器112可判斷該當前輸出字串的長度是否符合一指定長度。若該當前輸出字串的長度符合該指定長度(例如:150個字元),則表示該當前輸出字串已符合使用者對於測試資料的長度需求,故處理器112可終止測試資料建構流程2,且該當前輸出字串可被用做測試該受測裝置的一筆測試資料。反之,若該當前輸出字串的長度不符合該指定長度,則處理器112可返回動作203,並重新執行動作203至動作210,直到該當前輸出字串的長度符合該指定長度。After determining the output character, in act 209, the processor 112 may concatenate the output character (eg, "N") with the current output string (eg, "O") to update the current output character output string. Next, in act 210, the processor 112 may determine whether the length of the current output string conforms to a specified length. If the length of the current output string conforms to the specified length (for example: 150 characters), it means that the current output string has met the length requirement of the user for the test data, so the processor 112 can terminate the test data construction process 2 , and the current output string can be used as a test data for testing the device under test. On the contrary, if the length of the current output string does not meet the specified length, the processor 112 may return to act 203 and re-execute actions 203 to 210 until the length of the current output string meets the specified length.

第3圖例示了根據本發明的一或多個實施例中的字元生成方法。第3圖所示內容是為了舉例說明本發明的實施例,而非為了限制本發明的保護範圍。Figure 3 illustrates a character generation method in accordance with one or more embodiments of the present invention. The content shown in FIG. 3 is for illustrating the embodiment of the present invention, rather than for limiting the protection scope of the present invention.

參照第3圖,用於建構測試資料的一字元生成方法3可由一電子計算裝置所執行。該電子計算裝置可儲存一字元預測模型以及一字元集,且該字元集可包含複數個字元。字元生成方法3可包含以下步驟: 運行該字元預測模型,以根據一當前輸出字串而產生該字元預測向量,其中該字元預測向量包含對應至該等字元的複數個權重值(標示為301); 根據該字元預測向量,自該字元集當中決定一預測字元(標示為302); 當判斷該等權重值中相應於該預測字元的權重值小於一臨界值時,反轉該等權重值,並根據反轉後的該等權重值而自該字元集當中選取一字元做為一輸出字元(標示為303);以及 當判斷該等權重值中相應於該預測字元的權重值不小於該臨界值時,將該預測字元做為該輸出字元(標示為304)。 Referring to FIG. 3, a character generation method 3 for constructing test data may be performed by an electronic computing device. The electronic computing device can store a character prediction model and a character set, and the character set can include a plurality of characters. Character generation method 3 may include the following steps: running the character prediction model to generate the character prediction vector according to a current output string, wherein the character prediction vector includes a plurality of weight values (marked as 301 ) corresponding to the characters; According to the character prediction vector, determine a prediction character (marked as 302) from the character set; When it is determined that the weight value corresponding to the predicted character among the weight values is smaller than a threshold value, the weight values are reversed, and a character is selected from the character set according to the reverse weight values as an output character (labeled 303); and When it is determined that the weight value corresponding to the predicted character among the weight values is not less than the threshold value, the predicted character is regarded as the output character (marked as 304 ).

在某些實施例中,字元生成方法3還可包含以下步驟: 將該輸出字元與該當前輸出字串進行串接;以及 持續產生該輸出字元以及更新該當前輸出字串,直到該當前輸出字串的一長度符合一指定長度,並將該當前輸出字串做為一測試資料。 In some embodiments, the character generation method 3 may further include the following steps: concatenate the output character with the current output string; and The output character is continuously generated and the current output string is updated until a length of the current output string meets a specified length, and the current output string is used as a test data.

在某些實施例中,字元生成方法3還可包含以下步驟:於判斷該等權重值中相應於該預測字元的權重值是否小於該臨界值之前,將該等權重值正規化,以使各權重值皆介於0和1之間,其中該臨界值亦介於0和1之間。In some embodiments, the character generation method 3 may further include the following step: before judging whether the weight value corresponding to the predicted character among the weight values is smaller than the threshold value, normalizing the weight values to obtain Make each weight value between 0 and 1, where the critical value is also between 0 and 1.

在某些實施例中,關於字元生成方法3,該電子計算裝置是根據該等權重值而針對該等字元進行一多項式分布抽樣,以選取該字元。In some embodiments, regarding the character generation method 3, the electronic computing device performs a polynomial distribution sampling on the characters according to the weight values to select the character.

在某些實施例中,字元生成方法3還可包含以下步驟: 自外部接收一受測裝置使用的複數個封包;以及 分析該等封包,以建立該字元集,其中該等封包符合一文字式通訊協定。 In some embodiments, the character generation method 3 may further include the following steps: receiving from the outside a plurality of packets used by a device under test; and The packets are analyzed to create the character set, wherein the packets conform to a textual communication protocol.

在某些實施例中,字元生成方法3還可包含以下步驟: 自外部接收一受測裝置使用的複數個封包; 分析該等封包,以建立該字元集,其中該等封包符合一文字式通訊協定;以及 根據該等封包以及該字元集訓練一深度神經網路,以建立該字元預測模型。 In some embodiments, the character generation method 3 may further include the following steps: Receive a plurality of packets used by a device under test from outside; analyzing the packets to create the character set, wherein the packets conform to a textual protocol; and A deep neural network is trained based on the packets and the set of characters to build the character prediction model.

字元生成方法3的每一個實施例本質上都會與字元生成裝置11的某一個實施例相對應。因此,即使上文未針對字元生成方法3的每一個實施例進行詳述,本發明所屬技術領域中具有通常知識者仍可根據上文針對字元生成裝置11的說明而直接瞭解字元生成方法3的未詳述的實施例。Each embodiment of the character generation method 3 essentially corresponds to a certain embodiment of the character generation device 11 . Therefore, even if each embodiment of the character generation method 3 is not described in detail above, those with ordinary knowledge in the technical field to which the present invention pertains can still directly understand the character generation according to the description of the character generation device 11 above. Non-detailed example of method 3.

上述的字元生成方法3可被實作為一電腦程式產品。當該電腦程式產品被讀入該電子計算裝置時,包含於該電腦程式產品中的複數個程式指令可執行上述的字元生成方法3。該電腦程式產品可被儲存於一非暫態有形機器可讀媒介,例如但不限於一唯讀記憶體(read-only memory,ROM)、一快閃記憶體(flash memory)、一磁碟片(floppy disk)、一行動硬碟、一磁帶(magnetic tape)、可連網的一資料庫或任何其他為本發明所屬技術領域中具有通常知識者所熟知且具有相同功能的儲存媒介。The character generation method 3 described above can be implemented as a computer program product. When the computer program product is read into the electronic computing device, a plurality of program instructions contained in the computer program product can execute the character generation method 3 described above. The computer program product can be stored in a non-transitory tangible machine-readable medium, such as but not limited to a read-only memory (ROM), a flash memory (flash memory), a disk (floppy disk), a mobile hard disk, a magnetic tape, a network-connected database, or any other storage medium known to those of ordinary skill in the art to which the present invention pertains and having the same function.

上述實施例僅用來例舉本發明的部分實施態樣,以及闡釋本發明的技術特徵,而非用來限制本發明的保護範疇及範圍。任何本發明所屬技術領域中具有通常知識者可輕易完成的改變或均等性的安排均屬於本發明所主張的範圍,而本發明的權利保護範圍以申請專利範圍為準。The above embodiments are only used to illustrate some embodiments of the present invention and to illustrate the technical characteristics of the present invention, but are not used to limit the protection scope and scope of the present invention. Any changes or equivalent arrangements that can be easily accomplished by those with ordinary knowledge in the technical field to which the present invention pertains belong to the claimed scope of the present invention, and the protection scope of the present invention is subject to the scope of the patent application.

如下所示 100、101:模糊測試裝置 11:字元生成裝置 111:儲存器 112:處理器 113:收發器 2:測試資料建構流程 201、202、203、204、205、206、207、208、209、210:動作 3:字元生成方法 301、302、303、304:步驟 C1:字元集 M1:字元預測模型 PK1、PK2:封包As follows 100, 101: Fuzzing setup 11: Character generation device 111: Storage 112: Processor 113: Transceiver 2: Test data construction process 201, 202, 203, 204, 205, 206, 207, 208, 209, 210: Actions 3: Character generation method 301, 302, 303, 304: Steps C1: character set M1: Character prediction model PK1, PK2: packets

檢附的圖式可輔助說明本發明的各種實施例,其中: 第1圖例示了根據本發明的一或多個實施例中的字元生成裝置; 第2A圖與第2B圖例示了根據本發明的一或多個實施例中的測試資料建構流程;以及 第3圖例示了根據本發明的一或多個實施例中的字元生成方法。 The accompanying drawings assist in explaining various embodiments of the invention, in which: FIG. 1 illustrates a character generating apparatus in accordance with one or more embodiments of the present invention; FIGS. 2A and 2B illustrate a test data construction process in accordance with one or more embodiments of the present invention; and Figure 3 illustrates a character generation method in accordance with one or more embodiments of the present invention.

無。none.

3:字元生成方法 3: Character generation method

301、302、303、304:步驟 301, 302, 303, 304: Steps

Claims (18)

一種用於建構測試資料的字元生成裝置,包含: 一儲存器,用以儲存一字元預測模型以及一字元集,其中該字元集包含複數個字元;以及 一處理器,與該儲存器電性連接,該處理器用以: 運行一字元預測模型,以根據一當前輸出字串而產生一字元預測向量,其中該字元預測向量包含對應至該等字元的複數個權重值; 根據該字元預測向量,自該字元集當中決定一預測字元; 判斷該等權重值中相應於該預測字元的權重值是否小於一臨界值,其中: 若相應於該預測字元的權重值小於該臨界值,則反轉該等權重值,並根據反轉後的該等權重值而自該字元集當中選取一字元做為一輸出字元;以及 若相應於該預測字元的權重值不小於該臨界值,則將該預測字元做為該輸出字元。 A character generating device for constructing test data, comprising: a storage for storing a character prediction model and a character set, wherein the character set includes a plurality of characters; and a processor electrically connected to the storage, the processor is used for: running a character prediction model to generate a character prediction vector according to a current output string, wherein the character prediction vector includes a plurality of weight values corresponding to the characters; determining a predicted character from the set of characters according to the character prediction vector; Determine whether the weight value corresponding to the prediction character among the weight values is less than a threshold, wherein: If the weight value corresponding to the predicted character is smaller than the threshold value, invert the weight value, and select a character from the character set as an output character according to the inverted weight value ;as well as If the weight value corresponding to the predicted character is not less than the threshold value, the predicted character is used as the output character. 如請求項1所述的字元生成裝置,其中該處理器還用以: 將該輸出字元與該當前輸出字串進行串接,以更新該當前輸出字串;以及 持續產生該輸出字元以及更新該當前輸出字串,直到該當前輸出字串的一長度符合一指定長度,並將該當前輸出字串做為一測試資料。 The character generation device of claim 1, wherein the processor is further configured to: concatenate the output character with the current output string to update the current output string; and The output character is continuously generated and the current output string is updated until a length of the current output string meets a specified length, and the current output string is used as a test data. 如請求項1所述的字元生成裝置,其中於判斷該等權重值中相應於該預測字元的權重值是否小於該臨界值之前,該處理器還用以將該等權重值正規化,以使各權重值皆介於0和1之間,且其中該臨界值亦介於0和1之間。The character generation device of claim 1, wherein before determining whether the weight value corresponding to the predicted character among the weight values is smaller than the threshold value, the processor is further configured to normalize the weight values, So that each weight value is between 0 and 1, and the critical value is also between 0 and 1. 如請求項1所述的字元生成裝置,其中該處理器是根據該等權重值而針對該等字元進行一多項式分布抽樣,以選取該字元。The character generating device of claim 1, wherein the processor performs a polynomial distribution sampling on the characters according to the weight values to select the character. 如請求項1所述的字元生成裝置,還包含一收發器,該收發器與該處理器以及該儲存器電性連接,該收發器用以自外部接收一受測裝置使用的複數個封包; 其中,該處理器還用以分析該等封包,以建立該字元集,其中該等封包符合一文字式通訊協定。 The character generating device according to claim 1, further comprising a transceiver, the transceiver is electrically connected to the processor and the storage, and the transceiver is used for externally receiving a plurality of packets used by a device under test; The processor is further configured to analyze the packets to create the character set, wherein the packets conform to a textual communication protocol. 如請求項5所述的字元生成裝置,其中該處理器還用以根據該等封包以及該字元集訓練一深度神經網路,以建立該字元預測模型。The character generation device of claim 5, wherein the processor is further configured to train a deep neural network according to the packets and the character set to establish the character prediction model. 一種用於建構測試資料的字元生成方法,由一電子計算裝置所執行,該電子計算裝置儲存一字元預測模型以及一字元集,該字元集包含複數個字元,該字元生成方法包含以下步驟: 運行該字元預測模型,以根據一當前輸出字串而產生該字元預測向量,其中該字元預測向量包含對應至該等字元的複數個權重值; 根據該字元預測向量,自該字元集當中決定一預測字元; 當判斷該等權重值中相應於該預測字元的權重值小於一臨界值時,反轉該等權重值,並根據反轉後的該等權重值而自該字元集當中選取一字元做為一輸出字元;以及 當判斷該等權重值中相應於該預測字元的權重值不小於該臨界值時,將該預測字元做為該輸出字元。 A character generation method for constructing test data, executed by an electronic computing device, the electronic computing device stores a character prediction model and a character set, the character set includes a plurality of characters, the character generation The method consists of the following steps: running the character prediction model to generate the character prediction vector according to a current output string, wherein the character prediction vector includes a plurality of weight values corresponding to the characters; determining a predicted character from the set of characters according to the character prediction vector; When it is determined that the weight value corresponding to the predicted character among the weight values is smaller than a threshold value, the weight values are reversed, and a character is selected from the character set according to the reverse weight values as an output character; and When it is determined that the weight value corresponding to the predicted character among the weight values is not less than the threshold value, the predicted character is used as the output character. 如請求項7所述的字元生成方法,還包含以下步驟: 將該輸出字元與該當前輸出字串進行串接,以更新該當前輸出字串;以及 持續產生該輸出字元以及更新該當前輸出字串,直到該當前輸出字串的一長度符合一指定長度,並將該當前輸出字串做為一測試資料。 The character generation method according to claim 7, further comprising the following steps: concatenate the output character with the current output string to update the current output string; and The output character is continuously generated and the current output string is updated until a length of the current output string meets a specified length, and the current output string is used as a test data. 如請求項7所述的字元生成方法,還包含以下步驟:於判斷該等權重值中相應於該預測字元的權重值是否小於該臨界值之前,將該等權重值正規化,以使各權重值皆介於0和1之間,其中該臨界值亦介於0和1之間。The character generation method according to claim 7, further comprising the step of: before judging whether the weight value corresponding to the predicted character among the weight values is smaller than the threshold value, normalizing the weight values so that the weight values are Each weight value is between 0 and 1, wherein the critical value is also between 0 and 1. 如請求項7所述的字元生成方法,其中該電子計算裝置是根據該等權重值而針對該等字元進行一多項式分布抽樣,以選取該字元。The character generation method as claimed in claim 7, wherein the electronic computing device performs a polynomial distribution sampling on the characters according to the weight values to select the character. 如請求項7所述的字元生成方法,還包含以下步驟: 自外部接收一受測裝置使用的複數個封包;以及 分析該等封包,以建立該字元集,其中該等封包符合一文字式通訊協定。 The character generation method according to claim 7, further comprising the following steps: receiving from the outside a plurality of packets used by a device under test; and The packets are analyzed to create the character set, wherein the packets conform to a textual communication protocol. 如請求項11所述的字元生成方法,還包含以下步驟:根據該等封包以及該字元集訓練一深度神經網路,以建立該字元預測模型。The character generation method according to claim 11, further comprising the step of: training a deep neural network according to the packets and the character set to establish the character prediction model. 一種電腦程式產品,一電子計算裝置載入該電腦程式產品所包含之複數個程式指令後執行用於建構測試資料的一字元生成方法,該電子計算裝置儲存一字元預測模型以及一字元集,該字元集包含複數個字元,該字元生成方法包含以下步驟: 運行該字元預測模型,以根據一當前輸出字串而產生該字元預測向量,其中該字元預測向量包含對應至該等字元的複數個權重值; 根據該字元預測向量,自該字元集當中決定一預測字元; 當判斷該等權重值中相應於該預測字元的權重值小於一臨界值時,反轉該等權重值,並根據反轉後的該等權重值而自該字元集當中選取一字元做為一輸出字元;以及 當判斷該等權重值中相應於該預測字元的權重值不小於該臨界值時,將該預測字元做為該輸出字元。 A computer program product, an electronic computing device loads a plurality of program instructions contained in the computer program product and executes a character generation method for constructing test data, the electronic computing device stores a character prediction model and a character set, the character set includes a plurality of characters, and the character generation method includes the following steps: running the character prediction model to generate the character prediction vector according to a current output string, wherein the character prediction vector includes a plurality of weight values corresponding to the characters; determining a predicted character from the set of characters according to the character prediction vector; When it is determined that the weight value corresponding to the predicted character among the weight values is smaller than a threshold value, the weight values are reversed, and a character is selected from the character set according to the reverse weight values as an output character; and When it is determined that the weight value corresponding to the predicted character among the weight values is not less than the threshold value, the predicted character is used as the output character. 如請求項13所述的電腦程式產品,其中該字元生成方法還包含以下步驟: 將該輸出字元與該當前輸出字串進行串接,以更新該當前輸出字串;以及 持續產生該輸出字元以及更新該當前輸出字串,直到該當前輸出字串的一長度符合一指定長度,並將該當前輸出字串做為一測試資料。 The computer program product of claim 13, wherein the character generation method further comprises the following steps: concatenate the output character with the current output string to update the current output string; and The output character is continuously generated and the current output string is updated until a length of the current output string meets a specified length, and the current output string is used as a test data. 如請求項13所述的電腦程式產品,其中該字元生成方法還包含以下步驟:於判斷該等權重值中相應於該預測字元的權重值是否小於該臨界值之前,將該等權重值正規化,以使各權重值皆介於0和1之間,其中該臨界值亦介於0和1之間。The computer program product of claim 13, wherein the character generation method further comprises the step of: before determining whether the weight value corresponding to the predicted character among the weight values is smaller than the threshold value, the weight values Normalize so that each weight value is between 0 and 1, where the threshold is also between 0 and 1. 如請求項13所述的電腦程式產品,其中該電子計算裝置是根據該等權重值而針對該等字元進行一多項式分布抽樣,以選取該字元。The computer program product of claim 13, wherein the electronic computing device selects the character by sampling the characters from a polynomial distribution according to the weight values. 如請求項13所述的電腦程式產品,其中該字元生成方法還包含以下步驟: 自外部接收一受測裝置使用的複數個封包;以及 分析該等封包,以建立該字元集,其中該等封包符合一文字式通訊協定。 The computer program product of claim 13, wherein the character generation method further comprises the following steps: receiving from the outside a plurality of packets used by a device under test; and The packets are analyzed to create the character set, wherein the packets conform to a textual communication protocol. 如請求項17所述的電腦程式產品,其中該字元生成方法還包含以下步驟:根據該等封包以及該字元集訓練一深度神經網路,以建立該字元預測模型。The computer program product of claim 17, wherein the character generation method further comprises the step of: training a deep neural network according to the packets and the character set to establish the character prediction model.
TW109141188A 2020-11-24 2020-11-24 Character-generating appartus, character-generating method and computer program thereof for building test data TWI765426B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW109141188A TWI765426B (en) 2020-11-24 2020-11-24 Character-generating appartus, character-generating method and computer program thereof for building test data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW109141188A TWI765426B (en) 2020-11-24 2020-11-24 Character-generating appartus, character-generating method and computer program thereof for building test data

Publications (2)

Publication Number Publication Date
TWI765426B true TWI765426B (en) 2022-05-21
TW202221555A TW202221555A (en) 2022-06-01

Family

ID=82594474

Family Applications (1)

Application Number Title Priority Date Filing Date
TW109141188A TWI765426B (en) 2020-11-24 2020-11-24 Character-generating appartus, character-generating method and computer program thereof for building test data

Country Status (1)

Country Link
TW (1) TWI765426B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103136098A (en) * 2011-11-30 2013-06-05 西门子公司 Method, device and system for fuzzing test
TW201617771A (en) * 2014-11-10 2016-05-16 財團法人資訊工業策進會 Backup method, pre-testing method for enviornment updating and system thereof
CN108470003A (en) * 2018-03-24 2018-08-31 中科软评科技(北京)有限公司 Fuzz testing methods, devices and systems
US20180365139A1 (en) * 2017-06-15 2018-12-20 Microsoft Technology Licensing, Llc Machine learning for constrained mutation-based fuzz testing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103136098A (en) * 2011-11-30 2013-06-05 西门子公司 Method, device and system for fuzzing test
TW201617771A (en) * 2014-11-10 2016-05-16 財團法人資訊工業策進會 Backup method, pre-testing method for enviornment updating and system thereof
US20180365139A1 (en) * 2017-06-15 2018-12-20 Microsoft Technology Licensing, Llc Machine learning for constrained mutation-based fuzz testing
CN108470003A (en) * 2018-03-24 2018-08-31 中科软评科技(北京)有限公司 Fuzz testing methods, devices and systems

Also Published As

Publication number Publication date
TW202221555A (en) 2022-06-01

Similar Documents

Publication Publication Date Title
JP6154824B2 (en) Boolean logic in state machine lattices
CN110990273B (en) Clone code detection method and device
CN109873774B (en) Network traffic identification method and device
CN111431819A (en) Network traffic classification method and device based on serialized protocol flow characteristics
CN115169570A (en) Quantum network protocol simulation method and device and electronic equipment
CN109714356A (en) A kind of recognition methods of abnormal domain name, device and electronic equipment
CN107451106A (en) Text method and device for correcting, electronic equipment
KR20230094956A (en) Techniques for performing subject word classification of document data
TWI765426B (en) Character-generating appartus, character-generating method and computer program thereof for building test data
Viotti et al. A survey of JSON-compatible binary serialization specifications
TW202234277A (en) Detection and mitigation of unstable cells in unclonable cell array
CN116644180A (en) Training method and training system for text matching model and text label determining method
CN116136970B (en) Stable sub-checking line construction method, quantum error correction decoding method and related equipment
CN115828269A (en) Method, device, equipment and storage medium for constructing source code vulnerability detection model
CN113225213B (en) Method and device for translating configuration file of network equipment and network simulation
CN116980356A (en) Network traffic identification method and device, electronic equipment and storage medium
Xu et al. Feature Extraction for Payload Classification: A Byte Pair Encoding Algorithm
CN113886593A (en) Method for improving relation extraction performance by using reference dependence
CN112764791A (en) Incremental updating malicious software detection method and system
US12008580B2 (en) Natural language processing machine learning to convert service recommendations to actionable objects
CN116545779B (en) Network security named entity recognition method, device, equipment and storage medium
Jameel et al. Optimal topology search for fast model averaging in decentralized parallel SGD
CN114491621B (en) Text object security detection method and equipment
JP2019213183A (en) Clustering method, classification method, clustering apparatus, and classification apparatus
WO2022259330A1 (en) Estimation device, estimation method, and estimation program

Legal Events

Date Code Title Description
MM4A Annulment or lapse of patent due to non-payment of fees