TW202207155A

TW202207155A - Model determination method and related terminal and computer readable storage medium

Info

Publication number: TW202207155A
Application number: TW109139394A
Authority: TW
Inventors: 蘇海昇; 蘇婧; 武偉
Original assignee: 大陸商上海商湯智能科技有限公司
Priority date: 2020-07-31
Filing date: 2020-11-11
Publication date: 2022-02-16
Also published as: CN111950411A; CN111950411B; TWI755149B; WO2022021624A1

Abstract

The embodiment of the present application provides a method for determining a model, a related terminal, and a computer-readable storage medium. The method includes: classifying the input video according to a first model to obtain first feature data, and according to the second The model classifies the input video to obtain second feature data; determines a first loss function according to the first feature data and the second feature data; according to the first model and the second model Network parameters, determine a second loss function; determine a target loss function according to at least one of the first loss function and the second loss function; adjust the first model according to the target loss function to obtain the first The three models can improve the accuracy of the adjusted model when processing video classification tasks.

Description

模型確定方法及相關終端和電腦可讀儲存介質Model determination method and related terminal and computer-readable storage medium

本發明關於但不限於資料處理技術領域，具體關於一種模型確定方法及相關終端和電腦可讀儲存介質。The present invention relates to, but is not limited to, the technical field of data processing, and specifically relates to a model determination method and related terminals and computer-readable storage media.

行為識別作為視頻行為理解領域中最基礎的研究方向之一，旨在識別修剪視頻中發生的動作類別，吸引了越來越多人的關注。相關技術中基於深度學習的方法主要包含了兩種典型的類別：雙流網路旨在從RGB圖像和堆疊光流中分別捕捉表冠和運動資訊，而三維卷積採用三維（three dimensional，3D）卷積核直接從原始視頻中捕捉空間和時間資訊。然而，無論哪種方案，為了獲得較好的性能，通常都需要犧牲巨大的參數和資源作為代價。As one of the most basic research directions in the field of video behavior understanding, behavior recognition aims to identify the action categories that occur in trimmed videos, attracting more and more people's attention. Deep learning-based methods in the related art mainly include two typical categories: two-stream networks aim to capture crown and motion information from RGB images and stacked optical flow, respectively, and three-dimensional convolutions adopt three-dimensional (3D) ) convolution kernels capture spatial and temporal information directly from the raw video. However, no matter which scheme, in order to obtain better performance, it usually needs to sacrifice huge parameters and resources as the cost.

本發明實施例提供了一種模型確定方法及相關終端和電腦可讀儲存介質，能夠提升調整後得到的模型處理視頻分類任務時的準確性。Embodiments of the present invention provide a model determination method, a related terminal, and a computer-readable storage medium, which can improve the accuracy of a model obtained after adjustment when processing a video classification task.

本發明實施例提供了一種模型確定方法，所述模型確定方法用於確定分類模型，確定的分類模型應用於對待分類視頻進行分類，該方法包括：根據第一模型對輸入視頻進行分類處理，得到第一特徵資料，以及根據第二模型對輸入視頻進行分類處理，得到第二特徵資料；根據第一特徵資料、第二特徵資料，確定第一損失函數；根據第一模型的網路參數和第二模型的網路參數，確定第二損失函數；根據第一損失函數和第二損失函數中至少一個，確定目標損失函數；根據目標損失函數對第一模型進行調整，以得到第三模型。An embodiment of the present invention provides a model determination method, the model determination method is used to determine a classification model, and the determined classification model is used to classify a video to be classified, and the method includes: Classify the input video according to the first model to obtain the first feature data, and classify the input video according to the second model to obtain the second feature data; Determine the first loss function according to the first feature data and the second feature data; Determine the second loss function according to the network parameters of the first model and the network parameters of the second model; Determine the target loss function according to at least one of the first loss function and the second loss function; The first model is adjusted according to the objective loss function to obtain the third model.

在上述實施例中，通過第一模型、第二模型對輸入視頻進行分類處理後得到的第一特徵資料和第二特徵資料確定第一損失函數，通過第一模型和第二模型的網路參數確定第二損失函數，通過第一損失函數和第二損失函數確定的目標損失函數對第一模型進行調整，得到第三模型，相對於相關技術中，蒸餾方法大多停留在輸入資料層面的有效選取，能夠通過第一模型和第二模型獲取到的第一損失函數和第二損失函數來確定的目標損失函數對第一模型進行監督學習，以得到第三模型，從而能夠從模型參數等方面對模型進行蒸餾得到第三模型，提升了第三模型處理視頻分類任務時的準確性。In the above embodiment, the first loss function is determined by the first and second feature data obtained after the input video is classified and processed by the first model and the second model, and the network parameters of the first model and the second model are used to determine the first loss function. Determine the second loss function, adjust the first model through the target loss function determined by the first loss function and the second loss function, and obtain the third model. Compared with the related art, the distillation method mostly stays in the effective selection of the input data level , the target loss function determined by the first loss function and the second loss function obtained by the first model and the second model can be used to supervised learning of the first model to obtain the third model, so that the model parameters can be used to determine the target loss function. The model is distilled to obtain a third model, which improves the accuracy of the third model in processing video classification tasks.

在本發明的一些實施例中，根據第一特徵資料、第二特徵資料，確定第一損失函數，包括：對第一特徵資料進行變換，以得到第一頻譜資料，以及對所述第二特徵資料進行變換，以得到第二頻譜資料；至少根據所述第一頻譜資料和第二頻譜資料，確定第一損失函數。In some embodiments of the present invention, the first loss function is determined according to the first feature data and the second feature data, including: Transforming the first characteristic data to obtain first spectral data, and transforming the second characteristic data to obtain second spectral data; A first loss function is determined based on at least the first spectral data and the second spectral data.

在上述實施例中，通過對第一特徵資料和第二特徵資料進行變換，以得到第一頻譜資料和第二頻譜資料，通過第一頻譜資料和第二頻譜資料確定第一損失函數，能夠通過頻譜損失函數（第一損失函數）對蒸餾進行監督，提升了模型蒸餾時的準確性。In the above embodiment, the first spectral data and the second spectral data are obtained by transforming the first characteristic data and the second characteristic data, and the first loss function is determined by the first spectral data and the second spectral data, and the The spectral loss function (the first loss function) supervises the distillation and improves the accuracy of the model distillation.

在本發明的一些實施例中，第一頻譜資料包括第一模型中的K個第一恒等結構塊的輸出資料進行變換後的資料，第二頻譜資料包括第二模型中的K個第二恒等結構塊的輸出資料進行變換後的資料，至少根據第一頻譜資料和第二頻譜資料，確定第一損失函數，包括：獲取預測器模型的第一參數，預測器模型用於確定第二模型和第一模型的輸出資料的尺度相同；根據第一模型中的K個第一恒等結構塊的輸出資料進行變換後的資料、第二模型中的K個第二恒等結構塊的輸出資料進行變換後的資料和第一參數，確定第一損失函數。In some embodiments of the present invention, the first spectral data includes data transformed from the output data of K first identity building blocks in the first model, and the second spectral data includes K second data in the second model The data after the transformation of the output data of the identity structure block, determine the first loss function according to at least the first spectral data and the second spectral data, including: obtaining the first parameter of the predictor model, and the predictor model is used to determine that the scale of the output data of the second model and the first model is the same; According to the data transformed from the output data of the K first identity building blocks in the first model, the transformed data from the output data of the K second identity building blocks in the second model, and the first parameter, determine The first loss function.

在上述實施例中，通過預測器模型的第一參數，可以保證第二模型和第一模型的輸出資料的尺度相同，提升了模型蒸餾時的效率。In the above embodiment, through the first parameter of the predictor model, it can be ensured that the scales of the output data of the second model and the first model are the same, which improves the efficiency of model distillation.

在本發明的一些實施例中，根據第一模型和第二模型的網路參數，確定第二損失函數，包括：獲取第一模型的第一網路參數，以及獲取第二模型的第二網路參數；對第一網路參數進行排序，得到第一累計分佈圖，以及對第二網路參數進行排序，得到第二累計分佈圖；根據第一累計分佈圖和第二累計分佈圖的散度，確定第二損失函數。In some embodiments of the present invention, the second loss function is determined according to the network parameters of the first model and the second model, including: obtaining the first network parameter of the first model, and obtaining the second network parameter of the second model; Sorting the first network parameters to obtain a first cumulative distribution map, and sorting the second network parameters to obtain a second cumulative distribution map; A second loss function is determined based on the divergence of the first cumulative profile and the second cumulative profile.

在上述實施例中，通過第一模型的第一網路參數和第二模型的第二網路參數確定第二損失函數，從而可以使得蒸餾後後得到的第三模型與第二模型在頻率分佈上對齊，提升模型蒸餾時的準確性。In the above embodiment, the second loss function is determined by the first network parameters of the first model and the second network parameters of the second model, so that the third model and the second model obtained after distillation can be frequency distributed Align up to improve the accuracy of model distillation.

在本發明的一些實施例中，根據第一損失函數和第二損失函數中至少一個，確定目標損失函數，包括：獲取第一模型的第三損失函數，第三損失函數為視頻分類損失函數；根據第一損失函數和第三損失函數，確定目標損失函數。In some embodiments of the present invention, the target loss function is determined according to at least one of the first loss function and the second loss function, including: Obtain the third loss function of the first model, where the third loss function is a video classification loss function; According to the first loss function and the third loss function, the target loss function is determined.

在上述實施例中，通過第一損失函數和第三損失函數確定目標損失函數，可以提升目標損失函數對第一模型進行調整得到的第三模型進行分類檢測時的準確性。In the above-mentioned embodiment, the target loss function is determined by the first loss function and the third loss function, which can improve the accuracy of classification and detection of the third model obtained by adjusting the first model by the target loss function.

在本發明的一些實施例中，根據第一損失函數和第二損失函數中至少一個，確定目標損失函數，包括：獲取第一模型的第三損失函數，第三損失函數為視頻分類損失函數；根據第二損失函數和第三損失函數，確定目標損失函數。In some embodiments of the present invention, the target loss function is determined according to at least one of the first loss function and the second loss function, including: Obtain the third loss function of the first model, where the third loss function is a video classification loss function; According to the second loss function and the third loss function, the target loss function is determined.

在本發明的一些實施例中，根據第一損失函數和第二損失函數中至少一個，確定目標損失函數，包括：獲取第一模型的第三損失函數，第三損失函數為視頻分類損失函數；根據第一損失函數、第二損失函數和第三損失函數，確定目標損失函數。In some embodiments of the present invention, the target loss function is determined according to at least one of the first loss function and the second loss function, including: Obtain the third loss function of the first model, where the third loss function is a video classification loss function; The target loss function is determined according to the first loss function, the second loss function and the third loss function.

在本發明的一些實施例中，根據第一損失函數和第二損失函數中至少一個，確定目標損失函數，包括：獲取第一模型的第三損失函數，第三損失函數為視頻分類損失函數；獲取與第一損失函數和第二損失函數對應的權值函數；根據權值函數、第一損失函數、第二損失函數和第三損失函數，確定目標損失函數。In some embodiments of the present invention, the target loss function is determined according to at least one of the first loss function and the second loss function, including: Obtain the third loss function of the first model, where the third loss function is a video classification loss function; Obtain the weight function corresponding to the first loss function and the second loss function; The target loss function is determined according to the weight function, the first loss function, the second loss function and the third loss function.

在本發明的一些實施例中，方法還包括：接收待分類視頻；通過第三模型對待分類視頻進行分類處理，以得到分類結果。In some embodiments of the present invention, the method further comprises: Receive videos to be classified; The video to be classified is classified by the third model to obtain a classification result.

本發明實施例提供一種模型確定裝置，所述模型確定裝置用於確定分類模型，確定的分類模型應用於對待分類視頻進行分類，其中該裝置包括：處理單元，配置為根據第一模型對輸入視頻進行分類處理，得到第一特徵資料，以及根據第二模型對輸入視頻進行分類處理，得到第二特徵資料；第一確定單元，配置為根據第一特徵資料、第二特徵資料，確定第一損失函數；第二確定單元，配置為根據第一模型和第二模型的網路參數，確定第二損失函數；第三確定單元，配置為根據第一損失函數和第二損失函數中至少一個，確定目標損失函數；調整單元，配置為根據目標損失函數對第一模型進行調整，以得到第三模型。An embodiment of the present invention provides a model determination device, the model determination device is used to determine a classification model, and the determined classification model is used to classify a video to be classified, wherein the device includes: a processing unit, configured to classify the input video according to the first model to obtain the first feature data, and to classify the input video according to the second model to obtain the second feature data; a first determining unit, configured to determine a first loss function according to the first characteristic data and the second characteristic data; a second determining unit, configured to determine a second loss function according to the network parameters of the first model and the second model; a third determination unit, configured to determine the target loss function according to at least one of the first loss function and the second loss function; The adjustment unit is configured to adjust the first model according to the target loss function to obtain the third model.

在本發明的一些實施例中，在根據第一特徵資料、第二特徵資料，確定第一損失函數方面，第一確定單元還配置為：對第一特徵資料進行變換，以得到第一頻譜資料，以及對第二特徵資料進行變換，以得到第二頻譜資料；至少根據第一頻譜資料和第二頻譜資料，確定第一損失函數。In some embodiments of the present invention, in terms of determining the first loss function according to the first feature data and the second feature data, the first determining unit is further configured to: Transforming the first characteristic data to obtain first spectral data, and transforming the second characteristic data to obtain second spectral data; A first loss function is determined based on at least the first spectral data and the second spectral data.

在本發明的一些實施例中，第一頻譜資料包括第一模型中的K個第一恒等結構塊的輸出資料進行變換後的資料，第二頻譜資料包括第二模型中的K個第二恒等結構塊的輸出資料進行變換後的資料，其中，K為大於0的整數；在至少根據第一頻譜資料和第二頻譜資料，確定第一損失函數方面，第一確定單元還配置為：獲取預測器模型的第一參數，預測器模型用於確定第二模型和第一模型的輸出資料的尺度相同；根據第一模型中的K個第一恒等結構塊的輸出資料進行變換後的資料、第二模型中的K個第二恒等結構塊的輸出資料進行變換後的資料和第一參數，確定第一損失函數。In some embodiments of the present invention, the first spectral data includes data transformed from the output data of K first identity building blocks in the first model, and the second spectral data includes K second data in the second model Data obtained by transforming the output data of the identity structure block, wherein K is an integer greater than 0; in terms of determining the first loss function according to at least the first spectral data and the second spectral data, the first determining unit is further configured as: obtaining the first parameter of the predictor model, and the predictor model is used to determine that the scale of the output data of the second model and the first model is the same; According to the data transformed from the output data of the K first identity building blocks in the first model, the transformed data from the output data of the K second identity building blocks in the second model, and the first parameter, determine The first loss function.

在本發明的一些實施例中，第二確定單元還配置為：獲取第一模型的第一網路參數，以及獲取第二模型的第二網路參數；對第一網路參數進行排序，得到第一累計分佈圖，以及對第二網路參數進行排序，得到第二累計分佈圖；根據第一累計分佈圖和第二累計分佈圖的散度，確定第二損失函數。In some embodiments of the present invention, the second determining unit is further configured to: obtaining the first network parameter of the first model, and obtaining the second network parameter of the second model; Sorting the first network parameters to obtain a first cumulative distribution map, and sorting the second network parameters to obtain a second cumulative distribution map; A second loss function is determined based on the divergence of the first cumulative profile and the second cumulative profile.

在本發明的一些實施例中，第三確定單元還配置為：獲取第一模型的第三損失函數，第三損失函數為視頻分類損失函數；根據第一損失函數和第三損失函數，確定目標損失函數。In some embodiments of the present invention, the third determining unit is further configured to: Obtain the third loss function of the first model, where the third loss function is a video classification loss function; According to the first loss function and the third loss function, the target loss function is determined.

在本發明的一些實施例中，第三確定單元還配置為：獲取第一模型的第三損失函數，第三損失函數為視頻分類損失函數；根據第二損失函數和第三損失函數，確定目標損失函數。In some embodiments of the present invention, the third determining unit is further configured to: Obtain the third loss function of the first model, where the third loss function is a video classification loss function; According to the second loss function and the third loss function, the target loss function is determined.

在本發明的一些實施例中，第三確定單元還配置為：獲取第一模型的第三損失函數，第三損失函數為視頻分類損失函數；根據第一損失函數、第二損失函數和第三損失函數，確定目標損失函數。In some embodiments of the present invention, the third determining unit is further configured to: Obtain the third loss function of the first model, where the third loss function is a video classification loss function; The target loss function is determined according to the first loss function, the second loss function and the third loss function.

在本發明的一些實施例中，第三確定單元還配置為：獲取第一模型的第三損失函數，第三損失函數為視頻分類損失函數；獲取與第一損失函數和第二損失函數對應的權值函數；根據權值函數、第一損失函數、第二損失函數和第三損失函數，確定目標損失函數。In some embodiments of the present invention, the third determining unit is further configured to: Obtain the third loss function of the first model, where the third loss function is a video classification loss function; Obtain the weight function corresponding to the first loss function and the second loss function; The target loss function is determined according to the weight function, the first loss function, the second loss function and the third loss function.

在本發明的一些實施例中，該裝置還包括：接收單元，配置為接收待分類視頻；分類單元，配置為通過第三模型對待分類視頻進行分類處理，以得到分類結果。In some embodiments of the present invention, the apparatus further comprises: a receiving unit, configured to receive the video to be classified; The classification unit is configured to perform classification processing on the video to be classified by the third model to obtain a classification result.

本發明實施例提供一種終端，包括處理器、輸入設備、輸出設備和記憶體，所述處理器、輸入設備、輸出設備和記憶體相互連接，其中，所述記憶體用於儲存電腦程式，所述電腦程式包括程式指令，所述處理器被配置用於調用所述程式指令，執行如本發明實施例中所描述的方法的步驟指令。An embodiment of the present invention provides a terminal, including a processor, an input device, an output device, and a memory, wherein the processor, the input device, the output device, and the memory are connected to each other, wherein the memory is used to store computer programs, so The computer program includes program instructions, and the processor is configured to invoke the program instructions to execute the step instructions of the method as described in the embodiments of the present invention.

本發明實施例提供一種電腦可讀儲存介質，其中，上述電腦可讀儲存介質儲存用於電子資料交換的電腦程式，其中，上述電腦程式使得電腦執行如本發明實施例中所描述的方法的部分或全部步驟。An embodiment of the present invention provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program for electronic data exchange, wherein the computer program causes a computer to execute part of the method described in the embodiment of the present invention or all steps.

本發明實施例提供一種電腦程式產品，其中，上述電腦程式產品包括儲存了電腦程式的非暫態性電腦可讀儲存介質，上述電腦程式可操作來使電腦執行如本發明實施例中所描述的方法的部分或全部步驟。該電腦程式產品可以為一個軟體安裝包。Embodiments of the present invention provide a computer program product, wherein the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute the operations described in the embodiments of the present invention. some or all of the steps of the method. The computer program product may be a software installation package.

本發明實施例的上述方法、裝置、終端、電腦可讀儲存介質和電腦程式產品在以下實施例的描述中會更加簡明易懂。The above-mentioned method, apparatus, terminal, computer-readable storage medium and computer program product of the embodiments of the present invention will be more concise and easy to understand in the description of the following embodiments.

下面將結合本發明實施例中的附圖，對本發明實施例中的技術方案進行清楚、完整地描述，顯然，所描述的實施例僅僅是本發明一部分實施例，而不是全部的實施例。基於本發明中的實施例，本領域普通技術人員在沒有做出創造性勞動前提下所獲得的所有其他實施例，都屬於本發明保護的範圍。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

本發明的說明書和申請專利範圍及上述附圖中的術語“第一”、“第二”等是用於區別不同物件，而不是用於描述特定順序。此外，術語“包括”和“具有”以及它們任何變形，意圖在於覆蓋不排他的包含。例如包含了一系列步驟或單元的過程、方法、系統、產品或設備沒有限定於已列出的步驟或單元，而是可選地還包括沒有列出的步驟或單元，或可選地還包括對於這些過程、方法、產品或設備固有的其他步驟或單元。The terms "first", "second" and the like in the description of the present invention and the scope of the patent application and the above drawings are used to distinguish different items, rather than to describe a specific order. Furthermore, the terms "comprising" and "having" and any variations thereof are intended to cover non-exclusive inclusion. For example, a process, method, system, product or device comprising a series of steps or units is not limited to the listed steps or units, but optionally also includes unlisted steps or units, or optionally also includes For other steps or units inherent to these processes, methods, products or devices.

在本發明中提及“實施例”意味著，結合實施例描述的特定特徵、結構或特性可以包含在本發明的至少一個實施例中。在說明書中的各個位置出現該短語並不一定均是指相同的實施例，也不是與其它實施例互斥的獨立的或備選的實施例。本領域技術人員顯式地和隱式地理解的是，本發明所描述的實施例可以與其它實施例相結合。Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor a separate or alternative embodiment that is mutually exclusive of other embodiments. It is explicitly and implicitly understood by those skilled in the art that the described embodiments of the present invention may be combined with other embodiments.

為了更好的理解本發明實施例的模型確定方法，下面首先對應用模型確定方法的場景進行簡要介紹。通過模型確定方法確定的模型，應用於對輸入的待分類視頻進行分類，得到分類結果。通過模型確定方法確定的模型可以被稱為學生模型，學生模型相對於教師模型，其與教師模型在對視頻進行分類處理的結果的有效性相近，相近可以理解為其分類的準確性相近等，教師模型可以理解為通過大量的樣本資料進行訓練得到的模型，但是學生模型的網路結構比教師模型的網路結構要小，且學生模型易於部署，可以部署到一些資源比較小的電子設備上，以執行分類任務，提升了模型部署的靈活度和實用性。In order to better understand the model determination method according to the embodiment of the present invention, a scenario in which the model determination method is applied is briefly introduced below. The model determined by the model determination method is applied to classify the input video to be classified to obtain the classification result. The model determined by the model determination method can be called the student model. Compared with the teacher model, the student model is similar to the teacher model in the results of classifying the video, and the similarity can be understood as the accuracy of its classification. The teacher model can be understood as a model obtained by training a large amount of sample data, but the network structure of the student model is smaller than that of the teacher model, and the student model is easy to deploy and can be deployed on some electronic devices with relatively small resources , to perform classification tasks, improving the flexibility and practicality of model deployment.

下面介紹一種可能的應用場景，請參閱圖1A，圖1A為本發明實施例提供的一種模型確定方法的應用場景示意圖。如圖1所示，目標區域10可以是需要進行行為分析的區域，例如，對商場中的人進行購物分析時，商場就可以被確定為目標區域，又例如，需要對十字路口的車輛進行車流量分析，則十字路口就可以被確定為目標區域。通過攝影頭20採集目標區域的視頻，該視頻可以用於視頻分類，攝影頭20在採集到視頻後，可以將該視頻發送給伺服器30，此處的伺服器可以是常規的伺服器，當然也可以是電子設備等，電子設備例如可以是手機、平板電腦等，伺服器在接收到視頻後，可以通過視頻分類模型31對該視頻通過蒸餾後得到的學生模型進行分類，得到分類結果40。分類結果可以理解為，例如，以商場為例進行說明，分類結果可以是商場中不同的人在不同的商店中購物的行為，也可以是商場中人群運動情況的行為等。通過蒸餾後得到的學生模型對視頻進行分類處理，得到分類結果，該分類結果的準確性與蒸餾前的教師模型進行分類處理得到的分類結果的準確性相近，但該學生模型的計算速率高於教師模型，以及模型的大小小於教師模型，從而易於部署以及快速獲取到分類結果。A possible application scenario is introduced below. Please refer to FIG. 1A , which is a schematic diagram of an application scenario of a model determination method provided by an embodiment of the present invention. As shown in FIG. 1 , the target area 10 may be an area that needs to conduct behavior analysis. For example, when shopping analysis is performed on people in the mall, the mall can be determined as the target area. For another example, vehicles at the intersection need to be analyzed. Traffic analysis, then the intersection can be identified as the target area. The video of the target area is captured by the camera 20, and the video can be used for video classification. After the camera 20 captures the video, the video can be sent to the server 30. The server here can be a conventional server. Of course, It can also be an electronic device, for example, a mobile phone, a tablet computer, etc. After receiving the video, the server can use the video classification model 31 to classify the student model obtained by distillation of the video to obtain a classification result 40 . The classification result can be understood as, for example, taking a shopping mall as an example, the classification result can be the behavior of different people in the shopping mall shopping in different stores, or the behavior of the movement of people in the shopping mall. The video is classified by the student model obtained after distillation, and the classification result is obtained. The accuracy of the classification result is similar to the accuracy of the classification result obtained by the teacher model before distillation, but the calculation rate of the student model is higher than that of the student model. The teacher model, and the size of the model is smaller than the teacher model, so that it is easy to deploy and get classification results quickly.

相關技術中，為了考慮行為分類的即時性，越來越多研究學者開始探索輕量化模型。蒸餾學習是一種常見的模型輕量化手段，通過將較大的教師模型中關鍵的資訊蒸餾進較小的學生模型中，從而使得學生模型達到與教師模型相近的性能。少數的用於視頻分類任務的蒸餾方法大多停留在輸入資料層面的有效選取，或者照搬圖像分類領域的蒸餾方法，導致了蒸餾得到的學生模型在處理視頻分類任務時的準確性較低。In related technologies, in order to consider the immediacy of behavior classification, more and more researchers have begun to explore lightweight models. Distillation learning is a common model lightweight method. By distilling key information from a larger teacher model into a smaller student model, the student model can achieve similar performance to the teacher model. Most of the few distillation methods used for video classification tasks stay at the effective selection of the input data, or copy the distillation methods in the field of image classification, resulting in the low accuracy of the distilled student model when dealing with video classification tasks.

請參閱圖1B，圖1B為本發明實施例提供的一種模型確定方法的流程示意圖，該方法可以由終端設備或伺服器或其它處理設備執行，其中，終端設備可以為使用者設備（User Equipment，UE）、移動設備、使用者終端、終端、蜂窩電話、無線電話、個人數位助理（Personal Digital Assistant，PDA）、手持設備、計算設備、車載設備、可穿戴設備等。在一些可能的實現方式中，該方法可以通過處理器調用記憶體中儲存的電腦可讀指令的方式來實現。如圖1B所示，模型確定方法包括如下。Please refer to FIG. 1B . FIG. 1B is a schematic flowchart of a method for determining a model according to an embodiment of the present invention. The method may be executed by a terminal device, a server, or other processing device, where the terminal device may be a user equipment (User Equipment, UE), mobile devices, user terminals, terminals, cellular phones, wireless phones, personal digital assistants (Personal Digital Assistant, PDA), handheld devices, computing devices, in-vehicle devices, wearable devices, etc. In some possible implementations, the method may be implemented by the processor calling computer-readable instructions stored in the memory. As shown in FIG. 1B , the model determination method includes the following.

101、根據第一模型對輸入視頻進行分類處理，得到第一特徵資料，以及根據第二模型對輸入視頻進行分類處理，得到第二特徵資料。其中，該模型確定方法可以由伺服器執行，該輸入視頻可以通過攝影頭採集得到。輸入視頻可以是需要進行分類處理的視頻，例如，使用者在街道行走時的視頻、使用者做出某種肢體動作時的視頻等，此處僅為舉例說明，不作具體限定。第一模型可以是殘差神經網路（Residual Network，ResNet）結構的模型，ResNet結構中可以包括有多個恒等結構塊。第二模型可以是教師模型，其可以理解為通過大量的樣本資料進行訓練得到的模型，其用於對視頻分類。第一模型對輸入視頻進行分類處理，得到的第一特徵資料為時域資料，以及第二特徵資料也可以是時域資料。101. Classify the input video according to the first model to obtain first feature data, and classify the input video according to the second model to obtain second feature data. Wherein, the model determination method can be executed by a server, and the input video can be acquired by a camera. The input video may be a video that needs to be classified, for example, a video of a user walking on a street, a video of a user performing a certain body movement, etc., which are only illustrative and not specifically limited. The first model may be a model of a residual neural network (Residual Network, ResNet) structure, and the ResNet structure may include multiple identity structural blocks. The second model can be a teacher model, which can be understood as a model obtained by training a large amount of sample data, and is used for classifying videos. The first model classifies the input video, and the obtained first feature data is time domain data, and the second feature data may also be time domain data.

102、根據第一特徵資料、第二特徵資料，確定第一損失函數。可以將第一特徵資料、第二特徵資料變換為對應的頻譜資料，根據頻譜資料來確定該第一損失函數。頻譜資料中的高頻資料側重於表徵相鄰視頻幀之間的運動資訊，頻譜資料中的低資料側重於表徵場景。通過頻譜資料來確定第一損失函數，並採用第一損失函數對模型進行調整時，能夠提升模型調整時收斂速度，提升模型調整的效率。102. Determine a first loss function according to the first feature data and the second feature data. The first feature data and the second feature data can be transformed into corresponding spectrum data, and the first loss function can be determined according to the spectrum data. The high frequency data in the spectral data focuses on characterizing the motion information between adjacent video frames, and the low data in the spectral data focuses on characterizing the scene. The first loss function is determined by using the spectrum data, and when the first loss function is used to adjust the model, the convergence speed of the model adjustment can be improved, and the efficiency of the model adjustment can be improved.

103、根據第一模型的網路參數和第二模型的網路參數，確定第二損失函數。可以對第一模型的網路參數和第二模型的網路參數進行排序得到累計分佈圖，根據累計分佈圖來確定第二損失函數。對網路參數進行排序的方法可以是分別對高頻參數和低頻參數進行排序。通過累計分佈圖來確定第二損失函數，可以提升第二損失函數確定時的準確性。103. Determine a second loss function according to the network parameters of the first model and the network parameters of the second model. The network parameters of the first model and the network parameters of the second model can be sorted to obtain a cumulative distribution map, and the second loss function can be determined according to the cumulative distribution map. The method for sorting the network parameters may be sorting the high frequency parameters and the low frequency parameters respectively. Determining the second loss function by accumulating the distribution map can improve the accuracy of determining the second loss function.

104、根據第一損失函數和第二損失函數中至少一個，確定目標損失函數。可以根據第一損失函數確定出目標損失函數，也可以根據第二損失函數確定出目標損失函數，還可以根據第一損失函數和第二損失函數確定出目標損失函數。通過第一損失函數和第二損失函數中至少一個確定目標損失函數，這可以提升通過目標損失函數調整後得到的第三模型對視頻進行分類處理時的準確性。104. Determine a target loss function according to at least one of the first loss function and the second loss function. The target loss function may be determined according to the first loss function, the target loss function may also be determined according to the second loss function, and the target loss function may be determined according to the first loss function and the second loss function. The target loss function is determined by at least one of the first loss function and the second loss function, which can improve the accuracy of classifying the video by the third model obtained by adjusting the target loss function.

105、根據目標損失函數對第一模型進行調整，以得到第三模型。可以通過目標損失函數監督樣本資料對第一模型進行訓練，在收斂後得到第三模型。第三模型可以理解為從第二模型中蒸餾得到的模型。此處若將第二模型理解為教師模型，第三模型則可以理解為從教師模型蒸餾得到的學生模型。105. Adjust the first model according to the target loss function to obtain a third model. The first model can be trained by supervising the sample data through the objective loss function, and the third model can be obtained after convergence. The third model can be understood as a model distilled from the second model. Here, if the second model is understood as the teacher model, the third model can be understood as the student model distilled from the teacher model.

本發明實施例中，通過第一模型、第二模型對輸入視頻進行分類處理後得到的第一特徵資料和第二特徵資料確定第一損失函數，通過第一模型和第二模型的網路參數確定第二損失函數，通過第一損失函數和第二損失函數確定的目標損失函數對第一模型進行調整，得到第三模型，相對於相關技術方案中，蒸餾方法大多停留在輸入資料層面的有效選取，能夠通過第一模型和第二模型獲取到的第一損失函數和第二損失函數來確定的目標損失函數對第一模型進行監督學習，以得到第三模型，從而能夠從模型參數等方面對模型進行蒸餾得到第三模型，提升了第三模型處理視頻分類任務時的準確性。In the embodiment of the present invention, the first loss function is determined by using the first model and the second model after classifying the input video and the first feature data and the second feature data, and the network parameters of the first model and the second model are used to determine the first loss function. The second loss function is determined, and the first model is adjusted through the target loss function determined by the first loss function and the second loss function to obtain the third model. Compared with the related technical solutions, the distillation methods mostly stay effective at the input data level. Select, the target loss function determined by the first loss function and the second loss function obtained by the first model and the second model can be used to supervise the learning of the first model to obtain the third model, so that the model parameters and other aspects can be obtained. The third model is obtained by distilling the model, which improves the accuracy of the third model in processing video classification tasks.

在本發明的一些實施例中，一種可能的根據第一特徵資料、第二特徵資料，確定第一損失函數的方法包括： A1、對第一特徵資料進行變換，以得到第一頻譜資料，以及對所述第二特徵資料進行變換，以得到第二頻譜資料； A2、至少根據所述第一頻譜資料和所述第二頻譜資料，確定所述第一損失函數。In some embodiments of the present invention, a possible method for determining the first loss function according to the first feature data and the second feature data includes: A1, transform the first feature data to obtain the first spectrum data, and transform the second feature data to obtain the second spectrum data; A2. Determine the first loss function according to at least the first spectral data and the second spectral data.

對第一特徵資料和第二特徵資料進行變換的方法可以是，通過離散傅立葉變換，對第一特徵資料和第二特徵資料進行變換，以得到對應的第一頻譜資料和第二頻譜資料。The method for transforming the first characteristic data and the second characteristic data may be to transform the first characteristic data and the second characteristic data through discrete Fourier transform, so as to obtain the corresponding first and second spectral data.

在本發明的一些實施例中，第一模型包括有K個第一恒等結構塊，第二模型包括K個第二恒等結構塊，其中，K為大於0的整數。In some embodiments of the present invention, the first model includes K first identity building blocks, and the second model includes K second identity building blocks, where K is an integer greater than 0.

對第一特徵資料進行變換得到第一頻譜資料時可以通過如下公式（1-1）和公式（1-2）所示的方法進行變換：

，（1-1）；

，（1-2）；其中，

為第一模型中的第i個恒等結構塊輸出的特徵資料，

為對應的第一頻譜資料，

為第一模型，

為可學習的參數，

為輸入視頻，

為離散傅立葉變換。When the first characteristic data is transformed to obtain the first spectrum data, the transformation can be performed by the methods shown in the following formulas (1-1) and (1-2):

, (1-1);

, (1-2); where,

is the feature data output for the ith identity building block in the first model,

is the corresponding first spectrum data,

for the first model,

is a learnable parameter,

for input video,

is the discrete Fourier transform.

獲取第二頻譜資料的方法可以參考上述獲取第一頻譜資料的方法，此處不再贅述。For the method for acquiring the second spectrum data, reference may be made to the above-mentioned method for acquiring the first spectrum data, which will not be repeated here.

確定第一損失函數的方法可以是通過第一頻譜資料、第二頻譜資料和預測器模型的參數來確定第一損失函數。預測器模型可以是由一系列2維卷積組成的預測器。預測器的功能為保證第一模型和第二模型的輸出資料的尺度相同。The method of determining the first loss function may be to determine the first loss function by using the first spectral data, the second spectral data and the parameters of the predictor model. A predictor model can be a predictor consisting of a series of 2D convolutions. The function of the predictor is to ensure that the scales of the output data of the first model and the second model are the same.

在本發明的一些實施例中，所述第一頻譜資料包括所述第一模型中的K個第一恒等結構塊的輸出資料進行變換後的資料，所述第二頻譜資料包括所述第二模型中的K個第二恒等結構塊的輸出資料進行變換後的資料。一種可能的至少根據所述第一頻譜資料和所述第二頻譜資料，確定所述第一損失函數的方法，包括： B1、獲取預測器模型的第一參數，所述預測器模型用於確定所述第二模型和所述第一模型的輸出資料的尺度相同； B2、根據所述第一模型中的K個第一恒等結構塊的輸出資料進行變換後的資料、所述第二模型中的K個第二恒等結構塊的輸出資料進行變換後的資料和所述第一參數，確定第一損失函數。In some embodiments of the present invention, the first spectral data includes data transformed from output data of K first identity building blocks in the first model, and the second spectral data includes the first The output data of the K second identity building blocks in the second model are transformed data. A possible method for determining the first loss function according to at least the first spectral data and the second spectral data, comprising: B1, obtain the first parameter of the predictor model, and the predictor model is used to determine that the scale of the output data of the second model and the first model is the same; B2. Data transformed according to the output data of the K first identity building blocks in the first model, and transformed data according to the output data of the K second identity building blocks in the second model and the first parameter to determine a first loss function.

第一參數可以是可學習參數，具體可以理解為，可以根據樣本資料等進行優化學習的參數。在進行模型蒸餾時，預測器與第一模型同時被優化。The first parameter may be a learnable parameter, which may be specifically understood as a parameter that can be optimized and learned according to sample data and the like. During model distillation, the predictor is optimized simultaneously with the first model.

確定第一損失函數的方法可以為如下公式（1-3）所示的方法：

，（1-3）；其中，

為第一模型中第i個第一恒等結構塊輸出的資料進行變換後的頻譜，

為第二模型中第i個第二恒等結構塊輸出的資料進行變換後的頻譜，

是批數據，

是可學習參數為

的預測器模型，

為第一損失函數。The method for determining the first loss function may be the method shown in the following formula (1-3):

, (1-3); where,

is the transformed spectrum for the data output by the i-th first identity building block in the first model,

the transformed spectrum for the data output by the i-th second identity building block in the second model,

is batch data,

is the learnable parameter of

The predictor model of ,

is the first loss function.

在本發明的一些實施例中，一種可能的根據第一模型的網路參數和第二模型的網路參數確定第二損失函數的方法包括： C1、獲取所述第一模型的第一網路參數，以及獲取所述第二模型的第二網路參數； C2、對所述第一網路參數進行排序，得到第一累計分佈圖，以及對所述第二網路參數進行排序，得到第二累計分佈圖； C3、根據所述第一累計分佈圖和所述第二累計分佈圖的散度，確定所述第二損失函數。In some embodiments of the present invention, a possible method for determining the second loss function according to the network parameters of the first model and the network parameters of the second model includes: C1, obtaining the first network parameter of the first model, and obtaining the second network parameter of the second model; C2. Sort the first network parameters to obtain a first cumulative distribution map, and sort the second network parameters to obtain a second cumulative distribution map; C3. Determine the second loss function according to the divergence of the first cumulative distribution map and the second cumulative distribution map.

獲取第一模型的第一網路參數的方法可以為通過從記憶體中儲存的網路參數中獲取第一網路參數，也可以是通過對第一模型進行參數提取，從而獲取到第一網路參數，當然還可以是其它的方式獲取到第一網路參數，此處不作具體限定。獲取第二網路參數的方法可以參照獲取第一網路參數的方法，此處不再贅述。The method for obtaining the first network parameters of the first model may be by obtaining the first network parameters from the network parameters stored in the memory, or by performing parameter extraction on the first model to obtain the first network parameters. Of course, the first network parameter may also be obtained in other ways, which is not specifically limited here. For the method for obtaining the second network parameter, reference may be made to the method for obtaining the first network parameter, which will not be repeated here.

可以對第一網路參數進行分類，得到高頻參數和低頻參數，高頻參數可以理解為頻率大於或等於預設頻率閾值的參數，低頻參數可以理解為頻率小於預設頻率閾值的參數，預設頻率閾值通過經驗值或歷史資料設定。The first network parameters can be classified to obtain high-frequency parameters and low-frequency parameters. The frequency threshold is set by empirical value or historical data.

獲取到高頻參數和低頻參數，分別對高頻參數和低頻參數進行排序，以得到第一累計分佈圖。獲取第二累計分佈圖的方式可以與獲取第一累計分佈圖的方式相同，此處不再贅述。The high-frequency parameters and the low-frequency parameters are obtained, and the high-frequency parameters and the low-frequency parameters are sorted respectively to obtain a first cumulative distribution map. The manner of acquiring the second cumulative distribution diagram may be the same as the manner of acquiring the first cumulative distribution diagram, which will not be repeated here.

一種可能的根據第一累計分佈圖和第二累計分佈圖的散度確定第二損失函數的方法可以是如下公式（1-4）所示的方法：

，（1-4）；

表示隨機採樣函數，

為第一網路參數，

為第二網路參數，

為是卷積核

的數量，KL[]為散度。A possible method for determining the second loss function according to the divergence of the first cumulative distribution map and the second cumulative distribution map may be the method shown in the following formula (1-4):

, (1-4);

represents a random sampling function,

is the first network parameter,

is the second network parameter,

is the convolution kernel

The number of , and KL[] is the divergence.

在上述實施例中，通過第一模型的第一網路參數和第二模型的第二網路參數確定第二損失函數，從而可以使得蒸餾後得到的第三模型與第二模型在頻率分佈上對齊，提升模型蒸餾時的準確性。In the above embodiment, the second loss function is determined by the first network parameter of the first model and the second network parameter of the second model, so that the third model and the second model obtained after distillation can be in the frequency distribution Alignment to improve the accuracy of model distillation.

在本發明的一些實施例中，一種可能的根據所述第一損失函數和所述第二損失函數中至少一個，確定目標損失函數的方法包括： D1、獲取所述第一模型的第三損失函數，所述第三損失函數為視頻分類損失函數； D2、根據所述第一損失函數和所述第三損失函數，確定目標損失函數。In some embodiments of the present invention, a possible method for determining a target loss function according to at least one of the first loss function and the second loss function includes: D1. Obtain the third loss function of the first model, where the third loss function is a video classification loss function; D2. Determine a target loss function according to the first loss function and the third loss function.

第一模型的第三損失函數可以為對第一模型進行監督訓練時的損失函數，例如可以是常規的視頻分類損失函數。可以從儲存空間中獲取，也可以從網路中獲取，當然也可以通過其它方式獲取該第三損失函數。視頻分類函數可以是交叉熵。The third loss function of the first model may be a loss function during supervised training of the first model, for example, may be a conventional video classification loss function. The third loss function can be obtained from the storage space or from the network, of course, the third loss function can also be obtained in other ways. The video classification function can be cross entropy.

可以將第一損失函數和第三損失函數之和，確定為目標損失函數。The sum of the first loss function and the third loss function may be determined as the target loss function.

在本發明的一些實施例中，一種可能的根據所述第一損失函數和所述第二損失函數中至少一個，確定目標損失函數的方法包括： E1、獲取所述第一模型的第三損失函數，所述第三損失函數為視頻分類損失函數； E2、根據所述第二損失函數和所述第三損失函數，確定目標損失函數。In some embodiments of the present invention, a possible method for determining a target loss function according to at least one of the first loss function and the second loss function includes: E1. Obtain the third loss function of the first model, where the third loss function is a video classification loss function; E2. Determine a target loss function according to the second loss function and the third loss function.

可以將第二損失函數和第三順勢函數之和，確定為目標損失函數。The sum of the second loss function and the third homeopathic function can be determined as the target loss function.

在本發明的一些實施例中，一種可能的根據所述第一損失函數和所述第二損失函數中至少一個，確定目標損失函數的方法包括： F1、獲取所述第一模型的第三損失函數，所述第三損失函數為視頻分類損失函數； F2、根據所述第一損失函數、所述第二損失函數和所述第三損失函數，確定目標損失函數。In some embodiments of the present invention, a possible method for determining a target loss function according to at least one of the first loss function and the second loss function includes: F1. Obtain the third loss function of the first model, where the third loss function is a video classification loss function; F2. Determine a target loss function according to the first loss function, the second loss function and the third loss function.

可以獲取第一損失函數對應的第一權值，第二損失函數的第二權值，第三損失函數對應的第三權值，根據第一權值、第二權值和第三權值對第一損失函數、第二損失函數和第三損失函數進行權值運算，以得到目標損失函數。The first weight corresponding to the first loss function, the second weight corresponding to the second loss function, and the third weight corresponding to the third loss function can be obtained. According to the pair of the first weight, the second weight and the third weight The first loss function, the second loss function and the third loss function perform weight calculation to obtain the target loss function.

在本發明的一些實施例中，由於在蒸餾的過程中，教師網路中不可避免的會存在誤導資訊，即暗知識，這些暗知識並不能説明分類網路的學習，反而會誤導學生網路對於分類任務的判斷，在這種情況下，我們從概率分佈的角度引入了聯合學習的策略來進行高效的蒸餾。聯合學習可以通過第一損失函數、第二損失函數和第三損失函數共同確定目標損失函數的方式進行體現，具體的，一種可能的根據所述第一損失函數和所述第二損失函數中至少一個，確定目標損失函數的方法包括： G1、獲取第一模型的第三損失函數，所述第三損失函數為視頻分類損失函數； G2、獲取與所述第一損失函數和所述第二損失函數對應的權值函數； G3、根據所述權值函數、所述第一損失函數、所述第二損失函數和第三損失函數，確定所述目標損失函數。In some embodiments of the present invention, during the distillation process, there will inevitably be misleading information in the teacher's network, that is, dark knowledge. These dark knowledge cannot explain the learning of the classification network, but will mislead the students' network. For the judgment of classification tasks, in this case, we introduce the strategy of joint learning from the perspective of probability distribution for efficient distillation. The joint learning can be embodied in the way of jointly determining the target loss function by the first loss function, the second loss function and the third loss function. One, the method of determining the objective loss function includes: G1. Obtain the third loss function of the first model, where the third loss function is a video classification loss function; G2. Obtain the weight function corresponding to the first loss function and the second loss function; G3. Determine the target loss function according to the weight function, the first loss function, the second loss function and the third loss function.

一種可能的獲取權值函數的方法包括：設定對第一模型進行調整時的第一階段的反覆運算次數為N1，第二階段的最大反覆運算次數為N2，則權值函數可以通過如下公式（1-5）表示：

，（1-5）；其中，分別

分別表示常量

的權重值。

表示訓練的反覆運算週期次數，

表示第一階段的反覆運算次數，

表示第二階段最大的反覆運算次數，

為權值函數。A possible method for obtaining the weight function includes: setting the number of repeated operations in the first stage to be N1 when adjusting the first model, and the maximum number of repeated operations in the second stage as N2, then the weight function can be calculated by the following formula ( 1-5) means:

, (1-5); where, respectively

represent constants

weight value.

Indicates the number of repeated operation cycles for training,

represents the number of repeated operations in the first stage,

represents the maximum number of repeated operations in the second stage,

is the weight function.

上述權值函數具體可以理解為，在前

個反覆運算週期中的權重因數可以定義為一個概率為

的常量，

為選取輸出分數為

的樣本對應特徵的概率。然後在接下來的

個反覆運算週期，權重因數以一個指數函數動態的變化。在初期設定了一個比較高的權重因數來選取教師特徵和網路參數用於蒸餾，然後這個權重呈指數下降，在最後一個階段，權重因數將會是一個相對較小的常量，從而可以提升蒸餾時的準確性。The above weight function can be specifically understood as, before

The weight factor in iterative operation cycles can be defined as a probability of

constant,

To select the output score as

The probability of the sample corresponding to the feature. then in the next

The weighting factor changes dynamically with an exponential function. In the initial stage, a relatively high weight factor is set to select teacher features and network parameters for distillation, and then this weight decreases exponentially. In the last stage, the weight factor will be a relatively small constant, which can improve distillation. time accuracy.

可以將權值函數與第一損失函數、第二損失函數的乘積，將該乘積與第三損失函數之和確定為目標損失函數，具體可以通過如下公式（1-6）所示的方法確定目標損失函數：

，（1-6）；其中，

為第一損失函數，

為第二損失函數，

為第三損失函數，

為目標損失函數。The product of the weight function, the first loss function and the second loss function can be multiplied, and the sum of the product and the third loss function can be determined as the target loss function. Specifically, the target can be determined by the method shown in the following formula (1-6). Loss function:

, (1-6); where,

is the first loss function,

is the second loss function,

is the third loss function,

is the target loss function.

在上述實施例中，通過聯合學習的方式，採用有第一損失函數、第二損失函數和第三損失函數確定的目標損失函數，對第一模型進行調整得到第三模型，可以提升對第一模型進行蒸餾得到第三模型時的準確性。In the above embodiment, through joint learning, the target loss function determined by the first loss function, the second loss function and the third loss function is used to adjust the first model to obtain the third model, which can improve the accuracy of the first loss function. The accuracy when the model is distilled to obtain the third model.

在本發明的一些實施例中，模型確定方法還可以包括如下方法： H1、接收待分類視頻； H2、通過所述第三模型對所述待分類視頻進行分類處理，以得到分類結果。In some embodiments of the present invention, the model determination method may further include the following methods: H1. Receive the video to be classified; H2. Classify the video to be classified by using the third model to obtain a classification result.

分類結果例如可以是不同的用戶的動作的類別，例如，動作可以是行走、站立等，當然也可以是其他事物的運動類別，例如，汽車的行駛路線、行駛動作等，此處僅為舉例說明不作具體限定。The classification results can be, for example, the categories of actions of different users, for example, the actions can be walking, standing, etc., of course, they can also be the motion categories of other things, such as the driving route of the car, driving actions, etc., which are only for illustration here. There is no specific limitation.

因此，可以通過調整後得到的第三模型對待分類視頻進行分類處理，得到分類結果，相對於相關技術方案中的學生模型，能夠分類結果獲取時的準確性。Therefore, the third model obtained after adjustment can be used to classify the video to be classified to obtain the classification result. Compared with the student model in the related technical solution, the accuracy of the classification result can be obtained.

在本發明的一些實施例中，下面介紹一個具體的應用場景。在需要進行簡單部署的場景中，簡單部署的場景可以理解為需要部署分類模型的設備的資源受限，不易於部署較大的分類模型，或者說由於資源的限制不能部署較大的分類模型，此時，則可以對較大的分類模型（教師模型）進行蒸餾得到較小的分類模型（分類模型），較大的分類模型可以理解為模型參數複雜，分類精度高的模型，較小的分類模型可以理解為模型參數較為簡潔，精度與較大的分類模型進行分類時的精度相近。In some embodiments of the present invention, a specific application scenario is introduced below. In the scenario that requires simple deployment, the scenario of simple deployment can be understood as the resource of the device that needs to deploy the classification model is limited, and it is not easy to deploy a larger classification model, or it is impossible to deploy a larger classification model due to resource constraints. At this time, the larger classification model (teacher model) can be distilled to obtain a smaller classification model (classification model). The larger classification model can be understood as a model with complex model parameters and high classification accuracy, and a smaller classification model. The model can be understood as the model parameters are relatively simple, and the accuracy is similar to the classification accuracy of the larger classification model.

在對教師模型（第二模型）進行蒸餾得到學生模型時，可以確定對初始學生模型（第一模型）進行調整的目標損失函數，在確定目標損失函數時，可以通過初始學生模型和教師模型的輸出資料通過離散傅立葉變換後的頻譜資料，來確定出第一損失函數，通過頻譜資料的特徵來對模型進行蒸餾，以及根據教師模型和初始學生模型自身的模型參數，確定第二損失函數，通過模型本身的特徵來對模型進行蒸餾，再獲取初始學生模型的視頻分類損失函數，該視頻分類損失函數可以為交叉熵，通過第一損失函數、第二損失函數和第三損失函數進行聯合學習，即通過第一損失函數、第二損失函數和第三損失函數確定目標損失函數，通過目標損失函數對初始學生模型進行調整，得到調整後的學生模型（第三模型），通過聯合學習的方式來確定調整後的學生模型，可以減少教師網路中不可避免的會存在誤導資訊（暗知識），提升了調整後的學生模型進行分類時的準確性。When the student model is obtained by distilling the teacher model (the second model), the target loss function for adjusting the initial student model (the first model) can be determined. When determining the target loss function, the difference between the initial student model and the teacher model can be used The output data determines the first loss function through the spectral data after discrete Fourier transform, the model is distilled through the characteristics of the spectral data, and the second loss function is determined according to the model parameters of the teacher model and the initial student model itself, through The characteristics of the model itself are used to distill the model, and then the video classification loss function of the initial student model is obtained. The video classification loss function can be cross entropy, and the first loss function, the second loss function and the third loss function are used for joint learning. That is, the target loss function is determined by the first loss function, the second loss function and the third loss function, and the initial student model is adjusted by the target loss function to obtain the adjusted student model (third model), which is obtained by joint learning. Determining the adjusted student model can reduce the inevitable misleading information (dark knowledge) in the teacher's network, and improve the accuracy of the adjusted student model for classification.

在得到調整後的學生模型後，將該調整後的學生模型部署到資源受限的設備上，以執行視頻分類任務。當然，也可以將調整後的學生模型部署到其它設備上，此處僅為舉例說明不作具體限定。After the adjusted student model is obtained, the adjusted student model is deployed on resource-constrained devices to perform video classification tasks. Of course, the adjusted student model can also be deployed on other devices, which is only for illustration and not specifically limited here.

請參閱圖2，圖2為本發明實施例提供的一種模型確定方法的流程示意圖，該方法可以由終端設備或伺服器或其它處理設備執行。如圖2所示，模型確定方法包括如下。Please refer to FIG. 2. FIG. 2 is a schematic flowchart of a method for determining a model according to an embodiment of the present invention. The method may be executed by a terminal device, a server, or other processing devices. As shown in Figure 2, the model determination method includes the following.

201、根據第一模型對輸入視頻進行分類處理，得到第一特徵資料，以及根據第二模型對輸入視頻進行分類處理，得到第二特徵資料；輸入視頻可以是需要進行分類處理的視頻，例如，使用者在街道行走時的視頻、使用者做出某種肢體動作時的視頻等，此處僅為舉例說明，不作具體限定。201, classifying the input video according to the first model to obtain first feature data, and classifying the input video according to the second model to obtain second feature data; The input video may be a video that needs to be classified, for example, a video of a user walking on a street, a video of a user performing a certain body movement, etc., which are only illustrative and not specifically limited.

202、對第一特徵資料進行變換，以得到第一頻譜資料，以及對第二特徵資料進行變換，以得到第二頻譜資料；對第一特徵資料和第二特徵資料進行變換的方法可以是，通過離散傅立葉變換，對第一特徵資料和第二特徵資料進行變換，以得到對應的第一頻譜資料和第二頻譜資料。第一頻譜資料包括第一模型中的K個第一恒等結構塊的輸出資料進行變換後的資料，第二頻譜資料包括第二模型中的K個第二恒等結構塊的輸出資料進行變換後的資料。202. Transform the first feature data to obtain the first spectrum data, and transform the second feature data to obtain the second spectrum data; The method for transforming the first characteristic data and the second characteristic data may be to transform the first characteristic data and the second characteristic data through discrete Fourier transform, so as to obtain the corresponding first and second spectral data. The first spectral data includes the transformed data of the output data of the K first identity building blocks in the first model, and the second spectral data includes the transformed output data of the K second identity building blocks in the second model later information.

203、至少根據第一頻譜資料和第二頻譜資料，確定第一損失函數；可以根據預測器模型的參數、第一頻譜資料和第二頻譜資料，確定第一損失函數。203. Determine a first loss function according to at least the first spectral data and the second spectral data; The first loss function may be determined based on the parameters of the predictor model, the first spectral data, and the second spectral data.

204、根據第一模型的網路參數和第二模型的網路參數，確定第二損失函數；204. Determine a second loss function according to the network parameters of the first model and the network parameters of the second model;

205、根據第一損失函數和第二損失函數中至少一個，確定目標損失函數；205. Determine a target loss function according to at least one of the first loss function and the second loss function;

206、根據目標損失函數對第一模型進行調整，以得到第三模型。206. Adjust the first model according to the target loss function to obtain a third model.

請參閱圖3，圖3為本發明實施例提供的一種模型確定方法的流程示意圖，該方法可以由終端設備或伺服器或其它處理設備執行。如圖3所示，該方法包括如下。Please refer to FIG. 3 , which is a schematic flowchart of a method for determining a model according to an embodiment of the present invention, and the method may be executed by a terminal device, a server, or other processing devices. As shown in Figure 3, the method includes the following.

301、根據第一模型對輸入視頻進行分類處理，得到第一特徵資料，以及根據第二模型對輸入視頻進行分類處理，得到第二特徵資料；輸入視頻可以是需要進行分類處理的視頻，例如，使用者在街道行走時的視頻、使用者做出某種肢體動作時的視頻等，此處僅為舉例說明，不作具體限定。301. Classify the input video according to the first model to obtain the first feature data, and classify the input video according to the second model to obtain the second feature data; The input video may be a video that needs to be classified, for example, a video of a user walking on a street, a video of a user performing a certain body movement, etc., which are only illustrative and not specifically limited.

302、根據第一特徵資料、第二特徵資料，確定第一損失函數；可以對第一特徵資料、第二特徵資料進行變化，以得到對應的第一頻譜資料和第二頻譜資料，至少根據第一頻譜資料和第二頻譜資料，確定第一損失函數。對第一特徵資料和第二特徵資料進行變換的方法可以是，通過離散傅立葉變換，對第一特徵資料和第二特徵資料進行變換，以得到對應的第一頻譜資料和第二頻譜資料。第一頻譜資料包括第一模型中的K個第一恒等結構塊的輸出資料進行變換後的資料，第二頻譜資料包括第二模型中的K個第二恒等結構塊的輸出資料進行變換後的資料。302. Determine a first loss function according to the first feature data and the second feature data; The first characteristic data and the second characteristic data may be changed to obtain the corresponding first spectral data and the second spectral data, and the first loss function is determined at least according to the first spectral data and the second spectral data. The method for transforming the first characteristic data and the second characteristic data may be to transform the first characteristic data and the second characteristic data through discrete Fourier transform, so as to obtain the corresponding first and second spectral data. The first spectral data includes the transformed data of the output data of the K first identity building blocks in the first model, and the second spectral data includes the transformed output data of the K second identity building blocks in the second model later information.

303、獲取第一模型的第一網路參數，以及獲取第二模型的第二網路參數；303. Obtain first network parameters of the first model, and obtain second network parameters of the second model;

304、對第一網路參數進行排序，得到第一累計分佈圖，以及對第二網路參數進行排序，得到第二累計分佈圖；可以對第一網路參數進行分類，得到高頻參數和低頻參數，高頻參數可以理解為頻率大於或等於預設頻率閾值的參數，低頻參數可以理解為頻率小於預設頻率閾值的參數，預設頻率閾值通過經驗值或歷史資料設定。304. Sort the first network parameters to obtain a first cumulative distribution map, and sort the second network parameters to obtain a second cumulative distribution map; The first network parameters can be classified to obtain high-frequency parameters and low-frequency parameters. The frequency threshold is set by empirical value or historical data.

305、根據第一累計分佈圖和第二累計分佈圖的散度，確定第二損失函數；305. Determine a second loss function according to the divergence of the first cumulative distribution map and the second cumulative distribution map;

306、根據第一損失函數和第二損失函數中至少一個，確定目標損失函數；306. Determine a target loss function according to at least one of the first loss function and the second loss function;

307、根據目標損失函數對第一模型進行調整，以得到第三模型。307. Adjust the first model according to the target loss function to obtain a third model.

與上述實施例一致的，請參閱圖4，圖4為本發明實施例提供的一種終端400的結構示意圖，如圖所示，包括處理器410、輸入設備420、輸出設備430和記憶體440，處理器410、輸入設備420、輸出設備430和記憶體440相互連接，其中，記憶體配置為儲存電腦程式（即圖4中的一個或多個程式441），電腦程式包括程式指令，處理器被配置為調用程式指令，上述程式包括用於執行以下步驟的指令；根據第一模型對輸入視頻進行分類處理，得到第一特徵資料，以及根據第二模型對輸入視頻進行分類處理，得到第二特徵資料；根據第一特徵資料、第二特徵資料，確定第一損失函數；根據第一模型和第二模型的網路參數，確定第二損失函數；根據第一損失函數和第二損失函數中至少一個，確定目標損失函數；根據目標損失函數對第一模型進行調整，以得到第三模型。Consistent with the above embodiments, please refer to FIG. 4 . FIG. 4 is a schematic structural diagram of a terminal 400 according to an embodiment of the present invention. As shown in the figure, it includes a processor 410 , an input device 420 , an output device 430 , and a memory 440 . The processor 410, the input device 420, the output device 430, and the memory 440 are interconnected, wherein the memory is configured to store a computer program (ie, one or more programs 441 in FIG. 4), the computer program includes program instructions, and the processor is configured to invoke program instructions including instructions for performing the following steps; Classify the input video according to the first model to obtain the first feature data, and classify the input video according to the second model to obtain the second feature data; Determine the first loss function according to the first feature data and the second feature data; Determine the second loss function according to the network parameters of the first model and the second model; Determine the target loss function according to at least one of the first loss function and the second loss function; The first model is adjusted according to the objective loss function to obtain the third model.

在本發明的一些實施例中，根據第一特徵資料、第二特徵資料，確定第一損失函數，包括：對第一特徵資料進行變換，以得到第一頻譜資料，以及對第二特徵資料進行變換，以得到第二頻譜資料；至少根據第一頻譜資料和第二頻譜資料，確定第一損失函數。In some embodiments of the present invention, the first loss function is determined according to the first feature data and the second feature data, including: Transforming the first characteristic data to obtain first spectral data, and transforming the second characteristic data to obtain second spectral data; A first loss function is determined based on at least the first spectral data and the second spectral data.

在本發明的一些實施例中，該方法還包括：接收待分類視頻；通過第三模型對所述待分類視頻進行分類處理，以得到分類結果。In some embodiments of the present invention, the method further includes: Receive videos to be classified; The video to be classified is classified by the third model to obtain a classification result.

上述主要從方法側執行過程的角度對本發明實施例的方案進行了介紹。可以理解的是，終端為了實現上述功能，其包含了執行各個功能相應的硬體結構和/或軟體模組。本領域技術人員應該很容易意識到，結合本文中所提供的實施例描述的各示例的單元及演算法步驟，本發明能夠以硬體或硬體和電腦軟體的結合形式來實現。某個功能究竟以硬體還是電腦軟體驅動硬體的方式來執行，取決於技術方案的特定應用和設計約束條件。專業技術人員可以對每個特定的應用使用不同方法來實現所描述的功能，但是這種實現不應認為超出本發明的範圍。The foregoing describes the solutions of the embodiments of the present invention mainly from the perspective of the method-side execution process. It can be understood that, in order to realize the above functions, the terminal includes corresponding hardware structures and/or software modules for executing each function. Those skilled in the art should easily realize that the present invention can be implemented in hardware or a combination of hardware and computer software in conjunction with the units and algorithm steps of the examples described in the embodiments provided herein. Whether a function is performed by hardware or computer software driving hardware depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of the present invention.

本發明實施例可以根據上述方法示例對終端進行功能單元的劃分，例如，可以對應各個功能劃分各個功能單元，也可以將兩個或兩個以上的功能集成在一個處理單元中。上述集成的單元既可以採用硬體的形式實現，也可以採用軟體功能單元的形式實現。需要說明的是，本發明實施例中對單元的劃分是示意性的，僅僅為一種邏輯功能劃分，實際實現時可以有另外的劃分方式。In this embodiment of the present invention, the terminal can be divided into functional units according to the foregoing method examples. For example, each functional unit can be divided corresponding to each function, or two or more functions can be integrated into one processing unit. The above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of software functional units. It should be noted that, the division of units in the embodiment of the present invention is schematic, and is only a logical function division, and there may be other division manners in actual implementation.

與上述一致的，請參閱圖5，圖5為本發明實施例提供的一種模型確定裝置的結構示意圖。如圖5所示，該裝置包括：處理單元501，配置為根據第一模型對輸入視頻進行分類處理，得到第一特徵資料，以及根據第二模型對輸入視頻進行分類處理，得到第二特徵資料；第一確定單元502，配置為根據第一特徵資料、第二特徵資料，確定第一損失函數；第二確定單元503，配置為根據第一模型和第二模型的網路參數，確定第二損失函數；第三確定單元504，配置為根據第一損失函數和第二損失函數中至少一個，確定目標損失函數；調整單元505，配置為根據目標損失函數對第一模型進行調整，以得到第三模型。Consistent with the above, please refer to FIG. 5 , which is a schematic structural diagram of an apparatus for determining a model according to an embodiment of the present invention. As shown in Figure 5, the device includes: The processing unit 501 is configured to classify and process the input video according to the first model to obtain the first feature data, and to classify and process the input video according to the second model to obtain the second feature data; The first determining unit 502 is configured to determine the first loss function according to the first feature data and the second feature data; The second determining unit 503 is configured to determine the second loss function according to the network parameters of the first model and the second model; The third determining unit 504 is configured to determine the target loss function according to at least one of the first loss function and the second loss function; The adjustment unit 505 is configured to adjust the first model according to the objective loss function to obtain the third model.

在本發明的一些實施例中，在根據第一特徵資料、第二特徵資料，確定第一損失函數方面，第一確定單元502還配置為：對第一特徵資料進行變換，以得到第一頻譜資料，以及對第二特徵資料進行變換，以得到第二頻譜資料；至少根據第一頻譜資料和第二頻譜資料，確定第一損失函數。In some embodiments of the present invention, in terms of determining the first loss function according to the first feature data and the second feature data, the first determining unit 502 is further configured to: Transforming the first characteristic data to obtain first spectral data, and transforming the second characteristic data to obtain second spectral data; A first loss function is determined based on at least the first spectral data and the second spectral data.

在本發明的一些實施例中，第一頻譜資料包括第一模型中的K個第一恒等結構塊的輸出資料進行變換後的資料，第二頻譜資料包括第二模型中的K個第二恒等結構塊的輸出資料進行變換後的資料，在至少根據第一頻譜資料和第二頻譜資料，確定第一損失函數方面，第一確定單元502還配置為：獲取預測器模型的第一參數，預測器模型用於確定第二模型和第一模型的輸出資料的尺度相同；根據第一模型中的K個第一恒等結構塊的輸出資料進行變換後的資料、第二模型中的K個第二恒等結構塊的輸出資料進行變換後的資料和第一參數，確定第一損失函數。In some embodiments of the present invention, the first spectral data includes data transformed from the output data of K first identity building blocks in the first model, and the second spectral data includes K second data in the second model In terms of determining the first loss function according to at least the first spectral data and the second spectral data, the first determining unit 502 is further configured to: obtaining the first parameter of the predictor model, and the predictor model is used to determine that the scale of the output data of the second model and the first model is the same; According to the data transformed from the output data of the K first identity building blocks in the first model, the transformed data from the output data of the K second identity building blocks in the second model, and the first parameter, determine The first loss function.

在本發明的一些實施例中，第二確定單元503還配置為：獲取第一模型的第一網路參數，以及獲取第二模型的第二網路參數；對第一網路參數進行排序，得到第一累計分佈圖，以及對第二網路參數進行排序，得到第二累計分佈圖；根據第一累計分佈圖和第二累計分佈圖的散度，確定第二損失函數。In some embodiments of the present invention, the second determining unit 503 is further configured to: obtaining the first network parameter of the first model, and obtaining the second network parameter of the second model; Sorting the first network parameters to obtain a first cumulative distribution map, and sorting the second network parameters to obtain a second cumulative distribution map; A second loss function is determined based on the divergence of the first cumulative profile and the second cumulative profile.

在本發明的一些實施例中，第三確定單元504還配置為：獲取第一模型的第三損失函數，第三損失函數為視頻分類損失函數；根據第一損失函數和第三損失函數，確定目標損失函數。In some embodiments of the present invention, the third determining unit 504 is further configured to: Obtain the third loss function of the first model, where the third loss function is a video classification loss function; According to the first loss function and the third loss function, the target loss function is determined.

在本發明的一些實施例中，第三確定單元504還配置為：獲取第一模型的第三損失函數，第三損失函數為視頻分類損失函數；根據第二損失函數和第三損失函數，確定目標損失函數。In some embodiments of the present invention, the third determining unit 504 is further configured to: Obtain the third loss function of the first model, where the third loss function is a video classification loss function; According to the second loss function and the third loss function, the target loss function is determined.

在本發明的一些實施例中，第三確定單元504還配置為：獲取第一模型的第三損失函數，第三損失函數為視頻分類損失函數；根據第一損失函數、第二損失函數和第三損失函數，確定目標損失函數。In some embodiments of the present invention, the third determining unit 504 is further configured to: Obtain the third loss function of the first model, where the third loss function is a video classification loss function; The target loss function is determined according to the first loss function, the second loss function and the third loss function.

在本發明的一些實施例中，第三確定單元504還配置為：獲取第一模型的第三損失函數，第三損失函數為視頻分類損失函數；獲取與第一損失函數和第二損失函數對應的權值函數；根據權值函數、第一損失函數、第二損失函數和第三損失函數，確定目標損失函數。In some embodiments of the present invention, the third determining unit 504 is further configured to: Obtain the third loss function of the first model, where the third loss function is a video classification loss function; Obtain the weight function corresponding to the first loss function and the second loss function; The target loss function is determined according to the weight function, the first loss function, the second loss function and the third loss function.

本發明實施例還提供一種電腦儲存介質，其中，該電腦儲存介質儲存用於電子資料交換的電腦程式，該電腦程式使得電腦執行如上述方法實施例中記載的任何一種模型確定方法的部分或全部步驟。An embodiment of the present invention further provides a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, the computer program enables the computer to execute part or all of any one of the model determination methods described in the above method embodiments step.

本發明實施例還提供一種電腦程式產品，所述電腦程式產品包括儲存了電腦程式的非暫態性電腦可讀儲存介質，該電腦程式使得電腦執行如上述方法實施例中記載的任何一種模型確定方法的部分或全部步驟。Embodiments of the present invention further provide a computer program product, the computer program product comprising a non-transitory computer-readable storage medium storing a computer program, the computer program enabling the computer to perform any one of the model determinations described in the above method embodiments some or all of the steps of the method.

需要說明的是，對於前述的各方法實施例，為了簡單描述，故將其都表述為一系列的動作組合，但是本領域技術人員應該知悉，本發明實施例並不受所描述的動作順序的限制，因為依據本發明實施例，某些步驟可以採用其他順序或者同時進行。其次，本領域技術人員也應該知悉，說明書中所描述的實施例均屬於優選實施例，所涉及的動作和模組並不一定是本發明所必須的。It should be noted that, for the sake of simple description, the foregoing method embodiments are all expressed as a series of action combinations, but those skilled in the art should know that the embodiments of the present invention are not limited by the described action sequences. Limitation, because according to embodiments of the present invention, certain steps may be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the present invention.

在上述實施例中，對各個實施例的描述都各有側重，某個實施例中沒有詳述的部分，可以參見其他實施例的相關描述。In the above-mentioned embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference may be made to the relevant descriptions of other embodiments.

在本發明所提供的幾個實施例中，應該理解到，所揭露的裝置，可通過其它的方式實現。例如，以上所描述的裝置實施例僅僅是示意性的，例如所述單元的劃分，僅僅為一種邏輯功能劃分，實際實現時可以有另外的劃分方式，例如多個單元或元件可以結合或者可以集成到另一個系統，或一些特徵可以忽略，或不執行。另一點，所顯示或討論的相互之間的耦合或直接耦合或通信連接可以是通過一些介面，裝置或單元的間接耦合或通信連接，可以是電性或其它的形式。In the several embodiments provided by the present invention, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or elements may be combined or integrated. to another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical or other forms.

所述作為分離部件說明的單元可以是或者也可以不是物理上分開的，作為單元顯示的部件可以是或者也可以不是物理單元，即可以位於一個地方，或者也可以分佈到多個網路單元上。可以根據實際的需要選擇其中的部分或者全部單元來實現本實施例方案的目的。The unit described as a separate component may or may not be physically separated, and the component displayed as a unit may or may not be a physical unit, that is, it may be located in one place, or may be distributed to multiple network units . Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.

另外，在申請明各個實施例中的各功能單元可以集成在一個處理單元中，也可以是各個單元單獨物理存在，也可以兩個或兩個以上單元集成在一個單元中。上述集成的單元既可以採用硬體的形式實現，也可以採用軟體程式模組的形式實現。In addition, it is stated in the application that each functional unit in each embodiment may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be realized in the form of hardware, or can be realized in the form of software program module.

所述集成的單元如果以軟體程式模組的形式實現並作為獨立的產品銷售或使用時，可以儲存在一個電腦可讀取記憶體中。基於這樣的理解，本發明的技術方案本質上或者說對現有技術做出貢獻的部分或者該技術方案的全部或部分可以以軟體產品的形式體現出來，該電腦軟體產品儲存在一個記憶體中，包括若干指令用以使得一台電腦設備（可為個人電腦、伺服器或者網路設備等）執行本發明各個實施例所述方法的全部或部分步驟。而前述的記憶體包括：U盤、唯讀記憶體（read-only memory，ROM）、隨機存取記憶體（random access memory，RAM）、移動硬碟、磁碟或者光碟等各種可以儲存程式碼的介質。If the integrated unit is implemented in the form of a software program module and sold or used as an independent product, it can be stored in a computer-readable memory. Based on this understanding, the technical solution of the present invention is essentially or the part that contributes to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a memory, Several instructions are included to cause a computer device (which may be a personal computer, a server or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned memory includes: U disk, read-only memory (ROM), random access memory (RAM), mobile hard disk, disk or CD, etc. that can store code medium.

本領域普通技術人員可以理解上述實施例的各種方法中的全部或部分步驟是可以通過程式來指令相關的硬體來完成，該程式可以儲存於一電腦可讀記憶體中，記憶體可以包括：快閃記憶體盤、唯讀記憶體、隨機存取器、磁片或光碟等。Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above embodiments can be completed by instructing the relevant hardware through a program, and the program can be stored in a computer-readable memory, and the memory can include: Flash memory disks, read-only memory, random access devices, magnetic disks or optical disks, etc.

以上對本發明實施例進行了詳細介紹，本文中應用了具體個例對本發明的原理及實施方式進行了闡述，以上實施例的說明只是用於幫助理解本發明的方法及其核心思想；同時，對於本領域的一般技術人員，依據本發明的思想，在具體實施方式及應用範圍上均會有改變之處，綜上所述，本說明書內容不應理解為對本發明的限制。The embodiments of the present invention have been introduced in detail above, and specific examples are used to illustrate the principles and implementations of the present invention. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the present invention; at the same time, for Persons of ordinary skill in the art, according to the idea of the present invention, will have changes in the specific embodiments and application scope. To sum up, the contents of this specification should not be construed as limiting the present invention.

工業實用性本發明實施例提供了一種模型確定方法及相關裝置、終端、電腦可讀儲存介質、電腦程式產品，其中方法包括：根據第一模型對所述輸入視頻進行分類處理，得到第一特徵資料，以及根據第二模型對所述輸入視頻進行分類處理，得到第二特徵資料；根據所述第一特徵資料、所述第二特徵資料，確定第一損失函數；根據所述第一模型的網路參數和所述第二模型的網路參數，確定第二損失函數；根據所述第一損失函數和所述第二損失函數中至少一個，確定目標損失函數；根據所述目標損失函數對所述第一模型進行調整，以得到第三模型。根據本發明實施例提供的模型確定方法，可以確定分類模型，根據確定的分類模型對待分類視頻進行分類，可以提升視頻分類時的準確性。Industrial Applicability Embodiments of the present invention provide a model determination method and related devices, terminals, computer-readable storage media, and computer program products, wherein the method includes: classifying the input video according to a first model to obtain first feature data, and Classify the input video according to the second model to obtain second feature data; determine a first loss function according to the first feature data and the second feature data; and determine the network parameters according to the first model and the network parameters of the second model to determine a second loss function; determine a target loss function according to at least one of the first loss function and the second loss function; determine the first loss function according to the target loss function A model is adjusted to obtain a third model. According to the model determination method provided by the embodiment of the present invention, a classification model can be determined, and the video to be classified can be classified according to the determined classification model, which can improve the accuracy of video classification.

10:目標區域 20:攝影頭 30:伺服器 31:視頻分類模型 40:分類結果 400:終端 410:處理器 420:輸入設備 430:輸出設備 440:記憶體 441:一個或多個程式 501:處理單元 502:第一確定單元 503:第二確定單元 504:第三確定單元 505:調整單元 101~105:步驟 201~206:步驟 301~307:步驟10: Target area 20: Camera 30: Server 31: Video Classification Model 40: Classification results 400: Terminal 410: Processor 420: input device 430: output device 440: memory 441: One or more programs 501: Processing unit 502: The first determination unit 503: Second determination unit 504: The third determination unit 505: Adjustment unit 101~105: Steps 201~206: Steps 301~307: Steps

為了更清楚地說明本發明實施例或相關技術中的技術方案，下面將對實施例或相關技術描述中所需要使用的附圖作簡單地介紹，顯而易見地，下面描述中的附圖僅僅是本發明的一些實施例，對於本領域普通技術人員來講，在不付出創造性勞動的前提下，還可以根據這些附圖獲得其他的附圖。圖1A為本發明實施例提供的一種模型確定方法的應用場景示意圖；圖1B為本發明實施例提供的一種模型確定方法的流程示意圖；圖2為本發明實施例提供的一種模型確定方法的流程示意圖；圖3為本發明實施例提供的一種模型確定方法的流程示意圖；圖4為本發明實施例提供的一種終端的結構示意圖；圖5為本發明實施例提供的一種模型確定裝置的結構示意圖。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or related technologies, the following briefly introduces the accompanying drawings required for the description of the embodiments or related technologies. Obviously, the accompanying drawings in the following description are only the For some embodiments of the invention, for those of ordinary skill in the art, other drawings can also be obtained according to these drawings without any creative effort. 1A is a schematic diagram of an application scenario of a model determination method provided by an embodiment of the present invention; 1B is a schematic flowchart of a method for determining a model according to an embodiment of the present invention; 2 is a schematic flowchart of a method for determining a model according to an embodiment of the present invention; 3 is a schematic flowchart of a method for determining a model according to an embodiment of the present invention; FIG. 4 is a schematic structural diagram of a terminal according to an embodiment of the present invention; FIG. 5 is a schematic structural diagram of an apparatus for determining a model according to an embodiment of the present invention.

101~105:步驟101~105: Steps

Claims

一種模型確定方法，所述模型確定方法用於確定分類模型，確定的所述分類模型應用於對待分類視頻進行分類，所述方法包括：根據第一模型對輸入視頻進行分類處理，得到第一特徵資料，以及根據第二模型對所述輸入視頻進行分類處理，得到第二特徵資料；根據所述第一特徵資料、所述第二特徵資料，確定第一損失函數；根據所述第一模型的網路參數和所述第二模型的網路參數，確定第二損失函數；根據所述第一損失函數和所述第二損失函數中至少一個，確定目標損失函數；根據所述目標損失函數對所述第一模型進行調整，以得到第三模型。A model determination method, wherein the model determination method is used to determine a classification model, and the determined classification model is used to classify a video to be classified, and the method includes: classifying the input video according to the first model to obtain first feature data, and classifying the input video according to the second model to obtain second feature data; determining a first loss function according to the first feature data and the second feature data; determining a second loss function according to the network parameters of the first model and the network parameters of the second model; determining a target loss function according to at least one of the first loss function and the second loss function; The first model is adjusted according to the objective loss function to obtain a third model.

根據請求項1所述的方法，其中，所述根據所述第一特徵資料、所述第二特徵資料，確定第一損失函數，包括：對所述第一特徵資料進行變換，以得到第一頻譜資料，以及對所述第二特徵資料進行變換，以得到第二頻譜資料；至少根據所述第一頻譜資料和所述第二頻譜資料，確定所述第一損失函數。The method according to claim 1, wherein the determining the first loss function according to the first feature data and the second feature data includes: Transforming the first characteristic data to obtain first spectral data, and transforming the second characteristic data to obtain second spectral data; The first loss function is determined based on at least the first spectral data and the second spectral data.

根據請求項2所述的方法，其中，所述第一頻譜資料包括所述第一模型中的K個第一恒等結構塊的輸出資料進行變換後的資料，所述第二頻譜資料包括所述第二模型中的K個第二恒等結構塊的輸出資料進行變換後的資料，其中，K為大於0的整數；所述至少根據所述第一頻譜資料和所述第二頻譜資料，確定所述第一損失函數，包括：獲取預測器模型的第一參數，所述預測器模型用於確定所述第二模型和所述第一模型的輸出資料的尺度相同；根據所述第一模型中的K個第一恒等結構塊的輸出資料進行變換後的資料、所述第二模型中的K個第二恒等結構塊的輸出資料進行變換後的資料和所述第一參數，確定第一損失函數。The method according to claim 2, wherein the first spectral data includes data transformed from output data of K first identity building blocks in the first model, and the second spectral data includes all the data obtained by transforming the output data of the K second identity building blocks in the second model, wherein K is an integer greater than 0; and according to at least the first spectral data and the second spectral data, Determining the first loss function includes: obtaining a first parameter of a predictor model, the predictor model being used to determine that the scale of the output data of the second model and the first model is the same; The data transformed according to the output data of the K first identity building blocks in the first model, the transformed data and the output data of the K second identity building blocks in the second model The first parameter is described, and the first loss function is determined.

根據請求項1至3任一項所述的方法，其中，所述根據所述第一模型的網路參數和所述第二模型的網路參數，確定第二損失函數，包括：獲取所述第一模型的第一網路參數，以及獲取所述第二模型的第二網路參數；對所述第一網路參數進行排序，得到第一累計分佈圖，以及對所述第二網路參數進行排序，得到第二累計分佈圖；根據所述第一累計分佈圖和所述第二累計分佈圖的散度，確定所述第二損失函數。The method according to any one of claim 1 to 3, wherein the determining the second loss function according to the network parameters of the first model and the network parameters of the second model includes: acquiring first network parameters of the first model, and acquiring second network parameters of the second model; Sorting the first network parameters to obtain a first cumulative distribution map, and sorting the second network parameters to obtain a second cumulative distribution map; The second loss function is determined based on the divergence of the first cumulative profile and the second cumulative profile.

根據請求項1至3任一項所述的方法，其中，所述根據所述第一損失函數和所述第二損失函數中至少一個，確定目標損失函數，包括：獲取所述第一模型的第三損失函數，所述第三損失函數為視頻分類損失函數；根據所述第一損失函數和所述第三損失函數，確定目標損失函數。The method according to any one of claims 1 to 3, wherein the determining a target loss function according to at least one of the first loss function and the second loss function includes: obtaining a third loss function of the first model, where the third loss function is a video classification loss function; A target loss function is determined according to the first loss function and the third loss function.

根據請求項1至3任一項所述的方法，其中，所述根據所述第一損失函數和所述第二損失函數中至少一個，確定目標損失函數，包括：獲取所述第一模型的第三損失函數，所述第三損失函數為視頻分類損失函數；根據所述第二損失函數和所述第三損失函數，確定目標損失函數。The method according to any one of claims 1 to 3, wherein the determining a target loss function according to at least one of the first loss function and the second loss function includes: obtaining a third loss function of the first model, where the third loss function is a video classification loss function; A target loss function is determined according to the second loss function and the third loss function.

根據請求項1至3任一項所述的方法，其中，所述根據所述第一損失函數和所述第二損失函數中至少一個，確定目標損失函數，包括：獲取所述第一模型的第三損失函數，所述第三損失函數為視頻分類損失函數；根據所述第一損失函數、所述第二損失函數和所述第三損失函數，確定目標損失函數。The method according to any one of claims 1 to 3, wherein the determining a target loss function according to at least one of the first loss function and the second loss function includes: obtaining a third loss function of the first model, where the third loss function is a video classification loss function; A target loss function is determined according to the first loss function, the second loss function and the third loss function.

根據請求項1至3任一項所述的方法，其中，所述根據所述第一損失函數和所述第二損失函數中至少一個，確定目標損失函數，包括：獲取第一模型的第三損失函數，所述第三損失函數為視頻分類損失函數；獲取與所述第一損失函數和所述第二損失函數對應的權值函數；根據所述權值函數、所述第一損失函數、所述第二損失函數和第三損失函數，確定所述目標損失函數。The method according to any one of claims 1 to 3, wherein the determining a target loss function according to at least one of the first loss function and the second loss function includes: Obtain a third loss function of the first model, where the third loss function is a video classification loss function; obtaining weight functions corresponding to the first loss function and the second loss function; The target loss function is determined according to the weight function, the first loss function, the second loss function and the third loss function.

根據請求項1至3任一項所述的方法，還包括：接收待分類視頻；通過所述第三模型對所述待分類視頻進行分類處理，以得到分類結果。The method according to any one of claims 1 to 3, further comprising: Receive videos to be classified; The video to be classified is classified by the third model to obtain a classification result.

一種終端，包括處理器、輸入設備、輸出設備和記憶體，所述處理器、輸入設備、輸出設備和記憶體相互連接，其中，所述記憶體配置為儲存電腦程式，所述電腦程式包括程式指令，所述處理器被配置為調用所述程式指令，執行如請求項1至9任一項所述的方法。A terminal, comprising a processor, an input device, an output device and a memory, the processor, the input device, the output device and the memory are connected to each other, wherein the memory is configured to store a computer program, and the computer program includes a program instructions, the processor is configured to invoke the program instructions to perform the method of any one of claims 1 to 9.

一種電腦可讀儲存介質，所述電腦可讀儲存介質儲存有電腦程式，所述電腦程式包括程式指令，所述程式指令當被處理器執行時使所述處理器執行如請求項1至9任一項所述的方法。A computer-readable storage medium storing a computer program, the computer program including program instructions, the program instructions, when executed by a processor, cause the processor to perform any of claims 1 to 9. one of the methods described.