TWI745808B - Situation awareness system and method - Google Patents

Situation awareness system and method Download PDF

Info

Publication number
TWI745808B
TWI745808B TW108147454A TW108147454A TWI745808B TW I745808 B TWI745808 B TW I745808B TW 108147454 A TW108147454 A TW 108147454A TW 108147454 A TW108147454 A TW 108147454A TW I745808 B TWI745808 B TW I745808B
Authority
TW
Taiwan
Prior art keywords
augmented reality
reality device
server
digital image
category
Prior art date
Application number
TW108147454A
Other languages
Chinese (zh)
Other versions
TW202125326A (en
Inventor
蘇愷宏
林永祥
Original Assignee
亞達科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 亞達科技股份有限公司 filed Critical 亞達科技股份有限公司
Priority to TW108147454A priority Critical patent/TWI745808B/en
Publication of TW202125326A publication Critical patent/TW202125326A/en
Application granted granted Critical
Publication of TWI745808B publication Critical patent/TWI745808B/en

Links

Images

Landscapes

  • Image Analysis (AREA)
  • Processing Or Creating Images (AREA)

Abstract

A situation awareness system includes a server, a first augmented reality (AR) device and a second AR device. A first digital image is extracted by the first AR device. The first AR device receives a class label corresponding to an object to be labeled through a user interface, and uploads the class label and the first digital image to the server which trains a machine learning model accordingly. A second digital image is extracted by the second AR device and a situation object is detected according to the machine learning model. The second AR device displays a graphic object on a transparent display according to the location of the situation object.

Description

狀況認知系統與方法 Condition recognition system and method

本發明是有關於一種透過擴增實境裝置自動產生人工智慧應用程式的系統與方法,其中應用程式能分享給其他使用者。 The present invention relates to a system and method for automatically generating artificial intelligence application programs through augmented reality devices, wherein the application programs can be shared with other users.

多數人會做出錯誤的決定是因為缺乏對於情境的認知,而情境認知仰賴個人的經驗,生活習性與天賦,有些具有天賦的人能夠根據本能辨識出特定的情境,例如辨識出特定的物件。現今的技術足以將這些經驗和天賦透過機器學習轉換為數位化的人工智慧模型。然而,大量的標記工作和複雜的模型訓練,使得人工智慧模型產出緩慢,也不易散播。近年來因為圖形處理單元(graph processing unit,GPU)運算能力的大幅提升,人工智慧已然成為最熱門的新興產業。然而現今的人工智慧應用各自為政,沒有系統化和結構性的生態鏈,一般人難以接觸。 Most people make wrong decisions because they lack knowledge of the situation. Situational cognition relies on personal experience, life habits and talents. Some talented people can identify specific situations based on instinct, such as identifying specific objects. Today's technology is sufficient to transform these experiences and talents into digital artificial intelligence models through machine learning. However, a large amount of labeling work and complex model training make the output of artificial intelligence models slow and difficult to spread. In recent years, artificial intelligence has become the hottest emerging industry due to the dramatic increase in the computing power of the graph processing unit (GPU). However, today's artificial intelligence applications are independent, and there is no systematic and structured ecological chain, which is difficult for ordinary people to reach.

本發明的實施例提出一種狀況認知系統,包括伺 服器、第一擴增實境裝置與第二擴增實境裝置。第一擴增實境裝置通訊連接至伺服器,第一擴增實境裝置包括第一影像感測器與第一透明顯示器,用以透過第一影像感測器擷取第一數位影像。第二擴增實境裝置,通訊連接至伺服器,第二擴增實境裝置包括第二影像感測器與第二透明顯示器。第一擴增實境裝置透過使用者介面接收關於一待分類物件的類別標籤,並上傳類別標籤與第一數位影像至伺服器。伺服器根據類別標籤與第一數位影像訓練出一機器學習模型。第二擴增實境裝置透過第二影像感測器擷取第二數位影像,第二擴增實境裝置或伺服器根據機器學習模型偵測第二數位影像中的情境物件,第二擴增實境裝置根據情境物件的位置在第二透明顯示器中顯示對應的圖示物件。 The embodiment of the present invention proposes a situation recognition system, including a servo The server, the first augmented reality device and the second augmented reality device. The first augmented reality device is communicatively connected to the server. The first augmented reality device includes a first image sensor and a first transparent display for capturing a first digital image through the first image sensor. The second augmented reality device is communicatively connected to the server, and the second augmented reality device includes a second image sensor and a second transparent display. The first augmented reality device receives a category label about an object to be classified through the user interface, and uploads the category label and the first digital image to the server. The server trains a machine learning model based on the category label and the first digital image. The second augmented reality device captures the second digital image through the second image sensor, the second augmented reality device or server detects the contextual object in the second digital image according to the machine learning model, and the second augmentation The real-world device displays the corresponding icon object on the second transparent display according to the position of the contextual object.

在一些實施例中,上述的使用者介面包括語音介面或手勢介面。 In some embodiments, the aforementioned user interface includes a voice interface or a gesture interface.

在一些實施例中,第一擴增實境裝置或伺服器用以偵測第一數位影像中的至少一個第一推薦物件。第一擴增實境裝置用以在第一透明顯示器上顯示關於第一推薦物件的邊界框,並透過使用者介面接收來自使用者的指令以調整邊界框的位置與大小。 In some embodiments, the first augmented reality device or server is used to detect at least one first recommended object in the first digital image. The first augmented reality device is used for displaying a bounding box about the first recommended object on the first transparent display, and receiving instructions from the user through the user interface to adjust the position and size of the bounding box.

在一些實施例中,第一擴增實境裝置或伺服器是根據第一卷積神經網路來偵測第一推薦物件,第一卷積神經網路用以執行僅一次推論程序以輸出多個邊界框的位置與多個類別信心值,其中邊界框的大小為固定。 In some embodiments, the first augmented reality device or server detects the first recommended object based on the first convolutional neural network, and the first convolutional neural network is used to perform the inference process only once to output multiple The positions of two bounding boxes and multiple category confidence values, where the size of the bounding box is fixed.

在一些實施例中,伺服器用以取得多個訓練影像 與關於每一個訓練影像的標記資料,其中標記資料包括訓練邊界框的大小。伺服器用以執行非監督式分群演算法以將訓練邊界框的大小分為多個群組,並且從每一個群組中取得預設邊界框大小。伺服器將預設邊界框大小用於上述的僅一次推論程序中。 In some embodiments, the server is used to obtain multiple training images And labeling data about each training image, where the labeling data includes the size of the training bounding box. The server is used to execute an unsupervised grouping algorithm to divide the size of the training bounding box into multiple groups, and obtain a preset bounding box size from each group. The server uses the preset bounding box size in the one-time inference procedure described above.

在一些實施例中,上述的機器學習模型為第二卷積神經網路,上述偵測第二數位影像中的情境物件的操作包括:根據第一卷積神經網路偵測第二數位影像中的第二推薦物件與第二推薦物件的類別信心值;以及根據第二卷積神經網路輸出第二推薦物件的情境信心值,並將第二推薦物件的類別信心值乘上情境信心值以得到結果信心值,若結果信心值大於臨界值則判斷第二推薦物件為情境物件。 In some embodiments, the above-mentioned machine learning model is a second convolutional neural network, and the operation of detecting the contextual object in the second digital image includes: detecting the second digital image according to the first convolutional neural network The second recommended object and the category confidence value of the second recommended object; and output the context confidence value of the second recommended object according to the second convolutional neural network, and multiply the category confidence value of the second recommended object by the context confidence value to The result confidence value is obtained, and if the result confidence value is greater than the critical value, it is determined that the second recommended object is a contextual object.

在一些實施例中,第一推薦物件屬於第一類別,類別標籤屬於第二類別,第二類別被涵蓋在第一類別之中。 In some embodiments, the first recommended object belongs to the first category, the category label belongs to the second category, and the second category is included in the first category.

以另一個角度來說,本發明的實施例提出一種狀況認知方法,適用於一狀況認知系統,此狀況認知系統包括伺服器、第一擴增實境裝置與第二擴增實境裝置。第一擴增實境裝置包括第一影像感測器與第一透明顯示器。第二擴增實境裝置包括第二影像感測器與第二透明顯示器。上述的狀況認知方法包括:透過第一影像感測器擷取第一數位影像;透過第一擴增實境裝置的一使用者介面接收關於待分類物件的一類別標籤;根據類別標籤與第一數位影像訓練出機器學習模型;以及透過第二影像感測器擷取第二數位影像,根據機器學習模型偵測第二數位影像中的情境物件,並根據情境物件的位置在第二 透明顯示器中顯示對應的圖示物件。 From another perspective, an embodiment of the present invention provides a situation awareness method, which is suitable for a situation awareness system. The situation awareness system includes a server, a first augmented reality device, and a second augmented reality device. The first augmented reality device includes a first image sensor and a first transparent display. The second augmented reality device includes a second image sensor and a second transparent display. The above-mentioned situation recognition method includes: capturing a first digital image through a first image sensor; receiving a category label about an object to be categorized through a user interface of the first augmented reality device; according to the category label and the first The digital image trains the machine learning model; and the second digital image is captured through the second image sensor, the contextual object in the second digital image is detected according to the machine learning model, and the second digital image is detected according to the position of the contextual object. The corresponding icon object is displayed on the transparent display.

在一些實施例中,上述的方法還包括:偵測第一數位影像中的至少一個第一推薦物件,此步驟是根據第一卷積神經網路來執行。此第一卷積神經網路用以執行僅一次推論程序以輸出多個邊界框的位置與多個類別信心值,其中邊界框的大小為固定。 In some embodiments, the above-mentioned method further includes: detecting at least one first recommended object in the first digital image, and this step is performed according to the first convolutional neural network. The first convolutional neural network is used to perform an inference process only once to output the positions of multiple bounding boxes and multiple class confidence values, wherein the size of the bounding boxes is fixed.

在一些實施例中,狀況認知方法更包括:取得多個訓練影像與關於每一個訓練影像的一標記資料,其中標記資料包括訓練邊界框的大小;執行一非監督式分群演算法以將訓練邊界框的大小分為多個群組,並且從每一個群組中取得一預設邊界框大小;以及將預設邊界框大小用於僅一次推論程序中。 In some embodiments, the situation recognition method further includes: obtaining a plurality of training images and a labeling data about each training image, where the labeling data includes the size of the training bounding box; and executing an unsupervised clustering algorithm to reduce the training bounds. The size of the frame is divided into multiple groups, and a preset bounding box size is obtained from each group; and the preset bounding box size is used in the one-time inference process.

在上述的系統與方法中,使用者可以不用寫程式,透過擴增實境裝置來建立機器學習模型並分享給其他使用者。 In the above-mentioned system and method, users can build machine learning models through augmented reality devices without writing programs and share them with other users.

為讓本發明的上述特徵和優點能更明顯易懂,下文特舉實施例,並配合所附圖式作詳細說明如下。 In order to make the above-mentioned features and advantages of the present invention more comprehensible, the following specific embodiments are described in detail in conjunction with the accompanying drawings.

100‧‧‧狀況認知系統 100‧‧‧Condition Cognition System

102‧‧‧伺服器 102‧‧‧Server

110、120‧‧‧擴增實境裝置 110、120‧‧‧Amplified reality device

111、121‧‧‧影像感測器 111、121‧‧‧Image sensor

112、122‧‧‧透明顯示器 112、122‧‧‧Transparent display

130‧‧‧網路 130‧‧‧Internet

201‧‧‧創作者 201‧‧‧Creator

210、220、230、232、240‧‧‧步驟 210, 220, 230, 232, 240‧‧‧ steps

211‧‧‧參數 211‧‧‧Parameter

226‧‧‧訓練樣本 226‧‧‧Training sample

231‧‧‧應用程式 231‧‧‧Application

221~225‧‧‧步驟 221~225‧‧‧Step

300、400‧‧‧數位影像 300, 400‧‧‧Digital image

301~304‧‧‧推薦物件 301~304‧‧‧Recommended objects

310‧‧‧擴增實境場景 310‧‧‧Amplified reality scene

311~314‧‧‧邊界框 311~314‧‧‧Bounding Box

401‧‧‧情境物件 401‧‧‧Situational Objects

410‧‧‧擴增實境場景 410‧‧‧Amplified reality scene

411‧‧‧邊界框 411‧‧‧bounding box

412‧‧‧文字 412‧‧‧Text

421~424‧‧‧圖標 421~424‧‧‧icon

502、504、506‧‧‧步驟 Steps 502, 504, 506‧‧‧

501‧‧‧訓練影像 501‧‧‧Training image

503‧‧‧預設邊界框大小 503‧‧‧Default bounding box size

505‧‧‧增強影像 505‧‧‧Enhanced image

507‧‧‧標籤 507‧‧‧label

508‧‧‧機器學習模型 508‧‧‧Machine Learning Model

601~604‧‧‧步驟 601~604‧‧‧Step

[圖1]是根據一實施例繪示狀況認知系統的示意圖。 [Fig. 1] is a schematic diagram showing a situation recognition system according to an embodiment.

[圖2A]是根據一實施例繪示狀況認知系統的運作示意圖。 [Fig. 2A] is a schematic diagram showing the operation of the situation recognition system according to an embodiment.

[圖2B]是根據一實施例繪示步驟220的流程圖。 [Fig. 2B] is a flowchart showing step 220 according to an embodiment.

[圖3A]是根據一實施例繪示擴增實境裝置所擷取的數位影像的示意圖。 [FIG. 3A] is a schematic diagram showing a digital image captured by an augmented reality device according to an embodiment.

[圖3B]是根據一實施例繪示使用者所看到的擴增實境場景的示意圖。 [Fig. 3B] is a schematic diagram showing an augmented reality scene seen by a user according to an embodiment.

[圖4A]是根據一實施例繪示擴增實境裝置所擷取的數位影像的示意圖。 [FIG. 4A] is a schematic diagram showing a digital image captured by an augmented reality device according to an embodiment.

[圖4B]是根據一實施例繪示使用者所看到的擴增實境場景的示意圖。 [Fig. 4B] is a schematic diagram showing an augmented reality scene seen by a user according to an embodiment.

[圖5]是根據一實施例繪示訓練機器學習模型的流程圖。 [Fig. 5] is a flowchart of training a machine learning model according to an embodiment.

[圖6]是根據一實施例繪示狀況認知方法的流程圖。 [Fig. 6] is a flowchart of a method for situation recognition according to an embodiment.

關於本文中所使用之『第一』、『第二』、...等,並非特別指次序或順位的意思,其僅為了區別以相同技術用語描述的元件或操作。 Regarding the "first", "second", ... etc. used in this text, it does not specifically refer to the order or sequence, but only to distinguish the elements or operations described in the same technical terms.

本揭露所提出的技術可稱為狀況認知放大器(Situation Awareness Magnifier,SAM),在人工智慧(artificial intelligence,AI)技術的基礎上,SAM以輕便型智慧眼鏡為媒介,讓有高度狀況認知的創作者可以透過他們的經驗,標記常人無法意識到的狀況,藉由雲端自動化的機器學習,將這些經驗轉換為一個個AI應用程式(application,APP)。一般使用者透過智慧眼鏡與SAM雲端系統連接,便可取得創作者提供的狀況認知。 The technology proposed in this disclosure can be called Situation Awareness Magnifier (SAM). Based on artificial intelligence (AI) technology, SAM uses lightweight smart glasses as a medium to allow creations with high situation awareness People can use their experience to mark situations that ordinary people cannot realize, and use cloud-based automated machine learning to convert these experiences into AI applications (applications, APPs). General users can connect to the SAM cloud system through smart glasses to obtain the status cognition provided by the creator.

圖1是根據一實施例繪示狀況認知系統的示意 圖。狀況認知系統100包括了伺服器102、擴增實境(augmented reality,AR)裝置110與擴增實境裝置120,其中擴增實境裝置110、120可透過網路130通訊連接至伺服器102。在此實施例中擴增實境裝置110、120是實作為智慧眼鏡,但在其他實施例中也可以實作為面罩、透明平板等,本發明並不限制擴增實境裝置110、120的大小與外型。在此實施例中,穿戴擴增實境裝置110的使用者被稱為創作者,創作者可以透過伺服器102將自己的狀況認知能力實作為應用程式,藉此將狀況認知能力分享給擴增實境裝置120的使用者。 Fig. 1 is a schematic diagram illustrating a situation recognition system according to an embodiment picture. The situation awareness system 100 includes a server 102, an augmented reality (AR) device 110, and an augmented reality device 120. The augmented reality devices 110 and 120 can be connected to the server 102 via a network 130. . In this embodiment, the augmented reality devices 110, 120 are implemented as smart glasses, but in other embodiments, they can also be implemented as masks, transparent plates, etc. The present invention does not limit the size of the augmented reality devices 110, 120 And appearance. In this embodiment, the user wearing the augmented reality device 110 is referred to as the creator. The creator can use the server 102 to implement his situational awareness as an application, thereby sharing the situational awareness with the augmented reality device. The user of the reality device 120.

擴增實境裝置110包括了影像感測器111與透明顯示器112。擴增實境裝置120包括了影像感測器121與透明顯示器122。透明顯示器112、122亦可稱為透視型顯示器(See-through display),是一種可讓使用者得以同時觀看外在影像以及投射影像的裝置,達到實像與虛像疊加而不遮蔽視野的效果。透明顯示器112、122可包括液晶顯示器、有機發光二極體、投影器、鏡片、光導等,本發明並不限制透明顯示器112、122所包含的元件。影像感測器111、121可包括感光耦合元件(Charge-coupled Device,CCD)感測器、互補性氧化金屬半導體(Complementary Metal-Oxide Semiconductor)感測器或其他合適的感光元件。為了簡化起見,圖1中並未繪示擴增實境裝置110、120的所有元件,例如,擴增實境裝置110、120還可包括處理器、麥克風、聲音撥放器、慣性測量單元、通訊模組、實體按鈕等,本發明並不在此限。上述的通訊模組可以是蜂巢式網路(或稱行動網路)、近場通訊、紅外線 通訊、藍芽或無線保真(WiFi)等等,用以通訊連接至網路130。 The augmented reality device 110 includes an image sensor 111 and a transparent display 112. The augmented reality device 120 includes an image sensor 121 and a transparent display 122. The transparent displays 112 and 122 can also be called see-through displays, which are devices that allow users to view external images and projected images at the same time, achieving the effect of superimposing real and virtual images without obstructing the field of vision. The transparent displays 112 and 122 may include liquid crystal displays, organic light-emitting diodes, projectors, lenses, light guides, etc. The present invention does not limit the components included in the transparent displays 112 and 122. The image sensors 111 and 121 may include Charge-coupled Device (CCD) sensors, Complementary Metal-Oxide Semiconductor (Complementary Metal-Oxide Semiconductor) sensors, or other suitable photosensitive elements. For the sake of simplicity, not all components of the augmented reality device 110, 120 are shown in FIG. 1. For example, the augmented reality device 110, 120 may also include a processor, a microphone, a sound player, and an inertial measurement unit. , Communication modules, physical buttons, etc., the present invention is not limited thereto. The above communication module can be a cellular network (or mobile network), near field communication, infrared Communication, Bluetooth or Wi-Fi (WiFi), etc., are used to communicate and connect to the network 130.

圖2A是根據一實施例繪示狀況認知系統的運作示意圖,請參照圖2A,在此創作者201是指擴增實境裝置110的使用者,在步驟210中,創作者201在伺服器102提供的系統平台上建立專案,此系統平台可具有網頁、應用程式、網路服務等介面。上述的專案是用以管理創作者201根據自身的狀況認知能力所建立的機器學習模型,創作者201可輸入多個參數211,這些參數211會傳送至伺服器102。參數211包括所要認知的類別、影像解析度、機器學習模型的種類、機器學習模型的各種參數等等。舉例來說,上述的類別可包括人的情緒、人是否說謊、動植物的物種、機械是否故障、是否生病等等,伺服器102可以提供多個預設的類別,或者創作者201可以建立新的類別。在一些實施例中,伺服器102也可以提供一個由專家建立的類別樹狀圖,其中紀錄了每個類別的從屬關係,例如“人臉”的類別會涵蓋“生氣”的類別,因此“人臉”節點會是“生氣”節點的父節點,以此類推。類別樹狀圖中各個類別的從屬關係可以用來幫助創作者標記影像中的物件,舉例來說,若創作者要建立“生氣”的類別,則伺服器102可以提供人臉的物件偵測器,當要標記一張影像中的“生氣”類別時,可先根據此物件偵測器偵測出人臉以供創作者判斷是否為“生氣”類別。 2A is a schematic diagram illustrating the operation of the situation recognition system according to an embodiment. Please refer to FIG. 2A. Here, the creator 201 refers to the user of the augmented reality device 110. In step 210, the creator 201 is on the server 102. Create a project on the provided system platform, which can have interfaces such as web pages, applications, and network services. The above-mentioned project is used to manage the machine learning model established by the creator 201 based on the cognitive ability of his own situation. The creator 201 can input multiple parameters 211, and these parameters 211 will be sent to the server 102. The parameters 211 include the category to be recognized, the resolution of the image, the type of the machine learning model, various parameters of the machine learning model, and so on. For example, the aforementioned categories may include human emotions, whether people are lying, species of animals and plants, whether machinery is malfunctioning, whether they are sick, etc. The server 102 can provide multiple preset categories, or the creator 201 can create a new one. category. In some embodiments, the server 102 may also provide a category tree diagram created by experts, in which the affiliation of each category is recorded. For example, the category of "face" covers the category of "angry", so "people The "face" node will be the parent node of the "angry" node, and so on. The affiliation of each category in the category tree can be used to help the creator mark objects in the image. For example, if the creator wants to create an "angry" category, the server 102 can provide a face object detector , When you want to mark the "angry" category in an image, you can first detect the face based on this object detector for the creator to determine whether it is the "angry" category.

在建立好專案以後,在步驟220中,創作者會透過擴增實境裝置110擷取數位影像並標記所要認知的物件以產生訓練樣本。具體來說,圖2B是根據一實施例繪示步驟220的流程圖。請參照圖2B,在步驟221,擴增實境裝置110先透過 自身的影像感測器111擷取一數位影像(如圖3A所示的數位影像300),在此例子中創作者要標記數位影像300中屬於“生氣”類別的物件。 After the project is created, in step 220, the creator will capture digital images through the augmented reality device 110 and mark the objects to be recognized to generate training samples. Specifically, FIG. 2B is a flowchart illustrating step 220 according to an embodiment. 2B, in step 221, the augmented reality device 110 first passes The own image sensor 111 captures a digital image (the digital image 300 as shown in FIG. 3A ). In this example, the creator wants to mark the objects in the digital image 300 that belong to the “angry” category.

在步驟222,由擴增實境裝置110或伺服器102偵測數位影像300中的推薦物件301~304(在此例子中為人臉)。在一些實施例中,可由伺服器102提供推薦物件301~304的物件偵測器給擴增實境裝置110,或者這個物件偵測器是內建在擴增實境裝置110當中。在一些實施例中,也可由擴增實境裝置110傳送數位影像300至伺服器102,由伺服器偵測出推薦物件301~304以後將偵測結果回傳至擴增實境裝置110。 In step 222, the augmented reality device 110 or the server 102 detects the recommended objects 301 to 304 (human faces in this example) in the digital image 300. In some embodiments, the server 102 may provide an object detector of the recommended objects 301 to 304 to the augmented reality device 110, or the object detector may be built in the augmented reality device 110. In some embodiments, the augmented reality device 110 may also send the digital image 300 to the server 102, and the server detects the recommended objects 301 to 304 and then returns the detection result to the augmented reality device 110.

在步驟223中,根據偵測結果顯示圖示物件。具體來說,擴增實境裝置110會根據推薦物件301~304的位置產生對應的圖示物件(例如邊界框)並將此圖示物件顯示在透明顯示器112當中,圖3B繪示的是創作者所看到的擴增實境場景310,其中邊界框311~314為虛像,其餘是真實場景,兩者會混和在一起。 In step 223, the icon object is displayed according to the detection result. Specifically, the augmented reality device 110 generates a corresponding icon object (such as a bounding box) according to the positions of the recommended objects 301 to 304 and displays the icon object in the transparent display 112. FIG. 3B shows the creation In the augmented reality scene 310 seen by the reader, the bounding boxes 311 to 314 are virtual images, and the rest are real scenes, and the two will be mixed together.

在步驟224中,選擇所要標記的邊界框。創作者可以透過一使用者介面輸入一或多個指令,藉此從推薦物件301~304中選擇出所要標記的物件。此使用者介面例如為語音介面或手勢介面,也就是說擴增實境裝置110可透過麥克風接收來自創作者的聲音訊號並辨識創作者說出的話或者是透過影像感測器111辨識創作者的手勢來分析出創作者所要下達的指令。或者在一些實施例中,上述的使用者介面也可以用滑鼠、鍵盤、手寫板、實體按鍵、或其他硬體裝置來實作。在此 例子中創作者選擇了邊界框313(推薦物件303)。 In step 224, the bounding box to be marked is selected. The creator can input one or more commands through a user interface to select the object to be marked from the recommended objects 301 to 304. This user interface is, for example, a voice interface or a gesture interface. That is to say, the augmented reality device 110 can receive audio signals from the creator through a microphone and recognize the creator’s words or recognize the creator’s voice through the image sensor 111. Gestures to analyze the instructions the creator wants to give. Or in some embodiments, the aforementioned user interface can also be implemented by a mouse, keyboard, handwriting pad, physical keys, or other hardware devices. here In the example, the creator selects the bounding box 313 (recommended object 303).

在步驟225,提供推薦物件的類別標籤。創作者可以透過上述的使用者介面輸入一或多個指令,進一步地把邊界框313(推薦物件303)分類為“生氣”這個類別,創作者所提供的類別標籤可以用文字、數字或二進位碼記錄下來。在一些實施例中,創作者也可以透過使用者介面下達指令以調整邊界框311~314的位置與大小,之後再提供類別標籤。在一些實施例中,如果創作者所要標記的物件並不在邊界框311~314的範圍內,創作者可以透過使用者介面建立一個新的邊界框並提供相對應的類別標籤。數位影像300、邊界框313的位置與大小(或新建立邊界框的位置與大小)與類別標籤可合併稱為一個訓練樣本,接下來可再重複步驟222~225,藉此產生更多訓練樣本。 In step 225, a category label of the recommended object is provided. The creator can enter one or more commands through the aforementioned user interface to further classify the bounding box 313 (recommended object 303) into the category "angry". The category label provided by the creator can be text, number or binary. Record the code. In some embodiments, the creator can also issue commands through the user interface to adjust the positions and sizes of the bounding boxes 311 to 314, and then provide category labels. In some embodiments, if the object to be marked by the creator is not within the bounding boxes 311 to 314, the creator can create a new bounding box through the user interface and provide a corresponding category label. The position and size of the digital image 300, the bounding box 313 (or the position and size of the newly created bounding box) and the category label can be combined as a training sample, and then steps 222~225 can be repeated to generate more training samples .

在一些實施例中也可以不產生推薦物件,擴增實境裝置110會直接透過使用者接面接收關於一待分類物件的類別標籤。舉例來說,擴增實境裝置110可以設定數位影像中對應使用者目光的一個特定位置(例如影像中心)並取得此特定位置上的物件(亦稱為待分類物件),擴增實境裝置110可以判斷使用者的目光是否停留在此待分類物件超過一預設時間(例如5秒),若是的話則等待使用者說出此待分類物件的類別標籤。在本揭露中,上述的推薦物件也可以被稱為待分類物件,因此不論有沒有產生推薦物件,只要透過擴增實境裝置110的使用者介面來產生一個待分類物件的類別標籤都在本揭露的範圍中。 In some embodiments, the recommended object may not be generated, and the augmented reality device 110 will directly receive the category label of an object to be classified through the user interface. For example, the augmented reality device 110 can set a specific position in the digital image corresponding to the user's gaze (such as the center of the image) and obtain the object at the specific position (also referred to as the object to be classified). The augmented reality device 110 can determine whether the user's gaze stays on the object to be classified for more than a preset time (for example, 5 seconds), and if so, waits for the user to speak the category label of the object to be classified. In this disclosure, the above-mentioned recommended objects can also be referred to as objects to be classified. Therefore, no matter whether the recommended objects are generated or not, as long as the user interface of the augmented reality device 110 is used to generate a category label of the object to be classified, this In the scope of disclosure.

請參照回圖2A,接下來擴增實境裝置110會將所 收集到的一或多個訓練樣本226傳送至伺服器102。在步驟220中,由於系統先提供了推薦物件,因此創作者可以快速地建立類別標籤。值得注意的是,推薦物件的類別必須要涵蓋創作者所要建立的類別,例如“人臉”這類別涵蓋了“生氣”的類別。以另一個角度來說,若推薦物件屬於第一類別,創作者提供的類別標籤屬於第二類別,則第二類別會被涵蓋在第一類別之中,但上述的類別僅是範例,本發明並不限制第一類別與第二類別為何。除此之外,創作者是透過擴增實境裝置110來建立類別標籤,因此創作者可以任意地在室內/戶外移動,當他發現合適的物件便可以產生新的訓練樣本,相較於在電腦前枯燥的不斷建立標籤,此實施例提供的系統對於創作者來說更加友善。 Please refer back to Figure 2A. Next, the augmented reality device 110 will The collected one or more training samples 226 are sent to the server 102. In step 220, since the system provides recommended objects first, the creator can quickly create category tags. It is worth noting that the category of the recommended object must cover the category created by the creator. For example, the category of "face" covers the category of "angry". From another perspective, if the recommended object belongs to the first category and the category label provided by the creator belongs to the second category, the second category will be covered by the first category, but the above categories are only examples. The present invention There is no restriction on the first category and the second category. In addition, the creator uses the augmented reality device 110 to create category tags, so the creator can move indoors/outdoors at will. When he finds suitable objects, he can generate new training samples. The boring and constant creation of tags in front of the computer, the system provided by this embodiment is more friendly to creators.

當擴增實境裝置110傳送訓練樣本226給伺服器102以後,伺服器102可以判斷訓練樣本的個數是否超過一個臨界值,若是的話則會開始執行步驟230,若否的話則會通知創作者繼續收集訓練樣本。在步驟230中,伺服器102根據收集到的類別標籤與數位影像訓練一個機器學習模型,此機器學習模型可為決策樹、隨機森林、多層次神經網路、卷積神經網路、支持向量機等等,本發明並不在此限。在一些實施例中,伺服器102也可以對所收集到的數位影像執行一些前處理,例如亮度調整、去雜訊等,本發明並不限制這些前處理的內容。在一些實施例中,伺服器102也可以建立一個應用程式231,讓使用者透過應用程式231來使用所訓練出的機器學習模型。在建立應用程式231以後,伺服器102會將應用程式231發佈在使用者可存取的平台上。 After the augmented reality device 110 sends the training samples 226 to the server 102, the server 102 can determine whether the number of training samples exceeds a threshold, if so, it will start to perform step 230, if not, it will notify the creator Continue to collect training samples. In step 230, the server 102 trains a machine learning model based on the collected category labels and digital images. The machine learning model can be a decision tree, a random forest, a multi-level neural network, a convolutional neural network, or a support vector machine. Wait, the present invention is not limited to this. In some embodiments, the server 102 may also perform some pre-processing on the collected digital images, such as brightness adjustment, noise removal, etc. The present invention does not limit the content of these pre-processing. In some embodiments, the server 102 may also create an application program 231 for the user to use the trained machine learning model through the application program 231. After the application 231 is created, the server 102 will publish the application 231 on a platform accessible by the user.

在一些實施例中,在發佈應用程式231之前,伺服器102可以先將應用程式231傳送給創作者進行測試(步驟232),在測試之後創作者可以回到步驟220以收集更多訓練樣本,或者創作者可以接受訓練結果,接下來伺服器102才會將應用程式231發佈。 In some embodiments, before publishing the application 231, the server 102 may first send the application 231 to the creator for testing (step 232). After the testing, the creator may return to step 220 to collect more training samples. Or the creator can accept the training result, and then the server 102 will release the application 231.

在步驟240,使用者透過擴增實境裝置120來下載並安裝應用程式231,之後便可以根據訓練好的機器學習模型來認知新的狀況。具體來說,擴增實境裝置120可透過自身的影像感測器121擷取一數位影像,如圖4A的數位影像400,擴增實境裝置120可以根據所訓練出的機器學習模型來偵測數位影像400中的情境物件401(在此例子中為生氣的人臉)。在一些實施例中,擴增實境裝置120可以將數位影像400傳送至伺服器102,由伺服器102偵測出情境物件401後將偵測結果回傳至擴增實境裝置120。或者,擴增實境裝置120也可以透過應用程式231從伺服器102下載訓練好的機器學習模型,透過自身的處理器來偵測數位影像400中的情境物件401。 In step 240, the user downloads and installs the application 231 through the augmented reality device 120, and then can recognize the new situation according to the trained machine learning model. Specifically, the augmented reality device 120 can capture a digital image through its own image sensor 121, such as the digital image 400 shown in FIG. 4A. The augmented reality device 120 can detect according to the trained machine learning model. Measure the contextual object 401 (an angry face in this example) in the digital image 400. In some embodiments, the augmented reality device 120 may send the digital image 400 to the server 102, and the server 102 detects the contextual object 401 and then returns the detection result to the augmented reality device 120. Alternatively, the augmented reality device 120 can also download the trained machine learning model from the server 102 through the application 231, and detect the contextual object 401 in the digital image 400 through its own processor.

擴增實境裝置120也會根據情境物件401的位置在透明顯示器121中顯示對應的圖示物件,如圖4B所示的擴增實境場景410,圖示物件包括邊界框411與“生氣”的文字412。然而,如4B僅為一範例,上述的圖示物件也可包含任意的圖案、符號、數字等等,本發明並不在此限。在一些實施例中,擴增實境場景410中也可以顯示多個圖標421~424,這些圖標421~424是用以切換至其他的狀況認知能力,也就是偵測其他的物件,例如為車、花、樹、小孩等等,本發明並不在此限。 The augmented reality device 120 will also display the corresponding icon object on the transparent display 121 according to the position of the scene object 401, as shown in the augmented reality scene 410 shown in FIG. 4B, the icon object includes a bounding box 411 and "angry" The text 412. However, as 4B is only an example, the above-mentioned icon objects may also include any patterns, symbols, numbers, etc., and the present invention is not limited thereto. In some embodiments, multiple icons 421 to 424 may also be displayed in the augmented reality scene 410. These icons 421 to 424 are used to switch to other status awareness capabilities, that is, to detect other objects, such as cars. , Flowers, trees, children, etc., the present invention is not limited thereto.

在一些實施例中,步驟220中所採用的物件偵測器與步驟230所訓練的機器學習模型都是卷積神經網路。在習知的物件偵測方法中是用一個視窗掃過整張數位影像,對於每個視窗都要判斷是否為所要偵測的物件,若是的話則在設定此視窗為一個邊界框,之後調整視窗的大小再重新掃描一次,也就是說在這樣的驗算法中需要推論(inference)多次。然而,在此實施例中是先預設有多個邊界框,每個邊界框有固定的大小,而卷積神經網路僅執行一次推論程序來輸出多個邊界框的位置與多個類別信心值。更具體來說,每個邊界框都至少對應到3+N個參數,包含X座標、Y座標、是否有物件的機率P(Object),N個類別信心值,若總共有M個邊界框,則卷積神經網路至少會輸出Mx(3+N)個數值,其中M、N為正整數。上述的N個類別信心值例如是對應到狗、貓、人臉、車等N個類別,將上述的機率P(Object)乘上對應的類別信心值則表示此邊界框內有對應類別的機率為何,可表示為以下方程式(1)。 In some embodiments, the object detector used in step 220 and the machine learning model trained in step 230 are both convolutional neural networks. In the conventional object detection method, a window is used to scan the entire digital image. For each window, it is determined whether it is the object to be detected. If so, the window is set as a bounding box, and then the window is adjusted The size of is scanned again, which means that inferences are needed many times in such a verification algorithm. However, in this embodiment, multiple bounding boxes are preset, and each bounding box has a fixed size, and the convolutional neural network only executes the inference process once to output the positions of multiple bounding boxes and multiple category confidences. value. More specifically, each bounding box corresponds to at least 3+N parameters, including X coordinate, Y coordinate, the probability of whether there is an object P (Object), and N category confidence values. If there are a total of M bounding boxes, Then the convolutional neural network will output at least Mx(3+N) values, where M and N are positive integers. The above-mentioned N category confidence values correspond to N categories such as dogs, cats, faces, cars, etc. The above probability P(Object) is multiplied by the corresponding category confidence value to indicate the probability that there is a corresponding category in the bounding box Why can be expressed as the following equation (1).

P(Ci)=P(Ci|Object)×P(Object)...(1) P(C i )=P(C i |Object)×P(Object)...(1)

其中P(Ci|Object)為上述的類別信心值,Ci為第i個類別,i=1...N,P(Ci)為此邊界框內有對應類別Ci的機率。值得注意的是,在訓練階段若影像中的物件個數大於上述的正整數M,則部分邊界框中並沒有物件,因此這些邊界框的機率P(Object)應為0。在一些實施例中,上述Mx(3+N)個數值可以是在全連接層之後輸出,或者在一些實施例中也可以在卷積層之後輸出,然而本領域具有通常知識者當可理解卷積神經網路的架構,在此不再詳細贅述。 Where P(C i |Object) is the above-mentioned category confidence value, C i is the i-th category, i=1...N, and P(C i ) is the probability that there is a corresponding category C i in the bounding box. It is worth noting that in the training phase, if the number of objects in the image is greater than the above positive integer M, there are no objects in some bounding boxes, so the probability P(Object) of these bounding boxes should be 0. In some embodiments, the above-mentioned Mx(3+N) values may be output after the fully connected layer, or in some embodiments may also be output after the convolution layer, but those with ordinary knowledge in the field should understand convolution The structure of the neural network will not be described in detail here.

由於上述邊界框的大小是固定的,因此必須有一機制來決定此大小(包含寬度與高度)。圖5是根據一實施例繪示訓練機器學習模型的流程圖。請參照圖5,其中各步驟是由伺服器102來執行。首先從資料庫出取得多個訓練影像501以及關於訓練影像的標記資料,標記資料亦稱為基本事實(ground truth),包括了已經標記過的邊界框(以下稱訓練邊界框)的大小與位置。在步驟502,執行一非監督式分群演算法以將這些訓練邊界框的大小分為多個群組,並且從每一個群組中取得預設邊界框大小503,例如取每個群組的中心位置(取平均)以做為預設邊界框大小503。此非監督式分群演算法例如為K-均值演算法(K-means),其中K可為5、6或任意合適的正整數。這些預設邊界框大小503可以表示為向量[高度,寬度],例如為[64,44]、[88,75]等共K個向量,本發明並不限制這些預設邊界框大小503的數值。在步驟504中,可以對訓練影像501執行上述的前處理以得到增強影像505。接下來在步驟506,根據標籤507、預設邊界框大小503與增強影像505來執行監督式機器學習演算法以取得機器學習模型508。 Since the size of the above bounding box is fixed, there must be a mechanism to determine this size (including width and height). Fig. 5 is a flowchart of training a machine learning model according to an embodiment. Please refer to FIG. 5, where each step is executed by the server 102. First obtain multiple training images 501 from the database and label data about the training images. The label data is also called ground truth, including the size and position of the labeled bounding box (hereinafter referred to as the training bounding box) . In step 502, perform an unsupervised grouping algorithm to divide the size of these training bounding boxes into multiple groups, and obtain a preset bounding box size 503 from each group, for example, take the center of each group The position (averaged) is used as the preset bounding box size 503. This unsupervised grouping algorithm is, for example, the K-means algorithm, where K can be 5, 6, or any suitable positive integer. These preset bounding box sizes 503 can be expressed as vectors [height, width], for example, K vectors such as [64,44], [88,75], etc. The present invention does not limit the value of these preset bounding box sizes 503 . In step 504, the aforementioned pre-processing may be performed on the training image 501 to obtain an enhanced image 505. Next, in step 506, a supervised machine learning algorithm is executed according to the label 507, the preset bounding box size 503, and the enhanced image 505 to obtain a machine learning model 508.

預設邊界框大小503除了用在訓練階段也會用在上述僅一次的推論程序中。本實施的卷積神經網路是直接預測邊界框的位置,因此不需要利用視窗掃過整張影像,據此只需要執行一次推論程序,可以減少推論的時間。 The preset bounding box size 503 is used in the above-mentioned only one-time inference procedure in addition to being used in the training phase. The convolutional neural network in this implementation directly predicts the position of the bounding box, so there is no need to use a window to scan the entire image. According to this, only one inference process needs to be performed, which can reduce the inference time.

在上述例子中,步驟220所採用的是偵測人臉的卷積神經網路,而步驟230是訓練偵測生氣人臉的卷積神經網路。在一些實施例中,在步驟230中伺服器102可以不採用偵 測人臉的卷積神經網路,重新訓練一個“生氣人臉”的機器學習模型,也就是說輸入影像為包含背景的整張影像,其中具有生氣人臉的標記。或者在一些實施例中可以單獨訓練一個“生氣”的機器學習模型,這個機器學習模型可以與“人臉”的機器學習模型結合。具體來說,步驟220中所使用的卷積神經網路稱為第一卷積神經網路,而在步驟230中所訓練的是卷積神經網路稱為第二卷積神經網路,第二卷積神經網路的輸入為人臉影像,輸出為表示是否為“生氣”的數值。在步驟240中偵測情境物件的操作可先根據第一卷積神經網路偵測數位影像400中的第二推薦物件(相同於情境物件401)與並取得第二推薦物件的類別信心值P(face),根據上述的描述可以表示為以下方程式(2)。 In the above example, step 220 uses a convolutional neural network for detecting human faces, and step 230 is training a convolutional neural network for detecting angry human faces. In some embodiments, the server 102 may not use detection in step 230. The convolutional neural network for measuring human faces retrains a machine learning model of "angry faces", that is to say, the input image is the entire image including the background, which has the mark of angry faces. Or, in some embodiments, an "angry" machine learning model can be trained separately, and this machine learning model can be combined with a "face" machine learning model. Specifically, the convolutional neural network used in step 220 is called the first convolutional neural network, and the convolutional neural network trained in step 230 is called the second convolutional neural network. The input of the second convolutional neural network is a face image, and the output is a value indicating whether it is "angry" or not. In step 240, the operation of detecting the contextual object can first detect the second recommended object (same as contextual object 401) in the digital image 400 according to the first convolutional neural network and obtain the category confidence value P of the second recommended object. (face), according to the above description, can be expressed as the following equation (2).

P(face)=P(face|Object)×P(Object)...(2) P(face)=P(face|Object)×P(Object)...(2)

接著根據第二卷積神經網路輸出第二推薦物件的情境信心值P(angry|face),並將第二推薦物件的類別信心值P(face)乘上情境信心值P(angry|face)以得到結果信心值P(angry),表示為以下方程式(3)。 Then output the context confidence value P(angry|face) of the second recommended object according to the second convolutional neural network, and multiply the category confidence value P(face) of the second recommended object by the context confidence value P(angry|face) To obtain the result confidence value P(angry), it is expressed as the following equation (3).

P(angry)=P(angry|face)×P(face)...(3) P(angry)=P(angry|face)×P(face)...(3)

若結果信心值P(angry)大於一臨界值則判斷第二推薦物件為情境物件。在此實施例中,由於第二卷積神經網路是根據人臉來判斷是否生氣,因此第二卷積神經網路的複雜度會比較低,可以減少訓練的時間。值得注意的是,可以減少訓練時間的一個原因是步驟220所採用的推薦物件的類別涵蓋了使用者所要標記的類別,這不只減少創作者標記所需要的時 間,也減少了訓練時間。 If the result confidence value P(angry) is greater than a critical value, it is determined that the second recommended object is a contextual object. In this embodiment, since the second convolutional neural network judges whether the person is angry or not based on the face, the complexity of the second convolutional neural network is relatively low, which can reduce the training time. It is worth noting that one reason for reducing training time is that the recommended object category used in step 220 covers the category that the user wants to mark, which not only reduces the time required for the creator to mark It also reduces training time.

圖6是根據一實施例繪示狀況認知方法的流程圖,在步驟601,透過第一影像感測器擷取第一數位影像。在步驟602,透過第一擴增實境裝置的一使用者介面接收關於第一數位影像中一待分類物件的一類別標籤。在步驟603,根據類別標籤與第一數位影像訓練出機器學習模型。在步驟604,透過第二影像感測器擷取第二數位影像,根據機器學習模型偵測第二數位影像中的情境物件,並根據情境物件的位置在第二透明顯示器中顯示對應的圖示物件。然而,圖6中各步驟已詳細說明如上,在此便不再贅述。值得注意的是,圖6中各步驟可以實作為多個程式碼,此外圖6的方法可以搭配以上實施例使用,也可以單獨使用。換言之,圖6的各步驟之間也可以加入其他的步驟。 FIG. 6 is a flowchart illustrating a method for situation recognition according to an embodiment. In step 601, a first digital image is captured through a first image sensor. In step 602, a category label for an object to be classified in the first digital image is received through a user interface of the first augmented reality device. In step 603, a machine learning model is trained based on the category label and the first digital image. In step 604, the second digital image is captured through the second image sensor, the contextual object in the second digital image is detected according to the machine learning model, and the corresponding icon is displayed on the second transparent display according to the position of the contextual object object. However, each step in FIG. 6 has been described in detail as above, and will not be repeated here. It is worth noting that each step in FIG. 6 can be implemented as multiple program codes. In addition, the method in FIG. 6 can be used in conjunction with the above embodiments, or can be used alone. In other words, other steps can also be added between the steps in FIG. 6.

本發明藉由輕便型的智慧眼鏡和自動化的AI模型訓練系統,精鍊樣本品質,縮短標記工作與模型訓練時間。搭配完善的雲端系統,使一般人能取得這些保貴的認知經驗,應用在日常生活中。本發明以創作者和使用者共享狀況意識為出發點,打造以AI App為主體的生態系統。藉由自動化的AI訓練與辨識服務,創造整個生態系,讓一般人也能參與AI。此外,透過上述的系統與方法,使用者不需撰寫程式,只需透過智慧眼鏡和使用者介面來標記狀態,雲端平台即自動進行機器學習,大幅減低入門門檻。本工具除了供一般消費性市場使用外,可應用於任何需要大量標記物件,以供未來機器學習之場合。人員僅需透過眼鏡標記物件並上傳SAM平台,後續訓練 工作將由平台自動完成。其產生之訓練模型,可再依需求進一步連結重整,組成多功能的AI App。 The invention uses portable smart glasses and an automated AI model training system to refine the sample quality and shorten the marking work and model training time. With a complete cloud system, ordinary people can obtain these valuable cognitive experiences and apply them in daily life. The present invention takes the creator and user's awareness of sharing the situation as a starting point to create an ecosystem with AI App as the main body. With automated AI training and identification services, the entire ecosystem is created so that ordinary people can participate in AI. In addition, through the above-mentioned system and method, users do not need to write programs, but only need to mark the status through smart glasses and the user interface, and the cloud platform will automatically perform machine learning, which greatly reduces the entry barrier. In addition to being used by the general consumer market, this tool can be applied to any occasion that requires a large number of labeled objects for future machine learning. Personnel only need to mark objects through glasses and upload them to the SAM platform for follow-up training The work will be done automatically by the platform. The training model generated by it can be further connected and reorganized according to requirements to form a multi-functional AI App.

雖然本發明已以實施例揭露如上,然其並非用以限定本發明,任何所屬技術領域中具有通常知識者,在不脫離本發明的精神和範圍內,當可作些許的更動與潤飾,故本發明的保護範圍當視後附的申請專利範圍所界定者為準。 Although the present invention has been disclosed in the above embodiments, it is not intended to limit the present invention. Anyone with ordinary knowledge in the relevant technical field can make some changes and modifications without departing from the spirit and scope of the present invention. The protection scope of the present invention shall be subject to those defined by the attached patent application scope.

102‧‧‧伺服器 102‧‧‧Server

120‧‧‧擴增實境裝置 120‧‧‧Amplified Reality Device

201‧‧‧創作者 201‧‧‧Creator

210、220、230、232、240‧‧‧步驟 210, 220, 230, 232, 240‧‧‧ steps

211‧‧‧參數 211‧‧‧Parameter

226‧‧‧訓練樣本 226‧‧‧Training sample

231‧‧‧應用程式 231‧‧‧Application

Claims (6)

一種狀況認知系統,包括:一伺服器;一第一擴增實境裝置,通訊連接至該伺服器,該第一擴增實境裝置包括一第一影像感測器與一第一透明顯示器,用以透過該第一影像感測器擷取一第一數位影像;以及一第二擴增實境裝置,通訊連接至該伺服器,該第二擴增實境裝置包括一第二影像感測器與一第二透明顯示器,其中該第一擴增實境裝置透過一使用者介面接收關於一待分類物件的一類別標籤,並上傳該類別標籤與該第一數位影像至該伺服器,其中該伺服器根據該類別標籤與該第一數位影像訓練一機器學習模型,其中該第二擴增實境裝置透過該第二影像感測器擷取一第二數位影像,該第二擴增實境裝置或該伺服器根據該機器學習模型偵測該第二數位影像中的一情境物件,該第二擴增實境裝置根據該情境物件的位置在該第二透明顯示器中顯示對應的圖示物件,其中該第一擴增實境裝置或該伺服器是根據一第一卷積神經網路來偵測該第一數位影像中的至少一第一推薦物件以作為該待分類物件,該第一卷積神經網路用以執行僅一次推論程序以輸出多個邊界框的位置與多個類別信心值,其中該些邊界框的大小為固定, 其中該伺服器用以取得多個訓練影像與關於每一該些訓練影像的一標記資料,其中該標記資料包括至少一訓練邊界框的大小,其中該伺服器用以執行一非監督式分群演算法以將該些訓練影像的該至少一訓練邊界框的大小分為多個群組,並且從每一該些群組中取得一預設邊界框大小,其中該預設邊界框大小包括寬度與高度,其中該伺服器將該些預設邊界框大小用於該僅一次推論程序中。 A situation recognition system includes: a server; a first augmented reality device communicatively connected to the server, the first augmented reality device including a first image sensor and a first transparent display, Used to capture a first digital image through the first image sensor; and a second augmented reality device, which is communicatively connected to the server, and the second augmented reality device includes a second image sensor And a second transparent display, wherein the first augmented reality device receives a category label about an object to be classified through a user interface, and uploads the category label and the first digital image to the server, wherein The server trains a machine learning model based on the class label and the first digital image, wherein the second augmented reality device captures a second digital image through the second image sensor, and the second augmented reality device The environment device or the server detects a contextual object in the second digital image according to the machine learning model, and the second augmented reality device displays a corresponding icon on the second transparent display according to the position of the contextual object Object, wherein the first augmented reality device or the server detects at least one first recommended object in the first digital image as the object to be classified according to a first convolutional neural network, the first A convolutional neural network is used to perform only one inference process to output the positions of multiple bounding boxes and multiple class confidence values, where the size of the bounding boxes is fixed, The server is used to obtain a plurality of training images and a label data about each of the training images, wherein the label data includes the size of at least one training bounding box, and the server is used to perform an unsupervised grouping algorithm The method is to divide the size of the at least one training bounding box of the training images into a plurality of groups, and obtain a preset bounding box size from each of the groups, wherein the preset bounding box size includes width and Height, where the server uses the preset bounding box sizes in the one-time inference process. 如申請專利範圍第1項所述之狀況認知系統,其中該使用者介面包括一語音介面或一手勢介面。 In the situation recognition system described in claim 1, wherein the user interface includes a voice interface or a gesture interface. 如申請專利範圍第1項所述之狀況認知系統,其中該第一擴增實境裝置用以在該第一透明顯示器上顯示關於該至少一第一推薦物件的該些邊界框的其中之一,並透過該使用者介面接收來自使用者的指令以調整該些邊界框的其中之一的位置與大小。 The situation recognition system described in claim 1, wherein the first augmented reality device is used to display one of the bounding boxes about the at least one first recommended object on the first transparent display , And receive instructions from the user through the user interface to adjust the position and size of one of the bounding boxes. 如申請專利範圍第1項所述之狀況認知系統,其中該機器學習模型為一第二卷積神經網路,偵測該第二數位影像中的該情境物件的操作包括:根據該第一卷積神經網路偵測該第二數位影像中的一第二推薦物件與該第二推薦物件的該類別信心值;以及 根據該第二卷積神經網路輸出該第二推薦物件的一情境信心值,並將該第二推薦物件的該類別信心值乘上該情境信心值以得到一結果信心值,若該結果信心值大於一臨界值則判斷該第二推薦物件為該情境物件。 For example, the situation recognition system described in claim 1, wherein the machine learning model is a second convolutional neural network, and the operation of detecting the contextual object in the second digital image includes: according to the first volume The product neural network detects a second recommended object in the second digital image and the confidence value of the category of the second recommended object; and Output a context confidence value of the second recommended object according to the second convolutional neural network, and multiply the category confidence value of the second recommended object by the context confidence value to obtain a result confidence value, if the result confidence value If the value is greater than a critical value, it is determined that the second recommended object is the contextual object. 如申請專利範圍第4項所述之狀況認知系統,其中該第一推薦物件屬於一第一類別,該類別標籤屬於一第二類別,該第二類別被涵蓋在該第一類別之中。 For example, in the situation recognition system described in item 4 of the scope of patent application, the first recommended object belongs to a first category, the category label belongs to a second category, and the second category is covered by the first category. 一種狀況認知方法,適用於一狀況認知系統,該狀況認知系統包括一伺服器、一第一擴增實境裝置與一第二擴增實境裝置,該第一擴增實境裝置包括一第一影像感測器與一第一透明顯示器,該第二擴增實境裝置包括一第二影像感測器與一第二透明顯示器,該狀況認知方法包括:取得多個訓練影像與關於每一該些訓練影像的一標記資料,其中該標記資料包括至少一訓練邊界框的大小;執行一非監督式分群演算法以將該些訓練影像的該至少一訓練邊界框的大小分為多個群組,並且從每一該些群組中取得一預設邊界框大小,該些預設邊界框大小用於僅一次推論程序中;透過該第一影像感測器擷取一第一數位影像;根據一第一卷積神經網路來偵測該第一數位影像中的至少一第一推薦物件以作為一待分類物件,該第一卷積神經網路用以執行該僅一次推論程序以輸出多個邊界框的位置與 多個類別信心值,其中該些邊界框的大小為固定;透過該第一擴增實境裝置的一使用者介面接收關於該待分類物件的一類別標籤;根據該類別標籤與該第一數位影像訓練一機器學習模型;以及透過該第二影像感測器擷取一第二數位影像,根據該機器學習模型偵測該第二數位影像中的一情境物件,並根據該情境物件的位置在該第二透明顯示器中顯示對應的圖示物件。 A situation awareness method is suitable for a situation awareness system. The situation awareness system includes a server, a first augmented reality device, and a second augmented reality device. The first augmented reality device includes a second augmented reality device. An image sensor and a first transparent display, the second augmented reality device includes a second image sensor and a second transparent display, the situation recognition method includes: obtaining a plurality of training images and information about each A label data of the training images, wherein the label data includes the size of at least one training bounding box; an unsupervised grouping algorithm is executed to divide the size of the at least one training bounding box of the training images into a plurality of groups Groups, and obtain a preset bounding box size from each of the groups, the preset bounding box sizes are used in a one-time inference process; a first digital image is captured through the first image sensor; Detect at least one first recommended object in the first digital image as an object to be classified according to a first convolutional neural network, and the first convolutional neural network is used to execute the only one-time inference process to output The position of multiple bounding boxes and A plurality of class confidence values, in which the size of the bounding boxes is fixed; a class label about the object to be classified is received through a user interface of the first augmented reality device; according to the class label and the first digit Image training a machine learning model; and capturing a second digital image through the second image sensor, detecting a contextual object in the second digital image according to the machine learning model, and according to the position of the contextual object The corresponding icon object is displayed on the second transparent display.
TW108147454A 2019-12-24 2019-12-24 Situation awareness system and method TWI745808B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW108147454A TWI745808B (en) 2019-12-24 2019-12-24 Situation awareness system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW108147454A TWI745808B (en) 2019-12-24 2019-12-24 Situation awareness system and method

Publications (2)

Publication Number Publication Date
TW202125326A TW202125326A (en) 2021-07-01
TWI745808B true TWI745808B (en) 2021-11-11

Family

ID=77908463

Family Applications (1)

Application Number Title Priority Date Filing Date
TW108147454A TWI745808B (en) 2019-12-24 2019-12-24 Situation awareness system and method

Country Status (1)

Country Link
TW (1) TWI745808B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI825654B (en) * 2022-04-07 2023-12-11 華碩電腦股份有限公司 Augmented reality implementing method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106599939A (en) * 2016-12-30 2017-04-26 深圳市唯特视科技有限公司 Real-time target detection method based on region convolutional neural network
CN206193386U (en) * 2016-09-27 2017-05-24 北京正安维视科技股份有限公司 Alert glasses equipment of using
US20170168566A1 (en) * 2010-02-28 2017-06-15 Microsoft Technology Licensing, Llc Ar glasses with predictive control of external device based on event input
US20190332889A1 (en) * 2016-11-09 2019-10-31 Konica Minolta Laboratory U.S.A., Inc. System and method of using multi-frame image features for object detection
US20190385005A1 (en) * 2018-06-19 2019-12-19 Himax Technologies Limited Framebuffer-less system and method of convolutional neural network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170168566A1 (en) * 2010-02-28 2017-06-15 Microsoft Technology Licensing, Llc Ar glasses with predictive control of external device based on event input
CN206193386U (en) * 2016-09-27 2017-05-24 北京正安维视科技股份有限公司 Alert glasses equipment of using
US20190332889A1 (en) * 2016-11-09 2019-10-31 Konica Minolta Laboratory U.S.A., Inc. System and method of using multi-frame image features for object detection
CN106599939A (en) * 2016-12-30 2017-04-26 深圳市唯特视科技有限公司 Real-time target detection method based on region convolutional neural network
US20190385005A1 (en) * 2018-06-19 2019-12-19 Himax Technologies Limited Framebuffer-less system and method of convolutional neural network

Also Published As

Publication number Publication date
TW202125326A (en) 2021-07-01

Similar Documents

Publication Publication Date Title
US10909401B2 (en) Attention-based explanations for artificial intelligence behavior
Shriram et al. [Retracted] Deep Learning‐Based Real‐Time AI Virtual Mouse System Using Computer Vision to Avoid COVID‐19 Spread
US10579860B2 (en) Learning model for salient facial region detection
US9875445B2 (en) Dynamic hybrid models for multimodal analysis
US20190392587A1 (en) System for predicting articulated object feature location
WO2019095118A1 (en) Method for classifying blemishes on skin and electronic device
US20200311116A1 (en) Context based media curation
Loke et al. Indian sign language converter system using an android app
US10803571B2 (en) Data-analysis pipeline with visual performance feedback
WO2016048502A1 (en) Facilitating dynamic affect-based adaptive representation and reasoning of user behavior on computing devices
US20180157883A1 (en) In-field data acquisition and formatting
Alnuaim et al. Human‐Computer Interaction with Hand Gesture Recognition Using ResNet and MobileNet
US10915734B2 (en) Network performance by including attributes
TWI745808B (en) Situation awareness system and method
KR20210008075A (en) Time search method, device, computer device and storage medium (VIDEO SEARCH METHOD AND APPARATUS, COMPUTER DEVICE, AND STORAGE MEDIUM)
Mazzamuto et al. Weakly supervised attended object detection using gaze data as annotations
Nagalapuram et al. Controlling media player with hand gestures using convolutional neural network
US11429812B2 (en) Detecting digital image manipulations
Kumar et al. Masked face age and gender identification using CAFFE-modified MobileNetV2 on photo and real-time video images by transfer learning and deep learning techniques
Wani et al. Hand gesture recognition using convex hull-based approach
Delabrida et al. Towards a wearable device for monitoring ecological environments
Arooj et al. Enhancing sign language recognition using CNN and SIFT: A case study on Pakistan sign language
TWM596391U (en) Situation awareness system
Mukherjee et al. Personalization of industrial human–robot communication through domain adaptation based on user feedback
ViswanathReddy et al. Facial emotions over static facial images using deep learning techniques with hysterical interpretation