TWI802881B

TWI802881B - System and method for visitor interest extent analysis

Info

Publication number: TWI802881B
Application number: TW110116950A
Authority: TW
Inventors: 黃基雲; 邱德正
Original assignee: 普安科技股份有限公司
Priority date: 2020-05-12
Filing date: 2021-05-11
Publication date: 2023-05-21
Also published as: TW202143167A

Abstract

A system for a visitor’s interest extent analysis and a method thereof is disclosed. The system and method is applied to analyze image data to calculate out the targets that a visitor may be looking at. With the positions of objects in the environment, it is inferred that what the visitor is interested in and the degree of interest. The system and method recognizes, through the deep learning according to an algorithm under the convolutional neural network (CNN) architecture, the head direction of the visitor in the image data. Further, by means of a virtual light source placed behind the head of the visitor, the gazing area where the visitor may be looking at is accordingly obtained. When the position of a target is located in a sub-area representing a certain probability within the gazing area, it is determined that the visitor has a degree of interest pn the target, and it also means that the visitor may be looking at the target.

Description

訪客興趣程度分析的系統和方法 System and method for visitor interest degree analysis

本發明係揭露一種判斷(Estimation)影像資料中人們/顧客所注視的標的為何種物體的方法。特別是一種基於卷積神經網路(Convolutional Neural Network,CNN)架構下的影像資料的偵測方法，該方法不需透過2個影像鏡頭(Camera Lens)所記錄的影像資料去追蹤其中人臉與眼球方向，僅透過單1個影像鏡頭所記錄的影像資料，就能達到辨識影像資料中人們所注視標的為何種物體的方法。 The present invention discloses a method for estimating what kind of object people/customers are looking at in image data. In particular, a detection method based on image data under the Convolutional Neural Network (CNN) architecture, which does not need to track the face and Eyeball direction, only through the image data recorded by a single image lens, can achieve the method of identifying what kind of object people are looking at in the image data.

請參考圖1A，繪示先前技術所實施的一種注視標的追蹤(Gaze Tracking)判定方法。圖1A中顯示環境中有一人注視(Gaze)眼前的一物體(Object)50。又，在此環境的兩個不同位置(Location)處分別架設有攝影機，稱為Camera 1(Cam 1，虛線圓圈處)60與Camera 2(Cam 2，虛線圓圈處)70，此兩支攝影機對進入環境的人及環境內的物體50進行攝影；其中，在某個時間點該兩支攝影機Cam 1 60、Cam 2 70所擷取到的人臉影像如圖1B所示。該兩支攝影機Cam 1 60、Cam 2 70根據擷取到的人臉影像上的資訊計算臉部中心線(Facial Central Line，圖未示)、眼睛內眼球的方向(圖未示)，以求得此人注視視線(Gazing Line)40的方向，依此推估這個人注視的標的是否為物體50。 Please refer to FIG. 1A , which illustrates a determination method of gaze tracking (Gaze Tracking) implemented in the prior art. FIG. 1A shows a person in the environment who is gazing at (Gaze) an object (Object) 50 in front of him. In addition, cameras are respectively set up at two different locations (Location) in this environment, which are called Camera 1 (Cam 1, at the dotted circle) 60 and Camera 2 (Cam 2, at the dotted circle) 70. The people entering the environment and the objects 50 in the environment are photographed; wherein, at a certain point in time, the face images captured by the two cameras Cam 1 60 and Cam 2 70 are shown in FIG. 1B . The two cameras, Cam 1 60 and Cam 2 70, calculate the Facial Central Line (Facial Central Line, not shown in the figure) and the direction of the eyeball in the eye (not shown in the figure) according to the information on the captured face image, so as to obtain The direction of the person's gaze (Gazing Line) 40 is obtained, and it is estimated whether the person's gaze is the object 50 based on this.

圖1A中的先前技術為求得影像資料中人物注視視線40及該人物所注視的物體50必須要以下條件配合： The prior art in Fig. 1A must meet the following conditions in order to obtain the line of sight 40 and the object 50 that the person is looking at in the image data:

(a)一般的狀況下必須使用2台Camera才能實施此一先前技術，若是環境中只設置一台Camera則無法達成。在另一種狀況下，若是使用3D Camera時，因一台3D Camera會配置2個或2個以上的鏡頭(Lens)，如此亦可達成以上設置2台一般Camera的效果。不過，無論是使用2台一般Camera或是一台3D Camera，都需要較高的Camera成本來達成此一先前技術。 (a) Under normal circumstances, two cameras must be used to implement this previous technology, and it cannot be achieved if only one camera is installed in the environment. In another case, if a 3D camera is used, since one 3D camera will be equipped with 2 or more lenses (Lens), the above effect of setting 2 general cameras can also be achieved. However, regardless of using two general cameras or one 3D camera, a relatively high camera cost is required to achieve this previous technology.

(b)因為此一先前技術需要辨識人臉上佔比不大的眼睛以及眼睛內眼球的方向，因此須要影像資料的「解析度」高過一定的門檻，才得以達到辨識人物的「臉部中心線」與「眼球的偏移角度」的目的。此外，若是影像資料中人物的臉部被遮住或是因為影像解析度不夠的緣故，當影像資料內人物的「臉部中心線」、「眼球的偏移角度」其中之一無法得知時，則無法計算出人物所注視的標的為何。 (b) Because this previous technology needs to identify the eyes that account for a small proportion of the human face and the direction of the eyeballs in the eyes, the "resolution" of the image data must be higher than a certain threshold to achieve the recognition of the "face" of the person. The purpose of "center line" and "eyeball offset angle". In addition, if the face of the person in the image data is covered or due to insufficient image resolution, when one of the "face centerline" and "eyeball deviation angle" of the person in the image data cannot be known , it is impossible to calculate what the character is looking at.

(c)需要較大的影像資料儲存空間，用於儲存解析度較高的影像資料。因為此一先前技術需要較高解析度的影像資料，如此每一影像資料的大小會增加，同樣的資料量需要較多的儲存空間，因而增加儲存空間的成本。 (c) Larger image data storage space is required for storing image data with higher resolution. Because this prior art requires higher resolution image data, the size of each image data will increase, and the same amount of data requires more storage space, thus increasing the cost of storage space.

此外，使用此一先前技術在辨識影像資料時由於計算複雜，因此需要花費較多的時間，或是以較高的硬體成本來節省時間。換句話說，使用較低成本的硬體則花費時間長，效能差；若需縮短時間、增加效能，則須投入較高的硬體成本。 In addition, when using this prior art to identify image data, due to the complicated calculation, it needs to spend more time, or use higher hardware cost to save time. In other words, using lower-cost hardware takes a long time and has poor performance; if you need to shorten the time and increase performance, you must invest more hardware cost.

由上述的說明可知圖1A與圖1B代表之先前技術的缺點。故，本案發明提出一種解決上述先前技術中缺點的技術方法，除了可讓使用到的硬體成本降低外，同時也增加系統整體效能。 The disadvantages of the prior art represented by FIGS. 1A and 1B can be seen from the above description. Therefore, the present invention proposes a technical method to solve the shortcomings of the above-mentioned prior art, which not only reduces the cost of the hardware used, but also increases the overall performance of the system.

本發明之一目的係提供一種一種訪客興趣程度分析系統，係用於分析一訪客對至少一物品的興趣程度，該系統包括：至少一影像擷取裝置，設置於一地點，用以擷取該地點的一影像資料，該影像資料中記錄有該訪客的一第一頭部影像；以及一影像分析伺服器，連接於該至少一影像擷取裝置，用以計算並分析來自該至少一影像擷取裝置的該影像資料，該影像分析伺服器更包含：一資料處理中心，用於執行一影像分析應用程式，可以依據自該第一頭部影像中所得的一第一特徵對映(feature mapping)而判斷該第一頭部影像所對應的一第一頭部方向，並依循該第一頭部方向而計算出一第一投影區域；以及一記憶單元，用來暫存該影像資料、該第一頭部影像、該第一特徵對映(feature mapping)、以及其它該資料處理中心在運作過程中所需或產出的相關資料；其中，該第一投影區域的計算係利用一虛擬光源，該虛擬光源置於該訪客的頭部位置的後方且與該第一頭部方向一致的方向投射光線而形成模擬的該第一投影區域，當該第一投影區域涵蓋該至少一物品的位置時，即表示該訪客對該至少一物品有一興趣程度。 An object of the present invention is to provide a visitor interest degree analysis system, which is used to analyze a visitor's degree of interest in at least one item. An image data of the location, the image data records a first head image of the visitor; and an image analysis server connected to the at least one image capture device for calculating and analyzing The image data of the device is fetched, and the image analysis server further includes: a data processing center for executing an image analysis application program, which can be based on a first feature mapping (feature mapping) obtained from the first head image ) to determine a first head direction corresponding to the first head image, and calculate a first projection area according to the first head direction; and a memory unit for temporarily storing the image data, the The first head image, the first feature mapping, and other relevant data required or produced by the data processing center during operation; wherein, the calculation of the first projected area uses a virtual light source , the virtual light source is placed behind the head position of the visitor and projects light in a direction consistent with the first head direction to form the simulated first projection area, when the first projection area covers the position of the at least one item , it means that the visitor has a degree of interest in the at least one item.

本發明之另一目的係提供一種訪客興趣程度分析系統，連接於至少一影像擷取模組且接收來自該至少一影像擷取模組的一影像資料，係用於分析一訪客對至少一標的的興趣程度，該系統包括：一資料處理中心，執行一影像分析應用程式，可以依據自該影像資料中一第一頭部影像所得的一第一特徵對映(feature mapping)而判斷該第一頭部影像所對應的一第一頭部方向，並依循該第一頭部方向而計算出一第一投影區域；以及一記憶單元，用來暫存該影像資料、該第一頭部影像、該第一特徵對映(feature mapping)、以及其它該資料處理中心在運作過程中所需或產出的相關資料；其中，該第一投影區域的計算係利用一虛擬光源，該虛擬光源置於該訪客的頭部位置的後方且與該第一頭部方向一致的方向投射光線而形成模擬的該第一投影區域，當該第一投影區域涵蓋該至少一標的的位置時，即表示該訪客對該至少一標的有一興趣程度。 Another object of the present invention is to provide a visitor interest level analysis system, which is connected to at least one image capture module and receives an image data from the at least one image capture module for analysis A visitor's degree of interest in at least one target, the system includes: a data processing center, executing an image analysis application program, which can be based on a first feature mapping (feature mapping) obtained from a first head image in the image data ) to determine a first head direction corresponding to the first head image, and calculate a first projection area according to the first head direction; and a memory unit for temporarily storing the image data, the The first head image, the first feature mapping, and other relevant data required or produced by the data processing center during operation; wherein, the calculation of the first projected area uses a virtual light source , the virtual light source is placed behind the head position of the visitor and projects light in a direction consistent with the first head direction to form a simulated first projection area, when the first projection area covers the position of the at least one target , it means that the visitor has a degree of interest in the at least one target.

本發明之更一目的係提供一種分析訪客興趣程度的方法，該方法係由一影像分析伺服器所執行，用以判斷一訪客對至少一標的的興趣程度，該方法包括：提供一影像分析應用程式於該影像分析伺服器中；該影像分析伺服器取得一影像資料；該影像分析伺服器偵測該影像資料中具有一第一頭部特徵的一第一頭部影像；該影像分析伺服器分析該第一頭部影像，並藉由分析所得的一第一特徵對映(feature mapping)而判斷該第一頭部影像所對應的一第一頭部方向；該影像分析伺服器計算該第一頭部影像於一立體空間中的所在位置，並依據該第一頭部影像的所在位置、該第一頭部方向以及利用一虛擬光源計算出模擬的一第一投影區域；以及該影像分析伺服器根據該第一投影區域的涵蓋範圍以及該至少一標的的位置，而判斷對應該第一頭部影像的該訪客是否對該至少一標的有一興趣程度。 Another object of the present invention is to provide a method for analyzing the degree of interest of a visitor. The method is executed by an image analysis server to determine the degree of interest of a visitor in at least one target. The method includes: providing an image analysis application The program is in the image analysis server; the image analysis server obtains an image data; the image analysis server detects a first head image having a first head feature in the image data; the image analysis server analyzing the first head image, and judging a first head direction corresponding to the first head image by analyzing a first feature mapping; the image analysis server calculates the first the position of a head image in a three-dimensional space, and calculate a simulated first projection area according to the position of the first head image, the first head direction, and a virtual light source; and the image analysis According to the coverage of the first projection area and the position of the at least one object, the server judges whether the visitor corresponding to the first head image is interested in the at least one object.

40:注視視線 40: Look at the line of sight

50:標的(Target) 50: Target

60,70:影像攝影機 60,70: video camera

100:影像資料擷取與分析系統 100:Image data acquisition and analysis system

150:雲資料儲存單元 150: cloud data storage unit

180,190:網路 180,190: Internet

300:影像分析伺服器 300: image analysis server

300M:影像處理裝置 300M: Image processing device

310:中央處理單元 310: central processing unit

320:動態隨機存取記憶體 320: Dynamic Random Access Memory

330:非揮發性記憶體 330: Non-volatile memory

340:唯讀記憶體 340: ROM

350:儲存介面控制器 350: storage interface controller

360:網路介面控制器 360: Network Interface Controller

370:圖形處理單元 370: Graphics Processing Unit

380:實體儲存裝置陣列 380:Physical Storage Device Array

385:實體儲存裝置 385:Physical storage device

390:匯流排 390: busbar

400,400A,400B,400C,400N,400M:影像擷取裝置 400, 400A, 400B, 400C, 400N, 400M: image capture device

405:人流分析模組 405: People flow analysis module

410:頭部偵測模組 410: Head detection module

415:訪客身份辨識模組 415:Visitor identification module

420:商品偵測模組 420:Commodity detection module

425:注視區域分析模組 425:Gaze area analysis module

430:注視物品分析模組 430:Gaze item analysis module

435:興趣度分析模組 435:Interest Analysis Module

440:移動路徑分析模組 440:Movement path analysis module

445:訪客屬性分析模組 445:Visitor attribute analysis module

450:資料庫 450: Database

455:回流客分析模組 455:Returning customer analysis module

460:影像分析應用程式 460: Image Analysis Applications

465:雲通道服務 465:Cloud channel service

465M:多媒體播放程式 465M: Multimedia Player

470:超控制器 470: super controller

475:作業系統 475: operating system

480:硬體 480: hardware

800A,800B:頭部 800A, 800B: head

805A,805B,805C,805D:切線 805A, 805B, 805C, 805D: tangent

810:屏幕 810: screen

815A,815B:投影 815A, 815B: projection

820:投影區域(機率95%) 820: Projection area (probability 95%)

820E:投影區域(機率98%) 820E: Projection area (probability 98%)

830:投影區域(機率65%) 830: Projection area (probability 65%)

840:投影區域(機率15%) 840: Projection area (probability 15%)

850:影音顯示裝置 850: Audio-visual display device

855:纜線 855: cable

860:網路 860: network

870:內容伺服器 870:Content server

880:媒體播放器 880:Media player

890:螢幕 890: screen

510,520,530,550,560,570,610,620,630,640,700,705,710,715,720,725,772A,722B,722C,730,735,740,745,750,755,760,765,900,910,920,1000,1010,1020,1030,1040,1050,1060,1070,1080,1090,1110,1120,1130,1140,1200,1205,1210,1220,1230,1240,1250,1260,1270,1280,1290,1300,1310,1320,1330:步驟 510,520,530,550,560,570,610,620,630,640,700,705,710,715,720,725,772A,722B,722C,730,735,740,745,750,755,760,765,900,910,92 0,1000,1010,1020,1030,1040,1050,1060,1070,1080,1090,1110,1120,1130,1140,1200,1205,1210,1220,1230,1240,1250 ,1260,1270,1280,1290,1300,1310,1320,1330: steps

A1,A2,B1,B2,B3,C1,C2,C3,C4:流程圖節點 A1,A2,B1,B2,B3,C1,C2,C3,C4: flowchart nodes

圖1A係依據先前技術中以2部Camera來偵測女性所注視的物體之示意圖。 FIG. 1A is a schematic diagram of using two cameras to detect the object a woman is looking at according to the prior art.

圖1B係依據圖1A中Camera 1與Camera 2所記錄女性注視物體時臉部眼睛的示意圖。 Fig. 1B is a schematic diagram of the face and eyes of a woman when she is gazing at an object recorded by Camera 1 and Camera 2 in Fig. 1A.

圖2係依據本發明的一實施例中的影像資料擷取與分析系統的系統架構之方塊圖。 FIG. 2 is a block diagram of the system architecture of the image data acquisition and analysis system according to an embodiment of the present invention.

圖3A係依據圖2中的影像分析伺服器的硬體基本架構之方塊圖。 FIG. 3A is a block diagram of the basic hardware architecture of the image analysis server in FIG. 2 .

圖3B係依據圖2中的影像分析伺服器的軟、硬體的架構關係之方塊圖。 FIG. 3B is a block diagram showing the relationship between software and hardware architecture of the image analysis server in FIG. 2 .

圖4係依據圖3B中的影像資料之分析與應用中各功能模組之方塊圖。 FIG. 4 is a block diagram of each functional module in the analysis and application of the image data in FIG. 3B.

圖5係依據本發明的一實施例中基於CNN架構下的辨識影像資料中人們及物體的整體應用概念之流程圖。 FIG. 5 is a flow chart of the overall application concept of identifying people and objects in image data based on CNN architecture according to an embodiment of the present invention.

圖6係依據圖5中所示的訓練階段之流程圖。 FIG. 6 is a flowchart according to the training phase shown in FIG. 5 .

圖7A係依據本發明的一實施例中基於CNN架構下關於人的資料集。 FIG. 7A is a data set about people based on CNN architecture in an embodiment of the present invention.

圖7B係依據本發明的一實施例中基於CNN架構下關於物體的資料集(一)。 FIG. 7B is a data set (1) about objects based on CNN architecture in an embodiment of the present invention.

圖7C係依據本發明的一實施例中基於CNN架構下關於物體的資料集(二)。 FIG. 7C is a data set (2) about objects based on CNN architecture in an embodiment of the present invention.

圖8A係依據本發明的一實施例中該影像資料之分析與應用執行辨識影像圖片中的顧客的臉/頭部之示意圖。 FIG. 8A is a schematic diagram of performing the analysis and application of the image data to recognize the face/head of the customer in the image according to an embodiment of the present invention.

圖8B係依據本發明的一實施例中該影像資料之分析與應用執行辨識影像圖片中的顧客身份與商品名稱之示意圖。 FIG. 8B is a schematic diagram of identifying customer identities and product names in image pictures according to the analysis and application of the image data in an embodiment of the present invention.

圖8C係依據本發明的一實施例中該影像資料之分析與應用執行辨識影像圖片中的商品名稱之示意圖。 FIG. 8C is a schematic diagram of performing the analysis and application of the image data to identify the product name in the image according to an embodiment of the present invention.

圖9係依據本發明的一實施例中分析賣場中購物顧客最有興趣的商品之流程圖。 FIG. 9 is a flow chart of analyzing products that shoppers are most interested in in a store according to an embodiment of the present invention.

圖10係依據圖9的步驟755中所揭露的資訊記錄之示意圖。 FIG. 10 is a schematic diagram of information records disclosed in step 755 of FIG. 9 .

圖11A係依據本發明的一實施例中人的頭部方向(1)、人的頭部方向(2)與投影光線之間的關係之示意圖。 Figure 11A shows the direction of the person's head (1), the direction of the person's head (2) and the projection according to an embodiment of the present invention Schematic diagram of the relationship between light rays.

圖11B係依據圖11A中的人的頭部方向與投影光線之間所形成的一區域範圍之示意圖。 FIG. 11B is a schematic diagram of an area formed between the direction of the person's head and the projection light in FIG. 11A .

圖11C係依據本發明的一實施例中人的頭部方向(3)、人的頭部方向(3)轉動角度計算與投影光線之間的關係之示意圖。 11C is a schematic diagram of the relationship between the direction of the human head ( 3 ), the calculation of the rotation angle of the human head direction ( 3 ) and the projection light according to an embodiment of the present invention.

圖11D係依據本發明的一實施例中人的頭部方向(4)、人的頭部方向(4)轉動角度計算與投影光線之間的關係之示意圖。 FIG. 11D is a schematic diagram of the relationship between the direction of the person's head ( 4 ), the calculation of the rotation angle of the person's head direction ( 4 ) and the projected light according to an embodiment of the present invention.

圖12A係依據本發明的一實施例中的智慧電子看板系統之系統架構圖。 FIG. 12A is a system architecture diagram of a smart electronic kanban system according to an embodiment of the present invention.

圖12B係依據本發明的一實施例中的智慧電子看板系統之情境示意圖。 FIG. 12B is a schematic diagram of the situation of the smart electronic kanban system according to an embodiment of the present invention.

圖13A係依據圖12A中的影像處理裝置的硬體的基本架構之方塊圖。 FIG. 13A is a block diagram of the basic architecture of the hardware of the image processing device in FIG. 12A .

圖13B係依據圖12A中的影像處理裝置的軟、硬體的架構之關係圖。 FIG. 13B is a diagram showing the relationship between the software and hardware architectures of the image processing device in FIG. 12A .

圖14係依據本發明的一實施例中的智慧電子看板系統的整體操作之流程圖。 FIG. 14 is a flowchart of the overall operation of the smart electronic kanban system according to an embodiment of the present invention.

圖15係依據圖14中關於智慧電子看板系統的人物偵測/統計分析階段之流程圖。 FIG. 15 is a flow chart of the person detection/statistical analysis phase of the smart electronic kanban system according to FIG. 14 .

圖16係依據圖14中關於智慧電子看板系統的應用階段之流程圖。 Fig. 16 is a flow chart of the application stage of the smart electronic kanban system according to Fig. 14 .

請參考圖2，依據本發明的一實施例，圖2係一影像資料擷取與分析系統(Image Data Capture and Analysis System,IDCAS)100，該系統100包括複數個影像擷取裝置(Image Capture Device)400A至400N、一雲資料儲存單元(Cloud Storage Unit)150與一影像分析伺服器(Image Analysis Server)300。其中該複數個影像擷取裝置400A至400N與該雲資料儲存單元150之間係透過網路(network)180或傳輸線(transmission line)互相傳送訊號與資料，而該雲資料儲存單元150與該影像分析伺服器300之間係透過網路(network)190或傳輸線(transmission line)互相傳送訊號與資料。 Please refer to FIG. 2, according to an embodiment of the present invention, FIG. 2 is an image data capture and analysis system (Image Data Capture and Analysis System, IDCAS) 100, the system 100 includes a plurality of image capture devices (Image Capture Device ) 400A to 400N, a cloud data storage unit (Cloud Storage Unit) 150 and an image analysis server (Image Analysis Server) 300 . The plurality of image capture devices 400A to 400N and the cloud data storage unit 150 transmit signals and data to each other through a network 180 or a transmission line, and the cloud data storage unit 150 and the image The analysis servers 300 transmit signals and material.

依據本發明的一實施例，圖2中該複數個影像擷取裝置400A至400N可以是任何形式的一般攝影機或IP攝影機(IP Camera)，例如：可遙控上下仰角、左右旋轉、鏡頭遠近的雲台攝影機(Tripod Head Camera)、圓頂攝影機(Doom Camera)、紅外線攝影機(Infrared Camera)、魚眼攝影機(Fisheye Camera)、3D攝影機(3D Camera)等。該複數個影像擷取裝置400A至400N係設置於不同的地點(Site)，採取持續不間斷攝影的模式、定時攝影的模式、或是移動偵測(motion detection)的模式，擷取各個地點的環境中的影像資料，並透過網路(Network)180或傳輸線(transmission line)將該些影像資料傳送至雲資料儲存單元150來儲存。該網路180可以是一區域網路(Local Area Network,LAN)、一廣域網路(Wide Area Network,WAN)、一網際網路(Internet)、或是一無線網路(Wireless Network)。該影像擷取裝置400A至400N所擷取的影像資料的格式可以是「動態影像專家組」(Moving Picture Experts Group，即MPEG)制定的影音格式，例如：MPEG-4、MPEG-2…等，或是其他影音格式，例如：音訊影片交錯(Audio Video Interleave,AVI)、真實媒體可變位元速率(Real Media Variable Bitrate,RMVB)…等。該影像資料的規格可以是，例如：解析度(Resolution)1980X1080畫素(pixel)/30幀(Frame Per Second,FPS)的影像資料，但不受限於此規格的影像資料，也可以是任何畫素/幀數規格的影像資料。上述所指持續不間斷的攝影模式係指，攝影機不分白天晚上整日持續的攝影；上述所指定時的攝影模式係指，設定攝影機在某一或某些時段才會攝影。例如：09：00-20：00時段(營業時間)進行攝影，其他時間則不進行攝影；或是11：00-14：00以及17：00-21：00兩個時段進行攝影，其他時間則不進行攝影。上述所指移動偵測(motion detection)的攝影模式是指，當攝影機偵測到其環境內出現物體移動時，才會觸發攝影機的攝影功能，攝影的時間可能是一段預設時間或是持續到沒有再偵測到任何物體移動的現象為止。依據本發明的另一實施例，複數個影像擷取裝置400A至400N在將影像資料傳送至雲資料儲存單元150儲存之前，可以先經過一影像處理裝置(圖未示)先行處理部分的影像資料後再傳送至雲資料儲存單元150，然後影像分析伺服器300再由雲資料儲存單元150下載影像資料做進一步處理。因影像資料的處理會佔用影像分析伺服器300的圖形處理器(GPU)370相當龐大的資源，故此做法可以減少傳送至影像分析伺服器300的影像資料量，也可以減輕影像分析伺服器300的工作負載，使該系統100整體效能增加。依據本發明的一實施例，上述影像處理裝置(圖未示)可以是其他的資料伺服器(Data Server)、個人電腦(PC)、筆記型電腦(Notebook PC)、平版型電腦(Tablet PC)或是另一影像分析伺服器300。 According to an embodiment of the present invention, the plurality of image capture devices 400A to 400N in FIG. 2 can be general cameras or IP cameras (IP Cameras) in any form, for example: clouds that can remotely control up and down elevation angles, left and right rotations, and lens distances. Tripod Head Camera, Doom Camera, Infrared Camera, Fisheye Camera, 3D Camera, etc. The plurality of image capture devices 400A to 400N are installed at different sites (Site), and adopt the mode of continuous uninterrupted shooting, the mode of timed shooting, or the mode of motion detection (motion detection) to capture the images of each site. image data in the environment, and transmit the image data to the cloud data storage unit 150 through the network (Network) 180 or transmission line for storage. The network 180 can be a Local Area Network (LAN), a Wide Area Network (WAN), an Internet, or a Wireless Network. The format of the image data captured by the image capture devices 400A to 400N may be an audio-visual format stipulated by the "Moving Picture Experts Group" (MPEG), such as MPEG-4, MPEG-2, etc. Or other audio-visual formats, such as: Audio Video Interleave (AVI), Real Media Variable Bitrate (RMVB)...etc. The specification of the image data can be, for example: resolution (Resolution) 1980X1080 pixel (pixel)/30 frame (Frame Per Second, FPS) image data, but not limited to the image data of this specification, also can be any Image data with pixel/frame number specifications. The continuous and uninterrupted shooting mode mentioned above refers to the continuous shooting of the camera throughout the day regardless of day and night; For example: 09:00-20:00 time period (business hours) for photography, and other times for photography; or 11:00-14:00 and 17:00-21:00 for photography, other time for photography Photography is not performed. The motion detection shooting mode mentioned above means that when the camera detects that there is movement of an object in its environment, the shooting function of the camera will be triggered, and the shooting time may be a period of time The preset time may continue until no further object movement is detected. According to another embodiment of the present invention, before the plurality of image capture devices 400A to 400N transmit the image data to the cloud data storage unit 150 for storage, an image processing device (not shown) may first process part of the image data It is then sent to the cloud data storage unit 150, and then the image analysis server 300 downloads the image data from the cloud data storage unit 150 for further processing. Because the processing of image data will occupy a considerable amount of resources of the graphics processing unit (GPU) 370 of the image analysis server 300, so this approach can reduce the amount of image data sent to the image analysis server 300, and can also reduce the load on the image analysis server 300. The workload increases the overall performance of the system 100 . According to an embodiment of the present invention, the above-mentioned image processing device (not shown) may be other data server (Data Server), personal computer (PC), notebook computer (Notebook PC), tablet type computer (Tablet PC) Or another image analysis server 300 .

依據本發明的一實施例，圖2中的雲資料儲存單元150是指一雲提供者(Cloud Provider)所建構的一雲端資料儲存環境，該雲提供者可以是，例如：Google、Amazon、阿里巴巴、DropBox…等提供雲端資料環境的廠商。該雲資料儲存單元150提供充足的資料儲存空間給圖2中的影像擷取裝置400A至400N用以長久儲存影像資料，而影像擷取裝置400A至400N的內部儲存空間則只用於暫存資料。依據本發明的另一個實施例，該雲資料儲存單元150可以省略，圖2中的影像擷取裝置400A至400N亦可以直接將影像資料傳送給影像分析伺服器300來處理。 According to an embodiment of the present invention, the cloud data storage unit 150 in Fig. 2 refers to a cloud data storage environment constructed by a cloud provider (Cloud Provider), and the cloud provider can be, for example: Google, Amazon, Ali Baba, DropBox... and other vendors that provide cloud data environments. The cloud data storage unit 150 provides sufficient data storage space for the image capture devices 400A to 400N in FIG. 2 for permanent storage of image data, while the internal storage space of the image capture devices 400A to 400N is only used for temporary data storage. . According to another embodiment of the present invention, the cloud data storage unit 150 can be omitted, and the image capture devices 400A to 400N in FIG. 2 can also directly transmit the image data to the image analysis server 300 for processing.

請參考圖2，依據本發明的一實施例，圖2中的影像分析伺服器300係透過一網路190下載影像擷取裝置400A至400N傳送至雲資料儲存單元150的影像資料，然後做一系列的影像分析與處理。該網路190可以是一區域網路(Local Area Network,LAN)、一廣域網路(Wide Area Network,WAN)、一網際網路(Internet)、或是一無線網路(Wireless Network)。影像分析伺服器300進行上述之影像分析與處理之前，須先做深度學習(Deep Learning)，以訓練本身具有可以辨識影像中特定特徵(feature)的能力；其中，該深度學習(Deep Learning)基本上是應用卷積神經網路(Convolutional Neural Network,CNN)的演算法進行。影像分析伺服器300從所連接的一資料庫(Data Base)450內讀取已定義特定類別的資料集(Data Set)，透過卷積神經網路(Convolutional Neural Network,CNN)的演算法，學習代表該資料集的特徵；例如：從定義「人臉」的資料集(Data Set)中「學習」影像資料中「人臉」的特徵(Feature)。待訓練(Training)/學習(Learning)的階段完成後，影像分析伺服器300便有能力可以辨識其它影像資料是否具有該特定特徵(Feature)。 Please refer to FIG. 2, according to an embodiment of the present invention, the image analysis server 300 in FIG. 2 downloads the image data transmitted from the image capture devices 400A to 400N to the cloud data storage unit 150 through a network 190, and then performs a A series of image analysis and processing. The network 190 may be a local area network (Local Area Network, LAN), a wide area network (Wide Area Network, WAN), an Internet (Internet), or It is a wireless network (Wireless Network). Before the image analysis server 300 performs the above-mentioned image analysis and processing, it must do deep learning (Deep Learning) to train itself to have the ability to identify specific features in the image; wherein, the deep learning (Deep Learning) basically The above is the application of convolutional neural network (Convolutional Neural Network, CNN) algorithm. The image analysis server 300 reads a defined data set (Data Set) of a specific category from a connected database (Data Base) 450, and learns through the algorithm of a convolutional neural network (CNN). Represents the features of the data set; for example: "learn" the features (Feature) of the "face" in the image data from the data set (Data Set) that defines the "face". After the training/learning phase is completed, the image analysis server 300 is capable of identifying whether other image data has the specific feature.

請參考圖3A，圖3A係根據本發明一實施例繪示圖2中的影像分析伺服器300其硬體480的基本架構方塊圖。依據本發明之一實施例，該影像分析伺服器300的硬體480基本架構包括：一中央處理單元(CPU)310、一唯讀記憶體(ROM)340、一動態隨機存取記憶體(DRAM)320、一儲存介面控制器(Storage Interface Controller)350、一實體儲存裝置陣列(Physical Storage Device Array,PSD Array)380、一非揮發性記憶體(NVRAM)330、一網路介面控制器(Network Interface Controller,NIC)360及一圖形處理單元(GPU)370。上述各單元間透過一或多個匯流排(Bus)390來傳遞彼此間的訊息與資料。其中，該中央處理單元(CPU)310與該圖形處理單元(GPU)370可以是兩獨立單元、或者是整合在一晶片或一軟/硬體模組之中而形成一資料處理中心；該實體儲存裝置陣列(PSD Array)380更包含複數個實體儲存裝置(Physical Storage Device,PSD)385，例如：硬碟(Hard Disk Drive,HDD)、固態硬碟(Solid State Disk,SSD)、或其它可以達到相同儲存功能的實體儲存裝置385。 Please refer to FIG. 3A . FIG. 3A is a block diagram illustrating the basic structure of the hardware 480 of the image analysis server 300 in FIG. 2 according to an embodiment of the present invention. According to one embodiment of the present invention, the basic structure of the hardware 480 of the image analysis server 300 includes: a central processing unit (CPU) 310, a read-only memory (ROM) 340, a dynamic random access memory (DRAM) ) 320, a storage interface controller (Storage Interface Controller) 350, a physical storage device array (Physical Storage Device Array, PSD Array) 380, a non-volatile memory (NVRAM) 330, a network interface controller (Network Interface Controller (NIC) 360 and a graphics processing unit (GPU) 370. The above-mentioned units transmit information and data between each other through one or more bus bars (Bus) 390 . Wherein, the central processing unit (CPU) 310 and the graphics processing unit (GPU) 370 can be two independent units, or be integrated into a chip or a software/hardware module to form a data processing center; the entity The storage device array (PSD Array) 380 further includes a plurality of physical storage devices (Physical Storage Device, PSD) 385, such as: hard disk (Hard Disk Drive, HDD), solid state disk (Solid State Disk, SSD), or other possible The physical storage device 385 achieves the same storage function.

圖3A中，該中央處理單元310(或是該中央處理單元(CPU)310與該圖形處理單元(GPU)370整合在一起後的該資料處理中心)係為該影像分析伺服器300的一核心單元，其用來執行硬體、作業系統、與應用程式間的資料處理程序。其中該中央處理單元310(或是該資料處理中心)可為一Power PC、一x86或任何架構的CPU。該唯讀記憶體340係用來儲存該影像分析伺服器300開機時的基本輸出入系統(BIOS)及/或其他程式。該動態隨機存取記憶體(DRAM)320係用做為CPU指令或是各種影像資料的暫存之處，其可用來儲存來自該雲資料儲存單元150的影像資料，以等待該中央處理單元310及/或該圖形處理單元370處理；或是暫時儲存該中央處理單元310及/或該圖形處理單元370已處理好的資料，以待適當時間再儲存至該實體儲存裝置陣列380內、或是將上述已處理好的資料透過網路介面控制器360送出。該圖形處理單元370係該影像分析伺服器300的另一核心單元，基本上係用於處理與圖形相關的影像資料。該圖形處理單元370的硬體係特別為配合處理圖形或影像的需求而設計，因此在處理影像資料上一般比該中央處理單元310快速得多，適合用於處理大量的影像資料。該非揮發性記憶體(NVRAM)330可以利用快閃記憶體(Flash memory)來實現，其用來儲存輸出入請求(I/O request)的執行狀態的相關資料，以備輸出入請求(I/O request)存取該實體儲存裝置陣列380的相關操作尚未做完前發生不正常電源關閉時，做為檢驗使用。該儲存介面控制器350係為一儲存介面，用以將中央處理單元310及/或該圖形處理單元370執行處理後的資料，儲存至實體儲存裝置陣列380內，或是自實體儲存裝置陣列380內讀取相關的資料至動態隨機存取記憶體320內暫存，等待中央處理單元310及/或該圖形處理單元370處理。該儲存介面控制器350所採用的通訊協定可以是光纖通道(Fiber Channel,FC)、串列附接小型電腦系統介面(Serial Attached SCSI,SAS)、串列先進技術接取(Serial ATA,SATA)、或是任何適用的傳輸協定。該實體儲存裝置陣列380係由多個實體儲存裝置385所組成，用來提供該影像分析伺服器300儲存資料的空間。依據本發明的另一實施例，當該影像分析伺服器300不提供資料的儲存空間時，則該實體儲存裝置陣列380可以省略，而此情況下，資料則更改儲存至該非揮發性記憶體(NVRAM)330內或是儲存至一外接式的儲存裝置，如JBOD。該網路介面控制器360對外連接到一網路，其係將中央處理單元310及/或該圖形處理單元370處理後的資料或是訊息透過網路傳給網路上的其他裝置，或是將該網路上其他裝置的資料傳送至動態隨機存取記憶體320內暫存。依據本發明的另一實施例，當中央處理單元310處理資料的能力足以同時負荷處理外部命令與處理影像資料時，則上述之資料處理中心只有包含中央處理單元310，圖形處理單元370可以省略。 In FIG. 3A, the central processing unit 310 (or the data processing center after the central processing unit (CPU) 310 and the graphics processing unit (GPU) 370 are integrated together) is a core of the image analysis server 300 A unit used to execute data processing procedures between hardware, operating system, and application programs. Wherein the central processing unit 310 (or the data processing center) can be a Power PC, an x86 or CPU of any architecture. The ROM 340 is used to store the BIOS and/or other programs when the image analysis server 300 is powered on. The dynamic random access memory (DRAM) 320 is used as a temporary storage place for CPU instructions or various image data, and it can be used to store image data from the cloud data storage unit 150 to wait for the central processing unit 310 and/or processed by the graphics processing unit 370; or temporarily store the data processed by the central processing unit 310 and/or the graphics processing unit 370, and then store them in the physical storage device array 380 at an appropriate time, or Send the above-mentioned processed data through the network interface controller 360 . The graphics processing unit 370 is another core unit of the image analysis server 300 and is basically used for processing image data related to graphics. The hardware system of the graphics processing unit 370 is specially designed to meet the requirements of processing graphics or images, so it is generally much faster than the central processing unit 310 in processing image data, and is suitable for processing a large amount of image data. The non-volatile memory (NVRAM) 330 can be realized by using a flash memory (Flash memory), and it is used to store relevant data of the execution state of the I/O request (I/O request) in preparation for the I/O request (I/O request). O request) when an abnormal power supply shutdown occurs before the related operations of accessing the physical storage device array 380 are completed, it is used as a check. The storage interface controller 350 is a storage interface, which is used to store the data processed by the central processing unit 310 and/or the graphics processing unit 370 in the physical storage device array 380, or from the physical storage device array 380 Relevant data are read from within and temporarily stored in the DRAM 320 , waiting for processing by the central processing unit 310 and/or the graphics processing unit 370 . The communication protocol adopted by the storage interface controller 350 may be Fiber Channel (Fiber Channel, FC), Serial Attached Small Computer System Interface (Serial Attached SCSI, SAS), Serial Advanced Technology Access (Serial ATA, SATA) , or any applicable transport protocol. The physical storage device array 380 is composed of a plurality of physical storage devices 385 for providing the image analysis The server 300 is a space for storing data. According to another embodiment of the present invention, when the image analysis server 300 does not provide data storage space, the physical storage device array 380 can be omitted, and in this case, the data is changed and stored in the non-volatile memory ( NVRAM) 330 or stored in an external storage device, such as JBOD. The network interface controller 360 is externally connected to a network, and it transmits the data or information processed by the central processing unit 310 and/or the graphics processing unit 370 to other devices on the network through the network, or transmits The data of other devices on the network are sent to the DRAM 320 for temporary storage. According to another embodiment of the present invention, when the data processing capability of the central processing unit 310 is sufficient to simultaneously process external commands and image data, the above-mentioned data processing center only includes the central processing unit 310, and the graphics processing unit 370 can be omitted.

請參考圖3B，圖3B係根據本發明一實施例繪示圖2中的影像分析伺服器300其軟、硬體架構的關係示意圖。圖3B中該影像分析伺服器300的軟體係架構於硬體480之上，該硬體480組織架構如圖3A中所示。 Please refer to FIG. 3B . FIG. 3B is a schematic diagram illustrating the relationship between the software and hardware architectures of the image analysis server 300 in FIG. 2 according to an embodiment of the present invention. The software architecture of the image analysis server 300 in FIG. 3B is based on the hardware 480, and the organizational structure of the hardware 480 is shown in FIG. 3A.

根據圖3B所示之實施例，在影像分析伺服器300的硬體480與作業系統(Operating System,OS)475之間有一超控制器(Hypervisor)470，亦可稱為虛擬機器監視器(Virtual Machine Monitor,VMM)，該超控制器(Hypervisor)470(或稱為VMM)可以用軟體、韌體或硬體的方式實現。該超控制器470提供一虛擬的作業平台供一或多個作業系統(OS)共用該硬體480的資源，因此該超控制器470也可以視為圖3B中該作業系統475的「元作業系統」(Pre Operating System)，其主要用來協調及分配硬體480的資源，以供運行於該影像分析伺服器300上的多個作業系統使用。在不中斷各作業系統運轉的情況下，該超控制器470可以自動地調整(增加或減少)任一作業系統可以運用的硬體資源，例如：分配的CPU時間、記憶體空間、網路介面、硬碟儲存空間等硬體資源，使得各作業系統間的工作負載 (workload)接***衡。雖然圖3B中僅繪出一作業系統475，但實際上運行於該超控制器470上的作業系統可以是多個。依據本發明的另一實施例，該超控制器470上可再執行一第二作業系統。該第二作業系統內的程式主要將圖3A中的多個實體儲存裝置385規劃為複數個的資料區塊(Data Block)，並透過RAID(Redundant Array of Independent Disks)的機制(例如：RAID 5或RAID 6)保護該複數個資料區塊內儲存的資料，避免因其中1或2個實體儲存裝置385的毀損而造成資料遺失。依據本發明的另一實施例，若該影像分析伺服器300僅需要單一個作業系統時，則該超控制器470可以省略。 According to the embodiment shown in FIG. 3B, there is a hypervisor (Hypervisor) 470 between the hardware 480 and the operating system (Operating System, OS) 475 of the image analysis server 300, which can also be called a virtual machine monitor (Virtual Machine Monitor). Machine Monitor, VMM), the hypervisor (Hypervisor) 470 (or VMM) can be implemented by software, firmware or hardware. The super-controller 470 provides a virtual operating platform for one or more operating systems (OS) to share the resources of the hardware 480, so the super-controller 470 can also be regarded as the "meta-operation" of the operating system 475 in FIG. 3B System” (Pre Operating System), which is mainly used to coordinate and allocate resources of the hardware 480 for use by multiple operating systems running on the image analysis server 300 . Without interrupting the operation of each operating system, the super-controller 470 can automatically adjust (increase or decrease) the hardware resources that any operating system can use, such as: allocated CPU time, memory space, network interface , hard disk storage space and other hardware resources, making the workload between operating systems (workload) close to balance. Although only one operating system 475 is depicted in FIG. 3B , there may be multiple operating systems running on the super-controller 470 . According to another embodiment of the present invention, a second operating system can be executed on the super-controller 470 . The program in the second operating system mainly plans the plurality of physical storage devices 385 in FIG. or RAID 6) protect the data stored in the plurality of data blocks to avoid data loss due to damage to one or two of the physical storage devices 385 . According to another embodiment of the present invention, if the image analysis server 300 only needs a single operating system, the super-controller 470 can be omitted.

圖3B中，該作業系統475可以是一般常見的作業系統，例如：Windows、Linux、Solaris等。該作業系統475提供多工(Multi-Task)、分時(Time Sharing)的作業環境，讓多個應用程式與程序可以同時執行。圖3B中，在該作業系統475上架構有二功能區塊，分別代表一影像分析應用程式460及一雲通道服務(Cloud Gateway Service)應用程式465。該影像分析應用程式460係為一軟體、或軟硬體兼具的模組，係由圖3A中的中央處理單元(CPU)310、或圖形處理單元(GPU)370、或整合在一起後的資料處理中心(對應硬體480)所執行。該影像分析應用程式460用於偵測並分析圖2中該影像擷取裝置400A至400N所得到的影像資料。透過該影像分析應用程式460分析影像資料後，可得到影像資料中擷取的訪客與商品的相關資訊，再配合日期、時間等相關資料，經過適當的計算，最後可得到諸多可供使用者利用的統計資訊並輸出。該雲通道服務應用程式465係作為在作業系統475上執行的相關應用程式，對外與雲資料儲存單元150之間做資料傳輸的一中介服務程式，其接受該相關應用程式的指令對該雲資料儲存單元150進行檔案的存取。依據本發明的另一實施例，當該系統100不需要透過該雲資料儲存單元150來儲存/分析資料時，則該雲通道服務應用程式465可以省略。 In FIG. 3B , the operating system 475 may be a common operating system, such as Windows, Linux, Solaris, and the like. The operating system 475 provides a multi-task and time-sharing operating environment, allowing multiple applications and programs to be executed simultaneously. In FIG. 3B , two functional blocks are constructed on the operating system 475 , representing an image analysis application program 460 and a cloud gateway service application program 465 respectively. The image analysis application program 460 is a software, or a module combining both software and hardware, and is composed of the central processing unit (CPU) 310 in FIG. 3A, or the graphics processing unit (GPU) 370, or an integrated Executed by the data processing center (corresponding to the hardware 480). The image analysis application 460 is used to detect and analyze the image data obtained by the image capture devices 400A to 400N in FIG. 2 . After the image data is analyzed by the image analysis application program 460, the relevant information of visitors and products captured in the image data can be obtained, combined with relevant data such as date and time, and after proper calculation, a lot of available information can be obtained for the user. statistics and output. The cloud channel service application program 465 is an intermediary service program for data transmission between the cloud data storage unit 150 and the cloud data storage unit 150 as a related application program executed on the operating system 475. The storage unit 150 performs file access. According to another embodiment of the present invention, when the system 100 does not need to store/analyze data through the cloud data storage unit 150, the cloud channel service application 465 can be omitted.

依據本發明的另一實施例，圖2中的影像分析伺服器300亦可位於雲資料儲存單元150上；換句話說，影像分析伺服器300可以是由雲資料儲存單元150的供應商所提供的一虛擬機器(Virtual Machine,VM)。執行於該虛擬機器上的作業系統亦由該雲資料儲存單元150的供應商來提供。圖3B中該影像分析應用程式460可於該虛擬機器上的作業系統中執行；也就是說，使用者上傳影像分析應用程式460到雲資料儲存單元150後，透過該虛擬機器來執行該影像分析應用程式460。在這種情況下，雲端中的影像分析伺服器300所使用的Data Base 450可以是雲資料儲存單元150內的一或多個物件檔案(Object File)。如此，影像擷取裝置400A至400N傳送至雲資料儲存單元150的影像資料可以立即由同樣位於雲端中的虛擬機器分析處理，且分析處理後所產生的相關資料也可以儲存於雲資料儲存單元150上，視需要再對外傳送。 According to another embodiment of the present invention, the image analysis server 300 in FIG. 2 may also be located on the cloud data storage unit 150; in other words, the image analysis server 300 may be provided by the provider of the cloud data storage unit 150 A virtual machine (Virtual Machine, VM). The operating system running on the virtual machine is also provided by the provider of the cloud data storage unit 150 . The image analysis application 460 in FIG. 3B can be executed in the operating system on the virtual machine; that is, after the user uploads the image analysis application 460 to the cloud data storage unit 150, the image analysis is executed through the virtual machine Apps 460. In this case, the Data Base 450 used by the image analysis server 300 in the cloud may be one or more object files (Object Files) in the cloud data storage unit 150 . In this way, the image data transmitted by the image capture devices 400A to 400N to the cloud data storage unit 150 can be immediately analyzed and processed by the virtual machine also located in the cloud, and the relevant data generated after the analysis and processing can also be stored in the cloud data storage unit 150 , and send it to the outside world as needed.

依據本發明的另一實施例，圖2中的影像分析伺服器300亦可以整合至影像擷取裝置400(即400A至400N)內。在這種情況下，圖3B中的影像分析應用程式460則移至影像擷取裝置400內執行。該影像分析應用程式460更包含了複數個功能模組，如圖4所示之功能模組的全部或其中一部分。依據本發明的一實施例，該影像擷取裝置400同時具有記錄影像資料與對影像資料做分析/處理的功能。影像擷取裝置400在分析/處理影像資料後，所得到的完整或部分相關資訊會儲存在內部的記憶單元內，例如：SD卡(Secure Digital Memory Card)或是內建的快閃記憶體(Flash Memory)。依據本發明的另一實施例，當影像擷取裝置400分析/處理影像資料並得到完整或部分的相關資訊後，會透過網路180傳送分析結果至該雲資料儲存單元150儲存，或是傳送至影像分析伺服器300做更進一步的分析與處理。上述的完整或部分相關資訊可以是，例如：人臉偵測後的影像資料、商品偵測後的影像資料…等等。 According to another embodiment of the present invention, the image analysis server 300 in FIG. 2 can also be integrated into the image capture device 400 (ie 400A to 400N). In this case, the image analysis application 460 in FIG. 3B is moved to the image capture device 400 for execution. The image analysis application program 460 further includes a plurality of functional modules, such as all or part of the functional modules shown in FIG. 4 . According to an embodiment of the present invention, the image capturing device 400 has functions of recording image data and analyzing/processing the image data. After the image capture device 400 analyzes/processes the image data, the obtained complete or partial relevant information will be stored in an internal memory unit, such as an SD card (Secure Digital Memory Card) or a built-in flash memory ( Flash Memory). According to another embodiment of the present invention, when the image capture device 400 analyzes/processes the image data and obtains complete or partial relevant information, it will transmit the analysis results to the cloud data storage unit 150 through the network 180 for storage, or send to the image analysis server 300 for further analysis and processing. The above-mentioned complete or partial relevant information may be, for example: image data after face detection, business Image data after product detection...and so on.

請參考圖4，圖4係顯示圖3B中的影像分析應用程式460更包含的功能模組。依據本發明一實施例，該影像分析應用程式460包含了以下所有功能模組；依據本發明的另一實施例，該影像分析應用程式460只包含了圖4中的部分功能模組。雖然圖4中的各功能模組係以各自獨立的方式繪示，依據本發明一實施例，在實際實行時，二個或二個以上的功能模組可能整合在一起。圖4中之各功能模組說明如下： Please refer to FIG. 4 , which shows further functional modules included in the image analysis application 460 in FIG. 3B . According to an embodiment of the present invention, the image analysis application 460 includes all the following functional modules; according to another embodiment of the present invention, the image analysis application 460 only includes some of the functional modules in FIG. 4 . Although each functional module in FIG. 4 is shown independently, according to an embodiment of the present invention, two or more functional modules may be integrated together during actual implementation. The functional modules in Figure 4 are described as follows:

(A)人流(People Flow)分析模組405：偵測影像資料中屬於人物特徵的部分、及分析影像資料中人們流動的軌跡，來計算環境中的人數、評估擁擠程度、或分辨訪客在環境中的移動方向。其中一種應用可以是，透過架設在某一商場的一或多部影像擷取裝置400中所記錄的影像資料，統計某一段時間內購物的人數及/或訪客的屬性(例如：年紀、性別、職業…)，其中也包含了不同屬性的訪客在購物時的移動方向。 (A) People Flow analysis module 405: Detect the part of the image data that belongs to the characteristics of people, and analyze the trajectory of people flowing in the image data, to calculate the number of people in the environment, evaluate the degree of crowding, or distinguish the presence of visitors in the environment direction of movement in . One of the applications may be to count the number of shoppers and/or visitor attributes (for example: age, gender, Occupation...), which also includes the direction of movement of visitors of different attributes when shopping.

(B)頭部偵測(Head Detection)模組410：偵測影像資料中具有人的頭部特徵的部分，並分析各個頭部影像的位置與屬性。所謂頭部影像的屬性可以是，例如，藉由分析影像資料中頭部或人臉的特徵(feature)，判斷人的頭部呈現的頭部方向(Head Direction)為何；以上判斷頭部方向(Head Direction)的依據是，根據影像分析伺服器300在訓練階段後所獲得有關頭部方向的「特徵對映(feature mapping)」，或稱為「特徵向量(feature vector)」。此外，頭部偵測模組410亦可進一步判斷頭部或人臉的其他屬性，例如：睜眼或閉眼、表現出的情緒、頭髮顏色、臉部的視覺幾何、或是其他與頭部相關的屬性。頭部影像的屬性資料配合中繼資料標籤(例如：快樂、眼鏡、年齡範圍、會員資料…)，可以幫助使用者快速整理或搜尋或識別人物。頭部偵測模組410可能需要影像資料具有一定的畫素大小、或一定的解析度以上，如此才能將顧客臉/頭部的特徵辨識出來。 (B) Head Detection (Head Detection) module 410: Detect the part of the image data with human head features, and analyze the position and attribute of each head image. The attribute of the so-called head image can be, for example, by analyzing the feature (feature) of the head or face in the image data, it is judged what the head direction (Head Direction) presented by the head of the person is; Head Direction) is based on the "feature mapping" (or "feature vector") related to the head direction obtained by the image analysis server 300 after the training phase. In addition, the head detection module 410 can further determine other attributes of the head or human face, such as: eyes open or closed, emotions shown, hair color, visual geometry of the face, or other head-related attributes. properties. The attribute data of the head image cooperates with the relay Data tags (for example: happiness, glasses, age range, member information...) can help users quickly organize or search or identify people. The head detection module 410 may require the image data to have a certain pixel size or above a certain resolution, so as to recognize the features of the customer's face/head.

(C)訪客身份(Visitor’s ID)辨識模組415：將捕捉到的訪客的臉部特徵與資料庫中已有的訪客資料(臉部特徵)做比對，以辨識訪客的身分(例如：姓名或代號等)，以及進一步得知該身分的相關資料(例如：常購買的物品、付費習慣等)。為捕捉訪客的臉部特徵，可以透過影像攝影機中鏡頭拉近(Room in)的功能，或是即時改變錄影時影像資料的解析度的功能，幫助捕捉行進中的訪客臉部的特徵，並與資料庫中的臉部特徵的資料比對是否有一樣或接近的臉部資料，如此可以了解是否為不同時間點曾進入環境的同一人。依據本發明一實施例，在資料庫中，針對一訪客身份(Visitor’s ID)至少會記錄10張(但不限於此數)各種不同角度的臉部特徵的圖像資料，並利用該些臉部特徵的圖像資料比對/辨識影像擷取裝置400所擷取的影像資料中訪客的臉部特徵。依據本發明的另一實施例，若影像擷取裝置400無法判讀訪客臉部的特徵時，訪客身份辨識模組415會依據影像中訪客走路時身體各部分所展現的獨有特徵來辨識訪客的身份，例如：訪客走路時手腳的擺幅、走路姿勢、頭部/背部/手腳間的長度比例...等特定訪客獨有外觀特徵來辨識該訪客的身份。 (C) Visitor's ID (Visitor's ID) identification module 415: compare the captured visitor's facial features with the existing visitor data (facial features) in the database to identify the visitor's identity (for example: name or code name, etc.), and further information about the identity (such as frequently purchased items, payment habits, etc.). In order to capture the facial features of visitors, it is possible to use the function of zooming in (Room in) in the video camera, or the function of changing the resolution of the image data during recording in real time, to help capture the facial features of the visitors on the move, and to communicate with them The facial feature data in the database is compared to see if there is the same or similar facial data, so that it can be known whether it is the same person who has entered the environment at different time points. According to an embodiment of the present invention, in the database, at least 10 (but not limited to this number) image data of facial features from various angles are recorded for a visitor's ID, and these facial features are used to Feature image data comparison/recognition of visitor's facial features in the image data captured by the image capture device 400 . According to another embodiment of the present invention, if the image capture device 400 cannot judge the features of the visitor's face, the visitor identification module 415 will identify the visitor's identity according to the unique features displayed by each part of the visitor's body when the visitor walks in the image. Identity, such as: the swing of the hands and feet when the visitor walks, the walking posture, the length ratio between the head/back/hands and feet... and other unique appearance characteristics of a specific visitor to identify the identity of the visitor.

(D)商品偵測(Product Detection)模組420：偵測影像資料中具有商品特徵的部分，可依據設定將各種或特定商品框顯出來。一般來說，擷取的影像資料的畫素/解析度需要達一定大小以上，商品偵測模組420才能將各種商品的特徵辨識出來，然後依據設定，可以將該商品的圖像做一凸顯(加框)的動作。 (D) Product Detection (Product Detection) module 420: Detects the part of the image data that has product characteristics, and can display various or specific products as frames according to the settings. Generally speaking, the pixel/resolution of the captured image data needs to be above a certain size, so that the commodity detection module 420 can identify the features of various commodities, and then according to the settings, the image of the commodity can be highlighted (boxed) action.

(E)注視區域(Gazing Area)分析模組425：分析影像資料中辨識為「人」的圖像所注視的區域範圍與其相對的座標位置為何。該注視區域分析模組425係以判別人的頭部方向(Head Direction)的方式，來決定眼睛所注視的方向為何。模擬一光源，並假設該光源位於影像資料中人的頭部區域的正後方，設定該光源以與頭部方向一致的方向射出光線，並於頭部前方的一虛擬平面上產生光線的投影，如此推測光線投影的區域範圍即為訪客可能注視的區域範圍。透過這樣模擬的方式，可以將投射(或注視)的區域範圍計算出來，同時該注視的區域範圍與其相對於該影像擷取裝置400的距離也可以被換算出來。一影像資料中可能包括一或多個訪客，這些訪客可能(a)正在行進中、且持續注視一特定物品，或(b)停留於環境中某一處、並注視一特定物品。注視區域分析模組425可以處理呈現在影像資料中上述各種訪客的注視行為，並能計算(推測)各個訪客所注視的區域範圍。配合在環境中各個物品的位置，便能知道座落在訪客注視區域範圍內的物品，而這些(這一)物品極可能是訪客有興趣的商品。 (E) Gazing Area analysis module 425: analyze the range of the gazing area of the image identified as "person" in the video data and its relative coordinate position. The gaze area analysis module 425 determines the direction the eyes are looking at by judging the head direction (Head Direction) of the person. Simulate a light source, and assume that the light source is located directly behind the head area of the person in the image data, set the light source to emit light in a direction consistent with the head direction, and generate light projection on a virtual plane in front of the head, In this way, it is inferred that the range of the light projection area is the range of the area that the visitor may gaze at. Through such a simulation method, the range of the projected (or watched) area can be calculated, and the range of the watched area and its distance relative to the image capture device 400 can also be converted. One or more visitors may be included in an image data, and these visitors may be (a) moving and looking at a specific object continuously, or (b) staying at a certain place in the environment and looking at a specific object. The gazing area analysis module 425 can process the gazing behaviors of the above-mentioned various visitors presented in the image data, and can calculate (estimate) the gaze area range of each visitor. Cooperating with the positions of each item in the environment, it is possible to know the items located within the gaze area of the visitor, and these (this) items are likely to be commodities that the visitor is interested in.

(F)注視物品(Gazing Object)分析模組430：分析影像資料中的訪客正在看哪一(哪些)物品。注視物品分析模組430為頭部偵測模組410、訪客身份辨識模組415、商品偵測模組420、注視區域分析模組425的其中至少二者的輸出結果的結合，而產生的進一步應用。注視物品分析模組430除了分析影像資料中的訪客正在看什麼物品外，對於一特定訪客對某一物品的注視的次數、每一次注視的時間、訪客的性別、訪客的年紀…等各種資料都會記錄下來。此外，注視物品分析模組430同時也會合併記錄日期、時間、地點、溫度、濕度…等其它需要的基本資訊。 (F) Gazing Object analysis module 430 : analyze which object (or objects) the visitor is looking at in the image data. The gaze item analysis module 430 is a combination of the output results of at least two of the head detection module 410, the visitor identification module 415, the commodity detection module 420, and the gaze area analysis module 425, and produces further application. The gaze item analysis module 430, in addition to analyzing what item the visitor is looking at in the image data, will analyze various data such as the number of times a specific visitor gazes at a certain item, the time of each gaze, the gender of the visitor, the age of the visitor, etc. record it. In addition, the watched object analysis module 430 will also combine and record date, time, location, temperature, humidity... and other basic information required.

(G)興趣度(Intensity of Interest)分析模組435：透過其他模組提供的各種資料，興趣度分析模組435利用大數據(big data)的分析方式，產生使用者需要的統計資訊。這些統計資訊可以供使用者調整商品擺放位置、改變銷售策略、或是改變廣告/促銷內容，以提高整體的營收與經營績效。舉例來說，分析某一段時間、多個地點(商場)的記錄資料可以彙整出一賣場中「哪一些商品最容易引起顧客注意」、「哪一些商品較不引起顧客興趣」、「特定商品引起顧客注意的年齡層、性別」、「各種商品與天氣、星期、特定顧客層間的連動程度」…等統計資訊。 (G) Intensity of Interest analysis module 435: various information provided by other modules The interest degree analysis module 435 utilizes big data (big data) analysis methods to generate statistical information required by users. These statistical information can be used by users to adjust product placement, change sales strategies, or change advertising/promotional content, so as to improve overall revenue and business performance. For example, analyzing the record data of multiple locations (malls) for a certain period of time can be summarized as "which products are most likely to attract customers' attention", "which products are less interesting to customers", "specific products that attract customers' attention" in a store. Statistical information such as the age group and gender that customers pay attention to", "the degree of linkage between various products and weather, weekdays, and specific customer levels".

(H)移動路徑(Moving Path)分析模組440：透過單一或是多個的影像擷取裝置400來追蹤特定條件的訪客(特定族群)的移動路徑(軌跡)。藉由分析其移動路徑及停滯時間，以提供訪客行為資訊供使用者參考。使用者可以依據該訪客行為資訊，調整物品/商品的擺設地點或改變物品/商品的擺設方式，以吸引訪客/顧客注意的依據。上述之特定條件的訪客可以由使用者設定，例如是：40-50歲間的女性、10-12歲小學的男學生、70歲以上使用柺杖的男性…等。 (H) Moving Path analysis module 440: track the moving path (trajectory) of visitors (specific groups) with specific conditions through a single or multiple image capture devices 400 . By analyzing its moving path and stagnation time, it can provide visitor behavior information for user reference. Based on the visitor behavior information, the user can adjust the display location of the article/commodity or change the display method of the article/commodity to attract the attention of the visitor/customer. The above-mentioned visitors with specific conditions can be set by the user, for example: women between the ages of 40-50, male students of primary schools aged 10-12, men over the age of 70 who use crutches...etc.

(I)訪客屬性分析模組445：透過資料庫450內已經預先定義的各種屬性的資料集(Data Set)，讓影像分析伺服器300在訓練階段先進行深度學習。深度學習完成後，訪客屬性分析模組445便專責於判斷影像資料中訪客的各種屬性，例如：性別、年齡、職業…等。如前所述，本發明基於CNN的架構，透過對各類人物所定義的資料集(Data Set)做深度學習，加上分析訪客的衣著特徵、動作特徵、選購商品的類別等各種資訊來進一步交叉分析比對，使該訪客屬性分析模組445可以提高對訪客屬性推斷的正確性。 (I) Visitor attribute analysis module 445: through the predefined data sets (Data Set) of various attributes in the database 450, the image analysis server 300 can perform deep learning in the training phase. After the deep learning is completed, the visitor attribute analysis module 445 is dedicated to judging various attributes of the visitor in the video data, such as gender, age, occupation, etc. As mentioned above, the present invention is based on the architecture of CNN, through in-depth learning of data sets (Data Set) defined by various characters, plus analysis of various information such as visitor's clothing characteristics, action characteristics, and categories of purchased goods. Further cross-analysis and comparison enables the visitor attribute analysis module 445 to improve the correctness of visitor attribute inference.

(J)回流客(Returning Customer)分析模組455：透過資料庫450內記錄的歷史訪客影像資料，訪客身份辨識模組415可以辨識目前影像資料中的訪客是否以前曾經造訪過。若是判斷為回流客，則資料庫450中會紀錄有與該回流客相關的歷史統計資訊，例如：歷史購物記錄。上述回流客的歷史統計資訊可以再結合注視物品分析模組430提供的訪客對不同物品的注視次數、注視時間等資訊，則回流客分析模組455可以進一步推斷訪客再次購買某一物品的機率/頻率，供使用者參考以提早作準備，例如：商場可以知道多數顧客有興趣的商品以提早準備充足的貨量。 (J) Returning Customer analysis module 455: records through the database 450 The historical visitor image data, the visitor identification module 415 can identify whether the visitor in the current image data has visited before. If it is judged as a returning customer, then the database 450 will record historical statistical information related to the returning customer, such as historical shopping records. The above-mentioned historical statistical information of returning customers can be combined with information such as the number of times visitors gaze at different items and the time of watching them provided by the gazed item analysis module 430. Then the returning visitor analysis module 455 can further infer the probability of the visitor buying a certain item again Frequency, for users' reference to prepare in advance, for example: shopping malls can know the products that most customers are interested in so as to prepare sufficient quantities in advance.

請參考圖5，依據本發明的一實施例，圖5係本發明辨識影像資料中人們及物體的整體應用概念的流程圖。該流程分為兩個階段，先是訓練階段幫助影像分析伺服器300中的影像分析應用程式460做深度學習；之後進入應用階段，影像分析伺服器300中的影像分析應用程式460便可以進行各種關於影像偵測與辨識的實務應用。 Please refer to FIG. 5 . According to an embodiment of the present invention, FIG. 5 is a flow chart of the overall application concept of the present invention for identifying people and objects in image data. The process is divided into two stages. First, the training stage helps the image analysis application program 460 in the image analysis server 300 to do deep learning; and then enters the application stage, where the image analysis application program 460 in the image analysis server 300 can perform various related tasks. Practical applications of image detection and recognition.

圖5中之訓練階段包含步驟510與步驟520。 The training phase in FIG. 5 includes step 510 and step 520 .

在步驟510中，利用資料庫450中已經定義的各種人/物特徵的資料集(Data Set)，幫助影像分析伺服器300做深度學習。依據本發明的一實施例，資料集(Data Set)可以是具有各種人/物特徵且呈現各種不同角度的多張影像資料，例如：具有同一特徵(feature)的一系列相關圖片。本發明中所謂的資料集(Data Set)是指符合同一主題(subject)且具有該主題特徵(subject feature)的多張圖像資料的總稱；其中的主題(subject)可以是，舉例來說，各種不同態樣的男生、女生、老人、小孩、衣服、褲子、帽子、植物、動物、汽車、工具、食物…等等。同一主題(subject)的圖像資料具有符合該主題的特徵(feature)，且可能呈現在不同「空間中旋轉自由度」下所呈現的態樣。所謂「空間中旋轉自由度」，舉例來說，包含有翻滾(Roll)、俯仰(Pitch)、偏擺(Yaw)等3種軸向的維度，或稱為「三維空間中的旋轉自由度」。代表「空間中旋轉自由度」的翻滾(Roll)、俯仰(Pitch)、偏擺(Yaw)三個維度的數值會記錄在圖像資料的「標籤(Label)」資料中。此外，該標籤資料中還會記錄定義該影像資料的主題(subject)，例如：男生、女生、老人、小孩、衣服、褲子、帽子、植物、動物、汽車、工具、食物…等等。所以，圖像資料的「標籤(Label)」中記載不同的數值，也表示該圖像資料中的人/物於立體空間中的不同旋轉自由度下所呈現的各種態樣。例如：一資料集(Data Set)中的一圖像資料中顯示「人的頭部向右轉(翻滾)45、仰角(俯仰)0、斜角(偏擺)0」的態樣，其中「人的頭部、向右轉45、仰角0、斜角0」即是記錄於該影像資料中的「標籤(Label)」。影像分析伺服器300利用各種不同主題(subject)的資料集(Data Set)，透過卷積神經網路(CNN)架構做深度學習。經過深度學習的訓練後，影像分析伺服器300本身便可獲得學習主題的「特徵對映(feature mapping)」；如此，影像分析伺服器300便具有能力可以辨識影像擷取裝置400擷取的影像資料中具有哪種/哪些主題(人或物)的特徵。依據本發明的一實施例，上述定義各種人/物特徵的資料集(Data Set)係儲存於與該影像分析伺服器300連接的資料庫(Data Base)450內，在訓練階段時供影像分析應用程式460利用。依據本發明的一實施例，資料集的圖型檔案可以是符合聯合圖像專家小組(Joint Photographic Experts Group，JPEG)、點陣圖(Bitmap，BMP)、PC畫筆交換(PC Paintbrush Exchange，PCX)、圖像交換格式(Graphics Interchange Format，GIF)…等任何影像標準的圖形檔案。 In step 510 , use the defined data set (Data Set) of various human/object characteristics in the database 450 to help the image analysis server 300 to do deep learning. According to an embodiment of the present invention, a data set may be a plurality of image data having various characteristics of people/objects and presenting various angles, for example: a series of related pictures having the same feature. The so-called data set (Data Set) in the present invention refers to the general term of a plurality of image materials conforming to the same subject (subject) and having the subject feature (subject feature); wherein the subject (subject) can be, for example, All kinds of boys, girls, old people, children, clothes, pants, hats, plants, animals, cars, tools, food...etc. The image data of the same subject (subject) has the characteristics (feature) that conform to the subject, and may appear in different "empty The state presented under "intermediate rotational degrees of freedom". The so-called "rotational degrees of freedom in space", for example, include three axial dimensions such as roll (Roll), pitch (Pitch), and yaw (Yaw), or it is called "rotational degrees of freedom in three-dimensional space". . The three-dimensional values of Roll, Pitch, and Yaw representing the "rotational degrees of freedom in space" will be recorded in the "Label" data of the image data. In addition, the tag data will also record the subject defining the image data, such as: boys, girls, old people, children, clothes, pants, hats, plants, animals, cars, tools, food, etc. Therefore, the different numerical values recorded in the "Label" of the image data also represent various aspects of the person/object in the image data under different rotational degrees of freedom in the three-dimensional space. For example: an image data in a data set (Data Set) shows the appearance of "the person's head turns to the right (rolling) 45, elevation angle (pitch) 0, oblique angle (yaw) 0", where " The person's head, turning right 45 degrees, elevation angle 0, and inclination angle 0" are the "Label" recorded in the image data. The image analysis server 300 utilizes various data sets (Data Set) of different subjects to perform deep learning through a convolutional neural network (CNN) framework. After deep learning training, the image analysis server 300 itself can obtain the "feature mapping" of the learning theme; thus, the image analysis server 300 has the ability to recognize the image captured by the image capture device 400 What/what subjects (people or things) are characterized by the material. According to an embodiment of the present invention, the above-mentioned data sets (Data Set) defining various characteristics of people/objects are stored in the database (Data Base) 450 connected to the image analysis server 300 for image analysis during the training phase App 460 utilizes. According to an embodiment of the present invention, the graphic file of the data set can be in accordance with the Joint Photographic Experts Group (JPEG), bitmap (Bitmap, BMP), PC paintbrush exchange (PC Paintbrush Exchange, PCX) , Graphics Interchange Format (Graphics Interchange Format, GIF)... and other graphic files of any image standard.

步驟520中，係在測試步驟510的深度學習成果。經過步驟510中的深度學習過程後，影像分析應用程式460可以藉由所獲得的「特徵對映(feature mapping)」學習到辨識影像資料中各種「人/物」的能力，但此辨識能力的優劣仍須經過測試。根據本發明的一實施例，測試的方式可以是輸入測試影像資料，以驗證影像分析應用程式460辨識「人/物」的能力是否合乎要求。在尚未通過設定的評估標準前，影像分析應用程式460會重複步驟510的深度學習過程，以提高其辨識各種「人/物」的能力，直到可以通過設定的評估標準為止。 In step 520, the deep learning result of step 510 is tested. After the deep learning process in step 510, the image analysis application 460 can learn the ability to recognize various "people/objects" in the image data through the obtained "feature mapping", but the recognition ability is limited Pros and cons are still Subject to testing. According to an embodiment of the present invention, the test method may be to input test image data to verify whether the ability of the image analysis application program 460 to recognize "persons/objects" meets requirements. Before passing the set evaluation standard, the image analysis application 460 will repeat the deep learning process of step 510 to improve its ability to recognize various "persons/objects" until the set evaluation standard can be passed.

上述步驟510與步驟520表示影像分析應用程式460正式上線前的前置訓練(訓練階段)。此前置訓練的成果與影像分析應用程式460於實際上線時的辨識能力息息相關。若是訓練階段做得好，表示影像分析應用程式460具有較佳的影像辨識能力，可以輸出品質好與正確的結果。又，步驟510與步驟520執行之後，影像分析應用程式460可以學習到辨識影像資料內各種「人/物」特徵的「特徵對映(feature mapping)」，或稱為「特徵向量(feature vector)」。根據學習到的「特徵對映(feature mapping)」，該影像分析應用程式460於實際上線時，並不需再比對資料庫(Data Base)450內的資料集(Data Set)，而依照學習到的「特徵對映(feature mapping)」來偵測或辨識影像資料具有何種人或物的特徵。 The above step 510 and step 520 represent the pre-training (training phase) before the image analysis application program 460 is officially launched. The result of this pre-training is closely related to the recognition ability of the image analysis application program 460 when it is actually launched. If the training stage is done well, it means that the image analysis application 460 has better image recognition ability, and can output good quality and correct results. In addition, after step 510 and step 520 are executed, the image analysis application program 460 can learn the "feature mapping (feature mapping)" or "feature vector" for identifying various "person/object" features in the image data. ". According to the learned "feature mapping", when the image analysis application 460 is actually launched, it does not need to compare the data sets (Data Set) in the database (Data Base) 450, but according to the learned The obtained "feature mapping (feature mapping)" is used to detect or identify the characteristics of people or objects in the image data.

圖5中之應用階段包含步驟530至步驟570。 The application phase in FIG. 5 includes step 530 to step 570 .

步驟530係影像分析應用程式460辨識來自影像擷取裝置400的影像資料。步驟530係為應用階段之一。根據本發明的一實施例，將影像擷取裝置400設置於超市/賣場的環境中，則影像擷取裝置400所擷取的影像資料可能會包含購物顧客的臉/頭與身體的部分、同時也可能會包含貨架上的各種商品。步驟530中，影像分析應用程式460中的頭部偵測模組410與商品偵測模組420，可以偵測影像資料中哪些部份的圖像為購物顧客(含購物顧客所穿的衣服)、哪些部份的圖像為貨架上的商品等。基本上，無論是購物顧客(臉/頭部特徵)或是貨架上商品的圖像的畫素，都須大於一門檻值以上，例如：40X40畫素以上，但不受限於此規格，則購物的顧客(臉/頭部特徵)或是貨架上商品就可以被影像分析應用程式460偵測與辨識出來。上述所稱「40X40畫素」僅為對畫質的門檻值提出說明時的一個範例，這個門檻值可能是其它任意值，例如：50X50畫素、80X80畫素、100X100畫素…等，端視實際應用所需。由此可知，根據本發明技術，一張/幀影像資料上呈現的購物的顧客與貨架上商品，不論數目多少，只要各別的畫質達到門檻值以上，即可以同時被該影像分析應用程式460所辨識出來。 In step 530 , the image analysis application 460 identifies the image data from the image capture device 400 . Step 530 is one of the application phases. According to an embodiment of the present invention, if the image capture device 400 is set in the environment of a supermarket/store, the image data captured by the image capture device 400 may include the face/head and body parts of the shopper, and at the same time It may also contain various items on the shelf. In step 530, the head detection module 410 and the product detection module 420 in the image analysis application 460 can detect which parts of the images in the image data are shoppers (including the clothes worn by shoppers) , Which part of the image is the product on the shelf, etc. Basically, whether it is a shopper (face/head features) or an item on a shelf The pixels of the image must be greater than a threshold value, for example: 40X40 pixels or more, but not limited to this specification, then the shopping customer (face/head features) or the goods on the shelf can be image analyzed The application 460 detects and identifies it. The above-mentioned "40X40 pixels" is only an example when explaining the threshold value of image quality. This threshold value may be any other value, such as: 50X50 pixels, 80X80 pixels, 100X100 pixels...etc. required for practical application. It can be seen that according to the technology of the present invention, regardless of the number of shopping customers and products on the shelves presented on one/frame of image data, as long as the image quality of each reaches above the threshold value, they can be simultaneously analyzed by the image analysis application. 460 identified.

步驟540中，係將辨識後的各種人/物特徵的資料提供給影像分析應用程式460的其它功能模組做進一步應用。根據本發明的一實施例，所謂人/物特徵係指影像資料中可以辨識為人或物的特定形狀、顏色、大小、或以上的組合；此外，在步驟540中亦會計算辨識的人或物相對於影像擷取裝置400的位置與距離。以上所稱的進一步應用係指，利用影像分析應用程式460的各種功能模組，例如圖4中所列的：人流分析模組405、訪客身分辨識模組415、注視區域分析模組425、注視物品分析模組430、興趣度分析模組435、移動路徑分析模組440、訪客屬性分析模組445、回流客分析模組455…等功能模組，對辨識後的各種人/物特徵的資料做進一步的處理與分析，以產生使用者需要的資料及/或資訊。 In step 540, the identified data of various human/object features are provided to other functional modules of the image analysis application program 460 for further application. According to an embodiment of the present invention, the so-called human/object feature refers to a specific shape, color, size, or a combination of the above that can be identified as a person or object in the image data; in addition, in step 540, the identified person or object will also be calculated. The position and distance of the object relative to the image capture device 400 . The above-mentioned further application refers to various functional modules utilizing the image analysis application program 460, such as those listed in FIG. Item analysis module 430, interest degree analysis module 435, moving path analysis module 440, visitor attribute analysis module 445, returning customer analysis module 455... and other functional modules, for the identification of various people/object characteristics data Do further processing and analysis to generate data and/or information that users need.

步驟550中，係將各功能模組之歷史輸出結果記錄於資料庫(Data Base)450內。記錄於資料庫(Data Base)450的輸出結果不限定於影像資料，也可是，舉例來說：包括文字、表格、資料庫等的相關資料，或是影像資料與任意相關資料的組合。根據本發明的一實施例，將本發明技術應用於超市或賣場時，記錄到超市/賣場光顧的顧客(Customer，可視為每一ID)、消費或注視(有興趣)的商品(Product Item)、顧客的移動路徑、以及購物行為(Behavior)，是提供作進一步分析應用的重要基礎元素。除了上述有關顧客、商品、移動路徑、消費行為的記錄外，影像分析應用程式460更可以自動結合日期/時間、地點、天氣、溫度、濕度、節日、廣告促銷…等其他參考資訊，以形成另一種可再利用的資訊。 In step 550, the historical output results of each functional module are recorded in the database (Data Base) 450 . The output results recorded in the database (Data Base) 450 are not limited to image data, but can also be, for example, related data including text, tables, databases, etc., or a combination of image data and any related data. According to an embodiment of the present invention, when the technology of the present invention is applied to a supermarket or a store, it is recorded that the customer (Customer, which can be regarded as each ID) visited by the supermarket/store, the commodity (Product Item) consumed or watched (interested) , customer's mobile path, and shopping behavior (Behavior), are provided for further An important foundational element for analytics applications. In addition to the above-mentioned records about customers, products, moving paths, and consumption behaviors, the image analysis application 460 can also automatically combine date/time, location, weather, temperature, humidity, festivals, advertising promotions... and other reference information to form another A reusable information.

步驟560中，係將各功能模組儲存於資料庫(Data Base)450內的歷史輸出資料再做進一步的統計分析。雖然只針對單日的記錄、單人的記錄、或單項商品的記錄亦可以提供一些參考資訊；但若結合所有的資料做大數據分析更可以得到其中的趨勢與關係。步驟560就是對各功能模組輸出的歷史記錄再加以統計分析，以提供使用者各種有用的參考資訊，例如：容易引起顧客興趣的商品、所有顧客第一眼注意的商品、不易引起顧客興趣的區域、促銷商品對顧客的影響」、顧客的行動路線、商品的最佳擺設…等諸多的統計資訊輸出。 In step 560, further statistical analysis is performed on the historical output data stored in the database (Data Base) 450 by each functional module. Although some reference information can be provided only for the records of a single day, records of a single person, or records of a single item; but if all the data are combined for big data analysis, the trends and relationships can be obtained. Step 560 is to statistically analyze the historical records output by each function module to provide users with various useful reference information, such as: products that are likely to attract customers' interest, products that are first noticed by customers, and products that are not easy to attract customers' interest Regions, the impact of promotional products on customers", customer action routes, the best display of products... and many other statistical information output.

步驟570中，係將步驟560中對各功能模組輸出的歷史記錄再加以統計分析後的結果，甚至根據該結果提供建議，輸出資訊給使用者參考。根據本發明的一實施例，輸出資訊的形式可以是：報表式的資訊、圖表式的資訊、敘述性的資訊、或是以上任意兩者或兩者以上的組合。舉例來說，該輸出資訊可以是，「某物品受較多顧客關注，其中顧客的男女性別比例、顧客的年紀大小」、「易受到顧客關注的物品項目與促銷活動的關連度」、「顧客第一眼的喜好商品與實際購買此商品的比例」、「不易受顧客注意的區域」、「顧客與促銷商品的關連程度」、「顧客的移動路線」、「特定商品的最佳擺設」…等。另外，輸出資訊中亦可以提供使用者改善的建議，例如：「受顧客注意的商品其增加庫存的頻率與數量」、「將顧客最常關注的哪些商品放置於哪些位置的貨架上可以增加銷售量」、「基於顧客移動路線，建議安排貨架上商品擺放位置的方式」、「哪些是較冷門的商品建議應改擺放在其他位置」…等等。 In step 570, the historical records output by each functional module in step 560 are added to the result of statistical analysis, and suggestions are even provided according to the results, and information is output to the user for reference. According to an embodiment of the present invention, the output information may be in the form of report information, graph information, descriptive information, or a combination of any two or more of the above. For example, the output information can be, "A certain item attracts more attention from customers, among which the ratio of male to female and the age of the customer", "The relationship between items that are easy to attract customers' attention and promotional activities", "Customer The ratio of the first favorite product to the actual purchase of this product", "areas that are not easily noticed by customers", "the degree of relationship between customers and promotional products", "customer's movement route", "the best display of specific products"... wait. In addition, the output information can also provide suggestions for user improvement, such as: "The frequency and quantity of increased inventory of products that are noticed by customers", "Placing which products that customers are most concerned about on the shelves can increase sales Quantity", "Based on the customer's moving route, suggest ways to arrange the placement of products on the shelf", "Which are less popular products, it is recommended to be placed in other positions"...etc.

請參考圖6，圖6係關於圖5中的訓練階段(步驟510、步驟520)的流程圖。該訓練階段的流程開始於步驟610。 Please refer to FIG. 6 , which is a flowchart related to the training phase (step 510 , step 520 ) in FIG. 5 . The flow of the training phase starts at step 610 .

步驟610中，影像分析伺服器300中的影像分析應用程式460基於CNN的架構下，利用輸入的各種「人/物」特徵的資料集(Data Set)進行深度學習。依據本發明的一實施例，資料集(Data Set)的影像資料可以是如圖7A中定義「呈現各種不同角度的女性的頭部」，或是如圖7B、圖7C中定義「各種物體與食物」的影像資料。依據本發明的一實施例，圖7A中的影像資料至少包括了編號1至91各個「頭部方向(Head Direction)」的女性影像資料。每一圖7A中編號1-91的影像資料都有其代表主體內容及其頭部方向(Head Direction)的翻滾(Roll)、俯仰(Pitch)、偏擺(Yaw)三個維度的標籤(label)資訊。編號1-91號的影像資料的「標籤」中的主題(subject)是女性，而標籤中不同數值的Roll、Pitch、Yaw等資訊代表該女性的臉/頭部於立體空間下，在不同的「空間中的旋轉自由度」下所呈現的不同態樣。圖7A中的女性「在不同的空間中旋轉自由度下所呈現的不同態樣」可以稱為該女性的「頭部方向」。上述每一編號的影像資料中不同值的Roll、Pitch、Yaw為該編號的影像資料的「標籤」之部份資訊。此外，主題(subject)是女性也是「標籤」的另一部份資訊。例如編號46號的影像資料(粗線方框處)，其特徵為臉/頭正對前方且該影像資料中臉/頭無任何的仰角與偏移(即無任何的翻滾、俯仰、偏擺值)。所以編號46號的影像資料，其Roll、Pitch、Yaw的值分別被定義為(0,0,0)。圖7A中編號46號以外其他編號的影像資料與編號46號的影像資料相比較，其臉/頭的仰角與偏移都有不同程度的改變，故其Roll、Pitch、Yaw的值可能是為(-20,10,0)、(5,-10,30)、(10,20,-5)…等不同的情況，而影像資料中此數值代表其標籤之一部分，用來區別不同臉/頭部的翻滾(Roll)、俯仰(Pitch)、偏擺(Yaw)所形成的頭部方向的影像資料。上述該(0,0,0)、(-20,10,0)、(5,-10,30)、(10,20,-5)等數值並不一定表示實際的角度，而可能是代表與基準的影像資料的相異程度。影像分析伺服器300中的影像分析應用程式460經過步驟610的深度學習過程後，可以得到學習主題的「特徵對映(feature mapping)」」，或稱為「特徵向量(feature vector)」，用來辨識具有特定主題特徵的「人」或「物」。 In step 610, the image analysis application program 460 in the image analysis server 300 uses the input data set (Data Set) of various "person/object" features to perform deep learning based on the architecture of CNN. According to an embodiment of the present invention, the image data of the data set (Data Set) can be as defined in Figure 7A as "presenting women's heads from various angles", or as defined in Figure 7B and Figure 7C as "various objects and Food" video data. According to an embodiment of the present invention, the image data in FIG. 7A at least include female image data numbered 1 to 91 in each “head direction (Head Direction)”. Each image data numbered 1-91 in Figure 7A has its three-dimensional label (label) representing the subject content and its head direction (Roll), pitch (Pitch), and yaw (Yaw). )Information. The subject in the "label" of the image data No. 1-91 is a woman, and the different values of Roll, Pitch, Yaw and other information in the label represent the face/head of the woman in the three-dimensional space, in different Different appearances presented under "rotational degrees of freedom in space". The "different appearances presented by the woman in different degrees of freedom of rotation in different spaces" in Fig. 7A can be called the "head direction" of the woman. The different values of Roll, Pitch, and Yaw in the image data of each number above are part of the information of the "tag" of the image data of the number. In addition, the subject (subject) is another part of the information of the "tag". For example, the image data No. 46 (in the thick line box), is characterized by the face/head facing the front and the face/head in the image data without any elevation angle and offset (that is, without any roll, pitch, and yaw) value). Therefore, for the image data numbered 46, the values of Roll, Pitch, and Yaw are respectively defined as (0,0,0). Compared with the image data numbered 46 in Figure 7A, the elevation angle and offset of the face/head have different degrees of change, so the values of Roll, Pitch, and Yaw may be (-20,10,0), (5,-10,30), (10,20,-5)... and other different situations, and this value in the image data represents part of its label, which is used to distinguish different faces/ The head formed by the roll (Roll), pitch (Pitch), and yaw (Yaw) of the head direction image data. The above values (0,0,0), (-20,10,0), (5,-10,30), (10,20,-5) and other values do not necessarily indicate the actual angle, but may represent The degree of difference from the baseline image data. After the image analysis application program 460 in the image analysis server 300 undergoes the deep learning process in step 610, it can obtain the "feature mapping (feature mapping)" of the learning topic, or "feature vector (feature vector)", which can be used to To identify "people" or "things" with specific thematic characteristics.

圖7B、7C中的「物體與食物」的資料集(Data Set)影像資料可能是，舉例來說，賣場內的各種商品，例如：水果、衣服、褲子、飲料、食物…等。同樣地，各種商品的資料集(Data Set)中，每一種商品的每一影像資料也會具有一標籤，標示不同的Roll、Pitch、Yaw數值，以在多種商品中呈現「同一種商品但不同態樣」下的影像資料。 The image data of the "object and food" data set (Data Set) in Fig. 7B and 7C may be, for example, various commodities in the store, such as: fruit, clothes, pants, drinks, food...etc. Similarly, in the data set (Data Set) of various commodities, each image data of each commodity will also have a label, indicating different values of Roll, Pitch, and Yaw, so as to present "the same commodity but different Image data under "Style".

步驟620中，係利用測試影像資料驗證影像分析應用程式460於步驟610的學習成果。雖然經過步驟610的深度學習過程，影像分析應用程式460應具備一定程度的能力辨識所學習主題(subject)的特徵，但成功率是否合乎要求仍需經測試影像資料的驗證。測試影像資料的數量可能是資料集(Data Set)的影像資料的數量的「四分之一」或是「五分之一」，但不以此為限。根據本發明的一實施例，測試過程的正確性可能還是需要「評定者(人)」的介入來作最後的判定，判定的內容，舉例來說：看辨識的結果是否正確、辨識率是否達到一定的程度。此一驗證的歷程可以被記錄下來，以供未來再學習與實際辨識影像資料時的參考。 In step 620 , the learning result of the image analysis application 460 in step 610 is verified by using the test image data. Although after the deep learning process in step 610, the image analysis application program 460 should have a certain degree of ability to identify the characteristics of the subject to be learned, whether the success rate meets the requirements still needs to be verified by the test image data. The amount of test image data may be "1/4" or "1/5" of the amount of image data in the Data Set, but not limited thereto. According to an embodiment of the present invention, the correctness of the test process may still require the intervention of an "evaluator (person)" to make a final judgment. The content of the judgment, for example: whether the recognition result is correct and whether the recognition rate has reached to a certain extent. This verification process can be recorded for future reference when relearning and actually identifying image data.

步驟630中，判斷影像分析應用程式460的學習結果是否達到預定的標準。舉例來說，該判定的標準可以是：利用測試影像資料測試的錯誤率必須低於10%以下；上述10%的門檻值只是舉例，並不以此為限，實務上使用者可以設定任何想要的門檻值。再以10%為例，錯誤率低於10%的意義是：假設一個測試影像資料中有10個「人」存在於畫面中，該影像分析應用程式460必須正確辨識出其中至少9人以上的臉/頭部區域、預估該臉/頭部所呈現的Roll、Pitch、Yaw等特徵。若可達到測試影像資料的90%以上的正確辨識率才算合格，否則即是未達到標準。同樣地，當測試對象為「物」時也是比照同樣的檢視標準。影像分析應用程式460的學習結果若是達到預定的標準，則結束訓練階段。否則，則執行步驟640。 In step 630, it is determined whether the learning result of the image analysis application 460 meets a predetermined standard. For example, the criterion for the judgment may be: the error rate of the test using the test image data must be Less than 10%; the threshold value of 10% above is just an example and not limited to it. In practice, users can set any desired threshold value. Taking 10% as an example, the meaning of the error rate being lower than 10% is: assuming that there are 10 "people" in the test image data, the image analysis application 460 must correctly identify at least 9 of them. Face/head area, estimate the Roll, Pitch, Yaw and other features presented by the face/head. If the correct recognition rate of more than 90% of the test image data can be achieved, it is considered qualified, otherwise it is not up to the standard. Similarly, when the test object is a "thing", the same inspection standard is also compared. If the learning result of the image analysis application 460 meets the predetermined standard, the training phase ends. Otherwise, execute step 640 .

步驟640中，係針對某些測試後未達到標準的主題(subject)，輸入更多相關的資料集(Data Set)再次進行深度學習。步驟640執行完畢後會再回到步驟620，利用測試影像資料驗證影像分析應用程式460的學習結果是否達到預定的標準。如此反覆學習並驗證，直到學習的結果達到預定的標準為止。 In step 640, for some subjects that did not meet the standard after the test, more relevant data sets (Data Set) are input to perform deep learning again. After step 640 is executed, it returns to step 620 to verify whether the learning result of the image analysis application 460 meets the predetermined standard by using the test image data. Repeated learning and verification in this way until the learning result reaches the predetermined standard.

依據本發明的另一實施例，當步驟630的判斷結果為否時，則步驟640中並不再輸入更多相關的資料集(Data Set)且再次進行深度學習，而是選擇另一種演算法來做為辨識各種主題(subject)於各種態樣下之特徵的工具。依據本發明的另一實施例，當步驟630的判斷結果為否時，則步驟640中變更/調整其中的學習參數，以使學習的結果可以更接近預期的結果。上述經由改變而達到預定標準的變因可以是透過：(a)輸入更多與其相關的資料集(Data Set)的影像資料；(b)選擇另一種演算法；(c)調整其中的學習參數，等方法來達到改變學習的結果。改變變因的方式，可以一次改變一個變因，或是同時改變多個的變因來改變其學習的結果。 According to another embodiment of the present invention, when the judgment result of step 630 is no, then in step 640, no more relevant data sets (Data Set) are input and deep learning is performed again, but another algorithm is selected It is used as a tool to identify the characteristics of various subjects in various modes. According to another embodiment of the present invention, when the judgment result of step 630 is no, then in step 640 the learning parameters are changed/adjusted so that the learning result can be closer to the expected result. The above-mentioned change to meet the predetermined standard can be achieved through: (a) inputting more image data related to the data set (Data Set); (b) selecting another algorithm; (c) adjusting the learning parameters therein , and other methods to achieve the results of changing learning. The way to change the variable can be to change one variable at a time, or to change multiple variables at the same time to change the learning result.

請參考圖8A至圖8C，圖8A至圖8C係例示圖6中的步驟620以測試影像驗證影像分析應用程式460的學習結果。一幀(frame)測試影像為諸多的畫素所構成的圖片(picture)，其中包含使用者希望影像分析應用程式460可以辨識的特定圖像。經過深度學習的訓練後，影像分析應用程式460應該具有相當的能力可以辨識測試影像的圖片(picture)中學習過的人或物的特徵。一般來說，在下面兩種情況下，影像分析應用程式460應該可以順利完成辨識：(a)測試影像中具有欲辨識的人或物的特徵圖像超過一定畫素大小、(b)出現互相重疊的人或物的特徵圖像時，其中互相重疊的部分不超過一定比例而影響可辨識的程度。圖8A至圖8C中呈現，經過影像分析應用程式460辨識後，以某一顏色的方框標註出「人的頭部」於測試影像的圖片(picture)中的位置；另外以另一顏色的方框標註出特定物體於測試影像資料的圖片(picture)中的位置。雖然在圖8A與圖8B之測試影像的圖片(picture)中，「人的頭部」的角度各不相同，但只要欲辨識的特徵圖像超過一定畫素大小，該特徵即應可被辨識出來。測試影像的整體辨識率須要達到預定的標準之上，訓練階段才能結束。 Please refer to FIG. 8A to FIG. 8C . FIG. 8A to FIG. 8C illustrate step 620 in FIG. 6 to test the image to verify the learning result of the image analysis application 460 . A frame of test image is a picture composed of many pixels, which includes a specific image that the user wants the image analysis application 460 to recognize. After being trained by deep learning, the image analysis application 460 should have a considerable ability to recognize the features of the learned person or object in the picture of the test image. Generally speaking, under the following two conditions, the image analysis application 460 should be able to successfully complete the identification: (a) the characteristic image of the person or object to be identified in the test image exceeds a certain pixel size, (b) there are mutual In the case of overlapping feature images of people or objects, the overlapping parts do not exceed a certain ratio and affect the degree of recognizability. 8A to 8C show, after the image analysis application program 460 recognizes, the position of the "human head" in the picture of the test image is marked with a box of a certain color; The box marks the position of the specific object in the picture (picture) of the test image data. Although the angles of the "human head" are different in the pictures of the test images in Figure 8A and Figure 8B, as long as the feature image to be recognized exceeds a certain pixel size, the feature should be recognized come out. The training phase can only end when the overall recognition rate of the test image reaches a predetermined standard.

請參考圖9，圖9係說明圖5的步驟540的細部流程。依據本發明的一實施例，圖5的流程是針對本發明諸多可以實施的應用中的其中之一「分析最容易引起賣場中購物顧客興趣的商品」加以說明。圖5的流程開始於步驟700。 Please refer to FIG. 9 , which illustrates the detailed flow of step 540 in FIG. 5 . According to an embodiment of the present invention, the flow chart in FIG. 5 is illustrated for one of the many possible applications of the present invention "analyzing the commodities most likely to arouse the interest of shoppers in the store". The process of FIG. 5 starts at step 700 .

步驟700中，影像分析伺服器300取得要進行分析的影像資料。依據本發明的一實施例，取得影像資料的方式可以是影像擷取裝置(Image Capture Device)400將擷取的影像資料先透過網路180上傳(upload)至雲資料儲存單元150內儲存，之後影像分析伺服器300再透過網路190從雲資料儲存單元150下載(download)影像資料做進一步分析。依據本發明的另一實施例，影像資料也可以由該影像擷取裝置(Image Capture Device)400透過網路或連接線直接傳送到影像分析伺服器300。本發明中對影像資料進行分析的影像分析伺服器300不限於一個，也有可能是複數個。根據本發明一實施例，影像資料分析的工作皆是由單一影像分析伺服器300負責，在此情況下圖9的流程都在同一影像分析伺服器300內進行。另一種情況是，由複數個影像分析伺服器300協力完成影像分析的工作，則圖9的流程會被分成幾個部分，由不同的影像分析伺服器300各進行一部分。舉例來說，影像資料先傳送至一第一影像分析伺服器300做部分的影像處理與/或分析，然後將做過部分處理與/或分析的影像資料上傳到雲資料儲存單元150內儲存；另有一第二影像分析伺服器300隨時可以從雲資料儲存單元150下載影像資料做進一步的處理與分析。上述第一影像分析伺服器300與第二影像分析伺服器300可以都是具有完整功能的影像分析伺服器300，或是兩影像分析伺服器300或其中之一僅具有部分影像處理與分析的功能。依據本發明的一實施例，影像分析伺服器300接收到影像資料後必須先判斷影像資料的檔案格式是否是可以接受的檔案格式，若不是，則影像資料必須先經一檔案格式的轉檔步驟後再進行後續處理。 In step 700, the image analysis server 300 obtains image data to be analyzed. According to an embodiment of the present invention, the way to obtain the image data can be that the image capture device (Image Capture Device) 400 first uploads the captured image data to the cloud data storage unit 150 through the network 180 for storage, and then The image analysis server 300 then downloads (downloads) the image data from the cloud data storage unit 150 through the network 190 for further analysis. According to another embodiment of the present invention, image data can also be The image is directly transmitted to the image analysis server 300 by the image capture device (Image Capture Device) 400 through a network or a connection line. In the present invention, the image analysis server 300 for analyzing image data is not limited to one, and may be plural. According to an embodiment of the present invention, a single image analysis server 300 is responsible for image data analysis. In this case, the processes in FIG. 9 are all performed in the same image analysis server 300 . In another case, when multiple image analysis servers 300 cooperate to complete the image analysis work, the process shown in FIG. 9 will be divided into several parts, each of which is performed by different image analysis servers 300 . For example, the image data is first sent to a first image analysis server 300 for partial image processing and/or analysis, and then the partially processed and/or analyzed image data is uploaded to the cloud data storage unit 150 for storage; There is also a second image analysis server 300 that can download image data from the cloud data storage unit 150 at any time for further processing and analysis. The above-mentioned first image analysis server 300 and the second image analysis server 300 may both be image analysis servers 300 with complete functions, or the two image analysis servers 300 or one of them may only have partial image processing and analysis functions . According to an embodiment of the present invention, after the image analysis server 300 receives the image data, it must first judge whether the file format of the image data is an acceptable file format, if not, the image data must first go through a file format conversion step Then proceed with subsequent processing.

步驟705中，係辨識該影像資料中具有主題(subject)的部分；其中所謂的「主題(subject)」是，影像分析伺服器300經過深度學習後可以辨識之具有特定特徵的標的。影像資料的每一圖片(picture)中可能包含有一或複數個具有特定特徵的標的，這些標的可能是具有「人臉或頭部」特徵的標的、或是具有「物品或商品)」特徵的標的。依據本發明一實施例，由於影像資料中的每一圖片是由眾多的畫素所組成，故步驟705必須判斷圖片中包括了哪些「人」特徵的標的與哪些「物」特徵的標的，此外還要知道各個標的各自分布在圖片的哪一位置；換句話說，判斷圖片中每一具有「人」特徵的標的及每一具有「物」特徵的標的，及其個別在圖片中所佔的範圍(畫素大小)與在圖片中的位置，是執行步驟705的主要目的。一般來說，雖然影像資料的動態規格是由一秒鐘數張到數十張/幀的圖片(picture)所組成，根據本發明一實施例，在實作上不需要每一圖片都偵測其包括了多少具有「人」特徵的標的與具有「物」特徵的標的。若以最常見的30幀(frame)/秒(second)的規格為例，可以選擇，但不以此為限，在每間隔1/6秒時選取一幀(frame)圖片(picture)去偵測該圖片包括了多少具有「人」特徵的標的與具有「物」特徵的標的，以及其個別位於圖片中的位置與範圍大小。換句話說，在此假設前提下，步驟705須每1秒鐘判別6個圖片具有的「人」特徵的標的與「物」特徵的標的。上述每間隔1/6秒偵測一幀圖片僅做為本發明的一實施例，實際上該時間間隔可以為任意其它數值的時間間隔。必須注意的是：步驟705中對於圖片中「人」及/或「物」的偵測並非透過Data Base進行比對而實現，而是藉由在訓練過程的深度學習中所學習得到的「特徵對映(feature mapping)」來執行辨識圖片中「人」及/或「物」的工作。 In step 705, it is to identify the part of the image data that has a subject; where the so-called "subject" is a target with specific characteristics that can be identified by the image analysis server 300 after deep learning. Each picture of the video data may contain one or more objects with specific characteristics, these objects may be objects with the characteristics of "human face or head", or objects with the characteristics of "article or commodity)" . According to an embodiment of the present invention, since each picture in the image data is composed of many pixels, step 705 must determine which "person" feature objects and which "thing" feature objects are included in the picture, and in addition It is also necessary to know where each object is distributed in the picture; in other words, it is necessary to judge each object with "person" characteristics and each object with "object" characteristics in the image. , and their respective ranges (pixel size) and positions in the picture are the main purpose of executing step 705. Generally speaking, although the dynamic specification of image data is composed of several to dozens of pictures per second, according to an embodiment of the present invention, it is not necessary to detect every picture in practice. It includes how many targets with "human" characteristics and targets with "thing" characteristics. Taking the most common specification of 30 frames (frame)/second (second) as an example, you can choose, but not limited to, to select a frame (frame) picture (picture) at every interval of 1/6 second to detect Measure how many objects with "human" characteristics and objects with "object" characteristics are included in the picture, as well as their individual positions and ranges in the picture. In other words, under this assumption, the step 705 needs to distinguish the "person" feature object and the "object" feature object in the 6 pictures every 1 second. The above-mentioned detection of a frame of pictures at an interval of 1/6 second is only an embodiment of the present invention, and actually the time interval can be any other value of time interval. It must be noted that the detection of "person" and/or "object" in the picture in step 705 is not achieved through comparison with the Data Base, but through the "features" learned during the deep learning in the training process. feature mapping" to perform the task of identifying "people" and/or "things" in images.

在步驟705中對影像資料進行辨識後，可將辨識到的「人」與「物」的標的，區分為「顧客」與「商品」二條平行流程來處理。因此，在步驟705之後，辨識為「人(顧客)」的標的至步驟710以及執行其後的步驟；辨識為「物(商品)」的標的至步驟715以及執行其後的步驟。 After the image data is recognized in step 705, the recognized objects of "person" and "thing" can be divided into two parallel processes of "customer" and "product" for processing. Therefore, after step 705, the target identified as "person (customer)" goes to step 710 and the subsequent steps are performed; the target identified as "thing (commodity)" goes to step 715 and the subsequent steps are performed.

承上說明，步驟710是圖片中辨識為「人(顧客)」的標的所進入的步驟。由於圖片中呈現出人臉/頭部的範圍有大有小，所以圖片中若有過小區域(畫素)的人臉/頭部，可能無法有效的被偵測出來。於本發明一實施例，可以被偵測出來的畫素大小為40X40畫素(含)以上。該40X40畫素的門檻值僅是本發明提供的一實施參考，本發明並不以此為限，最低可被辨識的畫素門檻值可能是任意其它數值的畫素大小，端視軟硬體的能力而定。在步驟710中確定辨識為「人」的標的後，依據本發明一實施例，若不須維護顧客的身分(會員)資料，則進入步驟730之「顧客頭部方向之偵測」與步驟735之「顧客頭部位置之估算」等二個步驟，該二個步驟將在稍後說明。依據本發明另一實施例，若須要維護顧客的身分(會員)資料，則進入步驟720。 Continuing from the above description, step 710 is a step in which the object identified as "person (customer)" in the picture enters. Since the range of the face/head shown in the picture is large or small, if there is a face/head in a too small area (pixel) in the picture, it may not be effectively detected. In an embodiment of the present invention, the pixel size that can be detected is 40×40 pixels (inclusive). The threshold value of the 40X40 pixels is only an implementation reference provided by the present invention, and the present invention is not limited thereto. The minimum pixel threshold value that can be identified may be any other The pixel size of its value depends on the capabilities of the hardware and software. After determining the object identified as "person" in step 710, according to an embodiment of the present invention, if there is no need to maintain the customer's identity (membership) data, then enter step 730 of "detection of customer's head direction" and step 735 There are two steps in the "Estimation of Customer's Head Position", which will be explained later. According to another embodiment of the present invention, if the customer's identity (membership) data needs to be maintained, go to step 720 .

步驟720中，係對圖片中的顧客做身份辨識(Identity Recognition)。依據本發明的一實施例，對於圖片中所有的顧客的身份會逐一比對。賣場中的人可能是隨意走動四處張望觀看商品，也有可能是暫停腳步而專注於某些商品。依據本發明的一實施例，所謂的顧客是指那些有停下腳步且其觀看某一物品(商品)的時間長度有超過一時間門檻值的人；其中，設定時間門檻值可能是，例如：2秒鐘、5秒鐘、10秒鐘，但不以此為限，可以是設定的任一時間長度。在這種設定下，圖片中不符合上述條件的人，將不被認定是顧客。依據本發明的另一實施例，對於圖片中正在移動的人，但其注視視線持續於同一物品(商品)達一設定時間以上(例如：2秒鐘以上，但不以此為限)，也會被認定是顧客而進一步辨識其身份。 In step 720, identify the customer in the picture (Identity Recognition). According to an embodiment of the present invention, the identities of all customers in the pictures are compared one by one. People in the store may walk around randomly to look around to watch the products, or they may stop and focus on certain products. According to an embodiment of the present invention, the so-called customers refer to those who stop and watch a certain item (commodity) for a length of time exceeding a time threshold; where the time threshold may be set, for example: 2 seconds, 5 seconds, 10 seconds, but not limited thereto, can be any length of time set. Under this setting, people in the picture who do not meet the above conditions will not be identified as customers. According to another embodiment of the present invention, for people who are moving in the picture, but their gaze continues on the same item (commodity) for more than a set time (for example: more than 2 seconds, but not limited thereto), also It will be identified as a customer to further identify its identity.

此外，步驟720須要藉由比對圖2中資料庫(Data Base)450的資料，以達到對顧客做身份辨識的目的；其中，資料庫(Data Base)450的資料包含了曾經來訪的顧客ID以及各來訪顧客的「臉部特徵」的影像資料，該臉部特徵的影像資料包含了不同頭部方向(Head Direction)的影像資料。透過資料庫450內所記錄的各顧客的「臉部特徵」的影像資料，影像分析應用程式460可以依所記錄的各「臉部特徵」是否符合目前圖片中的顧客的「臉部特徵」，以辨識出顧客的身份。上述的顧客ID可能是顧客的會員ID，或是透過影像分析應用程式460所指派的臨時 ID。臨時ID的樣態可能是，例如：20200101-F0000，以日期、購物區域等資料所組成的數字組合。當顧客身份比對成功時，表示該顧客曾經來訪並被記錄於資料庫(Data Base)450中，此時不需於資料庫(Data Base)450中再新增該顧客的資料。相反地，當顧客身份比對不成功時，則表示該顧客不曾被記錄於資料庫(Data Base)450、或是該顧客的頭部影像資料的畫質不佳以致無法順利進行比對，此時可以在資料庫(Data Base)450中新增一顧客臨時ID以代表該顧客的身份。 In addition, step 720 needs to achieve the purpose of identifying the customer by comparing the data of the database (Data Base) 450 in Figure 2; wherein, the data of the database (Data Base) 450 includes the customer ID and Image data of "facial features" of each visiting customer, the image data of facial features includes image data of different head directions (Head Direction). Through the image data of each customer's "facial feature" recorded in the database 450, the image analysis application 460 can determine whether each recorded "facial feature" matches the "facial feature" of the customer in the current picture, to identify customers. The aforementioned customer ID may be the customer's membership ID, or a temporary ID assigned by the image analysis application 460 ID. The form of the temporary ID may be, for example: 20200101-F0000, a combination of numbers composed of date, shopping area and other information. When the identity comparison of the customer is successful, it means that the customer has visited and is recorded in the database (Data Base) 450, and there is no need to add the customer's information in the database (Data Base) 450 at this time. Conversely, when the customer identity comparison is unsuccessful, it means that the customer has never been recorded in the database (Data Base) 450, or the quality of the customer's head image data is not good so that the comparison cannot be carried out smoothly. At this time, a customer temporary ID can be added in the database (Data Base) 450 to represent the identity of the customer.

步驟725中，若是步驟720無法在資料庫(Data Base)450中比對到顧客臉部特徵的資料，則在資料庫450中新增顧客ID及記錄其臉部特徵的影像資料，並新增其注視商品的相關記錄；若是步驟720可以在資料庫(Data Base)450中比對到顧客臉部特徵的資料，則對比對到ID的顧客新增其注視商品的相關記錄，此外亦可以根據設定決定是否新增該顧客的臉部特徵的影像資料於資料庫(Data Base)450中。於本發明的一實施例，步驟725中除了記錄顧客ID、顧客的臉部特徵的影像資料之外，還會記錄注視商品的相關資訊，包括：日期/時間、注視商品的名稱/代號、注視商品的持續時間、注視商品的次數等資料。 In step 725, if step 720 fails to compare the data of the customer's facial features in the database (Data Base) 450, then add the customer ID and record the image data of its facial features in the database 450, and add The related record of its watching product; if step 720 can be compared to the data of the customer's facial features in the database (Data Base) 450, then the customer who is compared to the ID adds the related record of its watching product. In addition, it can also be based on The setting determines whether to add the image data of the customer's facial features in the database (Data Base) 450 . In an embodiment of the present invention, in step 725, in addition to recording the image data of the customer ID and the customer's facial features, the relevant information of the watched product will also be recorded, including: date/time, name/code of the watched product, watched The duration of the product, the number of times the product is viewed, etc.

依據本發明的另一實施例，圖9中的步驟720與步驟725皆省略，以增進影像資料擷取與分析系統100的整體的效能。在這種情況下，不執行步驟720中「與資料庫(Data Base)450做比對以辨識顧客ID」的步驟，對於圖片中的每一顧客都當成新顧客；在這種情況下，也不會執行步驟725中「在資料庫(Data Base)450中記錄顧客的臉/頭部的影像資料與其注視商品」的步驟。這些相關的顧客資料只會暫存於影像分析伺服器300裡的動態隨機存取記憶體(DRAM)320、非揮發性記憶體(NVRAM)330、或其他適合的儲存空間，不做永久性保存。 According to another embodiment of the present invention, both step 720 and step 725 in FIG. 9 are omitted to improve the overall performance of the image data acquisition and analysis system 100 . In this case, the step of "comparing with the database (Data Base) 450 to identify the customer ID" in step 720 is not performed, and each customer in the picture is regarded as a new customer; The step of "recording the image data of the customer's face/head in the database (Data Base) 450 and watching the product" in step 725 will not be performed. These relevant customer data will only be temporarily stored in the dynamic random access memory (DRAM) 320, non-volatile memory (NVRAM) 330, or other suitable storage spaces in the image analysis server 300, and will not be stored permanently .

依據本發明的另一實施例，除了上述於步驟720中「對圖片中的顧客做身份辨識」之外，更進一步包含辨識顧客的屬性資料，例如：性別、年紀、職業等，但不以此為限。請參考圖9中以虛線繪示的部分，若以辨識顧客的性別、年紀、職業為例，在步驟720之後可以接著執行步驟722A、步驟722B、步驟722C等3步驟。這3個步驟都是以虛線來繪示，代表這3個步驟可以選擇性執行或省略。在實際應用上，執行較多顧客屬性的辨識很可會降低系統100的整體效能，所以使用者可以視需求而決定是否增加辨識顧客的屬性。 According to another embodiment of the present invention, in addition to the above-mentioned "identify the customer in the picture" in step 720, it further includes identifying the attribute data of the customer, such as: gender, age, occupation, etc., but not based on this limit. Please refer to the part shown by the dotted line in FIG. 9 , if the identification of the customer's gender, age, and occupation is taken as an example, after step 720, three steps such as step 722A, step 722B, and step 722C can be executed. These 3 steps are all drawn with dotted lines, which means that these 3 steps can be selectively performed or omitted. In practical applications, performing identification of more customer attributes may reduce the overall performance of the system 100 , so users can decide whether to increase the identification of customer attributes according to requirements.

步驟722A中，係對圖片中的顧客做性別辨識與記錄。同上述，本發明基於CNN的架構，利用與性別相關的資料集(Data Set)做深度學習後，影像分析應用程式460學習得到辨識主題為性別的「特徵對映(feature mapping)」，可以據以辨識圖片中的顧客是年輕的女性、年長的女性、年輕的男性、年長的男性、女童或男童、或是幼兒等等。當辨識出顧客的性別後也會在資料庫450內該顧客的性別欄位新增其性別資料。同前述，步驟722A係以虛線來繪示，代表該步驟722A可以省略。 In step 722A, identify and record the gender of the customer in the picture. As mentioned above, the present invention is based on the architecture of CNN. After using the gender-related data set (Data Set) for in-depth learning, the image analysis application program 460 learns to obtain the "feature mapping (feature mapping)" that identifies the subject as gender. To identify whether the customer in the picture is a young woman, an old woman, a young man, an old man, a girl or a boy, or a toddler, etc. When the gender of the customer is identified, the gender data of the customer will be added to the gender field of the customer in the database 450 . Same as above, the step 722A is shown with a dotted line, which means that the step 722A can be omitted.

步驟722B中，係對圖片中的顧客做年紀推估與記錄。同上述，本發明基於CNN架構，利用與年紀相關的資料集(Data Set)做深度學習後，影像分析應用程式460學習得到辨識主題為年紀的「特徵對映(feature mapping)」，可以據以推估圖片中顧客的年紀。在訓練階段，資料庫450提供愈多的各年齡的男/女性/小孩的資料集(Data Set)供做深度學習，則步驟722B中影像分析應用程式460所推估的顧客年紀就會愈正確。當推估出顧客的年紀後也會在資料庫450內該顧客的年紀欄位新增其年紀資料。同前所述，步驟722B係以虛線來繪示，代表該步驟722B可以省略。 In step 722B, it is to estimate and record the age of the customers in the pictures. As mentioned above, the present invention is based on the CNN architecture. After deep learning is done using age-related data sets (Data Set), the image analysis application program 460 learns to obtain the "feature mapping (feature mapping)" that identifies the subject as age. Estimate the age of the customer in the picture. In the training phase, the more data sets (Data Set) of men/women/children of various ages are provided by the database 450 for deep learning, the more accurate the customer age estimated by the image analysis application 460 in step 722B will be. . After the age of the customer is estimated, the age data of the customer will be added to the age field of the customer in the database 450 . As mentioned above, the step 722B is drawn with a dotted line, which means that the step 722B can be omitted.

步驟722C中，係對圖片中的顧客做職業推估與記錄。同上述，本發明基於CNN架構，利用與職業相關的資料集(Data Set)做深度學習後，影像分析應用程式460學習得到辨識主題為職業的「特徵對映(feature mapping)」，可以據以推估圖片中顧客的職業。在訓練階段，資料庫450提供愈多的各種職業所穿著的服裝、或是其工作時所使用工具的資料集(Data Set)供做深度學習，則步驟722C中影像分析應用程式460所推估出的顧客的職業就會愈接近事實。當推估出顧客的職業後也會在資料庫450內該顧客的職業欄位新增其職業資料。同前所述，步驟722C係以虛線來繪示，代表該步驟722C可以省略。 In step 722C, it is to estimate and record the occupation of the customer in the picture. As mentioned above, the present invention is based on the CNN architecture, and after deep learning is done using the data set (Data Set) related to the occupation, the image analysis application program 460 learns and obtains the "feature mapping (feature mapping)" that identifies the subject as the occupation, which can be based on Estimate the occupation of the customer in the picture. In the training phase, the database 450 provides more clothing worn by various occupations, or data sets (Data Set) of tools used in their work for deep learning, the image analysis application 460 estimates in step 722C The occupation of the customer will be closer to the truth. After the customer's occupation is estimated, the customer's occupation information will be added to the occupation field of the customer in the database 450 . As mentioned above, the step 722C is drawn with a dotted line, which means that the step 722C can be omitted.

步驟730中，係對圖片中顧客頭部方向(Head Direction)的偵測。承前面步驟710的說明，在步驟710執行完畢之後，會以平行處理的方式執行步驟730中「顧客頭部方向的偵測」與步驟735中「顧客頭部位置之估算」。依據本發明的一實施例，進行步驟730之前，須先在圖5的訓練階段中利用如圖7A的資料集(DataSet)讓影像分析應用程式460做深度學習，以學習得到的主題為頭部方向的「特徵對映(feature mapping)」，然後再依據該頭部方向的「特徵對映(feature mapping)」判定圖片中顧客頭部方向最可能的態樣。以圖7A為例，其中編號1-91號的圖片呈現同一女性於各種不同頭部方向的影像，每一影像都有相對應的標籤(label)，每一標籤(label)內的記載的資訊至少包括「主題(subject)」(例如：女性的頭部)、及「主題(subject)的翻滾(Roll)、俯仰(Pitch)、偏擺(Yaw)三數值」。由上所述可知：編號1-91號的圖片係呈現一女性的臉/頭部於不同的「頭部方向」下的態樣。於實際應用時，圖片中顧客的頭部方向(即由頭部的翻滾(Roll)、俯仰(Pitch)、偏擺(Yaw)等值所形成的空間中翻轉自由度)大致上可以對應到學習過的編號1-91其中一種狀況，因此影像分析應用程式460可以由過去訓練階段時深度學習所得到的「特徵對映(feature mapping)」，來判定圖片中顧客頭部方向相對應的Roll,Pitch,Yaw等數值。若圖片中該顧客頭部方向並不完全符合上述編號1-91的其中之一者，步驟730中也會依照學習到的「特徵對映(feature mapping)」對顧客的頭部方向做一最接近的判斷，推估其最可能的Roll,Pitch,Yaw等數值。此外，根據本發明一實施例，步驟730可以進一步計算顧客頭部於圖片中的位置與範圍大小，並把圖片中屬於顧客臉/頭部的部分區分(標示)出來。在可以辨識的情況下，若一圖片中包括10位顧客，則10位顧客的頭部方向都會被偵測出來，且進一步該10位顧客的臉/頭部也會在圖片中被標示出來，如圖8A或是圖8B中以顏色粗框標示的顧客的臉/頭部。 In step 730, it is to detect the head direction (Head Direction) of the customer in the picture. Following the previous description of step 710, after step 710 is executed, the "detection of customer's head direction" in step 730 and the "estimation of customer's head position" in step 735 will be executed in parallel. According to an embodiment of the present invention, before performing step 730, the image analysis application program 460 must first use the data set (DataSet) as shown in FIG. 7A to do deep learning in the training phase of FIG. According to the "feature mapping" of the direction of the head, and then according to the "feature mapping" of the head direction, the most likely shape of the customer's head direction in the picture is determined. Taking Figure 7A as an example, the pictures numbered 1-91 present images of the same woman in different head directions, each image has a corresponding label (label), and the information recorded in each label (label) At least include "subject" (for example: female head), and "three values of roll, pitch, and yaw of the subject". From the above, it can be seen that the pictures numbered 1-91 show the face/head of a woman under different "head orientations". In practical applications, the direction of the customer's head in the picture (that is, the degree of freedom in the space formed by the head's roll (Roll), pitch (Pitch), yaw (Yaw) and other values) can roughly correspond to the learning One of the conditions numbered 1-91 passed, so the image analysis application 460 can learn from the depth Learn the obtained "feature mapping (feature mapping)" to determine the Roll, Pitch, Yaw and other values corresponding to the direction of the customer's head in the picture. If the direction of the customer's head in the picture does not completely match one of the above numbers 1-91, in step 730, the customer's head direction will also be determined according to the learned "feature mapping (feature mapping)". Approximate judgment, estimate the most likely Roll, Pitch, Yaw and other values. In addition, according to an embodiment of the present invention, step 730 can further calculate the position and size of the customer's head in the picture, and distinguish (mark) the part of the customer's face/head in the picture. If it can be identified, if there are 10 customers in a picture, the head directions of the 10 customers will be detected, and the faces/heads of the 10 customers will also be marked in the picture, The customer's face/head marked with a thick colored frame as shown in Figure 8A or Figure 8B.

步驟735中，係對圖片中顧客臉/頭部在環境(賣場)中的位置做估算。本發明在執行圖9所示的應用流程之前，須先對影像擷取裝置400做校準(calibration)的工作，以得到影像擷取裝置400的相關參數(例如：Intrinsic/Extrinsic matrix)，經過深度學習訓練後，步驟735可由影像擷取裝置400的參數與圖片中顧客的影像資料，推算出圖片中顧客與影像擷取裝置400的距離與相對位置。依據本發明的一實施例，該特定物體與該影像擷取裝置400的相對距離可以視為空間中的兩個點之間的距離。一般可以將影像擷取裝置400設為原點，所以影像擷取裝置400於3D立體空間的座標表示為(0,0,0)。而若可以推算出該特定物體於3D立體空間的座標，例如：(100,552,211)，則特定物體相對於原點(影像擷取裝置400)的距離就是一可計算的長度。依據本發明的另一實施例，3D立體空間中的原點亦可設定為圖片中賣場的兩面相鄰牆壁與地面的三角交會點。總而言之，本發明先對影像擷取裝置400做校準(calibration)的工作，以得知2D圖片中的特定點對影像擷取裝置400或參考點(設定的原點)在3D空間中的相對距離。據此，步驟735中便可利用校準後的影像擷取裝置400估算顧客頭部在3D空間中的位置。 In step 735, the position of the customer's face/head in the picture in the environment (store) is estimated. In the present invention, before executing the application process shown in FIG. 9 , the image capture device 400 must be calibrated to obtain relevant parameters of the image capture device 400 (for example: Intrinsic/Extrinsic matrix). After learning and training, step 735 can calculate the distance and relative position between the customer and the image capture device 400 in the picture based on the parameters of the image capture device 400 and the image data of the customer in the picture. According to an embodiment of the present invention, the relative distance between the specific object and the image capture device 400 can be regarded as the distance between two points in space. Generally, the image capture device 400 can be set as the origin, so the coordinates of the image capture device 400 in the 3D three-dimensional space are expressed as (0,0,0). And if the coordinates of the specific object in the 3D space can be calculated, for example: (100, 552, 211), then the distance of the specific object relative to the origin (the image capture device 400 ) is a calculable length. According to another embodiment of the present invention, the origin in the 3D three-dimensional space can also be set as the triangular intersection point between two adjacent walls and the ground of the store in the picture. In a word, the present invention performs calibration on the image capture device 400 first, so as to know the relative distance of a specific point in the 2D picture to the image capture device 400 or the reference point (set origin) in 3D space . Accordingly, in step 735 , the calibrated image capture device 400 can be used to estimate the position of the customer's head in the 3D space.

步驟740中，係推測顧客的注視區域(Gazing Area)及估算該注視區域中各部份可能是顧客視線落點的機率(probability)。依據本發明的一實施例，經過步驟730中顧客頭部方向之偵測與步驟735中顧客頭部位置的估算後，可以得知圖片中某一顧客的「頭部方向」(roll、pitch、yaw三數值)與顧客頭部於立體空間中的位置。透過此二種資訊，進行以下模擬：從顧客頭部的正後方設置一虛擬光源，該虛擬光源以與該頭部正前方的方向投射出光線(light)，設定該投射光線延伸一設定距離、或是一直延伸直到有物體(商品)的位置，然後在該設定距離處、或該物體(商品)位置處，假設有一模擬的屏幕且該投射光線在上面形成一模擬的投影(projection)。該模擬投影所形成的投影區域是該顧客可能的注視區域，如圖11A與圖11B中所示的橢圓形投影區域815A、815B。根據本發明一實施例，該投影區域中進一步區分為複數個子投影區域，其中具有最高機率的子投影區域是該顧客的視線最可能聚焦的範圍，也可以推論是環境(賣場)中最引起顧客興趣的範圍，如圖11B中的子投影區域820。以圖11B為例，子投影區域820中所顯示的百分比數值為95%，其代表該顧客可能注視該子投影區域820的機率為95%。此外，以該子投影區域820為中心向外環狀放射的其他子投影區域830、840分別顯示有不同的機率數值，各是65%與15%，同樣地也代表顧客可能注視該子投影區域的機率。一般來說，人們的視線應該大多看向正前方並較多機會凝視自己正前方的物體。以此假設為前提，根據本發明一實施例，設定以該橢圓形投影區域的中心點為基準點劃分出複數個同心的子投影區域，離開中心點越遠的子投影區域，其被注視的機率愈小，如圖11B所示。根據本發明另一實施例，考量人們的眼球會轉動因此其視線可能並非看向正前方，因此設定劃分各個子投影區域的基準點偏離投影區域的中心，也可能偏左、偏右、偏上、或偏下等等。在步驟740中，圖片中的每一顧客依據其不同的頭部方向都會有各自的投影區域，每一顧客各自的投影區域代表該每一顧客的注視區域。 In step 740, the customer's gaze area (Gazing Area) is estimated and the probability (probability) that each part of the gaze area may be the customer's gaze point is estimated. According to an embodiment of the present invention, after the detection of the customer's head direction in step 730 and the estimation of the customer's head position in step 735, the "head direction" (roll, pitch, yaw three values) and the position of the customer's head in the three-dimensional space. Through these two kinds of information, the following simulation is carried out: a virtual light source is set from the back of the customer's head, and the virtual light source projects light in the direction directly in front of the head, and the projected light is set to extend a set distance, Or extend until there is an object (commodity), and then at the set distance or the position of the object (commodity), suppose there is a simulated screen on which the projected light forms a simulated projection. The projection area formed by the simulated projection is the possible gaze area of the customer, such as the elliptical projection areas 815A and 815B shown in FIG. 11A and FIG. 11B . According to an embodiment of the present invention, the projection area is further divided into a plurality of sub-projection areas, wherein the sub-projection area with the highest probability is the range where the customer's line of sight is most likely to focus, and it can also be deduced that the environment (store) most attracts customers. The range of interest is, for example, the sub-projection area 820 in FIG. 11B . Taking FIG. 11B as an example, the percentage value displayed in the sub-projection area 820 is 95%, which means that the probability that the customer may watch the sub-projection area 820 is 95%. In addition, the other sub-projection areas 830 and 840 that radiate outward in a circular shape with the sub-projection area 820 as the center display different probability values, respectively 65% and 15%, which also means that customers may look at this sub-projection area probability. Generally speaking, people's line of sight should mostly look straight ahead and have more opportunities to stare at objects directly in front of them. Based on this assumption, according to an embodiment of the present invention, it is set to divide a plurality of concentric sub-projection areas with the center point of the elliptical projection area as the reference point, and the farther the sub-projection area is from the center point, the watched The probability is smaller, as shown in Figure 11B. According to another embodiment of the present invention, considering that people's eyeballs will rotate, so their line of sight may not be looking straight ahead, so the reference point for dividing each sub-projection area is set to deviate from the center of the projection area, and may also be left, right, or up. , or lower and so on. In step 740, each customer in the picture has its own projected area according to its different head orientations, Each customer's respective projection area represents the gaze area of each customer.

請參考圖11C，根據本發明另一實施例，假設一顧客向左轉角度θ往一方向(3)觀看，使用步驟735中之將影像擷取裝置400設為原點O，推算出顧客頭部的中心點H與原點間的距離跟顧客頭部正前方向左轉角度θ。接著，依據一旋轉矩陣公式計算出一注視點(Gazing Point)G，進而以注視點G與顧客頭部的中心點H所形成之一第一向量的方向推算出位於顧客頭部的中心點H正後方、距離中心點H為的適當位置處P，為一虛擬光源的投影起始點(Fictitious Projection Point)P；該適當位置P，於本發明一實施例中，可以是頭部後方大約3倍的頭部模擬球體直徑之處。此時，以一方程式(x-H_x)²+(y-H_y)²=r²模擬顧客頭部外圍形狀，其中(H_x,H_y)代表頭部模擬球體的中心點、r代表頭部模擬球體的半徑；然後，透過一切點方程式推算出虛擬投影起始點P與方程式(x-H_x)²+(y-H_y)²=r²所模擬的顧客頭部外圍形狀間之二個切點L與R，從而形成二向量，亦即由切點L與虛擬投影起始點P、由切點R與虛擬投影起始點P所形成一第二向量PL跟一第三向量PR。最後，依據前述之第二向量PL跟第三向量PR在屏幕810上之交點，據以形成一模擬投影區域(即為投影區域820)。 Please refer to FIG. 11C , according to another embodiment of the present invention, assuming that a customer turns left at an angle θ to look in a direction (3), use the image capture device 400 as the origin O in step 735 to calculate the customer's head The distance between the center point H of the head and the origin is the angle θ of the customer's head turning left in the direction directly in front of the head. Next, calculate a gaze point (Gazing Point) G according to a rotation matrix formula, and then use the direction of a first vector formed by the gaze point G and the center point H of the customer's head to calculate the center point H located on the customer's head Right behind, the appropriate position P at a distance from the central point H is the projection starting point (Fictitious Projection Point) P of a virtual light source; the appropriate position P, in one embodiment of the present invention, may be about 3 times the diameter of the head to simulate a sphere. At this time, use the equation (xH _x ) ² +(yH _y ) ² =r ² to simulate the peripheral shape of the customer's head, where (H _x ,H _y ) represents the center point of the head simulation sphere, and r represents the head simulation sphere Then, the two tangent points L and R between the starting point P of the virtual projection and the peripheral shape of the customer's head simulated by the equation (xH _x ) ² +(yH _y ) ² =r ² are calculated through the tangent point equation, Thus, two vectors are formed, that is, a second vector PL and a third vector PR are formed by the tangent point L and the virtual projection start point P, and by the tangency point R and the virtual projection start point P. Finally, according to the above-mentioned intersection point of the second vector PL and the third vector PR on the screen 810 , a simulated projection area (that is, the projection area 820 ) is formed.

根據本發明另一實施例，可以依據需求推算所需數量之切點，以形成對應數量之向量，進而得出前述向量在屏幕810上之交點，並據以形成一區域(即為投影區域820)。 According to another embodiment of the present invention, the required number of tangent points can be calculated according to the demand to form a corresponding number of vectors, and then the intersection points of the aforementioned vectors on the screen 810 can be obtained to form an area (that is, the projection area 820) .

請參考圖11D，根據本發明另一實施例，假設一顧客頭部向右翻滾角度α、向上俯仰角度β以及向右偏擺角度γ往一方向(4)觀看，使用步驟735中之將影像擷取裝置400設為原點O，推算出顧客頭部的中心點H與原點間的距離跟顧客頭部正前方向(方向(4))的頭部轉動角度。接著，依據一旋轉矩陣公式計算出一注視點(Gazing Point)G，進而以注視點G與顧客頭部的中心點H所形成之一第四向量的方向推算出位於顧客頭部的中心點H正後方、距離中心點H為r的位置處，為一虛擬投影起始點(Fictitious Projection Point)P；該r位置，於本發明一實施例中，可以是頭部後方位於頭部模擬球體直徑的3.5倍之處。此時，以一方程式(x-H_x)²+(y-H_y)²+(z-H_z)²=r²模擬顧客頭部外圍形狀，並透過一切點方程式推算出虛擬投影起始點P與方程式(x-H_x)²+(y-H_y)²+(z-H_z)²=r²所模擬的顧客頭部外圍形狀間之二個切點A與B，從而形成二向量，亦即由切點A與虛擬投影點P、由切點B與虛擬投影點P所形成一第五向量PA跟一第六向量PB。最後，依據前述之第二向量PA跟第三向量PB在一第一虛擬平面(屏幕810)上之交點，據以形成一模擬投影區域(即為投影區域820)。 Please refer to FIG. 11D , according to another embodiment of the present invention, assuming that a customer's head is looking at a direction (4) with a right roll angle α, an upward pitch angle β, and a right yaw angle γ, using the image in step 735 The capture device 400 is set as the origin O, and calculates the distance between the center point H of the customer's head and the origin and the head rotation angle of the customer's head in the forward direction (direction (4)). Next, a gaze point (Gazing Point) G is calculated according to a rotation matrix formula, and then the center point H located on the customer's head is calculated from the direction of a fourth vector formed by the gaze point G and the center point H of the customer's head Right behind, at a position of r from the central point H, is a fictitious projection starting point (Fictitious Projection Point) P; the r position, in one embodiment of the present invention, can be located at the diameter of the simulated sphere behind the head 3.5 times of that. At this time, use the equation (xH _x ) ² +(yH _y ) ² +(zH _z ) ² =r ² to simulate the peripheral shape of the customer's head, and calculate the virtual projection starting point P and the equation (xH _x ) ² +(yH _y ) ² +(zH _z ) ² =r 2 The two tangent points A and B between the peripheral shape of the customer's head simulated by r ² form a two-vector, that is, the tangent point A and the virtual projection point P , A fifth vector PA and a sixth vector PB are formed by the tangent point B and the virtual projection point P. Finally, according to the aforementioned intersection of the second vector PA and the third vector PB on a first virtual plane (screen 810 ), a simulated projection area (that is, the projection area 820 ) is formed.

根據本發明另一實施例，參酌人因工程(Human Factors Engineering)領域的研究，人類眼睛的視野(visual fields)在頭部靜止不轉動的狀態下，單眼視野在垂直方向上約有120°~140°的視野大小、在水平方向上約有150°的視野大小，是故雙眼所共同的視野領域在垂直方向、水平方向上分別約有60°、90°的視野大小。若要更仔細觀看一物體(標的)的話(也意味著對於該物體(標的)的專注度越高)，則視野領域的角度必然會更狹窄。因此，於本發明一實施例，採用垂直方向為30°的角度、水平方向為30°的角度所形成的一或多條虛擬投影線定義訪客雙眼的共同視野領域，其中該一或多條虛擬投影線往後延伸交會於該訪客之人頭部後方的約6r處(r為頭部半徑)之一點(一虛擬投影點P，如圖11C所示)。從該點(該虛擬投影點P)沿著該一或多條虛擬線(例如圖11C所示之該第五向量PA跟該第六向量PB)來模擬一圓錐狀的模擬投影照射區域，並在一或多個虛擬平面(例如圖11D之該第一虛擬平面(屏幕810)、一第二虛擬平面(屏幕812))形成該訪客之一或多個模擬投影區域(例如圖11D之該投影區域820、一投影區域822)。若該物品(標的)的部份或全部為該圓錐狀模擬投影照射區域內，則表示該物品(標的)為該顧客所關注的標的，其中，若該物品(標的)的位置越靠近中心線(例如圖11C中之由點H與點O所形成一第一中心線及由點H與點G所形成一第二中心線)，則表示該顧客不僅關注該物品(商品)且關注該物品(商品)的程度越高。 According to another embodiment of the present invention, with reference to the research in the field of Human Factors Engineering, the visual field of the human eye (visual fields) is about 120° in the vertical direction when the head is still and does not rotate. The field of view of 140° has a field of view of about 150° in the horizontal direction, so the common field of vision of both eyes has a field of view of about 60° and 90° in the vertical direction and horizontal direction, respectively. If you want to watch an object (target) more carefully (also means that the concentration of the object (target) is higher), the angle of the field of view will inevitably be narrower. Therefore, in one embodiment of the present invention, one or more virtual projection lines formed by an angle of 30° in the vertical direction and an angle of 30° in the horizontal direction are used to define the common visual field of the visitor's eyes, wherein the one or more The virtual projection line extends backward and intersects at a point (a virtual projection point P, as shown in FIG. 11C ) about 6r behind the visitor's head (r is the radius of the head). From the point (the virtual projection point P) along the one or more virtual lines (such as the fifth vector PA and the sixth vector PB shown in FIG. 11C ) to simulate a conical simulated projection irradiation area, and On one or more virtual planes (such as the first virtual plane (screen 810) and a second virtual plane (screen 812) of FIG. area 820, a projected area 822). If part or all of the item (target) is within the cone-shaped simulated projection irradiation area, it means that the item (target) is the target that the customer is concerned about, wherein, if the position of the item (target) is closer to Near the center line (such as a first center line formed by point H and point O in Figure 11C and a second center line formed by point H and point G), it means that the customer not only pays attention to the article (commodity) but also The higher the degree of attention to the item (commodity) is.

根據本發明另一實施例，該旋轉矩陣公式可以是一個二維旋轉矩陣公式如下：

。 According to another embodiment of the present invention, the rotation matrix formula may be a two-dimensional rotation matrix formula as follows:

.

根據本發明另一實施例，該旋轉矩陣公式可以是一個三維旋轉矩陣公式如下：

According to another embodiment of the present invention, the rotation matrix formula may be a three-dimensional rotation matrix formula as follows:

上述說明中，依據本發明的一實施例，只有當顧客停下腳步且其觀看某一物品(標的)的時間長度超過一設定的時間門檻值，才會針對他們推測其注視區域(Gazing Area)；其中的設定時間可能是，例如：2秒鐘，但不以此為限，可以是設定的任一時間長度。依據本發明的另一實施例，若是顧客正在移動且其注視視線持續於同一物品(標的)達超過一設定的時間門檻值以上(例如：2秒鐘以上，但不以此為限)，也會針對他們推測其注視區域(Gazing Area)。除了無法辨識出顧客的頭部方向的情形外，圖片中的每一顧客都應可估算出其注視區域(投影區域)與其中子投影區域中被注視的機率大小。上述「無法辨識顧客的頭部方向」的原因可能是：(1)圖片中某一顧客的頭部影像不完整，例如：該顧客的頭部影像有大面積被其他顧客或物體擋住；(2)圖片中某一顧客的頭部影像範圍所包含的畫素太小，致使該顧客的頭部無法被辨識出來。依據本發明所揭露的技術，圖片中顧客的頭部影像就算沒有包括臉部的眼睛(例如：後側的頭部影像)，該顧客的頭部方向還是可以被辨識出來。依據本發明的另一實施例，當圖片中顧客的頭部影像明顯地呈現出臉部的眼睛特徵時，可以計算並參考顧客的視線方向，將上述投影區域做進一步限縮，以增加估算顧客注視區域的正確性。加上計算視線方向後限縮的注視區域可能如圖11B中的820E所示。步驟740執行完畢，即跳至步驟750執行。 In the above description, according to an embodiment of the present invention, only when the customer stops and the time length of viewing a certain item (target) exceeds a set time threshold value, the Gazing Area (Gazing Area) will be estimated for them ; The set time may be, for example, 2 seconds, but not limited thereto, and may be any set time length. According to another embodiment of the present invention, if the customer is moving and his gaze continues on the same item (target) for more than a set time threshold (for example: more than 2 seconds, but not limited thereto), also The Gazing Area will be estimated for them. In addition to the situation where the head direction of the customer cannot be identified, each customer in the picture should be able to estimate the probability of being watched in its gaze area (projection area) and its sub-projection area. The reasons for the above "unrecognizable customer's head direction" may be: (1) The head image of a certain customer in the picture is incomplete, for example: a large area of the customer's head image is blocked by other customers or objects; (2) ) image range of a certain customer's head in the picture contains too small pixels, so that the customer's head cannot be recognized. According to the technology disclosed in the present invention, even if the head image of the customer in the picture does not include the eyes of the face (for example: the head image from the back side), the direction of the head of the customer can still be identified. According to another embodiment of the present invention, when the head image of the customer in the picture clearly shows the features of the eyes of the face, the above-mentioned projection area can be further restricted by calculating and referring to the direction of the customer's line of sight, so as to increase the estimated customer Correctness of gaze area. plus The narrowed gaze area after calculating the gaze direction may be shown as 820E in FIG. 11B . After step 740 is executed, skip to step 750 for execution.

回到上述步驟705，當步驟705對影像資料的特定圖片中具有特定特徵的部分辨識為「物(商品)」時，則進入步驟715。 Going back to step 705 above, when step 705 recognizes a part with a specific feature in a specific picture of the image data as an "object (commodity)", then go to step 715 .

步驟715中，係對圖片中辨識為「物(商品)」的標的所進入的步驟。本發明基於CNN的架構，在訓練階段時透過資料庫450內各種「物(商品)」的資料集(Data Set)做深度學習，之後影像分析應用程式460可以學習到主題為各種「物(商品)」的「特徵對映(feature mapping)」，所以在步驟715中可以辨識出圖片(幀)中屬於「物(商品)」的標的。除了無法辨識的商品外，圖片中的所有商品應都可被辨識出來。上述「無法辨識商品」的原因可能是：(1)圖片中某一商品的影像資料不完整，例如：該商品的影像範圍有大面積被其他的顧客或物品擋住；(2)圖片中該商品的影像範圍所包含的畫素太小，致使該商品無法被辨識出來。由於圖片中所包含的商品其影像範圍大小不一，所以過小影像範圍的商品可能無法有效地被辨識出來。於本發明一實施例，商品可以順利被偵測並被辨識出來的畫素大小為40X40畫素(含)以上。該40X40畫素的門檻值僅是本發明提供的一實施參考，本發明並不以此為限，實際上最低可被辨識的畫素門檻值可能是任意其它數值的畫素大小，端視軟硬體的能力而定。一般來說商場中的商品種類非常的多，為了可以正確地辨識出每一種商品，故在訓練階段，每一種商品的資料集(Data Set)都必須先提供給影像分析應用程式460做深度學習。於本發明一實施例，定義各個商品的資料集(Data Set)中，其影像資料也呈現了「該商品於立體空間中的不同旋轉自由度下所呈現的態樣」，如圖7B與圖7C所示；甚至在影像資料的標籤(Label)中記錄了代表「空間中旋轉自由度」的翻滾(Rell)、俯仰(Pitch)、偏擺(Yaw)三個旋轉角度的數值。步驟715執行完畢接著執行步驟745。 In step 715, it is a step to enter into the target identified as "object (commodity)" in the picture. The present invention is based on the architecture of CNN. During the training phase, deep learning is done through the data sets (Data Set) of various "things (commodities)" in the database 450. Afterwards, the image analysis application program 460 can learn that the themes are various "things (commodities)". )” of “feature mapping (feature mapping)”, so in step 715, the target belonging to “object (commodity)” in the picture (frame) can be identified. All products in the image should be identifiable, except for unrecognizable products. The reasons for the above "unidentifiable product" may be: (1) the image data of a certain product in the picture is incomplete, for example: a large area of the image of the product is blocked by other customers or objects; (2) the product in the picture The image area for contains too small pixels for the item to be recognizable. Because the image size of the products included in the picture is different, the products with too small image size may not be effectively identified. In one embodiment of the present invention, the pixel size in which products can be successfully detected and identified is 40×40 pixels (inclusive). The threshold value of 40X40 pixels is only an implementation reference provided by the present invention, and the present invention is not limited thereto. In fact, the minimum identifiable pixel threshold value may be the pixel size of any other value. Depends on the capabilities of the hardware. Generally speaking, there are many kinds of commodities in the shopping malls. In order to correctly identify each commodity, the data set (Data Set) of each commodity must first be provided to the image analysis application 460 for deep learning during the training phase. . In one embodiment of the present invention, in the data set (Data Set) that defines each product, its image data also presents "the appearance of the product under different rotational degrees of freedom in three-dimensional space", as shown in Figure 7B and Figure 7B. 7C; even in the label (Label) of the image data, roll (Rell), pitch (Pitch), yaw The value of the three rotation angles of the pendulum (Yaw). Step 715 is executed and then step 745 is executed.

依據本發明的另一實施例，步驟715可以不被執行，也就是不對影像資料中有關商品的部分做偵測。有關商品的部分，改為採用一種輸入商品資料的方式，供影像分析應用程式460參考與比對，其細節將於後文中說明。於此實施例，在訓練階段可以不提供商品的資料集(Data Set)供影像分析應用程式460做深度學習。 According to another embodiment of the present invention, step 715 may not be executed, that is, the portion of the product in the image data is not detected. For the part about the product, a method of inputting product data is used instead for reference and comparison by the image analysis application program 460 , the details of which will be described later. In this embodiment, the product data set (Data Set) may not be provided for the image analysis application 460 to do deep learning during the training phase.

步驟745中，係估算或取得商品在環境(賣場)中的位置。依據本發明一實施例，步驟745中是透過辨識影像資料(圖片)中的商品來估算其位置。與步驟735相同的理由，換算2D圖片中的特定點(物體)於3D立體空間中的位置前，須要先對影像擷取裝置400做校準(calibration)的工作。所以，依據本發明一實施例，步驟745中估算所得圖片中某一商品的位置可以用3D立體空間座標來表示，例如：(200,20,525)；或是以2D平面座標來表示，例如：(550,200)。 In step 745, the location of the product in the environment (store) is estimated or obtained. According to an embodiment of the present invention, in step 745, the position of the product is estimated by identifying the product in the image data (picture). For the same reason as step 735 , before converting the position of a specific point (object) in the 2D image in the 3D stereo space, the image capture device 400 needs to be calibrated first. Therefore, according to an embodiment of the present invention, the position of a product in the picture estimated in step 745 can be represented by 3D three-dimensional space coordinates, for example: (200, 20, 525); or by 2D plane coordinates, for example: ( 550,200).

依據本發明另一實施例，步驟745可以藉由參考已輸入之商品資料，來取得「影像擷取裝置400所處的環境(賣場)中各個商品的擺放位置」。上述所謂「已輸入之商品資料」是指：事先在資料庫450中輸入已經設定好之各個商品的擺放位置，例如：各個商品擺放在環境(賣場)中的哪一空間點/範圍、或是賣場中的哪一貨架的哪一位置…等。在這種情況下，影像分析應用程式460並不需要辨識影像資料中有哪些商品，也不需要藉由辨識影像資料中的商品來估算其位置。當需要知道顧客的注視方向(或有興趣的範圍內)有哪些商品時，只要參考事先在資料庫450中輸入的商品位置資料，便可以達到目的。 According to another embodiment of the present invention, step 745 can obtain "the placement position of each product in the environment (store) where the image capture device 400 is located" by referring to the input product data. The above-mentioned so-called "input commodity data" refers to: input the placement position of each commodity that has been set in the database 450 in advance, for example: which space point/range in the environment (store) each commodity is placed, Or which position of which shelf in the store...etc. In this case, the image analysis application 460 does not need to identify which products are in the image data, nor does it need to estimate the location of the products by identifying the products in the image data. When it is necessary to know which products are in the customer's gaze direction (or within the range of interest), the purpose can be achieved by referring to the product location data input in the database 450 in advance.

另，必須注意的是，本發明的步驟735與步驟745中，對於「顧客頭部位置的估算」及「商品位置的估算」，無論是要計算何者在3D立體空間中的位置，都不需同時透過二部影像擷取裝置400的二個攝影鏡頭(Lens)，僅需要透過一部影像擷取裝置400內部的一個攝影鏡頭即可達到此功能。雖然一般的3D影像攝影機在外觀上就只是單一部影像攝影機，但其內部卻包括至少二個或是二個以上的攝影鏡頭。所以，本發明技術亦可不須借重3D影像攝影機的複數個鏡頭，僅需要透過一個普通影像擷取裝置400的一個鏡頭，即可推算出2D圖片中各個位置點(代表人或物)在環境中的3D位置。 In addition, it must be noted that in step 735 and step 745 of the present invention, for "estimation of the customer's head position" and "estimation of the product position", no matter which position in the 3D three-dimensional space is to be calculated, it is not necessary At the same time, through the two camera lenses (Lens) of the two image capture devices 400 , this function can be achieved only through one camera lens inside one image capture device 400 . Although a general 3D video camera is just a single video camera in appearance, it includes at least two or more photographic lenses inside. Therefore, the technology of the present invention does not need to rely on multiple lenses of a 3D video camera, and only needs to use one lens of a common image capture device 400 to calculate the position of each position point (representing a person or object) in the 2D picture in the environment. 3D position.

步驟750中，匯整步驟740中「顧客注視區域及其機率」與步驟745中「商品位置」的資訊，估算圖片中哪些商品被顧客注視區域所涵蓋與其獲得顧客目光(興趣)的機率。依圖9的流程圖所示，步驟745與步驟740執行後的結果皆輸出至步驟750。步驟740執行後可以得知圖片中顧客的注視區域的可能涵蓋範圍與該注視區域是顧客視線落點的機率；步驟745執行後可以得知環境(賣場)中的商品於3D立體空間中的位置。所以根據步驟740與步驟745提供的資訊，步驟750可以推測出，顧客的注視區域範圍內涵蓋了哪些商品，即「圖片中的顧客正在注視哪一些商品」。其中，顧客的注視區域本身又進一步區分為各種不同機率的子區域。所以，以上「圖片中的顧客正在注視哪一些商品」可以「顧客的注視區域的各子區域各涵蓋甚麼商品及其被注視的機率」來表達，或是選擇「落在顧客的注視區域內最高機率的一或複數個子區域的商品」。若顧客的注視區域或其子區域涵蓋了2種或以上的商品，則該些商品都會被包含進來。依據本發明的另一實施例，對於圖片中每一顧客所注視的商品並非以該商品的項目(種類)來表示，而係以該商品於該圖片的平面座標來表示、或是該商品於空間中的座標來表示。 In step 750, the information of "customer gaze area and its probability" in step 740 and "commodity location" in step 745 is collected to estimate which products in the picture are covered by the customer gaze area and the probability of gaining customer attention (interest). As shown in the flowchart of FIG. 9 , the results of step 745 and step 740 are both output to step 750 . After step 740 is executed, the possible coverage of the customer's gaze area in the picture and the probability that the gaze area is the customer's line of sight can be known; after step 745 is executed, the position of the commodity in the environment (store) in the 3D three-dimensional space can be known . Therefore, according to the information provided in step 740 and step 745, step 750 can infer which products are covered by the customer's gaze area, that is, "which products the customer in the picture is looking at". Wherein, the customer's gaze area itself is further divided into various sub-areas with different probabilities. Therefore, the above "which products the customer in the picture is looking at" can be expressed by "what products are covered in each sub-area of the customer's gaze area and the probability of being gazed at", or choose "the highest value in the customer's gaze area". Commodities of one or more subregions of probability". If the customer's gaze area or its sub-area covers 2 or more types of products, these products will be included. According to another embodiment of the present invention, the product that each customer is looking at in the picture is not represented by the item (category) of the product, but is represented by the plane coordinates of the product in the picture, or the product in the picture. coordinates in space.

步驟755中，係將以上有關顧客和商品的資料記錄於資料庫450，並在圖片上產生視覺化(Visualization)的效果。依據本發明一實施例，此處所謂「視覺化的效果」係指在圖片中以文字、數字或符號顯示所辨識出的顧客身份(ID)與其所注視的商品。依據本發明的另一實施例，圖片中商品的視覺化效果以可以該商品的「座標值」來代替文字、數字或符號來顯示。此外，圖片中顧客臉/頭部的位置、商品的位置也都可以加入方框(或其他形狀的標示)來凸顯(highlight)其視覺化效果，如圖8B所示。依據本發明的另一實施例，此處所謂「視覺化的效果」，除了上述在圖片中以文字、數字或符號顯示所辨識出的顧客身份(ID)與其所注視的商品之外，更在圖片中顧客臉/頭部的位置、商品的位置，以「加強這些位置區域中像素的亮度」的方式來達成。依據本發明的再一實施例，此處所謂「視覺化的效果」，除了上述在原圖片中以文字、數字或符號顯示所辨識出的顧客身份(ID)與其所注視的商品之外，更以「圖片中除了顧客臉/頭部的位置、商品的位置之外其他區域的像素減少亮度(變暗)」的方式來達成；而圖片中顧客臉/頭部的位置、商品的位置的像素的亮度則可維持正常不變。又，請參考圖8C，圖8C中的各種商品並非都是以完全正面的方式來擺放，部分的商品有一定的旋轉角度。但經過訓練階段中利用資料庫450內的資料集(Data Set)做深度學習後，步驟755中影像分析應用程式460仍能辨識出各商品並以帶有顏色的方框標示出來，並在圖片中該商品的旁邊加入代表商品名稱的文字或符號。 In step 755, the above information about customers and products is recorded in the database 450, and a visualization (Visualization) effect is produced on the picture. According to an embodiment of the present invention, the so-called "visual effect" here refers to displaying the identified customer ID and the product he is looking at with words, numbers or symbols in the picture. According to another embodiment of the present invention, the visual effect of the product in the picture can be displayed by using the "coordinate value" of the product instead of words, numbers or symbols. In addition, the position of the customer's face/head and the position of the product in the picture can also be added with a box (or other shape mark) to highlight (highlight) its visual effect, as shown in FIG. 8B . According to another embodiment of the present invention, the so-called "visualization effect" here, in addition to displaying the recognized customer identity (ID) and the product it is looking at in the above-mentioned pictures with words, numbers or symbols, also The position of the customer's face/head and the position of the product in the picture are achieved by "enhancing the brightness of the pixels in these position areas". According to yet another embodiment of the present invention, the so-called "visualization effect" here, in addition to displaying the identified customer identity (ID) and the product he is looking at in the original picture with text, numbers or symbols, also uses "In addition to the position of the customer's face/head and the position of the product in the picture, the brightness of the pixels in other areas is reduced (darkened)"; and the pixels of the position of the customer's face/head and the position of the product in the picture are Brightness can remain normal. Also, please refer to FIG. 8C , the various commodities in FIG. 8C are not all placed in a completely frontal manner, and some commodities have a certain rotation angle. However, after deep learning using the data set (Data Set) in the database 450 in the training phase, the image analysis application 460 in step 755 can still identify each product and mark it with a colored box, and display it in the picture Add words or symbols representing the name of the product next to the product in .

請參考圖10，圖10係繪示步驟755中有關記錄顧客及其所注視商品的一範例。步驟755中會將前面各步驟得到的顧客相關資料、商品相關資料、顧客持續注視商品的時間等資訊記錄(儲存)於資料庫450內。根據本發明一實施例，上述記錄於資料庫450的各種相關資訊包含圖10所示的各欄位的資料。圖10的表格中包含有複數個項目，每一項目記錄一位顧客的身分(ID)及其相關屬性資料，例如：性別、年紀、及/或職業；還有記錄該顧客注視(有興趣)商品的相關資料，例如：「商品名稱」、「商品位置」、「持續關注時間」、「注視等級」…等；此外，亦會記錄「日期/時間」供做參考。其中「持續關注時間」是某一顧客關注某一商品的持續時間，基本上該持續時間應該都會大於一設定的最小注視時間長度，例如：2秒鐘，但不以此為限。「商品位置」是指該商品於環境(賣場)中的實際位置，可以用3D立體空間的三維座標來表示(如圖10中所示)、或是以事先定義的代號表示(如123-456表示123走道的456貨架)。根據本發明一實施例，取得一商品的位置後可以用來輔助確認該商品的名稱。「年紀」、「性別」、與職業(圖10中未示)等顧客的屬性資料，需視實際情況來決定是否記錄(儲存)於資料庫450中。若辨識顧客ID時不啟用此類辨識屬性資料相關的功能模組，則圖10中的「年紀」與「性別」的資料可以省略不記錄。「注視等級」的資料係表示此商品引起該顧客興趣的強度(Intensity)，其可以是以數值代表的等級，例如：以1到5表示由弱到強的五種等級，但並不以此為限。「注視等級」可以與「持續關注時間」相關連，例如：「持續關注時間」的數值越大表示該商品引起顧客興趣度愈高；此外，若商品位在顧客注視區域中較高機率的子區域，亦會提升「注視等級」。在記錄顧客的ID資料時，若影像分析應用程式460無法從資料庫450中找到對應的顧客資料時，則會以一暫時的代號來替代，例如圖10中的「「F0103」資料。在這種情況下，步驟755中除了以一暫時代號做為顧客的ID記錄外，還需增加該顧客的人臉/頭部的影像資料至資料庫450內，以提供下次當同一顧客再度至賣場購物時辨識其身份所使用。依據本發明的另一實施例，圖10中的「商品位置」中的空間座標也可以平面坐標的資料來取代。 Please refer to FIG. 10 . FIG. 10 is a diagram illustrating an example of recording customers and products they are looking at in step 755 . In step 755, information such as customer-related data, product-related data, and time the customer has been looking at the product obtained in the previous steps will be recorded (stored) in the database 450 . According to an embodiment of the present invention, the various related information recorded in the database 450 includes the data in each field shown in FIG. 10 . Table of Figure 10 The grid contains a plurality of items, and each item records a customer's identity (ID) and its related attribute data, such as: gender, age, and/or occupation; there is also a record of the customer's attention (interested) product related Data, such as: "Product Name", "Product Location", "Continued Attention Time", "Focus Level"... etc.; in addition, "Date/Time" will also be recorded for reference. Among them, "sustained attention time" is the duration of a certain customer's attention to a certain product. Basically, the duration should be greater than a set minimum duration of gaze, for example: 2 seconds, but not limited thereto. "Commodity location" refers to the actual location of the commodity in the environment (store), which can be represented by three-dimensional coordinates in a 3D space (as shown in Figure 10), or by a pre-defined code (such as 123-456 Indicates 456 shelves of 123 aisles). According to an embodiment of the present invention, after obtaining the position of a commodity, it can be used to assist in confirming the name of the commodity. Whether to record (store) customer attribute data such as "age", "gender", and occupation (not shown in FIG. 10 ) in the database 450 depends on the actual situation. If the functional modules related to this type of identification attribute data are not activated when identifying the customer ID, the data of "age" and "sex" in Figure 10 can be omitted and not recorded. The data of "Attention Rating" indicates the intensity (Intensity) of this product that arouses the customer's interest. It can be a grade represented by a numerical value, for example: 1 to 5 represents five grades from weak to strong, but it does not mean limit. "Focus level" can be related to "sustained attention time". For example, the larger the value of "sustained attention time", the higher the customer interest in the product; The area will also increase the "Attention Level". When recording the customer's ID data, if the image analysis application 460 cannot find the corresponding customer data from the database 450, it will replace it with a temporary code, such as ""F0103" data in FIG. 10 . In this case, in step 755, in addition to using a temporary code as the ID record of the customer, it is also necessary to add the image data of the customer's face/head to the database 450, so as to provide the same customer next time It is used to identify the identity when shopping in the store. According to another embodiment of the present invention, the spatial coordinates in the "commodity position" in FIG. 10 can also be replaced by plane coordinate data.

步驟760中，係將以上步驟取得的各種有關的資料更新至資料庫450 內，並累積增加記錄內容用以供做統計分析之用。統計分析的資料量可以是一段時間(例如：1小時或是1天)的記錄，例如圖10所示的圖表；或是一段較長時間的累積資料(例如：一星期、一個月、一季、或是一年)。不同資料數量所做出的統計分析結果各有其意義，端視使用者的需求而定。統計分析的項目可以是，例如：「賣場中哪一些商品是容易引起購物顧客的興趣」、「某一些商品引起購物顧客興趣的性別、年齡層」、「最不容易引起購物顧客興趣的商品」、「某一商品擺放於賣場的何處最容易引起購物顧客的興趣」、「同款但不同色的商品中，何種顏色是容易引起購物顧客的興趣」…等等可供使用者利用的資訊。 In step 760, the various relevant data obtained in the above steps are updated to the database 450 , and accumulatively increase the recorded content for statistical analysis. The amount of data for statistical analysis can be a record of a period of time (for example: 1 hour or 1 day), such as the chart shown in Figure 10; or a long period of accumulated data (for example: a week, a month, a season, or a year). Statistical analysis results made with different amounts of data have their own significance, depending on the needs of users. The items of statistical analysis can be, for example: "Which products in the store are likely to arouse the interest of shoppers", "Gender and age group of certain products that arouse the interest of shoppers", "Products that are least likely to arouse the interest of shoppers" , "Where is a product placed in the store is most likely to arouse the interest of shoppers", "Among the products of the same style but different colors, which color is likely to arouse the interest of shoppers"... etc. can be used by users information.

步驟765中，係根據步驟760中的統計分析結果，產出分析報告與/或建議給客戶。產出的分析報告可能包含加入視覺化效果後的影像資料、分析資料後所得到的有用資訊、分析後的建議等等。其中，分析後的建議可以是，例如：「某種商品的補貨數量及補貨預估時間」、「某種不受顧客青睞的商品更換到某一區域內來擺設」、「某種商品與其價格的促銷時機」、「某種商品的最佳擺設位置」…等對客戶來說是基於大數據(big data)分析後的可幫助銷售的有用建議。 In step 765, according to the statistical analysis results in step 760, an analysis report and/or suggestion is produced to the client. The generated analysis report may include image data after adding visual effects, useful information obtained after analyzing the data, suggestions after analysis, etc. Among them, the suggestions after analysis can be, for example: "the replenishment quantity and estimated time of replenishment of a certain product", "a certain product that is not favored by customers is replaced in a certain area for display", "a certain product The timing of the promotion and its price", "the best place to display a certain product"... etc. are useful suggestions for customers based on big data analysis that can help sales.

步驟765中「產出分析報告與建議」的執行，可能由於各行業的特殊性之不同，也可能基於客戶的實際需要而步驟765予以省略。步驟765執行完畢後也結束了圖9的流程。 The execution of "output analysis report and suggestion" in step 765 may be omitted due to the particularity of each industry or based on the actual needs of customers. After step 765 is executed, the flow of FIG. 9 also ends.

請參考圖11A，圖11A係繪示本發明中參考頭部方向(Head Direction)以模擬投影光線的方式推算顧客可能注視(有興趣)的區域的示意圖。圖11A中分別包括了2個呈現不同角度的頭部800A與800B示意圖，其中頭部800A的頭部方向1(Head Direction 1)係以(Roll1,Pitch1,Yaw1)表示其在空間中呈現的三個旋轉角度。(Roll1,Pitch1,Yaw1)中的3個角度的數值可以代表該頭部方向1(Head Direction 1)於立體空間中所呈現的偏轉態樣。同樣地，頭部800B的頭部方向2(Head Direction 2)係以(Roll2,Pitch2,Yaw2)表示其在空間中呈現的三個旋轉角度。(Roll2,Pitch2,Yaw2)中的3個角度的數值可以代表該頭部方向2(Head Direction 2)於立體空間中所呈現的偏轉態樣。依據本發明的一實施例，本發明中以模擬投影光線的方式推算顧客可能注視(有興趣)的區域，說明如下。假設於立體空間中在頭部800A、800B的正後方有一光源(Light Source)，分別以頭部方向1(Head Direction 1)與頭部方向2(Head Direction 2)往頭部前方投射光線。此一模擬的光源從頭部的正後方往頭部前方投射光線，模擬的投射光線在立體空間中會形成一錐狀體或圓柱體。當模擬光線一直往前投射至物品或商品的所在之處，便在該處假設有一模擬屏幕(Screen)810，而該模擬光線的錐狀體或圓柱體在該模擬屏幕810上的投影(Projection)即是顧客可能注視(有興趣)的區域範圍。圖11A中顯示，以不同頭部方向(Head Direction 1與Head Direction 2)投射光線後，在同一處的模擬屏幕810上分別產生不同的投影815A與815B。該投影815A與投影815B即是推算出來之頭部800A與頭部800B各自的注視區域範圍，而位於該投影815A與投影815B的區域範圍內的物品，即可推測是顧客注視或有興趣之物品(商品)。 Please refer to FIG. 11A . FIG. 11A is a schematic diagram of the present invention, referring to the head direction (Head Direction) and simulating the projected light to estimate the possible gaze (interesting) area of the customer. FIG. 11A includes two schematic diagrams of heads 800A and 800B presenting different angles, wherein the head direction 1 (Head Direction 1) of the head 800A is represented by (Roll1, Pitch1, Yaw1) in three directions in space. rotation angle Spend. The values of the three angles in (Roll1, Pitch1, Yaw1) can represent the deflection state of the head direction 1 (Head Direction 1) in the three-dimensional space. Similarly, the Head Direction 2 (Head Direction 2) of the head 800B is represented by (Roll2, Pitch2, Yaw2) three rotation angles in space. The values of the three angles in (Roll2, Pitch2, Yaw2) can represent the deflection of the head direction 2 (Head Direction 2) in the three-dimensional space. According to an embodiment of the present invention, in the present invention, the area that the customer may gaze at (interested in) is estimated by simulating projection light, as described below. Assume that there is a light source (Light Source) directly behind the heads 800A and 800B in the three-dimensional space, and project light toward the front of the head in Head Direction 1 and Head Direction 2 respectively. This simulated light source projects light from the back of the head to the front of the head, and the simulated projected light will form a cone or cylinder in the three-dimensional space. When the simulated light is projected forward to the location of the article or commodity, it is assumed that there is a simulated screen (Screen) 810, and the projection (Projection) of the cone or cylinder of the simulated light on the simulated screen 810 ) is the range of areas that customers may look at (interested in). As shown in FIG. 11A , after projecting light rays in different head directions (Head Direction 1 and Head Direction 2 ), different projections 815A and 815B are produced respectively on the same simulation screen 810 . The projection 815A and the projection 815B are the calculated gaze areas of the head 800A and the head 800B, and the items located within the area of the projection 815A and the projection 815B can be inferred to be the items that the customer is watching or interested in. (commodity).

一般情況下，人們很少會以眼睛的餘光來看物體，大部分的情況下是以與頭部方向差不多的視線來往前注視物體。當設定眼睛的視野區域局限於與頭部同寬時，如圖11B中以虛線805C與805D所表示的視野區域，則推算出的可能注視區域範圍會很有限，如圖11B中的區域範圍820；反之，當眼睛的視野區域擴大到眼睛餘光可及之處時，如圖11B中以虛線805A與805B所表示的視野區域，則推算出的可能的注視區域範圍會變大許多，如圖11B中的區域範圍815A。又，圖11A中所示的投影815A、815B可能是圖11B中繪示的任一橢圓區域820、 830、840。根據本發明技術，顧客(頭部800A與800B)於立體空間中的所在位置可以推算出來，而顧客(頭部800A與800B)至該虛擬屏幕810(也就是物品所在處)的距離也是一可以推測的數值，加上利用深度學習後得到的「特徵對映(feature mapping)」判斷顧客的頭部800A與800B於立體空間中所呈現的角度(Head Direction 1及Head Direction 2)，將上述的各種資料輸入影像分析應用程式460後，便可以計算出顧客可能注視(有興趣)的區域(投影815A與投影815B)。 Under normal circumstances, people rarely look at objects with the peripheral vision of the eyes, and in most cases look forward at the objects with a line of sight similar to the direction of the head. When the field of view of the eyes is set to be limited to the same width as the head, such as the field of view represented by dotted lines 805C and 805D in Figure 11B, the estimated range of possible gaze areas will be very limited, such as the area range 820 in Figure 11B ; Conversely, when the field of vision of the eye expands to the place where the peripheral vision of the eye can reach, as shown in the field of view represented by dotted lines 805A and 805B in Figure 11B, the deduced possible gaze area will become much larger, as shown in Figure 11B Area extent 815A in 11B. Also, the projections 815A, 815B shown in FIG. 11A may be any of the elliptical regions 820, 820, 830, 840. According to the technology of the present invention, the position of the customer (head 800A and 800B) in the three-dimensional space can be calculated, and the distance from the customer (head 800A and 800B) to the virtual screen 810 (that is, where the item is located) can also be calculated. The estimated value is added to the angles (Head Direction 1 and Head Direction 2) of the customer's head 800A and 800B in the three-dimensional space judged by the "feature mapping" obtained after deep learning, and the above-mentioned After various data are input into the image analysis application program 460, the area (projection 815A and projection 815B) that the customer may look at (interested in) can be calculated.

請參考圖11B，圖11B係以另一角度呈現圖11A中模擬投影的方式所形成的可能注視區域範圍與眼睛視野之示意圖。圖11B中，虛線805A、805B表示眼睛餘光所及的視野，並以其定義投影範圍840是；而虛線805C、805D則表示與頭部800A同寬的視野，並以其定義投影範圍820。承前面所述，以眼睛餘光所及的視野805A、805B所形成的區域範圍840為最大；以與頭部800A同寬的視野投射所形成的區域範圍820為最小。在一般的情況下，眼睛視線總是以與頭部方向一致的方向往前看，故位於最中心的區域範圍820是顧客注視(有興趣)的區域的機率最高。舉例來說，假設顧客有95%的機率都在看最中心的區域範圍820裡的物品(商品)，再外圈的區域範圍830去掉區域範圍820的環狀部分可能是65%的機率，最外圈的區域範圍840去掉區域範圍830的環狀部分可能只剩15%的機率。依據本發明的另一實施例，當可以清楚地辨識圖片中人臉的眼睛時，可以依據辨識出的眼睛視線來調整計算方法，而可以得到一個更精確的區域範圍820E，而該區域範圍820E的機率百分比應該是高於區域範圍820內的95%，圖11B中所顯示該區域範圍820E的機率百分比為98%。 Please refer to FIG. 11B . FIG. 11B is a schematic view showing the possible gaze area and eye field of view formed by the simulated projection in FIG. 11A from another angle. In FIG. 11B , dotted lines 805A and 805B represent the field of view of the peripheral vision of the eyes, and define the projection range 840 therein; According to the foregoing, the area range 840 formed by the peripheral vision of the eyes 805A and 805B is the largest; the area range 820 formed by the projection of the field of view with the same width as the head 800A is the smallest. In general, the eyesight is always looking forward in the same direction as the head, so the most central region 820 is the region where the customer is gazing (interested) with the highest probability. For example, assuming that the customer has a 95% probability of looking at the article (commodity) in the most central area range 820, and the area range 830 of the outer ring removes the ring part of the area range 820 may be a 65% probability. The area range 840 of the outer circle may only have a probability of 15% if the circular part of the area range 830 is removed. According to another embodiment of the present invention, when the eyes of the human face in the picture can be clearly identified, the calculation method can be adjusted according to the identified eye sight line, and a more accurate area range 820E can be obtained, and the area range 820E The percentage probability of , should be higher than 95% within the area range 820, which is 98% for the area range 820E shown in FIG. 11B.

本發明中所揭露的技術中與先前技術相較，先前技術必須要透過2個鏡頭同時記錄一人物的臉部中心線、眼睛視線的方向(眼球方向)的資訊才可以得知該人物正在注視的物體為何。而本發明所揭露的技術只需透過1鏡頭來擷取人物的頭部影像，即使沒有看到臉部也可知道該人物正在注視何種物體。所以本發明所揭露的方法適合使用在影像資料中包含許多人/物體的場合，且該方法只需用到單一影像攝影機(鏡頭)，同時影像資料的畫素(解析度)要求也不需要高到可以看清楚人臉上的眼球的程度。所以使用本發明的方法，可以簡單地達到辨識圖片中的哪些人正在注視哪些物體的目的。 Compared with the prior art in the technology disclosed in the present invention, the information of the center line of a person’s face and the direction of the eye line of sight (eyeball direction) must be recorded simultaneously through two lenses in the prior art to obtain the information. Know what the character is looking at. However, the technology disclosed in the present invention only needs to capture the head image of the person through one lens, and can know what kind of object the person is looking at even if the face is not seen. Therefore, the method disclosed in the present invention is suitable for the occasions where many people/objects are included in the image data, and the method only needs to use a single image camera (lens), and the pixel (resolution) requirement of the image data does not need to be high. To the extent that the eyeballs on a human face can be seen clearly. Therefore, using the method of the present invention, the purpose of identifying which people in the picture are looking at which objects can be easily achieved.

請同時參考圖12A與圖12B，依據本發明的一實施例，圖12A係一智慧電子看板系統(Intelligent Electric Signage System,IESS)100M的架構圖。該系統100M包括了一影像處理裝置(Image Processing Device)300M、一影音顯示裝置(Video/Audio Display Device)850、一影像擷取裝置(Image Capture Device)400M、一內容伺服器(Content Server,CS)870與一媒體播放器(Media Player)880。其中影像處理裝置300M與影像擷取裝置400M之間、影像處理裝置300M與媒體播放器880之間、以及媒體播放器880與影音顯示裝置850之間，係各自透過傳輸線(Transmission Line)855互相連接；影像處理裝置300M、媒體播放器880兩者與內容伺服器870之間，則透過網路860或傳輸線互相連接以傳送資料。該網路860可以是一區域網路(LAN)、一廣域網路(WAN)、一網際網路(Internet)、或一無線網路(Wireless Network)等。依據本發明的另一實施例，影像處理裝置300M與影像擷取裝置400M之間不是透過傳輸線855傳送資料，而是透過該網路860或是另一網路互相連接以傳送資料。依據本發明的另一實施例，影像處理裝置300M與媒體播放器880間並非透過傳輸線855傳送資料，而是透過該網路860或是另一網路互相連接以傳送資料。依據本發明的一實施例，影像擷取裝置400M可以獨立設置於影音顯示裝置850的附近；依據本發明的另一實施例，影像擷取裝置400M係與影音顯示裝置850整合在一起成為單一裝置。 Please refer to FIG. 12A and FIG. 12B at the same time. According to an embodiment of the present invention, FIG. 12A is a structure diagram of an intelligent electric signage system (Intelligent Electric Signage System, IESS) 100M. The system 100M includes an image processing device (Image Processing Device) 300M, an audio-visual display device (Video/Audio Display Device) 850, an image capture device (Image Capture Device) 400M, a content server (Content Server, CS ) 870 and a media player (Media Player) 880. Among them, between the image processing device 300M and the image capture device 400M, between the image processing device 300M and the media player 880, and between the media player 880 and the audio-visual display device 850 are connected to each other through transmission lines (Transmission Line) 855 ; The image processing device 300M, the media player 880 and the content server 870 are connected to each other through the network 860 or a transmission line to transmit data. The network 860 can be a local area network (LAN), a wide area network (WAN), an Internet (Internet), or a wireless network (Wireless Network) and so on. According to another embodiment of the present invention, instead of transmitting data through the transmission line 855, the image processing device 300M and the image capturing device 400M are connected to each other through the network 860 or another network to transmit data. According to another embodiment of the present invention, the image processing device 300M and the media player 880 do not transmit data through the transmission line 855, but are connected to each other through the network 860 or another network to transmit data. According to one embodiment of the present invention, the image capture device 400M can be independently installed near the audio-visual display device 850; according to another embodiment of the present invention, the image capture device 400M is integrated with the audio-visual display device 850 to form a single device .

依據本發明的一實施例，影像處理裝置300M及/或內容伺服器870可以是圖2中雲資料儲存單元150內的一虛擬機器(Virtual Machine,VM)，執行於該虛擬機器上的作業系統亦由雲資料儲存單元150的供應商來提供。請參考圖13B，其中的影像分析應用程式460M可於雲資料儲存單元150提供的虛擬機器上的作業系統中執行；也就是說，使用者上傳影像分析應用程式460M到雲資料儲存單元150中，然後透過其中的虛擬機器來執行該影像分析應用程式460M。在此實施例，影像擷取裝置400M所擷取的影像資料會先透過網路860傳送到雲資料儲存單元150，以提供給位於雲端中的虛擬機器(亦即，影像處理裝置300M)分析處理。 According to an embodiment of the present invention, the image processing device 300M and/or the content server 870 may be a virtual machine (Virtual Machine, VM) in the cloud data storage unit 150 in FIG. It is also provided by the provider of the cloud data storage unit 150 . Please refer to FIG. 13B , where the image analysis application 460M can be executed in the operating system on the virtual machine provided by the cloud data storage unit 150; that is, the user uploads the image analysis application 460M to the cloud data storage unit 150, Then execute the image analysis application program 460M through the virtual machine therein. In this embodiment, the image data captured by the image capture device 400M will first be sent to the cloud data storage unit 150 through the network 860, so as to be provided to the virtual machine (ie, the image processing device 300M) located in the cloud for analysis and processing. .

圖12A中，影像擷取裝置400M係用來擷取位於前方的人及/或物體。圖12A中的影像擷取裝置400M與圖2中的影像擷取裝置400A-400N是類似的裝置，彼此間的功能差異不大，有關影像擷取裝置400M的詳細細節請參考前文中關於圖2之影像擷取裝置400A-400N的說明。於本發明的一實施例，當影像擷取裝置400M與影音顯示裝置850整合為單一裝置時，影像擷取裝置400M可以安置於影音顯示裝置850的上方、側邊、或下方處，以記錄正在觀看影音顯示裝置850中之影像資料的人們。基本上，影像擷取裝置400M內部只包含一個影像鏡頭即可實施本發明所揭露的技術；但在其它實施例中，影像擷取裝置400M內部亦可以裝設多個鏡頭，以符合特殊應用的需求。透過影像擷取裝置400M所擷取的影像資料，影像處理裝置300M可以偵測站在影像擷取裝置400M(亦即，影音顯示裝置850)前面的人們的視線方向，以判定是否有人、有多少人、以及哪些人正在注視或觀看影音顯示裝置850顯示的內容。故，影像擷取裝置400M的設置位置只要能達到以上目的(亦即，可以記錄人們臉部(視線方向)的位置處)即可，不限於要設置在影音顯示裝置850的上方或其它特定位置。 In FIG. 12A , the image capturing device 400M is used to capture people and/or objects in front. The image capture device 400M in FIG. 12A is similar to the image capture devices 400A-400N in FIG. 2 , and their functions are not much different from each other. For details about the image capture device 400M, please refer to FIG. 2 above. Description of the image capture devices 400A-400N. In one embodiment of the present invention, when the image capture device 400M and the audio-visual display device 850 are integrated into a single device, the image capture device 400M can be placed above, on the side, or below the audio-visual display device 850 to record the People who watch the image data in the audio-visual display device 850 . Basically, the image capture device 400M only includes one image lens to implement the technology disclosed in the present invention; however, in other embodiments, the image capture device 400M can also be equipped with multiple lenses to meet the needs of special applications. need. Through the image data captured by the image capture device 400M, the image processing device 300M can detect the direction of sight of people standing in front of the image capture device 400M (that is, the audio-visual display device 850 ), to determine whether there are people and how many people there are. people, and which people are watching or watching the content displayed by the audio-visual display device 850 . Therefore, as long as the installation position of the image capture device 400M can achieve the above purpose (that is, the position where people's faces (direction of sight) can be recorded), it is not limited to be installed above the audio-visual display device 850 or other specific positions. .

圖12A中，本發明之影像處理裝置300M係用來處理與分析來自影像擷取裝置400M的影像資料。該影像處理裝置300M的硬體與軟體架構係與圖3A、圖3B中的影像分析伺服器300相近。請參考圖13A與圖13B，與圖3A、圖3B的影像分析伺服器300相比，影像處理裝置300M只需要一個實體儲存裝置385及單一個作業系統即已足夠實施本發明之智慧電子看板系統；於本發明之另一實施例，亦可使用圖3A與圖3B的影像分析伺服器300做為影像處理裝置300M。關於圖13A與圖13B中有關影像處理裝置300M的各個軟硬體元件，請參考前文中圖3A與圖3B的各個硬體/軟體元件的說明。影像處理裝置300M接收來自影像擷取裝置400M的影像資料，透過影像處理裝置300M上的影像分析應用程式460M做偵測與分析，以判斷正在觀看影音顯示裝置850所播放的影音內容(例如：廣告)的人們的各種屬性(Attributes of People)。又，同步配合媒體播放器880的播放進度及/或內容伺服器870關於複數個影音廣告的內容屬性(Attribute of Contents)的資料，做為之後進一步調整複數個影音廣告的播放內容與順序的參考。當影像處理裝置300M需要記錄觀看影音廣告的人們的各種屬性、媒體播放器880的播放進度、以及複數個影音廣告的內容屬性等各項相關的統計資料時，且這些統計資料需要透過一手持裝置(Hand Hold Device)輸出時，則圖13A中的影像處理裝置300M的基本架構更包含一無線傳輸裝置(Wireless Transmission Device，圖未示)，例如：WiFi裝置或是藍芽(Bluetooth)裝置。上述的手持裝置可以是，例如：智慧型手機(Smart phone)、平板電腦(Tablet PC)、筆記型電腦(Notebook PC)等。 In FIG. 12A , the image processing device 300M of the present invention is used to process and analyze the image data from the image capture device 400M. The hardware and software architecture of the image processing device 300M is similar to the image analysis server 300 in FIGS. 3A and 3B . Please refer to FIG. 13A and FIG. 13B . Compared with the image analysis server 300 in FIG. 3A and FIG. 3B , the image processing device 300M only needs one physical storage device 385 and a single operating system, which is enough to implement the smart electronic kanban system of the present invention. ; In another embodiment of the present invention, the image analysis server 300 in FIG. 3A and FIG. 3B can also be used as the image processing device 300M. For the software and hardware components of the image processing device 300M in FIG. 13A and FIG. 13B , please refer to the description of the hardware/software components in FIG. 3A and FIG. 3B above. The image processing device 300M receives the image data from the image capture device 400M, and performs detection and analysis through the image analysis application program 460M on the image processing device 300M, so as to determine whether the audio-visual content played by the audio-visual display device 850 is being watched (for example: advertisement ) of people's various attributes (Attributes of People). In addition, synchronously cooperate with the playback progress of the media player 880 and/or the data of the content server 870 on the content attributes (Attribute of Contents) of multiple audio-visual advertisements, as a reference for further adjusting the playback content and order of multiple audio-visual advertisements . When the image processing device 300M needs to record the various attributes of the people watching the audio-visual advertisements, the playback progress of the media player 880, and the content attributes of multiple audio-visual advertisements, etc., and these statistical data need to be passed through a handheld device (Hand Hold Device) output, the basic structure of the image processing device 300M in FIG. 13A further includes a wireless transmission device (Wireless Transmission Device, not shown), such as a WiFi device or a Bluetooth (Bluetooth) device. The above-mentioned handheld device may be, for example, a smart phone (Smart phone), a tablet computer (Tablet PC), a notebook computer (Notebook PC) and the like.

圖12A中的媒體播放器(Media Player)880係透過網路860將來自內容伺服器870的串流影音資料(Streaming Video/Audio Data)加以解碼重組成影像資料與聲音資料後，透過傳輸線855輸出給影音顯示裝置850；同時媒體播放器880也會將即時的播放進度傳送給影像處理裝置300M。依據本發明的另一實施例，媒體播放器880只有將影像資料與聲音資料透過傳輸線855輸出給影音顯示裝置850，並不會將即時的播放進度傳送給影像處理裝置300M。上述傳送影像資料與聲音資料至影音顯示裝置850的傳輸線855可以是，例如：與HDMI端子匹配的傳輸線、與A/V端子匹配的傳輸線…等。圖12A中的媒體播放器880實作上係以硬體來達成。依據本發明的另一實施例，媒體播放器880也可用軟體來實施，例如，圖13B中的一多媒體播放程式465M，其係為在影像處理裝置300M內以軟體來執行串流影音的播放的方法。 The media player (Media Player) 880 in FIG. 12A decodes the streaming video/audio data (Streaming Video/Audio Data) from the content server 870 through the network 860 and recombines them into video data and audio data, and outputs them through the transmission line 855 to the audio-visual display device 850; while the media player 880 also The real-time playback progress will be sent to the image processing device 300M. According to another embodiment of the present invention, the media player 880 only outputs the image data and audio data to the audio-visual display device 850 through the transmission line 855, and does not transmit the real-time playback progress to the image processing device 300M. The above-mentioned transmission line 855 for transmitting image data and audio data to the audio-visual display device 850 may be, for example, a transmission line matching an HDMI terminal, a transmission line matching an A/V terminal, etc. The media player 880 in FIG. 12A is implemented by hardware. According to another embodiment of the present invention, the media player 880 can also be implemented by software. For example, a multimedia player program 465M in FIG. method.

圖12A中的影音顯示裝置850係用來顯示該媒體播放器880所傳送的串流影音資料，其內部基本上包括一顯示器(面板)890與一揚聲器系統(圖未標號)。該串流影音資料是由內容伺服器870透過網路860傳送出來，經媒體播放器880內的影音解碼器的解碼後，將其中的影像資料輸出至該顯示器890來顯示影音廣告，而解碼後的聲音資料則送至揚聲器系統來播放聲音。依據本發明的另一實施例，影音顯示裝置850更具有一網路介面，並非透過傳輸線855而是透過網路860接收來自媒體播放器880的影像資料與聲音資料。 The audio-visual display device 850 in FIG. 12A is used to display the streaming audio-visual data transmitted by the media player 880, and its interior basically includes a display (panel) 890 and a speaker system (not labeled). The streaming audio-visual data is sent by the content server 870 through the network 860, and after being decoded by the video-audio decoder in the media player 880, the video data in it is output to the display 890 to display audio-visual advertisements, and after decoding The sound data is sent to the speaker system to play the sound. According to another embodiment of the present invention, the audio-visual display device 850 further has a network interface for receiving video data and audio data from the media player 880 through the network 860 instead of the transmission line 855 .

依據本發明的另一實施例，圖12A中的影像處理裝置300M與媒體播放器880可以整合為一單一裝置；此外，影像擷取裝置400M與影音顯示裝置850亦可以整合為一單一裝置。依據本發明的另一實施例，上述影像處理裝置300M、媒體播放器880、影像擷取裝置400M與影音顯示裝置850四者可以整合為一單一裝置。 According to another embodiment of the present invention, the image processing device 300M and the media player 880 in FIG. 12A can be integrated into a single device; in addition, the image capture device 400M and the audio-visual display device 850 can also be integrated into a single device. According to another embodiment of the present invention, the image processing device 300M, the media player 880 , the image capture device 400M, and the audio-visual display device 850 may be integrated into a single device.

內容伺服器870係為一具有網路介面的伺服器(Server)。基本上，內容伺服器870的硬體架構與一般伺服器的差異不大；但在軟體內容上，內容伺服器870會提供多種不同類型的影音檔案(例如：廣告)。這些多種不同類型的影音檔案(例如：廣告)的內容係針對不同族群的人所設計，其中所謂不同族群的人可以是，例如：老年的男/女、年輕的男/女、小孩、肥胖的男/女、戴眼鏡的男/女、飼養寵物的男/女…等各種不同屬性的族群。影音檔案(例如：廣告)的內容中牽涉包括食、衣、住、行、育、樂等各領域的產品或服務。又，上述每一影音檔案(例如：廣告)所針對的族群、播放順序、影片長度…等各種資訊為該影音檔案(例如：廣告)的「內容屬性」(Attribute of Contents)。內容伺服器870可以24小時不間斷或特定時間的方式循環上述多種不同類型的影音檔案檔案(例如：廣告)，並將該些影音檔案(例如：廣告)轉換成影音串流資料格式並透過網路860傳送給媒體播放器880。圖12A中的內容伺服器870會將該影音檔案(例如：廣告)的內容屬性等相關資料透過網路860傳送給影像處理裝置300M。此外，內容伺服器870也會透過網路860接受影像處理裝置300M的指令而改變影音檔案(例如：廣告)的播放行為，例如：中斷播放、變更播放順序、影像快轉、音量調整、顯示特定文字…等。 The content server 870 is a server with a network interface. Basically, the content The hardware structure of the server 870 is not much different from that of general servers; however, in terms of software content, the content server 870 will provide various types of video and audio files (for example: advertisements). The content of these various types of audio-visual files (for example: advertisements) is designed for people of different ethnic groups, wherein the so-called people of different ethnic groups can be, for example: old men/women, young men/women, children, obese Male/female, male/female with glasses, male/female with pets... and other ethnic groups with different attributes. The content of audio-visual files (eg advertisements) involves products or services in various fields including food, clothing, housing, transportation, education, and entertainment. In addition, various information such as the group targeted by each audio-visual file (for example: advertisement), the playing sequence, the length of the video, etc. are the "Attribute of Contents" of the audio-visual file (for example: advertisement). The content server 870 can circulate the above-mentioned various types of audio-visual files (for example: advertisements) in a 24-hour uninterrupted or specific time manner, and convert these audio-visual files (such as: advertisements) into audio-visual streaming data formats and transmit them through the network. Road 860 is sent to media player 880. The content server 870 in FIG. 12A transmits relevant data such as content attributes of the audio-visual file (eg, advertisement) to the image processing device 300M through the network 860 . In addition, the content server 870 will also accept instructions from the image processing device 300M through the network 860 to change the playback behavior of audio-visual files (such as: advertisements), such as: interrupt playback, change the playback sequence, fast-forward images, adjust volume, display specific text...etc.

請參考圖12B，圖12B係依據本發明的一實施例中所示的智慧電子看板系統100M的應用情境圖。圖12B中顯示影音顯示裝置850前有2位顧客正在觀看目前播出的廣告內容，位於影音顯示裝置850上方的影像擷取裝置400M正在擷取2位訪客的表情與反應，同一時間影像擷取裝置400M也將所記錄的影像資料傳送給影像處理裝置300M來分析。影像處理裝置300M透過內部的影像分析應用程式460M即時地分析並判斷出2位訪客的性別、年紀、臉部表情、衣著、身高…等屬性資料。上述有關訪客的各種屬性資料與播出的影音廣告的內容有著極高的關連性，此種關連性會使播出的影音廣告吸引到與其最相符的顧客群的注意，以發揮最大的廣告效益。 Please refer to FIG. 12B . FIG. 12B is an application scenario diagram of the smart electronic kanban system 100M shown in an embodiment of the present invention. Figure 12B shows that there are 2 customers in front of the audio-visual display device 850 watching the currently broadcast advertisement content, and the image capture device 400M located above the audio-visual display device 850 is capturing the expressions and reactions of the 2 visitors. The device 400M also transmits the recorded image data to the image processing device 300M for analysis. The image processing device 300M analyzes and judges the gender, age, facial expression, clothing, height... and other attribute data of the two visitors in real time through the internal image analysis application program 460M. The above-mentioned various attribute data about visitors are highly correlated with the contents of the broadcasted audio-visual advertisements. Such correlation will make the broadcasted audio-visual advertisements attract the attention of the most suitable customer group, so as to Get the most out of your advertising.

請參考圖13A，圖13A係依據本發明一實施例所繪示圖12A中的該影像處理裝置300M其硬體的基本架構方塊圖。圖13A中該影像處理裝置300M的硬體架構與圖3A中該影像分析伺服器300的硬體架構幾乎相同。差異的地方是該影像處理裝置300M可能進一步包括一無線傳輸裝置(Wireless Transmission Device,圖未示)，該無線傳輸裝置係讓該影像處理裝置300M與手持裝置間以無線的方式來傳輸資料。圖13A中各硬體架構請參考本說明書中圖3A中的相關說明。 Please refer to FIG. 13A . FIG. 13A is a block diagram of the basic hardware structure of the image processing device 300M shown in FIG. 12A according to an embodiment of the present invention. The hardware architecture of the image processing device 300M in FIG. 13A is almost the same as the hardware architecture of the image analysis server 300 in FIG. 3A . The difference is that the image processing device 300M may further include a wireless transmission device (Wireless Transmission Device, not shown in the figure), which allows the image processing device 300M and the handheld device to transmit data wirelessly. For each hardware architecture in FIG. 13A, please refer to the relevant description in FIG. 3A in this manual.

請參考圖13B，圖13B係依據本發明一實施例繪示圖12A中的該影像處理裝置300M其軟、硬體的架構關係圖。圖13A中該影像處理裝置300M的硬體架構與圖3A中該影像分析伺服器300的硬體架構幾乎相同，只有該實體儲存裝置的數量及硬體規格上的差異。另外，關於圖13B中的軟體架構，除了少了虛擬機器監視器470外，該影像處理裝置300M的作業系統475、影像分析應用程式460與圖3A中的作業系統475、影像分析應用程式460幾乎都是相同的。此外，圖13B中，該作業系統475中還可能包括一多媒體播放程式465M，該多媒體播放程式465M係以虛線繪示，其表示當不使用該媒體播放器880時，該多媒體播放程式465M(軟體)可以代替該媒體播放器880(硬體)的功能。因該多媒體播放程式465M不一定會在該作業系統475中執行，所以圖13B中該多媒體播放程式465M是以虛線所繪示的一軟體模組。又，依據本發明的一實施例，圖13B中該影像分析應用程式460中的應用模組，除了圖4中所顯示的各種應用模組外，更增加例如：表情偵測模組、身高偵測模組、衣著偵測模組、胖瘦偵測模組、隨身工具偵測模組、隨身寵物偵測模組、職業預測模組…等。關於圖13B中該影像分析伺服器的300軟、硬體的架構，請參考前述圖3B、圖4的說明。 Please refer to FIG. 13B . FIG. 13B is a diagram showing the relationship between software and hardware architecture of the image processing device 300M in FIG. 12A according to an embodiment of the present invention. The hardware architecture of the image processing device 300M in FIG. 13A is almost the same as the hardware architecture of the image analysis server 300 in FIG. 3A , only the number of physical storage devices and hardware specifications are different. In addition, with regard to the software architecture in FIG. 13B , except that the virtual machine monitor 470 is missing, the operating system 475 and the image analysis application program 460 of the image processing device 300M are almost the same as the operating system 475 and the image analysis application program 460 in FIG. 3A are all the same. In addition, in FIG. 13B , the operating system 475 may also include a multimedia player program 465M, which is shown with a dotted line, which indicates that when the media player 880 is not used, the multimedia player program 465M (software ) can replace the function of the media player 880 (hardware). Because the multimedia player program 465M may not be executed in the operating system 475, the multimedia player program 465M in FIG. 13B is a software module shown by a dotted line. Moreover, according to an embodiment of the present invention, the application modules in the image analysis application program 460 in FIG. 13B, in addition to the various application modules shown in FIG. Detection module, clothing detection module, fat and thin detection module, portable tool detection module, portable pet detection module, occupation prediction module...etc. For the software and hardware architecture of the image analysis server 300 in FIG. 13B , please refer to the descriptions of FIG. 3B and FIG. 4 .

請參考圖14，圖14係依據本發明一實施例繪示該智慧電子看板系統100M的整體操作流程圖。該整體操作流程圖其中包含了智慧電子看板人物偵測訓練階段900、智慧電子看板人物調查/統計分析階段910、智慧電子看板應用階段920。其中，該智慧電子看板人物偵測訓練階段900即是圖5、圖6中所表示的訓練階段(圖5的步驟510、圖6的步驟520與步驟610至步驟640)。關於該智慧電子看板人物偵測訓練階段900請參考圖5的步驟510、圖6的步驟520與步驟610至步驟640的說明。 Please refer to FIG. 14 . FIG. 14 is a flowchart illustrating the overall operation of the smart electronic kanban system 100M according to an embodiment of the present invention. The overall operation flow chart includes a stage 900 of detecting and training people on a smart electronic kanban, a stage 910 of character investigation/statistical analysis on a smart electronic kanban, and a stage 920 of applying a smart electronic kanban. Wherein, the person detection training phase 900 of the smart electronic signboard is the training phase shown in FIG. 5 and FIG. 6 (step 510 in FIG. 5 , step 520 and step 610 to step 640 in FIG. 6 ). Please refer to the description of step 510 in FIG. 5 , step 520 and steps 610 to 640 in FIG. 6 for the smart digital signage person detection training stage 900 .

本發明中對於影像資料中人的偵測是基於CNN架構下透過訓練階段時利用資料集(Data Set)做深度學習所得到的人的各種屬性的「特徵對映(feature mapping)」，並在應用階段時透過這些學習到的「特徵對映(feature mapping)」來偵測影像擷取裝置400M擷取的影像資料中符合特徵對映(feature mapping)的部分，進而分析「人的各種屬性」。特別的是：對於圖14中該智慧電子看板系統的人物偵測訓練階段900，基本上只與「人臉/頭部的辨識」、「身體的辨識」等關於「人的辨識」有關，並不包括「物體/商品的辨識」。所以訓練階段時對於各種的影像資料，主要是提供與「人」相關的資料集(DataSet)。該與人相關的資料集除了提供如圖7A的檔案資料外，還包括：各種年齡層(老人/中年人/年輕人/小孩…)的男性/女性的臉部態樣的資料集、身體體態(胖/瘦/適中…)的體態態樣的資料集、臉部表情(喜/怒/哀/樂/驚恐/失望…)的表情態樣的資料集…等。此外，與「人」相關的資料集還可進一步包括「嬰兒」與「寵物」的資料集。若需要估測人的職業時，則還需要進一步提供包括「衣服(各行業的制服)」、「工具(各行業的工具)」…等各種的資料集。關於圖14中的「智慧電子看板人物偵測訓練階段」900，請參考圖5中步驟510、步驟520與圖6中步驟610至步驟640的說明。另，圖14中的「智慧電子看板人物調查/統計分析階段」910實施時的詳細流程將於圖15中說明；圖14中的「智慧電子看板應用階段」920實施時的詳細流程將於圖16中說明。又，圖14中步驟910係以一虛線繪示，其原因為：該系統100M若不需將步驟910的統計資料輸出並加以分析的情況時，則圖14中該步驟910「智慧電子看板人物調查/統計分析階段」可以省略，故步驟910以虛線來繪示。 In the present invention, the detection of people in image data is based on the "feature mapping" of various attributes of people obtained through deep learning through the use of Data Set during the training phase under the CNN architecture, and in In the application stage, use these learned "feature mappings" to detect the parts of the image data captured by the image capture device 400M that meet the feature mappings, and then analyze "various attributes of people" . In particular: for the person detection training stage 900 of the smart electronic kanban system in FIG. "Identification of objects/commodities" is not included. Therefore, for various image data during the training phase, it is mainly to provide a data set (DataSet) related to "people". In addition to providing the archive data shown in Figure 7A, the human-related data set also includes: data sets of male/female facial appearances, body A data set of body shape (fat/thin/moderate...), a data set of facial expression (happy/angry/sad/happy/frightened/disappointed...), etc. In addition, the dataset related to "person" may further include datasets of "baby" and "pet". If it is necessary to estimate a person's occupation, it is necessary to further provide various data sets including "clothes (uniforms of various industries)", "tools (tools of various industries)", etc. For the "smart digital signage person detection training stage" 900 in FIG. 14 , please refer to the descriptions of steps 510 and 520 in FIG. 5 and steps 610 to 640 in FIG. 6 . In addition, the "intelligence Figure 15 will illustrate the detailed flow of the implementation of the stage 910 of "Smart Electronic Kanban Character Investigation/Statistical Analysis"; the detailed flow of the implementation of the "Smart Electronic Kanban Application Stage" 920 in Figure 14 will be illustrated in Figure 16. Also, step 910 is shown with a dotted line in Fig. 14, and the reason is: if the system 100M does not need to output and analyze the statistical data of step 910, then the step 910 in Fig. The "investigation/statistical analysis stage" can be omitted, so step 910 is shown with a dotted line.

請參考圖15，圖15係顯示依據圖14中關於「智慧電子看板系統的人物調查/統計分析階段」910的流程圖。承圖14中步驟910的說明，當該系統100M若不需將步驟910的統計資料輸出並加以分析的情況時，則圖15中的流程可以省略。圖15中該流程開始於步驟1000。 Please refer to FIG. 15 . FIG. 15 is a flow chart showing the “personal investigation/statistical analysis phase of the smart electronic kanban system” 910 in FIG. 14 . Following the description of step 910 in FIG. 14, if the system 100M does not need to output and analyze the statistical data in step 910, the process in FIG. 15 can be omitted. The process starts at step 1000 in FIG. 15 .

步驟1000中，該影像處理裝置300M取得要進行分析的影像資料。圖12A中，當該影音顯示裝置850在播放不同「內容屬性」的影音廣告時，該影像擷取裝置400M也同步在記錄位於該影音顯示裝置850前的人的各種態樣。依據本發明的一實施例，此處所謂的「人物」僅指影像資料中與「人」相關的部分，並不包括其他的物體，例如：車子、建築物、交通號誌…等，以下說明書中所指的「人物」皆與此相同。 In step 1000, the image processing device 300M acquires image data to be analyzed. In FIG. 12A , when the audio-visual display device 850 is playing audio-visual advertisements with different "content attributes", the image capture device 400M is also synchronously recording various appearances of people in front of the audio-visual display device 850 . According to an embodiment of the present invention, the so-called "person" here only refers to the part related to "people" in the image data, and does not include other objects, such as: cars, buildings, traffic signs, etc., the following instructions The "persons" referred to in are the same.

步驟1010中，係辨識該影像資料中具有主題(subject)的部分；其中所謂的「主題」是，影像分析伺服器300M經過深度學習後可以辨識之具有特定特徵的標的。影像資料的每一幀(frame)圖片(picture)中可能包含有一或複數個具有特定特徵的標的，這些標的可能是具有人的「臉/頭部」特徵的標的、或是具有人的「身體」等特徵的標的。依據本發明一實施例，由於影像資料中的每一圖片是由眾多的畫素所組成，故步驟1010必須判斷圖片中包括了哪些是人的「臉/ 頭部」特徵的標的與哪些是人的「身體」特徵的標的，此外還要知道各個標的各自分布在圖片的哪一位置；換句話說，判斷圖片中每一具有人的「臉/頭部」特徵的標的及每一具有人的「身體」特徵的標的，及其個別在圖片中所佔的範圍(畫素大小)與在圖片中的位置，是執行步驟1010的主要目的。一般來說，雖然影像資料的動態規格是由一秒鐘數張到數十張/幀的圖片(picture)所組成，根據本發明一實施例，在實作上不需要每一圖片都偵測其包括了多少具有人的「臉/頭部」特徵的標的與具有人的「身體」特徵的標的。若以最常見的30幀(frame)/秒(second)的規格為例，可以選擇，但不以此為限，在每間隔1/6秒時選取一幀(frame)圖片(picture)去偵測該圖片包括了多少具有人的「臉/頭部」特徵的標的與具有人的「身體」特徵的標的，以及其個別位於圖片中的位置與範圍大小。換句話說，在此假設前提下，步驟1010須每1秒鐘判別6個圖片具有人的「臉/頭部」特徵的標的與具有人的「身體」特徵的標的。上述每間隔1/6秒偵測一幀圖片僅做為本發明的一實施例，實際上該時間間隔可以為任意其它數值的時間間隔。必須注意的是：步驟1010中對於圖片中人的「臉/頭部」或是人的「身體」的偵測並非透過Data Base進行比對而實現，而是藉由在訓練過程的深度學習中所學習得到的「特徵對映(feature mapping)」來執行辨識圖片中人的「臉/頭部」及/或人的「身體」的工作。 In step 1010, the portion of the image data that has a subject is identified; the so-called "subject" is a target with specific characteristics that can be identified by the image analysis server 300M after deep learning. Each frame (frame) picture (picture) of the image data may contain one or more objects with specific characteristics. These objects may be objects with human "face/head" characteristics, or human "body " and other characteristics of the target. According to an embodiment of the present invention, since each picture in the image data is composed of many pixels, it is necessary to determine which of the pictures are human "faces/ The target of the "head" feature and which are the targets of the "body" feature of the person. In addition, it is necessary to know where each target is distributed in the picture; The main purpose of executing step 1010 is the target of the "feature" and each target with the "body" feature of a person, as well as their respective range (pixel size) and position in the picture. Generally speaking, although the dynamic specification of image data is composed of several to dozens of pictures per second, according to an embodiment of the present invention, it is not necessary to detect every picture in practice. It includes how many objects with human "face/head" characteristics and human "body" characteristics. Taking the most common specification of 30 frames (frame)/second (second) as an example, you can choose, but not limited to, to select a frame (frame) picture (picture) at every interval of 1/6 second to detect Measure how many objects with human "face/head" characteristics and human "body" characteristics are included in the picture, as well as their individual positions and ranges in the picture. In other words, under this assumption, the step 1010 needs to discriminate between 6 pictures with human "face/head" features and human "body" features every 1 second. The above-mentioned detection of a frame of pictures at an interval of 1/6 second is only an embodiment of the present invention, and actually the time interval can be any other value of time interval. It must be noted that in step 1010, the detection of the "face/head" or the "body" of the person in the picture is not achieved through the comparison of the Data Base, but through the deep learning in the training process The learned "feature mapping" is used to perform the work of recognizing the person's "face/head" and/or the person's "body" in the image.

在步驟1010中對影像資料進行辨識後，可將辨識到的人的「臉/頭部」與人的「身體」的標的，區分為「臉/頭部」與「身體」二條平行流程來處理。因此，在步驟1010之後，辨識為人的「臉/頭部」的標的至步驟1020以及執行其後的步驟；辨識為人的「身體」的標的至步驟1030以及執行其後的步驟。 After the image data is recognized in step 1010, the identified human "face/head" and human "body" objects can be divided into two parallel processes of "face/head" and "body" for processing . Therefore, after step 1010, the object recognized as a person's "face/head" goes to step 1020 and the following steps are performed; the object recognized as a person's "body" goes to step 1030 and the subsequent steps are performed.

又，依據本發的一實施例，上述所謂人的「臉/頭部」除了臉部外，還包含頭部的部分；此處所謂的「身體」是指「臉/頭部」以外身體的其他部分。依據本發明的另一實施例，該影像處理裝置300M對該影像資料的內容來分析，除了「臉/頭部」與「身體」外，還包括與「人」極度相關的所有「物體」，例如：隨身衣物(衣服、帽子…)、隨身物體(嬰兒、眼鏡、首飾、寵物、皮包…)等。 Also, according to an embodiment of the present invention, the above-mentioned so-called "face/head" of a person, in addition to the face, Parts of the head are also included; the so-called "body" here refers to other parts of the body other than the "face/head". According to another embodiment of the present invention, the image processing device 300M analyzes the content of the image data, including not only the "face/head" and "body", but also all "objects" that are extremely related to "people", For example: personal clothing (clothes, hats...), personal objects (babies, glasses, jewelry, pets, purses...), etc.

承上說明，步驟1020是圖片中辨識為人的「臉/頭部」的標的所進入的步驟，步驟1020係對該圖片中人的臉/頭部的偵測。由於圖片中呈現出人臉/頭部的範圍有大有小，所以圖片中若有過小區域(畫素)的人臉/頭部，可能無法有效的被偵測出來。於本發明一實施例，可以被偵測出來的畫素大小為40X40畫素(含)以上。該40X40畫素的門檻值僅是本發明提供的一實施參考，本發明並不以此為限，最低可被辨識的畫素門檻值可能是任意其它數值的畫素大小，端視軟硬體的能力而定。在步驟1020中確定辨識為人的「臉/頭部」的標的後，則進入步驟1040之「人的頭部方向之偵測」、步驟1050之「人的頭部位置之估算」、步驟1060之「人的性別/年紀/表情之偵測」等3個步驟。上述3個步驟都是與人的「臉/頭部」有相關，該3個步驟將在稍後說明。 Continuing from the above description, step 1020 is a step in which the target identified as a person's "face/head" in the picture enters, and step 1020 is the detection of the person's face/head in the picture. Since the range of the face/head shown in the picture is large or small, if there is a face/head in a too small area (pixel) in the picture, it may not be effectively detected. In an embodiment of the present invention, the pixel size that can be detected is 40×40 pixels (inclusive). The threshold value of 40X40 pixels is only an implementation reference provided by the present invention, and the present invention is not limited thereto. The minimum identifiable pixel threshold value may be any other numerical pixel size, depending on the hardware and software depends on the ability. After determining the target recognized as the "face/head" of a person in step 1020, proceed to step 1040 of "detection of the direction of the person's head", step 1050 of "estimation of the position of the person's head", step 1060 There are 3 steps including "detection of human gender/age/expression". The above three steps are all related to the "face/head" of a person, and these three steps will be explained later.

步驟1040中，係對該圖片中人的頭部方向(Head Direction)的偵測。承前面步驟1020的說明，在步驟1020執行完畢之後，會以平行處理的方式執行步驟1040中「人的頭部方向的偵測」、步驟1050中「人的頭部位置之估算」與步驟1060中「人的性別/年紀/表情之偵測」。依據本發明的一實施例，進行步驟1040之前，須先在圖14的訓練階段中利用如圖7A的資料集(DataSet)讓影像分析應用程式460M做深度學習，以學習得到的主題為頭部方向的「特徵對映(feature mapping)」，然後再依據該頭部方向的「特徵對映(feature mapping)」判定圖片中人的頭部方向最可能的態樣。以圖7A為例，其中編號1-91號的圖片呈現同一女性於各種不同頭部方向的影像，每一影像都有相對應的標籤(label)，每一標籤(label)內的記載的資訊至少包括「主題(subject)」(例如：女性的頭部)、及「主題(subject)的翻滾(Roll)、俯仰(Pitch)、偏擺(Yaw)三數值」。由上所述可知：編號1-91號的圖片係呈現一女性的臉/頭部於不同的「頭部方向」下的態樣。於實際應用時，圖片中顧客的頭部方向(即由頭部的翻滾(Roll)、俯仰(Pitch)、偏擺(Yaw)等值所形成的空間中翻轉自由度)大致上可以對應到學習過的編號1-91其中一種狀況，因此影像分析應用程式460M可以由過去訓練階段時深度學習所得到的「特徵對映(feature mapping)」，來判定圖片中顧客頭部方向相對應的Roll,Pitch,Yaw等數值。若圖片中該人的頭部方向並不完全符合上述編號1-91的其中之一者，步驟1040中也會依照學習到的「特徵對映(feature mapping)」對人的頭部方向做一最接近的判斷，推估其最可能的Roll,Pitch,Yaw等數值。此外，根據本發明一實施例，步驟1040可以進一步計算人的頭部於圖片中的位置與範圍大小。在可以辨識的情況下，若一圖片中包括10位男女，則10位男女的頭部方向都會被偵測出來。又，在該影像擷取裝置400M前方的人們可能會有不同的行為：可能是注視正在播出的影音廣告、可能是正在交談中、可能正在看遠處的建築物等不同的行為，該等行為其「人的頭部方向」所呈現的態樣也會不相同。 In step 1040, the head direction (Head Direction) of the person in the picture is detected. Continuing from the previous description of step 1020, after step 1020 is executed, the "detection of the direction of the person's head" in step 1040, the "estimation of the position of the person's head" in step 1050 and step 1060 will be executed in parallel "Detection of human gender/age/expression". According to an embodiment of the present invention, before step 1040 is performed, in the training phase of FIG. 14 , the data set (DataSet) as shown in FIG. 7A must be used to let the image analysis application program 460M do deep learning, and the subject obtained by learning is used as the head According to the "feature mapping" of the direction of the head, the most likely shape of the head direction of the person in the picture is determined according to the "feature mapping" of the head direction. Taking Figure 7A as an example, the pictures numbered 1-91 present the same woman in Various images of different head orientations, each image has a corresponding label (label), and the information recorded in each label (label) includes at least "subject (subject)" (for example: female head), and "Three values of the subject (Roll), pitch (Pitch), and yaw (Yaw)". From the above, it can be seen that the pictures numbered 1-91 show the face/head of a woman under different "head orientations". In practical applications, the direction of the customer's head in the picture (that is, the degree of freedom in the space formed by the head's roll (Roll), pitch (Pitch), yaw (Yaw) and other values) can roughly correspond to the learning One of the situations numbered 1-91, so the image analysis application 460M can use the "feature mapping" obtained by deep learning in the previous training stage to determine the Roll corresponding to the direction of the customer's head in the picture. Pitch, Yaw and other values. If the head direction of the person in the picture does not completely match one of the above-mentioned numbers 1-91, step 1040 will also make a correction to the person's head direction according to the learned "feature mapping (feature mapping)". The closest judgment is to estimate the most likely Roll, Pitch, Yaw and other values. In addition, according to an embodiment of the present invention, step 1040 may further calculate the position and range of the person's head in the picture. If it can be identified, if there are 10 men and women in a picture, the head directions of all 10 men and women will be detected. In addition, people in front of the image capture device 400M may have different behaviors: they may be watching the video and audio advertisements being played, they may be talking, they may be looking at distant buildings, etc. The appearance of the "direction of the head" of the behavior will also be different.

步驟1050中，係對該圖片中人的頭部位置之估算。本發明在執行圖15所示的應用流程之前，須先對影像擷取裝置400M做校準(calibration)的工作，以得到影像擷取裝置400M的相關參數(例如：Intrinsic/Extrinsic matrix)，經過深度學習訓練後，步驟1050可由影像擷取裝置400M的參數與圖片中特定物體的影像資料，推算出圖片中特定物體與影像擷取裝置400M的距離與相對位置。依據本發明的一實施例，該特定物體與該影像擷取裝置400M的相對距離可以視為空間中的兩個點之間的距離。一般可以將影像擷取裝置400M設為原點，所以影像擷取裝置400M於3D立體空間的座標表示為(0,0,0)。則可以推算出該特定物體於3D立體空間的座標，例如：(100,552,211)，則特定物體相對於原點(影像擷取裝置400M)的距離就是一可計算的長度。依據本發明的另一實施例，3D立體空間中的原點亦可設定為圖片中賣場的兩面相鄰牆壁與地面的三角交會點。總而言之，本發明先對影像擷取裝置400M做校準(calibration)的工作，以得知2D圖片中的特定點對影像擷取裝置400M或參考點(設定的原點)在3D空間中的相對距離。據此，步驟1050中便可利用校準後的影像擷取裝置400M估算人的頭部在3D空間中的位置。 In step 1050, the head position of the person in the picture is estimated. Before the present invention executes the application process shown in FIG. 15 , the image capture device 400M must be calibrated to obtain the relevant parameters of the image capture device 400M (for example: Intrinsic/Extrinsic matrix). After learning and training, step 1050 can calculate the distance and relative position between the specific object in the picture and the image capture device 400M based on the parameters of the image capture device 400M and the image data of the specific object in the picture. According to an embodiment of the present invention, the relative distance between the specific object and the image capture device 400M can be regarded as the distance between two points in space. Generally, the image capture device 400M can be set as the origin, so the image capture The coordinates of the device 400M in the 3D three-dimensional space are expressed as (0,0,0). Then the coordinates of the specific object in the 3D three-dimensional space can be calculated, for example: (100, 552, 211), and the distance of the specific object relative to the origin (the image capture device 400M) is a calculable length. According to another embodiment of the present invention, the origin in the 3D three-dimensional space can also be set as the triangular intersection point between two adjacent walls and the ground of the store in the picture. In a word, the present invention performs calibration work on the image capture device 400M first, so as to know the relative distance of a specific point in the 2D picture to the image capture device 400M or the reference point (set origin) in 3D space . Accordingly, in step 1050 , the calibrated image capture device 400M can be used to estimate the position of the person's head in the 3D space.

步驟1070中，係對該圖片中人的注視區域的估算。承步驟1040、步驟1050執行後的結果(資料)，可以進一步得知該圖片中的人的注視區域。 In step 1070, the gaze area of the person in the picture is estimated. According to the result (data) after the execution of step 1040 and step 1050, the gaze area of the person in the picture can be further known.

依據本發明的一實施例，經過步驟1040中人的頭部方向之偵測與步驟1050中人的頭部位置的估算後，可以得知圖片中某一個人的「頭部方向」(roll、pitch、yaw三數值)與該某一個人的頭部於立體空間中的位置。透過此二種資訊，進行以下模擬：從圖片中人的頭部的正後方以與該頭部方向一致的方向投射出的光線(light)，設定該投射光線一直延伸直到該影音顯示裝置850的位置，然後在該影音顯示裝置850位置處，假設有一模擬的屏幕且該投射光線在上面形成一模擬的投影(projection)。該模擬投影所形成的投影區域是該某一個人可能的注視區域，如圖11A與圖11B中所示的橢圓形投影區域815A、815B。根據本發明一實施例，該投影區域中進一步區分為複數個子投影區域，其中具有最高機率的子投影區域是該某一個人的視線最可能聚焦的範圍，也可以由此推論是該某一個人所注視的物體是否為該影音顯示裝置850上的螢幕890，以判定該某一個人是否觀看正在播出的影音廣告。請參考圖9中步驟740、圖11A、圖11B中的說明內容。在步驟1070中，圖片中的每一個人依據其不同的頭部方向都會有各自的投影區域，每一個人各自的投影區域代表該每一顧客的注視區域。 According to an embodiment of the present invention, after the detection of the person's head direction in step 1040 and the estimation of the person's head position in step 1050, the "head direction" (roll, pitch) of a certain person in the picture can be known. , yaw three values) and the position of the certain person's head in the three-dimensional space. Through these two kinds of information, the following simulation is carried out: the light (light) is projected from the right back of the person's head in the picture in the direction consistent with the direction of the head, and the projected light is set to extend until the audio-visual display device 850 position, then at the position of the audio-visual display device 850, it is assumed that there is a simulated screen on which the projection light forms a simulated projection (projection). The projection area formed by the simulated projection is the possible gaze area of the certain person, such as the elliptical projection areas 815A and 815B shown in FIG. 11A and FIG. 11B . According to an embodiment of the present invention, the projection area is further divided into a plurality of sub-projection areas, among which the sub-projection area with the highest probability is the range where the line of sight of the certain person is most likely to be focused. Whether the object is the screen 890 on the audio-visual display device 850 to determine whether the certain person watches the ongoing audio-visual advertisement. Please refer to step 740 in FIG. 9 , and descriptions in FIG. 11A and FIG. 11B . In step 1070, each person in the picture has its own projection area according to its different head orientations, and each person's respective projection area represents the gaze area of each customer.

依據本發明的一實施例，步驟1070中，只有當圖片中的人觀看某一物體的時間長度超過一設定的時間門檻值，才會針對他們推測其注視區域(Gazing Area)是否為該影音顯示裝置850上的螢幕890；其中的設定的時間門檻值可能是，例如：5秒鐘，但不以此為限，可以是設定的任一時間長度。依據本發明的另一實施例，若是圖片中的人正在移動且其注視視線持續於同一物體超過一設定的時間門檻值(例如：5秒鐘，但不以此為限)，也會針對他們推測其注視區域是否為該影音顯示裝置850上的螢幕890。步驟1070中，除了無法辨識出圖片中人的頭部方向的情形外，圖片中的每一個人都應可估算出其注視區域(投影區域)與其中子投影區域中被注視的機率大小。上述「無法辨識圖片中人的頭部方向」的原因可能是：(1)圖片中人的頭部影像不完整，例如：該人的頭部影像有大面積被其他人或物體擋住；(2)圖片中某一個人的頭部影像範圍所包含的畫素太小，致使該某一個人的頭部無法被辨識出來。依據本發明所揭露的技術，圖片中人的頭部影像就算沒有包括臉部的眼睛(例如：後側的頭部影像)，該人的頭部方向還是可以被辨識出來。依據本發明的另一實施例，當圖片中人的頭部影像明顯地呈現出臉部的眼睛特徵時，可以計算並參考顧客的視線方向，將上述投影區域做進一步限縮，以增加估算圖片中人注視區域的正確性。步驟1070執行完畢，即跳至步驟1080執行。 According to an embodiment of the present invention, in step 1070, only when the person in the picture watches a certain object for longer than a set time threshold value, will it be estimated whether their gaze area (Gazing Area) is the audio-visual display The screen 890 on the device 850; the set time threshold therein may be, for example, 5 seconds, but not limited thereto, and may be any set time length. According to another embodiment of the present invention, if the person in the picture is moving and their gaze continues on the same object for more than a set time threshold (for example: 5 seconds, but not limited thereto), they will also be targeted It is estimated whether its gaze area is the screen 890 on the audio-visual display device 850 . In step 1070, except that the head direction of the person in the picture cannot be recognized, each person in the picture should be able to estimate the probability of being watched in the gaze area (projection area) and its sub-projection area. The reasons for the above "unrecognizable head direction of the person in the picture" may be: (1) The head image of the person in the picture is incomplete, for example: a large area of the person's head image is blocked by other people or objects; (2) ) of a certain person's head in the picture contains too small pixels to make the certain person's head unrecognizable. According to the technology disclosed in the present invention, even if the head image of the person in the picture does not include the eyes of the face (for example: the head image from the back side), the direction of the head of the person can still be identified. According to another embodiment of the present invention, when the image of the head of the person in the picture clearly shows the features of the eyes of the face, it can be calculated and referred to the customer's line of sight direction, and the above-mentioned projection area can be further restricted to increase the estimated image size. The correctness of the human gaze area. After step 1070 is executed, skip to step 1080 for execution.

再回到步驟1060。步驟1060中，係對該圖片中人的性別、年紀、表情的偵測。依據發明的一實施例，對於圖片中的人的頭部方向外，其「臉部」的部分也要能清楚的辨識，圖片中未能達到此標準的人則會忽略不予偵測。依據本發明的另一實施例，當圖片中的人的臉部被其他的人或是物體遮蓋的面積超過該影像分析應用程式460M可以判斷的上限時，則該被遮蔽的人則會忽略不予偵測。基本上，圖片中該人的臉部的區域，其所包含的畫素至少要為40X40畫素以上，其中該40X40的數值僅作為本發明的一實施例，並非對本發明的限制，故該圖片中該人的臉部的區域，其所包含的畫素可以是任何大小的畫素。步驟1060中對於「性別」、「年紀」、「表情」的偵測係透過圖14中步驟920的訓練階段時利用該資料庫450中的Data Set做深度學習，所得到的主題各自為「性別」、「年紀」、「表情」的「特徵對映(feature mapping)」，故在圖14中步驟920的訓練階段時，該資料庫450要提供，例如：「年輕的男/女」、「中年的男/女」、「老年的男/女」、「小孩中男/女」等，各種年齡層的資料集(Data Set)，以學習偵測出圖片中該人的「性別」、「年紀」。對於步驟1060中偵測圖片中人的「年紀」，只要訓練階段時所提供的DataSet愈多，步驟1060中便可以更準確地偵測該圖片中人的「年紀」。又，對於圖片中人「表情」的偵測，在圖14中步驟920的訓練階段時該資料庫450中的DataSet要包含各種情緒，例如：微笑、高興、生氣、悲傷、快樂、流淚、驚訝、害怕、疑惑…等各種顯現情緒的Data Set。雖然步驟1060中顯示該「性別」、「年紀」、「表情」的偵測被列在同一步驟中，但實作上對於該「性別」、「年紀」、「表情」的偵測係與步驟1040、步驟1050是以平行流程來處理的。又，上述關於步驟1060中圖片中人的「性別」、「年紀」、「表情」可以視為該「人的屬性」之一。 Return to step 1060 again. In step 1060, it is to detect the gender, age and expression of the person in the picture. According to an embodiment of the invention, in addition to the direction of the head of the person in the picture, the "face" part must also be clearly identified, and the person in the picture that fails to meet this standard will be ignored and not detected. in accordance with In another embodiment of the present invention, when the area of the face of the person in the picture is covered by other people or objects exceeds the upper limit that the image analysis application 460M can judge, the covered person will be ignored. detection. Basically, the area of the face of the person in the picture must contain at least 40×40 pixels, and the value of 40×40 is only an example of the present invention, not a limitation of the present invention, so the picture The region of the person's face in , which contains pixels can be any size of pixels. In step 1060, the detection system for "gender", "age" and "expression" is to use the Data Set in the database 450 to do deep learning through the training phase of step 920 in Fig. 14, and the obtained themes are respectively "gender ", "age" and "feature mapping (feature mapping)" of "expression", so during the training phase of step 920 in Figure 14, the database 450 should provide, for example: "young male/female", " Middle-aged male/female", "elderly male/female", "children middle-aged male/female", etc., data sets of various age groups (Data Set), to learn to detect the "gender" of the person in the picture, "age". For detecting the "age" of the person in the picture in step 1060, as long as more DataSets are provided during the training phase, the "age" of the person in the picture can be detected more accurately in step 1060. Also, for the detection of the "expression" of the person in the picture, the DataSet in the database 450 should contain various emotions during the training phase of step 920 in FIG. , fear, doubt... and other data sets that express emotions. Although it is shown in step 1060 that the detection of the "gender", "age" and "expression" is listed in the same step, the actual detection system and steps of the "gender", "age" and "expression" Step 1040 and step 1050 are processed in parallel. Also, the above-mentioned "gender", "age" and "expression" of the person in the picture in step 1060 can be regarded as one of the "attributes of the person".

回到上述步驟1010，步驟1010中，當辨識圖片中的人所具有的特徵的標的為「身體」時，則進入步驟1030。 Going back to step 1010 above, in step 1010, when the target of identifying the features of the person in the picture is "body", then go to step 1030.

步驟1030中，係對該圖片中人的身體部分的偵測，特別是關於「人的身高」、「人的體態(胖/瘦)」、「人的衣著」等部分的偵測。對於人的身高必須要明確地在圖片中顯示人中「頭到鞋子」所有的影像資料，否則未能達到此標準的人的身高則會忽略不予計算。依據本發明的一實施例，由於前述步驟1050中人的頭部位置可以被估算出來，所以同樣地，在步驟1030中，圖片中人的身高也可以依此方法被估算出來。基本上，透過身高的偵測可以辨識出「成年的男/女」、「孩童的男/女」。又，步驟1030中對於「人的體態(胖/瘦)」、「人的衣著」的偵測，同樣的係在圖14中步驟920的訓練階段時利用資料庫450中的Data Set做深度學習，所得到的主題各自為「人的體態」、「人的衣著」的「特徵對映(feature mapping)」。故在圖14中步驟920的訓練階段時，資料庫450要提供男女各種體態(高矮、胖瘦等)的資料集(Data Set)、以及「各種職業所穿的制服」的資料集(Data Set)，以學習偵測出圖片中人的體態(高矮、胖瘦等)、以及透過偵測圖片中人所穿的衣服(制服)來判斷其職業。對於步驟1030中偵測圖片中「人的體態」與「人的衣著」，只要在訓練階段時提供的Data Set愈多，則學習成果會越好；換言之，可以具有更強的能力可以準確地判斷圖片中有關「人的體態」與「人的衣著」的屬性。依據本發明的另一實施例，對於圖片中人的職業之推測，進一步包括分析該人「隨身的物品」。例如：當圖片中人的隨身物品為「公事包」時，該圖片中人的職業可以推測其是「上班族」；當圖片中人的隨身物品為「書包」時，該圖片中人的職業可以推測其是「學生」…等。由於圖片中人其職業的推測很不容易，故實作上，對於圖片中人的職業之推測基本上會對數種情況來分析，除了上述「衣著」、「隨身物品」外，還包括「配飾」、「時間」、「地點」、「天氣」、…等各種條件加入分析，以推測出該圖片中人的「職業」。雖然步驟1030中顯示該「身高」、「胖/瘦」、「衣著」的偵測被列在同一步驟中，但實作上對於該「人的身高」、「人的體態(胖/瘦)」、「衣著」的偵測係與步驟1040、步驟1050、步驟1060是平行來處理的。 In step 1030, it is the detection of the body parts of the person in the picture, especially the detection of parts such as "person's height", "person's posture (fat/thin)", "person's clothing". For a person's height it is necessary Clearly display all the image data of the person from "head to shoe" in the picture, otherwise the height of the person who fails to meet this standard will be ignored and not calculated. According to an embodiment of the present invention, since the head position of the person can be estimated in step 1050, similarly, in step 1030, the height of the person in the picture can also be estimated in this way. Basically, "adult man/female" and "children's man/female" can be identified through height detection. In addition, in step 1030, for the detection of "person's posture (fat/thin)" and "person's clothing", the same system uses the Data Set in the database 450 for deep learning during the training phase of step 920 in Figure 14 , the obtained themes are "feature mapping" of "human posture" and "human clothing". Therefore, during the training phase of step 920 in Fig. 14, the database 450 will provide the data sets (Data Set) of various postures (height, short, fat, thin, etc.) of men and women, and the data sets (Data Set) of "uniforms worn by various occupations". ), to learn to detect the figure (height, short, fat, thin, etc.) of the person in the picture, and to judge the occupation by detecting the clothes (uniform) worn by the person in the picture. For the detection of "person's posture" and "person's clothing" in the picture in step 1030, as long as more Data Sets are provided during the training phase, the learning result will be better; in other words, it can have a stronger ability to accurately Determine the attributes of "person's posture" and "person's clothing" in the picture. According to another embodiment of the present invention, the estimation of the occupation of the person in the picture further includes analyzing the person's "carrying items". For example: when the belongings of the person in the picture are "briefcase", the occupation of the person in the picture can be inferred as "office worker"; when the belongings of the person in the picture are "school bag", the occupation of the person in the picture It can be speculated that it is a "student"...etc. Since it is not easy to guess the occupation of the person in the picture, in practice, the speculation of the occupation of the person in the picture will basically be analyzed in several situations. In addition to the above-mentioned "clothing" and "carrying items", it also includes "accessories". , "Time", "Location", "Weather", ... and other conditions are added to the analysis to infer the "occupation" of the person in the picture. Although the detection of the "height", "fat/thin" and "clothing" displayed in step 1030 is listed in the same step, in practice, the detection of the "person's height", "person's body (fat/thin) ", "clothing" detection system and step 1040, step 1050, step 1060 are processed in parallel.

步驟1070中，係對該圖片中人的注視區域的估算。承步驟1040與步驟1050的執行結果(輸出)，步驟1070可以估算出圖片中某一人所注視的區域是在何處。步驟1070執行完畢後即可知道「圖片中的某一人是否正在注視該影音顯示裝置850上的螢幕890」，即代表圖片中的該某一人是否正在觀看播放中的影音廣告。所以收集「圖片中正在觀看播放中的影音廣告的人」的相關資訊(屬性)是最重要的流程。對於此流程還必須包括「持續觀看正在播出的影音廣告一段時間以上」的條件，該一段時間的條件是5秒。又，該5秒的數值僅作為本發明的一實施例，並非對本發明的一限制，上述該5秒的時間長度可以是任何秒數的時間長度。承上步驟1070、步驟1060、步驟1030的輸出結果，本發明中需要被統計的標的對象(Target Person)，其基本條件是：「圖片中人持續觀看正在播出的影音廣告超過一段預設的時間以上」。該步驟1070、步驟1060、步驟1030執行完畢後則執行步驟1080的判斷。 In step 1070, the gaze area of the person in the picture is estimated. Based on the execution results (output) of step 1040 and step 1050, step 1070 can estimate where a person is looking at in the picture. After step 1070 is executed, it is possible to know "whether the person in the picture is watching the screen 890 on the audio-visual display device 850", that is, whether the person in the picture is watching the video-audio advertisement being played. Therefore, it is the most important process to collect the relevant information (attributes) of "people who are watching the playing audio-visual advertisement in the picture". This process must also include the condition of "continuing to watch the ongoing video and audio advertisement for a period of time", and the condition for this period of time is 5 seconds. In addition, the value of 5 seconds is only used as an embodiment of the present invention, and is not a limitation to the present invention. The above-mentioned 5 seconds may be any number of seconds. Continuing from the output results of step 1070, step 1060, and step 1030, the basic condition of the target person (Target Person) that needs to be counted in the present invention is: "the person in the picture continues to watch the video and audio advertisement being played for more than a preset period more than time". After step 1070, step 1060, and step 1030 are executed, the judgment of step 1080 is executed.

步驟1080中，係判斷圖片中的人是否正在注視該影音顯示裝置850的螢幕890且持續注視該螢幕890的時間長度超過一時間門檻值，例如：5秒以上。依據本發明的一實施例，圖片中同時具備上述2種條件的人才是符合需要被統計的標的，圖片中其他不符合上述2種條件的人則會予以忽略。步驟1080中的判斷，其結果若為真(Yes)，則執行步驟1110，否則執行步驟1120。 In step 1080, it is determined whether the person in the picture is watching the screen 890 of the audio-visual display device 850 and the duration of watching the screen 890 exceeds a time threshold, for example, more than 5 seconds. According to an embodiment of the present invention, people in the picture who meet the above two conditions at the same time meet the targets to be counted, and other people in the picture who do not meet the above two conditions will be ignored. If the result of the judgment in step 1080 is true (Yes), then go to step 1110 , otherwise go to step 1120 .

步驟1090中，係該媒體播放器880對該影像處理裝置300M提供目前正在播出的影音廣告的播放進度及該內容伺服器870對該影像處理裝置300M提供目前正在播放的影音廣告的內容屬性。依據本發明的另一實施例，該目前影音廣告的播放進度及目前正在播放的影音廣告的內容屬性可以都由該內容伺服器870來提供或是都由該媒體播放器880來提供。步驟1090中，該媒體播放器880提供目前正在播出的影音廣告的播放進度，該播放進度基本上係指該目前正在播放的影音廣告中的第幾秒。該影音廣告的內容屬性包括：目前正在播放的影音廣告的編號、該影音廣告的播放順序、該影音廣告的標的產品(例如：冰淇淋、汽車、眼鏡、保健食品…等)、該影音廣告所適合的族群對象(例如：婦女、老人、小孩、男性上班族…等)、該影音廣告預設的播出期間、該影音廣告播放的總長度(時間)…等各種資訊。 In step 1090, the media player 880 provides the video processing device 300M with the playing progress of the currently playing video and audio advertisement and the content server 870 provides the video processing device 300M with the content attributes of the currently playing video and video advertisement. According to another embodiment of the present invention, the playing progress of the current audio-visual advertisement and the content attributes of the currently-playing audio-visual advertisement may both be provided by the content server 870 or both provided by the media player 880 . In step 1090, the media player 880 provides For the playing progress of the currently playing video and audio advertisement, the playing progress basically refers to the number of seconds in the currently playing video and audio advertisement. The content attributes of the audio-visual advertisement include: the serial number of the audio-visual advertisement currently being played, the playing sequence of the audio-visual advertisement, the target product of the audio-visual advertisement (for example: ice cream, automobile, glasses, health food, etc.), and the suitable The target groups of the group (for example: women, the elderly, children, male office workers, etc.), the preset broadcasting period of the audio-visual advertisement, the total length (time) of the audio-visual advertisement, etc. and other information.

步驟1110中，係同步記錄圖片中人的相關資訊(內容屬性)與影音廣告的內容屬性。步驟1110中會執行記錄該步驟1060、步驟1030、步驟1090中的各輸出結果等相關資訊。上述所謂的「記錄」係為一統計記錄(圖未示)，該統計記錄係以新增的方式來累積資料庫450內的記錄資料。該統計記錄包括目前的日期/時間、正在觀看該影音廣告的人之性別、年紀、身高、胖瘦、職業、表情與該目前正在播放的影音廣告的內容屬性(編號、標的產品、預設的播出期間、播放的總長度(時間))與目前播放進度等各種欄位資訊。又該統計記錄可以是一資料庫檔案、表格、文字檔等記錄格式。該統計記錄可以儲存在該影像處理裝置300M的硬碟385中或是該非揮發性記憶體330內。依據本發明的一實施例，當該統計記錄到達預定的記錄數量或是該影音廣告達到預定的測試時間後，儲存於該影像處理裝置300M的該統計記錄會透過該網路860傳送至其他的伺服器(圖未示)。步驟1110執行完畢，會跳回至步驟1010繼續圖片中其他人的影像資料分析，持續反覆循環的執行。依據本發明的一實施例，基本上，圖片中的每一個人都會以平行處理的方式來偵測其性別、年紀、身高、胖瘦、職業、表情等屬性(人的屬性)。 In step 1110, the related information (content attribute) of the person in the picture and the content attribute of the audio-visual advertisement are recorded synchronously. In step 1110, relevant information such as output results in step 1060, step 1030, and step 1090 will be recorded. The above-mentioned so-called "record" is a statistical record (not shown in the figure), and the statistical record is to accumulate the record data in the database 450 in a new way. The statistical record includes the current date/time, the gender, age, height, fatness, occupation, expression, and content attributes (number, target product, default) of the currently playing audiovisual advertisement. Various field information such as the broadcast period, the total length (time) of the broadcast, and the current playback progress. In addition, the statistical record can be in a database file, table, text file and other record formats. The statistical record can be stored in the hard disk 385 of the image processing device 300M or in the non-volatile memory 330 . According to an embodiment of the present invention, when the statistical record reaches the predetermined number of records or the audio-visual advertisement reaches the predetermined test time, the statistical record stored in the image processing device 300M will be transmitted to other server (not shown). After step 1110 is executed, it will jump back to step 1010 to continue analyzing the image data of other people in the picture, and continue to execute repeatedly. According to an embodiment of the present invention, basically, each person in the picture will be processed in parallel to detect attributes (human attributes) such as gender, age, height, fat or thin, occupation, expression, etc.

又，承上述步驟1080中的執行結果，當其結果為假(No)時，則執行步驟1120。 In addition, following the execution result of step 1080 above, if the result is false (No), then step 1120 is executed.

步驟1120中，係判斷是否已經達到預定的測試時間或是該統計記錄所累積的資料量已經足夠。正常的情況下，圖14中該步驟910的人物調查/資料統計階段在執行時會有一預定的期間，該預定的期間可以是1天、3天、7天…等不同的時間區間。當時間區間較長時，該累積的資料量較多，則該統計記錄的分析結果在實際運用時會較接近本發明所預定達到的目標。此外，當該統計記錄的累積資料量已經達到一定的數目時，可以進一步對該統計記錄與該等影音廣告的關連性來分析。當步驟1120的判斷結果為真(Yes)時，則執行步驟1130，否則跳回步驟1010中繼續執行。 In step 1120, it is judged whether the predetermined test time has been reached or the amount of data accumulated in the statistical record is sufficient. Under normal circumstances, there will be a predetermined period during the execution of the character investigation/data statistics stage of step 910 in FIG. 14 , and the predetermined period can be different time intervals such as 1 day, 3 days, 7 days, etc. When the time interval is longer, the amount of accumulated data is larger, and the analysis result of the statistical record will be closer to the intended goal of the present invention in practical application. In addition, when the accumulated data volume of the statistical record has reached a certain amount, the correlation between the statistical record and the audio-visual advertisements can be further analyzed. When the judgment result of step 1120 is true (Yes), then execute step 1130 , otherwise jump back to step 1010 to continue execution.

步驟1130中，係將該統計記錄(圖未示)透過網路860輸出到其他裝置上來分析。基本上，該統計記錄會透過該網路860來輸出，之後會繼續對該統計記錄與該等影音廣告的關連性來分析。依據本發明的另一實施例，該統計記錄(圖未示)係透過WiFi或是藍芽裝置輸出至一手持裝置上。 In step 1130, the statistical records (not shown) are output to other devices through the network 860 for analysis. Basically, the statistical records will be output through the network 860, and then the correlation between the statistical records and the audio-visual advertisements will be further analyzed. According to another embodiment of the present invention, the statistics record (not shown) is output to a handheld device through WiFi or Bluetooth device.

步驟1140中，係分析該統計記錄與各影音廣告的內容間的關連性。該分析行為的執行係由該等影音廣告內容的製作廠商、或是由該影像處理裝置300M的提供者來執行。承上述步驟1110中執行後所累積的該統計記錄(圖未示)進一步來分析，該分析的結果可能會得到：「廣告A吸引的族群是70歲以上老年人」、「廣告B吸引的族群是年紀35-45歲間的上班族女性」、「廣告C的廣告內容不吸引人，大多數的人的注視時間幾乎都只有1.5秒」、「廣告D的1：30至2：05的內容最容易引起所有人大笑」…等分析結果，該等許多的分析結果會做為圖14的步驟920中該「智慧電子看板應用階段」在執行時的資訊基礎，同時該結論也代表該等影音廣告內容的製作廠商的無人化調查問卷。 In step 1140, the correlation between the statistical records and the content of each audio-visual advertisement is analyzed. The execution of the analysis is performed by the producer of the video and audio advertisement content, or by the provider of the image processing device 300M. Based on the further analysis of the statistical records (not shown) accumulated after the execution of the above-mentioned step 1110, the result of the analysis may be: "the group attracted by advertisement A is the elderly over 70 years old", "the group attracted by advertisement B It is an office worker woman between the ages of 35 and 45", "The advertising content of Ad C is not attractive, and most people's gaze time is almost only 1.5 seconds", "The content of Ad D from 1:30 to 2:05 It is the most likely to cause everyone to laugh"... and other analysis results, many of these analysis results will be used as the information basis for the "smart electronic signage application stage" in step 920 of Figure 14 during execution, and the conclusion also represents the audio-visual Unmanned survey questionnaire for producers of advertising contents.

步驟1140執行完畢，即表示圖15中的流程已經結束，同時也代表圖14中該步驟910的「智慧電子看板人物調查/統計分析階段」已執行完畢。 The execution of step 1140 means that the process shown in FIG. 15 has ended, and it also means that the "investigation/statistical analysis phase of the smart electronic signboard person" of step 910 in FIG. 14 has been executed.

上述步驟1140執行後的分析結果，可以達到「哪些影音廣告的內容會吸引到哪些族群」的目的。所以步驟1140執行後的分析結果，都可以新增再作為該些影音廣告的「內容屬性」的一部份。 The analysis results after the execution of the above step 1140 can achieve the purpose of "which group of people will be attracted by the contents of which audio-visual advertisements". Therefore, the analysis results after the execution of step 1140 can be added and used as a part of the "content attribute" of these audio-visual advertisements.

又，上述步驟1040、步驟1050、步驟1070的執行係為了判斷圖片中人是否正在注視影音顯示裝置850上的螢幕890，若圖片中的人不是正在注視影音顯示裝置850上的螢幕890時，則圖片中此種情況的人不會被列入統計的對象。依據本發明的另一實施例，當不需要判斷圖片中人是否正在注視影音顯示裝置850上的螢幕890時，則步驟1040、步驟1050、步驟1070可以省略。此外，在此情況下，也不需步驟1080去判斷圖片中人是否注視影音顯示裝置850上的螢幕890的時間長度是否超過一預設的時間門檻值，此時步驟1080也會同步省略。 Also, the above-mentioned step 1040, step 1050, and step 1070 are performed in order to judge whether the person in the picture is watching the screen 890 on the audio-visual display device 850, if the person in the picture is not watching the screen 890 on the audio-visual display device 850, then People in this situation in the picture will not be included in the statistics. According to another embodiment of the present invention, when it is not necessary to determine whether the person in the picture is looking at the screen 890 on the audio-visual display device 850, then step 1040, step 1050, and step 1070 can be omitted. In addition, in this case, step 1080 is not required to determine whether the person in the picture is looking at the screen 890 on the audio-visual display device 850 for longer than a preset time threshold, and step 1080 is also skipped at this time.

請參考圖16，圖16係顯示依據圖14中關於「智慧電子看板系統的應用階段」920的流程圖。該流程開始於步驟1200。 Please refer to FIG. 16 . FIG. 16 shows a flow chart of the “application phase of the smart electronic kanban system” 920 according to FIG. 14 . The process starts at step 1200 .

步驟1200中，係取得來自該影像擷取裝置400M的影像資料。圖16中步驟1200與圖15中步驟1000是相同，請參考圖15中步驟1000的說明。 In step 1200, the image data from the image capture device 400M is acquired. Step 1200 in FIG. 16 is the same as step 1000 in FIG. 15 , please refer to the description of step 1000 in FIG. 15 .

步驟1205中，係提供該影音廣告內容的屬性的相關資料。該廣告容的屬性係指由該內容伺服器870或是該媒體播放器880所提供的影音廣告的內容屬性等資料。上述該影音廣告的內容屬性包括：每一影音廣告的編號、每一影音廣告的播放順序、目前正在播放的影音廣告的編號、每一影音廣告的標的產品(例如：冰淇淋、汽車、眼鏡、保健食品…等)、每一影音廣告所適合的族群對象(例如：婦女、老人、小孩、男性上班族…等)、每一影音廣告預設的播出期間、每一影音廣告播放的總時間…等各種資訊。其中，每一影音廣告所適合的族群對象與後續偵測影像資料的圖片中所具有的各種「人的屬性」的關係程度非常高，該「人的屬性」係為，例如：性別、身高、胖瘦、年紀、職業…等。 In step 1205, relevant information about the attribute of the video and audio advertisement content is provided. The attribute of the advertisement content refers to the content of the audio-visual advertisement provided by the content server 870 or the media player 880 properties etc. The content attribute of above-mentioned this audio-visual advertisement comprises: the serial number of each audio-visual advertisement, the playing order of each audio-visual advertisement, the serial number of the audio-visual advertisement currently being played, the target product of each audio-visual advertisement (for example: ice cream, automobile, glasses, health care Food, etc.), the target group for each video advertisement (for example: women, the elderly, children, male office workers, etc.), the preset broadcast period of each video advertisement, the total playing time of each video advertisement... and other information. Among them, the group objects suitable for each audio-visual advertisement have a very high degree of relationship with the various "human attributes" in the pictures of the subsequent detection video data. The "human attributes" are, for example: gender, height, Fat or thin, age, occupation...etc.

步驟1210中，係辨識該影像資料中具有主題(subject)的部分。圖16中的步驟1210與圖15中步驟1010是相同，請參考圖15中步驟1010的說明。 In step 1210, a part with a subject in the image data is identified. Step 1210 in FIG. 16 is the same as step 1010 in FIG. 15 , please refer to the description of step 1010 in FIG. 15 .

步驟1220中，係對該圖片中人的臉/頭部的偵測。圖16中的步驟1220與圖15中步驟1020是相同，請參考圖15中步驟1020的說明。 In step 1220, the face/head of the person in the picture is detected. Step 1220 in FIG. 16 is the same as step 1020 in FIG. 15 , please refer to the description of step 1020 in FIG. 15 .

步驟1240中，係對該圖片中人的頭部方向的偵測。圖16中的步驟1240與圖15中步驟1040是相同，請參考圖15中步驟1040的說明。 In step 1240, the head direction of the person in the picture is detected. Step 1240 in FIG. 16 is the same as step 1040 in FIG. 15 , please refer to the description of step 1040 in FIG. 15 .

步驟1250中，係對該圖片中人的頭部位置之估算。圖16中的步驟1250與圖15中步驟1050是相同，請參考圖15中步驟1050的說明。 In step 1250, the head position of the person in the picture is estimated. Step 1250 in FIG. 16 is the same as step 1050 in FIG. 15 , please refer to the description of step 1050 in FIG. 15 .

步驟1270中，係對該圖片中人的注視區域的估算。承步驟1240、步驟1250執行後的結果(輸出)，可以進一步得知圖片中某一人的注視區域，以判斷該某一人是否正在注視該影音顯示裝置850上的螢幕890。圖16中的步驟1270與圖15中步驟1070是相同，請參考圖15中步驟1070的說明。 In step 1270, the gaze area of the person in the picture is estimated. According to the result (output) of step 1240 and step 1250, the gaze area of a certain person in the picture can be further known to determine whether the certain person is looking at the screen 890 on the audio-visual display device 850 . Step 1270 in FIG. 16 is the same as step 1070 in FIG. 15 , please refer to the description of step 1070 in FIG. 15 .

步驟1260中，係對該圖片中人的性別、年紀、表情的偵測。依據本發明的另一實施例，該步驟1260中只針對該圖片中人的性別、年紀來做偵測，並不包含表情的偵測。圖16中的步驟1260與圖15中步驟1060是相同，請參考圖15中步驟1060的說明。 In step 1260, it is to detect the gender, age and expression of the person in the picture. According to another embodiment of the present invention, in step 1260, only the gender and age of the person in the picture are detected, and expression detection is not included. Step 1260 in FIG. 16 is the same as step 1060 in FIG. 15 , please refer to the description of step 1060 in FIG. 15 .

回到上述步驟1210，步驟1210中，當辨識圖片中的人所具有的特徵的標的為「身體」時，則進入步驟1230。 Returning to the above step 1210, in step 1210, when the target of identifying the feature of the person in the picture is "body", then go to step 1230.

步驟1230中，係對該圖片中人的身體部分的偵測，特別是關於「人的身高」、「人的體態(胖/瘦)」、「人的衣著」等部分的偵測。雖然步驟1230中顯示該「人的身高」、「人的體態(胖/瘦)」、「人的衣著」的偵測被列在同一步驟中，但實作上對於該「人的身高」、「人的體態(胖/瘦)」、「人的衣著」的偵測係與步驟1240、步驟1250、步驟1260是平行來處理的。圖16中的步驟1230與圖15中步驟1030是相同，請參考圖15中步驟1030的說明。 In step 1230, it is the detection of the body parts of the person in the picture, especially the detection of "person's height", "person's posture (fat/thin)", "person's clothing" and other parts. Although the detection of the "person's height", "person's posture (fat/thin)" and "person's clothing" are listed in the same step in step 1230, in practice, for the "person's height", The detection system of "person's posture (fat/thin)" and "person's clothing" is processed in parallel with step 1240, step 1250, and step 1260. Step 1230 in FIG. 16 is the same as step 1030 in FIG. 15 , please refer to the description of step 1030 in FIG. 15 .

步驟1270中，係對該圖片中人的注視區域的估算。承步驟1240與步驟1250的執行結果(輸出)，步驟1270可以估算出圖片中某一人所注視的區域是在何處。步驟1270執行完畢後即可知道「圖片中的某一人是否正在注視該影音顯示裝置850上的螢幕890」，即代表圖片中的該某一人是否正在觀看播放中的影音廣告。所以收集「圖片中正在觀看播放中的影音廣告的人」的相關資訊(屬性)是最重要的流程。對於此流程還必須包括「持續觀看正在播出的影音廣告一段時間以上」的條件，該一段時間的條件是5秒。又，該5秒的數值僅作為本發明的一實施例，並非對本發明的一限制，上述該5秒的時間長度可以是任何秒數的時間長度。承上步驟1270、步驟1260、步驟1230的輸出結果，本發明中需要被統計的標的對象(Target Person)，其基本條件是：「圖片中人持續觀看正在播出的影音廣告的時間長度超過一預設的時間門檻值」。該步驟1270、步驟1260、步驟1230執行完畢後則執行步驟1280的判斷。 In step 1270, the gaze area of the person in the picture is estimated. Based on the execution results (output) of step 1240 and step 1250, step 1270 can estimate where a person is looking at in the picture. After step 1270 is executed, it is possible to know "whether the person in the picture is watching the screen 890 on the audio-visual display device 850", that is, whether the person in the picture is watching the playing video-audio advertisement. Therefore, it is the most important process to collect the relevant information (attributes) of "people who are watching the playing audio-visual advertisement in the picture". This process must also include the condition of "continuing to watch the ongoing video and audio advertisement for a period of time", and the condition for this period of time is 5 seconds. In addition, the value of 5 seconds is only used as an embodiment of the present invention, and is not a limitation to the present invention. The above-mentioned 5 seconds may be any number of seconds. Continuing from the output results of step 1270, step 1260, and step 1230, the statistics that need to be counted in the present invention The basic condition of the target person (Target Person) is: "the length of time the person in the picture continues to watch the ongoing video and audio advertisement exceeds a preset time threshold". After step 1270, step 1260, and step 1230 are executed, the judgment of step 1280 is executed.

步驟1280中，係判斷圖片中的人是否正在注視該影音顯示裝置850的螢幕890且持續注視的時間長度超過一時間門檻值，例如：5秒以上。為使依據本發明的一實施例，圖片中同時具備上述2種條件的人才是本發明的所揭露的方法中的標的，其他的人則會予以忽略。步驟1280中的判斷，其結果若為真(Yes)，則執行步驟1290，否則執行步驟1330。 In step 1280, it is determined whether the person in the picture is watching the screen 890 of the audio-visual display device 850 and the duration of watching exceeds a time threshold, for example, more than 5 seconds. In order to make it according to an embodiment of the present invention, only those who meet the above two conditions in the picture are the targets in the disclosed method of the present invention, and others will be ignored. If the result of the judgment in step 1280 is true (Yes), then go to step 1290 , otherwise go to step 1330 .

步驟1290中，係即時統計圖片中該正在注視螢幕890的人及其屬性(Attribute)。上述圖片中該正在注視螢幕890的人的屬性係指圖片中正在注視螢幕890的人的「性別」、「年紀」、「身高」、「體態(胖/瘦)」、「配件」、「衣著」、「職業」…等各種特徵，步驟1290即時的對該圖片中該正在注視螢幕890的人的屬性做一統計(即時屬性統計，圖未示)。該即時屬性統計的統計結果可能是「20-26歲的女性：6人」、「70歲的年長者：6人、6-10歲的小孩：2人」、「戴眼鏡的女性：10人」、「45-60歲的男性上班族：16人、45-55歲女性上班族：3人」、「體態偏肥胖的女性：9人、體態適中的女性：2人、體態偏瘦的女性：4人」…等於圖片中各種人的屬性統計。上述該即時屬性統計的結果會做為是否變更目前正在播放的影音廣告內容的依據。 In step 1290, the person and his attribute (Attribute) in the picture that are watching the screen 890 are counted in real time. The attributes of the person watching the screen 890 in the above picture refer to the "sex", "age", "height", "body (fat/thin)", "accessories", "clothing" of the person watching the screen 890 in the picture ", "occupation"... and other characteristics, step 1290 immediately makes a statistics on the attributes of the person who is watching the screen 890 in the picture (real-time attribute statistics, not shown in the figure). The statistical results of this real-time attribute statistics may be "women aged 20-26: 6 people", "elderly people aged 70: 6 people, children aged 6-10: 2 people", "women wearing glasses: 10 people ", "45-60-year-old male office workers: 16 people, 45-55-year-old female office workers: 3 people", "obese women: 9 people, moderate-bodied women: 2 people, thin women : 4 people"...It is equal to the attribute statistics of various people in the picture. The result of the real-time attribute statistics mentioned above will be used as the basis for changing the content of the currently playing audio-visual advertisement.

步驟1300中，係判斷圖片中所有的人是否都已經偵測/統計完成。依據本發明的一實施例，對於圖片中的每一人的偵測，該影像分析應用程式460M幾乎都是平行的處理的。承上述說明，該影像分析應用程式460M包括對：頭部方向之偵測、頭部位置之估算、性別之判斷、年紀之估算、身高的估算、胖瘦的判斷、職業(衣著)的判斷…等屬於臉/頭部與身體方面的偵測。當該影像分析應用程式460M完成以上工作後，則執行步驟1310，否則跳回至步驟1210中執行。 In step 1300, it is judged whether all persons in the picture have been detected/counted. According to an embodiment of the present invention, the image analysis application 460M processes the detection of each person in the picture almost in parallel. According to the above description, the image analysis application program 460M includes the pair of: head Direction detection, head position estimation, gender judgment, age estimation, height estimation, fat or thin judgment, occupation (clothing) judgment, etc. belong to face/head and body detection. After the image analysis application 460M completes the above work, execute step 1310 , otherwise jump back to step 1210 for execution.

步驟1310中，係依據即時人的屬性統計的結果、各影音廣告的內容屬性來挑選目前最相符(適合)的影音廣告。上述即時人的屬性統計係步驟1290中統計圖片中所有正在注視螢幕890的人及其屬性(Attribute)；該影音廣告屬性係由步驟1205中所提供的各影音廣告的內容屬性的相關資料。該影像分析應用程式460M透過智慧型的比對與分析，該影像分析應用程式460M會挑選出最相符(適合)「目前圖片中正在觀看影音廣告的人」的影音廣告。該所挑選出的影音廣告可能只有1則，也可能是1則以上。當該所選出的影音廣告是1則以上時，則進一步還會包括決定其播出的先後順序。上述該智慧型的比對與分析可能是：圖片中所包括的人的三分之二以上為30-65歲的男性時，選擇播出關於汽車的影音廣告；圖片中所包括的人的四分之三以上為12-50歲的女性時，選擇播出關於女性生理用品的影音廣告；圖片中所包括的人的二分之一以上有戴眼鏡時，選擇播出關於近視眼鏡的影音廣告…等分別屬於不同的內容屬性的影音廣告。 In step 1310, the currently most matching (suitable) audiovisual advertisement is selected according to the result of real-time person attribute statistics and the content attributes of each audiovisual advertisement. The attribute statistics of the above-mentioned real-time people are counted in step 1290 on all people who are watching the screen 890 in the picture and their attributes (Attribute); The image analysis application 460M uses intelligent comparison and analysis, and the image analysis application 460M will select the most suitable (suitable) audio-visual advertisement for "the person watching the audio-visual advertisement in the current picture". There may be only one or more than one audio-visual advertisement selected by the selection. When the selected audio-visual advertisements are more than one, the order of broadcasting them will be further determined. The comparison and analysis of the above-mentioned intelligence may be: when more than two-thirds of the people included in the picture are males aged 30-65, they choose to broadcast audio-visual advertisements about cars; When more than three-thirds of the women aged 12-50 choose to broadcast audio-visual advertisements about female physiological products; when more than half of the people included in the picture wear glasses, they choose to broadcast audio-visual advertisements about myopia glasses ... and other video and audio advertisements that belong to different content attributes.

步驟1320中，係通知該內容伺服器870變更目前播放的影音廣告與順序。由於該影音廣告的檔案是儲存於該內容伺服器870內，且該影音廣告的播放與影音串流的輸出也是由該內容伺服器870所完成。故該內容伺服器870可以決定目前要播放的影音廣告的內容(內容屬性)及其播放順序。所以該影像處理裝置300M通知該內容伺服器870，以「步驟1310中，透過智慧的比對與分析後所挑選出的影音廣告及其順序」來播放該影音廣告。又，此時，對於目前尚未播放完畢的影音廣告，該內容伺服器870可以是即時中斷該影音廣告並切換成新順序的影音廣告的方式來播出。在切換新順序後所播出的影音廣告，其廣告屬性是與圖片中目前正在注視影音廣告播出的人的屬性是最相關的，所以也極可能是圖片中目前正在注視影音廣告播出的人最有興趣的影音廣告。就以該影音廣告來說，該影音廣告係引起該圖片中最多人的興趣。故，本發明中的智慧型電子看板系統100M可以將該影音廣告的廣告效益發揮到最大。 In step 1320, the content server 870 is notified to change the currently played video and audio advertisement and sequence. Since the file of the audio-visual advertisement is stored in the content server 870 , and the playback of the audio-visual advertisement and the output of the audio-visual stream are also completed by the content server 870 . Therefore, the content server 870 can determine the content (content attribute) and the playing order of the audio-visual advertisement to be played at present. Therefore, the image processing device 300M notifies the content server 870 to play the video and video advertisement with "the video and video advertisement and its sequence selected after intelligent comparison and analysis in step 1310". Also, at this time, for the audio-visual advertisements that have not been played yet, the content server 870 can immediately interrupt the audio-visual advertisements and switch to a new sequence. The way of video and audio advertisements to broadcast. The video and audio advertisement played after the new sequence is switched, its advertisement attribute is the most relevant to the attribute of the person who is currently watching the video and video advertisement in the picture, so it is also very likely to be the person in the picture who is currently watching the video and video advertisement The audio-visual advertisements that people are most interested in. As for the audio-visual advertisement, the audio-visual advertisement arouses the interest of the most people in the image. Therefore, the intelligent electronic signage system 100M in the present invention can maximize the advertising benefits of the video and audio advertisements.

步驟1330中，係該影像處理裝置300M判斷是否收到該內容伺服器870所送出的暫停/關機信號。為使影音廣告的播映成本降低，該等影音廣告不一定是24小時來循環播放，而是在每日中的某一段時間來播放，例如：07：00至24：00的時間來播放。當該影像處理裝置300M收到該內容伺服器870所送出的暫停/關機信號時，則結束圖16中該智慧電子看板應用階段的流程，否則，跳回至步驟1210中繼續執行。 In step 1330, the image processing device 300M judges whether the pause/shutdown signal sent by the content server 870 is received. In order to reduce the cost of broadcasting audio-visual advertisements, such audio-visual advertisements are not necessarily played in a 24-hour loop, but are played at a certain time of day, for example, from 07:00 to 24:00. When the image processing device 300M receives the pause/shutdown signal sent by the content server 870, it ends the process of the application stage of the smart electronic signage in FIG.

又，上述步驟1240、步驟1250、步驟1270的執行係為了判斷圖片中人是否正在注視影音顯示裝置850上的螢幕890，若圖片中的人不是正在注視影音顯示裝置850上的螢幕890時，則圖片中此種情況的人不會被列入統計的對象。依據本發明的另一實施例，當不需要判斷圖片中人是否正在注視影音顯示裝置850上的螢幕890時，則步驟1240、步驟1250、步驟1270可以省略。此外，在此情況下，也不需步驟1280去判斷圖片中人是否注視影音顯示裝置850上的螢幕890的時間長度是否超過一預設的時間門檻值，此時步驟1280也會同步省略。 Also, the execution of the above steps 1240, 1250, and 1270 is to determine whether the person in the picture is watching the screen 890 on the audio-visual display device 850, if the person in the picture is not watching the screen 890 on the audio-visual display device 850, then People in this situation in the picture will not be included in the statistics. According to another embodiment of the present invention, when it is not necessary to determine whether the person in the picture is looking at the screen 890 on the audio-visual display device 850, then step 1240, step 1250, and step 1270 can be omitted. In addition, in this case, step 1280 is not required to determine whether the person in the picture is looking at the screen 890 on the audio-visual display device 850 for longer than a preset time threshold, and step 1280 is also skipped at this time.

又，雖然圖14中繪示步驟910與步驟920是依照先後順序執行，並分別以圖15與圖16的2個子流程來代表步驟910與步驟920的執行細節；但，依據本發明的另一實施例，圖14中的步驟910及步驟920亦可以各自獨立地平行執行，也就是說，代表步驟910及步驟920之執行內容的2個子流程(亦即，圖15及圖16)可以同時執行。 Also, although it is shown in FIG. 14 that step 910 and step 920 are executed sequentially, and the execution details of step 910 and step 920 are represented by two sub-flows in FIG. 15 and FIG. 16 respectively; however, according to another aspect of the present invention In an embodiment, step 910 and step 920 in FIG. 14 can also be independently executed in parallel, that is, That is to say, the two subroutines representing the execution content of step 910 and step 920 (ie, FIG. 15 and FIG. 16 ) can be executed simultaneously.

本發明的實施例中，該影像處理裝置300M取得即時的影像資料後，首先辨識/分析圖片中該正在注視影音廣告的人的屬性，再經過與各影音廣告的內容屬性執行一智慧的分析比對，以得到最新的影音廣告播放順序且即時地變更影音廣告的播放，所以該最新播出的影音廣告的廣告內容極可能引起圖片中正在觀看影音廣告的人的興趣，也表示目前正在播出的影音廣告都是被目前最相關的人來觀看。所以，使用本發明所揭露的該智慧電子看板系統與方法會讓該影音廣告的廣告效益發揮到最大，以節省廣告經費。 In the embodiment of the present invention, after the image processing device 300M obtains the real-time image data, it first identifies/analyzes the attribute of the person watching the audio-visual advertisement in the picture, and then performs an intelligent analysis and comparison with the content attribute of each audio-visual advertisement. Yes, to obtain the latest playing order of video and audio advertisements and change the playback of video and video advertisements in real time, so the advertising content of the latest video and video advertisements is likely to arouse the interest of the people who are watching the video and video advertisements in the picture, and it also means that it is currently being broadcast of video and audio ads are viewed by the most relevant people at the moment. Therefore, using the smart electronic billboard system and method disclosed in the present invention can maximize the advertising benefits of the audio-visual advertisements to save advertising expenses.

以上所述僅為本發明之較佳實施例，凡依本發明申請專利範圍所做之均等變化與修飾，皆應屬本發明之涵蓋範圍。 The above descriptions are only preferred embodiments of the present invention, and all equivalent changes and modifications made according to the scope of the patent application of the present invention shall fall within the scope of the present invention.

700,705,710,715,720,725,722A,722B,722C,730,735,740,745,750,755,760,765:步驟 700,705,710,715,720,725,722A,722B,722C,730,735,740,745,750,755,760,765: steps

Claims

一種訪客興趣程度分析系統，係用於分析一訪客對至少一物品的興趣程度，該系統包括：至少一影像擷取裝置，設置於一地點，用以擷取該地點的一影像資料，該影像資料中記錄有該訪客的一第一頭部影像；以及一影像分析伺服器，連接於該至少一影像擷取裝置，用以計算並分析來自該至少一影像擷取裝置的該影像資料，該影像分析伺服器更包含：一資料處理中心，用於執行一影像分析應用程式，可以依據自該第一頭部影像中所得的一第一特徵對映(feature mapping)而判斷該第一頭部影像所對應的一第一頭部方向，並依循該第一頭部方向而計算出一第一投影區域；以及一記憶單元，用來暫存該影像資料、該第一頭部影像、該第一特徵對映(feature mapping)、以及其它該資料處理中心在運作過程中所需或產出的相關資料；其中，該第一投影區域的計算係利用一虛擬光源，該虛擬光源置於該訪客的頭部位置的後方且與該第一頭部方向一致的方向投射光線而形成模擬的該第一投影區域，當該第一投影區域涵蓋該至少一物品的位置時，即表示該訪客對該至少一物品有一興趣程度。 A visitor interest degree analysis system is used to analyze a visitor's degree of interest in at least one item. The system includes: at least one image capture device, set at a location, for capturing an image data of the location, the image A first head image of the visitor is recorded in the data; and an image analysis server is connected to the at least one image capture device to calculate and analyze the image data from the at least one image capture device, the The image analysis server further includes: a data processing center for executing an image analysis application program, which can judge the first head according to a first feature mapping obtained from the first head image A first head direction corresponding to the image, and a first projection area is calculated according to the first head direction; and a memory unit is used to temporarily store the image data, the first head image, the second A feature mapping (feature mapping), and other relevant data required or produced by the data processing center during operation; wherein, the calculation of the first projection area uses a virtual light source placed on the visitor The simulated first projection area is formed by projecting light in a direction consistent with the first head direction and behind the head position of the visitor. When the first projection area covers the position of the at least one item, it means that the visitor has At least one item has an interest level.

如請求項1所述的系統，其中該影像分析伺服器更具有一儲存單元，由至少一實體儲存裝置所組成，用來提供該影像分析伺服器儲存空間。 The system according to claim 1, wherein the image analysis server further has a storage unit, which is composed of at least one physical storage device, and is used to provide storage space for the image analysis server.

如請求項1所述的系統，其中該資料處理中心係由一中央處理單元與一圖形處理單元的其中之一或兩者所組成。 The system according to claim 1, wherein the data processing center is composed of one or both of a central processing unit and a graphics processing unit.

如請求項1所述的系統，其中該第一特徵對映(feature mapping)係該影像分析伺服器透過分析該第一頭部影像中各個像素(pixel)間的關係而得之可以描述該第一頭部影像的特徵的一組資料。 The system as described in claim 1, wherein the first feature mapping (feature mapping) is obtained by the image analysis server by analyzing the relationship between each pixel (pixel) in the first head image and can describe the first A set of data that characterizes a head image.

如請求項1所述的系統，其中該第一頭部方向係指該第一頭部影像中該訪客的頭部於空間中運動的一自由度。 The system as claimed in claim 1, wherein the first head orientation refers to a degree of freedom in which the visitor's head moves in space in the first head image.

如請求項5所述的系統，其中該自由度是以翻滾(Roll)、俯仰(Pitch)、偏擺(Yaw)三個旋轉維度來表示。 The system according to claim 5, wherein the degree of freedom is represented by three rotation dimensions of roll (Roll), pitch (Pitch) and yaw (Yaw).

如請求項5所述的系統，其中該自由度是以代表前後、左右、上下的三維直角座標系來表示。 The system as claimed in claim 5, wherein the degree of freedom is represented by a three-dimensional Cartesian coordinate system representing front and rear, left and right, and up and down.

如請求項1所述的系統，其中該至少一影像擷取裝置與該影像分析伺服器間係直接連接或是透過一網路連接。 The system according to claim 1, wherein the at least one image capture device is directly connected to the image analysis server or connected through a network.

如請求項1所述的系統，其中該影像資料僅由該至少一影像擷取裝置的單一鏡頭所擷取。 The system as claimed in claim 1, wherein the image data is only captured by a single lens of the at least one image capture device.

如請求項1所述的系統，其中該影像分析應用程式係為一軟體、或軟硬體兼具的模組，用於分析該影像資料。 The system according to claim 1, wherein the image analysis application program is a software, or a module combining software and hardware, for analyzing the image data.

如請求項1所述的系統，其中更包含一雲資料儲存單元，連接於該至少一影像擷取裝置與該影像分析伺服器，該雲資料儲存單元係用於儲存該至少一影像擷取裝置所擷取的該影像資料。 The system as described in claim 1, which further comprises a cloud data storage unit connected to The at least one image capture device, the image analysis server, and the cloud data storage unit are used to store the image data captured by the at least one image capture device.

如請求項11所述的系統，其中該影像分析伺服器更包括一雲通道服務應用程式，做為該影像分析伺服器與該雲資料儲存單元之間資料傳輸的中介服務程式。 The system as described in claim 11, wherein the image analysis server further includes a cloud channel service application program as an intermediary service program for data transmission between the image analysis server and the cloud data storage unit.

如請求項11所述的系統，其中該影像分析伺服器是位於該雲資料儲存單元內的一虛擬機器。 The system as claimed in claim 11, wherein the image analysis server is a virtual machine located in the cloud data storage unit.

如請求項1所述的系統，其中該第一投影區域與該第一頭部影像於空間中代表的橫截面積一樣。 The system as claimed in claim 1, wherein the first projected area is the same as the cross-sectional area represented by the first head image in space.

如請求項1所述的系統，其中該第一投影區域更區分為複數個子區域，該複數個子區域各自對應一機率，代表該至少一物品的位置被涵蓋於該子區域時，該至少一物品被注視的機率。 The system as described in claim 1, wherein the first projected area is further divided into a plurality of sub-areas, and each of the plurality of sub-areas corresponds to a probability, representing that when the position of the at least one item is covered by the sub-area, the at least one item chance of being watched.

如請求項15所述的系統，其中該複數個子區域各自對應的該機率代表該訪客對該至少一物品的該興趣程度。 The system according to claim 15, wherein the probability corresponding to each of the plurality of sub-regions represents the degree of interest of the visitor to the at least one item.

如請求項1所述的系統，其中更計算該第一投影區域涵蓋該至少一物品的位置的時間長度是否超過一時間門檻值，做為判斷該訪客是否對該至少一物品具有該興趣程度的依據。 The system as described in claim 1, wherein it further calculates whether the length of time that the first projection area covers the position of the at least one item exceeds a time threshold, as a criterion for judging whether the visitor has the degree of interest in the at least one item in accordance with.

如請求項1所述的系統，其中該至少一物品的位置係參考一輸入的位置資料而得，或是透過分析該影像資料中該至少一物品的全部或部分影像而計算而得。 The system as claimed in claim 1, wherein the position of the at least one item is obtained by referring to an input position data, or is calculated by analyzing all or part of the image of the at least one item in the image data.

如請求項1所述的系統，其中更對該影像資料中加入文字、數字或符號說明該訪客的身份(ID)或代號。 The system as described in Claim 1, further adding words, numbers or symbols to the image data to describe the identity (ID) or code of the visitor.

如請求項1所述的系統，其中該影像資料中加入文字、數字或符號說明該至少一物品的名稱或代號。 The system according to claim 1, wherein words, numbers or symbols are added to the image data to describe the name or code of the at least one item.

如請求項1所述的系統，其中該影像資料中更包含有該至少一物品的全部或部分影像，並在該至少一物品的全部或部分影像上或周圍加入一視覺化的效果，用以凸顯該訪客對該至少一物品的興趣程度。 The system as described in Claim 1, wherein the image data further includes all or part of the image of the at least one item, and a visual effect is added on or around the whole or part of the image of the at least one item for Highlight the visitor's degree of interest in the at least one item.

一種訪客興趣程度分析系統，連接於至少一影像擷取模組且接收來自該至少一影像擷取模組的一影像資料，係用於分析一訪客對至少一標的的興趣程度，該系統包括：一資料處理中心，執行一影像分析應用程式，可以依據自該影像資料中一第一頭部影像所得的一第一特徵對映(feature mapping)而判斷該第一頭部影像所對應的一第一頭部方向，並依循該第一頭部方向而計算出一第一投影區域；以及一記憶單元，用來暫存該影像資料、該第一頭部影像、該第一特徵對映(feature mapping)、以及其它該資料處理中心在運作過程中所需或產出的相關資料；其中，該第一投影區域的計算係利用一虛擬光源，該虛擬光源置於該訪客的頭部位置的後方且與該第一頭部方向一致的方向投射光線而形成模擬的該第一投影區域，當該第一投影區域涵蓋該至少一標的的位置時，即表示該訪客對該至少一標的有一興趣程度。 A visitor interest degree analysis system, connected to at least one image capture module and receiving an image data from the at least one image capture module, is used to analyze a visitor's degree of interest in at least one target, the system comprising: A data processing center executes an image analysis application program, and can determine a first head image corresponding to the first head image according to a first feature mapping obtained from the first head image in the image data. A head direction, and a first projection area is calculated according to the first head direction; and a memory unit is used to temporarily store the image data, the first head image, and the first feature mapping (feature mapping), and other relevant data required or produced by the data processing center during its operation; Wherein, the calculation of the first projection area utilizes a virtual light source, which is placed behind the head of the visitor and projects light in a direction consistent with the direction of the first head to form a simulated first projection area , when the first projection area covers the position of the at least one target, it means that the visitor has a degree of interest in the at least one target.

如請求項22所述的系統，其中更具有一儲存單元，由至少一實體儲存裝置所組成，用來提供儲存空間。 The system as claimed in claim 22 further has a storage unit composed of at least one physical storage device for providing storage space.

如請求項22所述的系統，其中該資料處理中心係由一中央處理單元與一圖形處理單元的其中之一或兩者所組成。 The system according to claim 22, wherein the data processing center is composed of one or both of a central processing unit and a graphics processing unit.

如請求項22所述的系統，其中該第一特徵對映(feature mapping)係該資料處理中心透過分析該第一頭部影像中各個像素(pixel)間的關係而得之可以描述該第一頭部影像的特徵的一組資料。 The system as described in claim 22, wherein the first feature mapping is obtained by the data processing center by analyzing the relationship between pixels in the first head image and can describe the first A set of data for the characteristics of head images.

如請求項22所述的系統，其中該第一頭部方向係指該第一頭部影像中該訪客的頭部於空間中運動的一自由度。 The system according to claim 22, wherein the first head orientation refers to a degree of freedom in which the visitor's head moves in space in the first head image.

如請求項26所述的系統，其中該自由度是以翻滾(Roll)、俯仰(Pitch)、偏擺(Yaw)三個旋轉維度來表示。 The system according to claim 26, wherein the degree of freedom is represented by three rotational dimensions of roll (Roll), pitch (Pitch) and yaw (Yaw).

如請求項26所述的系統，其中該自由度是以代表前後、左右、上下的三維直角座標系來表示。 The system as claimed in claim 26, wherein the degree of freedom is represented by a three-dimensional Cartesian coordinate system representing front-back, left-right, and up-down.

如請求項22所述的系統，其中該系統係直接或透過一網路連接於該至少一影像擷取模組。 The system according to claim 22, wherein the system is connected to the at least one image capture module directly or through a network.

如請求項22所述的系統，其中該影像資料僅由該至少一影像擷取模組的單一鏡頭所擷取。 The system as claimed in claim 22, wherein the image data is only captured by a single lens of the at least one image capture module.

如請求項22所述的系統，其中該影像分析應用程式係為一軟體、或軟硬體兼具的模組，用於分析該影像資料。 The system according to claim 22, wherein the image analysis application program is a software, or a module combining software and hardware, for analyzing the image data.

如請求項22所述的系統，其中該系統更透過一雲資料儲存單元連接於該至少一影像擷取裝置，該雲資料儲存單元係用於儲存該至少一影像擷取裝置所擷取的該影像資料。 The system as described in claim 22, wherein the system is further connected to the at least one image capture device through a cloud data storage unit, and the cloud data storage unit is used to store the image captured by the at least one image capture device video material.

如請求項32所述的系統，其中更包括一雲通道服務應用程式，做為與該雲資料儲存單元之間資料傳輸的中介服務程式。 The system as described in claim 32 further includes a cloud channel service application program as an intermediary service program for data transmission with the cloud data storage unit.

如請求項32所述的系統，其中該系統是位於該雲資料儲存單元內的一虛擬機器。 The system as claimed in claim 32, wherein the system is a virtual machine located in the cloud data storage unit.

如請求項22所述的系統，其中該第一投影區域與該第一頭部影像於空間中代表的橫截面積一樣。 The system according to claim 22, wherein the first projected area is the same as the cross-sectional area represented by the first head image in space.

如請求項22所述的系統，其中該第一投影區域更區分為複數個子區域，該複數個子區域各自對應一機率，代表該至少一標的的位置被涵蓋於該子區域時，該至少一標的被注視的機率。 The system according to claim 22, wherein the first projected area is further divided into a plurality of sub-areas, and each of the plurality of sub-areas corresponds to a probability, representing that the position of the at least one target is covered in In the sub-area, the probability that the at least one target is watched.

如請求項36所述的系統，其中該複數個子區域各自對應的該機率代表該訪客對該至少一標的的該興趣程度。 The system as claimed in claim 36, wherein the probability corresponding to each of the plurality of sub-areas represents the degree of interest of the visitor in the at least one target.

如請求項22所述的系統，其中更計算模擬的該第一投影區域涵蓋該至少一標的的位置的時間長度是否超過一時間門檻值，做為判斷該訪客是否對該至少一標的具有該興趣程度的依據。 The system as described in claim 22, wherein it further calculates whether the time length of the simulated first projected area covering the position of the at least one object exceeds a time threshold value, as the judgment whether the visitor has the interest in the at least one object degree of basis.

如請求項22所述的系統，其中該至少一標的的位置係參考一輸入的位置資料而得，或是透過分析該影像資料中該至少一標的的全部或部分影像而計算而得。 The system as claimed in claim 22, wherein the position of the at least one object is obtained by referring to an input position data, or is calculated by analyzing all or part of the image of the at least one object in the image data.

如請求項22所述的系統，其中更對該影像資料中加入文字、數字或符號說明該訪客的身份(ID)或代號。 The system as described in Claim 22, further adding words, numbers or symbols to the image data to describe the identity (ID) or code of the visitor.

如請求項22所述的系統，其中更對該影像資料中加入文字、數字或符號說明該至少一標的的名稱或代號。 The system according to claim 22, further adding words, numbers or symbols to the image data to describe the name or code of the at least one object.

如請求項22所述的系統，其中更在該影像資料中更包含有該至少一物品的全部或部分影像，並在該至少一物品的全部或部分影像上或周圍加入一視覺化的效果，用以凸顯該訪客對該至少一標的的興趣程度。 The system as described in claim 22, wherein the image data further includes all or part of the image of the at least one item, and a visual effect is added on or around the entire or part of the image of the at least one item, It is used to highlight the visitor's degree of interest in the at least one target.

如請求項22所述的系統，該系統更包括：一內容管理模組，連接於該資料處理中心並根據該資料處理中心的分析結果提供一串流影音資料，其中該串流影音資料更包括一內容屬性；以及一播放模組，連接於該內容管理模組並解碼來自該內容管理模組的該串流影音資料，並將解碼後的該串流影音資料傳送給一顯示裝置供其播放。 As the system described in claim 22, the system further includes: A content management module, connected to the data processing center and providing a stream of audio-visual data according to the analysis results of the data processing center, wherein the stream of audio-visual data further includes a content attribute; and a playback module, connected to the content The management module decodes the streaming audio-visual data from the content management module, and sends the decoded streaming audio-visual data to a display device for playing.

如請求項43所述的系統，其中該顯示裝置係用以接收來自該播放模組所解碼的該串流影音資料。 The system according to claim 43, wherein the display device is used to receive the streamed audio-visual data decoded by the playback module.

如請求項43所述的系統，其中該顯示裝置包括一螢幕。 The system as claimed in claim 43, wherein the display device includes a screen.

如請求項43所述的系統，其中該播放模組及該資料處理中心是整合在一起。 The system according to claim 43, wherein the playing module and the data processing center are integrated together.

如請求項43所述的系統，其中該內容管理模組、該播放模組及該資料處理中心是整合在一起。 The system according to claim 43, wherein the content management module, the playback module and the data processing center are integrated together.

如請求項45所述的系統，其中該影像擷取模組是整合在該顯示裝置之中。 The system as claimed in claim 45, wherein the image capture module is integrated in the display device.

如請求項43所述的系統，其中該內容管理模組、該播放模組、該影像擷取模組及該資料處理中心之間係透過至少一網路彼此連接。 The system according to claim 43, wherein the content management module, the playback module, the image capture module and the data processing center are connected to each other through at least one network.

如請求項43所述的系統，其中該資料處理中心分析該影像資料中該訪客的數量與屬性，並根據分析後所得的結果調整該串流影音資料的內容。 The system as described in claim 43, wherein the data processing center analyzes the number and attributes of the visitor in the video data, and adjusts the content of the streaming audio-visual data according to the analyzed results Allow.

如請求項43所述的系統，其中該資料處理中心根據該第一頭部方向而判斷該訪客是否正在注視該顯示裝置，並根據分析後的結果調整該串流影音資料的內容。 The system according to claim 43, wherein the data processing center judges whether the visitor is looking at the display device according to the first head direction, and adjusts the content of the streaming audio-visual data according to the analysis result.

如請求項43所述的系統，其中該第一投影區域更區分為複數個子區域，該複數個子區域各自對應一機率，代表該顯示裝置的位置被涵蓋於該子區域時，該顯示裝置的播放內容被注視的機率。 The system as described in claim 43, wherein the first projection area is further divided into a plurality of sub-areas, and each of the plurality of sub-areas corresponds to a probability, which means that when the position of the display device is covered by the sub-area, the playback of the display device The probability of the content being watched.

如請求項52所述的系統，其中該複數個子區域各自對應的該機率代表該訪客對該顯示裝置的播放內容的該興趣程度。 The system as claimed in claim 52, wherein the probability corresponding to each of the plurality of sub-areas represents the degree of interest of the visitor in the playing content of the display device.

如請求項43所述的系統，其中更計算模擬的該第一投影區域涵蓋該顯示裝置的位置的時間長度是否超過一時間門檻值，做為判斷該訪客是否對該顯示裝置的播放內容具有該興趣程度的依據。 The system as described in claim 43, wherein it further calculates whether the time length of the simulated first projection area covering the position of the display device exceeds a time threshold value, as judging whether the visitor has the content played by the display device basis of interest.

如請求項22所述的系統，其中該影像擷取模組擷取該訪客的外觀而產生該影像資料。 The system according to claim 22, wherein the image capture module captures the appearance of the visitor to generate the image data.

如請求項22所述的方法，其中該虛擬光源位於該第一頭部影像於一立體空間中的所在位置的後方大約3倍的該第一頭部影像的模擬球體的直徑之處。 The method as claimed in claim 22, wherein the virtual light source is located behind the position of the first head image in a three-dimensional space by about 3 times the diameter of the simulated sphere of the first head image.

一種分析訪客興趣程度的方法，該方法係由一影像分析伺服器所執行，用以判斷一訪客對至少一標的的興趣程度，該方法包括：提供一影像分析應用程式於該影像分析伺服器中；該影像分析伺服器取得一影像資料；該影像分析伺服器偵測該影像資料中具有一第一頭部特徵的一第一頭部影像；該影像分析伺服器分析該第一頭部影像，並藉由分析所得的一第一特徵對映(feature mapping)而判斷該第一頭部影像所對應的一第一頭部方向；該影像分析伺服器計算該第一頭部影像於一立體空間中的所在位置，並依據該第一頭部影像的所在位置、該第一頭部方向以及利用一虛擬光源計算出模擬的一第一投影區域；以及該影像分析伺服器根據該第一投影區域的涵蓋範圍以及該至少一標的的位置，而判斷對應該第一頭部影像的該訪客是否對該至少一標的有一興趣程度。 A method for analyzing the degree of interest of a visitor. The method is executed by an image analysis server to determine the degree of interest of a visitor in at least one target. The method includes: providing an image analysis application program in the image analysis server The image analysis server obtains an image data; the image analysis server detects a first head image having a first head feature in the image data; the image analysis server analyzes the first head image, and judge a first head direction corresponding to the first head image by analyzing a first feature mapping obtained; the image analysis server calculates the first head image in a three-dimensional space and calculate a simulated first projection area based on the location of the first head image, the first head direction, and a virtual light source; and the image analysis server calculates a simulated first projection area according to the first projection area and the location of the at least one object, and determine whether the visitor who has viewed the first head image is interested in the at least one object.

如請求項57所述的方法，其中該影像分析應用程式係為一軟體、或軟硬體兼具的模組，用於分析該影像資料。 The method according to claim 57, wherein the image analysis application program is a software, or a module combining software and hardware, for analyzing the image data.

如請求項57所述的方法，其中該影像分析伺服器取得該影像資料的方式是，該影像分析伺服器接收由一影像擷取裝置透過網路或直接傳送的該影像資料。 The method as described in claim 57, wherein the image analysis server obtains the image data by receiving the image data sent by an image capture device through the network or directly by the image analysis server.

如請求項59所述的方法，其中該影像資料僅由該影像擷取裝置的單一鏡頭所擷取。 The method as claimed in claim 59, wherein the image data is only captured by a single lens of the image capture device.

如請求項57所述的方法，其中該影像分析伺服器取得該影像資料的方式是，該影像分析伺服器透過一網路從一雲資料儲存單元下載該影像資料。 The method as described in claim 57, wherein the image analysis server obtains the image data by downloading the image data from a cloud data storage unit through a network.

如請求項61所述的方法，其中更提供一雲通道服務應用程式，做為該影像分析伺服器與該雲資料儲存單元之間資料傳輸的中介服務程式。 The method as described in claim 61, wherein a cloud channel service application program is further provided as an intermediary service program for data transmission between the image analysis server and the cloud data storage unit.

如請求項57所述的方法，其中該影像分析伺服器更包含：一資料處理中心，用於執行該影像分析應用程式，可以依據自該第一頭部影像中所得的該第一特徵對映(feature mapping)而判斷該第一頭部影像所對應的該第一頭部方向，並依循該第一頭部方向而計算出該第一投影區域；以及一記憶單元，用來暫存該影像資料、該第一頭部影像、該第一特徵對映(feature mapping)、以及其它該資料處理中心在運作過程中所需或產出的相關資料。 The method as described in claim 57, wherein the image analysis server further includes: a data processing center for executing the image analysis application program, which can be mapped according to the first feature obtained from the first head image (feature mapping) to determine the first head direction corresponding to the first head image, and calculate the first projection area according to the first head direction; and a memory unit for temporarily storing the image data, the first head image, the first feature mapping, and other relevant data required or produced by the data processing center during operation.

如請求項63所述的方法，其中該資料處理中心係由一中央處理單元與一圖形處理單元的其中之一或兩者所組成。 The method according to claim 63, wherein the data processing center is composed of one or both of a central processing unit and a graphics processing unit.

如請求項57所述的方法，其中該第一特徵對映(feature mapping)係該影像分析伺服器透過分析該第一頭部影像中各個像素(pixel)間的關係而得之可以描述該第一頭部影像的特徵的一組資料。 The method as described in claim 57, wherein the first feature mapping (feature mapping) is obtained by the image analysis server by analyzing the relationship between each pixel (pixel) in the first head image and can describe the first A set of data that characterizes a head image.

如請求項57所述的方法，其中該第一頭部方向係指該第一頭部影像中該訪客的頭部於空間中運動的一自由度。 The method according to claim 57, wherein the first head orientation refers to a degree of freedom in which the visitor's head moves in space in the first head image.

如請求項66所述的方法，其中該自由度是以翻滾(Roll)、俯仰(Pitch)、偏擺(Yaw)三個旋轉維度來表示。 The method according to claim 66, wherein the degree of freedom is represented by three rotational dimensions of roll, pitch and yaw.

如請求項66所述的方法，其中該自由度是以代表前後、左右、上下的三維直角座標系來表示。 The method as claimed in claim 66, wherein the degree of freedom is represented by a three-dimensional Cartesian coordinate system representing front and rear, left and right, and up and down.

如請求項57所述的方法，其中該第一投影區域與該第一頭部影像於空間中代表的橫截面積相同。 The method according to claim 57, wherein the first projected area is the same as the cross-sectional area represented by the first head image in space.

如請求項57所述的方法，其中該第一投影區域更區分為複數個子區域，該複數個子區域各自對應一機率，代表該至少一標的的位置被涵蓋於該子區域時，該至少一標的被注視的機率。 The method as described in claim 57, wherein the first projection area is further divided into a plurality of sub-areas, and each of the plurality of sub-areas corresponds to a probability, representing that when the position of the at least one target is covered by the sub-area, the at least one target chance of being watched.

如請求項70所述的方法，其中該複數個子區域各自對應的該機率代表該訪客對該至少一標的的該興趣程度。 The method as claimed in claim 70, wherein the probability corresponding to each of the plurality of sub-areas represents the degree of interest of the visitor to the at least one target.

如請求項57所述的方法，其中更計算該第一投影區域涵蓋該至少一標的的位置的時間長度是否超過一時間門檻值，做為判斷該訪客是否對該至少一標的具有該興趣程度的依據。 The method as described in claim 57, wherein it is further calculated whether the length of time that the first projection area covers the position of the at least one object exceeds a time threshold, as a criterion for judging whether the visitor has the degree of interest in the at least one object in accordance with.

如請求項57所述的方法，其中該至少一標的的位置係參考一輸入的位置資料而得，或是透過分析該影像資料中該至少一標的的全部或部分影像而計算而得。 The method as recited in claim 57, wherein the position of the at least one target is referenced to an input obtained from the location data of the image data, or calculated by analyzing all or part of the image of the at least one target in the image data.

如請求項57所述的方法，其中更對該影像資料中加入文字、數字或符號說明該訪客的身份(ID)或代號。 The method as described in claim 57, further adding words, numbers or symbols to the image data to describe the identity (ID) or code of the visitor.

如請求項57所述的方法，其中更對該影像資料中加入文字、數字或符號說明該至少一標的的名稱或代號。 The method as described in claim 57, further adding words, numbers or symbols to the image data to describe the name or code of the at least one object.

如請求項57所述的方法，其中該影像資料中更包含有該至少一物品的全部或部分影像，並在該至少一物品的全部或部分影像上或周圍加入一視覺化的效果，用以凸顯該訪客對該至少一標的的興趣程度。 The method as described in claim 57, wherein the image data further includes all or part of the image of the at least one item, and a visual effect is added on or around the whole or part of the image of the at least one item for Highlight the visitor's degree of interest in the at least one subject.

如請求項57所述的方法，其中該影像分析伺服器是位於一雲資料儲存單元內的一虛擬機器。 The method as claimed in claim 57, wherein the image analysis server is a virtual machine located in a cloud data storage unit.

如請求項57所述的方法，其中該影像分析伺服器更連接於一儲存單元，該儲存單元係由至少一實體儲存裝置所組成，用來提供該影像分析伺服器儲存空間。 The method as described in claim 57, wherein the image analysis server is further connected to a storage unit, and the storage unit is composed of at least one physical storage device for providing storage space for the image analysis server.

如請求項57所述的方法，其中該虛擬光源位於該第一頭部影像於一立體空間中的所在位置的後方大約3倍的該第一頭部影像的模擬球體的直徑之處。 The method according to claim 57, wherein the virtual light source is located behind the position of the first head image in a three-dimensional space by about 3 times the diameter of a simulated sphere of the first head image.

如請求項57所述的方法，其中該至少一標的係為一顯示裝置。 The method as claimed in claim 57, wherein the at least one object is a display device.

如請求項80所述的方法，更包含以下步驟：當該影像分析伺服器判斷該訪客對該顯示裝置有該興趣程度時，統計該訪客的數量及其屬性；該影像分析伺服器根據該訪客的數量及其屬性的統計結果決定一串流影音資料，其中該串流影音資料更包括一內容屬性；以及該影像分析伺服器通知一內容伺服器並要求其根據決定的該串流影音資料做為該顯示裝置的播放內容。 The method described in claim 80 further includes the following steps: when the image analysis server determines that the visitor has the degree of interest in the display device, count the number of visitors and their attributes; The statistical results of the quantity and its attributes determine a stream of audio-visual data, wherein the stream of audio-visual data further includes a content attribute; and the image analysis server notifies a content server and requests it to do a It is the playback content of the display device.

如請求項81所述的方法，其中該顯示裝置中更具有一多媒體播放程式，用於接收並解碼該串流影音資料。 The method according to claim 81, wherein the display device further has a multimedia player program for receiving and decoding the streaming video and audio data.

如請求項81所述的方法，其中該影像分析伺服器更參考該串流影音資料的該內容屬性，而判斷該訪客對該內容屬性具有該興趣程度。 The method as claimed in claim 81, wherein the image analysis server further refers to the content attribute of the streaming audio-visual data, and determines that the visitor has the degree of interest in the content attribute.

如請求項81所述的方法，更包含以下步驟：該影像分析伺服器根據該第一頭部方向而判斷該訪客是否正在注視該顯示裝置，並統計分析後的結果做為調整該串流影音資料的依據。 The method as described in claim 81 further includes the following steps: the image analysis server judges whether the visitor is looking at the display device according to the first head direction, and uses the statistical analysis result as an adjustment to the streaming video The basis of the data.

如請求項81所述的方法，其中該第一投影區域更區分為複數個子區域，該複數個子區域各自對應一機率，代表該顯示裝置的位置被涵蓋於該子區域時，該串流影音資料被注視的機率。 The method as described in claim 81, wherein the first projected area is further divided into a plurality of sub-areas, and each of the plurality of sub-areas corresponds to a probability, representing that when the position of the display device is covered by the sub-area, the streaming audio-visual data chance of being watched.

如請求項85所述的方法，其中該複數個子區域各自對應的該機率代表該訪客對該顯示裝置的播放內容的興趣程度。 The method according to claim 85, wherein the probability corresponding to each of the plurality of sub-regions represents the degree of interest of the visitor in the playing content of the display device.

如請求項81所述的方法，更包含以下步驟：該影像分析伺服器計算該第一投影區域涵蓋該顯示裝置的位置的時間長度是否超過一時間門檻值，做為判斷該訪客是否對該顯示裝置的播放內容具有該興趣程度的依據。 The method as described in claim 81 further includes the following steps: the image analysis server calculates whether the time length during which the first projection area covers the position of the display device exceeds a time threshold value, and determines whether the visitor is interested in the display The content played by the device has evidence of this level of interest.

如請求項80所述的方法，其中該訪客的屬性是由以下組合的部分或全部所組成，包括：性別、年紀、身高、體態、配件、衣著、職業。 The method as described in claim item 80, wherein the attribute of the visitor is composed of part or all of the following combinations, including: gender, age, height, body shape, accessories, clothing, occupation.