TWI322963B - - Google Patents

Download PDF

Info

Publication number
TWI322963B
TWI322963B TW96102113A TW96102113A TWI322963B TW I322963 B TWI322963 B TW I322963B TW 96102113 A TW96102113 A TW 96102113A TW 96102113 A TW96102113 A TW 96102113A TW I322963 B TWI322963 B TW I322963B
Authority
TW
Taiwan
Prior art keywords
image
human motion
motion recognition
equation
human
Prior art date
Application number
TW96102113A
Other languages
Chinese (zh)
Other versions
TW200832237A (en
Inventor
Chin Teng Lin
Original Assignee
Univ Nat Chiao Tung
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Univ Nat Chiao Tung filed Critical Univ Nat Chiao Tung
Priority to TW96102113A priority Critical patent/TW200832237A/en
Publication of TW200832237A publication Critical patent/TW200832237A/en
Application granted granted Critical
Publication of TWI322963B publication Critical patent/TWI322963B/zh

Links

Landscapes

  • Image Analysis (AREA)

Description

九、發明說明: ' 【發明所屬之技術領域】 ' 本發明之主要目的係在提供—種人_作辨識方法,_是關於一種 結合時序姿態比對無_獅論來完成人_作的識別。 【先前技術】Nine, the invention description: '[Technical field to which the invention belongs] ' The main purpose of the present invention is to provide a method for identifying people, _ is about a combination of time-series pose comparison and no _ lion to complete the recognition of the person . [Prior Art]

讀動作賊在自動監㈣統、人機界面、居家安全鏡纟統和智慧 型居家桃等方面的應用巾财主要的地位。許多人醜作辨識系統僅僅 鲁侧單-張f’像的姿勢來糊該動作。但是,在時間序壯,姿勢狀態轉 換的關係是絲酬人齡作的重要資訊。近來已贿不少人類動作辨識 的方法被提出。大多數人類動作識別方法可以依據其方法所使用的特徵而 分為兩大類。第一類是利用動量特徵(m〇ti〇nbasedfeatures)。在“丨EEEThe reading action thief plays a major role in the application of automatic supervision (four) system, man-machine interface, home security mirror system and intelligent home peach. Many people's ugly identification systems only use the posture of the single-sheet f' image to confuse the action. However, in the time sequence, the relationship between posture state transitions is an important piece of information on the age of the people. Recently, many methods for recognizing human motion have been proposed. Most human motion recognition methods can be divided into two broad categories based on the features used in their methods. The first category is the use of momentum characteristics (m〇ti〇nbasedfeatures). At 丨EEE

Trans. Pattern Anal, v〇l_ 23, no. 3, 2001” 論文中,Bobick 和 Davis 利用比較時序像巾樣板的動量能量(mQt iQnenergy )以及先前動量統計 @〇notion history來辨視人類的動作。在 “pr〇c 〇〇η【. c〇fflpUt. Vision 鲁 Pattern Recog.’ ν〇ι. 4,pp 3M5,謂,,論文中,R Hamid 等人 採用了時空上的特徵’例如兩隻手之間相對距離和兩隻手之間的相對速 度’此外還使用動態貝氏網路(dynamic Bayesian netw〇rks)來完成人類的 動作識別jl作。另—方面’則是制二轉三維的形狀特徵來識別人類動 作在 iEEEComput.Soc. Workshop Models versus Exemplars in Comput. Vision, pp. 263-270,2003”論文中使用肯尼邊緣偵測器(Cannyedge detector)得到的資料來表示動作的輪廓形狀,並且對於每一個動作定義了 關鍵畫面(key frame)。在 “ieee lnt. Workshop on Anal. Modeling of 6 1322963Trans. Pattern Anal, v〇l_ 23, no. 3, 2001” In the paper, Bobick and Davis used the momentum energy (mQt iQnenergy) of the time series image and the previous momentum statistics @〇notion history to discern human movements. In "pr〇c 〇〇η [. c〇fflpUt. Vision Lu Pattern Recog.' ν〇ι. 4, pp 3M5, said, in the paper, R Hamid et al. adopted features in time and space 'eg two hands The relative distance between them and the relative speed between the two hands' also uses dynamic Bayesian netw〇rks to complete human motion recognition. The other aspect is the use of a two-dimensional three-dimensional shape feature to identify human motion in the iEEEComput.Soc. Workshop Models versus Exemplars in Comput. Vision, pp. 263-270, 2003" paper using the Kenny edge detector (Cannyedge Detector) The data obtained to represent the contour shape of the action, and a key frame is defined for each action. In "ieee lnt. Workshop on Anal. Modeling of 6 1322963

FacesandGest·,PP. 74-81,2003”論文中的作者們提出了使用· 以及不魏肖純的三維雜舰描述來達到人麟作的分酿辨識。 如果娜用運動基礎和微基_特师絲完絲傾獅工作, 因為時序上資訊並未被考慮進去,所以還是有許多動作是無法清楚辨別 —力。因此’這促使我們設計,穩定、_的方法,_人鑛作中隱含 固有的時紙㈣訊來達成較精準_作識別,ddenMarkQvMQdei (國) 能夠處理時序上的資料和能夠提供辨識上不受時間尺度改變影響的特性, Φ Hidden Markov Model也已經應用於手部姿勢識別和動作識別上。使用_ 的代價是效率上的問題以及必須收集大量的資料和花費許多時間在估計 HMM相對應的參數。 本發月糊特徵工間轉換(Elgenspace㈣㈣⑽奶⑻及標準空間轉 換(Canonical space transf〇_i〇n)提取圖像中的特徵,因而經 這些空間轉換後的向量可以將時態序列轉換成姿勢序列,這個姿勢序列可 板模板類別的索引值組合表示。因為如果在一個很短的時間間隔令所 瞻嫩的兩張影像之間的姿態變化差異會極少,這主要受到人類在做動作一 個固有的自然頻率所限制’其頻率不會很高:因此我們採用減低取樣頻率 影像而不使用所有畫面來做辨識。此外,本發明更提出了使用模糊法則推 論方法,此方法不但可以結合人類動作上的姿勢序列資訊而且模糊法則 可涵蓋不同人在做相同動作時姿勢上些微差異,增加辨識效果的穩定性。 【發明内容】 本發明之主要目的係在提供一種人類動作辨識方法特別是關於一種 7 1322963 程_裡,首先第-步驟S11U尤是為前景人物抽取建造一個背景模型。 本發明使用減影像相除絲贿統計f景模型。透過計算統計最 大、最小的灰階值和影像灰階值相除的最大比值,得以獲得一個背景模型。 本發明提出-種已經被證實對酬變化較秘感的影像相除比值法, 而不是使㈣像差值法來建立背景。假設相機所她到的畫面的影像 強度可藉由方程式1表示。 其中狄表某-個像素位置的照度,心代表某__彳轉素位置的反射率而 /代表影像序列切索引值。如果只拍攝背景並且保持照像機的穩定反射 率的影響仍齡存在。不過如果制影像相除的方法,則反射率的影響將 可以被消除,張連續的影像相除可崎他們的像素強度比值寫成方程式 2。 》 J^x,y^) \nJ s^y)r(x>y)), Si(x,y)、 U-i (丨,y) J °8 Ks^(x,y),The authors of FacesandGest·, PP. 74-81, 2003” proposed the use of · and Wei Weichun's three-dimensional miscellaneous ship description to achieve the identification of the human lining. If Na used sports foundation and micro-base _ special silk I am working on the lion, because the information on the time series has not been taken into consideration, so there are still many movements that cannot be clearly identified - force. Therefore, this motivates us to design, stabilize, and _ methods, which are inherent in human mining. Time Paper (4) to achieve more accurate _ for identification, ddenMarkQvMQdei (country) can handle time series data and can provide identification without the influence of time scale changes, Φ Hidden Markov Model has also been applied to hand gesture recognition and action The cost of using _ is the efficiency problem and the need to collect a large amount of data and spend a lot of time in estimating the parameters corresponding to the HMM. This month's paste feature inter-work conversion (Elgenspace (four) (four) (10) milk (8) and standard space conversion (Canonical space transf 〇 _i 〇 n) extract features from the image, so that the vector transformed by these spaces can convert the temporal sequence into a sequence of poses, which The sequence of poses can be represented by a combination of index values for the board template category, because if the difference in attitude between the two images is very small at a short interval, it is mainly due to an inherent natural frequency of human motion. The limit 'its frequency is not very high: therefore we use the reduced sampling frequency image instead of using all the pictures for identification. In addition, the present invention further proposes the use of fuzzy law inference method, which can not only combine the posture sequence of human motion The information and fuzzy rules can cover the slight differences in postures of different people when doing the same action, and increase the stability of the recognition effect. SUMMARY OF THE INVENTION The main object of the present invention is to provide a human motion recognition method, in particular, a method of 7 1322963 _ First, the first step S11U is to construct a background model for the foreground character extraction. The invention uses the subtractive image to exclude the bribe statistics, and divides the grayscale value and the image grayscale value by calculating the statistics. The maximum ratio is obtained to obtain a background model. The image-dividing ratio method of real-paying change is more secretive than the (4) aberration method to establish the background. It is assumed that the image intensity of the picture she takes to the camera can be expressed by Equation 1. The illuminance of the position, the heart represents the reflectivity of a certain __ 彳 素 位置 position / represents the image sequence cut index value. If only the background is taken and the effect of maintaining the stable reflectance of the camera is still present, but if the image is divided The method, the effect of the reflectivity can be eliminated, the continuous image is divided by the ratio of their pixel intensity is written as Equation 2. 》 J^x, y^) \nJ s^y)r(x>y) ), Si(x,y), Ui (丨,y) J °8 Ks^(x,y),

所以本發明提出影像相除的比值來建造㈣景觀1景影像的每個 像素都以三個辑數值來代表:最小讀时值、最歧階強度值 历Μ和最大相連影像灰階姆比值這三個數值的統計方式如方程 式3所示。 9 1322963 Λ(*. y) -d(x,y)Therefore, the present invention proposes the ratio of image division to construct (4) each pixel of the landscape 1 scene image is represented by three series values: minimum read time value, most uneven intensity value history and maximum connected image gray scale m ratio. The statistical method of the three values is shown in Equation 3. 9 1322963 Λ(*. y) -d(x,y)

max {//(-^.^)} min {//(^^)} max {l,{x,y)lli-\{x,y)} i max {/i~i(x^y)} min {//(义,·)〇} max {/,-i {x^yV^ix^y)} I if (尤,少)之1 otherwiseMax {//(-^.^)} min {//(^^)} max {l,{x,y)lli-\{x,y)} i max {/i~i(x^y) } min {//(义,·)〇} max {/,-i {x^yV^ix^y)} I if (especially, less) 1 otherwise

在建立出背景模型之後’步驟S112到S113是將前景人物部分可以從 每個影像中分離出來。一張影像中的像素要被歸類於前景還是背景,主要 依據方程式4所得到的結果。 0,a background pixel if 1ι(χ^)/η(χ^)<Ι(£ΐ(χ,γ) or . 11 (x^y)l r{x,y)<kd{x,y) 1,a foreground pixel otherwise ⑷ 其中該方程式的5代表一張前景人物分離出來後的二值化影像,*值為可調 參數’一般來說,k=l. 4。而前景人物的區塊則可以使用對X轴以及Y轴做 ^影統計後設定一個門檻值’依據此統計值的邊界位置就可以切割出來前 景部分’切割出來的影像都會被調整到同樣的大小。 兩張影像如果是在相鄰間隔很短的時間内捕捉進來的話,這兩張影像 中的姿勢;i異性並不會太大。此外’人體約屬於碰,因此有它的自然頻 率,換s之,當做某一動作時,它有行動速度上的自然限制。在我們的方 法中,我們每隔一段固定的間隔取一個影像,稱做為基本樣板影像。第2 圖為挑選樣版影像的範例,該範例在大約固定時間間隔挑選出五組基本左 1322963 到右走路雜版·。這絲本樣版雜會經由特徵"轉換以及標準空 間轉換投影到-個新的空間上如第二個流程測所示,而整個識別的流程 是在標準空間中完成的。 在視訊以及影像處理上,料轉度通常㈣常大^ @為在這些After the background model is established, steps S112 to S113 are to separate the foreground character portion from each image. The pixels in an image are classified as foreground or background, mainly based on the results obtained in Equation 4. 0,a background pixel if 1ι(χ^)/η(χ^)<Ι(£ΐ(χ,γ) or . 11 (x^y)lr{x,y)<kd{x,y) 1, a foreground pixel otherwise (4) where 5 of the equation represents a binarized image after a foreground character is separated, and the value of * is an adjustable parameter 'generally, k = 1.4. The block of the foreground character can be used to set a threshold value for the X-axis and the Y-axis. The image of the foreground portion can be cut according to the boundary position of the statistical value. The cut image will be adjusted to the same size. . If the two images are captured in a short interval of time, the poses in the two images; i is not too different. In addition, the human body belongs to the touch, so it has its natural frequency. For a certain action, it has a natural limit on the speed of action. In our method, we take an image at regular intervals and call it a basic template image. Figure 2 shows an example of selecting a sample image that picks up five sets of basic left 1322963 to right walking swatches at approximately fixed time intervals. This swatch is projected onto a new space via feature "conversion and standard space conversion as shown in the second process, and the entire process of identification is done in standard space. In video and image processing, the material rotation is usually (four) often large ^ @在在 these

圖像中有很多腿的部份,通常賴這些腿部分的綠是賴像經由空 間上的轉換投影到—個新的空間上。這些轉換方法大部分是透過較少的維 絲逼進聽原糊像。衫二做程刪裡,本發_讎法結合 了步驟S121嚼徵空間轉雜如卿咖transf〇峨丨㈤)及步驟⑽的 '1轉換(Can〇niCal space transformati〇n)。特徵空間轉換已經有 效的使用在自動人臉辨識祕識別和走路錢辨識等系統上。在特徵空間 轉換後’再使㈣準空間轉換來減少㈣維度、最佳化於增強兩個類別間 的可刀離性和提相齡賴效能。識別是在標準空間巾完成的。 *假設總共有c __學習魏。每—個麵都代表存在整個訓練影 的某特定動作的姿勢型態、代表在第i個類別中的第】個影 像’整個訓練集合的影像總數可以表示為〜=㈣〜。而整個訓練集 合可以改寫為[χ;ι·.. , , # 、 ’’ ’xm_,“]’其中每一個X;J都是含有n個像素的影 像。 首先’每_像_度會纽過正規化的處理如方程式5所示 xi,j =There are a lot of leg parts in the image, usually the green of these leg parts is projected onto a new space via the space transition. Most of these conversion methods are forced to listen to the original image through fewer wires. In the second process of the shirt, the _ 雠 method combines the step S121 to chew the space to turn into the mixed coffee such as transf〇峨丨 (5)) and the step (10) of the '1 conversion (Can〇niCal space transformati〇n). Feature space conversion has been effectively used in systems such as automatic face recognition and walking money identification. After the feature space is transformed, the (four) quasi-spatial transformation is performed to reduce the (four) dimension, and the optimization is to enhance the detachability and the relative age of the two categories. Identification is done in a standard space towel. * Assume that there is a total of c __ learning Wei. Each face represents the pose pattern of a particular action in which the entire training shadow exists, and the total number of images representing the first image in the i-th category 'the entire training set' can be expressed as ~=(four)~. The entire training set can be rewritten as [χ; ι·.. , , # , '' 'xm_, "]' where each X; J is an image containing n pixels. First of all 'every _ image _ degree The process of over normalization is as shown in Equation 5, xi, j =

⑸ 6來表示 透過方程式5可以得到每張影像的平均像素值並由方程式 1322963 ⑻ /=1/=1 同時訓練影像集合可以改寫成一個維度是„X/Vr的矩陣χ,也就β方程弋7 X = [x!,i -mx,-..,xc^ _m ] J. ⑺ 假設矩陣xxT的稚是κ’則矩陣xxT會有κ的不為零的特徵值,& 一 λ, 以及其相對應的特徵向量,Wp uz,…,,並且符合方程式8。(5) 6 indicates that the average pixel value of each image can be obtained by Equation 5 and the image set can be rewritten by the equation 1322963 (8) /=1/=1. The matrix can be rewritten as a matrix of „X/Vr, ie, the β equation 弋7 X = [x!,i -mx,-..,xc^ _m ] J. (7) Assuming that the matrix xxT is κ' then the matrix xxT will have a non-zero eigenvalue of κ, & λ, and Its corresponding feature vector, Wp uz,..., and conforms to Equation 8.

Xi 11# = R uy, i = 1, 2,…,KXi 11# = R uy, i = 1, 2,...,K

方程式8中的R = XXT ’而且這個R是一個對稱的方陣。但是π的維度一般 來說就是雜的尺寸大小’這_度⑲常Α,會增加輯運算很大的複 雜度,基於奇異值分解理論(Singular value⑽咖出加伽㈣可以 透過計算找個矩陣來得到特徵值跟特徵向量,透過方程式9來表示。R = XXT ' in Equation 8 and this R is a symmetric square matrix. However, the dimension of π is generally a heterogeneous size. This is a constant 19 degree, which increases the complexity of the series operation. Based on the singular value decomposition theory (Singular value (10), it can be calculated to find a matrix. The feature value and the feature vector are obtained and expressed by Equation 9.

R = XTXR = XTX

式子中β矩陣的維度是的〇<〜 ’比原本R矩陣的;小很多。這個s矩陣也 會有K個不為零的特徵值,,和相對應的特徵向量,ϋ 著可以利用方程式10來得到R矩陣的特徵值跟特徵矩陣。 接 λ,· = λ,· e, =(λ,)Γ2χβ(. (10) 其中該式子中的,U ’則Κ則代表κ個特徵向量為互相正交的向量。 基於主成分分析(principal c〇mp〇nent㈣咖)理論每張影像可以利 &最大的特徵值和相對應的特徵向量e e2 ^來得到近似的結果。這些 12 1322963 k個這謝可峨卜__,纽侧物如丨將原本的 影像投影到新的空間上。 (11) y/"·=[ei,々,...,e々]Tx,·" 式子中,· = 1, 2, ...,C 而 y. = l, 2,.... 換向量 基於標準卿▲嫌表經_物轉換後 的類別’“職表第i __第j個向量。整觸合的平均向量則可 使用方程式12計算出來。 m, Νγ ΣΣ y>j (12) 其中 i = l,2,...,c 而 y =丨,2,..., M。各類別的平均向量則可以使用方程式13叶算 (13) mi=瓦Σ〜·· yij Α 計算出這兩個平均向量後,我們可以定義下列三個變數:St代矣 '衣總散佈 矩陣(total scatter matrix),s* 代表群内散佈矩陣(withincias matrix) ’ Sb代表類別間散佈矩陣(between_class腿计丨乂)。 --·1Β1變數可 以透過下列方程式計算得到 '况r 备 Σ -m _ δβ=^Σ S(yu-m,)(y,y ,el 1 c 13 1322963 標準空間轉換的主要目的就是同時使群中散佈矩陣最小而類別門散 矩陣最大。要得到這結果可以透過方程式14計算出來。 j(w): WTSbW \VTSwW_ (14) 求解的方法就是將選擇一個極大解W以符合方程式15 (15) 癱假設r是最佳解,而w;向量是相對於第i個最大特徵值所得到的特徵向量。 健、Introductwn to Statistical Pattern Rec卿ition,2ηά 私ti⑽、 1990”書t所提_理論可以將方程式15改寫成方程式i6。 (16) 解疋後可以得到e—丨個不為零的特徵值跟其相對應的特徵向量The dimension of the β matrix in the equation is 〇<~ ’ is much smaller than the original R matrix; This s-matrix also has K non-zero eigenvalues, and corresponding eigenvectors, and Equation 10 can be used to obtain the eigenvalues and feature matrices of the R matrix. λ,· = λ,· e, =(λ,)Γ2χβ(. (10) where U ' is 向量 代表 represents κ eigenvectors as mutually orthogonal vectors. Based on principal component analysis ( Principal c〇mp〇nent (four) coffee) theory each image can be approximated by the largest eigenvalue and the corresponding eigenvector e e2 ^. These 12 1322963 k of this thank you __, the side of the投影 Project the original image onto a new space. (11) y/"·=[ei,々,...,e々]Tx,·" In the formula,· = 1, 2, .. ., C and y. = l, 2,.... The vector is based on the standard qing ▲ table after the _ object conversion category '" job table i __ jth vector. The average vector of the whole touch It can be calculated using Equation 12. m, Νγ ΣΣ y>j (12) where i = l,2,...,c and y =丨,2,..., M. The average vector for each category can be used Equation 13 leaves (13) mi=corrugated ~·· yij Α After calculating the two average vectors, we can define the following three variables: St 矣 to 'total scatter matrix, s* Intra-group scatter matrix (withincias matrix) ' Sb represents the inter-category scatter matrix (between_class leg 丨乂). The -1Β1 variable can be calculated by the following equation: 'condition r Σ -m _ δβ=^Σ S(yu-m,)(y,y,el 1 c 13 1322963 The main purpose of standard space conversion is to minimize the scatter matrix in the group and the class gate matrix. The result can be calculated by Equation 14. j(w): WTSbW \VTSwW_ (14) Method of solving That is, a maximal solution W will be chosen to conform to Equation 15 (15). The hypothesis r is the best solution, and w; the vector is the eigenvector obtained from the i-th largest eigenvalue. Jian, Introductwn to Statistical Pattern Rec ition , 2ηά private ti (10), 1990" book t _ theory can rewrite equation 15 into equation i6. (16) after decompression can get e - 不 non-zero eigenvalues and their corresponding eigenvectors

Un,】细@個基底,我們可以將一個在特徵空間上的―個點投影到 標準空間上的另—個點如方程式17所示。 Ά 〜(17) 、中〜代表W到新空間的點’而正交基底卜I ”,〜】τ則稱作標準空間轉 換矩陣。 透過結合方程式 和17 ’每張影像都可以透過方程式18投影到一個 c - 1維的新空間上。 Z,V =Ηχ,ν. 14 (18) 1^22963 第三個流程S130主要為辨識流程,在時間影像序列中,不同姿勢間的 轉變關係是識別人_作上-轉常重要㈣訊。如果_僅僅利用一張 影像來做為動作辨識的依據,則分類結果很容易錯誤,因為兩個不同的動 作中’可能會出現極類似的姿勢影^人類動作的單—雜有模糊性, 所以在步驟S131到S133裡,我們提出利用模糊法則來做人類動作識別, 它不僅能舰合_序壯的纽,而且可以容忍*同人做同—個動作的 差異。相·賴_論的敎如下。在“臓τ·. Syst.,ManC細η, vol. 22’ no· 6的論文中,wang和Mendel提出從例子中去學習產生模糊 法則。在 “IEEETrans· Sys.,ManCybera,A,vol. 30, no. 2” 的論文 中,Su提出了以模糊法則為基礎的方法來時序上的手部姿勢辨別。 假設^是第i個姿勢類別中第j個人樣板影像的空間轉換向量,而五是 某一時刻影像的轉換向量。仙湘高斯型態雜屬函數(membership function)來表示某一張影像相對於每一個姿勢類別的可能性,此隸屬函數 可由方程式19所表示。 〜=^^ΓχρΒ(5-馬)、Ή)] (19) 其中Σ代表a的協方差矩陣(c〇variance難计丨乂)而瓦代表的是平均向量。 我們假設5和只/的各維度都是互相獨立的,則方程式19可以改寫為方程式 20。 ^ =argmax c-1 0 exp (20) 15 其中m代表維度的索引,%是使用全部的平均向量去求得的第m維標準差, ^代表·械於第,猶勢_的_ 值。最射財得這張影像 所屬的姿勢類別並由方程式21所示。 1 = arg max r. ' (21) i指的是姿勢類別 每-張影像都纽用-個文字符號Pi來代表它,這邊的 的索引值。 *為了 l s時序祕,制將三張树的減低取樣鮮影像結合成為一 組如α,ι2, ι3)β如果這一組資料採用了太多張的影像則一個快速完成 動作週期物t嫩·__崎;竭_姆的影像則 時序上的資訊會不足n這三_像會投影到標準空間中然後依昭 隸^數储繼機糊。細_顺_獅動作類別 為(h h I3’D),D代為動作(Acti〇nCateg〇ry)e這樣的姿勢序 列構成模槪職統在學科⑽續料。而祕經由這組輸入輸出 對’會產生-組相對應的法則,它的形式如下 IF antecedent conditions hold, THEN consequent conditions hold/* 來說如第2圖所不,第一張影像的姿勢序列是屬於第二張影像 屬於m張影像屬於Ρ2β’而其相對應的動作屬於從左往右走㈤則 會由方程式22的輸人輸出料,以第⑽為例,就會代表從左往右走。 (’“,心制“〜〜) 1322963 由此會產生下列的法則 KIF the activity5 s I. is P18 AND its h is P19 AND its Is is P2〇, THEN the activity is Wlr” 第3圖顯示系統分類决算法的結構。首先,每間隔一段固定時間去續 取一張影像如S300。這一張影像會經過空間的轉換到標準空間中並且計算 出其相對應的隸屬函數值如S305到S310。接著如S315到S325步驟所顯示, € 每三張影像的隸屬函數值集合成一個群組,去訓練所得到的模糊法則資料 .庫中尋找-組最相似的姿勢序列’這個影像糊倾分_最類似的法則 中所紀錄的動作類別。 我們的貫驗環境是一間教室。光源是穩定的日光燈。採用僅有一張桌 子的單純月景。攝影機固疋在一個地點,並且不移動的拍攝同—場景。攝 影機每秒鐘會拍攝30張640x480像素的影像。共有六個動作者,每位各自 做相同的下面六個動作:「由左往右走」(wLR)、「由右往左走」(Wrl)、「跳躍」 φ (1鲫)、「蹲下」(C隨)、「攸上」(Cup)和「收下」(CWN)。其中五個人的影 片為訓練用,剩下一個則當測試用,每個人的影片都會輪流拿來做測試。 第4a圖是一張由攝影機擷取的影像,第牝圖是前景人物分離出來並轉為 二值化影像的結果。 我們各挑選了六類的基本姿勢樣板影像給「由左往右走」、「由右往左 走」和「爬上」’五類給「爬下」,三類給「蹲下」,兩類給「跳躍」,總共 是28類。在所有的測試影片都經過以上的訓練學習後,我們必須設定一個 門檻值,這門檻值是用來摒除某些姿勢序列發生次數相對較少的法則,門 1322963 植值的高低會影響法則數目的多寡。第5圖為不同門與辨識結果的比 較圖。圖中可以看到,當Η楼值為二的時候辨識率最高,但在我們的實驗 中,我們採用三為門雛’這是因為如果採用的門植值太低,則會產生太 多的法則1且有些關的支持數會太少。此外,如果有—些互相矛盾的 法則產生,我們則挑選在訓練時出現次數較多的法則。 表格-顯示我們㈣_職率1們__動統目前是離線測 試狀態,也就是我們的測試影片不是即時影片。因騎5:1降低抽樣影像 的步驟’我們在測試的時候把由不同起始時序位置,即i至5的位置去讀 取影像,雌_狀況也_考慮料,這與我們在观翻法則時是L 樣的’而且也比較相近於即時辨識時發生的狀況。舉例來說,我們影片中 第-個、第二個、第三個、第四個或第五個影像開始去讀取做辨識^ 在 “IEEECVPR,pp. 379 - 385, 1992” 的論文中,Ya峨〇和_提 出使用HiddenMa咖vMQdei做人㈣作物卜随⑽她㈣制麵 是一種狀態間轉_機率模型,it常被Φ來做時序上 貝啊刀析。在實驗 中我們將模糊法則方法與HMM方法的識別率做個比較。 表格一 所有動作者的動作辨識率 1322963 量測資料 辨識率(%) Wlr Wrl Croh Jump Cup Coon 第一人 100.0 92.3 Tl.o 78.4 78.1 94.6 第二人 100.0 82.5 97.1 61.8 100.0 94.3 第三人 100.0 100.0 74.4 94.1 100.0 45.3 第四人 100.0 93.7 100.0 91.3 93.6 76.7 第五人 100.0 100.0 100.0 100.0 90.7 100.0 第六人 100.0 100.0 97.6 100.0 100.0 100.0 平均值 91 .78 HMM方法的第-部份是使用最近鄰居(nearest neighb〇r)分類法將影像 序列賴錢勢序列。每-張雜都被分綱麟最近的姿勢樣板類別, 而姿勢樣板的挑選跟以上敘述模糊法則方法的樣版是一樣的。在學習的階 段,各HMM必須經過訓練以產生最能代表某一類動作的姿勢轉換機率參數。 ^ 在實驗中,我們使用Baum_Welch演算法來產生估計出HMM的相關參數,另 外我們採用了刖向鍵結方式(forward-chaining),狀態的各數設定28個, 最後觀測序列的長度設定為三。 在經過訓練資料的學過後,可以獲得六組脱他分別代表一類的動作。 在識別一個未知動作之觀測序列的動作時,將未知動作歸類於能從六個 HMMs中產生最大機率值’也就是最相近的服μ類別中《我們採用前向演算 法(forward algorithm)來計算出這個機率值。 使用Η順演算法和以模糊法則為基礎的演算法之間的辨識率比較顯示 1322963 在表格二。模糊法則演算法能夠獲得較高的正確率大約提高了 2.4%,這 顯示了模概職算法在人鋪作的卿上有較好賴識效果。 表格二 HMM與模糊法則演算法辨識率比較表 HMM 模糊法則演算 法, 第一人 81. 18 84.61 第二人 88.33 91.03 第三人 86,25 87.15 第四人 90.00 93.33 第五人 93.80 96.71 第六人 96.90 97.85 平均值 89.41 91.78 實驗結果顯不’在沒有參考任何姿體所在位置、移動路徑和移動速度 離況下,這六種動作的總順率可輯縣.78%。翻法職算法與画 方法相比較下提高了大約2. 4%的正確率。Un,] Fine @base, we can project a point on the feature space to another point on the standard space as shown in Equation 17. 〜 ~(17), medium ~ represents the point of W to the new space 'and the orthogonal base I ′, 】 τ is called the standard space transformation matrix. Through the combination of equations and 17 'each image can be projected through equation 18 To a new space of c - 1 dimension. Z, V = Ηχ, ν. 14 (18) 1^22963 The third process S130 is mainly for the identification process. In the time image sequence, the transition relationship between different postures is identification. If you use only one image as the basis for motion recognition, the classification result is easy to make mistakes, because in two different actions, 'similar poses may appear ^ The single-manual ambiguity of human motion, so in steps S131 to S133, we propose to use the fuzzy law to do human motion recognition, which can not only be a ship-like, but also can tolerate the same person. The difference is the following. In the paper "臓τ·. Syst., ManC fine η, vol. 22' no·6, wang and Mendel proposed to learn the fuzzy rule from the example. In the paper "IEEE Trans. Sys., ManCybera, A, vol. 30, no. 2", Su proposed a fuzzy rule-based method for temporal gesture recognition. Assume that ^ is the spatial transformation vector of the jth model image in the i-th pose category, and five is the conversion vector of the image at a certain moment. The Xianxiang Gaussian type membership function is used to indicate the probability of an image relative to each pose category. This membership function can be represented by Equation 19. ~=^^ΓχρΒ(5-马),Ή)] (19) where Σ represents the covariance matrix of a (c〇variance is difficult to calculate) and watt represents the average vector. We assume that the dimensions of 5 and only / are independent of each other, then Equation 19 can be rewritten as Equation 20. ^ =argmax c-1 0 exp (20) 15 where m represents the index of the dimension, % is the m-th standard deviation obtained using all the average vectors, and ^ represents the _ value of the mechanical, the _ _. The most popular type of posture to which this image belongs is shown by Equation 21. 1 = arg max r. ' (21) i refers to the pose category. Each image is used with a text symbol Pi to represent it, the index value of this side. * For the ls timing secret, the system combines the reduced image of the three trees into a group such as α, ι2, ι3) β. If this group of data uses too many images, then a fast completion of the action cycle is tender. __崎; exhaust _ um image, the information on the timing will be insufficient n. The three _ image will be projected into the standard space and then stored in the standard. The _ _ _ lion action category is (h h I3'D), and the D-generation action sequence (Acti〇nCateg〇ry) e constitutes a model sequence in the subject (10). And through this set of input and output pairs, the law that corresponds to the group will be generated, and its form is as follows: IF antecedent conditions hold, THEN consequent conditions hold/*, as shown in Fig. 2, the pose sequence of the first image is The second image belongs to the m image belongs to Ρ2β' and its corresponding action belongs to the left to the right (5), then the output of the input of Equation 22, taking (10) as an example, it means going from left to right. . ('", heart system~~) 1322963 This will result in the following rules: KIF the activity5 s I. is P18 AND its h is P19 AND its Is is P2〇, THEN the activity is Wlr” Figure 3 shows the system classification The structure of the algorithm. First, each time a fixed time to continue to retrieve an image such as S300. This image will be spatially converted into the standard space and calculate its corresponding membership function values such as S305 to S310. As shown in steps S315 to S325, the membership function values of every three images are grouped into a group to train the obtained fuzzy rule data. The library searches for the most similar pose sequence of the group. The type of action recorded in a similar rule. Our test environment is a classroom. The light source is a stable fluorescent lamp. It uses a simple moon view with only one table. The camera is fixed in one place, and the camera is not moving. Scene. The camera will take 30 images of 640x480 pixels per second. There are six actors, each doing the same six actions: "Left to the right" (wLR), " Go from right to left" (Wrl), "jump" φ (1鲫), "蹲下" (C with), "Cup" and "Receive" (CWN). Five of the films are for training, and the other is for testing. Each person's film will be taken for testing. Figure 4a is an image captured by a camera. The second image is the result of the separation of the foreground characters and conversion to binarized images. We have selected six types of basic pose sample images for "Left to Right", "From Right to Left" and "Climb" to five categories for "Climb" and three categories for "Kneeling". The class gives "jump", a total of 28 categories. After all the test films have passed the above training, we must set a threshold value, which is used to eliminate the rule that the number of occurrences of certain posture sequences is relatively small. The height of the door 1322963 will affect the number of rules. How many. Figure 5 is a comparison of the different gates and the identification results. As you can see, the recognition rate is the highest when the value of the building is two, but in our experiment, we use the three as the door. 'This is because if the threshold value is too low, it will produce too much. Rule 1 and some support will be too few. In addition, if there are some contradictory rules, we will choose the rule that occurs more frequently during training. Form - Show us (4) _ job rate 1 __ 统 is currently offline testing state, that is, our test film is not an instant movie. The step of lowering the sampled image by riding 5:1 'we read the image from different starting timing positions, i to 5, at the time of testing, and the female _ situation is also considered, which is in line with our law of turning The time is L-like' and is also similar to what happens when instant recognition occurs. For example, the first, second, third, fourth, or fifth image in our movie begins to be read for identification. ^ In the paper "IEEECVPR, pp. 379 - 385, 1992", Ya峨〇 and _ proposed to use HiddenMa coffee vMQdei to be a human (four) crops with (10) her (four) noodle is a state transition _ probability model, it is often used by Φ to do the timing of the shell. In the experiment we compare the recognition rate of the fuzzy rule method with the HMM method. Table 1 All the actor's action recognition rate 1322963 Measurement data identification rate (%) Wlr Wrl Croh Jump Cup Coon First person 100.0 92.3 Tl.o 78.4 78.1 94.6 Second person 100.0 82.5 97.1 61.8 100.0 94.3 Third person 100.0 100.0 74.4 94.1 100.0 45.3 Fourth person 100.0 93.7 100.0 91.3 93.6 76.7 Fifth person 100.0 100.0 100.0 100.0 90.7 100.0 Sixth person 100.0 100.0 97.6 100.0 100.0 100.0 Average 91.78 The first part of the HMM method is to use nearest neighbor (nearest neighb〇 r) The classification method takes the image sequence to the potential sequence. Each of the miscellaneous items is divided into the most recent posture model category, and the selection of the posture template is the same as the pattern of the above-mentioned fuzzy law method. During the learning phase, each HMM must be trained to produce a pose conversion probability parameter that best represents a certain type of action. ^ In the experiment, we use the Baum_Welch algorithm to generate the relevant parameters of the estimated HMM. In addition, we use the forward-chaining method, the number of states is set to 28, and the length of the last observed sequence is set to three. . After the training materials have been learned, you can get six sets of actions that represent each other. In the action of identifying an observation sequence of an unknown action, the unknown action is classified into a maximum probability value that can be generated from six HMMs, that is, in the most similar service μ category, we use a forward algorithm. Calculate this probability value. A comparison of the recognition rates between the use of the Η-sequence algorithm and the algorithm based on the fuzzy rule shows 1322963 in Table 2. The fuzzy law algorithm can achieve a higher accuracy rate of about 2.4%, which shows that the algorithm is well-appeared in the people's work. Table 2 HMM and fuzzy rule algorithm identification rate comparison table HMM fuzzy rule algorithm, first person 81. 18 84.61 second person 88.33 91.03 third person 86,25 87.15 fourth person 90.00 93.33 fifth person 93.80 96.71 sixth person 96.90 97.85 Average 89.41 91.78 The experimental results show that the total yield of these six actions can be counted. 78% without any reference to the position, movement path and moving speed. The correct rate of about 2.4% is improved by comparison with the painting method.

暴 惟以上所述者,僅為本發明之較佳實施例而已,並非用來限定本發明 實施之範圍。故即凡依本發明巾請範圍職之形狀、構造、特徵及精神所 為之均等變化或修飾,均應包括於本發明之申請專利範圍内。 20 1322963 【圖式簡單說明】 第1圖為本發明之系統架構示意圖。 第2圖為挑選樣版影像的範例示意圖。 第3圖為系統分類演算法的結構示意圖。 第4圖為前景人物擷取範例示意圖。 第5圖為不同門檻值與辨識結果的比較示意圖。 【主要元件符號說明】 無The above is only the preferred embodiment of the present invention and is not intended to limit the scope of the practice of the present invention. Therefore, any change or modification of the shape, structure, characteristics and spirit of the scope of the invention should be included in the scope of the patent application of the present invention. 20 1322963 [Simplified description of the drawings] Fig. 1 is a schematic diagram of the system architecture of the present invention. Figure 2 is a schematic diagram of an example of selecting a sample image. Figure 3 is a schematic diagram of the structure of the system classification algorithm. Figure 4 is a schematic diagram of a sample of prospective characters. Figure 5 is a comparison of different threshold values and identification results. [Main component symbol description] None

Claims (1)

7.如申請專利第1項所述之人類動作辨識方法,其中該人類動作識 別是在標準空間中完成的。 8.如申請專利範園第1項所述之人類動作辨識方法,其令該背景模型使 用相連影像相除法來描述一個統計背景模型。 9·如中請專利賴第8 _述之人軸作辨财法,其中馳計背景模7. The human motion recognition method according to claim 1, wherein the human motion recognition is performed in a standard space. 8. The human motion recognition method as described in claim 1 of the patent application, wherein the background model uses a connected image division method to describe a statistical background model. 9. In the case of the patent, Lai No. 8 _ describes the human axis as a method of distinguishing money, 型透過計算統計最大、最小的雄值和树影像灰階值雛的最大比 值以獲得一個背景模型。 10. 如申請專利顧第8 _述之人__識方法,其中賊 型的背景影像像素都以三個統計數絲代表:最小灰階強度值最大 灰階強度值和最大相連影像灰階值相除比值。 其中該人類動作識 U.如申請專利範圍第1項所述之人類動作辨識方法, 相對於每一個姿勢類 別利用高斯型態的隸屬函數,來表示每—張影像 別的可能性。The model obtains a background model by calculating the maximum ratio of the largest and smallest male values and the tree image grayscale values. 10. For example, the patent image of the thief-type background image pixel is represented by three statistical numbers: the minimum gray-scale intensity value, the maximum gray-scale intensity value, and the maximum connected image gray-scale value. Divide ratio. The human motion recognition U. The human motion recognition method described in claim 1 of the patent application uses a Gaussian membership function for each gesture category to indicate the possibility of each image. 23twenty three
TW96102113A 2007-01-19 2007-01-19 Human activity recognition method by combining temple posture matching and fuzzy rule reasoning TW200832237A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW96102113A TW200832237A (en) 2007-01-19 2007-01-19 Human activity recognition method by combining temple posture matching and fuzzy rule reasoning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW96102113A TW200832237A (en) 2007-01-19 2007-01-19 Human activity recognition method by combining temple posture matching and fuzzy rule reasoning

Publications (2)

Publication Number Publication Date
TW200832237A TW200832237A (en) 2008-08-01
TWI322963B true TWI322963B (en) 2010-04-01

Family

ID=44818839

Family Applications (1)

Application Number Title Priority Date Filing Date
TW96102113A TW200832237A (en) 2007-01-19 2007-01-19 Human activity recognition method by combining temple posture matching and fuzzy rule reasoning

Country Status (1)

Country Link
TW (1) TW200832237A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8634595B2 (en) 2011-05-04 2014-01-21 National Chiao Tung University Method for dynamically setting environmental boundary in image and method for instantly determining human activity
US20140118556A1 (en) * 2012-10-31 2014-05-01 Pixart Imaging Inc. Detection system
CN103808305A (en) * 2012-11-07 2014-05-21 原相科技股份有限公司 Detection system
US10354413B2 (en) 2013-06-25 2019-07-16 Pixart Imaging Inc. Detection system and picture filtering method thereof

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8165349B2 (en) * 2008-11-29 2012-04-24 International Business Machines Corporation Analyzing repetitive sequential events
TWI419058B (en) * 2009-10-23 2013-12-11 Univ Nat Chiao Tung Image recognition model and the image recognition method using the image recognition model
TWI459310B (en) * 2011-12-30 2014-11-01 Altek Corp Image capturing device able to simplify characteristic value sets of captured images and control method thereof
TWI490790B (en) * 2012-11-14 2015-07-01 Far Eastern Memorial Hospital Dynamic cardiac imaging analysis and cardiac function assessment system
TW201426620A (en) 2012-12-19 2014-07-01 Ind Tech Res Inst Health check path evaluation indicator building system, method thereof, device therewith, and computer program product therein
US20150363450A1 (en) * 2014-06-12 2015-12-17 National Chiao Tung University Bayesian sequential partition system in multi-dimensional data space and counting engine thereof
US9807316B2 (en) 2014-09-04 2017-10-31 Htc Corporation Method for image segmentation
CN112990137B (en) * 2021-04-29 2021-09-21 长沙鹏阳信息技术有限公司 Classroom student sitting posture analysis method based on template matching

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8634595B2 (en) 2011-05-04 2014-01-21 National Chiao Tung University Method for dynamically setting environmental boundary in image and method for instantly determining human activity
US20140118556A1 (en) * 2012-10-31 2014-05-01 Pixart Imaging Inc. Detection system
TWI489090B (en) * 2012-10-31 2015-06-21 Pixart Imaging Inc Detection system
US9684840B2 (en) 2012-10-31 2017-06-20 Pixart Imaging Inc. Detection system
US10255682B2 (en) 2012-10-31 2019-04-09 Pixart Imaging Inc. Image detection system using differences in illumination conditions
US10755417B2 (en) 2012-10-31 2020-08-25 Pixart Imaging Inc. Detection system
CN103808305A (en) * 2012-11-07 2014-05-21 原相科技股份有限公司 Detection system
US10354413B2 (en) 2013-06-25 2019-07-16 Pixart Imaging Inc. Detection system and picture filtering method thereof

Also Published As

Publication number Publication date
TW200832237A (en) 2008-08-01

Similar Documents

Publication Publication Date Title
TWI322963B (en)
Presti et al. 3D skeleton-based human action classification: A survey
Stikic et al. Weakly supervised recognition of daily life activities with wearable sensors
Lim et al. Isolated sign language recognition using convolutional neural network hand modelling and hand energy image
Piyathilaka et al. Gaussian mixture based HMM for human daily activity recognition using 3D skeleton features
Sung et al. Unstructured human activity detection from rgbd images
Chaichulee et al. Multi-task convolutional neural network for patient detection and skin segmentation in continuous non-contact vital sign monitoring
Faria et al. A probabilistic approach for human everyday activities recognition using body motion from RGB-D images
Luo et al. A deep sum-product architecture for robust facial attributes analysis
Nicolle et al. Facial action unit intensity prediction via hard multi-task metric learning for kernel regression
WO2023082882A1 (en) Pose estimation-based pedestrian fall action recognition method and device
Mici et al. A self-organizing neural network architecture for learning human-object interactions
Li et al. Human action recognition via skeletal and depth based feature fusion
Pinquier et al. Strategies for multiple feature fusion with hierarchical hmm: application to activity recognition from wearable audiovisual sensors
Liu et al. Action recognition for sports video analysis using part-attention spatio-temporal graph convolutional network
Appenrodt et al. Multi stereo camera data fusion for fingertip detection in gesture recognition systems
Angelopoulou et al. Evaluation of different chrominance models in the detection and reconstruction of faces and hands using the growing neural gas network
Lin et al. Human action recognition using action trait code
Wang et al. A novel local feature descriptor based on energy information for human activity recognition
Batool et al. Fundamental Recognition of ADL Assessments Using Machine Learning Engineering
Jayabalan et al. Dynamic Action Recognition: A convolutional neural network model for temporally organized joint location data
Xie et al. Event voxel set transformer for spatiotemporal representation learning on event streams
Koppula et al. Human activity learning using object affordances from rgb-d videos
Ahmed et al. Adaptive pooling of the most relevant spatio-temporal features for action recognition
Sousa et al. Incremental semantic mapping with unsupervised on-line learning