CN109151387A

CN109151387A - A kind of dollying head recognition of face low latency solution based on webRTC

Info

Publication number: CN109151387A
Application number: CN201810980968.3A
Authority: CN
Inventors: 叶�武; 潘瑶斌; 方垚
Original assignee: Hangzhou Dang Hong Polytron Technologies Inc
Current assignee: Hangzhou Dang Hong Polytron Technologies Inc; Hangzhou Arcvideo Technology Co ltd
Priority date: 2018-08-27
Filing date: 2018-08-27
Publication date: 2019-01-04
Anticipated expiration: 2038-08-27
Also published as: CN109151387B

Abstract

The dollying head recognition of face low latency solution based on webRTC that the invention discloses a kind of.It specifically comprises the following steps: that Face datection request is initiated in mobile terminal；Transcoding task is initiated from monitoring server to transcoder；Transcoder initiates request to RTC server and establishes Chatroom；RTC server return to room number gives transcoder；Transcoder tells monitoring server room number；Monitoring server tells mobile terminal room number again；Mobile terminal connects RTC server by room number and room is added；RTC server and communication cloud establish the data transmission nodal of chummery, carry out the transmission of time low latency data based on webRTC；Mobile terminal starts to send data through communication cloud to transcoder；Transcoder establishes task at one into two, realizes face snap and real-time transparent transmission task.The beneficial effects of the present invention are: can effectively realize reduces picture delay issue, it can be deduced that probably between 200ms to 300ms, theory can drop within 100ms delay result.

Description

A kind of dollying head recognition of face low latency solution based on webRTC

Technical field

The present invention relates to coding and decoding video correlative technology fields, refer in particular to a kind of dollying tribal chief based on webRTC Face identifies low latency solution.

Background technique

In exploitation mobile phone terminal face monitoring project, discovery sends rtmp with mobile phone terminal and flows to server for recognition of face When, discovery picture postpones excessive problem, and mobile phone terminal distance is remoter, and the delay for walking public network stream higher can reach more than ten seconds.

Summary of the invention

The present invention be in order to overcome the above deficiencies in the prior art, provide one kind can effectively shorten delay when Between the dollying head recognition of face low latency solution based on webRTC.

To achieve the goals above, the invention adopts the following technical scheme:

A kind of dollying head recognition of face low latency solution based on webRTC, specifically comprises the following steps:

(1) Face datection request is initiated in mobile terminal；

(2) transcoding task is initiated from monitoring server to transcoder；

(3) transcoder initiates request to RTC server and establishes Chatroom；

(4) RTC server return to room number gives transcoder；

(5) transcoder tells monitoring server room number；

(6) monitoring server tells mobile terminal room number again；

(7) mobile terminal connects RTC server by room number and room is added；

(8) RTC server and communication cloud establish the data transmission nodal of chummery, carry out time low latency based on webRTC Data transmission；

(9) mobile terminal starts to send data through communication cloud to transcoder；

(10) transcoder establishes task at one into two, realizes face snap and real-time transparent transmission task.

Using the low latency solution of the above-mentioned dollying head recognition of face based on webRTC, can effectively realize Picture delay issue is reduced, the data for changing into RBG24 by decoding video are shown using opecv, it can be deduced that delay result is big Generally between 200ms to 300ms, theory can be dropped within 100ms；Mobile phone using 4G network be also almost this delay when Between.

Preferably, in step (8), based on specifically including RtcMessage, communication, logical in webRTC Believe cloud and hardware, wherein RtcMessage is to initiate request creation room to communication cloud for mobile terminal as a signaling collection Or room is added, after communication cloud creates room success, then the communication connection of communication is established with mobile terminal, by hard Part acquisition audio, video data is sent to communication cloud or receives the data of communication cloud.

Preferably, transcoder uses bottom transcoding technology when establishing task at one into two in step (10), It inherits dshow frame to be realized, be implemented as follows: mobile terminal being obtained by Source module access RTC server first Video data, then data are distributed by infTee module and spell pin module frame to video data decoder decoder and video Wrapper, the first branch video data decoder decoder parse bit stream data, then are transmitted to video encoder encoder compiling Scheme at RGB24, is transmitted to face recognition module and carries out Characteristic Contrast, to capture face；Second branch's video spells pin module frame Wrapper is transmitted to FLVmux module, generates RTMP live stream, adds audio mute packet, carries out real-time transparent transmission.

Preferably, receiving video data in step (10), it is decoded into H264 uncorrected data, then H264 video data It is converted to RBG24 figure, figure is not stopped into refreshing with the cv::imshow method of Opencv and is shown, the effect watched in real time is reached Fruit.

Preferably, face snap includes that Face datection, face tracking, recognition of face and living body are tested in step (10) Four parts are demonstrate,proved, Face datection refers to detection static images face and returns to face frame coordinate, landmark coordinate and matter Measure score information；Face tracking refers to the face tracking inspection to monitoring or dynamic video realization Millisecond under complex scene It surveys, obtains face frame coordinate, landmark coordinate and the mass fraction information of all faces in each frame in real time, and not by people Face blocks, obscures, the influence of side face factor；Recognition of face refers to that the recognition of face for 1:1 and 1:N compares, wherein 1:1 It compares misclassification rate in the case where recall rate 96% and is lower than ten a ten thousandths, 1:N is compared in extensive unlimited ethnic group, unlimited age Portrait data bottom library on realize Millisecond retrieval；Whether living body verifying is true man behaviour before referring to verifying mobile terminal camera Make.

The beneficial effects of the present invention are: can effectively realize reduces picture delay issue, is shown using opecv and pass through solution Code video changes into the data of RBG24, it can be deduced that probably between 200ms to 300ms, theory can drop to delay result Within 100ms.

Detailed description of the invention

Fig. 1 is flow chart of the method for the present invention；

Fig. 2 is the schematic diagram based on webRTC；

Fig. 3 is the schematic diagram of bottom transcoding technology.

Specific embodiment

The present invention will be further described with reference to the accompanying drawings and detailed description.

In embodiment as described in Figure 1, a kind of dollying head recognition of face low latency solution based on webRTC, Specifically comprise the following steps:

(1) mobile terminal (Mobile App) initiates Face datection request；

(2) transcoding task is initiated from monitoring server (monitor server) to transcoder (transcoder)；

(3) transcoder (transcoder) initiates request to RTC server and establishes Chatroom；

(4) RTC server return to room number (session id) gives transcoder (transcoder)；

(5) transcoder (transcoder) tells monitoring server (monitor server) room number (session id)；

(6) monitoring server (monitor server) tells mobile terminal (Mobile App) room number (session again id)；

(7) mobile terminal (Mobile App) connects RTC server by room number (session id) and room is added；

As shown in Fig. 2, based on RtcMessage, communication, communication cloud and hardware is specifically included in webRTC, Wherein RtcMessage is to initiate request creation room as a signaling collection to communication cloud for mobile terminal or room is added Between, after communication cloud creates room success, then the communication connection of communication is established with mobile terminal, by hardware acquisition sound view Frequency evidence is sent to communication cloud or receives the data of communication cloud.

(10) transcoder establishes task at one into two, realizes face snap and real-time transparent transmission task；

Transcoder uses bottom transcoding technology when establishing task at one into two, inherits dshow frame and carries out in fact It is existing, as shown in figure 3, being implemented as follows: mobile terminal video data is obtained by Source module access RTC server first, then By infTee module distribution data to video data decoder decoder and video spelling pin module frame wrapper, first point Branch video data decoder decoder parses bit stream data, then is transmitted to video encoder encoder and is compiled into RGB24 figure, is transmitted to Face recognition module carries out Characteristic Contrast, to capture face；Second branch's video is spelled pin module frame wrapper and is transmitted to FLVmux module, generation RTMP live stream, addition audio mute packet (and because of the transmission mechanism using pure video, thus Eliminate AV and synchronize the required time), to adapt to the RTMP streaming player for centainly needing audio, carry out real-time transparent transmission.

DirectShow is that (this method inherits the frame and in linux for Streaming Media frame on a windows platform Lower realization), provide media stream acquisition and the playback function of high quality.It supports diversified media file format, packet ASF, MPEG, AVI, MP3 and wav file are included, while supporting to drive using WDM or the VFW of early stage driving to carry out media stream Acquisition.DirectShow incorporates other DirectX technologies, can automatically detect and use available audio-video hardware Accelerate, can also support not hardware-accelerated system.DirectShow enormously simplifies media playback, format conversion and acquisition work Make.But at the same time, it also provides bottom current control framework for the customized solution of user, to allow user certainly The DirectShow component of new file format or other purposes is supported in row creation.It is that several use DirectShow write below Typical case: DVD player, video editing application, AVI to ASF converter, MP3 player and Digital Video collection application.

Video data is received, H264 uncorrected data is decoded into, then H264 video data is converted to RBG24 figure, uses Opencv Cv::imshow method will figure do not stop refresh show, achieve the effect that watch in real time.

Face snap includes that Face datection, face tracking, recognition of face and living body verify four parts, what Face datection referred to It is detection static images face and returns to face frame coordinate, landmark coordinate and mass fraction information, in FDDB test set On, detection effect reaches leading level；Face tracking refer to under complex scene monitoring or dynamic video realize milli The face tracking detection of second grade, obtains face frame coordinate, landmark coordinate and the quality of all faces in each frame in real time Score information, and do not blocked, obscured by face, side face factor is influenced；Recognition of face refers to the people for 1:1 and 1:N Face identification compares, and wherein 1:1 compares misclassification rate in the case where recall rate 96% and advising greatly lower than ten a ten thousandths, 1:N comparison The retrieval of Millisecond is realized in the unlimited ethnic group of mould, the portrait data bottom library at unlimited age, may be implemented under DYNAMIC COMPLEX scene more The real-time identification and alarm of road video, plurality of human faces, on LFW test set, accuracy rate reaches 99.87%；Living body verifying refers to Whether be true man's operation before verifying mobile terminal camera, prevent using high definition photo, threedimensional model, video record, change face etc. it is imitative Behavior is emitted, demand for security of the sensitive industry to recognition of face is met.

Using the low latency solution of the above-mentioned dollying head recognition of face based on webRTC, can effectively realize Picture delay issue is reduced, the data for changing into RBG24 by decoding video are shown using opecv, it can be deduced that delay result is big Generally between 200ms to 300ms, theory can be dropped within 100ms；Mobile phone using 4G network be also almost this delay when Between, mobile terminal is slightly higher in 2S or so using 4G delay meeting at a distance.

Claims

1. a kind of dollying head recognition of face low latency solution based on webRTC, characterized in that specifically include as follows Step:

(1) Face datection request is initiated in mobile terminal；

(2) transcoding task is initiated from monitoring server to transcoder；

(3) transcoder initiates request to RTC server and establishes Chatroom；

(4) RTC server return to room number gives transcoder；

(5) transcoder tells monitoring server room number；

(6) monitoring server tells mobile terminal room number again；

(7) mobile terminal connects RTC server by room number and room is added；

(8) RTC server and communication cloud establish the data transmission nodal of chummery, carry out time low latency data based on webRTC Transmission；

(10) transcoder establishes task at one into two, utilizes opecv Display Realization face snap and real-time transparent transmission task.

2. a kind of dollying head recognition of face low latency solution based on webRTC according to claim 1, It is characterized in, in step (8), based on specifically including RtcMessage, communication, communication cloud and hardware in webRTC, Wherein RtcMessage is to initiate request creation room as a signaling collection to communication cloud for mobile terminal or room is added Between, after communication cloud creates room success, then the communication connection of communication is established with mobile terminal, by hardware acquisition sound view Frequency evidence is sent to communication cloud or receives the data of communication cloud.

3. a kind of dollying head recognition of face low latency solution based on webRTC according to claim 1 or 2, It is characterized in that transcoder uses bottom transcoding technology, inherits when establishing task at one into two in step (10) Dshow frame is realized, is implemented as follows: obtaining mobile terminal video counts by Source module access RTC server first According to, then data are distributed by infTee module and spell pin module frame wrapper to video data decoder decoder and video, First branch video data decoder decoder parses bit stream data, then is transmitted to video encoder encoder and is compiled into RGB24 Figure is transmitted to face recognition module and carries out Characteristic Contrast, to capture face；Second branch's video spells pin module frame Wrapper is transmitted to FLVmux module, generates RTMP live stream, adds audio mute packet, carries out real-time transparent transmission.

4. a kind of dollying head recognition of face low latency solution based on webRTC according to claim 3, It is characterized in, in step (10), receives video data, is decoded into H264 uncorrected data, then H264 video data is converted to RBG24 Figure is not stopped refreshing with the cv::imshow method of Opencv and shown, achievees the effect that watch in real time by figure.

5. a kind of dollying head recognition of face low latency solution based on webRTC according to claim 1, It is characterized in, in step (10), face snap includes that Face datection, face tracking, recognition of face and living body verify four parts, Face datection refers to detection static images face and returns to face frame coordinate, landmark coordinate and mass fraction information； Face tracking refers to obtaining every the face tracking detection of monitoring or dynamic video realization Millisecond under complex scene in real time Face frame coordinate, landmark coordinate and the mass fraction information of all faces in one frame, and do not blocked, obscured by face, The influence of side face factor；Recognition of face refers to that the recognition of face for 1:1 and 1:N compares, and wherein 1:1 is compared in recall rate Misclassification rate is lower than ten a ten thousandths in the case where 96%, and 1:N is compared in extensive unlimited ethnic group, the portrait data bottom at unlimited age The retrieval of Millisecond is realized on library；Whether living body verifying is true man's operation before referring to verifying mobile terminal camera.