CN111259711A

CN111259711A - Lip movement identification method and system

Info

Publication number: CN111259711A
Application number: CN201811471395.8A
Authority: CN
Inventors: 张修宝; 沈海峰
Original assignee: Beijing Didi Infinity Technology and Development Co Ltd
Current assignee: Beijing Didi Infinity Technology and Development Co Ltd
Priority date: 2018-12-03
Filing date: 2018-12-03
Publication date: 2020-06-09

Abstract

The embodiment of the application discloses a method and a system for identifying lip movements. The method for identifying lip movements comprises the following steps: acquiring a plurality of frames of face images of a face to be recognized; the multi-frame face images are shot at different moments; acquiring lip characteristic values of a plurality of single-frame face images based on the multi-frame face images; determining a lip reference feature value based on the plurality of lip feature values; comparing the lip characteristic value of at least one single-frame face image with the lip reference characteristic value; and determining whether the face to be recognized has lip movement or not based on the comparison result. The lip movement recognition method and the lip movement recognition device can simply and quickly recognize the lip movement so as to achieve the purpose of recognizing the lip movement on portable equipment.

Description

Lip movement identification method and system

Technical Field

The present application relates to the field of image information processing, and in particular, to a method and system for recognizing lip movements.

Background

With the rapid development of image information processing technology, face recognition technology is widely applied in the fields of human-computer interaction, face detection and the like. For example, in more and more online transaction platforms, in order to ensure the security of the transaction, the identity of a customer or a service provider is usually verified, and a facial image is usually acquired. In order to prevent someone from maliciously adopting the photo or information of another person to pass identity authentication, it is necessary to verify whether the face in the image is real in face recognition. The human face in the image can be judged to be a real living body or a photo by enabling a verifier to finish a randomly specified action according to an instruction, or enabling the verifier to say a word or a sentence randomly, and detecting whether the human face has lip movement or whether the lip movement corresponds to the voice through the human face image. Therefore, there is a need for a simple and effective method for lip movement identification.

Disclosure of Invention

One embodiment of the application provides a method and a system for identifying lip movements. The lip motion recognition method can simply and quickly recognize the lip motion so as to reduce the data processing amount and reduce the requirements on arithmetic equipment, and can achieve the purpose of finishing the lip motion recognition on portable equipment. The method for identifying lip movements comprises the following steps: acquiring a plurality of frames of face images of a face to be recognized; and shooting the plurality of frames of face images at different moments. And acquiring lip characteristic values of a plurality of single-frame face images based on the multi-frame face images. Determining a lip reference characteristic value based on the plurality of lip characteristic values. And comparing the lip characteristic value of at least one single-frame face image with the lip reference characteristic value. And determining whether the face to be recognized has lip movement or not based on the comparison result.

In some embodiments, determining whether the face to be recognized has lip movement includes: and determining the absolute value of the difference value between the lip characteristic value of at least one single-frame face image and the lip reference characteristic value. Determining a maximum of at least one of the absolute values of the differences. And when the maximum value is larger than a first preset value, determining that the human face to be recognized has lip movement.

In some embodiments, the determining the lip feature value of the single frame of the face image includes: and acquiring at least one first characteristic point coordinate positioned on the upper lip and at least one second characteristic point coordinate positioned on the lower lip in the single-frame face image. Determining a first distance between the upper lip and the lower lip based on the first feature point coordinates and the second feature point coordinates. And acquiring coordinates of two lip corner points in the single-frame face image. And obtaining a second distance between the two lip corner points based on the coordinates of the two lip corner points. Determining the lip characteristic value in the single-frame face image based on the first distance and the second distance.

In some embodiments, the first feature points and the second feature points corresponding to the first feature points are distributed in an up-down symmetrical manner.

In some embodiments, the lip characteristic value is a ratio of the first distance and the second distance.

In some embodiments, the determining a first distance between the upper lip and the lower lip based on the first feature point coordinates and the second feature point coordinates comprises: and determining a feature point distance based on each of the two or more first feature points in the single-frame face image and a second feature point corresponding to the first feature point. Determining the first distance based on two or more of the feature point distances.

In some embodiments, the first distance is an average of two or more of the feature point distances.

In some embodiments, the determining the lip feature value of the single frame of the face image includes: and acquiring a normalized face image of the single-frame face image. And determining the lip characteristic value based on the normalized face image of the single-frame face image.

In some embodiments, the obtaining a normalized face image of a single frame of face image includes: and acquiring the coordinates of the human face characteristic points on the human face in the single-frame human face image. And determining an affine transformation matrix based on the human face characteristic point coordinates and the corresponding preset human face characteristic point coordinates. And carrying out affine transformation on the single-frame face image based on the affine transformation matrix to obtain a normalized face image of the single-frame face image.

In some embodiments, the face feature points include at least one of the following feature points: eye center, nose tip, and lip angle.

In some embodiments, the method further comprises performing image size reduction processing on each of the plurality of frames of face images.

In some embodiments, the lip reference feature value is an average of lip feature values of a plurality of the single-frame face images.

In some embodiments, the plurality of frames of face images are obtained from a video file of the face to be recognized.

One of the embodiments of the present application provides a system for recognizing lip movements, including: the acquisition module is used for acquiring a plurality of frames of face images of a face to be recognized; and shooting the plurality of frames of face images at different moments. And the lip characteristic value determining module is used for acquiring lip characteristic values of a plurality of single-frame face images based on the multi-frame face images. A lip reference characteristic value determination module for determining a lip reference characteristic value based on the plurality of lip characteristic values. The recognition module is used for comparing the lip characteristic value of at least one single-frame face image with the lip reference characteristic value; and determining whether the face to be recognized has lip movement or not based on the comparison result.

In some embodiments, the identification module is further to: and determining the absolute value of the difference value between the lip characteristic value of at least one single-frame face image and the lip reference characteristic value. Determining a maximum of at least one of the absolute values of the differences. And when the maximum value is larger than a first preset value, determining that the human face to be recognized has lip movement.

One embodiment of the present application provides a device for recognizing lip movements, which includes at least one storage medium and at least one processor. The at least one storage medium is configured to store computer instructions. The at least one processor is configured to execute the computer instructions to implement the method of identifying lip movements as described above.

One of the embodiments of the present application provides a computer readable storage medium storing computer instructions which, when executed by a processor, implement the method for identifying lip movements as described above.

Drawings

The present application will be further explained by way of exemplary embodiments, which will be described in detail by way of the accompanying drawings. These embodiments are not intended to be limiting, and in these embodiments like numerals are used to indicate like structures, wherein:

FIG. 1 is a schematic diagram of an application scenario of an image information service system according to some embodiments of the present application;

FIG. 2 is a schematic diagram of exemplary hardware and/or software components of an exemplary computing device according to some embodiments of the present invention;

FIG. 3 is a schematic diagram of exemplary hardware components and/or software components of an exemplary mobile device shown in accordance with some embodiments of the present invention;

FIG. 4 is a block diagram of a system for identifying lip movements according to some embodiments of the present application;

FIG. 5 is an exemplary flow chart of a method of identifying lip movements shown in accordance with some embodiments of the present application;

FIG. 6 is an exemplary flow chart illustrating the determination of lip feature values according to some embodiments of the present application;

FIG. 7 is an exemplary schematic diagram of a lip feature point distribution shown in accordance with some embodiments of the present application;

FIG. 8 is another exemplary flow chart illustrating the determination of lip characteristics according to some embodiments of the present application;

FIG. 9 is an exemplary schematic diagram of a normalized face image according to some embodiments of the present application.

FIG. 10 is an exemplary schematic diagram of 68 individual face contour points shown in accordance with some embodiments of the present application.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the description of the embodiments will be briefly introduced below. It is obvious that the drawings in the following description are only examples or embodiments of the application, from which the application can also be applied to other similar scenarios without inventive effort for a person skilled in the art. Unless otherwise apparent from the context, or otherwise indicated, like reference numbers in the figures refer to the same structure or operation.

It should be understood that "system", "device", "unit" and/or "module" as used herein is a method for distinguishing different components, elements, parts, portions or assemblies at different levels. However, other words may be substituted by other expressions if they accomplish the same purpose.

As used in this application and the appended claims, the terms "a," "an," "the," and/or "the" are not intended to be inclusive in the singular, but rather are intended to be inclusive in the plural unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that steps and elements are included which are explicitly identified, that the steps and elements do not form an exclusive list, and that a method or apparatus may include other steps or elements.

Flow charts are used herein to illustrate operations performed by systems according to embodiments of the present application. It should be understood that the preceding or following operations are not necessarily performed in the exact order in which they are performed. Rather, the various steps may be processed in reverse order or simultaneously. Meanwhile, other operations may be added to the processes, or a certain step or several steps of operations may be removed from the processes.

Embodiments of the present application may be applied to different transportation systems including, but not limited to, one or a combination of terrestrial, marine, aeronautical, aerospace, and the like. For example, taxis, special cars, tailplanes, buses, designated drives, trains, railcars, high-speed rails, ships, airplanes, hot air balloons, unmanned vehicles, receiving/sending couriers, and the like, employ managed and/or distributed transportation systems. The application scenarios of the different embodiments of the present application include, but are not limited to, one or a combination of several of a web page, a browser plug-in, a client, a customization system, an intra-enterprise analysis system, an artificial intelligence robot, and the like. It should be understood that the application scenarios of the system and method of the present application are merely examples or embodiments of the present application, and those skilled in the art can also apply the present application to other similar scenarios without inventive effort based on these figures. For example, other similar guided user parking systems.

The terms "passenger", "passenger end", "user terminal", "customer", "demander", "service demander", "consumer", "user demander" and the like are used interchangeably and refer to a party that needs or orders a service, either a person or a tool. Similarly, "driver," "provider," "service provider," "server," and the like, as described herein, are interchangeable and refer to an individual, tool, or other entity that provides a service or assists in providing a service. In addition, a "user" as described herein may be a party that needs or subscribes to a service, or a party that provides or assists in providing a service.

Fig. 1 is a schematic diagram illustrating an application scenario of an image information service system 100 according to some embodiments of the present application. For example, the image information service system 100 may be an online service platform for a variety of services. In some embodiments, the image information service system 100 may be used for a service of facial image information in a network appointment service, for example, a facial recognition service of a taxi, an identity verification service of a special car, a monitoring service of a public transport, an identity authentication service of a passenger, and the like. In some embodiments, the image information service system 100 may also be used for home services, online shopping, take-out, and the like. For example, in home services, for identity authentication of service providers, face recognition, monitoring of service processes, and the like. The image information service system 100 may include a server 110, a network 120, a storage device 130, one or more image capture terminals 140, and an information source 150. The server 110 may include a processing engine 112.

In some embodiments, the server 110 may be a single server or a group of servers. The server farm can be centralized or distributed (e.g., server 110 can be a distributed system). In some embodiments, the server 110 may be local or remote. For example, the server 110 may access the storage device 130 and the image capture terminal 140 via the network 120. As another example, server 110 may be directly connected to storage device 130, image capture terminal 140 to access stored information and/or data. In some embodiments, the server 110 may be implemented on a cloud platform. By way of example only, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, between clouds, multiple clouds, the like, or any combination of the above. In some embodiments, server 110 may be implemented on a computing device similar to that shown in FIG. 2 or FIG. 3 of the present application. For example, server 110 may be implemented on one computing device 200 as shown in FIG. 2, including one or more components in computing device 200. As another example, server 110 may be implemented on a mobile device 300 as shown in FIG. 3, including one or more components in computing device 300. In some embodiments, processing engine 112 may process data and/or information related to location information services to perform one or more of the functions described herein. For example, the processing engine 112 may process the face image based on the image capturing terminal 140 acquiring the face image or the face video. Or the processing engine 112 may obtain a face image or a face video from the image acquisition terminal 140, and identify whether the face has lip movement through the lip feature values in the face image or the face video. In some embodiments, the processing engine 112 may perform normalization processing on the face image. For example, facial images with different angles and different inclination degrees are normalized into facial images with correct front. For another example, the processing engine 112 may determine the lip feature value of the human face after normalizing the human face image, so as to improve the recognition accuracy.

In some embodiments, image capture terminal 140 may be a capture device for images or videos. In some embodiments, the image capture terminal 140 may include, but is not limited to, a camera 140-1, a laptop computer 140-2, a vehicle mounted built-in device 140-3, a mobile device 140-4, and the like, or any combination thereof. In some embodiments, the in-vehicle built-in device 140-3 may include, but is not limited to, an in-vehicle computer, an in-vehicle head-up display (HUD), an in-vehicle automatic diagnostic system (OBD), a tachograph, an in-vehicle navigation, and the like, or any combination thereof. In some embodiments, mobile device 140-4 may include, but is not limited to, a smartphone, a Personal Digital Assistant (PDA), a tablet, a palmtop, smart glasses, a smart watch, a wearable device, a virtual display device, a display enhancement device, and the like, or any combination thereof. In some embodiments, image capture terminal 140 may send the transport service requirements to server 110 for processing. In some embodiments, the image capture terminal 140 may send the facial image information to one or more devices in the image information service system 100. In some embodiments, image capture terminal 140 may receive image information capture instructions sent by one or more devices in location image service system 100. In some embodiments, the image capturing terminal 140 may be disposed in a car, a mall, a supermarket, a house, an office, or the like to obtain image information of a human face.

Storage device 130 may store data and/or instructions. In some embodiments, storage device 130 may store data obtained from image capture terminal 140. In some embodiments, storage device 130 may store data and/or instructions for execution or use by server 110, which may be executed or used by server 110 to implement the example methods described herein. In some embodiments, storage device 130 may include mass storage, removable storage, volatile read-write memory, read-only memory (ROM), the like, or any combination of the above. Exemplary mass storage devices may include magnetic disks, optical disks, solid state drives, and the like. Exemplary removable memory may include flash memory disks, floppy disks, optical disks, memory cards, compact disks, magnetic tape, and the like. Exemplary volatile read-only memory can include Random Access Memory (RAM). Exemplary random access memories may include Dynamic Random Access Memory (DRAM), Double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM), Static Random Access Memory (SRAM), silicon controlled random access memory (T-RAM), zero capacitance memory (Z-RAM), and the like. Exemplary read-only memories may include mask read-only memory (MROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically erasable programmable read-only memory (EEPROM), compact disk read-only memory (CD-ROM), digital versatile disk read-only memory (dfrom), and the like. In some embodiments, storage device 130 may be implemented on a cloud platform. By way of example only, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, between clouds, multiple clouds, the like, or any combination of the above.

In some embodiments, a storage device 130 may be connected to network 120 to enable communication with one or more components (e.g., server 110, image capture terminal 140, etc.) in image information service system 100. One or more components of the image information service system 100 may access data or instructions stored in the storage device 130 through the network 120. In some embodiments, storage device 130 may be directly connected to or in communication with one or more components of image information service system 100 (e.g., server 110, image capture terminal 140, etc.). In some embodiments, storage device 130 may be part of server 110.

Network 120 may facilitate the exchange of information and/or data. In some embodiments, one or more components in the image information service system 100 (e.g., the server 110, the storage device 130, and the image capture terminal 140, etc.) may send information and/or data to other components in the image information service system 100 over the network 120. For example, the server 110 may acquire/obtain data information from the image capture terminal 140 via the network 120. In some embodiments, the network 120 may be any one of, or a combination of, a wired network or a wireless network. For example, network 120 may include a cable network, a wired network, a fiber optic network, a telecommunications network, an intranet, the internet, a Local Area Network (LAN), a Wide Area Network (WAN), a Wireless Local Area Network (WLAN), a Metropolitan Area Network (MAN), a Public Switched Telephone Network (PSTN), a bluetooth network, a ZigBee network, a Near Field Communication (NFC) network, the like, or any combination of the above. In some embodiments, network 120 may include one or more network access points. For example, the network 120 may include wired or wireless network access points, such as base stations and/or Internet switching points 120-1, 120-2, and so forth. Through the access point, one or more components of the image information service system 100 may connect to the network 120 to exchange data and/or information.

The information source 150 is a source that provides other information to the image information service system 100. The information sources 150 may be used to provide information to the system related to online service information, such as weather conditions, legal information, news information, life guide information, and the like. The information source 150 may be in the form of a single central server, or may be in the form of a plurality of servers connected via a network, or may be in the form of a large number of personal devices. When the information source 150 exists as a plurality of personal devices, the devices may upload text, voice, image, video, etc. to the cloud server in a user-generated content (user-generated content) manner, so that the cloud server communicates with the plurality of personal devices connected thereto to form the information source 150.

It should be noted that, in some other embodiments, the image capturing terminal 140 may also integrate a processing engine, and at this time, the image capturing terminal 140 does not need to upload the face image or the face video to the server 110, but directly processes the face image or the face video acquired by the image capturing terminal to implement the exemplary method described in this application.

FIG. 2 is a schematic diagram of an exemplary computing device 200 shown in accordance with some embodiments of the invention. Server 110 and storage device 130 may be implemented on computing device 200. For example, the processing engine 112 may be implemented on the computing device 200 and configured to implement the functionality disclosed herein.

Computing device 200 may include any components used to implement the systems described herein. For example, the processing engine 112 may be implemented on the computing device 200 by its hardware, software programs, firmware, or a combination thereof. For convenience, only one computer is depicted in the figures, but the computing functions described herein with respect to the image information service system 100 may be implemented in a distributed manner by a set of similar platforms to distribute the processing load of the system.

Computing device 200 may include a communication port 250 for connecting to a network for enabling data communication. Computing device 200 may include a processor (e.g., CPU)220 that may execute program instructions in the form of one or more processors. An exemplary computer platform may include an internal bus 210, various forms of program memory and data storage including, for example, a hard disk 270, and Read Only Memory (ROM)230 or Random Access Memory (RAM)240 for storing various data files that are processed and/or transmitted by the computer. An exemplary computing device may include program instructions stored in read-only memory 230, random access memory 240, and/or other types of non-transitory storage media that are executed by processor 220. The methods and/or processes of the present application may be embodied in the form of program instructions. Computing device 200 also includes input/output component 260 for supporting input/output between the computer and other components. Computing device 200 may also receive programs and data in the present disclosure via network communication.

For ease of understanding, only one processor is exemplarily depicted in fig. 2. However, it should be noted that the computing device 200 in the present application may include multiple processors, and thus the operations and/or methods described in the present application that are implemented by one processor may also be implemented by multiple processors, collectively or independently. For example, if in the present application the processors of computing device 200 perform

steps

1 and 2, it should be understood that

steps

1 and 2 may also be performed by two different processors of computing device 200, either collectively or independently (e.g., a first processor performing step 1, a second processor performing step 2, or a first and second

processor performing steps

1 and 2 collectively).

Fig. 3 is a schematic diagram of exemplary hardware and/or software of an exemplary mobile device 300, shown in accordance with some embodiments of the present invention. The acquisition of trajectory data may be implemented on the mobile device 300. As shown in fig. 3, mobile device 300 may include a communication unit 310, a display unit 320, a graphics processor 330, a processor 340, an input/output unit 350, a memory 360, and a storage unit 390. A bus or a controller may also be included in the mobile device 300. In some embodiments, mobile operating system 370 and one or more application programs 380 may be loaded from storage unit 390 into memory 360 and executed by processor 340. In some embodiments, application 380 may receive and display information for image processing or other information related to processing engine 112. The input/output unit 350 may enable interaction of data information with the image information service system 100 and provide interaction-related information to other components in the image information service system 100, such as the server 110, through the network 120.

To implement the various modules, units and their functionality described in this application, a computer hardware platform may be used as the hardware platform for one or more of the elements mentioned herein. A computer having user interface elements may be used to implement a Personal Computer (PC) or any other form of workstation or terminal equipment. A computer may also act as a server, suitably programmed.

FIG. 4 is a block diagram of an identify lip movement system according to some embodiments of the present application. As shown in fig. 4, the lip motion recognition system may include an acquisition module 410, a lip motion feature value determination module 420, a lip motion reference feature value determination module 430, a recognition module 440, and an image pre-processing module 450. In some embodiments, the acquisition module 410, the lip movement feature value determination module 420, the lip movement reference feature value determination module 430, the recognition module 440, and the image pre-processing module 450 may be included in the processing engine 112 shown in fig. 1.

The obtaining module 410 may be configured to obtain multiple frames of face images of a face to be recognized. In some embodiments, the plurality of frames of face images may be images of faces to be recognized, which are taken at different times. For example, the face image may be a face image that changes continuously with the passage of time. In some embodiments, the plurality of frames of face images may be face images taken at regular intervals. And shooting the plurality of frames of face images at different moments. In some embodiments, the plurality of frames of face images are extracted from the video file of the face to be recognized. In some embodiments, a camera may be arranged to obtain multiple frames of face images of a face to be recognized. For example, in a network appointment service, a plurality of frames of face images of faces to be recognized can be acquired through a camera arranged in a vehicle. For another example, a camera may be set in a shopping mall, an office area, a house, or the like, to obtain a plurality of frames of face images of faces to be recognized. In some embodiments, a mobile device with a camera function, such as a mobile phone, a notebook computer, a vehicle data recorder, and the like, may be used to collect multiple frames of face images of a face to be recognized. In some embodiments, the facial image obtained by the camera may be stored in the server 110, the storage device 130. In some embodiments, multiple frames of face images of the face to be recognized may be acquired from the server 110, the network 120, the storage device 130, one or more image capturing terminals 140, or the information source 150.

The lip movement feature value determination module 420 may be configured to determine a lip feature value of a single frame of face image based on each of the plurality of frame of face images. In some embodiments, the lip characteristic value may be a characteristic value that characterizes whether the lips have opening and closing actions. In some embodiments, the lip characteristic value may be a distance between the upper lip and the lower lip. In some embodiments, at least one first feature point coordinate located on the upper lip and at least one second feature point coordinate located on the lower lip in a single frame of face image may be acquired. In some embodiments, the first feature points and the second feature points corresponding to the first feature points are distributed in an up-down symmetrical manner. As shown in fig. 7, in some embodiments, the first feature point 710 may be the highest point of the upper lip. In some embodiments, the second feature point 720 may be the lowest point of the lower lip. In some embodiments, there may be 3 first feature points. In some embodiments, the first feature point located in the middle of the two lip peaks of the upper lip may be set as a middle feature point of the upper lip, and the three first feature points may be the middle feature point of the upper lip and two feature points located at arbitrary positions on both sides of the middle feature point (see fig. 7). In some embodiments, the first distance between the upper lip and the lower lip may be determined based on the first feature point coordinates and the second feature point coordinates. In some embodiments, a feature point distance may be determined based on each of two or more first feature points in the single-frame face image and a second feature point corresponding thereto. In some embodiments, the first distance may be determined based on two or more of the feature point distances. In some embodiments, the first distance may be one or a combination of euclidean distance, manhattan distance, chebyshev distance, mahalanobis distance, normalized euclidean distance, or the like. In some embodiments, the first distance may be an arithmetic average of two or more of the feature point distances. In other embodiments, the first distance may also be one or any combination of two or more of the above feature point distances, a weighted sum, a weighted average, a geometric average, a squared average, or a harmonic average. For example, in fig. 7, in a single frame of face image, the distance between each first feature point 710 and the corresponding second feature point 720 is a feature point distance. If there are three first feature points and three corresponding second feature points, three feature point distances can be obtained from three groups of feature points which are symmetrical up and down. The arithmetic mean value of the distances of the three characteristic points is the first distance. In some embodiments, coordinates of two lip corner points in the single frame of face image may be obtained. For example, in fig. 7, coordinates of two lip corner points 730 in a single frame of face image can be obtained. In some embodiments, the second distance between the lip corners may be obtained based on the lip corner coordinates. For example, in fig. 7, the distance between lip points 730 is the second distance.

In some embodiments, the lip characteristic value is a ratio of the first distance and the second distance. Because the lips of everyone are different, and the size such as thickness or width of lip is also different, regard the ratio of first distance and second distance as lip characteristic value, the calculation deviation that difference between the different lips leads to can be avoided, is favorable to improving the accuracy of discernment. For example, as shown in fig. 7, the multi-frame face image may be four single-frame face images in fig. 7, and three first feature points, a second feature point 720 corresponding to the first feature value, and two lip points 730 are obtained in each single-frame face image. And taking the distance between each first characteristic point and the corresponding second characteristic point as a characteristic point distance, and obtaining the average value of the three characteristic point distances as a first distance. And calculating the distance between the two lip corner points 730 as a second distance, wherein the ratio of the first distance to the second distance is the lip characteristic value of the single-frame face image. And calculating the lip characteristic value of each single-frame face image to finally obtain 4 lip characteristic values.

In some embodiments, the coordinates of the feature points of the human face located on the human face in the single frame of the human face image may be acquired. In some embodiments, the face feature points include at least one of the following feature points: eye center, nose tip, and lip angle. In some embodiments, the contour points of the face may include contour points of the face, contour points of various components such as eyes, nose, lips, eyebrows, and the like. The coordinates of the eye center point, the nose tip point and the lip corner point are not easily influenced by facial expressions and are suitable for serving as reference points of the face. In some embodiments, as shown in fig. 9, the facial feature points may be two eye center points, a nose tip point, and two lip corner points. In some embodiments, an affine transformation matrix may be determined based on the coordinates of the face feature points and the corresponding preset coordinates of the face feature points, and then an affine transformation is performed on the single-frame face image based on the affine transformation matrix to obtain a normalized face image of the single-frame face image. In some embodiments, the predetermined face feature points may be feature points on a standard face of a frontal, predetermined overall size. The coordinates of the human face characteristic points and the coordinates of the preset human face characteristic points are converted through an emission transformation matrix, and human face images with different angles, inclination degrees and lens distances are converted into the front preset human face images with uniform overall sizes. As shown in fig. 9, the tilted face image can be converted into a frontal face image by the determined affine transformation matrix. In some embodiments, the affine transformation matrix may be obtained by establishing an equation set by coordinates of the face feature points in the single frame of face image and coordinates of the corresponding preset face feature points. In some embodiments, after the affine transformation matrix is determined, the coordinates of the data points representing the face information in the single-frame face image are converted into coordinates in a preset face image, and a normalized face image is obtained. And normalizing each single-frame face image in the multi-frame face images to obtain a normalized face image of each single-frame face image.

In some embodiments, the lip feature values may be determined based on a normalized face image of a single frame of the face image. In some embodiments, a normalized face image of a single frame of face image may be obtained, a first feature point coordinate and a second feature point coordinate may be obtained from the normalized face image, a first distance and a second distance may be calculated, and a lip feature value may be determined according to a ratio of the first distance and the second distance. In some embodiments, a plurality of lip feature point coordinates may be obtained from the normalized face image, and the first feature point coordinate and the second feature point coordinate are obtained from the plurality of lip feature point coordinates. In some embodiments, the coordinates of 68 face contour points may be obtained from the normalized face image, and the coordinates of the first feature point and the coordinates of the second feature point may be obtained from the 68 face contour points. In some embodiments, the distribution of 68 face contour points may be as shown in FIG. 10. Including contour points of both cheeks and contour points of major facial organs. And each contour point has a fixed number from which the coordinates of the corresponding contour point can be determined. In some embodiments, the first feature point and the second feature point may be selected from lip contour points. In some embodiments, the coordinates of the first feature point and the coordinates of the second feature point may be determined by the number of the selected lip contour points. For example, as shown in fig. 10, the first feature point coordinates may be contour point coordinates numbered 50, 51, 52, 53, 54, 62, 63, or 64, and the second feature point coordinates may be contour point coordinates numbered 56, 57, 58, 59, 60, 66, 67, or 68. In some embodiments, the first distance may be determined from the first feature point coordinates and the second feature point coordinates. In some embodiments, coordinates of two lip corner points may be obtained from the single frame of normalized face image, and the second distance may be determined. For example, as shown in fig. 10, the coordinates of the lip points may be coordinates of contour points numbered 49 and 55, and the second distance may be a straight-line distance between two points of the feature point 49 and the feature point 55. And calculating the lip characteristic value of the single-frame face image according to the first distance and the second distance.

The lip movement reference characteristic value determination module 430 may be configured to determine a lip reference characteristic value based on lip characteristic values of a plurality of the single-frame face images. In some embodiments, the lip reference feature value is an average of lip feature values of a plurality of the single-frame face images. For example, the lip feature value is a distance between a first feature point of an upper lip and a second feature point of a lower lip, and the lip reference feature value may be an average value of distances between the first feature point and the second feature point in each single frame of face images in multiple frames of face images.

The recognition module 440 may be configured to compare the lip feature value of at least one of the single-frame face images with the lip reference feature value. In some embodiments, it may be determined whether the face to be recognized has lip movement based on the comparison result. In some embodiments, the absolute value of the difference between the lip feature value of at least one of the single-frame face images and the lip reference feature value may be calculated, and the maximum value of at least one of the absolute values of the difference may be determined. In some embodiments, when the maximum value is greater than a first preset value, it is determined that the face to be recognized has lip movement. That is, if the difference between the maximum value of the opening of the upper lip and the lower lip or the reduced maximum value and the average value of the distance between the upper lip and the lower lip exceeds a certain range, the opening and closing of the lips are indicated, and the lips move. If the difference between the maximum value of the opening or the maximum value of the reduction of the upper and lower lips and the average value of the distance between the upper and lower lips does not exceed a certain range, lip motion does not occur in the lips. In some embodiments, the first preset value may be 3.0.

The image preprocessing module 450 may be configured to perform image size reduction processing on each of the plurality of frames of face images. In some embodiments, the image size reduction process may be performed on each of the plurality of frames of face images. In some embodiments, the frame width and the frame height of each frame of image in the multiple frames of face images may be obtained, the minimum value of the frame width and the frame height of each frame of image is determined, the minimum value is compared with the preset image size value, and when the minimum value is greater than the second preset value, the frame width and the frame height of each frame of image in the video are reduced in equal proportion, so that the maximum value of the frame width and the frame height in the video is smaller than the preset image size value. The multi-frame image of the face to be recognized is subjected to reduction processing, so that the problem that the resolution of the image is too large, and the data processing amount of the image is too large can be avoided.

It should be understood that the system and its modules shown in FIG. 4 may be implemented in a variety of ways. For example, in some embodiments, the system and its modules may be implemented in hardware, software, or a combination of software and hardware. Wherein the hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory for execution by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the methods and systems described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided, for example, on a carrier medium such as a diskette, CD-or DVD-ROM, a programmable memory such as read-only memory (firmware), or a data carrier such as an optical or electronic signal carrier. The system and its modules of the present application may be implemented not only by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., but also by software executed by various types of processors, for example, or by a combination of the above hardware circuits and software (e.g., firmware).

It should be noted that the above descriptions of the candidate item display and determination system and the modules thereof are only for convenience of description, and are not intended to limit the present application within the scope of the illustrated embodiments. It will be appreciated by those skilled in the art that, given the teachings of the present system, any combination of modules or sub-system configurations may be used to connect to other modules without departing from such teachings. For example, in some embodiments, for example, the acquiring module 410, the lip movement feature value determining module 420, the lip movement reference feature value determining module 430, the identifying module 440 and the image preprocessing module 450 disclosed in fig. 4 may be different modules in a system, or may be a module that implements the functions of two or more modules described above. For example, the obtaining module 410 and the lip movement characteristic value determining module 420 may be two modules, or one module may have the function of obtaining and determining the lip movement characteristic value. For example, each module may share one memory module, and each module may have its own memory module. Such variations are within the scope of the present application. In some embodiments, the image pre-processing module 450 may also be omitted.

FIG. 5 illustrates an exemplary flow chart of a method of identifying lip motion according to some embodiments of the present application. In some embodiments, flow 500 may be performed by processing logic that may comprise hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (instructions run on a processing device to perform hardware simulation), etc., or any combination thereof. One or more operations in the process 500 for identifying lip movements shown in fig. 5 may be implemented by the image information service system 100 shown in fig. 1. For example, the flow 500 may be stored in the storage device 150 in the form of instructions and executed by the processing engine 112 to perform the calls and/or perform the operations (e.g., the processor 220 of the computing device 200 shown in fig. 2, the central processor 340 of the mobile device 300 shown in fig. 3). As shown in fig. 5, the lip movement identifying method may include:

step 510, a plurality of frames of face images of the face to be recognized may be obtained. In some embodiments, step 510 may be performed by acquisition module 410. In some embodiments, the plurality of frames of face images may be images of faces to be recognized, which are taken at different times. For example, the face image may be a face image that changes continuously with the passage of time. In some embodiments, the plurality of frames of face images may be face images taken at each certain time. For example, it may be a plurality of frames of face images of a face to be recognized, which are photographed every 0.5 seconds. In some embodiments, the plurality of frames of face images may be face images obtained from a video file of a face to be recognized. In some embodiments, the plurality of frames of face images may be consecutive face images of a face to be recognized. In some embodiments, the multiple frames of face images may be extracted face images which are discontinuous but have a certain sequence in the video of the face to be recognized. For example, every two frames of a face image, such as face images numbered 1, 4, 7, 10, etc., may be extracted. In some embodiments, the image size reduction process may be performed on each of the plurality of frames of face images. In some embodiments, the frame width and the frame height of each frame of image in the multiple frames of face images may be obtained, the minimum value of the frame width and the frame height of each frame of image is determined, the minimum value is compared with the preset image size value, and when the minimum value is greater than the second preset value, the frame width and the frame height of each frame of image in the video are reduced in equal proportion, so that the maximum value of the frame width and the frame height in the video is smaller than the preset image size value. In some embodiments, the image reduction may be performed by one or a combination of nearest neighbor interpolation, bilinear interpolation, cubic convolution, local averaging, and the like. The multi-frame image of the face to be recognized is subjected to reduction processing, so that the problem that the resolution of the image is large, and the data processing amount of the image is too large can be avoided. In some embodiments, a camera may be arranged to obtain multiple frames of face images of a face to be recognized. For example, in a network appointment service, a plurality of frames of face images of faces to be recognized can be acquired through a camera arranged in a vehicle. For another example, a camera may be set in a shopping mall, an office area, a house, or the like, to obtain a plurality of frames of face images of faces to be recognized. In some embodiments, a mobile device with a camera function, such as a mobile phone, a notebook computer, a vehicle data recorder, and the like, may be used to collect multiple frames of face images of a face to be recognized. In some embodiments, the facial image obtained by the camera may be stored in the server 110, the storage device 130. In some embodiments, multiple frames of face images of the face to be recognized may be acquired from the server 110, the network 120, the storage device 130, one or more image capturing terminals 140, or the information source 150.

In step 520, a lip feature value of a single frame of face image may be determined based on each of the plurality of frames of face images. In some embodiments, step 520 may be performed by the lip characteristic value determination module 420. In some embodiments, the lip characteristic value may be a characteristic value that characterizes whether the lips have opening and closing actions. For example, the distance between the highest point of the lip to the lowest point of the lip, the distance between two corners of the lip, the distance between any two feature points of the lip, the coordinates of any feature point of the lip, the coordinates of feature points at the cheeks and/or the chin around the lip, etc. In some embodiments, lip feature values may be determined for each single frame of face images in a plurality of frames of face images. In some embodiments, whether the face to be recognized has lip movement can be judged through the change of the lip characteristic value of each single frame of face image in the multiple frames of face images. For example, if the lip characteristic value is the distance between the highest point of the lip and the lowest point of the lip, whether the lip has an opening and closing action or not and whether the face to be recognized has lip movement or not can be seen through the change of the distance between the highest point of the lip and the lowest point of the lip in each single frame of face image. For another example, the lip feature value may be coordinates of any feature point of the lips, and whether the lips open and close or not and whether the face to be recognized has lip movement or not may be determined by whether the coordinates of the lip feature points of each single frame of face image change or not and the change amplitude. In some embodiments, a lip reference characteristic value may be set and compared with the lip reference characteristic value to determine whether there is lip movement. For example, the lip characteristic value may be a distance between a highest point of the lip and a lowest point of the lip, and the lip reference characteristic value may be a distance between a highest point of the lip and a lowest point of the lip in a tight state of the lip. And comparing the lip characteristic value with the lip reference characteristic value to judge whether the lip is opened or not and whether the human face to be recognized has lip movement or not.

In step 530, lip reference feature values may be determined based on lip feature values of a plurality of the single-frame face images. In some embodiments, step 530 may be performed by lip reference feature value determination module 430. In some embodiments, the lip reference feature value may be any one of an average value, a median, a standard deviation, a weighted average value, and the like of lip feature values of a plurality of the single-frame face images. In some embodiments, the lip reference feature value is an average of lip feature values of a plurality of the single-frame face images. For example, if the lip feature value is the distance between two feature points of the lip, the lip reference feature value may be an average value of the distances between two feature points of the lip in each single frame of face images in the multi-frame face images.

In step 540, the lip feature value of at least one single frame of face image may be compared with the lip reference feature value, and based on the comparison result, it is determined whether the face to be recognized has lip movement. In some embodiments, step 540 may be performed by identification module 440. In some embodiments, the absolute value of the difference between the lip feature value of at least one of the single-frame face images and the lip reference feature value may be determined, and the maximum value of at least one of the absolute values of the difference may be determined. In some embodiments, when the maximum value is greater than a first preset value, it is determined that the face to be recognized has lip movement. For example, the lip feature value may be a distance between feature points on the upper lip and feature points on the lower lip on each frame of face image, an average value of the lip feature values of the face images of the plurality of frames is calculated as a lip feature reference value, an absolute value of a difference between the lip feature value of each frame of face image and the lip feature reference value is calculated, a maximum value of the absolute values of the differences is compared with a first preset value, and if the absolute value is greater than the first preset value, it can be considered that the face to be recognized has lip movement. That is, the difference between the maximum value of the opening of the upper lip and the lower lip or the reduced maximum value and the average value of the distance between the upper lip and the lower lip is beyond a certain range. If the difference between the maximum value of the opening or the maximum value of the reduction of the upper and lower lips and the average value of the distance between the upper and lower lips does not exceed a certain range, lip motion does not occur in the lips. In some embodiments, the first preset value may be 3.0.

It should be noted that the above description is merely for convenience and should not be taken as limiting the scope of the present application. It will be understood by those skilled in the art that, having the benefit of the teachings of this system, various modifications and changes in form and detail may be made to the field of application for which the method and system described above may be practiced without departing from this teachings.

FIG. 6 illustrates an exemplary flow chart for determining lip characteristics according to some embodiments of the present application. In some embodiments, flow 600 may be performed by processing logic that may comprise hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (instructions run on a processing device to perform hardware simulation), etc., or any combination thereof. One or more operations in the process 600 for determining lip characteristics shown in fig. 5 may be implemented by the image information service system 100 shown in fig. 1. For example, the flow 600 may be stored in the storage device 150 in the form of instructions and executed by the processing engine 112 to perform the calls and/or execute (e.g., the processor 220 of the computing device 200 shown in fig. 2, the central processor 340 of the mobile device 300 shown in fig. 3). As shown in fig. 6, the method of determining a lip feature value may include:

at 610, at least one first feature point coordinate located on the upper lip and at least one second feature point coordinate located on the lower lip in the single frame of face image may be obtained. In some embodiments, step 610 may be performed by the lip characteristic value determination module 420. As shown in fig. 7, in each single frame of face image in the multiple frames of face images, the coordinates of at least one first feature point 710 located on the upper lip are obtained, and the coordinates of at least one second feature point 720 located on the lower lip are obtained. In some embodiments, the first feature point 710 may be the highest point of the upper lip. In some embodiments, the second feature point 720 may be the lowest point of the lower lip. In some embodiments, there may be 3 first feature points. In some embodiments, the first feature point located in the middle of the two lip peaks of the upper lip may be set as a middle feature point of the upper lip, and the three first feature points may be the middle feature point of the upper lip and two feature points located at arbitrary positions on both sides of the middle feature point (see fig. 7). In some embodiments, the number of first feature points may also be 5. The five first feature points may be a middle feature point and four feature points equally distributed on both sides of the middle feature point. In some embodiments, the first feature point may be two, and the two feature points may be two feature points located on both sides of the middle feature point. In some embodiments, the first feature points and the second feature points corresponding to the first feature points are distributed in an up-down symmetrical manner. For example, as shown in fig. 7, the first feature point is three feature points including a middle feature point of the upper lip and two feature points respectively located on both sides of the middle feature point. The second characteristic points are three vertically symmetrical characteristic points with the same vertical coordinate as the first characteristic points.

In 620, a first distance between the upper lip and the lower lip may be determined based on the first feature point coordinates and the second feature point coordinates. In some embodiments, step 620 may be performed by lip characteristic value determination module 420. In some embodiments, a feature point distance may be determined based on each of two or more first feature points in the single-frame face image and a second feature point corresponding thereto. In some embodiments, the first distance is determined based on two or more of the feature point distances. In some embodiments, the first distance may be an average of two or more of the feature point distances. For example, in fig. 7, in a single frame of face image, the distance between each first feature point 710 and the corresponding second feature point 720 is a feature point distance. If there are three first feature points and three corresponding second feature points, three feature point distances can be obtained from three groups of feature points which are symmetrical up and down. The average value of the distances of the three characteristic points is the first distance. In some embodiments, the first distance may be one or any combination of the sum, weighted sum, arithmetic mean, weighted average, geometric mean, squared average, or harmonic mean of the distances of the three feature points.

At 630, coordinates of two lip corners in the single frame of face image may be obtained. In some embodiments, step 630 may be performed by lip characteristic value determination module 420. For example, in fig. 7, coordinates of two lip corner points 730 in a single frame of face image can be obtained. In some embodiments, the coordinates of two lip corner points of each single frame of face image in the multiple frames of face images can be obtained.

At 640, a second distance between the lip corners may be obtained based on the lip corner coordinates. In some embodiments, step 640 may be performed by the lip characteristic value determination module 420. For example, in fig. 7, the distance between lip points 730 is the second distance. In some embodiments, the second distance may be one or a combination of euclidean distance, manhattan distance, chebyshev distance, mahalanobis distance, normalized euclidean distance, or the like.

In 650, the lip feature value in the single frame of face image may be determined based on the first distance and the second distance. In some embodiments, step 650 may be performed by the lip characteristic value determination module 420. In some embodiments, the lip characteristic value is a ratio of the first distance and the second distance. Because the lips of everyone are different, and the size such as thickness or width of lip is also different, regard the ratio of first distance and second distance as lip characteristic value, the calculation deviation that difference between the different lips leads to can be avoided, is favorable to improving the accuracy of discernment. In some embodiments, a first distance of each single frame of face image in the multiple frames of face images and a second distance of the single frame of face image may be determined, and a ratio of the first distance and the second distance of the single frame of face image may be used as a lip feature value of the single frame of face image. After the lip characteristic value of each single frame of face image is determined, a plurality of lip characteristic values corresponding to the number of the face images of the plurality of frames can be obtained. For example, as shown in fig. 7, the multi-frame face image may be four single-frame face images in fig. 7, and three first feature points, a second feature point corresponding to the first feature value, and two lip points 730 are obtained in each single-frame face image. And taking the distance between each first characteristic point and the corresponding second characteristic point as a characteristic point distance, and obtaining the average value of the three characteristic point distances as a first distance. And calculating the distance between the two lip corner points 730 as a second distance, wherein the ratio of the first distance to the second distance is the lip characteristic value of the single-frame face image. And calculating the lip characteristic value of each single-frame face image to finally obtain 4 lip characteristic values. According to the technical scheme, the average value of the 4 lip characteristic values can be used as the lip reference characteristic value. And comparing the 4 lip characteristic values with the lip reference characteristic values respectively to obtain the maximum value of the absolute value of the difference. If the maximum value of the absolute value of the difference value is larger than a first preset value (for example, 3.0), the face to be recognized can be considered to have lip movement.

FIG. 8 illustrates an exemplary flow chart for determining lip characteristics according to some embodiments of the present application. In some embodiments, flow 800 may be performed by processing logic that may comprise hardware (e.g., circuitry, dedicated logic, programmable logic, microcode, etc.), software (instructions run on a processing device to perform hardware simulation), etc., or any combination thereof. One or more operations in the process 800 for determining lip characteristics values shown in fig. 8 may be implemented by the image information service system 100 shown in fig. 1. For example, the flow 800 may be stored in the storage device 150 in the form of instructions and executed and/or invoked by the processing engine 112 (e.g., the processor 220 of the computing device 200 shown in fig. 2, the central processor 340 of the mobile device 300 shown in fig. 3). As shown in fig. 8, the method of determining a lip characteristic value may include:

at 810, coordinates of a feature point of a human face in the single frame of human face image may be obtained. In some embodiments, step 810 may be performed by lip characteristic value determination module 420. In some embodiments, the face feature points include at least one of the following feature points: eye center, nose tip, and lip angle. In some embodiments, the contour points of the face may include contour points of the face, contour points of various components such as eyes, nose, lips, eyebrows, and the like. The coordinates of the eye center point, the nose tip point and the lip corner point are not easily influenced by facial expressions and are suitable for serving as reference points of the face. In some embodiments, the coordinates of the center point of the eyes and the nose tip point can be acquired as the face feature points. In some embodiments, the coordinates of the center point of the eye and the lip corner point may be obtained as the face feature points. In some embodiments, as shown in fig. 9, the facial feature points may be two eye center points, a nose tip point, and two lip corner points.

At 820, an affine transformation matrix may be determined based on the face feature point coordinates and their corresponding preset face feature point coordinates. In some embodiments, step 820 may be performed by lip characteristic value determination module 420. In some embodiments, the predetermined face feature points may be feature points on a standard face of a frontal, predetermined overall size. The coordinates of the human face characteristic points and the coordinates of the preset human face characteristic points are converted through an emission transformation matrix, and human face images with different angles, inclination degrees and lens distances are converted into the front preset human face images with uniform overall sizes. As shown in fig. 9, the tilted face image can be converted into a frontal face image by the determined affine transformation matrix. In some embodiments, an equation may be established by coordinates of the face feature points in the single frame of face image and coordinates of corresponding preset face feature points, so as to obtain an affine transformation matrix. For example, the coordinates of a facial feature point in a single frame of facial image are (x, y), and the corresponding coordinates of a preset facial feature point are (x ', y'), an equation is established:

finding coefficients α and affine transformation matrix

In some embodiments, an affine transformation matrix may be determined for each single one of the plurality of frames of face images.

At 830, affine transformation may be performed on the single frame of face image based on the affine transformation matrix to obtain a normalized face image of the single frame of face image. In some embodiments, step 830 may be performed by the lip characteristic value determination module 420. In some embodiments, after the affine transformation matrix is determined, the coordinates of the data points representing the face information in the single-frame face image are substituted into equation (1) to calculate, and the coordinates of the data coordinate points in the preset face image are obtained, so as to obtain a normalized face image. And normalizing each single-frame face image in the multi-frame face images to obtain a normalized face image of each single-frame face image.

In 840, the lip feature values may be determined based on a normalized face image of a single frame of the face image. In some embodiments, step 840 may be performed by the lip characteristic value determination module 420. In some embodiments, a normalized face image of a single frame of face image may be obtained, a first feature point coordinate and a second feature point coordinate may be obtained from the normalized face image, a first distance and a second distance may be calculated, and a lip feature value may be determined according to a ratio of the first distance and the second distance. In some embodiments, a plurality of lip feature point coordinates may be obtained from the normalized face image, and the first feature point coordinate and the second feature point coordinate are obtained from the plurality of lip feature point coordinates. In some embodiments, the coordinates of 68 face contour points may be obtained from the normalized face image, and the coordinates of the first feature point and the coordinates of the second feature point may be obtained from the 68 face contour points. In some embodiments, the distribution of 68 face contour points may be as shown in FIG. 10. Including contour points of both cheeks and contour points of major facial organs. And each contour point has a fixed number from which the coordinates of the corresponding contour point can be determined. In some embodiments, the first feature point and the second feature point may be selected from lip contour points. In some embodiments, the coordinates of the first feature point and the coordinates of the second feature point may be determined by the number of the selected lip contour points.

In some embodiments, three first feature point coordinates (x) may be obtained from a single frame of normalized face image₁，y₁)、(x₂，y₂)、(x₃，y₃) And corresponding coordinates (x) of three second feature points_1d，y_1d)、(x_2d，y_2d)、(x_3d，y_3d). In some embodiments, three feature point distances D may be calculated according to equation (2) separately₁、D₂、D₃：

And calculating a first distance according to the three characteristic point distances:

D_mean＝(D₁+D₂+D₃)/3 (3)

in some embodiments, two lips may be acquired from the single frame normalized face imageCorner point coordinates (x)₄，y₄)、(x₅，y₅). Calculating a second distance:

calculating the lip characteristic value of the single-frame face image according to the first distance and the second distance:

E_i＝D_mean/D_lip(5)

in some embodiments, the lip feature value of each frame of the face image can be obtained according to the method. Calculating lip reference characteristic values of multiple frames of face images:

in some embodiments, the multi-frame face image may be a continuous 25-frame face image, i.e., n-25.

In some embodiments, the absolute value of the difference between each lip characteristic value and the lip reference characteristic value is calculated according to the following formula. The maximum value of the absolute value of the difference is determined.

E_{diff_max}＝max(|E_i-E′|) (5)

And comparing the maximum value of the absolute difference value with a first preset value. In some embodiments, if the maximum absolute value of the difference is greater than the first preset value, the face to be recognized has lip movement. In some embodiments, if the maximum absolute value of the difference is less than or equal to the first preset value, the face to be recognized has no lip movement.

The beneficial effects that may be brought by the embodiments of the present application include, but are not limited to: (1) according to the technical scheme, lip movements can be recognized simply and rapidly, and the calculation amount is reduced, so that the purpose of recognizing the lip movements on portable equipment can be achieved; (2) according to the method and the device, normalization processing is carried out on the face image in the identification process, normalization processing is also carried out on the lip characteristic value in the process of determining the lip characteristic value (lip characteristic value normalization is carried out by adopting the distance between two lip corner points), although the operand is reduced, the identification accuracy is high. It is to be noted that different embodiments may produce different advantages, and in different embodiments, any one or combination of the above advantages may be produced, or any other advantages may be obtained.

Having thus described the basic concept, it will be apparent to those skilled in the art that the foregoing detailed disclosure is to be considered merely illustrative and not restrictive of the broad application. Various modifications, improvements and adaptations to the present application may occur to those skilled in the art, although not explicitly described herein. Such modifications, improvements and adaptations are proposed in the present application and thus fall within the spirit and scope of the exemplary embodiments of the present application.

Also, this application uses specific language to describe embodiments of the application. Reference throughout this specification to "one embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic described in connection with at least one embodiment of the present application is included in at least one embodiment of the present application. Therefore, it is emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, some features, structures, or characteristics of one or more embodiments of the present application may be combined as appropriate.

Moreover, those skilled in the art will appreciate that aspects of the present application may be illustrated and described in terms of several patentable species or situations, including any new and useful combination of processes, machines, manufacture, or materials, or any new and useful improvement thereon. Accordingly, various aspects of the present application may be embodied entirely in hardware, entirely in software (including firmware, resident software, micro-code, etc.) or in a combination of hardware and software. The above hardware or software may be referred to as "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the present application may be represented as a computer product, including computer readable program code, embodied in one or more computer readable media.

The computer storage medium may comprise a propagated data signal with the computer program code embodied therewith, for example, on baseband or as part of a carrier wave. The propagated signal may take any of a variety of forms, including electromagnetic, optical, etc., or any suitable combination. A computer storage medium may be any computer-readable medium that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code located on a computer storage medium may be propagated over any suitable medium, including radio, cable, fiber optic cable, RF, or the like, or any combination of the preceding.

Computer program code required for the operation of various portions of the present application may be written in any one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C + +, C #, VB.NET, Python, and the like, a conventional programming language such as C, Visual Basic, Fortran 2003, Perl, COBOL 2002, PHP, ABAP, a dynamic programming language such as Python, Ruby, and Groovy, or other programming languages, and the like. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any network format, such as a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet), or in a cloud computing environment, or as a service, such as a software as a service (SaaS).

Additionally, the order in which elements and sequences of the processes described herein are processed, the use of alphanumeric characters, or the use of other designations, is not intended to limit the order of the processes and methods described herein, unless explicitly claimed. While various presently contemplated embodiments of the invention have been discussed in the foregoing disclosure by way of example, it is to be understood that such detail is solely for that purpose and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover all modifications and equivalent arrangements that are within the spirit and scope of the embodiments herein. For example, although the system components described above may be implemented by hardware devices, they may also be implemented by software-only solutions, such as installing the described system on an existing server or mobile device.

Similarly, it should be noted that in the preceding description of embodiments of the application, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the embodiments. This method of disclosure, however, is not intended to require more features than are expressly recited in the claims. Indeed, the embodiments may be characterized as having less than all of the features of a single embodiment disclosed above.

Numerals describing the number of components, attributes, etc. are used in some embodiments, it being understood that such numerals used in the description of the embodiments are modified in some instances by the use of the modifier "about", "approximately" or "substantially". Unless otherwise indicated, "about", "approximately" or "substantially" indicates that the number allows a variation of ± 20%. Accordingly, in some embodiments, the numerical parameters used in the specification and claims are approximations that may vary depending upon the desired properties of the individual embodiments. In some embodiments, the numerical parameter should take into account the specified significant digits and employ a general digit preserving approach. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the range are approximations, in the specific examples, such numerical values are set forth as precisely as possible within the scope of the application.

The entire contents of each patent, patent application publication, and other material cited in this application, such as articles, books, specifications, publications, documents, and the like, are hereby incorporated by reference into this application. Except where the application is filed in a manner inconsistent or contrary to the present disclosure, and except where the claim is filed in its broadest scope (whether present or later appended to the application) as well. It is noted that the descriptions, definitions and/or use of terms in this application shall control if they are inconsistent or contrary to the statements and/or uses of the present application in the material attached to this application.

Finally, it should be understood that the embodiments described herein are merely illustrative of the principles of the embodiments of the present application. Other variations are also possible within the scope of the present application. Thus, by way of example, and not limitation, alternative configurations of the embodiments of the present application can be viewed as being consistent with the teachings of the present application. Accordingly, the embodiments of the present application are not limited to only those embodiments explicitly described and depicted herein.

Claims

1. A method of identifying lip movements, comprising:

acquiring a plurality of frames of face images of a face to be recognized; the multi-frame face images are shot at different moments;

acquiring lip characteristic values of a plurality of single-frame face images based on the multi-frame face images;

determining a lip reference feature value based on the plurality of lip feature values;

comparing the lip characteristic value of at least one single-frame face image with the lip reference characteristic value;

and determining whether the face to be recognized has lip movement or not based on the comparison result.

2. The method of claim 1, wherein the determining whether the face to be recognized has lip movement comprises:

determining the absolute value of the difference between the lip characteristic value of at least one single-frame face image and the lip reference characteristic value;

determining a maximum of at least one of the absolute values of the differences;

and when the maximum value is larger than a first preset value, determining that the human face to be recognized has lip movement.

3. The method of claim 1, wherein determining lip feature values for a single frame of the face image comprises:

acquiring at least one first characteristic point coordinate positioned on an upper lip and at least one second characteristic point coordinate positioned on a lower lip in a single-frame face image;

determining a first distance between the upper lip and the lower lip based on the first feature point coordinates and the second feature point coordinates;

acquiring coordinates of two lip corner points in the single-frame face image;

obtaining a second distance between the two lip corner points based on the coordinates of the two lip corner points;

determining the lip characteristic value in the single-frame face image based on the first distance and the second distance.

4. The method of claim 3,

the first characteristic points and the second characteristic points corresponding to the first characteristic points are distributed in a vertically symmetrical mode.

5. The method of claim 3,

the lip characteristic value is a ratio of the first distance and the second distance.

6. The method of claim 3, wherein determining the first distance between the upper lip and the lower lip based on the first feature point coordinate and the second feature point coordinate comprises:

determining a feature point distance based on each of two or more first feature points in the single-frame face image and a second feature point corresponding to the first feature point;

determining the first distance based on two or more of the feature point distances.

7. The method of claim 6,

the first distance is an average value of distances of two or more feature points.

8. The method of claim 1, wherein determining lip feature values for a single frame of the face image comprises:

acquiring a normalized face image of a single-frame face image;

and determining the lip characteristic value based on the normalized face image of the single-frame face image.

9. The method of claim 8, wherein obtaining a normalized face image of a single frame of the face image comprises:

acquiring coordinates of human face characteristic points on a human face in the single-frame human face image;

determining an affine transformation matrix based on the human face characteristic point coordinates and the corresponding preset human face characteristic point coordinates;

and carrying out affine transformation on the single-frame face image based on the affine transformation matrix to obtain a normalized face image of the single-frame face image.

10. The method of claim 9,

the face feature points include at least one of the following feature points: eye center, nose tip, and lip angle.

11. The method of claim 1, further comprising performing image size reduction processing on each of the plurality of frames of face images.

12. The method of claim 1,

the lip reference characteristic value is an average value of lip characteristic values of a plurality of single-frame face images.

13. The method of claim 1,

and the multi-frame face image is obtained from the video file of the face to be recognized.

14. A system for identifying lip movements, comprising:

the acquisition module is used for acquiring a plurality of frames of face images of a face to be recognized; the multi-frame face images are shot at different moments;

the lip characteristic value determining module is used for acquiring lip characteristic values of a plurality of single-frame face images based on the multi-frame face images;

a lip reference characteristic value determination module for determining a lip reference characteristic value based on the plurality of lip characteristic values;

the recognition module is used for comparing the lip characteristic value of at least one single-frame face image with the lip reference characteristic value; and determining whether the face to be recognized has lip movement or not based on the comparison result.

15. The system of claim 14, wherein the identification module is further to:

16. The system of claim 14, wherein the lip characteristic value determination module is further to:

acquiring coordinates of two lip corner points in the single-frame face image;

17. The system of claim 16,

18. The system of claim 16,

19. The system of claim 16, wherein the lip characteristic value determination module is further to:

20. The system of claim 19,

21. The system of claim 14, wherein the lip characteristic value determination module is further to:

acquiring a normalized face image of a single-frame face image;

22. The system of claim 21, wherein the lip characteristic value determination module is further to:

23. The system of claim 22,

24. The system of claim 14, further comprising an image pre-processing module for performing image downscaling processing on each of the plurality of frames of face images.

25. The system of claim 14,

26. The system of claim 14,

27. An apparatus for identifying lip movements, comprising at least one storage medium and at least one processor;

the at least one storage medium is configured to store computer instructions;

the at least one processor is configured to execute the computer instructions to implement the method of identifying lip movements of any of claims 1-13.

28. A computer readable storage medium storing computer instructions which, when executed by a processor, implement a method of identifying lip movements according to any one of claims 1 to 13.