CN116189271B - Data processing method and system based on intelligent watch identification lip language - Google Patents

Data processing method and system based on intelligent watch identification lip language Download PDF

Info

Publication number
CN116189271B
CN116189271B CN202310425186.4A CN202310425186A CN116189271B CN 116189271 B CN116189271 B CN 116189271B CN 202310425186 A CN202310425186 A CN 202310425186A CN 116189271 B CN116189271 B CN 116189271B
Authority
CN
China
Prior art keywords
data
lip
preset
language
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310425186.4A
Other languages
Chinese (zh)
Other versions
CN116189271A (en
Inventor
单文豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Manridy Technology Co ltd
Original Assignee
Shenzhen Manridy Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Manridy Technology Co ltd filed Critical Shenzhen Manridy Technology Co ltd
Priority to CN202310425186.4A priority Critical patent/CN116189271B/en
Publication of CN116189271A publication Critical patent/CN116189271A/en
Application granted granted Critical
Publication of CN116189271B publication Critical patent/CN116189271B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/166Detection; Localisation; Normalisation using acquisition arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Machine Translation (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a data processing method and a system based on intelligent watch lip language identification, which are applied to the field of data processing; the method comprises the steps of collecting facial image data of a user based on a preset scanner, generating a recognition area to be captured according to the facial image data, obtaining content to be recognized of the recognition area, analyzing the content to be recognized to obtain a transformation track of the recognition area, carrying out data synthesis on the transformation track based on dynamic transformation to generate lip language data of the user, extracting the voice data by applying a preset database, capturing language features appearing in the voice data, analyzing the lip language data based on the voice data and the language features to generate lip language processing data of the user, carrying out text translation on the lip language processing data, translating the lip language processing data based on a preset mapping relation to generate at least one or more language texts corresponding to the lip language processing data, and generating the language texts in a preset display screen.

Description

Data processing method and system based on intelligent watch identification lip language
Technical Field
The invention relates to the field of data processing, in particular to a data processing method and system for recognizing lip language based on an intelligent watch.
Background
Along with the continuous development of social productivity and science and technology, the requirements of various industries on language technology identification are increasingly greater, the language technology identification is a three-dimensional dynamic scene and entity behavior of multi-source information fusion and interaction, and the intelligent wearing equipment is adopted to record spoken language data when a user is in a noisy environment, but the function of lip language identification cannot be realized by using the intelligent wearing equipment, so that the function of voice identification is difficult to realize under the noisy environment, and communication disorder is easy to occur when the user interacts with the intelligent wearing equipment.
Disclosure of Invention
The invention aims to solve the problem that a user is difficult to realize a voice recognition function in a noisy environment and communication barriers are easy to occur when interacting with intelligent wearable equipment, and provides a data processing method and a data processing system based on intelligent watch lip language recognition.
The invention adopts the following technical means for solving the technical problems:
the invention provides a data processing method based on intelligent watch lip language identification, which comprises the following steps:
acquiring facial image data of a user based on a preset scanner, generating a recognition area to be captured according to the facial image data, and dynamically recognizing the recognition area by using the scanner to acquire the content to be recognized of the recognition area;
Inputting part information in the identification area into a training model for training to obtain a trained prediction model, analyzing the content to be identified to obtain a transformation track of the identification area, carrying out data synthesis on the transformation track based on dynamic transformation to generate lip language data of the user, inputting the lip language data into the prediction model for prediction to obtain voice data of the user, wherein the transformation track is specifically a transformation track of mouth action when the user speaks, and the part information comprises an upper lip, a lower lip, a lip angle and a lip valley;
extracting the voice data by using a preset database, and capturing language features appearing in the voice data, wherein the language features comprise sound production, languages and language branch dialects;
analyzing the lip language data based on the voice data and the language features, generating lip language processing data of the user, and judging whether the lip language processing data is clear or not according to a preset statement scoring table;
if yes, text translation is carried out on the lip processing data, translation is carried out on the lip processing data based on a preset mapping relation, at least one or more language texts corresponding to the lip processing data are generated, and the language texts are generated in a preset display screen, wherein the language texts are specifically language type texts arranged based on preset priorities.
Further, the step of acquiring facial image data of a user based on a preset scanner, generating a recognition area to be captured according to the facial image data, dynamically recognizing the recognition area by using the scanner, and acquiring the content to be recognized of the recognition area includes:
capturing the current scannable width and the scanning distance of the user;
judging whether the scannable width and the scanning distance are within a preset scanning range or not;
if yes, acquiring face information in the scanning range, comparing the face information with a preset scanning template in a difference way, generating a recognizable area of the face information, acquiring identity data corresponding to the face information according to the recognizable area, and acquiring lip fluctuation information in the recognizable area; wherein the identifiable region includes an ocular feature, a nasal feature, a lip feature, a brow feature, and an ear feature;
if the lip fluctuation information exists in the face information, recording the starting time and the ending time of the lip fluctuation information.
Further, the step of inputting the location information in the identification area into a training model to train and obtaining a trained prediction model includes:
Acquiring an initial training sample set, wherein training samples in the initial training sample set comprise image data of the position information and corresponding labels of the image data;
determining a clustering algorithm corresponding to the initial training sample set, taking the corresponding labels as cluster numbers, and clustering the image data by using the clustering algorithm to generate a clustering result, wherein the clustering result comprises corresponding labels of the image data, each cluster number comprises a plurality of image data samples, and the clustering algorithm comprises selecting K-means or spectral clustering;
determining image data, of which each corresponding label in the clustering result is inconsistent with the corresponding label, as an abnormal training sample;
deleting the abnormal training sample from the initial training sample set to obtain a cleaning training sample set, and inputting the cleaning training sample set into the training model for training.
Further, the step of analyzing the content to be identified to obtain a transformation track of the identification area and synthesizing data based on dynamic transformation of the transformation track includes:
based on the dynamic scene recorded in the content to be identified, constructing each reference axis taking the identification area as a reference coordinate system, and taking the position information as each coordinate of an independent point;
And extracting each reference axis and each coordinate to serve as an image to be converted, mapping the image to be converted into a preset space template according to a preset conversion proportion, generating a converted image corresponding to the image to be converted, and acquiring a dynamic conversion track of the converted image by applying a preset frame rate based on view angle information preset by the space template and the dynamic scene.
Further, the step of extracting the voice data by using a preset database and capturing language features appearing in the voice data includes:
extracting period data of at least one or more preset time points in the voice data;
and coding the time period data by using a preset coder, converting the time period data into voice feature vectors corresponding to each preset time point, and generating voice features corresponding to the voice data based on the voice feature vectors, wherein the coder specifically characterizes each vector of the voice data based on an embedded layer, and is based on each vector in the time period data set.
Further, the step of analyzing the lip language data based on the voice data and the language features to generate lip language processing data of the user and judging whether the lip language processing data is clear according to a preset sentence scoring table includes:
Converting the lip language processing data into text information in a preset format based on the language type;
inputting the text information into a preset reading model for reading, and generating sentence scores corresponding to the text information based on the text meanings of the text information;
judging whether the sentence score is larger than a score benchmark preset in the sentence score table;
if yes, judging that the lip language processing data has corresponding meaning.
Further, the step of acquiring facial image data of a user based on a preset scanner, generating a recognition area to be captured according to the facial image data, dynamically recognizing the recognition area by using the scanner, and acquiring the content to be recognized of the recognition area comprises the following steps:
capturing decibel data in an environment;
judging whether the decibel data is larger than a preset reading threshold value or not;
if yes, stopping capturing the decibel data in the environment, starting a preset scanner, and acquiring the image data in the preset range based on the temperature information in the preset range.
The invention also provides a data processing system based on the intelligent watch lip language identification, which comprises:
the acquisition module is used for acquiring facial image data of a user based on a preset scanner, generating an identification area to be captured according to the facial image data, and dynamically identifying the identification area by using the scanner to acquire the content to be identified of the identification area;
The prediction module is used for inputting the position information in the identification area into a training model for training to obtain a trained prediction model, analyzing the content to be identified to obtain a transformation track of the identification area, carrying out data synthesis on the transformation track based on dynamic transformation to generate lip language data of the user, inputting the lip language data into the prediction model for prediction to obtain voice data of the user, wherein the transformation track is specifically a transformation track of mouth action when the user speaks, and the position information comprises an upper lip, a lower lip, lip angles and lip valleys;
the capturing module is used for extracting the voice data by applying a preset database and capturing language features appearing in the voice data, wherein the language features comprise sound production, languages and language branch dialects;
the judging module is used for analyzing the lip language data based on the voice data and the language characteristics, generating lip language processing data of the user, and judging whether the lip language processing data is clear or not according to a preset statement scoring table;
and the execution module is used for translating the text of the lip processing data if the lip processing data is processed, translating the lip processing data based on a preset mapping relation, generating at least one or more language texts corresponding to the lip processing data, and generating the language texts in a preset display screen, wherein the language texts are particularly language type texts arranged based on a preset priority.
Further, the acquisition module further comprises:
the capturing unit is used for capturing the current scannable width and the scanning distance of the user;
the judging unit is used for judging whether the scannable width and the scanning distance are in a preset scanning range or not;
the execution unit is used for acquiring the face information in the scanning range if the face information is detected, comparing the face information with a preset scanning template in a difference way, generating an identifiable region of the face information, acquiring identity data corresponding to the face information according to the identifiable region, and acquiring lip fluctuation information in the identifiable region; wherein the identifiable region includes an ocular feature, a nasal feature, a lip feature, a brow feature, and an ear feature;
and the recording unit is used for recording the starting time and the ending time of the lip fluctuation information if the lip fluctuation information exists in the face information.
Further, the prediction module further includes:
the acquisition unit is used for acquiring an initial training sample set, wherein training samples in the initial training sample set comprise image data of the position information and corresponding labels of the image data;
The generating unit is used for determining a clustering algorithm corresponding to the initial training sample set, taking the corresponding labels as cluster numbers, and clustering the image data by utilizing the clustering algorithm to generate a clustering result, wherein the clustering result comprises corresponding labels of the image data, each cluster number comprises a plurality of image data samples, and the clustering algorithm comprises the steps of selecting K-means or spectral clustering;
the comparison unit is used for determining image data, of which each corresponding label in the clustering result is inconsistent with the corresponding label, as an abnormal training sample;
the training unit is used for deleting the abnormal training sample from the initial training sample set to obtain a cleaning training sample set, and inputting the cleaning training sample set into the training model for training.
The invention provides a data processing method and a system based on intelligent watch lip language identification, which have the following beneficial effects:
according to the invention, after lip image data of a user are collected, the lip image data are analyzed to obtain a transformation track, the transformation track is subjected to data synthesis to generate lip data of the user, the lip data are input into a prediction model to be predicted to obtain voice data of the user, language characteristics of the voice data are extracted, lip processing data of the user are generated through translation, language text information is obtained and then generated in a display screen preset in the intelligent watch, so that the user can still realize a voice recognition function in a noisy environment, and the probability of communication disorder when the user interacts with intelligent wearable equipment is reduced.
Drawings
FIG. 1 is a flow chart of an embodiment of a method for processing data based on a smart watch identification lip language according to the present invention;
fig. 2 is a block diagram illustrating an embodiment of a data processing system based on a smart watch identification lip language according to the present invention.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present invention, as the achievement, functional features, and advantages of the present invention are further described with reference to the embodiments, with reference to the accompanying drawings.
The technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1, a data processing method based on a smart watch identification lip language according to an embodiment of the present invention includes:
s1: acquiring facial image data of a user based on a preset scanner, generating a recognition area to be captured according to the facial image data, and dynamically recognizing the recognition area by using the scanner to acquire the content to be recognized of the recognition area;
S2: inputting part information in the identification area into a training model for training to obtain a trained prediction model, analyzing the content to be identified to obtain a transformation track of the identification area, carrying out data synthesis on the transformation track based on dynamic transformation to generate lip language data of the user, inputting the lip language data into the prediction model for prediction to obtain voice data of the user, wherein the transformation track is specifically a transformation track of mouth action when the user speaks, and the part information comprises an upper lip, a lower lip, a lip angle and a lip valley;
s3: extracting the voice data by using a preset database, and capturing language features appearing in the voice data, wherein the language features comprise sound production, languages and language branch dialects;
s4: analyzing the lip language data based on the voice data and the language features, generating lip language processing data of the user, and judging whether the lip language processing data is clear or not according to a preset statement scoring table;
s5: if yes, text translation is carried out on the lip processing data, translation is carried out on the lip processing data based on a preset mapping relation, at least one or more language texts corresponding to the lip processing data are generated, and the language texts are generated in a preset display screen, wherein the language texts are specifically language type texts arranged based on preset priorities.
In this embodiment, the intelligent system collects facial image data of a user wearing the intelligent watch based on a preset scanner, generates a recognition region to be captured, namely a mouth region, according to the facial image data, and dynamically recognizes the mouth region by using the scanner to obtain voice content to be recognized of the mouth region; accordingly, the scene is summarized as follows: because the user is in a noisy environment, the intelligent watch cannot record voice content about the user through the capture volume, and therefore the user mouth area is scanned to obtain the voice content to be identified in the user mouth area; then the intelligent system takes corresponding position information (comprising an upper lip position, a lower lip position, a lip angle position and a lip valley position of a mouth region) in the recognition region as a training sample, inputs the training sample into a blank training model for training to obtain a prediction model after training, analyzes the voice content to be recognized to obtain a lip transformation track of the user corresponding to the voice content to be recognized, synthesizes data of the transformation track based on a dynamic transformation process to generate lip data corresponding to the voice content to be recognized by the user, and predicts the lip data by applying a prediction model to obtain voice data corresponding to the user; the intelligent system extracts the characteristics of the voice data of the user by applying a preset large database, and captures the language characteristics appearing in the voice data, including sounding characteristics, language types and branch dialects corresponding to the language types appearing in the voice data; the intelligent system analyzes the lip language data of the user based on the voice data and the language characteristics corresponding to the voice data to generate lip language processing data of the user, and then judges whether the lip language processing data is clear or not according to a statement scoring table preset so as to execute corresponding steps; for example, the system judges that the lip language processing data is not clear according to the sentence scoring table, namely the system judges that the meaning of the voice content to be recognized of the user is not clear at the moment, and the intelligent watch displays a preset prompt word of 'please repeat the voice content to be read' into a display screen of the watch so as to remind the user to repeat the voice content with the unclear meaning just, and the system carries out reading recognition again; for example, the system determines that the lip processing data is clear according to the sentence scoring table, that is, the system determines that the voice content to be recognized of the user has a corresponding meaning at the moment, the smart watch translates the meaning, translates the lip processing data based on a preset mapping relationship, generates at least one or more language texts corresponding to the translated text (for example, language text branches of translating text into chinese language such as mandarin, white language and Charactizing, language text branches of translating text into spanish such as gatway, bask and gatway Li Xiya), and finally generates the language texts in a display screen of the smart watch.
In this embodiment, acquiring facial image data of a user based on a preset scanner, generating a recognition area to be captured according to the facial image data, dynamically recognizing the recognition area by using the scanner, and acquiring the content to be recognized of the recognition area, where step S1 includes:
s11: capturing the current scannable width and the scanning distance of the user;
s12: judging whether the scannable width and the scanning distance are within a preset scanning range or not;
s13: if yes, acquiring face information in the scanning range, comparing the face information with a preset scanning template in a difference way, generating a recognizable area of the face information, acquiring identity data corresponding to the face information according to the recognizable area, and acquiring lip fluctuation information in the recognizable area; wherein the identifiable region includes an ocular feature, a nasal feature, a lip feature, a brow feature, and an ear feature;
s14: if the lip fluctuation information exists in the face information, recording the starting time and the ending time of the lip fluctuation information.
In this embodiment, the intelligent system captures the user width and the scanning distance in the frame through the set maximum width and the set maximum scanning distance that can be scanned by the scanner frame, and determines whether the user width and the scanning distance in the scanner frame are within the preset maximum scanner width and the preset maximum scanning distance, so as to execute the corresponding steps; for example, the width and the scanning distance of the user in the scanner picture are not within the preset maximum width and the preset maximum scanning distance of the scanner, that is, the intelligent watch cannot meet the scanning condition, the user cannot be confirmed to be in the scannable range, and the scanning function cannot be started in advance to scan the user; for example, the user width and the scanning distance in the scanner picture are within the preset maximum scanner width and the preset maximum scanning distance, that is, the intelligent watch can collect face information in the scanning range, and compare the face information differently based on the preset scanning template to generate identifiable areas (namely five sense organs data) of the face information, obtain identity data corresponding to the user according to face information data corresponding to the identifiable areas, collect lip fluctuation information of the user, and record the start time and the end time of the lip fluctuation information.
It should be noted that, the reason for comparing the difference between the face information and the scanned sample is as follows: the user is prevented from suddenly leaving the range of the scanner in the scanning process, so that the scanner can erroneously scan other error information with similar existing temperature as the face information.
In this embodiment, the step S2 of inputting the location information in the identification area into the training model to perform training, and obtaining the trained prediction model includes:
s21: acquiring an initial training sample set, wherein training samples in the initial training sample set comprise image data of the position information and corresponding labels of the image data;
s22: determining a clustering algorithm corresponding to the initial training sample set, taking the corresponding labels as cluster numbers, and clustering the image data by using the clustering algorithm to generate a clustering result, wherein the clustering result comprises corresponding labels of the image data, each cluster number comprises a plurality of image data samples, and the clustering algorithm comprises selecting K-means or spectral clustering;
s23: determining image data, of which each corresponding label in the clustering result is inconsistent with the corresponding label, as an abnormal training sample;
S24: deleting the abnormal training sample from the initial training sample set to obtain a cleaning training sample set, and inputting the cleaning training sample set into the training model for training.
In this embodiment, the system determines a clustering algorithm (including a K-means clustering algorithm and a spectral clustering algorithm) required to be adopted for cleaning an initial training sample set by acquiring the initial training sample set (including labeling content corresponding to image data and image data of part information of a mouth region) for training a blank training model, uses the corresponding labeling content as a cluster number, applies the clustering algorithm to cluster the image data to generate a clustering result, includes image data of lip angle, lip bow, lip peak, lip bead, lip valley, upper lip and lower lip of the part information, and then determines the image data which is inconsistent with the label based on the labeling information and label in the clustering result as abnormal training samples (that is, other data is redundant image data inconsistent with the corresponding label except the image data is lip angle, lip valley, upper lip and lower lip), and deletes the abnormal training samples of the redundant image data from the initial training sample set to obtain a cleaned training sample set, and inputs the training sample set as the training sample into the blank training model for training.
In this embodiment, the step S2 of analyzing the content to be identified to obtain a transformation track of the identification area and synthesizing the transformation track based on dynamic transformation includes:
s201: based on the dynamic scene recorded in the content to be identified, constructing each reference axis taking the identification area as a reference coordinate system, and taking the position information as each coordinate of an independent point;
s202: and extracting each reference axis and each coordinate to serve as an image to be converted, mapping the image to be converted into a preset space template according to a preset conversion proportion, generating a converted image corresponding to the image to be converted, and acquiring a dynamic conversion track of the converted image by applying a preset frame rate based on view angle information preset by the space template and the dynamic scene.
In this embodiment, the intelligent system constructs three reference axes X, Y, Z using the region to be recognized (i.e., the mouth region) as a reference coordinate system based on dynamic scene data belonging to the user obtained by recording in the voice content to be recognized, uses four coordinates of the upper lip, the lower lip, the lip angle and the lip valley of the part information as independent points, establishes a reference plane belonging to the mouth region by using the reference axes and the coordinate numbers, uses the reference plane as an image to be converted by extracting the three reference axes and the four coordinates in the reference plane, maps the image to be converted into a space template which is preset according to a preset proportional size, generates a converted image which has the same content as the image to be converted but has different proportional size, then uses a preset frame rate to acquire a dynamic conversion track of the converted image based on preset visual angle information and dynamic scene data in the space template, and finally acquires data of the beginning of the lip conversion of the user to data of the conversion end by the dynamic conversion track.
In this embodiment, the step S3 of extracting the voice data by using a preset database and capturing the language features appearing in the voice data includes:
s31: extracting period data of at least one or more preset time points in the voice data;
s32: and coding the time period data by using a preset coder, converting the time period data into voice feature vectors corresponding to each preset time point, and generating voice features corresponding to the voice data based on the voice feature vectors, wherein the coder specifically characterizes each vector of the voice data based on an embedded layer, and is based on each vector in the time period data set.
In this embodiment, the system converts the speech data of the predetermined time point into the speech input vector by extracting the period data of at least one or more predetermined time points in the predicted speech data, using the embedding layer of the encoder model, so as to obtain a sequence of the speech input vector, thereby being capable of facilitating the subsequent encoding process, and then, passes the sequence of the speech input vector through the converter of the encoder model to convert the speech data of each predetermined time point into the speech feature vector corresponding to each predetermined time point, it should be understood that since the speech input vector can be encoded based on context by the encoder model based on the converter, the obtained speech feature vector can obtain the speech associated information of a plurality of predetermined time points globally to generate the speech feature corresponding to the speech data.
In this embodiment, the step S4 of analyzing the lip language data based on the voice data and the language features to generate lip language processing data of the user and determining whether the lip language processing data is clear according to a preset sentence scoring table includes:
s41: converting the lip language processing data into text information in a preset format based on the language type;
s42: inputting the text information into a preset reading model for reading, and generating sentence scores corresponding to the text information based on the text meanings of the text information;
s43: judging whether the sentence score is larger than a score benchmark preset in the sentence score table;
s44: if yes, judging that the lip language processing data has corresponding meaning.
In this embodiment, the intelligent system converts the lip processing data into text information content in a preset format based on the language type, then inputs the text information content into a preset reading model for reading to obtain text meanings corresponding to the text information content, generates sentence scores corresponding to the text information based on the text meanings, and judges whether the sentence scores are greater than a score standard preset in a sentence score table so as to execute corresponding steps; for example, if the sentence score generated by the system based on the text meanings is 70 points and the score reference preset by the sentence score table is 60 points, the system will determine that the lip language processing data has the corresponding text meaning, and the system will display the text meaning; for example, if the sentence score generated by the system based on the text meanings is 55 points and the score reference preset by the sentence score table is 60 points, the system will determine that the text meaning of the lip language processing data is ambiguous, and the system will not display the text meaning clearly, so the system will not display the text meaning.
In this embodiment, acquiring facial image data of a user based on a preset scanner, generating a recognition area to be captured according to the facial image data, dynamically recognizing the recognition area by using the scanner, and acquiring the content to be recognized of the recognition area, before step S1, including:
s101: capturing decibel data in an environment;
s102: judging whether the decibel data is larger than a preset reading threshold value or not;
s103: if yes, stopping capturing the decibel data in the environment, starting a preset scanner, and acquiring the image data in the preset range based on the temperature information in the preset range.
In this embodiment, the intelligent system determines whether the decibel data is greater than a preset readable threshold by capturing the decibel data in the current environment, so as to execute a corresponding step; for example, the system captures 70db of db data in the current environment and the preset readable threshold is 120db, that is, the system determines that db data in the current environment is not greater than the preset readable threshold, and the system does not enable the lip reading function, but still applies the scanner to scan the user to attempt to read the voice data input by the user; for example, the system captures 150db of db data in the current environment and the preset readable threshold is 120db, that is, the system determines that db data in the current environment is greater than the preset readable threshold, and then the system stops capturing db data in the environment and starts a preset scanner function to scan temperature information in a preset range, and tries to collect image data in the range to read face information in the image data.
Referring to fig. 2, a data processing system based on a smart watch identification lip language according to an embodiment of the present invention includes:
the acquisition module 10 is configured to acquire facial image data of a user based on a preset scanner, generate a recognition area to be captured according to the facial image data, and dynamically recognize the recognition area by using the scanner to obtain content to be recognized of the recognition area;
the prediction module 20 is configured to input location information in the recognition area into a training model for training to obtain a trained prediction model, analyze the content to be recognized to obtain a transformation track of the recognition area, perform data synthesis on the transformation track based on dynamic transformation, generate lip data of the user, input the lip data into the prediction model for prediction, and obtain voice data of the user by prediction, where the transformation track is specifically a transformation track of a mouth motion of the user during speaking, and the location information includes an upper lip, a lower lip, a lip angle and a lip valley;
the capturing module 30 is configured to extract the voice data by using a preset database, and capture language features appearing in the voice data, where the language features include a sound production, a language and a language branch dialect;
The judging module 40 is configured to parse the lip language data based on the voice data and the language features, generate lip language processing data of the user, and judge whether the lip language processing data is clear according to a preset sentence scoring table;
and the execution module 50 is configured to translate the text of the lip processing data if the text is generated, translate the lip processing data based on a preset mapping relationship, generate at least one or more language texts corresponding to the lip processing data, and generate the language texts in a preset display screen, wherein the language texts are specifically language type texts arranged based on a preset priority.
In this embodiment, the acquisition module 10 acquires facial image data of a user wearing the smart watch based on a preset scanner, generates a recognition region to be captured, namely a mouth region, according to the facial image data, and dynamically recognizes the mouth region by using the scanner to acquire voice content to be recognized of the mouth region; accordingly, the scene is summarized as follows: because the user is in a noisy environment, the intelligent watch cannot record voice content about the user through the capture volume, and therefore the user mouth area is scanned to obtain the voice content to be identified in the user mouth area; then, the prediction module 20 takes corresponding position information (including an upper lip position, a lower lip position, a lip angle position and a lip valley position of a mouth region) in the recognition region as a training sample, inputs the training sample into a blank training model to train to obtain a trained prediction model, analyzes the voice content to be recognized to obtain a lip transformation track of the user corresponding to the voice content to be recognized, synthesizes data of the transformation track based on a dynamic transformation process to generate lip data corresponding to the voice content to be recognized by the user, and predicts the lip data by applying a prediction model to obtain voice data corresponding to the user; the capturing module 30 captures language features appearing in the voice data by applying a preset large database to perform feature extraction on the voice data of the user, including sounding features, language types and branch dialects corresponding to the language types appearing in the voice data; the judging module 40 analyzes the lip language data of the user based on the voice data and the language characteristics corresponding to the voice data to generate lip language processing data of the user, and then judges whether the lip language processing data is clear or not according to a statement scoring table preset so as to execute the corresponding steps; for example, the system judges that the lip language processing data is not clear according to the sentence scoring table, namely the system judges that the meaning of the voice content to be recognized of the user is not clear at the moment, and the intelligent watch displays a preset prompt word of 'please repeat the voice content to be read' into a display screen of the watch so as to remind the user to repeat the voice content with the unclear meaning just, and the system carries out reading recognition again; for example, the system determines that the lip processing data is clear according to the sentence scoring table, that is, the execution module 50 determines that the input voice content to be recognized of the user has a corresponding meaning, at this time, the smart watch translates the meaning, translates the lip processing data based on a preset mapping relationship, generates at least one or more language texts corresponding to the translated text (for example, language text branches of which the translated text is in chinese language: mandarin, white language and Charactizing words, for example, language text branches of which the translated text is in spanish: gatailoniya, bask and gat Li Xiya), and finally generates these language texts in the display screen of the smart watch.
In this embodiment, the acquisition module further includes:
the capturing unit is used for capturing the current scannable width and the scanning distance of the user;
the judging unit is used for judging whether the scannable width and the scanning distance are in a preset scanning range or not;
the execution unit is used for acquiring the face information in the scanning range if the face information is detected, comparing the face information with a preset scanning template in a difference way, generating an identifiable region of the face information, acquiring identity data corresponding to the face information according to the identifiable region, and acquiring lip fluctuation information in the identifiable region; wherein the identifiable region includes an ocular feature, a nasal feature, a lip feature, a brow feature, and an ear feature;
and the recording unit is used for recording the starting time and the ending time of the lip fluctuation information if the lip fluctuation information exists in the face information.
In this embodiment, the intelligent system captures the user width and the scanning distance in the frame through the set maximum width and the set maximum scanning distance that can be scanned by the scanner frame, and determines whether the user width and the scanning distance in the scanner frame are within the preset maximum scanner width and the preset maximum scanning distance, so as to execute the corresponding steps; for example, the width and the scanning distance of the user in the scanner picture are not within the preset maximum width and the preset maximum scanning distance of the scanner, that is, the intelligent watch cannot meet the scanning condition, the user cannot be confirmed to be in the scannable range, and the scanning function cannot be started in advance to scan the user; for example, the user width and the scanning distance in the scanner picture are within the preset maximum scanner width and the preset maximum scanning distance, that is, the intelligent watch can collect face information in the scanning range, and compare the face information differently based on the preset scanning template to generate identifiable areas (namely five sense organs data) of the face information, obtain identity data corresponding to the user according to face information data corresponding to the identifiable areas, collect lip fluctuation information of the user, and record the start time and the end time of the lip fluctuation information.
It should be noted that, the reason for comparing the difference between the face information and the scanned sample is as follows: the user is prevented from suddenly leaving the range of the scanner in the scanning process, so that the scanner can erroneously scan other error information with similar existing temperature as the face information.
In this embodiment, the prediction module further includes:
the acquisition unit is used for acquiring an initial training sample set, wherein training samples in the initial training sample set comprise image data of the position information and corresponding labels of the image data;
the generating unit is used for determining a clustering algorithm corresponding to the initial training sample set, taking the corresponding labels as cluster numbers, and clustering the image data by utilizing the clustering algorithm to generate a clustering result, wherein the clustering result comprises corresponding labels of the image data, each cluster number comprises a plurality of image data samples, and the clustering algorithm comprises the steps of selecting K-means or spectral clustering;
the comparison unit is used for determining image data, of which each corresponding label in the clustering result is inconsistent with the corresponding label, as an abnormal training sample;
the training unit is used for deleting the abnormal training sample from the initial training sample set to obtain a cleaning training sample set, and inputting the cleaning training sample set into the training model for training.
In this embodiment, the prediction module further includes:
the construction unit is used for constructing each reference axis taking the identification area as a reference coordinate system and taking the position information as each coordinate of an independent point based on the dynamic scene recorded in the content to be identified;
the extraction unit is used for extracting each reference axis and each coordinate as an image to be converted, mapping the image to be converted into a preset space template according to a preset conversion proportion, generating a converted image corresponding to the image to be converted, and acquiring a dynamic conversion track of the converted image by applying a preset frame rate based on view angle information preset by the space template and the dynamic scene.
In this embodiment, the intelligent system constructs three reference axes X, Y, Z using the region to be recognized (i.e., the mouth region) as a reference coordinate system based on dynamic scene data belonging to the user obtained by recording in the voice content to be recognized, uses four coordinates of the upper lip, the lower lip, the lip angle and the lip valley of the part information as independent points, establishes a reference plane belonging to the mouth region by using the reference axes and the coordinate numbers, uses the reference plane as an image to be converted by extracting the three reference axes and the four coordinates in the reference plane, maps the image to be converted into a space template which is preset according to a preset proportional size, generates a converted image which has the same content as the image to be converted but has different proportional size, then uses a preset frame rate to acquire a dynamic conversion track of the converted image based on preset visual angle information and dynamic scene data in the space template, and finally acquires data of the beginning of the lip conversion of the user to data of the conversion end by the dynamic conversion track.
In this embodiment, the capturing module further includes:
a second extraction unit configured to extract period data of at least one or more predetermined time points in the voice data;
the encoding unit is used for encoding the time period data by applying a preset encoder, converting the time period data into voice feature vectors corresponding to each preset time point, and generating voice features corresponding to the voice data based on the voice feature vectors, wherein the encoder specifically characterizes each vector of the voice data based on an embedded layer, and is based on each vector in the time period data set.
In this embodiment, the system converts the speech data of the predetermined time point into the speech input vector by extracting the period data of at least one or more predetermined time points in the predicted speech data, using the embedding layer of the encoder model, so as to obtain a sequence of the speech input vector, thereby being capable of facilitating the subsequent encoding process, and then, passes the sequence of the speech input vector through the converter of the encoder model to convert the speech data of each predetermined time point into the speech feature vector corresponding to each predetermined time point, it should be understood that since the speech input vector can be encoded based on context by the encoder model based on the converter, the obtained speech feature vector can obtain the speech associated information of a plurality of predetermined time points globally to generate the speech feature corresponding to the speech data.
In this embodiment, the judging module further includes:
the conversion unit is used for converting the lip language processing data into text information in a preset format based on the language type;
the reading unit is used for inputting the text information into a preset reading model for reading, and generating sentence scores corresponding to the text information based on the text meanings of the text information;
the second judging unit is used for judging whether the sentence score is larger than a score standard preset in the sentence score table or not;
and the second execution unit is used for judging that the lip language processing data has corresponding meanings if yes.
In this embodiment, the intelligent system converts the lip processing data into text information content in a preset format based on the language type, then inputs the text information content into a preset reading model for reading to obtain text meanings corresponding to the text information content, generates sentence scores corresponding to the text information based on the text meanings, and judges whether the sentence scores are greater than a score standard preset in a sentence score table so as to execute corresponding steps; for example, if the sentence score generated by the system based on the text meanings is 70 points and the score reference preset by the sentence score table is 60 points, the system will determine that the lip language processing data has the corresponding text meaning, and the system will display the text meaning; for example, if the sentence score generated by the system based on the text meanings is 55 points and the score reference preset by the sentence score table is 60 points, the system will determine that the text meaning of the lip language processing data is ambiguous, and the system will not display the text meaning clearly, so the system will not display the text meaning.
In this embodiment, further comprising:
the second capturing module is used for capturing decibel data in the environment;
the second judging module is used for judging whether the decibel data is larger than a preset reading threshold value or not;
and the second execution module is used for stopping capturing the decibel data in the environment if the temperature information in the preset range is detected, and starting a preset scanner to acquire the image data in the preset range.
In this embodiment, the intelligent system determines whether the decibel data is greater than a preset readable threshold by capturing the decibel data in the current environment, so as to execute a corresponding step; for example, the system captures 70db of db data in the current environment and the preset readable threshold is 120db, that is, the system determines that db data in the current environment is not greater than the preset readable threshold, and the system does not enable the lip reading function, but still applies the scanner to scan the user to attempt to read the voice data input by the user; for example, the system captures 150db of db data in the current environment and the preset readable threshold is 120db, that is, the system determines that db data in the current environment is greater than the preset readable threshold, and then the system stops capturing db data in the environment and starts a preset scanner function to scan temperature information in a preset range, and tries to collect image data in the range to read face information in the image data.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (10)

1. The data processing method based on the intelligent watch identification lip language is characterized by comprising the following steps of:
acquiring facial image data of a user based on a preset scanner, generating a recognition area to be captured according to the facial image data, dynamically recognizing the recognition area by using the scanner, and acquiring the content to be recognized of the recognition area, wherein the recognition area is a mouth area of the user, and the content to be recognized is specifically voice content to be recognized in the mouth area of the user by scanning the mouth area of the user;
the method comprises the steps of inputting position information in an identification area into a training model for training to obtain a trained prediction model, analyzing content to be identified to obtain a transformation track of the identification area, carrying out data synthesis on the transformation track based on dynamic transformation to generate lip data of a user, inputting the lip data into the prediction model for prediction to obtain voice data of the user, wherein the transformation track is specifically a transformation track of mouth action of the user during speaking, the position information comprises an upper lip, a lower lip, lip angles and lip valleys, the position information can be used as a training sample of the training model, the training model is specifically a blank training model, and the prediction model can predict the lip data corresponding to the content to be identified to obtain voice data corresponding to the lip data;
Extracting the voice data by using a preset database, and capturing language features appearing in the voice data, wherein the language features comprise sound production, languages and language branch dialects;
analyzing the lip language data based on the voice data and the language features, generating lip language processing data of the user, and judging whether the lip language processing data is clear or not according to a preset statement scoring table;
if yes, text translation is carried out on the lip processing data, translation is carried out on the lip processing data based on a preset mapping relation, at least one or more language texts corresponding to the lip processing data are generated, and the language texts are generated in a preset display screen, wherein the language texts are specifically language type texts arranged based on preset priorities.
2. The method for processing data based on the smart watch lip language according to claim 1, wherein the step of acquiring the face image data of the user based on the preset scanner, generating the identification area to be captured according to the face image data, dynamically identifying the identification area by using the scanner, and acquiring the content to be identified of the identification area comprises the following steps:
Capturing the current scannable width and the scanning distance of the user;
judging whether the scannable width and the scanning distance are within a preset scanning range or not;
if yes, acquiring face information in the scanning range, comparing the face information with a preset scanning template in a difference way, generating a recognizable area of the face information, acquiring identity data corresponding to the face information according to the recognizable area, and acquiring lip fluctuation information in the recognizable area; wherein the identifiable region includes an ocular feature, a nasal feature, a lip feature, a brow feature, and an ear feature;
if the lip fluctuation information exists in the face information, recording the starting time and the ending time of the lip fluctuation information.
3. The method for processing data based on the identification lip language of the smart watch according to claim 1, wherein the step of inputting the location information in the identification area into a training model for training to obtain a trained prediction model comprises the following steps:
acquiring an initial training sample set, wherein training samples in the initial training sample set comprise image data of the position information and corresponding labels of the image data;
Determining a clustering algorithm corresponding to the initial training sample set, taking the corresponding labels as cluster numbers, and clustering the image data by using the clustering algorithm to generate a clustering result, wherein the clustering result comprises corresponding labels of the image data, each cluster number comprises a plurality of image data samples, and the clustering algorithm comprises selecting K-means or spectral clustering;
determining image data, of which each corresponding label in the clustering result is inconsistent with the corresponding label, as an abnormal training sample;
deleting the abnormal training sample from the initial training sample set to obtain a cleaning training sample set, and inputting the cleaning training sample set into the training model for training.
4. The method for processing data based on smart watch lip language identification according to claim 1, wherein the step of analyzing the content to be identified to obtain a transformation track of the identification area and synthesizing the transformation track based on dynamic transformation comprises:
based on the dynamic scene recorded in the content to be identified, constructing each reference axis taking the identification area as a reference coordinate system, and taking the position information as each coordinate of an independent point;
And extracting each reference axis and each coordinate to serve as an image to be converted, mapping the image to be converted into a preset space template according to a preset conversion proportion, generating a converted image corresponding to the image to be converted, and acquiring a dynamic conversion track of the converted image by applying a preset frame rate based on view angle information preset by the space template and the dynamic scene.
5. The smart watch-based lip-language recognition data processing method according to claim 1, wherein the step of extracting the voice data by using a preset database and capturing language features appearing in the voice data comprises:
extracting period data of at least one or more preset time points in the voice data;
and coding the time period data by using a preset coder, converting the time period data into voice feature vectors corresponding to each preset time point, and generating voice features corresponding to the voice data based on the voice feature vectors, wherein the coder specifically characterizes each vector of the voice data based on an embedded layer, and is based on each vector in the time period data set.
6. The method for processing data based on intelligent watch lip recognition according to claim 1, wherein the step of analyzing the lip data based on the voice data and the language features to generate the lip processing data of the user and judging whether the lip processing data is clear according to a preset sentence scoring table comprises the steps of:
Converting the lip language processing data into text information in a preset format based on the language type;
inputting the text information into a preset reading model for reading, and generating sentence scores corresponding to the text information based on the text meanings of the text information;
judging whether the sentence score is larger than a score benchmark preset in the sentence score table;
if yes, judging that the lip language processing data has corresponding meaning.
7. The method for processing the data based on the smart watch lip language according to claim 1, wherein the step of acquiring the face image data of the user based on the preset scanner, generating the identification area to be captured according to the face image data, dynamically identifying the identification area by using the scanner, and acquiring the content to be identified of the identification area comprises the following steps:
capturing decibel data in an environment;
judging whether the decibel data is larger than a preset reading threshold value or not;
if yes, stopping capturing the decibel data in the environment, starting a preset scanner, and acquiring the image data in the preset range based on the temperature information in the preset range.
8. Data processing system based on intelligence wrist-watch discernment lip language, its characterized in that includes:
The acquisition module is used for acquiring facial image data of a user based on a preset scanner, generating an identification area to be captured according to the facial image data, dynamically identifying the identification area by using the scanner, and acquiring the content to be identified of the identification area, wherein the identification area is the mouth area of the user, and the content to be identified is specifically voice content to be identified of the mouth area of the user by scanning the mouth area of the user;
the prediction module is used for inputting the position information in the identification area into a training model for training to obtain a trained prediction model, analyzing the content to be identified to obtain a transformation track of the identification area, carrying out data synthesis on the transformation track based on dynamic transformation to generate lip language data of the user, inputting the lip language data into the prediction model for prediction to obtain voice data of the user, wherein the transformation track is specifically a transformation track of mouth action when the user speaks, the position information comprises an upper lip, a lower lip, lip angles and lip valleys, the position information can be used as a training sample of the training model, the training model is specifically a blank training model, and the prediction model can predict the lip language data corresponding to the content to be identified to obtain voice data corresponding to the lip language data;
The capturing module is used for extracting the voice data by applying a preset database and capturing language features appearing in the voice data, wherein the language features comprise sound production, languages and language branch dialects;
the judging module is used for analyzing the lip language data based on the voice data and the language characteristics, generating lip language processing data of the user, and judging whether the lip language processing data is clear or not according to a preset statement scoring table;
and the execution module is used for translating the text of the lip processing data if the lip processing data is processed, translating the lip processing data based on a preset mapping relation, generating at least one or more language texts corresponding to the lip processing data, and generating the language texts in a preset display screen, wherein the language texts are particularly language type texts arranged based on a preset priority.
9. The smart watch-based lip language identification data processing system of claim 8, wherein the acquisition module further comprises:
the capturing unit is used for capturing the current scannable width and the scanning distance of the user;
the judging unit is used for judging whether the scannable width and the scanning distance are in a preset scanning range or not;
The execution unit is used for acquiring the face information in the scanning range if the face information is detected, comparing the face information with a preset scanning template in a difference way, generating an identifiable region of the face information, acquiring identity data corresponding to the face information according to the identifiable region, and acquiring lip fluctuation information in the identifiable region; wherein the identifiable region includes an ocular feature, a nasal feature, a lip feature, a brow feature, and an ear feature;
and the recording unit is used for recording the starting time and the ending time of the lip fluctuation information if the lip fluctuation information exists in the face information.
10. The smart watch-based lip-recognition data processing system of claim 8, wherein the prediction module further comprises:
the acquisition unit is used for acquiring an initial training sample set, wherein training samples in the initial training sample set comprise image data of the position information and corresponding labels of the image data;
the generating unit is used for determining a clustering algorithm corresponding to the initial training sample set, taking the corresponding labels as cluster numbers, and clustering the image data by utilizing the clustering algorithm to generate a clustering result, wherein the clustering result comprises corresponding labels of the image data, each cluster number comprises a plurality of image data samples, and the clustering algorithm comprises the steps of selecting K-means or spectral clustering;
The comparison unit is used for determining image data, of which each corresponding label in the clustering result is inconsistent with the corresponding label, as an abnormal training sample;
the training unit is used for deleting the abnormal training sample from the initial training sample set to obtain a cleaning training sample set, and inputting the cleaning training sample set into the training model for training.
CN202310425186.4A 2023-04-20 2023-04-20 Data processing method and system based on intelligent watch identification lip language Active CN116189271B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310425186.4A CN116189271B (en) 2023-04-20 2023-04-20 Data processing method and system based on intelligent watch identification lip language

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310425186.4A CN116189271B (en) 2023-04-20 2023-04-20 Data processing method and system based on intelligent watch identification lip language

Publications (2)

Publication Number Publication Date
CN116189271A CN116189271A (en) 2023-05-30
CN116189271B true CN116189271B (en) 2023-07-14

Family

ID=86438675

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310425186.4A Active CN116189271B (en) 2023-04-20 2023-04-20 Data processing method and system based on intelligent watch identification lip language

Country Status (1)

Country Link
CN (1) CN116189271B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020253051A1 (en) * 2019-06-18 2020-12-24 平安科技(深圳)有限公司 Lip language recognition method and apparatus

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012059017A (en) * 2010-09-09 2012-03-22 Kyushu Institute Of Technology Word-spotting lip reading device and method
CN105825167A (en) * 2016-01-29 2016-08-03 维沃移动通信有限公司 Method for enhancing lip language recognition rate and mobile terminal
CN112784696B (en) * 2020-12-31 2024-05-10 平安科技(深圳)有限公司 Lip language identification method, device, equipment and storage medium based on image identification
CN114581812B (en) * 2022-01-12 2023-03-21 北京云辰信通科技有限公司 Visual language identification method and device, electronic equipment and storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020253051A1 (en) * 2019-06-18 2020-12-24 平安科技(深圳)有限公司 Lip language recognition method and apparatus

Also Published As

Publication number Publication date
CN116189271A (en) 2023-05-30

Similar Documents

Publication Publication Date Title
CN110751208B (en) Criminal emotion recognition method for multi-mode feature fusion based on self-weight differential encoder
CN110909613B (en) Video character recognition method and device, storage medium and electronic equipment
US10878824B2 (en) Speech-to-text generation using video-speech matching from a primary speaker
CN104361276B (en) A kind of multi-modal biological characteristic identity identifying method and system
CN109714608B (en) Video data processing method, video data processing device, computer equipment and storage medium
CN112507311A (en) High-security identity verification method based on multi-mode feature fusion
CN113327621A (en) Model training method, user identification method, system, device and medium
CN112768070A (en) Mental health evaluation method and system based on dialogue communication
CN111554279A (en) Multi-mode man-machine interaction system based on Kinect
CN111402892A (en) Conference recording template generation method based on voice recognition
CN116246610A (en) Conference record generation method and system based on multi-mode identification
CN115910066A (en) Intelligent dispatching command and operation system for regional power distribution network
CN115083392A (en) Method, device, equipment and storage medium for acquiring customer service coping strategy
CN112466287B (en) Voice segmentation method, device and computer readable storage medium
CN114239610A (en) Multi-language speech recognition and translation method and related system
CN116189271B (en) Data processing method and system based on intelligent watch identification lip language
CN113393841B (en) Training method, device, equipment and storage medium of voice recognition model
CN109817223A (en) Phoneme marking method and device based on audio fingerprints
CN113837907A (en) Man-machine interaction system and method for English teaching
CN112734604A (en) Device for providing multi-mode intelligent case report and record generation method thereof
CN111243597A (en) Chinese-English mixed speech recognition method
CN114283493A (en) Artificial intelligence-based identification system
CN114171000A (en) Audio recognition method based on acoustic model and language model
CN114078470A (en) Model processing method and device, and voice recognition method and device
CN111276146A (en) Teaching training system based on voice recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant