CN117219078A

CN117219078A - Ornaments voice recognition method and system based on artificial intelligence

Info

Publication number: CN117219078A
Application number: CN202311362890.6A
Authority: CN
Inventors: 陈建明
Original assignee: Shenzhen Hongtai Intelligent Creative Electronic Technology Co ltd
Current assignee: Shenzhen Hongtai Intelligent Creative Electronic Technology Co ltd
Priority date: 2023-10-20
Filing date: 2023-10-20
Publication date: 2023-12-12

Abstract

According to the method and the system for recognizing the ornament voice based on the artificial intelligence, provided by the embodiment of the invention, through improving the algorithm framework of the control task event recognition algorithm, the control task event recognition algorithm only focuses on the voice vector related to the target control task event in the input information of the ornament control voice to be recognized, the processing overhead of the control task event recognition algorithm is reduced, and therefore the timeliness and the accuracy of generating an ornament control execution strategy by combining the control task event recognition algorithm are improved.

Description

Ornaments voice recognition method and system based on artificial intelligence

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to an artificial intelligence-based ornament voice recognition method and system.

Background

An electric sound-controlled ornament is an ornament which is driven by electric power and can automatically swing according to music rhythm or sound change. It is usually composed of one or more oscillating parts, incorporating sensors and audio equipment, in order to be able to sense the sounds in the environment and respond accordingly. When it detects sound, it oscillates at different speeds, amplitudes or directions, creating a visual, dynamic effect. These pendulums may take a variety of shapes and designs, such as stage performers, animals, musical instruments, and the like. Electric sound-controlled ornaments are often used in home decoration, entertainment or music activities to increase interest and vitality for the environment. Most of traditional electric sound control ornaments can only act according to simple voice instructions, and along with the mention of complexity of the ornament control of users, the prior art is difficult to realize accurate and timely ornament control through relatively complex voice instructions.

Disclosure of Invention

In order to improve the technical problems in the related art, the invention provides an artificial intelligence-based method and system for recognizing the ornament voice.

In a first aspect, an embodiment of the present invention provides an artificial intelligence-based method for recognizing a sound of an ornament, which is applied to an AI speech recognition processing system, and the method includes:

acquiring control voice input information of the to-be-identified ornament; loading the to-be-identified ornament control voice input information to a control task event identification algorithm, and carrying out voice description vector mining on the to-be-identified ornament control voice input information through a first voice description vector mining branch set of the control task event identification algorithm to obtain control task event demand description data, wherein the first voice description vector mining branch set comprises a synchronous control task event demand description mining component and a first feature aggregation component connected with each synchronous control task event demand description mining component;

loading the control task event demand description data into a second voice description vector mining branch set according to the control task event recognition algorithm to obtain control task relative distribution description data corresponding to a target control task event in the to-be-recognized object control voice input information, wherein the second voice description vector mining branch set comprises a synchronous control task relative distribution description mining component and a second feature aggregation component connected with each synchronous control task relative distribution description mining component;

And generating a goods of furniture control execution strategy corresponding to the target control task event based on the control task relative distribution description data.

In some optional solutions, performing, by the control task event recognition algorithm, a first speech description vector mining branch set on the to-be-recognized object control speech input information to perform speech description vector mining to obtain control task event requirement description data, where the method includes:

inputting the input information of the to-be-recognized ornament control voice into a first deep learning component, and extracting to obtain a first ornament control voice semantic vector;

loading the first cargo control speech semantic vector into a synchronized first moving average component and first feature downsampling component, the first deep learning component, the first moving average component, and the first feature downsampling component comprising a semantic refinement process;

generating a swing control voice moving average vector through the first moving average component, and generating a local structure control logic feature through the first feature downsampling component;

and carrying out AI knowledge fusion on the moving average vector of the swing control voice and the local structure control logic characteristic through the first characteristic aggregation component to obtain the control task event demand description data.

In some alternative solutions, according to the control task event recognition algorithm, the control task event requirement description data is loaded to a second voice description vector mining branch set to obtain control task relative distribution description data corresponding to a target control task event in the to-be-recognized ornament control voice input information, where the control task relative distribution description data includes:

inputting the control task event demand description data into a second deep learning component, and extracting to obtain a second ornament control voice semantic vector;

loading the second swing control speech semantic vector to a synchronized second, third and second feature downsampling components, the second and third moving average components differing in the scale of speech description vector mining branches, the second, third and second deep learning components comprising a semantic refinement process;

generating a first relative distribution moving average vector by the second moving average component, a second relative distribution moving average vector by the third moving average component, and a relative distribution upstream and downstream feature by the second feature downsampling component;

And carrying out AI knowledge fusion on the first relative distribution moving average vector, the second relative distribution moving average vector and the relative distribution upstream and downstream features through the second feature aggregation component to obtain the control task relative distribution description data.

In some optional solutions, the generating, based on the control task relative distribution description data, a cargo control execution policy corresponding to the target control task event includes:

determining an event state of a target control task event in the to-be-identified ornament control voice input information based on the control task relative distribution description data;

determining the number of to-be-controlled parts of the target control task event based on the event state of the target control task event;

determining key part distribution characteristics of the target control task event based on the control task relative distribution description data;

and determining a swing control execution strategy and a swing control expected track of the target control task event based on the number of the parts to be controlled and the key part distribution characteristics.

In some alternatives, the debugging method of the control task event recognition algorithm includes:

Acquiring a sample set of the input information of the ornament control voice, wherein the sample set of the input information of the ornament control voice comprises an ornament control execution strategy sample corresponding to a control task event sample;

loading the swing control voice input information samples in the swing control voice input information sample set into a to-be-debugged control task event recognition algorithm, wherein the to-be-debugged control task event recognition algorithm performs voice description vector mining on the swing control voice input information samples through a first voice description vector mining branch set to obtain debugging control task event demand description data, and the first voice description vector mining branch set comprises a synchronous control task event demand description mining component and a first feature aggregation component connected with each synchronous control task event demand description mining component;

loading the debugging control task event demand description data into a second voice description vector mining branch set through the to-be-debugged control task event recognition algorithm to obtain debugging control task relative distribution description data corresponding to a control task event sample in the goods of furniture control voice input information sample, wherein the second voice description vector mining branch set comprises a synchronous control task relative distribution description mining component and a second feature aggregation component connected with each synchronous control task relative distribution description mining component;

And optimizing algorithm variables of the control task event recognition algorithm to be debugged based on the debugging control task relative distribution description data and the goods of furniture control execution strategy sample until the debugging control task relative distribution description data generated by the control task event recognition algorithm to be debugged is generated to meet the steady state requirement, so as to obtain the debugged control task event recognition algorithm.

In some alternatives, the method further comprises:

acquiring a priori debugging learning basis corresponding to an ornament control execution strategy sample corresponding to the control task event sample in the ornament control voice input information sample;

determining a general variable corresponding to a control task event recognition algorithm based on the prior debugging learning basis;

and adjusting algorithm variables of the control task event recognition algorithm based on the universal variables to obtain the control task event recognition algorithm to be debugged.

In some alternatives, the obtaining a set of control speech input information samples, the swing control voice input information samples in the swing control voice input information sample set comprise swing control execution strategy samples corresponding to control task event samples, and the swing control voice input information samples comprise:

Acquiring a past ornament control voice input information set, wherein the past ornament control voice input information in the past ornament control voice input information set comprises corresponding past ornament control execution strategies;

acquiring a setting selection instruction, wherein the setting selection instruction comprises the correlation characteristics of past control task events and past voice input information processing instructions of different numbers of parts to be controlled;

based on the number of the control parts of the past ornament control execution strategy in the past ornament control voice input information and the related characteristics, voice input information processing conditions corresponding to each past ornament control voice input information are obtained, the corresponding past ornament control voice input information is subjected to past ornament control voice input information processing according to the voice input information processing conditions to obtain an ornament control voice input information sample, and the ornament control voice input information sample comprises an ornament control execution strategy sample corresponding to a control task event sample.

In some optional solutions, the setting selection instruction includes a first layer of past voice input information processing instruction, a second layer of past voice input information processing instruction, a third layer of past voice input information processing instruction and a fourth layer of past voice input information processing instruction, the number of control parts of a past ornament control execution policy in the past ornament control voice input information and the associated features are based on to obtain a voice input information processing condition corresponding to each past ornament control voice input information, and the past ornament control voice input information processing is performed on the corresponding past ornament control voice input information according to the voice input information processing condition to obtain an ornament control voice input information sample, where the ornament control voice input information sample includes an ornament control execution policy sample corresponding to a control task event sample, including:

Removing the past ornament control voice input information when the number of the control parts corresponding to the past ornament control execution strategy in the past ornament control voice input information is smaller than the number of first to-be-controlled parts corresponding to the first-layer past voice input information processing instruction and/or when the number of the control parts corresponding to the past ornament control execution strategy in the past ornament control voice input information is larger than the number of fourth to-be-controlled parts corresponding to the fourth-layer past voice input information processing instruction;

when the past ornament control execution strategy in the past ornament control voice input information is matched with the number of the second to-be-controlled parts corresponding to the second layer of past ornament control voice input information processing instruction, a set reconstruction index is obtained, voice fine granularity expansion is carried out on part of the past ornament control voice input information based on the set reconstruction index, the past ornament control voice input information after the voice fine granularity expansion is obtained, and the past ornament control voice input information after the voice fine granularity expansion forms the ornament control voice input information sample set;

and when the past ornament control execution strategy in the past ornament control voice input information is matched with the number of the third to-be-controlled parts corresponding to the third-layer past voice input information processing instruction, forming the past ornament control voice input information into the ornament control voice input information sample set.

In a second aspect, the present invention also provides an AI speech recognition processing system, including a processor and a memory; the processor is in communication with the memory, and the processor is configured to read and execute a computer program from the memory to implement the method described above.

In a third aspect, the present invention also provides a computer-readable storage medium having stored thereon a program which, when executed by a processor, implements the method described above.

According to the method and the system for recognizing the ornament voice based on the artificial intelligence, the ornament control voice input information to be recognized is loaded into the debugged control task event recognition algorithm, the control task event recognition algorithm obtains control task event demand description data through the first voice description vector mining branch set, the control task event recognition algorithm loads the control task event demand description data into the second voice description vector mining branch set, the second voice description vector mining branch set determines a target control task event in the ornament control voice input information to be recognized according to the control task event demand description data, then the control task relative distribution description data corresponding to the target control task event is obtained through extraction, and finally an ornament control execution strategy corresponding to the target control task event is generated according to the control task relative distribution description data. The first voice description vector mining branch set comprises a synchronous control task event demand description mining component and a first feature aggregation component connected with each synchronous control task event demand description mining component, and the second voice description vector mining branch set comprises a synchronous control task relative distribution description mining component and a second feature aggregation component connected with each synchronous control task relative distribution description mining component. By improving the algorithm architecture of the control task event recognition algorithm, the control task event recognition algorithm only focuses on the voice vector related to the target control task event in the input information of the to-be-recognized ornament control voice, and the processing overhead of the control task event recognition algorithm is reduced, so that the timeliness and the accuracy of generating an ornament control execution strategy by combining the control task event recognition algorithm are improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

Fig. 1 is a schematic flow chart of an artificial intelligence-based method for recognizing sound of a decoration according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the invention. Rather, they are merely examples of apparatus and methods consistent with aspects of the invention.

It should be noted that the terms "first," "second," and the like in the description of the present invention and the above-described drawings are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.

The method embodiments provided by the embodiments of the present invention may be performed in an AI speech recognition processing system, a computer device, or a similar computing device. Taking the example of running on an AI speech recognition processing system, the AI speech recognition processing system may comprise one or more processors (which may include, but is not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA) and a memory for storing data, and optionally the AI speech recognition processing system may further comprise transmission means for communication functions. It will be appreciated by those of ordinary skill in the art that the above-described architecture is merely illustrative and is not intended to limit the architecture of the AI speech recognition processing system described above. For example, the AI speech recognition processing system can also include more or fewer components than those shown above, or have a different configuration than those shown above.

The memory may be used to store a computer program, for example, a software program of application software and a module, for example, a computer program corresponding to an artificial intelligence-based method for recognizing a voice of an ornament according to an embodiment of the present invention, and the processor executes the computer program stored in the memory, thereby performing various functional applications and data processing, that is, implementing the method. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid state memory. In some examples, the memory may further include memory remotely located with respect to the processor, which may be connected to the AI speech recognition processing system via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission means is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of an AI speech recognition processing system. In one example, the transmission means comprises a network adapter (Network Interface Controller, simply referred to as NIC) that can be connected to other network devices via a base station to communicate with the internet. In one example, the transmission device may be a Radio Frequency (RF) module, which is used to communicate with the internet wirelessly.

Referring to fig. 1, fig. 1 is a schematic flow chart of an artificial intelligence-based method for recognizing a decoration voice according to an embodiment of the invention, where the method is applied to an AI voice recognition processing system, and further includes steps 110-140.

Step 110, obtaining the control voice input information of the to-be-recognized ornament.

The to-be-identified goods of furniture control voice input information is used for controlling the electric goods of furniture, and the electric goods of furniture can be Christmas goods of furniture and the like.

And 120, loading the to-be-identified ornament control voice input information into a control task event recognition algorithm, and performing voice description vector mining on the to-be-identified ornament control voice input information through a first voice description vector mining branch set of the control task event recognition algorithm to obtain control task event demand description data.

The control task event recognition algorithm may be a speech recognition model, and the first speech description vector mining branch set includes a synchronous control task event demand description mining component and a first feature aggregation component connected with each synchronous control task event demand description mining component.

And 130, loading the control task event demand description data to a second voice description vector mining branch set according to the control task event recognition algorithm to obtain control task relative distribution description data corresponding to a target control task event in the to-be-recognized ornament control voice input information.

The second voice description vector mining branch set comprises a synchronous control task relative distribution description mining component and a second characteristic aggregation component connected with each synchronous control task relative distribution description mining component. Further, the control task relative distribution description data is used for representing the priority, the sequence and the like of the target control task event in the to-be-identified ornament control voice input information.

And 140, generating a goods of furniture control execution strategy corresponding to the target control task event based on the control task relative distribution description data.

The control execution strategy of the ornament can be packaged into a control instruction which is sent to the corresponding controller of the Christmas ornament, so that the control of the Christmas ornament is realized.

Steps 110-140 are described by way of a complete example.

In step 110, the user provides the following instructions via voice input: "Christmas ornaments, the sphere part is rotated 30 degrees, and simultaneously, the Jingle Bells music is played, and the snowflake module is allowed to fall in a quick mode. "

In step 120, the voice instructions of the user are processed using the voice recognition model. And converting the to-be-recognized ornament control voice input information into a voice description vector mining result through the synchronous control task event demand description mining component and the first characteristic aggregation component.

In step 130, control task event demand description data is loaded into the second set of speech description vector mining branches based on the output of the control task event recognition algorithm. This branch set includes a synchronized control task relative distribution description mining component and a second feature aggregation component.

In step 140, an ornament control execution policy is generated based on the control task relative distribution description data, and an ornament control execution policy for the target control task event is generated based on the control task relative distribution description data.

The decoration control execution strategy may include the following contents: (1) partial rotation for sphere: generating a control instruction of sphere rotation according to the angle information in the control task relative distribution description data, and rotating the sphere part by 30 degrees; (2) for music playback: generating a control instruction for playing music according to the music selection information in the control task relative distribution description data, and selecting to play the Jingle Bells tracks; (3) for snowflake modules to drift: and generating a control instruction for the snowflake module to fall according to the speed information in the control task relative distribution description data, and setting the control instruction into a quick mode.

And finally, packaging the generated ornament control instruction into a control instruction sequence and issuing the control instruction sequence to a corresponding Christmas ornament controller. Thus, the Christmas ornament will execute corresponding actions according to instructions, including rotating the sphere part, playing the Jille Bells music and letting the snowflake module fly in a quick mode. By fusing the complex examples of the control transformation of the furnishing structure, the user can realize a furnishing control experience with more interactivity and individuation.

In the embodiment of the invention, the control task event recognition algorithm is used for realizing the processing of the control voice input information of the to-be-recognized ornament based on a voice recognition model. Embodiments of the invention may employ recurrent neural network transcription (Recurrent Neural Network Transducer, RNN-T) for processing.

RNN-T is a sequence-to-sequence model that can map an input sequence (speech signal) to an output sequence (text label). It is composed of three main components:

1. encoder (Encoder): and extracting the characteristics of the input voice signal and generating a voice characteristic vector sequence. Common feature extraction methods include mel-frequency cepstrum coefficients (Mel Frequency Cepstral Coefficients, MFCC) and the like;

2. decoder (Decoder): the output sequence is generated step by adopting a cyclic neural network (Recurrent Neural Network, RNN) structure. In control task event recognition, the decoder aims at mapping the voice feature vector sequence to corresponding control task event demand descriptions;

3. mutually exclusive classification Layer (CTC Layer): for aligning the output sequence generated by the decoder with the real label for training and optimization. CTC loss functions may help model learning to correctly identify an input speech signal.

By training a large amount of marked voice data, the voice recognition model can gradually learn the mapping relation between the input voice signal and the corresponding control task event demand description. Once the model training is completed, it can be used to translate the to-be-identified item control speech input information into control task event demand description data.

It should be noted that other types of models, such as end-to-end models or hybrid models based on a transducer architecture, may be employed for specific control task event recognition algorithms to better handle complex speech input information. The selection of the appropriate model depends on the specific application scenario and performance requirements.

In an embodiment of the present invention, the processing of steps 110-140 using recurrent neural network transcription (RNN-T) has the following advantages:

(1) Sequence modeling capability: RNN-T is a sequence-to-sequence model suitable for processing tasks where both input and output are variable length sequences. In the control of the ornament, both the voice input information and the control task event demand description can be regarded as sequence data. RNN-T is able to efficiently model these two sequences and capture the timing relationship between them;

(2) Context understanding capabilities: with the RNN structure, the RNN-T can use the context information to infer the association between the current input and output. In voice control, the user's instructions typically include a number of components, such as control actions, parameter settings, and the like. The RNN-T can better understand the meaning of the whole control task event using the context information in the decoder;

(3) Robustness and generalization capability: the RNN-T is trained through a large-scale training data set, so that rich speech feature representation and modes of controlling task events can be learned. This makes it appear more robust and generalizable in the face of speech input and diversified control task events for different users;

(4) End-to-end training and reasoning: the RNN-T model may perform end-to-end training and reasoning. This means that the optimization can be performed directly from the voice input to the generation of the control task event demand description, simplifying the complexity of system design and deployment;

(5) The update and improvement can be iterated: the RNN-T model is relatively flexible and can be improved by gradually adjusting the model structure, adjusting super parameters and the like. With the collection of more data and the iterative training of the model, the recognition accuracy and the effect of controlling the task execution can be continuously improved.

In summary, the process of steps 110-140 using recurrent neural network transcription has better performance and adaptability in speech recognition and control task event recognition. The method can effectively process variable-length sequence data, provide context understanding capability, has the characteristics of robustness, generalization capability and iterative optimization, and provides more accurate, intelligent and personalized experience for the control of the ornament.

In the above embodiment, the speech description vector mining is a key step, and the speech description vector is a representation obtained by feature extraction and encoding of a speech input. For example, by the MFCC (Mel Frequency Cepstral Coefficients) algorithm, the speech signal is divided into different time windows and the spectral characteristics of each window are calculated. These features can then be converted into a vector representation of fixed dimensions, forming a speech description vector. Examples: through the speech description vector mining, the user's voice command ' rotate sphere part 30 degrees ' is converted into the corresponding speech description vector for subsequent processing and analysis.

In the above embodiment, the control task event requirement description data includes specific requirements and description information of the user on the control of the ornament. It consists of a speech description vector or text document and is used to guide the control operation of the ornament. For example, a user may require the furnishing to perform a variety of control tasks, such as rotating, playing music, and module drifting, etc. Examples: the voice command of the user is converted into the task event demand description data by rotating the sphere part by 30 degrees, playing the Jingle Bells music at the same time and enabling the snowflake module to fall in a rapid mode, wherein the specific requirements include the sphere rotation angle, the music track, the falling speed of the snowflake module and the like.

In the above-described embodiments, the control task relative distribution description data is used to describe the relative relationships and constraints between different control tasks. It provides information on the order, time interval, duration, etc. of the control tasks to generate a reasonable control execution strategy for the pendulums. These data may be obtained based on semantic analysis or a priori knowledge in the user instructions. Examples: and analyzing the control task event demand description data to generate control task relative distribution description data. For example, it is determined that the ball rotation should be triggered after the start of music playing, and that the snowflake module should be continuously dropped during the music playing.

Through voice description vector mining, control task event demand description data and control task relative distribution description data processing, the system can better understand the user instructions and generate corresponding goods of furniture control execution strategies so as to realize personalized and complex goods of furniture control experience required by the user.

In the above-described embodiments, the control task event demand description mining component is configured to extract and parse control task event demand description information from voice input or text data. It may include Natural Language Processing (NLP) techniques and domain-specific rules to identify and extract specific control requirements of the user. Examples: by using natural language processing technology, the control task event requirement description mining component can extract keywords of 'sphere part' and '30 degrees of rotation' from the voice command of a user, and further recognize the rotation action and angle of the sphere.

In the above embodiment, the control task relative distribution description mining component is configured to analyze a relative order, a time constraint, and the like between control tasks, and generate relative distribution description information of the control tasks. It can infer time sequence relation between different control tasks based on semantic analysis, knowledge base or statistical inference. Examples: by analyzing the user's voice command to "rotate the ball portion 30 degrees, let the snowflake module fly down in a fast mode after the music play starts", the control task relative distribution description mining component can infer that the ball rotation action should be triggered after the music play starts, and the snowflake module fly down should be continued during the music play.

In the above embodiment, the feature aggregation component is configured to integrate and aggregate feature information obtained from different sources to form a complete control task event description. The method can fuse a plurality of characteristics such as voice description vector, control task event demand description, control task relative distribution description and the like so as to provide comprehensive control task description. Examples: the feature aggregation component can integrate the characteristics of a sphere rotation angle obtained by excavating a voice description vector, a music track and snowflake module drifting speed obtained by excavating a control task event demand description, a time sequence relation obtained by excavating a control task relative distribution description and the like to form a complete control task description, for example, the sphere rotates for 30 degrees, and after music playing starts, the snowflake module drifts in a rapid mode.

Through the cooperative work of the control task event demand description mining component, the control task relative distribution description mining component and the feature aggregation component, the system can extract key information from voice input, analyze the relation among tasks and integrate various features so as to generate comprehensive and accurate control task event description and provide guidance for subsequent ornament control operation.

In summary, applying steps 110-140 and processing with RNN-T can significantly improve the timeliness and accuracy of the control execution strategy of the ornament. The system can realize real-time response and instant update so as to meet the quick response requirement of a user on the control of the ornament. Meanwhile, through context understanding, multi-mode fusion and iterative optimization, the system can generate more accurate control task event descriptions, provide accurate and intelligent guidance for the control of the ornament, and further develop and explain time effectiveness and accuracy.

1. Timeliness (timeline): by performing the processing of steps 110-140 using recurrent neural network transcription (RNN-T), the timeliness of the placement control enforcement strategy may be improved.

Real-time response: the RNN-T model has lower reasoning delay, and can realize rapid voice recognition and task event recognition. This enables the system to quickly generate a corresponding control task event description after receiving a voice instruction from the user, and to drive the ornament to perform a corresponding control operation.

Updating in real time: because the RNN-T model supports end-to-end training and reasoning, online learning and updating can be flexibly performed. When the system encounters new voice command or task event requirements during steps 110-140, it can integrate these information in real-time reasoning, adjusting the control strategy in time to meet the new control requirements.

2. Accuracy (Accuracy): the use of RNN-T for the process of steps 110-140 may also improve the accuracy of the control execution strategy for the trim.

Context understanding: the RNN-T model can use the context information to infer the association between the current input and output. This means that it can better understand the semantics and control requirements in the user's voice instructions to generate an accurate control task event description.

Multimodal fusion: in addition to voice input, the RNN-T may incorporate other sensory data, such as multimodal information such as images, sensors, etc. By comprehensively analyzing the data of different modes, the system can more comprehensively understand the control intention of the user and generate a more accurate ornament control execution strategy.

Iterative optimization: the RNN-T model has the characteristics of iterative updating and improvement. By continuously collecting user feedback and iterative training, the model can gradually improve recognition accuracy and control task execution effects, thereby providing a more accurate ornament control execution strategy.

In some possible embodiments, the step 120 of performing voice description vector mining on the to-be-identified furniture piece control voice input information through the first voice description vector mining branch set of the control task event recognition algorithm to obtain control task event requirement description data includes steps 121-124.

Step 121, inputting the to-be-recognized ornament control voice input information into a first deep learning component, and extracting to obtain a first ornament control voice semantic vector.

Wherein the deep learning component may be a deep learning subnet shared by the neural network weights.

For example, the voice instruction spoken by the user is "increase rotation speed and turn on the lamp light". The first deep learning component processes the speech input and extracts corresponding semantic vectors indicating that the user wishes to increase rotational speed and turn on the light.

Step 122, loading the first swing control speech semantic vector into a synchronized first moving average component and a first feature downsampling component, the first deep learning component, the first moving average component, and the first feature downsampling component comprising a semantic refinement process.

For example, a first Ornaments control speech semantic vector is loaded into a first moving average component and a first feature downsampling component for a semantic refinement process.

As such, these components can extract important control instruction information from the semantic vector, such as rotational speed and lighting brightness requirements. Through the semantic refining process, the extracted features can be further optimized, noise interference is reduced, and the extracted semantic information is ensured to be more accurate.

Step 123, generating a moving average vector of the swing control voice through the first moving average component, and generating a local structure control logic feature through the first feature downsampling component.

For example, using a first moving average component, a moving average vector of the part control speech is calculated from the successive semantic vectors. Meanwhile, through the first feature downsampling component, local structure control logic features, such as key feature points of the change trend of the rotation speed and the brightness of the lamplight, are extracted.

In this way, the swing-controlled speech-motion average vector may provide a smooth control transition instruction that enables a smooth transition of the mechanical structure. The local structure control logic features can capture important change modes of the rotation speed and the lamplight brightness, and provide more specific information for subsequent control strategy generation.

And 124, carrying out AI knowledge fusion on the moving average vector of the swing control voice and the local structure control logic characteristic through the first characteristic aggregation component to obtain the control task event demand description data.

For example, the pendulum control speech moving average vector and the local structure control logic feature are fused by a first feature aggregation component. For example, control task event demand description data such as "increase rotation speed to 50%, turn on lamp light to 80%" is generated based on the rotation speed moving average vector and key feature points of the lamp light brightness.

Therefore, through the fusion process of the first characteristic aggregation component, information of different characteristic sources can be integrated, and more comprehensive and accurate control task event demand description data can be formed. This helps the system better understand the control intent of the user and generates a more accurate, intelligent control task event description, thereby guiding the control transformation of the mechanical structure.

In summary, in steps 121-124, more accurate and comprehensive task event requirement description data can be extracted from the input of the ornament control voice through the extraction of the semantic information by the deep learning component, the smoothing of the moving average component, the local structure extraction of the feature downsampling component and the fusion of the feature aggregation component. The beneficial effects of the steps include improving semantic understanding capability, reducing noise interference, capturing important information, integrating various characteristic sources and the like, and providing more accurate and intelligent guidance for subsequent mechanical structure control transformation.

Under some optional design considerations, in step 130, the control task event requirement description data is loaded to a second voice description vector mining branch set according to the control task event recognition algorithm to obtain control task relative distribution description data corresponding to a target control task event in the to-be-recognized ornament control voice input information, including steps 131-134.

And 131, inputting the control task event demand description data into a second deep learning component, and extracting to obtain a second ornament control voice semantic vector.

Assuming that the control task event demand description data is "adjust rotation speed to 50%, adjust light brightness to 80%", the description is processed by the second deep learning component, and a semantic vector representation of the second ornament control voice is obtained.

Step 132, loading the second swing control speech semantic vector into a synchronous second moving average component, a third moving average component and a second feature downsampling component, wherein the scales of speech description vector mining branches of the second moving average component and the third moving average component are different, and the second deep learning component, the second moving average component, the third moving average component and the second feature downsampling component comprise a semantic refining process.

The second cargo control speech semantic vector is loaded into a second moving average component, a third moving average component, and a second feature downsampling component. These components perform feature smoothing and extraction based on semantic vectors, while performing a semantic refinement process to capture important features and trends of control task events.

Step 133, generating a first relative distribution moving average vector by the second moving average component, generating a second relative distribution moving average vector by the third moving average component, and generating a relative distribution upstream and downstream feature by the second feature downsampling component.

And calculating a first relative distribution moving average vector through a second moving average component, and describing the change trend of the control task event on the time sequence. And obtaining a second relative distribution moving average vector through a third moving average component, wherein the second relative distribution moving average vector represents the distribution characteristics of the control task event in different dimensions. Meanwhile, through the second feature downsampling component, the relatively distributed upstream and downstream features are extracted, and the associated information of the control task event is revealed.

And step 134, carrying out AI knowledge fusion on the first relative distribution moving average vector, the second relative distribution moving average vector and the relative distribution upstream and downstream features through the second feature aggregation component to obtain the control task relative distribution description data. Wherein AI knowledge fusion includes feature fusion.

The first relative distribution moving average vector, the second relative distribution moving average vector and the relative distribution upstream and downstream features are fused by a second feature aggregation component. For example, the control task relative distribution description data is generated by combining the trend and distribution characteristics of the two moving average vectors and the associated information of the relative distribution upstream and downstream characteristics. The data expresses the distribution characteristics of the target control task event in time sequence and multiple dimensions, and provides more comprehensive and accurate guidance for the subsequent generation of the ornament control strategy.

The following is a more detailed example, illustrating numerical calculations and vector examples in each step: assume the following control task event demand description data: "the rotation speed is adjusted to 50%, and the light brightness is adjusted to 80%".

Step 131: assume that the second deep learning component obtains the following semantic vectors of the second ornament control speech by processing: [0.2,0.7, -0.5].

Step 132: assuming that the speech description vector mining branch size of the second moving average component is 3, the speech description vector mining branch size of the third moving average component is 2. The second feature downsampling component performs feature extraction according to the semantic vector to generate relatively distributed upstream and downstream features. Output of the second moving average component (first relative distribution moving average vector): [0.15,0.35,0.55]; output of the third moving average component (second relative distributed moving average vector): [0.4,0.6]; relatively distributed upstream and downstream features: [0.7,0.9,0.8].

Step 134: and carrying out AI knowledge fusion through a second feature aggregation component, and fusing the first relative distribution moving average vector, the second relative distribution moving average vector and the relative distribution upstream and downstream features. Control task relative distribution description data after AI knowledge fusion: [0.15,0.35,0.55,0.4,0.6,0.7,0.9,0.8].

In this example above, specific numerical values are used to represent the calculation result and vector examples for each step. Note that this is just an example, and the values and vectors in actual practice may vary depending on the system design and algorithm model.

In step 131, a second deep learning component is used to extract a second ornament control speech semantic vector, which enables extraction of a vector representation with more semantic information from the input data. In step 132, the second shot control speech semantic vector is loaded into the synchronized second, third, and second feature downsampling components and processed through a semantic refinement process. The processing mode can effectively extract and capture important characteristics of input data, and further improves the expression capacity and generalization performance of the model. In step 133, a first relative distribution moving average vector, a second relative distribution moving average vector, and a relative distribution upstream and downstream feature are generated, which generated features can more fully describe the relative distribution of the data, helping to more accurately understand the features of the control task event. In step 134, AI knowledge fusion is performed by the second feature aggregation component, and feature fusion is performed on the first relative distribution moving average vector, the second relative distribution moving average vector, and the upstream and downstream features of the relative distribution, so that comprehensive expression capability and performance of the model are further improved.

In conclusion, the design thought has the advantages of improving the performance of a control task event recognition algorithm, extracting vector representation with more semantic information, capturing important features, describing the relative distribution condition of data, and improving the comprehensive expression capability of the model through AI knowledge fusion.

In some alternative embodiments, the generating, in step 140, the ornament control execution policy corresponding to the target control task event based on the control task relative distribution description data includes steps 141-144.

And 141, determining an event state of a target control task event in the to-be-identified ornament control voice input information based on the control task relative distribution description data.

Step 142, determining the number of the to-be-controlled parts of the target control task event based on the event state of the target control task event.

The part to be controlled is a relevant structural part in the Christmas ornament, and the Christmas ornament can be composed of a plurality of structural parts.

Step 143, determining key part distribution characteristics of the target control task event based on the control task relative distribution description data.

And 144, determining a decoration control execution strategy and a decoration control expected track of the target control task event based on the number of the parts to be controlled and the key part distribution characteristics.

In the above embodiment, according to the control task relative distribution description data, it may be determined whether the current state of the target control task event, for example, rotation, music playing, and other actions triggering are required. By analyzing the state of the target control task event, the number of specific parts to be controlled can be determined. In this example, the site to be controlled may include a sphere portion and a snowflake module. According to the control task relative distribution description data, the positions and distribution characteristics of key parts in the target control task event can be determined. This helps to control these sites more accurately to achieve the desired effect. In combination with the number of locations to be controlled and the distribution characteristics of the critical locations, it is possible to determine how to implement the control strategy of the pendulums, as well as the desired motion trajectories of the pendulums. In this example, based on the input information, a control strategy may be designed: the ball part is rotated 30 degrees, while the Jingle Bells music is played, and the snowflake module is allowed to fly down in a quick mode.

To sum up, first, the current state of the target control task event can be determined through step 141. The specific actions and operations to be performed can be understood from the control task relative distribution description data. Next, the number of sites to be controlled is determined based on the event status of the target control task event, via step 142. This enables accurate control of the relevant structural parts in the christmas ornament. Next, in step 143, the key location distribution characteristics of the target control task event are determined. The structure and characteristics of the ornament can be better understood by analyzing the control task relative distribution description data. Finally, in step 144, in combination with the number of sites to be controlled and the distribution characteristics of the critical sites, a swing control execution strategy and desired trajectories can be generated. This means that specific actions and movements can be designed which are suitable for achieving the desired effect.

In some possible embodiments, the method for debugging the control task event recognition algorithm includes steps 210-240.

Step 210, obtaining a set of pieces of placement control voice input information samples, where the pieces of placement control voice input information samples in the set of pieces of placement control voice input information samples include pieces of placement control execution policy samples corresponding to control task event samples.

This step is part of the debugging method that controls the task event recognition algorithm. In this step, a set of control-by-oscillating speech input information samples is obtained and contains a series of control-by-oscillating speech input information samples, each corresponding to a control task event sample and a corresponding control-by-oscillating execution strategy sample. The purpose of this sample set is for use in debugging control task event recognition algorithms. Through a specific voice input information sample, the recognition effect of the algorithm under different conditions can be observed, and debugging and optimization can be performed. Construction of a sample set of inventory control voice input information typically involves collecting voice command examples of various inventory control tasks and recording the expected inventory control execution strategy for each example. These samples will be used as input data to debug the control task event recognition algorithm to verify the recognition capability and accuracy of the algorithm for different task events. Through the step, the sample set of the input information of the ornament control voice can be obtained, and necessary input data is provided for debugging of a follow-up control task event recognition algorithm.

Step 220, loading the swing control voice input information samples in the swing control voice input information sample set into a to-be-debugged control task event recognition algorithm, wherein the to-be-debugged control task event recognition algorithm performs voice description vector mining on the swing control voice input information samples through a first voice description vector mining branch set to obtain debugging control task event demand description data, and the first voice description vector mining branch set comprises a synchronous control task event demand description mining component and a first feature aggregation component connected with each synchronous control task event demand mining component.

In step 220 of the debugging method of the control task event recognition algorithm, the pieces of the control voice input information samples in the pieces of the control voice input information samples are loaded into the control task event recognition algorithm to be debugged. And the control task event recognition algorithm to be debugged performs voice description vector mining on the loaded goods of furniture control voice input information sample through the first voice description vector mining branch set. The goal of this process is to extract the demand description data associated with the control task event from the input voice information for further debugging and optimization. The first set of speech description vector mining branches generally includes a synchronized control task event requirement description mining component and a first feature aggregation component coupled to each mining component. The control task event demand description mining component is configured to extract a demand description of the control task event from the voice input information, and the first feature aggregation component is configured to integrate results of the respective task event demand description mining components. Through step 220, load the goods of furniture control voice input information sample to the control task event recognition algorithm to be debugged, and utilize the first voice description vector mining branch set to mine the voice description vector. Thus, control task event demand description data required by debugging can be obtained, and a basis is provided for further optimization algorithm.

Step 230, loading the debugging control task event demand description data to a second voice description vector mining branch set through the to-be-debugged control task event recognition algorithm to obtain debugging control task relative distribution description data corresponding to a control task event sample in the goods of furniture control voice input information sample, wherein the second voice description vector mining branch set comprises a synchronous control task relative distribution description mining component and a second feature aggregation component connected with each synchronous control task relative distribution description mining component.

In step 230 in the method for debugging the control task event recognition algorithm, debugging control task event requirement description data is loaded into the second speech description vector mining branch set by the control task event recognition algorithm to be debugged. The second set of phonetic description vector mining branches is typically comprised of synchronized control task relative distribution description mining components and a second feature aggregation component coupled to each mining component. And debug control task event demand description data is obtained from the first speech description vector mining branch set. The debugging control task event demand description data is loaded to the second voice description vector mining branch set, so that debugging control task relative distribution description data corresponding to the control task event sample in the decoration control voice input information sample can be further extracted. Debugging control task relative distribution description data can help to more accurately understand relationships and feature distributions between different control task events. These data are important to optimize the accuracy and stability of the control task event recognition algorithm. Therefore, in step 230, the debug control task event requirement description data is loaded into the second speech description vector mining branch set to obtain debug control task relative distribution description data corresponding to the control task event sample in the decoration control speech input information sample, so as to provide guidance for optimization and improvement of the algorithm.

And 240, optimizing algorithm variables of the to-be-debugged control task event recognition algorithm based on the debugging control task relative distribution description data and the goods of furniture control execution strategy sample until the debugging control task relative distribution description data generated by the to-be-debugged control task event recognition algorithm meets a steady state requirement (convergence requirement) to obtain the debugged control task event recognition algorithm.

In step 240 in the method for debugging the control task event recognition algorithm, algorithm variables of the control task event recognition algorithm to be debugged are optimized based on the debugging control task relative distribution description data and the fixture control execution policy sample. First, debug control task relative distribution description data is used that describes the relationships and feature distributions between different control task events. By analysing this data it is possible to check whether the algorithm can accurately identify different task events and to determine the possible direction of improvement. Second, policy examples are implemented in connection with the control of the pendulums, which represent the desired behavior of the control of the pendulums. The accuracy and consistency of the algorithm can be evaluated by comparing the debugging control task relative distribution description data generated by the algorithm with the goods of furniture control execution strategy sample. Based on these evaluation results, optimization of the algorithm variables is performed. It is possible to adjust parameters of the algorithm, improve feature extraction methods, or employ other machine learning techniques to improve the performance of the algorithm. This process is an iterative process until the debug control task relative distribution description data generated by the control task event recognition algorithm to be debugged meets the steady state requirements. The control task event recognition algorithm to be debugged is continuously debugged and optimized until the debugged control task event recognition algorithm is obtained, which can accurately recognize the ornament control task event and generate the debugging control task relative distribution description data conforming to the expectation, through step 240. In this way, accurate identification of control task events and optimization of the swing control execution strategy can be achieved.

Through the series of operations of steps 210 through 240, debugging and optimization of the control task event recognition algorithm can be achieved, thereby improving its performance and accuracy. The process includes constructing a set of pendulum control speech input information samples, loading the samples and mining speech description vectors, loading and mining debugging task event demand description data, and optimizing algorithm variables.

First, the example of the swing control voice instruction is collected and a sample set for debugging is created. These actual speech input information samples can be used to test the recognition capabilities of the algorithm and provide the basis for subsequent optimization work.

Next, these samples are loaded into the control task event recognition algorithm to be debugged, and the branch set is mined by using the speech description vector to extract the demand description data therein. This step enables the algorithm to accurately extract information related to the control task event from the speech input.

And then loading debugging control task event demand description data into a second voice description vector mining branch set, and further extracting relative distribution description data of control task events in the swing control voice input information sample. Through analysis of the relation and the characteristic distribution among the task events, more comprehensive understanding is obtained, and the performance of the algorithm is optimized.

And finally, optimizing the parameter and feature extraction method of the algorithm based on the debugging control task relative distribution description data and the decoration control execution strategy sample. Through the iterative process, the accuracy and consistency of the algorithm is continuously improved until a debugged control task event recognition algorithm meeting the expected effect is obtained.

In summary, through steps 210 to 240, the algorithm can be tested by using the real sample data, and the key information can be extracted therefrom, so as to finally optimize the performance of the algorithm. The technical effects enable the algorithm to accurately identify the control task event and generate the relative distribution description data which accords with the expectation, and provide guidance and support for optimizing the control execution strategy of the ornament.

In further alternative embodiments, the method further comprises steps 201-203.

Step 201, obtaining a priori debugging learning basis (training annotation information) corresponding to a Ornaments control execution strategy sample corresponding to the control task event sample in the cargo control voice input information sample.

In step 201 in an alternative embodiment of the control task event recognition algorithm, a priori debug learning basis, also referred to as training annotation information, corresponding to a shot control execution policy sample corresponding to a shot control task event sample in a shot control speech input information sample is obtained. The a priori debug learning basis is marking or annotation information provided for each of the furnishing control task event samples by manual or other means. Such annotation information may include expected control of the ornament to perform actions, the correct command sequence, or other relevant information. By acquiring a priori debug learning basis, a benchmark or reference may be established to evaluate the performance of the control task event recognition algorithm to be debugged. These bases provide the algorithm with expected control task events and corresponding control execution strategies for comparison and verification with the results generated by the algorithm. Through analysis and understanding of priori debugging learning basis, the characteristics and the requirements of the control task event can be better known, and guidance is provided for the accuracy and consistency of the optimization algorithm. In addition, these training annotation information may also be used for model training and algorithm improvement purposes to improve the performance of the control task event recognition algorithm. Therefore, in step 201, a priori debug learning basis corresponding to a control task event sample in the sample of the information of the input of the swing control voice is obtained, so as to provide a benchmark and a reference, evaluate the performance of the algorithm, and serve as a basis for improvement and optimization.

Step 202, determining a universal variable (initialization variable) corresponding to the control task event recognition algorithm based on the prior debugging learning basis.

Step 202 in an alternative embodiment of the control task event recognition algorithm determines a generic variable corresponding to the control task event recognition algorithm, also referred to as an initialization variable, based on a priori debug learning basis. The generic variables are initial parameters or configurations set at the algorithm initialization stage. The selection of these variables is based on a priori debug learning basis and analysis and inference of domain knowledge. By determining the universal variables, a reasonable starting point is provided for the control task event recognition algorithm so as to carry out subsequent debugging and optimization. The setting of these variables may reflect a priori knowledge such as common characteristics of a particular task event, relevant context information, etc. The setting of the general variables may involve aspects such as model structure, algorithm parameters, feature extraction methods, etc. By reasonable initialization, the convergence, stability and accuracy of the algorithm can be improved, and a foundation is laid for subsequent debugging work. The universal variable determined based on the prior debugging learning basis in step 202 becomes the initial setting of the control task event recognition algorithm to be debugged, and provides a starting point and a reference for the improvement and optimization of the algorithm. Subsequent steps will make further adjustments and optimizations around these initialization variables to improve the performance and accuracy of the algorithm. Therefore, in step 202, the universal variable corresponding to the control task event recognition algorithm is determined based on the a priori debug learning basis, and is used as a starting point and a reference for algorithm initialization to support subsequent debugging and optimization work.

And 203, adjusting (initializing) algorithm variables of the control task event recognition algorithm based on the universal variables to obtain the control task event recognition algorithm to be debugged.

In step 203 in an alternative embodiment of the control task event recognition algorithm, algorithm variables of the control task event recognition algorithm are adjusted (initialized) based on the universal variables, resulting in a control task event recognition algorithm to be debugged. Algorithm variables refer to parameters or configuration items that can be adjusted and optimized in the control task event recognition algorithm. By adjusting these algorithm variables, the behavior and performance of the algorithm can be changed. In step 203, the algorithm variables are initialized based on the determined universal variables. This may involve setting parameters such as specific weight matrices, thresholds, learning rates, etc., and selecting an appropriate model structure or algorithm framework. An initial state is provided for a control task event recognition algorithm to be debugged through reasonable initialization of algorithm variables, so that subsequent debugging and optimization work can be facilitated. This process helps to develop the algorithm in the desired direction and increases its accuracy, robustness and efficiency. The objective of step 203 is to adjust the algorithm variables in the initialization phase, so that the control task event recognition algorithm to be debugged can have a certain initial performance, and lay a foundation for subsequent optimization and improvement. Accordingly, in step 203, the algorithm variables of the control task event recognition algorithm are adjusted (initialized) based on the universal variables, thereby obtaining the algorithm state to be debugged. This provides a starting point and basis for subsequent debugging and optimization to improve the performance and accuracy of the algorithm.

In some alternative embodiments, the obtaining the set of shot control voice input information samples in step 210, where the shot control voice input information samples in the set of shot control voice input information samples include shot control execution policy samples corresponding to control task event samples, includes steps 211-213.

Step 211, obtaining a past ornament control voice input information set, where the past ornament control voice input information in the past ornament control voice input information set includes a corresponding past ornament control execution policy.

In step 211 of the alternative embodiment, a set of past ornament control voice input information is obtained, including voice input information of a corresponding past ornament control execution policy. The past ornament control voice input information set refers to the voice input information recorded in the past ornament control task process. Such information may include voice commands issued by the user, corresponding furnishing control execution actions, and other related data. By acquiring the past cargo control speech input information set, a historical data set can be established for analyzing and learning past control task events and corresponding control execution strategies. Such past information may provide valuable experience and reference to help optimize and improve the performance of the control task event recognition algorithm. In step 211, past item control voice input information is collected and consolidated for subsequent processing and analysis. Such information may be used to train models, extract features, build statistical models, or perform other forms of data processing, with the aim of helping to improve the processing and recognition accuracy of the piece of furniture control voice input information. Accordingly, in step 211, a past ornament control voice input information set is obtained, which includes the voice input information of the past ornament control execution policy corresponding thereto. This provides a basis for subsequent data analysis and algorithm optimization to improve the performance and accuracy of control task event identification.

Step 212, obtaining a setting selection instruction, wherein the setting selection instruction comprises the correlation characteristics of past control task events and past voice input information processing instructions of different numbers of parts to be controlled. Wherein setting the selection indication comprises setting a screening rule, and the association feature can be understood as a matching feature or a pairing relation.

In step 212 in an alternative embodiment, a setting selection indication is obtained. The set selection indication is a set of indications or rules for determining the characteristics of the association between past control task events and past voice input information processing indications. These indications may include features related to the number of sites to be controlled, as well as rules for screening and matching past data. By obtaining the setting selection indication, the past ornament control voice input information can be selected and processed according to the specific association characteristics, so that the requirement of the current control task is met. The setting selection instruction may relate to a correlation characteristic between past control task events and past voice input information processing instructions of different numbers of parts to be controlled. These features may facilitate appropriate data screening, matching, and correlation to extract past item control voice input information related to a current control task event. In step 212, a setting selection instruction is acquired and its content is parsed. These indications will be used as references in subsequent steps to obtain speech input information processing conditions applicable to the current control task event based on the number of control related parts and associated features in the past ornament control speech input information. Therefore, in step 212, a setting selection instruction is obtained, which includes the association features of past control task events and past voice input information processing instructions of different numbers of parts to be controlled. This provides a basis for subsequent data processing and screening to obtain samples of the swing control voice input information suitable for the current control task event.

Step 213, obtaining a voice input information processing condition corresponding to each piece of past piece of control voice input information based on the number of the control parts of the past piece of control execution strategy and the related characteristics in the past piece of control voice input information, and processing the corresponding piece of past piece of control voice input information according to the voice input information processing condition to obtain a piece of control voice input information sample, wherein the piece of control voice input information sample comprises a piece of control execution strategy sample corresponding to a control task event sample.

In step 213 of the optional embodiment, based on the number of the controlling parts of the past ornament control execution policy in the past ornament control voice input information set and the associated features, a voice input information processing condition corresponding to each of the past ornament control voice input information is obtained. The voice input information processing condition is a condition determined according to the association characteristic and the number of the related control parts and is used for selecting and processing the past ornament control voice input information. These conditions may include specific filtering rules, matching algorithms, or other relevant processing strategies. Based on the association features and the number of the control parts, determining which voice input information has relevance and applicability to the current control task event by analyzing data in the past ornament control voice input information set. In step 213, corresponding past ornament control voice input information is processed according to the voice input information processing conditions. This may involve data screening, feature extraction, pattern matching, etc. operations to obtain samples of the control speech input information for the fixture that meet the current control task event requirements. By processing the voice input information, the past data can be better matched and correlated with the current control task, thereby providing more accurate and useful training samples for the control task event recognition algorithm. Therefore, in step 213, according to the number of the control parts in the past ornament control voice input information set and the related characteristics, the voice input information processing conditions corresponding to each past ornament control voice input information are obtained, and the past ornament control voice input information is processed to obtain an ornament control voice input information sample applicable to the current control task event.

Firstly, a historical data set is established by obtaining a past ornament control voice input information set comprising corresponding past ornament control execution strategies for analyzing and learning past control task events and corresponding control execution strategies. This allows valuable knowledge and information to be drawn from past experience. Then, by obtaining the setting selection instruction, the association characteristics and the matching rules are determined so as to select and process the past ornament control voice input information. The method has the advantages that the past data related to the current control task event can be screened out according to specific conditions and requirements, and accuracy and pertinence are provided for subsequent analysis and processing. And finally, based on the number of the related control parts and the related characteristics, processing the past ornament control voice input information according to the voice input information processing conditions. This includes data screening, feature extraction, pattern matching, etc. to obtain samples of the control speech input information for the fixture that meet the current control task event requirements. The purpose of this step is to improve the quality and applicability of the data samples related to the current task. In summary, through the technical operations from step 211 to step 213, a historical dataset can be established, and valuable information can be extracted by using past experience; determining association characteristics and matching rules according to the setting selection indication, and realizing screening and matching of data; and processing the past ornament control voice input information based on the processing conditions to obtain a sample applicable to the current control task event. The technical effects provide more accurate, specific and targeted training samples for the control task event recognition algorithm, so that the performance and accuracy of the algorithm are improved, and the optimization and improvement of the ornament control system are promoted.

In some examples, the setting selection indication includes a first tier past voice input information processing indication, a second tier past voice input information processing indication, a third tier past voice input information processing indication, and a fourth tier past voice input information processing indication. Based on this, in step 213, based on the number of the controlling parts of the past ornament control execution policy and the related features in the past ornament control voice input information, the voice input information processing conditions corresponding to each of the past ornament control voice input information are obtained, and the past ornament control voice input information is processed according to the voice input information processing conditions to obtain an ornament control voice input information sample, where the ornament control voice input information sample includes an ornament control execution policy sample corresponding to a control task event sample, and the method includes steps 2131, 2132, or 2133.

Step 2131, when the number of the control components corresponding to the past-cargo control execution policy in the past-cargo control voice input information is smaller than the number of the first to-be-controlled parts corresponding to the first-layer past-cargo input information processing instruction, and/or when the number of the control components corresponding to the past-cargo control execution policy in the past-cargo control voice input information is greater than the number of the fourth to-be-controlled parts corresponding to the fourth-layer past-cargo input information processing instruction, removing the past-cargo control voice input information.

Step 2132, when the past cargo control execution policy in the past cargo control voice input information matches the number of the second to-be-controlled parts corresponding to the second layer of past cargo input information processing instruction, obtaining a set reconstruction index, performing voice fine granularity expansion on part of the past cargo control voice input information based on the set reconstruction index, obtaining past cargo control voice input information after voice fine granularity expansion, and forming the past cargo control voice input information after voice fine granularity expansion into the cargo control voice input information sample set.

Step 2133, when the past cargo control execution policy in the past cargo control voice input information matches the number of third to-be-controlled parts corresponding to the third layer past cargo control voice input information processing instruction, forming the past cargo control voice input information into the cargo control voice input information sample set.

For example, in the past voice input information processing instruction of different layers:

the first layer of past voice input information processing instructions may define the number of first parts to be controlled. If the number of the control parts corresponding to the swing control execution strategy in the past swing control voice input information is smaller than the number of the first parts to be controlled corresponding to the first layer of past voice input information processing instruction, removing the past swing control voice input information;

The fourth layer of past voice input information processing instructions may define the number of fourth to-be-controlled parts. If the number of the related control components corresponding to the goods of furniture control execution strategy in the past goods of furniture control voice input information is larger than the number of the fourth to-be-controlled parts corresponding to the fourth layer of past voice input information processing instruction, the past goods of furniture control voice input information is removed;

the second-layer past voice input information processing instruction may specify the number of second portions to be controlled. If the swing control execution strategy in the past swing control voice input information is matched with the number of the second to-be-controlled parts corresponding to the second layer of the past voice input information processing instruction, a reconstruction index is calculated and set, and the reconstruction index is used for carrying out voice fine granularity expansion on part of the past swing control voice input information, so that the expanded past swing control voice input information is obtained. The expanded information can form a goods of furniture control voice input information sample set;

similarly, the third layer past voice input information processing instruction may specify the number of third sites to be controlled. If the swing control execution strategy in the past swing control voice input information matches the number of the third to-be-controlled parts corresponding to the third layer of the past voice input information processing instruction, the past swing control voice input information is also formed into a swing control voice input information sample set.

Through the previous voice input information processing instruction of the different layers, the ornament control voice input information sample related to the current control task event can be obtained according to the number of the control parts and the related characteristics. Such processing helps to improve the quality, applicability, and accuracy of the data, providing a useful sample for training and optimization of control task event recognition algorithms.

In step 2131, the past ornament control voice input information is processed based on the comparison between the first-layer past voice input information processing instruction and the number of the control-related parts in the setting selection instruction. Specifically, if the number of the control components corresponding to the past-cargo control execution policy in the past-cargo control voice input information is smaller than the number of the first to-be-controlled parts corresponding to the first-layer past-voice input information processing instruction, or if the number of the control components is larger than the set number of the fourth to-be-controlled parts corresponding to the fourth-layer past-voice input information processing instruction, the past-cargo control voice input information is removed from the sample set. For example, assume that the first layer past voice input information processing instruction specifies that the number of first portions to be controlled is 3. While some past ornament control voice input information ornament control execution strategy only involves 2 parts. According to the setting selection instruction, the number of the control related components is found to be smaller than the number of the first parts to be controlled corresponding to the first layer of past voice input information processing instruction. In this case, the past ornament control voice input information is removed because it does not meet the set condition. Similarly, if the number of parts involved in the motion control execution strategy in the motion control voice input information exceeds the set number of fourth to-be-controlled parts corresponding to the fourth-layer motion input information processing instruction, the motion control voice input information is removed. Through the processing in step 2131, the past ornament control voice input information with the number of the related control parts not meeting the requirement can be eliminated, so that the quality and the applicability of the sample set are improved. This helps to ensure that the data used is more relevant and accurate to the current control task event.

In step 2132, the past ornament control voice input information is processed according to the matching between the second-layer past voice input information processing instruction and the number of the related control parts in the setting selection instruction. Specifically, when the number of the second to-be-controlled parts corresponding to the second layer past voice input information processing instruction is matched with the past piece control execution strategy in the past piece control voice input information, a reconstruction index is calculated and set, and voice fine granularity expansion is performed on part of the past piece control voice input information by using the index, so that expanded past piece control voice input information is obtained. These expanded information will constitute a set of control speech input information samples. For example, assume that the second-layer past voice input information processing instruction specifies that the number of second sites to be controlled is 2. The existing past ornament control voice input information has the scheme that the ornament control execution strategy exactly involves 2 parts and is matched with the second-layer past voice input information processing instruction. In this case, the set reconstruction index is calculated and used to speech fine-grained expansion of the past cargo control speech input information. By extension, more detailed and fine voice input information can be obtained, and diversity and coverage of sample sets are further increased. Through the processing of step 2132, voice fine granularity expansion can be performed on a part of past ornament control voice input information, so that a sample set is enriched and more comprehensive and specific information is provided. This helps to improve the training and optimization of the algorithm to better accommodate a variety of different control task events.

In step 2133, the past ornament control voice input information is processed according to the matching between the third-layer past voice input information processing instruction and the number of the control-related parts in the setting selection instruction. Specifically, when the number of the third to-be-controlled parts corresponding to the third layer past voice input information processing instruction is matched with the past piece control execution strategy in the past piece control voice input information, the past piece control voice input information is formed into a piece control voice input information sample set. For example, assume that the third-layer past voice input information processing instruction specifies that the number of third sites to be controlled is 4. The number of parts related to the existing past ornament control voice input information in the ornament control execution strategy is exactly matched with the processing instruction of the third-layer past voice input information. In this case, the past cargo control speech input information would be incorporated into the cargo control speech input information sample set. By the processing in step 2133, the past ornament control voice input information matched with the third-layer past voice input information processing instruction can be collected to form a sample set. Such processing helps to extract samples related to the number of specific locations, providing more targeted training data for control task event recognition algorithms.

Further, there is also provided a computer-readable storage medium having stored thereon a program which, when executed by a processor, implements the above-described method.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus and method embodiments described above are merely illustrative, for example, flow diagrams and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present invention may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a network device, or the like) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The ornament voice recognition method based on artificial intelligence is characterized by being applied to an AI voice recognition processing system and comprising the following steps:

2. The method of claim 1 wherein performing speech description vector mining on the to-be-recognized item of furniture control speech input information via the first set of speech description vector mining branches of the control task event recognition algorithm to obtain control task event demand description data comprises:

3. The method of claim 1, wherein loading the control task event demand description data into a second speech description vector mining branch set according to the control task event recognition algorithm to obtain control task relative distribution description data corresponding to a target control task event in the to-be-recognized ornament control speech input information, comprises:

4. The method of claim 1, wherein generating the ornament control execution policy corresponding to the target control task event based on the control task relative distribution description data comprises:

5. The method of claim 1, wherein the method of debugging the control task event recognition algorithm comprises:

6. The method of claim 5, wherein the method further comprises:

7. The method of claim 5, wherein obtaining the set of shot control voice input information samples, the shot control voice input information samples in the set of shot control voice input information samples comprising shot control execution policy samples corresponding to control task event samples, comprises:

8. The method of claim 7, wherein the setting selection instruction includes a first layer of past voice input information processing instruction, a second layer of past voice input information processing instruction, a third layer of past voice input information processing instruction, and a fourth layer of past voice input information processing instruction, the obtaining a voice input information processing condition corresponding to each of the past voice input information based on the number of control parts of the past voice input control execution policy and the associated feature, and performing past voice input information processing on the corresponding past voice input information according to the voice input information processing condition to obtain a sample of the voice input information, the sample of the voice input information including a sample of the voice input control execution policy corresponding to the sample of the task event, the method comprising:

9. An AI speech recognition processing system, comprising a processor and a memory; the processor is communicatively connected to the memory, the processor being configured to read a computer program from the memory and execute the computer program to implement the method of any of claims 1-8.

10. A computer readable storage medium, characterized in that a program is stored thereon, which program, when being executed by a processor, implements the method of any of claims 1-8.