CN118092652A

CN118092652A - Man-machine interaction control method based on acoustic wearable equipment

Info

Publication number: CN118092652A
Application number: CN202410227238.1A
Authority: CN
Inventors: 曲雯毓; 吴原原; 马仁杰; 佟鑫宇; 陈建成; 钟臻哲
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2024-02-29
Filing date: 2024-02-29
Publication date: 2024-05-28

Abstract

The invention discloses a man-machine interaction control method based on acoustic wearable equipment, belonging to the technical field of indoor motion tracking methods; according to the invention, two loudspeakers in an audio system are used as signal transmitters to transmit pre-modulated FMCW signals, and the pre-modulated FMCW signals are received by microphones in mobile equipment such as smart phones and headsets to analyze the information of the distance, angle and speed fused in the signal transmission process, so that the motion tracking and man-machine interaction control based on the acoustic wearable equipment are finally realized. According to the technical method, the system can track the position and direction information of the acoustic mobile equipment, so that the action information of the user is mapped into the input of the system, and a novel man-machine interaction mode is realized in intelligent scenes such as AR, VR and the like.

Description

Man-machine interaction control method based on acoustic wearable equipment

Technical Field

The invention relates to the technical field of indoor motion tracking methods, in particular to a human-computer interaction control method based on acoustic wearable equipment.

Background

The internet of things (Internet of Things, ioT) plays an important role in current and future generation information, network and communication development and applications, with extraordinary significance for the advancement of human society. With the increasing demand of modern applications for Location information acquisition, location-Based services (LBS) have become one of the most central intelligent services in the field of internet of things. Diversified LBS applications highlight the importance of acquiring accurate real time location knowledge, including indoor location information and outdoor location information. In an outdoor environment, a global satellite navigation system (Global Positioning System, GPS) is developed and mature, and the positioning accuracy can reach below 5 meters generally, and even reach the accuracy of sub-meter level. However, in indoor environments, GPS often fails to function properly due to the obstruction of buildings and complex multipath information. Indoor positioning problems are thus a challenge to be solved, and lack of sophisticated wireless alternatives. To fill this gap, academia and industry have invested in a great deal of research to explore innovative indoor positioning solutions to meet people's demands for indoor positioning, including wireless signal-based positioning technologies, inertial navigation systems, visual positioning, and the like. These studies aim at improving the accuracy and reliability of indoor positioning, and promote the development and application of indoor positioning technology.

More accurate positioning techniques may support richer LBS applications, such as enabling location and direction awareness of mobile devices to support innovative human-machine interactions. Traditional indoor positioning techniques can be classified into passive positioning and active positioning according to whether the user carries the device. Passive localization, also known as equipment-free localization tracking, is based on the tracking of reflected wireless signals, which is based on the collection and analysis of data by sensing and analyzing the changes in the signals after reflection by the human body and the environment, thereby inferring the location and behavior of the target. At present, the indoor passive positioning tracking research is still in an early stage, and the accuracy and the robustness of the system are poor, so that the industrialization of science and technology is difficult to realize. Active tracking, also known as device-based tracking, is based on the exchange of information between a mobile device (e.g., a cell phone, a smart watch, a tablet, etc.) carried by a user and a base station, i.e., the active participation of the mobile terminal is utilized to locate and track a target. The active positioning has the advantages of high reliability of the signal link and extremely strong positioning tracking robustness.

In indoor scenes, typical applications such as smart home, somatosensory games, virtual reality VR and augmented reality AR put higher demands on various man-machine interaction control technologies, wherein the key is how to realize high-precision motion pose tracking. Recent work has focused on using acoustic signals to perform indoor positioning, which has two distinct advantages:

1) The positioning tracking precision based on acoustics is high. In the method based on the wireless sensor, compared with the high propagation speed (light speed) of electromagnetic wave signals such as WIFI, bluetooth, RFID, UWB and the like and the positioning accuracy of decimeter level, the acoustic signal is used as a mechanical wave, the propagation speed is only 340m/s, and the positioning accuracy can reach millimeter level.

2) The acoustic equipment is low in price and huge in hardware basis. Compared with the price of hardware devices such as WIFI, bluetooth, infrared light cameras and the like, which varies from hundreds to thousands of yuan, the price of the loudspeaker and the microphone is extremely low, and is usually only within tens of yuan. In addition, the acoustic equipment is integrated on various wearable intelligent equipment such as mobile phones, earphones, watches and the like, so that the acoustic motion tracking system has strong expansibility and commercial deployment potential.

Based on the above, the invention provides a novel man-machine interaction control method based on the acoustic wearable device by utilizing commercial sound equipment and intelligent wearable mobile equipment which are common in daily life, and a prototype system is completed by utilizing commercial equipment.

Disclosure of Invention

The invention aims to provide a man-machine interaction control method based on an acoustic wearable device to solve the problems in the background technology. The invention realizes an indoor active tracking scheme based on commercial sound equipment and mobile acoustic intelligent equipment, thereby providing various novel interaction modes for users and machines.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

a man-machine interaction control method based on an acoustic wearable device comprises the following steps:

S1, designing a dual-link code division multiplexing signal model: designing a code division signal model from the angle of the signal shape, wherein the model is provided with a special encoder and a special decoder, and the core of the model is an x-chirp signal which is used for separating a pair of signals with overlapped time domains and frequency domains so as to realize the concurrent transmission and analysis of two loudspeaker channels;

S2, motion characteristic joint estimation: introducing an arrival angle ambiguity estimation mechanism, providing a joint estimation algorithm based on signal second-order modulation, analyzing a received signal, reducing the influence of Doppler effect, and enhancing the estimation of distance and arrival angle motion parameters;

S3, fusion tracking of the position and the direction: based on S1-S2, motion feature perception under a microphone array coordinate system in the mobile device is obtained through calculation, and then a geometric relation is established according to microphone layout and use characteristics of different devices to calculate the position and the direction of the mobile device under a world coordinate system.

Preferably, the S1 specifically includes the following:

S1.1, designing an encoder to support dual-link orthogonal coding: setting a left speaker and a right speaker to simultaneously transmit FMCW signals with opposite slopes, wherein the frequency of the FMCW signals sent by the left speaker is linearly increased along with time and is recorded as an upper chirp signal; the frequency of the FMCW signal sent by the right loudspeaker is linearly reduced and is recorded as a lower chirp signal; the signal transmitted in each time slot is represented as:

wherein f ₀、f₁, B, T represent the start frequency, end frequency, bandwidth and duration of the chirp signal, respectively, and f ₁-f₀ =b;

After experiencing signal time-of-flight delays τ _↑ and τ _↓ for the left and right links, respectively, the microphone receives a mixed signal for both links:

Wherein a _↑ and a _↓ represent the amplitudes of the left link signal and the right link signal, respectively;

s1.2, designing a decoder for the mixed signal to separate two links: firstly, proving that a decoder meets the requirement of the autocorrelation and orthogonality of CDMA; when the upper chirp signal modulates itself with a product constant equation:

Wherein F _low (·) represents low-pass filtering; Representing signal mixing;

the frequency is obtained by the calculation The single-chord signal of (2) shows a peak on the spectrogram, and the autocorrelation is satisfied; the mixing part of the mutual signal modulation of the upper chirp signal and the lower chirp signal is as follows:

the above formula is utilized to generate white noise with the range of 0-kHz, the energy distribution in the frequency spectrum is uniform, and the orthogonality is satisfied;

S1.3, mixing the received signals with an upper chirp signal and a lower chirp signal respectively by using a product constant equation, converting the signals into complex signals by using Hilbert transformation through a low-pass filter, and obtaining two mixed signals corresponding to a left link and a right link:

Wherein the < · > part represents an orthogonal part, which is white noise; they are lines uniformly distributed in energy and have no effect on the peaks of the mixed signal spectrum.

Preferably, the S2 specifically includes the following:

s2.1, setting a linear microphone, and taking the position of the first microphone as a reference point, representing delay caused by the distance d between the sound and the microphone as:

assuming that the adjacent microphones are spaced apart by r and the target is at azimuth θ, the signal received by the ith microphone will be shifted by an additional distance (i-1) rcos θ compared to the signal received by the 1 st microphone in the array, and thus the delay caused by the angle of arrival θ is expressed as:

S2.2, taking the influence of Doppler effect on signal frequency into consideration, and taking the triangular chirp into a signal model, namely, transmitting an inverse chirp signal by a loudspeaker in an adjacent time slot; modeling signals from a single path for purposes of illustration, when transceivers are far from each other, the receiving frequency decreases, equivalent to a rightward shift of up chirp signals and a leftward shift of down chirp signals on a spectrogram, and the equivalent time of flight of doppler velocity is expressed as:

Wherein f ^R、f^T denotes the frequency of the received and transmitted signals;

s2.3, based on S2.2, the equivalent time delays of the up chirp and the down chirp receiving signals of the ith microphone are respectively expressed as:

Performing a second signal modulation in two consecutive segments of the triangular signal sequence to cancel the signal phase shift caused by the doppler effect; will be Sequence flipped version and/>Mixing to obtain a second-order mixed signal of the ith microphone:

S2.4, performing joint estimation in a search space, specifically, firstly constructing a joint estimator according to a theoretical formula of a second-order mixed signal:

Wherein, Respectively representing estimated values of the distance and the azimuth;

S2.5, removing the accurate joint estimator from the measurement sample of the second-order mixed signal to obtain a phase value of 0 (i.e ) Is a signal of (2); thus, the final optimization objective is:

preferably, the S3 specifically includes the following:

s3.1, aiming at position tracking, estimating the distance between the mobile equipment and the left speaker and the right speaker based on the joint estimator in S2 And/>Further solving the position information of the equipment through a triangulation relation, wherein the specific function is expressed as follows:

wherein D represents the distance between the left speaker and the right speaker;

S3.2, for direction tracking, designing different device direction estimation modes for different microphone layouts and application characteristics based on the estimated arrival angle characteristics of the signals, wherein the device direction estimation modes specifically comprise:

For a headset-based face attention detection application scenario, detecting a face orientation of a user in a horizontal direction using left and right ANC microphones as a set of microphone arrays; detecting a face orientation of a user in a vertical direction using an ANC microphone and a voice microphone of the left ear;

For an air mouse application program based on a mobile phone, detecting the pitch angle of the mobile phone by using an upper voice microphone and a lower voice microphone of the mobile phone, and setting a threshold of 60 degrees as a mark for executing a mouse click event of the mobile phone.

Preferably, a set of intelligent control programs based on the acoustic wearable device is designed based on the control method, wherein the programs comprise a headset-based attention detection module and a smart phone-based air mouse module.

Compared with the prior art, the invention provides a human-computer interaction control method based on an acoustic wearable device, which has the following beneficial effects:

(1) The invention realizes an indoor human-computer interaction control method based on an acoustic wearable device. According to the technical method, the system can track the position and direction information of the acoustic mobile equipment, so that the action information of the user is mapped into the input of the system, and a novel man-machine interaction mode is realized in intelligent scenes such as AR, VR and the like.

(2) The system realized by the invention has extremely strong expansibility and commercial deployment potential. Currently, televisions, computers, etc. on the market are equipped with two or more speakers, and most intelligent wearable devices are equipped with microphone components, such as smart phones, smart watches, headsets, smart glasses, etc., which can be used as controllers of audio-visual systems.

(3) The invention provides a novel FMCW-based code division multiplexing signal model, which realizes dual-channel concurrent transmission through a unique signal coding and decoding technology, so that the whole time and bandwidth resources are distributed to two loudspeakers, and the tracking precision of an FMCW system is improved from a physical aspect.

(4) The invention designs a time-frequency domain fusion multi-resolution signal characteristic analysis method. The method combines a triangular chirp signal model, solves the ambiguity in the range-speed-angle-of-arrival estimation by a novel second-order hybrid technology and a multi-resolution joint estimation technology, and improves the tracking precision of an FMCW system from an algorithm level.

(5) The present invention has devised a number of experiments in a number of scenarios to verify the capabilities of the system. The invention implements a set of facial attention detection programs and air mouse programs using commercial speakers, headphones, and smartphones.

Drawings

Fig. 1 is a flowchart of a man-machine interaction control method based on an acoustic wearable device;

fig. 2 is a schematic diagram of a dual-link code division multiplexing signal model in embodiment 1 of the present invention;

FIG. 3 is a schematic diagram of the signal delay mentioned in embodiment 1 of the present invention;

FIG. 4 is a schematic diagram of the triangular chirp model mentioned in example 1 of the present invention;

FIG. 5 is a schematic diagram of the position and direction tracking mentioned in example 1 of the present invention;

fig. 6 is a view showing the effect of detecting the attention of a face based on a headphone as mentioned in embodiment 1 of the present invention;

Fig. 7 is an air mouse effect diagram based on a smart phone according to embodiment 1 of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments.

The invention aims to provide a man-machine interaction control method by utilizing acoustic wearable equipment. The basic principle is that two sounds of an audio system respectively send mutually orthogonal FMCW signals, and the FMCW signals are received and analyzed by a microphone assembly in an acoustic wearable device, so that the actions of a human body are tracked by means of a tracking mobile device, and intelligent man-machine interaction is realized.

Term interpretation:

human-computer interaction (HCI) technology is a technology that is focused on studying how humans and machines exchange information. Including machines understanding user behavior, designing user interfaces, and improving operational experience. With the advent of the intelligent internet of things, various intelligent devices and sensor layers are endless, and the devices provide more interaction interfaces and data input modes for users, so that more intelligent and personalized interaction experience can be created for the users, for example, man-machine interaction is performed through voice control, gesture recognition, biological feature recognition and other modes.

Indoor motion tracking technology is a method for tracking indoor activities by utilizing technologies such as computer vision, sensors and the like so as to implement man-machine interaction. The method based on computer vision detects key points in the video through an AI model and transforms a coordinate system, thereby completing motion tracking. Sensor-based methods typically utilize IMUs (accelerometers, gyroscopes, magnetometers, etc.) or wireless sensing elements (WIFI, bluetooth, acoustic microphones, infrared cameras, etc.) to analyze motion data to perform tracking. Due to privacy security, dim light environment and accumulated error problems, and the integration of more and more sensors by intelligent wearable devices, tracking based on wireless sensors becomes a popular indoor motion tracking solution.

The multiple access multiplexing technique is a technique for transmitting a plurality of user data over the same communication channel. It allows multiple users to use shared communication resources simultaneously, improving the utilization of the communication channel. This is particularly important in wireless communication systems, which include a variety of multiple access multiplexing techniques such as Time Division Multiple Access (TDMA), frequency Division Multiple Access (FDMA), code Division Multiple Access (CDMA), and the like. Time division multiple access divides time into a plurality of time slots, and each user transmits data in different time slots; frequency division multiple access divides a frequency spectrum into different sub-channels, and each user transmits data in different frequency bands; code division multiple access uses different codes to distinguish between data of different users. The multiple access multiplexing technology is widely applied to the communication field and plays an important role in improving the communication efficiency and capacity.

The doppler effect is a physical phenomenon that describes the wavelength and frequency changes caused by the relative motion between the source and the observer directly. The difference between the transmitted and received frequencies caused by the doppler effect, known as doppler shift, reveals the law of variation of the wave properties in motion. The Doppler shift has a value proportional to the velocity of motion, expressed asWhere f is the emission frequency of the wave, c is the propagation velocity of the wave, v is the relative movement velocity of the observer, Δf is positive when they are close to each other, and Δf is negative when they are far from each other. Doppler shift is typically used to measure the frequency difference between the transmitted wave and the echo to obtain the velocity of the target object.

Based on the above, the human-computer interaction control method based on the acoustic wearable device provided by the invention is described below with reference to specific examples, and the specific examples are as follows.

Example 1:

In order to ensure the adaptability of the test environment, the invention builds a system prototype on a common desktop space of 2m multiplied by 1m by using commercial speakers, headsets and smart phones. The two speakers are connected with the desktop computer and placed on both sides of a 27-memory display with a spacing of 0.7m. The invention generates two orthogonal triangular chirp of 18-22kHz with duration of 0.08s and is played back in cycles from two tracks. Because the earphone manufacturer does not open the noise reduction microphone interface, the invention uses the YDM8Mic recording board to connect the three microphones and respectively sticks the three microphones at the left and right ANC microphones and the voice microphone positions, and the configuration is completely consistent with the commercial ANC earphone. In addition, the invention develops a sound box Zhuo Ying for transmitting the binaural recording data to the computer in real time by utilizing websocket. The desktop executes python code to launch the attention detection and air mouse applications.

Referring to fig. 1, the invention provides a man-machine interaction control method using an acoustic wearable device, and the main flow of the man-machine interaction control method is divided into three parts of designing a dual-link code division multiplexing signal model, analyzing three-dimensional motion characteristics, and fusing and tracking positions and postures, and the specific contents are as follows.

(1) A dual-link code division multiplexing signal model part: the basis of 2D tracking is to estimate motion characteristics using microphone observations of two loudspeakers. In a multi-speaker system, existing sense signal transmission strategies can be divided into TDMA and FDMA, respectively, resulting in inherent limitations: i) Time efficiency is sacrificed. Each speaker queues a full band for transmission, allowing processing of only one path signal at a time; ii) sacrificing accuracy by reducing bandwidth. To resolve multipath signals, different speakers are assigned to transmit signals of different frequencies. One potential method of overcoming the inherent limitations is a CDMA scheme that can share time and frequency resources between different transmitters, thereby avoiding resource partitioning. Thus, the present invention designs a code division signal model from the point of view of signal shape, with dedicated encoder and decoder. The core of the method is an x chirp signal, and a pair of signals with overlapped time domain and frequency domain can be separated, so that the concurrent transmission and analysis of two loudspeaker channels are realized.

A dual speaker system requires two signals to independently carry motion data. In a CDMA scenario, the two signals share the same resources in terms of time and frequency allocation. Thus, their separation and modulation requires implementation of a well-designed encoding and decoding scheme. In general, CDMA imposes two specific requirements on signal coding:

1) Orthogonality: the different signal coding sequences must be different and mutually orthogonal; .

2) Autocorrelation: the signal code sequence exhibits a strong correlation with itself.

The invention designs an encoder to support dual-link orthogonal coding. Specifically, the invention allows two speakers to simultaneously transmit FMCW signals having opposite slopes. The FMCW signal from the left speaker is characterized by a linear increase in frequency over time (upper chirp), while the FMCW signal from the right speaker is characterized by a linear decrease in frequency (lower chirp). The signal transmitted in each time slot can be expressed as:

Where f ₀、f₁, B, T denote the start frequency, end frequency, bandwidth, and duration of the chirp signal, respectively, and f ₁-f₀ =b. After experiencing signal time-of-flight delays τ _↑ and τ _↓ for the left and right links, respectively, the microphone receives a mixed signal for both links:

Where a _↑ and a _↓ represent the amplitudes of the left and right link signals, respectively.

The present invention then designs a decoder for the mixed signal to separate the two links. The decoder was first demonstrated to meet the autocorrelation and orthogonality requirements of CDMA. When the upper chirp (lower chirp satisfies the same conclusion) and itself use the product constant equation (i.e) When signal modulation is performed:

Wherein F _low (·) represents low-pass filtering; representing signal mixing. The result of the above equation is a frequency/> The single-chord signal of (2) shows a peak on the spectrogram, and the autocorrelation is satisfied. While the mixing part of the mutual signal modulation of the upper chirp and the lower chirp is that

The above equation produces a white noise in the range of 0 to kHz with a uniform energy distribution in the spectrum, thus satisfying orthogonality. Therefore, the invention mixes the received signal with the upper chirp and the lower chirp respectively by using the product constant equation, and converts the signal into complex signal by using the Hilbert transform through the low pass filter to obtain two mixed signals corresponding to the left link and the right link

Wherein the < · > portion represents an orthogonal portion, which is white noise. They are lines uniformly distributed in energy and have no effect on the peaks of the mixed signal spectrum. This part of the principle is shown in fig. 2.

(2) Motion feature joint estimation section: the position and orientation information of the mobile device is continuously tracked depending on fine-grained three-dimensional motion feature resolution. However, the doppler effect introduces a frequency shift between the received signal and the transmitted signal, thereby introducing ambiguity between the range and velocity estimates; in addition, the low resolution of frequency detection results in coarse granularity of the distance estimate, thereby introducing ambiguity in the angle-of-arrival estimate. Therefore, the invention provides a joint estimation algorithm based on signal second-order modulation to analyze the received signal, lighten the influence of Doppler effect and enhance the estimation of distance and arrival angle motion parameters. Specifically, for Doppler effect, the invention incorporates a triangle chirp into the signal model, i.e. two sides thereof are mixed to execute a new second order modulation so as to solve the speed-distance ambiguity; in addition, joint estimation is further performed on the second-order mixed signal. Finally, the part outputs the fine-grained distance and arrival angle information.

During the propagation of the signal, it will be affected by distance, angle of arrival and speed, resulting in a delay τ of signal reception.

Considering a linear microphone, as shown in fig. 3, with the position of the first microphone as the reference point, the delay caused by the distance d between the sound and the microphone can be expressed as

The effect of the doppler effect on the signal frequency is taken into account by the triangular chirp into the signal model, i.e. the loudspeaker transmits the inverse chirp in adjacent time slots. Since the two links are orthogonal to each other, the signal is modeled from a single path for purposes of illustration, in order to simplify analysis. When the transceivers are far apart from each other, the reception frequency decreases, which can be equivalent on the spectrogram to a rightward shift of up chirp (i.e., increasing signal flight time) and a leftward shift of down chirp (i.e., decreasing signal flight time), the equivalent flight time of the Doppler velocity is expressed as

Where f ^R、f^T denotes the frequency of the received and transmitted signals, the principle of which is shown in fig. 4.

Thus, the equivalent delays of the up chirp and down chirp received signals of the i-th microphone can be expressed as:

Since the doppler effect has an opposite effect on the upper chirp and the lower chirp, in two consecutive segments of the triangular signal sequence, a second signal modulation is performed to cancel the signal phase shift caused by the doppler effect. Will be Sequence flipped version and/>Mixing to obtain a second-order mixed signal of the ith microphone

Then, joint estimation is performed within the search space. Specifically, the joint estimator is first constructed from the theoretical formula of the second order mixed signal:

Wherein the method comprises the steps of Representing the estimated values of the distance and azimuth, respectively. If the exact joint estimator is removed from the measured samples of the second order mixed signal, a phase value of 0 (i.e./>) Is a signal of (a). Thus, the final optimization objective of the present invention is:

(3) Position and orientation fusion tracking: according to the steps, motion feature perception under a microphone array coordinate system in the mobile device is obtained through calculation, and then a special geometric relation is established according to microphone layout and use characteristics of different devices to calculate the position and the direction of the mobile device under a world coordinate system. Finally, in the invention, a set of intelligent control program based on the acoustic wearable device is realized, and the intelligent control program comprises a headset-based attention detection module and a smart phone-based air mouse module.

For position tracking, based on the joint estimator, the distance between the mobile device and two speakers can be estimatedAnd/>And further solving the location information of the device by triangulation relations. The location of the device can be expressed as:

wherein D represents the distance between the left and right speakers.

For direction tracking, the angle of arrival characteristics of the signal have been estimated, further, different device direction estimation modes can be designed for different microphone layouts and application features. For a headset-based face attention detection application scenario, face orientation in the horizontal direction of the user (i.e., pan) is detected using two ANC microphones left and right as a set of microphone arrays; in addition, the ANC microphone and the voice microphone of the left ear are used to detect the face orientation of the user in the vertical direction (i.e., head up and down). For an air mouse application program based on a mobile phone, detecting the pitch angle of the mobile phone by using an upper voice microphone and a lower voice microphone of the mobile phone, and setting a threshold of 60 degrees as a mark for executing a mouse click event of the mobile phone. This partial principle is shown in fig. 5.

The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.

Claims

1. The human-computer interaction control method based on the acoustic wearable device is characterized by comprising the following steps of:

2. The human-computer interaction control method based on the acoustic wearable device according to claim 1, wherein the S1 specifically includes the following:

Wherein F _low (·) represents low-pass filtering; Representing signal mixing;

3. The human-computer interaction control method based on the acoustic wearable device according to claim 2, wherein the S2 specifically includes the following:

s2.5, removing an accurate joint estimator from a measurement sample of the second-order mixed signal to obtain a signal with a phase value of 0; thus, the final optimization objective is:

4. the human-computer interaction control method based on the acoustic wearable device according to claim 3, wherein the S3 specifically includes the following: