CN114664032A

CN114664032A - Voice broadcasting method, system, device and readable storage medium

Info

Publication number: CN114664032A
Application number: CN202210273478.6A
Authority: CN
Inventors: 周婧; 鲁健; 郭俊琪; 李俊
Original assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Current assignee: Shanghai Sensetime Intelligent Technology Co Ltd
Priority date: 2022-03-18
Filing date: 2022-03-18
Publication date: 2022-06-24

Abstract

The application discloses a voice broadcasting method, a system, equipment and a readable storage medium, wherein the method comprises the following steps: acquiring first broadcast data, and acquiring a first target audio matched with the first broadcast data; the method and the device have the advantages that the first target audio is sent to the playing terminal associated with the target scene in a wireless transmission mode, the first broadcast data are generated in response to the fact that the alarm event exists in the target scene, the first target audio is used for being provided for the playing terminal to play, and through the mode, voice broadcast can be conducted on the alarm event.

Description

Voice broadcasting method, system, device and readable storage medium

Technical Field

The present application relates to the field of voice processing technologies, and in particular, to a voice broadcasting method, system, device, and readable storage medium.

Background

In some scenes, the server processes image data of a target scene acquired by the scene information acquisition device, and can determine whether special conditions exist in the target scene, specifically, for example, a baby carriage, a large luggage compartment, a person falling down, or the like appear on an escalator, or whether an authorized person enters the target scene is identified and determined, and the like. In the above scenario, a user needs to be prompted by a corresponding voice broadcast, so that the user is prompted to have a risk.

Disclosure of Invention

The application at least provides a voice broadcasting method, a system, equipment and a readable storage medium.

The application provides a voice broadcasting method, which comprises the following steps: acquiring first broadcast data; acquiring a first target audio matched with the first broadcast data; and sending a first target audio to a playing terminal associated with the target scene in a wireless transmission mode, wherein the first broadcast data is the first target audio generated in response to the alarm event existing in the target scene and is provided for the playing terminal to play.

Therefore, the first target audio corresponding to the first broadcast data is determined by receiving the first broadcast data, and the first target audio is wirelessly transmitted, so that the first target audio is played by the playing terminal associated with the wirelessly connected target scene, and the voice broadcast of the alarm event is realized.

The first broadcast data comprise event types to which alarm events exist in a target scene, and a first target audio matched with the first broadcast data is acquired, and the method comprises the following steps: and selecting a preset audio matched with the event type in the first broadcast data as a first target audio.

Therefore, the audio matched with the event type in the first broadcast data is selected from the preset audio to serve as the first target audio for voice broadcast, and different audio can be adopted for broadcasting the alarm events of different event types, so that the voice broadcast is more targeted, and the risk can be prompted in a targeted manner.

Before the first broadcast data is acquired, the method further comprises: and sending a subscription request containing the alarm event information to a server to request the server to respond to the alarm event of which the target scene has the alarm event matched with the alarm event information, and pushing broadcast data related to the alarm event.

Therefore, the voice processing device can obtain broadcast data corresponding to the alarm event needing voice broadcast from the server in a targeted manner by sending the subscription request containing the alarm event information to the server, so as to perform voice broadcast in a targeted manner.

Wherein the alarm event information includes at least one event type of the alarm event.

Therefore, the voice processing device can subscribe the broadcast data corresponding to the event type of one or more alarm events to the server, so as to perform voice broadcast.

Wherein sending a subscription request containing alarm event information to a server comprises: and packaging the alarm event information and a receiving address to obtain a subscription request, and sending the subscription request to a server, wherein the receiving address is an address for receiving broadcast data related to the alarm event information.

Therefore, the server can subscribe the broadcast data matched with the alarm event information to the receiving address by sending the subscription request obtained by packaging the alarm event information and the receiving address, so that the broadcast data corresponding to the alarm event needing to be broadcast can be obtained for voice broadcast.

The method for sending the first target audio to the playing terminal associated with the target scene in the wireless transmission mode comprises the following steps: in response to the fact that the plurality of first broadcast data are received, sending a first target audio corresponding to each first broadcast data to the playing terminal according to a preset sequence; or in response to the first broadcast data containing event types of a plurality of alarm events, sending a first target audio corresponding to each event type in the first broadcast data to the play terminal according to a preset sequence.

Therefore, under the condition that a plurality of broadcasts need to be carried out simultaneously, can send corresponding target audio in proper order according to predetermineeing the order, broadcast in proper order.

After the first target audio is sent to the playing terminal associated with the target scene in a wireless transmission mode, the method further comprises the following steps: acquiring second broadcast data, wherein the second broadcast data is generated by the server in response to the alarm event existing in the target scene; and sending a second target audio matched with the second broadcast data to the playing terminal in a wireless transmission mode so that the playing terminal stops playing the first target audio and plays the second target audio.

Therefore, in the process that the playing terminal still plays the target audio, if the voice processing device receives the latest broadcast data, the target audio corresponding to the latest broadcast data can be sent instead, so that the playing terminal can play the target audio corresponding to the latest broadcast data instead, and the voice broadcast is more timely.

Wherein, acquire first report data, include: an event detection module of the server receives image data obtained by shooting a target scene, responds to the image data that an alarm event exists, and generates first broadcast data related to the alarm event; the event detection module pushes the first broadcast data to a voice processing module of the server, wherein the voice processing module is used for acquiring a first target audio matched with the first broadcast data and subsequent steps of the first target audio.

Therefore, the server can also be used as a voice processing device, namely, the server generates broadcast data, acquires matched audio based on the broadcast data and sends the audio to the playing terminal.

Wherein, the target scene is an escalator or an elevator.

Therefore, when the escalator or the elevator generates an alarm event, the alarm event can be subjected to voice broadcasting, and corresponding risks are prompted.

The method for sending the first target audio to the playing terminal associated with the target scene in the wireless transmission mode comprises the following steps: and controlling the wireless audio transmitter to transmit the first target audio to a wireless audio receiver connected with the playing terminal.

Therefore, the voice processing device can wirelessly control the playing terminal to play audio, and correspondingly performs voice broadcast under the condition that the alarm event occurs.

The application provides a voice broadcasting system which comprises voice processing equipment, playing equipment associated with a target scene and scene information acquisition equipment, wherein the scene information acquisition equipment is used for acquiring scene information of the target scene, and the scene information is used for determining whether an alarm event exists in the target scene; the voice processing device is used for acquiring first broadcast data, acquiring a first target audio matched with the first broadcast data, and sending the first target audio to a playing terminal associated with a target scene in a wireless transmission mode, wherein the first broadcast data are generated in response to an alarm event existing in the target scene; the playing device is used for playing the first target audio transmitted by the voice processing device through wireless.

The voice broadcasting system also comprises a server, wherein the server is used for detecting whether an alarm event exists in a target scene or not and responding to the alarm event existing in the target scene, and sending broadcasting data about the alarm event to the voice processing equipment; or the voice processing device is used for detecting whether an alarm event exists in the target scene or not, and generating broadcast data about the alarm event in response to the alarm event existing in the target scene.

The scene information acquisition equipment is network camera equipment; and/or the voice playing system also comprises an audio transmitter connected with the voice processing equipment and an audio receiver connected with the playing equipment, and wireless transmission can be realized between the audio transmitter and the audio receiver; the voice processing device sends the first target audio to the audio receiver through the audio sender, and the playing device receives the first target audio through the audio receiver.

The application provides a voice processing device, which comprises a processor and a memory, wherein the memory is used for storing program data, and the processor is used for executing the program data to realize any voice broadcasting method.

The present application provides a computer-readable storage medium for storing program data that can be executed to implement any of the above-described voice broadcasting methods.

In the above scheme, the first target audio corresponding to the first broadcast data is determined by receiving the first broadcast data, and the first target audio is wirelessly transmitted, so that the first target audio is played by the broadcast terminal associated with the wirelessly connected target scene, and the voice broadcast of the alarm event is realized.

Drawings

Fig. 1 is a schematic flowchart of an embodiment of a voice broadcast method according to the present application;

fig. 2 is a schematic flowchart of another embodiment of a voice broadcast method of the present application;

fig. 3 is a schematic flowchart of another embodiment of the voice broadcasting method of the present application;

fig. 4 is a schematic frame diagram of an embodiment of a voice broadcast device according to the present application;

FIG. 5 is a block diagram of an embodiment of a speech processing apparatus according to the present application;

FIG. 6 is a block diagram of a framework for an embodiment of the voice messaging system of the present application;

FIG. 7 is a block diagram of an embodiment of a computer-readable storage medium of the present application.

Detailed Description

In order to make the purpose, technical solution and effect of the present application clearer and clearer, the present application is further described in detail below with reference to the accompanying drawings and examples.

The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship. Further, the term "plurality" herein means two or more than two.

It is understood that the methods of the present application can include any of the method embodiments described below as well as any non-conflicting combinations of the method embodiments described below.

It is understood that the voice broadcasting method in the present application can be executed by a voice processing device, and the voice processing device can be any device with processing capability, such as a tablet computer, a mobile phone, a computer, and the like, and the voice processing device can also be simply referred to as a device. The speech processing device may be a device capable of connecting to and communicating with the server, or the speech processing device may be the same device as the server.

Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a voice broadcasting method according to an embodiment of the present application.

The method comprises the following steps:

step S110: and acquiring first broadcast data.

It should be noted that, the image data server capable of acquiring the target scene through the scene information acquisition device may determine whether an alarm event exists in the target scene based on the image data, and when the alarm event exists, it is necessary to prompt that the alarm event exists through voice broadcast. The first broadcast data is generated in response to the target scene existing alarm event.

It can be understood that under the condition that an alarm event occurs in a target scene, voice broadcasting is required, so that people in the target scene are reminded of risks. For example, the target scene is an escalator, and when a falling event occurs at the escalator, voice broadcasting is required to prompt people at the escalator to have the falling alarm event. The server can be used for judging whether an alarm event exists in a target scene or not, but the server cannot perform voice broadcast when the alarm event occurs. In some cases, the target scene has a playing terminal, for example, the escalator itself can play audio or a broadcast speaker box or the like is provided at the escalator. In some cases, the target scene does not have a play terminal, and then a separate play terminal may be provided at the target scene, for example, placing an active speaker next to an escalator. The server may be disposed in other locations, for example, a computer room, and is used for determining whether an alarm event exists at the target scene. The voice processing device provided by the application can be used for connecting a server and receiving broadcast data sent by the server when an alarm event exists so as to control the playing terminal to play audio.

In a specific application scenario, the target scene may be an escalator or an elevator, the scene information acquisition device is a plurality of network cameras, and may acquire image data at the escalator or the elevator, the server may receive the image data sent by the scene information acquisition device and determine whether an alarm event exists in the image data of the target scene, the alarm event may indicate that a risk exists, and an event type of the alarm event that the server determines may be set by a user according to an actual need, for example, the alarm event corresponding to the escalator or the elevator may include a fall alarm, a reverse travel alarm, a retention alarm, a congestion alarm, a wheelchair alarm, a stroller alarm, a large luggage case alarm, and other alarm types, indicating that a safety accident risk exists. In order to perform voice broadcast to prompt that an alarm event exists currently, the server may generate first broadcast data in response to the image data that the alarm event exists, so that the voice processing device may obtain the first broadcast data to perform voice broadcast.

Step S120: a first target audio matched with the first broadcast data is obtained.

It can be understood that, several audios may be pre-stored in the voice processing device, and different audios may be selected to be sent to the play terminal for different alarm events. After the first broadcast data is obtained, the device may obtain, according to the first broadcast data, a first target audio matched with the first broadcast data, where the first target audio is used to prompt an alarm event corresponding to the first broadcast data existing in a target scene.

Step S130: and sending the first target audio to a playing terminal associated with the target scene in a wireless transmission mode.

The playing terminal associated with the target scene can be any device capable of playing audio to the target scene, the number and the position of the playing terminals can be set by a user according to actual needs, and the playing terminal can receive the audio sent by the voice processing device in a wireless transmission mode and play the audio to prompt people existing in the target scene to have alarm events through the audio.

Generally speaking, a wired connection is needed from the server to the voice processing device and then to the broadcasting terminal, but due to the variety of application environments, there are some situations where the server to the broadcasting terminal cannot satisfy the complete wiring condition, for example, the wiring is affected by factors such as too far distance or fire door obstruction, and then the voice broadcast cannot be realized. The voice processing device sends the first target audio to the playing terminal in a wireless transmission mode, so that the position setting of the voice processing device and the playing terminal is not interfered by environmental factors any more, and the setting can be flexibly performed.

In some embodiments, the target scene may not be limited to an escalator or an elevator, but may also be some areas where a risk needs to be prompted, such as an office building, a garden, and the like. The alarm event may indicate that there is a security accident risk or a security leakage risk, and the like, for example, the target scene is a campus, and the alarm event may include an unauthorized entry, and the like.

Referring to fig. 2, fig. 2 is a schematic flowchart illustrating another embodiment of a voice broadcasting method according to the present application. It is understood that the voice processing device may run a voice broadcast program, where the program includes a service layer, and the relevant steps of the voice processing device in the embodiment of the present application may be executed by the service layer.

The method comprises the following steps:

step S210: and sending a subscription request containing the alarm event information to a server to request the server to respond to the alarm event which is matched with the alarm event information in the target scene, and pushing broadcast data related to the alarm event.

The voice processing device may send a subscription request to the server, where the subscription request includes the alarm event information and is used to request the server to push, to the voice processing device, the announcement data about the alarm event when the alarm event matching the alarm event information exists in the target scene.

The alarm event information includes at least one event type of alarm event, and if the event type of a certain alarm event is the same as one of the event types included in the alarm event information, the alarm event can be considered to be matched with the alarm event information.

It should be noted that the server may determine, based on the image acquired by the scene information acquisition device, whether an alarm event exists in the target scene, the event types of the alarm event that the server can identify may be several, and the event type of the alarm event that the voice processing device subscribes to the server is at least one of the event types of the alarm event that the server can identify.

In a specific application scenario, the target scenario is an escalator, the voice processing device may push a subscription request to the server through a Kafka (distributed publish-subscribe message system) message, where the subscription request includes three types of alarm events, namely, a fall alarm, a reverse alarm and a stroller alarm, so as to request the server to push broadcast data of the alarm event to the voice processing device when the event type of the alarm event occurs is any one of the fall alarm, the reverse alarm and the stroller alarm.

It is to be understood that, in order for the server to send the broadcast data to the voice processing device, the subscription request may further include a receiving address, where the receiving address is an address for receiving the broadcast data related to the alarm event information.

The specific step of sending, by the voice processing device, the subscription request including the alarm event information to the server may be to package the alarm event information and the receiving address to obtain the subscription request, and send the subscription request to the server, where after receiving the subscription request, the server sends, when it is determined that the target scene has an alarm event matching the alarm event information in the subscription request, broadcast data about the alarm event to the receiving address in the subscription request, so that when the target scene has an alarm event matching the alarm event information, the broadcast data about the alarm event is pushed to the voice processing device.

Step S220: and acquiring first broadcast data.

The first broadcast data may include event types to which alarm events exist in the target scene belong, and the voice broadcast device may determine the first target audio that needs to be played based on the event types in the first broadcast data.

In some embodiments, the first report data may include, in addition to the event type of the alarm event, data related to the alarm event, such as image data related to the alarm event, a scene information collection device identification for collecting the image data, and the like. The above-mentioned correlation data can be used in the statistics of alarm events.

Step S230: a first target audio matched with the first broadcast data is obtained.

It can be understood that preset audios corresponding to a plurality of event types can be prestored in the voice processing device, and are used for performing voice broadcast corresponding to different event types, for example, preset audios corresponding to a wheelchair alarm, a stroller alarm, and a large trunk alarm are prestored. Specifically, step S230 may be to select a preset audio matched with the event type in the first broadcast data as the first target audio.

Step S240: and sending the first target audio to a playing terminal associated with the target scene in a wireless transmission mode.

The voice processing device and the playing terminal are connected in a wireless mode through the wireless audio transmitter, so that the voice processing device can send the first target audio to the playing terminal in a wireless transmission mode. In a specific application scenario, the target scenario is an escalator, a sound box is arranged at the escalator to serve as a playing terminal, the voice processing device is a personal computer, the personal computer is connected with a wireless audio transmitter, and the sound box is connected with a wireless audio receiver, so that step S240 can be implemented.

In some embodiments, multiple alarm events may occur simultaneously in the target scene, where multiple alarm events of the same event type may also be included, and for the alarm events of the same event type, the server may perform a merging process on the alarm events of the same type, or may perform separate alarm event processes, where the merging process is described below as an example. In addition, the above multiple alarm events occurring simultaneously may include multiple event types, for example, a case where a stay, a fall, a jam, etc. occur simultaneously on an escalator. Then, in a short time or at almost the same time, the server generates a plurality of first broadcast data corresponding to different types of alarm events, and may send the plurality of first broadcast data to the voice processing device, where each first broadcast data corresponds to an event type of an alarm event, or may also send one first broadcast data including event types of a plurality of alarm events, so that the voice processing device performs voice broadcast on multiple alarm events.

If the server sends a plurality of first broadcast data to the voice processing device, the voice processing device responds to the received first broadcast data, and the device can send first target audio corresponding to each first broadcast data to the playing terminal according to a preset sequence. The preset sequence may be a priority sequence of event types of the alarm event. For example, the first target audio corresponding to the previous event type is preferentially transmitted in the order of a fall alarm, a reverse alarm, a stay alarm, a congestion alarm, a wheelchair alarm, a stroller alarm, a major trunk alarm.

In addition, if the server sends a first broadcast data containing event types of multiple alarm events to the voice processing device, the voice processing device responds that the first broadcast data contain event types of multiple alarm events, and sends a first target audio corresponding to each event type in the first broadcast data to the playing terminal according to a preset sequence.

It can be understood that voice broadcasting of one alarm event may last for a certain time, for example, playing a first target audio may last for 1 minute, multiple alarm events may occur in a target scene successively, a voice processing device may receive multiple pieces of broadcast data successively, each piece of broadcast data is for one alarm event, and if there is no temporal overlap in voice broadcasting of the alarm events that occur successively, that is, when playing a preset audio based on a next piece of broadcast data, a preset audio corresponding to a previous piece of broadcast data has already been played, then the voice processing device may process the preset audio separately, that is, step S220-step S240 may be repeatedly executed. If the device receives new broadcast data and determines a corresponding preset audio during playing of a preset audio corresponding to a certain alarm event, the voice processing device may determine the playing object in multiple ways, for example, the preset audio corresponding to the new broadcast data is used as the audio to be played, and the playing is performed after the playing of the currently played preset audio is completed, or the playing of the current preset audio may be directly stopped, the preset audio corresponding to the new broadcast data is directly played, or the playing object may be determined based on preset rules, for example, the priority of the current alarm event and the priority of the new alarm event, or the playing object may be randomly selected from the currently played preset audio. The following description will be given by taking an example of directly playing a preset audio corresponding to new broadcast data.

Step S250: and acquiring second broadcast data.

It should be noted that after the voice playing device sends the first target audio to the playing terminal associated with the target scene in the wireless transmission manner, the playing terminal receives the first target audio, and receives the second broadcast data during the playing of the first target audio.

And the second broadcast data is generated by the server in response to the alarm event existing in the target scene.

Step S260: and sending a second target audio matched with the second broadcast data to the playing terminal in a wireless transmission mode so that the playing terminal stops playing the first target audio and plays the second target audio.

The playing terminal receives the second target audio data in the process of playing the first target audio, then stops playing the first target audio and plays the second target audio, and therefore the new alarm event corresponding to the second target audio can be broadcasted.

In a specific implementation scene, the target scene is an escalator, two network cameras are used as scene information acquisition equipment to acquire image data at the escalator, and the scene information acquisition equipment is connected with a security network switch of a security monitoring room. The server is connected with the corresponding network switch, the server and the corresponding network switch are deployed in a power distribution room machine room together, and the server is connected with the security network switch through the corresponding network switch, so that the network camera can send the acquired image data of the target scene to the server. In addition, the server is in wired connection with the voice processing device through the corresponding network switch, so that the server can acquire a subscription request and send broadcast data. Escalator department is provided with the active audio amplifier as broadcast terminal, and voice processing equipment connects wireless audio transmitter, and the wireless audio receiver of active audio amplifier connection, both pass through wireless audio transmitter wireless connection to can transmit the target audio frequency, make broadcast terminal can play the target audio frequency and realize voice broadcast. The voice processing device can be arranged at any position capable of being in wireless connection with the playing terminal, the server and the voice processing device can be in wired connection, flexible wiring can be conducted according to the field condition, the positions of the server, the voice processing device and the playing terminal are flexibly arranged, the limitation of the field condition to voice broadcasting is reduced, and the flexibility of the voice broadcasting is improved.

In some embodiments, the server may also be connected to the external network switch through its corresponding network switch, so that data transmission may be performed.

In the above scheme, the first target audio corresponding to the first broadcast data is determined by receiving the first broadcast data, and the first target audio is wirelessly transmitted, so that the first target audio is played by the broadcast terminal associated with the wirelessly connected target scene, and the voice broadcast of the alarm event is realized. In addition, when a plurality of alarm events occur and the alarm event processing occurrence time is coincident, the voice processing equipment can process according to the sequence and the rule preset by the user, so that the voice broadcasting is more flexible.

Referring to fig. 3, fig. 3 is a schematic flowchart of a voice broadcasting method according to another embodiment of the present application, and in this embodiment, a voice processing device and a server are taken as the same device for example.

It is understood that the server includes an event detection module and a voice processing module, and data transmission can be performed between the voice processing module and the event detection module. The event detection module may be configured to determine whether an alarm event exists in a target scene, and the voice processing module may be configured to perform a related operation of voice broadcast based on the alarm event, for example, to perform acquisition of a first target audio matched with first broadcast data and subsequent steps thereof. The playing terminal can be located near the target scene, so that when the playing terminal plays audio, a user in the target scene can hear the voice prompt, and the server can be located within the wireless transmission range of the playing terminal.

Further, the server may further include a message system module, and the message system module may be used for data transmission between the voice processing module and the event detection module. The voice processing module can package the alarm event information and the receiving address to obtain a subscription request, and sends the subscription request to the message system module, and the message system module stores the subscription request after receiving the subscription request, so as to push the broadcast data.

Step S310: an event detection module of the server receives image data obtained by shooting a target scene, responds to the fact that an alarm event exists in the image data, and generates first broadcast data related to the alarm event.

The scene information acquisition equipment can shoot a target scene to obtain image data, and transmits the image data to the event detection module of the server, and the event detection module can process based on the image data to detect whether an alarm event exists in the target scene. And generating first broadcast data regarding the alarm event in the case where the alarm event exists in the image data.

It should be noted that, in some embodiments, the event types of the alarm event that can be detected by the event detection module may include several types, including the event type subscribed by the voice processing module, and may also include other event types beyond the event type subscribed by the voice processing module. The event detection module may detect all detectable event types and generate corresponding broadcast data, where the first broadcast data includes the event types subscribed by the voice processing module. The event detection module can also store the generated broadcast data.

Step S320: and the event detection module pushes the first broadcast data to a voice processing module of the server.

It can be understood that step S110 can be implemented by step S310 and step S320, because the voice processing device and the server are the same device, the event detection module of the server generates the first broadcast data and pushes the first broadcast data to the voice processing module, which means that the voice processing device acquires the first broadcast data, and then can perform subsequent processing based on the first broadcast data.

Specifically, the event detection module can push the generated broadcast data to the message system module through a Kafka message queue "transmission queue", wherein the generated broadcast data also includes the first broadcast data, and after the message system module receives the broadcast data, according to the stored subscription request, it is determined that the event type corresponding to the first broadcast data matches with the subscription request sent by the voice processing module, and the first broadcast data is pushed to the receiving address in the subscription request, so that the first broadcast data is pushed to the voice processing module.

In the above embodiments, the server processes the image data of a certain target scene, the voice processing device broadcasts based on the broadcast data of the target scene, in some embodiments, the server may also receive and process image data corresponding to a plurality of different target scenes separately to generate broadcast data separately, each target scene may correspond to a voice processing module/voice processing device separately, or a plurality of target scenes may correspond to one voice processing module/voice processing device, and if a plurality of voice processing modules/voice processing devices exist, the server may also receive a multi-party subscription request, and the message system module may match the subscription request sent by each voice processing module/voice processing device with the broadcast data sent by the event detection module, and push corresponding broadcast data to the subscriber. The subscriber may be a number of voice processing modules in the server and/or a number of voice processing devices connected to the server.

Step S330: the voice processing module acquires a first target audio matched with the first broadcast data.

Step S340: the voice processing module sends a first target audio to a playing terminal associated with a target scene in a wireless transmission mode.

In the above scheme, the server and the voice processing device can be the same device, so that the device for voice broadcasting is simplified, the device is more flexibly deployed, the first broadcast data are received, the first target audio corresponding to the first broadcast data is determined, and the first target audio is wirelessly transmitted, so that the first target audio is played by the play terminal associated with the wirelessly connected target scene, and the voice broadcasting of the alarm event is realized.

Referring to fig. 4, fig. 4 is a schematic frame diagram of a voice broadcast device according to an embodiment of the present application.

In this embodiment, the voice broadcasting device 40 includes a first obtaining module 41, a second obtaining module 42, and a wireless transmission module 43, where the first obtaining module 41 may be configured to obtain first broadcast data, the first broadcast data is generated in response to an alarm event existing in a target scene, the second obtaining module 42 may be configured to obtain a first target audio matched with the first broadcast data, the wireless transmission module 43 may be configured to send the first target audio to a playing terminal associated with the target scene in a wireless transmission manner, and the first target audio is used for being provided to the playing terminal for playing.

In the above scheme, through receiving the first broadcast data, determine the first target audio that the first broadcast data correspond to, and wireless transmission first target audio for the first target audio is broadcast to the broadcast terminal that the target scene of wireless connection is correlated with, has realized carrying out voice broadcast to the alarm incident, in addition, because wireless connection between broadcast terminal and the speech processing equipment, the position of broadcast terminal and speech processing equipment can set up in a flexible way, thereby can realize carrying out voice broadcast in optional position, reduced because of the restriction of environmental factor such as unable wiring to the applicable scene of voice broadcast, the flexibility of voice broadcast has been improved.

The first broadcast data includes an event type to which an alarm event exists in a target scene belongs, and the second obtaining module 42 obtains a first target audio matched with the first broadcast data, which may specifically include: and selecting a preset audio matched with the event type in the first broadcast data as a first target audio.

According to the scheme, the audio matched with the event type in the first broadcast data is selected from the preset audio to serve as the first target audio for voice broadcast, and different audio can be adopted for broadcasting the alarm events of different event types, so that the voice broadcast is more targeted, and the risk can be prompted in a targeted manner.

The voice broadcasting device 40 may further include a subscription module, configured to send a subscription request including the alarm event information to the server before acquiring the first broadcast data, so as to request the server to push broadcast data about an alarm event in response to an alarm event that exists in the target scene and matches with the alarm event information.

In the above scheme, by sending the subscription request including the alarm event information to the server, the broadcast data corresponding to the alarm event requiring voice broadcast can be obtained from the server in a targeted manner, so that voice broadcast is performed in a targeted manner.

In the above scheme, the voice processing device may subscribe to the server for broadcast data corresponding to the event type of the one or more alarm events, so as to perform voice broadcast.

The sending, by the subscription module, the subscription request including the alarm event information to the server may specifically include: and packaging the alarm event information and a receiving address to obtain a subscription request, and sending the subscription request to a server, wherein the receiving address is an address for receiving broadcast data related to the alarm event information.

According to the scheme, the server can subscribe the broadcast data matched with the alarm event information to the receiving address by sending the subscription request obtained by packaging the alarm event information and the receiving address to the server, so that the broadcast data corresponding to the alarm event needing to be broadcast can be obtained to perform voice broadcast.

The wireless transmission module 43 sends the first target audio to the playing terminal associated with the target scene in a wireless transmission manner, and specifically includes: in response to the fact that the plurality of first broadcast data are received, sending a first target audio corresponding to each first broadcast data to the playing terminal according to a preset sequence; or in response to the first broadcast data containing event types of a plurality of alarm events, sending a first target audio corresponding to each event type in the first broadcast data to the play terminal according to a preset sequence.

In the above scheme, under the condition that a plurality of reports need to be carried out simultaneously, can send corresponding target audio in proper order according to predetermineeing the order, report in proper order.

The voice broadcasting device 40 may further include a third obtaining module, configured to obtain second broadcast data after sending the first target audio to the broadcasting terminal associated with the target scene in a wireless transmission manner, where the second broadcast data is generated by the server in response to the alarm event existing in the target scene; and sending a second target audio matched with the second broadcast data to the playing terminal in a wireless transmission mode so that the playing terminal stops playing the first target audio and plays the second target audio.

In the above scheme, in the process that the target audio is still played at the playing terminal, if the voice processing device receives the latest broadcast data, the target audio corresponding to the latest broadcast data can be sent instead, so that the playing terminal can play the target audio corresponding to the latest broadcast data instead, and the voice broadcast is more timely.

Wherein, first acquisition module 41 acquires first broadcast data, and specifically may include: an event detection module of the server receives image data obtained by shooting a target scene, responds to the image data that an alarm event exists, and generates first broadcast data related to the alarm event; the event detection module pushes the first broadcast data to a voice processing module of the server, wherein the voice processing module is used for executing the steps of obtaining a first target audio matched with the first broadcast data and the subsequent steps of obtaining the first target audio.

In the above scheme, the server can also be used as a voice processing device, namely, broadcast data is generated, and matched audio is acquired based on the broadcast data and the audio is sent to the playing terminal.

Wherein, the target scene is an escalator or an elevator.

In the above scheme, when an alarm event occurs in the escalator or the elevator, the alarm event is subjected to voice broadcast, so that corresponding risks are prompted.

The sending of the first target audio to the playing terminal associated with the target scene by the wireless transmission module 43 in a wireless transmission manner may specifically include: and controlling the wireless audio transmitter to transmit the first target audio to a wireless audio receiver connected with the playing terminal.

In the scheme, the playing terminal can be controlled to play audio in a wireless mode, and voice broadcasting is correspondingly carried out under the condition that an alarm event occurs.

Referring to fig. 5, fig. 5 is a schematic diagram of a frame of an embodiment of a speech processing apparatus according to the present application.

In this embodiment, the speech processing device 50 includes a memory 51 and a processor 52, wherein the memory 51 is coupled to the processor 52. Specifically, the respective components of the speech processing apparatus 50 may be coupled together by a bus, or the processor 52 of the speech processing apparatus 50 is connected one-to-one with the other components, respectively. The speech processing device 50 may be any device having processing capabilities, such as a computer, tablet, cell phone, etc.

The memory 51 is used for storing program data executed by the processor 52, data of the processor 52 during processing, and the like. For example, first broadcast data, first target audio, and the like. The memory 51 includes a nonvolatile storage portion for storing the program data.

The processor 52 controls the operation of the speech Processing device 50, and the processor 52 may also be referred to as a Central Processing Unit (CPU). Processor 52 may be an integrated circuit chip having signal processing capabilities. The processor 52 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 52 may be commonly implemented by a plurality of circuit-forming chips.

The processor 52 is configured to execute instructions to implement any of the above-described voice broadcasting methods by calling the program data stored in the memory 51.

Referring to fig. 6, fig. 6 is a block diagram of an embodiment of a voice reporting system of the present application.

In this embodiment, the voice broadcasting system 60 includes a voice processing device 61, a playing device 62 associated with a target scene, and a scene information collecting device 63, where the voice processing device 61 is configured to obtain first broadcast data, obtain a first target audio matched with the first broadcast data, and send the first target audio to a playing terminal associated with the target scene in a wireless transmission manner, where the first broadcast data is generated in response to an alarm event existing in the target scene, and may be any one of the foregoing devices, the playing device 62 is configured to play the first target audio wirelessly transmitted by the voice processing device 61, the scene information collecting device 63 is configured to collect scene information of the target scene, and the scene information may be used to determine whether the alarm event exists in the target scene.

In some embodiments, the voice announcement system 60 further includes a server 64, and the server 64 is configured to detect whether an alert event exists in the target scene, and send announcement data regarding the alert event to the voice processing device 61 in response to the alert event existing in the target scene. Then, the scene information collecting apparatus 63 is connected to the server 64.

In some embodiments, the voice announcement system 60 does not include the server 64 and the voice processing device 61, and is configured to detect whether an alert event exists in the target scene, and generate announcement data regarding the alert event in response to the alert event existing in the target scene. Then, the scene information collection device 63 is connected to the voice processing device 61 (not shown in the figure).

In some embodiments, the scene information acquisition device 63 may be a network camera device, such as a network camera or the like.

In some embodiments, the voice playing system 60 further includes an audio transmitter 65 connected to the voice processing device 61, and an audio receiver 66 connected to the playing device 62, wherein wireless transmission is enabled between the audio transmitter 65 and the audio receiver 66, the voice processing device 61 transmits the first target audio to the audio receiver 66 through the audio transmitter 65, and the playing device 62 receives the first target audio through the audio receiver 66.

Referring to fig. 7, fig. 7 is a block diagram illustrating an embodiment of a computer-readable storage medium according to the present application.

In this embodiment, the computer-readable storage medium 70 stores processor-executable program data 71, which can be executed to implement any of the voice broadcasting methods described above.

The computer-readable storage medium 70 may be a medium that can store program data, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, or may be a server that stores the program data, and the server may send the stored program data to other devices for operation or may self-operate the stored program data.

In some embodiments, the computer-readable storage medium 70 may also be a memory as shown in FIG. 5.

The above description is only an embodiment of the present application, and is not intended to limit the scope of the present application, and all equivalent structures or equivalent processes performed by the present application and the contents of the attached drawings, which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims

1. A voice broadcast method, the method comprising:

acquiring first broadcast data, wherein the first broadcast data is generated in response to an alarm event existing in a target scene;

acquiring a first target audio matched with the first broadcast data;

and sending the first target audio to a playing terminal associated with the target scene in a wireless transmission mode, wherein the first target audio is used for being provided for the playing terminal to play.

2. The method according to claim 1, wherein the first report data includes an event type to which an alarm event existing in the target scene belongs;

the acquiring of the first target audio matched with the first broadcast data includes:

and selecting a preset audio matched with the event type in the first broadcast data as the first target audio.

3. The method of claim 1 or 2, wherein prior to said obtaining first broadcast data, the method further comprises:

sending a subscription request containing alarm event information to a server to request the server to respond to the alarm event matched with the alarm event information in the target scene, and pushing broadcast data about the alarm event.

4. The method of claim 3, wherein the alarm event information includes at least one event type of alarm event;

and/or, the sending the subscription request containing the alarm event information to the server includes:

and packaging the alarm event information and a receiving address to obtain a subscription request, and sending the subscription request to the server, wherein the receiving address is an address for receiving broadcast data related to the alarm event information.

5. The method according to any one of claims 1 to 4, wherein the sending the first target audio to the play terminal associated with the target scene by using a wireless transmission manner comprises:

responding to the received plurality of first broadcast data, and sending a first target audio corresponding to each first broadcast data to the playing terminal according to a preset sequence;

or, in response to that the first broadcast data contains event types of a plurality of alarm events, sending a first target audio corresponding to each event type in the first broadcast data to the broadcast terminal according to a preset sequence.

6. The method according to any one of claims 1 to 5, wherein after the sending the first target audio to the play terminal associated with the target scene by using wireless transmission, the method further comprises:

acquiring second broadcast data, wherein the second broadcast data are generated by the server in response to a warning event existing in a target scene;

and sending a second target audio matched with the second broadcast data to the playing terminal by adopting the wireless transmission mode so as to enable the playing terminal to stop playing the first target audio and play the second target audio.

7. The method according to any one of claims 1 to 6, wherein the server is located within a wireless transmission range of the cast terminal; obtain first report data, include:

an event detection module of the server receives image data obtained by shooting the target scene, responds to the image data that the alarm event exists, and generates first broadcast data related to the alarm event;

and the event detection module pushes the first broadcast data to a voice processing module of the server, wherein the voice processing module is used for executing the steps of acquiring a first target audio matched with the first broadcast data and the subsequent steps of acquiring the first target audio.

8. Method according to any of claims 1 to 7, characterized in that the target scene is an escalator or an elevator;

and/or, the sending the first target audio to the playing terminal associated with the target scene in a wireless transmission mode includes:

and controlling a wireless audio transmitter to transmit the first target audio to a wireless audio receiver connected with the playing terminal.

9. A voice broadcasting system, characterized in that the voice broadcasting system comprises a voice processing device, a broadcasting device associated with a target scene, and a scene information collecting device, wherein,

the scene information acquisition equipment is used for acquiring scene information of a target scene, and the scene information is used for determining whether an alarm event exists in the target scene;

the voice processing device is used for acquiring first broadcast data, acquiring a first target audio matched with the first broadcast data, and sending the first target audio to a playing terminal associated with the target scene in a wireless transmission mode, wherein the first broadcast data are generated in response to the existence of an alarm event in the target scene;

the playing device is used for playing the first target audio transmitted by the voice processing device through wireless.

10. The voice broadcasting system according to claim 9, further comprising a server for detecting whether an alert event exists in the target scene, and in response to the presence of the alert event in the target scene, transmitting broadcasting data regarding the alert event to the voice processing device;

or the voice processing device is used for detecting whether an alarm event exists in the target scene or not, and generating broadcast data about the alarm event in response to the alarm event existing in the target scene.

11. The voice broadcast system according to claim 9 or 10, wherein the scene information collecting device is a network camera device;

and/or the voice playing system further comprises an audio transmitter connected with the voice processing device and an audio receiver connected with the playing device, and wireless transmission can be realized between the audio transmitter and the audio receiver; the voice processing device sends the first target audio to the audio receiver through the audio sender, and the playing device receives the first target audio through the audio receiver.

12. A speech processing device characterized in that the speech processing device comprises a processor and a memory for storing program data, the processor being adapted to execute the program data to implement the method according to any of claims 1-8.

13. A computer-readable storage medium, characterized in that the computer-readable storage medium is used for storing program data, which can be executed for implementing the method according to any one of claims 1-8.