WO2018228280A1 - 一种通知信息的输出方法、服务器及监控*** - Google Patents

一种通知信息的输出方法、服务器及监控*** Download PDF

Info

Publication number
WO2018228280A1
WO2018228280A1 PCT/CN2018/090388 CN2018090388W WO2018228280A1 WO 2018228280 A1 WO2018228280 A1 WO 2018228280A1 CN 2018090388 W CN2018090388 W CN 2018090388W WO 2018228280 A1 WO2018228280 A1 WO 2018228280A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio information
information
feature value
audio
determining
Prior art date
Application number
PCT/CN2018/090388
Other languages
English (en)
French (fr)
Inventor
崔枝
Original Assignee
杭州海康威视数字技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州海康威视数字技术股份有限公司 filed Critical 杭州海康威视数字技术股份有限公司
Priority to US16/622,159 priority Critical patent/US11275628B2/en
Priority to EP18817001.3A priority patent/EP3640935B1/en
Publication of WO2018228280A1 publication Critical patent/WO2018228280A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/542Event management; Broadcasting; Multicasting; Notifications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/65Clustering; Classification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1815Semantic context, e.g. disambiguation of the recognition hypotheses based on word meaning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/54Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for retrieval

Definitions

  • the present application relates to the field of multimedia information processing technologies, and in particular, to a method, a server, and a monitoring system for outputting notification information.
  • notification information for some abnormal events to remind relevant personnel to deal with them in time. For example, when a robbery event occurs in the captured video image, the notification information needs to be output for the robbery event. Or, in the process of video surveillance of the cashier at the mall or supermarket, if there is a property dispute, you can also output a notification message, and so on.
  • the solution for outputting the notification information generally includes: analyzing the video image collected by the video collection device, for example, determining an active target in the video image and a motion trajectory of the active target; and determining, according to the analysis result, whether the video image is in the video image An abnormal event occurs; if yes, a notification message is output.
  • the purpose of the embodiment of the present application is to provide a method for outputting notification information, a server, and a monitoring system, which improve the accuracy of outputting notification information.
  • the embodiment of the present application discloses a method for outputting notification information, including:
  • the determined notification information is output.
  • the feature value model includes a scene sound model;
  • the scene sound model is a feature value model established for a preset scene sound;
  • the matching the extracted feature value with the feature value model in the preset database may include:
  • the extracted feature values are matched to the scene sound model.
  • the method may further include:
  • the multi-type audio information is first decomposed into at least one single-type audio information, the single-type audio information includes one type of sound; and the step of performing feature value extraction on the audio information is performed;
  • Performing feature value extraction on the audio information including:
  • the matching the extracted feature values with the feature value model in the preset database includes:
  • an alert level corresponding to the audio information including:
  • the audio information is a single type of audio information:
  • audio information is multi-type audio information:
  • the decomposing the multiple types of audio information into the at least one single type of audio information may include:
  • the audio segment is treated as a single type of audio information
  • the audio segment is decomposed into at least one single type of audio information based on sound parameters in the audio segment, the sound parameters comprising one or more of the following: pitch, loudness, timbre.
  • the method may further include:
  • Decomposing the plurality of types of audio information into at least one single type of audio information may include:
  • the alert level corresponding to the multiple types of audio information which may include:
  • the determining the notification information corresponding to the audio information may include:
  • the video image and/or geographic location information is determined as notification information corresponding to the audio information.
  • the method may further include:
  • the process of constructing the database may include:
  • the constructed feature value model is stored in the database corresponding to the alert level set by the user.
  • the method may further include:
  • the target feature value model is added to the database corresponding to the alert level included in the add command.
  • an embodiment of the present application further discloses a server, including: a processor and a memory, wherein the memory is used to store executable program code, and the processor runs by reading executable program code stored in the memory.
  • the determined notification information is output.
  • the feature value model includes a scene sound model; the scene sound model is a feature value model established for a preset scene sound; and the processor is further configured to perform the following steps:
  • the extracted feature values are matched to the scene sound model.
  • the processor is further configured to perform the following steps:
  • the audio information After obtaining the audio information, determining whether the audio information is multi-type audio information, and the multi-type audio information includes multiple types of sounds;
  • the multi-type audio information is first decomposed into at least one single-type audio information, the single-type audio information includes one type of sound; and each single-type audio information is subjected to feature value extraction;
  • the audio information is a single type of audio information:
  • audio information is multi-type audio information:
  • the processor is further configured to perform the following steps:
  • the audio segment is treated as a single type of audio information
  • the audio segment is decomposed into at least one single type of audio information based on sound parameters in the audio segment, the sound parameters comprising one or more of the following: pitch, loudness, timbre.
  • the processor is further configured to perform the following steps:
  • the audio information is multi-type audio information
  • matching the multi-type audio information with at least one preset scene sound model
  • the processor is further configured to perform the following steps:
  • the video image and/or geographic location information is determined as notification information corresponding to the audio information.
  • the processor is further configured to perform the following steps:
  • the processor is further configured to perform the following steps:
  • the constructed feature value model is stored in the database corresponding to the alert level set by the user.
  • the processor is further configured to perform the following steps:
  • the target feature value model is added to the database corresponding to the alert level included in the add command.
  • the embodiment of the present application further discloses a monitoring system, including: a server,
  • the server is configured to obtain audio information, perform feature value extraction on the audio information, and match the extracted feature value with a feature value model in a preset database, where the database stores a feature value model and an early warning level. Corresponding relationship; determining, according to the matching result, an alert level corresponding to the audio information; determining whether the alert level meets a preset condition; if yes, determining notification information corresponding to the audio information; and outputting the determined notification information.
  • the system further includes: an audio collection device,
  • the audio collection device is configured to collect audio information, and send the collected audio information to the server.
  • the system further includes: a video collection device,
  • the video capture device is configured to collect video images and determine local geographic location information, and send the collected audio images and the determined geographic location information to the server;
  • the server is further configured to: determine a video image and geographic location information corresponding to the audio information, and add the video image and geographic location information to the notification information in a process of determining notification information corresponding to the audio information. .
  • the server includes a communication server and a database server, where
  • the database server is configured to acquire simulated audio information of an abnormal event; perform feature value extraction on the analog audio information; construct a feature value model according to the extracted feature value; and set the constructed feature value model and an alarm set by a user
  • the level corresponds to a database stored to the database server;
  • the communication server is configured to acquire audio information, perform feature value extraction on the audio information, and match the extracted feature value with a feature value model in a database of the database server, where the database stores feature values. Corresponding relationship between the model and the warning level; determining an early warning level corresponding to the audio information according to the matching result; determining whether the early warning level satisfies a preset condition, and if yes, determining notification information corresponding to the audio information; outputting the determined Notification information.
  • the embodiment of the present application further discloses a computer readable storage medium, where the computer readable storage medium stores a computer program, and when the computer program is executed by the processor, the notification information is implemented. Output method.
  • an embodiment of the present application further discloses an executable program code for an output method executed to execute any of the above notification information.
  • a database is pre-established, and the database includes a correspondence relationship between the feature value model and the warning level; acquiring the feature value of the audio information, matching the acquired feature value with the feature value model in the database, and determining The warning level corresponding to the audio information is output; when the warning level meets the preset condition, the notification information is output. Therefore, the embodiment of the present application outputs the notification information by analyzing the audio information, and does not need to determine the active target in the video image. Even if there are many active targets in the scene, the track is disordered, and the solution can be accurately output by applying the solution. Notification information.
  • FIG. 1 is a first schematic flowchart of a method for outputting notification information according to an embodiment of the present application
  • FIG. 2 is a second schematic flowchart of a method for outputting notification information according to an embodiment of the present application
  • FIG. 3 is a third schematic flowchart of a method for outputting notification information according to an embodiment of the present application
  • FIG. 4 is a schematic structural diagram of a server according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a first structure of a monitoring system according to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of a second structure of a monitoring system according to an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a third structure of a monitoring system according to an embodiment of the present application.
  • the embodiment of the present application provides a method, a server, and a monitoring system for outputting notification information.
  • the method can be applied to a server in a monitoring system, or can be applied to various electronic devices, and is not limited.
  • a method for outputting notification information provided by an embodiment of the present application is described in detail below.
  • FIG. 1 is a schematic flowchart of a method for outputting notification information according to an embodiment of the present disclosure, including:
  • the device (hereinafter referred to as the device) that implements the solution may have an audio collection function, and the audio information acquired in S101 may be collected by the device itself.
  • the device can be communicatively coupled to the audio collection device to obtain audio information from the audio collection device.
  • the solution may be executed once every preset time period, that is, the audio information is acquired once every preset time length.
  • the solution may be executed after receiving the trigger command of the user, which is not limited.
  • the acquired audio information may be filtered, denoised, and the like, and then the feature values are extracted.
  • the extracted feature values may include one or more of the following types:
  • Speech rate semantic information, volume zero-crossing rate, volume maximum, volume minimum, volume average, volume change rate maximum, volume change rate minimum, volume change rate average, sound frequency maximum, sound frequency minimum , sound frequency average, sound frequency change rate maximum, sound frequency change rate minimum, sound frequency change rate average, audio curve vector, volume curve vector, and the like.
  • the database is pre-built before the implementation of the scheme.
  • the database stores a correspondence between the eigenvalue model and the warning level, and the eigenvalue model may be a set of multiple eigenvalues.
  • the kind of the feature value included in the feature value model coincides with the kind of the feature value extracted in S102. In this way, a better match can be obtained.
  • the eigenvalue model corresponding to the first-level warning level can be: speech rate 200 words/minute, volume average 70dB, semantic information “caution”.
  • the eigenvalue model corresponding to the second-level early warning level may be: a speech rate of 300 words/minute, a volume average of 80 dB, and a semantic information "who comes.”
  • the eigenvalue model corresponding to the three-level warning level can be: speech rate 400 words/minute, volume average 90dB, and semantic information “help”.
  • each level of the early warning level may correspond to multiple eigenvalue models.
  • the above model is taken as an example for description.
  • S104 Determine, according to the matching result, an early warning level corresponding to the audio information.
  • the feature values acquired in S102 include: a speech rate of 300 words/minute, a volume average of 80 dB, and a semantic information of “coming to people”; matching these feature values with the feature value model in the above database to match the secondary alarm level. It is determined that the alert level corresponding to the audio information acquired in S101 is two levels.
  • the criteria for successful matching may be set according to actual conditions. For example, when the matching rate is higher than the preset value, the matching is successful.
  • the matching result may include information that successfully matches a certain feature value model or fails to match a certain feature value model, or other, and is not limited.
  • the feature value model stored in the preset database may include a scene sound model, and the scene sound model may be a feature value model established for the preset scene sound.
  • the scene sound can include gunshots, crying, whistling, etc., and is not limited. It can be understood that when a chaos occurs in a scene such as a shopping mall, a supermarket, or a bank, it is usually accompanied by gunshots, whistling sounds, and crying sounds. In the present embodiment, these sounds are referred to as scene sounds.
  • the machine learning algorithm may be used to perform model training on the scene sounds in advance to obtain a scene sound model. It can be understood that when these scene sounds exist, the probability of occurrence of an abnormal event is large, and therefore, the warning level corresponding to the scene sound model can be set higher.
  • the feature values extracted in S102 are matched with the scene sound models, and the early warning level corresponding to the successfully matched scene sound is determined as the early warning level of the audio information.
  • the early warning level in this step refers to the early warning level corresponding to the above audio information determined in S104.
  • S106 Determine notification information corresponding to the audio information.
  • the preset condition is one level or more of the warning level, the condition is satisfied, and the notification information corresponding to the audio information acquired in S101 is determined.
  • S106 may include: acquiring video images and/or geographic location information corresponding to the audio information; and determining the video image and/or geographic location information as notification information corresponding to the audio information.
  • the device can have a video capture function and a positioning function, so that the video image collected by itself and the geographical location information determined by itself can be obtained; or the device can communicate with other devices and obtain audio information from other devices.
  • the video image and/or geographical location information is not limited.
  • the video image corresponding to the audio information refers to the video image that is the same as the audio information and has the same acquisition time;
  • the geographical location information corresponding to the audio information refers to the geographical location information of the device that collects the audio information.
  • the device obtains video images and/or geographical location information corresponding to the audio information from other devices, the other device performs audio or video acquisition on the same scene with the device that collects the audio information.
  • the notification information includes the video image and/or the geographical location information, so that the abnormal event can be more accurately notified to the relevant person for processing.
  • the user may be prompted whether to output the notification information; determine whether the rejection information sent by the user is received within the preset time period; if not, execute S107 again.
  • the prompt information may include one or more of the following: an alert level corresponding to the audio information, a video image, a geographic location information, or the like, and is not specifically limited.
  • the prompt information is displayed to the user, and the display forms are various, such as pop-up windows, flashing reminders, etc., and are not limited.
  • the user can select the confirmation output for the prompt information, and can choose to reject the output, or not to make a selection; if receiving the confirmation information sent by the user (the user selects the confirmation output), or does not receive the preset time period, User feedback (the user has not made a selection), S107 is performed; if the rejection information sent by the user is received (the user chooses to reject the output), the notification information is not output.
  • the process of constructing the foregoing database may include:
  • the abnormal event can be understood as a robbery event, a property dispute event, etc., and is not limited.
  • the above database can be constructed according to actual needs.
  • the analog audio information of the robbery event can be recorded, and the feature value of the analog audio information is extracted.
  • the extracted feature values include: a speech rate of 400 words/minute, a volume average of 90 dB, and a semantic information “life-saving”, and a feature value model is constructed according to the extracted feature values, and the feature value model may be a set of the above-mentioned feature values;
  • the eigenvalue model is stored corresponding to the alert level set by the user. In this way, the correspondence between each feature value model and the warning level is stored in the database.
  • the built database can be updated:
  • the included alert level is added to the database.
  • the audio information that the user thinks is in conformity with the desired audio information is referred to as the target audio information
  • the user may send an add instruction to the device, where the add instruction may include the target audio information.
  • the device determines the target audio information according to the identifier in the adding instruction, and extracts the feature value of the target audio information; constructs the target feature value model according to the extracted feature value, and constructs the target feature value.
  • the model is added to the database corresponding to the alert level contained in the add directive.
  • a database is pre-established, and the database includes a correspondence relationship between the feature value model and the warning level; acquiring the feature value of the audio information, and performing the acquired feature value and the feature value model in the database Matching, and determining the alert level corresponding to the audio information; when the alert level meets the preset condition, the notification information is output. Therefore, the embodiment of the present application outputs the notification information by analyzing the audio information, and does not need to determine the active target in the video image. Even if there are many active targets in the scene, the track is disordered, and the solution can be accurately output by applying the solution. Notification information.
  • FIG. 2 is a second schematic flowchart of a method for outputting notification information according to an embodiment of the present disclosure, including:
  • S202 Determine whether the audio information is multi-type audio information, if yes, execute S203, if no, directly execute S204.
  • S203 Decompose the plurality of types of audio information into at least one single type of audio information.
  • multi-type audio information includes multiple types of sounds
  • single-type audio information includes one type of sound
  • the application scenario of the solution may be a single sound scenario, for example, in a home scenario, the collected audio information may include only one person's voice information, and such audio information is also the above-mentioned single type audio information.
  • the application scenario of the solution may also be a multi-type sound scenario, such as a supermarket, a shopping mall, a bank, etc., and the collected audio information includes multi-person voice information, and such audio information is also the above-mentioned multi-type audio information.
  • the collected audio information includes a person's voice information and sound information in the environment, and such audio information is also multi-type audio information.
  • the collected audio information includes voice information of multiple people and sound information in the environment, and such audio information is also multi-type audio information.
  • the multi-type audio information may be first decomposed into single-type audio information, and then the subsequent steps are performed.
  • S203 may include: dividing the multi-type audio information into a plurality of audio segments according to a preset segmentation rule; and determining, for each audio segment, whether the audio segment includes multiple sound types. If not, the audio segment is treated as a single type of audio information; if so, the audio segment is decomposed into at least one single type of audio information according to a sound parameter in the audio segment, the sound parameter comprising the following Kind or more: tone, loudness, tone.
  • the preset segmentation rule may be multiple.
  • multiple types of audio information may be divided into multiple audio segments of equal length, or may be divided into multiple audio segments of equal size, or may be The total length of the multi-type audio information, determining the number of the segmented audio segments, and segmenting the audio segments according to the number, or, according to the total volume of the multi-type audio information, determining the number of the segmented audio segments, according to the number The audio segment, and so on, the specific segmentation rules are not limited.
  • the multi-type audio information can be divided into a plurality of audio segments of 1 second duration, and if the total duration of the multi-type audio information is 1 minute, 60 audio segments are obtained.
  • the multi-type audio information is a conversation between the person A and the person B, and the duration is one minute, and the voice information of the person A does not intersect with the voice information of the person B. It is assumed that the first 30 audio segments obtained by the segmentation only contain the voice information of the person A, and the last 30 audio segments only contain the voice information of the person A, then the 60 audio segments contain only one voice type, all of which are single Type audio information.
  • each audio segment contains only one person's voice information. In fact, multiple sound types appear in one audio segment. It is still assumed that the multi-type audio information is the conversation between the person A and the person B, and the duration is one minute, but the segmented audio segment contains only one person's voice information, and some contains two people's voice information.
  • An audio segment containing one person's voice information is treated as a single type of audio information, and for an audio segment containing two human voice information, the audio segment is further decomposed according to the sound parameters in the audio segment.
  • Multi-type audio information is collected for these scenes, and the multi-type audio information is segmented to obtain a plurality of audio segments. Since there are multiple types of sounds at the same time, the audio segment corresponding to the moment contains a plurality of sound types. For the audio segment, the audio segment is further decomposed according to the sound parameters therein.
  • the sound parameters can include one or more of the following: pitch, loudness, and tone.
  • pitch can include one or more of the following: pitch, loudness, and tone.
  • tone can include one or more of the following: pitch, loudness, and tone.
  • the different sounds can be extracted using sound parameters such as pitch, loudness, and timbre of different sounds. Therefore, it is also possible to continue to decompose the audio segments containing a plurality of sound types to obtain individual single-type audio information.
  • S204 corresponds to S102 in FIG. 1
  • S205 corresponds to S103 in FIG. 1
  • the steps of feature value extraction and feature value matching in FIG. 2 are for each single type of audio information, therefore:
  • S205 Match, for each single type of audio information, a feature value extracted from the single type audio information with a feature value model in the preset database.
  • S206 corresponds to S104 in FIG. 1, and S206 is:
  • the audio information acquired in S201 is a single type of audio information, determining an early warning level corresponding to the single type of audio information according to the matching result;
  • the audio information acquired in S201 is multi-type audio information
  • obtain a matching result corresponding to each single-type audio information included in the multi-type audio information ; determine a weight corresponding to each single-type audio information; The determined weight and the matching result determine an early warning level corresponding to the multi-type audio information.
  • each single-type audio information included in the multi-type audio information corresponds to a matching result.
  • each single-type audio information can be determined.
  • Corresponding weights There are a plurality of specific manners, for example, determining the weight according to the order in which the single type of audio information is obtained by the decomposition; or determining the weight according to the volume average value of the audio information of each single type, and the like, which is not limited.
  • the multi-type audio information acquired in S201 includes whistle sound, crying sound, and multi-person voice information, and decomposes the multi-type audio information to obtain "whistle sound”, “cry sound”, " Four single types of audio information of "personal A's voice information” and "person B's voice information”.
  • the determined early warning level is the second level;
  • the matching result corresponding to the "cry” is determined to be three levels.
  • the determined early warning level is three levels, according to the "personal B's voice information” Match the result and the determined alert level is one level.
  • the weight of the "whistle” corresponds to 0.7
  • the weight corresponding to "cry” is 0.9
  • the weight of "personal A's voice information” is 0.8
  • the "personal B's voice information” corresponds to a weight of 0.6.
  • the multi-type audio information acquired in S201 is decomposed, and the single-type audio information corresponding to the scene sound and other single-type audio information (for example, the voice information of the person) are obtained, only the scene sound corresponding may be considered. It is also reasonable to use the warning level and weight to calculate the warning level of the multi-type audio information.
  • S207 is the same as S105 in Fig. 1
  • S208 is the same as S106 in Fig. 1
  • S209 is the same as S107 in Fig. 1.
  • the notification information corresponding to the multi-audio information is determined, and the subsequent steps are similar to the embodiment of FIG. 1 and will not be described again.
  • FIG. 3 is a third schematic flowchart of a method for outputting notification information according to an embodiment of the present disclosure, including:
  • S302 Determine whether the audio information is multi-type audio information, if yes, execute S303, if no, directly execute S308.
  • S303 Match the multi-type audio information with at least one preset scene sound model.
  • S304 Determine, according to the matching result, each scene sound included in the multi-type audio information.
  • S305 Determine an alert level and a weight corresponding to each of the scene sounds.
  • the scene sound model may include: a gunshot sound model, a whistle sound model, a crying sound model, and the like, which are not limited in detail. It can be understood that when there is chaos in a scene such as a shopping mall, a supermarket, or a bank, it is usually accompanied by gunshots, whistling, and crying. In the embodiment of Fig. 3, these sounds are referred to as scene sounds.
  • the machine learning algorithm can be used to perform model training on these scene sounds in advance to obtain a scene sound model. Before decomposing multiple types of audio information, multiple types of audio information can be matched with these scene sound models.
  • the multi-type audio information acquired in S301 includes whistling sound, crying sound, and multi-person voice information.
  • the multi-type audio information is matched with various preset scene sound models, and the matching result is: matching with the whistle sound model and the crying sound model, that is, determining that the multi-type audio information includes the whistle Sound and cry.
  • the corresponding alert levels and weights may be set in advance for various scene sounds.
  • the set warning level and weight can be stored correspondingly to the scene sound model, so that according to the matching result in S303, the warning level and weight corresponding to each scene sound (whistling sound and crying sound) can be directly determined.
  • S305 may include: extracting each of the scene sounds in the multi-type audio information; performing feature value extraction on the scene sound for each of the extracted scene sounds, The extracted feature values are matched with the feature value model in the preset database, and the early warning level corresponding to the successfully matched feature value model is determined as the early warning level of the scene sound.
  • the multi-type audio information includes a whistle sound and a crying sound.
  • the whistle and the crying sound can be extracted separately according to the pitch, loudness, timbre or other sound parameters.
  • the scene sound is also handled as a single type of audio information.
  • the eigenvalue extraction and the eigenvalue matching are performed on the whistle sound and the crying sound, and the specific process is similar to S204 and S205 in the embodiment of FIG. 2, and details are not described herein again.
  • the database in the embodiment and the database in the embodiment of FIG. 1 may be the same database, or may be different databases, and are not specifically limited.
  • the scene sound and the voice information in the multi-type audio information are separately processed, and the scene sound may be processed first, then the voice information may be processed, or the voice information may be processed first, and then the scene sound is processed, that is, S303-305 may be executed first, then S306-S309 may be executed, or S306-S309 may be executed first, and then S303-305 may be executed, and the specific order is not limited.
  • the voice information in this embodiment refers to "speech that is sent by a person and has semantics", and does not include a voice having no semantics such as the above-mentioned crying sound.
  • S306 Determine voice information included in the multi-type audio information.
  • S307 Determine, according to the timbre of the voice information, each single type of audio information corresponding to the voice information.
  • voice information sent by a person can be extracted by using a timbre, or can be extracted by other means, which is not limited.
  • S308 corresponds to S204 in FIG. 2
  • S309 corresponds to S205 in FIG. 2, specifically The process will not go into details.
  • the audio information acquired in S301 is multi-type audio information
  • obtain a matching result corresponding to each single-type audio information included in the multi-type audio information ; determine a weight corresponding to each single-type audio information; The weight corresponding to the single type of audio information and the matching result, and the warning level and weight corresponding to each of the scene sounds, determine an early warning level corresponding to the multiple types of audio information.
  • each single-type audio information included in the multi-type audio information corresponds to a matching result.
  • each single-type audio information can be determined.
  • Corresponding weights There are a plurality of specific manners, for example, determining weights according to the order in which the single type of audio information is decomposed; or randomly assigning weights; or determining weights according to the volume average value of each single type of audio information, and the like, which is not limited.
  • the warning level and weight determined in S305 are comprehensively considered, and the matching result and weight corresponding to the single type audio information are determined, and the early warning level corresponding to the multi-type audio information is determined. That is to say, comprehensively consider the warning level and weight corresponding to each scene sound, and the warning level and weight corresponding to each type of voice information, and determine the early warning level corresponding to the multi-type audio information.
  • the multi-type audio information acquired by S301 includes two scene sounds of whistle sound and crying sound, and voice information of person A and person B.
  • determining that the multi-type audio information includes “whistle sound” and “cry sound” and determining that the voice information included in the multi-type audio information corresponds to “person Two single types of audio information of "voice information of A" and "voice information of person B”.
  • the voice information included in the multi-type audio information corresponds to two single types of audio information of “personal A's voice information” and “person B's voice information”, and then the multi-type audio information and the scene sound are The model is matched to determine that the multi-type audio information includes "whistle” and "cry".
  • the warning level corresponding to "whistle” is determined by S305 as the second level and the weight is 0.7, and the warning level corresponding to "cry” is determined to be three levels and the weight is 0.9; through S306-S309, "person A” is determined.
  • the corresponding alarm level of the voice information is three levels, the weight is 0.8, and the warning level corresponding to the “personal B voice information” is determined to be one level and the weight is 0.6.
  • S311 is the same as S105 in Fig. 1
  • S312 is the same as S106 in Fig. 1
  • S313 is the same as S107 in Fig. 1.
  • the notification information corresponding to the multi-audio information is determined, and the subsequent steps are similar to the embodiment of FIG. 1 and will not be described again.
  • the embodiment of the present application further provides a server.
  • FIG. 4 is a schematic structural diagram of a server according to an embodiment of the present disclosure, including: a processor 401 and a memory 402, wherein the memory 402 is configured to store executable program code; and the processor 401 reads the executable file stored in the memory 402.
  • the program code runs a program corresponding to the executable program code for performing the following steps:
  • the determined notification information is output.
  • the feature value model includes a scene sound model;
  • the scene sound model is a feature value model established for a preset scene sound;
  • the processor 401 can also be configured to perform the following steps:
  • the extracted feature values are matched to the scene sound model.
  • the processor 401 is further configured to perform the following steps:
  • the audio information After obtaining the audio information, determining whether the audio information is multi-type audio information, and the multi-type audio information includes multiple types of sounds;
  • the multi-type audio information is first decomposed into at least one single-type audio information, the single-type audio information includes one type of sound; and each single-type audio information is subjected to feature value extraction;
  • the audio information is a single type of audio information:
  • audio information is multi-type audio information:
  • the processor 401 is further configured to perform the following steps:
  • the audio segment is treated as a single type of audio information
  • the audio segment is decomposed into at least one single type of audio information based on sound parameters in the audio segment, the sound parameters comprising one or more of the following: pitch, loudness, timbre.
  • the processor 401 is further configured to perform the following steps:
  • the audio information is multi-type audio information
  • matching the multi-type audio information with at least one preset scene sound model
  • the processor 401 is further configured to perform the following steps:
  • the processor 401 is further configured to perform the following steps:
  • the video image and/or geographic location information is determined as notification information corresponding to the audio information.
  • the processor 401 is further configured to perform the following steps:
  • the processor 401 is further configured to perform the following steps:
  • the constructed feature value model is stored in the database corresponding to the alert level set by the user.
  • the processor 401 is further configured to perform the following steps:
  • the target feature value model is added to the database corresponding to the alert level included in the add command.
  • a database is pre-established, and the database includes a correspondence relationship between the feature value model and the warning level; acquiring the feature value of the audio information, and performing the acquired feature value and the feature value model in the database Matching, and determining the alert level corresponding to the audio information; when the alert level meets the preset condition, the notification information is output. Therefore, the embodiment of the present application outputs the notification information by analyzing the audio information, and does not need to determine the active target in the video image. Even if there are many active targets in the scene, the track is disordered, and the solution can be accurately output by applying the solution. Notification information.
  • the embodiment of the present application further provides a monitoring system, which may include only a server, which has an audio collection function; or may also include a server and an audio collection device as shown in FIG. 5; or may also be as shown in FIG.
  • the device includes a server and a multimedia collection device, and the multimedia collection device has an audio and video collection function; or, as shown in FIG. 7, a server, an audio collection device, and a video collection device.
  • the audio collection device or the multimedia collection device is configured to collect audio information and send the collected audio information to the server.
  • the video capture device or the multimedia capture device is configured to collect video images and determine geographic location information, and send the collected audio images and the determined geographic location information to The server;
  • the server is further configured to determine, in the process of determining the notification information corresponding to the audio information, a video image and geographic location information corresponding to the audio information, and add the video image and geographic location information to the notification information.
  • the server may include a communication server and a database server, where
  • the database server is configured to acquire simulated audio information of an abnormal event; perform feature value extraction on the analog audio information; construct a feature value model according to the extracted feature value; and set the constructed feature value model and an alarm set by a user
  • the level corresponds to a database stored to the database server;
  • the communication server is configured to acquire audio information, perform feature value extraction on the audio information, and match the extracted feature value with a feature value model in a database of the database server, where the database stores feature values. Corresponding relationship between the model and the warning level; determining an early warning level corresponding to the audio information according to the matching result; determining whether the early warning level satisfies a preset condition, and if yes, determining notification information corresponding to the audio information; outputting the determined Notification information.
  • the server may be used to:
  • the feature value model includes a scene sound model; the scene sound model is a feature value model established for a preset scene sound; the server may further be configured to:
  • the extracted feature values are matched to the scene sound model.
  • the server can also be used to:
  • the multi-type audio information includes multiple types of sounds; if yes, first decomposing the multi-type audio information into at least one single-type audio Information, the single type audio information includes one type of sound; and then extracts feature values for each single type of audio information; if not, directly extracts feature values for single type audio information; for each single type of audio information And matching the feature value extracted from the single type audio information with the feature value model in the preset database; if the audio information is single type audio information: determining, according to the matching result, the single type audio information corresponding to If the audio information is multi-type audio information: obtaining a matching result corresponding to each single-type audio information included in the multi-type audio information; determining a weight corresponding to each single-type audio information; The determined weight and the matching result determine an early warning level corresponding to the multi-type audio information.
  • the server can also be used to:
  • the audio segment is treated as a single type of audio information
  • the audio segment is decomposed into at least one single type of audio information based on sound parameters in the audio segment, the sound parameters comprising one or more of the following: pitch, loudness, timbre.
  • the server can also be used to:
  • the audio information is multi-type audio information
  • matching the multi-type audio information with at least one preset scene sound model
  • the server can also be used to:
  • the server can also be used to:
  • the video image and/or geographic location information is determined as notification information corresponding to the audio information.
  • the server can also be used to:
  • the process of the server constructing the database may include:
  • the constructed feature value model is stored in the database corresponding to the alert level set by the user.
  • the server can also be used to:
  • the target feature value model is added to the database corresponding to the alert level included in the add command.
  • a database is pre-established, and the database includes a correspondence relationship between the feature value model and the warning level; acquiring the feature value of the audio information, matching the acquired feature value with the feature value model in the database, and determining The warning level corresponding to the audio information is output; when the warning level meets the preset condition, the notification information is output. Therefore, the embodiment of the present application outputs the notification information by analyzing the audio information, and does not need to determine the active target in the video image. Even if there are many active targets in the scene, the track is disordered, and the solution can be accurately output by applying the solution. Notification information.
  • the embodiment of the present application further provides a computer readable storage medium, where the computer readable storage medium stores a computer program, and when the computer program is executed by the processor, the method for outputting any one of the foregoing notification information is implemented.
  • the embodiment of the present application further provides an executable program code for an output method executed to execute any of the above notification information.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Alarm Systems (AREA)
  • Emergency Alarm Devices (AREA)

Abstract

一种通知信息的输出方法、服务器及监控***,预先建立数据库,数据库中包含特征值模型与预警级别的对应关系;获取音频信息的特征值(S102),将所获取的特征值与数据库中的特征值模型进行匹配(S103),进而确定出音频信息对应的预警级别(S104);当预警级别满足预设条件时(S105),输出通知信息(S107)。由此可见,通过对音频信息进行分析来输出通知信息,不需要确定视频图像中的活动目标,即使场景中活动目标较多,轨迹较乱,仍可以准确地输出通知信息。

Description

一种通知信息的输出方法、服务器及监控***
本申请要求于2017年6月12日提交中国专利局、申请号为201710436582.1发明名称为“一种通知信息的输出方法、服务器及监控***”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及多媒体信息处理技术领域,特别涉及一种通知信息的输出方法、服务器及监控***。
背景技术
在视频监控过程中,通常需要针对一些异常事件输出通知信息,以提醒相关人员及时处理。比如,当采集到的视频图像中出现抢劫事件,则需要针对该抢劫事件,输出通知信息。或者,在对商场或超市收银台进行视频监控的过程中,如果出现财物纠纷,也可以输出通知信息,等等。
目前,输出通知信息的方案通常包括:对视频采集设备采集到的视频图像进行分析,比如,确定该视频图像中的活动目标、以及活动目标的运动轨迹;根据分析结果,判断该视频图像中是否出现异常事件;如果是,则输出通知信息。
但是,应用上述方式输出通知消息时,如果视频图像中的活动目标较多,活动目标的轨迹较混乱,则难以针对每个活动目标准确地判断是否发生了异常事件,导致输出通知信息的准确性较低。
发明内容
本申请实施例的目的在于提供一种通知信息的输出方法、服务器及监控***,提高输出通知信息的准确性。
为达到上述目的,本申请实施例公开了一种通知信息的输出方法,包括:
获取音频信息;
对所述音频信息进行特征值提取;
将所提取的特征值与预设数据库中的特征值模型进行匹配,所述数据库 中存储有特征值模型与预警级别的对应关系;
根据匹配结果,确定所述音频信息对应的预警级别;
判断所述预警级别是否满足预设条件,如果是,确定所述音频信息对应的通知信息;
输出所确定的通知信息。
可选的,所述特征值模型包含场景声音模型;所述场景声音模型为针对预设场景声音建立的特征值模型;
所述将所提取的特征值与预设数据库中的特征值模型进行匹配,可以包括:
将所提取的特征值与所述场景声音模型进行匹配。
可选的,在所述获取音频信息之后,还可以包括:
判断所述音频信息是否为多类型音频信息,所述多类型音频信息中包含多种类型的声音;
如果是,先将所述多类型音频信息分解为至少一个单类型音频信息,所述单类型音频信息中包含一种类型的声音;再执行对所述音频信息进行特征值提取的步骤;
如果否,执行对所述音频信息进行特征值提取的步骤;
所述对所述音频信息进行特征值提取,包括:
对每个单类型音频信息进行特征值提取;
所述将所提取的特征值与预设数据库中的特征值模型进行匹配,包括:
针对每个单类型音频信息,将从所述单类型音频信息提取的特征值与所述预设数据库中的特征值模型进行匹配;
所述根据匹配结果,确定所述音频信息对应的预警级别,包括:
若所述音频信息为单类型音频信息:
根据匹配结果,确定所述单类型音频信息对应的预警级别;
若所述音频信息为多类型音频信息:
获得所述多类型音频信息中包含的每个单类型音频信息对应的匹配结果;
确定所述每个单类型音频信息对应的权重;
根据所确定的权重及所述匹配结果,确定所述多类型音频信息对应的预警级别。
可选的,所述将所述多类型音频信息分解为至少一个单类型音频信息,可以包括:
根据预设切分规则,将所述多类型音频信息切分为多个音频段;
针对每个音频段,判断所述音频段中是否包含多种声音类型;
如果否,将所述音频段作为一个单类型音频信息;
如果是,根据所述音频段中的声音参数,将所述音频段分解为至少一个单类型音频信息,所述声音参数包含以下一种或多种:音调、响度、音色。
可选的,在判断所述音频信息为多类型音频信息的情况下,所述方法还可以包括:
将所述多类型音频信息与预先设定的至少一种场景声音模型进行匹配;
根据匹配结果,确定所述多类型音频信息中包含的每一种场景声音;
确定所述每一种场景声音对应的预警级别及权重;
所述将所述多类型音频信息分解为至少一个单类型音频信息,可以包括:
确定所述多类型音频信息中包含的语音信息;
根据所述语音信息的音色,确定所述语音信息对应的每个单类型音频信息;
所述根据所确定的权重及所述匹配结果,确定所述多类型音频信息对应的预警级别,可以包括:
根据每个单类型音频信息对应的权重及匹配结果、以及所述每一种场景声音对应的预警级别及权重,确定所述多类型音频信息对应的预警级别。
可选的,所述确定所述音频信息对应的通知信息,可以包括:
获取所述音频信息对应的视频图像和/或地理位置信息;
将所述视频图像和/或地理位置信息确定为所述音频信息对应的通知信息。
可选的,在所述输出所确定的通知信息之前,还可以包括:
提示用户是否输出所述通知信息;
判断在预设时间段内是否接收到用户发送的拒绝信息;
如果否,执行所述输出所确定的通知信息的步骤。
可选的,构建所述数据库的过程可以包括:
获取异常事件的模拟音频信息;
对所述模拟音频信息进行特征值提取;
根据所提取的特征值构建特征值模型;
将所构建的特征值模型与用户设定的预警级别对应存储至所述数据库。
可选的,所述方法还可以包括:
接收用户发送的添加指令;
提取所述添加指令对应的目标音频信息的特征值;
根据所述目标音频信息的特征值,构建目标特征值模型;
将所述目标特征值模型与所述添加指令中包含的预警级别对应添加至所述数据库。
为达到上述目的,本申请实施例还公开了一种服务器,包括:处理器和存储器,其中,存储器用于存储可执行程序代码,处理器通过读取存储器中存储的可执行程序代码来运行与可执行程序代码对应的程序,以用于执行以下步骤:
获取音频信息;
对所述音频信息进行特征值提取;
将所提取的特征值与预设数据库中的特征值模型进行匹配,所述数据库中存储有特征值模型与预警级别的对应关系;
根据匹配结果,确定所述音频信息对应的预警级别;
判断所述预警级别是否满足预设条件,如果是,确定所述音频信息对应的通知信息;
输出所确定的通知信息。
可选的,所述特征值模型包含场景声音模型;所述场景声音模型为针对预设场景声音建立的特征值模型;所述处理器还用于执行如下步骤:
将所提取的特征值与所述场景声音模型进行匹配。
可选的,所述处理器还用于执行如下步骤:
在获取音频信息之后,判断所述音频信息是否为多类型音频信息,所述多类型音频信息中包含多种类型的声音;
如果是,先将所述多类型音频信息分解为至少一个单类型音频信息,所述单类型音频信息中包含一种类型的声音;再对每个单类型音频信息进行特征值提取;
如果否,直接对单类型音频信息进行特征值提取;
针对每个单类型音频信息,将从所述单类型音频信息提取的特征值与所述预设数据库中的特征值模型进行匹配;
若所述音频信息为单类型音频信息:
根据匹配结果,确定所述单类型音频信息对应的预警级别;
若所述音频信息为多类型音频信息:
获得所述多类型音频信息中包含的每个单类型音频信息对应的匹配结果;
确定所述每个单类型音频信息对应的权重;
根据所确定的权重及所述匹配结果,确定所述多类型音频信息对应的预警级别。
可选的,所述处理器还用于执行如下步骤:
根据预设切分规则,将所述多类型音频信息切分为多个音频段;
针对每个音频段,判断所述音频段中是否包含多种声音类型;
如果否,将所述音频段作为一个单类型音频信息;
如果是,根据所述音频段中的声音参数,将所述音频段分解为至少一个单类型音频信息,所述声音参数包含以下一种或多种:音调、响度、音色。
可选的,所述处理器还用于执行如下步骤:
在判断所述音频信息为多类型音频信息的情况下,将所述多类型音频信息与预先设定的至少一种场景声音模型进行匹配;
根据匹配结果,确定所述多类型音频信息中包含的每一种场景声音;
确定所述每一种场景声音对应的预警级别及权重;
确定所述多类型音频信息中包含的语音信息;
根据所述语音信息的音色,确定所述语音信息对应的每个单类型音频信息;
根据每个单类型音频信息对应的权重及匹配结果、以及所述每一种场景声音对应的预警级别及权重,确定所述多类型音频信息对应的预警级别。
可选的,所述处理器还用于执行如下步骤:
获取所述音频信息对应的视频图像和/或地理位置信息;
将所述视频图像和/或地理位置信息确定为所述音频信息对应的通知信息。
可选的,所述处理器还用于执行如下步骤:
在输出所确定的通知信息之前,提示用户是否输出所述通知信息;
判断在预设时间段内是否接收到用户发送的拒绝信息;
如果否,执行所述输出所确定的通知信息的步骤。
可选的,所述处理器还用于执行如下步骤:
获取异常事件的模拟音频信息;
对所述模拟音频信息进行特征值提取;
根据所提取的特征值构建特征值模型;
将所构建的特征值模型与用户设定的预警级别对应存储至所述数据库。
可选的,所述处理器还用于执行如下步骤:
接收用户发送的添加指令;
提取所述添加指令对应的目标音频信息的特征值;
根据所述目标音频信息的特征值,构建目标特征值模型;
将所述目标特征值模型与所述添加指令中包含的预警级别对应添加至所述数据库。
为达到上述目的,本申请实施例还公开了一种监控***,包括:服务器,
所述服务器,用于获取音频信息;对所述音频信息进行特征值提取;将所提取的特征值与预设数据库中的特征值模型进行匹配,所述数据库中存储有特征值模型与预警级别的对应关系;根据匹配结果,确定所述音频信息对应的预警级别;判断所述预警级别是否满足预设条件,如果是,确定所述音频信息对应的通知信息;输出所确定的通知信息。
可选的,所述***还包括:音频采集设备,
所述音频采集设备,用于采集音频信息,并将所采集的音频信息发送给所述服务器。
可选的,所述***还包括:视频采集设备,
所述视频采集设备,用于采集视频图像、以及确定自身地理位置信息,并将所采集的音频图像、以及所确定的地理位置信息发送给所述服务器;
所述服务器还用于,在确定所述音频信息对应的通知信息的过程中,确定所述音频信息对应的视频图像及地理位置信息,将所述视频图像及地理位置信息添加至所述通知信息。
可选的,所述服务器包括通信服务器和数据库服务器,其中,
所述数据库服务器,用于获取异常事件的模拟音频信息;对所述模拟音频信息进行特征值提取;根据所提取的特征值构建特征值模型;将所构建的特征值模型与用户设定的预警级别对应存储至所述数据库服务器的数据库;
所述通信服务器,用于获取音频信息;对所述音频信息进行特征值提取;将所提取的特征值与所述数据库服务器的数据库中的特征值模型进行匹配,所述数据库中存储有特征值模型与预警级别的对应关系;根据匹配结果,确定所述音频信息对应的预警级别;判断所述预警级别是否满足预设条件,如果是,确定所述音频信息对应的通知信息;输出所确定的通知信息。
为达到上述目的,本申请实施例还公开了一种计算机可读存储介质,所述计算机可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现上述任一种通知信息的输出方法。
为达到上述目的,本申请实施例还公开了一种可执行程序代码,所述可执行程序代码用于被运行以执行上述任一种通知信息的输出方法。
应用本申请实施例,预先建立数据库,该数据库中包含特征值模型与预警级别的对应关系;获取音频信息的特征值,将所获取的特征值与该数据库中的特征值模型进行匹配,进而确定出该音频信息对应的预警级别;当预警级别满足预设条件时,输出通知信息。由此可见,本申请实施例通过对音频信息进行分析来输出通知信息,不需要确定视频图像中的活动目标,即使场景中活动目标较多,轨迹较乱,应用本方案,仍可以准确地输出通知信息。
当然,实施本申请的任一产品或方法并不一定需要同时达到以上所述的所有优点。
附图说明
为了更清楚地说明本申请实施例和现有技术的技术方案,下面对实施例和现有技术中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的通知信息的输出方法的第一种流程示意图;
图2为本申请实施例提供的通知信息的输出方法的第二种流程示意图;
图3为本申请实施例提供的通知信息的输出方法的第三种流程示意图;
图4为本申请实施例提供的一种服务器的结构示意图;
图5为本申请实施例提供的监控***的第一种结构示意图;
图6为本申请实施例提供的监控***的第二种结构示意图;
图7为本申请实施例提供的监控***的第三种结构示意图。
具体实施方式
为使本申请的目的、技术方案、及优点更加清楚明白,以下参照附图并举实施例,对本申请进一步详细说明。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
为了解决上述技术问题,本申请实施例提供了一种通知信息的输出方法、服务器及监控***。该方法可以应用于监控***中的服务器,或者,也可以应用于各种电子设备,具体不做限定。
下面首先对本申请实施例提供的一种通知信息的输出方法进行详细说明。
图1为本申请实施例提供的一种通知信息的输出方法的流程示意图,包括:
S101:获取音频信息。
作为一种实施方式,执行本方案的设备(以下简称本设备)可以具备音频采集功能,S101中获取的可以为本设备自身采集的音频信息。
作为另一种实施方式,本设备可以与音频采集设备通信连接,从音频采集设备中获取音频信息。
在本实施例中,可以每隔预设时长执行一次本方案,也就是每隔预设时长获取一次音频信息。或者,也可以在接收到用户的触发指令后,执行本方案,具体不做限定。
S102:对所述音频信息进行特征值提取。
一种实现方式中,可以先对获取的音频信息进行过滤、降噪等处理,再进行特征值的提取。
举例来说,所提取的特征值可以包括如下一种或多种类型:
语速、语义信息、音量过零率、音量最大值、音量最小值、音量平均值、音量变化率最大值、音量变化率最小值、音量变化率平均值、声音频率最大值、声音频率最小值、声音频率平均值、声音频率变化率最大值、声音频率变化率最小值、声音频率变化率平均值、音频曲线向量、音量曲线向量等等。
S103:将所提取的特征值与预设数据库中的特征值模型进行匹配。
在本实施例中,执行本方案之前,预先构建数据库。该数据库中存储有特征值模型与预警级别的对应关系,该特征值模型可以为多个特征值的集合。
在本实施例中,特征值模型中包含的特征值的种类与S102中提取的特征值的种类相一致。这样,才能得到较好的匹配效果。
举例来说,假设预警级别分为三级,三级表示级别最高。在数据库中,一级的预警级别对应的特征值模型可以为:语速200字/分钟、音量平均值70dB、语义信息“小心”。二级的预警级别对应的特征值模型可以为:语速300字/分钟、音量平均值80dB、语义信息“来人哪”。三级的预警级别对应的特征值模型可以为:语速400字/分钟、音量平均值90dB、语义信息“救命”。
需要说明的是,每级预警级别可以对应多个特征值模型,这里为了简化描述,仅以上述模型为例进行说明。
S104:根据匹配结果,确定所述音频信息对应的预警级别。
假设S102中获取的特征值包括:语速300字/分钟、音量平均值80dB、语义信息“来人哪”;将这些特征值与上述数据库中的特征值模型进行匹配,匹配到二级的预警级别。确定S101中获取的音频信息对应的预警级别为二级。
需要说明的是,将提取的特征值与数据库中的特征值模型进行匹配时,匹配成功的标准可以根据实际情况进行设定,比如,可以设定匹配率高于预设值时,匹配成功。该匹配结果中可以包括与某特征值模型匹配成功、或者 与某特征值模型匹配失败的信息,或者其他,具体不做限定。
作为一种可选的实施方式,预设数据库中存储的特征值模型可以包含场景声音模型,场景声音模型可以为针对预设场景声音建立的特征值模型。场景声音可以包含枪声、哭声、鸣笛声等等,具体不做限定。可以理解,商场、超市、银行等场景下发生混乱时,通常伴随着枪声、鸣笛声、哭声,本实施例中,将这些声音称为场景声音。
具体的,可以利用机器学习算法,预先对这些场景声音进行模型训练,得到场景声音模型。可以理解,当存在这些场景声音时,发生异常事件的概率较大,因此,场景声音模型对应的预警级别可以设定的较高一些。
将S102中提取的特征值与这些场景声音模型进行匹配,将匹配成功的场景声音对应的预警级别确定为音频信息的预警级别。
S105:判断所述预警级别是否满足预设条件,如果是,执行S106。
本步骤中预警级别是指S104中确定得到的上述音频信息对应的预警级别。
S106:确定所述音频信息对应的通知信息。
延续上述例子,假设预设条件为预警级别一级以上,则满足该条件,确定S101中获取的音频信息对应的通知信息。
作为一种实施方式,S106可以包括:获取所述音频信息对应的视频图像和/或地理位置信息;将所述视频图像和/或地理位置信息确定为所述音频信息对应的通知信息。
可以理解,本设备可以具备视频采集功能及定位功能,这样,可以获取自身采集的视频图像、自身确定的地理位置信息;或者,本设备可以与其他设备通信连接,从其他设备中获取音频信息对应的视频图像和/或地理位置信息,具体不做限定。
音频信息对应的视频图像,是指与音频信息针对同一场景,且采集时刻相同的视频图像;音频信息对应的地理位置信息,是指采集音频信息的设备所在的地理位置信息。
如果本设备从其他设备中获取音频信息对应的视频图像和/或地理位置信 息,则该其他设备与采集该音频信息的设备针对同一场景进行音频或视频采集。
S107:输出所确定的通知信息。
在上述实施方式中,通知信息中包含视频图像和/或地理位置信息,这样,便可以将异常事件更准确地告知相关人员进行处理。
作为一种实施方式,在执行S107之前,可以提示用户是否输出所述通知信息;判断在预设时间段内是否接收到用户发送的拒绝信息;如果否,再执行S107。
在这种实施方式中,提示信息可以包括以下一种或多种:音频信息对应的预警级别、视频图像、地理位置信息或者其他,具体不做限定。将这些提示信息展示给用户,展示形式有多种,比如弹窗、闪烁提醒等等,具体不做限定。
可以理解,用户针对该提示信息,可以选择确认输出,可以选择拒绝输出,也可以不做选择;如果接收到用户发送的确认信息(用户选择确认输出)、或者在预设时间段内未接收到用户的反馈(用户未做选择),执行S107;如果接收到用户发送的拒绝信息(用户选择拒绝输出),则不输出该通知信息。
作为一种实施方式,构建上述数据库的过程可以包括:
获取异常事件的模拟音频信息;对所述模拟音频信息进行特征值提取;根据所提取的特征值构建特征值模型;将所构建的特征值模型与用户设定的预警级别对应存储至所述数据库。
该异常事件可以理解为抢劫事件、财物纠纷事件等等,具体不做限定。
本领域技术人员可以理解,可以根据实际需求,构建上述数据库。比如,可以录制抢劫事件的模拟音频信息,提取该模拟音频信息的特征值。假设提取的特征值包括:语速400字/分钟、音量平均值90dB、语义信息“救命”,根据所提取的特征值构建特征值模型,该特征值模型可以为上述特征值的集合;将该特征值模型与用户设定的预警级别对应存储。这样,数据库中便存储了各个特征值模型与预警级别的对应关系。
作为一种实施方式,可以对构建的数据库进行更新:
接收用户发送的添加指令;提取所述添加指令对应的目标音频信息的特征值;根据所述目标音频信息的特征值,构建目标特征值模型;将所述目标特征值模型与所述添加指令中包含的预警级别对应添加至所述数据库。
可以理解,如果用户认为某条音频信息符合期望,为了方便描述,将用户认为符合期望的音频信息称为目标音频信息,用户便可以向设备发送添加指令,该添加指令中可以包含目标音频信息的标识、以及用户针对该目标音频信息设定的预警级别。设备接收到该添加指令后,根据该添加指令中的标识,确定目标音频信息,并提取该目标音频信息的特征值;根据所提取的特征值,构建目标特征值模型,将构建的目标特征值模型与添加指令中包含的预警级别对应添加至数据库。
可见,应用上述实施方式,实现了对数据库的更新,进一步地,将所获取的音频信息的特征值与更新后的数据库中的特征值模型进行匹配,可以提高匹配的准确性。
应用本申请图1所示实施例,预先建立数据库,该数据库中包含特征值模型与预警级别的对应关系;获取音频信息的特征值,将所获取的特征值与该数据库中的特征值模型进行匹配,进而确定出该音频信息对应的预警级别;当预警级别满足预设条件时,输出通知信息。由此可见,本申请实施例通过对音频信息进行分析来输出通知信息,不需要确定视频图像中的活动目标,即使场景中活动目标较多,轨迹较乱,应用本方案,仍可以准确地输出通知信息。
图2为本申请实施例提供的通知信息的输出方法的第二种流程示意图,包括:
S201:获取音频信息。
S202:判断所述音频信息是否为多类型音频信息,如果是,执行S203,如果否,直接执行S204。
S203:将所述多类型音频信息分解为至少一个单类型音频信息。
需要说明的是,上述多类型音频信息中包含多种类型的声音,单类型音频信息中包含一种类型的声音。
可以理解,本方案的应用场景可以为单一声音场景,比如,应用在家庭场景中,采集到的音频信息中可以仅包含一个人的语音信息,这样的音频信息也就是上述单类型音频信息。
或者,本方案的应用场景也可以为多类型声音场景,比如超市、商场、银行等场景中,采集到的音频信息中包含多人的语音信息,这样的音频信息也就是上述多类型音频信息。
或者,在超市、商场、银行等场景中,采集到的音频信息中包含一个人的语音信息、以及环境中的声音信息,这样的音频信息也是多类型音频信息。
或者,在超市、商场、银行等场景中,采集到的音频信息中包含多人的语音信息、以及环境中的声音信息,这样的音频信息也是多类型音频信息。
如果S201中获取的音频信息为多类型音频信息,则可以先将多类型音频信息分解为单类型音频信息,再执行后续步骤。
作为一种实施方式,S203可以包括:根据预设切分规则,将所述多类型音频信息切分为多个音频段;针对每个音频段,判断所述音频段中是否包含多种声音类型;如果否,将所述音频段作为一个单类型音频信息;如果是,根据所述音频段中的声音参数,将所述音频段分解为至少一个单类型音频信息,所述声音参数包含以下一种或多种:音调、响度、音色。
该预设切分规则可以有多种,比如,可以将多类型音频信息切分为时间长度相等的多个音频段,或者,切分为体积大小相等的多个音频段,或者,也可以根据多类型音频信息的总时长,确定切分音频段的数量,根据该数量切分音频段,或者,也可以根据多类型音频信息的总体积大小,确定切分音频段的数量,根据该数量切分音频段,等等,具体切分规则不做限定。
举例来说,可以将多类型音频信息切分为多个时长为1秒的音频段,假设该多类型音频信息的总时长为1分钟,则得到60个音频段。
针对每个音频段,判断其中是否包含多种声音类型。举个例子,该多类 型音频信息为人员A与人员B的对话,时长一分钟,人员A的语音信息与人员B的语音信息没有交叉。假设切分得到的前30个音频段中仅包含人员A的语音信息,后30个音频段中仅包含人员A的语音信息,则这60个音频段都仅包含一种声音类型,都为单类型音频信息。
这个例子情况较为理想,恰好每个音频段中都仅包含一个人的语音信息,实际上,一个音频段中也会出现多种声音类型。仍假设多类型音频信息为人员A与人员B的对话,时长一分钟,但切分得到的音频段中有的仅包含一个人的语音信息,有的包含两个人的语音信息。将包含一个人的语音信息的音频段作为单类型音频信息,而对于包含两个人语音信息的音频段,根据音频段中的声音参数,将该音频段作进一步的分解。
再举一个例子,在一些较嘈杂的较场景中,同一时刻出现多人的语音信息、并且混杂了鸣笛声、哭声。针对这些场景采集到多类型音频信息,将该多类型音频信息进行切分,得到多个音频段。由于同一时刻存在多种类型的声音,该时刻对应的音频段包含多种声音类型。针对该音频段,根据其中的声音参数,将该音频段作进一步的分解。
声音参数可以包含以下一种或多种:音调、响度、音色。本领域技术人员可以理解,利用不同声音的音调、响度、音色等声音参数,可以提取出该不同的声音。因此,也就可以将包含多种声音类型的音频段继续分解,得到各个单类型音频信息。
在图2所示实施例中,S204对应图1中S102,S205对应图1中S103,但图2中特征值提取以及特征值匹配的步骤是针对各个单类型音频信息的,因此:
S204:对每个单类型音频信息进行特征值提取。
S205:针对每个单类型音频信息,将从所述单类型音频信息提取的特征值与所述预设数据库中的特征值模型进行匹配。
图2中,S206对应图1中S104,S206为:
若(S201中获取的音频信息)为单类型音频信息,根据匹配结果,确定所述单类型音频信息对应的预警级别;
若(S201中获取的音频信息)为多类型音频信息,获得所述多类型音频信息中包含的每个单类型音频信息对应的匹配结果;确定所述每个单类型音频信息对应的权重;根据所确定的权重及所述匹配结果,确定所述多类型音频信息对应的预警级别。
可以理解,如果音频信息为多类型音频信息,则执行S205后,多类型音频信息中包含的每个单类型音频信息都对应了一个匹配结果,这种情况下,可以确定每个单类型音频信息对应的权重。具体方式有多种,比如,根据分解得到单类型音频信息的顺序确定权重;或者,根据各个单类型音频信息的音量平均值大小确定权重,等等,具体不做限定。
举例来说,假设S201中获取的多类型音频信息中包含鸣笛声、哭声和多人的语音信息,对该多类型音频信息进行分解,得到“鸣笛声”、“哭声”、“人员A的语音信息”及“人员B的语音信息”四个单类型音频信息。
假设“鸣笛声”与数据库中的二级预警级别对应的特征值模型匹配成功,也就是说,根据“鸣笛声”对应的匹配结果,确定出的预警级别为二级;另外,假设根据“哭声”对应的匹配结果,确定出的预警级别为三级,根据“人员A的语音信息”对应的匹配结果,确定出的预警级别为三级,根据“人员B的语音信息”对应的匹配结果,确定出的预警级别为一级。
假设“鸣笛声”对应的权重为0.7,“哭声”对应的权重为0.9、“人员A的语音信息”对应的权重为0.8,“人员B的语音信息”对应的权重为0.6,则确定该多类型音频信息对应的预警级别=(0.7*2+0.9*3+0.8*3+0.6*1)/4=1.775。可以认为该预警级别大于一级小于二级,或者,也可以直接将该预警级别约等于二级,具体不做限定。
或者,可以将“鸣笛声”“哭声”这类场景声音的权重及预警级别设定的较高一些。作为一种实施方式,如果对S201中获取的多类型音频信息进行分解后,得到场景声音对应的单类型音频信息及其他单类型音频信息(比如,人员的语音信息),可以仅考虑场景声音对应的预警级别及权重,来计算该多类型音频信息的预警级别,这也是合理的。
图2中,S207与图1中S105相同,S208与图1中S106相同,S209与图1中S107 相同。
如果所确定的预警级别满足预设条件,则确定该多音频信息对应的通知信息,后续步骤与图1实施例类似,不再赘述。
应用本申请图2所示实施例,在多类型声音场景下,获取到多类型音频信息,将多类型音频信息分解成单类型音频信息后,再对单类型音频信息进行分析来输出通知信息,进一步提高了输出通知信息的准确性。
图3为本申请实施例提供的通知信息的输出方法的第三种流程示意图,包括:
S301:获取音频信息。
S302:判断所述音频信息是否为多类型音频信息,如果是,执行S303,如果否,直接执行S308。
S303:将所述多类型音频信息与预先设定的至少一种场景声音模型进行匹配。
S304:根据匹配结果,确定所述多类型音频信息中包含的每一种场景声音。
S305:确定所述每一种场景声音对应的预警级别及权重。
该场景声音模型可以包括:枪声模型、鸣笛声模型、哭声模型等等,具体不做限定。可以理解,商场、超市、银行等场景下发生混乱时,通常伴随着枪声、鸣笛声、哭声,图3实施例中,将这些声音称为场景声音。
可以利用机器学习算法,预先对这些场景声音进行模型训练,得到场景声音模型。在对多类型音频信息进行分解之前,可以先将多类型音频信息与这些场景声音模型进行匹配。
举例来说,假设S301中获取的多类型音频信息中包含鸣笛声、哭声和多人的语音信息。先将该多类型音频信息与预先设定的各种场景声音模型进行匹配,假设匹配结果为:与鸣笛声模型和哭声模型匹配成功,也就是确定出该多类型音频信息中包含鸣笛声和哭声。
作为一种实施方式,可以预先为各种场景声音设定其对应的预警级别及权重。可以将设定的预警级别及权重与上述场景声音模型对应存储,这样,根据S303中的匹配结果,可以直接确定出每一种场景声音(鸣笛声和哭声)对应的预警级别及权重。
作为另一种实施方式,S305可以包括:在所述多类型音频信息中,提取所述每一种场景声音;针对所提取的每一种场景声音,对所述场景声音进行特征值提取,将所提取的特征值与预设数据库中的特征值模型进行匹配,将匹配成功的特征值模型对应的预警级别确定为所述场景声音的预警级别。
延续上述例子,该多类型音频信息中包含鸣笛声和哭声。可以根据音调、响度、音色或者其他声音参数,分别提取出鸣笛声和哭声。在本实施方式中,将场景声音也作为单类型音频信息来处理。具体的,对鸣笛声和哭声进行特征值提取、特征值匹配,具体过程与图2实施例中S204、S205类似,不再赘述。
本实施方式中的数据库与图1实施例中的数据库可以为同一数据库,或者,也可以为不同的数据库,具体不做限定。
在图3所示实施例中,将多类型音频信息中的场景声音与语音信息分开处理,可以先处理场景声音,再处理语音信息,也可以先处理语音信息,再处理场景声音,也就是说,可以先执行S303-305,再执行S306-S309,也可以先执行S306-S309,再执行S303-305,具体顺序不做限定。
本实施例中的语音信息是指“人发出的、具有语义的语音”,不包括上述哭声等不具有语义的声音。
S306:确定所述多类型音频信息中包含的语音信息。
S307:根据所述语音信息的音色,确定所述语音信息对应的每个单类型音频信息。
本领域技术人员可以理解,人发出的语音信息可以通过音色来提取,或者,也可以通过其他方式来提取,具体不做限定。
然后对语音信息对应的每个单类型音频信息进行特征值提取(S308)、特征值匹配(S309),图3中,S308与图2中S204相对应,S309与图2中S205相对 应,具体过程不再赘述。
S310:若S301中获取的音频信息为单类型音频信息,根据匹配结果,确定所述单类型音频信息对应的预警级别;
若S301中获取的音频信息为多类型音频信息,获得所述多类型音频信息中包含的每个单类型音频信息对应的匹配结果;确定所述每个单类型音频信息对应的权重;根据每个单类型音频信息对应的权重及匹配结果、以及所述每一种场景声音对应的预警级别及权重,确定所述多类型音频信息对应的预警级别。
可以理解,如果音频信息为多类型音频信息,则执行S309之后,多类型音频信息中包含的每个单类型音频信息都对应了一个匹配结果,这种情况下,可以确定每个单类型音频信息对应的权重。具体方式有多种,比如,根据分解得到单类型音频信息的顺序确定权重;或者,随机分配权重;或者,根据各个单类型音频信息的音量平均值大小确定权重,等等,具体不做限定。
综合考虑S305中确定的预警级别及权重,以及上述单类型音频信息对应的匹配结果及权重,确定该多类型音频信息对应的预警级别。也就是说,综合考虑每一种场景声音对应的预警级别及权重、以及每一种语音信息对应的预警级别及权重,确定该多类型音频信息对应的预警级别。
假设S301获取的多类型音频信息中包含鸣笛声和哭声两种场景声音、以及人员A和人员B的语音信息。先将该多类型音频信息与场景声音模型进行匹配,确定出该多类型音频信息中包含“鸣笛声”和“哭声”,再确定该多类型音频信息中包含的语音信息对应了“人员A的语音信息”及“人员B的语音信息”两个单类型音频信息。
或者,也可以先确定该多类型音频信息中包含的语音信息对应了“人员A的语音信息”及“人员B的语音信息”两个单类型音频信息,再将该多类型音频信息与场景声音模型进行匹配,确定出该多类型音频信息中包含“鸣笛声”和“哭声”。
假设通过S305确定出“鸣笛声”对应的预警级别为二级、权重为0.7,确定出“哭声”对应的预警级别为三级、权重为0.9;通过S306-S309,确定出“人 员A的语音信息”对应的预警级别为三级、权重为0.8,确定出“人员B的语音信息”对应的预警级别为一级、权重为0.6。
执行S310,确定该多类型音频信息对应的预警级别=(0.7*2+0.9*3+0.8*3+0.6*1)/4=1.775。可以认为该预警级别大于一级小于二级,或者,也可以直接将该预警级别约等于二级,具体不做限定。
图3中,S311与图1中S105相同,S312与图1中S106相同,S313与图1中S107相同。
如果所确定的预警级别满足预设条件,则确定该多音频信息对应的通知信息,后续步骤与图1实施例类似,不再赘述。
应用本申请图3所示实施例,在多类型声音场景下,获取到多类型音频信息,将多类型音频信息中的场景声音与语音信息分开处理,能够针对场景声音与语音信息的不同,对其区分处理。
与上述方法实施例相对应,本申请实施例还提供一种服务器。
图4为本申请实施例提供的一种服务器的结构示意图,包括:处理器401和存储器402,其中,存储器402用于存储可执行程序代码;处理器401通过读取存储器402中存储的可执行程序代码来运行与可执行程序代码对应的程序,以用于执行以下步骤:
获取音频信息;
对所述音频信息进行特征值提取;
将所提取的特征值与预设数据库中的特征值模型进行匹配,所述数据库中存储有特征值模型与预警级别的对应关系;
根据匹配结果,确定所述音频信息对应的预警级别;
判断所述预警级别是否满足预设条件,如果是,确定所述音频信息对应的通知信息;
输出所确定的通知信息。
作为一种实施方式,所述特征值模型包含场景声音模型;所述场景声音 模型为针对预设场景声音建立的特征值模型;处理器401还可以用于执行如下步骤:
将所提取的特征值与所述场景声音模型进行匹配。
作为一种实施方式,处理器401还可以用于执行如下步骤:
在获取音频信息之后,判断所述音频信息是否为多类型音频信息,所述多类型音频信息中包含多种类型的声音;
如果是,先将所述多类型音频信息分解为至少一个单类型音频信息,所述单类型音频信息中包含一种类型的声音;再对每个单类型音频信息进行特征值提取;
如果否,直接对单类型音频信息进行特征值提取;
针对每个单类型音频信息,将从所述单类型音频信息提取的特征值与所述预设数据库中的特征值模型进行匹配;
若所述音频信息为单类型音频信息:
根据匹配结果,确定所述单类型音频信息对应的预警级别;
若所述音频信息为多类型音频信息:
获得所述多类型音频信息中包含的每个单类型音频信息对应的匹配结果;
确定所述每个单类型音频信息对应的权重;
根据所确定的权重及所述匹配结果,确定所述多类型音频信息对应的预警级别。
作为一种实施方式,处理器401还可以用于执行如下步骤:
根据预设切分规则,将所述多类型音频信息切分为多个音频段;
针对每个音频段,判断所述音频段中是否包含多种声音类型;
如果否,将所述音频段作为一个单类型音频信息;
如果是,根据所述音频段中的声音参数,将所述音频段分解为至少一个单类型音频信息,所述声音参数包含以下一种或多种:音调、响度、音色。
作为一种实施方式,处理器401还可以用于执行如下步骤:
在判断所述音频信息为多类型音频信息的情况下,将所述多类型音频信息与预先设定的至少一种场景声音模型进行匹配;
根据匹配结果,确定所述多类型音频信息中包含的每一种场景声音;
确定所述每一种场景声音对应的预警级别及权重;
确定所述多类型音频信息中包含的语音信息;
根据所述语音信息的音色,确定所述语音信息对应的每个单类型音频信息;
根据每个单类型音频信息对应的权重及匹配结果、以及所述每一种场景声音对应的预警级别及权重,确定所述多类型音频信息对应的预警级别。
作为一种实施方式,处理器401还可以用于执行如下步骤:
在所述多类型音频信息中,提取所述每一种场景声音;
针对所提取的每一种场景声音,对所述场景声音进行特征值提取,将所提取的特征值与预设数据库中的特征值模型进行匹配,将匹配成功的特征值模型对应的预警级别确定为所述场景声音的预警级别。
作为一种实施方式,处理器401还可以用于执行如下步骤:
获取所述音频信息对应的视频图像和/或地理位置信息;
将所述视频图像和/或地理位置信息确定为所述音频信息对应的通知信息。
作为一种实施方式,处理器401还可以用于执行如下步骤:
在输出所确定的通知信息之前,提示用户是否输出所述通知信息;
判断在预设时间段内是否接收到用户发送的拒绝信息;
如果否,执行所述输出所确定的通知信息的步骤。
作为一种实施方式,处理器401还可以用于执行如下步骤:
获取异常事件的模拟音频信息;
对所述模拟音频信息进行特征值提取;
根据所提取的特征值构建特征值模型;
将所构建的特征值模型与用户设定的预警级别对应存储至所述数据库。
作为一种实施方式,处理器401还可以用于执行如下步骤:
接收用户发送的添加指令;
提取所述添加指令对应的目标音频信息的特征值;
根据所述目标音频信息的特征值,构建目标特征值模型;
将所述目标特征值模型与所述添加指令中包含的预警级别对应添加至所述数据库。
应用本申请图4所示实施例,预先建立数据库,该数据库中包含特征值模型与预警级别的对应关系;获取音频信息的特征值,将所获取的特征值与该数据库中的特征值模型进行匹配,进而确定出该音频信息对应的预警级别;当预警级别满足预设条件时,输出通知信息。由此可见,本申请实施例通过对音频信息进行分析来输出通知信息,不需要确定视频图像中的活动目标,即使场景中活动目标较多,轨迹较乱,应用本方案,仍可以准确地输出通知信息。
本申请实施例还提供一种监控***,该监控***可以仅包括服务器,该服务器具有音频采集功能;或者也可以如图5所示,包括服务器和音频采集设备;或者也可以如图6所示,包括服务器和多媒体采集设备,该多媒体采集设备具有音频、视频采集功能;或者,也可以如图7所示,包括服务器、音频采集设备和视频采集设备。
在图5、图6或图7所示实施例中,音频采集设备或者多媒体采集设备,用于采集音频信息,并将所采集的音频信息发送给服务器。
在图6或图7所示实施例中,视频采集设备或者多媒体采集设备,用于采集视频图像、以及确定自身地理位置信息,并将所采集的音频图像、以及所 确定的地理位置信息发送给所述服务器;
服务器还用于,在确定所述音频信息对应的通知信息的过程中,确定所述音频信息对应的视频图像及地理位置信息,将所述视频图像及地理位置信息添加至所述通知信息。
作为一种实施方式,服务器可以包括通信服务器和数据库服务器,其中,
所述数据库服务器,用于获取异常事件的模拟音频信息;对所述模拟音频信息进行特征值提取;根据所提取的特征值构建特征值模型;将所构建的特征值模型与用户设定的预警级别对应存储至所述数据库服务器的数据库;
所述通信服务器,用于获取音频信息;对所述音频信息进行特征值提取;将所提取的特征值与所述数据库服务器的数据库中的特征值模型进行匹配,所述数据库中存储有特征值模型与预警级别的对应关系;根据匹配结果,确定所述音频信息对应的预警级别;判断所述预警级别是否满足预设条件,如果是,确定所述音频信息对应的通知信息;输出所确定的通知信息。
在本申请实施例提供的监控***中,服务器可以用于:
获取音频信息;对所述音频信息进行特征值提取;将所提取的特征值与预设数据库中的特征值模型进行匹配,所述数据库中存储有特征值模型与预警级别的对应关系;根据匹配结果,确定所述音频信息对应的预警级别;判断所述预警级别是否满足预设条件,如果是,确定所述音频信息对应的通知信息;输出所确定的通知信息。
作为一种实施方式,所述特征值模型包含场景声音模型;所述场景声音模型为针对预设场景声音建立的特征值模型;服务器还可以用于:
将所提取的特征值与所述场景声音模型进行匹配。
作为一种实施方式,服务器还可以用于:
在获取音频信息之后,判断所述音频信息是否为多类型音频信息,所述多类型音频信息中包含多种类型的声音;如果是,先将所述多类型音频信息分解为至少一个单类型音频信息,所述单类型音频信息中包含一种类型的声音;再对每个单类型音频信息进行特征值提取;如果否,直接对单类型音频 信息进行特征值提取;针对每个单类型音频信息,将从所述单类型音频信息提取的特征值与所述预设数据库中的特征值模型进行匹配;若所述音频信息为单类型音频信息:根据匹配结果,确定所述单类型音频信息对应的预警级别;若所述音频信息为多类型音频信息:获得所述多类型音频信息中包含的每个单类型音频信息对应的匹配结果;确定所述每个单类型音频信息对应的权重;根据所确定的权重及所述匹配结果,确定所述多类型音频信息对应的预警级别。
作为一种实施方式,服务器还可以用于:
根据预设切分规则,将所述多类型音频信息切分为多个音频段;
针对每个音频段,判断所述音频段中是否包含多种声音类型;
如果否,将所述音频段作为一个单类型音频信息;
如果是,根据所述音频段中的声音参数,将所述音频段分解为至少一个单类型音频信息,所述声音参数包含以下一种或多种:音调、响度、音色。
作为一种实施方式,服务器还可以用于:
在判断所述音频信息为多类型音频信息的情况下,将所述多类型音频信息与预先设定的至少一种场景声音模型进行匹配;
根据匹配结果,确定所述多类型音频信息中包含的每一种场景声音;
确定所述每一种场景声音对应的预警级别及权重;
确定所述多类型音频信息中包含的语音信息;
根据所述语音信息的音色,确定所述语音信息对应的每个单类型音频信息;
根据每个单类型音频信息对应的权重及匹配结果、以及所述每一种场景声音对应的预警级别及权重,确定所述多类型音频信息对应的预警级别。
作为一种实施方式,服务器还可以用于:
在所述多类型音频信息中,提取所述每一种场景声音;
针对所提取的每一种场景声音,对所述场景声音进行特征值提取,将所提取的特征值与预设数据库中的特征值模型进行匹配,将匹配成功的特征值模型对应的预警级别确定为所述场景声音的预警级别。
作为一种实施方式,服务器还可以用于:
获取所述音频信息对应的视频图像和/或地理位置信息;
将所述视频图像和/或地理位置信息确定为所述音频信息对应的通知信息。
作为一种实施方式,服务器还可以用于:
提示用户是否输出所述通知信息;
判断在预设时间段内是否接收到用户发送的拒绝信息;
如果否,执行所述输出所确定的通知信息的步骤。
作为一种实施方式,服务器构建所述数据库的过程可以包括:
获取异常事件的模拟音频信息;
对所述模拟音频信息进行特征值提取;
根据所提取的特征值构建特征值模型;
将所构建的特征值模型与用户设定的预警级别对应存储至所述数据库。
作为一种实施方式,服务器还可以用于:
接收用户发送的添加指令;
提取所述添加指令对应的目标音频信息的特征值;
根据所述目标音频信息的特征值,构建目标特征值模型;
将所述目标特征值模型与所述添加指令中包含的预警级别对应添加至所述数据库。
应用本申请实施例,预先建立数据库,该数据库中包含特征值模型与预警级别的对应关系;获取音频信息的特征值,将所获取的特征值与该数据库中的特征值模型进行匹配,进而确定出该音频信息对应的预警级别;当预警 级别满足预设条件时,输出通知信息。由此可见,本申请实施例通过对音频信息进行分析来输出通知信息,不需要确定视频图像中的活动目标,即使场景中活动目标较多,轨迹较乱,应用本方案,仍可以准确地输出通知信息。
本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时,实现上述任一种通知信息的输出方法。
本申请实施例还提供一种可执行程序代码,所述可执行程序代码用于被运行以执行上述任一种通知信息的输出方法。
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
本说明书中的各个实施例均采用相关的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于图4所示的服务器实施例、图5-7所示的监控***实施而言,由于其基本相似于图1-3所示的通知信息的输出方法实施例,所以描述的比较简单,相关之处参见图1-3所示的通知信息的输出方法实施例的部分说明即可。
以上所述仅为本申请的较佳实施例而已,并不用以限制本申请,凡在本申请的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请保护的范围之内。

Claims (24)

  1. 一种通知信息的输出方法,其特征在于,包括:
    获取音频信息;
    对所述音频信息进行特征值提取;
    将所提取的特征值与预设数据库中的特征值模型进行匹配,所述数据库中存储有特征值模型与预警级别的对应关系;
    根据匹配结果,确定所述音频信息对应的预警级别;
    判断所述预警级别是否满足预设条件,如果是,确定所述音频信息对应的通知信息;
    输出所确定的通知信息。
  2. 根据权利要求1所述的方法,其特征在于,所述特征值模型包含场景声音模型;所述场景声音模型为针对预设场景声音建立的特征值模型;
    所述将所提取的特征值与预设数据库中的特征值模型进行匹配,包括:
    将所提取的特征值与所述场景声音模型进行匹配。
  3. 根据权利要求1所述的方法,其特征在于,在所述获取音频信息之后,还包括:
    判断所述音频信息是否为多类型音频信息,所述多类型音频信息中包含多种类型的声音;
    如果是,先将所述多类型音频信息分解为至少一个单类型音频信息,所述单类型音频信息中包含一种类型的声音;再执行所述对所述音频信息进行特征值提取的步骤;
    如果否,执行所述对所述音频信息进行特征值提取的步骤;
    所述对所述音频信息进行特征值提取,包括:
    对每个单类型音频信息进行特征值提取;
    所述将所提取的特征值与预设数据库中的特征值模型进行匹配,包括:
    针对每个单类型音频信息,将从所述单类型音频信息提取的特征值与所述预设数据库中的特征值模型进行匹配;
    所述根据匹配结果,确定所述音频信息对应的预警级别,包括:
    若所述音频信息为单类型音频信息:
    根据匹配结果,确定所述单类型音频信息对应的预警级别;
    若所述音频信息为多类型音频信息:
    获得所述多类型音频信息中包含的每个单类型音频信息对应的匹配结果;
    确定所述每个单类型音频信息对应的权重;
    根据所确定的权重及所述匹配结果,确定所述多类型音频信息对应的预警级别。
  4. 根据权利要求3所述的方法,其特征在于,所述将所述多类型音频信息分解为至少一个单类型音频信息,包括:
    根据预设切分规则,将所述多类型音频信息切分为多个音频段;
    针对每个音频段,判断所述音频段中是否包含多种声音类型;
    如果否,将所述音频段作为一个单类型音频信息;
    如果是,根据所述音频段中的声音参数,将所述音频段分解为至少一个单类型音频信息,所述声音参数包含以下一种或多种:音调、响度、音色。
  5. 根据权利要求3所述的方法,其特征在于,在判断所述音频信息为多类型音频信息的情况下,所述方法还包括:
    将所述多类型音频信息与预先设定的至少一种场景声音模型进行匹配;
    根据匹配结果,确定所述多类型音频信息中包含的每一种场景声音;
    确定所述每一种场景声音对应的预警级别及权重;
    所述将所述多类型音频信息分解为至少一个单类型音频信息,包括:
    确定所述多类型音频信息中包含的语音信息;
    根据所述语音信息的音色,确定所述语音信息对应的每个单类型音频信息;
    所述根据所确定的权重及所述匹配结果,确定所述多类型音频信息对应的预警级别,包括:
    根据每个单类型音频信息对应的权重及匹配结果、以及所述每一种场景声音对应的预警级别及权重,确定所述多类型音频信息对应的预警级别。
  6. 根据权利要求1所述的方法,其特征在于,所述确定所述音频信息对应的通知信息,包括:
    获取所述音频信息对应的视频图像和/或地理位置信息;
    将所述视频图像和/或地理位置信息确定为所述音频信息对应的通知信息。
  7. 根据权利要求1所述的方法,其特征在于,在所述输出所确定的通知信息之前,还包括:
    提示用户是否输出所述通知信息;
    判断在预设时间段内是否接收到用户发送的拒绝信息;
    如果否,执行所述输出所确定的通知信息的步骤。
  8. 根据权利要求1所述的方法,其特征在于,构建所述数据库的过程包括:
    获取异常事件的模拟音频信息;
    对所述模拟音频信息进行特征值提取;
    根据所提取的特征值构建特征值模型;
    将所构建的特征值模型与用户设定的预警级别对应存储至所述数据库。
  9. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    接收用户发送的添加指令;
    提取所述添加指令对应的目标音频信息的特征值;
    根据所述目标音频信息的特征值,构建目标特征值模型;
    将所述目标特征值模型与所述添加指令中包含的预警级别对应添加至所述数据库。
  10. 一种服务器,其特征在于,包括:处理器和存储器,其中,存储器用于存储可执行程序代码,处理器通过读取存储器中存储的可执行程序代码来运行与可执行程序代码对应的程序,以用于执行以下步骤:
    获取音频信息;
    对所述音频信息进行特征值提取;
    将所提取的特征值与预设数据库中的特征值模型进行匹配,所述数据库中存储有特征值模型与预警级别的对应关系;
    根据匹配结果,确定所述音频信息对应的预警级别;
    判断所述预警级别是否满足预设条件,如果是,确定所述音频信息对应的通知信息;
    输出所确定的通知信息。
  11. 根据权利要求10所述的服务器,其特征在于,所述特征值模型包含场景声音模型;所述场景声音模型为针对预设场景声音建立的特征值模型;所述处理器还用于执行如下步骤:
    将所提取的特征值与所述场景声音模型进行匹配。
  12. 根据权利要求10所述的服务器,其特征在于,所述处理器还用于执行如下步骤:
    在获取音频信息之后,判断所述音频信息是否为多类型音频信息,所述多类型音频信息中包含多种类型的声音;
    如果是,先将所述多类型音频信息分解为至少一个单类型音频信息,所述单类型音频信息中包含一种类型的声音;再对每个单类型音频信息进行特征值提取;
    如果否,直接对单类型音频信息进行特征值提取;
    针对每个单类型音频信息,将从所述单类型音频信息提取的特征值与所述预设数据库中的特征值模型进行匹配;
    若所述音频信息为单类型音频信息:
    根据匹配结果,确定所述单类型音频信息对应的预警级别;
    若所述音频信息为多类型音频信息:
    获得所述多类型音频信息中包含的每个单类型音频信息对应的匹配结果;
    确定所述每个单类型音频信息对应的权重;
    根据所确定的权重及所述匹配结果,确定所述多类型音频信息对应的预警级别。
  13. 根据权利要求12所述的服务器,其特征在于,所述处理器还用于执行如下步骤:
    根据预设切分规则,将所述多类型音频信息切分为多个音频段;
    针对每个音频段,判断所述音频段中是否包含多种声音类型;
    如果否,将所述音频段作为一个单类型音频信息;
    如果是,根据所述音频段中的声音参数,将所述音频段分解为至少一个单类型音频信息,所述声音参数包含以下一种或多种:音调、响度、音色。
  14. 根据权利要求12所述的服务器,其特征在于,所述处理器还用于执行如下步骤:
    在判断所述音频信息为多类型音频信息的情况下,将所述多类型音频信息与预先设定的至少一种场景声音模型进行匹配;
    根据匹配结果,确定所述多类型音频信息中包含的每一种场景声音;
    确定所述每一种场景声音对应的预警级别及权重;
    确定所述多类型音频信息中包含的语音信息;
    根据所述语音信息的音色,确定所述语音信息对应的每个单类型音频信息;
    根据每个单类型音频信息对应的权重及匹配结果、以及所述每一种场景声音对应的预警级别及权重,确定所述多类型音频信息对应的预警级别。
  15. 根据权利要求10所述的服务器,其特征在于,所述处理器还用于执行如下步骤:
    获取所述音频信息对应的视频图像和/或地理位置信息;
    将所述视频图像和/或地理位置信息确定为所述音频信息对应的通知信息。
  16. 根据权利要求10所述的服务器,其特征在于,所述处理器还用于执行如下步骤:
    在输出所确定的通知信息之前,提示用户是否输出所述通知信息;
    判断在预设时间段内是否接收到用户发送的拒绝信息;
    如果否,执行所述输出所确定的通知信息的步骤。
  17. 根据权利要求10所述的服务器,其特征在于,所述处理器还用于执行如下步骤:
    获取异常事件的模拟音频信息;
    对所述模拟音频信息进行特征值提取;
    根据所提取的特征值构建特征值模型;
    将所构建的特征值模型与用户设定的预警级别对应存储至所述数据库。
  18. 根据权利要求10所述的服务器,其特征在于,所述处理器还用于执行如下步骤:
    接收用户发送的添加指令;
    提取所述添加指令对应的目标音频信息的特征值;
    根据所述目标音频信息的特征值,构建目标特征值模型;
    将所述目标特征值模型与所述添加指令中包含的预警级别对应添加至所述数据库。
  19. 一种监控***,其特征在于,包括:服务器,
    所述服务器,用于获取音频信息;对所述音频信息进行特征值提取;将所提取的特征值与预设数据库中的特征值模型进行匹配,所述数据库中存储有特征值模型与预警级别的对应关系;根据匹配结果,确定所述音频信息对应的预警级别;判断所述预警级别是否满足预设条件,如果是,确定所述音频信息对应的通知信息;输出所确定的通知信息。
  20. 根据权利要求19所述的***,其特征在于,所述***还包括:音频采集设备,
    所述音频采集设备,用于采集音频信息,并将所采集的音频信息发送给所述服务器。
  21. 根据权利要求19所述的***,其特征在于,所述***还包括:视频采集设备,
    所述视频采集设备,用于采集视频图像、以及确定自身地理位置信息,并将所采集的音频图像、以及所确定的地理位置信息发送给所述服务器;
    所述服务器还用于,在确定所述音频信息对应的通知信息的过程中,确定所述音频信息对应的视频图像及地理位置信息,将所述视频图像及地理位置信息添加至所述通知信息。
  22. 根据权利要求19所述的***,其特征在于,所述服务器包括通信服务器和数据库服务器,其中,
    所述数据库服务器,用于获取异常事件的模拟音频信息;对所述模拟音频信息进行特征值提取;根据所提取的特征值构建特征值模型;将所构建的特征值模型与用户设定的预警级别对应存储至所述数据库服务器的数据库;
    所述通信服务器,用于获取音频信息;对所述音频信息进行特征值提取;将所提取的特征值与所述数据库服务器的数据库中的特征值模型进行匹配,所述数据库中存储有特征值模型与预警级别的对应关系;根据匹配结果,确定所述音频信息对应的预警级别;判断所述预警级别是否满足预设条件,如果是,确定所述音频信息对应的通知信息;输出所确定的通知信息。
  23. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质内存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1-9任一所述的通知信息的输出方法。
  24. 一种可执行程序代码,其特征在于,所述可执行程序代码用于被运行以执行权利要求1-9任一项所述的通知信息的输出方法。
PCT/CN2018/090388 2017-06-12 2018-06-08 一种通知信息的输出方法、服务器及监控*** WO2018228280A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US16/622,159 US11275628B2 (en) 2017-06-12 2018-06-08 Notification information output method, server and monitoring system
EP18817001.3A EP3640935B1 (en) 2017-06-12 2018-06-08 Notification information output method, server and monitoring system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710436582.1A CN109036461A (zh) 2017-06-12 2017-06-12 一种通知信息的输出方法、服务器及监控***
CN201710436582.1 2017-06-12

Publications (1)

Publication Number Publication Date
WO2018228280A1 true WO2018228280A1 (zh) 2018-12-20

Family

ID=64630058

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/090388 WO2018228280A1 (zh) 2017-06-12 2018-06-08 一种通知信息的输出方法、服务器及监控***

Country Status (4)

Country Link
US (1) US11275628B2 (zh)
EP (1) EP3640935B1 (zh)
CN (1) CN109036461A (zh)
WO (1) WO2018228280A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110197663B (zh) * 2019-06-30 2022-05-31 联想(北京)有限公司 一种控制方法、装置及电子设备
CN110532888A (zh) * 2019-08-01 2019-12-03 悉地国际设计顾问(深圳)有限公司 一种监控方法、装置及***
CN111028860B (zh) * 2019-11-22 2021-08-06 深圳市康冠智能科技有限公司 音频数据处理方法、装置、计算机设备以及存储介质
CN111178883A (zh) * 2019-12-16 2020-05-19 秒针信息技术有限公司 异常确定方法及装置、存储介质、电子装置
CN113838478B (zh) * 2020-06-08 2024-04-09 华为技术有限公司 异常事件检测方法、装置和电子设备
CN112188427A (zh) * 2020-08-19 2021-01-05 天津大学 一种公共场所群体异常事件物联传感***和方法

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521945A (zh) * 2011-12-02 2012-06-27 无锡奥盛信息科技有限公司 一种呼叫探测报警方法与装置
CN102810311A (zh) * 2011-06-01 2012-12-05 株式会社理光 说话人估计方法和说话人估计设备
CN103198838A (zh) * 2013-03-29 2013-07-10 苏州皓泰视频技术有限公司 一种用于嵌入式***的异常声音监控方法和监控装置
CN103456301A (zh) * 2012-05-28 2013-12-18 中兴通讯股份有限公司 一种基于环境声音的场景识别方法及装置及移动终端
US20140088960A1 (en) * 2012-09-25 2014-03-27 Seiko Epson Corporation Voice recognition device and method, and semiconductor integrated circuit device
CN104239372A (zh) * 2013-06-24 2014-12-24 浙江大华技术股份有限公司 一种音频数据分类方法及装置
CN105679313A (zh) * 2016-04-15 2016-06-15 福建新恒通智能科技有限公司 一种音频识别报警***及方法
CN105812721A (zh) * 2014-12-30 2016-07-27 浙江大华技术股份有限公司 一种跟踪监控方法及跟踪监控设备

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7979146B2 (en) * 2006-04-13 2011-07-12 Immersion Corporation System and method for automatically producing haptic events from a digital audio signal
CN101587710B (zh) * 2009-07-02 2011-12-14 北京理工大学 一种基于音频突发事件分类的多码本编码参数量化方法
CN102014278A (zh) * 2010-12-21 2011-04-13 四川大学 一种基于语音识别技术的智能视频监控方法
CN103366740B (zh) * 2012-03-27 2016-12-14 联想(北京)有限公司 语音命令识别方法及装置
CN102970438A (zh) * 2012-11-29 2013-03-13 广东欧珀移动通信有限公司 一种手机自动报警方法及自动报警装置
CN104347068B (zh) * 2013-08-08 2020-05-22 索尼公司 音频信号处理装置和方法以及监控***
CN104036617B (zh) * 2014-06-11 2017-05-17 广东安居宝数码科技股份有限公司 报警方法和报警***
CN104156297A (zh) * 2014-08-07 2014-11-19 浪潮(北京)电子信息产业有限公司 告警方法和装置
US20160241818A1 (en) * 2015-02-18 2016-08-18 Honeywell International Inc. Automatic alerts for video surveillance systems
CN104795064B (zh) * 2015-03-30 2018-04-13 福州大学 低信噪比声场景下声音事件的识别方法
CN105022835B (zh) * 2015-08-14 2018-01-12 武汉大学 一种群智感知大数据公共安全识别方法及***
JP6682222B2 (ja) 2015-09-24 2020-04-15 キヤノン株式会社 検知装置及びその制御方法、コンピュータプログラム
CN106328134A (zh) * 2016-08-18 2017-01-11 都伊林 监狱语音数据识别及监测预警***
CN106683361A (zh) * 2017-01-24 2017-05-17 宇龙计算机通信科技(深圳)有限公司 声音监控方法及装置

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102810311A (zh) * 2011-06-01 2012-12-05 株式会社理光 说话人估计方法和说话人估计设备
CN102521945A (zh) * 2011-12-02 2012-06-27 无锡奥盛信息科技有限公司 一种呼叫探测报警方法与装置
CN103456301A (zh) * 2012-05-28 2013-12-18 中兴通讯股份有限公司 一种基于环境声音的场景识别方法及装置及移动终端
US20140088960A1 (en) * 2012-09-25 2014-03-27 Seiko Epson Corporation Voice recognition device and method, and semiconductor integrated circuit device
CN103198838A (zh) * 2013-03-29 2013-07-10 苏州皓泰视频技术有限公司 一种用于嵌入式***的异常声音监控方法和监控装置
CN104239372A (zh) * 2013-06-24 2014-12-24 浙江大华技术股份有限公司 一种音频数据分类方法及装置
CN105812721A (zh) * 2014-12-30 2016-07-27 浙江大华技术股份有限公司 一种跟踪监控方法及跟踪监控设备
CN105679313A (zh) * 2016-04-15 2016-06-15 福建新恒通智能科技有限公司 一种音频识别报警***及方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3640935A4

Also Published As

Publication number Publication date
US20200364097A1 (en) 2020-11-19
EP3640935B1 (en) 2024-02-14
US11275628B2 (en) 2022-03-15
EP3640935A4 (en) 2020-06-17
CN109036461A (zh) 2018-12-18
EP3640935A1 (en) 2020-04-22

Similar Documents

Publication Publication Date Title
WO2018228280A1 (zh) 一种通知信息的输出方法、服务器及监控***
US11178275B2 (en) Method and apparatus for detecting abnormality of caller
US10275210B2 (en) Privacy protection in collective feedforward
Marchi et al. A novel approach for automatic acoustic novelty detection using a denoising autoencoder with bidirectional LSTM neural networks
US11941968B2 (en) Systems and methods for identifying an acoustic source based on observed sound
Droghini et al. A Combined One‐Class SVM and Template‐Matching Approach for User‐Aided Human Fall Detection by Means of Floor Acoustic Features
CN108696768A (zh) 一种语音识别方法及***
CN108320757B (zh) 配送信息提示方法、装置、智能音箱及存储介质
CN110800053A (zh) 基于音频数据获取事件指示的方法和设备
CN109671430A (zh) 一种语音处理方法及装置
CN117234455B (zh) 基于环境感知的音频装置智能控制方法及***
CN110782622A (zh) 一种安全监控***、安全检测方法、装置及电子设备
US11133020B2 (en) Assistive technology
US20230052442A1 (en) Analyzing Objects Data to Generate a Textual Content Reporting Events
WO2019187107A1 (ja) 情報処理装置、制御方法、及びプログラム
CN108694388B (zh) 基于智能摄像头的校园监控方法及设备
KR102100304B1 (ko) 이미지 패턴화를 이용한 뱀 식별 방법
KR20170087225A (ko) 동물의 음성 분석정보를 제공하는 장치, 방법 및 기록매체
CN115862682B (zh) 声音检测方法及相关设备
US20240221764A1 (en) Sound detection method and related device
US20120291051A1 (en) Generating Event Definitions Based on Spatial and Relational Relationships
Goel et al. Audio Dialogues: Dialogues dataset for audio and music understanding
US20230317086A1 (en) Privacy-preserving sound representation
JP2014002336A (ja) コンテンツ処理装置、コンテンツ処理方法、およびコンピュータプログラム
US11145320B2 (en) Privacy protection in collective feedforward

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18817001

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2018817001

Country of ref document: EP

Effective date: 20200113